Intel Software Guard Extensions (SGX) is a pretty cool piece of technology that aims to make it possible for user programs to hide secrets from other user programs and the operating system itself. It establishes enclaves in the system that hides the data being processed and the code processing it from all other software. The original application for SGX was to support client-machine features like DRM, to create a safe space on a client that a server can trust. Recently, the people behind the Signal messaging system have provided a really interesting example of an application that makes use of the of SGX “in reverse”, to make it possible for a client to trust a server.
Fundamentally, SGX is about having a remote agent in a potentially hostile system that you can trust. It assumes you have a server that you control and trust, that communicates with a remote agent (enclave on a client machine) that you want to establish trust in. The code and data in the enclave has to be set up from untrusted local storage in the remote machine. In a simplified view, the code has to exist in a local clear-text copy that gets copied into the enclave as it is getting set up. Thus, there are no real way to guarantee what gets put into an enclave on the client end.
Once set up, the enclave will typically establish a network connection back to the trusted server and pull in the real secrets. The key to doing this right is to validate that the remote agent (enclave) can be trusted. The trusted party (server) has to be able to ascertain what was loaded into the enclave, and decide whether it wants to trust it. This is done using a mechanism known as attestation, where the trusted side basically gets measurements (secure checksums or hashes) of the setup on the untrusted side. If the measurements match what the server expects, the server can trust that the remote agent (enclave) runs the code it is supposed to run and has not been tampered with.
The typical scenario is for a server to check a client, but in the Signal case they turn that around and use SGX to allow the clients to check that the code running on the server can be trusted. In this case, the SGX enclaves are found on the server, not on the client.
Thus:
We get from “trust” to “trust and verify”, which is a real breakthrough.
The Application
The application for SGX is to allow a user to find other users to communicate with, without trusting the server to keep a social graph of who knows who.
The idea is that a Signal client application on a device like a phone contacts a Signal server and hands over a list of local contacts (hashed and encrypted) to the server. The server then checks which of the contacts are also Signal users, and sends that information back to the client. The client does not want the server to retain any information about the list of local contacts that it sent. The server should not know anything about which people are connected to which other people. Thus, the client has to trust the server to do the right thing – to do the match of their contacts against all active Signal users, and then forget everything that was sent.
As the developers say in the blog post, doing this until now is hard:
Doing better is difficult. There are a range of options that don’t work, like using bloom filters, encrypted bloom filters, sharded bloom filters, private information retrieval, or private set intersection. What if, instead, there were just a way for clients to verify that the code running on our servers was the code they wanted to be running on our servers, and not something that we or a third party had modified?
This is where SGX comes in. In reverse:
However, we can invert the traditional SGX relationship to run a secure enclave on the server. An SGX enclave on the server-side would enable a service to perform computations on encrypted client data without learning the content of the data or the result of the computation.
This provides a fundamental underpinning and is a very cool way to use SGX. SGX matches the goal – trust and verify.
Then, to build a really robust solution, there are many other practical concerns that the signal developers had to resolve. For example, the list of active signal users is in clear-text on the server… and they have to be handed in to the enclave in such a way that an observer in the OS cannot tell which of the users are being matched by the list from the client. Most of the blog post is spent explaining how this is done, removing side channels and information leakage over things like memory access patterns.
As a final flourish, the server source code is open and with a reproducible build. Thus, we have a situation where it is possible to verify that an open-source piece of software running on a server is indeed what it claims to be! That is a rather useful building block to keep in mind for the future.