Paul Dowman

/

Posts

PUBLISHED ON JULY 31, 2024

What is Verifiable Computing?

Every software developer knows that if someone else controls the hardware that runs your code, then you can’t completely trust the output. If a game player can change the abilities of their character by hacking the client software then they get an unfair advantage, so developers keep this logic out of the client-side code.

No problem, just put it on the server, and we trust the cloud providers enough for most things.

But what if there was a way to run code securely on hardware that you don’t control? What new use cases might it unlock?

In recent years we’ve developed a few ways to do this.

One type of computation we don’t currently trust to client devices is checking credentials, for example financial service KYC processes: it’s common to require customers to upload photos of their ID, or to connect to their bank (often requiring the customer to even provide their bank password!) to retrieve bank account info, in order to check some validity condition.

Now if we can actually verify that the client device ran some specific computation correctly, and trust its output without even knowing the inputs, then we can eliminate a huge amount of personal data being accessed and stored by insecure third parties. There are many early-stage projects that do exactly this. They can prove possession of a valid citizenship document or even a passport. It’s possible for a client to prove the contents of any data they received over a secure web connection (TLS) or even the contents of any email (since email is generally now signed by DKIM).

Other use cases provide privacy for public blockchain transactions, fight disinformation, build better distributed and decentralized software, and reduce the risk in using public cloud providers for security-sensitive work.

How does it work?

Of course any computation can be verified by simply re-running it. This is how the original “Layer 1” blockchains work, and why they have limited throughput: every node re-runs all the transactions to verify them. But this obviously doesn’t scale. We want to offload computation to a client device or some other provider, and be sure that it was run correctly.

There are two main ways to efficiently verify that a computation ran correctly: using trusted hardware or using a new cryptographic technique called “Zero Knowledge Proofs”. Neither of these were an option for general computation until recently.

Trusted hardware (trusted execution environment / TEE)

How do you trust the hardware if you don’t physically control it? Modern CPUs have a tamper-resistant module built in that contains a unique public/private key pair and the ability to “measure” the computing environment (basically take a cryptographic hash of the state of the hardware and software that’s running). If the hardware or software is different than what’s expected the hash won’t match and the secure boot fails. The CPU can then encrypt an area of memory and run code within this so that it can’t be modified (or if modified it will fail the cryptographic integrity check).

A verifier can then check the cryptographic signature of the output, following the chain of trust to a root certificate provided by the CPU manufacturer, guaranteeing that a genuine (and trusted) CPU ran the expected code correctly. The code can be audited, knowing that the exact version that was audited was actually used.

This is roughly how AMD’s Secure Encrypted Virtualization (SEV) works, and Intel’s Trust Domain Extensions (TDX) This allows computation to occur natively, at almost full speed, on modern hardware. It even works with GPUs for large AI training workloads. (EQTY Lab , my previous employer, is building tools for verifiable AI training.)

Of course there’s still a trust assumption here. The hardware is difficult to tamper with, but it’s at least theoretically possible, and there’s a long history of attacks on TEEs. Some of these only impact privacy, not verifiability, and the manufacturer only needs to be trusted at manufacture time of the specific CPU because the manufacturers publish certificate chains, and there’s a way to detect if the root key becomes compromised similar to TLS certificate transparency logs.

We may not want to trust this alone to secure billions of dollars, but it’s a pragmatic solution for many use cases and the limited trust assumptions involved are far smaller than what’s required for most critical systems right now.

In the future we may have open source and verifiable “secure through physics” TEEs.

Zero Knowledge Cryptography

If we’re not comfortable assuming that the manufacturer is trustworthy and that an adversary hasn’t examined the chip with an electron microscope to extract the private key, there are purely mathematical techniques to verify computation on any untrusted device: general purpose Zero Knowledge Proofs.

These allow a verifier to quickly verify that a particular computation produced a particular output, even if the computation in question took a long time to run.

Developers no longer need to understand the math or build custom arithmetic circuits, there are now verifiable virtual machines that run compiled programs written in general purpose languages.

The first such VMs were built for running blockchain smart contracts, but then in 2022 RISC Zero built a truly general purpose tool: a virtual machine that implements the RISC-V instruction set, built on ZK circuits. It can run compiled programs written in any language and outputs a “receipt”, that can be easily and quickly verified. Since then there has been a flurry of activity and there are are now many options, as of 2024:

  1. The RISC Zero zkVM is still the most mature implementation, and currently the only one that supports continuations to prove very large programs without running into memory constraints.
  2. Jolt is a highly performant and fully open source new zkVM.
  3. SP1 another highly performant open source zkVM.
  4. Powdr is a toolkit for creating zkVMs like RISC Zero.
  5. Valida is another zkVM with a “RISC-inspired” custom instruction set. They’re building a compiler for LLVM IR so that almost any compiled language can be used.
  6. Delphinius Lab WASM ZK VM a WASM ZK runtime - run WASM code verifiably
  7. Lurk is a Lisp-like language for writing programs that can be proved using zk-SNARKs.

This section won’t age well because this area is evolving so rapidly!

ZK performance costs

General purpose ZK computation comes with a heavy performance cost: currently 1 million to 10 million times slower than running the program directly.

For some specialized applications this overhead can be greatly reduced, and this is highly promising for verifiable machine learning and AI:

the greatest part of an AI computation is matrix multiplications, for which it is possible to make very efficient ZK-SNARKs or MPCs (or even FHE), and so the total overhead of putting AI inside cryptographic boxes is surprisingly low. – Vitalik Buterin

So I would expect AI-specific runtimes to eventually provide good GPU support with reasonable performance (much less overhead than there would be for doing general computation within ZK or FHE). An example of work in this space is the EZKL machine learning framework which allows existing PyTorch models to be run by a zero knowledge prover (so this is inference only, and not model training which generally requires much larger computation).

There are ZK-specific ASICs being developed, but we may need to wait for the pace of innovation to slow down and to converge on specific proof types before this is practical.

What about privacy?

Cryptography in general provides privacy as well as verifiability of data, and ZK and TEEs can both provide privacy and verifiability of computation.

The most common messaging around TEE-based products is privacy: cloud providers advertise their TEE-based VMs as confidential computing”. It is, after all, an encrypted VM that not even the host can access.

In contrast, a Zero Knowledge “prover” (the host, who is running the verifiable program) does have full access to the program and data. But privacy can be enabled by the fact that it’s verifiable. It’s now safe to let a client run it, keeping the inputs private.

As an aside, there is another promising branch of cryptography called Fully Homomorphic Encryption (FHE), which allows computation over encrypted data. Using FHE a cloud provider could run a program on your encrypted data, and give you an encrypted result which only you can decrypt. The data remains encrypted throughout the whole process so the cloud provider can’t access it. By itself this doesn’t provide verifiability, although it can be combined with techniques that do.

The future

The ability to outsource computation reliably is new, and it will change some of the most common patterns in software development, improving distributed software, and improving privacy.

Servers that request access to private data in order to process it can now let the client do the computation privately and securely instead, knowing that the result can’t be faked. Many things that required a server now may not. And distributed software of all types now has more efficient ways of coming to consensus on shared state than re-running the computation in every node.

Comments? Share it  or just comment.
2024 © Paul Dowman