In this blog post, we present * Testudo* a new

**near**

**linear-time**$^*$

**prover**SNARK with the following advantages:

**Small & Universal Setup:**It uses a trusted setup of square-root size — i.e., for an R1CS of size $N$, the trusted setup is of size $O(\sqrt{N})$. For large circuits, this brings the trusted setup material to MBs rather than GBs.**Very fast prover:**Estimated to run more than ~5x faster than fastest Groth16 implementation (i.e., Bellperson) for data-parallel computations.**Small Proofs & Fast Verifier:**Constant size proofs and verification time.**Uses R1CS,**the most widely used approach to writing circuits in deployed libraries. This gives us the following advantages:- Switching to Testudo allows us to reuse existing code.
- It does not require rewriting circuits in a different arithmetization.

(*) Our prover runs $\sqrt{N}$ multi-exps of size $\sqrt{N}$, which is roughly $O(N * \lambda/\log(N))$ group operations with $\lambda >> \log(N)$ for security.

*Internals*: At its core, our SNARK relies on three carefully combined building blocks:

- a modified version of Spartan (sumcheck-based scheme)
- a pairing-based multivariate commitment scheme (modified version of PST)
- a final Groth16 layer.

Testudo has an ongoing implementation using arkworks with a blst integration with GPU support:

*Call for participation**:* We welcome anyone interested to contribute to the Testudo implementation. The project tackles many challenging points and is at the state of the art when it comes to proving R1CS circuits. See the last section for more information!

*Why the name Testudo?* Testudo was a type of battle formation that ancient Rome adopted, where its soldiers operated “under the hood” of their shields. Testudo, the proof scheme, is similar: a Spartan prover woking under the hood of Groth16.

## Context

Our initial motivation for developing Testudo was to improve the SNARKs used in Filecoin. Filecoin requires storage providers to prove to the whole network that they are holding the storage they had initially committed to. The circuit involved has $\approx 2^{30}$ constraints (one of the largest circuits used in practice today) and is verified by Groth16.

The computation is large enough to push current hardware to its limits: the big circuit is actually “broken down” into 10 subcircuits each of size $\approx 2^{27}$, due to limitations on the maximum size of the trusted setup. Also, one issue with Groth16 is the** *** function-specific *trusted setup which complicates deployment of new versions of the Filecoin protocol (e.g., a new proof-of-space) since this would require a new circuit. While Filecoin is an interesting specific study-case, issues of this kind may also apply more generally to other deployed systems with similar requirements.

Therefore, our goals were as follows:

- Universal trusted setup of sub-linear size
- Faster (or at least comparable) prover time than Groth16
- Short proofs and fast verification time
- R1CS-based, to be “off-the-shelf” compatible with current FIL Proofs circuits
- A SNARK that could leverage “data parallel computation” since FIL Proofs are largely data parallel

## A bit more in depth

Recall that an R1CS system is defined by three $N \times N$ matrices, $A,B,C$ and we say that it is satisfiable if there exists a witness $w$ such that $\langle A \circ w \rangle \cdot \langle B \circ w \rangle - \langle C \circ w \rangle = \vec{0}$

### Spartan Recap

At a high level, the Spartan prover consists of several steps as in the following:

### High Level Testudo Description

**Polynomial Commitments:**- In a
phase, the prover encodes $A,B,C$ as sparse polynomials $\tilde{A}, \tilde{B}, \tilde{C}$and commits to them via polynomial commitments (called*preprocessing***computation**in Spartan). We note that for**commitments**circuits (i.e. very data-parallel, with many sub-circuits repeating in regular patterns), this step is not necessary or much reduced in complexity, since the Verifier can efficiently compute $\tilde{A}, \tilde{B}, \tilde{C}$ on their own (or the polynomials $\tilde{A},\tilde{B},\tilde{C}$ are much smaller than $A,B,C$)**uniform** - In the
phase the prover computes a multilinear extension $\tilde{w}$ of the witness and commits to it using any multivariate polynomial commitment scheme. While Spartan uses a discrete-log transparent scheme (Hyrax) we use our modification of PST described below.**online** - Note that the polynomials are of size $O(N)$ here, corresponding to the number of R1CS constraints.
**Sumchecks:**- The Spartan prover executes two sumcheck protocols sequentially using $\tilde{A},\tilde{B},\tilde{C},\tilde{w}$ to prove the satisfiability equation of the R1CS constraints $\langle A \circ w \rangle \cdot \langle B \circ w \rangle - \langle C \circ w \rangle = \vec{0}$
- The sumcheck verifier naively runs in time $O(\log N)$. In Testudo we encode this verifier into an R1CS circuit and the prover provides a Groth16 proof of knowledge of the accepting Spartan sumcheck proof.
: At the end of the sumchecks, the prover needs to show to the verifier the value of the polynomials $\tilde{A}, \tilde{B},\tilde{C}, \tilde{w}$ on a random point $\vec{x},\vec{y}$ (of size $O(\log N)$)**Opening of polynomial commitments**- 🐌 To open the modified PST commitment for the
, the prover operates on an $O(N)$-sized polynomial, which requires several polynomial divisions to get the quotient polynomials necessary for the proof, plus $O(N)$-sized exponentiations.**witness** - 🐌 The prover also opens the
**Computation****Commitments**which is where the Spartan prover spends a**,***large*chunk of its time. We recall however that on uniform circuits this step is not mandatory as verifier can open the matrix himself. - These openings can be either left in the “clear” for a $O(\log N)$ size proof (the PST opening of the witness polynomial) or also fed to a Groth16 prover for a constant size proof. We discuss the benefits and challenges of either approach below.

### Testudo Details

In Testudo, we apply several optimizations to the blueprint above, both at prover and verifier level.

- 🚀
We note that above we require a trusted setup for**Moving to universal trusted setup:** - the modified PST polynomial commitments,
- Groth16 on the sumcheck and polynomial commitment verification — we note that these circuits are of size $O(\log N)$ which is their contribution to the size of the trusted setup.
- 🚀
: PST, like KZG, requires a trusted setup of size $O(N)$. By**Reducing the trusted setup size***observing a tensor product structure*in the opening proof of PST, this can be reduced to $O(\sqrt{N})$. In practice, this means going from GBs to MBs when loading/storing a setup. This is obtained by designing a version of PST that works together with products for inner product arguments (MIPP) - We briefly exemplify this on a polynomial defined by 4 evaluation points over the hypercube. Note how the
`PST.Commit`

steps only operates on a polynomial of half of the degree!

Both are independent of the specific R1CS being proven, yielding a universal trusted setup.

- 🚀 The verifier needs to verify (a) one inner-pairing product proof of size $O(\log(\sqrt{N}))$ and (b) one PST opening of the same size (vs $O(\log(N))$ before; constants matter 😉).
- 🚀
**Faster Proving Times** - 🚀 Proving operates on $O(\sqrt{N})$-sized polynomials now instead of $O(N)$: so even if there are more polynomials, the proofs can be parallelised and are thus faster to compute. This holds in particular when it comes to the quotient polynomials which are smaller.
- 🚀
: As an example, in Filecoin Proofs, we verify hundreds of Merkle tree openings. You can think of it as applying a subcircuit (”Verify one Merkle Tree opening proof”) many times over.**Large speed up for uniform computations** - We now treat the “R1CS Matrix” as the matrix for the small subcircuit.
- 💡 In the sumcheck phase, we simply “concatenate” each witness (for each subcircuit) together. In theory, we should also concatenate the R1CS matrices together, but, since they are the same,
*the prover doesn’t have to do that in practice and we can leverage the specific details of the Spartan protocol here (*a slightly more formal draft*)* - Here is an example where we repeat the same circuit twice. Note that the prover only needs to commit and keep in memory the small blue matrix.
- 🚀 While the witness polynomial remains the same length, the computation commitment now only operates
*on the small circuit*which is a significant speed-up - 🚀
**Fast Verification and Small Proof Size** - 🚀
A sumcheck operates on finite fields only. Therefore, in Testudo, the prover encodes the verification of the sumcheck as a circuit and gives back a SNARK proof to the final verifier. Testudo uses Groth16 to implement this verifier, which still keeps the “universality” of Testudo because sumcheck verification doesn’t change with the user circuit.**Constant-time sumcheck verification:** - 🚀
We can also apply a final proof system to compress the verification of the polynomial openings. These openings are of size $O(\log\sqrt{N})$, which is small enough to run inside a circuit.**Constant-time polynomial commitment opening verification:** - We call this the
*outer proof*over the*outer curve,*see next section for more details. - 🚀 This yields a constant size proof and constant time verifier!

## Testudo81 and Testudo77

**Originally**, we wanted to realize Testudo on BLS12-377 (hence, * Testudo77*) because of its nice 2-chain property. This enables us to use BW6 (the “outer curve”) to efficiently prove statements about elliptic curve operations natively. For example, to verify the PST opening inside a proof where one needs to do scalar multiplication and pairings.

**However, Filecoin is operating on BLS12-381!** That means, in order to introduce this proof system in Filecoin, we would require storage providers to re-encode their storage using the new curve, mostly because of Poseidon which is field-dependent (

*SHA256 for example).*

*unlike**is our version on Testudo that runs on BLS12-381. The main issue is that it is*

**Testudo81****possible to use Groth16 naively on top of BLS12-381 because any “outer curve” will lack high 2-adicity in its scalar field, required to compute FFTs efficiently (see Timofey’s post for more information about that).**

*not***The problem**: * How can we be backwards compatible, so that Filecoin storage providers don’t need to re-encode their storage? *

### Testudo77

This version is the simplest and most elegant solution if we can afford using these curves. Basically it runs an external Groth16 proof system on BW6 to verify both sumcheck and polynomial commitment openings.

🛗 **Aggregation**: An additional advantage is that we can use tools like * Snarkpack* to further aggregate Testudo77 proofs!

**Outer proof constraints: **We expect the number of constraints to be less than 10 millions which results in a few seconds of additional proving time using Groth16 (which is totally fine for our setting).

### Testudo81

In this case since we cannot use Groth16 to compress the proof size and verification time, we could leave them in the clear for a $O(\log N)$ size proof. Alternatively we propose to use a subset of Testudo itself (i.e. the modified Spartan component which does not require FFT) to compress the polynomial commitment openings (using for example the Yeti curve as the “outer curve”). While this version achieves compatibility, it loses, however, in compactness and verification time—the “Spartan part” of Testudo is not constant-size (asymptotically this would be a $O(\log \log N)$ proof size/verification time, with some high constants).

## Implementation

The work-in-progress implementation is open source on Github. Currently, it features:

- The Groth16 verifier of the sumchecks
- The square root version of PST + MIPP
- Arkworks wrapper around the fast blst library with GPU integration (repo)

**Benchmarks**

**Benchmarks**### Modified PST

One of our main contributions is a very fast implementation of a multilinear polynomial commitment, built on top of the existing arkworks library. We present a comparison between the two versions on a polynomial of size $2^{25}$

Commitment (s) | Opening (s) | Verification (ms) | Proof (KB) | Committer Key | |

arkworks PST | 34 | 184 | 14 | 2 KB | 9.6 GB |

testudo PST | 24 | 1.2 | 32 | 17 KB | 2.3 MB |

**Verification time**: Note that this time has more than doubled. It is due to

*non optimized*implementation. For example, the PST and MIPP verification are happening in isolation, i.e. the pairings are evaluated separately instead of together. Multiple other optimization have to be implemented to verify both parts

*together*.

### Estimation using uniform circuits

We have the necessary building blocks to estimate accurately the proving time of a uniform circuits (even though the implementation does not yet offer that feature).

Specifically, we need to add the time for

- The first sumcheck on the full R1CS matrix (SC1)
- The second sumcheck on the small subcircuit (SC2)
- The PST times on the full witness size - commitment and opening combined (PST/MIPP)
- The computation commitment time on the small subcircuit (CC)

We have all this for a subcircuit of size $2^{20}$ repeated $128 = 2^{7}$ times, giving a circuit of size $2^{27}$ constraints.

R1CS | PST/MIPP (s)
Comm + Opening | SC1 (s) | SC2 (s) | CC (s) |

$2^{27}$ | 47 + 3 | 105 | 107 | 3944 |

$2^{20}$ | 1.17 + 0.454 | 0.843 | 0.773 | 30 |

**Total Proving Time:** PST($2^{27}$) + SC1($2^{27}$) + SC2($2^{20}$) + CC($2^{20}$) = **185s**

Comparison $2^{27}$ R1CS Bellperson: **1020s**

**Speedup factor: 5.1x **🚀

💡 As you can see, the CC is the most expensive part. There are many improvements to be done at this early stage that can drastically reduce the proving time there. We remind the reader that for uniform circuits this cost can be eliminated however.

### Comparison with Plonk-ish techniques

When it comes to universal trusted setup proofs, many systems today do not use R1CS but rather “custom gates” (sometimes also called Plonkish arithmeization), and apply SNARKs such as Plonk (or alternatives such as Hyperplonk) to the resulting constraint systems. The use of “custom gates” makes a comparison to pure R1CS-based schemes not immediate. We are still working on achieving meaningful comparisons but we estimate that Testudo is competitive with approaches that do use custom gates.

## Open problems

We have yet to solve still a few problems down the road:

- ☄️
Snarkpack allows a prover to only keep Groth16 proofs locally and then aggregate them all at once. This is very light in memory, even for a large number of proofs.**Efficient “streaming” aggregation:** - For Testudo81, can we achieve the same functionality and performance as Snarkpack? See our draft for our current thinking and problem.
- For Testudo77, can we achieve a
*faster*aggregation? Instead of “fully” proving a Testudo proof, can we keep some small intermediate steps and finalize the last parts of Testudo over all proofs? - 📚
Some open questions are:**Compact proof on Testudo81:** - How to achieve a compact proof using Testudo81?
- Can we use another proof system, with non-native arithmetic in a practical way?
- Can we use a mixed FFT algorithm to go around the 2-adicity problem?
- 🚀
: Currently, the computation commitment is**Faster computation commitment and CRS for it****still****a bottleneck**compared to the rest (whenusing data parallelism). One area to explore is taking advantage of the PST commitment scheme here and batch it with the PST of the witness.*not* **Different Polynomial Commitments**: Another design option would be to choose Dory as it could simplify the implementation of the Computation Commitment. In general, being agnostic to the polynomial commitment would help to try different strategies.

## Call for participations

We’re looking for enthusiastic engineers to help us push this effort forward. We believe that making this in the open is gonna give the best results. This is a complex piece of software and the structure behind is challenging.

There are both design and engineering challenges that are left open. On the implementation side, some of the items we would love to work on are:

- ⚙️
: Basically bringing a “ConstraintSystem” that enables development of complex R1CS circuits. The original Spartan test codebase only created R1CS manually.**A R1CS circuit building layer** - ⛓️
that greatly reduces the proving time over uniform circuits. This can be a game changer for Merkle tree path verification for example.**Data-parallel Testudo** - 🏎️
: For example, bringing the whole PST commitment to GPU**GPU-optimized schemes***at once*, bringing the computation commitment to the GPU, and even look at parallel computation of the sumcheck on the GPU. - 📦
By moving CC to use PST for example we can apply several optimizations combined with the PST commitment computed for the witness. We can also reduce the size of the preprocessing data required during proving time, which can easily attain TB for circuits of size $2^{30}$.**A simplified, and more compact computation commitment (CC):**

If you want to help, please reach out on the discord server of cryptonet or email us here! Feel free to discuss over Twitter.

## Acknowledgements

We thank Srinath Setty for helpful pointers about the Spartan codebase that unlocked us many times!

# Team

The Testudo effort was started inside the cryptonet team. The main Testudo team is composed of: **Matteo Campanelli **(cryptonet), **Nicolas Gailly** (cryptonet), **Rosario Gennaro **(cryptonet/CCNY), **Philipp Jovanovic** (UCL), **Mara Mihali **(formerly UCL/cryptonet), **Justin Thaler** (a16z/Georgetown).