# 1. TL;DR

## Bad News

Q: Is a big space gap a problem?

A: Seems like it; doesn’t seem to be a way to fix it (sections 2-3 and 6 below)

Q: What does that mean for the cost of unsealing?

A: I don’t see a way to improve the cost of unsealing, because we don’t have constructions with small space gaps and low cost of unsealing.

Q: Can we have a big space gap, but have both good and bad guys take advantage of the space gap?

A: Doesn’t seem like it, because effectively you’d get a new PoS with a small space gap, so that means you need to know how to design such a PoS (section 4 below)

## Good News

Q: Can we think of bigger space gaps and incorporate the space gap into the rationality analysis?

A: Yes. In fact, if we do so, we will almost certainly improve the cost ratio for almost any construction we can think of. However, we STILL need some result that shows the cost is high for a small space space gap. (Section 5 below)

Q: What does it mean for SDR as currently deployed?

A: I think we can improve the cost ratio by another factor of 3-4 on top of the 11x improvement we’ve already had. In other words, we may be able to prove that the adversary’s cost is equivalent to recomputing about 7 layers of the 11 layer graph (currently we have about 2.2 layers; before that, we had about 0.2 layers)

# 2. Basic summary of the traditional analysis

**Traditional Analysis. **Typical proof-of-space security analysis goes like this. Suppose the space requirement is $n$, and we assume that adversary can perform the PC1 computation (that is, pebbling the whole graph; equivalently, unsealing) at cost $\sigma$. (Because we don’t know for sure what the best technology available to the adversary is, we must take $\sigma$ to be smaller than the typical cost of PC1 computation during sealing; assume $a\cdot \sigma$ is the cost of the actual PC1 computation for typical storage providers, and thus $a$ is the ratio is of costs between typical and what is available to the adversary.) If the adversary stores less than $(1-g)n$, then at execution time, ≈the adversary will have to do redo (in expectation) some fraction $r$ of the cost $\sigma$. This $\epsilon$ is known as the

*spacegap*.

**Application of the Traditional Analysis to the Latency model**. This analysis seems sufficient in the latency model as long as the space gap $g$ is small enough: if $r\sigma$ must be done sequentially and the verifier will time out before it’s done, then the verifier will catch any adversary who stores less than $(1-g)n$; if the penalties for being caught are significant enough, the adversary will be incentivized to store at least $(1-g)n$.

**Application of the Traditional Analysis to the Cost Model. **In the cost model, assume the cost of storage of $n$ for the entire time between execution is $s$. Then the adversary who erases everything will incur cost at most $\sigma$ (perhaps less, depending on what it costs to compute query answers, which may be less than computing everything); to ensure a rational adversary does not do so, we need to at least ensure $\sigma>s$.

But what about an adversary commits some amount of storage $(1-\epsilon)n$ that is less than $n$ but more than $0$? With the traditional analysis, here’s the best we can say:

1) If $\epsilon<g$, all we know is that the adversary will have cost $(1-\epsilon)s$.

2) If $\epsilon\ge g$, all we know that the adversary will have expected cost $(1-\epsilon)\cdot s+ r\cdot \sigma$; since we don’t anything else about $\epsilon$, we get just $r\cdot\sigma$.

Thus, to ensure that erasing more than $gn$ is not a rational strategy, we need $r\sigma>(1-g)s.$ (Note that there is nothing we can do to ensure that erasing less than $gn$ is not a rational strategy — given this type of analysis, it may be perfectly rational.) This leads us to conclude that to $\sigma>(1-g)s/r.$ Noting that typical parties incur cost $a\sigma$ to unseal, we get a that the unsealing cost is at least $a(1-g)s/r$. Noting that that $r<1$, that’s a pretty high unsealing cost.

In other words, if we want to ensure that storage, rather than computation, is a rational strategy, then cost of unsealing must be at least $a(1-g)/r$ more than the cost of storage between executions. In our current implementation, $1-g=0.8$. The probability of being caught is, $1-(1-0.1)^{10}\approx 0.65$ (because the probability of being caught is 10% per challenge per our analysis, and there are 10 challenges), and if you are caught, you pay $r=1/5$ (per new analysis; in the previous analysis it was $1/55$), so the expected cost is $0.65/5=0.13$. Setting $a=10$ (we don’t actually know what $a$ is, but this has been considered reasonable), **we get that the cost of unsealing must be at least **$10\cdot 0.8/0.13 \approx 60$** times the cost of storage between WindowPoSTs in order for storage to be rational.**

# 3. How was the space gap incorporated into rationality in the traditional analysis?

Why do we want a small space gap in the cost model? The lower bound on the cost of unsealing actually favors a big space gap. The only reason we want a small gap is that we want to be a proof of space, and the smaller the space gap, the more space we can be sure is occupied. In the extreme, imagine we had a huge space gap of $g=99\%$. Then it would be rational to store $1\%$ of each replica, and that’s all we can prove. That doesn’t feel much of a proof of space. Moreover, the typical honest participants store $100\%$ of each replica, because they actually provide storage for data, which they can’t compress. This would make it easy for the adversary to take over the majority of the space in the system (WinningPoST helps with that, but that’s another story).

# 4. What if the space gap is large but everyone can take advantage of it?

In the analysis above, we assumed the adversary could erase $g$ fraction of the storage without any penalty for answering queries. This, of course, begs the question of whether the honest parties can do the same.

It certainly possible to come up with PoS constructions in which the honest parties can erase some amount of storage reduce their costs of answering queries (see the end of this section for an example). Then they should do so! The result is simply a new PoS construction. In other words, after the honest parties took advantage of the space gap, all you have is a different PoS. In that new PoS, we won’t know how to take advantage of the space gap anymore (else, we can improve the PoS again). But this new PoS will still likely have a space gap against parties who don’t follow the protocol, because of the difference between what know how to do and what we can prove is impossible to do (i.e., the difference between known upper and lower bounds, and the ratio $a$ of known cost of computation to adversarial cost of computation).

The situation is more subtle in a PoRep, because removing a portion of the PoS reduces your available PoRep storage accordingly (else, your are not actually saving on storage, and then you could as well just keep the PoS as is, since your are not gaining anything by taking advantage of the space gap). And if we care about cost of sealing / unsealing, then removing a portion of the PoS and the corresponding storage may actually increase the per byte cost of sealing / unsealing (since you may be erasing bytes that are cheaper to seal / unseal).

So even if you can take advantage of the space gap, in a PoRep context it may have other costs.

So, for the discussion below, we assume a PoRep construction is fixed, and honest parties execute it as is, because they have to store data.

**Example of a spacegap everyone can take advantage of. **For example, suppose we have a single-layer PoS (full graph, not just the tail) of in-degree $d$. Take any node. Erase it. Mark its $d$ ancestors as unerasable. Now erase any unmarked node. Mark its ancestors as unerasable. Repeat until all nodes are marked as unerasable. At each step, you mark $d$ new nodes and erase 1, so you can erase $1/(d+1)$ fraction of the nodes before all the nodes are marked. (You can probably improve this to $1/d$ by erasing unmarked nodes that have at least one edge from a node already marked as unerasable, but you have to prove that such a thing exists at every step.) Note that answering queries for unerased nodes is same as before, and answering queries for erased nodes requires just a single hash of $d$ ancestors. So for a random query, with probability $1-1/(d+1)$ you do one lookup, and with probability $1/(d+1)$, you do $d$ looksup plus one hash. Thus, expected cost of answering a query is about two lookups and one hash. So the honest parties can also erase those nodes if the cost of one lookup plus one hash is less than the cost of storage of $1/(d+1)$ fraction of the PoS.

# 5. Improving the traditional analysis by considering multiple space gaps and rationality

In reality, a rational adversary can use $(1-\epsilon)n$ storage for any $\epsilon$. What if instead of a single space-gap analysis for $\epsilon=g$, we had a function $r(\epsilon)$ which told the adversary the (expected) work $\sigma\cdot r(\epsilon)$ needed to pass execution? We know $r(0)=0$ (execution is trivial to pass) and $r(1)\le 1$ (even if you store nothing, you won’t have to do more work than pebbling the whole graph, or PC1). Then the rational (risk-neutral) adversary will choose $\epsilon$ that minimizes the cost of storage plus the expected cost of answering, i.e., $(1-\epsilon) \cdot s+\sigma \cdot r(\epsilon)$.

The current SPR proof tells us only that for $\epsilon=0.2$, $r(\epsilon)> 1/5\cdot 0.65 = 0.13$ (assuming we use 10 queries in the execution phase, because 0.65 is the probability of being caught). But imagine we could also prove (something like this seems plausible, because we add a new almost fully unpebbled layer for each time we take away 0.1113 pebbles) that

- $r(0.3)>0.18$
- $r(0.4)>0.23$
- $r(0.5)>0.28$
- $r(0.6)>0.33$
- $r(0.7)>0.38$
- $r(0.8)>0.43$
- $r(0.9)>0.48$

Then what we know about an adversary who chooses to save, say $0.69n$ storage, is that expected computation is at least $0.38\sigma$. That’s much better that what we know from the traditional single space-gap result, which allows the adversary to save almost all the storage and gives the expected computation of only $0.13\sigma$.

We can convert this to a more smooth result, “$r(\epsilon)>0.5\cdot\epsilon + 0.03$ as long as $\epsilon>0.2$”.

Then to argue that it’s irrational for the adversary to reduce storage by more than $g=20\%$, it suffices to argue that for all $\epsilon>0.2$, $(1-\epsilon) \cdot s+\sigma \cdot (0.5\epsilon + 0.03)>0.8s$. Because the left-hand-side is linear in $\epsilon$, it’s minimized at the extremes $\epsilon=0.2$ or $\epsilon=1$. Note that at $\epsilon=0.2$ the inequality is definitely true (as it becomes $3\sigma\cdot (0.5\epsilon+0.03)>0$) and at $\epsilon = 1$ it becomes $0.53\sigma>0.8s$. Thus, it is not rational to save $20\%$ or more storage as long as $\sigma>1.5s$. This is four times better than the single spacegap argument, which would require $0.13\sigma>0.8s$, or $\sigma>6.2s$.

# 6. Can we live with a large space gap?

Even the improved analysis above starts with a small space gap of $0.2$. That is, saving, 19.99% storage may well be a perfectly rational decision because we have proven nothing interesting about how much that would cost.

What if our only starting point was a big space gap; that is, what if there was nothing we could prove about a small space gap? Then the improved analysis doesn’t seem to do much vs the traditional analysis: if the space gap is large, we still have a problem.