Creator

Created

Jun 21, 2023 12:46 PM

@, June 2023

➡️

What to expect from this document? This is a recap of constructions we have considered in the past for proving storage in Filecoin. They are divided in two sections: (1) The first section lists replacements for SDR (ie, replacements for the Proof of Useful Space itself and therefore for the method used to add storage). (2) The second section lists alternative design to winningPoSt and windowPoSt (ie, how and when the storage auditing for the chosen PoUS is conducted).

🗣️ Please, be aware that this doc does not describes neither analyses each construction in full details. Links to the relevant documents are provided for this! [Some of these docs may not be public yet, sorry! Ping me if you need access.]

Proofs of Useful Space - adding storage
Asymmetric Constructions
ZigZag
DRG PoRep
Symmetric Constructions
Small SDR
NSE
Wide PoRep
Others…
PoSt & friends - auditing storage
Latency model
ElectionPoSt
Sequential ElectionPoSt
ChainPoSt
Cost Model
SurprisePoSt
BucketPoSt
Periodical Rescaling
ChainPoSt

	Security Model	Possible paired PoSt (auditing schedule)	Security analysis completed?	“Fast” (faster than SDR) retrieval ?
ZigZag	latency (based on SHA256 latency)	winningPoSt	no (sealstack to solve, bug in the pebbling argument to fix)	yes (asymmetric)
DRG PoRep	latency (based on SHA256 latency)	winningPoSt	no (sealstack to solve)	yes (asymmetric)
Small SDR	cost and latency (based on SHA256 evaluation)	windowPoSt/ chainPost	yes	yes (symmetric)
NSE	cost (based on SHA256 evaluation)	windowPoSt (or anything from the cost model category)	yes	yes
Wide PoRep	latency (based on SHA256 latency)	winningPost	yes	yes-ish (no explicit params)

Proofs of Useful Space - adding storage

Here we provide a list of possible replacements for the Stacked-DRG proof of useful space (aka “SDR”). For each construction, we provide (1) a very high level idea of the construction, (2) some comments about possible pros/cons and similarities respect to SDR (eg, security model) and (3) links to relevant documents with more details.

Asymmetric Constructions

We call a PoS “asymmetric” when the decoding algorithm (from replica back to data) is faster than the encoding algorithm (ie, the algorithm run by the prover/SP to go from data to replica).

Note any asymmetric construction used in the latency model (eg, in combo with winningPoSt) is subject to the following problem:

SealStack Attack: Assume we have a PoRep/PoUS/ with fast unsealing/decoding. The SP runs the sealing process and creates replica R (from data = 0, or any data); The SP runs the sealing process again with data = R and creates replica R'; The SP deletes R and keeps only R'. Every time a post (eg, winning PoSt) is required for R, the SP runs the unsealing algorithm on R' and gets R. If the unsealing algorithm is faster than 30s (block time/response time), then winningPoSt is broken. In other words we must have RetrievalLatency > RegenLatency > ResponseTime (Note: if unsealing algorithm is fast but still as expensive as sealing, then WindowPoSt is still secure and we can use the asymmetric construction in the cost model ) Possible solutions:
(sub optimal) Use the construction in the cost model with windowPoSt only;
Maybe use the construction for FIL+ deals (data can not be chosen by the prover);

ZigZag

Construction: It is similar to Stacked-DRGs (SDR) PoUS and presented in the same paper. Respect to SDR, it achieves fast retrieval because the decoding procedure can per run in parallel (ie, all nodes are decoded in parallel). In the paper, it was designed to work in the latency model (so it can work with any PoSt from the next section from the “latency model” category). The latency bound uses SHA256 running time on ASICs

It should be possible to use it in the cost model with similar analysis as the one we did for SDR. Comments:

Same spacegap as SDR (20%), similar or slightly worse params than SDR; Sector size can stay as today (32/64GiB);
⚠️ The security proof in the paper is broken (irene and luca found a bug that still need to be fixed and will have consequences on the parameters)
Sealstack

Links:

Ben’s paper

DRG PoRep

Construction:

Take a DRG, and use the “DAG encoding” to create the replica R from the data D: R_i = D_i + H(R_j for all j that are parents of i ) (note: encoding is sequential, decoding is parallelazible)

Comments:

Sector size can stay as today (32/64GiB);
Fast retrieval (10 s for 10MB)? Yes, this solution is an asymmetric porep with local decodability)
🔍 Large spacegap (if we use the DRG we have now)

Possible fix: find better DRG construction! See for example https://docs.google.com/document/d/1GAk87FUms8IZkJXJ4Kt5_Za_4Q1b-GnRDR2UA3UVt_o/edit#bookmark=id.omi8z5hfyxc3

It works well in the latency model, it can be used paired with winningPoSt and then a rescaling method (or fallback mechanism, see next section)
🔍 SealStack problem

Links:

Symmetric Constructions

We call a PoS “symmetric” when the decoding algorithm is the same as decoding.

Small SDR

Construction: The same as the currently used SDR, just smaller sector size.

Comments:

Sector size suggested to get fast retrieval: 512MiB/256MiB
As it is today for SDR, we can use this both with latency and cost model:

cost model: use windowPoSt (same as current shipped solution but more snarks per day per miner). Could use snarkpack/aggregation here to solve/improve this?
latency model: we can not use winningPoSt (latency bound is smaller than 30s), maybe we can used the ChainPoSt idea (requires VDF)

in 2021, we estimated the per day SNARK proofs generated for windowPoSt be higher than the one with SDR.

Note that the impact of this have changed now, since we use the optimistic approach and we could use SnarkPack.

Links:

https://docs.google.com/document/d/1GAk87FUms8IZkJXJ4Kt5_Za_4Q1b-GnRDR2UA3UVt_o/edit#bookmark=id.bp1h80okt0gw

NSE

Constructions: Based on labeling a layered graph. The graph is made by expander graphs and butterfly graphs (no DRGs) and it is also divided horizontally in small (ie, 4MiB) independent windows. These can be labeled in parallel, so helping with fast labeling and faster retrieval than SDR. Designed to work in the cost model, the main parameters were give for cost based on SHA256 evaluation (same as SDR) but some informal arguments about labeling being also memory/bandwidth hard were given (but no formal analysis for that)

Comments:

Sector size was much lager than 32/64GiB, but time to first byte was lower!
The idea was to use it together with windowPoSt, but we estimated the per day SNARK proofs generated for windowPoSt be higher than the one with SDR.

Note that the impact of this have changed now, since we use the optimistic approach and we could use SnarkPack.

Links: https://github.com/filecoin-project/research-private/blob/master/papers/nse.pdf

Wide PoRep

Construction:

Based on labeling a layered graph. The graph consists of a unique (larger) DRG layer on the top and then smaller (narrowed) butterfly layers below of it. Do the same labeling as in SDR and then use the “XOR encoding” to create the replica R from the data D.

Comments:

Goal: keep decoding reasonably cheap (and low-latency) while still getting strong latency bounds
Fast retrieval? maybe yes (no explicit params)
How do we do the audit (PoSt)? WinningPoSt should work, or others in the latency model. Note clear if we can use this in the cost model, maybe yes!

Links:

See construction 2 here https://github.com/filecoin-project/research-private/blob/master/papers/wide.pdf

Others…

Others (less clear or less studied proposal) proposals:

Stacked-Expanders and similar ideas (Stack expanders with much higher degree than the one used in SDR and improve the regeneration bound, to be used in the cost model)

link: https://www.overleaf.com/read/kgfyzjvsshwt

Tried to use constructions similar to the Stacked-Expander or NSE in a cost model where the main cost is given by memory access (ie, bandwidth hard graph), but did not manage to find good params. Links:

https://docs.google.com/document/d/12cy4VhPL26G76Y5ULkcUJutgHewwZDWP1jh7IzAfGI8/edit?usp=sharing

Expanded-SDR (This construction is based on the combination of small SDR graphs and a big expander that acts like a glue. To be used in the latency model, mainly)
Mem PoRep

Memory-intensive PoRep
DRGs with high degree, small sector size (e.g., 4 GiB)
Edges determined dynamically (as function of hash outputs)
Goal: rely on memory-hardness for adversary

Modern GPUs: very high memory bandwidth (~1 TB/s)
Either cost or latency based

WinPoRep (windowed porep, overaleaf by Juan)
VDE-PoRep

VDE stay for verifiable delay encoding (it can be instantiated using a VDF)
R_i = VDE(D_i+ H(comm_D))
it works in the latency model, VDF a_max can be problematic
see https://eprint.iacr.org/2018/678.pdf

Selective Unsealing

For example: “per node sealing” → Each node is sealed and unsealed independently

Toy Example: a "comb sealing", where each node of the key is the composition of multiple hash of different nodes) , K1 ← P1 ← P2 ← … ← PN , K1 =: H(P1) , Pi =: H(Pi+1), R1 = K1 + D1

Pros: We can have both faster retrieval and updatability
Cons: It seems this requires a huge amount of online challenges or sealing cost will spike!
This construction works in the cost model.
Link: https://docs.google.com/document/d/1fpxNiGOoPFhBbHETpVFrTYeUi88upyeOR6oEN5I-BiU/edit#bookmark=id.mee7y8y9um7

PoSt & friends - auditing storage

This section contains a list of alternative designs we considered for auditing storage over time before we decided to ship Filecoin with WinningPoSt and WindowPost. 👉 Note: not any PoSt can be used paired with any Proof of Useful Space. Once a PoUS is chosen, then a security model is chosen (latency or cost, rarely both are possible) and we can go a pick a PoSt in that category. Moreover, the explicit PoSt params will depends on the PoUS chosen. For example windowPoSt for SDR does not have the same parameters as windowPoSt for NSE.

Latency model

ElectionPoSt

Check every (or a fraction of) sectors at each epoch (eg, every 30 s), use the PoSt proofs as input for a standard VRF based election. Protocol overview:

At the beginning of the block:

Miner gets challenges from the chain (or drand)
A subset of sectors are randomly selected: selected = sample_rate * miner_size
For each selected sector, miner reads some random positions (can be done in parallel), hash do generating the “partial tickets”
Miner checks if any of the partial ticket is winning the VRF-based election

something like, partial ticket < target

A winning ticket and can be used as to submit a block (generate a single PoSt for the block)

the probability of scratching a winning ChallengeTicket is dependent on sector size and total storage. Miners are rewarded in proportion to the quantity of winning ChallengeTickets they generate in a given epoch, thereby incentivizing a miner to check as much of their storage as allowed in order to express their full power in a leader election.

Security (latency model):

if a sector is not stored, the miner has no time to regenerate and can to create the partial ticket

Major Problems:

if the sample_rate = 1 —> hardware requirements (i.e. if you are a big miner you have to try all your sector in a short period of time), checking all HDDs every 30s may not be not feasible!
if sample_rate ≠ 1 —> possible power table inflation

Links to past relevant docs:

Expected_Consensus.pdf400.8KB

Sequential ElectionPoSt

Comments:

Proposed to get fast sealing/unsealing

estimation in 2019: ~137s per 32GiB

The PoRep was DRG-PoRep
The security model was latency and it tried to solve this problem

large challenge-response time (1h) —> large sealing/unsealing time (120 hours), not good for product
short challenge-response time (1 min or 15 secs) —> miners not able to post on chain on time, footprint is high, but sealing is fast

Commit-then-prove paradigm

Protocol overview:

At the beginning of the block:

Miner gets challenge from the chain (or drand)
Miner generates a PoSt with sequential challenges (PoSt on sector i gives the challenges for PoSt on sector i+1)
Miner checks if they won the block using the output of the last post
Commit to the post output (witness)

After X blocks:

Miner submits a SNARK proof

Major problems:

same as ElectionPoSt (?)

Links to past relevant docs:

ChainPoSt

Challenge a new sector every x (each sector is chosen at random), post on chain only one SNARK after many sectors are proven.

drand seed —> PoStProof for R1 → VDF → PoStProof for R2 → submit on chain

To cover block time: 30 seconds say RegenLatency = x seconds

say VDF runs for y seconds

numbers of PoSt proofs = n

⇒ n*x + (n-1)*y = 30 seconds (n = (30+y)/ (x+y)

If you do not have a sector, you could lose the block reward (same as winningPoSt Rational argument) Attack:

run the VDF in y/A_max time say for example A_max = 2

Notes:

it should be possible to use this with “small SDR/small DRG” PoRep to get fast retrieval
security in the latency model: either most sectors are stored, or the probability of be able to post a valid SNARK is low
Link: https://docs.google.com/document/d/1GAk87FUms8IZkJXJ4Kt5_Za_4Q1b-GnRDR2UA3UVt_o/edit#bookmark=kix.gwtw1oh2qsha

Cost Model

SurprisePoSt

Given a proving period P consisting of k epochs and a power table with length M (meaning with M miners), at each epoch, M/k miners are sampled at random and need to prove their storage (or a fixed amount of his sectors).

Security: in the cost model

Deprecated in favour of WindowPost, since the second one was easier to implement and analyze.

Minor comments:

WindowPoSt was called “Scheduled PoSt” in some old docs;
We used “fallback PoSt” as a generic name for Surprise/Window PoSt. The idea was: the “main” PoSt (for security) is Election PoSt and then you need another mechanism for rescaling/power table inflation

BucketPoSt

Idea: Like winningPoSt (PoSt at election time, not periodically), but (1) use the cost model instead of latency (2) challenge a fraction of sectors instead of only 1

Problem: it require a minimum size for miners. Link: https://docs.google.com/document/d/1GAk87FUms8IZkJXJ4Kt5_Za_4Q1b-GnRDR2UA3UVt_o/edit#bookmark=id.bn7cr5ajl16j

Periodical Rescaling

[This is not a PoSt mechanism] There are periodical checkpoint where each miner locally updates his own power table according to the frequency each other miner posted Election/Winning PoSt proofs on-chain since the last checkpoint.

This can be used in the place of any fallback PoSt mechanism.

Problems:

may be hard to implement ??
it was not enough for “storage market soundness/security” (aka “sector accountability”): A miner has to prove that he can access/retrieve the sectors he is claiming (with some frequency). See https://docs.google.com/document/d/1gJrAmuUKsMSS4K-u3elPp11WE7NY1qt3Pzqw30mnbkE/edit

ChainPoSt

Challenge the same sector every few hours, post on chain only one SNARK after many PoSt proofs for the sector are collected.

See here: https://docs.google.com/document/d/1fpxNiGOoPFhBbHETpVFrTYeUi88upyeOR6oEN5I-BiU/edit#bookmark=id.8at2q1i9s940

Recap:

Consider the scenario of a scheduled/windowPoSt (ie, per sector, miner proves once every X hours). VDF can be used to space out different execution steps of a Proof of Space (aka PoSt) without having to post on chain each PoSt.

drand seed —> 1st PoStProof → VDF → 2nd PoStProof → submit on chain

Security (in the cost model):

VDF runs for Y hours

We need Y + operational time = X So Y needs to be close to X If this is not possible, repeat many time:

drand seed —> 1st PoSt → VDF → 2nd PoSt → VDF —> … n-th PoSt —> submit on chain

n*Y ~ X

Attack:

The attacker has SW and HW to run the VDF in Y/A_max (faster)

The attacker is not storing and regenerated for the 1st PoSt, then runs VDFs and the other PoSt proofs, as soon as he is done, it deletes the sector. RC = regen cost SC = storage cost for X hours (this is the cost for honest player) The attacker pays: RC + SC/X * (nY/A_max) ~ RC + SC/A_max

Or the attacker pays regeneration twice, to not store during VDF running time

Comparison formula: RC > SC* min {(A_max -1) / A_max, 1/2}

Without VDF: RC > SC

So we can have better parameters with the VDF!

Surveying prior endevors in Filecoin's Proof of Useful Space

Proofs of Useful Space - adding storage

Asymmetric Constructions

ZigZag

DRG PoRep

Symmetric Constructions

Small SDR

NSE

Wide PoRep

Others…

PoSt & friends - auditing storage

Latency model

ElectionPoSt

Sequential ElectionPoSt

ChainPoSt

Cost Model

SurprisePoSt

BucketPoSt

Periodical Rescaling

ChainPoSt