- Fast Retrieval
- Overview: retrieval via unsealing or extra copy?
- Hot/cold storage paradigm
- Option A: miners to keep an extra unsealed copy of the data.
- Rationality of the extra copy strategy
- Option B: Solving the SealStack problem and design a PoRep with fast unsealing.
Author: @Irene Date: December 2021
Fast Retrieval
Overview: retrieval via unsealing or extra copy?
If we assume that data are retrieved directly form their sealed copies (ie, replicas), then in order to get fast retrieval we need to work on the PoRep protocol and replace SDR with a PoRep which provides fast unsealing. Option B below shows a possible research paths following this direction (not easy). However, we believe this is not the only way to attack the problem since in most of the cases retrieval will happen without unsealing. We believe that miners can store the replica plus a copy of the data unsealed (“extra copy” strategy) and use this to quickly answer retriever queries (see Option A below). Why this? For two reasons:
- Reason Number 1: hot/cold storage paradigm;
- Reason Number 2: rationality of the extra copy. That is, we can prove that for any file with a retrieval frequency greater than the replica audit frequency, then the extra copy strategy is the rational one (see at the bottom of this doc for details).
Hot/cold storage paradigm
Usually, a storage provider has multiple copies of a file in order to guarantee reliability to clients. These copies do not have all the same features and are not all stored on the same medium (hardware), indeed we can distinguish in
- copy in hot storage: used for frequently accessing the data and for answering clients’ retrieval requests in very short time (ms); stored on the latest and fastest drives or SSDs.
- copy in a cold storage: used for backup. It is often the default option for the “second copy” and cheaper than hot storage because uses slower drives.
Note that hot data storage because it’s resource-intensive and has higher cost (so, cloud data storage providers may charge a an extra for providing this service)
We think that the cold/hot storage distinction makes sense for Filecoin as well! Sectors represent cold storage, where the data are stored (in encoded form) for the entire duration of a storage deal. For data that are frequently requested for retrieval, miners offering the retrieval service has a copy of the data in the clear (ie, not encoded), this extra copy is the hot storage in Filecoin. Keeping the extra copy has an higher cost since it requires extra space, but the market will take care of this. Moreover we can show that it is still the rational option for frequently requested data.
Option A: miners to keep an extra unsealed copy of the data.
(feasible)
We already know that this last option is the rational strategy if retrieval happens at least once in a day (see section below for details). For this reason, we believe this option is the most interesting to explore as first step.
The only missing step here is to enforce the daily retrieval (in the cost model, we could do this adding a daily PDP, in the time model we could use a non-perfect VDF).
Cons:
- space utilization is not optimal, are we okay with this?
- SHA CommD makes this option expensive to prove
Rationality of the extra copy strategy
Respect to retrieval, a miner can choose among two approaches:
- Storing only the replica and answering retrieval queries via unsealing (unsealing strategy), or
- Storing the replica plus a copy of the data unsealed (“extra copy” strategy).
The choice among the two depends on the different use-cases and on the unsealing cost and time. However, we can state that
To support the statement above, we provide the following argument:
TERMINOLOGY:
- Extra Copy Threshold: if a file is requested T_file times in a day, then storing the extra copy costs the same than retrieving the file from the replica via unsealing for each request;
- Retrieval frequency (RFreq) = number of retrievals in a day;
- Trivially, if RFreq > T_file ⇒ the extra copy is the rational (ie, cheapest) strategy.
- Replica check frequency (CFreq) = number of times that a replica is audited via a PoST in a day (eg, today we have CFreq = 1);
ANALYSIS:
Step 1: we show that T_file < CFreq
In order to avoid the seal-stack attack we must have that the cost of passing the check via unsealing the challenged nodes is larger than the cost of storing the replica:
UnsealingChecksCost * CFreq > StoringReplicaCost_for1day.
On the other hand, by definition we have that :
UnsealingFileCost * T_file = StoringFileCost_for1day.
Putting all together, we find that
⇒ T_file < (UnsealingChecksCost / UnsealingFileCost)* CFreq
Assuming that for efficiency we check very few values (ie, the size of checks is smaller than the average file size), we have that UnsealingChecksCost ≤ UnsealingFileCost and therefore for any file, T_file < CFreq.
Step 2: now we have that RFreq ≥ CFreq ⇒ RFreq > T_file, and the extra copy is the rational strategy.
Option B: Solving the SealStack problem and design a PoRep with fast unsealing.
(very hard research problem)
- Problem 1: Design a PoRep with fast unsealing (asymmetric or not).
- We only know one (broken) asymetric PoRep (ZigZag) which have fast unsealing. For this option we suggest a FastR&D effort to understand if this is feasible/interesting path.
- Symmetric solutions: We know how to get to ~10 s retrieval for file of size 10MB thanks to the previous effort (Q1&Q2 2021), but decreasing this values again is not trivial.
- Problem 2: For all the design above, we know that SealStack attack applies and winning PoSt is broken. SealStack seems an hard problem to solve, we have brainstormed only few ideas. But in order to be able to deploy a PoRep with fast unsealing (faster than 30s), we need to solve it!
SealStack Problem
Assume we have a PoRep with fast unsealing. The SP runs the sealing process and creates replica R (from data = 0, or any data); The SP runs the sealing process again with data = R and creates replica R'; The SP deletes R and keeps only R'.
Every time a post (winning or window post) is required for R, the SP runs the unsealing algorithm on R' and gets R.
If the unsealing algorithm is faster than 30s (block time), then winningPoSt is broken. (Note: if unsealing algorithm is fast but still as expensive as sealing , eg something parallelizable, then WindowPoSt is still secure)