📂
Public project doc:
Protocol Overview
We have an initial consortium of N auditor parties A_1,...,A_n. At this stage, the aim of these nodes is to scan the network according to storage deals that are on-chain (each storage deal needs to provide retrieval).
In particular, this means that parties being audited are essentially storage providers. In future versions of the protocol, we’ll open the possibility to any IPFS node to be audited. In order to do so we’ll put in place mechanisms like indices which will allow non-SP parties to be audited according to what they claim to store and keep available for retrieval
Step 1: Initialization
Auditor List is in a Smart Contract. A commitment to the Auditors list is stored onchain.
Parties involved in Step 1:
- Auditors (passive)
- Smart Contract on-chain storing the Auditors list
Step 2: Survey Protocol
Survey protocol starts at epoch1_start
and ends at epoch1_deadline
- Each Auditor A_i, for each
storage_deal
on a CIDcid
queries the correspondent storage provider SP_j and checks - Time to first byte (Metric 1)
- Average Retrieval Speed (Metric 2)
- Retrieval Success (Metric 3)
- According to the survey results, each auditor A_i builds a local table of the form
- T_i = [SP_ID , Metric 1, Metric 2, Metric 3]
- Local table T is committed
- A_i.Commit = Comm(T_i) [note: here and below, we can either use Comm Sha256 or a more structured VC]
- Commitment is signed
- SignedA_i.Commit = Comm(sk_{A_i}, A_i.Commit)
- The pair (A_i.Commit, SignedA_i.Commit) is posted onchain
- For now we are not considering Availability of Auditors' table. The aggregator is trusted in the sense of propagating the tables he received
- A Smart Contract checks SignedA_i.Commit
- Verify(pk_{A_i}, A_i.Commit)
- T_i is stored offchain by A_i
Survey Protocol ends at epoch1_deadline
Parties Involved in Step 2
- Auditors
- Storage providers
- Smart Contract Checking valid signatures on tables
Note on the Survey Protocol Step:
- Auditors' Tables Auditors' tables do not need to be accessed by any smart contract. This means that the way they are built and organized can be adapted to the protocol needs (i.e.: having a unique table per Auditors Vs having different per metric tables,...). The main driven here should be how to simplify the proof of correct aggregation step.
- Auditors Offchain Storage We assume Auditors to maintain a list of SP public keys offchain, and their mapping to SP_IDs
- [ ] PubKey
Step 2: Sharing Auditors Tables
The sharing step starts at epoch2_start
, right after epoch1_deadline
and ends at epoch2_deadline
.
- Each Auditor A_i sends the table T_i offchain to the Aggregator
Step 3: Aggregation
Aggregation step starts at epoch3_start
, right after epoch2_deadline
.
- (optional) Aggregator Aggr posts onchain a list of Auditors A_1,...,A_k from whom he received the table of the form, and the matching of the table with the commitment (0/1)
- [A_1, A_i.Commit, 0/1 ; ...; A_k, A_k.commit, 0/1 ]
- Aggr aggregates the data from the valid tables received from A_1,...,A_n (tables which do not verify against the correspondent commitment are not considered by the aggregator) and produces
- The aggregation is an average on the internal values received for each SP, for each metric (i.e. we do not consider lowest and highest level). The result is something of the form
- T1_Aggr = [Metric 1; SP_ID1 , Metric1_value; ...; SP_IDn, Metric1_value]
- T2_Aggr = [Metric 2; SP_ID1 , Metric2_value; ...; SP_IDn, Metric2_value]
- T3_Aggr = [Metric 3; SP_ID1 , Metric3_value; ...; SP_IDn, Metric3_value]
- Aggr Commits to the tables
- Aggr.Commit1 = Comm(T1_Aggr)
- Aggr.Commit2 = Comm(T2_Aggr)
- Aggr.Commit3 = Comm(T3_Aggr)
- Aggr signs the commitments
- SignedAggr.Commit1 = Sign(sk_{Aggr}, Aggr.Commit1)
- SignedAggr.Commit2 = Sign(sk_{Aggr}, Aggr.Commit2)
- SignedAggr.Commit3 = Sign(sk_{Aggr}, Aggr.Commit3)
- The pairs
- (Aggr.Commit1, SignedAggr.Commit1)
- (Aggr.Commit2, SignedAggr.Commit2)
- (Aggr.Commit3, SignedAggr.Commit3)
- A Smart Contract checks SignedAggr.Commit1, SignedAggr.Commit2, SignedAggr.Commit3
- Verify(pk_Aggr, SignedAggr.Commit1)
- Verify(pk_Aggr, SignedAggr.Commit2)
- Verify(pk_Aggr, SignedAggr.Commit3)
- Aggr runs Data Availablity Protocol for T1_Aggr, T2_Aggr, T3_Aggr
- Data Availability Protocol in the most straightforward case would mean posting the aggregated tables on-chain
- A smart contract checks correct aggregation
- Verify(Aggr.Commit1, Aggr.Commit2, Aggr.Commit2, A_1.Commit, A_k.Commit ) → 0/1
Note that we could reconsider the way we aggregate (i.e. live with arithmetic average for the MVP) if this would simplify the way we prove correct aggregation (i.e. using vector commitments for which excluding some values would turn the proof of correct aggregation way more complicated). Note: Given gas costs, especially in the long run, we can consider using a unique table rather than three different ones. See cost analysis below (option 2). The dynamics of the protocol remains unchanged.
are sent onchain
Aggregation Step ends at epoch3_end
Note: We considered here aggregation step happening according to Table Onchain Creation, Option 1. If we consider Table Onchain Creation, Option 2 and Option 3, the aggregation step needs to be modified accordingly (mainly for what regards the way the aggregator builds the aggregated table)
Parties involved in Step 3
- Aggregator Node
- Smart Contract checking correct aggregation and signatures
Note on Correct Aggregation Step
We have different options for checking aggregation
- Option 1: Use a Snark
- Option 2: Use Vector Commitments (overhead for using internal value still TBD. Easy if we want to use arithmetic average)
- Option 3 Data availability protocol for Auditors' tables, not only for Aggregator's table
- aggregation can be optimistic w/ fraud proofs
Assumptions:
- In the protocol above we assumed a Data Availability Layer at least for the aggregated table produced by the Aggregator
- We assume Aggregator to be honest in reporting the tables he received
Step 4: Client Query
- Clients can query the table via a smart contract
- Queries supported by the Client Query step: TBD
[for MVP: probably single metric queries]
A Visual Representation of the Protocol
What Happens | Auditor | Storage Provider | Aggregator | Data Availability Layer | Smart Contract | Onchain | Offchain | |
Init | Auditors List | Auditors identities are set up in a smart contract | Commitment to Auditors identities onchain | |||||
Epoch 1 start | Survey Protocol | Query SPs on metrics and build a local table | Answer Auditors’ queries | |||||
Commit to the table and signs the commitment | ||||||||
Epoch 1 deadline | Send Commitment and the signature onchain | Check Auditors’ signatures | Signed auditors commitment to the table | Auditors:
- list of pk ↔ SP_ID
- local tables | ||||
Epoch 2 start | Aggregation | Send table to the Aggregator | ||||||
Epoch 2 Deadline | ||||||||
Epoch 3 starts | ||||||||
Posts onchain a list Auditors who sent a valid table
[A_1, A_i.Commit, 0/1 ; ...; A_k, A_k.commit, 0/1 ] | ||||||||
Aggregates Auditors tables into
- T1_Aggr
- T2_Aggr
- T3_Aggr | ||||||||
Commits to each table and signs the commitments | ||||||||
Commitments and signatures are sent onchain | Check Aggregator's signatures | Signed Aggregator commitments onchain | ||||||
Check of correct aggregation | ||||||||
Run Data Availability Protocol for the Aggregated Table | There is a protocol for Data Availability Protocol of Aggregated Tables
[posting onchain for the MVP?]
| |||||||
Epoch 2 deadline | ||||||||
Anytime | Client Query | List of SP meeting query requirement (type of query supported TDB → single metric for MVP) |
Ethereum Costs and Compatibility of the Protocol
We want here to understand which is the best way to proceed with the aggregated table produced by the Aggregator.
The reference scenario is the one where the aggregated table is onchain and is accessible by a smart contract. In particular, it is possible for the smart to query the table during executions and making decision based on the query (which, in our setting, corresponds to giving out rewards based on the aggregated metrics in the table).
Moreover, we assume that a table is created from scratch during the very first round of the Storage Metrics DAO protocol and for the following rounds it is updated according to the new metrics values coming from the auditors. This means that in our analysis we need to split the cost into two different processes
- Table Onchain Creation
- Table Onchain Update
Before getting into the details of each option that we are considering, we report here a brief recap of Ethereum gas costs.
Gas Costs
- 21000 units: transaction cost
- 68 units: cost for each byte in the transaction
- 20000 units: cost for changing a 32 bytes (ETH unit) zero value to non-zero
- 5000 units: cost for changing a 32 bytes non-zero value to a different non-zero value
- 10000 units bonus: gas credit for changing a 32 bytes non-zero value to zero
Table Onchain Creation
Option 1: One table for each metric
We have:
- 3 different table (one per metric)
- 2 columns per each table ([SP_ID; Metric_Value])
- 200 rows per table
- Each Metric_Value entry in the table is 1 byte
- Each SP_ID entry in the table is 3 bytes
This leads to a total of 2400 bytes
- 3* 3* 200 = 1800 bytes in total for SP_ID entries
- 3*200 = 600 bytes in total for Metric_Value entries
It means that the total cost would be
21000 + 68 * 2400 + 2400*20000/32 ~1.68M gas units, which corresponds to ~170 $
We can improve on the above by using 3 different tables but having SP_ID in only one of the 3. This would lead to
3 different tables (one per metric)1 table of 2 columns [SP_ID; Metric_Value]2 tables of 1 column each [Metric_Value]200 rows per tableEach Metric_Value entry in the table is 1 byteEach SP_ID entry in the table is 3 bytes
This leads to a total of 1200 bytes
3 * 200 = 600 bytes in total for SP_ID entries3 * 200 = 600 bytes in total for Metric_Value entries
It means that the total cost would be
21000 + 68 * 1200 + 1200*20000/32 ~ 850k gas units, which corresponds to ~85 $
Option 2: A Unique Table with Metrics + an Indexing Table for Navigation
A second option would be to have a unique table of the form
[SP_ID, Metric1_Value, Metric2_Value, Metric3_Value]
and a separate table used for indexing and lookup for SP_ID.
In this case, we would have
- 1 table of 4 columns: [SP_ID, Metric1_Value, Metric2_Value, Metric3_Value]
- Each SP_ID entry is 3 bytes
- Each Metric_Value entry is 1 byte
- Sorted by MinerID to facilitate the binary search for MinerID→row lookup
- 200 rows
- 3 table of 1 column each used for lookup (per metric lookup table)
- Each entry of the table is 2 bytes which correspond to row in the main table
- Rows are sorted by the Metric in question
- 200 rows for each table
This leads to a total of (3 + 3)*200 + (3 * 2)* 200 = 2400 bytes
It means that the total cost would be
21000 + 68 * 2400 + 2400*20000/32 ~1.68M gas units, which corresponds to ~170 $
Note: Option 2 becomes better than option 1 the more metrics we add (row numbers are 2 bytes while SP_ID entries are 3 bytes each [at the moment. It can be il will grow in the future])
Cost Estimation for Table Creation and posting onchain
According to the calculation above, the cost of creating and posting the table onchain is ~ 170 $.
Option 3: Table based on thresholds
We can decide to build the table according to some threshold T_Metric which defines good and bad behaviour according to a particular metric.
As a way to save gas costs, we could include in the table only the SP_ID that actually meet the threshold, end eventually add/remove over time the ones that start/stop meeting the threshold.
Table Creation cost should be comparable to the former cases (depending on wether we go for Option 1 or 2).
Option 4: Offline Table
In this scenario the table is maintained offchain by the aggregator. The aggregator is somewhat trusted wrt data availability in the sense that the only thing that is onchain is a commitment of the table. The cost of creating such table is then minimal.
Thus, the access of the table is conditioned to the correct maintenance of it by the aggregator. Moreover, we would lose smart contract access to it.
Table Onchain Updates
For table updates we have to consider
- Change on SP_ID entries
- Changes non-zero → non-zero on Metric_Value entries
Option 1: One table for each metric
We have:
- 3 different table (one per metric)
- 2 columns per each table ([SP_ID; Metric_Value])
- 200 rows per table
- Each Metric_Value entry in the table is 1 byte
- Each SP_ID entry in the table is 3 bytes
This means that in the "worse case scenario”, where we change all the Metric_Value entries in the table we'll have to change a total of 3*200 = 600 bytes
We'd have to change all the SP_ID entries (in order to re-build the sorting), which means
3*200*3 = 1800 bytes
It means that the total cost of a full update would be
21000 + 68 * 2400 + 2400*5000/32 ~560k gas units, which corresponds to ~55$
Option 2: One table for all the metrics
In this case, we would have
- 1 table of 4 columns: [SP_ID, Metric1_Value, Metric2_Value, Metric3_Value]
- Each SP_ID entry is 3 bytes
- Each Metric_Value entry is 1 byte
- 200 rows
- 3 tables of 1 column used for lookup
- Each entry of the table is 2 bytes
- 200 rows
This means that in the "worse case scenario”, where we change all the Metric_Value entries in the table we'll have to change a total of 3*200 = 600 bytes
We could also have to change the lookup table, and in the worst case this would mean to change 2*3*200 =1200 bytes
Imagining the table as an array of tuples, we'd actually have to pay the updating cost also for the SP_ID part (since we'd modify the tuple [SP_ID, Metric1_Value, Metric2_Value, Metric3_Value]).
This means we'd have to add a cost of 3*200 = 600 gas units
It means that the total cost of a full update would be
21000 + 68 * 2400 + 2400*5000/32 ~560k gas units, which corresponds to ~55$
Variations on Option 2
- 1 table of [miner ID] sorted by SP_ID
- 1 table of [Metric1, Metric2, Metric3] sorted by SP_ID
- 3 tables of [row_number_in_previous_tables] sorted by MetricX_Value
This means that in the "worse case scenario”, where we change all the Metric_Value entries in the table we'll have to change a total of 3*200 = 600 bytes
We could also have to change the lookup table, and in the worst case this would mean to change 2*3*200 =1200 bytes
It means that the total cost of a full update would be
21000 + 68 * 1800 + 1800*5000/32 ~425k gas units, which corresponds to ~42$
Note: Not worth for now (improvement on Option 2 is too little)
Like Option 2 but with only top 20 SPs onchain (it cuts costs by 10x). Gas cost would be:
21000 + 68 * 240 + 240*5000/32 ~75k gas units which corresponds to 7.5$
Option 3: Table based on thresholds
The table shows the SP_ID that actually meet the threshold, and eventually add/remove over time the ones that start/stop meeting the threshold.
Table Update cost depend how variable the SP performance are wrt the threshold.
If we assume that SP are providing a consistent level of service (meaning, no huge oscillation over multiple audits), then the update cost is minimal.
In worst case scenario the cost is comparable to the one of the other cases
Option 4: Offline Table
Since we do not have any update happening on chain, this solution has no updating costs. Each round maintains the minimal constant cost of having a commitment of the table onchain