Storage Metrics DAO MVP Protocol Specs

📂

Public project doc:

Protocol Overview

We have an initial consortium of N auditor parties A_1,...,A_n. At this stage, the aim of these nodes is to scan the network according to storage deals that are on-chain (each storage deal needs to provide retrieval).

In particular, this means that parties being audited are essentially storage providers. In future versions of the protocol, we’ll open the possibility to any IPFS node to be audited. In order to do so we’ll put in place mechanisms like indices which will allow non-SP parties to be audited according to what they claim to store and keep available for retrieval

Step 1: Initialization

Auditor List is in a Smart Contract. A commitment to the Auditors list is stored onchain.

Parties involved in Step 1:

  • Auditors (passive)
  • Smart Contract on-chain storing the Auditors list

Step 2: Survey Protocol

Survey protocol starts at epoch1_start and ends at epoch1_deadline

  • Each Auditor A_i, for each storage_deal on a CID cid queries the correspondent storage provider SP_j and checks
    • Time to first byte (Metric 1)
    • Average Retrieval Speed (Metric 2)
    • Retrieval Success (Metric 3)
  • According to the survey results, each auditor A_i builds a local table of the form
    • T_i = [SP_ID , Metric 1, Metric 2, Metric 3]
  • Local table T is committed
    • A_i.Commit = Comm(T_i) [note: here and below, we can either use Comm Sha256 or a more structured VC]
  • Commitment is signed
    • SignedA_i.Commit = Comm(sk_{A_i}, A_i.Commit)
  • The pair (A_i.Commit, SignedA_i.Commit) is posted onchain
    • For now we are not considering Availability of Auditors' table. The aggregator is trusted in the sense of propagating the tables he received
  • A Smart Contract checks SignedA_i.Commit
    • Verify(pk_{A_i}, A_i.Commit)
  • T_i is stored offchain by A_i

Survey Protocol ends at epoch1_deadline

Parties Involved in Step 2

  • Auditors
  • Storage providers
  • Smart Contract Checking valid signatures on tables

Note on the Survey Protocol Step:

  • Auditors' Tables Auditors' tables do not need to be accessed by any smart contract. This means that the way they are built and organized can be adapted to the protocol needs (i.e.: having a unique table per Auditors Vs having different per metric tables,...). The main driven here should be how to simplify the proof of correct aggregation step.
  • Auditors Offchain Storage We assume Auditors to maintain a list of SP public keys offchain, and their mapping to SP_IDs
    • [ ] PubKey

Step 2: Sharing Auditors Tables

The sharing step starts at epoch2_start , right after epoch1_deadline and ends at epoch2_deadline.

  • Each Auditor A_i sends the table T_i offchain to the Aggregator

Step 3: Aggregation

Aggregation step starts at epoch3_start, right after epoch2_deadline.

  • (optional) Aggregator Aggr posts onchain a list of Auditors A_1,...,A_k from whom he received the table of the form, and the matching of the table with the commitment (0/1)
    • [A_1, A_i.Commit, 0/1 ; ...; A_k, A_k.commit, 0/1 ]
  • Aggr aggregates the data from the valid tables received from A_1,...,A_n (tables which do not verify against the correspondent commitment are not considered by the aggregator) and produces
    • The aggregation is an average on the internal values received for each SP, for each metric (i.e. we do not consider lowest and highest level). The result is something of the form
      • T1_Aggr = [Metric 1; SP_ID1 , Metric1_value; ...; SP_IDn, Metric1_value]
      • T2_Aggr = [Metric 2; SP_ID1 , Metric2_value; ...; SP_IDn, Metric2_value]
      • T3_Aggr = [Metric 3; SP_ID1 , Metric3_value; ...; SP_IDn, Metric3_value]

      Note that we could reconsider the way we aggregate (i.e. live with arithmetic average for the MVP) if this would simplify the way we prove correct aggregation (i.e. using vector commitments for which excluding some values would turn the proof of correct aggregation way more complicated). Note: Given gas costs, especially in the long run, we can consider using a unique table rather than three different ones. See cost analysis below (option 2). The dynamics of the protocol remains unchanged.

  • Aggr Commits to the tables
    • Aggr.Commit1 = Comm(T1_Aggr)
    • Aggr.Commit2 = Comm(T2_Aggr)
    • Aggr.Commit3 = Comm(T3_Aggr)
  • Aggr signs the commitments
    • SignedAggr.Commit1 = Sign(sk_{Aggr}, Aggr.Commit1)
    • SignedAggr.Commit2 = Sign(sk_{Aggr}, Aggr.Commit2)
    • SignedAggr.Commit3 = Sign(sk_{Aggr}, Aggr.Commit3)
  • The pairs
    • (Aggr.Commit1, SignedAggr.Commit1)
    • (Aggr.Commit2, SignedAggr.Commit2)
    • (Aggr.Commit3, SignedAggr.Commit3)
    • are sent onchain

  • A Smart Contract checks SignedAggr.Commit1, SignedAggr.Commit2, SignedAggr.Commit3
    • Verify(pk_Aggr, SignedAggr.Commit1)
    • Verify(pk_Aggr, SignedAggr.Commit2)
    • Verify(pk_Aggr, SignedAggr.Commit3)
  • Aggr runs Data Availablity Protocol for T1_Aggr, T2_Aggr, T3_Aggr
    • Data Availability Protocol in the most straightforward case would mean posting the aggregated tables on-chain
  • A smart contract checks correct aggregation
    • Verify(Aggr.Commit1, Aggr.Commit2, Aggr.Commit2, A_1.Commit, A_k.Commit ) → 0/1

Aggregation Step ends at epoch3_end

Note: We considered here aggregation step happening according to Table Onchain Creation, Option 1. If we consider Table Onchain Creation, Option 2 and Option 3, the aggregation step needs to be modified accordingly (mainly for what regards the way the aggregator builds the aggregated table)

Parties involved in Step 3

  • Aggregator Node
  • Smart Contract checking correct aggregation and signatures

Note on Correct Aggregation Step

We have different options for checking aggregation

  • Option 1: Use a Snark
  • Option 2: Use Vector Commitments (overhead for using internal value still TBD. Easy if we want to use arithmetic average)
  • Option 3 Data availability protocol for Auditors' tables, not only for Aggregator's table
    • aggregation can be optimistic w/ fraud proofs

Assumptions:

  • In the protocol above we assumed a Data Availability Layer at least for the aggregated table produced by the Aggregator

  • We assume Aggregator to be honest in reporting the tables he received

Step 4: Client Query

  • Clients can query the table via a smart contract
    • Queries supported by the Client Query step: TBD
    • [for MVP: probably single metric queries]

A Visual Representation of the Protocol

What Happens
Auditor
Storage Provider
Aggregator
Data Availability Layer
Smart Contract
Onchain
Offchain
Init
Auditors List
Auditors identities are set up in a smart contract
Commitment to Auditors identities onchain
Epoch 1 start
Survey Protocol
Query SPs on metrics and build a local table
Answer Auditors’ queries
Commit to the table and signs the commitment
Epoch 1 deadline
Send Commitment and the signature onchain
Check Auditors’ signatures
Signed auditors commitment to the table
Auditors: - list of pk ↔ SP_ID - local tables
Epoch 2 start
Aggregation
Send table to the Aggregator
Epoch 2 Deadline
Epoch 3 starts
Posts onchain a list Auditors who sent a valid table [A_1, A_i.Commit, 0/1 ; ...; A_k, A_k.commit, 0/1 ]
Aggregates Auditors tables into - T1_Aggr - T2_Aggr - T3_Aggr
Commits to each table and signs the commitments
Commitments and signatures are sent onchain
Check Aggregator's signatures
Signed Aggregator commitments onchain
Check of correct aggregation
Run Data Availability Protocol for the Aggregated Table
There is a protocol for Data Availability Protocol of Aggregated Tables [posting onchain for the MVP?]
Epoch 2 deadline
Anytime
Client Query
List of SP meeting query requirement (type of query supported TDB → single metric for MVP)

Ethereum Costs and Compatibility of the Protocol

We want here to understand which is the best way to proceed with the aggregated table produced by the Aggregator.

The reference scenario is the one where the aggregated table is onchain and is accessible by a smart contract. In particular, it is possible for the smart to query the table during executions and making decision based on the query (which, in our setting, corresponds to giving out rewards based on the aggregated metrics in the table).

Moreover, we assume that a table is created from scratch during the very first round of the Storage Metrics DAO protocol and for the following rounds it is updated according to the new metrics values coming from the auditors. This means that in our analysis we need to split the cost into two different processes

  • Table Onchain Creation
  • Table Onchain Update

Before getting into the details of each option that we are considering, we report here a brief recap of Ethereum gas costs.

Gas Costs

  • 21000 units: transaction cost
  • 68 units: cost for each byte in the transaction
  • 20000 units: cost for changing a 32 bytes (ETH unit) zero value to non-zero
  • 5000 units: cost for changing a 32 bytes non-zero value to a different non-zero value
  • 10000 units bonus: gas credit for changing a 32 bytes non-zero value to zero

Table Onchain Creation

Option 1: One table for each metric

We have:

  • 3 different table (one per metric)
  • 2 columns per each table ([SP_ID; Metric_Value])
  • 200 rows per table
  • Each Metric_Value entry in the table is 1 byte
  • Each SP_ID entry in the table is 3 bytes

This leads to a total of 2400 bytes

  • 3* 3* 200 = 1800 bytes in total for SP_ID entries
  • 3*200 = 600 bytes in total for Metric_Value entries

It means that the total cost would be

21000 + 68 * 2400 + 2400*20000/32 ~1.68M gas units, which corresponds to ~170 $

We can improve on the above by using 3 different tables but having SP_ID in only one of the 3. This would lead to

  • 3 different tables (one per metric)
  • 1 table of 2 columns [SP_ID; Metric_Value]
  • 2 tables of 1 column each [Metric_Value]
  • 200 rows per table
  • Each Metric_Value entry in the table is 1 byte
  • Each SP_ID entry in the table is 3 bytes

This leads to a total of 1200 bytes

  • 3 * 200 = 600 bytes in total for SP_ID entries
  • 3 * 200 = 600 bytes in total for Metric_Value entries

It means that the total cost would be

21000 + 68 * 1200 + 1200*20000/32 ~ 850k gas units, which corresponds to ~85 $

Option 2: A Unique Table with Metrics + an Indexing Table for Navigation

A second option would be to have a unique table of the form

[SP_ID, Metric1_Value, Metric2_Value, Metric3_Value]

and a separate table used for indexing and lookup for SP_ID.

In this case, we would have

  • 1 table of 4 columns: [SP_ID, Metric1_Value, Metric2_Value, Metric3_Value]
    • Each SP_ID entry is 3 bytes
    • Each Metric_Value entry is 1 byte
    • Sorted by MinerID to facilitate the binary search for MinerID→row lookup
    • 200 rows
  • 3 table of 1 column each used for lookup (per metric lookup table)
    • Each entry of the table is 2 bytes which correspond to row in the main table
    • Rows are sorted by the Metric in question
    • 200 rows for each table

This leads to a total of (3 + 3)*200 + (3 * 2)* 200 = 2400 bytes

It means that the total cost would be

21000 + 68 * 2400 + 2400*20000/32 ~1.68M gas units, which corresponds to ~170 $

Note: Option 2 becomes better than option 1 the more metrics we add (row numbers are 2 bytes while SP_ID entries are 3 bytes each [at the moment. It can be il will grow in the future])

Cost Estimation for Table Creation and posting onchain

According to the calculation above, the cost of creating and posting the table onchain is ~ 170 $.

Option 3: Table based on thresholds

We can decide to build the table according to some threshold T_Metric which defines good and bad behaviour according to a particular metric.

As a way to save gas costs, we could include in the table only the SP_ID that actually meet the threshold, end eventually add/remove over time the ones that start/stop meeting the threshold.

Table Creation cost should be comparable to the former cases (depending on wether we go for Option 1 or 2).

Option 4: Offline Table

In this scenario the table is maintained offchain by the aggregator. The aggregator is somewhat trusted wrt data availability in the sense that the only thing that is onchain is a commitment of the table. The cost of creating such table is then minimal.

Thus, the access of the table is conditioned to the correct maintenance of it by the aggregator. Moreover, we would lose smart contract access to it.

Table Onchain Updates

For table updates we have to consider

  • Change on SP_ID entries
  • Changes non-zero → non-zero on Metric_Value entries

Option 1: One table for each metric

We have:

  • 3 different table (one per metric)
  • 2 columns per each table ([SP_ID; Metric_Value])
  • 200 rows per table
  • Each Metric_Value entry in the table is 1 byte
  • Each SP_ID entry in the table is 3 bytes

This means that in the "worse case scenario”, where we change all the Metric_Value entries in the table we'll have to change a total of 3*200 = 600 bytes

We'd have to change all the SP_ID entries (in order to re-build the sorting), which means

3*200*3 = 1800 bytes

It means that the total cost of a full update would be

21000 + 68 * 2400 + 2400*5000/32 ~560k gas units, which corresponds to ~55$

Option 2: One table for all the metrics

In this case, we would have

  • 1 table of 4 columns: [SP_ID, Metric1_Value, Metric2_Value, Metric3_Value]
    • Each SP_ID entry is 3 bytes
    • Each Metric_Value entry is 1 byte
    • 200 rows
  • 3 tables of 1 column used for lookup
    • Each entry of the table is 2 bytes
    • 200 rows

This means that in the "worse case scenario”, where we change all the Metric_Value entries in the table we'll have to change a total of 3*200 = 600 bytes

We could also have to change the lookup table, and in the worst case this would mean to change 2*3*200 =1200 bytes

Imagining the table as an array of tuples, we'd actually have to pay the updating cost also for the SP_ID part (since we'd modify the tuple [SP_ID, Metric1_Value, Metric2_Value, Metric3_Value]).

This means we'd have to add a cost of 3*200 = 600 gas units

It means that the total cost of a full update would be

21000 + 68 * 2400 + 2400*5000/32 ~560k gas units, which corresponds to ~55$

Variations on Option 2

Option 2b

  • 1 table of [miner ID] sorted by SP_ID
  • 1 table of [Metric1, Metric2, Metric3] sorted by SP_ID
  • 3 tables of [row_number_in_previous_tables] sorted by MetricX_Value

This means that in the "worse case scenario”, where we change all the Metric_Value entries in the table we'll have to change a total of 3*200 = 600 bytes

We could also have to change the lookup table, and in the worst case this would mean to change 2*3*200 =1200 bytes

It means that the total cost of a full update would be

21000 + 68 * 1800 + 1800*5000/32 ~425k gas units, which corresponds to ~42$

Note: Not worth for now (improvement on Option 2 is too little)

Option 2c

Like Option 2 but with only top 20 SPs onchain (it cuts costs by 10x). Gas cost would be:

21000 + 68 * 240 + 240*5000/32 ~75k gas units which corresponds to 7.5$

Option 3: Table based on thresholds

The table shows the SP_ID that actually meet the threshold, and eventually add/remove over time the ones that start/stop meeting the threshold.

Table Update cost depend how variable the SP performance are wrt the threshold.

If we assume that SP are providing a consistent level of service (meaning, no huge oscillation over multiple audits), then the update cost is minimal.

In worst case scenario the cost is comparable to the one of the other cases

Option 4: Offline Table

Since we do not have any update happening on chain, this solution has no updating costs. Each round maintains the minimal constant cost of having a commitment of the table onchain