Proof of Space over DAG

Creator

Created

Mar 30, 2023 12:21 PM

Motivation

The core question is: Can we perform proofs over data structured as an arbitrary Directed Acyclic Graph (DAG)?
It stems from the desire to use IPFS CIDs instead of Filecoin specific commitment to an encoding of this data. While there are numerous other obstacles to achieving this goal, most of them can be classified as engineering tradeoffs. Some of them are:

Choice of the hash function/commitment to data blob - a trade-off between ease of use and performance of IPFS vs Filecoin
Size of data blobs/individual nodes in DAG - trade-off and can be hidden with larger abstract blobs, which can be decomposed into smaller ones for proving.

I have identified one core constraint of IPFS, which is core to the system: The data is represented as arbitrary DAG. To achieve commitment compatibility, we have to be able to work around this constraint.

The world today

Right now, the IPFS DAG, which is referenced by IPFS CID, is traversed in the data preparation step and encoded into a CAR (Content ARchive) file, which is then committed to as a vector (PieceCID).

This step is performed outside of the protocol by the Client, increasing operational complexity and making compatibility efforts between Filecoin and IPFS harder.

Solution Space

The main goal is to allow the client to use commitment to the DAG directly in the Filecoin protocol.

There are many paths to achieving this goal, but most of them narrow down to one key point: proving over arbitrary DAG.

PoS keeps working over vectors, and an additional proof provides the transformation of the DAG Commitment into Vector Commitment. (1)
PoS keeps working over vectors but internally transforms the DAG into Vector. (2)
PoS works directly over the DAG (for example, replication step node labelling running over a DAG). (3)

They all require being able to challenge arbitrary nodes (or leaf nodes) in the DAG as uniformly as possible.

Solutions 1 and 2 additionally also require the creation of self-consistent labelling of the DAG, which can then be used as mapping function $f: \mathit{DAG} \longrightarrow V$ from DAG space to vector space.

Ideally, the mapping is not performed by the client and is unique for each commitment to the DAG. This requires proof to prove the validity of the mapping.

Challenging the DAG

There exist some trivial straw-man examples of challenging the DAG, but they suffer from non-uniform challenge distribution.

Top-down

Consider a node of a DAG. This node has $M$ outgoing edges. We can choose one of these edges to traverse, and we repeat this recursively. This results in a highly biased distribution of challenges. Example:

Let’s say the root node has two outgoing edges: A, B. Under node A, there is only one node. Under node B, there are 10e6 nodes. Top-down traversal

Prover gets to choose weighting between A and B (1:1 above, prover can say 1:10), over 1000 challange.

Bottom-up

Based on labelling

most likely doesn’t work

Vector labelling the DAG

P(DAG) = V

C1 < C2

C1 hits DAG1 and C2 hits DAG2

label DAG1 with V1, same for dag2

V1 < V2