Protocol Research - Full optimistic MT Translation - WIP

Track

FilFil

DRI

Milestone

FilFil Edition 2.0: Perpetual Filecoin Archive Storage

Status

On Pause

Target Date

tl'dr

⚠️ Disclamier: everything below is (Preliminary) Work in Progress

Abstraction

We can abstract the ultimate goal of MT translation proof as the need to have a publicly verifiable proof that two vector commitments open to the same input vector.

Getting back to Filecoin

We have the following scenario:

We have some input data D
We have a first commitment Comm1
We have a second commitment Comm2

Concretely, we can think of Comm1 as an IPFS CID and to Comm2 as a CommP

Question: How do one knows that both Comm1 and Comm2 are commitment to the same (original) D?

One possible answer could be to open the two commitments in a Snark and prove that they open to the same (expected) vector D.

Unfortunately, our commitments are based on Sha256, which makes unpractical to deal with multiple openings.

Merkle Tree Translation Proof and Proofs of Wrong Merkle Tree Translation

At high level (see here for more details) we could deal with this issue by having a combination of two proofs:

An interactive proof of Merkle Tree translation (probabilistic checks on different random openings of both Comm1 and Comm2). Note that this step would not be onchain, given that one has to check all the vanilla Proofs (MT paths in case of MT) which can not be proven into a snark
A non interactive proof of Wrong Merkle Tree translation: at a latter time, anyone retrieving D can challenge the author of either Comm1 or Comm2, claiming that there is some position that does not open to D. This challenge could be resolved on-chain, by the SP posting a (singe) proof that translation was performed correctly at the given position (i.e. a single MT path in case of commitments done via MT).

What's Wrong?

In our case dealing with MT translation proof if problematic, since MT translation proof can not be onchain and are not publicly verifiable.

(Crazy) Idea

Is there a world where we can use Proofs of Wrong Merkle Tree Translation only?

Everything would basically turn into an optimistic protocol where basically we claim that Comm1 and Comm2 (basically, and IPFS CID and a CommP) are commitment to the same data D up until there is a Proof of Wrong Merkle Tree Translation (which si basically a fraud proof).

Caveat:

There is no "a priori” proof that the “translation” was done correctly in first place.
One can realize that Comm1 and Comm2 are not both opening to D if and only D is retrieved

Why this idea still (maybe) make sense?

It is enough to download D once in order to realize if there is a mismatch. Basically it is a "once for all” retireval
If the file is not retrieved AND there is a mismatch, then the mismatch got unnoticed

BUT: does it matter if there was a mismatch on some Comm1 and Comm2 regarding some data D that never gets retrieved?

On one hand

If the data is never retrieved, then one can claim that "nobody cares that this data is actually there” during the whole deal duration
After deal duration any data can be removed from the network, so if nobody retrieved it there is no difference "in practice” in having the real D stored for the deal duration or a modified D’
Space hardness is not put at risk by having Comm1 and Comm2 opening to two different data D and D’

On the other hand

we can not ensure a specific data is stored in the network a priori (up until it gets retrieved). This can be something undesirable from the SLA perspective

Problem

In general, having practical Proof of Wrong MT Translation is not trivial. Indeed, even if one gets to notice that there is a mismatch between the two commitments, identifying a single path where the two commitments differ results in an extensive search if the original file is not available (basically, if the original file was not stored). This aspect practically prevents anyone who is not storing the original file to put in place a Proof of Wrong MT transaltion.

Possible mitigations

Having Proof of Wrong MT Translation oracles who are called if and only if a mismatch in the commitment is found.

Oracle would take care of downloading the file from the SP and evaluate the two commitment.

If there is a mismatch they will send a mismatch notification onchain

Pros: Everyone can verify the mismatch by downloading the file himself
Cons: we rely on an oracle

Why do we need an oracle? If we don’t have an oracle, then everyone can deliberately send claims of Wrong MT translation onchain. We could say that in order to put a claim of Wrong MT translation onchain one has to put down a collateral, but this will make things much more convoluted (opening the door to corner case that we should analyse)