⚠️ Disclamier: everything below is (Preliminary) Work in Progress
Abstraction
We can abstract the ultimate goal of MT translation proof as the need to have a publicly verifiable proof that two vector commitments open to the same input vector.
Getting back to Filecoin
We have the following scenario:
- We have some input data D
- We have a first commitment Comm1
- We have a second commitment Comm2
Concretely, we can think of Comm1 as an IPFS CID and to Comm2 as a CommP
Question: How do one knows that both Comm1 and Comm2 are commitment to the same (original) D?
One possible answer could be to open the two commitments in a Snark and prove that they open to the same (expected) vector D.
Unfortunately, our commitments are based on Sha256, which makes unpractical to deal with multiple openings.
Merkle Tree Translation Proof and Proofs of Wrong Merkle Tree Translation
At high level (see here for more details) we could deal with this issue by having a combination of two proofs:
- An interactive proof of Merkle Tree translation (probabilistic checks on different random openings of both Comm1 and Comm2). Note that this step would not be onchain, given that one has to check all the vanilla Proofs (MT paths in case of MT) which can not be proven into a snark
- A non interactive proof of Wrong Merkle Tree translation: at a latter time, anyone retrieving D can challenge the author of either Comm1 or Comm2, claiming that there is some position that does not open to D. This challenge could be resolved on-chain, by the SP posting a (singe) proof that translation was performed correctly at the given position (i.e. a single MT path in case of commitments done via MT).
What's Wrong?
In our case dealing with MT translation proof if problematic, since MT translation proof can not be onchain and are not publicly verifiable.
(Crazy) Idea
Is there a world where we can use Proofs of Wrong Merkle Tree Translation only?
Everything would basically turn into an optimistic protocol where basically we claim that Comm1 and Comm2 (basically, and IPFS CID and a CommP) are commitment to the same data D up until there is a Proof of Wrong Merkle Tree Translation (which si basically a fraud proof).
Caveat:
- There is no "a priori” proof that the “translation” was done correctly in first place.
- One can realize that Comm1 and Comm2 are not both opening to D if and only D is retrieved
Why this idea still (maybe) make sense?
- It is enough to download D once in order to realize if there is a mismatch. Basically it is a "once for all” retireval
- If the file is not retrieved AND there is a mismatch, then the mismatch got unnoticed
- BUT: does it matter if there was a mismatch on some Comm1 and Comm2 regarding some data D that never gets retrieved?
- On one hand
- If the data is never retrieved, then one can claim that "nobody cares that this data is actually there” during the whole deal duration
- After deal duration any data can be removed from the network, so if nobody retrieved it there is no difference "in practice” in having the real D stored for the deal duration or a modified D’
- Space hardness is not put at risk by having Comm1 and Comm2 opening to two different data D and D’
- On the other hand
- we can not ensure a specific data is stored in the network a priori (up until it gets retrieved). This can be something undesirable from the SLA perspective
Problem
In general, having practical Proof of Wrong MT Translation is not trivial. Indeed, even if one gets to notice that there is a mismatch between the two commitments, identifying a single path where the two commitments differ results in an extensive search if the original file is not available (basically, if the original file was not stored). This aspect practically prevents anyone who is not storing the original file to put in place a Proof of Wrong MT transaltion.
Possible mitigations
- Having Proof of Wrong MT Translation oracles who are called if and only if a mismatch in the commitment is found.
- Oracle would take care of downloading the file from the SP and evaluate the two commitment.
- If there is a mismatch they will send a mismatch notification onchain
- Pros: Everyone can verify the mismatch by downloading the file himself
- Cons: we rely on an oracle
Why do we need an oracle? If we don’t have an oracle, then everyone can deliberately send claims of Wrong MT translation onchain. We could say that in order to put a claim of Wrong MT translation onchain one has to put down a collateral, but this will make things much more convoluted (opening the door to corner case that we should analyse)