🛠️

Enabling Filecoin Client Data Usage

Creator

Deleted User

Created

May 8, 2023 1:47 PM

👨‍🚀

The goal of this document is to highlight the different avenues one could take to allow verifiable computing over client data stored in Filecoin.

📋 TLDR
🚨 Core Problem
🌐 Single Commitment Vision
🥢 Multiple Commitments Vision
🤝 Multiple Commitments (Trusted)
🔐 Multiple Commitments (Verifiable)
🔏 Summary

📋 TLDR

Goal: enable verifiable computation over data stored on Filecoin

Problem: Current commitment to data are not amenable to verifiable computation

Solutions: 3 different types

Change filecoin / IPFS types (best but very difficult)
Use another different commitment:

And trust users (easy but no guarantees)
OR prove both commitment relate to same data (difficult with some guarantees)

🚨 Core Problem

CommD is not at all SNARK friendly, e.g. we can’t run any verifiable computation over it efficiently.

Therefore, we want a new PieceCID/CommD which is SNARK friendly. The following sections each explores a different path to get there.

🌐 Single Commitment Vision

👉

There is a single commD commitment used for both storage in Filecoin (as in now) but also verifiable computation over the data and IPFS.

Currently, computing over a SHA-256 PieceCID is very expensive for computation so we need to have another way to commit to a pieceCID.

Pros:

👍 One can know if the FIL PieceCID that he wants to compute over is effectively stored in Filecoin

Both commitment represents exactly the same data, with complete assurance

Cons:

Retrieving a PieceCID is not ideal for clients using IPFS
Not backward compatible for Filecoin storage proof
Potential new commitment does not meet user-friendly practical requirement
Suffers in case we upgrade PieceCID commitment scheme
Binds us potentially to specific proof scheme or curve (ok)

Action Items:

Accept current time to prove statements about IPFS CIDs (very hard right now)
OR Change the IPFS CID to be Snark friendly and change the Filecoin PieceID to be that (requires research, difficult)

🥢 Multiple Commitments Vision

👉

Thesis: There are multiple PieceCID commitments of the same data, one optimized for each usage (storage, computing, streaming etc)

For example, a streaming merkle tree is better than balanced merkle tree for videos, hence one might opt for two retrieval commitments instead of one.

The link between the two can be:

🤝 Trusted
🔐 Verifiable

Pros:

👍 One specialized commitment for each usage

Ideally even more than one depending on application

Can be built on top, does not necessarily require Filecoin core changes

🤝 Multiple Commitments (Trusted)

👉

In this world, we trust the client for having computed the right commitments.

The storage commitment is stored on Filecoin as usual. The other commitments can be either stored in the notes of a deal or in a smart contract or external db…

The assumption is that we trust the client is committing correctly to its data. Filecoin is used for storage and there is no link with the “computation” commitment for example.

Pros:

👍 Easy to implement a product directly now, no change at all on Filecoin side, mostly on client tooling

Cons:

Cannot be known if a CID is stored into FIL, unless it’s the PieceCID

However, given the client is trusted for its own data, then they can trust the link between PieceCID and other commitments

📝

Example: Solana stores block data into FIL and writes the retrieval hash and the computation hash in a trusted channel (e.g. deal info label, mapping smart contract). Those that want to use Solana data use the specific trusted hash

Action Items:

Define a standard way to add support for new commitments:

Either via the “label” fields in the deal
Or an external contract that creates the mapping,
Or multiple contracts, standard interface …

Create a toy application

🔐 Multiple Commitments (Verifiable)

👉

In this world, there is a cryptographic proof that shows another commitments commits to the same data as in the PieceCID

There are two types of verifiability:

Perfectly Linked: 100% confidence that the link between PieceCIDs and other commitments are correct

unpractical today, for large sizes - especially for PieceCIDs using SHA

Weakly Linked: It admits some variation of data between the two commitments

e.g. I can only be sure that 98% of the data I wish to compute on is correctly stored on Filecoin.

📝

Some users independently store Wikipedia pages, you want to contribute and you don’t know what pages are missing using the Retrieval CID only - so you check on the filecoin network what data is missing.

Pros:

Anyone can create the links using whatever technology of their preference
Proofs can be stored anywhere, not necessary have to be verifiable onchain

Cons:

Highly likely expensive - but doable (much better than single commitment version)

neither the user nor the SP will create these links

Action Items:

Find a way to weakly map SHA256 commitments into other commitments that are easier to prove.

✅ ⚖️Translations Proofs
Find an example and generate a proof of linking between a commitment and the pieceCID (ok to be offchain & ok to be weakly linked)

Map IPFS CID into PieceCID (doesn’t have to be SNARK and can be interactive)

🔏 Summary

Solution	Pros	Cons
🌐 Single commitment	• One can know if the FIL PieceCID that he wants to compute over is effectively stored in Filecoin ◦ Both commitment represents exactly the same data, with complete assurance	• Retrieving a PieceCID is not ideal for clients using IPFS • Not backward compatible for Filecoin storage proof • Potential new commitment does not meet user-friendly practical requirement • Suffers in case we upgrade PieceCID commitment scheme • Binds us potentially to specific proof scheme or curve (ok)
🥢 Multiple commitments	• One specialized commitment for each usage ◦ Ideally even more than one depending on application • Can be built on top, does not necessarily require Filecoin core changes
— 🤝 Trusted	Easy to implement a product directly now, no change at all on Filecoin side, mostly on client tooling	• Cannot be known if a CID is stored into FIL, unless it’s the PieceCID ◦ However, given the client is trusted for its own data, then they can trust the link between PieceCID and other commitments
— 🔐 Verifiable	• Anyone can create the links using whatever technology of their preference • Proofs can be stored anywhere, not necessary have to be verifiable onchain	• Highly likely expensive - but doable (much better than single commitment version) ◦ Neither the user nor the SP will create these links