Problem
How to incentivise the availability and retrieval of data.
To anyone? To specific parties?
Paying for specific retrieval is a different problem to paying for general availability. We may want both.
Some data needs high retrievability, but some does not. Don't want to pay all the costs of retrievability for archive data that is rarely/never fetched. The latter might be better suited by one-off retrieval deals like current system.
"Primary": retrieval from storage providers
- Mechanism to incentivise/ensure baseline retrievability from a storage provider who is proving they have the data.
- May have a cost, may not be amazing performance (e.g. unsealing), not replicated
- We can probably establish this as a baseline expectation for all providers, i.e. taking deal means committing to make the data available at least at a low level, for a price. It's not a universal expectation today: we should make it one.
- Fundamentally coupled to Filecoin
"Secondary": retrieval from caches
- Mechanism to ensure high QoS retrievability from node providing distribution, rather than primary storage
- Not fundamentally dependent on Filecoin: could be on just IPFS, or even web2
- Storage providers could do both primary and secondary. Dedicated retrieval providers could do only secondary.
Secondary retrieval has three parties involved: the host (providing the data), client (fetching the data) and principal/operator ("owns" the data, wants it to be retrieved). By analogy: AWS, the browser client, and person with AWS account. The principal pays the host to provide data to client.
In Web2, the host keeps logs and bills the principal in arrears. The principal trusts the host to report accurately. Principal can independently verify the availability and performance, and indirectly have rough verification of billing (e.g. correlation with other metrics).
Idea: retrieval vouchers
Principal provides one-time vouchers to client to use for fetching resources from host. Client provides voucher in a request header. Host aggregates them, periodically redeems from escrow funds posted by principal.
- Incentivise specific retrieval, not general availability
- Host cannot fake logs
- Principal needs to authenticate clients, else host might pose as client
- Requires a server to generate vouchers, might be expensive
Idea: keepers
Incentivise general availability of specific data by testing it periodically. Some broker incentivises "keepers" to test retrievability from hosts. Principal commits payment for specific data items and time period. Keeper draws a ticket informing it of which data item/s to fetch. If it finds a host that serves the data, posts a witness (digest?) that it retrieved successfully. Both host and keeper earn a fee. Keeper earns more by finding a different host than other keepers (identity/sybil problem with hosts).
Can we do specific funding by principles, as well as network funding for, e.g. "all data in deals"? But some data is not worth paying for high retrievability. Maybe client pays a fee to the network for availability, which then pays the broker?
- Can keeper make request that masks that it is a keeper? Host must be blind to being tested. Hard with collusion.
- Keepers post stake, hosts can rat on keepers that expose selves?
- Host needs to identify self in responses to earn rewards
- Requires index-like record of all eligible content. An off-chain index of all on-chain deals? Keeper needs to prove the data fetched was eligible.
- Works maybe for piece-level data, but not IPFS-block level
- How to decentralise/distribute broker to scale?
- How to ensure more popular data (expensive to host) is incentivised more as popularity grows?