On-chain Aggregator

Creator

Created

May 17, 2023 4:38 PM

Stage

Draft

Goal

Short-term unblock small deal via L1 chain

Design

If the CommP and how to get the data is available on the L1 chain, there is little reason to go through off-chain aggregation. The PoDSI proofs for all pieces will be more expensive than performing the work on-chain. There is space for optimizing the cost by performing the necessary work to generate the “bunch” in an L2. Still, during the initial iteration, I would suggest focusing on L1 and transitioning to L2 if that is too expensive.

This only works because the pieces to be aggregated are all interested in aggregating in a way proven on-chain. This differs from PoDSI as it is a more general solution with off-chain applications.

The contract does not output proofs. Operations are performed on-chain; thus, the result is assumed to be correct as long as the code is correct.

The PoDSI can still be generated off-chain for pieces based on the state of the contract, and piece_ids passed to the TakeBunch method.

Idea - L1 Aggregation Contract (Buncher)

Storage Client/Contract entry point:

SubmitForAggregation(CommPc, Size, fetch_uri, [n_replicas?]) (piece_id uint64) - submits a piece to be aggregated, stores that information in the contract under incremental ID

IsInDeal(piece_id) Optional<deal_id> - returns DealID if the piece was on-boarded in a deal, None otherwise. If we support multiple replicas, then[]deal_id .

SP/Aggregator Entry Point:

SP/Aggregator can watch and try to fetch the data if they were able to fetch enough small pieces to warrant onboarding them in a deal they call TakeBunch

ListOpen() []{CommPc, Size, fetch_uri, [n_replicas?]} - read-only, to be used off-chain

TakeBunch(piece_ids []uint64, miner_id/actor_id_for_authenticate_message_callback_passthrough) (bunch_id uint64, CommPa, Size)

TakeBunch takes in piece_ids for submitted pieces, reads their info and combines them and builds Data Segment Index (or not) at the same time, combines all these things into CommPa

This might be expensive

It replicates the work done in the go-data-segment, which implements PoDSI. The significant difference is that go-data-segment focuses on producing proofs as an output, which are not required here as the work is done on-chain, so there is no need for proving it.

A large chunk of work is similar to ComputeUnsealedCID, but it doesn’t include Data Segment Index. DSI is used for retrieval, so in theory, it doesn’t have to be mandatory/can be omitted. We could also not check the content of DSI, allowing the party performing the TakeBunch to provide a commitment to it, in which case the scenario is just better than omitting it outright, as some parties will provide valid DSI. In these cases generating valid PoDSI proof would not be possible for these pieces, but a naive inclusion proof which doesn’t provide retrievability guarantees could be generated.

It then creates an entry for the bunch, allowing a deal to be made for this CommPa against the L1 Aggregator via the FIP-0044 Standard Authentication Method for Actors mechanism.

The L1 Aggregation contract in Standard Authentication Method would parse and validate the DealProposal against the bunch it corresponds to (the signature field of the authentication can carry the bunch_id to reduce the validation cost), fields like CommPa, Size, Verified Deal status, Label, deal terms, are more deals requested.

The bunch_id has to be included in the Label field of the deal and verified by the authentication method.

The actor_id_for_authenticate_message_callback_passthrough could be used if the TakeBunch is performed by a third party different from the SP. The party which performed TakeBunch can further validate the DealProposal, checking, for example, that the deal is being made with an SP the given third party works with/for.

Acceptance of the deal is reported by callback from the market through a MarketNotifyDeal method. As the deal content was already verified during the authentication callback, the bunch_id within the Label field can be trusted to contain the correct bunch for the deal. The DealID contained within the notification can be saved in that bunch.

Why

This is guaranteed cheaper than submitting PoDSI on-chain for all these pieces.

Open Questions

Is single SP storing N replicas of piece an issue?

Is single SP storing N replicas of piece within one sector an issue?

This is hard to solve without making aggregated piece sector sized
Deals have no concept of “not with this other DealID”.
One option to avoid this is to store one replica of piece per SP.

How expensive is building CommPa and DSI on-chain?

It will depend on how many pieces there are and should be linear with the number of pieces.
If the CommPa could be built incrementally with each piece, then whoever submits the piece for it.

This can work if there is ACL in front of the contract, and pieces are guaranteed to be available at the URI. If there is no ACL, one piece could pollute the whole bunch.
Pieces from a single caller (including ACL proxy contract) could be incrementally bunched up. Then this partial bunch could be included in a bunch in the call to TakeBunch.
TakeBunch cost can be significantly reduced if partial bunches are allowed, and the cost is then shifted to the caller of SubmitForAggregation.
Build the sub-bunches incrementally. When a piece gets added, it gets folded into another piece of the same size (or takes a spot in an array if there are no pieces of the same size), bumping the now combined piece a level up. ~~At the same time, a DSI entry is built, hashed, and included in a similar structure for DSI of the sub-bunch, hashing and folding as soon as possible.~~
Building DSI incrementally is unfeasible as DSI Entries include offset of the given sub-piece within the large piece. This offset is not known until the bunch is finalised.
So TakeBunch would have to pay the cost of building the DSI.
Smaller pieces/sub-bunches cost more than larger ones.

Authentication, multiple deals, multiple replications for a single bunch

Let’s assume we want multiple replicas of bunches (currently, I didn’t assume it as a requirement to simplify, but it shows the edge case better).
The Contract has to refuse authentication if it authenticated a given bunch (or authenticated more than n_replicas times) within one PSD call.
The simplest solution is to lock out the bunch after these N tries, which is suboptimal.
Ideally, Buncher contract would know that PSD has ended and deals fell through.
The Buncher contract does not directly know if/when the deal it authenticated didn’t go through.
We don’t want to lock out in these cases.
The key is: The Buncher needs to know that the given PSD call has finished.
Heuristic: Epoch (epoch, tries) in the bunch state. Only allow up to N tries in one epoch.
Heuristic: Top-level message (sender, nonce, tries) in bunch state. Only allow up to N tries during the top-level message.
Sequential Nonce for tries? Does it help?

I don’t see how

Getting more requests for authentication is the attack vector. It does not solve the issue.

Hmm, PSD requires that one PSD uses just one MinerID, so getting more PSDs from different miner id could be a heuristic.

Heuristic: MinerID. Note down the MinerID from the deal proposal in the Buncher state, as well as a counter. When the MinerID changes, the counter increments. In the bunch state keep (observedCounter, tries). Reset tries if observedCounter is different than the counter in Buncher state.

Vulnerable to reentrancy.

Heuristic: MinerID+Epoch: the counter also increments when epoch change is observed, allowing the same Miner to retry the next epoch if nobody else is using the contract.

Vulnerable to reentrancy.

Reentrancy?

Reentrancy is an issue for the MinerID Heuristic. Someone can call PSD from the authentication callback.

How to get DataCap and prevent abuse

Groups or Groups of Groups deploy their own instances and put ACL in front then get DataCap for it.

~~DealID is not reported to the contract, IsOnboarded won’t know that a bunch was successfully onboarded in a deal. How did a different contract as a client solve it?~~

~~Possible solution: anyone can report DealID for the bunch. The fact that DealID corresponds to a bunch can be verified in a contract.~~

DealIDs are reported via MarketNotifyDeal method.
Method Definition

~~Can we use the authentication callback to count the deal as attempted?~~

Yes, the authentication callback has to verify that the market actor calls it. If that is the case, the deal was attempted.
Authentication callback should be allowed to validate only once for each bunch.

What happens when the authentication callback succeeds but the deal id is not assigned for other reasons?

Race against other providers