Challenges of on-chain sector termination notifications for user actors

Creator

Alex North

Created

Jul 23, 2023 11:25 PM

Here’s a design direction that Zen and I have discussed in detail, but unfortunately doesn’t work.

Sector state notifications

For user-programmed storage markets and similar applications, it would be nice if sector state changes could be pushed to those actors, so that the actors could rely on those pushes to track relevant changes. E.g. for data D satisfying some deal:

Data D committed to sector S1
Data D committed to sector S2
Data D removed from sector S1
(Optional) Sector S2 faulted/recovered
Sector S2 terminated

Opt-in notifications

For data being added to a sector, we could have SPs opt in to sending the notifications by specifying the actors to notify. The 🎯Direct data onboarding & FIL+ technical design does this, but it’s not enough to implement a complete market. What if the SP later overwrites that data (when we can re-snap)? What if the sector terminates early? The market needs to know that the SP is no longer storing the data.

For cases were data is removed, it’s not enough to allow the SP to send the notification, they must be forced to send it.

Forcing notifications

We could record in sector state information about (or a commitment to) which actors were notified about the initial commitment, and hence must be notified about changes. This could be a lot of state, but ignore that for now. Then, whenever an SP changes a sector, we could force them to notify the same parties notified earlier. This works for explicit removal of sector data.

Cron for involuntary events can’t send notifications

Forced notifications don’t work so well for involuntary events. A sector can be terminated automatically by being faulty for a long period, with no interaction from the SP. The fault and eventual termination are handled by cron. We can’t make calls to untrusted code like user-programmed markets from cron (and would like to minimise use of cron altogether), so can’t send notifications to interested actors.

Forcing notifications for involuntary events

A promising direction for reducing cron work is to restrict cron to merely checking that the SP has manually performed any required processing. For example we could introduce a “process deadline” method to be invoked explicitly at the end of every proving deadline which includes the logic of faulting partitions that missed Window PoST and terminating long-faulty sectors. Such an explicit call could be forced to send notifications.

The cron handler would then merely check that the deadline has been processed manually by the SP. And if it hasn’t? Enact some cheap-to-enforce penalty like immediately faulting everything in the deadline.

So, to summarise:

An SP must make explicit an explicit call every deadline proving period to do the work currently done by cron.
This work can include forced notifications of sector state changes, including termination, to user-programmed actors.
Cron remains, but merely checks that the SP has made the call. If it hasn’t, the deadline is faulted (or otherwise penalised).

Challenge: malicious actors

The big problem with this is that it boils down to “force the SP to call untrusted user code”. Malicious user-programmed “markets” could consume arbitrary amounts of gas. Not only could be this expensive, but it could exceed the block gas limit, making the message impossible to execute.

This is a risk generally for on-chain notifications but at least, when voluntary, the SP can set appropriate gas limits, and if it all goes wrong can just chose to resubmit and not notify the malicious actor. This might cost them something, but is fundamentally recoverable and limited in scope to interactions with the malicious actor.

But if the core protocol requires the SP to send the notification, a malicious contract can hold the SP at ransom up to the value of whatever the penalty the chain will enforce for not sending the impossible notification.

Challenge: chain bandwidth & cost

Even without forced notifications, forced termination handling has a bandwidth challenge. In a disaster scenario, SPs might have many sectors terminating in a short period of time. The chain might not have enough total bandwidth to process all of the terminations within the deadline’s window. Faulting sectors will only make the systemic problem worse, as it would introduce some expensive Window PoST verifications for recovery as further competition for scarce bandwidth, or set more sectors on track for termination.

Even if bandwidth is sufficient, the gas price could be arbitrarily high. Forced termination processing could be a big financial risk to SPs, compounding the cost of any disaster. Gas lanes or an equivalent might help.