🎯

Direct data onboarding & FIL+ technical design

Creator

Alex North

Created

Jun 8, 2023 11:12 PM

Project

Storage Programmability

Outline

See 🎯Direct data onboarding: impacts & work outline for a high level overview of the goals and expected impacts of this project.

Summary of technical changes

Instead of PublishStorageDeals, a FIL+ allocation is made by a client directly with the Datacap/Verified Registry actors (already possible today).
New actor methods for activating a sector or committing new data to an existing sector (SnapDeals) accept additional parameters. These new methods facilitate a flow that skips the built-in market actor entirely.
At pre-commit, an SP must specify a sector’s data commitment (unsealed CID), but does not need to specify the structure of that data nor any deals or FIL+ allocations.
At activation time, an SP specifies the structure of data that has been committed to the sector, a sequence of piece CIDs and sizes.

The SP specifies the IDs of any FIL+ allocations claimed by a piece of data being committed.
The SP specifies the IDs of any built-in market deals (later: any market deals) satisfied by a piece of data being committed.
Both of these are optional: no allocation or deal ID is required.

The miner code verifies that the pieces of data are included in the sector’s data commitment, then notifies the verified registry and built-in market actor to claim/activate the associated records.
The new method parameters make space for notifying any actor about such a data commitment, but actually sending those notifications is deferred to a future change.

Design principles

This design is informed by the following principles

The existing onboarding flow keeps working with no necessary change from SPs. The existing flow must and can only use the built-in market, in the same way it does today.
The new flow supports onboarding FIL+ verified data with no interactions with the built-in market (especially, no deal required).
The new flow supports onboarding non-FIL data with no interactions with either the built-in market or verified registry actors. Participants only pay costs for the facilities they use.
The new flow can also support the built-in market, so that the old flow is ready to be deprecated once participants migrate. We’re ready to cut the technical debt of having two flows.
The new flow is ready to support alternative markets with no API changes, though actually allowing them is deferred.

Design

The schemas described below include consideration for a number of other potential, future features of storage and data onboarding. These are mostly represented by placeholder fields in method parameters which are to be initially set to nil, but allow for future specification of their values without requiring SPs to migrate to new method numbers or parameter schemas. These future features are outlined at the end.

Workflows

A sketch of the current verified data onboarding workflow is:

Client signs a deal proposal message and transmits it to SP off-chain. This specifies the piece CIDs and terms. The parties exchange the data to be stored.
SP publishes the deal proposal to the built-in market actor, committing to the deal. The market actor acts as a delegate for the client to create a datacap allocation with the verified registry actor. The client receives the message CID for the deal publication and monitors this to be aware whether the deal has been agreed, and learn the deal ID.
SP pre-commits a sector, naming the deal IDs. The miner actor verifies that the deals are valid, and computes the corresponding sector data CID from the deal piece CIDs. The client monitors for their deal ID, and learns the corresponding sector ID.
SP proves and activates the sector. The miner actor activates the deals with the market, which again checks their validity. The market returns the associated datacap allocation IDs. The miner actor claims those allocations from the verified registry, and computes QA power. The client monitors the sector ID at least until activation. The client may monitor the market deal ID and/or sector ID for ongoing status.

The proposed onboarding workflow for Direct FIL+ is:

Client publishes a datacap allocation directly on-chain with the verified registry actor. This specifies the piece CID and terms. The SP can observe the allocation to them directly in chain state (we could add an actor event to support this). The parties exchange the data to be stored.
SP pre-commits sector and specifies the sector’s data CID (instead of computing it from deals).
SP proves and activates the sector. The SP provides the piece CIDs comprising sector content, along with any associated datacap allocation IDs (instead of fetching from the market). The miner actor verifies that the pieces match the sector’s unsealed CID. The miner actor claims any specified allocations from the verified registry, and computes QA power. The client monitors their allocation ID, and learns the corresponding sector ID when it is activated. The client may monitor the sector ID for ongoing status (the claim doesn’t track status).

The proposed flow also supports deals with the built-in market, or (in the future) other smart contracts with the following additional but optional parameter:

During sector activation, for any data piece, the SP also provides the address of a market contract and a notification payload message. The miner actor invokes a standard notification receiver method at the requested address after activating the sector. The receiver can trust that the miner has proven storage of the piece of data in the sector.

The current workflow remains in place for a transition period. An SP may publish storage deals, and specify them at sector pre-commit. If they do so, they must onboard that sector via the existing ProveCommitSector methods.

Storage miner actor

State

No changes to miner state schemas are necessary, but some fields become redundant, and the interpretation of some fields changes slightly. No migration is necessary to this state.

SectorPreCommitInfo

The DealIDs field remains in use for onboarding that continues to use the old flow during a transition period. It is empty and ignored for the new flow. Pre-commit deal IDs can be removed in a future FIP when after deprecating the old flow.

The UnsealedCID (CommD) field changes from optional to required. Miners must always specify the unsealed CID at pre-commitment.

When the old flow is deprecated by a subsequent FIP, the Expiration field may also be removed from pre-commit, and specified during sector activation.

SectorOnChainInfo

Sector on-chain info no longer references deals, and the DealIDs field is to be ignored. It can be removed in a future migration. The miner actor doesn’t persist deal information in state long term.

The DealWeight field is interpreted to carry the weight of any non-zero data in the sector, not only the weight of deals made via the built-in market actor.

Sector activation

The miner actor exports new methods for sector onboarding. With the new flow, Deal IDs are not specified at sector pre-commit, and deal information is not fetched by the miner actor from the built-in market actor. Instead, the pieces of data that comprise a sector are declared as a manifest when a replica or update is activated (proven). The provider must commit to the both the sealed and unsealed CIDs at pre-commit (this is already available via the PreCommitSectorBatch2 method).

Each piece in a manifest may declare a FIL+ verified data cap allocation ID that it satisfies. When activating a sector, the miner actor will attempt to claim that allocation directly from the verified registry actor and, if successful, calculate quality-adjusted power according to the piece’s size.

Each piece in a manifest may specify the address of an actor and a notification payload, to be sent when the sector is activated. When activating a sector, the miner actor will synchronously invoke that method with the piece CID and payload. This functions as a notification that the piece has been committed, e.g. to a marketplace. The miner actor will ignore the success or otherwise of that notification; the sector is committed regardless.

So, in the new ProveCommitAggregateX or ProveReplicaUpdateAggregate the miner actor:

computes a data commitment from the piece manifests (and implicit zero-padding)
confirms that the computed data commitment matches the pre-committed unsealed CID (when necessary)
verifies the proof that the sealed CID corresponds to the unsealed CID
attempts to claim any FIL+ allocations specified by the SP
attempts to invoke a notification method on any addresses specified by the SP

The new ProveReplicaUpdateAggregate also includes placeholder for sector expiration extension so that, in a future FIP, data may be replaced and the sector expiration extended in a single operation.

// Information provided by SP when activating/updating a piece of data
// in a sector.
// At minimum, the piece CID and size must be specified.
// Everything else is optional or supports future functionality.
struct PieceActivationManifest {
	PieceCID:                  CID
	Size:                      PaddedPieceSize
	// Each piece may specify a FIL+ verified data cap allocation ID
  // that it fully satisfies.
	VerifiedAllocationID:      Option<AllocationID>
  // Placeholder for future on-the-fly allocation/claim with signed voucher.
  VerifiedAllocationVoucher: Option<?>
	// Placeholder for previous sector that claimed the same allocation.
	// Supports future recovering lost pieces, and moving between sectors.
	VerifiedReplace:           Option<?> // Must be Nil
	// Each piece may specify hooks to notify on commitment.
	Notify:                    []{
		Address:  Address
    Payload:  []byte // E.g. an encoded deal ID to activate
    // Gas limit might be added here in the future.
  }
}

///// Prove Commmit /////

// Sector metadata is declared at ProveCommit or ReplicaUpdate
// (previously, piece data was fetched from the built-in market actor).
struct ActivationManifest {
	Sector:         SectorNumber
	// Declaration of pieces that make up the sector data.
	// Implicit "zero" piece fills any remaining capacity.
	Pieces:         []PieceActivationManifest
  // Placeholder for future support of deadline assignment.
	Schedule:       Option<?> // Must be Nil
	// Placeholder for future support of data term commitment/expiration.
	DataTerm:       Option<?> // Must be Nil
	// Placeholder for future support of storing data commitment on-chain.
	PersistDataCommitment: Option<?> // Must be Nil
}

struct ProveCommitAggregateXParams {
	Sectors:                    []ActivationManifest
	AggregateProof:             []byte
	// Placeholder for inclusion proofs for large allocations.
  // Multiple sectors may satisfy parts of a single allocation (ID or voucher).
  // Entry schema to be determined by future FIP.
  // See discussion #708.
  VerifiedAllocationClaims:    []{} // Must be empty
	// Whether to abort if any sector activation fails.
  RequireActivationSuccess:    bool
	// Whether to abort if any notification fails.
  RequireNotificationSuccess:  bool
}

struct ProveCommitAggregateXReturn {
  Sectors: []{      // Array parallel to params
    Activated: bool // Whether sector activated
		Power:     StoragePower // Worthwhile? What about initial pledge too?
    Pieces: []{             // Parallel to parameter
      Claimed: bool   // Whether any verified range was claimed
      Notifications: []{ // Pass-through result from notifications
        Code:  ExitCode
        Data:  []byte
      }
    }
  }
}

///// Replica Update /////

// Note: this is structured to support aggregated proof verification,
// even though we don't support that proof yet.
// When the proof type is supported, the proof bytes and type can be changed
// without switching to a new method.
// Note: supports re-snap, but this will not be permitted yet.
struct UpdateManifest {
	Sector:              SectorNumber
	Deadline:            uint64
	Partition:           uint64
	NewSealedSectorCID:  Cid // CommR
	// Declaration of pieces that make up the new sector data.
  // The pieces might be new, or maintained existing.
  // All maintained verified claims must be re-specified, and dropped
  // claims explicitly identified below.
	// Implicit "zero" piece fills any remaining capacity.
	Pieces:              []PieceActivationManifest
	// Placeholder for IDs of verified claims being dropped.
	DropClaims:          []ClaimID
	// Placeholder for manifest of provably-removed pieces.
  // Supports future data deletion.
	RemovedPieces:       [] // Must be empty
	// Placeholder for future support of expiration extension or data term.
	DataTerm:            Option<?> // Must be Nil
	// Placeholder for future support of storing data commitment on-chain.
	PersistDataCommitment: Option<?> // Must be Nil
}

struct ProveReplicaUpdateAggregateParams {
	Sectors:                     []UpdateManifest
	AggregateProof:              []byte // Before aggregation, concatenate proofs
	UpdateProofType:             RegisteredUpdateProof
	AggregateProofType:          RegisteredAggregateProof
  VerifiedAllocationClaims:    []{} // Must be empty
  RequireActivationSuccess:    bool
  RequireNotificationSuccess:  bool
}

struct ProveReplicaUpdateAggregateReturn = ProveCommitAggregateXReturn

Failure handling

Each batched sector activation comprises multiple sectors, each with multiple data pieces. Each data piece can have a single verified claim and multiple notifications. Any of these items might fail, but only limited ability to handle individual failures in a group is practical.

The activation of each sector is independent. A failed activation will not cause the method to abort, unless all activations fail. An SP can specify RequireActivationSuccess=true in the top-level request parameters to instead require every sector activation to succeed.

Sector activation includes claiming of any verified allocations. If a claim fails for one piece, no claims will be made for any piece in the sector and the sector will not be activated. An SP cannot chose to have activation proceed despite an invalid claim; they can instead resubmit the failed sector with only the remaining valid claims.

Sector activation does not include notifications. Notifications are sent strictly after activation, and only for successfully activated sectors. If a notification call returns a non-zero exit code, sector activation will proceed regardless. An SP can specify RequireNotificationSuccess=true in the top-level request parameters to instead require every notification to succeed, aborting the entire operation if one fails. There is no ability to abort the activation of a single sector only in response to a failed notification.

This is a change in semantics from the current flow, where a failed deal activation (i.e. notification) happens first and leads to an individual sector failing, and verified claims happen second and must all succeed.

Notifications

When a piece manifest specifies one or notification receivers, the storage miner invokes these receivers after activating the sector or replica update. The receiving actor must accept a standard FRC42 method number and parameter schema.

The miner actor will invoke each receiver address only once, with a batch of notification payloads.

struct PieceInfo {
	// The piece data commitment and size.
	CID:  CID
	Size: PaddedPieceSize
	// Receiver-specific identifier.
	// E.g. an encoded deal ID which the provider claims this piece satisfies.
	Payload: []byte
}

// Each sector may have multiple pieces sharing the sector's 
// number and commitment/expiration.
struct SectorChanges {
	Sector:          SectorNumber
	// Epoch until which the data is committed to the sector.
	CommitmentEpoch: ChainEpoch
	// Placeholder for future deletion support.
	ExpirationEpoch: Option<ChainEpoch> // Always Nil for now
	// Information about a piece added to (or retained in) the sector.
	Added:           []PieceInfo
	// Placeholder for future proven deletions
	Removed:         [] // Always empty for now
}

// Each notification may contain multiple sectors and pieces.
// Only pieces that the SP requested notification for are included,
// the rest of the sector content is omitted.
struct SectorContentChangedParams {
	Sectors: []SectorChanges
}

// For each piece in each sector, the notifee returns an exit code and
// (possibly-empty) result data.
// The miner actor will pass through results to its caller.
struct SectorContentChangedReturn {
	Sectors: []{
    Added: []{
			Code:   ExitCode
      Data:   []byte
		}
  }
}

// A receiver of SectorContentChanged should be idempotent;
// it should be prepared to receive a second notification of the same
// piece+payload with the same sector information, or with different
// sector information (e.g. a different commitment epoch, or different
// sector entirely).

const SECTOR_CONTENT_CHANGED_METHOD = frc42_hash("SectorContentChanged")

💡

Invoking untrusted code during sector activation has potential for contracts to grief SPs by resource exhaustion. And also possibly SPs to exploit buggy notification receivers. Security details need to be figured out. For now, the miner actor will only notify the built-in market actor. Arbitrary addresses can be allowed after we both learn from this schema in production, and take more time to design appropriate security measures.

Transition details

Existing onboarding methods remain in place, but internally adjust to new flows.

PreCommitSectorBatch2 can accept deal IDs, and now always required unsealed CID.
ProveCommitSector (old) will read deal IDs from pre-commit state, if any. It will activate them (by existing ActivateDeals), but won’t write them to sector info. Can only use the built-in market. Opts to fail if deals fail (preserve behaviour)?
ProveCommitSectorX will reject deal IDs from pre-commit state and use the manifest instead. Can support direct FIL+ and send activation notifications

Storage market actor

The built-in storage market actor is already pre-authorized to act as a delegate for verified registry clients, and thus allocates data cap according to the verified field in the DealProposal message.

State

A new field maps sector numbers to the deal IDs that the market has been notified are stored in those sectors.

// New structure storing per-sector deal information.
struct SectorDeals {
	Deals:            []DealID
}

// Existing deal state structure get a new field with sector number.
pub struct DealState {
	  // All existing fields as today.
		// ...

    // 0 if not yet included in proven sector (0 is also a valid sector number)
    pub sector_number: SectorNumber,
}

struct State {
	// All existing state as today.
	// ...

	// Existing mapping of deal state by ID.
  // Since deal state includes sector number, this gives deal->sector index.
	States CID // AMT[DealID]DealState

	// New mapping of sector IDs to deal IDS, grouped by storage provider.
	SectorDeals CID // HAMT[Address]HAMT[SectorNumber]SectorDeals
}

Deal activation

The PublishStorageDeals method is unchanged.

When an SP activates a piece with the old onboarding methods, deals are activated via the legacy ActivateDeals method, which returns the necessary verified allocation IDs to the miner actor. The ActivateDeals method parameters are expanded to include the sector ID for each deal, in order to update the SectorDeals mapping in state.

When an SP activates a piece with the new onboarding methods, deals are activated by the FRC42 SectorContentChanged method instead of ActivateDeals. The implementation checks that the piece CID and size match the deal ID nominated in the notification topic, and considers the deal active if so. If piece fails to meet the conditions to activate a deal, the miner actor can activate the sector anyway (depending on SP’s preference).

Termination

The built-in market retains a privileged notification when sectors are terminated ahead of their commitment. This notification is not available to other actors (because it can be called from cron).

The termination notification is invoked for all sectors that have any non-zero data (identifiable by non-zero DealWeight), but only when terminated early. The miner actor no longer carries information about which, if any, deals are affected by any particular sector. For each terminated sector, the built-in market actor looks up the SectorDeals mapping and marks any deals found as terminated (but defers further processing until cron, as current). This replaces OnMinerSectorsTerminate method, which is removed.

⚠️

This termination notification is not sustainable in the long term. The built-in market must eventually give up its privileged notification of sector state and resort to best-effort mechanisms available to user actors.

struct SectorTerminatedParams {
	Epoch:   ChainEpoch
	Sectors: Bitfield // Sector numbers
}

Gas impacts

The overall gas impact of these changes is to enable a far more gas-efficient workflow for onboarding data that does not need a built-in storage market deal. This could account for almost all data onboarding today, if participants adopt the new process. See 🎯Direct data onboarding: impacts & work outline.

Some gas impacts on existing flows are expected:

Simply deferring the specification of deal IDs from pre-commit to prove-commit will reduce costs associated with verifying deals from pre-commit.
Maintaining the new SectorDeals mapping in the built-in market will increase the gas cost of deal activation, when deals are used.
Notifying the built-in market actor of early termination for any sector with data, and inspecting the SectorDeals mapping, will increase the gas cost of termination.

Note: there’s an idea to flip a bit in per-sector info to indicate whether built-in market notification is needed. Only relevant when there are a lot of direct-data sectors though.

Deferred functionality

The following features are currently deferred from the minimal implementation. Placeholder fields in parameter schemas have generally been included so that their future implementation need not require operational migrations to new methods.

Activation notifications to actors other than built-in market

Security issues need thought

Signed vouchers for FIL+ allocations that can be submitted by the SP instead of the client

Also “instant” version for on-the-fly allocation and claim while activating sector

Re-committing verified data that was lost
Proven data deletion
Verified allocations larger than one sector (scalable FIL+)
Notifications of sector extension

May not be relevant, if replaced by re-activation notifications

Pull-style getters for sector state
SP choice of Window PoSt scheduling for new sectors

Migration

No migration of miner actor state is necessary. We could opt to migrate to remove DealIDs from SectorOnChainInfo, but there is little to gain from this. It could be done alongside another more substantial migration in the future.

A migration is necessary to initialise the built-in market actor’s SectorDeals mapping. This migration must read all sector infos, but need not write them.

Future workflows

One goal of this design is to support future user-programmed smart contract actors to offer much improved deal-making functionality. While none of that is directly enabled by this proposal, the new APIs are constructed to support it in the future. Some are sketched below for the purpose of verifying the capability of the new APIs.

Negatively-priced deal

A negatively-priced deal is one where the SP pays the client, essentially for the datacap allocation. Claiming the datacap allocation must be conditional on the SP activating the associated deal.

This is most easily achieved by a market actor that acts as a delegate for the client’s datacap. The market actor receives a pre-payment from the SP, and remits it to the client upon creating the associated datacap allocation. The market does not require any notification of piece activation, because the SP has already paid.

Instant deals

The built-in market requires deals to be published before they can be activated into a sector. An instant deal is created and activated at the same time, when the SP proves the data in a sector.

This can be achieved by a market actor that accepts a signed deal proposal embedded in a piece activation notification payload. The client transmits a signed deal proposal message to the SP off-chain, and the SP submits it on-chain only at the last moment of sector activation. This removes at least half the on-chain cost of deal accounting, since the deal is written to state just once, and never read.

The same flow can support instant deals from smart-contract clients, which can skip the off-chain transmission of a signed deal too.

Signed FIL+ allocation vouchers submitted at sector activation can support a similar flow for datacap. Smart-contracts can be voucher signatories.

Loooong deals

A market could support deals that are longer than a sector’s maximum lifetime by supporting the movement of data between sectors while satisfying a single deal.

This can be achieved by a market actor that activates deals with longer terms than the associated sector, but keeps track of when that sector expires. If the SP moves data to a new sector before the old one expires, the deal persists. Otherwise, the market can penalise the SP for early exit.

To move deal data from one sector to another, the SP activates the piece into a new sector and submits a notification payload identifying the deal to be continued. The market updates its tracking of when the data’s sector will expire.

A similar flow could support deal transfer to a new SP (with client approval).