Neutron: proposal for exponential network growth

Creator

Alex North

Created

Jul 14, 2022 2:28 AM

Stage

Still Valid

💡

This is copied from an internal document drafted in July 2021 amongst a focus on the blockchain’s limited ability to account for ever-increasing storage capacity. The original document has many comments I have not copied across into public view.

@anorth, @nicola, @nikkolas, @luca, @kuba

FIP discussion: https://github.com/filecoin-project/FIPs/issues/119

This document outlines a solution to enabling exponential, or near-enough, growth in the amount of storage committed to the Filecoin network. This proposal has the big picture. Some deeper technical design docs are expected to be necessary to specify the SNARK circuits, new miner actor, and corresponding Lotus miner changes.

Background
Problem detail
Goals
Non-goals
Design ideas
Overview
Supersectors
Pre-commitment
Proof of replication
Proof of space-time
Faults
Window PoSt accounting
Merging supersectors
Cron
Deals
CC upgrades
Outstanding questions
Scale
Product implications
Sealing pipeline storage of intermediate data
Migration
Beyond the baseline
Migrating old sectors
Pre-commit randomness
Toward exponential deals
Superdeals
Effort
Alternatives
In-place migration
Q&A

Background

The Filecoin network’s capacity to onboard new storage and to maintain proofs of committed storage are limited by blockchain transaction processing throughput. This prevents the network from consistently growing any faster. Miners who are willing to significantly increase their storage in the network are prevented or slowed down when trying to do so. The recently released Hyperdrive network upgrade will raise onboarding capacity to about 500-1000 PiB/day, but we expect this capacity to become saturated.

As onboarding rates increase, the fixed amount of network throughput consumed by maintaining the proofs of already-committed storage will increase. Projections suggest this could become a significant cost to the network eventually. We also bear some risk that throughput may be insufficient to dispute fraudulent proofs, were a large party to attempt them.

Problem detail

Validation of the Filecoin blockchain is subject to a fixed amount of computational work per epoch (including state access), enforced as the block gas limit. It’s also subject to limits on the size of the state tree, though these are less sharp. These limits set a reasonable hardware requirement on being able to validate the chain and thus run a full node.

Many parts of the application logic for onboarding and maintaining storage incur a constant computational and/or state cost per sector. This results in blockchain validation costs that are linear in the rate of growth of storage, and in the total amount of storage committed. Some costs are linear in the number of individual miner actors. These linearities add up to consume the entire computational throughput, thus limiting further onboarding. The total size of the state is also of practical concern, but secondary to the cost of reading or writing that state during processing.

Linearities exist in:

Pre-committing and proving new sectors (both state access and proof verification)
Proving Window PoSt of all storage daily (state access)
Detecting and accounting for faults and recoveries
Cron processing checking for missed proofs, expiring sectors, and other housekeeping

We wish to remove or reduce all such linear costs from the blockchain validation process in order to remove limitations on rate of growth, now and in the long future when power and growth are significantly (exponentially) higher.

The proof aggregation technology of SNARKPack goes a long way toward addressing the linear cost of PoRep and PoSt proof verification. Verifying a proof aggregate is logarithmic in the number of sectors proven. In its current form, however, there remain linear costs associated with providing the public inputs for each sector’s proof.

Goals

Our goal is to enable arbitrary amounts of storage to be committed and maintained by miners within a fixed network transaction throughput. Arbitrary amounts here means at least the amounts that could possibly be committed over the next few years, as limited by external constraints like global HDD production.

Given the problem described above, we can break this down a little to:

Redesign storage onboarding and maintenance state and processes to remove linear per-sector costs, or dramatically reduce constants below practical needs
Maintain security of proof of replication and space-time for all storage
Maintain economic attractiveness of committing and maintaining storage
Maintain macroeconomic security of pledge, etc
Maintain or improve cost of making deals
Maintain discoverable and verifiable information about the content of each sector, specifically deals
Maintain reasonable expectations of miner operational effectiveness

Note: we are not concerned with linear work performed by a single miner in committing or maintaining their storage. A miner is expected to incur approximately linear cost to maintain storage, perhaps enjoying economies of scale. We are concerned when one miner’s storage incurs a cost to the whole network.

Some contextual goals for our approach include:

A solution that is in reach for implementation in the next 3-6 months. In practise, this means relying on PoRep and ZK proof technologies that already exist today.
A solution that is good enough that we won’t have to re-solve the problem within a few years

Non-goals

This proposal is scoped to solving one (complex) problem.

Out of scope:

Exponential deal growth. Deals also incur linear costs to the network, and if they grew exponentially in number would pose similar problems. This proposal does not cater to unbounded growth of deals, except by attempting to make it no harder to solve that problem later.

We think this sequencing is reasonable because (a) deals are in practise rare at present, and (b) off-chain aggregation into whole-sector-size deals mitigates costs in the near term. We expect exponential deal growth to be a challenge to address in 2022.

Design ideas

Overview

At the highest level, approaches break down as either removing a linearity entirely (enabling true exponential growth), or reducing the constants of linearity so far that they represent negligible cost even at orders of magnitude greater growth and maintenance levels (e.g. ≫ future global storage production).

The premise behind this design is that we cannot store or access a fixed-size piece of state for each 32 or 64 GiB sector of storage, either while onboarding or maintaining storage. Specifically, we cannot store or access a replica commitment (CommR) per sector, nor mutate per-partition state when accounting Window PoSt. CommR in aggregate today accounts for over half of the state tree at a single epoch, and Window PoSt partition state manipulation dominates the cost of maintenance.

The key design idea is to maintain largely the same data and processes we have today, but applied to an arbitrary number of sectors as a unit. We call this unit a supersector. This proposal redesigns state, proofs and algorithms to enable a miner to commit to and maintain units of storage larger than one sector, with cost that is logarithmic or better in the amount of storage. Thus, with a fixed chain processing capacity, the unit of accounting and proof can increase in size over time to support unbounded storage growth and capacity. This proposal assumes that miners will increase their unit of commitment if blockchain transaction throughput is near capacity.

Supersectors

A supersector is an immutable aggregation of multiple fixed-size sectors into one unit. A miner performs sector replication locally using the same SDR scheme in use today, then aggregates multiple of these sectors into a supersector for onboarding. The miner generates a commitment to the supersector as a whole, aggregating the individual sector CommRs into a Merkle tree or vector commitment (TBD), producing an aggregate commitment: CommRAgg. On-chain, the miner pre-commits only to CommRAgg, and proves replication of the supersector as one unit with an extension to the SNARKPack aggregation scheme. This extension to SNARKPack hides the individual CommRs from messages and chain state (as well as aggregating the per-sector proofs).

Each supersector has a numeric identifier, unique to the miner as sector numbers are today. The sectors comprising a supersector are ordered, but the ordering and corresponding sector indices are private information to the miner. Thus a globally unique sector identifier for replication is given by (miner ID, supersector ID, sector index). Unlike the partitions used for scheduling Window PoSt today, a supersector’s membership is fixed (but see merging).

Each supersector carries metadata equivalent to that carried by each sector today: activation/expiration epochs, pledge and penalty parameters. Thus supersectors of the same size may have heterogeneous weight, power, and pledge requirements depending on the epoch at which they were committed, as do sectors today. Deal-related metadata like deal IDs, deal weight and deal-influenced power and pledge is indexed by the sector containing the deals, to enable partial faults and merging. Thus a supersector with no deals has a fixed-size metadata regardless of size, but a supersector containing deals has additional metadata linear in the number of [sectors with] deals (but see toward exponential deals).

Aggregated proof verification cost increases with every power-of-two number of proofs aggregated. Each sector PoRep involves 10 SNARKs. Thus supersectors with less than a power-of-two-divided-by-10 sectors will waste circuit constraints and proof verification time at PoRep. This cost can be captured by the proof verification gas costs of aggregates being a piecewise function of the number of proofs, as it is today.

TODO: talk about maximum supersector size governed by the SNARK circuit constraints allocated to proving inclusion. Height of the merkle tree / length of the vector commitment.

Pre-commitment

The sector number used to guarantee sector uniqueness is a combination of the miner ID, supersector ID, and sector index. Thus, A miner must choose how many sectors they will initially aggregate into a supersector at sealing time and seal each one with the correct supersector ID and index. All sectors committed to a supersector must share a common sector size and expiration epoch. The miner selects a single epoch from which to commit to chain randomness for all sectors (see pre-commit randomness). The miner seals and computes CommR for each sector, then computes an aggregate CommRAgg from those sector CommRs, in order. The individual sector CommRs are never posted to the chain.

Holding multiple sectors to pre-commit as a batch has an operational cost for miners. Depending on their sealing pipeline throughput and prevailing gas costs, miners may pre-commit larger or smaller supersectors, and merge them later. Note that sector indices must be contiguous. If a miner’s operation fails to seal some sectors, they will need to retry those sectors until they form a complete set. See failure analysis.

Sectors may contain deals, and any non-deal sector content must be zero to enable on-chain data commitment (CommD) calculation, similar to today. The miner must identify which sectors contain which deals in order to enable subsequent computation of the data commitments. Per-sector deal ID lists are written and stored in state for the duration of pre-commit; this is linear in the number of deals. We might constrain the supersector ordering of sectors with deals to afford a compact representation in state.

The pre-commitment message on chain includes:

Supersector ID
Number of 32G sectors
Aggregated CommR
Seal randomness epoch
Expiration
Deal IDs, mapped by sector index
CC sector replacement parameters (see CC upgrades)

The miner actor code computes deal weights and the pre-commit deposit for the supersector. The miner stores pre-commit metadata on chain for reference during prove-commit.

Proof of replication

The miner waits for the challenge period to generate a challenge seed and generates Merkle inclusion proofs from challenges, and a SNARK for each sector, proving its (private) CommR, as usual. The same challenges may be used for all sectors with the same security as for a single sector (2-10 chance to cheat independent of sector count; notes) because cheating is defended by the pre-commit deposit (the same is not true for PoSt). The miner creates a SNARKPack aggregate of these PoReps with public inputs:

the miner ID
the challenge seed
the aggregated replica commitment CommRAgg
a commitment to all the proofs aggregated (Cp)

Additionally, it generates an aggregation proof for the supersector, proving:

sector identifiers are calculated correctly (from indices)
challenges are calculated correctly
each CommR is included in the relevant position in CommRAgg
each data commitment (CommD) is satisfied
all CommR are distinct (redundant?)

Public inputs to this proof are:

the inputs to the aggregated PoRep
the supersector ID
the distinct data commitments (one for each sectors with deals, plus the CC commitment)

Note that the data commitment for sectors without deals is constant. We must ensure this shared commitment can be passed to proof verification just once, and re-used for proofs of all empty sectors. In this way, proofs of data commitment will be linear in the number of deals, not the number of sectors.

This supersector proof requires a new SNARK circuit, and hence a new trusted set-up. See more detail in SNARK Input Aggregation proposals.

If supersectors are sufficiently small, then multiple supersectors may be proven together by a larger SNARKPack aggregate. Each supersector proof will have a different CommRAgg public input, so aggregation will have validation cost linear in the number of supersectors. This amortizes other per-supersector costs and is only marginally more complicated, so should be implemented up front; otherwise we’ll definitely come back to implement it sooner rather than later (ref: aggregated proofs for Hyperdrive).

The miner actor verifies the supersector proof with the sector number, CommRAgg, distinct CommDs, and chain randomness seed. Similar to today, it then computes power and pledge for the supersector unit and activates deals.

Note: today there is a limit on the maximum aggregated PoRep of 819 sectors, governed by the performance loss in providing a linear number of public inputs to verification. Given the aggregation proof that hides the individual sector CommR inputs, and future improvements to SNARKPack, we expect to be able to support much larger aggregations.

Proof of space-time

Supersectors are subject to PoSt every 24 hrs, as today. The notion of a partition of 2349 (32GiB) sectors remains as a primitive to the proof algorithm, but a partition is no longer a unit of accounting (see accounting). Window PoSt proofs are submitted optimistically, verified off-chain, as today.

A new SNARK circuit proves Window PoSt for a partition of sectors that all belong to the same supersector. This follows the same idea as the supersector aggregation proof: instead of proving against public input commRs we prove against commRs that are proven to be included in commRAgg. The circuit proves:

for each sector and challenge, inclusion of a leaf inside that sector’s (private) CommR
all sector CommRs are included in CommRAgg (public)

This circuit has public inputs of:

the challenge seed
CommRAgg

Thus, inputs are no longer linear in the number of sectors proven. See SNARK Input Aggregation Proposals.

Window PoSt proofs for a partition may be aggregated, with one public input per partition. A supersector must be proven all at once, in a single aggregate, else we would require accounting information to keep track of the contained sectors un/proven. If a supersector has more sectors than the partition size, the Window PoSt must aggregate all proofs for the supersector into one submission with SNARKPack. This aggregate proof could have a single CommRAgg input.

Proofs for different supersectors may also be aggregated into one submission. Such a submission would have one public input per supersector included. Cross-supersector aggregation is a necessary feature to constrain Window PoSt costs while most supersectors are smaller than a full Window PoSt partition (today 2349, but probably smaller), lest more Window PoSts be required for a given amount of storage..

When a supersector’s size is not an exact multiple of the Window PoSt partition size, some proof verification throughput is wasted. In theory, we could optimize the packing and aggregation of sectors from multiple supersectors into partitions, but choose not to for this baseline.

The partition size is a parameter to be determined. The wasted proof generation and verification capacity is an off-chain cost with optimistic Window PoSt verification, on-chain only for disputes and fault recoveries. A smaller value will reduce this wasted capacity and support a greater number of supersectors per deadline per GPU, but increase the number of proofs to be aggregated, the cost of which grows logarithmically. The chosen value may become the minimum recommended supersector size, which suggests a much smaller value than 2349. The cost of computing a SNARK for goes linearly with partition size, so the same GPU throughput should be able to prove the ~same amount of storage with many small proofs as with one large one.

Note: Cannot re-use challenges for different sectors. Challenge generation to be resolved

TODO: talk about disputes (which will dispute the whole aggregated submission)

Faults

A miner can declare some part of a supersector faulty by submitting a fault mask. A fault mask identifies one or more ranges of contiguous sector indexes within the supersector. Similarly, a miner may declare a “skipped fault” with a WindowPoSt by attaching a fault mask. Sectors within this fault mask cannot be sampled by a Window PoSt proof. These indices must be communicated to the SNARK proof and verification in a sublinear way.

By identifying ranges of sectors as faulty, the chain can process faults with a cost that is not linear in the number of sectors. A range could be a singleton, but could also be much larger. If sector operational faults are correlated (e.g. same drive, machine, rack, etc) and correlated sectors are adjacent in the supersector, faults can also become somewhat scale-invariant. However, if faults are uniformly distributed, then tracking faults remains linear.

The representation size (i.e. number of ranges) of a fault mask must be subject to some upper limit, which will limit the resolution of fault declarations. If the limit is a constant (e.g. 10 ranges) or grows logarithmically (e.g. 1 range per power-of-two sectors) then we achieve a sublinear bound on total cost, with the miner “paying” for more random faults by being forced to declare ranges that include non-faulty sectors. This parameter trades tolerance-for-miner-faults against potential chain congestion in a period of heavy faults.

Note that where sectors have different power and penalty parameters (because they have deals), fault accounting will need to process linear per-sector metadata to determine their power. Deal ids, sector weight & pledge for sectors with deals are mapped by sector ID in supersector metadata. This linearity could be removed by decoupling the storage market from power, as in FIL+ explicit subsidy.

Sectors that remain faulty for 14 days are declared to be terminated. Each supersector must carry a termination mask that functions similarly to the fault mask to identify permanently-unavailable sectors. As for faults, termination of sectors with deals will involve linear processing.

Window PoSt accounting

Each supersector is assigned to one of 48 proving deadlines each day. Unlike the proving partitions in use today, a supersector is an immutable and indivisible collection, so attributes like power are fixed up front. Most of the mutable state for each partition becomes irrelevant, but some moves into the per-deadline state instead, which aggregates over supersectors rather than partitions.

Deadline state includes:

a bitfield of supersector IDs allocated to the deadline
a count of total sectors comprising the supersectors
bitfields for unproven, faulty, recovering, and terminated supersectors
memoized totals of live, unproven, faulty, and recovering power
an expiration queue for supersectors
a bitfield of early-terminated supersectors
a bitfield (or AMT?) indicating Window PoSt status for each supersector
an AMT of optimistically accepted WindowPoSt proofs
a snapshot of deadline state and optimistic proofs from the prior challenge window

New supersectors are allocated automatically to deadlines with a best-effort approach to balancing the number of sectors (i.e. amount of power) to be proven at each deadline. The primary parameter to optimize is how much GPU power the miner needs on-hand to compute the SNARKs for Window PoSt. This per-miner hardware requirement still scales linearly with the power to be proven, despite the verification of those proofs becoming sub-linear. The larger units of supersectors will be less amenable to optimal packing than individual sectors, but since the miner is in control of supersector size, they can exert sufficient influence to balance their load. Note that this will answer small miners’ request to spread their proof burden/risk out over time, even if they have much less than a full Window PoSt partition of sectors (but at a net increase in gas cost of maintenance). Depending on the chosen partition size, multiple small supersectors in one deadline may require more GPUs to compute PoSt, until they are merged).

Note that this power balancing is less effective at balancing the blockchain transaction throughput required for Window PoSt, since that throughput depends on the size of supersectors.

Size limits on RLE+ encoded bitfields limit the maximum number of supersectors per deadline. With our current 32KiB limit, the worst possible dispersion of supersector IDs allows for 3912 per deadline. If supersectors can be merged, this will present no practical problem.

Merging supersectors

The ability to merge supersectors provides important flexibility in scale for different miners over time.

To merge supersectors, a miner allocates a new supersector ID and identifies two (or more?) existing supersectors, in a defined order, to be merged. The miner computes a new CommRAgg from the (private) CommRs for every sector in the new supersector, and submits a SNARK proving that all CommRs are contained in the CommRAgg.

The sectors within the new supersector are logically reindexed – the indices used at PoRep or for Window PoSt in the old supersectors are not relevant to the future. The new supersector replaces those being merged immediately. The old supersector ID cannot be reused (similar to sector IDs today).

For miners onboarding supersectors that are smaller than a full Window PoSt partition (value TBC), merging is necessary to keep GPU requirements for Window near a practical minimum. This is because some proof generation (and verification) is wasted in the non-full partitions. E.g. If we set the partition size to be 240 (i.e. > 2349/10) and a miner has 11 supersectors of only 1 sector each in a single deadline, they would need 2 GPUs to compute the 11 SNARKs in the same time that today they can compute (up to 2349 sectors) with one. If they merge into 1 supersector of 11 sectors, they only need 1/10 of a GPU.

The merged sector cryptoeconomic parameters must preserve properties established for the constituent sectors.

Initial proposed behaviours (@Kubuxu)

Sector Parameter	Merging Behaviour	Comments
Sector Activation	Arithmetic Average	Preserves total lifetime as well as time for penalties
Sector Expiration	Arithmetic Average	Preserves total lifetime of sectors <- won’t work with deals
Deal Activation	No change
Deal Expiration	No change
DealWeight	Sum ?
VerifiedDealWeight	Sum ?
InitialPledge	Sum
ExpectedDayReward	Sum
ExpectedStoragePledge	Sum

Doing Explicit FIL+ subsidy would simplify this quite a bit.

After merging, the new supersector should be ~indistinguishable from a supersector that was onboarded as a unit. In particular, the merged supersector has the same data in chain state and may be subsequently merged with another supersector, continually compressing the per-sector state representation.

TODO: implications of external parties (deal clients) not having a stable public reference to the [super]sector containing their data.

A “sector ID” is required to guarantee uniqueness of a new sector at PoRep. It is mixed with the sealed data and changes CommR. But this number is not required at any point later – it might be more informative to call this a sector seed rather than long-term identifier. The “sector ID” used in PoSt is used to order sectors, mark faults, and as a seed to challenge generation. It is not tied to the CommR, but must match between proof generation and verification. We could use a different number than the PoRep sector ID, if it provided a stable ordering, could match chain state to identify faults, and is common to the prover and verifier. In particular, we could use a supersector ID and sector index and things should "just work" on the proofs side; the supersector ID need not match that used at PoRep either.

Cron

Given faults and Window PoSt accounting that are largely similar to today, only on larger and heterogeneous units of storage, miner cron processing is largely similar to today.

Notable differences are:

Missed Window PoSt is checked for supersectors, rather than partitions. A miner will likely have more supersectors than partitions.

Deals

This proposal changes little with respect to deals. Deals are still assigned to sectors (within supersectors) explicitly at pre-commitment time. The chain still needs to compute and prove the CommD for non-empty sectors. The miner still needs to seal this data into specific sectors, and the supersector metadata records per-sector deal information (only for sectors with deals, of course).

If a supersector is terminated before its scheduled expiration, all unfinished deals will be terminated.

CC upgrades

For this baseline proposal, committed capacity upgrade mechanisms are left largely unchanged, but operate at the supersector level.

A committed capacity supersector is a supersector with no deals in any of its sectors. Such a supersector may be upgraded to one with deals by committing a new supersector of the same size that does have deals. The CC supersector can then be terminated ahead of its scheduled expiration.

TODO: talk about how the ability to merge supersectors makes this not a big deal in practise.

TODO: if we can do in-place CC upgrades, rewrite this section to use that (re-compute and prove new CommRAgg)

Outstanding questions

How to aggregate CommRs: merkle tree or vector commitment

Parameter choice for width. Can we have w=2?

Parameter choice for Window PoSt partition size

Scale

TODO Estimates etc

A limit on per-miner storage given by max-supersector-size (??) * deadlines (48) * supersectors-per-deadline (3912)

Window PoSt for large supersectors will require one public input per partition (which may be the same CommRAgg repeated). This introduces a small linearity in proof verification (which is off-chain except for disputes).

For a while, most miners will build supersectors that are smaller than today’s Window PoSt partitions (2349 sectors). This means a miner will have more supersectors than they would have had partitions for a given amount of committed storage, so there are more PoSts to submit and check. It will likely be worth optimizing miner state structure to minimize the redundant state loaded to process a Window PoSt.

Product implications

Sealing pipeline storage of intermediate data

Pre-committing and proving a large supersector requires the miner to hold on to the intermediate sealing layer data for all the sectors involved until proofs are constructed. This intermediate data is large, about 10x the size of the sector (320–400 GB for 32GiB sector). The intermediate data consumes expensive NVMe storage, necessary for fast random access during proof generation but idle while waiting for other sealing processes in a batch. The most effective CPU:NVMe ratio keeps both saturated.

Thus, it’s economically unattractive to build batches larger than the simultaneous throughput of a sealing pipeline unless gas costs (and projected future maintenance costs) are very high. This affects all miners, but especially smaller operations. We already see reluctance to use the batch pre-commit method from Hyperdrive due to this cost.

A pretty good mitigation to this cost would be to enable miners to commit supersectors at the natural batch size for their sealing pipeline, even if very small, but then merge those supersectors into larger ones for reduced long-term state and maintenance costs. This would consume more gas, but allow the onboarding batch size to adjust with market forces without committing an operator to maintaining many small supersectors for their lifetime. Thus the merging supersectors extension to the baseline proposal may be necessary.

Non-interactive PoRep might resolve this much more fully.

Higher bar for miner operations for large aggregates

PoRep sealing all the sectors at once, no gaps
Faults take down a larger unit of storage, higher penalties (unless partial faults)

Miners need an active supersector merging policy to avoid hitting hard limits on, e.g., GPU requirement for PoSt.

Miners must record CommR data as critical metadata. Losing CommR is nearly as bad as losing the sector (they would have to regenerate by sealing).

Effectively packing deals into supersectors.

Harder CC upgrades

Small miners

Economies of scale

Hardware utilisation when supersector size = partition size
Amortized gas for committing and maintaining larger supersectors

Small miners

Sealing multiple sectors and holding them for batches consumes expensive NVMe space, so they won’t use large batches unless gas costs force it
Many small (non-optimal) supersectors will require more GPU for Window PoSt. (but this might motivate larger ones, even when gas cost low. GPU vs NVMe investment tradeoff).

Migration

A new miner actor implements supersector commitment and maintenance instead of per-sector methods and partition-wise accounting. These super-miners are distinct from the existing miner actors, but interact with the same power and market singleton actors. Old miner actors continue to function on chain, but are deprecated from the point of view of new development.

Miner operators must provision a new miner actor in order to commit super-sectors. Large-scale miners are expected to do so immediately, but smaller ones might not. We might set a sunset epoch for old miner actors after which no new sectors may be committed or extended, but would need a good story for small miner efficiency before doing so (e.g. that committing small supersectors is no worse than today, and they can be merged later).

A moderate amount of code will be duplicated between the different miner actors, mostly associated with metadata like control addresses, peer IDs, rewards & withdrawals, consensus faults. We may factor some of this out to shared libraries to reduce the maintenance burden.

See migrating old sectors for an extension, and alternatives considered.

Beyond the baseline

Migrating old sectors

If the Hyperdrive network upgrade prompts a great increase in storage onboarding rate, then the total state size may blow out to hundreds of gigabytes. This is deemed manageable in the short term, but it will be very valuable to discard that state in favour of supersectors’ much more compressed state, by migrating old sectors into new supersectors.

A miner operator which owns an old and new miner actor may aggregate a collection of sectors from an old miner actor into a single supersector in a new actor, as an atomic operation. These sectors remain sealed and so do not need a new PoRep, but become scheduled for Window PoSt in the new miner and retired from the old one. Since PoRep is not required, the sectors may be allocated a new supersector ID and sector indices easily.

The CommRs for all the migrated sectors are initially public information, so the CommRAgg may be computed on chain, or computed off-chain and proven by something similar to a Window PoSt.

Enables a gradual migration of activity to the new way. TODO: how? Need to handle the distinct sector IDs.

Pre-commit randomness

It’s not strictly necessary for all sectors to use the same pre-commit epoch and randomness. It is simpler though. All sector pre-committed together would need epochs within the MaxPreCommitRandomnessLookback in any case.

If sectors in a pre-commitment have different randomness epochs then:

we’ll need more expressive messages and state to represent per-sector values
we’ll need to store the different epochs on-chain for the duration of pre-commit

Toward exponential deals

Decoupling capacity from market: FIL+ explicit subsidy

Superdeals

The idea would be to allow a single deal to span a whole supersector, reducing dealmaking overhead and allowing for bigger continuous deals.

Effort

CryptoNet analysis
CryptoEcon analysis
Operational analysis
3 new circuits (PoRep aggregation, Window PoSt, Winning PoSt) + modifications to SNARKPack
Actors
Lotus

Alternatives

In-place migration

Attempt to migrate partition-wise miner state into supersectors, as an atomic state migration. Scary.

Q&A

What’s the incentive for miners to adapt to this? Does the incentive cover the extra operational risk and cost?

The incentive will be a reduced gas cost of committing and maintaining large amounts of storage. There is no incentive if gas is cheap, but the premise motivating this proposal is that network bandwidth will become saturated by onboarding demand. We will need to address operational risk/cost to make this an easy choice though, probably through some of the protocol extensions sketched out above.

Why can’t we get rid of the notion of individual comRs ?

Currently, a porep is actually 10 SNARKs, so even by considering comR as internal leaves of a bigger merkle tree, we eventually need to compute them still. We probably can’t do porep for 1000x sectors at once. ?

Why must we have 32/64GiB sectors involved?

This is our fundamental unit of proof of space, for the tech we have today. Can’t go larger without unreasonable RAM requirements.