Storage is committed to Filecoin in units of a sector. A sector is a 32 or 64 GiB chunk that is the unit of a single proof of replication. Each sector has a record in chain state (called “sector info”) that includes the replica commitment (a CID) and some metadata about when it was committed, its pledge requirement, deals etc. This record is mostly static, changing only when sectors are extended or have their content updated. For each sector, the chain also maintains some associated state information, reflecting things like whether the sector is faulty and when it is due to expire. This state information is aggregated into partitions of 2349 sectors, which is the unit of a Window PoSt proof. This information is accessed or modified each day when a provider submits Window PoSt for every sector.
There are more than four hundred million sectors currently committed to the Filecoin network. The information about sectors thus dominates the Filecoin chain state size. The replica commitments alone account for more than 400,000,000 * 36B = 13.4GiB of state. And loading each of the >170,000 partitions every day accounts for a significant fraction of the chain’s gas bandwidth.
In order to scale the capacity of the network to record commitments and maintain sector state accounting, we will one day need to undertake a significant reworking of how this per-sector state is recorded and processed. We’ve thought about this before: in early 2021 the rate of sector onboarding was growing exponentially, and if the exponential trend had continued we would have run into capacity constraints much sooner than initially anticipated. The Neutron proposal (motivation, detailed design) was one attempt to re-design things to cater for ongoing exponential growth by establishing a fixed-size on-chain representation for a collection of multiple sectors. An alternative idea, much more ambitious, is to move much of the sector state off-chain entirely, and have storage providers prove mutations of that state with a SNARK or similar verifiable computation.
As it happened, the growth rate levelled off and we put the proposal to one side. But big changes like this will become relevant again in the future. Even if we’re not staring down a hard capacity constraint, more efficient representation of sector info and state has potential to significantly reduce gas costs for storage providers.
All of this is to set some context in which to evaluate a common question: why shouldn’t we provide on-chain sector information to actors and smart contracts that will be built on the FVM.
The answer is that (1) we want on-chain APIs to be supported forever, because if we changed them we would break deployed smart-contracts that depend on those APIs, and such breakage could result in large losses of functionality and economic value, and (2) we don’t want to commit to providing any particular sector information through those APIs because we intend to remove that information from the chain in the future in order to reduce costs and support greater scale.
Both the Neutron and the off-chain state proposal would make it impossible to answer API questions like “what’s this sector’s replica commitment” or “is this sector faulty”: that per-sector information wouldn’t be stored. If we provided those APIs now, we would later face a devastating choice between running into hard network capacity limits, or breaking a huge ecosystem of smart contracts implementing the storage economy.
There are some things we can expose safely, though. Most of the high-level functionality that one would want can still be provided, just in a less direct fashion and requiring some off-chain data or computation to support it. While the raw sector state will one day be taken off-chain, a commitment to that state will mostly likely remain. So rather than an API to answer “what’s this sector’s replica commitment”, we can provide an on-chain API that verifies a proof that “I assert this sector’s replica commitment is X”. With this API, a party that maintains per-sector information off-chain (e.g. the sector’s provider, or a block explorer) can submit their claimed answer to the query, which can be verified on-chain by whatever application wanted to know it in the first place. It’s not quite as easy to use, but can support far greater storage capacity in the future.