@Alex North, Dec 2021
This is a high level overview of Ethereum's scaling plans. If you're familiar with "Ethereum 2.0" as the sharded blockchain originally proposed, the current plans are quite a bit different. This summary is intended to spread ideas and help us compare and contrast approaches. It assumes a fair bit of background knowledge or your ability to Google search to learn about concepts.
What's the problem?
The key challenge in scaling blockchains is an apparent trilemma between the goals of decentralization, security, and performance. Increasing transaction throughput and decreasing latency is hard while maintaining a permissionless and secure network.
Alternative solutions in brief
A naive solution is to split the network into multiple, smaller networks, each forming consensus on their own blockchain. These are called sidechains. Each chain presumably have the throughput of the original network, so you get a multiplier. The big, show-stopping disadvantage of this approach is that the miner/validator power is divided among the chains, so each is 1/N as secure as the original chain.
The sharding approach improves this by adding a mechanism to post fraud proofs about sidechain behaviour to a main chain, thus inheriting some of its security. Where all the chains are identical, this is the old Ethereum 2.0 approach. The Polkadot network is sharded but parachains (shards) can be more different.
A further challenge for each of these approaches is interoperability & composability. Protocols for cross-chain/shard communication are possible, but significantly hamper the Lego-like composability of smart contracts interacting in a common address space. These approaches require paying the cross-shard communication costs to scale throughput.
The modular blockchain
The new Ethereum architecture is modular, with distinct layers for security, execution, and data availability.
a rollup-centric roadmap could also imply a re-envisioning of eth2’s long-term future: as a single high-security execution shard that everyone processes, plus a scalable data availability layer.
The layer-1 beacon chain provides security through the proof-of-stake mechanism and hundreds of thousands of validators. Its contract and execution environment provides a recourse to recover funds if any of the execution environments fail.
L1 is decoupled from the layer-2 execution environments that almost all applications will actually use. Execution environments can use L1 security through smart-contracts specific to each environment, for example which provide a "withdraw" method which can be executed on L1 even if L2 fails. But L1 won't actually execute the transactions of the L2 executor. The most promising execution environments are rollups, either the zero-knowledge or optimistic variants.
Scaling with rollups is supercharged by data shards. Rather than sharding execution into different chains, the data availability layer (i.e. chain state) is sharded. A single rollup can access many or all data shards at once, thus remaining internally composable (contracts with state on one shard can directly call those with state on another) while enjoying huge transaction throughput (estimates are 100k TPS per rollup).
Multiple rollups (with cross-chain comms) brings us clearly into the millions of TPS. And because execution environments are decoupled from L1, there's loads of room for innovation, alternative security models, side-chains etc to lift the bounds arbitrarily higher.
Layer 2 vs side-chains
The term "Layer 2" is reserved for techniques that perform transactions off the primary chain, but in inherit full security of the L1 chain. This means that users can (1) reconstruct the state from data available on L1, and (2) withdraw their funds on L1 directly.
If you can't recreate the state and exit on L1 when the environment goes down, it's a side-chain, which is fundamentally less secure.
Read Vitalik's primer on rollups:
An Incomplete Guide to Rollups
First, Bob puts $1 (or some ETH or stablecoin equivalent) into a smart contract. To make his first payment to Alice, Bob signs a "ticket" (an off-chain message), that simply says "$0.001", and sends it to Alice. To make his second payment, Bob would sign another ticket that says "$0.002", and send it to Alice.
A rollup is a blockchain execution environment, focused only on maximizing transaction throughput. A rollup outsources security and data availability to some other system (the L1 chain). Note that a rollup does not execute over the global L1 state, but it its own state tree. Transactions in the rollup can only operate on state within the rollup. It's a distinct execution environment, and using it entails bridging assets, re-deploying contracts etc into the rollup's state.
A rollup achieves this outsourcing by publishing either the transaction inputs (optimistic rollups) or an extract of the state (zk rollups, e.g. just the token balances but not other state) to L1 via a smart contract, along with the new rollup state root that results from executing those transactions. The L1 doesn't execute those transactions directly, it just records the transaction data or state extract and the new rollup state root. This means that the rollup's internal state tree can be reconstructed from data secured on L1. This in turn means that a user can exit the rollup by invoking the rollup's contract on L1 directly.
Thus rollup has a small state on L1 and larger state in L2. The L1 state is just the transaction data that can be used to recreate the L2 state, or an extract of the L2 state. The L1 chain is not exposed to the full L2 state directly.
In a rollup-centric network, the L1 validator nodes do not execute the rolled-up transactions, so they do not need to be beefy computers nor have access to the L2 state. The rollup requires only a few aggregator/sequencer/proposer nodes which actually execute the transactions to compute the new rollup state. These do need to be powerful computers, but it's ok to require only a few powerful computers to serve the whole network. Note that even though only a few machines actually execute transactions, the network is still decentralised and secure because thousands of L1 nodes validate those transactions (either through ZK proof or lack of a fraud proof) and provide an exit if the rollup network becomes unavailable. This is quite different to a permissioned validator set.
Rollup throughput is thus limited only by the rate at which it can write new state data to some data availability layer. For a true L2 rollup this means the data throughput of the L1 chain. But other rollups could store the transaction data on a sidechain or somewhere else, too.
In a rollup-centric world, compute is cheap because validating nodes don't have to execute the transactions. Data availability becomes the scarce resource.
Ethereum will scale the size and rate of data availability through sharding. This part is inherited from the old Eth2.0 plans, but note that only data availabilty is sharded, not execution. Validator nodes will be assigned to one of (initially) 64 shards, each of which can propose a new data block every epoch. The validator set provides consensus on the availability of that data.
Read the data sharding proposal from Vitalik.
The sharding proposal opens up a space of 16 MB every 12 seconds that can be filled with any data, and the system guarantees consensus on the availability of that data. This data space can be used by rollups. This ~1398k bytes per sec is a 23x improvement on the ~60 kB/sec of the existing Ethereum chain, and in the longer term the data capacity is expected to grow even further. Hence, rollups that use eth2 sharded data can collectively process as much as ~100k TPS, and even more in the future.
Each validator only needs to store and process the data for its shard, plus a small subset of other data in order to participate in data availability sampling. So validator hardware and bandwidth requirements remain modest. Blocks on the beacon chain will confirm blocks on the data shards after they are accepted by the shard committee.
A rollup can be homed on a single shard, or it can use and store data across multiple shards, enjoying a multiplier on potential data throughput. It can do this while retaining full composability: contracts with data on one shard can call to contracts on another. This is very different to the sharded monolithic approach, where cross-shard transactions are slow and limited.
The Ethereum L1 execution chain will continue much as it looks today, with its state tree eventually migrating to become one data shard (I think).
The beefy rollup sequencer nodes that perform the L2 transactions would need to participate in all the data shards that they use, but clients and L1 validators do not.
Note that the more validators there are, the more data shards can be deployed. This gives Ethereum scale the opposite tendencies to blockchain trilemma. The system scales horizontally until bandwidth limits data throughput for rollup sequencers. Thanks to the modularity, it can scale arbitrarily further (at the cost of composability) by utilising other data availability sources. Ethereum can scale validators into the millions because the proof of stake consensus security mechanism that keeps hardware requirements low. Without PoS, data sharding in this way would not be possible.
- The rollup-centric approach is modular and separates the security, execution, and data availability layers. These can scale somewhat independently. Ethereum-base applications can chose components.
- L1 validators do not execute rollup transactions, so execution throughput scales independently of the chain processing capacity.
- Rollup execution inherits the full security of the L1 chain, because the rollup state can be reconstructed from transaction inputs posted to the data availability layer. A user can exit the rollup directly on L1, independent of the rollup sequencer.
- A rollup can be fully composable across data shards. This enables the simple programming model of direct, synchronous calls between contracts, i.e. no impedance to development.
- Distributed consensus over data availability is an easier problem than consensus over execution. Projections for sharded data throughput are >1MB/sec, which can grow as more validators join the network to support more shards.
- The limiting factor on rollup throughput is writing the transaction inputs to data availability. 100k TPS, fully-composable and enjoying full Ethereum L1 security seems reasonable.
- If that's not enough, the modular approach means that a rollup can chose an alternative data-availability layer, to trade yet further execution throughput for lesser security. The security loss is one of availability, not consensus. The rollup could freeze funds, but not steal or double-spend them.
The modular architecture separates the functions of security, execution, and data availability into distinct layers. It obtains a much higher throughput without sharding the execution layer, thus retaining the full benefits of lego-like contract composability without the pain of cross-shard invocations.
This would appear to be better than any monolithic-architecture approach (where execution and state are coupled), even if sharded. And when limits are reached, you can shard the rollup internally, and/or use alternative data availability layers to the primary L1 (zkPorter, StarkWare DAC, Celestia).
Some arguments from Polynya (a StarkWare affiliate?)
why not just have sharding or multi-chain as is but build rollups on top of shards / subnets? This is definitely a great interim solution, but this just adds extra steps and limitations. Each rollup is now constrained to the single shard. With a fully modular architecture, all execution layers have access to the full network. Thus, you can have uber-rollups doing tens of thousands of TPS, and with innovative inter-rollup communication schemes.
Multi-chain and sharded networks are still monolithic, or mostly monolithic. The modular architecture is necessarily superior to monolithic architectures, by at least 100x short-term, and 10,000x long term. You get better execution layers, better security layers and better data availability layers if each are laser focused on the one task instead of trying to do it all - and these benefits compound. Anything a monolithic execution layer can do; a modular execution layer can necessarily do way better.
Implications for Filecoin
The immediate relevance of this to Filecoin is as inspiration for our own scaling plans. Filecoin is a monolithic blockchain. I believe our current plans for scaling involve sharding the monolith, which means that execution environments are remote and need inter-chain protocols to communicate between shards. This is slow and limiting, and would likely significantly hamper the contract-level innovation that will be possible when FVM enables user-programable contracts, as compared with what will be possible on Ethereum.
Perhaps we should explore the possibilities of modularising the Filecoin chain in a similar way. The proof-of-space mechanism limits our validator set significantly compared with Ethereum, but maybe a data availability layer could be rolled in with short-term PoS and long-term deal-backed storage?
Another implication is that rollups are coming, and we might prepare our VM to support them. Even without the power of a high-throughput DA layer, rollups expand the execution throughput significantly. On the assumption (which need not hold) that developers will prefer to use the same languages and development targets for L2 as L1, we can prepare FVM to be rollup friendly.
We don't have a rollup. Perhaps we should work on one, or maybe get cosy with an existing rollup team.
Some scattered references:
- Fluence describes fraud proofs of WASM execution traces
- Arbitrum Nitro working on WASM optimistic rollup
Filecoin and data availability
There's some rhyming here that's hard to ignore. Filecoin as a product is all about storing data and making it available. Data availability is the scaling constraint on a rollup-centric modular blockchain. Proving that data has been stored, or even is retrievable, is not the same problem as data availability for blockchain state, but ... I can't shake the idea that there's something in there.
If sealing, proof-of-storage, and unsealing were sub-second, perhaps a Filecoin-like thing could serve as our own data availability layer?
Perhaps there's an opportunity for Filecoin to become the secondary data availability layer of choice for rollups and other execution environments that want even cheaper, higher throughput than Ethereum will support, and security based on PoST rather than sampling large replica sets.
Almost everything above is drawn from articles I read elsewhere.
- This post and almost all the links from it are great information. The author (a Starkware affiliate?) does have an agenda, but also paints an inspiring vision. I think that Filecoin and its storage-based security provides a strong counter-example to his thesis that L1s beyond Bitcoin and Ethereum are doomed.
- Matter Labs blog (zkSync) Blog
- Offchain Labs (Arbitrum) Blog, Docs, Talk
- StarkWare Blog, StarkEx
- Optimism (Optimistic Ethereum) Blog, Docs
- L2Beat - Active L2s today
- Celestia - dedicated data availability layer