This week we are updating the Tech Tree to showcase several new key milestones in Ethereum 1.x R&D that are not entirely a full manifestation of Stateless Ethereum, but significantly more achievable in the mid-term. The most notable inclusion in the tech tree is Alexey’s reGenesis proposal. While it is far from a fully defined upgrade, the prevailing sentiment among R&D is that reGenesis presents a less radical but much more feasible progression towards the ultimate aspiration of a “fully stateless” system. In many respects, complementing reGenesis is a static state network designed to assist in distributing state snapshots and historical chain information within a bittorrent-style DHT-based network. Concurrently, improvements that are more immediate, such as code merkleization and a binary trie depiction of state, are nearing readiness for EIP. Below, I will elucidate and clarify the modifications made, along with links to pertinent discussions should you want to explore any specific feature further.
Binary Trie
Ethereum presently utilizes a hexary Merkle-Patricia Trie for state encoding, but there are significant efficiency improvements possible by transitioning to a binary format, especially considering the expected size of witnesses. A complete re-encoding of Ethereum’s state necessitates the specification of the new format, alongside a clear transition strategy. Lastly, a decision must be reached about whether smart contract code will also be merkleized and whether that should be included in the binary trie transition or treated as a separate modification.
Binary Trie Format
The underlying concept of a binary trie simplifies (pun intended :)) Ethereum’s current hexary trie architecture. Instead of traversing one of 16 potential paths from the root of the trie towards child nodes, a binary trie utilizes just 2. With a comprehensive re-specification of the state trie comes further chances to enhance well-established inefficiencies that have become apparent after Ethereum has been active for over 5 years. Specifically, this might provide an opportunity to make the state much more suited to the practical performance challenges of database encoding (as discussed in a prior article on state growth).
The conversation surrounding a formal binary trie specification and rules for merkleization can be found on ethresearch.
Binary Trie Transition
It’s not solely the destination (binary trie format) that matters, but the journey itself! Ideally, the transition would avoid any interruption to transaction processing across the network, indicating that clients must develop the new binary trie while also managing new blocks that arrive every 15 seconds. The transition strategy appearing most promising is termed the overlay method, which draws partially from geth’s new snapshotting synchronization protocol. In brief, new state alterations will be added to the existing (hexary) trie in a binary structure, creating a sort of binary/hexary hybrid throughout the transition. The untouched state will be converted as a background operation. Once the transformation is finalized, both layers will get merged into a singular binary trie.
It is crucial to recognize that the binary transition represents one context where client diversity plays a vital role. Each client will need to either create their own variation of the transition or depend on other clients to handle the conversion and await the new trie upon completion. This scenario will undoubtedly be a ‘measure twice, cut once’ type of situation, with all client teams collaborating to implement tests and coordinate the transfer. It is conceivable that for the sake of safety and security, the network may need to briefly halt operations (e.g., mine several empty blocks) during the transition, yet reaching consensus on any specific plan remains too far in the future to anticipate at this moment.
Code Merkleization
Smart Contract code constitutes a substantial portion of the Ethereum state trie (approximately 1 GB of the ~50GB of state). A witness for any interaction with a smart contract will inherently need to supply the code it’s interacting with in order to compute a codeHash, which could entail a considerable amount of additional information. Code Merkleization offers a method for segmenting contract code into smaller parts and substituting codeHash with the root of another merkle trie. This approach would enable a witness to replace potentially large sections of smart contract code with reference hashes, thereby reducing essential kilobytes of witness data.
Various methods for code merkleization schemes exist, ranging from universally chunking (for instance, into 64-byte segments) on the simpler end to more sophisticated techniques like static analysis based on Solidity’s functionId or JUMPDEST instructions. The ideal approach for code merkleization will ultimately depend on what proves to be most effective with actual data acquired from the mainnet.
reGenesis
The most effective way to grasp the reGenesis proposal is this clarification by @mandrigin or the complete proposal by @realLedgerwatch, but to summarize, reGenesis is essentially “spring cleaning for the blockchain”. The entire state would hypothetically be divided into an ‘active’ and an ‘inactive’ state. Periodically, the whole ‘active’ state would be deactivated, and fresh transactions would commence building an active state anew from almost scratch (which explains the term “reGenesis”). If a transaction requires an older portion of the state, it would provide a witness notably similar to what would be necessary for Stateless Ethereum: a Merkle proof affirming that the state change aligns with some segment of inactive state. Should a transaction interact with an ‘inactive’ part of the state, it automatically promotes it to ‘active’ (regardless of whether or not the transactionis prosperous) where it persists until the next reGenesis occurrence. This possesses the advantageous quality of establishing certain economic limitations on state utilization that state rent implemented without actually abolishing any state, and permitting transaction senders unable to generate a witness to merely continue attempting a transaction until everything they engage with is ‘active’ once more.
The intriguing aspect of reGenesis is that it brings Ethereum significantly closer to the ultimate objective of Stateless, while evading some of the most substantial hurdles associated with Statelessness, such as the dynamics of witness gas accounting during EVM operations. It also facilitates a variant of transaction witnesses circulating throughout the network, enabling more streamlined, lighter clients and providing dapp developers greater chances to acclimate to the stateless model and witness creation. “True” Statelessness following reGenesis would thus become a matter of extent: Stateless Ethereum essentially evolves into reGenesis after each and every block.
State Network
An improved network protocol has been a ‘side-quest’ on the technology tree from the outset, but with the addition of reGenesis into the concept of Stateless Ethereum, discovering alternative network primitives for disseminating Ethereum chain data (including state) now appears to align much better with the primary mission. The current network protocol of Ethereum resembles a monolith, whereas in reality, there are various distinct types of data that could be shared using different ‘sub-networks’ tailored for diverse functions.
Previously, this concept has been discussed as the “Three Networks” during earlier Stateless discussions, with a DHT-based network effectively serving some of the data that remains unchanged moment to moment. With the introduction of reGenesis, the ‘inactive’ state would fall into the category of static data and could theoretically be managed by a bittorrent-style swarming network instead of piece-by-piece from a fully synchronized client as is practiced currently.
A network distributing the unaltered state since the last reGenesis event would constitute a static state network, potentially developed by enhancing the new Discovery v5.1 specification in the devp2p library (Ethereum’s networking protocol). Earlier proposals like Merry-go-Round sync and the (more advanced) SNAP protocol for synchronizing active state would still represent valuable steps toward establishing a fully distributed dynamic state network for clients striving to swiftly synchronize the complete state.
Wrapping up
A more concise and technical version of every leaf in the Stateless Tech Tree (not only the updated ones) can be found on the Stateless Ethereum specs repository, with active dialogues on all topics discussed here occurring in the Eth1x/2 R&D Discord – please request an invitation on ethresear.ch if you wish to participate. As always, feel free to tweet @gichiba or @JHancock for feedback, inquiries, and ideas for new subjects.