January 14th tl;dc (too lengthy, didn’t call)
Notice: This is a summary of the subjects covered in the recurring Eth1.x research conference call and does not symbolize finalized strategies or commitments to network enhancements.
The primary subjects of this call included
- Approximate data quantifying the benefits of transitioning to a binary trie structure
- Transition approaches and possible obstacles for a move to binary tries
- “Merklizing” contract code for proofs, and effects on gas scheduling/metering
- Chain pruning and historical chain/state data — implications for the network and methods of distribution.
Logistics
The weekend after EthCC (March 7-8), there will be a compact 1.x research summit aimed at having several days of meaningful dialogue and work on the relevant topics. The gathering will be limited (due to venue restrictions) to 40 participants, which should accommodate the expected attendees.
Additionally, there may be some informal, spontaneous meetups during Stanford Blockchain week and ETHDenver, but nothing scheduled explicitly.
The subsequent call is tentatively arranged for the first or second week of February — roughly halfway between now and the summit in Paris.
Technical discussion
EIP #2465
While not directly linked to stateless ethereum, this EIP enhances the network protocol for transaction dissemination, thus representing a quite straightforward upgrade that aligns with the research objectives. Support!
Binary Trie size benefits
Shifting to a binary trie structure (instead of the existing hexary trie structure) should theoretically decrease the size of proofs by approximately 3.75x, but realistically, that reduction might only be around half, depending on the perspective taken..
Proofs are composed of roughly 30% code and 70% hashes. Hashes within the trie can be reduced by 3x, but the code does not benefit from a binary trie, since it must always be included in the proof. Therefore, switching to a binary trie format will likely bring proof sizes to approximately ~300-1400kB, down from ~800-3400kB in the hexary trie.
Executing the transition
Implementing the actual transition to a binary trie presents a different challenge, with several inquiries that require elaboration. There are essentially two distinct strategies that could be considered:
progressive transition — This offers a ‘ship of Theseus’ model whereby the complete state trie is transitioned to a binary format account-by-account and storage slot-by-storage slot, as each segment of state is accessed by EVM execution. This indicates that, from now on, Ethereum’s state would be a hexary/binary hybrid, and accounts would need to be “activated” to update to the new trie format (possibly with a POKE opcode ;). The benefits include not interrupting the chain’s usual operations and avoiding large-scale coordination for upgrading. However, the downside is complexity: both hexary and binary trie formats must be accommodated within clients, and the process would never truly “conclude,” as some sections of the state cannot be accessed externally and would need to be specifically activated by their owners, which is unlikely to occur for the entire state. The progressive approach would also necessitate client modifications to maintain their database as a kind of ‘virtualized’ binary trie within a hexary database structure to prevent a sudden surge in storage demands for all clients (note: this database enhancement can occur independently of the full ‘progressive’ transition and would still be advantageous by itself).
compute and clean-cut — This approach would implement an ‘at once’ transition facilitated through one or more hard forks, where a future date would be designated for the switch, followed by all network participants needing to recompute the state as a binary trie, then transition to the new format collectively. This approach might be perceived as ‘simpler’ to execute from an engineering viewpoint. Nonetheless, it presents higher complexity in terms of coordination: The new binary trie state needs to be pre-computed ahead of the fork, which could take an hour (approximately) — during that time, it isn’t clear how transactions and new blocks would be processed (because they would need to be incorporated into the yet-un-calculated binary state trie, and/or the legacy trie). This would be complicated further by the tendency of many miners and exchanges to upgrade clients just before the fork. Alternatively, we might consider suspending the entire chain temporarily to recompute the new state — a process which could be even more challenging and potentially contentious to coordinate.
Both alternatives are still ‘on the table’ and need additional deliberation and dialogue before any choices are made regarding subsequent actions. Particularly concerning are the trade-offs between implementation complexity on one side and coordination difficulties on the other.
Code “chunking”
In addressing the code aspect of proofs, some prototyping has been executed on code ‘merklization’, which essentially allows contract code to be sectionalized into segments before inclusion in a proof. The fundamental idea is that if a method in a smart contract is invoked, the proof should only need to encompass the portions of the contract code that were actually invoked, rather than the entire contract. This field is still in nascent stages of research, but it indicates an approximate ~50% reduction in the code fraction of a proof. More ambitiously, the methodology of code chunking could be expanded to create a singular global ‘code trie’, but this notion is not well advanced and likely encompasses its own set of challenges necessitating deeper exploration.
Various strategies exist for breaking code into segments, subsequently utilized to generate proofs. The first strategy is ‘dynamic’, depending on locating JUMPDEST instructions and splitting near those points, leading to variable chunk dimensions depending on the specific code being divided. The second approach is ‘static’, which would partition code into consistent sizes,and include essential metadata indicating where accurate jump targets are located within the segment. It appears that either of these two methods would be acceptable, and both could potentially be compatible, allowing users to choose which one to implement. Regardless, chunking facilitates a further reduction in witness sizes.
(un)gas
An open inquiry is what modifications would be required or preferred in gas scheduling with the advent of block witnesses. The generation of witnesses needs to be compensated with gas. If the code is divided, within a block there would be some overlap as multiple transactions cover the same code, which means segments of a block witness might be paid for multiple times by all the transactions contained within the block. A prudent notion (and one favorable for miners) would be to assign the obligation of covering the complete cost of their individual transaction’s witness to the transaction submitter, allowing the miner to keep any excess payment. This reduces the necessity for adjustments in gas charges while encouraging miners to produce witnesses; however, it unfortunately disrupts the existing security framework that only places trust in sub-calls (within a transaction) that have a share of the total committed gas. The approach to address this alteration in the security framework is a matter that must be thoroughly analyzed. Ultimately, the objective is to bill each transaction for the expense of creating its own witness, in proportion to the code it interacts with.
Wei Tang’s UNGAS suggestion may facilitate any modifications to the EVM. It’s not absolutely required for stateless Ethereum, but it presents a concept for how to simplify future disruptive changes to gas schedules. The pertinent inquiry is “What do the alterations resemble both without and with UNGAS – and considering those factors, does UNGAS genuinely make this process significantly simpler to carry out?”. To find this out, we require experiments that execute scenarios with merklized code and newly applied gas regulations, and then evaluate what should be altered concerning cost and execution within the EVM.
Pruning and data transmission
In a stateless framework, nodes that are missing some or all state data must have a method to communicate to the wider network the data they possess and the data they lack. This has ramifications for network architecture – stateless clients that are devoid of data must be capable of swiftly and reliably locating the required information somewhere on the network, as well as transparently communicating in advance what data they do not possess (and may need). Incorporating such a feature into one of the chain-pruning EIPs represents a networking (but not consensus) protocol adjustment, and it is something that can also be executed now.
The other aspect of this dilemma is determining where to store the historical data, and the most favorable solution proposed thus far is an Ethereum-specific distributed storage network that can deliver the requested data. This could take various forms; the complete state might be suited for ‘chunking’, akin to contract code; partial-state nodes could oversee (randomly designated) chunks of the state and provide them upon request at the network’s periphery; clients could utilize additional data routing methods to ensure that a stateless node can still retrieve missing data through an intermediary (which lacks the required data, but is linked to another node that does). Regardless of the implementation, the fundamental aim is for clients to be able to access the network and obtain all necessary data reliably, without competing for access to a full-state node, which is essentially what occurs with LES nodes currently. The work surrounding these concepts is still in initial phases, but the geth team is generating promising outcomes while experimenting with ‘state tiling’ (chunking), and turbo-geth is developing data routing for disseminating fragments of the state.
As always, if you have queries regarding Eth1x initiatives, suggestions for topics, or wish to contribute, participate in an event, please introduce yourself on ethresear.ch or contact @gichiba and/or @JHancock on Twitter.