The evolution of Ethereum has been advancing at a remarkably rapid pace throughout the last month. The launch of PoC5 (“proof of concept five”) last month, just before the sale, represented a significant milestone for the initiative, as it marked the first time we had two clients—one crafted in C++ and another in Go—successfully collaborating and managing the identical blockchain. A fortnight later, the Python client was similarly incorporated into the lineup, and presently a Java variant is nearing completion as well. At this stage, we are in the midst of deploying an initial portion of funds that we have previously withdrawn from the Ethereum exodus address to broaden our operations, and we are diligently engaged in developing PoC6, the subsequent iteration in the series, which includes several improvements.
Currently, Ethereum resembles the state of Bitcoin around mid-2009; the clients and the protocol are functional, allowing users to send transactions and construct decentralized applications using contracts and even appealing user interfaces within HTML and Javascript. However, the software lacks efficiency, the user interface is underdeveloped, and networking inadequacies as well as vulnerabilities will take time to address. There exists a considerable risk of security flaws and consensus failures. To confidently release Ethereum 1.0, only four essential tasks must be accomplished: security testing at the protocol and network level, enhancements to virtual machine efficiency, a comprehensive series of tests to verify inter-client compatibility, and a finalized consensus algorithm. All these tasks are currently at the top of our priority list, while simultaneously, we are also working concurrently on robust and user-friendly tools for constructing decentralized applications, standard libraries for contracts, improved user interfaces, light clients, and other minor features that enhance the development experience from good to exceptional.
PoC6
The significant modifications slated for PoC6 are outlined as follows:
- The block time is reduced from 60 seconds down to 12 seconds by utilizing a new GHOST-based protocol that builds upon our previous attempts to cut the block time to 60 seconds
- The ADDMOD and MULMOD (unsigned modular addition and unsigned modular multiplication) are introduced at positions 0x14 and 0x15, respectively. Their primary aim is to facilitate the implementation of specific types of number-theoretic cryptographic algorithms, such as elliptic curve signature verification. Refer to this link for illustrative code examples that utilize these operations.
- The opcodes DUP and SWAP will be removed from their current locations. Instead, the new opcodes DUP1, DUP2 … DUP16 will occupy slots 0x80 … 0x8f, and similarly, SWAP1 … SWAP16 will be positioned at 0x90 … 0x9f. The DUPn operator duplicates the nth highest value in the stack to the top, whereas the SWAPn operator exchanges the highest and (n+1)-th highest values in the stack.
- The with statement is incorporated into Serpent, serving as a manual means of utilizing these opcodes for more efficient variable access. Example usage can be found here. It’s worth noting that this is an advanced feature, with a caveat: If you stack too many nested layers beneath a with statement that you attempt to access a variable more than 16 stack levels deep, the compilation will fail. Ultimately, the aim is for the Serpent compiler to intelligently alternate between stack-based and memory-based variables as necessary to optimize efficiency.
- The POST opcode is introduced at slot 0xf3. The POST opcode functions similarly to CALL, with the key differences being: (1) it has 5 inputs and 0 outputs (meaning it does not return any value), and (2) the execution occurs asynchronously after all other processes are complete. Specifically, the transaction execution process now involves (1) initializing a “post queue” containing the message embedded in the transaction, (2) continuously processing the first message in the post queue until it’s empty, and (3) refunding gas to the transaction origin and handling suicides. The POST opcode adds a message to the post queue.
- The hash of a block is now derived from the header alone, rather than the full block (a function that should have been implemented from the beginning), with the code hash for accounts lacking code being “” instead of sha3(“”) (resulting in a 32-byte efficiency gain for all non-contract accounts), and the destination address for contract creation transactions now returns the empty string rather than a sequence of twenty zero bytes.
On Efficiency
In addition to these modifications, a principal concept we are beginning to explore is “native contract extensions.” This notion arises from extensive internal and external discussions concerning the trade-offs involved in maintaining a more streamlined instruction set (“RISC“) in our virtual machine, constrained to fundamental memory, storage, and blockchain interactions, sub-calls, and arithmetic operations, as opposed to a more intricate instruction set (“CISC“), encompassing characteristics like elliptic curve signature validation, a broader array of hashing algorithms, bloom filters, and data representations such as heaps. The rationale for advocating a diminished instruction set is twofold. Initially, it renders the virtual machine less complex, facilitating the creation of various implementations and minimizing the likelihood of security vulnerabilities and consensus breakdowns. Additionally, no definitive collection of opcodes will ever cover all the tasks individuals may wish to perform, so a more generalized approach would be significantly more resilient to future developments.
The case for increasing the number of opcodes is rooted in straightforward efficiency. For instance, consider the heap. A heap is a data structure that facilitates three functions: inserting a value into the heap, rapidly verifying the current smallest value within the heap, and extracting the smallest value from the heap. Heaps are especially beneficial when constructing decentralized markets; the most basic method of designing a market involves having a heap of sell orders, an inverted (i.e., highest-first) heap of buy orders, and continuously removing the foremost buy and sell orders from the heap and matching them with one another while the asking price exceeds the bidding price. The method to execute this in a relatively swift manner, taking logarithmic time for both addition and removal, and constant time for querying, is by utilizing a tree:
The fundamental invariant is that the parent node of a tree remains lower than both children. To introduce a value into the tree, the value is added to the end of the lowest level (or the start of a new lowest level if the current one is filled), followed by relocating the node upwards in the tree, swapping it with its parents, as long as the parent node is greater than the child node. By the end of this process, the invariant is restored with the new node positioned correctly in the tree:
To eliminate a node, we remove the node at the top, retrieve a node from the lowest level and position it in its place, and move that node down the tree as far as is logical:
To ascertain the minimum node, we, well, check the top. The essential aspect here is that both operations occur in logarithmic time relative to the quantity of nodes in the tree; even if your heap contains a billion items, it requires merely 30 steps to add or remove a node. It’s a complex endeavor in computer science, but for those familiar with trees, it’s not particularly intricate. Now, let’s explore how to implement this in Ethereum code. The complete code sample for this can be found here; for those intrigued, the parent directory also houses a batched market implementation utilizing these heaps and an endeavor at executing futarchy through the markets. Below is a code snippet for the segment of the heap algorithm that addresses the addition of new values:
# push if msg.data[0] == 0: sz = contract.storage[0] contract.storage[sz + 1] = msg.data[1] k = sz + 1 while k > 1: bottom = contract.storage[k] top = contract.storage[k/2] if bottom top: contract.storage[k] = top contract.storage[k/2] = bottom k /= 2 else: k = 0 contract.storage[0] = sz + 1
The framework that we employ is such that contract.storage[0] holds the size (i.e., count of values) of the heap, contract.storage[1] signifies the root node, and from that point for any n , contract.storage[n] serves as a node with a parent contract.storage[n/2] and offspring contract.storage[n*2] and contract.storage[n*2+1] (if n*2 and n*2+1 do not exceed the heap size, naturally). Quite straightforward.
So, what’s the issue? To summarize, as previously stated, the central issue is inefficiency. Theoretically, all tree-based algorithms generally possess most of their operations requiring log(n) time. Here, however, the complication arises because we essentially have a tree (the heap) layered atop another tree (the Ethereum Patricia tree storing the state) on top of yet another tree (leveldb). Consequently, the market established here effectively incurs log3(n) overhead in practice, a considerable deceleration.
To illustrate further, over the past few days, I have composed, profiled, and tested Serpent code for elliptic curve signature validation. The code is primarily a rather simple adaptation of pybitcointools, albeit with some instances of recursion being substituted with loops to improve efficiency. Even then, the cost in gas is shocking: averaging around 340000 for a single signature validation.
Moreover, this is noteworthy after implementing several optimizations. For instance, take a look at the code for executing modular exponents:
with b = msg.data[0]: with e = msg.data[1]: with m = msg.data[2]: with o = 1: with bit = 2 ^ 255: while gt(bit, 0): # A hint of loop unrolling for 20% efficiency improvement o = mulmod(mulmod(o, o, m), b ^ !(!(e & bit)), m) o = mulmod(mulmod(o, o, m), b ^ !(!(e & div(bit, 2))), m) o = mulmod(mulmod(o, o, m), b ^ !(!(e & div(bit, 4))), m) o = mulmod(mulmod(o, o, m), b ^ !(!(e & div(bit, 8))), m) bit = div(bit, 16) return(o)
This consumes 5084 gas for any input. It remains a relatively straightforward algorithm; a more sophisticated adaptation might accelerate this by as much as 50%, but even so, processing over 256 bits tends to be costly regardless of the approach.
The two examples demonstrate that high-efficiency, high-throughput decentralized applications may prove to be quite challenging to construct atop Ethereum without either intricate commands to incorporate heaps, signature verification, etc. into the protocol, or an alternative to replace them. The solution we are currently developing is an initiative envisioned by our chief developer Gavin Wood, aiming to attain the advantages of both possibilities—maintaining the flexibility of simple commands while simultaneously gaining the efficiency of operations implemented natively: native code extensions.
Native Code Extensions
The mechanism by which native code extensions operate is as follows. Consider a particular operation or data structure that we desire Ethereum contracts to access but that we can optimize through implementation in C++ or machine code. Initially, we draft an implementation using Ethereum virtual machine code, verify its functionality through testing, and eventually publish that implementation as a contract. Following this, we either develop or locate an implementation that accomplishes this task natively, and we modify the execution engine’s code to search for calls to the contract we established, substituting a sub-call to the virtual machine with a call to the native extension instead. Therefore, instead of requiring 22 seconds for the elliptic curve recovery operation, it would only need 0.02 seconds.
The challenge arises in ensuring that the fees for these native extensions remain reasonable. This is where things become complicated. First, let’s simplify matters somewhat and explore where the economic analysis takes us. Assume that miners possess an oracle that informs them of the maximum time a specific contract could consume. Currently, without native extensions, this oracle exists in a basic form—it simply involves referencing the STARTGAS of the transaction—but becomes more complex when faced with a contract whose STARTGAS is 1000000 and has a high probability of utilizing several native extensions to drastically enhance performance. However, for the sake of this analysis, let’s assume it exists.
Now, consider a user who submits a transaction that expends 1500 gas on various business logic alongside 340000 gas on an optimized elliptic curve operation, which only needs the equivalent of 500 gas in normal execution to process. The prevailing market-rate transaction fee stands at 1 szabo (or micro-ether) per gas. The user sets a GASPRICE of 0.01 szabo, effectively covering 3415 gas because he is unwilling to cover the full 341500 gas for the transaction, but is aware that miners can handle his transaction for an effort equivalent to 2000 gas. The user transmits the transaction, leading to its reception by a miner. At this stage, there are two potential scenarios:
- The miner possesses a sufficient quantity of unconfirmed transactions in its mempool and is prepared to utilize the computational power to generate a block where the total gas consumed approaches the block-specific gas limit (this, as a reminder, is 1.2 times the long-term exponential moving average of the gas utilized in the latest blocks). In this scenario, the miner has a fixed amount of gas to fill, thus it seeks the highest GASPRICE possible, meaning a transaction offering 0.01 szabo per gas, in contrast to the market value of 1 szabo per gas, gets summarily ignored.
- Either there are insufficient unconfirmed transactions available, or the miner is smaller and unwilling or unable to handle every transaction. In such a case, the critical factor in determining whether or not a transaction is approved is the ratio of reward to processing duration. Therefore, the miner’s incentives are perfectly aligned, and since this transaction has a 70% improved reward to cost rate compared to most others, it will be accepted.
What we observe is that, given our magical oracle, these transactions will be accepted, but will require a few additional blocks before they are integrated into the network. Over time, the block-level gas cap could increase as more contract extensions are utilized, facilitating the use of even more of them. The main concern is that if such mechanisms become overly common, and the average gas expenditure per block surpasses 99% from native extensions, then the regulatory mechanism preventing large miners from producing excessively large blocks as a denial-of-service attack on the network would be weakened – at a gas ceiling of 1000000000, a malevolent miner could construct an unoptimized contract that takes up that many computational processes, thus halting the network.
So collectively, we face two challenges. One is the theoretical concern of the gas limit evolving into a weaker safeguard, and the other is the lack of a magical oracle. Fortunately, we can address the second challenge, and in doing so, simultaneously mitigate the impact of the first issue. The straightforward solution is simple: rather than GASPRICE being a single figure, there would be one default GASPRICE along with a compilation of [address, gasprice] pairs for particular contracts. As soon as execution engages an eligible contract, the virtual machine would monitor how much gas it expended within that context and subsequently refund the transaction sender accordingly. To avoid gas counts from spiraling too far out of control, the secondary gas prices would be mandated to be at least 1% (or some other fraction) of the original gasprice. The downside is that this system is space-inefficient, requiring approximately 25 additional bytes per contract. A potential solution is to permit users to register tables on the blockchain, then simply refer to the specific fee table they wish to adopt. In any event, the exact mechanism has yet to be finalized; thus, native extensions might have to wait until PoC7.
Mining
The other modification likely to be initiated in PoC7 is a new mining algorithm. We (primarily Vlad Zamfir) have been gradually developing the mining algorithm in our mining repository, reaching a functional proof of concept, although further research is essential to enhance its ASIC resistance. The fundamental concept behind the algorithm is to randomly create a new circuit every 1000 nonces; a device capable of executing this algorithm would need to be able to process all circuits that might be produced, and in theory, there should be a certain circuit that could be generated by our system equivalent to SHA256, or BLAKE, or Keccak, or any other algorithms within X11. Therefore, such a device would need to function as a generalized computer – fundamentally, the objective is to achieve something that approaches mathematically provable specialization-resistance. To ensure that all hash functions generated are secure, a SHA3 is always applied at the conclusion.
Of course, attaining perfect specialization-resistance is unattainable; there will always be certain features of a CPU that will prove extraneous within such an algorithm, leading to an unavoidable nonzero theoretical ASIC speedup. Currently, the most significant threat to our approach is probably some form of swiftly switching FPGA. Nevertheless, there is an economic rationale indicating that CPUs will persist even if ASICs demonstrate a speedup, provided that speedup remains sufficiently low; refer to my previous article on mining for an overview of some details. A possible compromise we may need to consider is whether or not to make the algorithm memory-intensive; ASIC resistance is challenging enough as it currently stands, and memory-hardness might inadvertently obstruct that aim (cf. Peter Todd’s assertions that memory-based algorithms may actually promote centralization); if the algorithm is not memory-intensive, it might end up being GPU-friendly. Simultaneously, we are exploring hybrid-proof-of-stake scoring functions as a means of enhancing PoW with additional security, necessitating 51% attacks to concurrently incorporate a significant economic component.
With the protocol stabilizing, another avenue ripe for development is what we are starting to refer to as “Ethereum 1.5” – mechanisms built atop Ethereum as it currently exists, without requiring any new modifications to the core protocol, enabling improved scalability and efficiency for contracts and decentralized applications, either by cleverly aggregating and batching transactions or by utilizing the blockchain solely as a backup enforcement mechanism with only the nodes interested in a specific contract operating that contract by default. There are several mechanisms within this category; this is an area that will garner substantially increased focus from both ourselves and ideally others in the community.
