Heartfelt appreciation to Gavin Wood, Vlad Zamfir, our security evaluators, and others for several ideas that contributed to the insights shared in this article
From its inception, one of Ethereum’s aims—and arguably its primary purpose—has been to provide a significant level of abstraction. Instead of confining users to a limited range of transaction types and applications, the platform empowers anyone to create any sort of blockchain application by scripting and uploading it to the Ethereum blockchain. This endows Ethereum with a level of resilience and impartiality that surpasses many other blockchain frameworks: even if society concludes that blockchains are not particularly beneficial for finance, but rather intriguing for supply chain monitoring, self-operating vehicles, self-replenishing dishwashers, and gambling on chess matches without trust, Ethereum will still maintain its utility. Nevertheless, there exist numerous aspects where Ethereum lacks the level of abstraction it could potentially achieve.
Cryptography
At present, all Ethereum transactions are signed utilizing the ECDSA method, specifically the secp256k1 curve employed by Bitcoin. Elliptic curve signatures are a widely used type of signature today, notably due to their smaller signature and key sizes compared to RSA: an elliptic curve signature occupies only 65 bytes, whereas an RSA signature takes up several hundred bytes. However, there is a growing recognition that the specific signature mechanism adopted by Bitcoin is less than ideal; ed25519 is increasingly acknowledged as a preferable substitute, especially because of its more straightforward implementation, increased resistance to side-channel vulnerabilities, and quicker verification times. Should quantum computing become a reality, we may need to transition to Lamport signatures.
A recommendation that some of our security evaluators and others have presented is to incorporate ed25519 signatures as an optional feature in version 1.1. But what if we could remain true to our essence of abstraction and extend it further: allowing individuals to utilize any cryptographic verification algorithm they choose? Is that feasible to implement securely? Well, we have the Ethereum virtual machine, enabling individuals to execute arbitrary cryptographic verification algorithms; we just need to determine how it can be integrated.
Here is a potential strategy:
- Each account that is not classified as a contract is assigned a piece of “verification code.”
- When a transaction is initiated, it must explicitly identify both sender and recipient.
- The initial stage of processing a transaction involves calling the verification code, utilizing the transaction’s signature (now a simple byte array) as input. If the verification code produces any non-empty output within 50000 gas, the transaction is considered valid. If it outputs an empty array (i.e., exactly zero bytes; a single x00 byte does not count) or terminates with an exception condition, then it is denied.
- To facilitate individuals without ETH to establish accounts, we devise a protocol that allows the generation of verification code offline, using the hash of that code as an address. Individuals can transfer funds to this address. The first time a transaction is sent from that account, the verification code must be supplied in a distinct field (we might use the nonce for this, as in scenarios where this occurs, the nonce would inherently be zero) and the protocol (i) verifies that the verification code is accurate, and (ii) integrates it (this is similar to “pay-to-script-hash” in Bitcoin).
This method offers a few advantages. Firstly, it does not dictate anything regarding the utilized cryptographic algorithm or the signature format, barring that it must be limited to 50000 gas (this figure can be modified over time). Secondly, it retains the existing system’s property that no prior registration is necessary. Thirdly, and significantly, it enables users to impose higher-level validity conditions that depend on the state: for example, preventing transactions that exceed your available GavCoin from executing rather than merely getting recorded on the blockchain without consequence.
However, considerable modifications to the virtual machine will be needed for this to function effectively. The current virtual machine is well-structured for handling 256-bit numbers, accurately capturing the hashes and elliptic curve signatures in use, but it is less efficient for algorithms with varying sizes. Moreover, regardless of the current VM’s design, it fundamentally introduces a layer of abstraction between the code and the machine. Therefore, if this is to be one of the future applications of the VM, an architecture that directly translates VM code to machine code, with transformations applied to interpret specialized opcodes and guarantee security, will likely prove optimal—especially for costly and intricate cryptographic algorithms like zk-SNARKs. Even then, precautions must be taken to decrease any “startup costs” of the virtual machine to further enhance efficiency and reduce denial-of-service susceptibility; coupled with this, a gas cost framework that encourages the reuse of existing code while heavily penalizing distinct code usage for each account could also enhance performance by allowing just-in-time-compiling virtual machines to preserve a cache.
The Trie
Arguably, the most crucial data structure in Ethereum is the Patricia tree. The Patricia tree serves as a data structure that, much like the conventional binary Merkle tree, permits any data component contained in the trie to be securely verified against a root hash using a logarithmically sized (i.e., relatively concise) hash chain; it also holds the essential feature that allows for rapid addition, removal, or modification of data in the tree, necessitating only minor alterations to the overall structure. The trie is utilized in Ethereum to manage transactions, receipts, accounts, and, importantly, the storage linked to each account.
A frequently mentioned limitation of this method is that the trie is a specific data structure,optimized for a specific array of use cases, however, in numerous scenarios, accounts may perform more effectively with an alternative model. The prevalent requirement is a heap: a data construct that allows elements to be swiftly appended with a priority value, and from which the lowest-priority element can consistently be rapidly extracted – especially advantageous in market implementations featuring bid/ask propositions.
Presently, the sole method to achieve this is through a fairly ineffective workaround: creating a heap implementation in Solidity or Serpent on top of the trie. This essentially indicates that every modification to the heap necessitates a logarithmic number of alterations (for instance, at 1000 elements, ten changes; at 1000000 elements, twenty changes) to the trie, and each amendment to the trie requires modifications to a logarithmic quantity (once again ten at 1000 elements and twenty at 1000000 elements) of items, with each requiring an update to the leveldb database that internally uses a logarithmic-time-updateable trie. If contracts had the capability to incorporate a heap directly as a protocol feature, this additional burden could be significantly reduced.
One potential solution to address this issue is the straightforward approach: simply allow contracts to choose between either a conventional trie or a heap, and finalize it. A seemingly superior solution, however, is to generalize even further. The approach is outlined as follows. Instead of having just a trie or a treap, we can utilize an abstract hash tree: there is a root node, which may be empty or may represent the hash of one or more children, and each child may either be a terminal value or the hash of its own set of children. An extension could involve permitting nodes to maintain both a value and children. This would all be encoded in RLP; for example, we might specify that every node must conform to the following format:
[val, child1, child2, child3....]
Where val must be a sequence of bytes (we may limit it to 32 if preferred), and each child (which can be none or more) must be the 32-byte SHA3 hash of another node. Now, we have the execution environment of the virtual machine maintain a “current node” pointer, and implement a few opcodes:
- GETVAL: pushes the value of the node at the current pointer onto the stack
- SETVAL: assigns the value at the node located at the current pointer to the value on the top of the stack
- GETCHILDCOUNT: retrieves the number of children of the node
- ADDCHILD: introduces a new child node (starting with zero of its own)
- REMOVECHILD: removes a child node
- DESCEND: moves down to the kth child of the current node (taking k as an argument from the stack)
- ASCEND: moves up to the parent node
- ASCENDROOT: moves up to the root node
Accessing a Merkle tree containing 128 elements would thus appear as follows:
def access(i): ~ascendroot() return _access(i, 7) def _access(i, depth): while depth > 0: ~descend(i % 2) i /= 2 depth -= 1 ```html return ~getval()
The creation of the tree would appear as follows:
def create(vals): ~ascendroot() while ~getchildcount() > 0: ~removechild() _create(vals, 7) def _create(vals:arr, depth): if depth > 0: # Recursively create left child ~addchild() ~descend(0) _create(slice(vals, 0, 2**(depth - 1)), depth - 1) ~ascend() # Recursively create right child ~addchild() ~descend(1) _create(slice(vals, 2**(depth - 1), 2**depth), depth - 1) ~ascend() else: ~setval(vals[0])
Evidently, the trie, the treap, and indeed any other tree-structured data framework could consequently be created as a library on these techniques. Notably, what stands out is that each distinct opcode operates within constant time: theoretically, every node can keep track of the links to its descendants and ancestor at the database level, needing merely one layer of overhead.
Nevertheless, this method also presents disadvantages. Specifically, be aware that losing control over the structure of the tree results in forfeiting the capacity to perform optimizations. Presently, the majority of Ethereum clients, including C++, Go, and Python, maintain a higher-level cache allowing for updates to and reads from storage to occur in constant time when numerous reads and writes are executed within a single transaction. If tries become non-standardized, subsequent optimizations
“`like these turn unfeasible. Moreover, each separate trie configuration would need to establish its own gas expenses and its own methods to guarantee that the tree remains shielded from exploitation: quite a challenging issue, considering even our own trie exhibited a moderate level of susceptibility until recently when we substituted the trie keys with the SHA3 hash of the key instead of the actual key. Thus, it is uncertain whether going to such lengths is justified.
Currency
It is widely recognized and acknowledged that an open blockchain necessitates some form of cryptocurrency to motivate individuals to engage in the consensus process; this embodies the essence of the somewhat ridiculous meme:
Nevertheless, is it feasible to create a blockchain that does not depend on any particular currency, instead permitting individuals to conduct transactions using any currency they prefer? In a proof-of-work scenario, especially one that only relies on fees, this is actually quite straightforward for a basic currency blockchain; simply impose a block size limit and allow miners and transaction initiators to negotiate a balance regarding the transaction price (the transaction fees could very well be processed as a batch payment via credit card). However, for Ethereum, the situation is slightly more intricate. The reason being that Ethereum 1.0, in its current form, possesses an integrated gas mechanism that enables miners to securely accept transactions without the worry of being subjected to denial-of-service attacks; the mechanism functions in the following manner:
- Each transaction designates a maximum gas limit and a fee to be paid per gas unit.
- Let’s assume the transaction permits a gas limit of N. If the transaction is legitimate and requires fewer than N computational steps (let’s say, M computational steps), it pays the fee corresponding to M steps. If the transaction utilizes all N computational steps before concluding, the execution is reverted but it still pays a fee for N steps.
This mechanism relies on the availability of a specific currency, ETH, which is regulated by the protocol. Is it possible to replicate it without depending on any single currency? It appears that the answer is affirmative, as long as we integrate it with the aforementioned “use any cryptography you desire” approach. The strategy is as follows. Initially, we slightly expand the applicability of the previously mentioned cryptography-neutrality principle: instead of creating a distinct concept of “verification code” to determine the validity of a transaction, we simply assert that there is only one type of account – a contract, and a transaction is merely a message arriving from the zero address. If the transaction exits with an exceptional condition within 50000 gas, the transaction is deemed invalid; otherwise, it is accepted as valid. Within this framework, we then configure accounts to incorporate the following code:
- Verify if the transaction is accurate. If it is not, exit. If it is, remit a gas payment to a master contract that will subsequently reimburse the miner.
- Deliver the actual message.
- Transmit a message to alert the master contract. The master contract then assesses the amount of remaining gas and refunds a fee proportional to the available amount to the sender, while forwarding the remainder to the miner.
Step 1 can be developed in a uniform format, so that it evidently consumes less than 50000 gas. Similarly, Step 3 can be constructed. Step 2 can then include the message supplying a gas limit equal to the specified gas limit of the transaction minus 100000. Miners can then pattern-match to exclusively approve transactions that fit this standard format (new standard formats can, of course, be introduced over time), and they can be confident that no individual transaction will deprive them of more than 50000 computational energy units. Consequently, everything is entirely enforced by the gas limit, allowing miners and transaction initiators to utilize any currency they choose.
A challenge that presents itself is: how do you compensate contracts? Presently, contracts possess the capability to “charge” for services, utilizing code analogous to this registry example:
def reserve(_name:bytes32): if msg.value > 100 * 10**18: if not self.domains[_name].owner: self.domains[_name].owner = msg.sender
With a sub-currency, there is no evident mechanism for correlating a message and a compensation for that message. However, there are two general patterns that can serve as substitutes. The first is a type of “receipt” interface: upon sending a currency payment to an individual, you can request the contract to retain the sender and value of the transaction. An instance like registrar.reserve(“blahblahblah.eth”) could thus be replaced by:
gavcoin.sendWithReceipt(registrar, 100 * 10**18) registrar.reserve("blahblahblah.eth")
The cryptocurrency would contain code resembling this:
def sendWithReceipt(to, value): if self.balances[msg.sender] >= value: self.balances[msg.sender] -= value self.balances[to] += value self.last_sender = msg.sender self.last_recipient = to self.last_value = value def getLastReceipt(): return([self.last_sender, self.last_recipient, self.value]:arr)
Furthermore, the registrar would function in the following manner:
def reserve(_name:bytes32): r = gavcoin.getLastReceipt(outitems=3) if r[0] == msg.sender and r[1] == self and r[2] >= 100 * 10**18: if not self.domains[_name].owner: self.domains[_name].owner = msg.sender
Fundamentally, the registrar would verify the most recent payment executed in that currency agreement and confirm that it is a payment directed to itself. To avert the potential for double-utilization of a payment, it might be prudent to have the get_last_receipt method eliminate the receipt during the reading process.
The alternative pattern involves having a currency that offers an interface permitting another address to withdraw funds from your account. Consequently, the code would appear as follows from the caller’s side: initially, authorize a one-time withdrawal of a specified number of currency units, then make a reservation, and the reservation contract will endeavor to execute the withdrawal, proceeding only if the withdrawal is successful:
gavcoin.approveOnce(registrar, 100) registrar.reserve("blahblahblah.eth")
And the registrar would consist of:
def reserve(_name:bytes32): if gavcoin.sendCoinFrom(msg.sender, 100, self) == SUCCESS: if not self.domains[_name].owner: self.domains[_name].owner = msg.sender
The second pattern has been formalized in the wiki page on Standardized Contract APIs.
Currency-agnostic Proof of Stake
The preceding allows for the creation of an entirely currency-agnostic proof-of-work blockchain. Nevertheless, to what degree can currency-agnosticism be incorporated into proof of stake? Currency-agnostic proof of stake is valuable for two main reasons. Firstly, it fosters a more robust perception of economic neutrality, thereby increasing the likelihood of acceptance by pre-existing established entities, as it wouldn’t be perceived as favoring a specific elite group (bitcoin holders, ether holders, etc.). Secondly, it amplifies the total amounts deposited, as individuals possessing digital assets aside from ether would incur minimal personal cost in contributing some of those assets to a deposit contract. At first glance, it appears to be a challenging issue: unlike proof of work, which is fundamentally reliant on an external and neutral resource, proof of stake is inherently grounded in some form of currency. So how far can we advance?
The initial step is to strive to establish a proof of stake system that operates using any currency, utilizing some type of standardized currency interface. The concept is straightforward: anyone would be able to engage in the system by providing any currency as a security deposit. A market mechanism would then be employed to ascertain the value of each currency, allowing for the estimation of the amount of each currency that would be necessary to submit for acquiring a stake deposit slot. A simple initial approximation could involve maintaining an on-chain decentralized exchange and reading price data feeds; however, this overlooks issues of liquidity and sockpuppeting (e.g., it is simple to create a currency, distribute it across a limited number of accounts, and simulate a value of $1 trillion per unit); thus, a more coarse-grained and direct approach is demanded.
To conceptualize what we are seeking, consider David Friedman’s account of a specific aspect of the ancient Athenian legal system:
The Athenians devised a clear solution to the challenges of generating public goods, such as maintaining a warship or organizing a public festival. If you were one of the wealthiest Athenians, every two years you were required to produce a public good, with the designated magistrate informing you which one.
“As you no doubt know, we are sending a team to the Olympics this year. Congratulations, you are the sponsor.”
Or
“Behold that splendid trireme docked. This year, guess who gets to serve as captain and paymaster.”
Such a duty was referred to as a liturgy. There were
“`two methods to evade it. One way was to demonstrate that you had already participated in another liturgy this year or had completed one the previous year. The other was to illustrate that there was another Athenian, wealthier than you, who had not engaged in one last year and was not partaking in one this year.
This presents a clear enigma. How, in a realm devoid of accountants, income taxes, or public documentation detailing individuals’ assets and their worth, do I demonstrate that you possess greater wealth than I do? The solution is not found in the accountant’s realm but rather in that of an economist—take a moment to ponder this before flipping the page.
The resolution was straightforward. I propose to barter everything I possess for everything you own. If you decline, you have conceded that you are indeed wealthier than I am, thus allowing you to undertake the liturgy that was meant for me.
Here, we devise a rather clever scheme for stopping affluent individuals from feigning poverty. However, what we seek now is a plan to prevent impoverished individuals from masquerading as wealthy (or more specifically, to hinder those releasing minimal value into the proof of stake security deposit system from pretending they are staking a significantly larger amount).
A straightforward method could be a barter system like this, but executed in reverse through a voting mechanism: to enter the stakeholder pool, you would require the approval of 33% of existing stakeholders, yet every stakeholder granting you approval would have to accept the condition that they can swap their stake for yours: a stipulation they would likely avoid if they suspected that the value of your stake could indeed diminish. Stakeholders would then impose an insurance charge for endorsing a stake that is likely to substantially decline against the prevailing currencies utilized in the stake pool.
The aforementioned plan has two significant drawbacks. Firstly, it inherently leads to currency centralization; if one currency dominates, it will be the most convenient and secure option for staking as well. If there exist two assets, A and B, the process of joining with currency A in this context implies acquiring an option (in the financial sense of the term) to obtain B at the exchange rate of A:B and the price at the moment of joining, which consequently would have an associated cost (which can be evaluated using the Black-Scholes model). Just joining with currency A would be more straightforward. Nevertheless, this can be adjusted by prompting stakeholders to continually vote on the pricing of all currencies and assets utilized in the stake pool—an incentivized vote, as the vote mirrors both the asset’s significance from the perspective of the system and the exchange rate at which the assets could be forcibly exchanged.
A further, more critical issue arises from the potential for pathological metacoins. For instance, one could envision a currency that is supported by gold but has an additional rule, imposed by the institution backing it, declaring that forced transfers initiated by the protocol “do not count”; that is, if such a transfer occurs, the allocation before the transfer is frozen and a new currency is generated using that allocation as its foundation. The old currency no longer enjoys backing by gold, and the new one does. Athenian forcible-exchange protocols can take you far when you can genuinely force property exchanges, but when one can intentionally create pathological assets that arbitrarily bypass certain transaction categories, it becomes considerably more challenging.
Theoretically, the voting mechanism can indeed circumvent this dilemma: nodes can simply opt not to induct currencies they perceive as dubious, and the default strategy can lean towards conservatism, embracing a very limited number of currencies and assets. Overall, we consider currency-agnostic proof of stake to be an open issue; it remains uncertain just how far it can reach, and the ultimate outcome may very likely be some quasi-subjective amalgamation of TrustDavis and Ripple consensus.
SHA3 and RLP
Now, we arrive at the final components of the protocol that we have yet to dissect: the hash algorithm and the serialization algorithm. Here, regrettably, abstracting elements proves significantly more challenging, and it is also considerably tougher to discern the value. First off, it is essential to recognize that even though we have illustrated how we could theoretically abstract away the trees utilized for account storage, it is much more difficult to envision how we could abstract the top-level trie monitoring the accounts themselves. This tree is inherently system-wide, so one cannot merely assert that different users will possess varied versions of it. The top-level trie relies on SHA3, which necessitates some specific hashing algorithm to remain intact. Even the lower-level data structures will likely need to adhere to SHA3; otherwise, there exists a risk of employing a hash function that is not collision-resistant, rendering the entire framework no longer strongly cryptographically validated and possibly leading to discrepancies between full clients and light clients.
RLP is likewise unavoidable; at a minimum, each account must include code and storage, which need to be kept together somehow, thus constituting a serialization format. Fortunately, however, SHA3 and RLP are arguably the most thoroughly tested, future-ready, and robust segments of the protocol, so the advantage of transitioning to alternatives is quite minimal.