Ethereum can be sufficiently straightforward to comprehend from a high-level perspective: Decentralized applications driven by the same kind of crypto-economic assurances that support Bitcoin. However, once one zooms in to, for instance, a detailed view, complexities arise quickly.
Even if one possesses a solid understanding of proof-of-work, it may not be immediately obvious how that relates to a blockchain doing more than tracking everyone’s unspent transaction outputs. Bitcoin employs computational labor to decentralize currency. Ethereum utilizes computational effort to decentralize abstract calculations. Huh? That abstraction is referred to as the Ethereum Virtual Machine, and it serves as the core of the Ethereum protocol because “within” the EVM lies the exclusive domain of smart contracts, which are ultimately responsible for all those ludicrous #defi tweets.
Upgrading the EVM stands as one of the key milestones of the Stateless Ethereum Tech Tree, and before we delve into the intriguing work there, I believe it’s wise to first address the obvious inquiry: “WTF is the EVM?”. In the first part of this two-part series, we will revisit the basics and attempt to grasp the EVM from the ground up, so that later we can really engage with the current discussions about topics like Code Merklization and UNGAS—even elements from the thrilling realm of Eth2 like Execution Environments!
WTF is the EVM?
When first-year Algebra scholars are introduced to that well-known function f(x), an analogy of “the function machine” is often employed. The notion of deterministic input/output appears to be significantly more accessible for students to visualize as a tangible physical machine processing data. I appreciate this analogy as it works in both directions: The EVM, which in a sense actually is a literal machine operating, can be conceptualized as a function that receives some state as inputs and generates a new one based on a set of arbitrary rules.
Setting aside the particulars of those rules for now, let’s assume that the only acceptable state transitions are those stemming from valid transactions (which adhere to the rules). The abstract machine that will determine a new state (S’) based on an old valid state (S) and a new collection of valid transactions (T) is the Ethereum state transition function:
Y(S, T)= S’
The first aspect that’s crucial to grasp about this function is that, as an abstraction, it’s somewhat of a mathematical placeholder: arguably not a tangible entity, and definitely not the EVM. The Ethereum state transition function is elegantly inscribed in Greek in the yellow paper because envisioning the EVM as a black box function effectively aids in conceptualizing the entire blockchain system (of which the EVM is merely one segment). The bidirectional link between functions and machines is determinism: Given any legitimate input, both should yield one and only one output.
However, the EVM, as previously mentioned, is in some respects a tangible machine operating in the world. The EVM’s physical manifestation cannot be articulated in the same manner one might refer to a cloud or an ocean wave, yet it does exist across thousands of interconnected computers running Ethereum clients. And at any given moment, there is one and only one canonical Ethereum state, which is what we are concerned with. All the other components within an Ethereum client exist solely to maintain consensus regarding which state is valid.
The term ‘canonical’ is utilized because ‘valid’ isn’t entirely fitting; a state transition calculated correctly is ‘valid’, yet it may still not be incorporated “on-chain” as part of the canon. Determining which states are canonical and which states are not is solely the duty of miners executing proof-of-work on the chain. Everyone engaging with Ethereum mainnet has, either literally or symbolically, “invested in” one specific particular state history, namely the one with the most computational effort invested in it, as determined by Ethereum’s Greedy Heaviest Observed Subtree (GHOST) protocol. With each new block added to the network comes a fresh set of transactions, a state transition, and a newly determined output state ready to be forwarded into the next canonical block, as dictated by miners. And this process continues indefinitely; that’s how the Ethereum blockchain operates.
We’ve thus far ‘black-boxed’ the EVM as the state transition function (machine) that accepts previous valid blocks and a selection of new transactions (as input), performs some calculations on it, and produces a new valid state (as output). The other components of the Ethereum protocol (such as miners selecting canonical blocks) provide necessary context, but it’s now time for some deeper analysis. What about those specific rules we previously set aside? How does the EVM compute a new state? How can a singular machine handle everything from simple balance transfers to elliptic curve algebra?
The Steampunk Stack Machine
The best way I can introduce the concept of a stack machine is through this cartoon depiction of Babbage’s Analytical Engine (credit: Sydney Padua), which was designed in 1837 but never constructed:
With the majority of individuals now carrying extraordinarily powerful electronic computers in their pockets, it’s easy to forget that computers don’t necessarily have to be electronic or that powerful. Babbage’s Analytical Engine serves as a very (hypothetically) real example of a Turing-complete (!) computer that, if it were built, would have operated on steam and punch cards. The EVM is in significant ways much more closely related to the Analytical Engine of two centuries ago than to the CPU in the device you’re utilizing to read this article.
The EVM is a stack machine, and although in actuality, it’s a virtualized machine operating withinnumerous Ethereum clients concurrently, I find it beneficial to envision the EVM as a tangible, albeit more sophisticated (yet still steam-driven) iteration of the Analytical Engine. This analogy might appear somewhat exaggerated, but I urge you to bear with it for a while as it becomes quite illustrative when we approach the topic of gas and a communal execution atmosphere.
The steampunk EVM would serve as a mechanical computer that operates by manipulating tangible punch cards. Each card would possess 256 slots for hole punches, allowing each card to symbolize any integer from 0 to 2^256. To execute a computation, one might visualize this device, through an intricate system of compressed air, positioning the cards denoting numbers and functions into a stack, adhering to a straightforward principle of “first in, last out”, it would sequentially PUSH new cards atop the stack or POP cards from the top for subsequent use. These might consist of new numbers for calculation, or arithmetic actions such as ADD or MULTIPLY, but they could equally be unique commands such as to STORE a card or set of cards for future use. Since the cards are merely binary, the operations must also be ‘encoded’ into a binary format; thus, we refer to them as operational codes, or simply opcodes for brevity.
If the stack mechanism were calculating 4 * 5 + 12, it would operate as follows:
_POP value 4 from the stack, retain it in memory. POP the value 5 from the stack, hold it in memory. POP the value _ from the stack; transfer everything in memory to the multiplication module; PUSH the returned outcome (20) to the stack. POP the value 20 from the stack; remember it in memory. POP the value 12 from the stack; maintain it in memory. POP the value + from the stack; send everything in memory to the addition module; PUSH the returned outcome (32) onto the stack. (Source: The EVM Runtime Environment)
We can visualize opcodes like ADD or MULTIPLY as distinct modules incorporated into the device, positioned close to the stack for quick access. When the machine is required to multiply 4 and 5, it would relay both cards to the “multiplication engine,” which might click and hiss before delivering the number 20 punched into a new card to PUSH back to the stack’s apex.
The “authentic” EVM contains numerous diverse opcodes for executing various functions. A specific minimal viable collection of these opcodes is necessary to perform generalized computation, and the EVM includes all of them (combined with some unique ones for cryptographic processes, such as the SHA-3 hash function). For better or worse, the notion of whether the EVM is (or isn’t) Turing-complete has been a topic of longstanding debate— it is this stack-based framework which possesses the trait of Turing-completeness: The EVM’s execution rules can, in principle, given ample time and sufficient memory, run any imaginable computer program provided it’s compiled to the appropriate 256-bit words and executed in the stack.
Compiling a program in our alternative realm would require forming a booklet of punch cards containing the relevant data and opcodes. This is literally (or, figuratively literally, whatever) the process occurring behind the scenes when you draft a smart contract in a high-level language like Solidity and convert it into bytecode. You can gain a clear understanding of how a programming language is transformed into machine code by reviewing this humorously annotated output of a Solidity compiler.
Up until now, the state has not yet been addressed, but remember that we set out to comprehend the principles by which a state transition can be computed. We can now articulate it a bit more clearly: The EVM embodies the physical instantiation (read: instance) of the state transition function. A valid state in Ethereum is one calculated by the EVM, and the canonical state is the valid state with the maximum computational effort expended on it (as determined by the GHOST protocol).
(Ideal) Gas
We might envision Babbage completing the imaginary Ethereum Stack Engine and subsequently proclaiming that all mathematical computations and solutions for exceedingly challenging problems were now attainable. He would encourage mathematicians and engineers to organize their issues as ‘transactions’ and submit them to be processed by Lady Lovelace into punch cards, to be run through the global computer. (By the way, Lovelace was the first individual to ever write a computer program, making her the original compiler). Given that the machine is designed to be an execution of the EVM and part of a broader Ethereum steampunk cosmos, we would need to conceptualize the state as a type of expansive Merkleized library catalog that would be refreshed daily based on a predetermined assortment and sequence of transactions deemed ‘canonical’ and archived.
The dilemma with this vision is that a genuine, mechanical EVM would incur extraordinarily high operational costs. The movement of gears, coiling of springs, and activation of various pneumatic chambers sorting punch cards would consume tons of coal each day. Who would shoulder the financial burden of perpetually operating the engine? Suppose that five mathematicians wished to execute their programs on a specific day, but there was time available for only three. How would such resource management challenges be addressed? The approach that Ethereum adopts seems, paradoxically, significantly more intuitive when considering a large and unwieldy mechanical computer: Charge money for computation and memory utilization!
Envisioning the operations of the stack machine as powered by compressed air, one could quantify the precise amount of gas necessary for performing an ADD operation and juxtapose it against the (much greater) quantity of gas required for SHA3. The schedule of gas fees for each opcode could be made publicly accessible, and anyone submitting a program would need to ensure they provided at least sufficient funds for theircomputation and storage capacity based on the gas fees (which may be related to the cost of coal or the need for computation). The ultimate stroke of brilliance is to make the machine’s state itself a ledger for accounts and balances, enabling a user to incorporate payment for their computation within the transaction itself.
As you might be aware, gas in an Ethereum transaction represents the costs of computation and memory for the EVM. Transaction gas fees must be settled in ETH and are non-recoverable once execution occurs, regardless of whether the action is successful or not. If a contract invocation exhausts its gas limit at any moment during an operation, it results in an out-of-gas error.
The gas mechanism cleverly fulfills two roles: it efficiently allocates the shared computational resources of the EVM in line with demand, and it offers adequate protection against endlessly looping programs (a challenge that emerges from Turing-completeness).
In the upcoming edition of “The 1.X Files”
I trust this whimsical mechanical illustration of a stack machine has been useful. If you found delight in contemplating the steampunk EVM as much as I have, and if you appreciate historically plausible alternate reality comic books, do take a look at “The Thrilling Adventures of Babbage and Lovelace” mentioned earlier; it will not disappoint.
Grasping something so abstract can be quite challenging, but there are subjects in the Stateless Tech Tree that will be considerably simpler to tackle with a relatively complete (even if somewhat cartoon-like) mental visualization of an EVM implementation.
One such subject is the implementation of Code Merkleization into the EVM, which would significantly minimize the size of witnesses by dividing compiled contract code into smaller segments. Next time, we will be able to delve into these topics directly.
As always, if you have any inquiries, feedback, suggestions for new subjects, or steampunk Ethereum fanfiction, please reach out to @gichiba or @JHancock on Twitter.