“`html
Given that many of us have a bit more available time, I believed this would be an excellent chance to engage with a subject that may be somewhat dull and laborious, yet remains quite essential to the Stateless Ethereum initiative: grasping the formal Witness Specification.
Like the commander of the Battleship in StarCraft, we will adopt a steady pace. The witness specification is not a particularly intricate concept, but it is indeed very profound. That profundity can be slightly intimidating, yet it is certainly worthwhile to delve into, as it will yield insights that, perhaps to your enthusiastic delight, stretch well beyond the sphere of blockchains, or even the realm of software!
By the conclusion of this introductory guide, you should possess at least a foundational confidence in your capability to comprehend what the formal Stateless Ethereum Witness Specification entails. I will also aim to make it a bit more entertaining.
Recap: What you must understand about State
Stateless Ethereum is, of course, somewhat of a mischaracterization, as the state is genuinely at the core of this entire endeavor. Specifically, the aim is to discover a way to make retaining a copy of the entire Ethereum state an optional requirement. If you haven’t been keeping up with this series, it might be worth checking out my previous primer on the status of stateless Ethereum. I’ll provide a brief TL;DR here, though. Feel free to skim if you reckon you’ve already grasped this topic quite well.
The complete ‘state’ of Ethereum encapsulates the current condition of all accounts and balances, along with the collective memories of all smart contracts that have been deployed and are operational within the EVM. Each finalized block in the chain possesses one and only one state, which is unanimously acknowledged by all participants within the network. That state is modified and updated with each new block that is incorporated into the chain.
The Ethereum State is represented in silico as a Merkle-Patricia Trie: a hashed data structure that arranges each singular piece of information (e.g., an account balance) into one extensive interconnected unit that can be verified for uniqueness. The complete state trie is too gargantuan to visualize, but here’s a ‘toy model’ that will be useful when we address witnesses:
Like enchanting cryptographic caterpillars, the accounts and code of smart contracts reside in the leaves and branches of this tree, which through successive hashing ultimately results in a single root hash. If you wish to ascertain that two copies of a state trie are identical, you can merely compare the root hashes. Sustaining relatively secure and indisputable consensus over one ‘canonical’ state is the essence of what a blockchain is crafted to achieve.
To submit a transaction for inclusion in the next block, or to verify that a specific modification is coherent with the last incorporated block, Ethereum nodes must maintain a thorough copy of the state and repeatedly re-compute the root hash. Stateless Ethereum introduces a series of alterations that will eliminate this necessity by introducing what is known as a ‘witness’.
A Witness Overview
Before we delve into the witness specification, it will be beneficial to have an intuitive grasp of what a witness signifies. Again, a more comprehensive explanation can be found in the post regarding the Ethereum state linked above.
A witness is akin to a cheat sheet for an unaware (stateless) student (client). It comprises merely the essential information needed to clear the exam (submit a valid state change for inclusion in the subsequent block). Instead of poring over the entire textbook (maintaining a copy of the current state), the uninformed student (stateless client) requests a friend (full node) for a crib sheet to submit their responses.
In very abstract terms, a witness provides all of the necessary hashes within a state trie, coupled with certain ‘structural’ information regarding the location of those hashes in the trie. This allows an ‘unaware’ node to incorporate new transactions into its state and to compute a new root hash locally – without necessitating them to download an entire copy of the state trie.
Let’s transition from the cartoonish notion to a more tangible illustration. Here is a “real” depiction of a witness:
I suggest opening this image in a new tab to allow you to zoom in and truly appreciate it. This witness was chosen due to its relatively small size and ease of feature identification. Each tiny square in this image denotes a single ‘nibble’, or half of a byte, and you can confirm that yourself by counting the number of squares you traverse, starting from the root and arriving at an Ether balance (you should count 64). While we examine this image, note the substantial segment of code within one of the transactions that must be included for a contract call — code constitutes a considerably large portion of the witness, and could be diminished by code merkleization (which we’ll investigate on another occasion).
Some Formalities
One of the fundamental unique characteristics of Ethereum as a protocol is its autonomy from any particular implementation. This is why, rather than having just one official client as seen in Bitcoin, Ethereum comprises several entirely distinct versions of clients. These clients, crafted in various programming languages, must comply with The Ethereum Yellow Paper, which elucidates in much more formal terms how any client should function to partake in the Ethereum protocol. This ensures that a developer creating a client for Ethereum does not encounter any ambiguity within the system.
The Witness Specification aims to fulfill this exact purpose: to offer an unequivocal description of what a witness is, which will facilitate its implementation across any language, for all clients. If and when Stateless Ethereum gains traction, the witness specification…
“`can be incorporated into the Yellow Paper as an annex.
When we utilize the term unambiguous in this instance, it signifies something more robust than what you might mean in everyday language. It’s not merely that the formal specification is just a highly detailed account of what a witness is and how it functions. It implies that, ideally, there exists precisely one way to describe a specific witness. In other words, if you conform to the formal specification, it would be unfeasible for you to create an implementation for Stateless Ethereum that produces witnesses distinct from any other implementation also adhering to the stipulations. This is crucial, because the witness is intended to (optimistically) evolve into a new foundation of the Ethereum protocol; it must be valid inherently.
A Question of Semantics (and Syntax)
While ‘blockchain development’ often connotes something fresh and exhilarating, it should be noted that a significant portion is rooted in much older and wiser disciplines of computing, cryptography, and formal reasoning. This truly manifests in the Witness Specification! To comprehend its functionality, we must be comfortable with some of the technical jargon, and to achieve that, we will need to make a brief diversion into linguistics and formal language theories.
Recite the following two sentences aloud, paying special attention to your intonation and rhythm:
- furiously sleep ideas green colorless
- colorless green ideas sleep furiously
I wager the first sentence came out feeling somewhat mechanical, with flat emphasis and a pause after each term. In contrast, the second sentence likely felt more natural, albeit somewhat whimsical. Even though it didn’t truly convey any meaning, the second sentence made sense in a manner that the first did not. This serves as an intuition pump to focus on the contrast between Syntax and Semantics. If you’re a speaker of English, you have an understanding of what the words denote (their semantic content), but that was largely irrelevant here; what you noticed was a difference between legitimate and illegitimate grammar (their syntax).
This illustrative sentence originates from a 1956 paper by Noam Chomsky, a name you may recognize. Although he is currently known as a significant political and social theorist, Chomsky’s initial contributions as a scholar were in the realm of logic and linguistics, and within this document, he established one of the most beneficial classification systems for formal languages.
Chomsky focused on the mathematical depiction of grammar, how languages can be categorized based on their grammatical rules, and what traits those categories possess. One such trait pertinent to us is syntactic ambiguity.
Ambiguous Buffalo
Consider the grammatically accurate sentence “Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.” — this classic example showcases just how ambiguous English syntax rules can be. If you understand that, depending on the scenario, the term ‘buffalo’ can function as a verb (to intimidate), an adjective (from Buffalo, NY), or a noun (a bison), you can interpret the sentence based on the contextual placement of each word.
We could also rephrase entirely using different terms and distinct sentences: “Are you familiar with those NY bison that the other NY bison intimidate? Well, they intimidate, also. To be precise, they intimidate NY bison.”
But if we wish to eliminate the ambiguity while still confining our terminology to just ‘buffalo’, and maintain it as a single sentence? It is feasible, yet we need to slightly adjust the rules of English. Our new “language” will be more precise. One approach to accomplish that is to mark each word to clarify its part of speech, like this:
Buffalo{pn} buffalo{n} Buffalo{pn} buffalo{n} buffalo{v} buffalo{v} Buffalo{pn} buffalo{n}
Perhaps that is still not entirely clear for the reader. To enhance clarity, let’s utilize a bit of substitution to help us group some of these “buffalo” together. Any bison from Buffalo, NY is fundamentally just one specific instance of what we could define as a “noun phrase”, or
where ::= signifies “What is on the left side can be replaced by what is on the right side”. Significantly, we don’t want this relationship to be reversible; imagine how frustrated the Boulder buffalo would become!
Implementing our substitution rule to the complete sentence, it would transform to:
Now, this remains somewhat perplexing, because in this sentence there lurks a clever relative clause, which becomes much clearer by inserting the word ‘that’ into the first portion of our sentence, i.e.
Therefore, let us create a substitution rule that consolidates the relative clause into
Moreover, since a relative clause merely provides further clarification about a noun phrase, the two combined are equivalent to just another noun phrase:
With these rules defined and applied, we can articulate the sentence as:
That appears quite impressive, and really gets at the central connection this amusing statement conveys: One specific group of bison frightening another group of bison.
We’ve progressed this far, so why not go all the way? Whenever ‘buffalo’ as a verb comes before a noun, we could designate that as a verb phrase, or
And with this, we have our single complete valid statement, which we could refer to as S:
S ::=
What we’ve accomplished here might be better depicted visually:
That structure looks intriguingly familiar, doesn’t it?
The buffalo illustration is a bit absurd and not particularly rigorous, but it’s close enough to illustrate what’s happening with the strange mathematical language of the Witness Specification, which I have quite slyly brought up in my discourse about buffalo. It’s referred to as Backus-Naur form notation, and it’s frequently utilized in formal specifications like this, across a range of real-world situations.
The ‘substitution guidelines’ we established for our restricted English language facilitated ensuring that, given a herd of “buffalo”, we could formulate a ‘valid’ sentence without needing to know anything about the meaning of the word buffalo in reality. In the classification first clarified by Chomsky, a language that has sufficiently precise rules of grammar that allow you to achieve this is termed a context-free language.
More critically, the guidelines guarantee that for every possible sentence made up of the word(s) buffalon, there exists one and only one method to construct the data structure shown in the tree diagram above. Clarity FTW!
Proceed and Explore the Spec
Witnesses are fundamentally just a single extensive object, encoded into a byte array. From the (anthropomorphic) viewpoint of a stateless client, that array of bytes might resemble a lengthy sentence composed of very similar looking words. As long as all clients adhere to the same set of rules, the byte array should translate into one and only one hashed data structure, regardless of how the implementation decides to represent it in memory or on disk.
The production rules, detailed in section 3.2, are somewhat more intricate and significantly less instinctive than those we utilized for our toy example, but the essence remains very much the same: To be unambiguous directives for a stateless client (or a developer crafting a client) to follow and be confident that they’re getting it correct.
I’ve skimmed over quite a bit within this discussion, and the rabbit hole of formal languages extends much deeper, to be sure. My intent here was simply to provide enough of an introduction and foundation to overcome that initial hurdle of comprehension. Now that you’ve surmounted that challenge, it’s time to open Wikipedia and tackle the rest on your own!
As always, if you have feedback, inquiries, or topic requests, please @gichiba or @JHancock on Twitter.