Sincere appreciation to Vlad Zamfir, Chris Barnett, and Dominic Williams for concepts and motivation
In a recent blog article, I detailed some provisional solutions for scalability, all of which fall under the umbrella of Ethereum 1.0 in its current form. Tailored micropayment protocols like channels and probabilistic payment systems could facilitate minor payments, utilizing the blockchain solely for eventual settlement, or merely probabilistically. For certain computation-intensive applications, computation can be performed by one entity by default, but in a manner that can be “pulled down” for auditing by the entire network if any party suspects wrongdoing. Nonetheless, these strategies are inherently application-specific and far from optimal. In this article, I present a more all-encompassing solution, which, although it does come with some “fragility” issues, offers a resolution that is significantly closer to universal applicability.
Grasping the Aim
To begin with, before delving into the specifics, we must attain a much deeper comprehension of our actual goals. What do we mean by scalability, especially in the context of Ethereum? In relation to a Bitcoin-like currency, the explanation is fairly straightforward; our objectives are to:
- Handle tens of thousands of transactions each second
- Ensure a transaction fee of under $0.001
- Achieve all this while preserving security against threats of at least 25% and avoiding heavily centralized full nodes
The initial goal alone is simple; we merely eliminate the block size limit and allow the blockchain to naturally expand until it reaches that scale, and the market self-regulates to compel smaller full nodes to continue exiting until the only three remaining are operated by GHash.io, Coinbase, and Circle. At that stage, a balance will arise between fees and size, as excessive size induces greater centralization, which in turn results in increased fees due to monopolistic pricing. To achieve the second aim, we can simply introduce numerous altcoins. However, to realize all three collectively, we must transcend a fundamental obstacle posed by Bitcoin and every other current cryptocurrency, and devise a system that operates without the requirement for any “full nodes” that must process every transaction.
In an Ethereum context, the notion of scalability becomes slightly more complex. Ethereum is, at its core, a platform for “dapps,” and within that framework, there exist two types of scalability that are pertinent:
- Enable a multitude of individuals to create dapps while keeping transaction fees minimal
- Allow each distinct dapp to achieve scalability in a manner akin to that for Bitcoin
The first is inherently less challenging than the second. The only characteristic that the “develop numerous alt-Etherea” approach lacks is that each individual alt-Ethereum has relatively fragile security; at a scale of 1000 alt-Etherea, each one would be susceptible to a 0.1% attack from the perspective of the total system (that 0.1% pertains to externally-sourced assaults; internally-sourced offenses, such as collusion between GHash.io and Discus Fish, would necessitate only 0.05%). If we could devise a method for all alt-Etherea to share consensus strength, e.g., some variation of merged mining that allows each chain to harness the strength of the entire group without necessitating the existence of miners simultaneously aware of all chains, we would reach our goal.
The second presents more challenges, as it leads to the same fragility characteristic that emerges from scaling Bitcoin as a currency: if every node perceives only a limited portion of the state, and arbitrary amounts of BTC can legitimately emerge in any part of the state deriving from any section of the state (such fungibility is part of a currency’s definition), one can intuitively discern how forgery assaults might propagate through the blockchain unnoticed until it is too late to reverse all without causing significant disruption to the system via a global rollback.
Reinventing the Wheel
We’ll commence by outlining a relatively straightforward model that does provide both types of scalability, but delivers the latter only in a very limited and expensive manner; effectively, we possess just enough intra-dapp scalability to guarantee asset fungibility, but not much beyond that. The model functions as follows:
Assume that the global Ethereum state (i.e., all accounts, contracts, and balances) is divided into N segments (“substates”) (consider 10 [dest_substate, address, value, data]. Secondly, there exists an opcode CROSS_SEND, which takes those four parameters as inputs, and sends such a unidirectional message to the designated substate.
Miners mine blocks on a particular substate s[j], and each block on s[j] concurrently acts as a block in the hub chain. Each block on s[j] has dependencies on the preceding block on s[j] and the preceding block on the hub chain. For instance, with N = 2, the chain would appear somewhat like this:
The block-level state transition function, while mining on substate s[j], accomplishes three tasks:
- Manages state transitions within s[j]
- If any of those state transitions generates a CROSS_SEND, incorporates that message into the hub chain
- If any messages exist on the hub chain with dest_substate = j, eliminates the messages from the hub chain, sends them to their corresponding destination addresses on s[j], and processes all resulting state transitions
From a scalability viewpoint, this yields a considerable enhancement. All miners only need to be aware of two out of the total N + 1 substates: their own substate, and the hub substate. Dapps that are small and self-sufficient will operate on a single substate, while dapps that wish to operate across multiple substates will need to relay messages through the hub. For instance, a cross-substate currency dapp would maintain a contract on all substates, and each contract would feature an API.that enables an individual to eliminate currency units within a specific substate in return for the contract transmitting a notification that would result in the user receiving an equivalent amount on a different substate.
Messages traversing the hub must be acknowledged by every node, making these transactions costly; nevertheless, in the scenario of ether or sub-currencies, we only require the transfer mechanism to be utilized occasionally for settlement, managing off-chain inter-substate exchanges for the majority of transfers.
Attacks, Challenges, and Responses
Now, let’s take this straightforward framework and evaluate its security features (for illustrative purposes, we’re using N = 100). Firstly, the framework is resilient against double-spending attacks up to 50% of the total hashpower; this is because each sub-chain is essentially merge-mined with all other sub-chains, with every block fortifying the security of all sub-chains at once.
However, there exist more perilous types of attacks. Suppose a malicious attacker possessing 4% hashpower infiltrates one of the substates, thereby gaining control of 80% of the mining capacity on it. At this point, that attacker generates blocks that are invalid—for instance, the attacker includes a state transition that generates messages that give out 1000000 ETH to every other substate without basis. Other miners within the same substate will identify the malicious miner’s blocks as invalid, but this is inconsequential; they only represent a minuscule portion of the total network, constituting just 20% of that substate. The miners on alternate substates remain unaware that the attacker’s blocks are invalid, as they lack information regarding the status of the “compromised substate,” leading them to potentially accept the blocks without scrutiny.
Fortunately, the solution is more intricate but remains well within the capabilities of our current understanding: as soon as one of the few legitimate miners on the compromised substate handles the invalid block, they will recognize its invalidity, identifying the specific fault. Consequently, they can produce a light-client Merkle tree proof which demonstrates that a particular segment of the state transition was invalid. To elaborate on how this functions, a light client proof comprises three components:
- The initial state root from which the state transition commenced
- The concluding state root that the state transition arrived at
- The portion of Patricia tree nodes that are accessed or altered during the execution of the state transition
The first two “initial state roots” correspond to the roots of the Ethereum Patricia state tree prior to and following the execution of the transaction; the Ethereum protocol mandates that both must be included in every block. The specified Patricia state tree nodes are essential for the verifier to follow along the computation and verify that the identical outcome is reached at the conclusion. For instance, if a transaction alters the state of three accounts, the collection of tree nodes that require provision might resemble the following:
Technically, the proof should encompass the array of Patricia tree nodes required to access the initial state roots and the transaction also, but that’s a relatively minor aspect. Overall, the proof can be perceived as comprising the minimal set of data from the blockchain necessary to facilitate that specific transaction, along with some additional nodes to establish that those portions of the blockchain are indeed part of the current state. Once the whistleblower constructs this proof, it will be disseminated throughout the network, and all other miners will view the proof and reject the invalid block.
Nevertheless, the most formidable class of attack is identified as a “data unavailability attack.” In this scenario, envision that the miner disseminates solely the block header to the network, along with the list of messages intended for addition to the hub, but fails to provide any of the transactions, intermediate state roots, or any additional data. This leads to an issue. Theoretically, it is entirely conceivable that the block is wholly legitimate; the block could have been properly constructed by collating transactions from a few affluent individuals who happened to be exceptionally generous. In actuality, however, this is not the case, and the block is fraudulent, but the unavailability of the data completely obstructs the construction of a definitive proof of the fraud. The 20% honest miners on the compromised substate may raise alarms, but they possess no evidence whatsoever, and any protocol that heeds their claims would inevitably succumb to a 0.2% denial-of-service attack where the miner seizes 20% of a substate and asserts that the other 80% of miners on that substate are colluding against him.
To tackle this challenge, we need what is referred to as a challenge-response protocol. Fundamentally, the mechanism operates as follows:
- Honest miners on the compromised substate observe the block that only includes the header.
- A legitimate miner issues a “challenge” in the form of an index (i.e., a number).
- If the block producer can furnish a “response” to the challenge, consisting of a light-client proof that the transaction execution at the specified index was carried out authentically (or a proof confirming that the specified index exceeds the count of transactions in the block), then the challenge is regarded as addressed.
- If a challenge remains unanswered for several seconds, miners on different substates consider the block dubious and refuse to mine on it (the game-theoretic rationale for this is consistent: they suspect that others will adopt the same strategy, and there is no benefit in mining on a substate that is likely to be orphaned soon)
Be aware that the mechanism entails a few additional complexities to function effectively. If a block is released alongside all its transactions with the exception of a few, then the challenge-response protocol could efficiently evaluate them all and dismiss the block. However, if a block was genuinely published as headers-only, then if it contained hundreds of transactions, a multitude of challenges would be necessary. One heuristic method to resolve the issue is that miners receiving a block should privately select some random nonces, send out a few challenges for those nonces to certain known miners on the potentially compromised substate, and if responses to all challenges do not arrive promptly, treat the block as suspect. Note that the miner does NOT publicly broadcast the challenge – this would provide an opportunity for an attacker to swiftly fill in the missing data.
The second issue is that the protocol is susceptible to a denial-of-service attack wherein attackers publish an overwhelming number of challenges to legitimate…blocks. To address this issue, creating a challenge ought to incur some expense – nonetheless, if this expense is excessively high, then the act of issuing a challenge will demand an extremely substantial “altruism delta,” perhaps so elevated that an assault will eventually occur and no individual will contest it. While some might lean towards resolving this with a market-oriented approach that assigns the obligation of generating the challenge to any parties who get defrauded by the erroneous state transition, it is important to recognize that it’s feasible to devise a state transition that fabricates new resources from nothing, marginally pilfering from everyone through inflation, and also favors affluent coin holders, resulting in a heist where there is no concentrated motivation to contest it.
For a currency, one “simple remedy” is limiting the worth of a transaction, thus rendering the entire issue to have only very restricted ramifications. For a Turing-complete protocol, the remedy is more intricate; the optimal strategies probably involve both making challenges costly and introducing a mining incentive to them. There will be a niche group of “challenge miners,” and the hypothesis is that they will be neutral regarding which challenges to undertake, so even the smallest altruism delta, enforced by software standards, will prompt them to execute correct challenges. One might even consider analyzing the duration it takes for challenges to receive responses, offering higher rewards for those that require more time.
The Twelve-Dimensional Hypercube
Note: this is NOT identical to the erasure-coding Borg cube. For additional details on that topic, refer here: https://blog.ethereum.org/2014/08/16/secret-sharing-erasure-coding-guide-aspiring-dropbox-decentralizer/
We can identify two weaknesses in the aforementioned framework. First, the justification that the challenge-response protocol will be effective is quite tenuous at best, exhibiting inadequate degenerate-case performance: a substate takeover assault paired with a denial of service strike that obstructs challenges could potentially compel an invalid block into a chain, necessitating a future day-long roll-back of the entire chain when (if?) the situation stabilizes. Furthermore, there is a vulnerability aspect here: an invalid block in any substate will nullify all subsequent blocks across all substates. Second, messages crossing substates must still be accessible by all nodes. We’ll begin by addressing the second issue, then go on to illustrate a possible defense to ameliorate the first issue slightly, and finally resolve it entirely while simultaneously abolishing proof of work.
The second issue, the costly nature of cross-substate messages, is resolved by transitioning the blockchain model from this:
To this:
Except the cube should feature twelve dimensions rather than three. Now, the protocol appears as follows:
- There exist 2N substates, each identified by a binary string of length N (e.g., 0010111111101). We define the Hamming distance H(S1, S2) as the count of digits that differ between the IDs of substates S1 and S2 (e.g., HD(00110, 00111) = 1, HD(00110, 10010) = 2, etc.).
- The state of each substate retains the ordinary state tree as before, but also includes an outbox.
- An opcode exists, CROSS_SEND, taking 4 arguments [dest_substate, to_address, value, data], and records a message with those parameters in the outbox of S_from where S_from is the substate from which the opcode was invoked.
- All miners must “mine an edge”; that is, valid blocks alter two adjacent substates S_a and S_b, and can incorporate transactions for either substate. The block-level state transition function is as follows:
- Process all transactions sequentially, applying the state transitions to S_a or S_b as required.
- Process all messages present in the outboxes of S_a and S_b sequentially. If the message is located in the outbox of S_a and has a final destination of S_b, implement the state transitions, and likewise for messages traveling from S_b to S_a. Otherwise, if a message resides in S_a and there is HD(S_b, msg.dest), transfer the message from the outbox of S_a to the outbox of S_b, and similarly for the opposite.
- A header chain exists that monitors all headers, thereby allowing all these blocks to be merged mined, and maintaining a centralized repository where the roots of each state are housed.
Essentially, rather than traversing through the hub, messages navigate their way around the substates along edges, and the continually decreasing Hamming distance guarantees that each message ultimately reaches its specified destination.
The crucial design choice here is the configuration of all substates into a hypercube. Why was the cube selected? The optimal way to conceptualize the cube is as a balance between two extreme alternatives: on one side the circle, and on the other, the simplex (essentially, a 2N-dimensional analog of a tetrahedron). In a circle, a message would need to traverse, on average, a quarter of the distance around the circle before arriving at its target, implying that we gain no efficiency over the classic hub-and-spoke structure.
In a simplex, every pair of substates is connected by an edge, thus a cross-substate message would be transmitted as soon as a block connecting those two substates is generated. However, with miners selecting random edges, it would take a significant amount of time for a block on the correct edge to be created, andmore significantly, participants observing a specific substate must function as at least light clients across all other substates in order to authenticate blocks pertinent to their interests. The hypercube offers an ideal equilibrium – each substate boasts a logarithmically increasing number of adjacent states, the length of the longest route expands logarithmically, and the block duration of any given connection also increases logarithmically.
It is essential to note that this algorithm shares most of the shortcomings related to the hub-and-spoke model – particularly, its poor performance in degenerate cases and the unclear economic implications surrounding challenge-response protocols. To enhance stability, one potential method is to adjust the header chain somewhat.
Currently, the header chain enforces very stringent validity criteria – if any block at any point in the header chain is found to be invalid, all blocks in every substate built upon it are also deemed invalid and must be redone. To address this issue, we could modify the header chain to merely track headers, allowing it to encompass both invalid headers and even multiple forks of the same substate chain. To incorporate a merge-mining protocol, we instantiate exponential subjective scoring while utilizing the header chain as an absolute common timekeeper. We select a lower base (for instance, 0.75 versus 0.99) and impose a maximum penalty factor of 1 / 2N to eliminate the advantages of forking the header chain; for those unacquainted with the mechanics of ESS, this simply implies “permit the header chain to include all headers, but utilize the header chain’s order to penalize blocks arriving later without rendering this penalty excessively severe.” Furthermore, we introduce a delay for cross-substate communications, ensuring that a message in an outbox only becomes “eligible” when the originating block is a few dozen blocks deep.
Proof of Stake
Now, let’s focus on adapting the protocol to a nearly-pure proof of stake system. For the moment, we will set aside concerns related to nothing-at-stake; Slasher-like protocols combined with exponential subjective scoring can address those issues, which will be discussed in later sections. Our initial goal is to illustrate how the hypercube operates without mining while also partially resolving the fragility problem. We will commence with a proof of activity implementation for multichain. The protocol functions as follows:
- There exists 2N substates represented by binary strings, similar to before, alongside a header chain (which also monitors the latest state root of each substate).
- Anyone can mine an edge, just like previously, but with reduced difficulty. However, once a block is mined, it is required to be published with the complete set of Merkle tree proofs, enabling a node with no previous information to fully verify all state transitions within the block.
- There exists a bonding protocol whereby an address can declare itself as a potential signer by providing a bond of size B (wealthier addresses will need to establish multiple sub-accounts). Possible signers are recorded in a specialized contract C[s] on each substate s.
- According to the block hash, a random selection of 200 substates s[i] is made, and a search index ind[i] is selected for each substate. Define signer[i] as the holder of the first address in C[s[i]] following index ind[i]. For the block to be considered valid, it must be endorsed by at least 133 members from the set signer[0] … signer[199].
To verify the validity of a block, members of the consensus group must take two actions. First, they would verify that the initial state roots listed in the block correspond with the relevant state roots in the header chain. Second, they would process the transactions to ensure that the final state roots align with those provided in the header chain, and that all trie nodes necessary for computing the update are present somewhere in the network. If both validations succeed, they sign the block, and if the block receives signatures from a sufficient number of consensus group members, it becomes incorporated into the header chain, updating the state roots for the two impacted blocks within the header chain.
And that encompasses everything. The critical attribute here is that each block is associated with a randomly chosen consensus group, selected from the overall state of all account holders. Thus, unless an adversary controls at least 33% of the stake within the entire system, it will be nearly impossible (specifically, a 2-70 probability, which, given 230 proof of work, falls comfortably within the domain of cryptographic impossibility) for the attacker to secure a block signature. Furthermore, without a stake of at least 33%, an attacker will not be able to hinder legitimate miners from creating blocks and obtaining signatures for them.
This methodology presents the advantage of exhibiting favorable degenerate-case behavior; should a denial-of-service attack occur, it is likely that very few or no blocks will be produced, or at least the production rate will be extremely slow, but no harm will result.
Now, the challenge becomes how to further lessen dependence on proof of work while incorporating blockmaker and Slasher-based methods? A straightforward strategy is to implement a distinct blockmaker protocol for every edge, mirroring the single-chain approach. To motivate blockmakers to act with integrity and avoid double-signing, Slasher can also be implemented here: if a signer endorses a block that ultimately isn’t included in the main chain, they receive a penalty. Schelling point dynamics ensure that every participant is incentivized to adhere to the protocol, as they assume that others will do the same (along with a minor pseudo-incentive of software defaults which strengthen the equilibrium).
A full EVM
These protocols enable us to transmit one-way messages from one substate to another. Yet, one-way communications have limitations in functionality (or rather, they possess as much capability as we wish them to have because everything is Turing-complete, but they can prove cumbersome). What if we could enable the hypercube to simulate a complete cross-substate EVM, allowing for the invocation of functions across different substates?
Surprisingly, this is feasible. The crux lies in appending a data structure termed a continuation to the messages. For instance, let’s consider that we are in the midst of a computation where a contract calls acontract that establishes a contract, and we are presently executing the code that generates the inner contract. Therefore, the current stage of our computation resembles the following:
What is the present “state” of this computation? In other words, what is the collection of all the data necessary to pause the computation, and later use that data to resume it? Within a single instance of the EVM, this comprises solely the program counter (i.e., our position within the code), the memory, and the stack. In scenarios where contracts invoke each other, we require that data for the entire “computational tree”, covering our current scope, the parent scope, the ancestor of that, and so on back to the original transaction:
This is referred to as a “continuation”. To resume execution from this continuation, we merely reactivate each computation and execute it to completion in reverse sequence (i.e., conclude with the innermost first, insert its output into the designated area of its parent, then finalize the parent, and so forth). Now, to achieve a fully scalable EVM, we simply substitute the idea of a one-way message with a continuation, and there we have it.
However, the question arises: do we truly wish to proceed this far? Firstly, transitioning between substates with such a virtual machine would be exceedingly inefficient; if a transaction’s execution requires access to ten contracts, and each contract exists in a random substate, then navigating through the entirety of that execution will consume an average of six blocks per transmission, multiplied by two transmissions per sub-call, and multiplied by ten sub-calls, resulting in a total of 120 blocks. Furthermore, we encounter issues with synchronicity; if A invokes B once and then again, but in between those calls, C invokes B, then C will encounter B in a partially processed state, which may present security vulnerabilities. Lastly, integrating this mechanism with the idea of reverting transaction execution when transactions exhaust their gas is challenging. Thus, it may be more viable to forgo continuations and instead favor straightforward one-way messages; since the language is Turing-complete, continuations can always be implemented later.
Due to the inefficiency and instability of cross-chain messages regardless of their implementation, the majority of dapps will likely prefer to operate entirely within a single sub-state, and dapps or contracts that frequently interact will also want to exist within the same substate. To prevent absolutely everyone from residing on the same substate, we can allow the gas limits for each substate to “spill over” into one another, striving to maintain uniformity across substates; consequently, market dynamics will naturally foster an environment where popular substates become pricier, prompting marginally indifferent users and dapps to settle in fresh territories.
Not So Fast
What issues remain? First, there’s the data availability dilemma: what occurs if all full nodes within a given substate vanish? If such a scenario occurs, the substate’s data will disappear indefinitely, and the blockchain will essentially need to be forked from the last block where all substate data is known. This could lead to double-spends, some malfunctioning dapps due to duplicate messages, etc. Therefore, we need to ensure that this situation never arises. This operates on a 1-of-N trust model; as long as one honest node retains the data, we are secure. Single-chain architectures also utilize this trust model, but the risk escalates as the number of nodes expected to retain each piece of data diminishes – as it does here by a factor of 2048. The concern is eased by the existence of altruistic nodes such as blockchain explorers, yet this may become problematic if the network scales so much that no single data center can accommodate the entire state.
Secondly, there exists a fragility issue: if any block anywhere in the system is processed incorrectly, it could result in ripple effects throughout the entire network. A cross-substate message may fail to be sent, or could be resent; coins could be double-spent, and so forth. Certainly, once a problem is discovered, it would inevitably be noticed, and it could be rectified by reverting the entire chain from that moment, but it remains unclear how often such situations may occur. One potential resolution to fragility is to maintain a separate version of ether in each substate, permitting ethers in diverse substates to interact variably with one another, and then introduce message redundancy features to higher-level languages, acknowledging that messages will be probabilistic; this would reduce the number of nodes verifying each header down to approximately 20, thereby enhancing scalability, although much of that would be offset by an increased volume of cross-substate messaging for error correction.
A third complication is that scalability is restricted; every transaction must reside within a substate, and each substate must exist within a header that every node must monitor, thus if a node’s maximum processing capacity is N transactions, the network can handle up to N2 transactions. One strategy to enhance scalability is to structure the hypercube hierarchically – envision the block headers in the header chain as transactions, and imagine the header chain itself evolving from a single-chain model to the identical hypercube model discussed here – that would yield N3 scalability, and applying this recursively would yield a form reminiscent of tree chains, providing exponential scalability – albeit at the expense of increased complexity, and rendering transactions traversing the entire state space considerably less efficient.
Lastly, fixing the number of substates at 4096 is not optimal; ideally, this number should increase over time as the state expands. One approach is to monitor the number of transactions per substate, and once that count exceeds the number of substates, we can simply add a dimension to the cube (i.e., double the count of substates). More sophisticated methods could involve the application of minimal cut algorithms like the relatively straightforward Karger’s algorithm to attempt to bisect each substate when a dimension is added. Nonetheless, such approaches face challenges, as they are complex and could unexpectedly elevate the costs and latency for dapps that inadvertently get bisected.
Alternative Approaches
Indeed, hypercubing the blockchain is not the sole method for enhancing blockchain scalability. A notably promising alternative is the establishment of an ecosystem consisting of multiple blockchains, some tailored for specific applications and others resembling generalized Ethereum-like scripting environments, while facilitating communication among them – practically, this typically involves having all (or at least some) of the blockchains maintain “light clients” of each other within their own states. The primary challenge is determining how these chains can reach a consensus, particularly within a proof-of-stake framework. Ideally, all chains involved in such a system would provide mutual reinforcement; however, how can that be achieved when it isn’t clear how to assess the value of each coin? If an aggressor holds 5% of all A-coins, 3% of all B-coins, and 80% of all C-coins, how can A-coin ascertain whether B-coin or C-coin should be accorded more weight?
One solution is to employ something akin to Ripple consensus among chains—each chain could initially decide, upon launch or gradually through stakeholder consensus, how much it values the consensus input of other chains, and then allow transitive effects to ensure that every chain protects all others over time. Such a model is highly effective as it welcomes innovation—anyone can create new chains at any moment with arbitrary rules, and all chains can still connect and reinforce each other; it is quite plausible that we will witness such an inter-chain mechanism developing among the majority of chains in the future, with some large chains, possibly even older ones like Bitcoin and frameworks such as a hypercube-based Ethereum 2.0, existing on their own simply for historical purposes. The concept here is to achieve a genuinely decentralized design: all parties support one another, rather than merely aligning with the strongest chain and hoping it doesn’t succumb to an unforeseen attack.

