Navigating the Minefield: Challenges Facing Ethereum’s State Mechanics

wsjcrypto

3 settimane fa

“`html

Through this blog entry, the aim is to formally unveil a significant risk to the Ethereum network, which posed a clear and present threat up until the Berlin hardfork.

State

Let’s start with some context on Ethereum and State.

The Ethereum state is made up of a patricia-merkle trie, which is a type of prefix-tree. This post will not delve too deeply into this structure, but it is important to note that as the state expands, the branches of this tree become denser. Every additional account represents another leaf. Between the tree’s root and the actual leaf, there are several “intermediate” nodes.

To locate a specific account, or “leaf” within this vast tree, approximately 6-9 hashes must be resolved, going from the root through the intermediate nodes to finally determine the last hash that leads to the desired data.

In simple terms: when a trie lookup is executed to locate an account, around 8-9 resolve operations take place. Each resolving operation constitutes one database query, and each database query may involve various actual disk operations. Estimating the quantity of disk actions is challenging, but given that the trie keys are cryptographic hashes (collision-resistant), the keys appear “random”, resulting in the worst-case scenario for any database.

As Ethereum has expanded, it has been necessary to increase the gas fees associated with operations accessing the trie. This adjustment was made in Tangerine Whistle at block 2,463,000 in October 2016, which included EIP 150. EIP 150 significantly raised certain gas costs and introduced a series of modifications to guard against DoS attacks, following the so-called “Shanghai attacks”.

Another such increase occurred during the Istanbul upgrade, at block 9,069,000 in December 2019. In this update, EIP 1884 was initiated.

EIP-1884 introduced the following amendments:

SLOAD increased from 200 to 800 gas,
BALANCE rose from 400 to 700 gas (along with a less expensive SELFBALANCE) being implemented,
EXTCODEHASH went from 400 to 700 gas,

The issue(s)

In March 2019, Martin Swende conducted some measurements concerning EVM opcode efficiency. This investigation subsequently resulted in the formulation of EIP-1884. A few months prior to the activation of EIP-1884, the document Broken Metre was published (September 2019).

Two security researchers from Ethereum — Hubert Ritzdorf and Matthias Egli — collaborated with one of the authors of the paper; Daniel Perez, and ‘weaponized’ an exploit they submitted to the Ethereum bug bounty on October 4, 2019.

We encourage you to read the submission thoroughly, as it’s a well-crafted report.

On a platform aimed at cross-client security, developers from Geth, Parity, and Aleth were notified about the submission on that same day.

The core of the exploit is to induce random trie lookups. A very straightforward version could be:

	jumpdest     ; jump label, start of loop
	gas          ; acquire a 'random' value on the stack
	extcodesize  ; initiate trie lookup
	pop          ; disregard the extcodesize outcome
	push1 0x00   ; jump label dest
	jump         ; revert to start

In their report, the researchers executed this exploit against nodes synced to the mainnet, via eth_call, and these were their findings when executed with 10M gas:

10M gas exploit using EXTCODEHASH (at 400 gas)
10M gas exploit using EXTCODESIZE (at 700 gas)

It is evidently clear that the alterations made in EIP 1884 were significantly reducing the impact of the assault, yet they were still far from adequate.

This transpired right before Devcon in Osaka. During
“““html

During Devcon, awareness of the issue was disseminated among the primary client developers for the mainnet. We also convened with Hubert and Mathias, alongside Greg Markou (from Chainsafe — who were engaged with ETC). Developers of ETC had also received the findings.

As 2019 came to an end, we realized we faced bigger challenges than we had previously foreseen, where hostile transactions could result in block times extending into the minute range. To complicate matters further: the developer community was already dissatisfied with EIP-1884, which had disrupted specific contract flows, and both users and miners were eagerly advocating for increased block gas limits.

Additionally, merely two months later, in December 2019, Parity Ethereum announced their exit from the ecosystem, allowing OpenEthereum to assume responsibility for the codebase’s maintenance.

A novel coordination channel for clients was established, wherein developers from Geth, Nethermind, OpenEthereum, and Besu collaborated further.

The solution(s)

We understood that we would need to implement a dual approach to address these challenges. One strategy involved working on the Ethereum protocol, aiming to resolve this issue at the protocol level; ideally without disrupting contracts, and preferably without penalizing ‘positive’ actions, while still thwarting attacks.

The second strategy would involve software engineering, altering the data models and structures within the clients.

Protocol work

The initial attempt at addressing such types of attacks can be found here. In February 2020, it was officially introduced as EIP 2583. The concept entails simply imposing a penalty each time a trie lookup results in a miss.

However, Peter devised a workaround for this concept — the ‘shielded relay’ attack – imposing an upper limit (approximately ~800) on the magnitude of such a penalty can effectively be.

The challenge with penalties for misses is that the lookup needs to occur first to establish that a penalty should be enforced. If there isn’t sufficient gas available for the penalty, an unpaid execution has taken place. Even though that results in a throw, these state reads can be encapsulated within nested calls; allowing the outer caller to persist in executing the attack without serving the (full) penalty.

Due to this, the EIP was discontinued while we sought a superior alternative.

Alexey Akhunov examined the concept of Oil — a supplementary source of “gas”, which was inherently different from gas, in that it would remain hidden from the execution layer and could induce transaction-based global reverts.
In May 2020, Martin drafted a comparable proposal about Karma.

While iterating through these various concepts, Vitalik Buterin suggested merely increasing the gas costs, while maintaining access lists. In August 2020, Martin and Vitalik commenced iterations on what was to evolve into EIP-2929 and its partner EIP, EIP-2930.

EIP-2929 effectively resolved a significant number of the previous issues.

Unlike EIP-1884, which raised costs unconditionally, it only elevated costs for items not already accessed. This results in a mere sub-percent increase in net costs.
Moreover, along with EIP-2930, it doesn’t disrupt any contract flows,
And it can be further adjusted with heightened gas costs (without causing issues).

On April 15th, 2021, both went live with the Berlin upgrade.

Development work

Peter’s initiative to address this issue was dynamic state snapshots, implemented in October 2019.

A snapshot serves as an additional data structure for storing the Ethereum state in a flat format, which can be fully constructed online during the active operation of a Geth node. The advantage of the snapshot is that it functions as an acceleration structure for state accesses:

Instead of conducting O(log N) disk reads (due to x LevelDB overhead) to access an account/storage slot, the snapshot can provide direct, O(1) access time (x LevelDB overhead).
The snapshot enables account and storage iteration at O(1) complexity per entry, allowing remote nodes to retrieve sequential state data at significantly reduced costs compared to before.
The existence of the snapshot also facilitates more exotic use cases such as offline pruning of the state trie or transitioning to other data formats.

The downside of the snapshot is that the raw account and storage data is essentially duplicated. In the context of the mainnet, this translates to an additional 25GB of SSD storage used.

The concept of dynamic snapshots had already been initiated in mid-2019, primarily aiming to enable snap sync. At that time, there were several “large projects” the Geth team was involved with.

Offline state pruning
Dynamic snapshots + snap

“`sync

LES state distribution through sharded state

Nevertheless, a decision was made to fully focus on snapshots, deferring other initiatives for the time being. These established the foundation for what would eventually become snap/1 sync algorithm. It was integrated in March 2020.

With the “dynamic snapshot” feature released into the ecosystem, we gained some breathing space. Should the Ethereum network face an attack, it would be challenging, indeed, but communicating to users about enabling the snapshot would at least be feasible. The entire snapshot generation process would require considerable time, and synchronizing the snapshots was not yet possible, but the network could continue its operations nonetheless.

Connecting the threads

Between March-April 2021, the snap/1 protocol was introduced in geth, allowing synchronization via the new snapshot-based algorithm. Although it wasn’t the standard synchronization method yet, it represented a significant (crucial) progression toward making snapshots not just beneficial as an attack deterrent, but also as a substantial enhancement for users.

On the protocol front, the Berlin upgrade was implemented in April 2021.

Some benchmarking conducted in our AWS monitoring environment is outlined below:

Pre-berlin, no snapshots, 25M gas: 14.3s
Pre-berlin, with snapshots, 25M gas: 1.5s
Post-berlin, no snapshots, 25M gas: ~3.1s
Post-berlin, with snapshots, 25M gas: ~0.3s

The (approximate) data suggests that the Berlin upgrade diminished the attack effectiveness by 5x, while snapshots further mitigated it by 10x, resulting in a total impact reduction of 50x.

We project that currently, on Mainnet (15M gas), it would be feasible to create blocks that would require 2.5-3s to process on a geth node without snapshots. This figure is expected to continually decline (for non-snapshot nodes) as the state expands.

If refunds are utilized to augment the effective gas usage within a block, this can be further intensified by a factor of (max) 2x. With EIP 1559, the block gas limit will possess greater elasticity, facilitating an additional 2x (the ELASTICITY_MULTIPLIER) during temporary surges.

Regarding the feasibility of launching this attack; the expense for an attacker acquiring a full block would amount to a few ether (15M gas at 100Gwei translates to 1.5 ether).

Why reveal now

This risk has been an “open secret” for quite some time — it has actually been mistakenly made public at least once, and it has been referenced in ACD discussions multiple times without specific details.

Given that the Berlin upgrade is now completed, and that geth nodes by default are using snapshots, we believe the threat level is sufficiently low that transparency takes precedence, and now is the moment to offer a complete disclosure on the behind-the-scenes developments.

It is crucial that the community is provided with the opportunity to comprehend the reasoning behind modifications that adversely impact user experience, such as increased gas costs and constraints on refunds.

This article was composed by Martin Holst Swende and Peter Szilagyi on 2021-04-23.
It was shared with other Ethereum-related projects on 2021-04-26, and publicly revealed on 2021-05-18.

Source link