Presented below are some intriguing findings regarding the performance of various miners throughout the initial 280,000 blocks of the Ethereum blockchain. During this period, I have compiled the list of block and uncle coinbase addresses; raw information can be found here for blocks and here for uncles. This provides us with a wealth of insightful data, particularly concerning stale rates and the connectivity of various miners and pools.
To begin with, the scatter plot:
In this observation, we identify several key trends. Primarily, uncle rates are significantly lower compared to Olympic; in total, we have observed 20,750 uncles across 280,000 blocks, resulting in an uncle rate of 7.41% (when calculated inclusively, i.e., uncles as a percentage of all blocks instead of uncles per block, this yields 6.89%) – in summary, this figure is not much higher than comparable statistics for bitcoin even back in 2011, when its mining environment bears a resemblance to Ethereum’s with CPUs and GPUs still predominant and low transaction volumes. It is important to note that this does not imply miners receive merely 93.11% of the income they would earn if they had optimal connectivity to every other miner; Ethereum’s uncle mechanism effectively minimizes approximately 87% of the gap, resulting in an actual “average loss” from poor connectivity of only around 0.9%. However, these losses are expected to escalate for two reasons as the network experiences increased transactions: first, the uncle mechanism applies only to base block rewards and not to transaction fees, and second, larger blocks necessarily lead to extended propagation durations.
Secondly, we observe a general trend where larger miners display lower uncle rates. This is, understandably, to be anticipated; however, it is crucial to explore (1) why this occurs, and (2) to what degree this is a genuine phenomenon rather than merely a statistical byproduct of the fact that smaller samples often exhibit more extreme outcomes.
When categorized by miner size, the statistics are summarized as follows:
Number of blocks mined | Average uncle rate |
0.127 | |
10-100 | 0.097 |
100-1000 | 0.087 |
1000-10000 | 0.089* |
>= 10000 | 0.055 |
* This outcome may be significantly influenced by a single anomaly, the possibly malfunctioning miner represented by the dot on the chart at 4005 blocks mined, 0.378 uncle rate; excluding this miner provides an average uncle rate of 0.071, which aligns more closely with the general trend.
There are four main hypotheses that could elucidate these findings:
- Professionalism disparity: Larger miners operate as professional entities and have greater resources at their disposal to invest in enhancing their overall connectivity to the network (e.g., through superior wireless equipment, closely monitoring their uncle rates for optimal performance), thus achieving higher efficiency. Conversely, smaller miners are typically amateurs using their personal laptops and may not have particularly strong internet connections.
- Last-block effect: The miner responsible for generating the most recent block is ‘aware’ of it instantaneously rather than waiting approximately one second for it to spread across the network, consequently gaining an edge in discovering the subsequent block.
- Pool efficiency: The significantly larger miners function as pools, and pools tend to exhibit superior networking efficiency compared to individual miners for some reason.
- Time period differences: Pools and other sizable miners were not operational during the initial days of the blockchain when block times were extremely rapid, resulting in elevated uncle rates.
The last-block effect evidently does not account for the entire picture. If it were the sole cause, we would expect to see a linear decline in efficiency: miners who mined just one block might experience an 8% uncle rate, while those who mined 28,000 blocks (i.e., 10% of the total) would register a 7.2% uncle rate, and miners who extracted 56,000 blocks would see a 6.4% uncle rate, etc.; this is due to the fact that miners mining 20% of the blocks would have mined the latest block 20% of the time, thus benefiting from a 0% expected uncle rate 20% of the time, explaining the 20% descent from 8% to 6.4%. The contrast between miners who mined one block and those who mined 100 blocks would be minimal. In practice, the decline in stale rates as size increases appears to be almost perfectly logarithmic, a trend that significantly aligns with a professionalism disparity explanation more than any other theory. Similarly, the time period difference complication is backed by the observed curve, but it is essential to recognize that only around 1,600 uncles (i.e., 8% of all uncles and 0.6% of all blocks) were mined in the initial chaotic two days, during which uncle rates were elevated; thus, this can only account for a maximum of ~0.6% of the total uncle rates.
The prevalence of professionalism disparity as the dominant factor is, in a sense, a positive indication, particularly since (i) this element is more significant at smaller to medium scales than it is at medium to large scales, and (ii) individual miners are usually influenced by opposing economic incentives that mitigate their decreased efficiency – notably, the fact that they are utilizing hardware they have largely already acquired.
Next, what accounts for the leap from a 7.1% uncle rate for those mining 1,000 to 10,000 blocks to a 5.5% rate for those above that threshold? The last-block effect could explain approximately 40% of this phenomenon, but not all of it (quick calculation: the average miner in the earlier group attains a network share of 1%, whereas in the latter group it stands at 10%; the 9% difference should predict a reduction from 7.1% to 7.1% * 0.93 = 6.4%), although it’s crucial to acknowledge that with the small quantity of miners, any conclusions drawn here should be viewed as exceedingly tentative at most.
The defining feature of the miners above 10,000 blocks, quite understandably, is that they are pools (or at least three out of five; the remaining two are independent miners, albeit the least significant). Interestingly, the rates of uncles for the two non-pools are 8.1% and 3.5% respectively, yielding a weighted average of 6.0%, which closely resembles the 5.4% weighted average stale rate of the three pools; thus, it generally appears that the pools are marginally more efficient compared to the solo miners, yet again, this outcome should not be regarded as statistically relevant; despite a large sample size within each pool, the overall sample size of pools remains minimal. Furthermore, the most efficient mining pool is not actually the largest (nanopool) – it’s suprnova.
This prompts an intriguing inquiry: where do the efficiencies and inefficiencies associated with pooled mining originate? On one side, pools are likely to be extremely well-connected to the network, adept at propagating their blocks effectively; they also gain from a diminished version of the last-block effect (diminished due to the requisite single-hop round trip from miner to pool back to miner). Conversely, the delay in receiving tasks from a pool subsequent to block creation should slightly elevate one’s stale rate: assuming a network delay of 200ms, by approximately 1%. It is probable that these influences largely neutralize each other.
The third crucial aspect to evaluate is: to what extent do we see disparities due to a true inequality in miner connectivity, and how much is merely attributed to chance? To investigate this, we can perform a simple statistical examination. Below are the deciles of uncle rates for all miners that generated more than 100 blocks (such that the first figure represents the lowest uncle rate, the second denotes the 10th percentile, the third is the 20th percentile, and so forth, concluding with the highest):
[0.01125703564727955, 0.03481012658227848, 0.04812518452908179, 0.0582010582010582, 0.06701030927835051, 0.07642487046632124, 0.0847457627118644, 0.09588299024918744, 0.11538461538461539, 0.14803625377643503, 0.3787765293383271]
Here are the deciles produced by a random model where each miner has a 7.41% “natural” stale rate and all discrepancies arise from differing luck:
[0.03, 0.052980132450331126, 0.06140350877192982, 0.06594885598923284, 0.06948640483383686, 0.07207207207207207, 0.07488986784140969, 0.078125, 0.08302752293577982, 0.09230769230769231, 0.12857142857142856]
Thus, we observe approximately half of the influence. The other half indeed stems from authentic connectivity variances; particularly, if you use a straightforward model where “natural” stale rates are random variables following a normal distribution centered around a mean of 0.09, with a standard deviation of 0.06 and a hard minimum of 0, you get:
[0, 0.025374105400130124, 0.05084745762711865, 0.06557377049180328, 0.07669616519174041, 0.09032875837855091, 0.10062893081761007, 0.11311861743912019, 0.13307984790874525, 0.16252390057361377, 0.21085858585858586]
This approximation is rather close, although it tends to increase too quickly on the lower end and progressively on the upper end; in reality, it appears that the optimal “natural stale rate distribution” exhibits positive skewness, which aligns with expectations given the diminishing returns associated with exerting greater effort to become more well-connected to the network. In conclusion, these effects are not significantly large; particularly when divided by 8 after factoring in the uncle mechanism, the disparities are considerably less than the disparities in energy expenses. Consequently, the most effective strategies for enhancing decentralization in the future are arguably focused on devising more decentralized alternatives to mining pools; possibly, mining pools adopting strategies such as Meni Rosenfeld’s Multi-PPS could serve as a medium-term remedy.