To those of you who are unaware, we are currently experiencing a chain halt, which means we don’t have the 67% consensus required between nodes to produce new blocks. We have been providing updates in the #node-runner Discord channel (which you can join by clicking the robot emoji in #welcome) whenever we have new developments to share, but we should have been providing more frequent public updates. Now that we have a solid grasp of the issue and how to resolve it, this will be the first of a regular cadence of status updates we’ll be sharing until the chain gets moving again.
The Issue
The cause of the chain halt was a deterministic app hash error based on the new transaction indexer introduced in RC-0.6.3. The transaction indexer is only used for consensus in 1 place: replay protection, which makes RC-0.6.3 and previous releases handle a particular edge case differently.
Unlike a chain halt that results from node downtime, this required identifying the cause (to determine if a hotfix is required) and coordinating with node runners to update their software. This is why the chain halt was taking longer to resolve than might otherwise be expected.
Now the nodes are on the software we need them to be, but there’s a different issue stemming from the chain halt itself. The longer the halt has persisted, the more voting rounds have occurred (72 in total), the more memory nodes are having to retain, the harder it has been to maintain nodes (keep them from crashing), the harder it has been to get 67% of nodes to stay caught up to the round data and vote in sync. Solving this is our main focus now.
The Backup
One thing that is important to highlight, since not everyone may be aware of the backup mechanisms we have in place, service to applications has remained uninterrupted for the duration of the halt. This is because the majority of applications use the Pocket Dashboard to connect to Pocket and we have built-in backup nodes that ensure application’s relays continue to be serviced in any event.
The Solution
Once we identified that the root cause was the transaction indexer in RC-0.6.3, we coordinated with the largest node runners to get them all updated and ensure that 67% of the network is operating by the rules of the new transaction indexer. This has been completed successfully.
Now we are working to collectively disregard the 72 unsuccessful voting rounds, to lighten the load and make it easier to get that next block produced. As I write this, the core devs are working on a patch that will skip these voting rounds for just this block height. Node runners will be provided time to update to this new version once it is released, with a deadline upon which upgraded nodes will wake, to account for different time zones. Once 67% have updated to the new version and the nodes wake, nodes will achieve consensus on the next voting round, and finally produce the next block.
We recommend ALL nodes to continue paying attention to these updates, as the more who upgrade with the new patch, the quicker we’ll unhalt the chain.
The Implications
- The chain will not be resuming until the wake deadline after the above patch is released. Edit: this is now May 31st 6pm EST.
- The long tail of node runners who do not participate in this hotfix will miss a block, which will result in jailing if you also missed 3 out of the past 9 blocks. However, after this block they will continue to be in consensus.
- Node runners who update to the patch should revert to 0.6.3, because it is confirmed that the patch will only affect the next block (27197) and 0.6.3 will remain in consensus moving forward.
The Silver Linings
- A blockchain is only as good as its node runners and our node runners have really stepped up this weekend, coordinating around the clock with our core devs to diagnose the halt, keep their nodes up, and cooperate across time zones to achieve consensus. It is seriously heartening to see the camaraderie that has been displayed during this challenging time and bodes well for the future resilience of our community.
- Once the chain gets moving again, we have enough consensus on 0.6.3 to activate all of the 0.6.X features: UpdateStake functionality (which will be part of a new process for whitelisting new chains more rapidly), higher network stability, Protobuf encoding for easier SDK/client development, etc.