PIP-4: Consensus Rule Change: 0.6.3

Proposal Edit – April 14th, 2021

0.6.0 has been replaced by 0.6.1, which fixes a critical bug that was discovered during the testnet upgrade rehearsal. Refer to this announcement for details on the hotfix.

Proposal Edit – May 11th, 2021

0.6.1 has been replaced by 0.6.3, which introduces a much-requested feature: Update Stake. Refer to this announcement for details on the new feature.

Attributes

Summary

Pocket Network Inc. is releasing a consensus rule change with the 0.6.3 release. The release contains several bug fixes, scalability features, and quality-of-life improvements.

Abstract

This Pocket Core release (0.6.3) offers a higher level of security (2 mission critical patches in the merkle tree) plus provides a higher level of network stability through the removal/patching of events, in addition to a change in the encoding algorithm (Amino to Google’s Protobuf).

Motivation

There are two major security issues in the merkle tree proof/claim implementation as well as an exploitable prediction attack due to a misimplementation at the block hash generation. The current encoding scheme is both ‘custom’ and unsupported across most all programming languages which hinders ecosystem growth and future development. Lastly, PUP-4 is somewhat addressed in this release.

Specification

0.6.0

  • Convert all consensus level amino encoding (including but not limited to the internal storage codecs) to protobuf encoding while maintaining as many legacy structures as possible

  • Introduce Previous Block Validator Voting structure into the block hash used for session and proposer selection algorithms.

  • Use the index of the leafs of the plasma core merkle tree as part of the parent hash to lock in the values using the Claim merkle root

  • Ensure consensus level events are not concatenated in the pocket core module by initializing in the transaction handler

  • Change ABCIValidatorUpdate to ABCIValidatorZeroUpdate for separation of service and validation

0.6.1

  • Hotfixed issue wherein GetParams() would ignore RelaysToTokenMultiplier
  • Added utility CLI command to convert evidence from amino to protobuf
  • Updated RPC spec to include stdTx
  • Return Dispatch for certain failed relay codes to save a hop on client side
  • Fix simulate relay to use basic auth
  • Updated User Guide to use RC-0.6.1
  • Added unsafe delete command to the keybase

0.6.3

  • Added Update Stake, which enables apps/nodes to edit certain parameters of their stake without needing to unstake first, by simply submitting the stake transaction again.
  • MaxApplications threshold activated (if all slots are taken, no additional apps can stake regardless of the stake amount)
  • Return AllClaims if no address is passed to nodeClaims query
  • No ABCI query during newTx() function in pocket core module
  • Change StdSignature from Base64 to Hex in RPC

Rationale

The bug fixes in the merkle tree result in an increased level of network stability. Applications and node runners will experience an even higher degree of reliability through the new found network security.

Through the addition of Protobuf encoding, client-side tooling such as SDK development and improvements just got a lot easier, which will make expanding our potential app user bases easier and an all around better development experience while using Pocket Network.

In addition a bug identified in event-handling has now been fixed, which creates smaller block sizes and should enable faster txs and overall better service.

We have successfully separated servicing and validation, which allows us to have more nodes overall and more scalability - no longer capped to 5000(technically). That said, PUP4 will likely still try to limit nodes to 5000 due to the lack of jailing available to servicers which may lead to service degradation.

Protobuf encoding will also lower transaction latency a tad because of less resource demand on nodes.

Viability

An extensive number of tests, functional, integration, unit, load, and simulation were completed leading up to this upgrade. These can be found in the release notes.

Implementation

The implementation of 0.6.0 is near complete. A few pending tests, the agreement of an Upgrade height, and the approval of this proposal, will result in a complete implementation.

0.6.1 0.6.3 has been released. Once ≥67% of validator power has updated to this version, this proposal will be edited to specify an upgrade height and voting will commence. Voting will last 7 days and pass with a 50% majority. If the vote passes, the Foundation will activate the upgrade at the specified height using the pocket gov upgrade transaction.

Audit

There was no external audit, refer to Viability.

Copyright

Copyright and related rights waived via CC0.

Now it seems we have enough validator power to proceed with the process of specifying an upgrade height, voting on this proposal, and ultimately submitting the gov upgrade transaction (per the Implementation section above).

To ensure full transparency/inclusion, I will share the data that leads me to believe we have enough validator power (so that it can be audited) and an off-chain poll to let node runners signal which upgrade height they are comfortable with.

Upgrade Readiness Data

With the help of C0D3R’s network charts, I was able to get a starting point of adoption %s.

It was then a question of tracking down the Unknown nodes (which is typically a result of a masked v1 endpoint) and confirming either of the following:

  • Enough of the Unknowns are on 0.6.3+ to achieve ≥67% consensus overall when combined with the Knowns
  • Enough of the Unknowns are jailed, thus not participating in consensus, to achieve ≥67% consensus with the Knowns alone

It seems that we have the latter case.

I asked C0D3R to add columns to the Version Explorer displaying current staked amount and jailed blocks (where 0 means unjailed). I then sorted by the largest stakes and went about recording which Unknown nodes are both large (in terms of Validator weight) and jailed, meaning their weight is irrelevant to consensus.

I cross-checked this with the genesis file and can confirm that all of these jailed unknowns are nodes who have been jailed since genesis, which means they are unlikely to come online soon and, if they do, will most likely be coming online under the latest version.

The result is an estimate that approximately 87% of validator power is currently running 0.6.3+. We need 67% to maintain consensus.

Data here: https://docs.google.com/spreadsheets/d/1bXIBCnDi2gpagEj86V9f-6-BH1PEyiyZMyZQoCmGRVA/edit?usp=sharing

Upgrade Height Poll

Typically, the gov upgrade transaction should pass without anyone noticing, other than being able to use new features that the new consensus rules bring.

However, since we are changing the consensus rules, there is a non-zero risk that a bug emerges once the upgrade is activated, no matter how much testing has been done.

This means we should 1) allow some breathing room in light of the recent chain halt, 2) choose a day/time that everyone will be awake and available to respond in the subsequent hours if needed. 11am EST seems to be the safest in my eyes, leaving a buffer before Europeans close up for the day, while being late enough that West Coast Americans should wake up before/during a hypothetical response period.

Using this reasoning, I would suggest the following options.

Which upgrade height would you prefer?
  • 28633 (Wed June 16th ~11am EST)
  • 29113 (Mon June 21st ~11am EST)
  • 29305 (Wed June 23rd ~11am EST)
  • Other (respond with suggestions)

0 voters

I would personally prefer the earlier options, because the sooner we upgrade the sooner we activate all of the features that 0.6.X brings, including network stability enhancements, UpdateStake, and MaxApplications (which will help us to whitelist new chains more rapidly). But I understand that some node runners may still be burned out from the chain halt response.

Please signal your preference in the poll above. Once we have a signal, we’ll edit the proposal to specify an upgrade height and publish it for voting.

I would favor doing this upgrade as early as possible if it improves the network stability, but I am skeptical about it and I think adding features at the moment will make it worse (happy to be proven wrong).

We’ve been looking for stability improvements for a few releases but we’ve had more multiple big outages & bugs (fastsync v0/v1, consensus split, 5K limit jail-looping some validators). Some node runners are still fire-fighting the situation since the halt, without a path out, and I think adding features will increase the risk to the point they won’t even be able to move ahead if there is another consensus break. I wish I could cast a positive vote on the poll, but I don’t see a foreseeable path to a stable situation at this stage.

Regarding the switch itself, there will likely be hiccups on node runners that have automation around the CLI, as the CLI options to stake/unstake/unjail etc will change. +1 for something like 11am EST; picking a Monday or Tuesday could also help saving a week-end in case things go wild.

1 Like

Thanks @mxs for this analysis. I agree completely, network stability and operation-ability on the node runners side is paramount at this time.
Thanks