Here are the notes from the call. Thanks to @Patrick727 and @RichCL for writing these up.
1. Chain halt post mortem
- May 28th - most difficult chain halt we have experienced
- Why was this specifically so difficult for Pocket to overcome?
- The bug that caused it.
- the coordination effort. A feature of how Tendermint handles immediate consistency for every block.
- On Friday - we figured out in the first hour.
- Source of issue was tx indexer. Specific combination of Txs threw off the state.
- Read only nature of Txs is helpful for …
- But Tx Indexer, the default was storing redundant Txs. This was in 6.1.
- When we went to 6.2 → we introduced a Tx Indexer that would remove redundant Txs.
- Error showed an insufficient fee Tx, when we had sufficient fees.
- Very niche case.
- We believe it was a random collision issue.
- Why was this specifically so difficult for Pocket to overcome?
- Next step: How do we get back to consensus?
- We decided to all get to 6.3, but when we got there we were unable to produce a block. Rounds got pushed back to 71.
- Downside: not enough nodes were surviving to vote on the consensus rounds to produce a block.
- Attempted Resolution: Brought together biggest node runners, tried to delete the wall file, then get 67% in isolation
- Eventual Resolution: Code a patch and bypass those existing rounds. Start at round 0 again.
- We got past the chain halt at this height.
- We decided to all get to 6.3, but when we got there we were unable to produce a block. Rounds got pushed back to 71.
- Aftermath:
- Difficult for Noderunners to
- delete wal files
- Txs that were living through 4 days of turmoil in the gossip
- Difficult for Noderunners to
- Final resolution:
- we proceed with most on 0.6.3, but some with 6.3.2
- Questions:
- none in Slido or #community-calls
2. DAO voting period for RC 0.6.3 upgrade: 6/23/2021 - 6/30/2021
- Some of the features are currently dormant because the consensus rule change has yet to be activated by the DAO
- Next Wednesday 12pm EST, we will submit pocket gov upgrade tx to activate update stake, protobuf encoding, other changes mentioned in last Community Call
- great time for Settlers of New Chains proposal.
- Andrew: proposed changes
- Merkle sum tree patch
- Proposer selection patch
- don’t hash in the vote data
- protects us from a Grinding attack
- Protobuf as the new standard for storing your evidence
- As a reminder, a consensus breaking changes.
- Allows UX update include Edit Stake Update
- drop old chains, add new ones easily, add POKT to your stake
- All written documentation here - pocket-core/doc/specs/architecture.md at 0e999d96c2d3043f5456f6312fa5c563cd77cf87 · pokt-network/pocket-core · GitHub
- Question: “Andrew, will these updates result in faster syncing for our nodes?”
- Protobuf has a slight improvement when being encoding reading/writing
- 6.4 beta - we have seen anecdotally, we have seen 60x faster syncing due to caching methods
- Still beta, but very promising
- Question: “Did you ever consider snapshots regarding this syncing dilemma ?”
- Something on the Tendermint’s roadmap, but has been there for more than a year
- As we veer away from Tendermint, we will look at this
- Question: “Are you advising/requesting additional nodes to run/test Beta 0.6.4? Or do you have enough testers?”
- Interesting because it comes up in the context of us about to submit 6.3 changes.
- We do not recommend anyone downgrade to 6.3
- I wouldn’t recommend anyone to go to 6.4 until consensus changes are implemented
- But not dissuading Runners from using bleeding edge technology
3. DAO voting period for “Settlers of New Chains”: 6/24 - 6/31
- The reason why we haven’t been able to freely add new chains because there has been a chicken/egg problem for new or small chains
- risk of targeted servicing
- We initially wanted to get to a certain scale before we start issuing rewards, but then there’s an incentive bootstrapping problem before we get to that scale
- First attempt: Campaign Net - isolated incentivized testnets
- Second attempt: Centralized funding based on Dashboard data
- Both efforts - Not super sustainable, and expensive
- Instead, how about we control demand-side through Dashboard and subsidize demand for a period of time
- Combine this with functionality, and this becomes MUCH easier for the Node Runners
- no need to wait for 21 days period
- In order to bootstrap a chain for a period of time, PNF provide a Minimum Viable Relays
- once organic demand meets MVR, then we take off subsidize demand, or 3 months come about - we take off the subsidize
- This is a temporary measure till future Consensus features can ameliorate targeted servicing risk
- Inflation effect is negligible
- <1% on total supply
- small cost to bootstrap growth
- Proposal summary:
- would delegate power to:
- choose new stakes chains
- control max application parameters
- give notice on new chains
- We would start with the following chains
- Avalanche Mainnet (0003)
- BSC (0004)
- BSC Archival (0010)
- Fuse (0005)
- xDAI (0027)
- Ethereum Archival Trace (0028)
- Ethereum Ropsten (0023)
- Ethereum Rinkeby (0025)
- Ethereum Goerli (0025)
- Ethereum Kovan (0024)
- Ethereum Archival (0022)
- On radar for next chains:
- Solana
- Polygon (Matic)
- would delegate power to:
- Question: “Can we have the “realistic” expected volumes on each chains, to have that information taken into account into our analysis and voting?”
- Shane: Reason we are choosing BSC, Fuse, Ava - we have been talking to their foundations, and DApps who have been asking for their endpoints
- Now that we have Portal, we are in a good place to serve these new chains
- Shane: Reason we are choosing BSC, Fuse, Ava - we have been talking to their foundations, and DApps who have been asking for their endpoints
- Jack: we would like to discourage pre-staking to a chain unless you have pre-spun that specific node
- Why? negatively impacting service for the Application if portal sends you traffic and your node is down
- No need to pre-stake due to Edit Stake functionality
- Question: “Does any other nodes providers have experience with any of those protocols, and with running them as full nodes ?”
- Alex: Yes, absolutely. PNF has spun up fallback nodes for the majority of these chains.
- If you run under docker, you can use one of my boxes with all of these new chains - just see in Node Runner channel chat history
- Initially difficult, but getting a lot more stable
- Re: resource requirements. We are running our transition from OpenEthereum to Erigon (TurboGeth)
- space requirements are significantly more forgiving
- separates the RPC from the data layer
- easier to scale
- faster than Open ETH
- Also, we will be running them bare metal outside of Docker as well
- Alex: Yes, absolutely. PNF has spun up fallback nodes for the majority of these chains.
- Question: “Bullish 4 SOL, but big latency constraints. Will perf be sufficient?”
- Have not run it,
- Of course, it is enormous
- Submission to DAO in the works: adjust relays for specific chains to help compensate for these more expensive chains
- Koen: Re: Erigon - I have been following them but my experience is that they are not production ready. Data results vary across different versions. A bit concerned. They are pushing out patches, but there are data differences
- Alex: I have been monitoring on this. Minor data differences, I would have to do more research. If you know of any major data differences, please let me know.
- Syncing an archival nodes was taking 6-8 weeks. Without a filed version, it was near impossible.
- Koen: yeah, it was taking me 3 weeks last summer
4. Node Pilot keeps shipping! Open-source library
Repo: GitHub - decentralized-authority/node-launcher: Full crypto node runner used in Node Pilot.
