- Author(s): @iajrz @otto_vargas @jdaugherty
- MinSignedPerWindow: 0.6
- SignedBlocksWindow: 10
- DowntimeJailDuration: 3600000000000
- SlashFractionDowntime: 0.000001
- MinSignedPerWindow: 0.8
- SignedBlocksWindow: 5
- DowntimeJailDuration: 14400000000000
- SlashFractionDowntime: 0.001
When validators go offline or spread bad gossip due to misconfigurations, their behavior can disrupt service and put the chain at risk for chain halts.This proposal aims to make it more expensive for a validator to go offline and to create mitigations that correlate with the danger posed to network performance and resilience by removing non-responsive validator nodes from the validator pool faster and increasing penalties for misconfigured validators.
The MinSignedPerWindow represents the minimum amount of blocks that need to be signed by validator nodes during the given window. If a validator falls under this percentage (currently 60%, or 6 out of 10 blocks), the validator gets jailed. This value can be increased to 80%, which would decrease the window of time before a validator invokes this mitigation from the current 1 hour and 15 minutes to 30 minutes.
In addition to increasing the MinSignedPerWindow, the SignedBlocksWindow should be halved from the current 10 blocks to 5 blocks, requiring the validator to sign every 4 out of 5 blocks rather than 8 out of10 blocks to allow the network to react more quickly to validator misconfigurations.
As part of the penalty for failing to sign an adequate number of blocks in the given window, we propose the DowntimeJailDuration window should be increased from 4 blocks (1 hour) to 16 blocks (4 hours) minimum, which is approximately 0.2% of a validator’s monthly availability.
To additionally mitigate and correct for this downtime, increasing SlashFractionDowntime will create proportional penalties for validators that do not sign enough blocks in the above windows.
These parameter updates aim to reinforce desired validator behavior in anticipation of potential downstream impacts of recent and upcoming protocol upgrades and software releases:
The release of PIP-22 (RC-0.9.0) has increased validator diversity and the likelihood that servicers will become validators due to the introduction of stake-weighted reward bins, which creates the potential for influxes of validator misconfigurations that should be dis-incentivized with more aggressive slashing.
The release of Lean Pocket (RC-0.9.2) will introduce the ability for servicers to run stacked nodes, which increases the danger posed by validator misconfigurations for servicers who have consolidated into a more vertical model that interacts with misconfigured validators through bad gossip, which is spread more effectively, resulting in disagreements among validators about state, leading to consequences from stuck nodes to chain halts.
MinSignedPerWindow and SignedBlocksWindow work in tandem; the idea is to reduce the amount of time a validator’s vote still counts toward consensus when it’s going to be offline for an indeterminate amount of time.
Currently, validators must sign 60% of blocks in the MinSignedPerWindow (6/10 blocks). The window has a fixed rollover point, meaning that if a node gets stuck at the 6th block of the window, it would be counted as a voter for up to 9 blocks before being removed from the validator pool — 4 from the first window and 5 from the next. In terms of time, this means a validator node could be stuck for up to 2 hours and 15 minutes (since each block is ~15 mins) before it’s removed from the validator pool. Furthermore, the validator node can be out of service for one out of every two and a half hours with no negative consequences.
The proposed changes to SignedBlocksWindow make it so that 4 out of 5 blocks need to be signed, which corresponds to an increase of MinSignedPerWindow from 60% to 80% to keep these parameters closely coupled. This reduces the worst-case absenteeism scenario to 45 minutes (one block from an ending window and two blocks from the next window), increasing the level of service to a maximum downtime of 15 minutes every 75 minutes.
The combination of the factors mentioned above and the introduction of larger (stake-weighted) nodes creates an incentive for both large servicers and validators to run tight infrastructure with solid monitoring and recovery capabilities, improving overall network health.
In line with the above changes, DowntimeJailDuration would be increased so that the price of downtime is higher. Jail duration count starts when the node is taken out of the validator pool, which today means a validator has to be offline for over an hour to be jailed for a single hour. We propose the duration be increased so that the punishment is significant regarding the downtime with the penalty of four hours for punishable downtime, which is either 30 minutes or 45 minutes after a validator is offline, depending on window timing.
Breakdown of Current vs. Future state of Downtime Duration
|Scenario||Present Parameters||Future Parameters|
|Slowest Detection||135 minutes||45 minutes|
|Fastest Detection||75 minutes||30 minutes|
As an additional means of reinforcing desired behavior through monetary dis-incentives, SlashFractionDowntime needs to increase to be proportional to the amount of risk they’re trying to dissuade. An increase to 0.1% of the staked amount, in combination with the higher mean validator stake we have when compared to the past, means that this level of slashing is going to be felt.
This is particularly important because misconfiguration is the most common reason why these would be applied. The proposed value changes are set against the average validator stake of 70k $POKT, which would result in the validator losing ~3% of their monthly rewards every time this penalty was incurred.
Breakdown of Current vs. Future state of reward penalties (using validator stakes as of Oct. 12th, 2022)
|Scenario||Present Parameters (0.0001% of stake)||Future Parameters (0.1% of stake)|
|Smallest Validator Stake||63,200 uPOKT||63,200,000 uPOKT|
|Average Validator Stake||72,184 uPOKT||72,184,000 uPOKT|
|Highest Validator Stake||333,333 uPOKT||333,333,000 uPOKT|
The proposed adjustments to the above parameters would reinforce desired validator behavior, mitigate the impact of validator misconfigurations on network health, and protect service nodes as new reward and node configuration models change the makeup of, and node interactions with, the validator pool.
The validator penalty is too aggressive
The penalty is intentionally aggressive to be proportional to the threat. Given that this proposal aims to incentivize less vigilant validators to prioritize quality, we believe this penalty is enough to feel it without being an outsized threat to profitability (assuming the configurations of misbehaving validators are corrected accordingly).
2 hours of jailing is not enough for validators to notice
When used in conjunction with MinSignedPerWindow we believe that the increased chance of jailing makes 2 hours a good starting point that incentivizes validators to debug and address downtime issues before the jailing period is over (and, ideally, moving forward with improved monitoring, etc.).
Edit: Following forum discussions below, this value was deemed too low and has been upped throughout the proposal to set the new value of the DowntimeJailDuration parameter to 14400000000000 (4 hours).
Why can’t we increase MinSignedPerWindow without decreasing SignedBlocksWindow?
While increasing MinSignedPerWindow to 80% (8 out of 10 blocks) achieves similar results, it does not impact the time it takes for the network to react to this misconfiguration. Decreasing SignedBlocksWindow allows the network to react twice as fast to misconfigured validators and remove the validator pool, which is better for network consensus.
Copyright and related rights waived via CC0.