Attributes
- Author(s): chris-chainflow#4082
- Recipient(s): Authors
- Category: Imbursement
- Fulfills: N/A
- Related Installments: N/A
- Asking Amount: 421200
Summary
We will customize open source tools, building the necessary plugins, etc., to provide a more comprehensive, POKT-node-specific monitoring and alerting toolset. We will provide the source code and documentation necessary for node operators to implement the solution for themselves.
Abstract
Chainflow will build a monitoring and alerting toolset that provides better visibility in validator operations, resulting in more secure and reliable network, while freeing time for validators to participate in other POKT community initiatives, e.g. governance.
Motivation
The POKT network requires nodes to secure and provide services to the network. These nodes need to meet strict and demanding availability and security requirements. Monitoring and alerting tools are needed to ensure these nodes are operating as expected.
Using these tools helps node operators provide the availability and security required and demanded by the network and its users. Establishing these tools as a helpful and necessary infrastructure component establishes good practices among node operators.
Open source monitoring and alerting tools offer a partial solution. Yet these tools need customization to monitor and alert on key node functions. Developing these tools often takes a back seat to “fire-fighting” activities. These activities ofen hijack a node operator’s daily operations.
As such, these tools become seen as a “luxury” to “get to later”, and as such don’t get developed or implemented. Furthermore, smaller node operators, key to stake decentralization, may not have the resources required to develop such tools.
This project will use open source tools as its foundation. We will customize the tools, building the necessary plugins, etc., to provide a more comprehensive, POKT-node-specific monitoring and alerting toolset. We will provide the source code and documentation necessary for node operators to implement the solution for themselves.
This will save them considerable time, attention and effort, making it much more likely they will establish this important piece of a highly available and secure node operation. As a result, the POKT network will benefit from a set of more reliable and secure nodes, who are able to prevent and address infrastructure issues, before they become major issues.
This should free operator time and attention to put toward other value-added ecosystem activities, e.g. thoughtfully participating in network governance.
Budget
(Note: Develop Specification and Develop Dashboard phases leave a long window for community feedback. Timeline can be compressed by reducing these windows.)
1 - Set-up project structure, 1 week
- Confirm timeline
- Set-up repo
- Set-up project board
l
Payment 1 - $$8424
2 - Develop Draft Specification
- Monitoring and alerting tool draft requirements specification
- draft spec
- collect community feedback
Payment 2 - $$8424
3 - Develop Final Specification
- Monitoring and alerting tool final requirements specification
Example -
- refine spec
- share final spec for community feedback
- publish final spec
Payment 3 - $8424
4 - Develop Dashboard Prototype, 10 weeks
- Dashboard prototype for review
- Final dashboard for release
- Installation, configuration and operation documentation
- Design and implement prototype
- Collect prototype community feedback
- Design and implement release version
- Develop documentation
- Release code and documentation
5 - Develop Final Dashboard, 10 weeks
- Dashboard prototype for review
- Final dashboard for release
- Installation, configuration and operation documentation
- Design and implement prototype
- Collect prototype community feedback
- Design and implement release version
- Develop documentation
- Release code and documentation
Payment 4 - $8424
Rationale
Have already agreed to split first payment into two payments, listed as Payment 1 and Payment 2 above.
Dissenting Opinions
Appendix: On Using an Open Source Tool, Specifically Grafana/Prometheus
Each open source tool has it’s own strengths/weaknesses, none do it all.
We never represented we’d be building a new tool from scratch. We chose Grafana/Prometheus as a starting point and built on it from there.
Furthermore, nothing works “out of the box” without customization, implementation and configuration work.
In my 20+ years of working with network and systems management tools, I’ve never come across one that works “out of the box”. Each tool requires its implementer to dedicate significant time and resources to customizing, configuring and implementing it.
This is particularly true for open source software. It’s also why for-profit software companies typically generate more revenue and enjoy higher margins on selling services related to the software they sell, as compared to the the revenue and margins from the software itself.
We’ll customized the “default” configuration to add metrics that validators find valuable. We’ve done the up-front work required to minimize any additional configuration a validator would have to do to monitor their validator infrastructure.
We will develop a complete alerting module from scratch to enable custom alerts which currently supports Telegram & Email notifications. We will build the custom alerter module to enable a way to extend the features in the future.
We will release developer docs with actual release so anyone can easily extend the features and add more custom alerts. We would be happy to add alerts the community finds useful.
We will provide detailed implementation instructions and templates to minimize the implementation overhead required of validators. This makes it more likely the validators will actually use the tool.
The available monitoring information increases in value the more easily and quickly it can be understood, digested and acted upon, so the UI/UX is key.
Simply dumping a waterfall of metrics and stats on validators unnecessarily increases the operational burden on validators and makes the tool less useful. We received valuable feedback that confirmed this for us. That’s why we will include a summary view and pay specific attention to how the dashboards are organized.
Deliverable(s)
See Budget section.
Contributor(s)
This will be a joint effort led by Chainflow (Chainflow Staking System Intro and FAQ) and supported by Vitwit (vitwit.com).
Together, we have built a similar tool for the Cosmos community, under an Interchain Foundation Grant.
Chainflow, specifically Chris Remus will be responsible for the timely delivery of each deliverable.
Copyright
Copyright and related rights waived via CC0.