PEP-1: POKT Node Monitoring and Alerting Dashboard

Attributes

  • Author(s): chris-chainflow#4082
  • Recipient(s): Authors
  • Category: Imbursement
  • Fulfills: N/A
  • Related Installments: N/A
  • Asking Amount: 421200

Summary
We will customize open source tools, building the necessary plugins, etc., to provide a more comprehensive, POKT-node-specific monitoring and alerting toolset. We will provide the source code and documentation necessary for node operators to implement the solution for themselves.

Abstract
Chainflow will build a monitoring and alerting toolset that provides better visibility in validator operations, resulting in more secure and reliable network, while freeing time for validators to participate in other POKT community initiatives, e.g. governance.

Motivation
The POKT network requires nodes to secure and provide services to the network. These nodes need to meet strict and demanding availability and security requirements. Monitoring and alerting tools are needed to ensure these nodes are operating as expected.

Using these tools helps node operators provide the availability and security required and demanded by the network and its users. Establishing these tools as a helpful and necessary infrastructure component establishes good practices among node operators.

Open source monitoring and alerting tools offer a partial solution. Yet these tools need customization to monitor and alert on key node functions. Developing these tools often takes a back seat to “fire-fighting” activities. These activities ofen hijack a node operator’s daily operations.

As such, these tools become seen as a “luxury” to “get to later”, and as such don’t get developed or implemented. Furthermore, smaller node operators, key to stake decentralization, may not have the resources required to develop such tools.

This project will use open source tools as its foundation. We will customize the tools, building the necessary plugins, etc., to provide a more comprehensive, POKT-node-specific monitoring and alerting toolset. We will provide the source code and documentation necessary for node operators to implement the solution for themselves.

This will save them considerable time, attention and effort, making it much more likely they will establish this important piece of a highly available and secure node operation. As a result, the POKT network will benefit from a set of more reliable and secure nodes, who are able to prevent and address infrastructure issues, before they become major issues.

This should free operator time and attention to put toward other value-added ecosystem activities, e.g. thoughtfully participating in network governance.

Budget
(Note: Develop Specification and Develop Dashboard phases leave a long window for community feedback. Timeline can be compressed by reducing these windows.)

1 - Set-up project structure, 1 week

  • Confirm timeline
  • Set-up repo
  • Set-up project board
    l
    Payment 1 - $$8424

2 - Develop Draft Specification

  • Monitoring and alerting tool draft requirements specification
  • draft spec
  • collect community feedback

Payment 2 - $$8424

3 - Develop Final Specification

  • Monitoring and alerting tool final requirements specification

Example -

  • refine spec
  • share final spec for community feedback
  • publish final spec

Payment 3 - $8424

4 - Develop Dashboard Prototype, 10 weeks

  • Dashboard prototype for review
  • Final dashboard for release
  • Installation, configuration and operation documentation
  • Design and implement prototype
  • Collect prototype community feedback
  • Design and implement release version
  • Develop documentation
  • Release code and documentation

5 - Develop Final Dashboard, 10 weeks

  • Dashboard prototype for review
  • Final dashboard for release
  • Installation, configuration and operation documentation
  • Design and implement prototype
  • Collect prototype community feedback
  • Design and implement release version
  • Develop documentation
  • Release code and documentation

Payment 4 - $8424

Rationale
Have already agreed to split first payment into two payments, listed as Payment 1 and Payment 2 above.

Dissenting Opinions

Appendix: On Using an Open Source Tool, Specifically Grafana/Prometheus

Each open source tool has it’s own strengths/weaknesses, none do it all.

We never represented we’d be building a new tool from scratch. We chose Grafana/Prometheus as a starting point and built on it from there.

Furthermore, nothing works “out of the box” without customization, implementation and configuration work.

In my 20+ years of working with network and systems management tools, I’ve never come across one that works “out of the box”. Each tool requires its implementer to dedicate significant time and resources to customizing, configuring and implementing it.

This is particularly true for open source software. It’s also why for-profit software companies typically generate more revenue and enjoy higher margins on selling services related to the software they sell, as compared to the the revenue and margins from the software itself.

We’ll customized the “default” configuration to add metrics that validators find valuable. We’ve done the up-front work required to minimize any additional configuration a validator would have to do to monitor their validator infrastructure.

We will develop a complete alerting module from scratch to enable custom alerts which currently supports Telegram & Email notifications. We will build the custom alerter module to enable a way to extend the features in the future.

We will release developer docs with actual release so anyone can easily extend the features and add more custom alerts. We would be happy to add alerts the community finds useful.

We will provide detailed implementation instructions and templates to minimize the implementation overhead required of validators. This makes it more likely the validators will actually use the tool.

The available monitoring information increases in value the more easily and quickly it can be understood, digested and acted upon, so the UI/UX is key.

Simply dumping a waterfall of metrics and stats on validators unnecessarily increases the operational burden on validators and makes the tool less useful. We received valuable feedback that confirmed this for us. That’s why we will include a summary view and pay specific attention to how the dashboards are organized.

Deliverable(s)
See Budget section.

Contributor(s)
This will be a joint effort led by Chainflow (chainflow.io/staking) and supported by Vitwit (vitwit.com).

Together, we have built a similar tool for the Cosmos community, under an Interchain Foundation Grant.

Chainflow, specifically Chris Remus will be responsible for the timely delivery of each deliverable.

Copyright
Copyright and related rights waived via CC0.

2 Likes

Thanks for the in-depth proposal @chris-remus. It’s a clear need for nodes of all ranges. Looking forward to voting on this.

I see $33,000.00 USD and 21 weeks to develop and deliver monitoring and alerting software which is perhaps one or two iterations better than the two community contributions which have been developed for free in the last 30 days.
The proposal monitoring part is at least new… but… are we saying that only validators vote now?
My experience with governance networks, (Dash and Blocknet) is that a web portal integration is key to get participation in the voting process. Do I misunderstand Pocket’s voting structure? Don’t non-nodes have voting power also?
I’d like to see this proposal replaced by a proposal to reward the current providers of monitoring software and incentivize them to the next level.
If someone wants to put up a separate proposal to facilitate the voting process via a web portal… that would be cool.

1 Like

I believe there’s already some great work on a tool // dashboard (POKT ROKT) that already does this functionality. Maybe we should funnel resources to expand that or build on top of that before allocating more resources here.

Thanks for the awsome plug. I’m very proud of the ROKT, but… Let me clarify here that I am NOT personally going to accept any POKT for my part in the process because I am affiliated with the Pocket Team, but I will support any community effort in this area because it’s good for everyone.

Agreed with the sentiment. What’s needed? a formal proposal for this? I’ve been working on something similar for the past couple of weeks and would love any incentive to keep building on top. Would accept POKT as grant payment :smiley:

Here’s the (work in progress) dashboard:
https://pokt-monitor-spa.vercel.app/

And here’s the very simple express server that powers it https://github.com/amhed/pokt-monitor-server

1 Like

You’re correct @BenVan that our governance takes place off-chain and it’s not just nodes that are entitled to a vote. We’re currently using a Rinkeby Aragon instance while we work on developing a custom governance dashboard (web portal), which will also use Aragon (and Discourse like we’re currently doing). So, to be clear, proposal monitoring wouldn’t be part of these dashboards.

That said, I don’t think @chris-remus was necessarily proposing to have governance features in his dashboard. There is mention of proposals in the example Google Sheet, and mention of freeing up time for nodes to participate in more value-add activities such as governance, but no formal proposal to build governance functionality.

On the subject of monitoring dashboards, I would generally lean towards supporting a diversity of implementations, because I believe competition ultimately leads to better products. If the offerings overlap significantly then, sure, it makes sense to choose, but if they take different approaches to the node ops experience (e.g. ROKT seems more CLI-centric, whereas Chainflow would be building Telegram/email notifications) then it’s worth considering how they complement each other.

If you look closely to the Cosmos Validator Mission Control repo, it is clearly stated that the work on this dashboard has been outsourced to an Indian company called Vitwit.

I would be pretty disappointed to see some POKT being allocated to this kind of project, especially when people from the community have already built some great tools that will continue to evolve over time cause these people are really part of the community and won’t be leaving after finishing a one of mission. I’m afraid that if we approve these kind of projects, the POKT allocated will just get sold once they get the first occasion, which is not good for the growth of this project.

I would prefer to see @BenVan and @amhedh being compensated for their great tools and help on Discord, which would also motivate them to improve their projects and I don’t think they would sell their hard earned POKT right away.

If anyone has an idea for a separate proposal, feel free to submit a PEP here! And if you need any help or have any questions, shoot a message to the #proposal-help channel in Discord.

@BenVan we can’t replace this proposal with another proposal, because all proposals should receive fair consideration from voters. But if any of you feel strongly about a different approach, publish that approach as its own proposal and the voters can decide for either, all, or none of them.

Also, in case anyone wasn’t aware, there’s actually a Reimbursement category within PEPs to reward work that was previously done, as well as a Bounty category to reward future work by an unspecified contributor.

3 Likes

Thank you for the insights guys

Since both proposals solve the same problem, it makes sense to have the voting for both of them at the same time (A or B or both). It is also important to understand the pros and cons of both the approach if one has to choose either and understand the possibility of collaboration if both the solutions are complimentary. It would make sense to have a pocket core team & community-oriented insight to this end. More importantly, what features are important for the node runners, and which proposal best achieves this? This hasn’t been clear so far.

As for incentive for the work already done, the reimbursement program mentioned by Jack seems like a viable solution.

1 Like

Hi Everyone :wave:

Thanks for the feedback to-date. To clarify, what we are proposing is a full featured monitoring and alerting dashboard for POKT nodes.

This tool -

  • Helps operators run their infrastructure more reliably
  • Resulting in a more resilient network
  • While reducing operational overhead
  • Allowing them to participate more actively in the POKT community

If you look at the screenshots here, you’ll see this is an operational tool whose target user is POKT node runners. That said, we can also allow certain metrics to be exposed externally to the community.

To address specific comments -

1 - Relevancy

You’ll see in the proposal multiple feedback stages. We’ve intentionally included these to make this tool as relevant as possible to its target audience.

2 - POKT ROKT

I haven’t seen this and would be glad to take a look. It’s hard for me to say what the overlap is without having seen it first. Regardless, as @JackALaing said, I also support a diversity of platform implementations, to reduce operational risk.

3 - @amhedh’s work

What we propose seems very different from, yet complementary with this tool. Our tool looks at a multitude of key operational metrics, rather than specifically reporting on a node’s earnings.

4 - Governance

The governance dashboard displays information about current and past proposals, as well as the particular node operator’s activity related to that proposal. It sends alerts when new proposals are issued and change status throughout the process.

5 - Vitwit Partnership

Yes, we are happy to work with Vitwit as our development partner. They’re a very talented team with significant experience in the Cosmos ecosystem and with blockchain/crypto projects in general.

6 - Selling POKT "once we get the first occasion"

We’re very excited for the future of POKT and intend to be here for the long haul. That said, it’s possible that we may choose to sell a % of the fees from this project to pay the bills.

Chainflow’s a small, independent bootstrapped operator who does at times rely on grants to support operational costs. I can tell you that our general strategy is to hold as long as possible, as we believe we’re in the very early stages of decentralization’s manifestation. So we don’t take the decision to sell earnings like this lightly.

1 Like

I don’t have much substantive feedback.

But do want to echo that at this stage diversity is really powerful, so long as there isn’t too much overlap.

I also would like to see the DAO be mindful of total outgoing POKT over time, particularly in the early stages of the network. Which may create a tension with funding diversity.

Side Note: Chris and Chainflow have been very great supporters of the network at its early early stages and I look forward to their continued contributions to the ecosystem.

1 Like

Just checking in here on this proposal. This has been great feedback from everyone. I know a lot has changed in existing node tooling and monitoring for nodes since this discussion started.

@chris-remus I wonder if we can re-evaluate the needs of nodes with the existing tooling available and ensure that the milestones are within reason given the pushback on the cost of the proposal?

For example, I know our infrastructure team released this awhile back - https://github.com/pokt-network/pocket-core-deployments/tree/staging/docker-compose/stacks

How would your proposal fit into that?

1 Like