Optimizing the v0 client to save the network a ton of money

At the time of writing this, Pocket Network has 34770 nodes. The average cost per node probably is between $25-80 per month. This means that the network currently costs between $870k-$2.5m a month to run. This doesn’t make sense given the number of relays that the network is currently serving.

This cost is due to inefficiencies in the v0 architecture; it is a recquisite for each servicer to run a full node. This isn’t entirely necessary because a servicers job is just to relay data from backend blockchain nodes, and bare metal node runners are having to make tens of pocket nodes run on the same machine, which is a waste of resources.

V1 will supposedly solve these issues and make pocket much more resource efficient; however, this update is at least 18 months out. In the interim, node runners will have to spend between $15.5m-$45m to keep the network running in its current state. I have spoken with a few of the devs about making this change, and they have acknowledged the benefit of it, as well as the extreme difficulty of abstracting the full node from the servicer. They also think that it is optimal to focus fully on v1, rather than make these optimizations to a client that will eventually become obselete.

I understand there are limitations with hiring talent and significant techincal hurdles required for this task, but I have a strong belief that PNF/PNI/DAO should fund the development of a v0 client that abstracts the full node away from the servicer because this modification would reduce costs dramatically. Not only would this benefit node runners, but it would also benefit the network as a whole, because it would allow the DAO to lower inflation and have node runners increase their margins.

I don’t think its optimal for the network to spend 10s of millions of dollars on infrastructure till v1 is released. It hurts token price and requires higher inflation. This modification should be pursued, and it makes economic sense to spend millions of dollars to get it done. The DAO could increase their percentage of the node rewards until node runners have repaid them.

Let me know what you all think.

4 Likes

Agreed with everything here up til the word “supposedly”; that implies there’s some question about this being the architecture outlined. It does get solved in v1. That being said, all of us running nodes recognize the benefit of abstracting away the chain storage requirements.

This is the meat of the problem for me. Part of the reason v1 has so many optimizations in usage is that it is a fundamental rewrite of the underlying system system, which is always easier than trying to accomplish a major refactor of existing code to support new complex functionality.

At the same time, if such a thing WERE possible, I strongly agree with the benefits outlined (and assume that’s why they’re baked into the v1 architecture).

If a path to achieving this could be determined which does not impact the v1 timeline, I’d be in support of this. I expect it would require additional hiring, which also comes with its own challenges. Or perhaps this is a place where a third party team could build and demonstrate proof of concept.

The other consideration is timeline. If it takes a year to accomplish this iteration, is there any point?

I have a lot of questions here, but strongly support the idea, if it can be proven feasible and in a timeline that makes sense.

I know. Just acknowledging that implementation can differ from theory :slight_smile:

Its not just the chain storage, for that can be fixed with chain pruning. It doesn’t make sense to have every servicer create all their proofs and validate every incoming transaction and block.

I agree. I just think its worthwile to find a way to optimize this in the meantime. It would literally save tens of millions of dollars.

It probably will. Pierre and I are going to dive into this soon and figure out what is possible and how difficult it would be. We’ll report back with our findings.

3 Likes

Hey,

I’m glad you made this post and agree with some of your points. It’s something I’ve been thinking about as well to help reduce costs per node. I wonder if it’s already been done by some of the bigger node providers to help reduce costs. Really right now, the only public approaches I’ve seen is to use multiple beefier machines and load balance.

To research and accomplish something like this, it will require good grasp of the current pocket-core code base. I’m willing to support a proposal that leads to a lighter node resulting in better performance and lower costs as long the community can use it.

Another point to add to all of this is that hardware capacity is not as readily available due to supply & demand of hardware right now. A lot of providers today are having capacity/scaling issues due to this, even outside the Pocket ecosystem. Reducing the amount of compute/storage power needed is a win for and alleviating this strain is a huge win all around the board.

I’m all for reducing the costs of node running; it’s the largest barrier to entry I hear from most prospective runners, although they typically opt for the POKTPool. And I can say from experience your monthly price estimate skews low; I pay 80$/month with a sweeheart deal. Perhaps I’m being overcharged, but there’s substantial friction in switching providers.

One question - would this also lower the hardware requirements for a node? If so, I could see sync times dropping for new nodes, while nodes driving Ford Fiesta quality hardware are more able to compete by staying in sync more reliably. Not that we want to litter the nodescape with junk hardware, but the trade off here is faster node propagation.

Love the thought. I think though the low hanging fruit here may be focused on pruning optimization vs. full node extraction from servicers. While the benefits are clear for separating the servicer from the full node, I don’t see that as being possible without ample knowledge of v0, which only the team really has. This would likely take away from the teams focus on v1 from my perspective.

I could see the DAO funding a research team to identify if there are ways to prune servicers in a manner that does not effect node quality. Tendermint does have pruning from my understanding and since servicers aren’t used for validation, there could be opportunity here. Servicers would only hold minimal chain data to do what it needs to do. Though, with pocket-core’s current caching feature, there may be limitations on what prunning can be done. Worth exploring though.

I could see this happening outside of the core team and in a manner that doesn’t effect v1. This would require a good amount of testing to ensure it doesn’t effect the network at scale. The good news is that since this would be more of a parameter change in the node itself, it wouldn’t require an updated to pokt-core most likely.

Those are some of my initial thoughts :slightly_smiling_face: Thanks for starting the conversation!

The requirements for the relay nodes are also a large cost factor - in addition to the Pocket nodes. If the goal is to bring down the overall costs for node runners, it seems like there are other options as well. For example, maybe shared RPC endpoints for the relay chains could be provided for node runners. While running a pocket node might only cost between $25 - $85 per month - it’s a lot higher if you also consider the costs of running nodes for the relay chains.

Thank you for the proposal. I like the motivation and idea proposed at a high level but wanted to flag that it may be a more significant undertaking than expected across research, development, and testing. In addition, it is important to note that these changes will likely extend across both pocket-core and the ~2-year-old forked Tendermint codebase, so the developers will need to have an extensive understanding of both.

It’s also worth mentioning that the core team is looking into optimizing some low-hanging fruit (e.g. WRT. storage), though the cost to node runners will still be high until v1 is officially released. Re modifications to the claim/proof lifecycle - this is why V1 is following an “optimistic” approach, but I believe that these changes in V0 will not be trivial.

Concerns

My main points of concern are:

  1. Implementation: Implementation is non-trivial and could extend well into the v1 development lifecycle, at which point the network operational overhead of having two breaking upgrades may not be worth it.
  2. Support: This will likely require the support / advisorship of the core protocol team members. Timer permitting, we’d be happy to review documents and answer questions, but our main focus is V1.
  3. Tokenomics: Assuming all the safety guarantees are met, if significant changes to the core protocol are made, this could impact Pocket’s tokenomics, which would require a lengthier discussion. E.g. the WAGMI discussion alone spanned multiple months.
  4. Quality Assurance: In addition to the implementation in point (1), I believe testing and quality assurance of the changes would be even more difficult since bugs can’t be simply “patched”. With every minor and major release that has come out in V0, the team builds an extensive plan with lots of testing and has found some critical bugs along the way. As a newer member of the team, I was very impressed by the detailed hundreds of pages used to test some of the recent and upcoming releases.

Suggestion

With that being said, I personally really want to support community contribution and involvement as much as possible. So my suggestion would be to:

  1. The DAO provides a small grant (e.g. 1%-5%) to fund the research for this proposal. The deliverable for this grant will be a design document and QA plan.
  2. The community and core protocol team members review and provide feedback on the proposed changes and testing plan. This can help scope the level of changes, highlight missing gaps, and give insight into the feasibility of the proposed design.
  3. Assuming (2) goes well, the DAO could provide another small grant (e.g. another 1%-5%) to fund the development of a prototype.
  4. Many issues & challenges are often uncovered in step (3), and timelines change, which can guide the research team as to whether this should be pursued.
  5. Depending on the outcome of (4), the DAO could provide the remaining funds.

Wins

I think this would be a win-win-win for the following reasons:

  1. There is the potential upside for lower costs for node runners.
  2. Even if the project fails is not pursued after the research or prototyping phase, the team will still be compensated for the time.
  3. The protocol team is very invested in making V1 more contributor-friendly through better tooling, better documentation, a better codebase, and well-defined processes for proposal, submissions and reviews. This could be an excellent opportunity to support, work with and learn from external contributors in the community, which will guide how we can further improve it in V1.
1 Like

I like this idea. There is no other way to come up with a lighter client without research & trial and error.

What are you referring to here?