TL-DR
Create a research and development team focused on the Quality of Service (QoS) of Machine Learning (ML) models deployed in the POKT Network. As first deliverables we aim to create an open evaluation framework, a leaderboard of staked ML models and tools for model staking. The long term goal is to provide Shannon with state of the art quality metrics for the on-chain AI Watchers.
Socket Data
Ambition to support:
- Ambition #1: Pocket has $1B of annual protocol revenue
- Ambition #2: Pocket has the most trusted infrastructure brand in crypto
Planned work:
- Research and keep up-to-date bibliography on relevant subjects.
- Provide informed opinions on ML related topics for the development of the Shannon update.
- Develop tools for measuring selected ML models’ QoS.
- Help finding correct values for burn/mint of ML services.
- Create the required databases to deploy a public site that will display this information and more insight of the POKT Network ML-related QoS data.
- Develop tools to deploy ML models in the POKT network.
- Write updates on the research topics and relevant news of the crypto-AI world.
Who we are / Previous work:
- The team is composed of:
- Ramiro Rodriguez Colmeiro, PhD. +5 years of experience in ML. +2 years in the POKT Network.
- Nicolas Aguirre, PhD. +5 years of experience in ML. +1 years in the POKT Network.
- Collaboration (not full commitment) from 4 members (1 PhD, 3 PhD candidates) of the “Argentinian Philosophical Analysis Society” (SADAF-CONICET), focused on language and semantic analysis.
- Participation of the POKTscan / PNYX team for technical support.
- The team has a high degree of specialization on the main topic of this Socket (i.e. Machine Learning) and counts with support from a research group focused on the evaluation of Language Models (one of the most important applications of ML models today).
- As most of the work will deal with specialized metrics, targeted to a specialized audience (AI model developers), we believe that the POKT Network should take a full technical approach to the QoS metrics and be able to stand at the edge of the current ML knowledge.
Type of Socket:
- The total time of this socket is unknown, but we expect to run even after Shannon is deployed. We are requesting the maximum amount of 4000 U$D/month to start this research. This will not cover the whole cost of the operation, part of this effort will be also funded by PNYX AI.
Commitment to “Default to open”:
- We commit to develop all code in public repositories and carry research in the open (open channels, bibliography and discussions).
- Test bench Repository
- Other communication channels are TBD.
Socket ETH Address:
0x0BFd787CEb920d657e743FDe6695111F06915288
Note:
The Socket is a way to kick-off this experiment in time. We expect this research project to grow beyond a Socket in the future, to make room for more contributors and services. After some time (and deliverables), and if the DAO finds it worthy, we can move to a direct DAO fund.
Context of this Socket - CLICK ME!
In the last months the idea of running other RPCs has started to arise within the community, principally for the Large Language Models (LLMs) RPCs. This idea is not strange to the academic world, a great talk on the subject was given at Barkley RDI 2023, you can see the presentation here, and check the SAKSHI paper.
If you are wondering what are the Large Language Models (LLMs) and why should we care, let’s make a very brief introduction: LLMs are Machine Learning (ML) models that produce (probable) sentences given a query, the most known of these is probably ChatGPT. The market for LLM uses has grown rapidly for OpenAI and other giants like Meta or Google. Also AI-based startups like Perplexity or Cohere are raising lots of capital to enter this market. The market for LLMs and other ML-related services is just beginning to grow.
A place for the POKT Network in the Crypto-AI world
Many projects are trying to harness the narratives of Crypto and Artificial Intelligence (AI), the approaches are numerous, interacting at many different levels. Enumerating all the proposed Crypto-AI projects is not practical, the list is vast, ever-growing and without a clear direction.
If we narrow down to projects that are focused on decentralized infrastructure, there are two broad categories, those providing computing for training (like Gensyn) and those providing inference services (Bittensor or Allora ). These projects rely on specialized consensus mechanisms that were developed exclusively for integrating crypto and AI. Specifically for those focused on inference, they look for keeping part of the inference process on-chain or proving the computation of some specific models (using Zero Knowledge Proofs, you can read more here), a task that’s not easy at all.
The POKT Network is not an AI native blockchain, it was created with the sole intention of providing the most reliable, performant and cost-effective Remote Procedure Calls (RPC) by means of crypto economic incentives. There is no direct connection between AI and the POKT network, except that many of the most popular AI powered applications use an RPC to communicate with the underlying AI. For example, each time a user communicates with a chat bot, the website makes a call to a Language Model (LM) and the response is prompted to the user. This means that the POKT Network can become the default access to a wide range of AI models, providing decentralized and permissionless access to any app developer that needs to access a model and also to any AI model developer that wants to offer their AI product. The only question that remains is why would they choose the POKT Network? Well, that’s what we will try to answer…
Quality of Service is… hard…
While the crypto-economics and the RPC layer of the Pokt Network is an excellent fit for the ML-RPCs, the permissionless nature of the network is what makes it a challenge (big surprise). We know that measuring the Quality of Service (QoS) of the current blockchain endpoints is not easy and a whole new actor is being implemented for the Shannon upgrade (The Watcher), but at least we know how to do it. Even today, Gateways can test the staked blockchains using simple mechanisms like majority checks among nodes in a session (to give a simple example).
In the world of ML and specifically in the world of LLMs, the landscape changes quite a bit:
There is not a single tool for each use case
With blockchain nodes the user wanted to obtain data from a very specific network, and that data is unique. With LLMs the user wants to access a Language Model (LM), to obtain an answer. The LM that the user wants to access is not fixed, in fact the users hardly know exactly which model is the one that processes their query. The expected response is also not unique, two answers can be lexically different but both conceptually correct and equivalent.
ML is growing and changing each day
At the time of writing this the best open-source models that could be staked on the network are the LLAMA2-based LLMs (including the Mistral’s variations), or many others (We had a list but it got old really fast). What we want to emphasize here is that as opposed to blockchains, the number of LLMs grows too rapidly to be able to catch up. Creating a service/chain for each LLM is not apt for ML models, the POKT network should be able to change organically as the ML landscape changes (we will describe this more deeply as one of our first reports).
A single LLM can have many flavors
Related to the last point, there is further resistance to lock a single model to a single service/chain. LLM models can all be of the same type, just to give a hint on the cardinality of the LLAMA-2 13B model variations you can check this single repository that has over 30 variations of it, having a chain/service for each of them makes no sense. Also, the POKT network should attract servicers that have no extra cost for providing the service, i.e. people that already have an LLM model working for other purposes and only want to sell their excess capacity. These people might be running case-specific networks, fine-tuned for some random task, which can also respond correctly to many other queries but they won’t respond like any other public model.
Pricing is not straight forward
Running ML models is much more expensive than running blockchain models, not only due to hardware requirements, but also due to processing costs (more energy is required). This was one of the motivations of PIP-31. Sadly two ML-RPC calls can differ wildly in computing costs. The cost can change because the data to be processed is large (a large corpus is given in the body) or because the model is inefficient (the number of tokens required to process the query can change from LM to LM).
The path forward
Solving this prior Shannon update is not possible and initially Shannon does not include many of the required specifications. However it does not mean that we cannot start building.
We see the path to ML-RPCs as following:
Morse Era
This is the current time, what can we do now?
Define a subset of ML models to support
We propose research on the subset of ML services/chains that the network can whitelist. Specifically start with LMs and Text-to-Image models, without any further specification (i.e. supporting only Llama-2-7b-gptq is not useful).
Create tools to stake ML models
Staking ML models can be done with lots of different tools, such as TGI, vLLM, TorchServe or TFServing, among others. We will probably need some adaptation of the endpoints to ensure that all staked models for a given chain respond in the same way and provide all needed data.
Average RPC price
We won’t have the tools to measure the procedure effort until Shannon or later. Setting a fair and cheap price for the LLM-RPC call is probably the only way forward. Finding a fair price is something that needs to be done.
Gateway-Side QoS Enforcing
As it is the case with blockchains services today, the QoS will be only enforced by the Gateways. Sadly this is the only choice, but we can give them a hand by creating public goods around QoS measuring.
A Permissionless and Public Leaderboard
Leveraging tools like Stanford - HELM and LM Evaluation Harness for LLMs and Stanford - HEIM for Text-to-Image models, we can provide tools to do online measuring the ranking of the nodes in our network, comparing them with their theoretical performance. Both gateways and users will be able to know which kind of models are being staked on the Network (and compare them to the big players’ self-reported performance).
Early Shannon Era
With the initial Shannon update we can focus on the Watcher module. During this time we can work to create the AI-related QoS modules, providing on-chain metrics of the staked models…
We will also have to work on the proof mechanism for the real procedure effort of each query, once more, the Watcher will play the role of overseeing this.
Mature Shannon Era
This is the exciting part… There is a very interesting article on ChatGPT performance evolution. The article shows that the LLM service of ChatGPT3.5 and ChatGPT4 changed a lot in a short time, degrading in some tasks. This leads us to the question…
And the answer is that you will never know.
What we can do with a fully functional Pokt Network is to provide blockchain-based metrics of QoS on the LM models that we serve. You will no longer need to trust OpenAI, or any other organization or new AI clergy that a model has a given capacity and that it has not changed.
We will be able to show a model performance leaderboard based on data in an immutable ledger.
New models/companies will be able join permissionlessly and compete with big players for performance without having to fight the established names.
The QoS module (the Watchers) will be an ever evolving part of the POKT Network, that by itself will be a valuable tool to coordinate ML development and align consumer preferences to research trends.
We imagine the POKT network not only as the backbone of web3, but also as the decentralized source of QoS data.
Some final words
Solving all this is not an easy task, but we are in the correct place to do it. We have all the crypto-economic incentives in place and a strong community of builders and servicers.
Entering the market of ML RPCs will show that POKT can really become your API to the Open Internet.