OPEN: POKT AI Lab --- Socket

RawthiL · February 19, 2024, 8:19pm

TL-DR

Create a research and development team focused on the Quality of Service (QoS) of Machine Learning (ML) models deployed in the POKT Network. As first deliverables we aim to create an open evaluation framework, a leaderboard of staked ML models and tools for model staking. The long term goal is to provide Shannon with state of the art quality metrics for the on-chain AI Watchers.

Socket Data

Ambition to support:

Ambition #1: Pocket has $1B of annual protocol revenue
Ambition #2: Pocket has the most trusted infrastructure brand in crypto

Planned work:

Research and keep up-to-date bibliography on relevant subjects.
Provide informed opinions on ML related topics for the development of the Shannon update.
Develop tools for measuring selected ML models’ QoS.
Help finding correct values for burn/mint of ML services.
Create the required databases to deploy a public site that will display this information and more insight of the POKT Network ML-related QoS data.
Develop tools to deploy ML models in the POKT network.
Write updates on the research topics and relevant news of the crypto-AI world.

Who we are / Previous work:

The team is composed of:
- Ramiro Rodriguez Colmeiro, PhD. +5 years of experience in ML. +2 years in the POKT Network.
- Nicolas Aguirre, PhD. +5 years of experience in ML. +1 years in the POKT Network.
- Collaboration (not full commitment) from 4 members (1 PhD, 3 PhD candidates) of the “Argentinian Philosophical Analysis Society” (SADAF-CONICET), focused on language and semantic analysis.
Participation of the POKTscan / PNYX team for technical support.
The team has a high degree of specialization on the main topic of this Socket (i.e. Machine Learning) and counts with support from a research group focused on the evaluation of Language Models (one of the most important applications of ML models today).
As most of the work will deal with specialized metrics, targeted to a specialized audience (AI model developers), we believe that the POKT Network should take a full technical approach to the QoS metrics and be able to stand at the edge of the current ML knowledge.

Type of Socket:

The total time of this socket is unknown, but we expect to run even after Shannon is deployed. We are requesting the maximum amount of 4000 U$D/month to start this research. This will not cover the whole cost of the operation, part of this effort will be also funded by PNYX AI.

Commitment to “Default to open”:

We commit to develop all code in public repositories and carry research in the open (open channels, bibliography and discussions).
Test bench Repository
Other communication channels are TBD.

Socket ETH Address:

0x0BFd787CEb920d657e743FDe6695111F06915288

Note:

The Socket is a way to kick-off this experiment in time. We expect this research project to grow beyond a Socket in the future, to make room for more contributors and services. After some time (and deliverables), and if the DAO finds it worthy, we can move to a direct DAO fund.

Context of this Socket - CLICK ME!

In the last months the idea of running other RPCs has started to arise within the community, principally for the Large Language Models (LLMs) RPCs. This idea is not strange to the academic world, a great talk on the subject was given at Barkley RDI 2023, you can see the presentation here, and check the SAKSHI paper.

If you are wondering what are the Large Language Models (LLMs) and why should we care, let’s make a very brief introduction: LLMs are Machine Learning (ML) models that produce (probable) sentences given a query, the most known of these is probably ChatGPT. The market for LLM uses has grown rapidly for OpenAI and other giants like Meta or Google. Also AI-based startups like Perplexity or Cohere are raising lots of capital to enter this market. The market for LLMs and other ML-related services is just beginning to grow.

A place for the POKT Network in the Crypto-AI world

Many projects are trying to harness the narratives of Crypto and Artificial Intelligence (AI), the approaches are numerous, interacting at many different levels. Enumerating all the proposed Crypto-AI projects is not practical, the list is vast, ever-growing and without a clear direction.

If we narrow down to projects that are focused on decentralized infrastructure, there are two broad categories, those providing computing for training (like Gensyn) and those providing inference services (Bittensor or Allora ). These projects rely on specialized consensus mechanisms that were developed exclusively for integrating crypto and AI. Specifically for those focused on inference, they look for keeping part of the inference process on-chain or proving the computation of some specific models (using Zero Knowledge Proofs, you can read more here), a task that’s not easy at all.

The POKT Network is not an AI native blockchain, it was created with the sole intention of providing the most reliable, performant and cost-effective Remote Procedure Calls (RPC) by means of crypto economic incentives. There is no direct connection between AI and the POKT network, except that many of the most popular AI powered applications use an RPC to communicate with the underlying AI. For example, each time a user communicates with a chat bot, the website makes a call to a Language Model (LM) and the response is prompted to the user. This means that the POKT Network can become the default access to a wide range of AI models, providing decentralized and permissionless access to any app developer that needs to access a model and also to any AI model developer that wants to offer their AI product. The only question that remains is why would they choose the POKT Network? Well, that’s what we will try to answer…

Quality of Service is… hard…

While the crypto-economics and the RPC layer of the Pokt Network is an excellent fit for the ML-RPCs, the permissionless nature of the network is what makes it a challenge (big surprise). We know that measuring the Quality of Service (QoS) of the current blockchain endpoints is not easy and a whole new actor is being implemented for the Shannon upgrade (The Watcher), but at least we know how to do it. Even today, Gateways can test the staked blockchains using simple mechanisms like majority checks among nodes in a session (to give a simple example).

In the world of ML and specifically in the world of LLMs, the landscape changes quite a bit:

There is not a single tool for each use case

With blockchain nodes the user wanted to obtain data from a very specific network, and that data is unique. With LLMs the user wants to access a Language Model (LM), to obtain an answer. The LM that the user wants to access is not fixed, in fact the users hardly know exactly which model is the one that processes their query. The expected response is also not unique, two answers can be lexically different but both conceptually correct and equivalent.

ML is growing and changing each day

At the time of writing this the best open-source models that could be staked on the network are the LLAMA2-based LLMs (including the Mistral’s variations), or many others (We had a list but it got old really fast). What we want to emphasize here is that as opposed to blockchains, the number of LLMs grows too rapidly to be able to catch up. Creating a service/chain for each LLM is not apt for ML models, the POKT network should be able to change organically as the ML landscape changes (we will describe this more deeply as one of our first reports).

A single LLM can have many flavors

Related to the last point, there is further resistance to lock a single model to a single service/chain. LLM models can all be of the same type, just to give a hint on the cardinality of the LLAMA-2 13B model variations you can check this single repository that has over 30 variations of it, having a chain/service for each of them makes no sense. Also, the POKT network should attract servicers that have no extra cost for providing the service, i.e. people that already have an LLM model working for other purposes and only want to sell their excess capacity. These people might be running case-specific networks, fine-tuned for some random task, which can also respond correctly to many other queries but they won’t respond like any other public model.

Pricing is not straight forward

Running ML models is much more expensive than running blockchain models, not only due to hardware requirements, but also due to processing costs (more energy is required). This was one of the motivations of PIP-31. Sadly two ML-RPC calls can differ wildly in computing costs. The cost can change because the data to be processed is large (a large corpus is given in the body) or because the model is inefficient (the number of tokens required to process the query can change from LM to LM).

The path forward

Solving this prior Shannon update is not possible and initially Shannon does not include many of the required specifications. However it does not mean that we cannot start building.

We see the path to ML-RPCs as following:

Morse Era

This is the current time, what can we do now?

Define a subset of ML models to support

We propose research on the subset of ML services/chains that the network can whitelist. Specifically start with LMs and Text-to-Image models, without any further specification (i.e. supporting only Llama-2-7b-gptq is not useful).

Create tools to stake ML models

Staking ML models can be done with lots of different tools, such as TGI, vLLM, TorchServe or TFServing, among others. We will probably need some adaptation of the endpoints to ensure that all staked models for a given chain respond in the same way and provide all needed data.

Average RPC price

We won’t have the tools to measure the procedure effort until Shannon or later. Setting a fair and cheap price for the LLM-RPC call is probably the only way forward. Finding a fair price is something that needs to be done.

Gateway-Side QoS Enforcing

As it is the case with blockchains services today, the QoS will be only enforced by the Gateways. Sadly this is the only choice, but we can give them a hand by creating public goods around QoS measuring.

A Permissionless and Public Leaderboard

Leveraging tools like Stanford - HELM and LM Evaluation Harness for LLMs and Stanford - HEIM for Text-to-Image models, we can provide tools to do online measuring the ranking of the nodes in our network, comparing them with their theoretical performance. Both gateways and users will be able to know which kind of models are being staked on the Network (and compare them to the big players’ self-reported performance).

Early Shannon Era

With the initial Shannon update we can focus on the Watcher module. During this time we can work to create the AI-related QoS modules, providing on-chain metrics of the staked models…

We will also have to work on the proof mechanism for the real procedure effort of each query, once more, the Watcher will play the role of overseeing this.

Mature Shannon Era

This is the exciting part… There is a very interesting article on ChatGPT performance evolution. The article shows that the LLM service of ChatGPT3.5 and ChatGPT4 changed a lot in a short time, degrading in some tasks. This leads us to the question…

And the answer is that you will never know.

What we can do with a fully functional Pokt Network is to provide blockchain-based metrics of QoS on the LM models that we serve. You will no longer need to trust OpenAI, or any other organization or new AI clergy that a model has a given capacity and that it has not changed.

We will be able to show a model performance leaderboard based on data in an immutable ledger.

New models/companies will be able join permissionlessly and compete with big players for performance without having to fight the established names.

The QoS module (the Watchers) will be an ever evolving part of the POKT Network, that by itself will be a valuable tool to coordinate ML development and align consumer preferences to research trends.

We imagine the POKT network not only as the backbone of web3, but also as the decentralized source of QoS data.

Some final words

Solving all this is not an easy task, but we are in the correct place to do it. We have all the crypto-economic incentives in place and a strong community of builders and servicers.

Entering the market of ML RPCs will show that POKT can really become your API to the Open Internet.

doctorrobinson · February 21, 2024, 9:06pm

Well we’re excited to have this one kick off! It’ll be open on March 1, with payment streaming starting the same day.

We’ll need an ETH address to set you up on a stream @RawthiL

I’d love to have you join our builders or community call once we’re back and give an overview of what you’re making here, and if there’s opportunities for other ecosystems or people to get involved.

@b3n please set up this stream to open on March 1

poktblade · March 7, 2024, 8:58am

is POKT AI Lab a new entity?

RawthiL · March 7, 2024, 1:44pm

Nope, currently is only a socket. Any of the deliverables is (will be) property of the DAO.
For simplicity we will start the repositories inside POKTscan or PNYX, but then transfer all to the foundation controlled repos.

RawthiL · April 1, 2024, 6:09pm

2024/03 Update

The initial report, accessible here, delves into fundamental concepts of Machine Learning crucial for comprehending Pocket Network’s use case. Within this document, we concentrate on Morse’s capabilities, delineating what is achievable and what remains beyond reach. Additionally, we offer an in-depth exploration of the challenges in assessing the LLM model, alongside a possible solution to mitigate this issue. Finally we comment on the future steps of this socket.

It’s important to note that our analysis refrains from commenting on Pocket’s prices and market viability. These aspects constitute a broader subject currently under discussion by the wider community.

Updates TL-DR

Wrote first repport covering general ML concepts and the capability of Pocket Network Morse.
Released a Local-Net proof of concept for the Pocket Network (Morse). The Local-Net features two ML services: LLM and Diffusion Models (text to image generation).
Updated pocket-core and pocket-localnet repos (POKTscan forks) to support ML relays:
- The default mesh config timed-out on ml calls (both LLM and Diffusers).
- The pocket core had a limit of 1 MB incoming RPC sizes, a parameter was added to make this configurable and enforced in a more modular fashion.
- The relayer app of pocket-localnet was updated to allow response dumping and custom relays (this will only be used for the proof of concept).
Released guides to deploy ML models:
- LLM using vLLM backend and OpenAI API compatibility: pocket-ml-testbench/model-deployment/llm at main · pokt-scan/pocket-ml-testbench · GitHub
- Diffusers using a custom backend that looks to be like Stable Diffusion API (needs more work) pocket-ml-testbench/model-deployment/diffusers at main · pokt-scan/pocket-ml-testbench · GitHub
Started work on benchmarking process:
- Conceptual designed almost settled.
- Started to work on lm-eval-harness implementation.

RawthiL · April 30, 2024, 8:57pm

2024/04 Update

The second report is now online. This time we make an overview on the origins of the leaderboards that are used to compare LLMs, with special interest on the LM Evaluation Harness framework. We then comment on these framework’s weaknesses (a general issue with all leaderboards, not specific to LMEH) and the potential role of the Pocket Network.
Later we make an in-depth description of the architecture of the Machine Learning Test-Bench (MLTB), that is being developed to reproduce the LMEH in the Pocket Network nodes.

Updates TL-DR

Helped in the creation of the (POKT Square RAG Agent)[GitHub - pokt-scan/pokt-square: A conversational agent equipped with retrieval-augmented generation capabilities, built upon the foundation of the Pocket Network stack.].
Finished the architectural design of the asynchronic testing procedure (the Pocket ML Test-Bench).
Merged issues on the Test-Bench code:
- (Template code for Temporal IO on Python)[Python Temporal IO · Issue #9 · pokt-scan/pocket-ml-testbench · GitHub].
- (Created docker-compose files for deploying the test-bench for development)[Create docker-compose.yaml file for development · Issue #12 · pokt-scan/pocket-ml-testbench · GitHub].
- (Combined the task of storing Hugging Face datasets with the Sampler to create a single Temporal App for both)[Join Sampler into Temporal · Issue #18 · pokt-scan/pocket-ml-testbench · GitHub].
- (Added code for saving and retrieving tasks from MongoDB)[MongoDB: Saving and retrieving Task and Instances · Issue #17 · pokt-scan/pocket-ml-testbench · GitHub].
- (Updated the readmes with final software architecture)[Update main Readme · Issue #14 · pokt-scan/pocket-ml-testbench · GitHub].
- (Added basic requester code)[Requester: complete MVP · Issue #11 · pokt-scan/pocket-ml-testbench · GitHub].
A lot of code was also merged code without a matching issue (due to premature stage of the project), it includes:
- Creating a proper logger function for both Go and Python that rungs OK with temporal.
- Added initial Sampler workflow code in Python.
- Cleaned all Readmes for better understanding of the repository.
- Make the code more independent from Language Models Evaluation Harness.
- Added packages for Pocket RPC and MongoDB connection handling in Go.

Future Work TL-DR

Create first PoC of the Pocket MLTB on LocalNet.
We will provide support but not be actively developing the POKT Square RAG (due to time constraints we will be focusing on the MLTB).

RawthiL · May 31, 2024, 9:46pm

2024/05 Update

The third report is now online.
This moth we came across the problem of tokenizers. We make a high level description of them and then we highlight why the POKT Network will need to have access to them (which will be a deviation form the OpenAI API, more specifically an additional endpoint).

This month we run into some unexpected problems (TemporalIO+Python Async and Tokenizer access) and we were not able to finish the product as planed. Now that we have solved these (or we know how to solve them at least) we are more confident that we can finish the experiment next month with a community presentation

Updates TL-DR

Merged issues on the Test-Bench code:

Manager:
Sampler:
- Generate prompts and save to mongo + Tokenizer
Requester:
- Complete requester MVP
General:
- Fixes and enhancements to create the first version of Morse POC

Future Work TL-DR

Manager:
Sampler:
- Create logic for tokenizer signature task
- Add asynchronous read operations to the MongoDB collections.
Evaluator:
- Create initial code for LMEH processing
Create community presentation material!

Dermot · June 4, 2024, 9:25am

Thanks for all these great updates @RawthiL

Can you shed a bit more light on where you think this all ends up?

And for the evaluator specifically, are you developing a public goods style community-run Watcher?

RawthiL · June 4, 2024, 1:26pm

Sure,

This socket at its conclusion will provide the community with:

Back-end for implementing arbitrary tests on AI nodes
The code for displaying a site that shows the quality of the LLM nodes in the same way as Open LLM Leaderboard - a Hugging Face Space by open-llm-leaderboard

If the community wants, we will later propose to maintain this running and add API capabilities (so gateways or apps can directly query nodes’ status).

We have designed this in the philosophy of an off-chain watcher. We have some ideas on how to grow this into on-chain / L2 / beacons / etc, but we are too early, we will start thinking on that after Shannon mainnet.

We will try to explain this during our presentation at the end of the month and a subsequent proposal for continuation / maintaining a site running this backend.

RawthiL · July 1, 2024, 1:17pm

2024/06

The socket’s final report is online.

During the month of June, the socket achieved its final milestones, culminating in a full reproduction of the (now old) OpenLLM Leaderboard. Specifically, we merged issues on the Test-Bench code:

We will be presenting the results of the socket work on the Weekly Ecosystem Call (2024-07-03), making a full recap of the socket and a live presentation of the produced MVP!