Please read these first, as this proposal is about launching foundation models on Pocket Network. It helps to understand what they are and who is some of the competition.
1. What are Foundation Models? - Foundation Models in Generative AI Explained - AWS
2. Build Generative AI Applications with Foundation Models - Amazon Bedrock FAQs - AWS
Goals of this doc:
- Starting a discussion on enabling a brand new offering type, specifically multi-purpose and general use generative AI models on Pocket Network.
- Getting feedback from
- portal operators on their requirements.
- node runners on cost and implementation.
- DAO on ecosystem fit and tokenomics.
This doc is a conversation starter. It is not a be-all/end-all proposal for all possible uses of RTTM feature or even implementation of AI on Pocket Network.
Introduction:
Once the RTTM changes are in place, Pocket Network can be an excellent venue for hosting and running off-the-shelf Foundation Models (FM).
Why can Pocket with successful in this domain?
- Privacy - Pocket offers a much higher level of privacy and confidentiality than any other commercial offering.
- Permission - Pocket offers worldwide access. In contrast, competitors such as AWS BedRock require case-by-case approval to access these models. Friction to onboarding is much lower in case of Pocket.
- Price - Pocket can be more price effective than competitors because it doesn’t incur the costs of running large data centers and the massive amount of personnel associated with their offerings.
Possible Concerns (and why they shouldn’t be)
- Foundation Models - Some people think that only custom or fine-tuned models are necessary, therefore Pocket wouldn’t be competitive with its off-the-shelf foundation models. This is incorrect. FMs are still extremely useful for a lot of purposes and they are sufficient for most LLM-enabled applications nowadays.
- Performance - If you are worried about perf of such systems, don’t be: Each LLM call already takes several seconds (typically 15 to 30 seconds), so the minimal perf hit of few milliseconds when crossing routers etc. is not a competitive concern in terms of Cherry Picker or even in comparison to centralized providers.
If we look around for inspiration and choose some of the successful products, AWS BedRock stands out as one of the better places if someone is getting started with such models. Out of 6 such models that BedRock offers, 2 stand out with the most utility and simplicity. This document proposes that Pocket Network starts with the following two.
Proposed Models
Llama 2
Llama is a multipurpose text generation (i.e. generative prediction) library. Its license [3] allows free redistribution rights.
Use Cases [4] [5]
Llama 2 is an incredibly powerful tool for creating non-harmful content such as blog posts, articles, academic papers, stories, and poems. Llama 2 has many applications including writing emails, generating summaries, expanding sentences, and answering questions. It can also be used for automated customer service bots, helping reduce the need for human input.
Text Generation: Llama 2 leverages reinforcement learning from human feedback and natural language processing to generate text based on given prompts and commands. That means you can quickly create high-quality and non-toxic written content without spending hours at the keyboard. However, like every language model, Llama 2 doesn’t give you completed text, it gives you a piece to work on.
Summarization: Llama 2 language model can summarize any written text in seconds. Simply copy and paste your existing text and give a prompt to Llama 2. It will quickly generate a summary of your text without losing any critical information.
Question & Answering: Like every language model, Llama 2 is successful in answering users’ questions, analyzing their commands and prompts, and generating output. The feature that distinguishes Llama 2 from other large language models is that it can generate safe output. According to Meta AI’s benchmark, the Llama 2 language model generates lower violation output than its competitors.
Implementation
Llama comes with 3 sizes: 7B, 13B, 70B. Although 70B is the most capable, its hardware requirements is also pretty high. 7B is very nimble and fast, but sometimes it gives very confidence-shaking answers. Therefore**, we propose running 13B model.** It can be run on most GPUs (and even CPUs, albeit slowly)
Pricing:
(Remember, one of the goals of this document is getting feedback from all participants. So these numbers are temporary)
AWS charges [1] $0.00075 per 1000 input tokens and $0.001 per output tokens.
Pocket Network doesn’t have infra or code for custom billing per call. We could charge portal operators a fixed $0.00075 (equivalent in POKT) per call total, up to 200 input tokens and 1000 output tokens for the 13B model. Calls larger than these would be dropped by the node runners.
Furthermore, we propose that there could be another tier, specifically designed for batch workload that is slower but cheaper. Those nodes would run on lower end GPUs or only CPUs, and they would be charged at $0.00050 per call total, up to 200 input tokens and 1000 output tokens for the 13B model.
Mistral
Just like above, Mistral is another capable LLM model. Allow-listing Mistral in addition to Llama v2 could allow users A/B test the responses and give them alternatives. Pricing etc. would be similar to Llama v2 proposal.
Stable Diffusion
Stable Diffusion is an image processing engine. Its license [2] allows free redistribution rights.
Use Cases
- Text-to-Image: generate an image from a text prompt.
- Image-to-Image: tweak an existing image towards a prompt.
- Inpainting: tweak an existing image only at specific masked parts.
- Outpainting: add to an existing image at the border of it.
- Data generation and augmentation: The Stable Diffusion model can generate new data samples, similar to the training data, and thus, can be leveraged for data augmentation.
Edit: Alternative Approach
Above approaches are prescriptive about specific model names. One of the comments below argues that as long as “IQ” is there, it doesn’t matter which exact model it is. For example:
- Chain 1: Meets or beats a score of 70 per XYZ (for example GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. )
- Chain 2: Meets or beats a score of 55 per XYZ at a different price point than Chain 1.
- Chain 3: …
Pricing:
(Again, please remember, one of the goals of this document is getting feedback from all participants. So these numbers are temporary)
AWS charges [1] $0.018 per standard quality image at 512x512 resolution. Pocket Network could duplicate the same price. Again, we don’t necessarily need to beat AWS absolutely with pricing, because there are other advantages of using Pocket Network such as privacy and easier onboarding.
ARR / Inflation Management
Two most important things to know:
-
These new chains will not be inflationary.
-
They are not impacted by, neither do they impact ARR measures.
If 0.003 POKT token is minted as a result of a call, that portal operator will be charged exactly the same amount. Any free tiers or promotional access will be at their expense (unless, of course, DAO passes a proposal to subsidize and/or support any portal operators in this new area).
Flow of Funds
- Portal operators and Pocket DAO (with feedback from node runners) agree on a price for each chain.
For example, say, $0.00075 per relay for a particular chain. This fiat value is converted to POKT weekly. For example, at the time of writing this doc, POKT trades at $0.25. So, each call would cost 0.003 POKT - We set the RTTM value for that chain so it rewards (i.e. mints) exactly that much POKT per relay.
For example, 0.003 POKT will be minted for that relay as a reward upon claim/proof. - At the end of the week, we keep a tally of how much new POKT is minted as a result of relay rewards in these chains, and for which portal operator and we charge them for that amount in POKT
For example, if Portal Operator XYZ made 100000 calls, there would be 100,000x0.003 = 300 POKT minted. So, we charge that operator 300 POKT. - We burn their payment to avoid inflation. Burn will be 1:1.
In terms of ops: Above calculations are not hard at all. They are all recorded in the blockchain as part of the claim/proof cycle. Public tools such as PoktScan already show individual app performance. If needed, a more purpose-built tool can be created easily to see which app did what.