Launching AI models on Pocket Network

Defining this for generative models is not easy. Two outputs of a language model can be lexically different but perfectly correct. Two images generated can have absolutely no overlap but refer to the same query.
Even in same family models, like “Llama-2 13b”, the quantization of the weights affects the response. Same model, same dataset, different optimization.

Model enforcement cripples development.
Having a service per model like proposed wont solve the expected quality problem. We can say “Llama-2 13b only service” but anyone will stake what they like, better and worse models. If the portal chooses a single model and do some rudimentary testing like expecting same answers as their source of truth model, then they will kick better models of the network limiting the usefulness of the chain to what they can test.
Then, when there are more than two portals, who dictates the version of the model? What if an other portal has better ways for assessing quality and can use models that are above the publicized quality?
Finally, supply with excess capacity wont adapt to a portal, they will never join POKT and the supply (which is very important in ML) will be limited to operators that agree to arbitrary portal mandates.

To make it simple

  • We cannot guarantee that a publicized service contains or not a given model.
  • It is naïve to think that same-response testing is a way for checking same-model staking.
  • Model enforcement goes against attracting excess capacity.
  • Model enforcement limits the supply optimization and development, as moving away from de-facto portal source of truth results in ban, i.e. implementing a better quantization that reduces your service cost will result in removal.

Sure, now you tell me how “day and night difference” is mathematically computed and automated into a pipeline.
I can give you some samples, below you will see 6 responses to the prompt:

Who won the world series in 2020?

Three (3) of those responses are from “Llama-2 13b” models, from those 3 Llama models, 2 only differ in the quantization. The other 3 responses are from “Mistral”, “Mixtral” and “Yi” families, models that are better or equal to “Llama-2 13b”, one actually rivals Llama-2 70b but has fewer parameters.

Response 1

• The Scorecards - World Series 1903 through 2017 - (rankings with graphs.)• Baseball Awards (includes Local High School Awards)• Baseball Fielding Stars• Baseball Records• Baseball Relievers (regular season and postseason data, saves leaders)• Baseball Search Interface (search for games, teams, players, and more)• Baseball Top Teams• Baseball Uniform Numbers• Batting Leaders - 100 Years of Batting• Baseball Utility Assist• Baseball World Series Logos• Major League Baseball Franchises (team names, locations, stadium names)• Baseball Yearly Rankings & Totals• Baseball’s Greatest Moments• Baseball Yearly Rankings (and records)• Mini League Schedule• MLB World Series Goblins• MLB World Series Score Boards• Most Valuable Player (MVP) Award Winners• Pitching Leaders - 100 Years of Pitching• Pitching ERAs (Earn

Response 2

Chief AP Science Writer

The Los Angeles Dodgers won the 2020 World Series, their first championship since 1988. The 2020 season was significantly different as MLB had to be played in “bubbles” due to the COVID-19 pandemic. The pandemic also made it so some playoff games, including the World Series, took place during the Trump presidency while others took place during the Biden presidency.

The Dodgers defeated the Tampa Bay Rays in six games to claim their seventh World Series title and their first since 1988. The Fall Classic was played in a neutral-site format under the expanded postseason due to the coronavirus pandemic, with games being held at the Globe Life Field in Arlington, Texas.

The Dodgers finished the season with the best record in the MLB, sweeping through the NLDS, NLCS and clinching the

Response 3

In 2020, there were two World Series, each played by different leagues operated by Major League Baseball (MLB). Here are the results of each series:

  • American League World Series:
    • Tampa Bay Rays defeated the Los Angeles Dodgers in Game 6, winning the series 4-2.
  • National League World Series:
    • Washington Nationals defeated the St. Louis Cardinals in Game 7, winning the series 4-3.

So, in 2020, the Tampa Bay Rays won the American League World Series, and the Washington Nationals won the National League World Series.

Response 4

The Los Angeles Dodgers won the 2020 Major League Baseball World Series. The World Series is a series played in October between the American and National Leagues champion baseball teams. It is the conclusion of the two different, but equally competitive, leagues’ competition in Major League Baseball.
Did the Dodgers win the World Series in 2020?
Did the LA Dodgers win any world series?
Did the LA Dodgers win the division?
What year did the Dodgers win their last World Series?
How long will the Dodgers be in Arizona?
Did EA remove Maddon from The Show 21?
What year did the Dodgers win their fifth World Series title?
For the first time in franchise history, Dodgers clinch
Mookie Betts is the ninth player in MLB history and second in Dodgers history to win a M

Response 5

In 2020, there were two teams in Major League Baseball who made it to the World Series. Who are those two teams?

Who won the World Series in 2020?

And, for bonus points, why did these two teams make it to the fall classic in the first place?

Response 6

Chief<|im_start|> Editor<|im_start|> Editor

The Los Angeles Dodgers won the 2020 World Series. This victory marked their seventh championship title and their first since 1988. The Dodgers defeated the Tampa Bay Rays in six games. The title translated into Real Major League Baseball Championship for the team from Los Angeles. Dodger players, coaching staff, management and even the most loyal fans celebrated this triumph as an end to a long drought. The goal of the team was realized through outstanding team strategy, special individual performances (like that of Mookie Betts), and an undeniable sense of unity and love for the game.

Generation details, for all responses
  • Engine: vLLM 0.3.2
  • Endpoint: /v1/completions
  • Request data:
 {
    "model": "vllm_default",
    "prompt": "Who won the world series in 2020?",
    "seed" : 1,
    "max_tokens": 200,
    "ignore_eos": False,
    "stream": False,
    "echo": False,
    "logprobs": False,
}

If you take the time to go through them and tell me which are from the “Llama-2 13b” family I would love to know the method and start implementing it for the community. I certainly cannot tell.

The problem with this naïve approach, base on “same model with same seed = same string” is evident. Unless you freeze not only what you are calling the “model family”, but also all other variables, quantization, generator, seed, prompting technique and generation guidance, you will have to deal with this.

I agree that we need to give the customers offers based on their expectations. In the LLM world, apples are measured like this:
Huggingface Leaderboard
Berkeley Function-Calling Leaderboard
(and others around there)

I think that we need to meet the users demand and have our own, not just blindly advertise “Llama-2 13b” family or any other base model name.

I believe the contrary, showing that we do not care to actually measure models quality and just believe in the service provider is a big alarm to anyone that knows how these model work and how difficult are to measure (this was actually my first thought when I saw your proof of concept).

This is a misconception, depending on the implementation and the topology of the model linearity wont hold. RWKV or MAMBA based models might be linear but Transformer based topologies have increased computational costs (and memory costs) as context windows grow. Also, depending on how the model is deployed, quantization plays a significant role with context and batch size, at some point de-quantization times become larger than actual processing.

Agree, we wont be counting tokens, but we estimate how much we want to pay for a given GPU capability working 24hs at an expected token rate. We can set this up and then let the supply side innovate and try to become cheaper. It is not something new for the Pocket network to allow innovation on the supply side.

I’m not opposed to different complexities services/chains, I’m opposed to model-enforcing.
Anyways, the problem of detecting and filter real high/low/trash quality nodes from the rest is the same and will be needed for each service.

2 Likes