Formalizing a Naming Scheme for RelayChainIDs

We are about to start onboarding numerous chains and whitelisting them in the SupportedBlockchains parameter, to support the new multi-chain era.

Before we do this, we still have a chance to enforce some order on our RelayChainID naming scheme. At the moment, only 0001 (Pocket mainnet) and 0021 (Ethereum mainnet full node) are locked-in, which means it may still be possible to come up with a naming scheme that is compatible with them.

Therefore I would like to break down the problem and propose a solution. I have no doubt that there will be gaps in my knowledge, meaning this won’t be a perfect solution. The goal is to start a discussion.

Defining the Problem

Limitations:

  • 4 characters
  • 16 symbols (hexadecimal): 0-9, a-f
  • 0001 (Pocket) and 0021 (Ethereum Mainnet Full Node) are locked-in

Characteristics we need to consider in the naming scheme, in order of priority:

  • Project: Pocket, Ethereum, Bitcoin, Avalance, etc. Top priority because this has the largest set.
  • Network Type: mainnet, testnet, canary, …
  • Node Type: full, archival, light, sentry, witness, …
  • Shards: subsets of a project set, of which there may be many
  • Subgraphs: a different project type than chains, so might be worth having a distinct naming scheme
  • Duplicate network type indicators: e.g. Ethereum testnet - Rinkeby, Ropsten, Kovan, Gorli. Lowest priority, because few projects have multiple versions of the same network type.

Proposed Solution

Proposition Pt. 1: Use the 4th column to define network/node types, with numbers for mainnets and letters for testnets:

  • 1: mainnet full
  • 2: mainnet archival
  • 3-9: mainnet spillover, for different node types or duplicate network types. No consistent logic because it will depend on the project.
  • a: testnet full
  • b: testnet archival
  • c-f: testnet spillover, for different node types or duplicate network types. No consistent logic because it will depend on the project.

Proposition Pt. 2: Use columns 1-3 to define different projects in numerical terms, from 1-999. Pocket would be 000X, Ethereum would be 002X.

Proposition Pt. 3: Use letters in place of the 0s before a project number to identify shards of the project. Ethereum Shard #1 (full node) would be AA21, Shard 2 would be AB21, and so on.

Proposition Pt. 4: Use letters-only in columns 1-4 to define subgraphs, from AAAA to FFFF. This ensures no confusion between chains and subgraphs.

Evaluating the Solution

Benefit of Propositions Pt. 1/2: The locked-in IDs (0001 and 0021) remain compatible with the naming scheme. For example:

  • POKT mainnet: 0001 (locked-in)
  • POKT testnet: 000a
  • ETH mainnet full: 0021 (locked-in)
  • ETH mainnet archival: 0022
  • ETH testnet full: 002a

Downside of Proposition Pt. 1: This naming scheme doesn’t actually support all the varieties of Ethereum testnets for both full and archival nodes. If this is a dealbreaker, we could swap to a-f for mainnets and 1-9 for testnets, but then this becomes incompatible with the locked-in 0001 and 0021.

Benefit of Proposition Pt. 3: Ties sharding to project IDs, making it easier to understand shard IDs at a glance.

Downside of Proposition 3: It’s not actually that scalable. The first 10 projects (000X-009X) will support up to 36 shards (AA-FF), but Ethereum is anticipated to have 64 shards already in Phase 1. And if sharding becomes prevalent we have a problem, because projects 010X to 099X would only support 6 shards each. If it turned out that all shards are going to have the same network/node type, we could use letters in the 4th column too and increase the scale for projects 000-009 to 216 shards.

Benefit of Proposition Pt. 4: Logically separates subgraphs from chains and supports up to 1296 subgraphs.
Note: we could technically use numbers in the 4th column and still maintain the logical separation while extending the possible scale to 3240 subgraphs.

Permutation Efficiency

One way we can evaluate the efficiency of the naming scheme is to consider how many permutations may be locked out in the following scenarios:
(a) the project didn’t make full use of their available IDs
(b) the permutation wasn’t included in the logic of the scheme

Scenario (a)

A hypothetical project 003 with no shards and only 2 network types (mainnet full and testnet full) would use only the following IDs: 0031 and 003a. Which leaves [A-F][A-F]3[2-9,B-F] unused, which is 504 unused permutations.

Already that seems like a lot of waste with only scenario (a) applying for only 1 project.

Scenario (b)

Scenario (b) is harder to evaluate, but I’ll give it a shot.

Because we’re only applying letters before project numbers which increment from right to left in columns 1-3, or exclusively in the form of subgraphs (AAAA-FFFF), but not after numbers that are located in columns 1-2, we’re missing out on the following permutations: [0-9][0-F][A-F][0-F]. That’s 13,824 permutations, which is 21% of the total possible permutations (65,536).

There are probably other missing permutations that I’m overlooking in scenario (b), but hopefully this will help others to assemble a more comprehensive critique.

Open Questions

These questions represent gaps in my knowledge, which may influence the appropriateness of the naming scheme:

  • If we have 4 characters then upgrade to 5 characters, what happens to the existing IDs. Do we simply add a column in front of the ID?
  • How do we handle forks? Which chain keeps the existing ID? Should we link the IDs of forks to the original ID?
  • Does it matter which client a node is using? Could a Pocket node using the Ethereum RelayChainID 0021 choose freely between Geth, OpenEthereum, Nethermind, and Besu?
  • Could a beacon chain be categorized within the 4th column?

Is there some immovable technical reason to be limited to a 4byte ID?
If not, i would recommend going with an open standard (candidate) instead of homebrewing identifiers

Hey @Garandor, this issue describes the reason behind the decision to limit to 4 bytes.

We made a final decision to go with a semi-randomized approach, keep an index with decimal values whose only purpose is to keep a visible indicator as to whether we are surpassing our total capacity (x is decimal and between 0-65536) and also whose final destination is to be converted to hex producing the final and resulting chain id.

Discarded Solution

In line with Jack’s proposal, we initially felt that a sane and clear deductible chain id that infers metadata about the actual network to whom the id is assigned was a valid aim and a desired goal.

However in our attempt to try to represent all of the relevant data properties in a 4 bit hex word, we ended up subverting the limitations outlined in the proposal above, however only by agreeing to another set of limitations.

The intermediate proposal that came in response to this one suggested the two following:

  • Represent properties of the node by designating chain id ranges per node property
  • Calculate chain ids as a function of the node properties: (network type, node type, project etc…)
  • Accommodate the limitations presented in the previous proposal:
    • Keep a space for the possibility of 64 shards per chain
    • Keep in mind that we have two already locked-in chain ids.

The tl;dr version of the proposal was to:

  • Establish an index for the network types we support identified by decimal values.(i.e: Mainnet=1, Testnet=2…)
  • Establish an index for the node types we support identified by decimal values. (i.e: full,=1 archival=2, sentry=3…)
  • Establish an index for the projects/chains we support identified bydecimal values (i.e: pocket=1, ethereum=2…)

Then, estimate how many nodes will we need per project should allow chain ids space for, which was calculated following this simple logic:

  • Assume all projects will have shards at a given point in time, and we’ll host 64 shard nodes per project
  • Assume we want to spin 1 node per network per node type per project
  • Assume we want to run Subgraph nodes as well.

The math was:

Count of Nodes/Project
= 64 + 8 * 4
= 96 + 4’.

Where:
8 is the supported node types,
4 is the supported network types
4’ is the subgraph nodes.

We can run 100 nodes per project. We have the possibility of assigning 65536 chain ids, this means we can support 655 projects, and for each project provide 100 unique chain ids.

The ranges should be established by network type, meaning we will have 4 main ranges + subgraph range, and within each network type range, we have 8 chain ids per project to assign one for each node type running on that network type.

Then we can start assigning chain ids according to the following formula:

DecimalChainId = 655 * (NodeTypeId - 1) * RangeId + (projectId-1) + (NodeTypeId-1)

then convert the DecimalChainId to Hex.


An example application of this formula on the currently locked in chain ids:

  • 0001: Pocket Mainnet Full
    • ProjectId: 2 (Check projects index, pocket=2)
    • RangeId: 1 (Mainnet =1)
    • NodeTypeId: 1 (Full)

Math:

655 * (1 - 1) * 1 + 1*(2 - 1) + (1 - 1) = 1


Then we can convert the resulting decimal chain id to a 4 bit hexadecimal:

1 (dec) = 0001 (hex/4bits)

  • 0021: Ethereum Mainnet Full
    • ProjectId: 34 (Check projects index)
    • RangeId: 1 (Mainnnet)
    • NodeTypeId: 1 (Full=1)

Math:

655 * (1 - 1) * 1 + 1* (34 - 1) + (1 - 1) = 33


Then we can convert the resulting decimal chain id to a 4 bit hexadecimal:

1 (dec) = 0021 (hex/4bits)

The more elaborate version of this is available in this notion document

Although this approach is quite deterministic, we did not feel comfortable moving forward with it given the 655 projects capacity it presented and the fact that it does not scale properly if we ever move to longer hex sentences (5bit or more).