Urbit for mathematicians

The three months since the last cp4space article have largely been spent learning about an interesting project called Urbit. My attention was drawn to its existence during a talk by Riva-Melissa Tez (formerly of Intel) on the importance of continued improvements to the computational efficiency of hardware.

A decent amount has been written on the Internet about Urbit, but it tends to either be non-technical, explaining the importance of the project, or technical but written in highly unconventional project-specific jargon. Instead, we’ll depart from either of these by examining the novel internals of Urbit that are most likely to be of interest to mathematicians and theoretical computer scientists.

Before we can do so, however, it is worth asking what Urbit is. This is often a source of confusion, because it is an ecosystem of multiple interacting components designed to mesh well together. If you were to metaphorically run 2-means on the set of Urbit components, they would cleanly separate into the operating system (Urbit OS) and identity system (Urbit ID).

The most interesting salient components of Urbit OS are:

Arvo: a deterministic operating system kernel implemented in a pure functional programming language (Hoon).
Nock: a very low-level functional programming language. It’s roughly halfway between a minimalistic Lisp dialect and the SKI combinator calculus. In the Urbit ecosystem, Nock is the machine code for the virtual machine in which the Urbit OS runs.
Vere: a software implementation (written in C) of the Nock virtual machine, capable of running the VM inside a Unix environment.
Hoon: a higher-level functional programming language in which the Urbit OS is directly written. This ultimately compiles down to Nock.

The identity system, Urbit ID, is a hierarchical system of integer addresses (analogous to IPv4 addresses) called Azimuth points. Azimuth points are securely associated to public keys (either mutably or immutably), and communication between machines running Urbit is built upon this public key infrastructure.

Azimuth points and their hierarchy

As mentioned, Azimuth points are non-negative integers. There are five such levels:

Points in [0, 2^8 – 1] are called galaxies.
Points in [2^8, 2^16 – 1] are called stars.
Points in [2^16, 2^32 – 1] are called planets.
Points in [2^32, 2^64 – 1] are called moons.
Points in [2^64, 2^128 – 1] are called comets.

These can cumulatively be represented by the datatypes uint8, uint16, uint32, uint64, and uint128, respectively. These datatypes possess standard ‘restriction maps’ familiar from using a programming language such as C:

uint64 → uint32 → uint16 → uint8

where, for instance, a 32-bit integer is coerced into a 16-bit integer by taking the lower half (or, equivalently, reducing modulo 2^16). There are two elegant ways to impose a ring structure on each of these 2^n-bit datatypes:

Integers modulo 2^(2^n), which form a ring;
Nimbers less than 2^(2^n), which form a finite field.

The first choice is the familiar one (in the C programming language, the operators {+, *, -} coincide with modular arithmetic in Z/(2^2^n)Z). In the second choice, ring addition is bitwise XOR, and ring multiplication is slightly more complicated to define.

For each of these two choices, the aforementioned restriction maps are ring homomorphisms.

Now, how do these restriction maps relate to Urbit? They impose a partial order on the set of Azimuth points, where x ≤ y if x is the image of y under a restriction map. The Hasse diagram of this partial order is a forest of 256 trees, each of which has a galaxy at its root. This Hasse diagram determines, for each Azimuth point that’s not a galaxy or comet, a unique parent of that Azimuth point. Most planets, for instance, have stars as parents.

The parenthood relationship for comets is different: the parent of a comet is its image under the restriction map uint128 → uint16, which must be one of five stars willing to host comets. Comets are direct hashes of public keys (so, unlike planets, can’t be transferred between different owners).

Planets, stars, and galaxies are tradable as non-fungible tokens on the Ethereum blockchain. This gives a modifiable decentralised mapping from each Azimuth point (32 bits and easily memorable) to an Ethereum address (160 bits and cryptographically secure). As such, it’s possible to trustlessly verify the owner of an Azimuth point and therefore set up a secure communication channel with that Azimuth point; all communications between machines running Urbit OS are established in this manner.

Names of Azimuth points

The integer representing an Azimuth point is converted into a human-readable name (called an ‘@p’) through the following process:

For planets and moons (not for stars and galaxies), a Feistel cipher is applied to the lowest 32 bits of the integer. This is a reversible permutation designed to obfuscate the relationship between the name of a planet and its parent star, reflecting the fact that planets are permitted to move freely between stars.
The bytes of the resulting (enciphered) integer are each converted to a three-letter syllable, using a separate lookup table for even-indexed bytes (suffixes) and odd-indexed bytes (prefixes). Each syllable consists of a generalised vowel flanked by two consonants.
The syllables are written in big-endian, with a hyphen between each 16-bit word and a tilde at the beginning.

An example of a 32-bit planet name is ~sorreg-namtyv, which belongs to the departed founder of Urbit.

A more comprehensive description of the process, including the two 256-element syllable lookup tables, is given in this StackOverflow answer.

In addition to the pronounceable ‘@p’ identifier, there is a visual encoding called a sigil. This stores each byte of the (enciphered!) 32-bit integer in a separate quadrant of a 2-by-2 arrangement, again using a pair of 256-element lookup tables. The sigil for the planet ~firmer-hatful is shown here:

sigil for ~firmer-hatful

Gavin Atkinson describes the design decisions involved in creating this system of sigils. Planets with circular sigils and/or meaningful @p names (e.g. ~timber-walled) tend to be in greater demand, and therefore more expensive, due to the combination of higher scarcity and aesthetic appeal.

The lookup tables mapping bytes to quadrant tiles were populated manually (by drawing 512 tile designs), so the Urbit sigils can be thought of as partially procedurally generated. [A fully procedurally-generated system may also have worked, such as the system in this Wolfram blog post from 2009, but this is unlikely to change now.] This is, nonetheless, highly consistent with the @p names; the syllable lookup tables were also manually populated.

Nock and its emulator

We proceed to turn our gaze from Urbit ID to Urbit OS.

Nock is a low-level pure functional programming language described here. Code and data are simply binary trees (called nouns) whose leaves are nonnegative integers. The Nock interpreter is a unary function * whose domain and codomain are both the space of nouns; its semantics are defined recursively in terms of itself and other unary functions.

My favourite of these functions is the tree-addressing operator, /, which is defined as follows:

/[1 a]              a
/[2 a b]            a
/[3 a b]            b
/[(a + a) b]        /[2 /[a b]]
/[(a + a + 1) b]    /[3 /[a b]]
/a                  /a

In particular, if we let L(x) and R(x) refer to the left and right subtrees of a tree x, then we have:

/[1 x] = x
/[2 x] = L(x)
/[3 x] = R(x)
/[4 x] = L(L(x))
/[5 x] = R(L(x))
/[6 x] = L(R(x))
/[7 x] = R(R(x))

and so forth. In general, /[n x] can be evaluated by:

expressing n in binary (for example, 1729 is “11011000001”);
removing the leading ‘1’ (in this example, giving “1011000001”);
replacing each ‘0’ with ‘L’ and ‘1’ with ‘R’ (in this example, “RLRRLLLLLR”);
taking x and sequentially applying each of the characters to it (in this example, giving R(L(L(L(L(L(R(R(L(R(x)))))))))), noting that the outermost function is the least significant bit).

If we enumerate the nodes in an infinite binary tree according to the natural number required to address the subtree rooted at that node using the / operator, this agrees with a breadth-first traversal of the tree. This is the same enumeration (the salmon spiral below) that relates the Calkin-Wilf sequence to the Calkin-Wilf tree, shown in this illustration by David Eppstein:

The operator / is useful enough that Lisp distributions typically implement the special-cases of this function for all 2 ≤ n ≤ 15, referring to them as {car, cdr, …, cddddr}.

Nock, if evaluated naively, would be incredibly inefficient: decrementing an integer n takes time proportional to n. The emulator therefore uses a clever idea called jets: runtime optimisations where the emulator pattern-matches various patterns (such as the decrementation function applied to an integer) and calls a native C function to decrement the integer directly.

The Urbit project has succeeded in implementing an entire operating system kernel as a function in this language, and it is performant in practice owing to the use of jets. The Hoon compiler (which compiles programs written in the higher-level language Hoon into the low-level Nock language) generates code that will be successfully pattern-matched by these jets.

With jets, it’s possible to go even lower-level than Nock. Urbit doesn’t do this because there’s not really much point in doing so, except to give a formal definition of Nock in terms of an even simpler language:

Aside: implementing Nock in the SK combinator calculus

If we’re opting for absolute simplicity, why not go the whole way? It’s possible to implement Nock in the untyped lambda calculus or in the SK combinator calculus.

It would be tempting to try the following method of encoding nouns:

The nonnegative integers at the leaves of noun trees can be represented using Church numerals: n is the higher-order function which maps its argument f to its n-fold functional composition, f^n.
The ‘cons cells’ [a b] can be represented as Church pairs.

However, unfortunately there isn’t a way to query whether a noun is an integer or a pair. As such, we’ll need to use a slightly more complicated encoding of Nock nouns where we ‘tag’ each noun with its type:

encode_noun([a b]) := pair(0, pair(encode_noun(a), encode_noun(b)))
encode_noun(n) := pair(1, church_numeral(n))

The unary functions used to define the Nock interpreter can now be themselves implemented as combinators, up to and including the Nock interpreter * itself.

For example, the question-mark operator can be implemented by taking the first element of the Church pair (which is the Church numeral 0 or 1) and then ‘boxing’ it as a Nock noun instead of a ‘bare’ Church numeral:

?(x) := pair(1, first(x))

where the ‘pair’ and ‘first’ functions are defined in the Wikipedia article.

The incrementation operator needs to unbox the integer, take its successor, and then rebox it, like so:

+(x) := pair(1, succ(second(x)))

Technically, we’ve cheated here, because according to the Nock definition, +(x) needs to also be defined on ordered pairs as well as integers. In particular, we need to use first(x) to extract a Church numeral specifying whether x is an ordered pair or an integer and branch on that: if the Church numeral is 1, then we can just return pair(1, succ(second(x))); if the Church numeral is 0, then we need to return +(x). The self-reference can be removed from this recursive definition by means of a fixed-point combinator.

The remaining operators are all recursive, so they’ll similarly require the fixed-point combinator in their implementations.

It would be a fun exercise to attempt to write a jet-propelled emulator for the SK combinator calculus, including optimisations such as internally representing Church numerals as integers (with a library such as GMP to allow arbitrarily large integers) and using jets to call the GMP implementations of arithmetic operations when appropriate.

By using the ‘compress identical subtrees’ idea, familiar from Hashlife and libraries for manipulating binary decision diagrams, we avoid excessive encoding overhead as well as enabling constant-time recognition of (standard implementations of!) functions such as the decrement operator.

3 Responses to Urbit for mathematicians

johnicholas says:

May 13, 2021 at 8:06 pm

I think very badly of Urbit author Curtis Yarvin / Mencius Moldbug’s politics and I think very highly of you and your work. I wonder if you would consider some adding some sort of disclaimer to this blog post something like “I do not endorse his alt-right politics, and I certainly hope that I’m not doing more harm than good by explaining and therefore popularizing this work.” I looked into Urbit around 2014, and I decided at that time not to use it further or try to publicize it.

Even if Yarvin is actually out, I would worry about further enriching Peter Thiel via this explainer.
It seems to me that Yarvin’s politics is not actually different than Urbit’s design: there’s very clearly an obsession with hierarchy, secrets and obscure language, anti-incrementalism, and totalitarian solutions in both.
It seems to me like the system of jets is eminently suitable for installing backdoors, and therefore it’s entirely likely that Urbit is either backdoored already, or is preadapted to have backdoors installed into it at some point in the future.

TL;DR: Urbit is very similar to both Gab and Parler.

- Hans Strunger says:
  
  May 14, 2021 at 2:19 pm
  
  What a petty way to life to consider every inventor or creators personal opinions. What do you think of Edison? Bell? Newton?
  
  Surely there are things you use, where you don’t agree to it’s inventor or creators personal opinions or political musings.
  
  Come over to the rational side, where you experience life and can use things to build a prosperous future instead.
  
Test says:

May 27, 2021 at 7:15 pm

Do you use Urbit? What’s your planet name?