Keep your public keys private

Yes, the title sounds very counterintuitive. After all, don’t digital signature schemes require the general public to know your public key so that they can verify your signatures?

That is correct, but importantly they don’t need to know your public key until the very moment that you actually want to sign something. Instead, you can publish a hash of your public key long beforehand, and only publish your public key at the moment you digitally sign the message. The verifier then checks that the digital signature is valid using that public key, and then also checks that the public key is consistent with the hash you published.

The pseudonymous inventor of Bitcoin, Satoshi Nakamoto, must have realised this when he or she wrote the whitepaper. A Bitcoin address (to which you send coins) is a RIPEMD-160 hash of a SHA-256 hash of the elliptic curve public key. Importantly, this hash is not the same thing as the public key itself.

Why does this matter? It turns out that the hash offers a much stronger level of security than the elliptic curve:

In particular, if an attacker knows only your Bitcoin address, then there’s no easier way for an attacker to steal your coins than by brute-force: generate a random private key, derive the public key, hash it to obtain an address, and see if it matches. It would take an expected 2^160 iterations of this approach to steal your coins.

Note that the attacker will almost certainly end up with a different private key from the one you created, but it will ‘coincidentally’ be able to unlock the coins in the same address. This is because the 160-bit space of addresses is 2^96 times smaller than the 256-bit space of elliptic curve keys.

What if an attacker knows your elliptic curve public key? Then, using the Pollard rho algorithm, it ‘only’ takes on the order of 2^128 iterations to determine your private key. That’s still humongous; even with all of the computing power currently available on the planet, it would take millions of years to reverse-engineer your private key.

So why should you be concerned? There are two reasons:

  • There’s a polynomial-time quantum algorithm for breaking elliptic curve discrete logarithm. It uses the same ‘period-finding’ subroutine as Shor’s factorisation algorithm. It’s still far beyond current quantum computing technology, with the best published upper bound requiring 128 billion Toffoli gates to break a 256-bit elliptic curve discrete logarithm, but quantum computing progress has been accelerating in recent years.
  • More of a concern is that Pollard’s rho algorithm might not be the best classical algorithm for solving the elliptic curve discrete logarithm problem. Prime factorisation, for example, became much easier as recently as 1996 when the General Number Field Sieve was invented (also by Pollard). It’s plausible that a secret governmental organisation has a faster method of cracking elliptic curve discrete logarithm.

If you’re unconvinced that this second reason is even remotely plausible, and strongly believe that Pollard rho is obviously the best possible algorithm for solving elliptic curve discrete logarithm, then you should ask yourself why serious cryptographers such as Joe Silverman are bothering to develop alternative methods.

(Before you dismiss this as an argumentum ad verecundiam, note that this is not purporting to be an argument that elliptic curve discrete logarithm is definitely insecure. Rather, it’s an argument that there’s no known proof that it is secure, because if such a proof did exist, then there would have been no point in Silverman proposing an approach that is guaranteed to fail.)

So, 2^128 iterations should be regarded as ‘an upper bound, which is tight assuming that there will be no academic, industrial, or governmental progress on finding faster algorithms’. And if a large-scale fault-tolerant quantum computer is developed, be very afraid…

On the other hand, cryptographic hash functions such as SHA-256 and RIPEMD-160 (both of which are composed to derive the Bitcoin address from the public key) are designed to thoroughly mush the input in as chaotic manner as possible*. They have no nice mathematical structure by design, so it’s very unlikely that there’s a better approach than the 2^160-iteration brute-force algorithm described above.

*that’s not to say that hash functions are just kitchen sinks of arbitrary operations. They’re still very carefully designed to have good dispersion properties, be resistant to a bunch of different cryptographic attacks, produce statistically random output, and achieve these goals efficiently (in terms of speed in software/hardware implementations).

The take-home message

To reiterate:

  • if you share your public key, then your security level is “hopefully 128 bits, but maybe some organisation has found a faster method and is keeping it quiet”;
  • if you don’t share your public key until you absolutely need to (when signing a transaction), then your security level is “almost definitely 160 bits”.

You should feel much more confident if your Bitcoin is in an address where the world doesn’t know your public key. This means that whenever you spend any Bitcoin from an address (inevitably revealing the public key in the process), you should empty the entire address and send the remaining balance to a fresh unused address.

You’re still revealing your public key, but only very briefly: as soon as the block containing that transaction is confirmed, it becomes incredibly computationally difficult to rewrite the history. In essence, it only gives an evil adversary a maximum of a few minutes to try to break the discrete logarithm problem.

Many wallet implementations create fresh addresses for every transaction, so my recommendation is to use one of those (e.g. the Trezor hardware wallet).

Since ‘not reusing addresses’ is already known to be best practice, then you might be tempted to ask:

Is this advice really necessary?

Apparently so.

At the time of writing, someone has 94258 Bitcoin* in a regular (pay-to-public-key-hash) address which has revealed its public key. So, if you are reading this and are the owner of 1P5ZEDWTKTFGxQjZphgWPQUpe554WKDfHQ, then I’d recommend moving the balance to a fresh address imminently.

*that’s about 3 billion dollars.

For example, if you look at one of the recent outgoing transactions, the ‘sigscript’ is the following:

473044022100ff4fa243701eea0bdba8615a8bc4f01df387
aae215a90fe382eaf5405ee0ad73021f6f814e02b88e1fbb
1b0d5099970da6cf426ef58027c4b99b1afb633bb5b62801
21024b83426cf9bff257261d87f2f2858b51b2eea756c0123c7e05bc0a007425c9f2

The last line here (68 hexadecimal characters, i.e. 34 bytes) contains 0x21 (meaning ’33’) followed by the 33-byte public key (a point on the elliptic curve). That is to say, there’s currently 3 billion dollars resting in an oft-reused Bitcoin address with an exposed public key, protected only by the difficulty of the elliptic curve discrete logarithm problem.

That’s a very large and unnecessary bet to be making against the collective innovative capability of the world’s algebraic geometers and cryptographers…

Posted in Bitcoin | Leave a comment

The neural network of the Stockfish chess engine

Last time, we briefly mentioned the high-level differences between Stockfish and Leela Chess.

To recap, Stockfish evaluates about 100 million positions per second using rudimentary heuristics, whereas Leela Chess evaluates 40 000 positions per second using a deep neural network trained from millions of games of self-play. They also use different tree search approaches: Stockfish uses a variant of alpha-beta pruning, whereas Leela Chess uses Monte Carlo tree search.

An important recent change to Stockfish was to introduce a neural network to evaluate the positions in the search tree, instead of just relying on hardcoded heuristics. It’s still much simpler than Leela Chess’s neural network, and only slows down Stockfish to exploring 50 million positions per second.

The real cleverness of Stockfish’s neural network is that it’s an efficiently-updatable neural network (NNUE). Specifically, it’s a simple feedforward network with:

  • a large (10.5M parameters!) input layer, illustrated below, that can utilise two different levels of sparsity for computational efficiency;
  • three much smaller layers (with 17.5k parameters in total) which are evaluated densely using vector instructions;
  • a single scalar output to give a numerical score for the position, indicating how favourable it is for the player about to move.

Everything is done using integer arithmetic, with 16-bit weights in the first layer and 8-bit weights in the remaining layers.

The input layer

Let’s begin by studying the first — and most interesting — layer. Here’s an illustration I made using Wolfram Mathematica:

The inputs to the layer are two sparse binary arrays, each consisting of 41024 elements. It may seem highly redundant to encode a chess position using 82048 binary features, but this is similar to an approach (called ‘feature crosses’) used in recommender systems.

What are the two sparse binary arrays, and why do they have 41024 features? My preferred way of interpreting these two arrays are as the ‘worldviews’ of the white king and the black king. In particular, the differences are:

  • Coordinate systems: because black and white pawns move in opposite directions, the two players need to ‘see the board from opposite angles’. This already happens in regular (human) chess because the two players are seated opposite each other at the chessboard, so one player’s ‘top’ is the other player’s ‘bottom’. If you imagine each player numbering the squares from top to bottom, left to right, then any physical square would be called n by one player and 63 − n by the other player.
  • Piece types: instead of viewing the pieces as black or white, a player sees it as either ‘mine’ or ‘theirs’. The 10 non-king piece types are thus {my pawn, their pawn, my knight, their knight, my bishop, their bishop, my rook, their rook, my queen, their queen}.

The reason for these ‘player-relative coordinate systems’ is that it means that Stockfish can use the same neural network irrespective of whether it’s playing as white or black. The network uses both your king’s worldview and your enemy king’s worldview for evaluating a position, because they’re both highly relevant (you want to protect your own king and capture your enemy’s king).

So, why does each worldview have 41024 features? It can be seen as an outer product (or tensor product) of:

  • a 64-element feature encoding the position of the king whose worldview this is, in their own coordinate system. This is ‘one-hot encoding’, where exactly one of the 64 entries is ‘1’ and the other 63 entries are ‘0’.
  • a 641-element feature encoding, for each of the 64 × 10 ordered pairs (square, piece-type), whether or not that square is occupied by that piece. The 641st element is unused, and is (according to the Chess Programming Wiki) apparently a result of the network being ported from Shogi to chess.

Each of the two 64-by-641 outer product matrices* is then flattened (the matrix is ‘reshaped’ into a vector with the same entries) to yield the corresponding 41024-element ‘sparse worldview’. In the input layer, each of the two 41024-element sparse worldviews are then affinely transformed to form a 256-element ‘dense worldview’.

*Important note: the 41024-element sparse binary arrays are never explicitly materialised, either as a 64-by-641 matrix or as a 41024-element vector. The Stockfish NNUE effectively ‘fuses’ the construction of these sparse vectors with the subsequent affine transformation (described below), updating the 256-element dense worldviews directly when the configuration of pieces on the chessboard is modified.

There are two levels of sparsity which are utilised when computing this affine transformation from \mathbb{R}^{41024} to \mathbb{R}^{256}, allowing the network to be efficiently evaluated many times in a tree search:

  • the 41024-element implicit vectors are themselves sparse: the number of nonzero elements is equal to the number of non-king pieces on the board.
  • moving a piece typically changes very few of the entries of the vector: if it’s a regular non-king move, only 2 entries change; if it’s a non-king move with capture, then 3 entries change.

It’s this second aspect which warrants the name ‘efficiently updatable’: when a move is made (or unmade, since we’re doing a tree search), we only need to add/subtract a few 256-element matrix columns from the resulting ‘dense worldview’ to update it.

Unless a king is moved, this (2 or 3 vector additions/subtractions) beats summing all of the matrix columns corresponding to nonzero entries (up to 30 vector additions), which in turn unconditionally beats doing a regular dense matrix-vector multiplication (41024 vector additions). That is to say, the second-level sparsity is about 10 times more efficient than the first-level sparsity, which is in turn about 1000 times more efficient than naively doing a dense matrix-vector multiplication.

The two dense worldviews are concatenated according to which player is about to move, producing a 512-element vector, which is elementwise clamped to [0, 127]. This elementwise clamping is the nonlinear activation function of the input layer, and (as we’ll describe) the hidden layers use a similar activation function. We can think of this as a ‘clipped ReLU’, which is exactly what the Stockfish source code calls it.

The remaining layers

The two hidden layers each use 8-bit weights and 32-bit biases. The activation function first divides the resulting 32-bit integer by 64 before again clamping to [0, 127], ready to be fed into the next layer. The output layer also uses 8-bit weights and 32-bit biases, but with no nonlinear activation function.

The first hidden layer takes 512 inputs (the clamped concatenated worldviews) and produces 32 outputs. The second hidden layer takes those 32 values as inputs, and again produces 32 outputs. The output layer takes those 32 values as inputs, and produces a single scalar output.

Since these subsequent layers are applied to dense vectors, they can’t use the same ‘efficiently updatable’ approach as the input layer; that’s why they’re necessarily substantially smaller. They can, however, use hardware vectorisation instructions (SSE/AVX) to apply the linear transformation and activation function.

This scalar output is then further postprocessed using other Stockfish heuristics, including taking into account the 50-move rule which isn’t otherwise incorporated into the evaluation function.

Observe that the neural network doesn’t actually have complete information, such as whether a pawn has just moved two squares (relevant to en passant), whether a king is able to castle, and various other information pertaining to the rather complicated rules for determining a draw. This is okay: the network is only being used as a cheap approximate evaluation for a position; when deciding what move to make, Stockfish performs a very deep tree search and only uses this approximate evaluation in the leaves.

Equivalently, you can think of this as being a massive refinement of the approach of ‘look a few moves ahead and see whether either player gains a material or positional advantage’, using a neural network as a much more sophisticated position-aware alternative of crudely ‘counting material’.

This is a neural network, so the weight matrices and bias vectors of the layers were learned by training the network on millions of positions and using backpropagation and a gradient-based optimiser. Of course, for supervised learning, you need a ‘ground truth’ for the network to attempt to learn, which seems somewhat circular: how do you even gather training data? The answer is that you use the classical version of Stockfish to evaluate positions using the deep tree search, and use that as your training data.

In theory, you could then train another copy of the NNUE using the NNUE-enhanced Stockfish as the evaluation function, and then iterate this process. Leela Chess does the same thing: its current network is trained on positions evaluated by using deep lookahead with the previous network as the leaf evaluation function. Note that the construction of the training data is orders of magnitude more expensive than training the network with this data, as you’re doing thousands of evaluations of the previous network (owing to the deep tree search) to construct each item of training data for the new network.

Further reading

The network is described and illustrated on the Chess Programming Wiki, which also has tonnes of references to forum discussions and other references. The first description of an NNUE was a Japanese paper by Yu Nasu (who suggested it for the board game Shogi instead of chess); the paper has since been translated into English and German. There’s also the Stockfish source code, which is very well organised (there’s a directory for the NNUE) and clearly written.

Posted in Chess | 9 Comments

Rigid heptagon linkage

In January 2000, Erich Friedman considered the problem of finding a rigid unit-distance graph G containing a regular heptagon as a subgraph. That is to say, the graph is immersed in the plane such that:

  • every edge of G must have unit length;
  • there exists some ε > 0 such that any ε-perturbation of the vertices of G results in at least one of those edges having non-unit length;
  • G contains a subgraph H isomorphic to a 7-cycle, such that the vertices and edges of H are those of a regular heptagon.

Last year, Watanabe Masaki discovered a solution where G has only 59 edges:

Watanabe Masaki’s 59-edge solution (blue lines represent edges)

On 19th December, Ed Pegg asked on Math StackExchange whether the following 42-edge configuration (invariant under an order-7 cyclic group of rotations, same as the heptagon) is rigid:

Ed Pegg’s 42-edge candidate graph

Jeremy Tan and William R. Somsky independently and concurrently proved that Pegg’s graph is indeed rigid, and that some of the vertices are redundant: it can be reduced to a 35-edge subgraph with the same rigidity property, significantly improving on Masaki’s upper bound of 59:

Jeremy’s 35-edge subgraph

Tan won the race by a narrow margin, posting a proof whilst Somsky was still writing up his solution. The proof also mentions that this graph is minimal (no proper subgraph is a solution), but it is an open problem whether the graph is minimum (has the smallest number of edges amongst all solutions).

This raises the question: can you find a solution with 34 or fewer edges?

In the other direction, what lower bounds can you find? A minimal example must be a Laman graph and therefore have 2n − 3 edges where n is the number of vertices. Each additional vertex can only be adjacent to at most 2 of the original vertices (by unit-distance-ness), so we have:

2n − 3 = (number of edges) <= ((n − 7) choose 2) + 2(n − 7) + 7

which gives a lower bound of 4 new vertices (11 vertices total) and 19 edges. The gap between my rather weak lower bound of 19 and Jeremy Tan’s upper bound of 35 is vast; can we narrow it in either direction?

Other news

Siobhan Roberts has published an article in the New York Times for the 50th anniversary of Martin Gardner’s initial publication of Conway’s Game of Life. The article mentions several recent discoveries, including the period-21 spaceship discovered by John Winston Garth using the search program ikpx2 described in a previous post.

Around the same time the article was published (and therefore too late to be mentioned therein), Dylan Chen discovered a 3c/7 spaceship he named ‘Anura’. It is the second elementary 3c/7 spaceship to be discovered (after Tim Coe’s spaghetti monster) and by far the smallest (313 cells, as opposed to 702 for the spaghetti monster).

Dylan Chen’s new 3c/7 spaceship, Anura.

Posted in Uncategorized | Leave a comment

Barreto-Naehrig curves and cryptographic pairings

There’s a very elegant cryptographic construction discovered by Barreto and Naehrig in a 2005 paper. It is beautiful from a pure mathematical perspective, but also has an impressive application: it was* part of the ingenious mechanism by which Zcash supports publicly-verifiable private transactions.

‘Publicly-verifiable private transactions’ sound paradoxical: they’re transactions where the inputs and outputs are cryptographically obfuscated (so no-one can see how much money is changing hands), but where it’s still possible for a member of the public to verify that all inputs and outputs are nonnegative and that the sum of the inputs equals the sum of the outputs (so there’s no ‘cheating’ happening).

If you’re not amazed by this, then I haven’t explained it properly: all account balances are encrypted, and the amounts of money changing hands in a transaction remains completely encrypted, but the transactions include a certificate which proves that the inputs and outputs (which are encrypted) satisfy nonnegativity and conservation of money. No information about the inputs and outputs is revealed, other than the fact that the transaction doesn’t ‘cheat’. This is an example of a ‘zero-knowledge proof’, and I still find it completely and utterly surprising that these things are even possible.

The rest of this article will attempt to explain one of the key mathematical ingredients (a construction of cryptographic pairings) and then briefly outline how it fits into these zero-knowledge proofs used in Zcash.

*Zcash changed their curve in 2017 from a Barreto-Naehrig curve to a BLS curve. BLS curves end up being marginally more efficient (higher level of security for a given size of elliptic curve), which is why the programmers made the change. The principle is still the same, namely constructing an elliptic curve which supports ‘pairings’, but there are various quantitative differences between Barreto-Naehrig and BLS curves. The reason for concentrating on Barreto-Naehrig curves is that they’re somewhat simpler and more aesthetically pleasing.

What are elliptic curves?

[If you know about the elliptic curve group law and the Cayley-Bacharach theorem from algebraic geometry, feel free to skip this section.]

The points on an elliptic curve (cubic curve with no singularities) can be made into an Abelian group. In particular, we define an operation by the following:

  • take an elliptic curve Γ;
  • choose one of its points of inflection and take that to be the ‘identity point’, 0;
  • assert that P + Q + R = 0 whenever there exists a line which intersects the curve at P, Q, and R. (Note that if there’s a repeated point, e.g. P = Q, then the line must have a ‘double intersection’ (point of tangency) at that point. Similarly, P + P + P = 0 if and only if P is a point of inflection; that’s why the identity point 0 must be one of the points of inflection.)

Traditionally, people often study elliptic curves in Weierstrass normal form, where the equation of the curve is y^2 = x^3 + ax + b. The identity point 0 is then typically chosen to be the point ‘at infinity’ where the curve intersects the line at infinity on the projective plane.

To show that this works as a definition, we firstly need to see that it’s well defined. In particular, given two points P and Q, how do we know that the third intersection R with the line ℓ through P and Q is uniquely determined?

In particular, ℓ is a linear equation, so we can use it to express x in terms of y or vice-versa. Substituting into the elliptic curve equation Γ gives a cubic equation in one variable. We know that this cubic equation has two roots (those corresponding to P and Q), so we can divide the cubic by those linear factors to determine the third root (corresponding to R).

Note that we didn’t assume that the ambient field was algebraically closed. This is important, because cryptographers use elliptic curves over finite fields, and a finite field cannot be algebraically closed.

This gives the following procedure for ‘adding’ two points:

  • draw a line through P and Q and let it intersect the curve again at R;
  • draw a line through R and 0 and let it intersect the curve again at S.

Then S is the sum of P and Q. This operation is commutative (interchanging P and Q does not affect the result), but how do we know that it’s associative? In particular, given the following diagram from a talk I gave in Winchester a couple of years ago, how do we know that the elliptic curve (blue), orange line, and green line all mutually intersect at the bottom-left?

It’s painful to verify this algebraically, but it follows immediately from the Cayley-Bacharach theorem. We’ve previously discussed this theorem, along with several miscellaneous applications to Euclidean geometry.

Elliptic curve cryptography and the discrete logarithm problem

There are two reasons why elliptic curve cryptography requires the use of a finite field instead of the real or complex numbers. One reason is practicality: there are uncountably many reals, most of which cannot be described, and therefore they cannot feasibly be represented on a computer.

The other reason is security: the Weierstrass elliptic function allows you to construct an isomorphism between the elliptic curve and a torus (specifically, the complex plane quotiented out by an appropriate lattice), and the ‘discrete logarithm problem’** on a torus is trivial; you can solve it efficiently using continued fractions.

**given two points, A and B, determine an integer m such that mA = B, where mA := A + A + … + A just means ‘the point A added to itself m times’.

On the other hand, for elliptic curves over a finite field, there is no known way to efficiently solve the elliptic curve discrete logarithm problem on a classical computer; this is how HTTPS and Bitcoin digital signatures remain secure. On a quantum computer you can use Shor’s algorithm, but you’d need a fault-tolerant 2330-qubit quantum computer with 128 billion Toffoli gates to break the 256-bit curve ‘secp256k1’ used for Bitcoin digital signatures, and that seems to be beyond the current technological capabilities of large commercial*** organisations.

***I doubt governments have this technology either. After all, they use Excel spreadsheets as a database for tracking the spread of a pandemic.

Anyway, note the following asymmetry:

  • It is very easy, given a large number m and a point G, to compute the point mG: using a ‘double-and-add’ procedure, you can do it in log2(m) ‘doubles’ and H(m) ‘additions’, where H(m) is the number of ‘1’-bits in the binary expansion of m. This procedure was how Ancient Egyptians multiplied ordinary integers.
  • On the other hand, it’s infeasible to go the other way. Given points G and mG, there is no known classical algorithm to reverse-engineer this to extract the original number m, without performing an amount of work proportional to the square-root of the number of points on the elliptic curve. Bitcoin’s elliptic curve has roughly 2^256 points, so it would take about 2^128 operations to steal a private-key using the Pollard rho algorithm.

G is just a global ‘base point’ on the elliptic curve Γ which (together with the curve itself) is a public parameter of the cryptosystem. The order n of G (smallest integer such that nG = 0) must be prime. Then we have an isomorphism from [the additive group of] \mathbb{F}_n to the elliptic curve:

\mathbb{F}_n \rightarrow \Gamma

m \mapsto mG

which is cryptographically irreversible.

[Note: the prime order n of the base point G is not the same as the prime order p of the field in which the elliptic curve itself lives. If G is a generator of the curve, then p and n will be relatively close but not necessarily equal. In the case where they are equal, the elliptic curve is called anomalous, and it has inherent weaknesses.]

Elliptic curve digital signatures

The existence of this one-way function enables various important cryptographic primitives to be built on top of elliptic curves, such as the aforementioned elliptic curve digital signature algorithm (ECDSA) used by HTTPS and Bitcoin. In ECDSA, Alice’s private key is some integer m \in \mathbb{F}_n, and her public key is the corresponding point mG on the elliptic curve Γ.

To sign a message M, Alice firstly computes a cryptographic hash z of M. In general, a cryptographic hash is a fixed-length bitstring (for SHA-256, it consists of 256 bits). In this case, we interpret z as an element of \mathbb{F}_n by interpreting it as a binary number and reducing modulo n.

Then, Alice computes a single-use cryptographically secure random number k, also in the field \mathbb{F}_n, and reveals the following:

  • the abscissa (x-coordinate) r of the curve point kG;
  • the value s = k^{-1}(z + rm) \mod n.

Neither r nor s is allowed to be zero; if this happened (incredibly unlikely!), Alice should generate a new value k and try again. These data (r, s) together form the digital signature for the message M. Bob can verify that Alice created the signature by computing the cryptographic hash z and checking that the curve point (r/s)(mG) + (z/s)G has abscissa r. This only requires Bob to know the public-key mG, not the private-key m.

ECDSA key-pairs can be reused to sign many messages, but you must generate a different random number k each time you sign a message. Otherwise, it’s possible for someone to determine your private-key. Indeed, several bitcoins were stolen by attackers as a result of a poor random number generator on early versions of the Android operating system.

Cryptographic pairings

A cryptographic pairing on an elliptic curve Γ is a bilinear map from Γ × Γ’ to [the multiplicative group of] some field F, where Γ’ is another elliptic curve isomorphic to Γ and related by a ‘twist’ (explained here). That is to say that:

\langle aP, bQ \rangle = \langle P, Q \rangle^{ab}

where P, Q are curve points on Γ, Γ’ (respectively) and a, b are integers. Note that the existence of a cryptographic pairing means that the elliptic curve discrete logarithm (hard) on Γ can be transported to the ordinary discrete logarithm (not quite as hard for a given size) on the field F. As such, the field F needs to be substantially larger than the curve Γ, lest it be the Achilles heel of the cryptosystem.

The field F is a finite field \mathbb{F}_{p^k}, whose characteristic p matches that of the ambient field \mathbb{F}_p in which the elliptic curve Γ lives. The minimal degree k for which a pairing exists is called the embedding degree of the elliptic curve.

For Bitcoin’s curve, the embedding degree is humongous (comparable to the number of points on Γ), which makes the pairing impossible to use. On the other hand, if k were very small (e.g. 1 or 2), the discrete logarithm in F would be much weaker than the discrete logarithm in Γ, so you’d need a massive elliptic curve to attain a desired level of security, and that would come at a computational cost.

The Barreto-Naehrig curves are a family of elliptic curves with a good embedding degree k = 12, so you can (for example) have a 256-bit elliptic curve with a 3072-bit embedding field F. This is what Zcash previously used, but it transpires that 3072-bit discrete logarithm is potentially slightly weaker than the desired security level. This means you’d want to use a slightly larger elliptic curve (384 or 512 bits), with a corresponding 4608- or 6144-bit embedding field F, respectively.

Details of the Barreto-Naehrig construction

The size of a Barreto-Naehrig curve is parametrised by an integer x. The values p and n are quartic polynomials in the value x:

  • p = 36 x^4 + 36 x^3 + 24 x^2 + 6 x + 1
  • n = 36 x^4 + 36 x^3 + 18 x^2 + 6 x + 1

Observe that the difference between them is only 6 x^2, which is slightly lower than the square-root of p (or n), consistent with Hasse’s bound. The validity of the construction relies on the fact that n is a factor of the 12th cyclotomic polynomial evaluated at pn = 6 x^2:

In[5]:= InputForm@Factor@Cyclotomic[12, 6 x^2]

Out[5]//InputForm= 
(1 - 6*x + 18*x^2 - 36*x^3 + 36*x^4)*(1 + 6*x + 18*x^2 + 36*x^3 + 36*x^4)

We need to choose a value of x such that these two numbers are prime; the first few such positive values of x are:

1, 5, 6, 7, 20, 78, 82, 123, 166, 169, 173, 202, 257, 295, 308, 321, 420, 438, 448, 460, 487, 543, 596, 650, 720, 798, 810, 811, 833, 845, 869, 872, 921, 981, …

which apparently isn’t yet in the OEIS. (I’ll add it.)

Of course, those values of x are far too small for cryptography. If you want a 256-bit elliptic curve, then you’ll want to choose x to be slightly lower than 2^64. By the prime number theorem, if you choose a random x you have a probability of 1/log(x)² that the numbers p and n will be prime.

After you’ve chosen a suitable x which passes both primality checks for p and n, you need to build the curve itself. Rather like Bitcoin’s elliptic curve secp256k1, the coefficient ‘a‘ in the equation y^2 = x^3 + a x + b is zero. [Note: the parameter x is not the same as the coordinate x; they just happen to have the same name.]

To determine b, you just keep trying successive values until you find one that works, as described in their algorithm:

Once you have the curve, how do you compute the pairing? There’s an algorithm by Miller (1986) for efficiently computing Tate/Weil pairings on arbitrary elliptic curves, and a paper by Devegili, Scott, and Dahab (2007) describes an optimised implementation of Tate and Ate pairings specifically for Barreto-Naehrig curves.

Interestingly, the paper makes the following comment:

Furthermore, for efficiency reasons in the pairing computation it is desirable to generate curves of prime order n such that n has a low Hamming weight. Constructing such curves for k = 12 or φ(k) > 4 is still a research problem.

The best choice of parameter I could find using the Barreto-Naehrig construction was x = 3(2^75 + 1), which results in n having 312 bits of which only 36 are nonzero.

Why are pairings useful?

They’re useful because they allow more computations on encrypted data.

Simply put, in the same way that an elliptic curve supports addition of numbers that have been encrypted as points, a pairing supports multiplication of encrypted numbers. It’s somewhat restricted, because the ‘product’ of two points belongs to F instead of Γ (i.e. it has a different type from those of the two multiplicands), so you can’t directly compute an encrypted product of three or more encrypted numbers. This is why pairings fall short of fully homomorphic encryption.

Despite this constraint, it’s still possible to take your desired computation (expressed as an arithmetic circuit) and compile it into a system of constraints that can be verified using pairings.

There’s an excellent explanation of zk-SNARKs here, which pedagogically illustrates this property of a pairing using the following diagram:

Illustration by Maksym Petkus from his article, Why and How zk-SNARK Works: Definitive Explanation

Petkus’s explanation abstracts away the particular choices of cryptographic primitives (the words ‘elliptic curve’ being mentioned only once in the whole article), but it’s useful additional context to know that the ‘Source set’ above is the elliptic curve Γ and the ‘Output set’ is the much larger embedding field F.

In addition to Petkus’s explanation, I’d strongly recommend also reading this series of blog posts by Ariel Gabizon.

Posted in Uncategorized | 7 Comments

Shallow trees with heavy leaves

There are two very different state-of-the-art chess engines: Stockfish and Leela Chess Zero.

  • Stockfish searches many more positions (100 000 000 per second) and evaluates them using computationally cheap heuristics. The tree search methodology is a refinement of alpha-beta pruning.
  • Leela Chess Zero, a descendant of DeepMind’s AlphaZero, searches much fewer positions (40 000 per second), using a deep convolutional neural network (trained through millions of games of self-play) to evaluate positions. It has a different search methodology, namely Monte Carlo tree search.

The neural network employed by Leela Chess Zero is itself fascinating, using an architecture similar to the SE-ResNet image classification model. This residual tower of squeeze-and-excitation layers feeds into separate ‘value’ and ‘policy’ heads, for evaluating the position and deciding what to do next, respectively.

However, I want to talk more about the general strategy of searching much fewer positions and expending more effort on each position. In particular, this major difference between Stockfish and Leela Chess Zero is reflected in two of the many search programs used to find spaceships in Conway’s Game of Life and related cellular automata:

  • The program ntzfind, originally written by a pseudonymous author ‘zdr’ and enhanced by Matthias Merzenich, Aidan Pierce, and Tom Rokicki, is a depth-first tree search which uses a huge in-memory lookup table to find all possible choices for the next row based on previous rows.
  • The new program ikpx2, adapted from the program I wrote to find the first elementary knightship, is more analogous to Leela Chess Zero in that its search tree is much smaller, but the amount of work done at each node is much greater.

In particular, ikpx2 uses a SAT solver to perform a deep lookahead to determine whether the current partial spaceship can be fully extended for several* more rows, and is therefore not close to a ‘dead end’. By comparison, ntzfind can only rule out a partial by performing a depth-first traversal of the entire subtree.

*this is configurable, but a typical value is 30.

The SAT solvers used are Armin Biere’s kissat and CaDiCaL. Kissat is more optimised than CaDiCaL, but doesn’t support incremental solving yet. As such, ikpx2 tries to learn (using basic reinforcement learning, specifically a multi-armed bandit model) which SAT solver is more appropriate for a given task. Theoretically, we could add additional state-of-the-art SAT solvers such as Mate Soos’s cryptominisat, and use a more powerful reinforcement learning model (such as a neural network) to evaluate the subproblem and decide which solver to use.

Having this ‘shallow tree with heavy leaves’ means that the current search progress can easily fit entirely within memory. Also, parallelism is easy: we keep a priority queue of tasks (partial spaceships) and have many CPU threads which do the following:

  1. remove a task from the queue;
  2. solve it using SAT solvers;
  3. communicate the results back to the main orchestrating thread.

The results that are sent back to the main orchestrating thread are partials that have been extended (for 30 or so rows). A small initial segment of those rows are added to the in-memory tree as nodes; the remaining rows are not, as otherwise we’d end up with the same full unpruned search tree used by ntzfind and therefore our advantage is completely annihilated.

What happens to the remaining rows? Instead of merely discarding them, we take the entire extended partial and simulate it in the cellular automaton! Sometimes these partials evolve into an object that the main search couldn’t have found: for example, the tail might have a higher period than the rest of the spaceship, or leave a trail of debris behind. This idea seems to have first been explored by Paul Tooke in 2001, before being rediscovered 20 years later in the context of ikpx2.

Results

One of the early new ikpx2 search results using Paul Tooke’s idea was this partial, which evolves into a period-528 puffer engine:

Similarly, here is a novel period-21 spaceship found by John Winston Garth. He ran ikpx2 to look for 2c/7 spaceships; within 3 days of running on 12 CPU threads, it had both rediscovered David Eppstein’s weekender and found a wholly new period-21 attachment that clings to its tail:

The user ‘iNoMed’ then discovered that two of these could interact to produce an exhaust that can be perturbed by a nearby weekender to emit a forward-moving glider every 42 generations:

Another feature of ikpx2 is its ability to find objects which have different components of different symmetries. Here, for example, is a spaceship with a symmetric head, asymmetric thorax, and symmetric abdomen. The tail is high-period, again having arisen from simulating a partial rather than from direct SAT solving:

Somewhat disappointingly, ikpx2 has not succeeded in finding any new spaceship velocities in the standard Life rules. I tested it on the same input as the original ikpx used to find Sir Robin; ikpx2 was able to find the same result approximately 50x faster (in terms of core-hours elapsed).

There was a recent near-miss, which would be a (2,1)c/7 spaceship were it not for a single extra cell born in generation 7 (the left configuration, with or without the indicated green triplet, gives rise to the right configuration with the frustrating red tetraplet):

The existence of near-misses such as this one makes me hopeful that it will eventually find a (2,1)c/7 spaceship given more time.

Other cellular automata

Unlike the original version of ikpx, this version is able to search a wide family of related cellular automata. In particular, it extracts the set of prime implicants from the cellular automaton rule (regarded as a 9-input 1-output boolean function) and uses that to encode the mechanics of the rule into the SAT problems.

In particular, two invocations of ikpx2 were sufficient to find the following growing knightship in the rule Day&Night: one invocation to find the fast (2,1)c/5 frontend which stretches a thick oblique line, and another invocation to find a slower (2,1)c/6 tail which consumes it:

Here’s an 11c/22 spaceship in a nonstandard rule found by lemon41625, which is an even better demonstration of the symmetry switching. The discovery consists of odd-symmetric, asymmetric, and even-symmetric components with a high-period exhaust:

Source code

The source code for ikpx2 is open-source (released under an MIT licence) so you can experiment with it yourself. It’s currently x86_64-specific, because it has apgsearch as a dependency (in order to quickly simulate the partials and examine whether they evolve into anything remotely interesting).

Posted in Uncategorized | 2 Comments

Let the circumcentre be your origin

Suppose we have two vectors, u and v, in a Euclidean vector space. If we wanted to somehow quantify the proximity of these two vectors, there are two particularly appealing choices:

  • the squared distance, |uv|²;
  • the inner product, u.v;

Indeed, the only isotropic (invariant under rotations/reflections of the ambient space) functions of u, v that can be expressed as polynomials of degree ≤ 2 in the entries of the vectors are precisely the linear combinations of |uv|², u.v, and 1. Conversely, if we know both the squared distance and the inner product, we can completely recover the pair of vectors up to rotations/reflections of the ambient space.

Both (squared) distances and inner products are very useful in practice, and it seems unsatisfying to have to choose between them. Fortunately, there’s a common situation in which you don’t need to do so: that’s when all of your points lie on an origin-centred sphere! In particular, if u and v both lie on an origin-centred sphere of radius R, we have:

|uv|² = 2(R² − u.v)

and conversely:

u.v = R² − ½|uv

so we can compute either of these quantities given the other one.

There are many applications in machine learning in which you want to compute the matrix of distances between a set of points in some latent space. If you’ve constrained the latent embedding to force everything onto the unit sphere, then this can be done very efficiently: you just compute the pairwise dot-products by a single multiplication of a matrix by its transpose, and then apply a simple elementwise transformation to convert these inner products into distances.

Often we don’t have the liberty to impose constraints on where our points lie, so having them be on an origin-centred sphere cannot be guaranteed. There is, however, one important exception:

Simplices

A non-degenerate simplex (i.e. a triangle, tetrahedron, or higher-dimensional analogue thereof) has a unique circumcentre, a point equidistant from all of the vertices. If you’re trying to reason about the geometry of a simplex, then you can firstly translate it so that this circumcentre coincides with the origin.

A helpful heuristic in solving Euclidean geometry problems concerned with a triangle is to ‘always draw the circumcircle’, and the approach of setting the circumcentre to be the origin is a natural extension of this. In Mathematical Olympiad Dark Arts (which I’m in the process of revising ready for publication as both books and online courses), this is the starting point for an algebraically convenient way to parameterise a triangle by complex numbers where the vertices are u², v², and w²:

By judiciously choosing the signs of u,v,w to ensure the angle bisectors meet the circle again at −vw, −uw, and −uv (this can be guaranteed), many of the most important triangle centres have positions given by homogeneous quadratic polynomials (or, failing that, rational functions) in u, v, w:

Similarly, important scalars associated with the triangle (such as the circumradius, inradius, semiperimeter, side-lengths, and so forth) are expressible as homogeneous polynomials in the parameters and their complex conjugates:

There’s actually a ‘strong type system’ lurking in here: we say that the parameters u, v, w have type (0, 1) and their conjugates have type (1, 0). Ordinary ‘dimensionless’ complex numbers (such as pi, i, and 2) have type (0, 0). Then we have the rules that if you multiply quantities, their types add elementwise, and you are only allowed to add/subtract quantities of the same type, and apply transcendental functions (such as exp) to ‘dimensionless’ quantities. In this type system, the following hold:

  • all points in the plane of the triangle have type (0, 2);
  • all lengths have type (1, 1);
  • all areas have type (2, 2);

and this type system helps catch any symbolic errors you might make when manually manipulating these expressions.

Cayley-Menger determinants

These are formulae for the volumes of simplices using only their squared side-lengths. We’ve looked at them before, where we used elementary row/column operations to manipulate determinants in order to prove their correctness. But with the trick of letting the circumcentre be the origin, we can much more succinctly prove the Cayley-Menger formula and a variant thereof.

In particular, here is the example for a tetrahedron (n = 3); it should be straightforward to see how it generalises:

Firstly, convince yourself that the matrix equation (first row) is true. It relies on what we’ve discussed about the relationship between dot-products and squared distances.

Observe the middle matrix, which is diagonal and has a signature of (1, n+1). You can think of this product as computing the (doubled) pairwise Lorentzian inner products of the rows of the leftmost matrix. The ‘time’ coordinate (first entry in each row) of the leftmost matrix is visibly equal to the norm of the ‘space’ coordinates (remaining entries), which is why each row has Lorentzian norm of zero (and therefore the diagonal of the product of the matrices is 0).

The two scalar equations below the matrix equation are, respectively:

  • the determinants of the upper-left submatrices of dimension n+1 (i.e. the matrices after the bottom row and rightmost column are removed);
  • the determinants of the full matrices of dimension n+2;

and the equations hold because determinants are multiplicative.

In the case of triangles, the first scalar equation simplifies to the theorem that the area of a triangle is abc/4R, where a,b,c are the side-lengths and R is the circumradius. The second scalar equation simplifies to Heron’s formula for the area of a triangle.

Posted in Uncategorized | Leave a comment

An attempt to understand the Monster group

The Monster group is very large, very complicated, and very mysterious.

According to the Classification of Finite Simple Groups that was completed last century, the Monster group is the largest of only 26 finite simple groups that do not fit into one of the infinite families of finite simple groups, namely:

  • the cyclic groups of prime order;
  • the alternating groups on 5 or more objects;
  • any of the ‘groups of Lie type‘, which are related to Lie groups but defined over finite fields.

The existence of the Monster was conjectured by Bernd Fischer and later constructed by Robert Griess. This construction was subsequently simplified by John Conway, but the resulting construction is still very complicated and somewhat piecemeal. Both constructions prove that the group is finite by showing that it’s the automorphism group of the Griess algebra defined on the ambient vector space.

Let’s look at the group A5, the smallest of the non-cyclic finite simple groups, by way of analogy. It’s the order-60 group of rotational symmetries of a regular dodecahedron, and this is the lowest-dimensional faithful representation of the group:

If we choose a coordinate basis for the dodecahedron such that the eight brightly-coloured vertices are (±1, ±1, ±1) and the remaining twelve vertices are the cyclic permutations of (±φ, ±1/φ, 0), then there’s a natural order-12 subgroup generated by cyclic permutations of the vertices together with an even number of sign flips.

This monomial subgroup is also a maximal subgroup, and happens to be the group A4 of rotations fixing a regular tetrahedron, such as the convex hull of the four blue vertices above. We can then describe A5 as being generated from this monomial subgroup together with an ‘exotic element’ ζ that sends:

  • (1, -1, -1) → (0, φ, -1/φ);
  • (-1, 1, -1) → (φ, -1/φ, 0);
  • (-1, -1, 1) → (-1/φ, 0, φ);
  • (1, 1, 1) → (-1, -1, -1).

We could also define A5 as the group of rotations which fix the following degree-6 polynomial (x − φy)(y − φz)(z − φx)(x + φy)(y + φz)(z + φx), which is isomorphic to Greg Egan’s potential function discussed here. This is mildly (but not precisely*) analogous to the description of the Monster as the automorphisms of the Griess algebra. Note that the polynomial is clearly invariant under the monomial subgroup A4, and with some effort can be shown to be invariant under the full group A5. Here’s a visualisation of the polynomial:

*in particular, the Griess algebra product and ambient inner product induce a symmetric trilinear real-valued function of three vectors, u.(v × w), whereas the dodecahedral potential is a non-linear real-valued function of a single vector.

Conway’s construction of the Monster group likewise begins with a maximal monomial subgroup, N0 = 2^35 (S3 × M24), and generates the Monster by adding an exotic element. But the construction is much more complicated, because:

  • the smallest dimension of a faithful representation of the Monster is 196883, compared with just 3 for the group A5;
  • the ambient 196883-dimensional space is a hodgepodge of multiple spaces, constructed in terms of various exceptional objects such as the Leech lattice, Golay code, and Parker loop.

Perhaps we could instead describe the Monster as the group of rotations fixing a set of vertices, in the same way that A5 can be described as the group of rotations fixing the 20 vertices of a dodecahedron? Again, this is possible: there’s a permutation representation on 97239461142009186000 vertices, namely the axes fixed by the centralisers of a certain important conjugacy class of elements in the Monster group (known as ‘transpositions’, ‘type-2A elements’, ‘short involutions’, or ‘Fischer involutions’).

The slight problem is that there are too many such vertices to write down explicitly. But maybe we can utilise the monomial subgroup, in the same way we did for A5: instead of listing all 20 vertices of the dodecahedron, it sufficed to list two of them, namely (1, 1, 1) and (0, φ, -1/φ), since the others are the images of one of these vertices under the action of the monomial subgroup.

Describing lattice points using the monomial subgroup

This same strategy (of describing a monomial subgroup together with a representative of each orbit) has already shown success in terms of studying one of the less complicated exceptional objects, the Leech lattice, where coordinates for the 196560 minimal nonzero vectors can be described efficiently as the set of images of:

  • (-3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1);
  • (4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
  • (2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);

where there are 98304 images of the first vector, 1104 images of the second vector, and 97152 images of the third vector. The monomial subgroup is the full automorphism group 2^12 (M24) of the binary Golay code (viewed as a subset of the vertices of a 24-dimensional cube, {-1, 1}^24) generated by coordinate permutations in M24 together with patterns of sign changes which coincide with elements of the Golay code. The Conway group Co0 is then generated by this monomial subgroup together with an exotic element, as before.

For the 20 vertices of the dodecahedron, we ended up with 2 orbits of points. For the 196560 minimal vectors of the Leech lattice, we have 3 orbits of points. We can ask the concrete question:

How many orbits are there, under the monomial subgroup N0 of the Monster group, of the 97239461142009186000 type-2A axes?

along with the natural follow-up questions:

What are the sizes of the orbits? And can we concisely describe coordinates of representatives of each orbit?

This set of vertices (whose automorphism group is the Monster) might give us more insight into the group, as well as providing a more convenient means of calculating with the group. There’s a Python package by Martin Seysen (from this year!) that could prove useful in trying to answer these questions.

We can also ask whether there’s a nice lattice associated with the Monster group, in the same way that the Leech lattice is associated with the Conway group Co0. There’s an allusion in Conway’s paper to such a lattice being investigated by Simon Norton, but this seemed to be a dead-end: I couldn’t find anything further on the topic, despite asking on MathOverflow.

Fortunately, Richard Borcherds (who won a Fields Medal for proving the monstrous moonshine conjecture) gave a talk on sporadic groups for the Archimedeans, and I was able to ask him about Norton’s lattice.

He responded by mentioning that he recalled that Norton’s lattice didn’t turn out to be unimodular, but that Scott Carnahan had recently constructed a unimodular lattice with Monster group symmetries. Carnahan obtains this lattice (in corollary 3.24) as the weight-2 subspace of an integral form he constructs on the monster vertex operator algebra, an infinite-dimensional graded algebra of which the Griess algebra is the weight-2 subspace.

It would be instructive to translate Carnahan’s lattice into the computationally convenient coordinate system used in Conway’s construction. This would hopefully allow one to study the geometry of the lattice by describing the shells of the lattice as unions of orbits of vectors under the monomial subgroup.

Posted in Uncategorized | Leave a comment

The exceptional Jordan algebra

In the early 1930s, Pascual Jordan attempted to formalise the algebraic properties of Hermitian matrices. In particular:

  • Hermitian matrices form a real vector space: we can add and subtract Hermitian matrices, and multiply them by real scalars. That is to say, if \lambda, \mu \in \mathbb{R} and A, B are Hermitian matrices, then so is the linear combination \lambda A + \mu B.
  • We cannot multiply Hermitian matrices and obtain a Hermitian result, unless the matrices commute. So the matrix product AB is not necessarily Hermitian, but the ‘symmetrised’ product A \circ B = \frac{1}{2}(AB + BA) is Hermitian, and coincides with ordinary multiplication whenever the matrices commute.

Now, this symmetrised product A \circ B is commutative by definition, and is also (bi)linear: (\lambda A + \mu B) \circ C = \lambda (A \circ C) + \mu (B \circ C). What other algebraic properties must this product satisfy? The important ones are:

  • Power-associativity: the expression A^n = A \circ \cdots \circ A does not depend on the parenthesisation.
  • Formal reality: a sum of squares is zero if and only if all of the summands are zero.

The second of these conditions means that we can say that an element of the Jordan algebra is ‘nonnegative’ if it can be expressed as a sum of squares. (In the familiar context of real symmetric matrices, this coincides with the property of the matrix being positive-semidefinite.) The nonnegative elements form a ‘cone’ closed under multiplication by positive real scalars and addition.

Jordan, von Neumann, and Wigner proceeded to classify all of the finite-dimensional algebras of this form (known as formally real Jordan algebras). They showed that every such algebra is a direct sum of ‘simple’ algebras, each of which is isomorphic to [at least] one of the following:

  • the real symmetric matrices of dimension n (for any positive integer n) with the aforementioned symmetrised product;
  • the complex Hermitian matrices of dimension n;
  • the quaternionic Hermitian matrices of dimension n;
  • the octonionic Hermitian matrices of dimension n (where n ≤ 3);
  • the algebras \mathbb{R}^n \oplus \mathbb{R} with the product (x, t) \circ (x', t') = (t'x + tx', \langle x, x' \rangle + tt'), known as ‘spin factors’. As John Baez mentions, these can be identified with Minkowski space, and the nonnegative elements are exactly the ‘future cone’ of the origin.

The qualification ‘at least’ is because there are some isomorphisms here:

Simple formally real Jordan algebras, showing the four infinite families and the exceptional Jordan algebra

Exactly one of these simple formally real Jordan algebras fails to fit into any of the four infinite families. This exceptional Jordan algebra is \mathfrak{h}_3(\mathbb{O}), the 3-by-3 self-adjoint octonionic matrices endowed with the symmetrised product. Viewed as a real vector space, it is 27-dimensional: an arbitrary element can be described uniquely by specifying the three diagonal elements (which must be real) and three lower off-diagonal elements (which can be arbitrary octonions); the three upper off-diagonal elements are then determined.

Projective spaces from Jordan algebras

Given a formally real Jordan algebra, we can consider the idempotent elements satisfying A \circ A = A. For the Jordan algebras built from n-by-n real, complex, or quaternionic matrices, these are the matrices with eigenvalues 0 and 1.

We get a partial order on these ‘projection’ matrices: A ‘contains’ B if and only if A \circ B = B. This partially-ordered set can be identified with the stratified collection of subspaces in the (n−1)-dimensional projective space over the base field:

  • the zero matrix corresponds to the empty space;
  • the rank-1 projection matrices correspond to points;
  • the rank-2 projection matrices correspond to lines;
  • the rank-(n−1) projection matrices correspond to hyperplanes;
  • the identity matrix corresponds to the full projective space itself.

The exceptional Jordan algebra gives us the octonionic projective plane, discovered by Ruth Moufang. We can’t get any higher-dimensional octonionic projective spaces because Desargues’ theorem is false in the octonionic projective plane, whereas it’s true in any plane that can be embedded in a 3-dimensional projective space. We mentioned this seven years ago.

This hints at why 4-by-4 and higher octonionic matrices have no hope of forming a formally real Jordan algebra: we’d be able to define an octonionic projective 3-space, which is impossible.

What about the spin factors? The idempotents in \mathbb{R}^n \oplus \mathbb{R} are:

  • the zero element (0, 0), corresponding to the ’empty space’;
  • the identity element (0, 1), corresponding to the ‘full space’;
  • the points (x, ½) where x is an arbitrary vector of length ½.

In other words, these correspond to spheres! Recall that the real, complex, quaternionic, and octonionic projective lines are (as topological manifolds) just the 1-, 2-, 4-, and 8-spheres, respectively; we can think of the spin factors as ‘projective lines’ built from arbitrary-dimensional spheres.

As for non-simple formally real Jordan algebras, the corresponding ‘projective spaces’ are just Cartesian products of the ‘projective spaces’ corresponding to the direct summands.

An exotic spacetime

In August of this year, Blake Stacey posted the following comment on John Baez’s essay:

Now for some context: it is possible to define the determinant of a 3-by-3 octonionic Hermitian matrix, and the group of linear operators (viewing \mathfrak{h}_3(\mathbb{O}) as a 27-dimensional real vector space) that preserves the determinant is a noncompact real form of the Lie group E6.

This group E6 is transitive on the positive-definite matrices of determinant 1. The subgroup fixing any one of these (without loss of generality, the identity matrix) is the compact real Lie group F4, which also preserves the Jordan product. This means that it maps idempotents to idempotents, so can be seen as acting on the octonionic projective plane as its group of projective transformations.

This group F4 is transitive on the rank-1 idempotent matrices, and the subgroup fixing any one of these is Spin(9). (As a result, we can describe the octonionic projective plane as the quotient F4 / Spin(9). Elie Cartan proved that all compact Riemannian symmetric spaces are quotients of compact Lie groups.)

What’s the analogy for familiar (3+1)-dimensional Minkowski spacetime?

  • the full group (analogous to E6) is the proper orthochronous Lorentz group;
  • the subgroup fixing a timelike vector (analogous to F4) is the rotation group SO(3);
  • the subgroup additionally fixing a lightlike vector (analogous to Spin(9)) is the rotation group SO(2);
  • the symmetric space (analogous to the octonionic projective plane) is the quotient SO(3) / SO(2), which is just the familiar 2-sphere.

A lattice in this exotic spacetime

It is natural to consider the ‘integer points’ in this spacetime, namely the octonionic Hermitian matrices where the off-diagonal elements are Cayley integers and the diagonal elements are ordinary integers. John Baez mentions that this is the unique integral unimodular lattice in (26+1)-dimensional spacetime, and it can be seen as the direct sum II_{25,1} \oplus \mathbb{Z} of the exceptional Lorentzian lattice with a copy of the integers.

This lattice was thoroughly investigated in a marvellous paper by Noam Elkies and Benedict Gross. Possibly the most surprising discovery in this paper is that whilst E6 acts transitively on the positive-definite matrices of determinant 1, this no longer holds when you ‘discretise’! More precisely, the positive-definite ‘integer points’ of determinant 1 form two distinct orbits under the discrete subgroup of E6 that preserves the lattice.

One of these orbits contains the identity matrix; the other contains the circulant matrix with elements {2, η, η*} where \eta = \dfrac{-1 + \sqrt{-7}}{2}. [Note: there’s a 6-dimensional sphere of octonionic square-roots of -7. You’ll need to choose one that results in η being a Cayley integer.] If you use this other matrix E as your quadratic form instead of the identity matrix I, this leads to a very natural construction of the Leech lattice.

Specifically, as shown in the Elkies-Gross paper, triples of Cayley integers with the norm \langle x | E | x \rangle form an isometric copy of the Leech lattice! By contrast, the usual inner product \langle x | I | x \rangle using the identity matrix as the quadratic form gives the direct sum E_8 \oplus E_8 \oplus E_8 — again an even unimodular lattice in 24 dimensions, but not as exceptional or beautiful or efficient as the Leech lattice.

Further reading

To get a full understanding of the octonions, Cayley integers, and exceptional Jordan algebra, I recommend reading all of the following:

Robert Wilson has also constructed the Leech lattice from the integral octonions (see here and here). Wilson’s construction also involves \eta = \dfrac{-1 + \sqrt{-7}}{2}, so it may be possible to show reasonably directly that it’s equivalent to the Elkies-Gross construction.

Posted in Uncategorized | Leave a comment

Closed-form numbers

Recall that the logarithm function (defined on the punctured complex plane) is multi-valued, with the different solutions to exp(x) = y differing by integer multiples of 2πi.

Visualisation of the imaginary part of the multi-valued complex logarithm function, created by Leonid 2.

 

James Propp remarked that the values of the form i^i^i are a dense subset of the unit circle. Extending this, one can show that the values of the form i^i^i^i are not dense in the complex plane, but those of the form i^i^i^i^i are.

Defining the EL-numbers

Timothy Chow described an interesting subfield of the complex numbers, known as either closed-form numbers or (less ambiguously) EL-numbers. We’ll present a slightly different but equivalent definition here, where we embrace the complex logarithm function in its full multivalued glory instead of choosing a principal value of the logarithm function.

In particular, the EL-numbers can be defined as follows:

  • Zero: the complex number 0 is an EL-number;
  • Addition/subtraction*: given two EL-numbers, a and b, then f(a, b) is also an EL-number where c = f(a, b) is the solution to a + b + c = 0;
  • Exponentiation: given an EL-number x, then exp(x) is also an EL-number;
  • Logarithm: given a nonzero EL-number y, then (at least**) one solution of exp(x) = y is also an EL-number.

*this slightly unconventional operation f, which is equivalent to having both addition and subtraction rules, was chosen because it’s completely symmetric: the validity of c = f(a, b) is invariant under permutation of a, b, and c. The relevance of this will become apparent in a future post.

**we show that this implies that all solutions are EL-numbers, so this definition is well-defined.

The first two rules provide us with unary negation and addition, so we know that the EL-numbers form an additive group:

  • x = f(x, 0);
  • xy = f(f(x, y), 0).

Togther with the other two rules, we can show that the EL-numbers form a field. In particular, if x and y are nonzero, then exp(log(x) + log(y)) = xy and exp(−log(x)) = 1/x irrespective of which choices of logarithms were taken. Finally, we need the multiplicative identity 1 to exist, but that can be obtained straightforwardly as exp(0).

Consequently, every rational number is a deterministic EL-number: we can, for each rational q, write down an expression which evaluates to q irrespective of which choices of logarithms are taken. Other deterministic EL-numbers include e, which can be obtained as exp(exp(0)). More generally, every element of the smallest field containing the rationals and closed under exponentiation is also a deterministic EL-number, and we conjecture that the converse is also true.

Mitigating multivaluedness

To show that we can obtain all logarithms (not just one), we apply the following procedure:

  • Apply the logarithm rule to obtain some solution x to exp(x) = y. This may be different from the ‘desired’ solution, w;
  • Apply the logarithm rule to obtain some solution p to exp(p) = −1. This is necessarily some odd integer multiple of πi.
  • We know that wx is therefore a rational multiple of p, say qp, so we can construct q deterministically, multiply it by p, and add the result to x to obtain w.

Note that this was somewhat ‘interactive’: we needed to know what solutions x and p were obtained in order to choose the correct rational number q to use. The ‘closed-form expression’ for w depends on the branches of the logarithms taken, and each of these expressions are ‘nondeterministic’ in the sense that if someone changed which branches were taken, they would no longer define w.

Nondeterministic EL-numbers

Clearly, some deterministic EL-numbers can have nondeterministic expressions: for example, exp(log(exp(0))/2) can be either 1 or −1, even though these are both deterministic EL-numbers. Are there any EL-numbers which only have nondeterministic expressions?

Well, yes: the imaginary unit i can be obtained as exp(log(−1) / 2). We’ll call this a nondeterministic EL-number, because any closed-form expression for i is also a closed-form expression for −i. By the same argument (namely the fact that complex conjugation is an exponential field automorphism of the complex numbers), every nonreal EL-number is necessarily nondeterministic.

π is also an EL-number, because both πi and i have already been shown to be EL-numbers, and we know that they form a field. Is it a deterministic EL-number? Certainly it hasn’t been proved that this is not the case, because it hasn’t even been shown that e + π is irrational. But on the other hand, it seems very difficult to find a construction of π that doesn’t also end up constructing an infinite number of other multiples of π.

My suspicion is that π is not a deterministic EL-number — can this be proved, conditional on a plausible assumption such as Schanuel’s conjecture?

Posted in Uncategorized | Leave a comment

The Barnes-Wall lattices

For any nonnegative integer n, there exists the highly symmetric Barnes-Wall lattice in dimension 2^n. In low dimensions, these are (up to scaling and rotation) familiar lattices:

  • For n = 0, this is just the integer lattice, \mathbb{Z}.
  • For n = 1, the lattice is D_2, a scaled and rotated copy of the familiar square lattice \mathbb{Z}^2.
  • For n = 2, we obtain the D_4 lattice (consisting of all integer points with even coordinate sum). The convex hull of the vectors closest to the origin form a 24-cell, a self-dual regular polytope in four dimensions.
  • For n = 3, this is a scaled copy of the E_8 lattice, discovered by Thorold Gosset and proved by Maryna Viazovska to be the densest packing of unit spheres in eight-dimensional space.
  • For n = 4, this is a scaled copy of the 16-dimensional laminated lattice, \Lambda_{16}, which again is the densest known lattice packing in its dimension. In Conway and Sloane’s Sphere Packings, Lattices and Groups, this is additionally described as a section of the Leech lattice.

We’ll describe a construction of the Barnes-Wall lattices that makes their full symmetry groups apparent (with the unavoidable exception of n = 3). Unlike the other simple constructions of the Barnes-Wall lattices, which involve quadratic fields, this construction only makes use of the classical integers \mathbb{Z}.

The real Clifford group and the minimal vectors

Being slightly notationally liberal (denoting a Weyl group by the name of its Dynkin diagram), let F_4 \subset O(4) be the symmetry group of a standard 24-cell. It has order 1152. In particular, it contains an index-3 subgroup consisting of the 384 signed permutation matrices; the remaining 768 elements are precisely the 4-by-4 Hadamard matrices (see previous post), appropriately scaled so as to be orthogonal matrices.

Being extremely notationally liberal, for n ≥ 3 we let F_{2^n} \subset O(2^n) be generated by:

  • the symmetric group S_n, which acts naturally on \mathbb{R}^{2^n} when considered as an n-fold tensor product of \mathbb{R}^2 with itself;
  • the group F_4, where we expand each of the 4 \times 4 matrices to a 2^n \times 2^n matrix by taking the Kronecker product with a 2^{n-2} \times 2^{n-2} identity matrix.

Equivalently, in the language of minimalistic quantum computation, it’s the group generated by (real orthogonal) 2-qubit gates with dyadic rational entries. (These are exactly elements of F_4 conjugated by elements of S_n.)

Technically, we’ve only defined the groups F_{2^n} for n ≥ 2. For completeness, we define F_1 and F_2 to be the groups of signed permutation matrices in dimensions 1 and 2, respectively. To cover these cases, we can amend the previous definition to:

For any nonnegative integer n, F_{2^n} \subseteq O(2^n) is the group of n-qubit gates generated by the dyadic rational gates acting on at most 2 qubits.

Then the minimal vectors of the Barnes-Wall lattice in dimension 2^n are just the images of the all-ones vector (1, 1, …, 1) under the action of this real Clifford group F_{2^n}. The number of such vectors is given by A006088, and the Euclidean length of each vector is \sqrt{2^n}.

We can then simply define the Barnes-Wall lattice to be the lattice generated by these minimal vectors! The determinant of the lattice is 2^{n 2^{n-3}}.

The full symmetry group

Whenever n ≠ 3, the symmetry group of the lattice is exactly the real Clifford group described above, with order given in A014115.

When n = 3, however, the symmetry group of the lattice is 270 times larger! Rather than just being F_8, it additionally contains arbitrary permutations of the eight coordinates, and is the Weyl group of E_8.

A generator matrix

It is convenient to have an integral basis of this lattice, compatible with the described construction. In particular, the basis vectors will be a convenient subset of the minimal vectors, and moreover will lie on the vertices of the cube [-1, +1]^{2^n}.

Greg Kuperberg describes an elegant construction of an integral basis of minimal vectors in a scaled and rotated copy of the Barnes-Wall lattice. In particular, he takes the (n-1)-fold tensor product of the matrix [[1, 1], [1, i]] and then replaces each of the resulting (complex) entries a + bi with a 2-by-2 real matrix [[a, b], [-b, a]].

We’ll modify this construction slightly by multiplying the (n-1)-fold tensor product by the scalar 1 + i. This corresponds to scaling and rotating the lattice such that it agrees with our construction, and has the elegant property that every entry of the resulting real matrix is ±1. We’ll also permute the rows so that the matrix is symmetric.

The first six generator matrices are shown below:

The code above uses the following equivalent formulation:

  • Take an M-by-M array, where M = 2^{n-1}, and number the rows/columns in a 0-indexed fashion.
  • Define the weight of the (i, j)th entry to be popcount(i & j). That is to say, we take the size of the intersection between the set of ‘1’-bits in the binary expansions of i and j.
  • Populate the (i, j)th entry with a 2-by-2 symmetric matrix according to the weight modulo 4:
    • if 0 modulo 4, use the matrix [[1, 1], [1, -1]];
    • if 1 modulo 4, use the matrix [[-1, 1], [1, 1]];
    • if 2 modulo 4, use the matrix [[-1, -1], [-1, 1]];
    • if 3 modulo 4, use the matrix [[1, -1], [-1, -1]].

(These are precisely the four real symmetric 2-by-2 (±1)-matrices which square to 2I.)

Related reading

Other constructions of the Barnes-Wall lattices appear in:

Posted in Uncategorized | 1 Comment