Every finite phoenix has period 2

A phoenix is an oscillator in Conway’s Life where every cell dies in every generation. The smallest example is Phoenix 1, which oscillates with period 2 and has a constant population of 12:

All known finite phoenices have period 2, and Stephen Silver proved in 2000 that there cannot exist a finite phoenix of period 3. Alex Greason more recently proved the non-existence of any phoenix (finite or infinite) with period 3 or 5.

Infinite phoenix agars (patterns that are periodic in two directions, filling the whole plane) and wicks (patterns that are periodic in one direction) are known for certain larger periods; the forum user wwei23 recently showed the existence of phoenix wicks of all periods divisible by 6:

Construction by wwei23 showing the existence of phoenix wicks of all periods of the form 6n

It seemed as though a finite period-4 phoenix may have been possible, as Keith Amling found period-4 wicks consisting of a narrow flexible rope supported by finite period-2 supports:

Keith Amling’s flexible period-4 wick

In particular, if it were possible to bend this around somehow into a closed loop, then we would have a finite period-4 oscillator. After trying in vain for a long time to find one, it became increasingly plausible that no such oscillator exists. Eventually it was possible to prove the non-existence of finite phoenices of periods between 3 and 69, and eventually prove the non-existence of finite phoenices of any period other than 2. The proof is computer-assisted, making use of SAT solvers to automate finite case-bashes (much like Alex Greason’s disproof of p3 and p5 phoenices), but the overall structure of the proof is quite simple and human-comprehensible.

Preliminaries

Before we begin the main proof, we will establish facts about finite phoenices which help to accelerate the SAT solver by reducing the search space that it needs to explore. These facts will also be proved with the help of a SAT solver, but obviously for the avoidance of circularity we cannot assume these facts until they have been proved. In particular, the overall structure of the proof will look like:

  • A: only 11 of the 16 possible 2 × 2 squares can occur in a finite phoenix;
  • B: only 99 of the 512 possible 3 × 3 squares can occur in a finite phoenix;
  • C: every finite phoenix has period 2;

where we assume A when proving B, and assume B when proving C.

Suppose that we have a finite phoenix oscillator. We say that a 2 × 2 square of cells is heavy if there is some time T where at least three of the four cells in that square are live. We show that no heavy squares can exist by the following argument:

  • Suppose there exists such a heavy square [x, x+1] × [y, y+1];
  • With out loss of generality, suppose that (x, y) is maximal, with respect to the lexicographic ordering on Z^2, over all such heavy squares (we can do this because the oscillator is finite by assumption);
  • Let T be a generation for which the heavy square contains at least three live cells, and then consider the 16 × 16 × 8 box of cells (the 16 × 16 neighbourhood centred on the heavy square, from time T − 5 to T + 2);
  • Create a Boolean variable for each cell together with constraints specifying that the Life rules are followed, that every live cell dies in every generation, and that there is no heavy square [u, u+1] × [v, v+1] such that (u, v) is lexicographically greater than (x, y);
  • Feed the resulting constraints into a SAT solver and derive a contradiction.

Now that we have established that no heavy squares can exist, we can feed this in as an additional constraint to speed up subsequent SAT problems. The next thing that we do is determine the set of possible 3 × 3 neighbourhoods that can occur at some generation in a finite phoenix: we find that of the 512 possible neighbourhoods (102 up to symmetry), only 99 of these (22 up to symmetry) can occur; they were originally tabulated by forum user wwei23 here. (Apparently wwei23 was able to establish this for all phoenices, finite or infinite, with the sole exception of the Venetian blinds agar.)

These neighbourhood constraints can again be injected into any SAT problems to accelerate the search. Specifically, we combine the Life rules with these neighbourhood constraints to obtain, for every 10 cells consisting of a 3×3 neighbourhood at time t together with the central cell at time t + 1, a proposition in these 10 Boolean variables whose 1024-element truth table consists of 99 true values and 925 false values. We encode this proposition by specifying all minimal clauses that are implied by this proposition, the set of which can be determined by an algorithm by Eugenio Morreale described in the solutions to exercises 29 and 30 from 7.1.1 of Knuth’s TAOCP.

The code for proving these preliminary lemmata is here.

The main proof

Suppose that we have a finite phoenix of period greater than 2. We define the extremal cell to be the cell (x, y) with the following properties:

  1. The cell (x, y) oscillates with period greater than 2;
  2. The value x + y is maximum amongst all cells with this property;
  3. The value y is maximum amongst all cells with these properties.

By definition, any other cell (u, v) with u + v > x + y or with u + v = x + y and v > y will necessarily be either constantly off or oscillate with period 2.

In the diagram above, we show the extremal cell in deep purple and the surrounding 29 × 29 neighbourhood. The forced-vacuum-or-p2 cells are shown in green; the cells that are allowed to be higher period are shown in purple. We also highlight a 39-cell patch centred on the extremal cell; this will be important later in the proof.

As the central cell oscillates with period greater than 2, there must be a time T for which the central cell is off at time T and on at time T+2. We now consider the 29 × 29 × 32 box of cells (the 29 × 29 neighbourhood from time T − 17 to T + 14) and create a Boolean variable for each cell; to these 26912 variables we introduce the following constraints:

  • every 3 × 3 neighbourhood is one of those that can appear in a phoenix, and the Life rules are obeyed;
  • if (u, v) is in the green region, then the variables (u, v, t) and (u, v, t+2) are equal;
  • the central cell (x, y, T) is off and (x, y, T+2) is on.

We then run an incremental SAT solver (Armin Biere’s CaDiCaL) on this problem to find all possible values of the 39-cell patch at time T. It transpires that there are 20 such possibilities for the patch:

For each of these patches, we create a new SAT problem, again on a 29 × 29 × 32 box of cells (albeit shifted 2 generations backwards in time), and search for all patches that can occur 2 generations before each of these patches. Ignoring the patches above, there are a further 20 distinct patches that can occur:

We can keep repeating this process, forming a breadth-first traversal of the directed graph of possible 39-cell patches that can occur at times T − 2k in the ancestry of the central cell. We find 46 more patches in the next layer, then 35 in the next, then 5, 6, 20, 28, 13, 4, and finally 0. At this point, we have traversed all of the possible 39-cell patches (197 in total).

For all of these 197 possible patches, the central cell is off, which means that there’s no time T − 2k which matches the state at time T, so we cannot have a periodic phoenix. (In particular, if we had a phoenix of period p, then it must be identical in generations T − 2p and T, thereby obtaining a contradiction.)

It therefore follows that every finite phoenix has period 2.

When I ran the program to enumerate the possible patches, I also tracked the full directed graph that contains a vertex for each of the 197 patches and a directed edge whenever a patch can be the grandparent of another patch. The reason for doing this is that I didn’t know a priori that all of the patches would have central cell off, leading to an easy proof; instead, I thought that it might be the case that some had central cell on, but that there’s no directed cycle containing such a vertex (a weaker condition, but still sufficient for the proof).

The graph has 197 vertices, 1017 edges, and 10 connected components, but there’s no obvious structure beyond that:

Corollaries

In any case, the fact that every finite phoenix has period 2 leads to the corollary that every phoenix has constant population: if it’s infinite, then the population is ℵ_0; otherwise, the oscillator is finite (therefore p2) and we can consider the (disjoint) sets A and B, where A is the set of cells alive at even generations and B is the set of cells alive at odd generations. Every cell in A has exactly 3 neighbours in B, and every cell in B has exactly 3 neighbours in A, so the sets have equal size.

(The p2 part of this proof was done much earlier by the forum user Praosylen who looked at the 3-regular bipartite graph with vertex-classes A and B and an edge whenever two cells are adjacent, and proved other properties such as bridgelessness. All of this research now applies to all finite phoenices, since non-p2 finite phoenices have been ruled out.)

Posted in Uncategorized | Leave a comment

Miscellaneous discoveries

Soon after the previous post announcing the discovery of an aperiodic monotile by Smith, Myers, Kaplan, and Goodman-Strauss, the same authors published a second aperiodic monotile which has the property that all of the tiles are of the same orientation: reflections are not needed.

This new tile, dubbed Spectre, is Tile(1,1) from the previous paper but with perturbations on the edges to enforce that all tiles are similarly oriented. The authors provide a curvilinear realisation, but there is also an equilateral polygonal realisation with 28 edges (two of which are parallel, so this is equivalently a 27-gon where one edge is twice as long as the remaining 26 edges):

Unlike their previous aperiodic monotile, the ‘hat’, this monotile cannot be described as the union of tiles of an Archimedean dual tiling. It does, however, admit the above description where every vertex lies in the ring of integers of the 12th cyclotomic field.

Omniperiodicity

In other news, Conway’s Game of Life has now been proved omniperiodic (for every positive integer n, there exists an oscillator of period n), with the recent discoveries this month of the remaining two unsolved periods, 19 (by Mitchell Riley) and 41 (by Nico Brown).

The proof of omniperiodicity involves witnesses for every period up to 42, beyond which adjustable glider loops handle every period from 43 upwards. These witnesses are listed on the status page.

Tetrational diehard progress

In the same cellular automaton, there has been a significant improvement in terms of engineering a ‘diehard’ (pattern that eventually fully disappears after a long number of generations) in a bouncing rectangle of area < 10000. Last year, we mentioned that Pavel Grankovskiy had engineered a diehard with a lifespan exceeding:

10^10^10^10^10^10^10^10^10

Note that this is a tetrational (iterated exponential) tower of height 9. In the last few days, various authors have increased the height of this tower considerably: Pavel Grankovskiy and a pseudonymous forum poster (toroidalet) improved the height to 15 before, in the space of a few hours, it explosively increased to 25, then 35, then 310, then 320, then 363, then 13131937954518, and now the record stands at 1.1038 × 10^1046.

The record-holder is really rather complicated and fits in a 109 × 91 box:

The pattern has been colour-coded vaguely in terms of function. The dense blobs of ‘spacedust’ are the results of SAT-solver-based predecessor searches to squeeze the pattern into this box. After running it for 120 generations, the pattern is less inscrutable:

Travelling to the southwest is a c/5 diagonal spaceship (in white) flanked by two c/4 diagonal ‘boatstretchers’ (one in mint green, and one in duck-egg blue). Most of the junk inside the body of the configuration exists as a delay mechanism to prolong the burning of the duck-egg blue fuse for as long as possible (more than eleven thousand generations). At generation 11880, we see that the fuse has commenced burning to leave a trail of loaves:

The fuse burns more quickly than the boatstretcher can extend it, and by generation 16000, it has completely burned to leave behind a trail of 478 loaves. Pairs of gliders produced by the body of the configuration head towards the receding c/5 diagonal spaceship:

When they reach the c/5 diagonal spaceship, a block is produced, and the pairs of gliders proceed to pull the block backwards, one cell at a time, until it collides with the southwesternmost loaf, cleanly annihilating it. If the distance between the spaceship and the loaf is N, then it takes 116N generations to pull the block back to the loaf (the gun is period-120, but the Doppler effect means the block is pulled with period 116), and then another 484N generations for the glider stream to catch up with the spaceship. As such, the spaceship has receded by a distance of 600N/5 = 120N, so it is 121 times further away than it was before.

This process repeats, with each of the 478 loaves multiplying the delay by a factor of 121. A few more factors of 121 are consumed by the lemon-yellow junk in the body of the mechanism, at which point the mint-green fuse is ignited.

This second fuse burns much like the first fuse, except now the scale is much larger: instead of producing 478 loaves, it produces 1.1038 × 10^1046 loaves. More precisely, the number of loaves is expressible using the following Python expression:

int('306'+'377400'*162+'38'+'78'*13+'56',11)

Here is the northeast corner of the pattern after the fuse has been ignited, simulated using a hashlife implementation. It took approximately 16 minutes to run the pattern for the 2^3480 generations necessary to reach this stage:

We have a second block-pull tractor beam mechanism, precisely reflected across the c/5 diagonal spaceship’s line of symmetry; this one is exponentially sparse (therefore not visible in the picture above), and so the times at which the loaves are removed are tetrationally sparse. Consequently, the time for the first n loaves to be destroyed is a power tower of height n. After a time expressible only as a power tower of height 1.1038 × 10^1046, all of the loaves are destroyed, and the entire mechanism cleanly self-destructs (with the help of the salmon and beige cells) to leave zero live cells.

Note that whilst hashlife is capable of simulating the 2^3480 generations necessary to burn the second fuse, it cannot handle the tetrational part of the mechanism; instead, this analysis relies on bespoke mathematical analysis of the behaviour of the pattern.

More ambitious plans

This is still at a very low rung on the fast-growing hierarchy: we would need an extra c/5d spaceship and corresponding pair of tractor beams for every 2 levels of the hierarchy, and even then we are restricted to primitive-recursive functions. If we dispense with the 10000-cell bounding box limitation, there are plans to build self-destructing Turing machines with extremely long lifespans. One of those plans involves a function that grows so quickly that it cannot be proved total in Peano arithmetic, based on the ‘primitive sequence system’. But that is a special case of something even more powerful, namely…

Bashicu Matrix System proved well-ordered

…the Bashicu Matrix System.

A very exciting paper by Samuel Vargovcik proves the well-orderedness of a conjectured system of ordinal notations, the Bashicu Matrix System, named after its discoverer. Ordinals (up to some large but as-yet-undetermined countable ordinal) are represented as matrices of natural numbers.

The order type of height-1 matrices (‘primitive sequence system’) is ε_0, same as the Kirby-Paris hydra; the order type of height-2 matrices (‘pair sequence system’) is Buchholz’s ordinal; beyond that, the exact order types are unknown and were only proved well-ordered by Vargovcik’s recent paper.

This is particularly noteworthy as other systems of ordinal notations, such as those of Buchholz, are bounded by much smaller ordinals (e.g. TFBO). The Bashicu Matrix System gives a very concise tabular representation for ordinals up to (some very large ordinal, far beyond TFBO); for example, the Bachmann-Howard ordinal is (according to this page) representable as the Bashicu matrix ((0, 0), (1, 1), (2, 2)), read in column-major order. The supremum of all height-2 matrices is the height-3 matrix ((0, 0, 0), (1, 1, 1)), which therefore corresponds to Buchholz’s ordinal.

Posted in Uncategorized | 5 Comments

Aperiodic monotile

David Smith, Joseph Myers, Craig Kaplan, and Chaim Goodman-Strauss have discovered an aperiodic monotile: a polygon that tiles the plane by rotations and reflections, but cannot tile the plane periodically.

Any tiling induced by the monotile is scalemic: the majority of tiles are unreflected, with only a relatively small proportion (shown in dark blue) being reflected. The ratio of unreflected to reflected tiles is φ^4 : 1, as can be determined from taking the authors’ description of the tiling in terms of a substitution system of ‘metatiles’:

  • H → 3H + 3P + 3F + T
  • P → 2H + P + 2F
  • F → 2H + P + 3F
  • T → H

and then taking the dominant eigenvector (corresponding to the dominant eigenvalue φ^4) to find the proportions of metatiles in the limit tiling, and using that to deduce the fraction of unreflected monotiles.

The tile itself is remarkably simple: it is a 13-sided polygon formed from the union of eight of the kites in the deltoidal trihexagonal tiling, which is itself the common refinement of the regular hexagonal tiling and its dual:

Adapted from an illustration by Tilman Piesk

This paper came a few days after the solution of another long-standing open problem in combinatorics, namely an improvement to the exponent in the upper bound on the Ramsey number R(s, s).

Additionally, a few months ago Barnabas Janzer proved that there exists a pair of convex compact sets S, K \in \mathbb{R}^4 each homeomorphic to a closed 4-ball such that a copy of S exists inside K in every orientation, but such that (remarkably!) the space of ways to embed copies of S inside K is not path-connected. This resolves a question posed by Hallard Croft.

Posted in Uncategorized | 7 Comments

The Osmiumlocks Prime

A couple of years ago I described a prime p which possesses various properties that renders it useful for computing number-theoretic transforms over the field \mathbb{F}_p. Specifically, we have:

p = \Phi_{192}(2) = \Phi_6(2^{32}) = 2^{64} - 2^{32} + 1

where the first of these equalities uses the identity that:

\Phi_{k}(x) = \Phi_{rad(k)}(x^{k/rad(k)})

where rad(k) is the product of the distinct prime factors of k.

It transpires that at approximately the same time, this field and its properties were rediscovered by the authors of the cryptographic protocol Plonky2. Remco Bloemen has written an article on this prime, which he calls the Goldilocks prime. Remco’s article is very much worth reading; it contains a bunch of new things not present in my original article, such as a fast implementation of modular inversion* based on the binary GCD algorithm, as well as a comprehensive discussion of Schönhage’s trick and some closely related variants.

*which can be improved by a further 20% as implemented here.

Besides convenient real-world properties such as fitting neatly into a machine-sized integer, there are two important mathematical properties of p:

  • p − 1 is divisible by 2^{32}, so 2^{32}th roots of unity exist in the field \mathbb{F}_p. This supports power-of-two-length NTTs up to length 2^{32}.
  • 2 is a primitive 192th root of unity, so multiplying an element of the field by a 192th root of unity (or indeed a 384th root of unity, by Schönhage’s trick) is particularly efficient. Since 192 has a large power-of-2 divisor (64), this means that we can use radix 64 for implementing these NTTs efficiently.

This prime has a much larger cousin with very similar properties:

q = \Phi_{1600}(2) = \Phi_{10}(2^{160}) = 2^{640} - 2^{480} + 2^{320} - 2^{160} + 1

We’ll call this the Osmiumlocks prime, because it’s in some sense a heavier version of the Goldilocks prime.

In this case, 2 is a primitive 1600th root of unity, and 2^{160}th roots of unity exist in the field, so we can have extremely large power-of-two-length NTTs. This is very fortuitous, even more so than the Goldilocks prime p, because the reason that k/rad(k) = 160 is so large is because the prime factorisation of k = 1600 = 2^6 . 5^2 has both prime factors repeated.

The beneficial properties of the Osmiumlocks prime q as compared with the Goldilocks prime p are that it’s sufficiently large to use for cryptography (for example, as the base field for an Edwards curve) and that it allows vastly longer power-of-two-length NTTs. You would only be limited by the maximum length of a power-of-two-length NTT over this field if you could store 2^160 words in memory; unless each word could be stored in fewer than 100 atoms, there are insufficiently many atoms inside the planet to build such a machine!

The disadvantage is that, being 640 bits long, field elements no longer fit into a 64-bit machine word. We’ll address this issue shortly.

Also, whereas multiplying by cube roots of unity was efficient in the Goldilocks field, it’s no longer efficient in the Osmiumlocks field; conversely, multiplying by fifth (and, less usefully, twenty-fifth) roots of unity is now efficient whereas it wasn’t in the Goldilocks field. All of the roots of unity in the Goldilocks field still exist in the Osmiumlocks field, though, because q − 1 is divisible by p − 1.

An unlikely alliance

I originally saw these two primes p and q as competitors: two primes with similar properties but very different sizes. It transpires, however, that the properties of p help to address the drawbacks of q, namely its inability to fit into a single machine word.

How would we implement arithmetic modulo q? If we represented it as 640/64 = 10 machine words, then we’d need 100 multiplication instructions (together with carries and suchlike) to implement multiplication using the ordinary method for integer multiplication. Instead, we’ll make use of the following observation:

2^{800} + 1 = (2^{160} + 1)q

This means that the field \mathbb{F}_q embeds into the ring of integers modulo 2^{800} + 1, so we can do our arithmetic in this larger ring (reducing at the end if necessary). How does that help? If we represent the elements in this ring as degree-n polynomials in a variable X = 2^{800/n}, then multiplying the polynomials modulo X^n + 1 is just a negacyclic convolution.

If we set n = 32, so that X = 2^{25}, then the coefficients of the polynomials are 25-bit integers. When we take a negacyclic convolution of two such polynomials, each coefficient will be the sum 32 different 50-bit products, which (allowing for an extra bit to deal with signs) will therefore fit in 56 bits.

We can do this negacyclic convolution efficiently using NTTs if we have efficient 64th roots of unity, which indeed we have in the original Goldilocks field! We can therefore compute one of these length-32 negacyclic convolutions over the Goldilocks field \mathbb{F}_p, involving just 32 multiplications of machine integers, a factor of 3 smaller than the 100 we would have otherwise needed.

We’ll need other operations, such as shifts and adds, but these tend to be cheap operations. Whether or not this is worthwhile depends on the size of the hardware multiplier. Many GPUs implement a full 64-bit product as many individual calls to a smaller (16- or 32-bit) hardware multiplier, so there we would experience a greater advantage from this NTT-based approach.

Also, the relevant operations vectorise very well: on a GPU, each warp consists of 32 threads, and we can have the ith thread in a warp hold the coefficient of X^i. Both the NTT butterfly operations and final carrying operations (to reduce the 56-bit coefficients back down to 25 bits) can be efficiently implemented using shuffles, which we have previously used in the context of fast Gaussian random number generation, and any arithmetic operations are performed across all threads in parallel.

Posted in Uncategorized | 1 Comment

Searching for optimal Boolean chains

I gave a half-hour talk on Tuesday about the project to search for optimal Boolean chains for all equivalence classes of 5-input 1-output and 4-input 2-output functions. The talk was not recorded, but the slides and transcript are included here for posterity. Some of the ideas in the talk haven’t been previously published, although it does follow a similar methodology to that which we introduced in a previous post.

The transcript is designed to be readable in a standalone fashion, so it incorporates both the spoken parts (in pale-yellow blocks) and the contents of the slides themselves. Both the transcript and slides were compiled from a single source file using an ad hoc build system, which preprocesses the source file into multiple separate LaTeX files which are compiled in parallel:

The build system also supports inline code in either Python or Wolfram Mathematica for drawing images, which is often more concise and expressive than drawing graphics using LaTeX’s native TikZ package. (It’s implemented using LaTeX’s immediate write18 commands, but with hash-based caching so that graphics are only regenerated if that part of the source file changes.)

Posted in Boolean optimisation | Leave a comment

The ordered partial partition polytope

In the tensor rank paper we introduced a new family of axis-aligned n-dimensional polytopes, one for each positive integer n. The vertices are naturally identified with ordered partial partitions (OPPs) of {1, …, n}, and the edges correspond to converting one OPP into an ‘adjacent’ OPP by moving an element in a specific way.

Here are the polytopes for n = 2 and n = 3:

Geometrically, the polytope has a simple description as being the set of all vectors x \in \mathbb{R}^n such that, for all k, the kth smallest coordinate is in the interval [0, k]. It has volume (n+1)^{n-1} and can tile the ambient space by translations.

But this doesn’t shed very much light on the combinatorial structure of the polytope, so it is useful to have a concrete description of the vertices, edges, and higher-dimensional faces. This was introduced in the paper, but the paper was chiefly concerned with the tensor rank of the determinant and therefore we did not delve too deeply into this.

As in the figure above, an ordered partial partition can be described by a list of disjoint subsets. For example, one of the OPPs of {1, …, 8} is:

[{3, 5}, {7}, {1, 4, 8}]

It is ordered because the order of the parts matters, and partial because some elements are allowed to be missing (in this case, 2 and 6). We can also draw this as a diagram where the parts are shown as layers, and all of the absent elements are present in a ‘ground level’ together with a sentinel value 0:

Whilst such a diagram takes up a lot more space, it is more helpful for visualising how an OPP can be transformed into an adjacent OPP. The two allowed moves are:

  • Taking a (nonzero) element that is the only occupant of its level (e.g. 7), and ‘merging it down’ into the next level (currently {3, 5}). The total number of levels decreases by 1.
  • Taking a (nonzero) element that is not the only occupant of its level (e.g. 2), and ‘separating it up’ into its own new level. The total number of levels increases by 1.

(This description is different from, but equivalent to, the one in the paper: the paper omits the ‘ground level’ and therefore instead has elements disappear and reappear; here we include this ‘ground level’ explicitly to make some of the descriptions more concise.)

For a given element i ∈ {1, …, n}, exactly one of these two moves is possible (depending on whether or not it is the only occupant of its level), and the two moves are inverses of each other.

The polytope and its face lattice

For a fixed n, we define a polytope whose vertices correspond to the OPPs of {1, …, n} and edges join pairs of OPPs if they can be interchanged by moving an element according to the aforementioned rules. Each vertex is incident with n edges, because from each OPP, we can move any of the n elements in exactly one way.

What about the higher-dimensional faces? A k-dimensional face is given by taking an OPP, allowing some size-k subset of the elements to be ‘movable’, and the remaining elements to be ‘immovable’. For example, we might make {2,3,7} movable elements, and the reachable configurations define a 3-dimensional face. Here we show the same OPP as before, but with the movable elements coloured in red:

We have also placed a solid black ‘floor’ on any level that contains an immovable element, because it is impossible for any elements to ever pass through those floors whilst subject to the rules we’ve described.

Which OPPs are reachable from this configuration? The element ‘2’ can stay on the ground level or move up into its own new level, so it has two possible locations. There are six possible configurations for the elements {3, 7}, depending on their relative orderings (3 possibilities) and whether or not at least one of them shares a level with ‘5’ (2 possibilities). Together, that means that this three-dimensional face has 12 vertices (where, by convention, we have omitted the contents of the ground level):

  • [{3, 5, 7}, {1, 4, 8}]
  • [{3, 5}, {7}, {1, 4, 8}]
  • [{5}, {3}, {7}, {1, 4, 8}]
  • [{5}, {3, 7}, {1, 4, 8}]
  • [{5}, {7}, {3}, {1, 4, 8}]
  • [{5, 7}, {3}, {1, 4, 8}]
  • [{2}, {5, 7}, {3}, {1, 4, 8}]
  • [{2}, {5}, {7}, {3}, {1, 4, 8}]
  • [{2}, {5}, {3, 7}, {1, 4, 8}]
  • [{2}, {5}, {3}, {7}, {1, 4, 8}]
  • [{2}, {3, 5}, {7}, {1, 4, 8}]
  • [{2}, {3, 5, 7}, {1, 4, 8}]

We have enumerated them in the order of a Hamiltonian cycle, making it apparent that these are indeed all reachable and therefore form a face of our polytope.

Counting the faces

Note that any face has a unique ‘minimum-energy’ vertex: the ordered partial partition where we merge all movable elements down into the next level containing at least one immovable element. To count the faces in the whole polytope, it therefore suffices to count, for each ordered partial partition, the faces that have that particular OPP as their minimum-energy vertex.

A vertex is minimum-energy if and only if every non-ground level contains at least one immovable element. This means that for a particular ordered partial partition R, the set of movable elements must be chosen by taking some (possibly empty) proper subset of each non-ground level P ∈ R together with an arbitrary subset of the ground level. The generating function of faces with R as its minimum-energy vertex is therefore the polynomial:

(1 + x)^{n-m} \prod_{P \in R} \left( (1 + x)^{|P|} - x^{|P|} \right)

where the coefficient of x^k counts the number of k-dimensional faces with R as its minimum-energy vertex, and m is the number of non-ground elements in R.

To obtain the full face-generating function for the whole polytope, we just need to sum over all ordered partial partitions R. The number of such ordered partial partitions grows superexponentially, which is problematic. However, the polynomial doesn’t depend on the ordering of the parts or the specific choice of elements in each part, so we just need to sum over all partitions of each integer mn and multiply each polynomial that occurs by the number of ordered partial partitions corresponding to that particular integer partition.

Integer partitions grow subexponentially, so it is much more feasible to count them. Here is some Wolfram Mathematica code that uses this idea to compute the face-generating function for the whole polytope as a function of n:

It’s relatively quick to compute the face-generating function up to n = 30. Here are the values for n up to 8 dimensions:

The constant term counts the number of vertices, given by A000629 in the OEIS. The linear term counts the number of edges; it doesn’t appear to be in the OEIS, but by n-regularity, it’s just n/2 times the number of vertices.

The term in x^{n-1} counts the codimension-1 faces, or facets, of the polytope. This sequence is A215149 in the OEIS, and has closed form n(1 + 2^{n-1}). We can interpret this geometrically: for each standard basis vector e_i, there are 2^{n-1} facets with outward normal +e_i and one facet with outward normal -e_i; summing over all n of the standard basis vectors gives the result.

The polytopes start to get outrageously complicated for larger n. For example, when n = 30, there are more than 22 undecillion vertices, 342 undecillion edges, and 16 billion facets:

Having a convenient way to compute the k-dimensional faces of the n-dimensional polytope in this family means that this could be submitted to the Encyclopedia of Combinatorial Polytope Sequences, which catalogues objects such as simplices, hypercubes, permutohedra, and associahedra.

An infinite-dimensional polytope

As abstract polytopes, these are nested: every ordered partial partition of {1, …, n} is also an ordered partial partition of {1, …, n+1}. This means that we can take their union, and obtain an abstract infinite-dimensional polytope whose vertices are all finitely-supported ordered partial partitions of the positive integers.

Posted in Uncategorized | 2 Comments

Tensor rank paper

Robin Houston, Nathaniel Johnston, and I have established some new bounds on the tensor rank of the determinant over various fields. The paper is now available as an arXiv preprint and contains the following results:

  • A new formula for the determinant (Houston’s identity) which applies over arbitrary commutative rings and establishes an upper bound of the nth Bell number for the tensor rank of the n × n determinant. This is a superexponential improvement over the previous state-of-the-art, which was (5/6)^{\lfloor n/3 \rfloor} n!
  • Tighter upper bounds in fields of characteristic p using the same identity, including an upper bound of 2^n - n in characteristic 2.
  • Two completely independent proofs of Houston’s identity: a combinatorial proof (which is a more streamlined version of the proof from this cp4space post) and a geometric proof.
  • Mildly improved lower bounds on the tensor rank of the determinant (improving Derksen’s bound by a small additive term).
  • A computer-assisted proof that the tensor rank of the 4 × 4 determinant over the field of two elements is exactly 12, which consumed 357 core-hours of CPU time*.
  • Some geometrical motivation which relates the formula to an interesting tiling of axis-aligned polytopes, the 1-skeleton of which is interpretable as a flip graph for ordered partial partitions:

David Eppstein talks more about orthogonal polyhedra here.

*the same machine later found the Walrus, the first (and currently only known) example of an elementary c/8 diagonal spaceship in Conway’s Game of Life.

Posted in Uncategorized | Leave a comment

Matrix multiplication update

At the end of the recent post on a combinatorial proof of Houston’s identity, I ended with the following paragraph:

This may seem paradoxical, but there’s an analogous situation in fast matrix multiplication: the best known upper bound for the tensor rank of 4-by-4 matrix multiplication is 49, by applying two levels of Strassen’s algorithm, but there is a little-known method by Winograd for multiplying two 4-by-4 matrices over a commutative ring using only 48 multiplications.

DeepMind’s AlphaTensor (blog post, Nature article, source code) has since found many non-equivalent 49-multiplication algorithms for 4-by-4 matrix multiplication, as well as improving on the state-of-the-art for some other sizes of matrix multiplication tensor.

As far as I can tell, DeepMind’s new algorithms do not beat the matrix multiplication exponent implied by Smirnov’s incredible algorithm for computing a 3-by-3-by-6 rectangular matrix multiplication using only 40 multiplications:

  • Naive exponent: 3
  • Strassen exponent: log(7) / log(2) = 2.8074
  • Smirnov exponent: log(64000) / log(54) = 2.7743

There are also a few galactic algorithms (which are not used in practice because their advantage over Smirnov only applies to extremely large matrices) which improve the exponent considerably further:

  • Coppersmith-Winograd: 2.3755
  • Virginia Vassilevska Williams: 2.3729

Matrix multiplication in GL(4, 2)

DeepMind’s AlphaTensor also found a record-breaking 47-multiplication algorithm for 4-by-4 matrix multiplication over the finite field \mathbb{F}_2. One reason why 4-by-4 matrix multiplication over \mathbb{F}_2 in particular is interesting is that the group of invertible matrices, GL(4, 2), is isomorphic to the alternating group A_8.

To summarise, here are the various algorithms for 4-by-4 matrix multiplication over \mathbb{F}_2 (all except the DeepMind algorithm work over arbitrary rings):

  • Naive: 64 multiplications + 48 additions;
  • Strassen^1: 56 multiplications + 88 additions;
  • Strassen^2: 49 multiplications + 165 additions;
  • Winograd: 48 multiplications + 120 additions;
  • DeepMind: 47 multiplications + ???* additions;

where the DeepMind algorithm can be applied recursively (unlike Winograd) to yield better exponents for large matrix multiplication over this field. Note that multiplications and additions over \mathbb{F}_2 are individual AND and XOR gates, respectively.  The naive algorithm has the advantage that the circuit has a small number of gates (112) and very low depth (3).

*AlphaTensor finds low-tensor-rank decompositions for tensors, rather than arithmetic circuits, so there’s a subsequent optimisation problem to reduce the number of additions. For example, Strassen’s algorithm for 2-by-2 matrices originally cost 7 multiplications and 18 additions, but the number of additions was later reduced to 15.

Posted in Uncategorized | 2 Comments

Updates and errata

In the Treefoil article, I erroneously described John Rickard’s length-24 cycle in \mathbb{Z}^3 as being the ‘uniquely minimal’ example of a cycle whose three axis-parallel projections are all trees (see here for a more detailed history on this problem). Dan Simms has since found a second non-equivalent length-24 cycle with the same bounding box, namely {0, 1, 2}^3. Whereas Rickard’s cycle avoids the central point in that bounding box, Simms’ cycle passes through it.

Simon Tatham has exhaustively searched all such cycles in the larger bounding box {0, 1, 2, 3}^3. Up to symmetry, there are 187075 such cycles of lengths between 24 and 50, enumerated below, confirming that Rickard’s and Simms’ cycles are the only examples of length 24 in this bounding box:

{24: 2,
28: 44,
30: 20,
32: 733,
34: 1481,
36: 5705,
38: 24259,
40: 47875,
42: 57937,
44: 31899,
46: 13920,
48: 2289,
50: 911}

Some of these cycles are visualised on Simon’s webpage.

Optimum Boolean circuit update

In other news, I’ve been busy over the last few months (ergo the sparsity of cp4space articles) revisiting the 5-input 1-output Boolean circuits. In particular, I reused the methodology from the 4-input 2-output search, performing an inductive search (with n ranging up to 11) of all sets of functions obtained as an n-step Boolean chain from five inputs.

What made this possible was the observation that every Boolean function has an invariant — the ‘bias’, or absolute difference between the number of 0s and the number of 1s in the truth table — which is constant across the NPN equivalence class. As such, we can classify sets of functions by the multiset of biases, and use that to divide up our problem space into manageable chunks. This enabled the search to be performed in parallel, and under memory constraints (on a 96-thread 768 GB instance from Amazon Web Services), taking about 5 days of wall-clock time to complete.

This effort both independently checked Donald Knuth’s 2005 result about the number of equivalence classes of functions with each cost, but more helpfully provided all of the minimum-cost circuits for all functions with cost <= 11 (there’s a unique equivalence class of functions with cost 12, represented by the truth table 0x169ae443). It would have been prohibitively expensive to find all cost-12 circuits for 0x169ae443, so I settled on exhaustively searching the tree-shaped circuits for this truth table instead.

The total number of helpful circuits is significantly narrowed down by considering the delay of the circuit: for a fixed gate cost, the next most important utility function is how quickly the output of the circuit is available. Each circuit has a 5-tuple of delays (the length of the longest path from each of the 5 inputs to the output), and we restrict to those circuits which are on the Pareto frontier (i.e. not strictly dominated by another circuit). This enabled the vast majority of circuits to be discarded, leaving just 10.6 million circuits in total for the 616124 equivalence classes of nontrivial functions (the constant function and identity functions are considered trivial since they require 0 gates to implement).

After generating these circuits, they were arranged into a 210 MiB flat file for fast access: the first 8 MiB is a k-perfect hashtable (2^17 buckets of 64 bytes each, where each bucket contains up to 13 different equivalence classes of functions) mapping each function to its cost and an index into the remaining 202 MiB which contains the circuits themselves. I’ve also written a second independent program that reads this flat file and checks the validity of all of the 10.6 million circuits.

Posted in Uncategorized | 1 Comment

A combinatorial proof of Houston’s identity

Robin Houston recently discovered a rather interesting formula for the determinant of an n-by-n matrix. In particular, the formula improves upon the best known upper bound for the tensor rank of the determinant (viewed as a multilinear map which takes n vectors of dimension n — the rows of the matrix — and outputs the determinant).

Houston also provided a Mathematica notebook which allows you to verify it for a given finite n (fast for n ≤ 6; very slow for larger n). The linked example shows n = 5, for which there are 52 nonzero terms as opposed to the 120 in the standard Laplace expansion of the determinant.

The sum ranges over all partial equivalence relations (PERs) on the set of n elements. The sign of a partial equivalence relation, sgn(~), is the product, over all equivalence classes S, of (−1)^(|S|+1). The size of a partial equivalence relation, |[∼]|, is simply the number of equivalence classes.

The partial equivalence classes containing singleton classes each contribute 0 to the sum, so these terms can be neglected; the remaining terms are in bijective correspondence with the set of all equivalence classes, in other words the nth Bell number. As each term is a product, over all rows, of linear functions of each row, this establishes that the tensor rank of the determinant is upper-bounded by the nth Bell number.

Houston discovered this identity by thinking geometrically in 3 dimensions and then generalising the result; we shall present a combinatorial proof of the same identity. In particular, we show that the expanded formula simplifies to the usual signed sum over all permutations, and do so without involving any geometrical or linear-algebraic properties of the determinant.

Part I: multiplicity of non-permutations

Observe that if you fully expand Houston’s identity, the resulting monomials are of the form a_{1, f(1)} a_{2, f(2)} \cdots a_{n, f(n)} where f is some function from {1, 2, …, n} to itself.

Also, from the definition, these functions have the property that if x is a fixed point of f, then there are no other elements y such that f(y) = x. This is a strictly weaker condition than injectivity, so some of these monomials that appear in the expansion do not belong in that of the determinant; we shall therefore need to show that these monomials appear with a coefficient of 0.

Given such a function f, which terms of Houston’s identity contain the corresponding monomial when expanded out? Let’s visualise f as a directed graph on the vertex-set {1, 2, …, n} where each vertex has exactly one outgoing edge and introduce the following terminology:

  • Fixed point: a vertex x with f(x) = x (i.e. a self-loop);
  • Leaf: a vertex x such that there are no incoming edges, i.e. no y such that f(y) = x;
  • Nonleaf: any vertex that is neither a leaf nor a fixed point.

Given such an f, we can characterise precisely the partial equivalence relations ~ which, when viewed as terms in Houston’s identity, give rise to the monomial corresponding to f. In particular, they are those partial equivalence relations ~ which satisfy the following properties:

  • If x is a fixed point, then it does not appear in any equivalence class.
  • If x and y are nonleaves belonging to the same connected component of the graph, then x ~ y.
  • If x is a leaf, then either x ~ f(x) or x does not appear in any equivalence class.

Note that any connected component (other than a self-loop) must contain at least two nonleaves: if x is any vertex in the component, then it follows from the constraints on f that f(x) and f(f(x)) are two distinct nonleaves. Consequently, we can describe a compatible PER with an ordered pair of:

  • An equivalence relation R on the set of nontrivial connected components of the graph corresponding to f;
  • A (boolean-valued) indicator function ι on the set of leaves of f which specifies which leaves belong to equivalence classes in the PER.

The first term here determines the size of ~ (it’s the same as the size of R). Fixing such an equivalence relation R, the sign of ~ depends on the parity of the number of leaves l such that ι(l) = 1. In other words, if there are any leaves at all, then we have an equal number of positive and negative terms of each size, so they cancel out perfectly.

Part II: multiplicity of permutations

As such, we’ve established that the only monomials that appear with nonzero coefficients are indeed the ones corresponding to permutations! It remains to show that the coefficients are correct, but it means that the analysis is much simpler because we can henceforth assume that f is a permutation. There are no leaves at all, and the nontrivial connected components are cycles.

Letting C be the set of nontrivial cycles, recall that we have a PER ~ corresponding to each equivalence relation R on C. The corresponding term in Houston’s identity has a coefficient of:

|[\sim]|! \textrm{ sgn}(\sim) = |[R]|! \textrm{ sgn}(R) \textrm{ sgn}(f)

where sgn(f) is the sign of the permutation f. Summing over all such R, we get that the overall coefficient of the monomial corresponding to f is:

\textrm{ sgn}(f) \sum_R |[R]|! \textrm{ sgn}(R)

We want to show that this simplifies to sgn(f). We can rewrite it using Stirling numbers of the second kind:

\textrm{ sgn}(f) \sum_{k=1}^n k! (-1)^{n-k} S(n, k)

This sum is the alternating sum of the number of facets of each dimension in a solid permutohedron, so is equal to its Euler characteristic, which is 1 by contractibility. (There’s probably a more direct way to show this using inclusion-exclusion.) As such, it does indeed simplify to sgn(f), thereby establishing the validity of Houston’s identity.

Asymptotics and further discussion

As discussed, Houston’s identity establishes an upper bound of B_n (the nth Bell number) for the tensor rank of the determinant. This asymptotically saves a multiplicative factor of:

\sqrt{2 \pi n} (\log n)^n

over the Laplace expansion, which is a substantial (superexponential!) improvement. The previous state-of-the-art appears to be:

(\sqrt[3]{1.2})^n

which is merely an exponential improvement over the Laplace expansion.

For practical calculation of large determinants over a field, it is far more efficient (polynomial time instead of superexponential time) to perform Gaussian elimination to reduce the matrix into an upper-triangular form and then simply take the product along the diagonal, so these asymptotics are less interesting in practice.

However, there still may be practical value in using this algorithm for small determinants, especially in applications where a branch-free algorithm is desired and multiplications are expensive. For example, it gives an 8-multiplication formula for a 3-by-3 determinant, instead of the 9 from the Laplace expansion, although the same author later discovered a simpler 8-multiplication formula which uses fewer additions/subtractions.

Can a formula be found using fewer than 8 multiplications? Even though the tensor rank for the 3-by-3 determinant is known to be 5, which implies that any ‘multilinear’ formula costs at least 8 multiplications, there may be a nonlinear formula which accomplishes the task in fewer multiplications.

This may seem paradoxical, but there’s an analogous situation in fast matrix multiplication: the best known upper bound for the tensor rank of 4-by-4 matrix multiplication is 49, by applying two levels of Strassen’s algorithm, but there is a little-known method by Winograd for multiplying two 4-by-4 matrices over a commutative ring using only 48 multiplications.

Posted in Uncategorized | 2 Comments