Complex Projective 4-Space

Schanuel’s conjecture and the semantics of FPSan

apgoucher — Sun, 03 May 2026 15:19:30 +0000

I’ve been spending some of my time recently developing a tool called FPSan in collaboration with Pawel Szczerbuk. It’s implemented as a Triton compiler pass, but has none of the desirable properties expected of a compiler pass: in particular, it doesn’t preserve functionality, it makes things slower, and it’s hitherto completely undocumented. (On the latter point, Pawel has an open PR adding documentation.)

Its purpose is to make it easier to verify algebraic equivalence of programs written in Triton that involve floating-point arithmetic. The key problem is that, in floating-point arithmetic, algebraic laws such as associativity do not hold exactly: in general, (a + b) + c need not equal a + (b + c). As such, if you rewrite a program to take advantage of this, e.g. to replace a sequential summation loop with a parallel tree-shaped reduction, the program will no longer behave completely identically.

FPSan can be viewed as an idempotent function on the space of programs that replaces all floating-point operations with (completely different!) integer operations, such that if f and g are algebraically equivalent programs then FPSan(f) and FPSan(g) produce identical results when given identical inputs.

More formally, conditional on the real version of Schanuel’s conjecture, this holds provided that the programs f and g have the following properties:

each program implements an arithmetic circuit on its floating-point inputs, and the control flow is independent of those floating-point inputs;
the arithmetic circuit only consists of inputs, outputs, the constants {-1.0, 0.0, +1.0}, the ring operations {−, +, ×}, and the exponential function exp.

These operations may seem somewhat restrictive, but it already encompasses a vast range of the more common GPU kernels involved in machine learning: matrix multiplications and [the bulk of] self-attention are covered by FPSan’s guarantees.

The proof is deferred to the end of this article to avoid derailing the discussion. This is quite possibly the only compiler sanitiser whose correctness depends on an extremely difficult unsolved problem in transcendental number theory.

Implementation

Specifically, FPSan constructs a bijective ’embedding function’ φ from the set of IEEE-754 single-precision floats (there are 2^32 of them) to the ring of integers modulo 2^32. The function φ is implemented as follows:

encode the float as a 32-bit word using the IEEE-754 encoding;
the uppermost bit (sign bit) is preserved;
for the remaining 31 bits, we apply a mod-2^31 multiplication by an odd constant, then a xorshift, then another mod-2^31 multiplication by an odd constant, and finally (if the sign bit was set) take the two’s complement;
interpret the 32-bit word as an integer modulo 2^32.

It’s designed to mix the bits reasonably well whilst having the properties that φ(−x) = −φ(x) for all nonzero x, φ(0.0) = 0, and φ(1.0) = 1. The ‘negative zero’ float gets mapped to 2^31, which is the other additively self-inverse element of the ring of integers modulo 2^32.

With this function, FPSan replaces:

floating-point addition fadd(x, y) with φ^-1(φ(x) + φ(y));
floating-point subtraction fsub(x, y) with φ^-1(φ(x) − φ(y));
floating-point multiplication fmul(x, y) with φ^-1(φ(x) × φ(y));
floating-point exponentiation exp(x) with φ^-1(C^φ(x)) where C is a particular constant that’s congruent to 5 (mod 8).

The last of these definitions makes use of the structure of the multiplicative group of integers modulo 2^32. In particular:

only the 2^31 odd elements, those that are 1 (mod 2), are invertible and therefore belong to the multiplicative group;
of those, the 2^30 elements that are 1 (mod 4) form a cyclic group under multiplication;
of those, the 2^29 elements that are 5 (mod 8) are generators, or equivalently have maximal period, which is why we choose C to be 5 (mod 8).

The map x → C^x is well-defined mod 2^32, because C^(2^32) = 1 (mod 2^32), so C^x (mod 2^32) depends only on the value of x mod 2^32. This works only because our modulus is a power of two; for an arbitrary modulus n, the exponent of the multiplicative group does not in general divide n.

The rewritten versions of fadd, fsub, fmul, and exp evidently obey all of the ring axioms, the identity exp(fadd(x, y)) = fmul(exp(x), exp(y)), and the relation exp(0.0) = 1.0.

Mixed-precision functionality

FPSan constructs an analogue of the embedding function φ for arbitrary floating-point datatypes, mapping into an integer ring of the same cardinality. To downcast from j-bit to k-bit precision, we embed our high-precision j-bit float into the ring of integers mod 2^j, then take the image mod 2^k, then unembed as a low-precision k-bit float. Upcasting is the reverse, where we choose the “sign-extending” lift from the integers mod 2^k to the integers mod 2^j; this in particular means that the constants {-1, 0, 1} survive arbitrary casting between different precisions. An upcast followed by a downcast induces the identity map; the reverse is not true because downcasting necessarily destroys information.

Constructing the multipliers in the embedding function and its inverse requires being able to compute efficient inverses modulo 2^k; we do this using ceil(log2(k)) iterations of the 2-adic Newton’s method.

Pawel wrote the rules for converting Triton’s mixed-precision matrix multiplication primitive, tl.dot, into the FPSan equivalent by expanding it into floating-point scalar operations. The mixing functions φ and φ^-1 only need to be applied to each input and output element, with the core of the matrix multiplication only involving int32 multiplication and addition.

The proof

Now for the fun part: the proof that Schanuel implies the desired properties of FPSan.

Suppose that we have two arithmetic circuits f and g over the reals, each consisting only of inputs, outputs, the constants {-1, 0, +1}, the ring operations {−, +, ×}, and the exponential function exp. Suppose moreover that they’re equivalent in that they implement the same function from to .

Assuming the real version of Schanuel’s conjecture, we have that the subring X of R generated by {0, −, +, ×, exp} is isomorphic to the free exponential ring on no generators, as proved in Macintyre 1991. If the circuits are equivalent over R, then they’re necessarily equivalent when we restrict to X, and by the isomorphism that means that they’re equivalent in the free exponential ring on no generators.

The ring of integers mod 2^32 together with the unary function C^x (where C is a particular constant that’s congruent to 5 (mod 8)) is a quotient of the free exponential ring on no generators; in particular, we can construct a surjective homomorphism θ from the free exponential ring on no generators to the ring of integers mod 2^32 by setting θ(exp(x)) = C^θ(x).

It follows, therefore, that the circuits remain equivalent under FPSan, as that just endows the floats with the structure of an exponential ring by pulling back the operations through the embedding function φ.

Sine and cosine

We also implement analogues of sine and cosine over the 2-adic integers by taking the real and imaginary parts of (−3/5 + 4/5 i)^n in the quadratic extension obtained by adjoining a formal symbol i satisfying i^2 = −1. These satisfy the trigonometric angle sum and difference identities, along with the usual norm identity sin(x)^2 + cos(x)^2 = 1.

The result (that any valid algebraic identity involving {0, 1, −, +, ×, exp} that holds over the reals also holds under FPSan, assuming Schanuel’s conjecture) can be strengthened: it remains true when sin and cos are included.

We’ll define the following two sequences of rings:

an ascending chain of subrings of the reals, {Y_0, Y_1, Y_2, …}, where Y_0 = Z and Y_{n+1} is the ring generated by adjoining Y_n with exp(x), sin(x), and cos(x) for all x in Y_n;
an ascending chain of subrings of the complex numbers, {W_0, W_1, W_2, …}, where W_0 = Z[i] and W_{n+1} is the ring generated by adjoining W_n with exp(x) for all x in W_n.

We can show by induction on n that W_n is exactly Y_n[i] (and in particular Y_n is the real part of W_n). In particular, assuming this holds for n−1, we have:

if x is in Y_{n−1}, then exp(x), cos(x) = (exp(ix) + exp(-ix))/2, and sin(x) = (exp(ix) – exp(-ix))/(2i) are all clearly in W_n;
if z is in W_{n−1}, so its real and imaginary parts a and b are in Y_{n−1}, then the real and imaginary parts of exp(z) are exp(a) cos(b) and exp(a) sin(b) which reside in Y_n.

Defining Y to be the union of Y_n for all n, and similarly defining W to be the union of W_n for all n, we have W = Y[i].

We can now repeat Macintyre’s proof idea but on the sequence of rings W_n: assuming the complex Schanuel’s conjecture, W is the free exponential ring generated by an i satisfying i^2 = −1. Any algebraic relation in Y involving {0, 1, −, +, ×, exp, cos, sin} can be converted to an algebraic relation in W involving {0, 1, i, −, +, ×, exp}, and must hold in any exponential ring where i satisfies i^2 = −1. The result follows.

Charles Corderman’s computer

apgoucher — Tue, 18 Mar 2025 22:18:56 +0000

This is an atypical post, being chiefly about the history of a rather obscure computer that was built in 1960 out of repurposed PDP parts, but it needs to be written somewhere lest it be forgotten. It has been referred to in different places as a ‘PDP-3’, ‘PDP-2½’, or by the name ‘CASINO’; based on the sources that I’ve read, these names all refer to the same unique computer designed by Charles L. Corderman.

It was the computer with which Corderman discovered the switch engine in 1971, the basis of all infinite-growth patterns that have naturally arisen from random soups in Conway’s Game of Life (as opposed to being deliberately engineered, such as Gosper’s glider gun). This discovery was first published in Volume 4 of Bob Wainwright’s LIFELINE newsletter, reproduced below:

Rich Schroeppel posted a terse note to a mailing list in July 1992 revealing more information about how Corderman discovered the switch engine, namely that it was the result of systematically running polyominos on an unusual computer.

[Corderman] had an oddball computer
configuration at a medium sized company in the
Boston suburbs. He was systematically tracking
all the polyominos, and eventually got up to
the Corderman-omino. He saw the puffer, and
word eventually reached us at MIT. I don’t
remember who found the blinker that stabilizes
the puffer, or if it was Corderman or an MIT
person.

The unique smallest polyomino which evolves into the switch engine has 9 cells, so this is most probably (but not certainly) the ‘Corderman-omino’ to which Schroeppel refers:

Much more interesting is the ‘oddball computer’ with which Corderman performed this search. Bill Gosper replied to Schroeppel’s post, detailing much more of its history, including the fact that Corderman designed and built the machine with two collaborators:

His computer was an absolutely unique (37 bits, no parity) large mainframe built and programmed by only three people, two of whom were gone when we met him. The design was begun by DEC, intended to be their PDP-2 or PDP-3, challenging IBM’s 709.

G.D. Searle (pharmaceuticals) of Waltham staked their data processing hopes on DEC’s delivery of the machine, and when DEC abandoned it, Searle bought the plans and the parts, and Corderman and his pals completed (and radically eccentrized) it.

The programming environment was *entirely* menu-driven via a light pen–an incredibly awkward pointing device consisting of little but a phototransistor that the mainframe had to query after displaying each dot. (This was late 1971 or so–years before mice and menus, except maybe Engelbart’s.)

No one had a memory big enough, nor a display fast enough for a bitmap raster.

During the demo, the Flexowriter (machine’s only keyboard) remained powered off, except briefly when he light penned it into printing a lowercase “r”, just to prove it worked.

Zero input keystrokes; one output keystroke. He had menus all the way down to twiddling individual memory bits. The screen phosphor, by the way, was magenta.

I believe word of his “switch engine” was brought to us
personally by Wainwright, who entrained us for the Corderman segment of his New England Life tour.

I believe Corderman said he was exhausting the decominoes when he noticed several iterations of an almost viable switch engine. (His program “chased” live nongliders.) He then cultivated the sprout until he found the two variants. (The omino does not recur in the period, and may need to be rediscovered.)

Back in May 2023 I tried to see if there was any record online of the existence of this marvellous machine. Eventually I found a 1997 Usenet post by Max Ben-Aaron (who worked at the aforementioned company) which corroborates Gosper’s story and gives a name to this computer — ‘Casino’:

In the late 60’s & early 70’s I worked for a company (Medidata, later Searle Medidata) which started life as a not-for-profit spin-off from Lincoln Lab. (as I have heard), called American Science Institute. The chief engineer, Ed Rawson was a friend of Dec’s Olsen and he managed to get hold of the modules used for the prototype PDP-2 which never reached the market. ASI used them to build their own machine (designed, I believe, by Chuck Corderman) which they called “Casino” and was sometimes jocularly referred to as a PDP-2 1/2. Casino was noteworthy for having, very early in trhe [sic.] game, graphics capabilities. It also had some special terminals which had labels that cannot be
repeated on this (family) newsgroup.

‘Chuck’ is an American nickname for ‘Charles’ so the story checks out.

I did some digging, and almost certainly ‘American Science Institute’ is a misremembering of the Scientific Engineering Institute of Waltham, MA, which (according to a book by Gordon Bell) built a PDP-3 in 1960 which later ended up in a museum in Oregon in 1974. I also found a letter referring to ‘CASINO’, this time in block capitals:

There’s a publication in a medical journal by Edward B. Rawson (the person who wrote that letter) published from the SEI in 1968, and soon after in that same year it had rebranded to Searle Medidata (now a for-profit company) and we see patent applications by Rawson. Crucially, the SEI and Searle Medidata had exactly the same postal address (140 Fourth Avenue, Waltham, MA) and both involved Rawson, so we can be pretty sure that the SEI became Searle Medidata. This also lines up precisely with Schroeppel’s description of a “medium sized company in the Boston suburbs”.

Gordon Bell, who wrote the remark about the PDP-3, was also privy to the various letters in that paper trail, and involved in establishing the DEC museum. So almost certainly the PDP-3 described by Bell is actually referring to CASINO itself, which would date its construction right back to 1960 (understandable, since that’s when the PDP-3 was going to be built anyway), coinciding with the lower bound of the 1960-1971 range. That would actually make a huge amount of sense: it would have been obsoleted by subsequent DEC machines by the late 1960s, so CASINO was no longer needed by Searle Medidata, and likely its creator (Corderman) didn’t want to see the machine go to waste so decided to repurpose it for running polyominoes in CGoL (because why not?). That would also explain why the other two coinventors of CASINO weren’t around at the time Gosper saw it: it was already eleven years old!

When I decided to search for more about the PDP-3 today, I saw that Lars Brinkhoff and ‘jnc’ had recently assembled a wiki page on CASINO. The authors had found the same online references that I did, along with another reference that I’d missed at the time: a 2009 book by Paul A. Suhler on the design of the Lockheed Blackbird. Conversely, the authors of that page are seemingly unaware that this was also the machine with which Corderman discovered the switch engine (unsurprisingly, as that can only be deduced from a 1992 e-mail that Bill Gosper posted in a private mailing list about cellular automata).

Brinkhoff later edited the page to explain the name CASINO — apparently an acronym for ‘Computer Able to select INternal Orders‘ — but no further references were included so I’ll have to ask Brinkhoff directly as to how he determined this information.

The Wikipedia article on the PDP series is consistent with the machine having been originally built for military aviation, mentioning that “the only PDP-3 was built from DEC modules by the CIA’s Scientific Engineering Institute (SEI) in Waltham, Massachusetts to process radar cross section data for the Lockheed A-12 reconnaissance aircraft* in 1960” and reports the word size as being 36 bits. It seems that 36 bits was the word size intended by DEC for their PDP-3 range (which was never built by DEC), but Corderman’s actual physical computer had (depending on which source you believe) 37- or 38-bit words.

*the same A-12 after which Elon Musk and Grimes named one of their children

A Google search for “scientific engineering institute” “waltham” also found a blog post written in January 2025 on the history of the PDP machines, including an excerpt about CASINO from Suhler’s book. This mentions that Corderman’s two collaborators were Jay Lawson and the aforementioned Edward B. Rawson (chief engineer at SEI), and that indeed the machine had significantly diverged from DEC’s original PDP-3 architecture:

The project was run like a homebrew computer project, with more emphasis on getting the machine and software to run rather than on making it well documented and easy to use. The design evolved so rapidly that when one of the engineers returned after a two-week absence, he didn’t recognize it. The design evolved away from the original PDP-3 architecture, and it came to be called CASINO.

Given that CASINO is rumoured to have unfortunately been destroyed, it is unlikely that we will ever know the full details of this remarkable machine…

The minimal infinite threeld

apgoucher — Sun, 07 Jul 2024 21:52:49 +0000

In the post on threelds, we investigated under what conditions the additive group of one field (the ‘inner field’) could be isomorphic to the multiplicative group of another field (the ‘outer field’). To summarise, this happens in the following cases:

the outer field is F_3 and the inner field is F_2;
the outer field is F_(2^p) and the inner field is F_(2^p – 1), where 2^p – 1 is a Mersenne prime;
the outer field has characteristic 2 and every element has a unique nth root for every positive integer n.

This last case, the infinite case, is worth analysing further. We proved that there is at least one example of every infinite cardinality, by appealing to Löwenheim-Skolem, but this was rather non-constructive. In this post, we instead describe an explicit construction of such a field, which is ‘minimal’ in the sense that it embeds into every infinite field with this property.

Firstly, observe that in a field of finite characteristic p, every element that is algebraic (i.e. is the solution to some polynomial with coefficients in the ring generated by 1) is an element of some finite subfield, and is therefore a root of unity. As such, in our putative field of characteristic 2 where every element has a unique nth root for all n, there can be no such elements as then 1 would have too many roots. In other words, every element other than 0 or 1 is transcendental.

If we took such a transcendental element X, then we could construct the field ‘generated’ by X under the field operations together with taking unique nth roots. This can be done completely explicitly:

enumerate all expressions (binary trees where each leaf is labelled with an element of the countable set {1, X_0, X_1, X_2, X_3, …} and each nonleaf node is labelled with an element of {+, −, ×, /});
define X_0 to be X;
for each n > 0, define X_n to be a pth root of the first well-formed* expression (in the above enumeration) that only mentions variables with indices less than n and doesn’t already have a pth root, where p is the (m+1)th prime, where m is the 2-adic valuation of n.

*we include the constraint that no subtree evaluates to 0, thereby avoiding any division-by-zero issues.

Then we obtain a tower of fields F_2(X_0) ⊆ F_2(X_0, X_1) ⊆ F_2(X_0, X_1, X_2) ⊆ … where each field is a finite-degree extension of the previous field. The union of these fields then has the desired property that the multiplicative group forms a vector space over Q.

Each of the extensions that we take is a degree-p extension obtained by adjoining a pth root of some element. Up to isomorphism, there is only one way to do this, so by induction there is a unique field ‘generated’ by a transcendental element X. This must therefore be a subfield of the outer field of every infinite threeld, as claimed, as every such field has a transcendental element.

Note that this vector space has countably infinite dimension: the irreducible polynomials in F_2[X] form a countably infinite set of linearly independent elements in the multiplicative group (if they didn’t, this would provide a counterexample to unique factorisation in F_2[X]). This answers in the negative Thomas Blok’s question as to whether an infinite threeld can be finite-dimensional.

Every finite phoenix has period 2

apgoucher — Sat, 20 Jan 2024 16:14:50 +0000

A phoenix is an oscillator in Conway’s Life where every cell dies in every generation. The smallest example is Phoenix 1, which oscillates with period 2 and has a constant population of 12:

All known finite phoenices have period 2, and Stephen Silver proved in 2000 that there cannot exist a finite phoenix of period 3. Alex Greason more recently proved the non-existence of any phoenix (finite or infinite) with period 3 or 5.

Infinite phoenix agars (patterns that are periodic in two directions, filling the whole plane) and wicks (patterns that are periodic in one direction) are known for certain larger periods; the forum user wwei23 recently showed the existence of phoenix wicks of all periods divisible by 6:

Construction by wwei23 showing the existence of phoenix wicks of all periods of the form 6n

It seemed as though a finite period-4 phoenix may have been possible, as Keith Amling found period-4 wicks consisting of a narrow flexible rope supported by finite period-2 supports:

Keith Amling’s flexible period-4 wick

In particular, if it were possible to bend this around somehow into a closed loop, then we would have a finite period-4 oscillator. After trying in vain for a long time to find one, it became increasingly plausible that no such oscillator exists. Eventually it was possible to prove the non-existence of finite phoenices of periods between 3 and 69, and eventually prove the non-existence of finite phoenices of any period other than 2. The proof is computer-assisted, making use of SAT solvers to automate finite case-bashes (much like Alex Greason’s disproof of p3 and p5 phoenices), but the overall structure of the proof is quite simple and human-comprehensible.

Preliminaries

Before we begin the main proof, we will establish facts about finite phoenices which help to accelerate the SAT solver by reducing the search space that it needs to explore. These facts will also be proved with the help of a SAT solver, but obviously for the avoidance of circularity we cannot assume these facts until they have been proved. In particular, the overall structure of the proof will look like:

A: only 11 of the 16 possible 2 × 2 squares can occur in a finite phoenix;
B: only 99 of the 512 possible 3 × 3 squares can occur in a finite phoenix;
C: every finite phoenix has period 2;

where we assume A when proving B, and assume B when proving C.

Suppose that we have a finite phoenix oscillator. We say that a 2 × 2 square of cells is heavy if there is some time T where at least three of the four cells in that square are live. We show that no heavy squares can exist by the following argument:

Suppose there exists such a heavy square [x, x+1] × [y, y+1];
With out loss of generality, suppose that (x, y) is maximal, with respect to the lexicographic ordering on Z^2, over all such heavy squares (we can do this because the oscillator is finite by assumption);
Let T be a generation for which the heavy square contains at least three live cells, and then consider the 16 × 16 × 8 box of cells (the 16 × 16 neighbourhood centred on the heavy square, from time T − 5 to T + 2);
Create a Boolean variable for each cell together with constraints specifying that the Life rules are followed, that every live cell dies in every generation, and that there is no heavy square [u, u+1] × [v, v+1] such that (u, v) is lexicographically greater than (x, y);
Feed the resulting constraints into a SAT solver and derive a contradiction.

Now that we have established that no heavy squares can exist, we can feed this in as an additional constraint to speed up subsequent SAT problems. The next thing that we do is determine the set of possible 3 × 3 neighbourhoods that can occur at some generation in a finite phoenix: we find that of the 512 possible neighbourhoods (102 up to symmetry), only 99 of these (22 up to symmetry) can occur; they were originally tabulated by forum user wwei23 here. (Apparently wwei23 was able to establish this for all phoenices, finite or infinite, with the sole exception of the Venetian blinds agar.)

These neighbourhood constraints can again be injected into any SAT problems to accelerate the search. Specifically, we combine the Life rules with these neighbourhood constraints to obtain, for every 10 cells consisting of a 3×3 neighbourhood at time t together with the central cell at time t + 1, a proposition in these 10 Boolean variables whose 1024-element truth table consists of 99 true values and 925 false values. We encode this proposition by specifying all minimal clauses that are implied by this proposition, the set of which can be determined by an algorithm by Eugenio Morreale described in the solutions to exercises 29 and 30 from 7.1.1 of Knuth’s TAOCP.

The code for proving these preliminary lemmata is here.

The main proof

Suppose that we have a finite phoenix of period greater than 2. We define the extremal cell to be the cell (x, y) with the following properties:

The cell (x, y) oscillates with period greater than 2;
The value x + y is maximum amongst all cells with this property;
The value y is maximum amongst all cells with these properties.

By definition, any other cell (u, v) with u + v > x + y or with u + v = x + y and v > y will necessarily be either constantly off or oscillate with period 2.

In the diagram above, we show the extremal cell in deep purple and the surrounding 29 × 29 neighbourhood. The forced-vacuum-or-p2 cells are shown in green; the cells that are allowed to be higher period are shown in purple. We also highlight a 39-cell patch centred on the extremal cell; this will be important later in the proof.

As the central cell oscillates with period greater than 2, there must be a time T for which the central cell is off at time T and on at time T+2. We now consider the 29 × 29 × 32 box of cells (the 29 × 29 neighbourhood from time T − 17 to T + 14) and create a Boolean variable for each cell; to these 26912 variables we introduce the following constraints:

every 3 × 3 neighbourhood is one of those that can appear in a phoenix, and the Life rules are obeyed;
if (u, v) is in the green region, then the variables (u, v, t) and (u, v, t+2) are equal;
the central cell (x, y, T) is off and (x, y, T+2) is on.

We then run an incremental SAT solver (Armin Biere’s CaDiCaL) on this problem to find all possible values of the 39-cell patch at time T. It transpires that there are 20 such possibilities for the patch:

For each of these patches, we create a new SAT problem, again on a 29 × 29 × 32 box of cells (albeit shifted 2 generations backwards in time), and search for all patches that can occur 2 generations before each of these patches. Ignoring the patches above, there are a further 20 distinct patches that can occur:

We can keep repeating this process, forming a breadth-first traversal of the directed graph of possible 39-cell patches that can occur at times T − 2k in the ancestry of the central cell. We find 46 more patches in the next layer, then 35 in the next, then 5, 6, 20, 28, 13, 4, and finally 0. At this point, we have traversed all of the possible 39-cell patches (197 in total).

For all of these 197 possible patches, the central cell is off, which means that there’s no time T − 2k which matches the state at time T, so we cannot have a periodic phoenix. (In particular, if we had a phoenix of period p, then it must be identical in generations T − 2p and T, thereby obtaining a contradiction.)

It therefore follows that every finite phoenix has period 2.

When I ran the program to enumerate the possible patches, I also tracked the full directed graph that contains a vertex for each of the 197 patches and a directed edge whenever a patch can be the grandparent of another patch. The reason for doing this is that I didn’t know a priori that all of the patches would have central cell off, leading to an easy proof; instead, I thought that it might be the case that some had central cell on, but that there’s no directed cycle containing such a vertex (a weaker condition, but still sufficient for the proof).

The graph has 197 vertices, 1017 edges, and 10 connected components, but there’s no obvious structure beyond that:

Corollaries

In any case, the fact that every finite phoenix has period 2 leads to the corollary that every phoenix has constant population: if it’s infinite, then the population is ℵ_0; otherwise, the oscillator is finite (therefore p2) and we can consider the (disjoint) sets A and B, where A is the set of cells alive at even generations and B is the set of cells alive at odd generations. Every cell in A has exactly 3 neighbours in B, and every cell in B has exactly 3 neighbours in A, so the sets have equal size.

(The p2 part of this proof was done much earlier by the forum user Praosylen who looked at the 3-regular bipartite graph with vertex-classes A and B and an edge whenever two cells are adjacent, and proved other properties such as bridgelessness. All of this research now applies to all finite phoenices, since non-p2 finite phoenices have been ruled out.)

Miscellaneous discoveries

apgoucher — Sun, 23 Jul 2023 22:44:07 +0000

Soon after the previous post announcing the discovery of an aperiodic monotile by Smith, Myers, Kaplan, and Goodman-Strauss, the same authors published a second aperiodic monotile which has the property that all of the tiles are of the same orientation: reflections are not needed.

This new tile, dubbed Spectre, is Tile(1,1) from the previous paper but with perturbations on the edges to enforce that all tiles are similarly oriented. The authors provide a curvilinear realisation, but there is also an equilateral polygonal realisation with 28 edges (two of which are parallel, so this is equivalently a 27-gon where one edge is twice as long as the remaining 26 edges):

Unlike their previous aperiodic monotile, the ‘hat’, this monotile cannot be described as the union of tiles of an Archimedean dual tiling. It does, however, admit the above description where every vertex lies in the ring of integers of the 12th cyclotomic field.

Omniperiodicity

In other news, Conway’s Game of Life has now been proved omniperiodic (for every positive integer n, there exists an oscillator of period n), with the recent discoveries this month of the remaining two unsolved periods, 19 (by Mitchell Riley) and 41 (by Nico Brown).

The proof of omniperiodicity involves witnesses for every period up to 42, beyond which adjustable glider loops handle every period from 43 upwards. These witnesses are listed on the status page.

Tetrational diehard progress

In the same cellular automaton, there has been a significant improvement in terms of engineering a ‘diehard’ (pattern that eventually fully disappears after a long number of generations) in a bouncing rectangle of area < 10000. Last year, we mentioned that Pavel Grankovskiy had engineered a diehard with a lifespan exceeding:

10^10^10^10^10^10^10^10^10

Note that this is a tetrational (iterated exponential) tower of height 9. In the last few days, various authors have increased the height of this tower considerably: Pavel Grankovskiy and a pseudonymous forum poster (toroidalet) improved the height to 15 before, in the space of a few hours, it explosively increased to 25, then 35, then 310, then 320, then 363, then 13131937954518, and now the record stands at 1.1038 × 10^1046.

The record-holder is really rather complicated and fits in a 109 × 91 box:

The pattern has been colour-coded vaguely in terms of function. The dense blobs of ‘spacedust’ are the results of SAT-solver-based predecessor searches to squeeze the pattern into this box. After running it for 120 generations, the pattern is less inscrutable:

Travelling to the southwest is a c/5 diagonal spaceship (in white) flanked by two c/4 diagonal ‘boatstretchers’ (one in mint green, and one in duck-egg blue). Most of the junk inside the body of the configuration exists as a delay mechanism to prolong the burning of the duck-egg blue fuse for as long as possible (more than eleven thousand generations). At generation 11880, we see that the fuse has commenced burning to leave a trail of loaves:

The fuse burns more quickly than the boatstretcher can extend it, and by generation 16000, it has completely burned to leave behind a trail of 478 loaves. Pairs of gliders produced by the body of the configuration head towards the receding c/5 diagonal spaceship:

When they reach the c/5 diagonal spaceship, a block is produced, and the pairs of gliders proceed to pull the block backwards, one cell at a time, until it collides with the southwesternmost loaf, cleanly annihilating it. If the distance between the spaceship and the loaf is N, then it takes 116N generations to pull the block back to the loaf (the gun is period-120, but the Doppler effect means the block is pulled with period 116), and then another 484N generations for the glider stream to catch up with the spaceship. As such, the spaceship has receded by a distance of 600N/5 = 120N, so it is 121 times further away than it was before.

This process repeats, with each of the 478 loaves multiplying the delay by a factor of 121. A few more factors of 121 are consumed by the lemon-yellow junk in the body of the mechanism, at which point the mint-green fuse is ignited.

This second fuse burns much like the first fuse, except now the scale is much larger: instead of producing 478 loaves, it produces 1.1038 × 10^1046 loaves. More precisely, the number of loaves is expressible using the following Python expression:

int('306'+'377400'*162+'38'+'78'*13+'56',11)

Here is the northeast corner of the pattern after the fuse has been ignited, simulated using a hashlife implementation. It took approximately 16 minutes to run the pattern for the 2^3480 generations necessary to reach this stage:

We have a second block-pull tractor beam mechanism, precisely reflected across the c/5 diagonal spaceship’s line of symmetry; this one is exponentially sparse (therefore not visible in the picture above), and so the times at which the loaves are removed are tetrationally sparse. Consequently, the time for the first n loaves to be destroyed is a power tower of height n. After a time expressible only as a power tower of height 1.1038 × 10^1046, all of the loaves are destroyed, and the entire mechanism cleanly self-destructs (with the help of the salmon and beige cells) to leave zero live cells.

Note that whilst hashlife is capable of simulating the 2^3480 generations necessary to burn the second fuse, it cannot handle the tetrational part of the mechanism; instead, this analysis relies on bespoke mathematical analysis of the behaviour of the pattern.

More ambitious plans

This is still at a very low rung on the fast-growing hierarchy: we would need an extra c/5d spaceship and corresponding pair of tractor beams for every 2 levels of the hierarchy, and even then we are restricted to primitive-recursive functions. If we dispense with the 10000-cell bounding box limitation, there are plans to build self-destructing Turing machines with extremely long lifespans. One of those plans involves a function that grows so quickly that it cannot be proved total in Peano arithmetic, based on the ‘primitive sequence system’. But that is a special case of something even more powerful, namely…

Bashicu Matrix System proved well-ordered

…the Bashicu Matrix System.

A very exciting paper by Samuel Vargovcik proves the well-orderedness of a conjectured system of ordinal notations, the Bashicu Matrix System, named after its discoverer. Ordinals (up to some large but as-yet-undetermined countable ordinal) are represented as matrices of natural numbers.

The order type of height-1 matrices (‘primitive sequence system’) is ε_0, same as the Kirby-Paris hydra; the order type of height-2 matrices (‘pair sequence system’) is Buchholz’s ordinal; beyond that, the exact order types are unknown and were only proved well-ordered by Vargovcik’s recent paper.

This is particularly noteworthy as other systems of ordinal notations, such as those of Buchholz, are bounded by much smaller ordinals (e.g. TFBO). The Bashicu Matrix System gives a very concise tabular representation for ordinals up to (some very large ordinal, far beyond TFBO); for example, the Bachmann-Howard ordinal is (according to this page) representable as the Bashicu matrix ((0, 0), (1, 1), (2, 2)), read in column-major order. The supremum of all height-2 matrices is the height-3 matrix ((0, 0, 0), (1, 1, 1)), which therefore corresponds to Buchholz’s ordinal.

Aperiodic monotile

apgoucher — Tue, 21 Mar 2023 07:47:17 +0000

David Smith, Joseph Myers, Craig Kaplan, and Chaim Goodman-Strauss have discovered an aperiodic monotile: a polygon that tiles the plane by rotations and reflections, but cannot tile the plane periodically.

Any tiling induced by the monotile is scalemic: the majority of tiles are unreflected, with only a relatively small proportion (shown in dark blue) being reflected. The ratio of unreflected to reflected tiles is φ^4 : 1, as can be determined from taking the authors’ description of the tiling in terms of a substitution system of ‘metatiles’:

H → 3H + 3P + 3F + T
P → 2H + P + 2F
F → 2H + P + 3F
T → H

and then taking the dominant eigenvector (corresponding to the dominant eigenvalue φ^4) to find the proportions of metatiles in the limit tiling, and using that to deduce the fraction of unreflected monotiles.

The tile itself is remarkably simple: it is a 13-sided polygon formed from the union of eight of the kites in the deltoidal trihexagonal tiling, which is itself the common refinement of the regular hexagonal tiling and its dual:

Adapted from an illustration by Tilman Piesk

This paper came a few days after the solution of another long-standing open problem in combinatorics, namely an improvement to the exponent in the upper bound on the Ramsey number R(s, s).

Additionally, a few months ago Barnabas Janzer proved that there exists a pair of convex compact sets each homeomorphic to a closed 4-ball such that a copy of S exists inside K in every orientation, but such that (remarkably!) the space of ways to embed copies of S inside K is not path-connected. This resolves a question posed by Hallard Croft.

The Osmiumlocks Prime

apgoucher — Wed, 08 Mar 2023 15:07:28 +0000

A couple of years ago I described a prime p which possesses various properties that renders it useful for computing number-theoretic transforms over the field . Specifically, we have:

where the first of these equalities uses the identity that:

where rad(k) is the product of the distinct prime factors of k.

It transpires that at approximately the same time, this field and its properties were rediscovered by the authors of the cryptographic protocol Plonky2. Remco Bloemen has written an article on this prime, which he calls the Goldilocks prime. Remco’s article is very much worth reading; it contains a bunch of new things not present in my original article, such as a fast implementation of modular inversion* based on the binary GCD algorithm, as well as a comprehensive discussion of Schönhage’s trick and some closely related variants.

*which can be improved by a further 20% as implemented here.

Besides convenient real-world properties such as fitting neatly into a machine-sized integer, there are two important mathematical properties of p:

p − 1 is divisible by , so th roots of unity exist in the field . This supports power-of-two-length NTTs up to length .
2 is a primitive 192th root of unity, so multiplying an element of the field by a 192th root of unity (or indeed a 384th root of unity, by Schönhage’s trick) is particularly efficient. Since 192 has a large power-of-2 divisor (64), this means that we can use radix 64 for implementing these NTTs efficiently.

This prime has a much larger cousin with very similar properties:

We’ll call this the Osmiumlocks prime, because it’s in some sense a heavier version of the Goldilocks prime.

In this case, 2 is a primitive 1600th root of unity, and th roots of unity exist in the field, so we can have extremely large power-of-two-length NTTs. This is very fortuitous, even more so than the Goldilocks prime p, because the reason that k/rad(k) = 160 is so large is because the prime factorisation of k = 1600 = 2^6 . 5^2 has both prime factors repeated.

The beneficial properties of the Osmiumlocks prime q as compared with the Goldilocks prime p are that it’s sufficiently large to use for cryptography (for example, as the base field for an Edwards curve) and that it allows vastly longer power-of-two-length NTTs. You would only be limited by the maximum length of a power-of-two-length NTT over this field if you could store 2^160 words in memory; unless each word could be stored in fewer than 100 atoms, there are insufficiently many atoms inside the planet to build such a machine!

The disadvantage is that, being 640 bits long, field elements no longer fit into a 64-bit machine word. We’ll address this issue shortly.

Also, whereas multiplying by cube roots of unity was efficient in the Goldilocks field, it’s no longer efficient in the Osmiumlocks field; conversely, multiplying by fifth (and, less usefully, twenty-fifth) roots of unity is now efficient whereas it wasn’t in the Goldilocks field. All of the roots of unity in the Goldilocks field still exist in the Osmiumlocks field, though, because q − 1 is divisible by p − 1.

An unlikely alliance

I originally saw these two primes p and q as competitors: two primes with similar properties but very different sizes. It transpires, however, that the properties of p help to address the drawbacks of q, namely its inability to fit into a single machine word.

How would we implement arithmetic modulo q? If we represented it as 640/64 = 10 machine words, then we’d need 100 multiplication instructions (together with carries and suchlike) to implement multiplication using the ordinary method for integer multiplication. Instead, we’ll make use of the following observation:

This means that the field embeds into the ring of integers modulo , so we can do our arithmetic in this larger ring (reducing at the end if necessary). How does that help? If we represent the elements in this ring as degree-n polynomials in a variable , then multiplying the polynomials modulo is just a negacyclic convolution.

If we set n = 32, so that , then the coefficients of the polynomials are 25-bit integers. When we take a negacyclic convolution of two such polynomials, each coefficient will be the sum 32 different 50-bit products, which (allowing for an extra bit to deal with signs) will therefore fit in 56 bits.

We can do this negacyclic convolution efficiently using NTTs if we have efficient 64th roots of unity, which indeed we have in the original Goldilocks field! We can therefore compute one of these length-32 negacyclic convolutions over the Goldilocks field , involving just 32 multiplications of machine integers, a factor of 3 smaller than the 100 we would have otherwise needed.

We’ll need other operations, such as shifts and adds, but these tend to be cheap operations. Whether or not this is worthwhile depends on the size of the hardware multiplier. Many GPUs implement a full 64-bit product as many individual calls to a smaller (16- or 32-bit) hardware multiplier, so there we would experience a greater advantage from this NTT-based approach.

Also, the relevant operations vectorise very well: on a GPU, each warp consists of 32 threads, and we can have the ith thread in a warp hold the coefficient of . Both the NTT butterfly operations and final carrying operations (to reduce the 56-bit coefficients back down to 25 bits) can be efficiently implemented using shuffles, which we have previously used in the context of fast Gaussian random number generation, and any arithmetic operations are performed across all threads in parallel.

Searching for optimal Boolean chains

apgoucher — Thu, 02 Mar 2023 19:07:09 +0000

I gave a half-hour talk on Tuesday about the project to search for optimal Boolean chains for all equivalence classes of 5-input 1-output and 4-input 2-output functions. The talk was not recorded, but the slides and transcript are included here for posterity. Some of the ideas in the talk haven’t been previously published, although it does follow a similar methodology to that which we introduced in a previous post.

The transcript is designed to be readable in a standalone fashion, so it incorporates both the spoken parts (in pale-yellow blocks) and the contents of the slides themselves. Both the transcript and slides were compiled from a single source file using an ad hoc build system, which preprocesses the source file into multiple separate LaTeX files which are compiled in parallel:

The build system also supports inline code in either Python or Wolfram Mathematica for drawing images, which is often more concise and expressive than drawing graphics using LaTeX’s native TikZ package. (It’s implemented using LaTeX’s immediate write18 commands, but with hash-based caching so that graphics are only regenerated if that part of the source file changes.)

The ordered partial partition polytope

apgoucher — Fri, 03 Feb 2023 13:48:55 +0000

In the tensor rank paper we introduced a new family of axis-aligned n-dimensional polytopes, one for each positive integer n. The vertices are naturally identified with ordered partial partitions (OPPs) of {1, …, n}, and the edges correspond to converting one OPP into an ‘adjacent’ OPP by moving an element in a specific way.

Here are the polytopes for n = 2 and n = 3:

Geometrically, the polytope has a simple description as being the set of all vectors such that, for all k, the kth smallest coordinate is in the interval [0, k]. It has volume and can tile the ambient space by translations.

But this doesn’t shed very much light on the combinatorial structure of the polytope, so it is useful to have a concrete description of the vertices, edges, and higher-dimensional faces. This was introduced in the paper, but the paper was chiefly concerned with the tensor rank of the determinant and therefore we did not delve too deeply into this.

As in the figure above, an ordered partial partition can be described by a list of disjoint subsets. For example, one of the OPPs of {1, …, 8} is:

[{3, 5}, {7}, {1, 4, 8}]

It is ordered because the order of the parts matters, and partial because some elements are allowed to be missing (in this case, 2 and 6). We can also draw this as a diagram where the parts are shown as layers, and all of the absent elements are present in a ‘ground level’ together with a sentinel value 0:

Whilst such a diagram takes up a lot more space, it is more helpful for visualising how an OPP can be transformed into an adjacent OPP. The two allowed moves are:

Taking a (nonzero) element that is the only occupant of its level (e.g. 7), and ‘merging it down’ into the next level (currently {3, 5}). The total number of levels decreases by 1.
Taking a (nonzero) element that is not the only occupant of its level (e.g. 2), and ‘separating it up’ into its own new level. The total number of levels increases by 1.

(This description is different from, but equivalent to, the one in the paper: the paper omits the ‘ground level’ and therefore instead has elements disappear and reappear; here we include this ‘ground level’ explicitly to make some of the descriptions more concise.)

For a given element i ∈ {1, …, n}, exactly one of these two moves is possible (depending on whether or not it is the only occupant of its level), and the two moves are inverses of each other.

The polytope and its face lattice

For a fixed n, we define a polytope whose vertices correspond to the OPPs of {1, …, n} and edges join pairs of OPPs if they can be interchanged by moving an element according to the aforementioned rules. Each vertex is incident with n edges, because from each OPP, we can move any of the n elements in exactly one way.

What about the higher-dimensional faces? A k-dimensional face is given by taking an OPP, allowing some size-k subset of the elements to be ‘movable’, and the remaining elements to be ‘immovable’. For example, we might make {2,3,7} movable elements, and the reachable configurations define a 3-dimensional face. Here we show the same OPP as before, but with the movable elements coloured in red:

We have also placed a solid black ‘floor’ on any level that contains an immovable element, because it is impossible for any elements to ever pass through those floors whilst subject to the rules we’ve described.

Which OPPs are reachable from this configuration? The element ‘2’ can stay on the ground level or move up into its own new level, so it has two possible locations. There are six possible configurations for the elements {3, 7}, depending on their relative orderings (3 possibilities) and whether or not at least one of them shares a level with ‘5’ (2 possibilities). Together, that means that this three-dimensional face has 12 vertices (where, by convention, we have omitted the contents of the ground level):

[{3, 5, 7}, {1, 4, 8}]
[{3, 5}, {7}, {1, 4, 8}]
[{5}, {3}, {7}, {1, 4, 8}]
[{5}, {3, 7}, {1, 4, 8}]
[{5}, {7}, {3}, {1, 4, 8}]
[{5, 7}, {3}, {1, 4, 8}]
[{2}, {5, 7}, {3}, {1, 4, 8}]
[{2}, {5}, {7}, {3}, {1, 4, 8}]
[{2}, {5}, {3, 7}, {1, 4, 8}]
[{2}, {5}, {3}, {7}, {1, 4, 8}]
[{2}, {3, 5}, {7}, {1, 4, 8}]
[{2}, {3, 5, 7}, {1, 4, 8}]

We have enumerated them in the order of a Hamiltonian cycle, making it apparent that these are indeed all reachable and therefore form a face of our polytope.

Counting the faces

Note that any face has a unique ‘minimum-energy’ vertex: the ordered partial partition where we merge all movable elements down into the next level containing at least one immovable element. To count the faces in the whole polytope, it therefore suffices to count, for each ordered partial partition, the faces that have that particular OPP as their minimum-energy vertex.

A vertex is minimum-energy if and only if every non-ground level contains at least one immovable element. This means that for a particular ordered partial partition R, the set of movable elements must be chosen by taking some (possibly empty) proper subset of each non-ground level P ∈ R together with an arbitrary subset of the ground level. The generating function of faces with R as its minimum-energy vertex is therefore the polynomial:

where the coefficient of counts the number of k-dimensional faces with R as its minimum-energy vertex, and m is the number of non-ground elements in R.

To obtain the full face-generating function for the whole polytope, we just need to sum over all ordered partial partitions R. The number of such ordered partial partitions grows superexponentially, which is problematic. However, the polynomial doesn’t depend on the ordering of the parts or the specific choice of elements in each part, so we just need to sum over all partitions of each integer m ≤ n and multiply each polynomial that occurs by the number of ordered partial partitions corresponding to that particular integer partition.

Integer partitions grow subexponentially, so it is much more feasible to count them. Here is some Wolfram Mathematica code that uses this idea to compute the face-generating function for the whole polytope as a function of n:

It’s relatively quick to compute the face-generating function up to n = 30. Here are the values for n up to 8 dimensions:

The constant term counts the number of vertices, given by A000629 in the OEIS. The linear term counts the number of edges; it doesn’t appear to be in the OEIS, but by n-regularity, it’s just n/2 times the number of vertices.

The term in counts the codimension-1 faces, or facets, of the polytope. This sequence is A215149 in the OEIS, and has closed form . We can interpret this geometrically: for each standard basis vector , there are facets with outward normal and one facet with outward normal ; summing over all n of the standard basis vectors gives the result.

The polytopes start to get outrageously complicated for larger n. For example, when n = 30, there are more than 22 undecillion vertices, 342 undecillion edges, and 16 billion facets:

Having a convenient way to compute the k-dimensional faces of the n-dimensional polytope in this family means that this could be submitted to the Encyclopedia of Combinatorial Polytope Sequences, which catalogues objects such as simplices, hypercubes, permutohedra, and associahedra.

An infinite-dimensional polytope

As abstract polytopes, these are nested: every ordered partial partition of {1, …, n} is also an ordered partial partition of {1, …, n+1}. This means that we can take their union, and obtain an abstract infinite-dimensional polytope whose vertices are all finitely-supported ordered partial partitions of the positive integers.

Tensor rank paper

apgoucher — Wed, 18 Jan 2023 12:00:36 +0000

Robin Houston, Nathaniel Johnston, and I have established some new bounds on the tensor rank of the determinant over various fields. The paper is now available as an arXiv preprint and contains the following results:

A new formula for the determinant (Houston’s identity) which applies over arbitrary commutative rings and establishes an upper bound of the nth Bell number for the tensor rank of the n × n determinant. This is a superexponential improvement over the previous state-of-the-art, which was
Tighter upper bounds in fields of characteristic p using the same identity, including an upper bound of in characteristic 2.
Two completely independent proofs of Houston’s identity: a combinatorial proof (which is a more streamlined version of the proof from this cp4space post) and a geometric proof.
Mildly improved lower bounds on the tensor rank of the determinant (improving Derksen’s bound by a small additive term).
A computer-assisted proof that the tensor rank of the 4 × 4 determinant over the field of two elements is exactly 12, which consumed 357 core-hours of CPU time*.
Some geometrical motivation which relates the formula to an interesting tiling of axis-aligned polytopes, the 1-skeleton of which is interpretable as a flip graph for ordered partial partitions:

David Eppstein talks more about orthogonal polyhedra here.

*the same machine later found the Walrus, the first (and currently only known) example of an elementary c/8 diagonal spaceship in Conway’s Game of Life.