In the Treefoil article, I erroneously described John Rickard’s length-24 cycle in as being the ‘uniquely minimal’ example of a cycle whose three axis-parallel projections are all trees (see here for a more detailed history on this problem). Dan Simms has since found a second non-equivalent length-24 cycle with the same bounding box, namely {0, 1, 2}^3. Whereas Rickard’s cycle avoids the central point in that bounding box, Simms’ cycle passes through it.

Simon Tatham has exhaustively searched all such cycles in the larger bounding box {0, 1, 2, 3}^3. Up to symmetry, there are 187075 such cycles of lengths between 24 and 50, enumerated below, confirming that Rickard’s and Simms’ cycles are the only examples of length 24 in this bounding box:

{24: 2, 28: 44, 30: 20, 32: 733, 34: 1481, 36: 5705, 38: 24259, 40: 47875, 42: 57937, 44: 31899, 46: 13920, 48: 2289, 50: 911}

Some of these cycles are visualised on Simon’s webpage.

### Optimum Boolean circuit update

In other news, I’ve been busy over the last few months (ergo the sparsity of cp4space articles) revisiting the 5-input 1-output Boolean circuits. In particular, I reused the methodology from the 4-input 2-output search, performing an inductive search (with *n* ranging up to 11) of all sets of functions obtained as an *n*-step Boolean chain from five inputs.

What made this possible was the observation that every Boolean function has an invariant — the ‘bias’, or absolute difference between the number of 0s and the number of 1s in the truth table — which is constant across the NPN equivalence class. As such, we can classify sets of functions by the multiset of biases, and use that to divide up our problem space into manageable chunks. This enabled the search to be performed in parallel, and under memory constraints (on a 96-thread 768 GB instance from Amazon Web Services), taking about 5 days of wall-clock time to complete.

This effort both independently checked Donald Knuth’s 2005 result about the number of equivalence classes of functions with each cost, but more helpfully provided *all* of the minimum-cost circuits for all functions with cost <= 11 (there’s a unique equivalence class of functions with cost 12, represented by the truth table 0x169ae443). It would have been prohibitively expensive to find all cost-12 circuits for 0x169ae443, so I settled on exhaustively searching the tree-shaped circuits for this truth table instead.

The total number of helpful circuits is significantly narrowed down by considering the delay of the circuit: for a fixed gate cost, the next most important utility function is how quickly the output of the circuit is available. Each circuit has a 5-tuple of delays (the length of the longest path from each of the 5 inputs to the output), and we restrict to those circuits which are on the Pareto frontier (i.e. not strictly dominated by another circuit). This enabled the vast majority of circuits to be discarded, leaving just 10.6 million circuits in total for the 616124 equivalence classes of nontrivial functions (the constant function and identity functions are considered trivial since they require 0 gates to implement).

After generating these circuits, they were arranged into a 210 MiB flat file for fast access: the first 8 MiB is a k-perfect hashtable (2^17 buckets of 64 bytes each, where each bucket contains up to 13 different equivalence classes of functions) mapping each function to its cost and an index into the remaining 202 MiB which contains the circuits themselves. I’ve also written a second independent program that reads this flat file and checks the validity of all of the 10.6 million circuits.

I’d be interested to see examples in practice of programs (or functions) which can be significantly optimized using the optimal 5-input 1-output or 4-input 2-output computations you (and others) have done. Is there a relatively simple example where such simplifications could be shown/visualized?

When I was doing a computer architecture paper at uni, there was an assignment where you had to do one of eight operations (represented by three bits), on two two digit binary numbers (for 2×2=4 bits), with a possible carry input (1 more bit), and generate a 2 bit number output with a carry. So an 8-input, 3-output function. I became mildly obsessed with finding the lowest number of gates I could do the function in, and managed to get it down to 28 (possibly with some XOR gates I can’t recall). I’m certain this is still far from optimal, and if I can find the outline for the assignment I’d be keen to see how few gates this could be accomplished in.