In 2005, Donald Knuth determined the minimum cost required to implement each of the 2^32 different 5-input 1-output Boolean functions as a circuit composed entirely of:

- 2-input gates (there are 16 of these), each of which has cost 1;
- 1-input gates (there are only 2 of these, namely the identity function and the NOT gate), each of which has cost 0.

Given that NOT gates are free, every 2-input gate is either silly (ignoring one or both of its inputs) or is equivalent to either AND or XOR.

A previous cp4space post discusses how we can efficiently determine the *NPN equivalence class* (that is to say, the equivalence class of functions up to permuting and negating the inputs and/or output(s)) of a function given its truth table, and therefore to query a database of optimal circuits for the 616126 equivalence classes.

I decided to attempt the analogous feat for 4-input 2-output functions. There are still 2^32 of them, but the equivalence classes are smaller:

- For 5-input 1-output functions, the symmetry group (by whose action we’re quotienting) has 2^5 × 5! × 1! × 2^1 = 7680 elements;
- For 4-input 2-output functions, the symmetry group has 2^4 × 4! × 2! × 2^2 = 3072 elements;

and, as such, there are more equivalence classes: 1476218, as mentioned here. This number can be calculated exactly using Burnside’s lemma, or approximated from below using the ceiling of the leading-order term: ceil(2^32 / 3072) = 1398102.

### The breadth-first search

With about 3750 CPU core-hours and a lot of memory usage, I was able to determine the optimal circuits for all of these 1476218 equivalence classes of 4-input 2-output Boolean functions. The number of classes and functions of each cost are tabulated below:

Representatives of the four classes of cost 0, for example, are:

- f(x1, x2, x3, x4) = (0, 0);
- f(x1, x2, x3, x4) = (0, x1);
- f(x1, x2, x3, x4) = (x1, x1);
- f(x1, x2, x3, x4) = (x1, x2);

and representatives for the eight classes of cost 1 are:

- f(x1, x2, x3, x4) = (0, x1 AND x2);
- f(x1, x2, x3, x4) = (0, x1 XOR x2);
- f(x1, x2, x3, x4) = (x1, x1 AND x2);
- f(x1, x2, x3, x4) = (x1, x1 XOR x2);
- f(x1, x2, x3, x4) = (x3, x1 AND x2);
- f(x1, x2, x3, x4) = (x3, x1 XOR x2);
- f(x1, x2, x3, x4) = (x1 AND x2, x1 AND x2);
- f(x1, x2, x3, x4) = (x1 XOR x2, x1 XOR x2).

The methodology was that of a breadth-first search, taking advantage of symmetry to vastly reduce the search space. The search up to depth 8, described here, was conducted using a multithreaded program (taking 115 core-hours), outputting a hefty 27-gigabyte file containing the entire search tree.

Each node in the tree at depth *n* is an equivalence class of sets of *n* distinct (from each other and their complements) nontrivial 4-input 1-output functions which can be implemented with minimum cost exactly *n*. Intuitively, the nodes in the tree represent initial segments of circuits, *up to equivalence*. Even though the tree grows super-exponentially* as a function of depth, it was still possible to explicitly compute and store the first eight levels:

- 1 node at depth 0;
- 2 nodes at depth 1;
- 15 nodes at depth 2;
- 156 nodes at depth 3;
- 2396 nodes at depth 4;
- 50865 nodes at depth 5;
- 1376962 nodes at depth 6;
- 45189111 nodes at depth 7;
- 1733295202 nodes at depth 8.

** technically, the entire tree is finite, because there are only finitely many sets of distinct 4-input Boolean functions, so ‘super-exponentially’ does not apply asymptotically. This is the same (very pedantic!) reason that it’s hard to make precise the notion that the number of positions in a Rubik’s cube that require *k* moves to solve** ‘grows exponentially’ as a function of *k* — there are only finitely many positions, and Tom Rokicki showed that indeed the tree stops at depth 20. It’s also why a population of bacteria or viruses can’t grow exponentially forever: they run out of limited resources, and the volume that they occupy is bounded above by the sphere which contains the population at time 0 and expands at the speed of light — an upper bound on volume which is cubic as a function of time!*

From there, I checked every irreducible way to append 2 more gates to one of these circuits, expanding the search to depth 10, and keeping track of the optimal solutions for each of the equivalence classes obtained in this manner. All but nineteen of the equivalence classes were solved with circuits of cost <= 10, providing a lower bound of 11 for those nineteen difficult classes. This lower bound of 11 turned out to be achievable in all nineteen cases, thereby conclusively answering the question.

### Partially verifying the results

How do we check that the results are correct?

In addition to computing the minimum cost of each of these equivalence classes, the search yielded explicit witnesses (circuits that achieve the minimum cost). A much simpler program can run through all 2^32 functions and verify that these witnesses indeed implement the purported functions. This verifies that the results are correct in one direction, showing that they are correct *upper* bounds for the minimum cost of each gate.

I can’t see any way to fully verify that they are correct *lower* bounds, without someone else independently running an exhaustive search themselves (ideally using a different algorithm) and checking that the results match the table above. This is because it’s easy to verify that a circuit is correct, but hard to verify that it’s minimum-cost.

That said, there was a certain amount of self-verification when running the breadth-first search: for example, creating an on-disk file of depth 6 and extending it by 2 gates produced identical results to the on-disk file of depth 8. (In an earlier attempt, I’d overlooked an edge-case, and these results didn’t agree — this was how I was able to catch the bug.)

What is an example of one of the maximally difficult (11 gate) equivalence classes?

You said we can be sure that these are upper bounds, but not lower bounds. Up to what number of gates can we be sure that the upper bound is the exact number (meaning, for example, do we know that for all equivalence classes where the upper bound is at most 4, that the upper bound is in fact the minimum number of gates)?

Also, once we are sure of the optimality of the circuits, will you do a similar efficient database for this (4 input 2 output), as for the 5 input 1 output case?

The 19 maximally difficult truth tables are here:

https://gitlab.com/apgoucher/variegate/-/tree/master/chainsearch#epilogue

The search was exhaustive so these _should_ be lower bounds as well: the point is that it’s helpful to have independent verification to be sure that there wasn’t a mistake in the code.

Yes, I’ve made a database for the 4-input 2-output case (64 MB in memory, 26 MB source) as well as the 5-input 1-output case (32 MB in memory, 12 MB source). I’ll probably elaborate on the hashtable structure in a future cp4space post (I’m using a semi-perfect hash where each query only needs to touch one cache line > 99% of the time, and two cache lines in the remaining cases).