At the end of the recent post on a combinatorial proof of Houston’s identity, I ended with the following paragraph:

This may seem paradoxical, but there’s an analogous situation in fast matrix multiplication: the best known upper bound for the tensor rank of 4-by-4 matrix multiplication is 49, by applying two levels of Strassen’s algorithm, but there is a little-known method by Winograd for multiplying two 4-by-4 matrices over a commutative ring using only 48 multiplications.

DeepMind’s AlphaTensor (blog post, Nature article, source code) has since found many non-equivalent 49-multiplication algorithms for 4-by-4 matrix multiplication, as well as improving on the state-of-the-art for some other sizes of matrix multiplication tensor.

As far as I can tell, DeepMind’s new algorithms do not beat the *matrix multiplication exponent* implied by Smirnov’s incredible algorithm for computing a 3-by-3-by-6 rectangular matrix multiplication using only 40 multiplications:

- Naive exponent:
**3** - Strassen exponent: log(7) / log(2) =
**2.8074** - Smirnov exponent: log(64000) / log(54) =
**2.7743**

There are also a few galactic algorithms (which are not used in practice because their advantage over Smirnov only applies to extremely large matrices) which improve the exponent considerably further:

- Coppersmith-Winograd:
**2.3755** - Virginia Vassilevska Williams:
**2.3729**

### Matrix multiplication in GL(4, 2)

DeepMind’s AlphaTensor also found a record-breaking 47-multiplication algorithm for 4-by-4 matrix multiplication over the finite field . One reason why 4-by-4 matrix multiplication over in particular is interesting is that the group of invertible matrices, GL(4, 2), is isomorphic to the alternating group A_8.

To summarise, here are the various algorithms for 4-by-4 matrix multiplication over (all except the DeepMind algorithm work over arbitrary rings):

**Naive:**64 multiplications + 48 additions;**Strassen^1:**56 multiplications + 88 additions;**Strassen^2:**49 multiplications + 165 additions;**Winograd:**48 multiplications + 120 additions;**DeepMind:**47 multiplications + ???* additions;

where the DeepMind algorithm can be applied recursively (unlike Winograd) to yield better exponents for large matrix multiplication over this field. Note that multiplications and additions over are individual AND and XOR gates, respectively. The naive algorithm has the advantage that the circuit has a small number of gates (112) and very low depth (3).

*AlphaTensor finds low-tensor-rank decompositions for tensors, rather than arithmetic circuits, so there’s a subsequent optimisation problem to reduce the number of additions. For example, Strassen’s algorithm for 2-by-2 matrices originally cost 7 multiplications and 18 additions, but the number of additions was later reduced to 15.

The DeepMind team published their formulas to their Github, using that, I calculated 340 additions for their 47 multiplications GF2 4×4 formula. The Kauers and Moosbauer follow up from a couple days later is even bigger with 414 additions. ( I calculated both of these in Maple with the code available at https://github.com/johnpmay/MapleSnippets ). It’s a pretty incredible number of additional additions to remove one multiplication.

The Winograd algorithm also doesn’t work over general rings. Its restriction is that multiplication must be commutative.