Recall how to multiply matrices and :
For each , it takes operations to calculate this sum, and so naive matrix multiplication takes operations total. Let denote the infimum of for which there exists an algorithm that computes the product of any two matrices in operations.
A 3-way tensor is a complex function of index triples, i.e., . A tensor has rank 1 if it enjoys a factorization of the form
The rank of a tensor is the size of its smallest decomposition into rank-1 tensors (like matrix rank).
Warning 1. Tensor rank is strange. Consider
Then for every , but .
Warning 2. Tensor rank is NP-hard to compute [Hastad 90].
Regardless, we use tensors (and their low-rank decompositions) to multiply matrices faster than . Define the matrix multiplication tensor as follows:
Suppose , i.e.,
Then we can use this decomposition to re-express matrix multiplication:
This can be implemented as follows (here, ; ignore the notes in brackets for now):
Initialize as the matrix of all zeros [, ]
Compute and [, ]
In each line, the first quantity in brackets counts the number of operations performed. Adding these quantities (and accounting for for loops), we get that the total complexity is . This is not very good:
Proof: Suppose . Then pick a nonzero such that for every . Then for every , meaning . Contradiction.
Now we fix and multiply matrices with . Consider blocks of size , and note that we can re-interpret the previous derivation of as a formula for the th block of in terms of block multiplication. In the algorithm described above, the second quantity in brackets reports the number of operations under this new interpretation. Due to the calculation of , the total number of operations then follows a recursion relation:
Note that . Let’s hunt for a pattern:
Since , the first term is dominant:
As such, we can multiply two matrices in only operations (even if is not a power of ; why?). To summarize:
Theorem. If , then .
Note: [Strassen 69, Landsberg 05], and so . MATLAB’s built-in matrix multiplication does not use Strassen’s algorithm due to local memory constraints, but Strassen provides speedups for large matrices ().