Recall how to multiply matrices
and
:
For each , it takes
operations to calculate this sum, and so naive matrix multiplication takes
operations total. Let
denote the infimum of
for which there exists an algorithm that computes the product of any two
matrices in
operations.
Conjecture. .
A 3-way tensor is a complex function of index triples, i.e., . A tensor has rank 1 if it enjoys a factorization of the form
The rank of a tensor is the size of its smallest decomposition into rank-1 tensors (like matrix rank).
Warning 1. Tensor rank is strange. Consider
Then for every
, but
.
Warning 2. Tensor rank is NP-hard to compute [Hastad 90].
Regardless, we use tensors (and their low-rank decompositions) to multiply matrices faster than . Define the matrix multiplication tensor as follows:
Suppose , i.e.,
Then we can use this decomposition to re-express matrix multiplication:
This can be implemented as follows (here, ; ignore the notes in brackets for now):
Initialize as the matrix of all zeros [
,
]
for
Compute and
[
,
]
[
,
]
for ,
[
,
]
endfor
endfor
In each line, the first quantity in brackets counts the number of operations performed. Adding these quantities (and accounting for for loops), we get that the total complexity is . This is not very good:
Lemma. .
Proof: Suppose . Then pick a nonzero
such that
for every
. Then
for every
, meaning
. Contradiction.
Now we fix and multiply
matrices with
. Consider blocks of size
, and note that we can re-interpret the previous derivation of
as a formula for the
th block of
in terms of block multiplication. In the algorithm described above, the second quantity in brackets reports the number of operations under this new interpretation. Due to the calculation of
, the total number of operations then follows a recursion relation:
Note that . Let’s hunt for a pattern:
Since , the first term is dominant:
As such, we can multiply two matrices in only
operations (even if
is not a power of
; why?). To summarize:
Theorem. If , then
.
Note: [Strassen 69, Landsberg 05], and so
. MATLAB’s built-in matrix multiplication does not use Strassen’s algorithm due to local memory constraints, but Strassen provides speedups for large matrices (
).