Unless you’ve been living under a rock, you probably know about Polymath8, and how it has been a dramatic success. In May, Yitang Zhang published a proof that there are infinitely many pairs of primes which differ by 70 million or less, and since then, the Polymath project exploded to decrease this number considerably. In a recent blog entry, Terry Tao took some time to reflect on why this Polymath project received so much attention. Here is one of the reasons he points out:
One could summarise the current state of progress by a single natural number , which implied by infinite descent that the project was guaranteed to terminate at some point, but also made it possible to set up a “scoreboard” that could be quickly and easily updated.
In reading this, I was reminded of another big open problem — perhaps the biggest open problem in compressed sensing:
The deterministic RIP matrix problem. Construct an infinite family of deterministic matrices , where is and and are both arbitrarily large, such that each satisfies the -restricted isometry property (RIP) with for all and with .
For those who don’t know, a matrix is said to be -RIP if for every vector with at most nonzero entries, we have
Matrices which satisfy this property are particularly well suited as compressed sensing matrices, as they allow sparse signals to be efficiently reconstructed from measurements of the form (and noisy variants thereof). At the moment, the RIP matrices with the fewest number of rows (i.e., fewest number of measurements) are constructed using random processes: If is , then can scale like . However, deterministic matrices are almost invariably shown to be RIP using coherence-based arguments (e.g., using the Gershgorin circle theorem to produce a bound on ), and these arguments inevitably lead to scaling like (i.e., many more measurements than in the random case).
Three years after Terry Tao posed this problem on his blog, the problem was partially solved in this breakthrough paper:
J. Bourgain, S. Dilworth, K. Ford, S. Konyagin, D. Kutzarova
This paper provides an explicit family of RIP matrices (of arbitrarily large aspect ratio) in which scales like for some . Instead of applying the Gershgorin circle theorem to estimate in terms of coherence, they leverage additive combinatorics to demonstrate certain cancellations in the Gram matrix . Today (three years later), this is the only known deterministic matrix construction which breaks the square-root bottleneck.
It appears as though the paper by Bourgain et al. provides the only known techniques to solve the deterministic RIP matrix problem. This leads to three natural questions:
1. What are the techniques?
2. How big is ?
and in the spirit of Polymath8:
3. Can we optimize their analysis to increase ?
The purpose of this blog entry is to start answering these questions, and perhaps kickstart an online project to increase .
— The big-picture techniques —
Before explaining how Bourgain et al. beat the Gershgorin technique, let’s briefly discuss this technique. First, let denote the submatrix of whose columns are indexed by . Then -RIP equivalently states that, for every of size , the eigenvalues of lie in . As such, we can prove that a matrix is RIP by approximating eigenvalues. To this end, if we assume the columns of have unit norm, and if we let denote the largest off-diagonal entry of in absolute value (this is the worst-case coherence of the columns of ), then the Gershgorin circle theorem implies that is -RIP. Unfortunately, the coherence can’t be too small, due to the Welch bound:
which is provided for some . Thus, to get , we require . This is much smaller than the random RIP constructions which instead take , thereby revealing the shortcoming of the Gershgorin technique. We’ll now discuss the alternative techniques that Bourgain et al. use. To be clear, this blog entry will not discuss the techniques which are specific to their particular matrix construction — these are where additive combinatorics make an appearance, and will be the subject of a future blog entry.
The main idea is to convert the RIP statement, which concerns all -sparse vectors simultaneously, into a statement about finitely many vectors:
Definition (flat RIP). We say satisfies –flat RIP if for every disjoint of size ,
Lemma A (essentially Lemma 3, cf. Theorem 13 in this paper). If has -flat RIP and unit-norm columns, then has -RIP.
Unlike the coherence argument, flat RIP doesn’t lead to much loss in . In particular, this paper shows that random matrices satisfy -flat RIP with when . As such, it makes sense that flat RIP would be a vehicle to break the square-root bottleneck. However, in practice, it’s difficult to control both the left- and right-hand sides of the flat RIP inequality — it would be much easier if we only had to worry about getting cancellations, and not getting different levels of cancellation for different-sized subsets. This leads to the following:
Definition (weak flat RIP). We say satisfies –weak flat RIP if for every disjoint of size ,
Lemma B (essentially Lemma 1). If has -weak flat RIP and worst-case coherence , then has -flat RIP.
Proof: By the triangle inequality, we have
Since also has weak flat RIP, we then have
Unfortunately, this coherence requirement puts back in the square-root bottleneck, since is equivalent to . To rectify this, they use a trick in which a modest with tiny can be converted to a large with modest :
Lemma C (buried in Lemma 3, cf. Theorem 1 in this paper). If has -RIP, then has -RIP for all .
This paper used this trick to get RIP results for larger when testing RIP for smaller . For the deterministic RIP matrix problem, we are stuck with proving how small is when on the order of . Note that this trick will inherently exhibit some loss in . Assuming the best possible scaling for all , and is , then if , you can get -RIP only if . In this best-case scenario, you would want to pick for some and apply Lemma C to get . In some sense, this is another manifestation of the square-root bottleneck, but it would still be a huge achievement to saturate this bound.
— How big is ? —
Before evaluating how big actually is, first note that the goal of this paper was only to establish that the bottleneck is breakable, i.e., to prove the following theorem:
Theorem (Theorem 1). There is an effective constant and an explicit number such that, for any positive integers and , there is an explicit matrix which is -RIP.
Due to this difference in objective, the analysis in this paper is perhaps ripe for optimization. But before we start optimizing, it would nice to see what the current record is. Halfway down page 155 of the journal version of the paper, the authors discuss how to use their results to produce the in the above theorem (note: I think the references to Corollary 1 in this paragraph should actually be references to Lemma 1). The main ingredient here is Lemma 2, which gives that their explicit construction is -weak flat RIP with
Here, is a constant from Proposition 2 of this paper, we can take in accordance with Corollary 4, and and are parameters of the matrix construction. Finally, is a free variable, constrained to be a positive even integer. To prove Theorem 1, they take , but since our problem only requires , we will instead take for any . Indeed, -weak flat RIP implies -flat RIP by Lemma B, which in turn implies -RIP by Lemma A, and so taking and applying Lemma C gives -RIP.
At this point, we wish to establish how big can be given
By elementary calculus, we have
with the maximum occurring at . In our case, this maximizer is , and the maximum leads to a maximal of
Overall, the analysis (as is) leads to . This is the record to beat.
— A way ahead —
Considering the calculation of , it is clear that there are several parameters whose improvement would lead to big increases in . By far the biggest contribution to the denominator of is , and the authors remark that optimizing this constant would be interesting in its own right (as a result in additive combinatorics). It’s unclear to me how amenable this constant is to optimization, but it’s certainly worth a try.
At the moment, I think the easiest gains will come from decreasing the in and the in , which come from the proofs of Lemmas 2 and 10. As an example (and a starting point), I’ll use the rest of this blog entry to decrease the in .
At the end of page 173 of the journal version (in the proof of Lemma 10 after (6.14)), the authors use the estimates
along with the assumption to conclude
for sufficiently large . Perhaps a more conservative approach is to first use precisely what the estimates give:
Then since by assumption (we actually don’t require this much, but fine), we have that is the dominant term on the right-hand side. As such, for every ,
for sufficiently large . In effect, this replaces the ‘s in the proof (which are eventually multiplied by to form the in ) with ‘s, and so the in is effectively replaced with a . Overall, this slight modification increases by a factor of , nearly doubling the state of the art’s position beyond the square-root bottleneck. Indeed, we now have
or approximately . I’m interested to see how far this can be taken.
UPDATE: Just launched a wiki to keep track of the progress.