# Deterministic RIP matrices: Breaking the square-root bottleneck, II

In my previous blog entry, I kicked off an online project to make progress on the deterministic RIP matrix problem by optimizing the analysis in a breakthrough paper by Bourgain et al. Since then, I launched a wiki to maintain a “scoreboard” of world records for this problem, and Afonso Bandeira made his own contribution to the project. The purpose of this entry is to introduce some additive combinatorics and to develop some understanding of how they are leveraged to prove RIP for a particular matrix construction. While this entry does not immediately produce a new world record, it does optimize the analysis of Corollaries 3 and 4 of the paper. I plan to generalize Lemmas 9 and 10 in a future blog post, which will lead to an explicit optimization problem that incorporates the work in this entry (and a new world record, hopefully).

— A brief introduction to additive combinatorics —

In this section, we briefly detail some key ideas from additive combinatorics. Given an additive group $G$ and finite sets $A,B\subseteq G$, we can define the sumset

$A+B:=\{a+b:a\in A,~b\in B\}$,

the difference set

$A-B:=\{a-b:a\in A,~b\in B\}$

(not to be confused with a difference set), and the additive energy

$E(A,B):=\#\big\{(a_1,a_2,b_1,b_2)\in A^2\times B^2:a_1+b_1=a_2+b_2\big\}$.

These definitions are useful in quantifying the additive structure of a set. In particular, consider the following:

# Deterministic RIP matrices: Breaking the square-root bottleneck

Unless you’ve been living under a rock, you probably know about Polymath8, and how it has been a dramatic success. In May, Yitang Zhang published a proof that there are infinitely many pairs of primes which differ by 70 million or less, and since then, the Polymath project exploded to decrease this number considerably. In a recent blog entry, Terry Tao took some time to reflect on why this Polymath project received so much attention. Here is one of the reasons he points out:

One could summarise the current state of progress by a single natural number $H$, which implied by infinite descent that the project was guaranteed to terminate at some point, but also made it possible to set up a “scoreboard” that could be quickly and easily updated.

In reading this, I was reminded of another big open problem — perhaps the biggest open problem in compressed sensing:

The deterministic RIP matrix problem. Construct an infinite family of deterministic matrices $\{\Phi_M\}$, where $\Phi_M$ is $M\times N(M)$ and $M$ and $N(M)/M$ are both arbitrarily large, such that each $\Phi_M$ satisfies the $(K,\delta)$-restricted isometry property (RIP) with $K=\Omega(M^{1-\epsilon})$ for all $\epsilon>0$ and with $\delta<1/2$.

# Phase retrieval from coded diffraction patterns

In a previous post, I described a paper I wrote with Afonso and Yutong about how to design $O(\log n)$ masked illuminations that enable efficient phase retrieval of $n$-dimensional signals for X-ray crystallography and related applications. This masked-illumination methodology was originally proposed in this paper, and our phase retrieval guarantee was based on a recovery method known as polarization. This week, the following paper was posted online, and it gives the first guarantee of this kind for a more popular recovery method called PhaseLift:

Phase retrieval from coded diffraction patterns

Emmanuel J. Candes, Xiaodong Li, Mahdi Soltanolkotabi

In particular, this paper shows that $O(\log^4 n)$ masked illuminations enable PhaseLift recovery, and their result actually holds for a wide assortment of masks. Since this paper is so related to my research, I decided to interview one of the authors (Mahdi Soltanolkotabi). I’ve lightly edited his responses for formatting and hyperlinks:

# A geometric intuition for the null space property

Solving underdetermined linear systems is impossible unless you’re given additional information about the solution. The goal of compressed sensing is to solve these systems by assuming the desired solution is sparse. In this vein, one common approach is L1 minimization: Find the minimizer of $\|z\|_1$ subject to $Az=Ax$, where $x$ is the true solution (and so $Ax$ is the available data). Here is the standard picture to illustrate why L1 minimization works:

In this case, we applied the sensing matrix $A=[1,3]$, whose null space is the set of scalar multiples of $[3;-1]$. The vector we are sensing is $x=[0;1]$, and the red line denotes the null space of $A$ shifted by $x$ (i.e., the set of all $z$ such that $Az=Ax$). The blue diamond illustrates the smallest L1-ball which intersects this shifted subspace, and we note that this intersection occurs at our signal $x$. As such, L1 minimization encouraged sparsity so well that it recovered the desired signal exactly.

The purpose of this blog entry is to gain a deeper intuition for why L1 minimization works so well. The following (rather obvious) lemma lies at the heart of this intuition:

# A fully automatic problem solver with human-style output

I think most would agree that the way we do math research has completely changed with technology in the last few decades. Today, I type research notes in LaTeX, I run simulations in MATLAB and Mathematica, I email with collaborators on a daily basis, I read the arXiv and various math blogs to keep in the know, and when I get stuck on something that’s a little outside my expertise, I ask a question on MathOverflow. With this in mind, can you think of another step we can take with technology that will revolutionize the way we conduct math research? It might sound ambitious, but the following paper is looking to make one such step:

A fully automatic problem solver with human-style output

M. Ganesalingam, W. T. Gowers

The vision of this paper is to make automated provers extremely mathematician-friendly so that they can be used on a day-to-day basis to help prove various lemmas and theorems. Today, we might use computers as a last resort to verify a slew of cases (e.g., to prove the four color theorem or the Kepler conjecture). The hope is that in the future, we will be able to seamlessly interface with computers to efficiently implement human-type logic (imagine HAL 9000 as a collaborator).

The paper balances its ambitious vision with a modest scope: The goal is to produce an algorithm which emulates the way human mathematicians (1) prove some of the simplest results that might appear in undergraduate-level math homework, and (2) write the proofs in LaTeX. To be fair, the only thing modest about this scope is the hardness of the results that are attempted, and as a first step toward the overall vision, this simplification is certainly acceptable. (Examples of attempted results include “a closed subset of a complete metric space is complete” and “the intersection of open sets is open.”)

# Probably Approximately Correct

A couple of months ago, Afonso recommended I read this new book by Leslie Valiant. In fact, this book had already been recommended by one of the blogs I follow, so I decided it must be worth the read. The purpose of this blog entry is to document some notes that I took while reading this fascinating book.

Considering the well-established theory of computability (and computational complexity), in hindsight, it seems rather natural to seek a theory of the learnable. What sort of things are inherently learnable, and efficiently so? In order to build a solid theory of learnability, we first need to make some assumptions. Valiant proposes two such assumptions:

1. The Invariance Assumption. Data collected for the learning phase comes from the same source as the data collected for the testing phase. Otherwise, the experiences you learn from are irrelevant to the situations you apply your learning to. In the real world, non-periodic changes in our environment are much slower than our learning processes, and so this is a reasonable assumption for human learning. As for machine learning, the validity of this assumption depends on how well the curriculum is designed.

2. The Learnable Regularity Assumption. In both the learning and testing phases, there are a few regularities available which completely distinguish objects so that efficient categorization is feasible. As humans who learn our environment, our senses are useful because they help us to identify objects, and our brains apply efficient algorithms to determine relationships between these objects, cause and effect, etc. Another way to phrase this assumption is that patterns can be recognized efficiently.

# Phase retrieval from very few measurements

Before the AIM meeting last month, I posted a paper on the arXiv that I wrote with Matt Fickus, Aaron Nelson and Yang Wang. This paper provides some very interesting results for phase retrieval in the setting where you wish to use as few intensity measurements as possible. There are really three different types of results in this paper, and they are partitioned into three different sections accordingly.

— 4M-4 injective intensity measurements —

Recall that Bodmann and Hammen provide a construction of $4M-4$ vectors in $\mathbb{C}^M$ which yield injective intensity measurements. Such an explicit construction is rather interesting because $4M-4$ is conjectured to be the smallest for which this is even possible. From my perspective, any insight into the structure of such ensembles could lead to a deeper understanding of what it takes to be injective in the complex case, and so such constructions are particularly important.

In our paper, we provide a second construction of $4M-4$ injective intensity measurements, and our method for proving injectivity was very different from the Bodmann-Hammen approach. Specifically, it is not clear if the Bodmann-Hammen construction has an efficient reconstruction algorithm, whereas for our construction, injectivity is essentially proved by applying a particular reconstruction algorithm that we designed in concert with the ensemble. We used several key ideas in our design, and I’ll briefly discuss them here.

# AIM workshop: Frame theory intersects geometry

A couple of years ago, I was invited to attend my first AIM workshop, and it was finally held this last week. The topic was Frame theory intersects geometry, and as the name suggests, this workshop was an experiment of sorts to get two different communities together to solve various problems. Specifically, we focused on how to leverage techniques from algebraic geometry to solve certain infamous problems in frame theory, and with a number of successes. This entry overviews these successes.

— Part b of the 4M-4 conjecture is true —

Phase retrieval served as one of the main thrusts of this workshop. To help get the ball rolling on this subject, Thomas Strohmer and I gave survey talks on the subject (here are my slides). As a workshop, we quickly put our sights on the 4M-4 conjecture, which states that 4M-4 intensity measurements are necessary and generically sufficient for injectivity. I detail this conjecture in this blog post, where I also offer \$100 for the solution. In that post, I indicated that part b (generic sufficiency) is probably the lower-hanging fruit. In another post, I interviewed Vlad Voroninski about his recent proof that 4 generic unitaries suffice for injectivity, and he indicated that similar proof techniques should resolve part b. This week, these suspicions proved correct, as part b was solved (or at least an extremely plausible argument was sketched and will be written up rigorously soon).

# Three talks from SampTA 2013

Last week, I attended SampTA at Jacobs University in Bremen, Germany. The organizers did a fantastic job, and in this blog entry, I’ll discuss three of the plenary talks that I found particularly interesting.

— Fast algorithms for sparse Fourier transform —

On Tuesday, Piotr Indyk talked about a randomized version of the fast Fourier transform that provides speedups in the case where the desired signal is (nearly) sparse in the frequency domain. His talk was mostly based on this paper, but check out this page for a bigger picture of the research program. To help our intuition, he focused on the case where the Fourier transform is exactly sparse (having at most $k$ nonzero entries), in which case the number of operations is $O(k\log n)$, where $n$ in the dimension of the ambient vector space. Note that this is a substantial improvement over the FFT, which uses $O(n\log n)$ operations.  Also note that when $k$ is small relative to $n$, this number of operations is actually less than the dimension of the space. As such, the sparse FFT (sFFT) will not even read the entire $n$-dimensional input!

To see how this is even possible, consider the following figure (I took this from these slides):

# An introduction to interlacing families

Perhaps you heard on Gil Kalai’s blog, but the 54-year-old Kadison-Singer problem was recently solved by Adam Marcus, Dan Spielman and Nikhil Srivastava. The paper is fantastically short, and comes as a sequel to another recent paper which used similar methods to solve another important problem: For every degree $d>2$, there exists an infinite family of $d$-regular bipartite Ramanujan graphs. Considering the gravity of these results, it’s probably wise to get a grasp of the techniques that are used to prove them. The main technique they use (“the method of interlacing families”) is actually new, and so I decided to write a blog post to discuss it (and force myself to understand it).

— 1. A new probabilistic method —

The main idea is that you’re given a list of polynomials, and you’re asked to establish that the largest real root of one of these polynomials is smaller than something. This is very similar to the sort of problem that the probabilistic method could be used to solve: Suppose you have a huge collection $S$ of real numbers, and you want to see how small these numbers can be. Define a random variable $X$ to be uniformly distributed over $S$ and compute the expected value. Then you know that there is some member of $S$ which is no larger than $\mathbb{E}[X]$. Another way to say this is the minimum is always less than or equal to the average. Functional analysts use this pigeonhole principle-type argument all the time without consciously acknowledging that they are using the probabilistic method. Of course, the probabilistic method can be formulated as “just counting,” but oftentimes, it is easier to think in terms of probability.