# Living on the edge: A geometric theory of phase transitions in convex optimization

Phase transitions are very common in modern data analysis (see this paper, for example). The idea is that, for a given task whose inputs are random linear measurements, there is often a magic number of measurements $M^*$ such that if you input $M\ll M^*$ measurements, the task will typically fail, but if you input $M\gg M^*$ measurements, the task will typically succeed. As a toy example, suppose the task is to reconstruct an arbitrary vector in $\mathbb{R}^N$ from its inner products with $M$ random vectors; of course, $M^*=N$ in this case. There are many tasks possible with data analysis, signal processing, etc., and it’s interesting to see what phase transitions emerge in these cases. The following paper introduces some useful techniques for exactly characterizing phase transitions of tasks involving convex optimization:

Living on the edge: A geometric theory of phase transitions in convex optimization

Dennis Amelunxen, Martin Lotz, Michael B. McCoy, Joel A. Tropp

Along with this paper, I highly recommend watching this lecture by Joel Tropp on the same topic. This blog entry is based on both the paper and the lecture.

# Gordon’s escape through a mesh theorem

In many applications (such as compressed sensing), we use a random projection to capture a subset of $\mathbb{R}^n$. Sometimes, you can show that a random projection performs well by appealing to the Johnson–Lindenstrauss lemma, which says that the pairwise distances of a finite collection of points are nearly preserved (for example, JL implies RIP). In other cases, we might call a projection “good” if its null space avoids a certain subset of $\mathbb{R}^n$ (think the null space property), but how might we show that a random projection is good? By appealing to Gordon’s escape through a mesh theorem, which says that a random subspace avoids a subset (“escapes a mesh”) provided the subset is small in some sense. The purpose of this blog entry is to prove this theorem and provide some intuition.

Throughout, we take $g_k$ to be a $k$-dimensional vector with iid $N(0,1)$ entries, and we denote $a_k:=\mathbb{E}\|g_k\|_2$, which satisfies $k/\sqrt{k+1}. Given a closed subset $S\subseteq\mathbb{R}^n$, then its Gaussian width is defined as

$\displaystyle{w(S):=\mathbb{E}\Big[\max_{x\in S}g_n^\top x\Big]}$.

As we will see, Gaussian width is an important measure of the size of a set. Intuitively, the Gaussian width of $S$ is more or less the expected width of $S$ when measured in a random direction (it’s actually half of the mean width, as defined here), and it plays a crucial role in several results, including the following:

# My research explained, using only the 1000 most common English words

[Inspired by ScottAfonso and Joel, using the Up-Goer Five Text Editor, which in turn was inspired by this xkcd. I actually only use 261 distinct words.]

Let’s say you have a picture, a piece of music, or a movie that you want to store on your computer. You can do this without taking up your entire hard drive, but why? Because there’s a way to look at each of these things so that they appear very simple: Imagine someone is making a movie of you reading this. You’re just sitting there. Maybe a fly is flying around the room, but not much is changing. That means each moment of the movie looks a lot like the one right before, and this makes it very easy to store the entire movie on a computer.

The fact that pictures and such are so simple allows you to do other cool stuff. Let’s say you find your favorite movie in the back of a second-hand store, but when you watch it at home, different marks pop up every now and then. Since movies are so simple, you can use a computer to fill in what you can’t see, and make it good as new.

# Deterministic RIP matrices: Breaking the square-root bottleneck, III

This blog entry is the third in a series which seeks to maximize $\epsilon_0$ such that there exist explicit $M\times N$ matrices of arbitrarily large aspect ratio $N/M$ which are RIP for $K=\Omega(M^{1/2+\epsilon_0})$. Continuing our study of the breakthrough paper by Bourgain et al., this entry will identify what the set $\mathcal{A}$ needs to satisfy, how to construct it, and we will also use this set to complete the proof of the main result. In particular, we will prove a general version of the result which allows for optimizing constants, and then we will perform an optimization routine that improves $\epsilon_0$ by a few orders of magnitude.

— Everything you need to know about $\mathcal{A}$

The main result will use an even number $m$ as a free parameter, which in turn determines another parameter $\alpha=\alpha(m)$. For a given choice of $m$, the proof of the main result requires a subset $\mathcal{A}\subseteq\mathbb{F}_p$ such that (i)

$|\mathcal{A}|\leq p^\alpha \quad (2.3)$

(i.e., $\mathcal{A}$ is sufficiently small), and (ii) for any $a\in\mathcal{A}$, the only way for $a_1,\ldots,a_{2m}\in\mathcal{A}\setminus\{a\}$ to satisfy

$\displaystyle{\sum_{j=1}^m\frac{1}{a-a_j}=\sum_{j=m+1}^{2m}\frac{1}{a-a_j} \quad (2.4)}$

is by taking $(a_1,\ldots,a_m)$ and $(a_{m+1},\ldots,a_{2m})$ to be permutations of each other. Here, division (and addition) is taken in the field $\mathbb{F}_p$.

# Matheon Workshop 2013: Compressed Sensing and its Applications

A couple of weeks ago, I was in Berlin for this workshop, and the talks were amazing — the organizers really did a fantastic job. For most of the talks, the slides are available here. There was an interesting mix of speakers; some had big problems in certain applications with the hope that compressed sensing might help, and others presented various generalizations and unifications of the standard compressed sensing theory. Overall, I noticed a few trends in the content of the talks:

• applying CS theory to imaging and video
• CS problems over the continuum or in a coherent dictionary
• nonlinear CS and phase retrieval
• dimension reduction for classification and machine learning
• measurement derandomization

The rest of this blog entry summarizes each talk. Some summaries are more brief than others, mostly according to how fervently I was writing notes during the talk.

# Deterministic RIP matrices: Breaking the square-root bottleneck, II

In my previous blog entry, I kicked off an online project to make progress on the deterministic RIP matrix problem by optimizing the analysis in a breakthrough paper by Bourgain et al. Since then, I launched a wiki to maintain a “scoreboard” of world records for this problem, and Afonso Bandeira made his own contribution to the project. The purpose of this entry is to introduce some additive combinatorics and to develop some understanding of how they are leveraged to prove RIP for a particular matrix construction. While this entry does not immediately produce a new world record, it does optimize the analysis of Corollaries 3 and 4 of the paper. I plan to generalize Lemmas 9 and 10 in a future blog post, which will lead to an explicit optimization problem that incorporates the work in this entry (and a new world record, hopefully).

— A brief introduction to additive combinatorics —

In this section, we briefly detail some key ideas from additive combinatorics. Given an additive group $G$ and finite sets $A,B\subseteq G$, we can define the sumset

$A+B:=\{a+b:a\in A,~b\in B\}$,

the difference set

$A-B:=\{a-b:a\in A,~b\in B\}$

(not to be confused with a difference set), and the additive energy

$E(A,B):=\#\big\{(a_1,a_2,b_1,b_2)\in A^2\times B^2:a_1+b_1=a_2+b_2\big\}$.

These definitions are useful in quantifying the additive structure of a set. In particular, consider the following:

# Deterministic RIP matrices: Breaking the square-root bottleneck

Unless you’ve been living under a rock, you probably know about Polymath8, and how it has been a dramatic success. In May, Yitang Zhang published a proof that there are infinitely many pairs of primes which differ by 70 million or less, and since then, the Polymath project exploded to decrease this number considerably. In a recent blog entry, Terry Tao took some time to reflect on why this Polymath project received so much attention. Here is one of the reasons he points out:

One could summarise the current state of progress by a single natural number $H$, which implied by infinite descent that the project was guaranteed to terminate at some point, but also made it possible to set up a “scoreboard” that could be quickly and easily updated.

In reading this, I was reminded of another big open problem — perhaps the biggest open problem in compressed sensing:

The deterministic RIP matrix problem. Construct an infinite family of deterministic matrices $\{\Phi_M\}$, where $\Phi_M$ is $M\times N(M)$ and $M$ and $N(M)/M$ are both arbitrarily large, such that each $\Phi_M$ satisfies the $(K,\delta)$-restricted isometry property (RIP) with $K=\Omega(M^{1-\epsilon})$ for all $\epsilon>0$ and with $\delta<1/2$.

# Phase retrieval from coded diffraction patterns

In a previous post, I described a paper I wrote with Afonso and Yutong about how to design $O(\log n)$ masked illuminations that enable efficient phase retrieval of $n$-dimensional signals for X-ray crystallography and related applications. This masked-illumination methodology was originally proposed in this paper, and our phase retrieval guarantee was based on a recovery method known as polarization. This week, the following paper was posted online, and it gives the first guarantee of this kind for a more popular recovery method called PhaseLift:

Phase retrieval from coded diffraction patterns

Emmanuel J. Candes, Xiaodong Li, Mahdi Soltanolkotabi

In particular, this paper shows that $O(\log^4 n)$ masked illuminations enable PhaseLift recovery, and their result actually holds for a wide assortment of masks. Since this paper is so related to my research, I decided to interview one of the authors (Mahdi Soltanolkotabi). I’ve lightly edited his responses for formatting and hyperlinks:

# A geometric intuition for the null space property

Solving underdetermined linear systems is impossible unless you’re given additional information about the solution. The goal of compressed sensing is to solve these systems by assuming the desired solution is sparse. In this vein, one common approach is L1 minimization: Find the minimizer of $\|z\|_1$ subject to $Az=Ax$, where $x$ is the true solution (and so $Ax$ is the available data). Here is the standard picture to illustrate why L1 minimization works:

In this case, we applied the sensing matrix $A=[1,3]$, whose null space is the set of scalar multiples of $[3;-1]$. The vector we are sensing is $x=[0;1]$, and the red line denotes the null space of $A$ shifted by $x$ (i.e., the set of all $z$ such that $Az=Ax$). The blue diamond illustrates the smallest L1-ball which intersects this shifted subspace, and we note that this intersection occurs at our signal $x$. As such, L1 minimization encouraged sparsity so well that it recovered the desired signal exactly.

The purpose of this blog entry is to gain a deeper intuition for why L1 minimization works so well. The following (rather obvious) lemma lies at the heart of this intuition:

# A fully automatic problem solver with human-style output

I think most would agree that the way we do math research has completely changed with technology in the last few decades. Today, I type research notes in LaTeX, I run simulations in MATLAB and Mathematica, I email with collaborators on a daily basis, I read the arXiv and various math blogs to keep in the know, and when I get stuck on something that’s a little outside my expertise, I ask a question on MathOverflow. With this in mind, can you think of another step we can take with technology that will revolutionize the way we conduct math research? It might sound ambitious, but the following paper is looking to make one such step:

A fully automatic problem solver with human-style output

M. Ganesalingam, W. T. Gowers

The vision of this paper is to make automated provers extremely mathematician-friendly so that they can be used on a day-to-day basis to help prove various lemmas and theorems. Today, we might use computers as a last resort to verify a slew of cases (e.g., to prove the four color theorem or the Kepler conjecture). The hope is that in the future, we will be able to seamlessly interface with computers to efficiently implement human-type logic (imagine HAL 9000 as a collaborator).

The paper balances its ambitious vision with a modest scope: The goal is to produce an algorithm which emulates the way human mathematicians (1) prove some of the simplest results that might appear in undergraduate-level math homework, and (2) write the proofs in LaTeX. To be fair, the only thing modest about this scope is the hardness of the results that are attempted, and as a first step toward the overall vision, this simplification is certainly acceptable. (Examples of attempted results include “a closed subset of a complete metric space is complete” and “the intersection of open sets is open.”)