# Robust width: A characterization of uniformly stable and robust compressed sensing

Jameson Cahill and I recently posted a new paper on the arXiv that we’re pretty excited about. Suppose you want to do compressed sensing with L1 minimization. That is, you get $y=\Phi x^\natural+e$ for some nearly $K$-sparse vector $x^\natural$ and some noise $e$ satisfying $\|e\|_2\leq\epsilon$, and then you attempt to reconstruct $x^\natural$ by taking

$\arg\min \quad \|x\|_1 \quad\mbox{ subject to } \quad\|\Phi x-y\|_2\leq\epsilon.$

One of the main results of compressed sensing is that the optimizer $x^\star$ of the above program satisfies

$\displaystyle{\|x^\star-x^\natural\|_2\leq \frac{C_0}{\sqrt{K}}\|x^\natural-x^\natural_K\|_1+C_1\epsilon}$

provided $\Phi$ satisfies the $(2K,\delta)$-restricted isometry property (RIP) for some $\delta<\sqrt{2}-1$. Here, $x^\natural_K$ denotes the best $K$-term approximation of $x^\natural$, and so $\|x^\natural-x^\natural_K\|_1$ captures how close $x^\natural$ is to being $K$-sparse. Jameson and I wondered if RIP was necessary to enjoy this uniformly stable and robust performance (uniform in the sense that the fixed matrix works for all $x^\natural$ simultaneously, stable because we don’t require $x^\natural$ to be exactly $K$-sparse, and robust because we allow noise). In our paper, we completely characterize the matrices that offer this performance, and we show that RIP matrices form a strict subset.

Before introducing the property, consider the following illustration of a high-dimensional L1 ball:

(This is taken from slides of Roman Vershynin; forgive the lack of symmetry that you’d expect in an L1 ball.) While this depiction is obviously not convex, it captures an important property of high-dimensional convex sets: They are composed of a round bulk, which accounts for the vast majority of the volume, and of a few outliers (the extreme points), which contribute a few long tentacles of negligible volume. In the case of the L1 ball, the extreme points correspond to sparse vectors. In order for $\Phi$ to distinguish any pair of $K$-sparse vectors, we require its nullspace to avoid all $2K$-sparse vectors. One way to enforce this is to require the nullspace $E$ to intersect the L1 ball with sufficiently small width, since this in turn would imply that every $2K$-sparse tentacle is avoided:

As Kashin and Temlyakov remark, this so-called width property is equivalent to having uniform stability in the L1 minimizer when $\epsilon=0$. Furthermore, by Milman’s low M* estimate, a randomly selected nullspace will satisfy the width property provided its dimension is appropriately small (“a random subspace passes through the bulk”). This is one way to explain why random matrices perform so well in compressed sensing.

So how does one account for noise? To do this, our paper introduces the robust width property: We say $\Phi$ satisfies the $(K,c_0,c_1)$-robust width property (RWP) if

$\displaystyle{\|x\|_2\leq\frac{c_0}{\sqrt{K}}\|x\|_1}$

for every $x$ such that $\|\Phi x\|_2. Our main result says that having uniformly stable and robust reconstruction with constants $K$, $C_0$ and $C_1$ is equivalent to having $(K,c_0,c_1)$-RWP with $c_0\asymp C_0$ and $c_1\asymp C_1^{-1}$ (we lose a factor of 2 or 4 in both directions of the equivalence). In the special case where the rows of $\Phi$ are orthonormal, RWP means that every slight perturbation of the nullspace satisfies the width property (here, perturbation is measured in the gap metric, and $c_1$ bounds the size of the acceptable perturbations). Also observe the contrapositive of RWP: that $\|\Phi x\|_2\geq c_1\|x\|_2$ for every “nearly $K$-sparse” $x$, that is for every $x$ satisfying $\|x\|_1/\|x\|_2; this is reminiscent of the lower RIP bound. Finally, in the last section of the paper, we use ideas from this paper to find a matrix which satisfies RWP, but not RIP, meaning the containment is strict.

In recent years, several instances of compressed sensing have emerged, for example, in which one seeks weighted sparsity, block sparsity, gradient sparsity, or low-rank matrices. For each of these instances, there is a convex program which has been proven to provide uniformly stable and robust reconstruction. Our paper provides a general robust width property which captures all of these instances simultaneously.