I’ve blogged a couple of times about the restricted isometry property (RIP), but this is my first entry that really digs into why this property is so important to compressed sensing. A short, fat matrix is -RIP if for every -sparse vector ,
The main point here is that RIP is sufficient to guarantee sparse reconstruction by L1-minimization (we make this more explicit in a theorem statement below). It is known that L1-minimization reconstructs every sparse signal precisely when the sensing matrix satisfies the null space property (NSP), and so one way to prove that RIP is sufficient is to show that RIP implies NSP. The following paper bypasses the NSP analysis by giving a direct result for RIP:
Emmanuel J. Candes
As a disclaimer, some have argued that the RIP guarantees are far from the fundamental limits of sparse recovery. This is not too surprising, considering the number of approximations which are used in the proof of the RIP guarantees (see below). Regardless, there is no question that the RIP framework has given the community a solid intuition for good sensing matrices; we want short, fat matrices for which columns are incoherent—not only pairwise incoherent, but -wise independent. Also, RIP provides guarantees which are, perhaps surprisingly, on the order of what’s optimal; it’s just a matter of application whether you need to support larger sparsity levels.
As promised, the following is the main result of the paper:
Theorem (Candes). Suppose the matrix is -RIP with , and take measurements of a (nearly -sparse) vector of the form
where is some noise term with . There exist constants and such that the estimate
where is the best -sparse approximation of .
While the proof as written in the paper is good (and in my opinion, easier to read than many of its predecessors), I would prefer if the logic flowed differently. As a specific example, I don’t think his Lemma 2.1 is technical or long enough to warrant being a separate lemma. It’s cool that RIP is related to restricted orthogonality (which this lemma proves), but restricted orthogonality isn’t explicitly mentioned in the paper, so this relationship is lost on the uninformed reader. Since the lemma’s proof is so short, I’d rather see this logic hard-coded in the proof of the main result to better motivate that part of the proof. Below, I provide a remix of Candes’s proof, which is easier for me to follow. Your mileage may vary, but feel free to read my proof and cite Candes in your papers. 🙂
Proof of Theorem (remix): To see how close the estimate is to , we consider their difference, . Next, partition into size- subsets such that corresponds to the largest entries of , and for each , corresponds to the largest entries of in .
To show that is small, we bound and separately. The first bound uses standard inequalities, while the second uses standard inequalities in concert with RIP. For the first bound, the triangle inequality gives
Next, we bound each term in this sum using the definitions of and :
where the last inequality follows from the minimum being smaller than the average. Applying this inequality to the original expression then gives
Next, we bound using lower-RIP:
We continue this bound by analyzing the two terms separately. Cauchy-Schwarz and upper-RIP give
and furthermore, the triangle inequality gives
where the last step follows from the fact that and are both feasible solutions of the program described in the theorem. For the second term of , we note that
For notational simplicity, let’s denote normalized vectors with hats:
Then the parallelogram identity and RIP together give
where the last equality follows from the fact that and are unit vectors with disjoint support. Since this analysis also holds for , we substitute the bounds into accordingly:
To summarize, we have the following bound on :
which, after applying , can be rearranged to have the form
where and are only dependent on .
At this point, we have the following bounds:
both of which are in terms of . To analyze this, we apply the reverse triangle inequality:
and rearranging gives
where the second inequality applies Cauchy-Schwarz to the all-ones vector, along with the identification . With this, becomes
Isolating in this last inequality then gives
Finally, we use to bound in terms of :
and then gives the result.