I’ve blogged a couple of times about the restricted isometry property (RIP), but this is my first entry that really digs into why this property is so important to compressed sensing. A short, fat matrix is
-RIP if for every
-sparse vector
,
.
The main point here is that RIP is sufficient to guarantee sparse reconstruction by L1-minimization (we make this more explicit in a theorem statement below). It is known that L1-minimization reconstructs every sparse signal precisely when the sensing matrix satisfies the null space property (NSP), and so one way to prove that RIP is sufficient is to show that RIP implies NSP. The following paper bypasses the NSP analysis by giving a direct result for RIP:
The restricted isometry property and its implications for compressed sensing
Emmanuel J. Candes
As a disclaimer, some have argued that the RIP guarantees are far from the fundamental limits of sparse recovery. This is not too surprising, considering the number of approximations which are used in the proof of the RIP guarantees (see below). Regardless, there is no question that the RIP framework has given the community a solid intuition for good sensing matrices; we want short, fat matrices for which columns are incoherent—not only pairwise incoherent, but -wise independent. Also, RIP provides guarantees which are, perhaps surprisingly, on the order of what’s optimal; it’s just a matter of application whether you need to support larger sparsity levels.
As promised, the following is the main result of the paper:
Theorem (Candes). Suppose the matrix
is
-RIP with
, and take measurements of a (nearly
-sparse) vector
of the form
,
where is some noise term with
. There exist constants
and
such that the estimate
satisfies
,
where is the best
-sparse approximation of
.
While the proof as written in the paper is good (and in my opinion, easier to read than many of its predecessors), I would prefer if the logic flowed differently. As a specific example, I don’t think his Lemma 2.1 is technical or long enough to warrant being a separate lemma. It’s cool that RIP is related to restricted orthogonality (which this lemma proves), but restricted orthogonality isn’t explicitly mentioned in the paper, so this relationship is lost on the uninformed reader. Since the lemma’s proof is so short, I’d rather see this logic hard-coded in the proof of the main result to better motivate that part of the proof. Below, I provide a remix of Candes’s proof, which is easier for me to follow. Your mileage may vary, but feel free to read my proof and cite Candes in your papers. 🙂
Proof of Theorem (remix): To see how close the estimate is to
, we consider their difference,
. Next, partition
into size-
subsets
such that
corresponds to the
largest entries of
, and for each
,
corresponds to the
largest entries of
in
.
To show that is small, we bound
and
separately. The first bound uses standard inequalities, while the second uses standard inequalities in concert with RIP. For the first bound, the triangle inequality gives
.
Next, we bound each term in this sum using the definitions of and
:
where the last inequality follows from the minimum being smaller than the average. Applying this inequality to the original expression then gives
Next, we bound using lower-RIP:
We continue this bound by analyzing the two terms separately. Cauchy-Schwarz and upper-RIP give
,
and furthermore, the triangle inequality gives
,
where the last step follows from the fact that and
are both feasible solutions of the program described in the theorem. For the second term of
, we note that
For notational simplicity, let’s denote normalized vectors with hats:
.
Then the parallelogram identity and RIP together give
,
where the last equality follows from the fact that and
are unit vectors with disjoint support. Since this analysis also holds for
, we substitute the bounds into
accordingly:
.
To summarize, we have the following bound on :
,
which, after applying , can be rearranged to have the form
where and
are only dependent on
.
At this point, we have the following bounds:
both of which are in terms of . To analyze this, we apply the reverse triangle inequality:
,
and rearranging gives
,
where the second inequality applies Cauchy-Schwarz to the all-ones vector, along with the identification . With this,
becomes
and becomes
.
Isolating in this last inequality then gives
Finally, we use to bound
in terms of
:
,
and then gives the result.
1. Please give some more explanation about inequalities in Eq(1). The factor Sqrt(k) is taken only for sake of proof or is there any reason?
2. In the parallelogram identity how the factor (1-delta) came in equation?
You need the sqrt(k) in order to bound the 2-norm in terms of the infinity-norm:
http://en.wikipedia.org/wiki/Norm_%28mathematics%29
The factor of (1-delta) is used to give an upper bound on the negative term (i.e., lower bound on the thing being negated).
||h_T0||1 Leq sqrt(k)||h_T0||2 How?
This is how to bound the 1-norm in terms of the 2-norm:
http://en.wikipedia.org/wiki/Norm_%28mathematics%29
You can prove this using Cauchy-Schwarz.
What is the difference between delta_s and delta_2s ? What does it indicates?
Why particularly the value of delta_2s chosen as less that sqrt(2)-1 ?
How delta_2s=1 implies linearly dependent vectors and delta_2s < 1 implies unique solution?
If delta_2s < 1 gives unique solution then why we select delta_2s < sqrt(2)-1 ?
Is there any article available written by you on RIP for random matrices?