The restricted isometry property and its implications for compressed sensing

I’ve blogged a couple of times about the restricted isometry property (RIP), but this is my first entry that really digs into why this property is so important to compressed sensing. A short, fat matrix \Phi is (K,\delta)-RIP if for every K-sparse vector x,

\displaystyle{ (1-\delta)\|x\|_2^2\leq\|\Phi x\|_2^2\leq(1+\delta)\|x\|_2^2}.

The main point here is that RIP is sufficient to guarantee sparse reconstruction by L1-minimization (we make this more explicit in a theorem statement below). It is known that L1-minimization reconstructs every sparse signal precisely when the sensing matrix satisfies the null space property (NSP), and so one way to prove that RIP is sufficient is to show that RIP implies NSP. The following paper bypasses the NSP analysis by giving a direct result for RIP:

The restricted isometry property and its implications for compressed sensing

Emmanuel J. Candes

As a disclaimer, some have argued that the RIP guarantees are far from the fundamental limits of sparse recovery. This is not too surprising, considering the number of approximations which are used in the proof of the RIP guarantees (see below). Regardless, there is no question that the RIP framework has given the community a solid intuition for good sensing matrices; we want short, fat matrices for which columns are incoherent—not only pairwise incoherent, but K-wise independent. Also, RIP provides guarantees which are, perhaps surprisingly, on the order of what’s optimal; it’s just a matter of application whether you need to support larger sparsity levels.

As promised, the following is the main result of the paper:

Theorem (Candes). Suppose the M\times N matrix \Phi is (2K,\delta)-RIP with \delta<\sqrt{2}-1, and take measurements of a (nearly K-sparse) vector x\in\mathbb{R}^N of the form

\displaystyle{y=\Phi x+z},

where z is some noise term with \|z\|_2\leq\varepsilon. There exist constants C_0 and C_1 such that the estimate

\displaystyle{x^\star:=\min_{\tilde{x}\in\mathbb{R}^M}\|\tilde{x}\|_1\quad \mbox{subject to}\quad \|y-\Phi \tilde{x}\|_2\leq \varepsilon}

satisfies

\displaystyle{\|x^\star-x\|_2\leq C_0\varepsilon+\frac{C_1}{\sqrt{K}}\|x-x_K\|_1},

where x_K is the best K-sparse approximation of x.

While the proof as written in the paper is good (and in my opinion, easier to read than many of its predecessors), I would prefer if the logic flowed differently. As a specific example, I don’t think his Lemma 2.1 is technical or long enough to warrant being a separate lemma. It’s cool that RIP is related to restricted orthogonality (which this lemma proves), but restricted orthogonality isn’t explicitly mentioned in the paper, so this relationship is lost on the uninformed reader. Since the lemma’s proof is so short, I’d rather see this logic hard-coded in the proof of the main result to better motivate that part of the proof. Below, I provide a remix of Candes’s proof, which is easier for me to follow. Your mileage may vary, but feel free to read my proof and cite Candes in your papers. 🙂

Proof of Theorem (remix): To see how close the estimate x^\star is to x, we consider their difference, h:=x^\star-x. Next, partition \{1,\ldots,N\} into size-K subsets \{T_j\}_{j\geq0} such that T_0 corresponds to the K largest entries of x, and for each j\geq 1, T_j corresponds to the K largest entries of h in \{1,\ldots,N\}\setminus\bigcup_{\ell=0}^{j-1} T_\ell.

To show that h is small, we bound \|h_{(T_0\cup T_1)^\mathrm{c}}\|_2 and \|h_{T_0\cup T_1}\|_2 separately. The first bound uses standard inequalities, while the second uses standard inequalities in concert with RIP. For the first bound, the triangle inequality gives

\displaystyle{\|h_{(T_0\cup T_1)^\mathrm{c}}\|_2=\bigg\|\sum_{j\geq 2}h_{T_j}\bigg\|_2\leq\sum_{j\geq 2}\|h_{T_j}\|_2}.

Next, we bound each term in this sum using the definitions of T_j and T_{j-1}:

\displaystyle{\|h_{T_j}\|_2\leq\sqrt{K}\max_{n\in T_j}|h[n]|\leq\sqrt{K}\min_{n\in T_{j-1}}|h[n]|\leq\frac{1}{\sqrt{K}}\|h_{T_{j-1}}\|_1,\qquad(1) }

where the last inequality follows from the minimum being smaller than the average. Applying this inequality to the original expression then gives

\displaystyle{\|h_{(T_0\cup T_1)^\mathrm{c}}\|_2\leq\frac{1}{\sqrt{K}}\|h_{T_0^\mathrm{c}}\|_1. \qquad (A)}

Next, we bound \|h_{T_0\cup T_1}\|_2 using lower-RIP:

\displaystyle{ (1-\delta)\|h_{T_0\cup T_1}\|_2^2\leq\|\Phi h_{T_0\cup T_1}\|_2^2=\langle\Phi h_{T_0\cup T_1}, \Phi h\rangle-\sum_{j\geq 2}\langle\Phi h_{T_0\cup T_1},\Phi h_{T_j}\rangle}

\displaystyle{\leq\langle\Phi h_{T_0\cup T_1}, \Phi h\rangle+\sum_{j\geq 2}|\langle\Phi h_{T_0\cup T_1},\Phi h_{T_j}\rangle|.\qquad(2)}

We continue this bound by analyzing the two terms separately. Cauchy-Schwarz and upper-RIP give

\displaystyle{\langle\Phi h_{T_0\cup T_1}, \Phi h\rangle\leq\|\Phi h_{T_0\cup T_1}\|_2\|\Phi h\|_2\leq\sqrt{1+\delta}\|h_{T_0\cup T_1}\|_2\|\Phi h\|_2},

and furthermore, the triangle inequality gives

\displaystyle{\|\Phi h\|_2=\|\Phi (x^\star-x)\|_2=\|\Phi x^\star-y+y-\Phi x\|_2\leq\|\Phi x^\star-y\|_2+\|y-\Phi x\|_2\leq 2\varepsilon},

where the last step follows from the fact that x and x^\star are both feasible solutions of the program described in the theorem. For the second term of (2), we note that

\displaystyle{|\langle\Phi h_{T_0\cup T_1},\Phi h_{T_j}\rangle|=|\langle\Phi h_{T_0},\Phi h_{T_j}\rangle+\langle\Phi h_{T_1},\Phi h_{T_j}\rangle|}

\displaystyle{\leq|\langle\Phi h_{T_0},\Phi h_{T_j}\rangle|+|\langle\Phi h_{T_1},\Phi h_{T_j}\rangle|.\qquad (3)}

For notational simplicity, let’s denote normalized vectors with hats:

\displaystyle{|\langle\Phi h_{T_0},\Phi h_{T_j}\rangle|=|\langle\Phi \widehat{h_{T_0}},\Phi \widehat{h_{T_j}}\rangle|\|h_{T_0}\|_2\|h_{T_j}\|_2}.

Then the parallelogram identity and RIP together give

\displaystyle{|\langle\Phi \widehat{h_{T_0}},\Phi \widehat{h_{T_j}}\rangle|=\frac{1}{4}\Big|\|\Phi \widehat{h_{T_0}}+\Phi \widehat{h_{T_j}}\|_2^2-\|\Phi \widehat{h_{T_0}}-\Phi \widehat{h_{T_j}}\|_2^2\Big|}

\displaystyle{\leq\frac{1}{4}\Big|(1+\delta)\|\widehat{h_{T_0}}+\widehat{h_{T_j}}\|_2^2-(1-\delta)\|\widehat{h_{T_0}}-\widehat{h_{T_j}}\|_2^2\Big|=\delta},

where the last equality follows from the fact that \widehat{h_{T_0}} and \widehat{h_{T_j}} are unit vectors with disjoint support. Since this analysis also holds for h_{T_1}, we substitute the bounds into (3) accordingly:

\displaystyle{|\langle\Phi h_{T_0\cup T_1},\Phi h_{T_j}\rangle|\leq\delta\big(\|h_{T_0}\|_2+\|h_{T_1}\|_2\big)\|h_{T_j}\|_2\leq\sqrt{2}\delta\|h_{T_0\cup T_1}\|_2\|h_{T_j}\|_2}.

To summarize, we have the following bound on (2):

\displaystyle{(1-\delta)\|h_{T_0\cup T_1}\|_2^2\leq\|h_{T_0\cup T_1}\|_2\bigg(2\varepsilon\sqrt{1+\delta}+\sqrt{2}\delta\sum_{j\geq 2}\|h_{T_j}\|_2\bigg)},

which, after applying (1), can be rearranged to have the form

\displaystyle{\|h_{T_0\cup T_1}\|_2\leq\alpha\varepsilon+\frac{\rho}{\sqrt{K}}\|h_{T_0^\mathrm{c}}\|_1,\qquad(B)}

where \alpha and \rho are only dependent on \delta.

At this point, we have the following bounds:

\displaystyle{\|h_{(T_0\cup T_1)^\mathrm{c}}\|_2\leq\frac{1}{\sqrt{K}}\|h_{T_0^\mathrm{c}}\|_1, \qquad (A)}

\displaystyle{\|h_{T_0\cup T_1}\|_2\leq\alpha\varepsilon+\frac{\rho}{\sqrt{K}}\|h_{T_0^\mathrm{c}}\|_1,\qquad(B)}

both of which are in terms of \|h_{T_0^\mathrm{c}}\|_1. To analyze this, we apply the reverse triangle inequality:

\displaystyle{\|x\|_1\geq\|x^\star\|_1=\|x+h\|_1=\|x_{T_0}+h_{T_0}\|_1+\|x_{T_0^\mathrm{c}}+h_{T_0^\mathrm{c}}\|_1}

\displaystyle{\geq \big(\|x_{T_0}\|_1-\|h_{T_0}\|_1\big)+\big(\|h_{T_0^\mathrm{c}}\|_1-\|x_{T_0^\mathrm{c}}\|_1\big)},

and rearranging gives

\displaystyle{\|h_{T_0^\mathrm{c}}\|_1\leq \|h_{T_0}\|_1+2\|x_{T_0^\mathrm{c}}\|_1\leq\sqrt{K}\|h_{T_0}\|_2+2\|x-x_K\|_1}

\displaystyle{\leq \sqrt{K}\|h_{T_0\cup T_1}\|_2+2\|x-x_K\|_1},

where the second inequality applies Cauchy-Schwarz to the all-ones vector, along with the identification x_{T_0}=x_K. With this, (A) becomes

\displaystyle{\|h_{(T_0\cup T_1)^\mathrm{c}}\|_2\leq\|h_{T_0\cup T_1}\|_2+\frac{2}{\sqrt{K}}\|x-x_K\|_1, \qquad (A')}

and (B) becomes

\displaystyle{\|h_{T_0\cup T_1}\|_2\leq\alpha\varepsilon+\rho\bigg(\|h_{T_0\cup T_1}\|_2+\frac{2}{\sqrt{K}}\|x-x_K\|_1\bigg)}.

Isolating \|h_{T_0\cup T_1}\|_2 in this last inequality then gives

\displaystyle{\|h_{T_0\cup T_1}\|_2\leq\frac{1}{1-\rho}\bigg(\alpha\varepsilon+\frac{2\rho}{\sqrt{K}}\|x-x_K\|_1\bigg). \qquad(B')}

Finally, we use (A') to bound \|h\|_2 in terms of \|h_{T_0\cup T_1}\|_2:

\displaystyle{\|h\|_2\leq\|h_{(T_0\cup T_1)^\mathrm{c}}\|_2+\|h_{T_0\cup T_1}\|_2\leq2\|h_{T_0\cup T_1}\|_2+\frac{2}{\sqrt{K}}\|x-x_K\|_1},

and then (B') gives the result.     \Box

Advertisements

8 thoughts on “The restricted isometry property and its implications for compressed sensing

  1. 1. Please give some more explanation about inequalities in Eq(1). The factor Sqrt(k) is taken only for sake of proof or is there any reason?

    2. In the parallelogram identity how the factor (1-delta) came in equation?

  2. What is the difference between delta_s and delta_2s ? What does it indicates?

    Why particularly the value of delta_2s chosen as less that sqrt(2)-1 ?

    How delta_2s=1 implies linearly dependent vectors and delta_2s < 1 implies unique solution?

    If delta_2s < 1 gives unique solution then why we select delta_2s < sqrt(2)-1 ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s