Solving underdetermined linear systems is impossible unless you’re given additional information about the solution. The goal of compressed sensing is to solve these systems by assuming the desired solution is sparse. In this vein, one common approach is L1 minimization: Find the minimizer of subject to , where is the true solution (and so is the available data). Here is the standard picture to illustrate why L1 minimization works:
In this case, we applied the sensing matrix , whose null space is the set of scalar multiples of . The vector we are sensing is , and the red line denotes the null space of shifted by (i.e., the set of all such that ). The blue diamond illustrates the smallest L1-ball which intersects this shifted subspace, and we note that this intersection occurs at our signal . As such, L1 minimization encouraged sparsity so well that it recovered the desired signal exactly.
The purpose of this blog entry is to gain a deeper intuition for why L1 minimization works so well. The following (rather obvious) lemma lies at the heart of this intuition:
Lemma. Fix some sensing matrix , and for every signal , consider the corresponding feasible set
Then uniquely minimizes subject to if and only if .
The beauty of this lemma is that it recasts exact recovery from L1 minimization purely in terms of convex geometry: A good null space for will be oriented in such a way that its shift by (namely, ) will kiss uniquely at . We will apply this understanding to relate the following two concepts:
(i) An sensing matrix is said to satisfy the exact recovery property for (or -ERP) if every supported on uniquely minimizes subject to .
(ii) An sensing matrix is said to satisfy the null space property for (or -NSP) if every satisfies .
The reader who is familiar with compressed sensing is probably aware that a matrix is -ERP for every of size if and only if it’s -NSP for every of size . The following theorem is a more specific version of this result:
Theorem. A matrix is -ERP if and only if it’s -NSP.
Before proving this theorem, we note that in our example above, is evidently -ERP, and indeed, is also -NSP since every member of the null space has the form , which satisfies whenever .
Proof: We will obtain the contrapositive of each direction.
() Suppose is not -NSP. Then there exists such that . In other words, setting and gives that . Also, since , and since . The following illustrates this situation:
In this figure, is denoted by a dotted red line. Overall, we have (since it must also contain ), and so is not -ERP by the lemma.
() Suppose is not -ERP. Then by the lemma, there exists a vector with support in such that contains some . Pick . Since , we have that is in the null space of , and furthermore, since . The following illustrates this situation:
Evidently, is smaller than , and this can be established in general by adding a convenient form of zero, and then applying the triangle inequality along with the fact that :
As such, is not -NSP.
As an aside, let’s consider how to build -NSP matrices. To this end, the restricted isometry property is commonly used as a sufficient condition for the null space property. As you might expect, there’s an analogous condition for -NSP:
Definition. An sensing matrix is said to satisfy the restricted isometry property for and (or -RIP) if
for every with support in and every with support size at most .
In words, -RIP matrices preserve distances between vectors supported on and other sparse vectors. By taking , we see that this version of RIP implies the classical version of RIP with sparsity level . As such, this is not much of a weakening of RIP. Regardless, the following result follows from the analysis in the classical proof that RIP implies NSP:
Theorem. If is -RIP with , then is -NSP.
Set . Since -RIP implies the classical -RIP, a random matrix will need rows to satisfy -RIP. As such, we’re not saving much in the number of measurements compared to drawing a -RIP matrix to get NSP (i.e., -NSP for every of size simultaneously). However, we can certainly lose the log factor, for example, by measuring with identity basis elements: . This disparity leads to the following problem:
Problem. Given a collection of subsets , how large must be for there to exist an matrix which is -NSP for every ?
Depending on , the answer can range from to , and even though the log factor might be negligible, some (real-world) choices for might lead to a much smaller constant (which could be important for certain applications).