This post will discuss an intuitive and fast reconstruction algorithm for compressed sensing called one-step thresholding. Since these are the first notes I’ve written about compressed sensing, I’ll take some time to review the basics:
Compressed sensing is a field where you want to capture all of a signal’s information in as few non-adaptive linear measurements as possible. Assuming there is no noise in the measurement process, each measurement is an entry of , where
is a short, fat matrix. Of course, since the rank of
is smaller than the dimension of
, we know
will fail to completely describe
unless we have additional information about
. Oftentimes, signals are known to be sparse in some orthonormal basis, and so this is the prior that is typically used.
So if we know is a sparse vector, that is, it has few nonzero entries, then it makes sense to reconstruct
from
by looking for the sparsest vector in the preimage of
. However, finding this vector is known to be NP-hard. Instead of looking for the sparsest vector, aka the vector of minimal
-norm, it has become a standard trick to convexify the
-norm objective function into the
-norm. In fact, both objective functions have the same unique global minimizer precisely when
satisfies the null space property. Also, matrices with the restricted isometry property (RIP) necessarily have the null space property, and these matrices are easy to construct by taking independent sub-Gaussian entries. We can use L1-minimization to reconstruct
-sparse vectors
after applying a random
matrix
with only
rows. This is a considerable improvement over the previous sparse approximation results, which are based on the worst-case coherence of the columns of
; these coherence-based guarantees require many more rows:
.
While L1-minimization has played an important role in the history of compressed sensing, it is by no means the most efficient method for reconstruction. Certainly, it is faster than minimizing the -norm, but many of the previously known sparse approximation procedures are very fast, and since the introduction of RIP, it has been applied to the older procedures to demonstrate better performance. Perhaps the most computationally cheap reconstruction algorithm is one-step thresholding (OST), which finds the columns of
that look the most like the measurement vector
to identity the support of
, and then recovers
by applying the pseudoinverse of the identified submatrix of
. But there are two additional pieces of information you need about
in order for OST to work properly:
(i) the norm of the sparse signal, and
(ii) that the nonzero entries of the signal have similar orders of size.
In many cases, determining the norm of is easy (such as when
is RIP), and so the real distinction for OST is (ii): it forces us to require that
is not only sparse, but also has nonzero entries of sufficiently equal size. Intuitively, this is a reasonable assumption if there is noise in
, since the relatively small nonzero entries could then be buried in the noise.
To illustrate why OST requires (i) and (ii), let’s prove the coherence-based performance guarantee for OST:
Theorem. Pick an sensing matrix
with worst-case coherence
,
and suppose is
-sparse with support
and sufficiently large nonzero entries:
.
Then given measurements , we can recover the support of
from back-projection:
.
Proof: Taking , we have that each entry of
is of the form
.
For each , the reverse triangle and triangle inequalities then give
.
Similarly, for each , we have
.
For this theorem, the necessary ratio between the smallest and average size of the nonzero entries of satisfies
,
and so the number of rows of must be
, as indicated earlier. Also notice that the norm of
(the
-norm in this case) is used to determine the threshold, but to recover
, it suffices to know the size of
‘s support (just pick the
largest entries in the back-projection).
To decrease the number of rows required in the sensing matrix , there has been some work to apply RIP structure. In many applications, deterministic sensing matrices are used, but to date, only random RIP constructions are known to perform much better than
. Moreover, it’s difficult to verify higher levels of RIP. As an alternative, other work has managed to find
performance from OST with much weaker (verifiable) conditions on
:
,
.
The difference here is that OST will only identify the correct support with high probability (assuming the support is drawn uniformly at random).
You might wonder if the uniform-random support assumption is valid. Suppose it isn’t, but I have the luxury of changing my sensing matrix before every measurement. Then I can randomly permute the columns of my sensing matrix and get the desired effect of random support in the sparse signal. Beyond the uniform-randomness assumption, the overall philosophy of having randomness in the performance is probably better than the current philosophy in state-of-the-art compressed sensing; engineers seem to prefer a fixed machine that is guaranteed to work 99% of the time over a machine drawn at random that might never work 1% of the time. To be more fair, since the weaker matrix conditions of OST are verifiable, they could be used in addition to state-of-the-art compressed sensing: draw a random matrix, which will be perfect with high probability, and then test for whether it will be good for OST to see if it’s at least near-perfect. Since verifying RIP is difficult, this might be the best indication you can get in a reasonable amount of time.
For some reason, you tend to call out engineers a lot for their weird quirks. Can I help it if I, like most other engineers, prefer to live pragmatically with our 99% guarantee instead of that esoteric 1% shot in the dark?
P.S. Do you plan on writing MATLab code for this? I can easily see making a MATLab screensaver that implements your sensing algorithm, but I am not sure what I would input into the matrix that I would be interested in.
For the record, I appreciate the quirks. As for your second comment, MATLAB is probably the easiest way to implement the algorithm, since it only requires matrix-vector products. Also, if you’re interested in stopping online piracy, the following should lead to a worthy input for the matrix:
http://arxiv.org/abs/1111.3376
The anti-piracy algorithm is quite intriguing. I miss working with you and your math stuff.
Hi!
, for arbitrary vector/matrix
, it seems
always larger than
? It also seems no matter the
norm or a constant in the proof?
I have a little confuse here. Since the coherence always
Thanks for your response!
Sorry for the above typo! I mean the
norm should always larger than its element. Thank you!
Not sure I understand your question. Coherence is actually between 0 and 1 (my columns each have unit norm), and yes, the 1-norm is the sum of the entry sizes, and hence larger than each entry.