One-step thresholding

This post will discuss an intuitive and fast reconstruction algorithm for compressed sensing called one-step thresholding.  Since these are the first notes I’ve written about compressed sensing, I’ll take some time to review the basics:

Compressed sensing is a field where you want to capture all of a signal’s information in as few non-adaptive linear measurements as possible. Assuming there is no noise in the measurement process, each measurement is an entry of $y=\Phi x$, where $\Phi$ is a short, fat matrix. Of course, since the rank of $\Phi$ is smaller than the dimension of $x$, we know $y$ will fail to completely describe $x$ unless we have additional information about $x$. Oftentimes, signals are known to be sparse in some orthonormal basis, and so this is the prior that is typically used.

So if we know $x$ is a sparse vector, that is, it has few nonzero entries, then it makes sense to reconstruct $x$ from $y=\Phi x$ by looking for the sparsest vector in the preimage of $y$. However, finding this vector is known to be NP-hard. Instead of looking for the sparsest vector, aka the vector of minimal $0$-norm, it has become a standard trick to convexify the $0$-norm objective function into the $1$-norm. In fact, both objective functions have the same unique global minimizer precisely when $\Phi$ satisfies the null space property. Also, matrices with the restricted isometry property (RIP) necessarily have the null space property, and these matrices are easy to construct by taking independent sub-Gaussian entries. We can use L1-minimization to reconstruct $K$-sparse vectors $x$ after applying a random $M\times N$ matrix $\Phi$ with only $M=\Omega(K\log(N/K))$ rows. This is a considerable improvement over the previous sparse approximation results, which are based on the worst-case coherence of the columns of $\Phi$; these coherence-based guarantees require many more rows: $M=\Omega(K^2)$.

While L1-minimization has played an important role in the history of compressed sensing, it is by no means the most efficient method for reconstruction. Certainly, it is faster than minimizing the $0$-norm, but many of the previously known sparse approximation procedures are very fast, and since the introduction of RIP, it has been applied to the older procedures to demonstrate better performance. Perhaps the most computationally cheap reconstruction algorithm is one-step thresholding (OST), which finds the columns of $\Phi$ that look the most like the measurement vector $y$ to identity the support of $x$, and then recovers $x$ by applying the pseudoinverse of the identified submatrix of $\Phi$. But there are two additional pieces of information you need about $x$ in order for OST to work properly:

(i) the norm of the sparse signal, and

(ii) that the nonzero entries of the signal have similar orders of size.

In many cases, determining the norm of $x$ is easy (such as when $\Phi$ is RIP), and so the real distinction for OST is (ii): it forces us to require that $x$ is not only sparse, but also has nonzero entries of sufficiently equal size. Intuitively, this is a reasonable assumption if there is noise in $x$, since the relatively small nonzero entries could then be buried in the noise.

To illustrate why OST requires (i) and (ii), let’s prove the coherence-based performance guarantee for OST:

Theorem. Pick an $M\times N$ sensing matrix $\Phi=[\varphi_1\cdots\varphi_N]$ with worst-case coherence

$\displaystyle{\mu:=\max_{\substack{i,j\in\{1,\ldots,N\}\\i\neq j}}|\langle \varphi_i,\varphi_j\rangle|}$,

and suppose $x$ is $K$-sparse with support $\mathcal{K}\subseteq\{1,\ldots,N\}$ and sufficiently large nonzero entries:

$\displaystyle{\min_{k\in\mathcal{K}}|x_k|\geq 2\mu\|x\|_1}$.

Then given measurements $y=\Phi x$, we can recover the support of $x$ from back-projection:

$\displaystyle{\mathcal{K}=\big\{n:|(\Phi^*y)[n]|>\mu\|x\|_1\big\}}$.

Proof: Taking $\hat{x}:=\Phi^*y=\Phi^*\Phi x$, we have that each entry of $\hat{x}$ is of the form

$\displaystyle{\hat{x}_n=x_n+\sum_{\substack{n'=1\\n'\neq n}}^N\langle\varphi_{n'},\varphi_n\rangle x_{n'}}$.

For each $n\in\mathcal{K}$, the reverse triangle and triangle inequalities then give

$\displaystyle{|\hat{x}_n| \geq|x_n|-\sum_{\substack{n'=1\\n'\neq n}}^N|\langle\varphi_{n'},\varphi_n\rangle| |x_{n'}| >2\mu\|x\|_1-\mu\|x\|_1 =\mu\|x\|_1}$.

Similarly, for each $n\not\in\mathcal{K}$, we have $|\hat{x}_n|\leq\mu\|x\|_1$.      $\Box$

For this theorem, the necessary ratio between the smallest and average size of the nonzero entries of $x$ satisfies

$\displaystyle{1\geq\frac{\min_{k\in\mathcal{K}}|x_k|}{\|x\|_1/K}\geq2\mu K\geq\frac{2K}{\sqrt{M}}}$,

and so the number of rows of $\Phi$ must be $M=\Omega(K^2)$, as indicated earlier. Also notice that the norm of $x$ (the $1$-norm in this case) is used to determine the threshold, but to recover $x$, it suffices to know the size of $x$‘s support (just pick the $K$ largest entries in the back-projection).

To decrease the number of rows required in the sensing matrix $\Phi$, there has been some work to apply RIP structure. In many applications, deterministic sensing matrices are used, but to date, only random RIP constructions are known to perform much better than $M\sim K^2$. Moreover, it’s difficult to verify higher levels of RIP. As an alternative, other work has managed to find $M\sim K\log(N/K)$ performance from OST with much weaker (verifiable) conditions on $\Phi=[\varphi_1\cdots\varphi_N]$:

$\displaystyle{\mu:=\max_{\substack{i,j\in\{1,\ldots,N\}\\i\neq j}}|\langle \varphi_i,\varphi_j\rangle|\leq\frac{C}{\log N}}$,

$\displaystyle{\nu:=\frac{1}{N-1}\max_{i\in\{1,\ldots,N\}}\bigg|\sum_{\substack{j=1\\j\neq i}}^N\langle \varphi_i,\varphi_j\rangle\bigg|\leq\frac{\mu}{\sqrt{M}}}$.

The difference here is that OST will only identify the correct support with high probability (assuming the support is drawn uniformly at random).

You might wonder if the uniform-random support assumption is valid. Suppose it isn’t, but I have the luxury of changing my sensing matrix before every measurement. Then I can randomly permute the columns of my sensing matrix and get the desired effect of random support in the sparse signal. Beyond the uniform-randomness assumption, the overall philosophy of having randomness in the performance is probably better than the current philosophy in state-of-the-art compressed sensing; engineers seem to prefer a fixed machine that is guaranteed to work 99% of the time over a machine drawn at random that might never work 1% of the time. To be more fair, since the weaker matrix conditions of OST are verifiable, they could be used in addition to state-of-the-art compressed sensing: draw a random matrix, which will be perfect with high probability, and then test for whether it will be good for OST to see if it’s at least near-perfect. Since verifying RIP is difficult, this might be the best indication you can get in a reasonable amount of time.

6 thoughts on “One-step thresholding”

1. For some reason, you tend to call out engineers a lot for their weird quirks. Can I help it if I, like most other engineers, prefer to live pragmatically with our 99% guarantee instead of that esoteric 1% shot in the dark?

P.S. Do you plan on writing MATLab code for this? I can easily see making a MATLab screensaver that implements your sensing algorithm, but I am not sure what I would input into the matrix that I would be interested in.

1. For the record, I appreciate the quirks. As for your second comment, MATLAB is probably the easiest way to implement the algorithm, since it only requires matrix-vector products. Also, if you’re interested in stopping online piracy, the following should lead to a worthy input for the matrix:

http://arxiv.org/abs/1111.3376

1. The anti-piracy algorithm is quite intriguing. I miss working with you and your math stuff.

2. anson says:

Hi!
I have a little confuse here. Since the coherence always $\ge 1$, for arbitrary vector/matrix $x$, it seems $\norm_1(x)$ always larger than $x$? It also seems no matter the $\ell_1$ norm or a constant in the proof?
Thanks for your response!

3. anson says:

Sorry for the above typo! I mean the $\ell_1$ norm should always larger than its element. Thank you!

1. Not sure I understand your question. Coherence is actually between 0 and 1 (my columns each have unit norm), and yes, the 1-norm is the sum of the entry sizes, and hence larger than each entry.