A variant on the compressed sensing of Yves Meyer

Last time, I blogged about constructing harmonic frames using difference sets. We proved that such harmonic frames are equiangular tight frames, thereby having minimal coherence between columns. I concluded the entry by conjecturing that incoherent harmonic frames are as good for compressed sensing as harmonic frames whose rows were randomly drawn from the discrete Fourier transform (DFT) matrix. In response, Igor Carron commented that recent work of Yves Meyer might be relevant:

These papers are interesting because their approach to compressed sensing is very different. Specifically, their sparse vectors are actually functions of compact support with sufficiently small Lebesgue measure. As such, concepts like conditioning are replaced with that of stable sampling, and the results must be interpreted in the context of functional analysis. The papers demonstrate that sampling frequencies according to a (deterministic) simple quasicrystal will uniquely determine sufficiently sparse functions, and furthermore, the sparsest function in the preimage can be recovered by L1-minimization provided it’s nonnegative.

Igor views these papers as “UFOs” in the literature because they are so different from conventional compressed sensing in so many ways. In this blog entry, I will present Meyer’s results (and proofs) in the analogous context of finite-dimensional harmonic frames, and I’ll make some connections with more familiar approaches.

Specifically, we have the following main result:

Theorem. Take $N$ prime and build a $(2K+1)\times N$ matrix $\Phi$ by collecting rows of the $N\times N$ DFT which are indexed by $\{-K,\ldots,K\}\subseteq\mathbb{Z}_N$. Suppose $x$ is nonnegative and $K$-sparse. If we take measurements of the form $y=\Phi x$, then $x$ is the unique solution of the following program:

$\displaystyle{\mbox{minimize } \|\hat{x}\|_1\quad\mbox{subject to}\quad y=\Phi\hat{x}}$.

Before proving this theorem, we note that similar results have been reported by Fuchs. In fact, his results apply to a variety of short, fat matrices, including Vandermonde matrices and real Fourier matrices. However, his recovery program enforces nonnegativity as a constraint, and so the Meyer-inspired result above is stronger. Also, our theorem statement is not limited to the particular matrix construction described above, but we’ll focus on this construction for the sake of simplicity.

To prove the above theorem, we first highlight the significance of the all-ones row in $\Phi$, that is, the DFT row indexed by zero. As we will see, this row guarantees two things:

• that $x$ is a solution to the program, and
• that every solution to the program is nonnegative.

To see this, consider a competitor $z$ such that $\Phi z=y$ and $\|z\|_1\leq\|x\|_1$, and decompose the vector into its positive, negative and imaginary parts: $z=z_+-z_-+iz_i$. Then we have

$\displaystyle{1^*z_+-1^*z_-+i1^*z_i=1^*z=y[0]=1^*x=\|x\|_1\geq\|z\|_1}$
$\displaystyle{\geq\|z_+-z_-\|_1=\|z_+\|_1+\|z_-\|_1=1^*z_++1^*z_-}$.

Since $1^*z_+-1^*z_-+i1^*z_i=1^*x$ is real, we know $1^*z_i=0$. Next, rearranging $1^*z_+-1^*z_-\geq1^*z_++1^*z_-$ gives $0\geq 1^*z_-$, but $z_-$ is nonnegative, and so $z_-$ and $1^*z_-$ are zero. This simplifies our string of inequalities:

$\displaystyle{1^*z_+=\|x\|_1\geq\|z\|_1\geq\|z_+-z_-\|_1=1^*z_+}$.

Since we have equality in the left- and right-hand sides, equality is achieved in both inequalities above. Equality in the first implies that $x$ is a minimizer (i.e., a solution to our program), and equality in the second implies that $z_i=0$, and so $z$ (i.e., any other solution) is nonnegative since $z_-$ is also zero.

Now that we know $x$ is a solution to the program, it remains to establish uniqueness. Of course, this would be easy to guarantee if we knew $\Phi$ satisfied the null space property (NSP): If $z\neq x$ were a solution, then $x-z$ is a nontrivial vector in the null space of $\Phi$, but the positive part of this vector is $K$-sparse (since $z$ must be nonnegative) and $\|(x-z)_+\|_1=\frac{1}{2}\|x-z\|_1$ since $1^*(x-z)=0$, violating NSP. Unfortunately, NSP is difficult to demonstrate in general, and in this case, it’s overkill. Since $\Phi$ has an all-ones row and every solution of the program is necessarily nonnegative, it’s relatively straightforward to prove the necessity and sufficiency of a certain cone property: that $\mathrm{Cone}(\Phi_\mathcal{K})\cap\mathrm{Cone}(\Phi_{\mathcal{K}^\mathrm{c}})=\{0\}$ for every $\mathcal{K}\subseteq\{1,\ldots,N\}$ of size $\leq K$. I won’t bother dwelling on the details of this property; while it might come in handy to prove similar theorems about frames that are not harmonic, it seems easier to exploit the fact that the other rows of $\Phi$ come from the DFT in our case.

We claim it suffices to show that for every $\mathcal{K}\subseteq\{1,\ldots,N\}$ of size $K$ and every $n\not\in\mathcal{K}$, there exists a vector $\sigma$ with the following properties:

• it’s nonnegative,
• it’s zero on $\mathcal{K}$,
• it’s nonzero at $n$, and
• it’s Fourier transform is supported on the row indices selected, i.e., $\{-K,\ldots,K\}$.

Let’s prove our claim: Take $\mathcal{K}$ to contain the support of $x$. Then for every nonnegative $z$ with $\Phi z=\Phi x$, we can take Fourier transforms to get

$\displaystyle{0=\langle x,\sigma\rangle=\langle Fx,F\sigma\rangle=\sum_{m=-K}^{K}(Fx)[m]\overline{(F\sigma)[m]}}$
$\displaystyle{=\sum_{m=-K}^{K}(Fz)[m]\overline{(F\sigma)[m]}=\langle Fz,F\sigma\rangle=\langle z,\sigma\rangle}$.

Since $z$ and $\sigma$ are nonnegative and orthogonal, they must have disjoint support, i.e., $z[n]=0$. Furthermore, since the choice of $n$ was arbitrary, $z$ is not supported outside of $\mathcal{K}$. Finally since $N$ is prime, we know $\Phi$ is full spark by Chebotarev’s theorem, meaning it maps $K$-sparse vectors injectively. Thus $\Phi z=\Phi x$ implies $z=x$, and so $x$ is the unique solution. So how do we ensure that such a $\sigma$ always exists?

One of the key things we have to ensure is that $\sigma$ is nonnegative. This can be difficult in general, but Fourier analysis will make it easier in our case. Specifically, we will first pick a vector $s$ and then define $\sigma$ entrywise by $\sigma[n']=|s[n']|^2$ for all $n'\in\mathbb{Z}_N$. As such, the support of $\sigma$ is precisely the support of $s$, but we also need to control the support of the Fourier transform of $\sigma$. By the definition of $\sigma$, we know that $F\sigma$ will be the autocorrelation of $Fs$, and so the support of $F\sigma$ will be the set of all differences between members of the support of $Fs$. Since $F\sigma$ is to be supported on the selected row entries, and since we generally want to have few rows in our sensing matrix, this means the support of $Fs$ should have a lot of additive structure. This sort of structure is precisely the subject of study in additive combinatorics, and this seems to correspond analogously to Meyer’s definition of model sets.

For the particular row indices of this example, namely $\{-K,\ldots,K\}$, it suffices to have $Fs$ supported on $\{0,\ldots,K\}$, which has a lot of additive structure, being an arithmetic progression. To establish the existence of such an $s$, we identify our requirements:

• $s[n']=0$ for all $n'\in\mathcal{K}$,
• $s[n]$ is nonzero, say $s[n]=1$, and
• $(Fs)[n']=0$ for all $n'\not\in\{0,\ldots,K\}$.

This is a total of $N$ linear constraints, and so the $N$-dimensional vector $s$ exists if the constraints are linearly independent. Indeed, the square matrix corresponding to these constraints has distinct rows from the identity and DFT matrices, and the determinant is easily shown to be nonzero using Chebotarev’s theorem.

Having proved the main result of this blog entry, I’d like to make a couple of observations. First, there appears to be a disconnect between the harmonic frame constructions suggested by Meyer’s work and the harmonic frames which are known to be good for compressed sensing in general. To be clear, the row entries of Meyer’s harmonic frames have a lot of additive structure, as opposed to randomly drawn row entries which have minimal additive structure. You can tell from the example $\{-K,\ldots,K\}$ that the row entries are far from random. The reason behind this contrast might lie in the nonnegativity assumption on the sparse signal. Note that in our proof, the additive structure only played a role when establishing uniqueness of the (nonnegative) solution.

This leads me to my second observation: The nonnegativity assumption appears rather strong. Not only are we able to find a deterministic construction with ease, the construction uses $\sim2K$ rows—the number you’d need to guarantee mere injectivity in general. Without the nonnegativity assumption, we are forced to take $\Omega(K\log(N/K))$ rows for L1-minimization to work; granted, the log factor is small, but it does help to distinguish from Meyer’s result. Of course, there are applications for which nonnegativity is an appropriate assumption, but it would be interesting to find other applications (and determine whether this harmonic construction fits the bill).