# Compressed Sensing 3.0

I’m attending an NSF program review, and Thomas Strohmer gave an awesome talk yesterday. Imagine a scenario in which you are trying to solve a linear inverse problem, but you don’t have complete information about the linear operator. Apparently, this is common when working with small sensors or with high-resolution data. Here’s the problem formulation: You receive

$y=A(\theta)x$,

where $\theta\in\Theta$ is an unknown parameter, $A\colon\Theta\rightarrow\mathbb{R}^{M\times N}$ is a known sensing matrix function (i.e., if you knew $\theta$, then you could determine the sensing matrix $A(\theta)$), and $x\in\mathbb{R}^N$ is an unknown signal exhibiting some notion of sparsity or structure. So what are the conditions that allow one to recover $\theta$ and $x$ from $y$?

In the case where $\theta$ is known, this is simply compressed sensing. Actually, you might call it

• Compressed Sensing 1.0 if $x$ is sparse, i.e., $\#\mathrm{supp}(x)$ is small, or
• Compressed Sensing 2.0 if $x$ has low rank (when reshaped into a matrix).

The problem of finding the appropriate parameter $\theta$ for the sensing matrix is called calibration, and we would like systematic approaches for solving this problem (i.e., “self-calibration”). This seems most feasible when $A$ is a linear function of $\theta$. In this case, each entry of the data $B_m(\theta,x):=(A(\theta)x)[m]$ is a bilinear form, and so it might be helpful to express the inverse problem as finding sparse or structured $\theta$ and $x$ such that

$y[m]=\theta^\top B_m x\qquad \forall m\in\{1,\ldots,M\}$,

where $B_m$ is the matrix representation of the bilinear form $B_m(\cdot,\cdot)$Following a suggestion of Lee Potter, Thomas proposed the following buzzword:

Compressed Sensing 3.0 — Bilinear optimization problems combined with sparsity and tensor structures.

In the case where $A$ is a linear function of $\theta$, calibration is an instance of CS 3.0. Motivated by real-world calibration problems, Thomas considers the case where $\theta$ belongs to a known subspace and $x$ is sparse. For the rest of this blog entry, I will discuss three other instances of CS 3.0 that have been studied to date. One is Peter Jung‘s work, which considers the case where $\theta$ and $x$ are both sparse (I summarized this work here). Specifically, $A(\theta)=\Phi\Psi(\theta)$, where $\Phi$ is a measurement matrix and $\Psi(\theta)$ is an unknown sparse combination of dictionaries $\{\Psi_i\}$, i.e, $\Psi=\sum_i\theta_i\Psi_i$ and $\theta$ is sparse. Apparently, this problem arises from machine-type sporadic communication.

Another instance of CS 3.0 was studied by Ali Ahmed, Ben Recht and Justin Romberg (the paper is here). In this case, $\theta$ and $x$ both belong to known subspaces, and you receive the circular convolution $y=\theta*x$ — this seems to be a fundamental problem. Let $F$ denote the discrete Fourier transform. If we denote $D_\theta:=\mathrm{diag}(F\theta)$, then we can write $y=A(\theta)x$, where $A(\theta)=F^*D_\theta F$. To solve this problem, it is more useful to observe the definition of the circular convolution:

$\displaystyle{(\theta*x)[n]:=\sum_{n'\in\mathbb{Z}_N}\theta[n']x[n-n']}$.

If we define $B_n$ to be the $N\times N$ matrix such that $B_n[i,j]=1$ when $i+j\equiv n\bmod N$ (and zero otherwise), then the received data can be expressed as linear measurements of $X=x \theta^\top$:

$y[n]=(\theta*x)[n]=\langle B_n,X\rangle_\mathrm{HS}$.

Indeed, this can be viewed as another instance of the trace trick:

$y[n]=\theta^\top B_n x=\mathrm{Tr}[\theta^\top B_n x]=\mathrm{Tr}[B_n x\theta^\top]=\langle B_n,X\rangle_\mathrm{HS}$.

Leaning on our experience with CS 2.0, the knee-jerk solution is to hunt for the rank-1 matrix $X$ satisfying these linear measurements by minimizing a convexified objective (the nuclear norm $\|X\|_*$). Ahmed, Recht and Romberg give conditions on the subspaces for which this relaxation gives exact recovery and stability.

Finally, last year found yet another instance of CS 3.0 from Atsushi Ito, Aswin Sankaranarayanan, Ashok Veeraraghavan and Rich Baraniuk (the paper is here). Imagine you take multiple pictures of the same scene, but perhaps you had too much coffee, so every picture is blurred. Can you reconstruct the underlying image? Here, the $i$th picture is of the form $y_i=k_i*x$, where $k_i$ is the blur kernel that describes how the camera moved during the $i$th exposure. As such, you can take $\theta:=(k_1,\ldots,k_Q)$, and attempt to find each $k_i$ under the assumption that it’s sparse, along with $x$, which will have small TV norm since it’s a natural image. This leads to a multi-image deblurring algorithm called BlurBurst (“burst” because you should think of the images as having been taking in rapid succession, or “burst mode”). As you might expect, this algorithm completely destroys image deblurring algorithms that only accept a single blurred image as input (see section 6 of the paper for some stunning comparisons).

I’m interested to see more theory and applications of CS 3.0!