# Sparse Representations, Numerical Linear Algebra, and Optimization Workshop

A couple of weeks ago, I attended the “Sparse Representations, Numerical Linear Algebra, and Optimization Workshop.” It was my first time at Banff, and I was thoroughly impressed by the weather, the facility, and the workshop organization. A few of the talks were recorded and are available here. Check out this good-looking group of participants: I wanted to briefly outline some of the problems that were identified throughout the workshop.

Off-the-grid compressed sensing. Suppose you are given a sum of sinusoids over an interval. How do you determine the sinusoids? You might be inclined to take the Fourier transform, but you are only given the function over an interval, and you don’t know how to extend it (the sinusoids are not necessarily periodic over the interval). If you assume the frequencies lie on some dilation of the integer lattice, you can express the sum of sinusoids as a matrix-vector product, where the matrix has a continuum of rows, each column is a prospective sinusoid, and the vector implicates which columns are active in the combination. Of course, in the worst-case scenario, the active frequencies would be better approximated by a more dense dilation of the integer lattice, but doing so will invariably increase the coherence between columns of the matrix, which in turn makes traditional compressed sensing impossible. This leads to an impasse: Either sample too coarsely and fail to capture the true signal model, or sample too finely and fail to reconstruct the desired signal. Thomas Strohmer suggested looking for some sort of multiscale-type algorithm in which you start with a more coarse representation to have an idea of where the support is, and then “zoom in” to refine the support. Of course, this particular problem was solved in this paper by passing to the dual and solving using an SDP, but perhaps a faster solver is available in this alternative approach.

Fast alternatives to lifting. Perhaps the most beautiful trick in phase retrieval is lifting, in which nonlinear intensity measurements $|\langle x,\varphi_i\rangle|^2$ are converted to linear measurements in operator space $\langle xx^*,\varphi_i\varphi_i^*\rangle_\mathrm{HS}$. With this identification, PhaseLift proposes to perform phase retrieval by finding a low-rank matrix $X$ that matches the linear data $\langle X,\varphi_i\varphi_i^*\rangle_\mathrm{HS}$. Here, rank minimization is naturally relaxed to trace minimization subject to a positive semidefinite constraint, thereby allowing for tractable recovery for sufficiently good measurement vectors. Unfortunately, this is tractable in a polynomial-time sense, but it will still take a while for a computer to churn out the recovered vector. As an alternative, one can “back project” to get $xx^*\approx \sum_i \langle xx^*,\varphi_i\varphi_i^*\rangle_\mathrm{HS} \varphi_i\varphi_i^*.$

In particular, under reasonable assumptions on the distribution of $\varphi_i$, one can show the leading eigenvector of the right-hand side above is a good approximation of $x$. Notice that this back-projection is particularly easy to calculate. If you then locally minimize the error of intensity measurements, you can recover provided your approximation of $x$ is good enough to land in a convex neighborhood of the global minimum. This is the main idea behind phase retrieval via Wirtinger flow, and similar methods ought to be attempted for the other instances of “Compressed Sensing 3.0“.

Low-rank matrix recovery by alternating projections. In low-rank matrix recovery, we seek to minimize the rank of a matrix subject to linear data, and the obvious relaxation is to minimize the nuclear norm instead. Recently, the phase transition of this relaxation was completely characterized. An alternative to nuclear norm minimization is a lot easier: iterative hard thresholding, aka alternating projections. Essentially, you start at the origin, and then you alternate between projecting onto the data (which forms a shifted null space) and projecting onto the set of rank- $r$ matrices. The latter is not a convex set, so analysis of this needs to be delicate. It is known that if the measurements satisfy a restricted isometry property, then alternating projections works regardless of the rank- $r$ ground truth. But the average-case performance is a bit more interesting: Empirically, it appears as though alternating projections significantly outperforms nuclear norm minimization. Check out these phase transitions (taken from this paper): Unfortunately, there is no theory to explain why the performances are so different. As a side note, it is interesting that the first step of alternating projections effectively performs the “back projection” that forms the first step of the alternative to lifting. Perhaps one can interpret the remainder of alternating projections as a local method of sorts. This seems to suggest that a single theory might solve both problems.

Tail bounds for linear combinations of chi-squared random variables. Fred Roosta gave a great talk about a Monte-Carlo method for estimating the trace of a symmetric operator $A$ when you don’t get to see $A$, but you get to choose inputs and observe outputs. The main trick he used is that $\mathbb{E}(xx^\top)=I$ implies $\mathrm{tr}(A)=\mathrm{tr}(AI)=\mathrm{tr}(A\mathbb{E}(xx^\top))=\mathbb{E}(x^\top Ax).$

As such, he can fix an isotropic distribution, draw inputs randomly, and then take inner products with outputs. The sample mean of this will then approximate the trace. If only we knew how well the sample mean concentrates, we could establish confidence intervals for the desired trace. Fred focused on the special case where $x$ is distributed $N(0,I)$, and in this case, $x^\top Ax$ is simply a linear combination of iid chi-squared random variables, where the scalars in the combination are the eigenvalues of $A$. I personally appeal to Lemma 1 of this paper whenever I need to control such a combination, but Fred wants more. He showed that if the spectrum of one operator majorizes the spectrum of another, then the tail is bigger. Unfortunately, not all pairs of spectra are comparable in the majorization order. Is the tail completely determined by, say, the entropy of the spectrum?

Low-rank matrix completion from structured random samples. Felix Herrmann wants to find oil to drill, so he collects a bunch of seismic data and seeks to solve the corresponding inverse problem. There are a lot of challenges associated with this. Here’s one: You get a lot of data. It is naturally structured in a tensor. If one of your sensors fails (and they will), then you have holes in your tensor that you need to fill. A natural trick is to matricize the tensor and pray that the underlying matrix has low rank, but this won’t always be the case. In one instance, the matrix appears to have diagonal dominance, and so the matrix is far from low rank. But Felix is clever — he “rotates” the matrix so that the diagonals are converted to columns (I personally think a shear would be a more natural operation here). This in turn makes the matrix have low rank; in Felix’s words, this is a “rank-revealing transformation.” At this point, the holes in the tensor become holes in a low-rank matrix, which we intend to fill by matrix completion. It certainly works in practice, but this is not explained by the current theory, which requires the entries to be drawn uniformly at random. In his application, there is a lot of probabilistic dependency between whether certain entries are selected, since several entries correspond to a common sensor.

Consistent notation and grammar. Toward the end of the workshop, Michael Saunders provided a charming and humorous discussion of various instances of bad notation he spotted in the preceding talks: For example, ever since they were defined by Duffin and Schaeffer, the frame bounds of a frame have been denoted by $A$ and $B$, but Michael notes that Householder notation would use Greek letters for scalars and save capital letters for matrices. He also took some time to point out bad grammar and bad LaTeX practices. I was personally shocked to learn about dangling participles.