# Applied Harmonic Analysis and Sparse Approximation

Last week, I attended this workshop at Oberwolfach. The weather was great, and I was rather impressed by the quality of the talks. Throughout the workshop, I paid special attention to the various problems that different people are thinking about. In what follows, I briefly discuss several of these problems.

Karlheinz Gröchenig gave a chalk talk about some recent work with Andreas Klotz in which he proves sampling theorems for functions of variable bandwidth. His definition of variable bandwidth is naturally motivated in terms of spectral projection, but it was identified during the questions portion of the talk that, if a function’s maximum bandwidth is $W$, then the function does not necessarily have bandwidth $\leq W$. (Indeed, there are apparently non-smooth functions of variable bandwidth.) This paradox suggests the following problem:

Problem 1. Recall that $\mathrm{PW}(w)$ denotes the “Paley–Wiener” function class of functions with (fixed) bandwidth $w$. Find a reasonable notion of variable bandwidth $w\colon\mathbb{R}\rightarrow\mathbb{R}_+$ such that the corresponding function classes satisfy $\mathrm{PW}(w_1)\subseteq\mathrm{PW}(w_2)$ whenever $w_1\leq w_2$ pointwise.

Massimo Fornasier gave a talk based on his recent paper with Jan-Christian Huetter regarding the approximation of probability measures over $\mathbb{R}^d$ with atoms. In the paper, they introduce an attraction force that encourages atoms to collect in the high-probability regions, along with a repulsion force that encourages atoms to stay away from each other. During his chalk talk, Massimo posed the following problem:

Problem 2. Consider the arrangement of $n$ atoms in $\mathbb{R}^d$ that optimally approximates the $d$-dimensional Gaussian distribution in terms of the Fornasier–Huetter potential. List the locations of these atoms as columns of a $d\times n$ matrix. Does this matrix satisfy the restricted isometry property?

For this problem, Afonso and I discussed proof techniques for both possibilities. To prove that the matrix is RIP, start with a Gaussian random matrix (which is RIP). Then minimizing the Fornasier–Huetter potential probably won’t change the matrix too much, and so the new matrix will still be RIP to some extent. On the other hand, to see why the matrix ought not be RIP, consider the uniform distribution over the unit sphere. Random matrices whose columns come from this distribution are certainly RIP, but the optimal arrangement of points on the sphere will often exhibit linear dependencies (in fact, you might get two points to be antipodal, as in the case of Platonic solids). In the absence of a complete proof, the jury is out on which is actually true.

At the end of the first day of the workshop, the organizers had everyone give a 3-minute talk on something they’re interested in. I took the opportunity to discuss Strohmer’s problem, as well as a couple of other problems I find interesting:

Problem 3. Given an image $y\colon\mathbb{Z}_n^2\rightarrow[0,1]$ and a bandlimited window $g\colon\mathbb{Z}_n^2\rightarrow\mathbb{R}$, find a lower bound on the quantity

$\min\quad\|g*(x-y)\|_2^2\quad\mbox{s.t.}\quad x\colon\mathbb{Z}_n^2\rightarrow\{0,1\}.$

The reason to care about this problem is that at this point, there are several methods of producing a good halftone quantized approximation of a given image (for the sake of black-and-white printing, say), but it would be nice to have a lower bound so as to report how close to optimal a given scheme performs. Interestingly, this problem is very related to Massimo’s talk, as established in this paper. (Also, Claire Boyer showed me a stunning video that illustrates the quality of halftoning with these methods.)

The other problem I discussed in my 3-minute talk is essentially Problem 10 from my recent paper with Jesse Peterson:

Problem 4. Let $W$ denote the $2^n\times2^n$ Walsh–Hadamard transform matrix, and let $z$ be some $k$-sparse vector in $\mathbb{R}^{2^n}$. Given a sample of $\ell$ entries of $\mathrm{sign}(Wz)$, find a $k$-sparse vector $\hat{z}$ such that $\mathrm{sign}(W\hat{z})$ nearly matches these samples in $\mathrm{poly}(n,k,\ell)$ time.

To me, this seems a lot like one-bit compressed sensing and sparse FFT. The main difficulty is that $W$ is exponential in $n$, and so a solution to this problem will necessarily have to exploit the structure of Walsh–Hadamard.

Ingrid Daubechies used her 3-minute talk to discuss recent advances in the following problem:

Problem 5. Consider a signal of the form

$\displaystyle{y(t)=\sum_{i=1}^k a_i(t)\sin(\omega_i(t)t),}$

where $k$ is small and $a$ and $\omega$ are slowly varying. Reconstruct the $a_i$‘s and $\omega_i$‘s from noisy samples of $y$.

Ingrid personally introduced me to this problem when I first visited Princeton as a prospective graduate student in early 2009, and I’ve been fascinated with it ever since. Empirical mode decomposition gives some reason to suspect that a solution ought to be possible, but one is left yearning for a performance guarantee. This led to the development of synchrosqueezing, along with the more recent development called ConceFT. Apparently, ConceFT considers the synchrosqueezing that occurs with a random wavelet, and then averages the result over multiple random wavelets so as to cancel out any undesired artifacts. In the end, this “boosted” version of synchrosqueezing is much more robust to noise.

Rachel gave a very nice talk on convex relaxations of different clustering objectives. Specifically, she focused on an LP relaxation of k-medoids and an SDP relaxation of k-means. To analyze both of these relaxations, she leveraged something we now call the stochastic ball model (a salute to the stochastic block model), in which data points are drawn randomly from balls in euclidean space. The idea is to find mild conditions on the positioning on the balls for the random data points to enable the relaxed program to exactly recover the planted clusters. However, Rachel pointed out that in practice, the k-medoids LP recovers optimal clusters even when they aren’t carefully planted. This observation leads to a very interesting problem.

For a given collection of $n$ points in $\mathbb{R}^d$, consider the corresponding $d\times n$ matrix $A$. Then $k$-medoids is a function that takes $A$ as input and outputs a partition of the set $[n]$ into $k$ subsets. Likewise, the $k$-medoids LP is a function that takes $A$ as input and outputs either the desired partition or the statement “NOT TIGHT” (it actually outputs something you might round, but we are only concerned with whether the LP is tight). Observe that the output is the same if the input $A$ is replaced by $cA$ for some nonzero scalar $c$. As such, we may assume $\|A\|_F=1$ without loss of generality.

Problem 6. Draw $A$ uniformly from the set of $d\times n$ matrices of unit Frobenius norm. What is the probability that the $k$-medoids LP is not tight? Is it vanishing with $d$ and $n$?

When comparing these relaxations with Lloyd’s algorithm, there is an apparent tradeoff: Either your algorithm is screaming fast (Lloyd), or it enjoys a performance guarantee in terms of the stochastic ball model (relaxations). Why can’t we have both?

Problem 7. Is there a fast k-means clustering algorithm that enjoys a performance guarantee in terms of the stochastic ball model? In particular, does Lloyd’s algorithm (suitably initialized) enjoy such a guarantee?

Mauro Maggioni discussed geometric multiresolution analysis (GMRA) for the efficient learning of manifolds, and his talk illustrated some applications to chemistry (e.g., finding useful parameters to analyze how a complicated molecule twists over time). During lunch, I talked with him about his research, and he pointed out a basic subproblem of his work that might be worth investigating:

Problem 8. Let $x$ be an unknown vector in $\mathbb{R}^n$, and let $G$ be an unknown connected subgroup of $O(n)$. Given noisy samples of the orbit $G(x)=\{gx : g\in G\}$, estimate the orbit $G(x)$.

The desired algorithm can be thought of as a twisted version of PCA, and it appears to be similarly fundamental. For example, identifying the invariants of a given data set will substantially improve the performance of any machine learning task. I briefly thought about this problem with Afonso while hiking towards Black Forest Chocolate Cake on Wednesday, and there seems to be some interesting math in this problem.

Stefan Steinerberger gave a talk on his recent work with Ronald Coifman about a nonlinear version of Fourier series. The idea revolved around something called the Blaschke factorization, which writes $F=BG$, where $F$ is a sufficiently nice function over the complex plane, and $G$ has no roots in the unit disk. Here, $B$ is chosen to be a rational function that sends roots of $G$ inside the disk to match those of $F$. While $G$ doesn’t have roots in the disk, one may subtract $G(0)$ to produce at least one root, and then factor $G-G(0)$. This leads to an iterative scheme that apparently mimics the Fourier transform. The goal of the paper is to prove results that establish how $G$ becomes simpler in some sense as you continue the iteration. Overall, the math surrounding these results is rather pretty, and so one is naturally encouraged to seek out applications:

Problem 9. Find applications of nonlinear phase unwinding (iterative Blaschke factorization).

As you probably deduced from this blog entry, my week at Oberwolfach was particularly stimulating. I’m looking forward to whenever I return!