## The HRT Conjecture

A couple of weeks ago, I attended a workshop hosted by Darrin Speegle on the HRT Conjecture. First posed twenty years ago by Chris Heil, Jay Ramanathan, and Pankaj Topiwala in this paper, the conjecture states that every finite collection of time-frequency shifts of any nonzero function in $L^2(\mathbb{R})$ is linearly independent. In this post, I will discuss some of the key ideas behind some of the various attempts at chipping away at HRT, and I’ll describe what appear to be fundamental barriers to a complete solution. For more information, see these survey papers: one and two.

First, some notation: Denote the translation-by-$a$ and modulation-by-$b$ operators by

$(T_af)(x):=f(x-a),\qquad (M_bf)(x)=e^{2\pi i bx}f(x),$

respectively. Then the formal conjecture statement is as follows:

The HRT Conjecture. For every $0\neq f\in L^2(\mathbb{R})$ and every finite $\Lambda\subset\mathbb{R}^2$, the collection $\{M_bT_af\}_{(a,b)\in\Lambda}$ is linearly independent.

What follows are some of the popular methods for tackling HRT:

## Clustering noisy data with semidefinite relaxations

Soledad Villar recently posted our latest paper on the arXiv (this one coauthored by her advisor, Rachel Ward). The paper provides guarantees for the k-means SDP when the points are drawn from a subgaussian mixture model. This blog entry will discuss one of the main ideas in our analysis, which we borrowed from Guedon and Vershynin’s recent paper.

The first application comes from graph clustering. Consider the stochastic block model, in which the $n$ vertices are secretly partitioned into two communities, each of size $n/2$, and edges between vertices of a common community are drawn iid with some probability $p$, and all other edges are drawn with probability $q. The goal of community estimation is to estimate the communities given a random draw of the graph. For this task, you might be inclined to find the maximum likelihood estimator for this model, but this results in an integer program. Relaxing the program leads to a semidefinite program, and amazingly, this program is tight and recovers the true communities with high probability when $p=(\alpha\log n)/n$ and $q=(\beta\log n)/n$ for good choices of $(\alpha,\beta)$. (See this paper.) These edge probabilities scale like the threshold for connected Erdos-Renyi graphs, and this makes sense since we wouldn’t know how to assign vertices in isolated components. If instead, the probabilities were to scale like $1/n$, then we would be in the “giant component” regime, so we’d still expect enough signal to correctly assign a good fraction of the vertices, but the SDP is not tight in this regime.

## Compressed Sensing and its Applications 2015

Last week, I attended this conference in Berlin, and much like the last CSA conference, it was very nice. This year, most of the talks followed one of three themes:

• Application-driven compressed sensing
• Quadratic or bilinear problems
• Clustering in graphs or Euclidean space

Examples of application-driven CS include theoretical results for radar-inspired sensing matrices and model-based CS for quantitative MRI. Readers of this blog are probably familiar with the prototypical quadratic problem (phase retrieval), and bilinear problems include blind deconvolution and self-calibration. Recently, I have blogged quite a bit about clustering in Euclidean space (specifically, k-means clustering), but I haven’t written much about clustering in graphs (other than its application to phase retrieval). For the remainder of this entry, I will discuss two of the talks from CSA2015 that covered different aspects of graph clustering.

## Zero to One: Notes on Startups, or How to Build the Future

I’ve been thinking a lot about my place in the world lately. I’m interested in doing math that makes a difference, and considering much of the breakthroughs in our society have come from various startups, I decided to investigate the startup culture. How might academia benefit from startup culture? One could easily imagine a hip research environment adorned with beanbag chairs and foosball tables, but these perks aren’t the stuff that makes a startup successful. To catch a glimpse, I turned to a book recently written by Peter Thiel (of PayPal fame):

## Probably certifiably correct algorithms

This post is based on two papers (one and two). The task is to quickly solve typical instances of a given problem, and to quickly produce a certificate of that solution. Generally, problems of interest are NP-hard, and so we consider a random distribution on problem instances with the philosophy that real-world instances might mimic this distribution. In my community, it is common to consider NP-hard optimization problems:

minimize $f(x)$ subject to $x\in S$.     (1)

In some cases, $f$ is convex but $S$ is not, and so one might relax accordingly:

minimize $f(x)$ subject to $x\in T$,     (2)

where $T\supseteq S$ is some convex set. If the minimizer of (2) happens to be a member of $S$, then it’s also a minimizer of (1) — when this happens, we say the relaxation is tight. For some problems (and distributions on instances), the relaxation is typically tight, which means that (1) can be typically solved by instead solving (2); for example, this phenomenon occurs in phase retrieval, in community detection, and in geometric clustering. Importantly, strong duality ensures that solving the dual of the convex relaxation provides a certificate of optimality.

## Recent advances in mathematical data science

Part of the experience of giving a talk at Oberwolfach is documentation. First, they ask you to handwrite the abstract of your talk into a notebook of sorts for safekeeping. Later, they ask you to tex up an extended abstract for further documentation. This time, I gave a longer version of my SPIE talk (here are the slides). Since I posted my extended abstract on my blog last time, I figured I’d do it again:

This talk describes recent work on three different problems of interest in mathematical data science, namely, compressive classification, $k$-means clustering, and deep learning. (Based on three papers: one, two, three.)

First, compressive classification is a problem that comes on the heels of compressive sensing. In compressive sensing, one exploits the underlying structure of a signal class in order to exactly reconstruct any signal from the class given very few linear measurements of the signal. However, many applications do not require an exact reconstruction of the image, but rather a classification of that image (for example, is this a picture of a cat, or of a dog?). As such, it makes intuitive sense that the classification task might succeed given far fewer measurements than are necessary for compressive sensing.

## Applied Harmonic Analysis and Sparse Approximation

Last week, I attended this workshop at Oberwolfach. The weather was great, and I was rather impressed by the quality of the talks. Throughout the workshop, I paid special attention to the various problems that different people are thinking about. In what follows, I briefly discuss several of these problems.

## Frames and Combinatorial & Algebraic Geometry

This last week, I attended a workshop at Universitat Bremen on Frames and Combinatorial & Algebraic Geometry. The organizers (Emily King and Eva Maria Feichtner) did a phenomenal job, and I was impressed by the quality of the talks (see this page for the slides). Below are a few of the things that we discussed throughout the workshop:

— Strohmer’s Problem —

At this year’s SampTA conference, Thomas Strohmer posed an interesting frame theory problem during his plenary talk. On the first day of the workshop, Ole Christensen suggested that we study the problem, and it ended up being one of our main focuses. Here’s a technical statement of the problem sans frame-theoretic jargon:

Strohmer’s Problem. Let $A$ be a known $n\times d$ matrix, let $D$ be an unknown $n\times n$ diagonal matrix, and let $X$ be an unknown $d\times m$ matrix. Under what conditions does $y=DAX$ uniquely determine $D$ and/or $X$?

This problem is motivated by the real-world problem of self-calibration (see for example this blog entry). In particular, the columns of $X$ represent unknown test signals, whereas $D$ represents unknown parameters of a linear sensor $DA$. The goal of self-calibration is to calibrate your sensor without requiring knowledge of the input (this is particularly important for applications like deep-space imaging).

## Cone Programming Cheat Sheet

I’m pretty excited about Afonso‘s latest research developments (namely, this and that), and I’ve been thinking with Jesse Peterson about various extensions, but we first wanted to sort out the basics of linear and semidefinite programming. Jesse typed up some notes and I’m posting them here for easy reference:

Many optimization problems can be viewed as a special case of cone programming. Given a closed convex cone $K$ in $\mathbb{R}^n$, we define the dual cone as

$K^*=\{y : \langle x,y\rangle\geq 0~\forall x\in K\}.$

Examples: The positive orthant and the positive semidefinite cone are both self-dual, and the dual of a subspace is its orthogonal complement. Throughout we assume we have $c\in\mathbb{R}^n$, $b\in\mathbb{R}^m$, closed convex cone $K\in\mathbb{R}^n$, closed convex cone $L\in\mathbb{R}^m$, and linear operator $A\colon\mathbb{R}^n\rightarrow\mathbb{R}^m$. We then have the primal and dual programs

$\mbox{(P)} \quad \max \quad \langle c,x\rangle \quad \mbox{s.t.} \quad b-Ax \in L,\quad x \in K$

$\mbox{(D)} \quad \min \quad \langle b,y\rangle \quad \mbox{s.t.} \quad A^\top y-c\in K^*,\quad y\in L^*$

Notice when $L$ and $K$ are both the positive orthant, this is a standard linear program. Further, considering the space of $n\times n$ real symmetric matrices as $\mathbb{R}^{n(n+1)/2}$, when $L=\{0\}$ and $K$ is the positive semidefinite cone, this is a standard semidefinite program. We consider several standard results (weak duality, strong duality, complementary slackness) in terms of the general cone program.

## An intuition for concentration inequalities

I use concentration inequalities a lot in my research, and it’s about time I write a post about them. Throughout this post, $\{X_i\}$ will denote independent realizations of a common random variable $X$ taking values in the real line. We will take $\mu:=\mathbb{E}X<\infty$, $\sigma^2:=\mathrm{Var}(X)<\infty$, and

$\displaystyle{\overline{X}_n:=\frac{1}{n}\sum_{i=1}^nX_i.}$

The central limit theorem (CLT) essentially states that

$\displaystyle{\lim_{n\rightarrow\infty}\mathrm{Pr}\bigg(\big|\overline{X}_n-\mu\big|\geq\frac{t\sigma}{\sqrt{n}}\bigg)=2Q(t)\leq e^{-t^2/2} \qquad \forall t\geq0,}$

where the Q-function $Q(\cdot)$ denotes the tail probability of the standard normal distribution. Unfortunately, this only ensures pointwise convergence for each $t$, meaning CLT provides no uniform tail bound for any fixed $n$. The purpose of a concentration inequality is to provide such a bound.