Talks from the Summer of ’17

This summer, I participated in several interesting conferences. This entry documents my slides and describes a few of my favorite talks from the summer. Here are links to my talks:

Now for my favorite talks from FoCM, ILAS, SIAM AG17 and SPIE:

Ben RechtUnderstanding deep learning requires rethinking generalization

In machine learning, you hope to fit a model so as to be good at prediction. To do this, you fit to a training set and then evaluate with a test set. In general, if a simple model fits a large training set pretty well, you can expect the fit to generalize, meaning it will also fit the test set. By conventional wisdom, if the model happens to fit the training set exactly, then your model is probably not simple enough, meaning it will not fit the test set very well. According to Ben, this conventional wisdom is wrong. He demonstrates this by presenting some observations he made while training neural nets. In particular, he allowed the number of parameters to far exceed the size of the training set, and in doing so, he fit the training set exactly, and yet he still managed to fit the test set well. He suggested that generalization was successful here because stochastic gradient descent implicitly regularizes. For reference, in the linear case, stochastic gradient descent (aka the randomized Kaczmarz method) finds the solution of minimal 2-norm, and it converges faster when the optimal solution has smaller 2-norm. Along these lines, Ben has some work to demonstrate that even in the nonlinear case, fast convergence implies generalization.

SOFT 2016: Summer of Frame Theory

This month, several experts in frame theory will be visiting my department, and so Matt Fickus and I decided to organize a workshop in the style of AIM. Considering the recent progress we’ve made on equiangular tight frames (ETFs) — namely, one, two, three, and four — we are hoping this workshop will spur further progress in this area. To kick off the month, I asked a few people to prepare hour-long chalk talks, and what follows are the extended abstracts:

1. Introduction to ETFs (Dustin G. Mixon)

Given a $d$-dimensional Hilbert space space $\mathbb{H}_d$ and a positive integer $n$, we are interested in packing $n$ lines through the origin so that the interior angle between any two is as large as possible. It is convenient to represent each line by a unit vector that spans the line, and in doing so, the problem amounts to finding unit vectors $\{\varphi_i\}_{i=1}^n$ that minimize coherence:

$\displaystyle{\mu\Big(\{\varphi_i\}_{i=1}^n\Big)=\max_{i\neq j}|\langle \varphi_i,\varphi_j\rangle|.}$

This minimization amounts to a nonconvex optimization problem. To construct provably optimal packings, one must prove a lower bound on $\mu$ for a given $n$ and spatial dimension $d$, and then construct an ensemble which meets equality in that bound. To date, we know of three bounds that are sharp:

The HRT Conjecture

A couple of weeks ago, I attended a workshop hosted by Darrin Speegle on the HRT Conjecture. First posed twenty years ago by Chris Heil, Jay Ramanathan, and Pankaj Topiwala in this paper, the conjecture states that every finite collection of time-frequency shifts of any nonzero function in $L^2(\mathbb{R})$ is linearly independent. In this post, I will discuss some of the key ideas behind some of the various attempts at chipping away at HRT, and I’ll describe what appear to be fundamental barriers to a complete solution. For more information, see these survey papers: one and two.

First, some notation: Denote the translation-by-$a$ and modulation-by-$b$ operators by

$(T_af)(x):=f(x-a),\qquad (M_bf)(x)=e^{2\pi i bx}f(x),$

respectively. Then the formal conjecture statement is as follows:

The HRT Conjecture. For every $0\neq f\in L^2(\mathbb{R})$ and every finite $\Lambda\subset\mathbb{R}^2$, the collection $\{M_bT_af\}_{(a,b)\in\Lambda}$ is linearly independent.

What follows are some of the popular methods for tackling HRT:

Compressed Sensing and its Applications 2015

Last week, I attended this conference in Berlin, and much like the last CSA conference, it was very nice. This year, most of the talks followed one of three themes:

• Application-driven compressed sensing
• Clustering in graphs or Euclidean space

Examples of application-driven CS include theoretical results for radar-inspired sensing matrices and model-based CS for quantitative MRI. Readers of this blog are probably familiar with the prototypical quadratic problem (phase retrieval), and bilinear problems include blind deconvolution and self-calibration. Recently, I have blogged quite a bit about clustering in Euclidean space (specifically, k-means clustering), but I haven’t written much about clustering in graphs (other than its application to phase retrieval). For the remainder of this entry, I will discuss two of the talks from CSA2015 that covered different aspects of graph clustering.

Recent advances in mathematical data science

Part of the experience of giving a talk at Oberwolfach is documentation. First, they ask you to handwrite the abstract of your talk into a notebook of sorts for safekeeping. Later, they ask you to tex up an extended abstract for further documentation. This time, I gave a longer version of my SPIE talk (here are the slides). Since I posted my extended abstract on my blog last time, I figured I’d do it again:

This talk describes recent work on three different problems of interest in mathematical data science, namely, compressive classification, $k$-means clustering, and deep learning. (Based on three papers: one, two, three.)

First, compressive classification is a problem that comes on the heels of compressive sensing. In compressive sensing, one exploits the underlying structure of a signal class in order to exactly reconstruct any signal from the class given very few linear measurements of the signal. However, many applications do not require an exact reconstruction of the image, but rather a classification of that image (for example, is this a picture of a cat, or of a dog?). As such, it makes intuitive sense that the classification task might succeed given far fewer measurements than are necessary for compressive sensing.

Applied Harmonic Analysis and Sparse Approximation

Last week, I attended this workshop at Oberwolfach. The weather was great, and I was rather impressed by the quality of the talks. Throughout the workshop, I paid special attention to the various problems that different people are thinking about. In what follows, I briefly discuss several of these problems.

Frames and Combinatorial & Algebraic Geometry

This last week, I attended a workshop at Universitat Bremen on Frames and Combinatorial & Algebraic Geometry. The organizers (Emily King and Eva Maria Feichtner) did a phenomenal job, and I was impressed by the quality of the talks (see this page for the slides). Below are a few of the things that we discussed throughout the workshop:

— Strohmer’s Problem —

At this year’s SampTA conference, Thomas Strohmer posed an interesting frame theory problem during his plenary talk. On the first day of the workshop, Ole Christensen suggested that we study the problem, and it ended up being one of our main focuses. Here’s a technical statement of the problem sans frame-theoretic jargon:

Strohmer’s Problem. Let $A$ be a known $n\times d$ matrix, let $D$ be an unknown $n\times n$ diagonal matrix, and let $X$ be an unknown $d\times m$ matrix. Under what conditions does $y=DAX$ uniquely determine $D$ and/or $X$?

This problem is motivated by the real-world problem of self-calibration (see for example this blog entry). In particular, the columns of $X$ represent unknown test signals, whereas $D$ represents unknown parameters of a linear sensor $DA$. The goal of self-calibration is to calibrate your sensor without requiring knowledge of the input (this is particularly important for applications like deep-space imaging).