## Some news regarding the Paley graph

Let $\mathbb{F}_p$ denote the field with $p$ elements, and let $Q_p$ denote the multiplicative subgroup of quadratic residues. For each prime $p\equiv 1\bmod 4$, we consider the Paley graph $G_p$ with vertex set $\mathbb{F}_p$, where two vertices are adjacent whenever their difference resides in $Q_p$. For example, the following illustration from Wikipedia depicts $G_{13}$:

The purpose of this blog entry is to discuss recent observations regarding the Paley graph.

## A few paper announcements

This last semester, I was a long-term visitor at the Simons Institute for the Theory of Computing. My time there was rather productive, resulting in a few (exciting!) arXiv preprints, which I discuss below.

1. SqueezeFit: Label-aware dimensionality reduction by semidefinite programming.

Suppose you have a bunch of points in high-dimensional Euclidean space, some labeled “cat” and others labeled “dog,” say. Can you find a low-rank projection such that after projection, cats and dogs remain separated? If you can implement such a projection as a sensor, then that sensor collects enough information to classify cats versus dogs. This is the main idea behind compressive classification.

## Talks from the Summer of ’17

This summer, I participated in several interesting conferences. This entry documents my slides and describes a few of my favorite talks from the summer. Here are links to my talks:

UPDATE: SIAM AG17 just posted a video of my talk.

Now for my favorite talks from FoCM, ILAS, SIAM AG17 and SPIE:

Ben RechtUnderstanding deep learning requires rethinking generalization

In machine learning, you hope to fit a model so as to be good at prediction. To do this, you fit to a training set and then evaluate with a test set. In general, if a simple model fits a large training set pretty well, you can expect the fit to generalize, meaning it will also fit the test set. By conventional wisdom, if the model happens to fit the training set exactly, then your model is probably not simple enough, meaning it will not fit the test set very well. According to Ben, this conventional wisdom is wrong. He demonstrates this by presenting some observations he made while training neural nets. In particular, he allowed the number of parameters to far exceed the size of the training set, and in doing so, he fit the training set exactly, and yet he still managed to fit the test set well. He suggested that generalization was successful here because stochastic gradient descent implicitly regularizes. For reference, in the linear case, stochastic gradient descent (aka the randomized Kaczmarz method) finds the solution of minimal 2-norm, and it converges faster when the optimal solution has smaller 2-norm. Along these lines, Ben has some work to demonstrate that even in the nonlinear case, fast convergence implies generalization.

## Compressed Sensing and its Applications 2015

Last week, I attended this conference in Berlin, and much like the last CSA conference, it was very nice. This year, most of the talks followed one of three themes:

• Application-driven compressed sensing
• Quadratic or bilinear problems
• Clustering in graphs or Euclidean space

Examples of application-driven CS include theoretical results for radar-inspired sensing matrices and model-based CS for quantitative MRI. Readers of this blog are probably familiar with the prototypical quadratic problem (phase retrieval), and bilinear problems include blind deconvolution and self-calibration. Recently, I have blogged quite a bit about clustering in Euclidean space (specifically, k-means clustering), but I haven’t written much about clustering in graphs (other than its application to phase retrieval). For the remainder of this entry, I will discuss two of the talks from CSA2015 that covered different aspects of graph clustering.

## Recent advances in mathematical data science

Part of the experience of giving a talk at Oberwolfach is documentation. First, they ask you to handwrite the abstract of your talk into a notebook of sorts for safekeeping. Later, they ask you to tex up an extended abstract for further documentation. This time, I gave a longer version of my SPIE talk (here are the slides). Since I posted my extended abstract on my blog last time, I figured I’d do it again:

This talk describes recent work on three different problems of interest in mathematical data science, namely, compressive classification, $k$-means clustering, and deep learning. (Based on three papers: one, two, three.)

First, compressive classification is a problem that comes on the heels of compressive sensing. In compressive sensing, one exploits the underlying structure of a signal class in order to exactly reconstruct any signal from the class given very few linear measurements of the signal. However, many applications do not require an exact reconstruction of the image, but rather a classification of that image (for example, is this a picture of a cat, or of a dog?). As such, it makes intuitive sense that the classification task might succeed given far fewer measurements than are necessary for compressive sensing.

## Applied Harmonic Analysis and Sparse Approximation

Last week, I attended this workshop at Oberwolfach. The weather was great, and I was rather impressed by the quality of the talks. Throughout the workshop, I paid special attention to the various problems that different people are thinking about. In what follows, I briefly discuss several of these problems.

## Three Paper Announcements

I’ve been pretty busy lately with writing and researching with visitors. These announcements serve as a quick summary of what I’ve been up to:

1. Tables of the existence of equiangular tight frames (with Matthew Fickus). Today, there’s quite a bit known about equiangular tight frames (ETFs), but what is known seems to be scattered across different papers. This paper surveys everything that is known, and tabulates all of the known real and complex ETFs with sufficiently few vectors in sufficiently small dimension. The tables were generated by coding up existence theorems in MATLAB so as to minimize errors. This serves as a “solution” to problem 21 in this documentation of the open problems discussed at the AIM workshop Frame theory intersects geometry. Recently, Matt and I have made a few ETF discoveries with John and Jesse, so you can expect this table to be updated after we announce these discoveries in the coming months.

## A conditional construction of restricted isometries

I have a recent paper on the arXiv with Afonso Bandeira and Joel Moreira that provides a deterministic RIP matrix which breaks the square-root bottleneck, conditional on a folklore conjecture in number theory.

Here’s the construction (essentially): Let $p$ be a prime which is 1 mod 4. Consider the $p\times p$ DFT matrix, and grab rows corresponding to the quadratic residues (i.e., perfect squares) modulo $p$. This construction was initially suggested in this paper. We already know that partial Fourier matrices break the square-root bottleneck if the rows are drawn at random, so our result corresponds to the intuition that quadratic residues exhibit some notion of pseudorandomness.

There are actually two folklore conjectures at play in our paper. Conjecture A implies that this matrix breaks the square-root bottleneck, which in turn implies Conjecture B:

## Sparse Representations, Numerical Linear Algebra, and Optimization Workshop

A couple of weeks ago, I attended the “Sparse Representations, Numerical Linear Algebra, and Optimization Workshop.” It was my first time at Banff, and I was thoroughly impressed by the weather, the facility, and the workshop organization. A few of the talks were recorded and are available here. Check out this good-looking group of participants:

I wanted to briefly outline some of the problems that were identified throughout the workshop.

## Robust width: A characterization of uniformly stable and robust compressed sensing

Jameson Cahill and I recently posted a new paper on the arXiv that we’re pretty excited about. Suppose you want to do compressed sensing with L1 minimization. That is, you get $y=\Phi x^\natural+e$ for some nearly $K$-sparse vector $x^\natural$ and some noise $e$ satisfying $\|e\|_2\leq\epsilon$, and then you attempt to reconstruct $x^\natural$ by taking

$\arg\min \quad \|x\|_1 \quad\mbox{ subject to } \quad\|\Phi x-y\|_2\leq\epsilon.$

One of the main results of compressed sensing is that the optimizer $x^\star$ of the above program satisfies

$\displaystyle{\|x^\star-x^\natural\|_2\leq \frac{C_0}{\sqrt{K}}\|x^\natural-x^\natural_K\|_1+C_1\epsilon}$

provided $\Phi$ satisfies the $(2K,\delta)$-restricted isometry property (RIP) for some $\delta<\sqrt{2}-1$. Here, $x^\natural_K$ denotes the best $K$-term approximation of $x^\natural$, and so $\|x^\natural-x^\natural_K\|_1$ captures how close $x^\natural$ is to being $K$-sparse. Jameson and I wondered if RIP was necessary to enjoy this uniformly stable and robust performance (uniform in the sense that the fixed matrix works for all $x^\natural$ simultaneously, stable because we don’t require $x^\natural$ to be exactly $K$-sparse, and robust because we allow noise). In our paper, we completely characterize the matrices that offer this performance, and we show that RIP matrices form a strict subset.