An impossibility theorem for gerrymandering

I just posted my latest paper on the arXiv, this one co-authored with Boris Alexeev. In this paper, we study a new technique that is currently being reviewed by the Supreme Court to detect unconstitutional gerrymandering. Recall that gerrymandering refers to the process whereby electoral district boundaries are manipulated in order to reduce or increase the voting power of a certain group of people. Gerrymandered districts are frequently detected by their bizarre shape. Here are some noteworthy examples (from Wikipedia):

Last year, the Supreme Court found NC-1 and NC-12 (the first two districts above) to be the result of unconstitutional racial gerrymandering. We’ll discuss IL-4 (the district on the right) later.

Monte Carlo approximation certificates for k-means clustering

This week, I visited Afonso Bandeira at NYU to give a talk in the MaD seminar on the semidefinite relaxation of k-means. Here are the slides. The last part of the talk is very new; I worked it out with Soledad Villar while she visited me a couple weeks ago, and our paper just hit the arXiv. In this blog entry, I’ll briefly summarize the main idea of the paper.

Suppose you are given data points $\{x_i\}_{i\in T}\subseteq\mathbb{R}^m$, and you are tasked with finding the partition $C_1\sqcup\cdots\sqcup C_k=T$ that minimizes the k-means objective

$\displaystyle{\frac{1}{|T|}\sum_{t\in[k]}\sum_{i\in C_t}\bigg\|x_i-\frac{1}{|C_t|}\sum_{j\in C_t}x_j\bigg\|^2\qquad(T\text{-IP})}$

(Here, we normalize the objective by $|T|$ for convenience later.) To do this, you will likely run MATLAB’s built-in implementation of k-means++, which randomly selects $k$ of the data points (with an intelligent choice of random distribution), and then uses these data points as proto-centroids to initialize Lloyd’s algorithm. In practice, this works very well: After running it a few times, you generally get a very nice clustering. But when do you know to stop looking for an even better clustering?

Optimal line packings from finite group actions

Joey Iverson recently posted our latest paper with John Jasper on the arXiv. This paper can be viewed as a sequel of sorts to our previous paper, in which we introduced the idea of hunting for Gram matrices of equiangular tight frames (ETFs) in the adjacency algebras of association schemes, specifically group schemes. In this new paper, we focus on the so-called Schurian schemes. This proved to be a particularly fruitful restriction: We found an alternate construction of Hoggar’s lines, we found an explicit representation of the “elusive” $7\times 36$ packing from the real packings paper (based on a private tip from Henry Cohn), we found an $11\times 66$ packing involving the Mathieu group $M_{11}$ (this one beating the corresponding packing in Sloane’s database), we found some low-dimensional mutually unbiased bases, and we recovered nearly all small sized ETFs. In addition, we constructed the first known infinite family of ETFs with Heisenberg symmetry; while these aren’t SIC-POVMs, we suspect they are related to the objects of interest in Zauner’s conjecture (as in this paper, for example). This blog entry briefly describes the main ideas in the paper.

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

Marco Mondelli recently posted his latest paper on the arXiv (joint work with Andrea Montanari). This paper proves sharp guarantees for weak recovery in phase retrieval. In particular, given phaseless measurements against Gaussian vectors, they demonstrate that a properly tuned spectral estimate exhibits correlation with the ground truth, even when the sampling rate is at the information-theoretic limit. In addition, they show that their spectral estimate empirically performs well even when the measurements follow a more realistic coded diffraction model. I decided to reach out to Marco to learn more, and what follows is my interview. I’ve lightly edited his responses for formatting and hyperlinks:

DGM: Judging by your website, this project in phase retrieval appears to be a departure from your coding theory background. How did this project come about?

MM: Many of the tools employed in information and coding theory are very general and they prove useful also to solve problems in other fields, such as, compressed sensing, machine learning or data analysis. So this is the general philosophy that motivated my “detour”.

Talks from the Summer of ’17

This summer, I participated in several interesting conferences. This entry documents my slides and describes a few of my favorite talks from the summer. Here are links to my talks:

UPDATE: SIAM AG17 just posted a video of my talk.

Now for my favorite talks from FoCM, ILAS, SIAM AG17 and SPIE:

Ben RechtUnderstanding deep learning requires rethinking generalization

In machine learning, you hope to fit a model so as to be good at prediction. To do this, you fit to a training set and then evaluate with a test set. In general, if a simple model fits a large training set pretty well, you can expect the fit to generalize, meaning it will also fit the test set. By conventional wisdom, if the model happens to fit the training set exactly, then your model is probably not simple enough, meaning it will not fit the test set very well. According to Ben, this conventional wisdom is wrong. He demonstrates this by presenting some observations he made while training neural nets. In particular, he allowed the number of parameters to far exceed the size of the training set, and in doing so, he fit the training set exactly, and yet he still managed to fit the test set well. He suggested that generalization was successful here because stochastic gradient descent implicitly regularizes. For reference, in the linear case, stochastic gradient descent (aka the randomized Kaczmarz method) finds the solution of minimal 2-norm, and it converges faster when the optimal solution has smaller 2-norm. Along these lines, Ben has some work to demonstrate that even in the nonlinear case, fast convergence implies generalization.

Packings in real projective spaces

There has been a lot of work recently on constructing line packings that achieve equality in either the Welch bound or the orthoplex bound. It has proven much harder to pack in regimes where these bounds are not tight. To help fill this void, about a month ago, I posted a new paper with Matt Fickus and John Jasper on the arXiv. We provide a few results to approach this case, which I outline below.

1. The minimal coherence of 6 unit vectors in $\mathbb{R}^4$ is 1/3.

The Welch bound is known to not be tight whenever $n$ lies strictly between $d+1$ and $d+\sqrt{2d+1/4}+1/2$ (see the next section for a proof sketch). As such, new techniques are required to prove optimality in this range. We leverage ideas from real algebraic geometry to show how to solve the case of $d+2$ vectors in $\mathbb{R}^d$ for all sufficiently small $d$. For example, our method provides a new proof of the optimality of 5 non-antipodal vertices of the icosahedron in $\mathbb{R}^3$, as well as the optimality of Sloane’s packing of 6 lines in $\mathbb{R}^4$.

Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk

Vlad Voroninski recently posted an arXiv preprint with Paul Hand that provides compressed sensing guarantees using a neural net-based generative signal model. This offers some theoretical justification for the shocking empirical results presented in the “Compressed sensing using generative models” paper, which demonstrates compressed sensing with 10 times fewer measurements than conventional compressed sensing (the source code is available here). I was especially excited to see this paper, having recently read Michael Elad’s editorial on deep learning. To learn more, I interviewed Vlad (see below); I’ve lightly edited his responses for formatting and hyperlinks:

DGM: What is the origin story of this project? Were you and Paul inspired by the “Compressed sensing using generative models” paper?

VV: I have been working extensively with applied deep learning for the last year or so, and have been inspired by recent applications of deep generative image priors to classical inverse problems, such as the super resolution work by Fei Fei Li et al. Moreover, recent work on regularizing with deep generative priors for synthesizing the preferred inputs to neural activations, by Yosinski et al., made me optimistic that GAN-based generative priors are capturing sophisticated natural image structure (the synthetic images obtained in this paper look incredibly realistic).