The following tweet has been making the rounds on social media:
Notice that since the Democrats won overall, it would be impossible for Republicans to win all seven districts, no matter how the map was drawn. As such, you might say that Alabama has been gerrymandered as effectively as possible. Sure enough, you can find the usual tell-tale signs of gerrymandering: Some of the district boundaries exhibit strange jagged portions — these portions undoubtedly maneuver to include some parts of the map while avoiding others.
Continue reading Partisan gerrymandering with geographically compact districts
I just posted my latest paper on the arXiv, this one co-authored with Boris Alexeev. In this paper, we study a new technique that is currently being reviewed by the Supreme Court to detect unconstitutional gerrymandering. Recall that gerrymandering refers to the process whereby electoral district boundaries are manipulated in order to reduce or increase the voting power of a certain group of people. Gerrymandered districts are frequently detected by their bizarre shape. Here are some noteworthy examples (from Wikipedia):
Last year, the Supreme Court found NC-1 and NC-12 (the first two districts above) to be the result of unconstitutional racial gerrymandering. We’ll discuss IL-4 (the district on the right) later.
Continue reading An impossibility theorem for gerrymandering
This week, I visited Afonso Bandeira at NYU to give a talk in the MaD seminar on the semidefinite relaxation of k-means. Here are the slides. The last part of the talk is very new; I worked it out with Soledad Villar while she visited me a couple weeks ago, and our paper just hit the arXiv. In this blog entry, I’ll briefly summarize the main idea of the paper.
Suppose you are given data points , and you are tasked with finding the partition that minimizes the k-means objective
(Here, we normalize the objective by for convenience later.) To do this, you will likely run MATLAB’s built-in implementation of k-means++, which randomly selects of the data points (with an intelligent choice of random distribution), and then uses these data points as proto-centroids to initialize Lloyd’s algorithm. In practice, this works very well: After running it a few times, you generally get a very nice clustering. But when do you know to stop looking for an even better clustering?
Continue reading Monte Carlo approximation certificates for k-means clustering
Joey Iverson recently posted our latest paper with John Jasper on the arXiv. This paper can be viewed as a sequel of sorts to our previous paper, in which we introduced the idea of hunting for Gram matrices of equiangular tight frames (ETFs) in the adjacency algebras of association schemes, specifically group schemes. In this new paper, we focus on the so-called Schurian schemes. This proved to be a particularly fruitful restriction: We found an alternate construction of Hoggar’s lines, we found an explicit representation of the “elusive” packing from the real packings paper (based on a private tip from Henry Cohn), we found an packing involving the Mathieu group (this one beating the corresponding packing in Sloane’s database), we found some low-dimensional mutually unbiased bases, and we recovered nearly all small sized ETFs. In addition, we constructed the first known infinite family of ETFs with Heisenberg symmetry; while these aren’t SIC-POVMs, we suspect they are related to the objects of interest in Zauner’s conjecture (as in this paper, for example). This blog entry briefly describes the main ideas in the paper.
Continue reading Optimal line packings from finite group actions
Marco Mondelli recently posted his latest paper on the arXiv (joint work with Andrea Montanari). This paper proves sharp guarantees for weak recovery in phase retrieval. In particular, given phaseless measurements against Gaussian vectors, they demonstrate that a properly tuned spectral estimate exhibits correlation with the ground truth, even when the sampling rate is at the information-theoretic limit. In addition, they show that their spectral estimate empirically performs well even when the measurements follow a more realistic coded diffraction model. I decided to reach out to Marco to learn more, and what follows is my interview. I’ve lightly edited his responses for formatting and hyperlinks:
DGM: Judging by your website, this project in phase retrieval appears to be a departure from your coding theory background. How did this project come about?
MM: Many of the tools employed in information and coding theory are very general and they prove useful also to solve problems in other fields, such as, compressed sensing, machine learning or data analysis. So this is the general philosophy that motivated my “detour”.
Continue reading Fundamental Limits of Weak Recovery with Applications to Phase Retrieval
This summer, I participated in several interesting conferences. This entry documents my slides and describes a few of my favorite talks from the summer. Here are links to my talks:
UPDATE: SIAM AG17 just posted a video of my talk.
Now for my favorite talks from FoCM, ILAS, SIAM AG17 and SPIE:
Ben Recht — Understanding deep learning requires rethinking generalization
In machine learning, you hope to fit a model so as to be good at prediction. To do this, you fit to a training set and then evaluate with a test set. In general, if a simple model fits a large training set pretty well, you can expect the fit to generalize, meaning it will also fit the test set. By conventional wisdom, if the model happens to fit the training set exactly, then your model is probably not simple enough, meaning it will not fit the test set very well. According to Ben, this conventional wisdom is wrong. He demonstrates this by presenting some observations he made while training neural nets. In particular, he allowed the number of parameters to far exceed the size of the training set, and in doing so, he fit the training set exactly, and yet he still managed to fit the test set well. He suggested that generalization was successful here because stochastic gradient descent implicitly regularizes. For reference, in the linear case, stochastic gradient descent (aka the randomized Kaczmarz method) finds the solution of minimal 2-norm, and it converges faster when the optimal solution has smaller 2-norm. Along these lines, Ben has some work to demonstrate that even in the nonlinear case, fast convergence implies generalization.
Continue reading Talks from the Summer of ’17