Talks from the Summer of ’17

This summer, I participated in several interesting conferences. This entry documents my slides and describes a few of my favorite talks from the summer. Here are links to my talks:

Now for my favorite talks from FoCM, ILAS, SIAM AG17 and SPIE:

Ben RechtUnderstanding deep learning requires rethinking generalization

In machine learning, you hope to fit a model so as to be good at prediction. To do this, you fit to a training set and then evaluate with a test set. In general, if a simple model fits a large training set pretty well, you can expect the fit to generalize, meaning it will also fit the test set. By conventional wisdom, if the model happens to fit the training set exactly, then your model is probably not simple enough, meaning it will not fit the test set very well. According to Ben, this conventional wisdom is wrong. He demonstrates this by presenting some observations he made while training neural nets. In particular, he allowed the number of parameters to far exceed the size of the training set, and in doing so, he fit the training set exactly, and yet he still managed to fit the test set well. He suggested that generalization was successful here because stochastic gradient descent implicitly regularizes. For reference, in the linear case, stochastic gradient descent (aka the randomized Kaczmarz method) finds the solution of minimal 2-norm, and it converges faster when the optimal solution has smaller 2-norm. Along these lines, Ben has some work to demonstrate that even in the nonlinear case, fast convergence implies generalization.

Continue reading Talks from the Summer of ’17

Notes on Zauner’s conjecture

Last week, I visited Joey Iverson at the University of Maryland, and we spent a lot of time working through different approaches to Zauner’s conjecture. In general, my relationship with this problem is very similar to Steve Flammia‘s description, as paraphrased by smerkel on Physics Stack Exchange:

[Flammia described] the SIC-POVM problem as a “heartbreaker” because every approach you take seems super promising but then inevitably fizzles out without really giving you a great insight as to why.

Case in point, Joey and I identified a promising approach involving ideas from our association schemes paper. We were fairly optimistic, and Joey even bet me $5 that our approach would work. Needless to say, I now have this keepsake from Joey:

5dollars.png

While our failure didn’t offer any great insights (as Flammia predicted), the experience forced me to review the literature on Zauner’s conjecture a lot more carefully. A few things caught my eye, and I’ll discuss them here. Throughout, SIC denotes “symmetric informationally complete line set” and WH denotes “the Weyl-Heisenberg group.”

Continue reading Notes on Zauner’s conjecture

Zauner’s conjecture is true in dimensions 18, 20, 21, 30, 31, 37, 39 and 43

Two years ago, I blogged about Tuan-Yow Chien’s PhD thesis, which proved Zauner’s conjecture in dimension 17. The idea was to exploit certain conjectures on the field structure of SIC-POVM fiducial vectors so as to round numerical solutions to exact solutions. This week, the arXiv announced Chien’s latest paper (coauthored with Appleby, Flammia and Waldron), which extends this work to find exact solutions in 8 new dimensions.

The following line from the introduction caught my eye:

For instance the print-out for exact fiducial 48a occupies almost a thousand A4 pages (font size 9 and narrow margins).

As my previous blog entry illustrated, the description length of SIC-POVM fiducial vectors appears to grow rapidly with d. However, it seems that the rate of growth is much better than I originally thought. Here’s a plot of the description lengths of the known fiducial vectors (the new ones due to ACFW17available here — appear in red):

Continue reading Zauner’s conjecture is true in dimensions 18, 20, 21, 30, 31, 37, 39 and 43

Optimal line packings from association schemes

Joey Iverson recently posted our paper with John Jasper on the arXiv. As the title suggests, this paper explains how to construct optimal line packings (specifically, equiangular tight frames) using association schemes. In particular, we identify many schemes whose adjacency algebra contains the Gram matrix of an ETF. This unifies several existing constructions, and also enabled us to construct the first known infinite family of ETFs that arise from nonabelian groups.

John is on the job market this year, and when reading his research statement, I was struck by his discussion of our paper, so I asked him to expand his treatment to a full blown blog entry. Without further ado, here is John’s guest blog post (which I’ve lightly edited for hyperlinks and formatting):

Continue reading Optimal line packings from association schemes

Recent developments in equiangular tight frames, II

Equiangular tight frames (ETFs) are optimal packings of lines through the origin. At the moment, they are the subject of a rapidly growing literature. In fact, there have been quite a few updates since my last post on this subject (less than five months ago), and I’ve revamped the table of ETFs accordingly. What follows is a brief discussion of the various developments:

1. There is an ETF of 76 vectors in \mathbb{C}^{19}

See this paper. Last time, I mentioned a recent proof that there is no ETF of 76 vectors in \mathbb{R}^{19}. It turns out that a complex ETF of this size does exist. To prove this, it actually seems more natural to view the vectors as columns of a 20\times 76 matrix whose row vectors sum to zero. As a lower-dimensional example, consider the following matrix:

\displaystyle{\left[\begin{array}{cccccccccc}+&+&+&+&+&+&+&+&+&+\\+&+&+&+&-&-&-&-&-&-\\+&-&-&-&+&+&+&-&-&-\\-&+&-&-&+&-&-&+&+&-\\-&-&+&-&-&+&-&+&-&+\\-&-&-&+&-&-&+&-&+&+\end{array}\right]}

Continue reading Recent developments in equiangular tight frames, II

SOFT 2016: Summer of Frame Theory

This month, several experts in frame theory will be visiting my department, and so Matt Fickus and I decided to organize a workshop in the style of AIM. Considering the recent progress we’ve made on equiangular tight frames (ETFs) — namely, one, two, three, and four — we are hoping this workshop will spur further progress in this area. To kick off the month, I asked a few people to prepare hour-long chalk talks, and what follows are the extended abstracts:

1. Introduction to ETFs (Dustin G. Mixon)

Given a d-dimensional Hilbert space space \mathbb{H}_d and a positive integer n, we are interested in packing n lines through the origin so that the interior angle between any two is as large as possible. It is convenient to represent each line by a unit vector that spans the line, and in doing so, the problem amounts to finding unit vectors \{\varphi_i\}_{i=1}^n that minimize coherence:

\displaystyle{\mu\Big(\{\varphi_i\}_{i=1}^n\Big)=\max_{i\neq j}|\langle \varphi_i,\varphi_j\rangle|.}

This minimization amounts to a nonconvex optimization problem. To construct provably optimal packings, one must prove a lower bound on \mu for a given n and spatial dimension d, and then construct an ensemble which meets equality in that bound. To date, we know of three bounds that are sharp:

Continue reading SOFT 2016: Summer of Frame Theory

The Voronoi Means Conjecture

UPDATE (July 26, 2016): Boris Alexeev recently disproved the Voronoi Means Conjecture! In particular, he found that certain stable isogons fail to exhibit the conjectured behavior, and his solution suggests a certain refinement of the conjecture. I asked him to write a guest blog entry about his solution, so expect to hear more in the coming weeks.

Suppose you’re given a sample from an unknown balanced mixture of k spherical Gaussians of equal variance in dimension d:

gmm

In the above example, k=3 and d=2. How do you estimate the centers \{\gamma_i\}_{i=1}^k of each Gaussian from the data? In this paper, Dasgupta provides an algorithm in which you project the data onto a randomly drawn subspace of some carefully selected dimension so as to concentrate the data points towards their respective centers. After doing so, there will be k extremely popular regions of the subspace, and for each region, you can average the corresponding points in the original dataset to estimate the corresponding Gaussian center. With this algorithm, Dasgupta proved that

\displaystyle{\mathrm{MSE}:=\frac{1}{k}\sum_{i=1}^k\|\hat{\gamma}_i-\gamma_i\|^2\lesssim d\sigma^2 \qquad\text{whp}}

provided \mathrm{SNR}:=\frac{1}{\sigma^2}\min_{i\neq j}\|\gamma_i-\gamma_j\|^2\gtrsim d.

Continue reading The Voronoi Means Conjecture