Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk

Vlad Voroninski recently posted an arXiv preprint with Paul Hand that provides compressed sensing guarantees using a neural net-based generative signal model. This offers some theoretical justification for the shocking empirical results presented in the “Compressed sensing using generative models” paper, which demonstrates compressed sensing with 10 times fewer measurements than conventional compressed sensing. I was especially excited to see this paper, having recently read Michael Elad’s editorial on deep learning. To learn more, I interviewed Vlad (see below); I’ve lightly edited his responses for formatting and hyperlinks:

DGM: What is the origin story of this project? Were you and Paul inspired by the “Compressed sensing using generative models” paper?

VV: I have been working extensively with applied deep learning for the last year or so, and have been inspired by recent applications of deep generative image priors to classical inverse problems, such as the super resolution work by Fei Fei Li et al. Moreover, recent work on regularizing with deep generative priors for synthesizing the preferred inputs to neural activations, by Yosinski et al., made me optimistic that GAN-based generative priors are capturing sophisticated natural image structure (the synthetic images obtained in this paper look incredibly realistic).

Continue reading Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk

A polynomial-time relaxation of the Gromov-Hausdorff distance

Soledad Villar recently posted her latest paper on the arXiv (joint work with Afonso Bandeira, Andrew Blumberg and Rachel Ward). This paper reduces an instance of cutting-edge data science (specifically, shape matching and point-cloud comparison) to a semidefinite program, and then investigates fast solvers using non-convex local methods. (Check out her blog for an interactive illustration of the results.) Soledad is on the job market this year, and I read about this paper in her research statement. I wanted to learn more, so I decided to interview her. I’ve lightly edited her responses for formatting and hyperlinks:

Continue reading A polynomial-time relaxation of the Gromov-Hausdorff distance

The Voronoi Means Conjecture

UPDATE (July 26, 2016): Boris Alexeev recently disproved the Voronoi Means Conjecture! In particular, he found that certain stable isogons fail to exhibit the conjectured behavior, and his solution suggests a certain refinement of the conjecture. I asked him to write a guest blog entry about his solution, so expect to hear more in the coming weeks.

Suppose you’re given a sample from an unknown balanced mixture of k spherical Gaussians of equal variance in dimension d:

gmm

In the above example, k=3 and d=2. How do you estimate the centers \{\gamma_i\}_{i=1}^k of each Gaussian from the data? In this paper, Dasgupta provides an algorithm in which you project the data onto a randomly drawn subspace of some carefully selected dimension so as to concentrate the data points towards their respective centers. After doing so, there will be k extremely popular regions of the subspace, and for each region, you can average the corresponding points in the original dataset to estimate the corresponding Gaussian center. With this algorithm, Dasgupta proved that

\displaystyle{\mathrm{MSE}:=\frac{1}{k}\sum_{i=1}^k\|\hat{\gamma}_i-\gamma_i\|^2\lesssim d\sigma^2 \qquad\text{whp}}

provided \mathrm{SNR}:=\frac{1}{\sigma^2}\min_{i\neq j}\|\gamma_i-\gamma_j\|^2\gtrsim d.

Continue reading The Voronoi Means Conjecture

On the low-rank approach for semidefinite programs arising in synchronization and community detection

From MaxCut to PhaseLift, semidefinite programming has proven to be rather powerful, especially for convex relaxation. SDP solvers take polynomial time, but the exponent is large, and anyone who’s run an SDP on CVX has experienced some frustration with the runtime. In practice, the SDP-optimal matrix tends to have extremely low rank, and so one may apply a rank constraint to facilitate the search for the SDP’s solution. This heuristic was first introduced by Burer and Monteiro, and it works well in practice, but the rank-constrained program is nonconvex and the theory is scant. Recently, the theory gap started to close with this paper:

On the low-rank approach for semidefinite programs arising in synchronization and community detection

Afonso S. Bandeira, Nicolas Boumal, Vladislav Voroninski

As the title suggests, this paper provides strong performance guarantees for the Burer-Monteiro heuristic in the particular cases of \mathbb{Z}_2 synchronization and community detection. I was very excited to see this paper, and so I interviewed one of the authors (Nicolas Boumal). I’ve lightly edited his responses for formatting and hyperlinks:

Continue reading On the low-rank approach for semidefinite programs arising in synchronization and community detection

Clustering noisy data with semidefinite relaxations

Soledad Villar recently posted our latest paper on the arXiv (this one coauthored by her advisor, Rachel Ward). The paper provides guarantees for the k-means SDP when the points are drawn from a subgaussian mixture model. This blog entry will discuss one of the main ideas in our analysis, which we borrowed from Guedon and Vershynin’s recent paper.

Let’s start with two motivating applications:

The first application comes from graph clustering. Consider the stochastic block model, in which the n vertices are secretly partitioned into two communities, each of size n/2, and edges between vertices of a common community are drawn iid with some probability p, and all other edges are drawn with probability q<p. The goal of community estimation is to estimate the communities given a random draw of the graph. For this task, you might be inclined to find the maximum likelihood estimator for this model, but this results in an integer program. Relaxing the program leads to a semidefinite program, and amazingly, this program is tight and recovers the true communities with high probability when p=(\alpha\log n)/n and q=(\beta\log n)/n for good choices of (\alpha,\beta). (See this paper.) These edge probabilities scale like the threshold for connected Erdos-Renyi graphs, and this makes sense since we wouldn’t know how to assign vertices in isolated components. If instead, the probabilities were to scale like 1/n, then we would be in the “giant component” regime, so we’d still expect enough signal to correctly assign a good fraction of the vertices, but the SDP is not tight in this regime.

Continue reading Clustering noisy data with semidefinite relaxations

Compressed Sensing and its Applications 2015

Last week, I attended this conference in Berlin, and much like the last CSA conference, it was very nice. This year, most of the talks followed one of three themes:

  • Application-driven compressed sensing
  • Quadratic or bilinear problems
  • Clustering in graphs or Euclidean space

Examples of application-driven CS include theoretical results for radar-inspired sensing matrices and model-based CS for quantitative MRI. Readers of this blog are probably familiar with the prototypical quadratic problem (phase retrieval), and bilinear problems include blind deconvolution and self-calibration. Recently, I have blogged quite a bit about clustering in Euclidean space (specifically, k-means clustering), but I haven’t written much about clustering in graphs (other than its application to phase retrieval). For the remainder of this entry, I will discuss two of the talks from CSA2015 that covered different aspects of graph clustering.

Continue reading Compressed Sensing and its Applications 2015

Probably certifiably correct algorithms

This post is based on two papers (one and two). The task is to quickly solve typical instances of a given problem, and to quickly produce a certificate of that solution. Generally, problems of interest are NP-hard, and so we consider a random distribution on problem instances with the philosophy that real-world instances might mimic this distribution. In my community, it is common to consider NP-hard optimization problems:

minimize f(x) subject to x\in S.     (1)

In some cases, f is convex but S is not, and so one might relax accordingly:

minimize f(x) subject to x\in T,     (2)

where T\supseteq S is some convex set. If the minimizer of (2) happens to be a member of S, then it’s also a minimizer of (1) — when this happens, we say the relaxation is tight. For some problems (and distributions on instances), the relaxation is typically tight, which means that (1) can be typically solved by instead solving (2); for example, this phenomenon occurs in phase retrieval, in community detection, and in geometric clustering. Importantly, strong duality ensures that solving the dual of the convex relaxation provides a certificate of optimality.

Continue reading Probably certifiably correct algorithms