The Voronoi Means Conjecture

UPDATE (July 26, 2016): Boris Alexeev recently disproved the Voronoi Means Conjecture! In particular, he found that certain stable isogons fail to exhibit the conjectured behavior, and his solution suggests a certain refinement of the conjecture. I asked him to write a guest blog entry about his solution, so expect to hear more in the coming weeks.

Suppose you’re given a sample from an unknown balanced mixture of k spherical Gaussians of equal variance in dimension d:


In the above example, k=3 and d=2. How do you estimate the centers \{\gamma_i\}_{i=1}^k of each Gaussian from the data? In this paper, Dasgupta provides an algorithm in which you project the data onto a randomly drawn subspace of some carefully selected dimension so as to concentrate the data points towards their respective centers. After doing so, there will be k extremely popular regions of the subspace, and for each region, you can average the corresponding points in the original dataset to estimate the corresponding Gaussian center. With this algorithm, Dasgupta proved that

\displaystyle{\mathrm{MSE}:=\frac{1}{k}\sum_{i=1}^k\|\hat{\gamma}_i-\gamma_i\|^2\lesssim d\sigma^2 \qquad\text{whp}}

provided \mathrm{SNR}:=\frac{1}{\sigma^2}\min_{i\neq j}\|\gamma_i-\gamma_j\|^2\gtrsim d.

By contrast, if you ask a data scientist to solve this problem, he might instinctively use the k-means algorithm. How does this alternative perform? As I mentioned in a previous blog post, I have a recent paper with Soledad Villar and Rachel Ward that analyzes an SDP relaxation of the k-means problem. It turns out that the solution to the relaxation can be interpreted as a denoised version of the original dataset, sending many of the data points very close to what appear to be k-means-optimal centroids. This is similar in spirit to Dasgupta’s random projection, and we use a similar rounding scheme to estimate the optimal k-means-centroids. Using these centroid estimates as Gaussian center estimates, we are able to prove performance bounds of the form \mathrm{MSE}\lesssim k^2\sigma^2 when \mathrm{SNR}\gtrsim k^2, meaning the performance doesn’t depend on the dimension, but rather the model order.

But does it make sense that the performance should even depend on the model order? This blog entry shows that it does seem to make sense, provided you’re using k-means to estimate the Gaussian centers.

To do so, we need to introduce a couple of new concepts. First, a stable isogon is a finite subset \Gamma of \mathbb{R}^d such that

  • |\Gamma|>1,
  • the symmetry group G\leq O(d) acts transitively on \Gamma, and
  • for each \gamma\in\Gamma, the stabilizer G_\gamma has the property that

\big\{x\in\mathbb{R}^d:Qx=x~~\forall Q\in G_\gamma\big\}=\mathrm{span}\{\gamma\}

Examples include the Platonic solids:

For this blog entry, the most important example will be the extreme points of the 1-ball (i.e., the orthoplex).

For the second new concept, consider a balanced mixture \mathcal{D} of spherical Gaussians of equal variance centered at the points of a stable isogon. For each \gamma\in\Gamma, let V_\gamma denote the corresponding Voronoi region. Then the Voronoi means are given by

\displaystyle{\mu_\gamma:=\mathop{\mathbb{E}}_{X\sim\mathcal{D}}[X|X\in V_\gamma]}

As an example, take \Gamma to be the standard orthoplex in two dimensions (each point having unit distance from the origin). Varying \sigma\in\{1,2,4\} and plotting the corresponding Voronoi means with circles gives the following:

Apparently, as the variance grows, the Voronoi means are pushed away from the true Gaussian centers. Notice that when \sigma=4, the mixture’s density function looks a lot like a single Gaussian centered at the center. This suggests that there isn’t much signal in the density for determining the true Gaussian centers. Indeed, if you sample 100 points from each Gaussian and then run k-means on all 400 points, then the results will vary. I ran this experiment 30 times and plotted the results:


Here, the Voronoi means are plotted in red. If we ramp up the number of points per Gaussian from 100 to 1000, then the k-means output varies quite a bit less:


Taking this to the extreme, we replace 1000 with 10,000 to get


Apparently, the k-means-optimal centroids converge to the Voronoi means in this case. We conjecture that this behavior is exhibited for general stable isogons:

The Voronoi Means Conecture. Draw N points from a balanced mixture of spherical Gaussians of equal variance centered at points in a stable isogon. Then the k-means-optimal centroids converge in probability to the Voronoi means as N\rightarrow\infty.

See the paper for the technical version of this conjecture statement. With this conjecture in mind, we return to our original task: Show the necessity of k-dependence in the performance of k-means-optimal centroids as estimates for Gaussian centers. To this end, VMC establishes that the Voronoi means serve as a worthy proxy for the k-means-optimal centroids, and so it suffices to show how Voronoi means differ from the Gaussian centers. As we saw above, they differ more as \sigma grows, and so we should expect some \sigma-dependence in the MSE. In the above example, the Voronoi means were positive scalar multiples of the Gaussian centers, and this is actually a consequence of \Gamma being a stable isogon:

Theorem. For each stable isogon \Gamma, there exists \alpha>0 such that \mu_\gamma=\alpha\frac{\gamma}{\|\gamma\|} for every \gamma\in\Gamma.

In the special case where \Gamma is an orthoplex in the first k/2 dimensions of \mathbb{R}^d, then one may write out a formula for \alpha as a (nasty) integral. In analyzing this integral, one can show that if the points in the orthoplex have norm c, then \alpha is a monotonically increasing function of c, and when c=0, \alpha\gtrsim\sigma\sqrt{\log k}. (This last bit essentially comes from the infinity norm of a Gaussian vector with k iid components.) With the help of this comparison, one can prove the following:

Theorem. If \Gamma is a (k/2)-dimensional orthoplex, then for every \sigma>0, either

\displaystyle{\min_{\gamma\in\Gamma}\|\mu_\gamma-\gamma\|\gtrsim\sigma\sqrt{\log k}\quad\text{or}\quad\min_{\gamma\neq\gamma'}\|\gamma-\gamma'\|\gtrsim\sigma\sqrt{\log k}}.

Overall, VMC implies that either MSE or SNR exhibits k-dependence. Of course, the necessary dependence is logarithmic in k, whereas the current theory with the SDP exhibits dependence which is polynomial in k, so it remains to determine the exact nature of this dependence. Of course, this theory depends on VMC, so it would be nice to know if this conjecture is even true:

I am offering a US$100 award for either a proof of the Voronoi Means Conjecture or strong evidence against it.

Similar to my previous award announcement, I have the following disclaimer:

This award has no time limit other than my death, and is entirely at my discretion. I might, also at my discretion, decide to split the award among several people or groups, or give a smaller award for partial progress. I don’t promise to read every claimed proof that’s emailed to me. The prize amount will not be adjusted for inflation.

For reference, a large class of stable isogons were studied by Broome and Waldron in this paper (they focused on the case where the symmetry group is irreducible). What other stable isogons are there? Also, do other stable isogons yield a stronger k-dependence? When investigating other stable isogons, it would be straightforward to numerically check the validity of VMC, and perhaps this will lead to strong evidence against VMC (and a hundred bucks!).


2 thoughts on “The Voronoi Means Conjecture

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s