Sudakov minoration: From Gaussian widths to epsilon nets

In a previous post, I went through Gordon’s escape through a mesh theorem. This theorem is unique because it leverages certain properties of Gaussian processes (and a quantification of size called Gaussian width) instead of passing to an epsilon net. The escape theorem is particularly important to areas like compressed sensing, and provides optimal guarantees in light of phase transitions. However, real-world implementations of compressed sensing tend to not apply Gaussian transforms, but rather random matrices with other distributions (such as Bernoulli or partial Fourier). As such, it is desirable to relax the hypothesis on the transform for the escape theorem to hold.

It is evident that Gaussian width is the correct measure of a set for phase transitions in the Gaussian setting, and we expect similar phase transitions in other settings. As detailed in this paper, an important measure of a set for the other settings is the size of an epsilon net. This leads to the following fundamental question:

What does the Gaussian width of a set say about the smallest possible size of an epsilon net for that set?

It is unclear to me how much is known about this, but I do know that something called Sudakov minoration provides a bound. In particular, Sudakov minoration guarantees the existence of a maximal \epsilon-packing of a certain size, and such a packing is necessarily a (2\epsilon)-covering. (why?) For the record, this blog post is based in part on these lecture notes. The following lemma is the main ingredient to Sudakov minoration:

Slepian’s Lemma. Let \{X_i\}_{i=1}^n and \{Y_i\}_{i=1}^n be Gaussian random variables (not necessarily iid) with mean zero. If


for every i and j, then \mathbb{E}\max_iX_i\geq\mathbb{E}\max_iY_i.

Sudakov Minoration Principle. Let \mathcal{P}_\epsilon denote a maximal \epsilon-packing of a compact set S\subseteq\mathbb{R}^N. Then

\log\displaystyle{|\mathcal{P}_\epsilon|\leq C\Big(\frac{w(S)}{\epsilon}\Big)^2},

where w denotes Gaussian width and C is a universal constant.

Proof: For every distinct x_i,x_j\in\mathcal{P}_\epsilon, we have

\mathbb{E}\Big(g_N^\top x_i-g_N^\top x_j\Big)^2=\|x_i-x_j\|^2\geq\epsilon^2=\mathbb{E}(Y_i-Y_j)^2,

where \{Y_i\}_{i=1}^{|\mathcal{P}_\epsilon|} are iid N(0,\epsilon^2/2). Then Slepian’s Lemma gives

\displaystyle{w(S)=\mathbb{E}\Big[\max_{x\in S}g_N^\top x\Big]\geq\mathbb{E}\Big[\max_{x\in \mathcal{P}_\epsilon}g_N^\top x\Big]\geq\mathbb{E}\Big[\max_{i=1,\ldots,|\mathcal{P}_\epsilon|}Y_i\Big]=\frac{\epsilon}{\sqrt{2}}\mathbb{E}\Big[\max_{i=1,\ldots,|\mathcal{P}_\epsilon|}Z_i\Big]},

where \{Z_i\}_{i=1}^{|\mathcal{P}_\epsilon|} are iid N(0,1). It remains to show that the expected value of the maximum of n iid standard Gaussians is like \sqrt{\log n}, which is verified in a lemma we save for the end of this post.     \Box

It should be pointed out that this lemma leaves a bit to be desired. For example, take a maximal \epsilon-packing P_\epsilon in an (M-1)-dimensional unit sphere, and consider the (disjoint) union of open balls of radius \epsilon/2 centered at each point in the packing. The total volume of these balls is given by |P_\epsilon|\cdot C\epsilon^M, for some constant C depending on M, whereas the ball of radius 1+\epsilon/2 which contains the entire union has volume C(1+\epsilon/2)^M. As such, we can compare volumes and isolate |P_\epsilon| to get the following bound:


and so \log|P_\epsilon|=O(M\log(1/\epsilon)). For comparison, the Gaussian width of this sphere is like \sqrt{M}, and so the bound that Sudakov minoration provides is particularly weak when \epsilon is small: \log|P_\epsilon|\leq O(M/\epsilon^2).

Here’s the technical lemma we used in the proof of Sudakov minoration:

Lemma. If \{Z_i\}_{i=1}^{n} are iid N(0,1), then \displaystyle{\mathbb{E}\Big[\max_{i=1,\ldots,n}Z_i\Big]=\Theta(\sqrt{\log n})}.

Proof: The upper bound is not so important in the proof of Sudakov minoration, but the argument is given here. The idea is to make a Chernoff-type argument in terms of the moment generating function of a Gaussian. For the lower bound, let Z' denote the random maximum of interest, take z such that \mathrm{Pr}(Z_i\geq z)=1/n, and condition on the event that every Z_i<z (this occurs with probability (1-1/n)^n):

\displaystyle{\mathbb{E}Z'=\mathbb{E}\big[Z'\big|Z_i<z~\forall i\big]\Big(1-\frac{1}{n}\Big)^n+\mathbb{E}\big[Z'\big|~\exists i\mbox{ s.t. }Z_i\geq z\big]\bigg(1-\Big(1-\frac{1}{n}\Big)^n\bigg)}

\displaystyle{\geq\mathbb{E}\big[Z'\big|Z_i<0~\forall i\big]+z\bigg(1-\Big(1-\frac{1}{n}\Big)^n\bigg)}.

Next, since Z'\geq\frac{1}{n}\sum_{i=1}^nZ_i, we can continue the bound:


At this point, we seek a lower bound on z. This can be achieved by rearranging

\displaystyle{\frac{1}{n}=\mathrm{Pr}(Z_i\geq z)\geq\frac{1}{3}e^{-z^2}},

i.e., z\geq\sqrt{\log(n/3)}. Substituting this into the previous bound then gives \mathbb{E}Z'=\Omega(\sqrt{\log n}), as desired.     \Box


2 thoughts on “Sudakov minoration: From Gaussian widths to epsilon nets

  1. Hello Dustin,

    Thank you for this interesting post.
    I guess there is a missing log() on the net size in the Sudakov minoration above.
    Should be

    Log |P_epsilon| <= C (w(S))^2/epsilon^2


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s