Gordon’s escape through a mesh theorem

In many applications (such as compressed sensing), we use a random projection to capture a subset of \mathbb{R}^n. Sometimes, you can show that a random projection performs well by appealing to the Johnson–Lindenstrauss lemma, which says that the pairwise distances of a finite collection of points are nearly preserved (for example, JL implies RIP). In other cases, we might call a projection “good” if its null space avoids a certain subset of \mathbb{R}^n (think the null space property), but how might we show that a random projection is good? By appealing to Gordon’s escape through a mesh theorem, which says that a random subspace avoids a subset (“escapes a mesh”) provided the subset is small in some sense. The purpose of this blog entry is to prove this theorem and provide some intuition.

Throughout, we take g_k to be a k-dimensional vector with iid N(0,1) entries, and we denote a_k:=\mathbb{E}\|g_k\|_2, which satisfies k/\sqrt{k+1}<a_k<\sqrt{k}. Given a closed subset S\subseteq\mathbb{R}^n, then its Gaussian width is defined as

\displaystyle{w(S):=\mathbb{E}\Big[\max_{x\in S}g_n^\top x\Big]}.

As we will see, Gaussian width is an important measure of the size of a set. Intuitively, the Gaussian width of S is more or less the expected width of S when measured in a random direction (it’s actually half of the mean width, as defined here), and it plays a crucial role in several results, including the following:

Gordon’s Comparison Theorem (Corollary 1.2 in Gordon’s paper). Take a closed subset S of the unit sphere in \mathbb{R}^n, and draw a k\times n matrix G with iid N(0,1) entries. Then

\displaystyle{\mathbb{E}\Big[\min_{x\in S}\|Gx\|_2\Big]\geq a_k-w(S)},


\displaystyle{\mathbb{E}\Big[\max_{x\in S}\|Gx\|_2\Big]\leq a_k+w(S)}.

Note that for any fixed x, \|Gx\|_2 has the same distribution as \|g_k\|_2, and so Gordon’s comparison theorem says you can expect \min_{x\in S}\|Gx\|_2 and \max_{x\in S}\|Gx\|_2 to be no more than w(S) away from a_k=\mathbb{E}\|g_k\|_2=\mathbb{E}\|Gx\|_2. This sort of guarantee is much weaker than the kind you’d get by taking a union bound over an epsilon net and appealing to concentration of measure. Still, this result is very useful because it’s often strong enough, and for some sets, the Gaussian width is easier to control than the size of the requisite epsilon net. For example, this paper uses linear programming duality to estimate the Guassian width of the set of vectors which are avoided in the null space property, and Proposition 3.6 in this paper similarly bounds the Gaussian width of sets in terms of an expected distance from a dual cone.

To prove Gordon’s escape through a mesh theorem, we will make use of Gordon’s comparison theorem as well as the following deviation inequality:

Proposition 1 (Corollary 2.3 in these notes). If f\colon\mathbb{R}^n\rightarrow\mathbb{R} is \sigma-Lipschitz, i.e., |f(x)-f(y)|\leq\sigma\|x-y\|_2 for every x,y\in\mathbb{R}^n, then

\mathrm{Pr}\Big(|f(g_n)-\mathbb{E}f(g_n)|>\lambda\Big)\leq\exp(-\lambda^2/2\sigma^2) \quad \forall \lambda>0.

The meat of the argument is in the proof of the following proposition, which says that a random subspace escapes an \epsilon-thickened mesh provided the original mesh is contained in the sphere and has a sufficiently small Gaussian width. The original (and more general) version of this result is Theorem 3.3 in Gordon’s paper, but I think the generality makes the proof needlessly difficult to parse.

Proposition 2. Take a closed subset S of the unit sphere in \mathbb{R}^n, and let B_\epsilon denote the closed ball centered at the origin of radius \epsilon>0. If w(S)<(1-\epsilon)a_k-\epsilon a_n, then an (n-k)-dimensional subspace Y drawn uniformly from the Grassmannian satisfies

\displaystyle{\mathrm{Pr}\Big(Y\cap(S+B_\epsilon)=\emptyset\Big)\geq1-\frac{7}{2}\exp\bigg(-\frac{1}{2}\bigg(\frac{(1-\epsilon)a_k-\epsilon a_n-w(S)}{3+\epsilon+\epsilon a_n/a_k}\bigg)^2\bigg)}.

Proof: We will take Y to be the null space of a k\times n matrix G with iid N(0,1) entries; indeed, this random subspace is uniformly distributed on the Grassmannian. To estimate the desired probability


we first note that

\displaystyle{Q:=\mathrm{Pr}\Big(\min_{x\in S}\|Gx\|_2\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2\Big)}

\displaystyle{=\mathrm{Pr}\bigg(\min_{\substack{x\in S\\y\in\mathrm{null}(G)}}\|G(x-y)\|_2\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2\bigg)}

\displaystyle{\leq\mathrm{Pr}\bigg(\min_{\substack{x\in S\\y\in\mathrm{null}(G)}}\|G\|_2\|x-y\|_2\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2\bigg)}

\displaystyle{\leq\mathrm{Pr}\Big(\|G\|_2\geq(1+\delta)\mathbb{E}\|G\|_2\Big)+\mathrm{Pr}\bigg(\min_{\substack{x\in\mathrm{null}(G)\\y\in S}}\|x-y\|_2>\epsilon\bigg)},

where the last step follows from a union bound (to see this, consider the contrapositive). Note that the second term above is precisely P. We will use Proposition 1 to bound the first term:

Given k\times n matrices T and T', note that


and similarly \|T'\|_2\leq\|T\|_2+\|T'-T\|_F. Putting these together then gives


i.e., the mapping T\mapsto\|T\|_2 is 1-Lipschitz with respect to the Frobenius norm. As such, by Proposition 1, a k\times n matrix G with iid N(0,1) entries satisfies


for any \delta>0. Next, monotonocity of expectation gives that


where e_1 is the first identity basis element. As such, we have


Overall, we have Q\leq\exp(-\delta^2a_k^2/2)+P, and rearranging gives a lower bound:

P\geq Q-\exp(-\delta^2a_k^2/2).\qquad(1)

As such, it remains to find a lower bound on Q. [At this point, you might be inclined to hunt for an epsilon net of S and do a union bound, but this is not what we do here. (!)] Taking Z to be N(0,1) and independent of G, then a union bound gives

\displaystyle{\mathrm{Pr}\Big(\|Gx\|_2+Z\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2 +\delta a_k\quad \forall x\in S\Big)}

\displaystyle{\leq Q+\mathrm{Pr}(Z\geq \delta a_k)\leq Q+\frac{1}{2}\exp(-\delta^2a_k^2/2)},

where the last step follows from the Chernoff bound. Furthermore (and this is the most interesting part of the proof), Lemma 3.1 in Gordon’s paper gives the following lower bound on the left-hand side above:

\displaystyle{R:=\mathrm{Pr}\Big(\|g_k\|_2+g_n^\top x\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2 +\delta a_k\quad \forall x\in S\Big)}

\displaystyle{\leq\mathrm{Pr}\Big(\|Gx\|_2+Z\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2 +\delta a_k\quad \forall x\in S\Big)},

where g_k and g_n are independent. Note that \|g_k\|_2 and \|Gx\|_2 have the same distribution for each x\in S, as do g_n^\top x and Z, but it is not at all obvious how the above probabilities should be related (due to the dependence on x). It turns out that the second mixed moments of y^\top Gx+Z for x\in S and y\in\mathbb{S}^{k-1} can be bounded in terms of the corresponding moments of y^\top g_k+g_n^\top x, which in turn leads to the probability estimate given above (thanks to Theorem B in Gordon’s paper). [This actually bears some resemblance to the proof of Gordan’s comparison theorem, which similarly bounds second mixed moments and appeals to Theorem A.]

At this point, we have

\displaystyle{Q\geq R-\frac{1}{2}\exp(-\delta^2a_k^2/2)},\qquad(2)

so it remains to find a lower bound on R. To this end, a union bound gives

\displaystyle{1-R=\mathrm{Pr}\Big(\exists x\in S \mbox{ s.t. }\|g_k\|_2+g_n^\top x<\epsilon(1+\delta)\mathbb{E}\|G\|_2 +\delta a_k\Big)}

\displaystyle{\leq\mathrm{Pr}\Big(\|g_k\|_2<(1-\delta)a_k\Big)+\mathrm{Pr}\Big(\exists x\in S \mbox{ s.t. }g_n^\top x\leq\epsilon(1+\delta)\mathbb{E}\|G\|_2-(1-2\delta)a_k\Big)}

Since x\mapsto\|x\|_2 is 1-Lipschitz in the 2-norm, Proposition 1 can be used to bound the first term above. Also, the second term can be reexpressed by noting that g_n has the same distribution as -g_n. Indeed,

\displaystyle{1-R\leq\exp(-\delta^2a_k^2/2)+\mathrm{Pr}\Big(\max_{x\in S}g_n^\top x\geq(1-2\delta)a_k-\epsilon(1+\delta)\mathbb{E}\|G\|_2\Big)}.

To bound the second term above, we first claim that f\colon g\mapsto \max_{x\in S}g^\top x is 1-Lipschitz. To see this, given a,b\in\mathbb{R}^n, suppose without loss of generality that f(a)\geq f(b), and that a^\top x is maximized at x=x_0. Then

|f(a)-f(b)|=f(a)-f(b)\leq a^\top x_0-b^\top x_0=(a-b)^\top x_0\leq\|a-b\|_2.

Thus, Proposition 1 gives

\displaystyle{\mathrm{Pr}\Big(\max_{x\in S}g_n^\top x\geq(1-2\delta)a_k-\epsilon(1+\delta)\mathbb{E}\|G\|_2\Big)}


provided (1-2\delta)a_k-\epsilon(1+\delta)\mathbb{E}\|G\|_2-w(S)>0. To satisfy this positivity condition, first note that the Gordon comparison theorem (with S taken to be the closed ball centered at the origin of radius 1) gives \mathbb{E}\|G\|_2\leq a_k+a_n. As such,


\geq\Big((1-\epsilon)a_k-\epsilon a_n-w(S)\Big)-\delta\Big((2+\epsilon)a_k+\epsilon a_n\Big)=:\sigma,

and since the first term of \sigma is strictly positive by assumption, it suffices to pick \delta such that

\displaystyle{\delta<\frac{(1-\epsilon)a_k-\epsilon a_n-w(S)}{(2+\epsilon)a_k+\epsilon a_n}}.

Overall, we have


Combining (1), (2) and (3) then gives

\displaystyle{P\geq 1-\frac{5}{2}\exp(-\delta^2a_k^2/2)-\exp(-\sigma^2/2)}.

We now pick \delta so as to balance and combine the terms on the right-hand side. Specifically, we take \delta such that \delta a_k=\sigma, which by the definition of \sigma, explicitly means

\displaystyle{\delta:=\frac{(1-\epsilon)a_k-\epsilon a_n-w(S)}{(3+\epsilon)a_k+\epsilon a_n}<\frac{(1-\epsilon)a_k-\epsilon a_n-w(S)}{(2+\epsilon)a_k+\epsilon a_n}}.

Then P\geq 1-\frac{7}{2}\exp(-\delta^2a_k^2/2), completing the result.     \Box

Now we essentially take the limit as \epsilon\rightarrow0 to get the main theorem:

Gordon’s Escape Through a Mesh Theorem (Corollary 3.4 in Gordon’s paper). Take a closed subset S of the unit sphere in \mathbb{R}^n. If w(S)<a_k, then an (n-k)-dimensional subspace Y drawn uniformly from the Grassmannian satisfies

\displaystyle{\mathrm{Pr}\Big(Y\cap S=\emptyset\Big)\geq1-\frac{7}{2}\exp\bigg(-\frac{1}{18}\Big(a_k-w(S)\Big)^2\bigg)}.

Proof: First note that for every x\in S, the closest y\in Y has norm \leq1 since it’s an orthogonal projection of x. As such, Y\cap S=\emptyset is equivalent to having \|x-y\|_2>0 for every x\in S and y\in Y\cap B_1, which in turn implies that Y\cap (S+B_\epsilon)=\emptyset when

\displaystyle{\epsilon=\frac{1}{2}\min_{\substack{x\in S\\y\in Y\cap B_1}}\|x-y\|_2}.

Indeed, the minimum is achieved (and thus \epsilon>0) since S\times(Y\cap B_1) is compact. As such, we have

\displaystyle{\mathrm{Pr}\Big(Y\cap S=\emptyset\Big)=\mathrm{Pr}\Big(\exists \epsilon>0 \mbox{ s.t. }Y\cap (S+B_\epsilon)=\emptyset\Big)}

\displaystyle{\geq\sup_{\epsilon>0}\mathrm{Pr}\Big(Y\cap (S+B_\epsilon)=\emptyset\Big)=\lim_{\epsilon\rightarrow 0^+}\mathrm{Pr}\Big(Y\cap (S+B_\epsilon)=\emptyset\Big)},

where the last step follows from the fact that Y\cap (S+B_\epsilon) implies Y\cap (S+B_{\epsilon'}) for every \epsilon'<\epsilon. Since w(S)<a_k by assumption, then we also have w(S)<(1-\epsilon)a_k-\epsilon a_n whenever


At this point, we appeal to Proposition 2 to bound \mathrm{Pr}(Y\cap (S+B_\epsilon)=\emptyset) for such \epsilon and take the limit to get the claimed bound.     \Box

4 thoughts on “Gordon’s escape through a mesh theorem

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s