# Gordon’s escape through a mesh theorem

In many applications (such as compressed sensing), we use a random projection to capture a subset of $\mathbb{R}^n$. Sometimes, you can show that a random projection performs well by appealing to the Johnson–Lindenstrauss lemma, which says that the pairwise distances of a finite collection of points are nearly preserved (for example, JL implies RIP). In other cases, we might call a projection “good” if its null space avoids a certain subset of $\mathbb{R}^n$ (think the null space property), but how might we show that a random projection is good? By appealing to Gordon’s escape through a mesh theorem, which says that a random subspace avoids a subset (“escapes a mesh”) provided the subset is small in some sense. The purpose of this blog entry is to prove this theorem and provide some intuition.

Throughout, we take $g_k$ to be a $k$-dimensional vector with iid $N(0,1)$ entries, and we denote $a_k:=\mathbb{E}\|g_k\|_2$, which satisfies $k/\sqrt{k+1}. Given a closed subset $S\subseteq\mathbb{R}^n$, then its Gaussian width is defined as

$\displaystyle{w(S):=\mathbb{E}\Big[\max_{x\in S}g_n^\top x\Big]}$.

As we will see, Gaussian width is an important measure of the size of a set. Intuitively, the Gaussian width of $S$ is more or less the expected width of $S$ when measured in a random direction (it’s actually half of the mean width, as defined here), and it plays a crucial role in several results, including the following:

Gordon’s Comparison Theorem (Corollary 1.2 in Gordon’s paper). Take a closed subset $S$ of the unit sphere in $\mathbb{R}^n$, and draw a $k\times n$ matrix $G$ with iid $N(0,1)$ entries. Then

$\displaystyle{\mathbb{E}\Big[\min_{x\in S}\|Gx\|_2\Big]\geq a_k-w(S)}$,

and

$\displaystyle{\mathbb{E}\Big[\max_{x\in S}\|Gx\|_2\Big]\leq a_k+w(S)}$.

Note that for any fixed $x$, $\|Gx\|_2$ has the same distribution as $\|g_k\|_2$, and so Gordon’s comparison theorem says you can expect $\min_{x\in S}\|Gx\|_2$ and $\max_{x\in S}\|Gx\|_2$ to be no more than $w(S)$ away from $a_k=\mathbb{E}\|g_k\|_2=\mathbb{E}\|Gx\|_2$. This sort of guarantee is much weaker than the kind you’d get by taking a union bound over an epsilon net and appealing to concentration of measure. Still, this result is very useful because it’s often strong enough, and for some sets, the Gaussian width is easier to control than the size of the requisite epsilon net. For example, this paper uses linear programming duality to estimate the Guassian width of the set of vectors which are avoided in the null space property, and Proposition 3.6 in this paper similarly bounds the Gaussian width of sets in terms of an expected distance from a dual cone.

To prove Gordon’s escape through a mesh theorem, we will make use of Gordon’s comparison theorem as well as the following deviation inequality:

Proposition 1 (Corollary 2.3 in these notes). If $f\colon\mathbb{R}^n\rightarrow\mathbb{R}$ is $\sigma$-Lipschitz, i.e., $|f(x)-f(y)|\leq\sigma\|x-y\|_2$ for every $x,y\in\mathbb{R}^n$, then

$\mathrm{Pr}\Big(|f(g_n)-\mathbb{E}f(g_n)|>\lambda\Big)\leq\exp(-\lambda^2/2\sigma^2) \quad \forall \lambda>0$.

The meat of the argument is in the proof of the following proposition, which says that a random subspace escapes an $\epsilon$-thickened mesh provided the original mesh is contained in the sphere and has a sufficiently small Gaussian width. The original (and more general) version of this result is Theorem 3.3 in Gordon’s paper, but I think the generality makes the proof needlessly difficult to parse.

Proposition 2. Take a closed subset $S$ of the unit sphere in $\mathbb{R}^n$, and let $B_\epsilon$ denote the closed ball centered at the origin of radius $\epsilon>0$. If $w(S)<(1-\epsilon)a_k-\epsilon a_n$, then an $(n-k)$-dimensional subspace $Y$ drawn uniformly from the Grassmannian satisfies

$\displaystyle{\mathrm{Pr}\Big(Y\cap(S+B_\epsilon)=\emptyset\Big)\geq1-\frac{7}{2}\exp\bigg(-\frac{1}{2}\bigg(\frac{(1-\epsilon)a_k-\epsilon a_n-w(S)}{3+\epsilon+\epsilon a_n/a_k}\bigg)^2\bigg)}$.

Proof: We will take $Y$ to be the null space of a $k\times n$ matrix $G$ with iid $N(0,1)$ entries; indeed, this random subspace is uniformly distributed on the Grassmannian. To estimate the desired probability

$\displaystyle{P:=\mathrm{Pr}\Big(\mathrm{null}(G)\cap(S+B_\epsilon)=\emptyset\Big)}$,

we first note that

$\displaystyle{Q:=\mathrm{Pr}\Big(\min_{x\in S}\|Gx\|_2\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2\Big)}$

$\displaystyle{=\mathrm{Pr}\bigg(\min_{\substack{x\in S\\y\in\mathrm{null}(G)}}\|G(x-y)\|_2\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2\bigg)}$

$\displaystyle{\leq\mathrm{Pr}\bigg(\min_{\substack{x\in S\\y\in\mathrm{null}(G)}}\|G\|_2\|x-y\|_2\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2\bigg)}$

$\displaystyle{\leq\mathrm{Pr}\Big(\|G\|_2\geq(1+\delta)\mathbb{E}\|G\|_2\Big)+\mathrm{Pr}\bigg(\min_{\substack{x\in\mathrm{null}(G)\\y\in S}}\|x-y\|_2>\epsilon\bigg)}$,

where the last step follows from a union bound (to see this, consider the contrapositive). Note that the second term above is precisely $P$. We will use Proposition 1 to bound the first term:

Given $k\times n$ matrices $T$ and $T'$, note that

$\|T\|_2=\|T'+T-T'\|_2\leq\|T'\|_2+\|T-T'\|_2\leq\|T'\|_2+\|T-T'\|_F$,

and similarly $\|T'\|_2\leq\|T\|_2+\|T'-T\|_F$. Putting these together then gives

$\big|\|T\|_2-\|T'\|_2\big|\leq\|T-T'\|_F$,

i.e., the mapping $T\mapsto\|T\|_2$ is $1$-Lipschitz with respect to the Frobenius norm. As such, by Proposition 1, a $k\times n$ matrix $G$ with iid $N(0,1)$ entries satisfies

$\displaystyle{\mathrm{Pr}\Big(\|G\|_2\geq(1+\delta)\mathbb{E}\|G\|_2\Big)\leq\exp\bigg(-\frac{1}{2}\Big(\delta\mathbb{E}\|G\|_2\Big)^2\bigg)}$

for any $\delta>0$. Next, monotonocity of expectation gives that

$\mathbb{E}\|G\|_2\geq\mathbb{E}\|Ge_1\|_2=\mathbb{E}\|g_k\|_2=a_k$,

where $e_1$ is the first identity basis element. As such, we have

$\displaystyle{\mathrm{Pr}\Big(\|G\|_2\geq(1+\delta)\mathbb{E}\|G\|_2\Big)\leq\exp(-\delta^2a_k^2/2)}$.

Overall, we have $Q\leq\exp(-\delta^2a_k^2/2)+P$, and rearranging gives a lower bound:

$P\geq Q-\exp(-\delta^2a_k^2/2).\qquad(1)$

As such, it remains to find a lower bound on $Q$. [At this point, you might be inclined to hunt for an epsilon net of $S$ and do a union bound, but this is not what we do here. (!)] Taking $Z$ to be $N(0,1)$ and independent of $G$, then a union bound gives

$\displaystyle{\mathrm{Pr}\Big(\|Gx\|_2+Z\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2 +\delta a_k\quad \forall x\in S\Big)}$

$\displaystyle{\leq Q+\mathrm{Pr}(Z\geq \delta a_k)\leq Q+\frac{1}{2}\exp(-\delta^2a_k^2/2)}$,

where the last step follows from the Chernoff bound. Furthermore (and this is the most interesting part of the proof), Lemma 3.1 in Gordon’s paper gives the following lower bound on the left-hand side above:

$\displaystyle{R:=\mathrm{Pr}\Big(\|g_k\|_2+g_n^\top x\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2 +\delta a_k\quad \forall x\in S\Big)}$

$\displaystyle{\leq\mathrm{Pr}\Big(\|Gx\|_2+Z\geq\epsilon(1+\delta)\mathbb{E}\|G\|_2 +\delta a_k\quad \forall x\in S\Big)}$,

where $g_k$ and $g_n$ are independent. Note that $\|g_k\|_2$ and $\|Gx\|_2$ have the same distribution for each $x\in S$, as do $g_n^\top x$ and $Z$, but it is not at all obvious how the above probabilities should be related (due to the dependence on $x$). It turns out that the second mixed moments of $y^\top Gx+Z$ for $x\in S$ and $y\in\mathbb{S}^{k-1}$ can be bounded in terms of the corresponding moments of $y^\top g_k+g_n^\top x$, which in turn leads to the probability estimate given above (thanks to Theorem B in Gordon’s paper). [This actually bears some resemblance to the proof of Gordan’s comparison theorem, which similarly bounds second mixed moments and appeals to Theorem A.]

At this point, we have

$\displaystyle{Q\geq R-\frac{1}{2}\exp(-\delta^2a_k^2/2)},\qquad(2)$

so it remains to find a lower bound on $R$. To this end, a union bound gives

$\displaystyle{1-R=\mathrm{Pr}\Big(\exists x\in S \mbox{ s.t. }\|g_k\|_2+g_n^\top x<\epsilon(1+\delta)\mathbb{E}\|G\|_2 +\delta a_k\Big)}$

$\displaystyle{\leq\mathrm{Pr}\Big(\|g_k\|_2<(1-\delta)a_k\Big)+\mathrm{Pr}\Big(\exists x\in S \mbox{ s.t. }g_n^\top x\leq\epsilon(1+\delta)\mathbb{E}\|G\|_2-(1-2\delta)a_k\Big)}$

Since $x\mapsto\|x\|_2$ is $1$-Lipschitz in the $2$-norm, Proposition 1 can be used to bound the first term above. Also, the second term can be reexpressed by noting that $g_n$ has the same distribution as $-g_n$. Indeed,

$\displaystyle{1-R\leq\exp(-\delta^2a_k^2/2)+\mathrm{Pr}\Big(\max_{x\in S}g_n^\top x\geq(1-2\delta)a_k-\epsilon(1+\delta)\mathbb{E}\|G\|_2\Big)}$.

To bound the second term above, we first claim that $f\colon g\mapsto \max_{x\in S}g^\top x$ is $1$-Lipschitz. To see this, given $a,b\in\mathbb{R}^n$, suppose without loss of generality that $f(a)\geq f(b)$, and that $a^\top x$ is maximized at $x=x_0$. Then

$|f(a)-f(b)|=f(a)-f(b)\leq a^\top x_0-b^\top x_0=(a-b)^\top x_0\leq\|a-b\|_2$.

Thus, Proposition 1 gives

$\displaystyle{\mathrm{Pr}\Big(\max_{x\in S}g_n^\top x\geq(1-2\delta)a_k-\epsilon(1+\delta)\mathbb{E}\|G\|_2\Big)}$

$\displaystyle{\leq\exp\bigg(-\frac{1}{2}\Big((1-2\delta)a_k-\epsilon(1+\delta)\mathbb{E}\|G\|_2-w(S)\Big)^2\bigg)}$,

provided $(1-2\delta)a_k-\epsilon(1+\delta)\mathbb{E}\|G\|_2-w(S)>0$. To satisfy this positivity condition, first note that the Gordon comparison theorem (with $S$ taken to be the closed ball centered at the origin of radius $1$) gives $\mathbb{E}\|G\|_2\leq a_k+a_n$. As such,

$(1-2\delta)a_k-\epsilon(1+\delta)\mathbb{E}\|G\|_2-w(S)$

$\geq\Big((1-\epsilon)a_k-\epsilon a_n-w(S)\Big)-\delta\Big((2+\epsilon)a_k+\epsilon a_n\Big)=:\sigma$,

and since the first term of $\sigma$ is strictly positive by assumption, it suffices to pick $\delta$ such that

$\displaystyle{\delta<\frac{(1-\epsilon)a_k-\epsilon a_n-w(S)}{(2+\epsilon)a_k+\epsilon a_n}}$.

Overall, we have

$\displaystyle{1-R\leq\exp(-\delta^2a_k^2/2)+\exp(-\sigma^2/2)}.\qquad(3)$

Combining (1), (2) and (3) then gives

$\displaystyle{P\geq 1-\frac{5}{2}\exp(-\delta^2a_k^2/2)-\exp(-\sigma^2/2)}$.

We now pick $\delta$ so as to balance and combine the terms on the right-hand side. Specifically, we take $\delta$ such that $\delta a_k=\sigma$, which by the definition of $\sigma$, explicitly means

$\displaystyle{\delta:=\frac{(1-\epsilon)a_k-\epsilon a_n-w(S)}{(3+\epsilon)a_k+\epsilon a_n}<\frac{(1-\epsilon)a_k-\epsilon a_n-w(S)}{(2+\epsilon)a_k+\epsilon a_n}}$.

Then $P\geq 1-\frac{7}{2}\exp(-\delta^2a_k^2/2)$, completing the result.     $\Box$

Now we essentially take the limit as $\epsilon\rightarrow0$ to get the main theorem:

Gordon’s Escape Through a Mesh Theorem (Corollary 3.4 in Gordon’s paper). Take a closed subset $S$ of the unit sphere in $\mathbb{R}^n$. If $w(S), then an $(n-k)$-dimensional subspace $Y$ drawn uniformly from the Grassmannian satisfies

$\displaystyle{\mathrm{Pr}\Big(Y\cap S=\emptyset\Big)\geq1-\frac{7}{2}\exp\bigg(-\frac{1}{18}\Big(a_k-w(S)\Big)^2\bigg)}$.

Proof: First note that for every $x\in S$, the closest $y\in Y$ has norm $\leq1$ since it’s an orthogonal projection of $x$. As such, $Y\cap S=\emptyset$ is equivalent to having $\|x-y\|_2>0$ for every $x\in S$ and $y\in Y\cap B_1$, which in turn implies that $Y\cap (S+B_\epsilon)=\emptyset$ when

$\displaystyle{\epsilon=\frac{1}{2}\min_{\substack{x\in S\\y\in Y\cap B_1}}\|x-y\|_2}$.

Indeed, the minimum is achieved (and thus $\epsilon>0$) since $S\times(Y\cap B_1)$ is compact. As such, we have

$\displaystyle{\mathrm{Pr}\Big(Y\cap S=\emptyset\Big)=\mathrm{Pr}\Big(\exists \epsilon>0 \mbox{ s.t. }Y\cap (S+B_\epsilon)=\emptyset\Big)}$

$\displaystyle{\geq\sup_{\epsilon>0}\mathrm{Pr}\Big(Y\cap (S+B_\epsilon)=\emptyset\Big)=\lim_{\epsilon\rightarrow 0^+}\mathrm{Pr}\Big(Y\cap (S+B_\epsilon)=\emptyset\Big)}$,

where the last step follows from the fact that $Y\cap (S+B_\epsilon)$ implies $Y\cap (S+B_{\epsilon'})$ for every $\epsilon'<\epsilon$. Since $w(S) by assumption, then we also have $w(S)<(1-\epsilon)a_k-\epsilon a_n$ whenever

$\displaystyle{\epsilon<\frac{a_k-w(S)}{a_k+a_n}}$.

At this point, we appeal to Proposition 2 to bound $\mathrm{Pr}(Y\cap (S+B_\epsilon)=\emptyset)$ for such $\epsilon$ and take the limit to get the claimed bound.     $\Box$