Living on the edge: A geometric theory of phase transitions in convex optimization

Phase transitions are very common in modern data analysis (see this paper, for example). The idea is that, for a given task whose inputs are random linear measurements, there is often a magic number of measurements M^* such that if you input M\ll M^* measurements, the task will typically fail, but if you input M\gg M^* measurements, the task will typically succeed. As a toy example, suppose the task is to reconstruct an arbitrary vector in \mathbb{R}^N from its inner products with M random vectors; of course, M^*=N in this case. There are many tasks possible with data analysis, signal processing, etc., and it’s interesting to see what phase transitions emerge in these cases. The following paper introduces some useful techniques for exactly characterizing phase transitions of tasks involving convex optimization:

Living on the edge: A geometric theory of phase transitions in convex optimization

Dennis Amelunxen, Martin Lotz, Michael B. McCoy, Joel A. Tropp

Along with this paper, I highly recommend watching this lecture by Joel Tropp on the same topic. This blog entry is based on both the paper and the lecture.

Perhaps the most basic application of convex geometry to modern signal processing is that of denoising. Suppose you want to estimate a true signal x from a noisy version x':=x+w. (Say w has iid N(0,\sigma^2) entries.) To do so, you need to leverage additional information about x. For example, if you know x lies on a shifted subspace, then you can project x' onto that shifted subspace to form an estimate \hat{x}, and the portion of the noise that lies in the orthogonal complement of the subspace will be annihilated. This is how least-squares estimation works.

Now suppose the information you get about the signal is that it’s simple in some sense (e.g., sparse or low-rank). The simplicity of the signal might be expressible in terms of a convex function f (e.g., the 1-norm or trace norm). So let’s say you’re told that f(x) is at most some value f^*. Then you might denoise x' by projecting onto the convex set S of points of value \leq f^*. But how well does this denoiser perform? Say the final estimator is \hat{x}. One way to judge the quality of this estimator is to consider the mean squared error \mathbb{E}[\|\hat{x}-x\|^2] relative to \sigma^2. It has been empirically observed (e.g., here) that the worst-case relative mean squared error in denoising \sup_{\sigma>0}\frac{1}{\sigma^2}\mathbb{E}[\|\hat{x}-x\|^2] is apparently identical to the phase transition for convex recovery using the same convex function f. Recently, Oymak and Hassibi found an important geometric characterization of this worst-case error:

\displaystyle{\sup_{\sigma>0}\tfrac{1}{\sigma^2}\mathbb{E}[\|\hat{x}-x\|^2]=\delta(\mathcal{D}(f,x))},

where \mathcal{D}(f,x) is the cone generated by points in S-x (this is called the descent cone of f at x), and \delta is the statistical dimension, defined by

\delta(C):=\mathbb{E}[\|P_C(g)\|^2];

here, P_C is the projection onto the convex cone C and g is a vector with iid N(0,1) entries.

It turns out that statistical dimension is particularly natural as a measure of the size of a cone. For example, a k-dimensional subspace has statistical dimension k. Also, the statistical dimension has been calculated for various cones of interest, such as the positive semidefinite cone (see Table 4.1 in the paper). Since statistical dimension characterizes the worst-case relative mean-squared error in denoising, and since this error has been empirically observed to be identical to a corresponding phase transition, it is perhaps not surprising that statistical dimension also provably characterizes phase transitions. This is the main result of the paper:

Theorem (Approximate kinematic formula). Fix a tolerance \eta\in(0,1). Suppose C,K\subseteq\mathbb{R}^N are closed convex cones, one of which is not a subspace. Draw an N\times N orthogonal matrix Q uniformly at random. Then

\delta(C)+\delta(K)\leq N-\sqrt{16N\log(4/\eta)}\quad \Longrightarrow \quad \mathrm{Pr}[C\cap QK=\{0\}]\geq1-\eta,

\delta(C)+\delta(K)\geq N+\sqrt{16N\log(4/\eta)}\quad \Longrightarrow \quad \mathrm{Pr}[C\cap QK=\{0\}]\leq\eta.

Now suppose K is a subspace of dimension N-M. Then QK can be realized as the null space of an M\times N matrix A with iid N(0,1) entries. Also, if Ax=y, then x is the unique minimizer of f(z) subject to Az=y precisely when the null space of A trivially intersects the descent cone \mathcal{D}(f,x). As such, this minimization routine reconstructs x with a phase transition at \delta(\mathcal{D}(f,x))+(N-M)=N, i.e., at M=\delta(\mathcal{D}(f,x)). Indeed, the above result proves the conjecture that identifies this phase transition with the denoising error. And it does so while offering some helpful geometric perspective.

As you might expect, this result finds applications in compressed sensing and random demixing problems, and it also offers some insight into something called random cone programs: How large must M be for there to exist an x in a fixed cone C such that Ax=b, where b is fixed and A has iid N(0,1) entries? Go figure, this also exhibits a phase transition at M=\delta(\mathcal{D}(f,x)).

I want to close by noting the similarity between the first part of the approximate kinematic formula and Gordon’s escape through a mesh theorem. It turns out that the statistical dimension of a cone C is more or less the square of the Gaussian width of the intersection between C and the unit sphere:

w(C\cap \mathbb{S}^{N-1})^2\leq\delta(C)\leq w(C\cap \mathbb{S}^{N-1})^2+1.

(See Proposition 10.1 in the paper.) Recall that Gordon’s escape through a mesh theorem says that an (N-M)-dimensional subspace hits a set S\subseteq\mathbb{S}^{N-1} with probability decaying exponentially with (\sqrt{M}-w(S))^2. This corresponds exactly with how, by the approximate kinematic formula, hitting the set S exhibits a phase transition at M=\delta(C)\approx w(S)^2, where C is the cone generated by S. As such, this formula establishes that the escape theorem is tight in some sense.

3 thoughts on “Living on the edge: A geometric theory of phase transitions in convex optimization

Leave a comment