Phase transitions are very common in modern data analysis (see this paper, for example). The idea is that, for a given task whose inputs are random linear measurements, there is often a magic number of measurements such that if you input
measurements, the task will typically fail, but if you input
measurements, the task will typically succeed. As a toy example, suppose the task is to reconstruct an arbitrary vector in
from its inner products with
random vectors; of course,
in this case. There are many tasks possible with data analysis, signal processing, etc., and it’s interesting to see what phase transitions emerge in these cases. The following paper introduces some useful techniques for exactly characterizing phase transitions of tasks involving convex optimization:
Living on the edge: A geometric theory of phase transitions in convex optimization
Dennis Amelunxen, Martin Lotz, Michael B. McCoy, Joel A. Tropp
Along with this paper, I highly recommend watching this lecture by Joel Tropp on the same topic. This blog entry is based on both the paper and the lecture.
Perhaps the most basic application of convex geometry to modern signal processing is that of denoising. Suppose you want to estimate a true signal from a noisy version
. (Say
has iid
entries.) To do so, you need to leverage additional information about
. For example, if you know
lies on a shifted subspace, then you can project
onto that shifted subspace to form an estimate
, and the portion of the noise that lies in the orthogonal complement of the subspace will be annihilated. This is how least-squares estimation works.
Now suppose the information you get about the signal is that it’s simple in some sense (e.g., sparse or low-rank). The simplicity of the signal might be expressible in terms of a convex function (e.g., the 1-norm or trace norm). So let’s say you’re told that
is at most some value
. Then you might denoise
by projecting onto the convex set
of points of value
. But how well does this denoiser perform? Say the final estimator is
. One way to judge the quality of this estimator is to consider the mean squared error
relative to
. It has been empirically observedĀ (e.g., here) that the worst-case relative mean squared error in denoising
is apparently identical to the phase transition for convex recovery using the same convex function
. Recently, Oymak and Hassibi found an important geometric characterization of this worst-case error:
,
where is the cone generated by points in
(this is called the descent cone of
at
), and
is the statistical dimension, defined by
;
here, is the projection onto the convex cone
and
is a vector with iid
entries.
It turns out that statistical dimension is particularly natural as a measure of the size of a cone. For example, a -dimensional subspace has statistical dimension
. Also, the statistical dimension has been calculated for various cones of interest, such as the positive semidefinite cone (see Table 4.1 in the paper). Since statistical dimension characterizes the worst-case relative mean-squared error in denoising, and since this error has been empirically observed to be identical to a corresponding phase transition, it is perhaps not surprising that statistical dimension also provably characterizes phase transitions. This is the main result of the paper:
Theorem (Approximate kinematic formula). Fix a tolerance . Suppose
are closed convex cones, one of which is not a subspace. Draw an
orthogonal matrix
uniformly at random. Then
,
.
Now suppose is a subspace of dimension
. Then
can be realized as the null space of an
matrix
with iid
entries. Also, if
, then
is the unique minimizer of
subject to
precisely when the null space of
trivially intersects the descent cone
. As such, this minimization routine reconstructs
with a phase transition at
, i.e., at
. Indeed, the above result proves the conjecture that identifies this phase transition with the denoising error. And it does so while offering some helpful geometric perspective.
As you might expect, this result finds applications in compressed sensing and random demixing problems, and it also offers some insight into something called random cone programs: How large must be for there to exist an
in a fixed cone
such that
, where
is fixed and
has iid
entries? Go figure, this also exhibits a phase transition at
.
I want to close by noting the similarity between the first part of the approximate kinematic formula and Gordon’s escape through a mesh theorem. It turns out that the statistical dimension of a cone is more or less the square of the Gaussian width of the intersection between
and the unit sphere:
.
(See Proposition 10.1 in the paper.) Recall that Gordon’s escape through a mesh theorem says that an -dimensional subspace hits a set
with probability decaying exponentially with
. This corresponds exactly with how, by the approximate kinematic formula, hitting the set
exhibits a phase transition at
, where
is the cone generated by
. As such, this formula establishes that the escape theorem is tight in some sense.
3 thoughts on “Living on the edge: A geometric theory of phase transitions in convex optimization”