# Probably certifiably correct algorithms

This post is based on two papers (one and two). The task is to quickly solve typical instances of a given problem, and to quickly produce a certificate of that solution. Generally, problems of interest are NP-hard, and so we consider a random distribution on problem instances with the philosophy that real-world instances might mimic this distribution. In my community, it is common to consider NP-hard optimization problems:

minimize $f(x)$ subject to $x\in S$.     (1)

In some cases, $f$ is convex but $S$ is not, and so one might relax accordingly:

minimize $f(x)$ subject to $x\in T$,     (2)

where $T\supseteq S$ is some convex set. If the minimizer of (2) happens to be a member of $S$, then it’s also a minimizer of (1) — when this happens, we say the relaxation is tight. For some problems (and distributions on instances), the relaxation is typically tight, which means that (1) can be typically solved by instead solving (2); for example, this phenomenon occurs in phase retrieval, in community detection, and in geometric clustering. Importantly, strong duality ensures that solving the dual of the convex relaxation provides a certificate of optimality.

At this point, it seems that our original task has been accomplished, provided the convex optimization is sufficiently fast. In the cases of phase retrieval, community detection and k-means clustering, the program (2) amounts to a semidefinite program, which can certainly be solved in polynomial time, but the runtimes scale much worse than alternatives like Wirtinger flow, degree-profiling and Lloyd’s algorithm. On the other hand, the fast alternatives fail to produce certificates of optimality.

To get around this, we first recall how to prove that (2) is typically tight. First, there is a planted solution $x_0\in S$ that solves (1) with high probability (the randomness is in $f$ and/or $S$). For example, in the case of phase retrieval, $x_0$ is the ground truth. The goal is to prove that $x_0$ is typically the unique solution of the relaxation (2). To do so, construct a function $g$ such that $g(x_0;f,S)$ typically certifies the optimality of $x_0$. In phase retrieval, $g$ leverages the golfing scheme, in community detection, $g$ is completely determined by complementary slackness, but in k-means clustering, $g$ is only partially determined by complementary slackness (the rest is chosen by an artform of sorts; note the difference between this choice and that choice). Once we have $g$, it remains to show that for each $x_0$, $g(x_0;f,S)$ is typically dual feasible, and for positive semidefinite programs, this amounts to checking whether a certain random matrix is positive semidefinite (so spectral estimates like these become useful here).

At this point, it seems pretty easy to overcome the fact that fast solvers like Wirtinger flow fail to produce optimality certificates: Take the fast solver’s solution, plug it into the function $g(\cdot;f,S)$, and then check that the output is dual feasible. Since dual feasibility can be tested by spectral methods, this should be particularly fast. This is Afonso‘s main point in this paper.

The only difficulty here is that the power method (for example) uses a random initialization. As such, while the power method will quickly converge with high probability, there is a small probability that it will take some time before converging. Since we can only run finitely many iterations of the power method, this naturally leads one to treat the power method’s output as a test statistic for a hypothesis test (see section 3.1 in this paper). Overall, one may compute $g(x_0;f,S)$, run the power method, and then perform a hypothesis test. In the end, one may quickly say something like “$x_0$ is the unique solution to (1) with confidence level 0.9999″. Indeed, unlike traditional statistics, it’s rather inexpensive to produce high levels of confidence here. For example, running the power method twice will change the confidence level from $1-\epsilon$ to $1-\epsilon^2$.