# Phase retrieval from coded diffraction patterns

In a previous post, I described a paper I wrote with Afonso and Yutong about how to design $O(\log n)$ masked illuminations that enable efficient phase retrieval of $n$-dimensional signals for X-ray crystallography and related applications. This masked-illumination methodology was originally proposed in this paper, and our phase retrieval guarantee was based on a recovery method known as polarization. This week, the following paper was posted online, and it gives the first guarantee of this kind for a more popular recovery method called PhaseLift:

Phase retrieval from coded diffraction patterns

Emmanuel J. Candes, Xiaodong Li, Mahdi Soltanolkotabi

In particular, this paper shows that $O(\log^4 n)$ masked illuminations enable PhaseLift recovery, and their result actually holds for a wide assortment of masks. Since this paper is so related to my research, I decided to interview one of the authors (Mahdi Soltanolkotabi). I’ve lightly edited his responses for formatting and hyperlinks:

DGM: I’ve personally thought a bit about this problem, and it seems that the biggest difficulty is working with Fourier-type measurements in the space of self-adjoint matrices. What techniques do you use to address this difficulty, how do you apply them, and where can I go to learn more about them?

MS: The main challenge in the Fourier model is that we have much less randomness to work with. One could imagine a few alternative proof methods for getting the result with Gaussian measurements based on doing expected value calculations (I’m going to refer to this as average case analysis) and then use concentration inequalities based on bounding higher order moments to change the “on average” results into high probability results. In the Fourier models the options are much more limited. First, even “average” case analysis (expected value arguments) of the problem are not so trivial as calculating means of random variables can be quite messy in this model. Second, most average case analysis do not necessarily translate into high probability results as calculating moments of the random variables here can become rather involved in the Fourier model. Furthermore, if the problem/proof is not formulated in the correct way there may be unwanted dependencies in the random variables that further complicate concentration arguments. To give you a concrete example, in the early drafts of this paper we used a different argument that required bounding second order moments of some random variables. Calculating such second order moments was rather tedious and took more than 9 edited latex pages. As you can see in the final version of the paper we were finally able to avoid such arguments. This is why the Fourier model is delicate because if you don’t formulate or think about each line of the proof in the “correct” way the argument may become impossible or very tedious.

Our proof has three main steps, which are also the main subsections in the proofs section of our paper.

Step 1: Robust injectivity.

Roughly stated this result states that our measurement process ($\mathcal{A}:\mathcal{S}^{n\times n}\rightarrow\mathbb{R}^m$) is not only invertible for the true signal ($xx^*$) but also well conditioned for certain signals. More specifically, all signals of the form $xy^*+yx^*$. This result essentially says that the range of our operator has the right “angle” (I will say more about this later).

This is a natural property that any “robust” algorithm would need to have and is the most involved and delicate part of our proof. (I will explain a bit later how it fits into our general proof).

Here, the three main challenges are:

1. First, one has to formulate robust injectivity in the “correct” way otherwise it would become very difficult to prove as I mentioned before.

2. Second, we need to realize that we must prove the result for $x^*y$ that are real and then show that the result holds more generally. The reason for this as discussed in our paper is that for any $y \in \mathbb{C}^n$, we can find $\lambda \in \mathbb{R}$, such that $x^*y-i\lambda x^*x=x^*(y-i\lambda x)\in\mathbb{R}$ while

$x(y-i\lambda x)^*+(y-i\lambda x)x^*=xy^*+yx^*$,

This may look easy in hindsight but I think it is rather important because without this argument one may incorrectly think that robust injectivity can not hold in the Fourier model. Indeed the concentration arguments we use do not necessarily hold without assuming $x^*y$ is real.

3. Finally, the main challenge is to massage the problem in such a way to put it into the form of bounding a spectral norm of a random matrix which is the sum of certain random i.i.d. matrices that have “good” properties. Because if we manage to do this then we can apply matrix concentration inequalities to finish the proof. (For example we used matrix Hoeffding inequality). This helps us avoid a lot of combinatorial arguments. (For matrix concentration inequalities I would recommend the tutorial by Tropp and references therein.

Step 2: Building an approximate dual certificate via the golfing scheme

Here we need to find a vector which is in the range of the transpose of our measurement operator and is “close enough” to the set of valid dual certificates. For this purpose we use a Golfing scheme argument. As mentioned in the paper the golfing scheme was presented in the work of Gross and modifications of this technique have subsequently been used in many other papers (e.g., one, two, three). Of course Golfing scheme is a general proof methodology and it requires a lot of calculations and customizations when applied to a particular problem.

The key ingredient in this part of our argument is that we need to somehow use our measurements and measurement operator to construct a matrix which is in some sense close to $xx^*$. The way we go about this is that we build such a matrix in expectation and then show that it is also close with high probability. Luckily we can reuse some of the calculations we had in our robust injectivity result to transition from an “in expectation” argument to a “high probability” one. Of course this is a “high level” description.

Step 3: Put everything together to prove exact recovery

Finally, we show that a combination of step 1 and 2 yield exact recovery.

Define $T=\{xy^*+yx^*: y\in\mathbb{C}^n\}$.

By a duality condition we know that we can have exact recovery of $xx^*$ by the SDP feasibility problem if we can find a matrix $Z\in\mathcal{A}^*(\lambda)$ with $\lambda\in\mathbb{R}^m$ (called a dual certificate) such that

(1) $P_{T^\perp}(Z)\preceq 0$, and
(2) $P_{T}(Z)=0$.

Now the idea is that we impose a more stringent condition on (1) requiring that $P_{T^\perp}(Z)\preceq -I_{T^\perp}$ and a more relaxed condition on (2) requiring that $P_{T}(Z)$ be only close to $0$. We now essentially show that having the approximate versions of (1) and (2) stated above together with robust injectivity allows us to find an exact dual certificate and therefore guarantee exact recovery.

To see why this argument makes sense let me give a more tangible example. Imagine that you want to prove that a 2-D plane passing through the origin will also pass through a face of the cube in $\mathbb{R}^3$. More specifically the face $x=1$, $|y|\le 1$ and $|z|\le 1$. Before I move on let me say that in this example,

• the plane is analogous to the range of $\mathcal{A}^*$,
• $|y|\le 1$ and $|z|\le 1$ is analogous to condition (1) above, and
• $x=1$ is analogous to condition (2) above.

Now one way to prove that the plane passes through this face is to find a point in the intersection of the plane and the face of the cube (this is basically analogous to finding the exact dual certificate stated above). However, another way to go about this is to find a point on the plane that is very close to the center of the face of the cube but is not on the face. More specifically,

• close to the center of the face, say  $|y|\le 1/4$ and $|z|\le 1/4$, and
• $x\approx 1$.

Then show that the angle with which the plane hits the face is “good” (this is analogous to robust injectivity in our proof). It is easy to see that by combining these conditions the plane must hit that face of the cube. At a high level this is what we are doing in our proof. The reason we are taking such an approach is that it may be easier to build this “close enough point” (in our case approximate dual certificate) in lieu of finding an actual point in the intersection (in our case an exact dual certificate).

DGM: In previous PhaseLift papers, there have been subtle differences in the recovery guarantees. In this paper, which of the following do you show?

(a) Fix a signal. Then with high probability, a random ensemble of masks enables PhaseLift recovery of that signal.

(b) Draw a random ensemble of masks. Then with high probability, that ensemble simultaneously enables PhaseLift recovery of every possible signal.

If (a), do you think (b) is within reach? If (b), was this particularly difficult to obtain? What are your thoughts on the differences between these types of guarantees?

MS: We work with model (a). We fix a signal. Then with high probability, a random ensemble of masks enables PhaseLift recovery of that signal.

We certainly believe that (b) holds. We have not spent much time to try to get (b). But, I expect that it would require non-trivial modifications to our current arguments. For example one would need to prove a variation of our robust injectivity result holds universally for all $x$, rather than a fixed $x$. Off the top of my head I would not know of an easy way to do this as typical covering arguments used in these type of situations does not seem to apply. Although I do have some ideas on how one could use more sophisticated arguments here.

As to your question about which is more reasonable. Of course (b) is stronger, but in my point of view (a) is a valid model to study when discussing such problems.

DGM: Typically in imaging applications, one must work with a 2D Fourier transform. Do you think your methods can be adapted to tackle the 2D case?

MS: You bring up a very good point. In fact we have briefly discussed this in the new version of our paper. Indeed we believe our arguments extend to unstructured 2D codes. However, explaining the proof will become a bit notation heavy as we would need to work with outer product of 2D matrices which results in 4D tensors. Furthermore, one can also see that it is immediate to get 2D results by dividing the problem into smaller 1D problem by means of a “tensorizing” trick as explained in appendix B of the new version of our paper.

DGM: This version of the paper looks very little like the original version you posted on the arXiv last month. (Even the title has changed!) How does the content of this version of your paper compare to the preliminary draft?

MS: We have made many changes to the paper so that the title, introduction and text are quite different. In the edited version of the paper we also work with a less restrictive model as we no longer need to have a “special” identity mask. This required non-trivial modifications to our previous construction of the approximate dual certificate. Finally, we have a revamped numerical section which studies phase transitions for different models and shows how the algorithm operates in the presence of Poisson noise.