The following line from the introduction caught my eye:

For instance the print-out for exact fiducial 48a occupies almost a thousand A4 pages (font size 9 and narrow margins).

As my previous blog entry illustrated, the description length of SIC-POVM fiducial vectors appears to grow rapidly with . However, it seems that the rate of growth is much better than I originally thought. Here’s a plot of the description lengths of the known fiducial vectors (the new ones due to ACFW17 — available here — appear in red):

Note that the vertical axis has logarithmic scale. Unlike my interpretation from two years ago, the description lengths appear to exhibit subexponential growth in . Putting the horizontal axis in log scale says even more:

The dotted line depicts . This suggests that the description length scales with the number of entries in the Gram matrix.

For context, let’s consider the more general problem of constructing equiangular tight frames (ETFs) of vectors in dimension ; see this paper for a survey. In the real case, it suffices to determine the sign pattern of an ETF’s Gram matrix, which can be naively described in bits. However, there are several infinite families of real ETFs with much shorter description length. Indeed, the sign patterns are determined by certain strongly regular graphs, many of which enjoy a straightforward algebro-combinatorial construction.

In the case of SIC-POVMs, the Gram matrix is complex, so it doesn’t correspond to a strongly regular graph in the same way, but the conjectures used in ACFW17 suggest that the Gram matrix may be selected so as to satisfy certain group and number theoretic properties. But even after reducing to such specific structure, the description length appears to scale with the size of the Gram matrix (i.e., the naive scaling in the real case). As such, an infinite family of explicit SIC-POVMs will likely require the identification of additional structure. This is shocking, considering the conjectured structures that are currently used already seem miraculous.

]]>

**DGM: **How were you introduced to this problem? Do you have any particular applications of shape matching or point-cloud comparison in mind with this research?

**SV:** This problem was introduced to me by Andrew Blumberg in the context of topological data science. Andrew is an algebraic topologist who is also interested in applications, in particular in computational topology. There is a vast literature on the registration problem for 3d shapes and surfaces, but usually they are tailored to the geometric properties of the space and rely on strong geometry assumptions. Our goal was to study this problem in an abstract setting, that could have potential impact in spaces with unusual geometry. In particular we are thinking of spaces of phylogenetic trees, protein-protein interaction data, and text processing. We don’t have experimental results for those problems yet but we are working on it.

A reason why it is so hard to obtain meaningful results for these “real data” problems is that it is hard to validate whether the method produces a meaningful result. A simple way for a mathematician like me to validate the performance of our methods and algorithms is to compare with problems where the ground truth solution is known (like the teeth classification and shape matching), and this is what we did in the paper.

For future scientific applications, I’m working with Bianca Dumitrascu, who is a graduate student in computational biology at Princeton. Bianca works with large datasets of protein-protein interaction information. She has the intuition that the existence of isometries between protein interaction measurements in different biological systems should be correlated with similar roles between corresponding proteins. However such behavior is very hard to test in real data because of scalability issues, the large amount of noise present in the data, and the lack of a theoretical ground truth in most cases.

**DGM:** Do you have any intuition for why your polynomial-time lower bound on Gromov-Hausdorff distance satisfies the triangle inequality?

**SV:** The intuitive answer: I think this is a phenomenon aligned with the “data is not the enemy” philosophy. The Gromov-Hausdorff distance is NP-hard in the worst case, but it is actually computable in polynomial time for a generic set of metric spaces. Since in the small scale our relaxed distance coincides with the Gromov-Hausdorff distance, then intuitively we could expect that it is actually a distance (and therefore satisfies triangle inequality).

The practical answer: Considering the relations that realize and , there is a straightforward way to define a relation between and so that the Gromov-Hausdorff objective value for that relation is smaller or equal than . Just consider the composition! If the result of our semidefinite program is interpreted as a soft assignment between points from one metric space to another, then it is natural to ask what the composition of soft assignments is, whether it is feasible for the semidefinite program, and if it is upper bounded by . This is basically why the triangle inequality holds.

**DGM: **You proved that generic finite metric spaces enjoy a neighborhood of spaces whose Gromov-Hausdorff distance from equals your lower bound (i.e., your bound is generically tight for small perturbations). However, the size of the allowable perturbation seems quite small. Later, you mention that you frequently observe tightness in practice. Do you think that tightness occurs for much larger perturbations in the average case over some reasonable distribution?

**SV:** I think tightness occurs for relatively large perturbations of the isometric case provided that the data is well conditioned. However, in an extreme case, if all pairwise distances are the same, then the solution of the semidefinite program is not unique and therefore tightness will not occur. When studying the distance from the topological point of view, a result of the form “there exists a local neighborhood such that the distances coincide” is relevant. From an applied mathematical perspective, it would interesting to quantify for how large perturbations the semidefinite program is tight. The techniques I know for obtaining such a result rely on the construction of dual certificates. The dual certificates I managed to construct also had a dependency on (the minimum nonzero entry in ) due to degeneracy issues. I think it should be possible to obtain a tightness result for larger perturbations but I think it may be a hard problem. The way I would start thinking about this is with numerical experiments and a conjectured phase transition for tightness of the semidefinite program as a function of noise, for different ‘s.

**DGM:** How frequently does your local method GHMatch recover the Gromov-Hausdorff distance in practice? Is there a way to leverage the smallest eigenvector of to get a better initialization (a la Wirtinger Flow for phase retrieval)?

**SV: **The algorithm GHMatch often gets stuck in local minima. In many non-convex optimization algorithms, good initialization is good enough to guarantee convergence to a global optimal after some steps of gradient descent. However our optimization problem has non-negative constraints which makes it significantly harder because the variable needs to be at least thresholded to a non-negative after each iteration. There is a class of algorithms that attempts to do such things for Synchronization problems, such as Projected Power Methods, (see for example this paper). But the right algorithm is not to project just like that, but to weight carefully with Approximate Message Passing, as they do for example in this paper.

]]>

John is on the job market this year, and when reading his research statement, I was struck by his discussion of our paper, so I asked him to expand his treatment to a full blown blog entry. Without further ado, here is John’s guest blog post (which I’ve lightly edited for hyperlinks and formatting):

An equiangular tight frame (ETF) is a set of unit vectors in that achieves equality in the so-called * Welch bound*:

One elegant construction of ETFs relies on a group theoretic/combinatorial object called a difference set. If is a finite group and , then we say that is a * difference set* if the cardinality of is independent of the choice of .

The construction of so-called * harmonic ETFs* is as follows. Let be a finite abelian group, and let be the group of characters on . If is a difference set and is the indicator function of , then is an ETF for its span, where denotes the left regular representation, and is the inverse Fourier transform of . A frame generated by the orbit of a vector under the action of a group representation is called a

It is natural to ask whether nonabelian groups can be used in a similar way to construct ETFs. Indeed, the most famous open problem in the theory of ETFs, Zauner’s conjecture, asks for an ETF with vectors in given as a group frame under the action of the Heisenberg group. The main obstruction is the fact that for nonabelian groups, the Pontryagin dual does not naturally form a group. Thus, it is not immediately clear what could stand in for a difference set. Our approach involves generalizing beyond nonabelian groups to association schemes.

A collection of 0-1 matrices is called an * association scheme* if the identity is in , is the all ones matrix, and spans a commutative -algebra.

Given a finite abelian group we can construct an association scheme by taking to be the translation operator on . We will refer to as an * abelian group scheme*. Not all association schemes arise in this way, and thus association schemes can be seen as a generalization of abelian groups. Moreover, we can do harmonic analysis on association schemes, as we shall see.

Since is a commutative -algebra, by the spectral theorem there is another basis for consisting of the projections onto the maximal eigenspaces in . If we expand a given matrix in both bases

then we can think of as the * Fourier transform* of . For an abelian group scheme as described above the indexing sets of and can be taken to be and respectively. With this identification the current definition of the Fourier transform is identical to the usual one. Thus, in a general association scheme we can think of the 0-1 matrices as playing the role of the group, and the projection matrices play the role of the dual group.

Since the set generates , it naturally has the algebraic structure that lacks when is a nonabelian group. This allows us to define a * hyperdifference set*, which generalizes the concept of a difference set to the setting of association schemes. For any subset we define

Since the projections are mutually orthogonal, it is immediate that is a projection. If is the Gram matrix of an ETF, then we call a hyperdifference set. See our recent paper for all the details on the construction of ETFs from association schemes.

In the case of abelian group schemes, hyperdifference sets are exactly difference sets, but association schemes and hyperdifference sets generalizes more than just harmonic ETFs. Indeed, any real ETF with centroidal symmetry (see this paper) is an instance of this construction, as are the ETFs constructed by Renes and Strohmer. But our goal is to use nonabelian groups to make ETFs, so let’s get on with it.

Given a finite * nonabelian* group , one can construct an association scheme called the

In this case, the matrix is the Gram matrix of an ETF with vectors in a space of dimension .

There are several known infinite families of difference sets in abelian groups, and thus there are infinite families of harmonic ETFs. However, until our recent paper, there was no known infinite family of ETFs constructed as group frames generated by a nonabelian group. In our construction the groups are formed by a “twisted” cross product of two vector spaces over the field with two elements. It turns out that these are instances of * Suzuki 2-groups* (see this paper). We then construct a set of irreducible characters satisfying above, thus giving a hyperdifference set in the group scheme. In the end, this gives us ETFs with vectors in a space of dimension for any positive integer .

]]>

It’s hard to pin down what exactly the polynomial method is. It’s a technique in algebraic extremal combinatorics, where the goal is to provide bounds on the sizes of objects with certain properties. The main idea is to identify the desired cardinality with some complexity measure of an algebraic object (e.g., the dimension of a vector space, the degree of a polynomial, or the rank of a tensor), and then use algebraic techniques to estimate that complexity measure. If at some point you use polynomials, then you might say you applied the polynomial method.

What follows is a series of instances of this meta-method.

**— Linear algebraic bounds —**

In this section, we identify certain combinatorial structures with vectors in a vector space. After identifying that these vectors must be linearly independent, we conclude an upper bound on the cardinality of these structures (namely, the dimension of the vector space).

**Problem (from the Tricki).** Suppose is even. What is the maximum number of subsets of with the property that each has odd size and the intersection between any two has even size?

For each set , consider its indicator function . (We let our indicator functions take values in the field of order 2 since parity plays a leading role in this problem.) We seek indicator functions of odd support that are pairwise orthogonal. In this case, orthogonality implies linear independence (why?), and so we know that the maximum possible number is . Furthermore, we can saturate this bound by selecting all singletons, and so the bound is sharp.

Next, consider a -balanced incomplete block design, that is, a collection of size- subsets of , called blocks, with the property that every point in belongs to exactly blocks, and every pair of distinct points is contained in exactly blocks. We take to prevent degenerate cases.

**Fisher’s inequality.** A -balanced incomplete block design exists only if .

Consider the matrix whose th row is the indicator function of the th block. (Here, we view the entries as real numbers.) First, implies (why?). Next, the definition gives , where denotes the all-ones matrix. Since is positive definite, it has rank , which in turn implies that has rank , which is only possible if .

Notice that in this case, the linearly independent vectors are the columns of , as opposed to the blocks’ indicator functions. The block designs which achieve equality in Fisher’s inequality are called symmetric designs. The Fano plane is an example.

The following result has a similar proof:

**The Gerzon bound.** Take of unit norm such that whenever . Then .

Here, the matrices are linearly independent in the -dimensional real vector space of self-adjoint matrices, which can be seen from their Gram matrix. The ensembles which achieve equality in the Gerzon bound are called symmetric informationally complete positive operator–valued measures (SIC-POVMs), and they are conjectured to exist for every dimension .

**— Simple polynomials have simple zero sets —**

In this section, the main idea is that simple polynomials have simple zero sets. The following is the most basic instance of this idea:

**Theorem.** Let be a field. A nonzero polynomial of degree at most has at most roots.

This immediately produces a similar instance for polynomials of multiple variables:

**Corollary.** Let be a field. If has degree at most and vanishes on more than points of a line in , then it vanishes on the entire line.

To see this, parameterize the line in terms of a variable , and then plug this parameterization into to get a polynomial in of degree at most . Then by the previous theorem, this has more than roots only if , which in turn establishes that vanishes on the entire line. The following provides yet another example of simple multivariate polynomials having simple zero sets:

**Theorem (Alon’s Combinatorial Nullstellensatz).** Let be a field, and suppose has total degree . If the coefficient of is nonzero and satisfy for each , then there exists such that .

As an application of this result, consider the following:

**Problem (Problem 6 from IMO 2007).** Let be a positive integer, and consider

as a set of points in . Determine the smallest number of planes, the union of which contains but does not include .

There are a couple of obvious choices of planes, for example shifts of each coordinate plane, or shifts of the orthogonal complement of the all-ones vector. We will show that is the smallest possible number of planes by showing that planes are insufficient. In particular, having only planes will lead to a polynomial that is too simple for the zero set to satisfy the desired constraints.

Suppose to the contrary that planes are sufficient, and let denote the th plane. Put

Then the zero set of is the union of the planes, and so by assumption. Since we want to apply Alon’s Combinatorial Nullstellensatz, we want a polynomial whose zero set contains an entire grid, so we will modify so as to vanish over all of . To this end, we know that

is also zero on , and furthermore, . As such,

will serve as the desired modification. This polynomial has total degree , and the coefficient of is . Furthermore, taking satisfies the hypothesis of Alon’s Combinatorial Nullstellensatz, which then gives that fails to vanish on all of , a contradiction.

**— Simple sets are zero sets of simple polynomials —**

Given a sufficiently small set, there is a low-degree nonzero polynomial that vanishes on that set. For example:

**Theorem.** Let be a field, and take of size at most . Then there exists a nonzero polynomial of degree at most that vanishes on .

This can be proven explicitly or implicitly. For the explicit version, just take . For the implicit version, take the linear operator that maps to by evaluating a given polynomial at each point in . Since the subspace of polynomials in of degree at most has dimension , this mapping has a nontrivial nullspace, thereby implicating the desired nonzero polynomial . This latter proof can be used to obtain a more general lemma:

**Theorem.** Let be a field, and take of size strictly less than . Then there exists a nonzero polynomial of degree at most that vanishes on .

Indeed, the dimension of the subspace of of polynomials with degree at most is the number of monomials of degree at most , which equals the number of -tuples of nonnegative integers whose sum is at most . A stars and bars argument then gives that this dimension is

where the last step follows from Pascal’s identity by induction on . From this, one may conclude (for example) that any two points in lie on a line, any five points lie on a (possibly degenerate) conic section, and any three points lie on a (possibly degenerate) cubic curve.

This theorem was used to prove the following:

**Theorem (Finite field Kakeya conjecture, Dvir 2008).** Let be a finite field, and suppose contains a line in every direction (i.e., is a Kakeya set). Then has size at least , where depends only on (and not ).

More explicitly, we will show that has size at least

(Note that this differs slightly from Terry Tao’s exposition.) To do this, we suppose to the contrary that . Then by the previous result, there exists a nonzero polynomial of degree that vanishes on . Use this to form a homogeneous polynomial by multiplying terms of degree by . Put . Then . By assumption, we have that for every , there exists such that is in the zero set of . This then implies that is in the zero set of . Note that for every , the homogeneity of gives

That is, has degree at most and is zero at for every nonzero . Since this makes up different points on a line, the corollary of the previous section gives that is also zero when , namely, at . Since the choice of was arbitrary, this means whenever . But is the polynomial made up of the terms of maximum degree in , which cannot be identically zero (by Alon’s Combinatorial Nullstellensatz, for example, though this is overkill).

**— Capset bounds —**

A particularly recent application of the polynomial method made a surprising contribution to the following open problem:

**The Capset Problem.** How large can a subset be if it contains no lines?

Such a set is known as a * capset*. The obvious bounds on are and , and with a bit of extra work, these were improved to and . On his blog, Terry Tao suspected that the lower bound could be improved all the way up to , and in one of his MathOverflow answers, he suspected that the polynomial method might be leveraged to improve the known bounds. He was recently proven both wrong and right when the polynomial method was used to establish the following:

**Theorem (Ellenberg–Gijswijt 2016).** If contains no lines, then .

The proof uses ideas from Croot–Lev–Pach 2016. First observe the following identity:

Indeed, precisely when either forms a line in or . Since contains no lines by assumption, both sides of the above identity are zero unless , in which case both sides are 1.

The above identity equates functions of the form . View this as a 3-way tensor whose entries lie in and whose rows, columns and tubes are indexed by members of . We will estimate a certain complexity measure of 3-way tensors known as * slice rank*: Here, a tensor is said to be decomposable if it can be expressed as

and a tensor has slice rank if there are decomposable tensors that sum to and no fewer. As one might expect, the slice rank of a diagonal tensor is the number of nonzero diagonal entries, and so the right-hand side of the above identity has slice rank . As such, it suffices to bound the slice rank of the left-hand side.

To this end, we view the left-hand side as a subtensor of , defined over all , whose slice rank is an upper bound on the desired slice rank. The following lemma estimates the slice rank of this larger tensor:

**Lemma.** The slice rank of over is at most , where

The result then follows from analyzing the asymptotic behavior of (see this post for details). The proof of the lemma is perhaps more interesting, considering this is where polynomials actually play a role. The first step is to express the tensor as a polynomial. To do this, note that over , we have . As such,

Bounding the slice rank then amounts to decomposing the above polynomial into only a few terms, where either , , or -dependence can be factored out of each term.

To this end, consider the , and -degrees of each monomial, the sum of which gives the total degree of that monomial. Observe from the polynomial’s definition that each monomial has total degree at most , implying that one the , or -degrees is at most . Partition the monomials according to which degree is smallest (breaking ties arbitrarily). For the moment, let’s focus on the monomials for which the -degree is smallest. Then we combine like terms according to the contribution from the variables, resulting in combined terms of the form . How many such terms are there? Considering the polynomial’s definition, we know that each lies in . Then letting , and denote the numbers of ‘s that are 0, 1 and 2, respectively, it follows that counts the total number of possible combined terms that were combined according to . Since the same number arises from combining terms according to or , we end up with the bound .

]]>

**1. There is an ETF of 76 vectors in **

See this paper. Last time, I mentioned a recent proof that there is no ETF of 76 vectors in . It turns out that a complex ETF of this size does exist. To prove this, it actually seems more natural to view the vectors as columns of a matrix whose row vectors sum to zero. As a lower-dimensional example, consider the following matrix:

Here, the columns are all possible vectors of s that sum to zero (modulo antipodes), and in this case, they happen to form an ETF for their span, namely, the orthogonal complement of the all-ones vector. ETFs like this (where the entires are all s and the row vectors sum to zero) are particularly well-suited as * supersaturated designs*. Unfortunately, the naive generalization of this construction fails to produce ETFs. However, a generalization of sorts does exist: There is a Steiner-type construction with the incidence matrix of any finite projective plane that contains something called a

**2. Certain generalized quadrangles lead to new complex ETFs**

See this paper. For this construction, first imagine taking any incidence matrix of a Steiner system, for example:

(The blank entries denote zeros.) Here, each of the 12 rows is the indicator function of a line containing 3 points, and there are a total of 9 points (columns). This particular example is the incidence matrix of an affine plane of order 3. Every point is contained in 4 lines, and so each column has squared norm 4. Also, two points determine a line, so the supports of every pair of columns overlap in exactly one entry. As such, you can think of the columns as being equal-norm and equiangular, even if you replace each 1 with an arbitrary unimodular constant. With this freedom, we can attempt to design unimodular constants in such a way that the columns of the resulting matrix form an ETF for their span. Amazingly, this is possible:

Indeed, the above columns form an ETF for a 6-dimensional subspace of . In general, one may remove the spread from something called an * abelian generalized quadrangle* and use the remaining incidence structure as instructions for producing such an ETF. This results in an infinite family of ETFs, most of which are real (and whose strongly regular graphs were previously discovered by Godsil). However, the complex ETFs in this infinite family are new.

**3. There is no ETF of 96 vectors in **

See this paper. Last time, I pointed to a similar paper which disproved the existence of real ETFs. This new paper uses similar techniques (and again, lots of computation) to establish that no real ETF exists. This is the third nonexistence result of real ETFs that goes beyond the necessary integrality conditions and Gerzon bound. Considering they are so hard to come by, I’m always happy to learn of new necessary conditions like this.

**4. There are new line packings that meet the orthoplex and lifted Toth bounds (!)**

See this paper and that paper. This is a slight stretch, since neither of these are ETF constructions, but they are similarly important because they are provably optimal packings of lines through the origin.

Recall that ETFs are known to be optimal line packings because they meet the Welch bound. There are actually a few lower bounds on coherence that packings might meet. For example, maximal mutually unbiased bases are known to be optimal packings because they meet the orthoplex bound. Recently, Bodmann and Haas used a completely different approach to find infinite families of packings that meet this bound in totally new dimensions. Their main idea is to take a large ETF with all unimodular entries and union it with an identity basis. Such a packing will be too large for the Welch bound be sharp (due to the Gerzon bound), but it is straightforward to show that the packing meets the orthoplex bound.

One way to prove the Welch and orthoplex bounds is to lift to the space of self-adjoint matrices, and then project onto the orthogonal complement of the identity matrix. Such a mapping will send each vector in to a “lifted traceless” real space of dimension , and the squared modulus of a given inner product in the original space can be expressed in terms of an inner product in the new space (no modulus). Through this mapping, line packings are converted into spherical codes, and so the Rankin bound may be applied to the lifted traceless space to produce bounds on line packings. In the special case where , the lifted traceless space is 3-dimensional, and spherical codes in this dimension also satisfy the so-called * Toth bound*. This bound is known to be sharp in the case of the equilateral triangle, the regular tetrahedron, the regular octahedron, and the regular icosahedron. Furthermore, each of these correspond to an optimal line packing in (the last of these was recently established by Casazza and Haas).

]]>

**1. Introduction to ETFs (Dustin G. Mixon)**

Given a -dimensional Hilbert space space and a positive integer , we are interested in packing lines through the origin so that the interior angle between any two is as large as possible. It is convenient to represent each line by a unit vector that spans the line, and in doing so, the problem amounts to finding unit vectors that minimize * coherence*:

This minimization amounts to a nonconvex optimization problem. To construct provably optimal packings, one must prove a lower bound on for a given and spatial dimension , and then construct an ensemble which meets equality in that bound. To date, we know of three bounds that are sharp:

**Trivial bound.**, sharp only if**Welch bound.**, sharp only if**Orthoplex bound.**, sharp only if

Of course, equality in the trivial bound occurs precisely when the vectors are orthogonal. It turns out that equality in the Welch bound occurs precisely when there exist constants and such that

In words, the ensemble is -equiangular and -tight, and so we call the ensemble an * equiangular tight frame* (ETF). Far less is known about ensembles that achieve equality in the orthoplex bound, though Bodmann and Haas have recently made an important stride in this direction.

ETFs were first introduced by Strohmer and Heath in 2003, and in the time since, they have proven to be notoriously difficult to construct. Still, they have found interesting connections with various combinatorial designs, and these connections have been particularly fruitful for constructing new infinite families of ETFs. In this talk we will discuss some of these success stories, paying particular attention to the research programs that led to their discovery.

**2. Nonabelian Harmonic ETFs (Joseph W. Iverson)**

A classic construction of equiangular tight frames relies on the discrete Fourier transform matrix (DFT) of a finite abelian group . Each column of the DFT corresponds to an element of , and each row gives the values of a homomorphism , also called a * character*. In particular, the entries of the DFT are unimodular. The DFT is a scalar multiple of a unitary matrix, so if we pull out any subset of rows, the resulting short, fat matrix will be an equal-norm tight frame. We obtain an

Recent papers by the speaker (here) and by Thill and Hassibi (there) suggest a generalization of this procedure for a * non*abelian group . In this setting, we replace characters with irreducible unitary representations , where is the dimension of the representation and is the group of unitary matrices. As in the abelian case, we can list the values of the irreducible representations in one, big matrix . The only difference is that now it will be convenient to scale the values of by . When , for instance, we get a matrix like the following, with :

Like before, we pull out any set of rows:

Once we collapse the columns, we get an equal-norm tight frame:

Thill and Hassibi have a way of picking the rows in this process that ensures the resulting frame has low coherence. Let be the set of irreducible representations of , up to unitary equivalence. Any group of automorphisms acts on by precomposition:

Fix an irreducible , and let be the orbit of under the action of . Now choose * all* of the rows of that correspond to the representations in the orbit , and make the resulting tight frame. The main result of Thill and Hassibi puts an upper bound on the coherence of this frame. In general, these will not be equiangular tight frames, but in at least one (abelian) example, Thill and Hassibi

**3. Polyphase ETFs and abelian generalized quadrangles (Matthew Fickus)**

We discuss a new way to construct ETFs which involves signing/phasing the incidence matrix of a balanced incomplete block design (BIBD). As we’ll see, these phased BIBD ETFs are naturally represented as the columns of a rank-deficient, tall-skinny matrix. In this form, it will be obvious that these vectors are equiangular but hard to see that they form a tight frame for their span. This contrasts with many other known constructions of ETFs, such as harmonic ETFs and Steiner ETFs, where tightness is obvious but equiangularity is not.

To date, we have constructed three infinite families of phased BIBD ETFs, and one of these contains an infinite number of new complex ETFs. For all of these, what we actually construct is a matrix obtained from a BIBD’s incidence matrix by replacing each of its nonzero entries with a monomial in the ring of polynomials (the convolution algebra) over a finite abelian group. Evaluating the matrices at any nontrivial character of this group produces a phased BIBD ETF.

Being matrices with polynomial entries, any such generalized ETF is an example of a polyphase matrix of a filter bank. As we will explain, the filter banks corresponding to our polyphase BIBD ETFs are closely related to special types of combinatorial designs known as generalized quadrangles (GQs). GQs have a rich literature, and we explain how each of our three infinite families of phased BIBD ETFs relate to it.

Our construction is also related to another recently introduced method for constructing complex ETFs, namely the one given in the recent paper “Equiangular lines and covers of the complete graph” by Coutinho, Godsil, Shirazi and Zhan. Their construction generalizes a well-known connection between real ETFs and strongly regular graphs (SRGs), identifying certain types of complex ETFs with abelian distance-regular antipodal covers of complete graphs (DRACKNs).

Overall, we will see that certain special types of filter banks simultaneously yield ETFs, abelian GQs and abelian DRACKNs. We also discuss some partial converses to these results. For example, the existence of a certain type of phased BIBD ETF actually implies the existence of such a filter bank, which in turn implies the existence of certain GQs and DRACKNs. This opens up some exciting new possibilities for future research: for a long time, frame theory has been leveraging the rich literature of combinatorial designs in order to construct ETFs; these results allow us to use frame theory to prove new results in combinatorial design.

**4. Maximal ETFs by combinatorial techniques (John Jasper)**

The most famous open problem in the study of equiangular tight frames (ETFs) concerns * maximal ETFs*, that is, ETFs with vectors in . To those studying quantum information theory the collection of outer products of a maximal ETF is called a

Zauner’s original conjecture was actually more detailed regarding how one can obtain a SIC-POVM. In particular, he conjectured that for each there is a maximal ETF formed by taking the orbit of a single vector, called a fiducial vector, under the action of the Heisenberg group. A great deal of effort has been put into proving Zauner’s conjecture. Exact solutions are known for dimensions , and numerical solutions are known for . With one exception, these solutions are all obtained by finding a fiducial vector for the Heisenberg group.

There is some good evidence that the group theoretic approach suggested by Zauner may have some fundamental limitations (see this blog entry). Looking at the study of ETFs in general, we see that ETFs generated by a group are important, for example, harmonic ETFs. However, there are several more constructions that are more purely combinatorial in nature. These include Steiner ETFs, Kirkman ETFs, Tremain ETFs, and many more. In this talk, we will discuss some alternate approaches to finding maximal ETFs. One of the smallest examples of a Steiner ETF is actually a maximal ETF in dimension 3, and this is the only instance where these combinatorial constructions have yielded a maximal ETF so far. However, we have recently seen some tantalizing evidence that some previously unknown combinatorial structure is lurking inside of maximal ETFs, illuminating new connections to existing constructions. Our hope is that with some work, we can leverage this combinatorial structure into new constructions of ETFs, perhaps even new maximal ones.

**5. Achieving the orthoplex bound and constructing weighted complex projective 2-designs with Singer sets (Nathaniel Hammen)**

(Based on this paper by Bernhard G. Bodmann and John Haas)

In many situations, we desire a unit-norm frame that has a small maximum magnitude among pairwise inner products. According to a bound by Welch, equiangular tight frames are the minimizers for the maximum magnitude of pairwise inner products. However, in a dimensional Hilbert space, the Welch bound is only achievable if the number of vectors in the frame is at most . If the number of vectors in the frame is larger than this, then the orthoplex bound serves as an alternative to the Welch bound.

In analogy with the Welch bound, the orthoplex bound is only achievable if the number of vectors in the frame is at most . In this talk we show that if a unit-norm frame has a maximum magnitude of pairwise inner products that is less than or equal to the orthoplex bound and there exists a basis such that the inner products between a frame vector and a basis vector are all identical, then the frame formed by the union of these two sets satisfies the orthoplex bound. If the initial frame is tight, then the orthoplectic frame will also be tight. In particular, the union of an equiangular tight frame made up of at least vectors with such a basis will always form a tight orthoplectic frame.

Two families of such orthoplectic frames are constructed using cyclic frames generated by difference sets and relative difference sets. When is a prime power, we obtain a tight frame with vectors, and when is a prime power, we obtain a tight frame with vectors. In addition, the orthoplectic frames that are constructed can also be shown to be weighted 2-designs. These are useful in quantum state tomography.

**6. Optimal subspace packings (Dustin G. Mixon)**

While the previous talks have discussed packing 1-dimensional subspaces in a finite-dimensional Hilbert space, in this talk, we will pack higher-dimensional subspaces. In the 1-dimensional case, we considered the interior angle between two lines, also known as the * principal angle*. When the subspaces are higher-dimensional, which angle shall we attempt to maximize?

In the higher-dimensional case, there are actually several principal angles to work with. Given subspaces and , the principal angles are defined iteratively: Find the unit vectors and that maximize , and then for each , find a unit vector orthogonal to and a unit vector orthogonal to that together maximize . Then the th principal angle between and is .

Now that we have principal angles, we can use them to define a worthy packing objective. When Conway, Hardin and Sloane first wrestled with this, they discussed several alternatives, and found that the so-called * chordal distance* was the easiest to work with theoretically:

where and are -dimensional and are the principal angles. Later, Dhillon, Heath, Strohmer and Tropp introduced another notion of distance, called the * spectral distance*:

When this latter notion of distance is large for all pairs in a given ensemble of subspaces, the ensemble is particularly well-suited for the compressed sensing of block-sparse signals (see this paper).

In this talk, we will discuss subspace packings which are chordal- and spectral-distance optimal. In particular, we will discuss generalizations of the Welch bound, as well as known constructions.

]]>

Suppose you’re given a sample from an unknown balanced mixture of spherical Gaussians of equal variance in dimension :

In the above example, and . How do you estimate the centers of each Gaussian from the data? In this paper, Dasgupta provides an algorithm in which you project the data onto a randomly drawn subspace of some carefully selected dimension so as to concentrate the data points towards their respective centers. After doing so, there will be extremely popular regions of the subspace, and for each region, you can average the corresponding points in the original dataset to estimate the corresponding Gaussian center. With this algorithm, Dasgupta proved that

provided .

By contrast, if you ask a data scientist to solve this problem, he might instinctively use the k-means algorithm. How does this alternative perform? As I mentioned in a previous blog post, I have a recent paper with Soledad Villar and Rachel Ward that analyzes an SDP relaxation of the k-means problem. It turns out that the solution to the relaxation can be interpreted as a denoised version of the original dataset, sending many of the data points very close to what appear to be k-means-optimal centroids. This is similar in spirit to Dasgupta’s random projection, and we use a similar rounding scheme to estimate the optimal k-means-centroids. Using these centroid estimates as Gaussian center estimates, we are able to prove performance bounds of the form when , meaning the performance doesn’t depend on the dimension, but rather the model order.

But does it make sense that the performance should even depend on the model order? This blog entry shows that it does seem to make sense, provided you’re using k-means to estimate the Gaussian centers.

To do so, we need to introduce a couple of new concepts. First, a **stable isogon** is a finite subset of such that

- ,
- the symmetry group acts transitively on , and
- for each , the stabilizer has the property that

Examples include the Platonic solids:

For this blog entry, the most important example will be the extreme points of the 1-ball (i.e., the **orthoplex**).

For the second new concept, consider a balanced mixture of spherical Gaussians of equal variance centered at the points of a stable isogon. For each , let denote the corresponding Voronoi region. Then the **Voronoi means** are given by

As an example, take to be the standard orthoplex in two dimensions (each point having unit distance from the origin). Varying and plotting the corresponding Voronoi means with circles gives the following:

Apparently, as the variance grows, the Voronoi means are pushed away from the true Gaussian centers. Notice that when , the mixture’s density function looks a lot like a single Gaussian centered at the center. This suggests that there isn’t much signal in the density for determining the true Gaussian centers. Indeed, if you sample 100 points from each Gaussian and then run k-means on all 400 points, then the results will vary. I ran this experiment 30 times and plotted the results:

Here, the Voronoi means are plotted in red. If we ramp up the number of points per Gaussian from 100 to 1000, then the k-means output varies quite a bit less:

Taking this to the extreme, we replace 1000 with 10,000 to get

Apparently, the k-means-optimal centroids converge to the Voronoi means in this case. We conjecture that this behavior is exhibited for general stable isogons:

**The Voronoi Means Conecture.** Draw points from a balanced mixture of spherical Gaussians of equal variance centered at points in a stable isogon. Then the k-means-optimal centroids converge in probability to the Voronoi means as .

See the paper for the technical version of this conjecture statement. With this conjecture in mind, we return to our original task: Show the necessity of -dependence in the performance of k-means-optimal centroids as estimates for Gaussian centers. To this end, VMC establishes that the Voronoi means serve as a worthy proxy for the k-means-optimal centroids, and so it suffices to show how Voronoi means differ from the Gaussian centers. As we saw above, they differ more as grows, and so we should expect some -dependence in the MSE. In the above example, the Voronoi means were positive scalar multiples of the Gaussian centers, and this is actually a consequence of being a stable isogon:

**Theorem.** For each stable isogon , there exists such that for every .

In the special case where is an orthoplex in the first dimensions of , then one may write out a formula for as a (nasty) integral. In analyzing this integral, one can show that if the points in the orthoplex have norm , then is a monotonically increasing function of , and when , . (This last bit essentially comes from the infinity norm of a Gaussian vector with iid components.) With the help of this comparison, one can prove the following:

**Theorem.** If is a -dimensional orthoplex, then for every , either

Overall, VMC implies that either MSE or SNR exhibits -dependence. Of course, the necessary dependence is logarithmic in , whereas the current theory with the SDP exhibits dependence which is polynomial in , so it remains to determine the exact nature of this dependence. Of course, this theory depends on VMC, so it would be nice to know if this conjecture is even true:

**I am offering a US$100 award for either a proof of the Voronoi Means Conjecture or strong evidence against it.**

Similar to my previous award announcement, I have the following disclaimer:

*This award has no time limit other than my death, and is entirely at my discretion. I might, also at my discretion, decide to split the award among several people or groups, or give a smaller award for partial progress. I don’t promise to read every claimed proof that’s emailed to me. The prize amount will not be adjusted for inflation.*

For reference, a large class of stable isogons were studied by Broome and Waldron in this paper (they focused on the case where the symmetry group is irreducible). What other stable isogons are there? Also, do other stable isogons yield a stronger -dependence? When investigating other stable isogons, it would be straightforward to numerically check the validity of VMC, and perhaps this will lead to strong evidence against VMC (and a hundred bucks!).

]]>

First, some notation: Denote the translation-by- and modulation-by- operators by

respectively. Then the formal conjecture statement is as follows:

**The HRT Conjecture.** For every and every finite , the collection is linearly independent.

What follows are some of the popular methods for tackling HRT:

**— Transform methods —**

Consider the set of nonzero-scaled time-frequency shift operators . Now consider the group of linear transformations such that , that is, conjugating with permutes the members of . These are called * metaplectic transforms*. Trivial examples include translation and modulation operators. Then for any counterexample to HRT

one may apply to both sides to get another counterexample:

As such, applying will modify the function , the time-frequency shifts , as well as the scalar multiples in the combination . It is helpful to visualize as an operation on the plane of pairs , called * phase space*. Here, shifts the plane to the right by , whereas shifts the plane up by . Considering the Fourier transform converts modulations to translations (and translations to inverse modulations), it is also an example of a metaplectic transform, and it acts on phase space by a 90-degree rotation. In fact, any rotation can be achieved by a fractional Fourier transform. Also, dilation will stretch phase space along the translation axis, and produce the inverse stretch along the modulation axis. Finally, chirp modulation shears phase space. Overall, metaplectic transforms can be leveraged to apply any area-preserving affine transformation of phase space.

As an example application of metaplectic transforms, we will show that translations of any nonzero function in are linearly independent. Suppose

Then applying the Fourier transform gives

At this point, we can factor out and restrict our attention to the support of , which has positive measure since . As such, the trigonometric polynomial must be zero on a set of positive measure, which is only possible if the trigonometric polynomial is trivial.

Metaplectic transforms are also useful for making “without loss of generality” statements. For example, if you want to prove HRT for any configuration of four points in phase space, you may apply metaplectic transforms to have without loss of generality.

Before moving on to the next section, I want to discuss two proofs that are related to the above proof. First, it is known that HRT holds whenever is a subset of a lattice in . (In particular, HRT holds for any three points.) In the special case where the lattice is the integer lattice, one may apply the Zac transform to convert time-frequency shifts into two-dimensional modulations. In fact, the proof of this case is nearly identical to the above proof that pure translations are independent.

We’ve been restricting the set of time-frequency shifts in a couple of ways in order to chip away at HRT, but you can also restrict the function class instead. For example, suppose has compact support and

If the translations are all the same in , then we are done by the previous analysis. Otherwise, let and be the two largest distinct translations in , and let denote the essential supremum of the support of . Then we can isolate the terms of the form by restricting the support to . These terms are all modulations of , and so the scalars are all zero. Repeating inductively then gives that HRT holds for functions of compact support.

Intuitively, the proof for compactly supported functions has the look and feel of row operations of a matrix, and as such, the proof is very much married to the time domain. Perhaps not surprisingly, this proof does not generalize to much broader classes of functions, but there are other time-domain methods that have found substantial success.

**— Time-domain methods —**

To illustrate the main idea here, let’s prove that two translations are linearly independent, but let’s do so without appealing to the Fourier transform. For simplicity (and without loss of generality), we consider translation by -1:

Intuitively, this means that for each , the samples with form an exponential. As such, either blows up to the left or to the right, or has constant modulus, but it’s certainly not in in any case.

Perhaps surprisingly, slight generalizations of this particular analysis seem rather difficult: Take two parallel lines in phase space, and consider two points on each line. Then you can write a similar recurrence relation to the above, but instead of , you get a quotient of trigonometric polynomials. It turns out that this quotient is sufficiently well behaved that you can consider it to be “essentially constant” in an ergodic sense, and so the desired conclusion can be made. See this paper for the (rather technical) details.

In general, it is difficult to analyze the exact recurrence relations that come from a dependency between time-frequency shifts. It is often much easier to analyze a “recurrence inequality” that comes from taking absolute values of both sides of the recurrence relation, and then passing through the triangle inequality. The following argument serves as a helpful example:

Suppose has the property that for each , is ultimately decreasing (examples include Gaussians and other Hermite functions). Importantly, this implies that whenever , then for all sufficiently large , and so

for all sufficiently large . In fact, since is arbitrary, we may conclude that

Now that we understand the function class that belongs to, suppose

and let be the distinct translations in . Then there exist trigonometric polynomials such that

Since is ultimately nonzero, we may divide and take absolute values to get a “recurrence inequality”:

for almost every sufficiently large . By , and since the trigonometric polynomials are bounded, we see that the right-hand side goes to zero on a set of full measure as gets large, which is not the sort of behavior exhibited by a nontrivial trigonometric polynomial. As such, we conclude that , and applying this argument inductively then gives that the entire linear combination is trivial.

Analyzing a recurrence inequality is fundamentally different from analyzing recurrence relations, since the analysis is inherently one-sided: we only take to be large (due to the inequality going one way) instead of also seeing what happens when gets large. As such, instead of identifying that must exhibit some sort of exponential blowup in one direction or another, we are forced to merely establish that at best, decays exponentially. This would be fine if we were to assume that belongs to a function class with faster-than-exponential decay, since this would allow us to derive a contradiction. However, this appears to be a fundamental barrier with this proof technique. I don’t think it will enable us to tackle all of , or even Schwartz space.

To be clear, we don’t yet have a proof that HRT holds for all functions of faster-than-exponential decay. Above, we showed that HRT holds for the monotone functions of such decay, and this paper tackles all functions with slightly faster decay: for every , as .

**— Other methods —**

In this section, we discuss a few other methods that have been used to tackle HRT, and could very well lead to a full proof of HRT.

**1. Spectral methods.** Notice that HRT is a statement about the kernels of finite linear combinations of time-frequency shift operators, since

precisely when is in the kernel of . But suppose this combination of operators exhibits another eigenvalue . Then

meaning the time-frequency shifts produce an HRT counterexample. As such, HRT is equivalent to every finite linear combination of time-shift operators having an empty point spectrum. This suggests an operator-theoretic approach to HRT. Some results along these lines are available here and there.

**2. Perturbation methods.** Suppose HRT holds for a given and for a given collection of points in phase space. Then it is known that sufficiently small perturbations of also satisfy HRT with , and furthermore, sufficiently small Euclidean perturbations of satisfy HRT with . As such, one could conceivably prove HRT by establishing explicit bounds on the sizes of allowable perturbations. For example, when perturbing , it would help to have control of the lower Riesz bound of .

**3. Passing to .** Let denote the multiplicative group

It is helpful to identify . Consider functions in and translation operator given by for every . It is a folklore theorem that HRT is equivalent to translations of nonzero functions in being linearly independent. As part of the workshop, we ironed out the extent to which these statements are equivalent (they are “nearly equivalent”), and passing to appears to be a reasonable strategy to tackle HRT. Considering the role that the Fourier transform plays in demonstrating the linear independence of pure translations over , this suggests that one ought to study a Fourier transform over .

**— HRT subproblems —**

To help illustrate the current boundary of modern techniques, this section provides a few subproblems, each of which are implied by HRT.

**Problem 1.** Is HRT true for ?

This problem highlights the fact that we don’t yet have a proof of HRT for the special case where . The cases where can be handled by the fact that HRT holds whenever is a finite subset of a lattice in . By contrast, when , we only have results for very specific (and notably non-generic) configurations.

**Problem 2.** Is HRT true for every finite ?

To solve this problem, one is inclined to exploit time-domain techniques, specifically recurrence relations (not recurrence inequalities!). By dilation tricks, this would imply HRT for arbitrary rectangular lattices (which is a known result, but not particularly easy to prove).

**Problem 3.** Is HRT true for functions with faster-than-exponential decay?

Recall that recurrence inequalities appear to be incapable of proving HRT for functions of exponential-or-slower decay. It would be nice to know if these techniques can “live up to their potential.”

**Problem 4.** Is HRT true for hyperbolic secant?

Recall that . Interestingly, the Fourier transform of hyperbolic secant is a dilated version of itself. As such, this function has exponential decay in both time and frequency, and is therefore a natural candidate to illustrate the boundary of the time-domain methods.

]]>

Afonso S. Bandeira, Nicolas Boumal, Vladislav Voroninski

As the title suggests, this paper provides strong performance guarantees for the Burer-Monteiro heuristic in the particular cases of synchronization and community detection. I was very excited to see this paper, and so I interviewed one of the authors (Nicolas Boumal). I’ve lightly edited his responses for formatting and hyperlinks:

**DGM: **The Burer-Monteiro heuristic for solving SDPs has been around for well over a decade. Why has it taken so long for the theory to catch up? Are the techniques you leverage particularly new?

**NB:** One part may be that the original SDPLR papers by Burer-Monteiro aimed at solving general SDP’s with few equality constraints (even though the numerical experiments focused on a few specific types of problems, among which the Max-Cut SDP shines particularly). The rank-2 phenomenon that we observe in the paper is not general. So perhaps the first audience of these papers was not that interested in this result.

As for execution, we rely mainly on two existing assets. Both are fairly recent.

For insight, we use the 2010 paper by Journée, Bach, Absil, and Sepulchre, who observed the importance (both algorithmic and theoretical) of the smooth manifold structure of the rank restricted search space; that is: when the convex search space of the SDP is restricted to matrices of bounded rank, the resulting (nonconvex) set can be parameterized by a smooth manifold. This is not true for general SDP’s, but it is true here, and we believe this is a great facilitator. (It is perhaps even required in some sense, but it is too soon to tell.) Of course, in our setting, the manifold turns out to be particularly simple (it is a product of circles), and it doesn’t take a lot of technical know-how to exploit this structure; it was used to some extent before 2010 of course. But I do believe that framing the nonconvex optimization problem in the larger setting of optimization on manifolds, despite the simplicity of the specific manifold at hand, has helped us get better intuition; this wider vision we credit to Journée et al. We also think this can help identify a more general setup in which SDPLR is likely to work well; something that is missing for now.

For the more technical steps, such as establishing strong correlation of the ground truth with second-order critical points, we relied mostly on the proof program successful in a previous 2014 paper where we show tightness of the SDP relaxation for phase synchronization, and also more recently in 2016 for a nonconvex approach to that one: it is the same SDP as the one we treat here, but with complex numbers. It is interesting that the same proof ingredients work both to study tightness of the SDP and to establish that nonconvex approaches to solving it can be successful.

**DGM: **Your guarantees give conditions under which all second-order critical points are good-enough solutions to synchronization or community detection. What are the underlying proof ideas for these results?

**NB:** The nonconvex search space is the manifold of matrices of size whose rows each have unit norm. This indeed ensures that is positive semidefinite with unit diagonal: the constraints of the SDP; and also that the rank is bounded by 2.

Assuming further satisfies first- and second-order optimality conditions, and calling the ground truth signal, we wanted to show that is large (much larger than what one would get by random chance). The necessary optimality conditions for the optimization problem on the manifold are simple: the Riemannian (or projected) gradient of the cost function must vanish (first order), and the Riemannian Hessian must be negative semidefinite since we are maximizing (second order). One thing which is clear is that both conditions must be used to reach a strong conclusion. The manifold structure makes these conditions particularly simple to obtain and manipulate. (In all generality, second-order KKT conditions are rather painful to work with…)

We start by showing that second-order conditions alone already imply some (weak) correlation. One way to understand this is that the second-order condition, even though it is purely local, turns into a more global statement in the following way. By Lemma 7, if satisfies second-order conditions, , where is the cost matrix. Take the inner product of this matrix inequality with any matrix whose diagonal is (this is the case for all feasible points of the SDP), and you get an inequality of the type . This readily implies that the cost at is not too bad, and hence that must correlate with the dominant component of , namely, .

The measurement model is , where is the ground truth vector and is the perturbation. For -synchronization, the perturbation is assumed to be unrelated to and small enough, so the dominant eigenvector of relates to and the above explanation works out easily. For community detection on the other hand, the perturbation relates to and could have large operator norm. It is then important to recognize that the components of which grossly violate the constraints on can easily be discarded, as described in the paper. This is done automatically by the SDP, and also by the nonconvex program.

Then, to further obtain truly strong correlation of the second-order critical points with , we show that points which satisfy second-order conditions *and* first-order conditions must be close to rank 1. When they are exactly rank 1, they are globally optimal, as we show in the exact recovery parts of the paper. Being approximately rank 1 turns out to already help control for high correlation.

**DGM: **Your nontrivial correlation–type results require the SNR to be bigger than 8, and your exact-recovery result under the stochastic block model requires SNR to scale like . What do you think are the true transitions, and why is it difficult to take the theory down to these transitions?

**NB:** The term for the exact recovery result is definitely a limitation of our proof technique. It is the same limitation that arises in this and that paper, mentioned earlier. In the present paper, the culprit is isolated in Lemma 13. Numerical experiments both for the exact recovery result and for the isolated lemma suggest that the real transition should be a polylog function. In our proofs, we only ever use two properties of the perturbation matrix , namely, a bound on the value of the SDP if the cost matrix is replaced by (which can be replaced by the operator norm of for -synchronization) and a bound on the -norm of the alignment of with the ground truth , that is, . But in our experiments, noise is generated following a much more structured random model. Probably, the true transitions can only be established if the random structure is exploited more finely.

As for community detection in the stochastic block model with constant average degree, it is already known that for it is not possible to get nontrivial correlation with the ground truth (this is the information theoretic limit). It is also known, and these are recent results (see here and here), that the SDP does provide nontrivial correlation as soon as . It is quite remarkable that there is no gap there. Now, it is tempting to believe that the Burer-Monteiro approach should work when the SDP works… But our current proof falls a bit short of that, and requires . Our sentiment at this stage is that it might be reasonably easy to reduce the constant to 4, or maybe even 2 with more care in the inequalities. But getting all the way down to 1 may prove to be a bigger challenge.

**DGM: **Are there any barriers to applying your proof ideas to other SDPs?

**NB:** For now, our analysis relies rather strongly on the manifold structure of the rank-restricted search space. While this structure is present in a number of important SDP’s, it is definitely not a mild condition. This also means we cannot accommodate inequality constraints at the moment. In practice, methods such as the original SDPLR algorithm by Burer and Monteiro can be applied even without the manifold structure, but there are no guarantees. Empirical results are also mixed, so that it is not clear if this is a limitation of the proof techniques or of the algorithms, or a real limitation. Down the line, we would also like to handle nonlinear cost functions, but then we lose the guarantee that there will be an optimal extreme point. That’s problematic, because the most general tool we know to control the rank of optimal solutions is the Pataki-Barvinok theorems, which control the rank of extreme points. For applications where there are other reasons for the solutions to be low-rank nonetheless (as is the case here), extra work is needed. A number of things have to line up well for this to be within reach.

On the bright side, these results should extend rather directly to synchronization of rotations in , which is similar to synchronization but with the constraint that diagonal *blocks *of are identity matrices of size . This paper gives deterministic results for the low-rank approach to work there. The deterministic result gets nowhere near allowing for a mere rank relaxation, but numerical experiments with a Gaussian noise model show that this is indeed sufficient even for large noise, and I expect the proof techniques laid out in the present paper to extend to that setting as well. (In fact, we know that most parts do.)

]]>

Let’s start with two motivating applications:

The first application comes from graph clustering. Consider the stochastic block model, in which the vertices are secretly partitioned into two communities, each of size , and edges between vertices of a common community are drawn iid with some probability , and all other edges are drawn with probability . The goal of community estimation is to estimate the communities given a random draw of the graph. For this task, you might be inclined to find the maximum likelihood estimator for this model, but this results in an integer program. Relaxing the program leads to a semidefinite program, and amazingly, this program is tight and recovers the true communities with high probability when and for good choices of . (See this paper.) These edge probabilities scale like the threshold for connected Erdos-Renyi graphs, and this makes sense since we wouldn’t know how to assign vertices in isolated components. If instead, the probabilities were to scale like , then we would be in the “giant component” regime, so we’d still expect enough signal to correctly assign a good fraction of the vertices, but the SDP is not tight in this regime.

As a second application, consider geometric clustering with the k-means objective. This can be formulated as an integer program, which can in turn be relaxed to an SDP. Recently, two papers (one and two) have investigated the performance of this SDP under the so-called stochastic ball model, which draws points from a mixture of different translates of a rotation-invariant distribution. In particular, if the distribution has compact support and the different translates have at least a little space between their supports, then the SDP relaxation recovers the planted clustering with high probability (here, exactly how little the space needs to be remains an open problem). Unfortunately, real-world data may not match such a nice model. To tackle messier data, one might study the performance of the SDP under a Gaussian mixture model, but the SDP is not tight for this model.

In both applications, the optimization has the following form:

maximize data subject to feasible

The problem is that when the data is noisy, the SDP relaxation fails to be tight. While this phenomenon may seem foreign to folks who study compressed sensing or phase retrieval, it is quite common in operations research (e.g., try solving the travelling salesman problem). The fix is to round to an integer solution, which in turn produces an approximation ratio: The optimal integer solution may be better than the rounded solution, but it’s definitely no better than the relaxed solution. As such, you have an approximation of the form

val(SDP) val(IP)

However, our clustering tasks ask for a different sort of approximation guarantee: We want our rounded solution (clustering) to be close in some sense to the planted clustering. Since our rounded solution comes from the relaxed solution, and since we know the SDP recovers the planted clustering when the data is less noisy, this means we actually want the following sort of Lipschitz-type approximation:

arg(SDP with noisy data) arg(SDP with less noisy data)

The only guarantee of this sort that I know of comes from Guedon and Vershynin’s paper, which analyzes minimum bisection SDP under the stochastic block model in the regime where and scale like . In this blog entry, I boil down the main ideas so as to facilitate its use in future applications. (Also, this captures the main ideas in the approximation portion of our k-means analysis.)

Let be a convex set in some Euclidean space , and for each linear objective , put

.

We’d like to show that if and are close in some -specific sense, then is close to . Here, represents noisy data, is less noisy, and so is the solution we get, whereas is the planted solution we wish we got. It turns out that the following norm provides a useful notion of distance:

.

This is called the support function of the convex hull of . It’s a norm for the span of , and it’s the dual of the atomic norm of . What follows is the approximation we use:

**Theorem (cf. Guedon–Vershynin).** For every with , we have

.

Strangely, this is somewhat different from the Lipschitz-type property we wanted: The result states that is close to in Euclidean norm if some scalar multiple of is close to in -norm (and if is more “geometrically extreme” than ). In both applications, geometric extremeness holds whenever is integral, and this makes intuitive sense, since this is saying that lies in a more pointy portion of than does (assuming is centered at zero, which it certainly is if we pass to ). It’s instructive to see how this bound behaves in the cases where is the unit 1- or 2-ball in two dimensions. In both cases, the bound is shockingly loose when and are close. As such, it may be interesting to investigate an alternative bound that better captures the left-hand side in the small-deviation regime. Regardless, this bound produces good results for both graph clustering and k-means clustering. In these applications, it is convenient to bound the right-hand side with a norm that is more accessible by random matrix–type methods (e.g., Guedon and Vershynin apply a PSD version of Grothendieck’s inequality to pass to the induced norm, and then apply a union-bound estimate).

* Proof of Theorem:* We have

where the last step follows from the hypothesis. Pick . Then implies

,

and similarly implies

.

Addition then gives

.

Next, and , and so . Applying this to the above and combining with then gives the result.

]]>