Perhaps you heard on Gil Kalai’s blog, but the 54-year-old Kadison-Singer problem was recently solved by Adam Marcus, Dan Spielman and Nikhil Srivastava. The paper is fantastically short, and comes as a sequel to another recent paper which used similar methods to solve another important problem: For every degree , there exists an infinite family of -regular bipartite Ramanujan graphs. Considering the gravity of these results, it’s probably wise to get a grasp of the techniques that are used to prove them. The main technique they use (“the method of interlacing families”) is actually new, and so I decided to write a blog post to discuss it (and force myself to understand it).
— 1. A new probabilistic method —
The main idea is that you’re given a list of polynomials, and you’re asked to establish that the largest real root of one of these polynomials is smaller than something. This is very similar to the sort of problem that the probabilistic method could be used to solve: Suppose you have a huge collection of real numbers, and you want to see how small these numbers can be. Define a random variable to be uniformly distributed over and compute the expected value. Then you know that there is some member of which is no larger than . Another way to say this is the minimum is always less than or equal to the average. Functional analysts use this pigeonhole principle-type argument all the time without consciously acknowledging that they are using the probabilistic method. Of course, the probabilistic method can be formulated as “just counting,” but oftentimes, it is easier to think in terms of probability.
Now if you’re given a collection of polynomials, you might be inclined to define a uniform distribution over this collection, and then find the “average” polynomial, that is, the polynomial whose th coefficient is the average of the th coefficients. But you’re asked to establish that there is a polynomial in the collection whose largest real root is smaller than something. How can you relate this to the roots of the average polynomial? In general, there is no correspondence, but the following lemma reveals an interesting case in which something can actually be said:
Lemma (Lemma 4.2 in Part I, essentially). Take a finite collection of polynomials , each with all real coefficients, a positive leading coefficient, and at least one real root. Let and be the largest and second largest real roots (respectively) of ; if only has one real root, take to be . If
then has a real root (say, ) which satisfies .
Note that the average polynomial is just a scaled version of , which will have the same roots as . In practice, you will use the largest real root of as the upper bound on . The coolest thing about this result is you could assign the proof as homework in an basic real analysis class.
Proof of Lemma: View each polynomial as a function over the real line. Since the leading coefficient of is positive, there exists such that for every and every . Taking , we then have . We claim that . Suppose otherwise, and consider two cases:
Case I: There exists such that . Then either is a root of in , or has a root in by the intermediate value theorem. This contradicts the fact that the second largest root of satisfies .
Case II: is strictly positive over . Note that is also strictly positive over , since otherwise implies that is not the largest root by the intermediate value theorem. Since is differentiable, exists, and we must have , since otherwise there is a point in a neighborhood of for which . Writing out the definition of the derivative, and taking to be the polynomial such that , we have
As such, is also a root of , and so . Combined with , we then have that , which contradicts the assumption that .
To summarize up to this point, we have for every , where , and also for every . By addition, it then follows that , meaning has a root by the intermediate value theorem.
Note that I presented this result differently from the original, mostly because I wanted to weaken the hypothesis to improve my understanding. The main takeaway is you want there to be a “caste system” on the top roots in order for this new probabilistic method to work.
— 2. A useful condition —
Consider the following definition:
Definition. Take a finite collection of polynomials of degree with all real roots. Let denote the roots of . We say the polynomials have a common interlacing if there exists such that
for every .
Note that having a common interlacing implies that , and so such polynomials satisfy the hypothesis of the previous lemma (provided the leading coefficients are positive). Moreover, such polynomials are rather common, at least in linear algebra. Theorem 4.3.10 of Horn and Johnson’s Matrix Analysis gives the following example: Consider all self adjoint matrices such that deleting the last row and column produces the same matrix . Then the characteristic polynomials of these matrices have a common interlacing (the ‘s in this case can be taken to be the eigenvalues of ).
Not only does this notion of common interlacing naturally occur in linear algebra, it also can be verified by testing the hypothesis in the following lemma:
Lemma (Proposition 1.35 in this paper, essentially). Take polynomials and of degree with positive leading coefficients such that, for every , the polynomial has real roots. Then and have a common interlacing.
Proof: We only treat the special case where the roots of and are all distinct (the degenerate cases can be treated using basic calculus-type arguments). By assumption, and each have real roots, which form points on the real line, and their complement in the real line forms open intervals. Viewing and as functions over the real line, note that their signs are constant over each of these open intervals (this follows from the intermediate value theorem). In fact, ordering these intervals from right to left as , then since the polynomials have positive leading coefficients, both and are positive on . Next, the largest of the roots belongs to either or (but not both), and so and have opposite sign over . Continuing in this way, and have common sign over even-indexed intervals and opposite sign over odd-indexed intervals. As such, addition gives that for any , every real root of must lie in an odd-indexed interval. Moreover, the (real) roots are continuous in , and so the boundary of each odd-indexed interval had better match a root of with a root of . Picking a point from each of the even intervals then produces a common interlacing of and .
Corollary. Take a finite collection of polynomials of degree with positive leading coefficients such that every convex combination of these polynomials has real roots. Then the polynomials have a common interlacing.
Proof: By the lemma, any pair has a common interlacing. This in mind, suppose does not have a common interlacing. Then there is some such that the th largest root of is smaller than the st largest root of , contradicting the fact that and have a common interlacing.
In discerning whether the new probabilistic method is valid for a given sequence of polynomials, the above result reduces this problem to verifying the real-rootedness of a continuum of polynomials. While this may seem particularly cumbersome at first glance, Marcus, Spielman and Srivastava have found success in verifying this (in both papers) using a theory of real stable polynomials. This leads to the most technical portions of both papers, and I plan to study these portions more carefully in the future.
— 3. The method of interlacing families —
In practice, the polynomials you are given may not have a common interlacing, but you might be able to organize them into something called an interlacing family.
Definition. Let be finite sets, and use the -tuples in to index polynomials. In particular, let each be a real-rooted degree- polynomial with a positive leading coefficient. For every and , define
For the degenerate case where , we take to be the sum of all of the ‘s. We say the polynomials form an interlacing family if for every and , the polynomials
have a common interlacing.
If you think of as being a random polynomial drawn uniformly from the collection , then
The other polynomials defined above can be thought of in terms of conditional expectation. That is, suppose you are told that the first indices of are . Then
Unfortunately, this is the extent of my intuition with interlacing families. I would like to relate them to eigensteps, for example, but interlacing families are very different because they involve partial sums of polynomials (as opposed to characteristic polynomials of partial sums of rank-1 matrices). Regardless, by the following result, interlacing families are just as useful as polynomials with a common interlacing:
Theorem (Theorem 4.4 in Part I). Suppose form an interlacing family. Then there exists an -tuple such that the largest root of is at most the largest root of .
Proof: By assumption, has a common interlacing, and so by the first lemma, there is a polynomial (, say) whose largest root is at most the largest root of
Next, we know has a common interlacing, and so by the first lemma, there is a polynomial (, say) whose largest root is at most the largest root of
which in turn is no larger than the largest root of . The result follows by induction.
At this point, we provide the two-step “method of interlacing families” which MSS use in both papers:
- Prove that the given polynomials form an interlacing family (using real stable polynomials).
- Find and produce an upper bound on its largest root.
— 4. How to prove Kadison-Singer —
It was a little disingenuous of me to suggest in the previous section that the polynomials are “given” to you. Indeed, there is a bit on ingenuity that goes in to actually finding the polynomials of interest. For the sake of example (and because it motivated me to write this blog entry in the first place), I will sketch how Marcus, Spielman and Srivastava use the method of interlacing families to prove Kadison-Singer.
Actually, they prove an equivalent version of Kadision-Singer called Weaver’s conjecture: There exists , and such that every finite-dimensional -tight frame with frame elements of norm at most 1 can be partitioned into subcollections of vectors such that the frame operator of each subcollection has operator norm at most . Apparently, it suffices to take , and . This follows from the following result:
Theorem (Corollary 1.3 in Part II). Let be column vectors in such that and for all . Then there exists a partition such that
for both and .
Even after seeing the result they proved, it’s not obvious what the polynomials are, but we’re getting there. Let be independent random column vectors in . Each has entries, so think of the top and bottom halves of this vector to define the distribution: With probability , the top half is and the bottom half is all zeros, and the rest of the time the top half is all zeros and the bottom half is . In this way, each vector is randomly lifted to a doubly-dimensional vector (qualitatively, this bears some resemblance to the random 2-lifts of graphs described in Part I).
Notice that is self adjoint and positive semidefinite, and so its operator norm is precisely the largest root of its characteristic polynomial. Since there are possible realizations of , we have a total of characteristic polynomials to analyze; these are the polynomials we are “given.” In Section 4 of Part II, they use real stable polynomials to show that these polynomials form an interlacing family. The most technical portion of Part II (namely, Section 5) is dedicated to bounding the largest root of in this case, and they manage to bound it by .
With this, the last theorem implies that the largest root of one of the polynomials is at most . In other words, there exists a subset such that
As such, the result follows by taking .