An introduction to interlacing families

Perhaps you heard on Gil Kalai’s blog, but the 54-year-old Kadison-Singer problem was recently solved by Adam Marcus, Dan Spielman and Nikhil Srivastava. The paper is fantastically short, and comes as a sequel to another recent paper which used similar methods to solve another important problem: For every degree d>2, there exists an infinite family of d-regular bipartite Ramanujan graphs. Considering the gravity of these results, it’s probably wise to get a grasp of the techniques that are used to prove them. The main technique they use (“the method of interlacing families”) is actually new, and so I decided to write a blog post to discuss it (and force myself to understand it).

— 1. A new probabilistic method —

The main idea is that you’re given a list of polynomials, and you’re asked to establish that the largest real root of one of these polynomials is smaller than something. This is very similar to the sort of problem that the probabilistic method could be used to solve: Suppose you have a huge collection S of real numbers, and you want to see how small these numbers can be. Define a random variable X to be uniformly distributed over S and compute the expected value. Then you know that there is some member of S which is no larger than \mathbb{E}[X]. Another way to say this is the minimum is always less than or equal to the average. Functional analysts use this pigeonhole principle-type argument all the time without consciously acknowledging that they are using the probabilistic method. Of course, the probabilistic method can be formulated as “just counting,” but oftentimes, it is easier to think in terms of probability.

Now if you’re given a collection of polynomials, you might be inclined to define a uniform distribution over this collection, and then find the “average” polynomial, that is, the polynomial whose kth coefficient is the average of the kth coefficients. But you’re asked to establish that there is a polynomial in the collection whose largest real root is smaller than something. How can you relate this to the roots of the average polynomial? In general, there is no correspondence, but the following lemma reveals an interesting case in which something can actually be said:

Lemma (Lemma 4.2 in Part I, essentially). Take a finite collection of polynomials \{f_i(x)\}_i, each with all real coefficients, a positive leading coefficient, and at least one real root.  Let R_i and r_i be the largest and second largest real roots (respectively) of f_i(x); if f_i(x) only has one real root, take r_i to be -\infty. If

\displaystyle{\max_j r_j\leq \min_j R_j},

then f_\emptyset(x):=\sum_j f_j(x) has a real root (say, R) which satisfies \min_j R_j \leq R.

Note that the average polynomial is just a scaled version of f_\emptyset(x), which will have the same roots as f_\emptyset(x). In practice, you will use the largest real root of f_\emptyset(x) as the upper bound on \min_j R_j.  The coolest thing about this result is you could assign the proof as homework in an basic real analysis class.

Proof of Lemma: View each polynomial f_i(x) as a function over the real line. Since the leading coefficient of f_i(x) is positive, there exists y_1 such that f_i(y)>0 for every y\geq y_1 and every i.  Taking y_0:=\min_j R_j, we then have r_i\leq y_0 \leq R_i\leq y_1. We claim that f_i(y_0)\leq 0. Suppose otherwise, and consider two cases:

Case I: There exists y\in(y_0,R_i) such that f_i(y)\leq 0. Then either y is a root of f_i(x) in (y_0,R_i), or f_i(x) has a root in (y_0,R_i) by the intermediate value theorem. This contradicts the fact that the second largest root of f_i(x) satisfies r_i\leq y_0.

Case II: f_i is strictly positive over (y_0,R_i). Note that f_i is also strictly positive over (R_i,\infty), since otherwise f_i(y_1)>0 implies that R_i is not the largest root by the intermediate value theorem. Since f_i is differentiable, f_i'(R_i) exists, and we must have f_i'(R_i)=0, since otherwise there is a point y in a neighborhood of R_i for which f_i(y)<0. Writing out the definition of the derivative, and taking g_i(x) to be the polynomial such that f_i(x)=(x-R_i)g_i(x), we have

\displaystyle{0=f_i'(R_i)=\lim_{y\rightarrow R_i}\frac{f_i(y)-f_i(R_i)}{y-R_i}=\lim_{y\rightarrow R_i}\frac{f_i(y)}{y-R_i}=g_i(R_i)}.

As such, R_i is also a root of g_i(x), and so r_i=R_i.  Combined with r_i\leq y_0\leq R_i, we then have that y_0=R_i, which contradicts the assumption that f_i(y_0)>0.

To summarize up to this point, we have f_i(y_0)\leq 0 for every i, where y_0:=\min_j R_j, and also f_i(y_1)>0 for every i.  By addition, it then follows that f_\emptyset(y_0)\leq 0<f_\emptyset(y_1), meaning f_\emptyset(x) has a root R\in[y_0,y_1) by the intermediate value theorem.     \Box

Note that I presented this result differently from the original, mostly because I wanted to weaken the hypothesis to improve my understanding. The main takeaway is you want there to be a “caste system” on the top roots in order for this new probabilistic method to work.

— 2. A useful condition —

Consider the following definition:

Definition. Take a finite collection of polynomials \{f_i(x)\}_i of degree n with all real roots. Let \beta_1^{(i)}\leq\cdots\leq\beta_n^{(n)} denote the roots of f_i(x). We say the polynomials \{f_i(x)\}_i have a common interlacing if there exists \{\alpha_k\}_{k=1}^{n-1} such that

\displaystyle{\beta_1^{(i)}\leq \alpha_1\leq \beta_2^{(i)}\leq \alpha_2\leq \cdots\leq \alpha_{n-1}\leq \beta_n^{(i)}}

for every i.

Note that \{f_i(x)\}_i having a common interlacing implies that \max_j\beta_{n-1}^{(j)}\leq \alpha_{n-1}\leq \min_j\beta_n^{(j)}, and so such polynomials satisfy the hypothesis of the previous lemma (provided the leading coefficients are positive). Moreover, such polynomials are rather common, at least in linear algebra. Theorem 4.3.10 of Horn and Johnson’s Matrix Analysis gives the following example: Consider all self adjoint n\times n matrices such that deleting the last row and column produces the same matrix A. Then the characteristic polynomials of these matrices have a common interlacing (the \alpha_k‘s in this case can be taken to be the eigenvalues of A).

Not only does this notion of common interlacing naturally occur in linear algebra, it also can be verified by testing the hypothesis in the following lemma:

Lemma (Proposition 1.35 in this paper, essentially). Take polynomials f(x) and g(x) of degree n with positive leading coefficients such that, for every \lambda\in[0,1], the polynomial \lambda f(x)+(1-\lambda)g(x) has n real roots. Then f(x) and g(x) have a common interlacing.

Proof: We only treat the special case where the 2n roots of f(x) and g(x) are all distinct (the degenerate cases can be treated using basic calculus-type arguments).  By assumption, f(x) and g(x) each have n real roots, which form 2n points on the real line, and their complement in the real line forms 2n+1 open intervals. Viewing f and g as functions over the real line, note that their signs are constant over each of these open intervals (this follows from the intermediate value theorem). In fact, ordering these intervals from right to left as \{I_k\}_{k=0}^{2n}, then since the polynomials have positive leading coefficients, both f and g are positive on I_0. Next, the largest of the 2n roots belongs to either f(x) or g(x) (but not both), and so f and g have opposite sign over I_1. Continuing in this way, f and g have common sign over even-indexed intervals and opposite sign over odd-indexed intervals. As such, addition gives that for any \lambda>0, every real root of \lambda f(x)+(1-\lambda)g(x) must lie in an odd-indexed interval. Moreover, the (real) roots are continuous in \lambda, and so the boundary of each odd-indexed interval had better match a root of f(x) with a root of g(x). Picking a point from each of the even intervals then produces a common interlacing of f(x) and g(x).     \Box

Corollary. Take a finite collection of polynomials \{f_i(x)\}_i of degree n with positive leading coefficients such that every convex combination of these polynomials has n real roots. Then the polynomials \{f_i(x)\}_i have a common interlacing.

Proof: By the lemma, any pair has a common interlacing. This in mind, suppose \{f_i(x)\}_i does not have a common interlacing. Then there is some i,j,k such that the kth largest root of f_i(x) is smaller than the (k+1)st largest root of f_j(x), contradicting the fact that f_i(x) and f_j(x) have a common interlacing.     \Box

In discerning whether the new probabilistic method is valid for a given sequence of polynomials, the above result reduces this problem to verifying the real-rootedness of a continuum of polynomials. While this may seem particularly cumbersome at first glance, Marcus, Spielman and Srivastava have found success in verifying this (in both papers) using a theory of real stable polynomials. This leads to the most technical portions of both papers, and I plan to study these portions more carefully in the future.

— 3. The method of interlacing families —

In practice, the polynomials you are given may not have a common interlacing, but you might be able to organize them into something called an interlacing family.

Definition. Let S_1,\ldots,S_m be finite sets, and use the m-tuples in S_1\times \cdots\times S_m to index |S_1|\cdots|S_m| polynomials. In particular, let each f_{s_1,\ldots,s_m}(x) be a real-rooted degree-n polynomial with a positive leading coefficient. For every k<m and s_1\in S_1,\ldots, s_k\in S_k, define

\displaystyle{f_{s_1,\ldots,s_k}(x):=\sum_{s_{k+1}\in S_{k+1}}\cdots\sum_{s_m\in S_m}f_{s_1,\ldots s_m}(x)}.

For the degenerate case where k=0, we take f_\emptyset(x) to be the sum of all of the f_{s_1,\ldots,s_m}(x)‘s. We say the polynomials \{f_{s_1,\ldots,s_m}(x) \}_{s_1,\ldots,s_m} form an interlacing family if for every k<m and s_1\in S_1,\ldots, s_k\in S_k, the polynomials

\displaystyle{\{f_{s_1,\ldots,s_k,t}(x) \}_{t\in S_{k+1}}}

have a common interlacing.

If you think of f(x) as being a random polynomial drawn uniformly from the collection \{f_{s_1,\ldots,s_m}(x) \}_{s_1,\ldots,s_m}, then

\displaystyle{\mathbb{E}[f(x)]=\frac{1}{|S_1|\cdots|S_m|}f_\emptyset(x)}.

The other polynomials defined above can be thought of in terms of conditional expectation. That is, suppose you are told that the first k indices of f(x) are s_1,\ldots,s_k. Then

\displaystyle{\mathbb{E}[f(x)|s_1,\ldots,s_k]=\frac{1}{|S_{k+1}|\cdots|S_m|}f_{s_1,\ldots,s_k}(x)}.

Unfortunately, this is the extent of my intuition with interlacing families. I would like to relate them to eigensteps, for example, but interlacing families are very different because they involve partial sums of polynomials (as opposed to characteristic polynomials of partial sums of rank-1 matrices). Regardless, by the following result, interlacing families are just as useful as polynomials with a common interlacing:

Theorem (Theorem 4.4 in Part I). Suppose \{f_{s_1,\ldots,s_m}(x)\}_{s_1,\ldots,s_m} form an interlacing family. Then there exists an m-tuple (s_1,\ldots,s_m) such that the largest root of f_{s_1,\ldots,s_m}(x) is at most the largest root of f_\emptyset(x).

Proof: By assumption, \{f_t(x)\}_{t\in S_1} has a common interlacing, and so by the first lemma, there is a polynomial (f_{s_1}(x), say) whose largest root is at most the largest root of

\displaystyle{\sum_{t\in S_1}f_t(x)=f_\emptyset(x)}.

Next, we know \{f_{s_1,t}(x)\}_{t\in S_2} has a common interlacing, and so by the first lemma, there is a polynomial (f_{s_1,s_2}(x), say) whose largest root is at most the largest root of

\displaystyle{\sum_{t\in S_2}f_{s_1,t}(x)=f_{s_1}(x)},

which in turn is no larger than the largest root of f_\emptyset(x). The result follows by induction.     \Box

At this point, we provide the two-step “method of interlacing families” which MSS use in both papers:

  1. Prove that the given polynomials form an interlacing family (using real stable polynomials).
  2. Find f_\emptyset(x) and produce an upper bound on its largest root.

— 4. How to prove Kadison-Singer —

It was a little disingenuous of me to suggest in the previous section that the polynomials are “given” to you. Indeed, there is a bit on ingenuity that goes in to actually finding the polynomials of interest. For the sake of example (and because it motivated me to write this blog entry in the first place), I will sketch how Marcus, Spielman and Srivastava use the method of interlacing families to prove Kadison-Singer.

Actually, they prove an equivalent version of Kadision-Singer called Weaver’s conjecture: There exists r\geq2\eta\geq 2 and \theta>0 such that every finite-dimensional \eta-tight frame with frame elements of norm at most 1 can be partitioned into r subcollections of vectors such that the frame operator of each subcollection has operator norm at most \eta-\theta. Apparently, it suffices to take r=2, \eta=18 and \theta=2. This follows from the following result:

Theorem (Corollary 1.3 in Part II). Let u_1,\ldots, u_m be column vectors in \mathbb{C}^d such that \sum_{i=1}^m u_iu_i^*=I and \|u_i\|^2\leq\alpha for all i. Then there exists a partition T_1\sqcup T_2=\{1,\ldots,m\} such that

\displaystyle{\bigg\|\sum_{i\in T_j}u_iu_i^*\bigg\|\leq\frac{(1+\sqrt{2\alpha})^2}{2}}

for both j=1 and j=2.

Even after seeing the result they proved, it’s not obvious what the polynomials are, but we’re getting there. Let v_1,\ldots,v_m be independent random column vectors in \mathbb{C}^{2d}. Each v_i has 2d entries, so think of the top and bottom halves of this vector to define the distribution: With probability 1/2, the top half is \sqrt{2}u_i and the bottom half is all zeros, and the rest of the time the top half is all zeros and the bottom half is \sqrt{2}u_i. In this way, each vector u_i is randomly lifted to a doubly-dimensional vector (qualitatively, this bears some resemblance to the random 2-lifts of graphs described in Part I).

Notice that \sum_{i=1}^m v_iv_i^* is self adjoint and positive semidefinite, and so its operator norm is precisely the largest root of its characteristic polynomial. Since there are 2^m possible realizations of \sum_{i=1}^m v_iv_i^*, we have a total of 2^m characteristic polynomials to analyze; these are the polynomials we are “given.” In Section 4 of Part II, they use real stable polynomials to show that these 2^m polynomials form an interlacing family. The most technical portion of Part II (namely, Section 5) is dedicated to bounding the largest root of f_\emptyset(x) in this case, and they manage to bound it by (1+\sqrt{2\alpha})^2.

With this, the last theorem implies that the largest root of one of the 2^m polynomials is at most (1+\sqrt{2\alpha})^2. In other words, there exists a subset T_1\subseteq\{1,\ldots,m\} such that

\displaystyle{2\max\bigg\{\bigg\|\sum_{i\in T_1}u_iu_i^*\bigg\|,\bigg\|\sum_{i\not\in T_1}u_iu_i^*\bigg\|\bigg\}=\bigg\|\sum_{i\in T_1}\binom{\sqrt{2}u_i}{0}\binom{\sqrt{2}u_i}{0}^*+\sum_{t\not\in T_1}\binom{0}{\sqrt{2}u_i}\binom{0}{\sqrt{2}u_i}^*\bigg\|\leq(1+\sqrt{2\alpha})^2}.

As such, the result follows by taking T_2:=\{1,\ldots,m\}\setminus T_1.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s