# An introduction to interlacing families

Perhaps you heard on Gil Kalai’s blog, but the 54-year-old Kadison-Singer problem was recently solved by Adam Marcus, Dan Spielman and Nikhil Srivastava. The paper is fantastically short, and comes as a sequel to another recent paper which used similar methods to solve another important problem: For every degree $d>2$, there exists an infinite family of $d$-regular bipartite Ramanujan graphs. Considering the gravity of these results, it’s probably wise to get a grasp of the techniques that are used to prove them. The main technique they use (“the method of interlacing families”) is actually new, and so I decided to write a blog post to discuss it (and force myself to understand it).

— 1. A new probabilistic method —

The main idea is that you’re given a list of polynomials, and you’re asked to establish that the largest real root of one of these polynomials is smaller than something. This is very similar to the sort of problem that the probabilistic method could be used to solve: Suppose you have a huge collection $S$ of real numbers, and you want to see how small these numbers can be. Define a random variable $X$ to be uniformly distributed over $S$ and compute the expected value. Then you know that there is some member of $S$ which is no larger than $\mathbb{E}[X]$. Another way to say this is the minimum is always less than or equal to the average. Functional analysts use this pigeonhole principle-type argument all the time without consciously acknowledging that they are using the probabilistic method. Of course, the probabilistic method can be formulated as “just counting,” but oftentimes, it is easier to think in terms of probability.

Now if you’re given a collection of polynomials, you might be inclined to define a uniform distribution over this collection, and then find the “average” polynomial, that is, the polynomial whose $k$th coefficient is the average of the $k$th coefficients. But you’re asked to establish that there is a polynomial in the collection whose largest real root is smaller than something. How can you relate this to the roots of the average polynomial? In general, there is no correspondence, but the following lemma reveals an interesting case in which something can actually be said:

Lemma (Lemma 4.2 in Part I, essentially). Take a finite collection of polynomials $\{f_i(x)\}_i$, each with all real coefficients, a positive leading coefficient, and at least one real root.  Let $R_i$ and $r_i$ be the largest and second largest real roots (respectively) of $f_i(x)$; if $f_i(x)$ only has one real root, take $r_i$ to be $-\infty$. If

$\displaystyle{\max_j r_j\leq \min_j R_j}$,

then $f_\emptyset(x):=\sum_j f_j(x)$ has a real root (say, $R$) which satisfies $\min_j R_j \leq R$.

Note that the average polynomial is just a scaled version of $f_\emptyset(x)$, which will have the same roots as $f_\emptyset(x)$. In practice, you will use the largest real root of $f_\emptyset(x)$ as the upper bound on $\min_j R_j$.  The coolest thing about this result is you could assign the proof as homework in an basic real analysis class.

Proof of Lemma: View each polynomial $f_i(x)$ as a function over the real line. Since the leading coefficient of $f_i(x)$ is positive, there exists $y_1$ such that $f_i(y)>0$ for every $y\geq y_1$ and every $i$.  Taking $y_0:=\min_j R_j$, we then have $r_i\leq y_0 \leq R_i\leq y_1$. We claim that $f_i(y_0)\leq 0$. Suppose otherwise, and consider two cases:

Case I: There exists $y\in(y_0,R_i)$ such that $f_i(y)\leq 0$. Then either $y$ is a root of $f_i(x)$ in $(y_0,R_i)$, or $f_i(x)$ has a root in $(y_0,R_i)$ by the intermediate value theorem. This contradicts the fact that the second largest root of $f_i(x)$ satisfies $r_i\leq y_0$.

Case II: $f_i$ is strictly positive over $(y_0,R_i)$. Note that $f_i$ is also strictly positive over $(R_i,\infty)$, since otherwise $f_i(y_1)>0$ implies that $R_i$ is not the largest root by the intermediate value theorem. Since $f_i$ is differentiable, $f_i'(R_i)$ exists, and we must have $f_i'(R_i)=0$, since otherwise there is a point $y$ in a neighborhood of $R_i$ for which $f_i(y)<0$. Writing out the definition of the derivative, and taking $g_i(x)$ to be the polynomial such that $f_i(x)=(x-R_i)g_i(x)$, we have

$\displaystyle{0=f_i'(R_i)=\lim_{y\rightarrow R_i}\frac{f_i(y)-f_i(R_i)}{y-R_i}=\lim_{y\rightarrow R_i}\frac{f_i(y)}{y-R_i}=g_i(R_i)}$.

As such, $R_i$ is also a root of $g_i(x)$, and so $r_i=R_i$.  Combined with $r_i\leq y_0\leq R_i$, we then have that $y_0=R_i$, which contradicts the assumption that $f_i(y_0)>0$.

To summarize up to this point, we have $f_i(y_0)\leq 0$ for every $i$, where $y_0:=\min_j R_j$, and also $f_i(y_1)>0$ for every $i$.  By addition, it then follows that $f_\emptyset(y_0)\leq 0, meaning $f_\emptyset(x)$ has a root $R\in[y_0,y_1)$ by the intermediate value theorem.     $\Box$

Note that I presented this result differently from the original, mostly because I wanted to weaken the hypothesis to improve my understanding. The main takeaway is you want there to be a “caste system” on the top roots in order for this new probabilistic method to work.

— 2. A useful condition —

Consider the following definition:

Definition. Take a finite collection of polynomials $\{f_i(x)\}_i$ of degree $n$ with all real roots. Let $\beta_1^{(i)}\leq\cdots\leq\beta_n^{(n)}$ denote the roots of $f_i(x)$. We say the polynomials $\{f_i(x)\}_i$ have a common interlacing if there exists $\{\alpha_k\}_{k=1}^{n-1}$ such that

$\displaystyle{\beta_1^{(i)}\leq \alpha_1\leq \beta_2^{(i)}\leq \alpha_2\leq \cdots\leq \alpha_{n-1}\leq \beta_n^{(i)}}$

for every $i$.

Note that $\{f_i(x)\}_i$ having a common interlacing implies that $\max_j\beta_{n-1}^{(j)}\leq \alpha_{n-1}\leq \min_j\beta_n^{(j)}$, and so such polynomials satisfy the hypothesis of the previous lemma (provided the leading coefficients are positive). Moreover, such polynomials are rather common, at least in linear algebra. Theorem 4.3.10 of Horn and Johnson’s Matrix Analysis gives the following example: Consider all self adjoint $n\times n$ matrices such that deleting the last row and column produces the same matrix $A$. Then the characteristic polynomials of these matrices have a common interlacing (the $\alpha_k$‘s in this case can be taken to be the eigenvalues of $A$).

Not only does this notion of common interlacing naturally occur in linear algebra, it also can be verified by testing the hypothesis in the following lemma:

Lemma (Proposition 1.35 in this paper, essentially). Take polynomials $f(x)$ and $g(x)$ of degree $n$ with positive leading coefficients such that, for every $\lambda\in[0,1]$, the polynomial $\lambda f(x)+(1-\lambda)g(x)$ has $n$ real roots. Then $f(x)$ and $g(x)$ have a common interlacing.

Proof: We only treat the special case where the $2n$ roots of $f(x)$ and $g(x)$ are all distinct (the degenerate cases can be treated using basic calculus-type arguments).  By assumption, $f(x)$ and $g(x)$ each have $n$ real roots, which form $2n$ points on the real line, and their complement in the real line forms $2n+1$ open intervals. Viewing $f$ and $g$ as functions over the real line, note that their signs are constant over each of these open intervals (this follows from the intermediate value theorem). In fact, ordering these intervals from right to left as $\{I_k\}_{k=0}^{2n}$, then since the polynomials have positive leading coefficients, both $f$ and $g$ are positive on $I_0$. Next, the largest of the $2n$ roots belongs to either $f(x)$ or $g(x)$ (but not both), and so $f$ and $g$ have opposite sign over $I_1$. Continuing in this way, $f$ and $g$ have common sign over even-indexed intervals and opposite sign over odd-indexed intervals. As such, addition gives that for any $\lambda>0$, every real root of $\lambda f(x)+(1-\lambda)g(x)$ must lie in an odd-indexed interval. Moreover, the (real) roots are continuous in $\lambda$, and so the boundary of each odd-indexed interval had better match a root of $f(x)$ with a root of $g(x)$. Picking a point from each of the even intervals then produces a common interlacing of $f(x)$ and $g(x)$.     $\Box$

Corollary. Take a finite collection of polynomials $\{f_i(x)\}_i$ of degree $n$ with positive leading coefficients such that every convex combination of these polynomials has $n$ real roots. Then the polynomials $\{f_i(x)\}_i$ have a common interlacing.

Proof: By the lemma, any pair has a common interlacing. This in mind, suppose $\{f_i(x)\}_i$ does not have a common interlacing. Then there is some $i,j,k$ such that the $k$th largest root of $f_i(x)$ is smaller than the $(k+1)$st largest root of $f_j(x)$, contradicting the fact that $f_i(x)$ and $f_j(x)$ have a common interlacing.     $\Box$

In discerning whether the new probabilistic method is valid for a given sequence of polynomials, the above result reduces this problem to verifying the real-rootedness of a continuum of polynomials. While this may seem particularly cumbersome at first glance, Marcus, Spielman and Srivastava have found success in verifying this (in both papers) using a theory of real stable polynomials. This leads to the most technical portions of both papers, and I plan to study these portions more carefully in the future.

— 3. The method of interlacing families —

In practice, the polynomials you are given may not have a common interlacing, but you might be able to organize them into something called an interlacing family.

Definition. Let $S_1,\ldots,S_m$ be finite sets, and use the $m$-tuples in $S_1\times \cdots\times S_m$ to index $|S_1|\cdots|S_m|$ polynomials. In particular, let each $f_{s_1,\ldots,s_m}(x)$ be a real-rooted degree-$n$ polynomial with a positive leading coefficient. For every $k and $s_1\in S_1,\ldots, s_k\in S_k$, define

$\displaystyle{f_{s_1,\ldots,s_k}(x):=\sum_{s_{k+1}\in S_{k+1}}\cdots\sum_{s_m\in S_m}f_{s_1,\ldots s_m}(x)}$.

For the degenerate case where $k=0$, we take $f_\emptyset(x)$ to be the sum of all of the $f_{s_1,\ldots,s_m}(x)$‘s. We say the polynomials $\{f_{s_1,\ldots,s_m}(x) \}_{s_1,\ldots,s_m}$ form an interlacing family if for every $k and $s_1\in S_1,\ldots, s_k\in S_k$, the polynomials

$\displaystyle{\{f_{s_1,\ldots,s_k,t}(x) \}_{t\in S_{k+1}}}$

have a common interlacing.

If you think of $f(x)$ as being a random polynomial drawn uniformly from the collection $\{f_{s_1,\ldots,s_m}(x) \}_{s_1,\ldots,s_m}$, then

$\displaystyle{\mathbb{E}[f(x)]=\frac{1}{|S_1|\cdots|S_m|}f_\emptyset(x)}$.

The other polynomials defined above can be thought of in terms of conditional expectation. That is, suppose you are told that the first $k$ indices of $f(x)$ are $s_1,\ldots,s_k$. Then

$\displaystyle{\mathbb{E}[f(x)|s_1,\ldots,s_k]=\frac{1}{|S_{k+1}|\cdots|S_m|}f_{s_1,\ldots,s_k}(x)}$.

Unfortunately, this is the extent of my intuition with interlacing families. I would like to relate them to eigensteps, for example, but interlacing families are very different because they involve partial sums of polynomials (as opposed to characteristic polynomials of partial sums of rank-1 matrices). Regardless, by the following result, interlacing families are just as useful as polynomials with a common interlacing:

Theorem (Theorem 4.4 in Part I). Suppose $\{f_{s_1,\ldots,s_m}(x)\}_{s_1,\ldots,s_m}$ form an interlacing family. Then there exists an $m$-tuple $(s_1,\ldots,s_m)$ such that the largest root of $f_{s_1,\ldots,s_m}(x)$ is at most the largest root of $f_\emptyset(x)$.

Proof: By assumption, $\{f_t(x)\}_{t\in S_1}$ has a common interlacing, and so by the first lemma, there is a polynomial ($f_{s_1}(x)$, say) whose largest root is at most the largest root of

$\displaystyle{\sum_{t\in S_1}f_t(x)=f_\emptyset(x)}$.

Next, we know $\{f_{s_1,t}(x)\}_{t\in S_2}$ has a common interlacing, and so by the first lemma, there is a polynomial ($f_{s_1,s_2}(x)$, say) whose largest root is at most the largest root of

$\displaystyle{\sum_{t\in S_2}f_{s_1,t}(x)=f_{s_1}(x)}$,

which in turn is no larger than the largest root of $f_\emptyset(x)$. The result follows by induction.     $\Box$

At this point, we provide the two-step “method of interlacing families” which MSS use in both papers:

1. Prove that the given polynomials form an interlacing family (using real stable polynomials).
2. Find $f_\emptyset(x)$ and produce an upper bound on its largest root.

— 4. How to prove Kadison-Singer —

It was a little disingenuous of me to suggest in the previous section that the polynomials are “given” to you. Indeed, there is a bit on ingenuity that goes in to actually finding the polynomials of interest. For the sake of example (and because it motivated me to write this blog entry in the first place), I will sketch how Marcus, Spielman and Srivastava use the method of interlacing families to prove Kadison-Singer.

Actually, they prove an equivalent version of Kadision-Singer called Weaver’s conjecture: There exists $r\geq2$$\eta\geq 2$ and $\theta>0$ such that every finite-dimensional $\eta$-tight frame with frame elements of norm at most 1 can be partitioned into $r$ subcollections of vectors such that the frame operator of each subcollection has operator norm at most $\eta-\theta$. Apparently, it suffices to take $r=2$, $\eta=18$ and $\theta=2$. This follows from the following result:

Theorem (Corollary 1.3 in Part II). Let $u_1,\ldots, u_m$ be column vectors in $\mathbb{C}^d$ such that $\sum_{i=1}^m u_iu_i^*=I$ and $\|u_i\|^2\leq\alpha$ for all $i$. Then there exists a partition $T_1\sqcup T_2=\{1,\ldots,m\}$ such that

$\displaystyle{\bigg\|\sum_{i\in T_j}u_iu_i^*\bigg\|\leq\frac{(1+\sqrt{2\alpha})^2}{2}}$

for both $j=1$ and $j=2$.

Even after seeing the result they proved, it’s not obvious what the polynomials are, but we’re getting there. Let $v_1,\ldots,v_m$ be independent random column vectors in $\mathbb{C}^{2d}$. Each $v_i$ has $2d$ entries, so think of the top and bottom halves of this vector to define the distribution: With probability $1/2$, the top half is $\sqrt{2}u_i$ and the bottom half is all zeros, and the rest of the time the top half is all zeros and the bottom half is $\sqrt{2}u_i$. In this way, each vector $u_i$ is randomly lifted to a doubly-dimensional vector (qualitatively, this bears some resemblance to the random 2-lifts of graphs described in Part I).

Notice that $\sum_{i=1}^m v_iv_i^*$ is self adjoint and positive semidefinite, and so its operator norm is precisely the largest root of its characteristic polynomial. Since there are $2^m$ possible realizations of $\sum_{i=1}^m v_iv_i^*$, we have a total of $2^m$ characteristic polynomials to analyze; these are the polynomials we are “given.” In Section 4 of Part II, they use real stable polynomials to show that these $2^m$ polynomials form an interlacing family. The most technical portion of Part II (namely, Section 5) is dedicated to bounding the largest root of $f_\emptyset(x)$ in this case, and they manage to bound it by $(1+\sqrt{2\alpha})^2$.

With this, the last theorem implies that the largest root of one of the $2^m$ polynomials is at most $(1+\sqrt{2\alpha})^2$. In other words, there exists a subset $T_1\subseteq\{1,\ldots,m\}$ such that

$\displaystyle{2\max\bigg\{\bigg\|\sum_{i\in T_1}u_iu_i^*\bigg\|,\bigg\|\sum_{i\not\in T_1}u_iu_i^*\bigg\|\bigg\}=\bigg\|\sum_{i\in T_1}\binom{\sqrt{2}u_i}{0}\binom{\sqrt{2}u_i}{0}^*+\sum_{t\not\in T_1}\binom{0}{\sqrt{2}u_i}\binom{0}{\sqrt{2}u_i}^*\bigg\|\leq(1+\sqrt{2\alpha})^2}$.

As such, the result follows by taking $T_2:=\{1,\ldots,m\}\setminus T_1$.