# Deterministic RIP matrices: Breaking the square-root bottleneck, II

In my previous blog entry, I kicked off an online project to make progress on the deterministic RIP matrix problem by optimizing the analysis in a breakthrough paper by Bourgain et al. Since then, I launched a wiki to maintain a “scoreboard” of world records for this problem, and Afonso Bandeira made his own contribution to the project. The purpose of this entry is to introduce some additive combinatorics and to develop some understanding of how they are leveraged to prove RIP for a particular matrix construction. While this entry does not immediately produce a new world record, it does optimize the analysis of Corollaries 3 and 4 of the paper. I plan to generalize Lemmas 9 and 10 in a future blog post, which will lead to an explicit optimization problem that incorporates the work in this entry (and a new world record, hopefully).

— A brief introduction to additive combinatorics —

In this section, we briefly detail some key ideas from additive combinatorics. Given an additive group $G$ and finite sets $A,B\subseteq G$, we can define the sumset

$A+B:=\{a+b:a\in A,~b\in B\}$,

the difference set

$A-B:=\{a-b:a\in A,~b\in B\}$

(not to be confused with a difference set), and the additive energy

$E(A,B):=\#\big\{(a_1,a_2,b_1,b_2)\in A^2\times B^2:a_1+b_1=a_2+b_2\big\}$.

These definitions are useful in quantifying the additive structure of a set. In particular, consider the following:

Lemma. A nonempty subset $A$ of some additive group $G$ satisfies the following inequalities:

(i) $|A+A|\geq|A|$

(ii) $|A-A|\geq|A|$

(iii) $E(A,A)\leq|A|^3$

with equality precisely when $A$ is a translate of some subgroup of $G$.

Proof: For (i), pick $a\in A$. Then $|A+A|\geq|A+a|=|A|$. Considering

$\displaystyle{A+A=\bigcup_{a\in A}(A+a)}$,

we have equality in (i) precisely when $A+A=A+a$ for every $a\in A$. Equivalently, given $a_0\in A$, then for every $a\in A$, addition by $a-a_0$ permutes the members of $A+a_0$. This is further equivalent to the following: Given $a_0\in A$, then for every $a\in A$, addition by $a-a_0$ permutes the members of $A-a_0$. It certainly suffices for $H:=A-a_0$ to be a group, and it is a simple exercise to verify that this is also necessary.

The proof for (ii) is similar.

For (iii), we note that

$E(A,A)=\#\big\{(a,b,c)\in A^3:a+b-c\in A\big\}\leq|A|^3$,

with equality precisely when $A$ has the property that $a+b-c\in A$ for every $a,b,c\in A$. Again, it clearly suffices for $A-a_0$ to be a group, and necessity is a simple exercise.     $\Box$

The notion of additive structure is somewhat intuitive. You should think of a translate of a subgroup as having maximal additive structure. When the bounds (i), (ii) and (iii) are close to being achieved by $A$ (e.g., $A$ is an arithmetic progression), you should think of $A$ as having a lot of additive structure. Interestingly, while there are different measures of additive structure (e.g., $|A-A|$ and $E(A,A)$), they often exhibit certain relationships (perhaps not surprisingly). The following is an example of such a relationship which is used throughout the paper by Bourgain et al.:

Lemma (Corollary 1). If $E(A,A)\geq|A|^3/K$, then there exists a set $A'\subseteq A$ such that $|A'|\geq|A|/(20K)$ and $|A'-A'|\leq10^7K^9|A|$.

In words, a set with a lot of additive energy necessarily has a large subset with a small difference set. This is proved using a version of the Balog–Szemeredi–Gowers lemma.

If translates of subgroups have maximal additive structure, then which sets have minimal additive structure? It turns out that random sets tend to (nearly) have this property, and one way to detect low additive structure is Fourier bias:

$\displaystyle{\|A\|_u:=\max_{\substack{\theta\in G\\\theta\neq0}}|\widehat{1_A}(\theta)|}$,

where the Fourier transform here has a $\frac{1}{|G|}$ factor out front (it’s not unitary). Indeed, $\|A\|_u$ captures how far $E(A,A)$ is from its minimal value $|A|^4/|G|$:

Lemma. For any subset $A$ of a finite additive group $G$, we have

(i) $\displaystyle{E(A,A)\geq\frac{|A|^4}{|G|}}$, and

(ii) $\displaystyle{\|A\|_u^4\leq\frac{1}{|G|^3}\Big(E(A,A)-\frac{|A|^4}{|G|}\Big)\leq\frac{|A|}{|G|}\|A\|_u^2}$.

Proof: Define $\lambda_x:=\#\{(a,a')\in A^2:a-a'=x\}$. Then (i) follows from Cauchy-Schwarz:

$\displaystyle{|A|^2=\sum_{x\in G}\lambda_x\leq|G|^{1/2}\|\lambda\|_2=\big(|G|E(A,A)\big)^{1/2}}.$

We will prove (ii) assuming $G=\mathbb{Z}/n\mathbb{Z}$, but the proof generalizes. Denote $e_n(x):=e^{2\pi ix/n}$. For the left-hand inequality, we consider

$\displaystyle{\sum_{\theta\in G}|\widehat{1_A}(\theta)|^4=\sum_{\theta\in G}\bigg|\frac{1}{|G|}\sum_{a\in A}e_n(-\theta a)\bigg|^4=\frac{1}{|G|^4}\sum_{\theta\in G}\bigg|\sum_{x\in G}\lambda_xe_n(-\theta x)\bigg|^2}$,

where the last step is by expanding $|w|^2=w\overline{w}$. Then Parseval’s identity simplifies this to $\frac{1}{|G|^3}\|\lambda\|_2^2=\frac{1}{|G|^3}E(A,A)$. We use this to bound $\|A\|_u^4$:

$\displaystyle{\|A\|_u^4=\max_{\substack{\theta\in G\\\theta\neq0}}|\widehat{1_A}(\theta)|^4\leq\sum_{\substack{\theta\in G\\\theta\neq0}}|\widehat{1_A}(\theta)|^4=\frac{1}{|G|^3}E(A,A)-\frac{|A|^4}{|G|^4}}$.

For the right-hand inequality, we apply Parseval’s identity:

$\displaystyle{E(A,A)=\sum_{x\in G}\lambda_x^2=\frac{1}{|G|}\sum_{\theta\in G}\bigg|\sum_{x\in G}\lambda_xe_n(-\theta x)\bigg|^2=\frac{|A|^4}{|G|}+\frac{1}{|G|}\sum_{\substack{\theta\in G\\\theta\neq0}}\bigg|\sum_{x\in G}\lambda_xe_n(-\theta x)\bigg|^2}$

From here, we apply the expansion $|w|^2=w\overline{w}$

$\displaystyle{\bigg|\sum_{a\in A}e_n(-\theta a)\bigg|^2=\sum_{x\in G}\lambda_xe_n(-\theta x)}$

to continue:

$\displaystyle{\sum_{\substack{\theta\in G\\\theta\neq0}}\bigg|\sum_{x\in G}\lambda_xe_n(-\theta x)\bigg|^2=\sum_{\substack{\theta\in G\\\theta\neq0}}\bigg|\sum_{a\in A}e_n(-\theta a)\bigg|^4\leq\sum_{\substack{\theta\in G\\\theta\neq0}}\big(|G|\|A\|_u\big)^2\bigg|\sum_{a\in A}e_n(-\theta a)\bigg|^2}$.

Applying Parseval’s identity then gives

$\displaystyle{E(A,A)\leq\frac{|A|^4}{|G|}+\big(|G|\|A\|_u\big)^2\cdot\frac{1}{|G|}\sum_{\theta\in G}\bigg|\sum_{a\in A}e_n(-\theta a)\bigg|^2=\frac{|A|^4}{|G|}+|G|^2\|A\|_u^2|A|}$,

which is a rearrangement of the right-hand inequality.     $\Box$

— The matrix construction —

Now that we’ve covered the basics of additive combinatorics, let’s consider how these ideas are useful to the pursuit of deterministic RIP matrices. The main idea is that we desire a Gram matrix $\Phi^*\Phi$ whose entries exhibit cancellations for weak flat RIP (see my previous post). By the last lemma, we can control cancellations of complex exponentials

$\displaystyle{\bigg|\sum_{a\in A}e_n(\theta a)\bigg|\leq|G|\|A\|_u,\qquad\theta\neq0}$

in terms of the additive energy of the index set $A$. This motivates us to pursue a Gram matrix whose entries are complex exponentials. To this end, consider the following vector:

$\displaystyle{u_{a,b}:=\frac{1}{\sqrt{p}}\Big(e_p(ax^2+bx)\Big)_{x\in\mathbb{F}_p}}$,

where $p$ is prime and $\mathbb{F}_p$ denotes the field of size $p$. Such vectors are called chirps, and they are used in a variety of applications including radar. Here, we are mostly interested in the form of their inner products. If $a_1=a_2$, then $\langle u_{a_1,b_1},u_{a_2,b_2}\rangle=\delta_{b_1,b_2}$ by the geometric sum formula. Otherwise, the inner product is much more interesting:

$\displaystyle{\langle u_{a_1,b_1},u_{a_2,b_2}\rangle=\frac{1}{p}\sum_{x\in\mathbb{F}_p}e_p\Big((a_1-a_2)x^2+(b_1-b_2)x\Big)}$.

Since $a_1-a_2\neq0$, we can complete the square in the exponent, and changing variables to $y:=x+(b_1-b_2)/(2(a_1-a_2))$ gives

$\displaystyle{\langle u_{a_1,b_1},u_{a_2,b_2}\rangle=\frac{1}{p}e_p\bigg(-\frac{(b_1-b_2)^2}{4(a_1-a_2)}\bigg)\sum_{y\in\mathbb{F}_p}e_p\Big((a_1-a_2)y^2\Big)}$.

Finally, this can be simplified using a quadratic Gauss sum formula:

$\displaystyle{\langle u_{a_1,b_1},u_{a_2,b_2}\rangle=\frac{\sigma_p}{\sqrt{p}}\bigg(\frac{a_1-a_2}{p}\bigg)e_p\bigg(-\frac{(b_1-b_2)^2}{4(a_1-a_2)}\bigg)}$,

where $\sigma_p$ is $1$ or $i$ (depending on whether $p$ is $1$ or $3\bmod 4$) and $(\frac{a_1-a_2}{p})$ is a Legendre symbol, taking value $\pm1$ depending on whether $a_1-a_2$ is a perfect square $\bmod p$. Modulo these factors, the above inner product is a complex exponential, and since we want these in our Gram matrix $\Phi^*\Phi$, we will take $\Phi$ to have columns of the form $u_{a,b}$ — in fact, the columns will be $\{u_{a,b}\}_{(a,b)\in\mathcal{A}\times\mathcal{B}}$ for some well-designed sets $\mathcal{A},\mathcal{B}\subseteq\mathbb{F}_p$.

For weak flat RIP, we want to bound the following quantity for $\Omega_1,\Omega_2\subseteq\mathcal{A}\times\mathcal{B}$ with $|\Omega_1|,|\Omega_2|\leq\sqrt{p}$:

$\displaystyle{\bigg|\bigg\langle\sum_{(a_1,b_1)\in\Omega_1}u_{a_1,b_1},\sum_{(a_2,b_2)\in\Omega_2}u_{a_2,b_2}\bigg\rangle\bigg|}$.

For $i=1,2$, define

• $A_i:=\{a:\exists b\mbox{ s.t. }(a,b)\in\Omega_i\}$ and
• $\Omega_i(a):=\{b:(a,b)\in\Omega_i\}$.

These provide an alternate expression for the quantity of interest:

$\displaystyle{\bigg|\sum_{(a_1,b_1)\in\Omega_1}\sum_{(a_2,b_2)\in\Omega_2}\langle u_{a_1,b_1},u_{a_2,b_2}\rangle\bigg|=\bigg|\sum_{\substack{a_1\in A_1\\a_2\in A_2}}\sum_{\substack{b_1\in\Omega_1(a_1)\\b_2\in\Omega_2(a_2)}}\langle u_{a_1,b_1},u_{a_2,b_2}\rangle\bigg|}$,

which admits the following bound by the triangle inequality:

$\displaystyle{\leq\sum_{\substack{a_1\in A_1\\a_2\in A_2}}\bigg|\sum_{\substack{b_1\in\Omega_1(a_1)\\b_2\in\Omega_2(a_2)}}\langle u_{a_1,b_1},u_{a_2,b_2}\rangle\bigg|=\frac{1}{\sqrt{p}}\sum_{\substack{a_1\in A_1\\a_2\in A_2}}\bigg|\sum_{\substack{b_1\in\Omega_1(a_1)\\b_2\in\Omega_2(a_2)}}e_p\bigg(-\frac{(b_1-b_2)^2}{4(a_1-a_2)}\bigg)\bigg|}$.

Pleasingly, it now suffices to bound a sum of complex exponentials, which we feel equipped to do using additive combinatorics. The following lemma does precisely this (it can be viewed as an analog of the previous lemma).

Lemma (Lemma 9). For every $\theta\in\mathbb{F}_p^*$ and $B_1,B_2\subseteq\mathbb{F}_p$, we have

$\displaystyle{\bigg|\sum_{\substack{b_1\in B_1\\b_2\in B_2}}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|\leq|B_1|^{1/2}E(B_1,B_1)^{1/8}|B_2|^{1/2}E(B_2,B_2)^{1/8}p^{1/8}}$.

Proof: First, Cauchy-Schwarz gives

$\displaystyle{\bigg|\sum_{\substack{b_1\in B_1\\b_2\in B_2}}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^2=\bigg|\sum_{b_1\in B_1}1\cdot\sum_{b_2\in B_2}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^2}$

$\displaystyle{\leq|B_1|\sum_{b_1\in B_1}\bigg|\sum_{b_2\in B_2}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^2}$.

Expanding $|w|^2=w\overline{w}$ and rearranging then gives an alternate expression for the right-hand side:

$\displaystyle{|B_1|\sum_{b_2,b_2'\in B_2}e_p\Big(\theta(b_2^2-(b_2')^2)\Big)\overline{\sum_{b_1\in B_1}e_p\Big(\theta(2b_1(b_2-b_2'))\Big)}}$.

Applying Cauchy-Schwarz again, we then have

$\displaystyle{\bigg|\sum_{\substack{b_1\in B_1\\b_2\in B_2}}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^4\leq|B_1|^2|B_2|^2\sum_{b_2,b_2'\in B_2}\bigg|\sum_{b_1\in B_1}e_p\Big(\theta(2b_1(b_2-b_2'))\Big)\bigg|^2}$,

and expanding $|w|^2=w\overline{w}$ this time gives

$\displaystyle{|B_1|^2|B_2|^2\sum_{\substack{b_1,b_1'\in B_1\\b_2,b_2'\in B_2}}e_p\Big(2\theta(b_1-b_1')(b_2-b_2')\Big)}$.

At this point, it is convenient to change variables, namely, $x=b_1-b_1'$ and $y=b_2-b_2'$:

$\displaystyle{\bigg|\sum_{\substack{b_1\in B_1\\b_2\in B_2}}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^4\leq |B_1|^2|B_2|^2\sum_{x,y\in\mathbb{F}_p}\lambda_x\mu_ye_p(2\theta xy),\qquad(1)}$

where $\lambda_x:=\#\{(b_1,b_1')\in B_1^2:b_1-b_1'=x\}$ and similarly for $\mu_y$ in terms of $B_2$. We now apply Cauchy-Schwarz again to bound the sum in (1):

$\displaystyle{\bigg|\sum_{x\in\mathbb{F}_p}\lambda_x\sum_{y\in\mathbb{F}_p}\mu_ye_p(2\theta xy)\bigg|^2\leq\|\lambda\|_2^2\sum_{x\in\mathbb{F}_p}\bigg|\sum_{y\in\mathbb{F}_p}\mu_ye_p(2\theta xy)\bigg|^2}$,

and changing variables $x':=-2\theta x$ (this change is invertible since $\theta\neq0$), we see that the right-hand side is a sum of squares of the Fourier coefficients of $\mu$. As such, Parseval’s identity gives the following simplification:

$\displaystyle{\bigg|\sum_{x,y\in\mathbb{F}_p}\lambda_x\mu_ye_p(2\theta xy)\bigg|^2\leq p\|\lambda\|_2^2\|\mu\|_2^2=pE(B_1,B_1)E(B_2,B_2)}$.

Applying this bound to (1) gives the result.     $\Box$

— How to construct $\mathcal{B}$

The previous lemma enables us to prove weak-flat-RIP-type cancellations in cases where $\Omega_1(a_1),\Omega_2(a_2)\subseteq\mathcal{B}$ both lack additive structure. Indeed, the method of this paper is to do precisely this, and the remaining cases (where either $\Omega_1(a_1)$ or $\Omega_2(a_2)$ has more additive structure) will find cancellations by accounting for the dilation weights $1/(a_1-a_2)$. This seems to contradict the paragraph after (2.9) in the journal version of the paper (this paragraph appears to suffer from several typos) — here is what I suspect should be the corrected version:

… To prove the cancellation in (2.8), we basically split it into two cases: (i) when $B'=\Omega_i(a_i)$ has less additive structure (i.e., $E(B',B')$ is small), the cancellation comes from the sum over $b_1$, $b_2$ (with $a_1$, $a_2$ fixed), and (ii) when $B'$ has more additive structure, one gets dispersion of the phases from the dilation weights $1/(a_1-a_2)$ (taking a large moment and using (2.4)). Incidentally, oscillations of the factor $(\frac{a_1-a_2}{p})$ play no role in the argument.

Overall, we will be very close to proving that $\Phi$ is RIP if most subsets of $\mathcal{B}$ lack additive structure. To this end, the authors actually prove something much stronger: They design $\mathcal{B}$ in such a way that all sufficiently large subsets have low additive structure. The following theorem is the first step in the design:

Theorem (Theorem 5). Fix $r,M\in\mathbb{N}$, $M\geq2$, and define the cube $\mathcal{C}:=\{0,\ldots,M-1\}^r\subseteq\mathbb{Z}^r$. Let $\tau$ denote the solution to the equation

$\displaystyle{\Big(\frac{1}{M}\Big)^{2\tau}+\Big(\frac{M-1}{M}\Big)^{\tau}=1}$.

Then for any subsets $A,B\subseteq\mathcal{C}$, we have

$|A+B|\geq\big(|A||B|\big)^\tau$.

As a consequence of this theorem (taking $A=B$), we have $|A+A|\geq|A|^{2\tau}$ for every $A\subseteq\mathcal{C}$, and since $\tau>1/2$, this means that large subsets $A$ have $|A+A|\gg|A|$, indicating low additive structure. However, $\mathcal{C}$ is a subset of the group $\mathbb{Z}^r$, whereas we need to construct a subset $\mathcal{B}$ of $\mathbb{F}_p$. The trick here is to pick $\mathcal{B}$ so that it inherits the additive structure of $\mathcal{C}$, and we use a Freiman isomorphism to accomplish this. In particular, we want a mapping $\varphi\colon\mathcal{C}\rightarrow\mathbb{F}_p$ such that $c_1+c_2=c_3+c_4$ if and only if $\varphi(c_1)+\varphi(c_2)=\varphi(c_3)+\varphi(c_4)$, and we will take $\mathcal{B}:=\varphi(\mathcal{C})$ — this is what it means for $\mathcal{C}$ and $\mathcal{B}$ to be Freiman isomorphic, and it’s easy to see that Freiman isomorphic sets have the same sized sumsets, difference sets and additive energy. In this case, it suffices to take

$\displaystyle{\mathcal{B}:=\bigg\{\sum_{j=1}^{r}x_j(2M)^{j-1}:x_1,\ldots,x_r\in\{0,\ldots,M-1\}\bigg\}}$.

Indeed, the $2M$-ary expansion of $b_1,b_2\in\mathcal{B}$ reveals the $c_1,c_2\in\mathcal{C}$ such that $\varphi(c_1)=b_1$ and $\varphi(c_2)=b_2$. Also, adding $b_1$ and $b_2$ incurs no carries, so the expansion of $b_1+b_2$ reveals $c_1+c_2$ (even when $c_1+c_2\not\in\mathcal{C}$).

We already know that large subsets of $\mathcal{C}$ (and $\mathcal{B}$) exhibit low additive structure, but the above theorem only gives this in terms of the sumset, whereas Lemma 9 requires low additive structure in terms of additive energy. As such, we will first convert the above theorem into a statement about difference sets, and then apply Corollary 1 to further convert it in terms of additive energy:

Corollary (essentially Corollary 3). Fix $r$, $M$ and $\tau$ according to Theorem 5, take $\mathcal{B}$ as defined above, and pick $s$ and $t$ such that $(2\tau-1)s\geq t$. Then every subset $B\subseteq\mathcal{B}$ such that $|B|>p^s$ satisfies $|B-B|>p^t|B|$.

Proof: First note that $-B$ is a translate of some other set $B'\subseteq\mathcal{B}$. Explicitly, if $b_0=\sum_{j=1}^r(M-1)(2M)^{j-1}$, then we can take $B':=b_0-B$. As such, Theorem 5 gives

$|B-B|=|B+B'|\geq|B|^{2\tau}=|B|^{2\tau-1}|B|>p^{(2\tau-1)s}|B|\geq p^t|B|.\quad\Box$

Corollary (essentially Corollary 4). Fix $r$, $M$ and $\tau$ according to Theorem 5, take $\mathcal{B}$ as defined above, and pick $\gamma$ and $\ell$ such that $(2\tau-1)(\ell-\gamma)\geq 10\gamma$. Then for every $\epsilon>0$, there exists $P$ such that for every $p\geq P$, every subset $S\subseteq\mathcal{B}$ with $|S|>p^\ell$ satisfies $E(S,S).

Proof: Suppose to the contrary that there exists $\epsilon>0$ such that there are arbitrarily large $p$ for which there is a subset $S\subseteq\mathcal{B}$ with $|S|>p^\ell$ and $E(S,S)\geq p^{-\gamma+\epsilon}|S|^3$. Writing $E(S,S)=|S|^3/K$, then $K\leq p^{\gamma-\epsilon}$. By Corollary 1, there exists $B\subseteq S$ such that, for sufficiently large $p$,

$|B|\geq|S|/(20K)>\frac{1}{20}p^{\ell-\gamma+\epsilon}>p^{\ell-\gamma}$,

and

$|B-B|\leq10^7K^9|S|\leq10^7K^9(20K|B|)\leq10^7\cdot20\cdot p^{10(\gamma-\epsilon)}|B|.

However, this contradicts the previous corollary with $s=\ell-\gamma$ and $t=10\gamma$.     $\Box$

Notice that we can weaken our requirements on $\gamma$ and $\ell$ if we had a version of Corollary 1 with a smaller exponent on $K$. This exponent comes from a version of the Balog–Szemeredi–Gowers lemma (Lemma 6), which follows from the proof of Lemma 2.2 of this paper. (Specifically, take $A=B$, and you need to change $A-_EB$ to $A+_EB$, but this change doesn’t affect the proof.) The authors indicate that it would be desirable to prove a better version of this lemma, but I’m not sure how easy that will be. If anything, one should probably try to prove Corollary 1 directly (instead of going through Lemma 6).