Deterministic RIP matrices: Breaking the square-root bottleneck, II

In my previous blog entry, I kicked off an online project to make progress on the deterministic RIP matrix problem by optimizing the analysis in a breakthrough paper by Bourgain et al. Since then, I launched a wiki to maintain a “scoreboard” of world records for this problem, and Afonso Bandeira made his own contribution to the project. The purpose of this entry is to introduce some additive combinatorics and to develop some understanding of how they are leveraged to prove RIP for a particular matrix construction. While this entry does not immediately produce a new world record, it does optimize the analysis of Corollaries 3 and 4 of the paper. I plan to generalize Lemmas 9 and 10 in a future blog post, which will lead to an explicit optimization problem that incorporates the work in this entry (and a new world record, hopefully).

— A brief introduction to additive combinatorics —

In this section, we briefly detail some key ideas from additive combinatorics. Given an additive group G and finite sets A,B\subseteq G, we can define the sumset

A+B:=\{a+b:a\in A,~b\in B\},

the difference set

A-B:=\{a-b:a\in A,~b\in B\}

(not to be confused with a difference set), and the additive energy

E(A,B):=\#\big\{(a_1,a_2,b_1,b_2)\in A^2\times B^2:a_1+b_1=a_2+b_2\big\}.

These definitions are useful in quantifying the additive structure of a set. In particular, consider the following:

Lemma. A nonempty subset A of some additive group G satisfies the following inequalities:

(i) |A+A|\geq|A|

(ii) |A-A|\geq|A|

(iii) E(A,A)\leq|A|^3

with equality precisely when A is a translate of some subgroup of G.

Proof: For (i), pick a\in A. Then |A+A|\geq|A+a|=|A|. Considering

\displaystyle{A+A=\bigcup_{a\in A}(A+a)},

we have equality in (i) precisely when A+A=A+a for every a\in A. Equivalently, given a_0\in A, then for every a\in A, addition by a-a_0 permutes the members of A+a_0. This is further equivalent to the following: Given a_0\in A, then for every a\in A, addition by a-a_0 permutes the members of A-a_0. It certainly suffices for H:=A-a_0 to be a group, and it is a simple exercise to verify that this is also necessary.

The proof for (ii) is similar.

For (iii), we note that

E(A,A)=\#\big\{(a,b,c)\in A^3:a+b-c\in A\big\}\leq|A|^3,

with equality precisely when A has the property that a+b-c\in A for every a,b,c\in A. Again, it clearly suffices for A-a_0 to be a group, and necessity is a simple exercise.     \Box

The notion of additive structure is somewhat intuitive. You should think of a translate of a subgroup as having maximal additive structure. When the bounds (i), (ii) and (iii) are close to being achieved by A (e.g., A is an arithmetic progression), you should think of A as having a lot of additive structure. Interestingly, while there are different measures of additive structure (e.g., |A-A| and E(A,A)), they often exhibit certain relationships (perhaps not surprisingly). The following is an example of such a relationship which is used throughout the paper by Bourgain et al.:

Lemma (Corollary 1). If E(A,A)\geq|A|^3/K, then there exists a set A'\subseteq A such that |A'|\geq|A|/(20K) and |A'-A'|\leq10^7K^9|A|.

In words, a set with a lot of additive energy necessarily has a large subset with a small difference set. This is proved using a version of the Balog–Szemeredi–Gowers lemma.

If translates of subgroups have maximal additive structure, then which sets have minimal additive structure? It turns out that random sets tend to (nearly) have this property, and one way to detect low additive structure is Fourier bias:

\displaystyle{\|A\|_u:=\max_{\substack{\theta\in G\\\theta\neq0}}|\widehat{1_A}(\theta)|},

where the Fourier transform here has a \frac{1}{|G|} factor out front (it’s not unitary). Indeed, \|A\|_u captures how far E(A,A) is from its minimal value |A|^4/|G|:

Lemma. For any subset A of a finite additive group G, we have

(i) \displaystyle{E(A,A)\geq\frac{|A|^4}{|G|}}, and

(ii) \displaystyle{\|A\|_u^4\leq\frac{1}{|G|^3}\Big(E(A,A)-\frac{|A|^4}{|G|}\Big)\leq\frac{|A|}{|G|}\|A\|_u^2}.

Proof: Define \lambda_x:=\#\{(a,a')\in A^2:a-a'=x\}. Then (i) follows from Cauchy-Schwarz:

\displaystyle{|A|^2=\sum_{x\in G}\lambda_x\leq|G|^{1/2}\|\lambda\|_2=\big(|G|E(A,A)\big)^{1/2}}.

We will prove (ii) assuming G=\mathbb{Z}/n\mathbb{Z}, but the proof generalizes. Denote e_n(x):=e^{2\pi ix/n}. For the left-hand inequality, we consider

\displaystyle{\sum_{\theta\in G}|\widehat{1_A}(\theta)|^4=\sum_{\theta\in G}\bigg|\frac{1}{|G|}\sum_{a\in A}e_n(-\theta a)\bigg|^4=\frac{1}{|G|^4}\sum_{\theta\in G}\bigg|\sum_{x\in G}\lambda_xe_n(-\theta x)\bigg|^2},

where the last step is by expanding |w|^2=w\overline{w}. Then Parseval’s identity simplifies this to \frac{1}{|G|^3}\|\lambda\|_2^2=\frac{1}{|G|^3}E(A,A). We use this to bound \|A\|_u^4:

\displaystyle{\|A\|_u^4=\max_{\substack{\theta\in G\\\theta\neq0}}|\widehat{1_A}(\theta)|^4\leq\sum_{\substack{\theta\in G\\\theta\neq0}}|\widehat{1_A}(\theta)|^4=\frac{1}{|G|^3}E(A,A)-\frac{|A|^4}{|G|^4}}.

For the right-hand inequality, we apply Parseval’s identity:

\displaystyle{E(A,A)=\sum_{x\in G}\lambda_x^2=\frac{1}{|G|}\sum_{\theta\in G}\bigg|\sum_{x\in G}\lambda_xe_n(-\theta x)\bigg|^2=\frac{|A|^4}{|G|}+\frac{1}{|G|}\sum_{\substack{\theta\in G\\\theta\neq0}}\bigg|\sum_{x\in G}\lambda_xe_n(-\theta x)\bigg|^2}

From here, we apply the expansion |w|^2=w\overline{w}

\displaystyle{\bigg|\sum_{a\in A}e_n(-\theta a)\bigg|^2=\sum_{x\in G}\lambda_xe_n(-\theta x)}

to continue:

\displaystyle{\sum_{\substack{\theta\in G\\\theta\neq0}}\bigg|\sum_{x\in G}\lambda_xe_n(-\theta x)\bigg|^2=\sum_{\substack{\theta\in G\\\theta\neq0}}\bigg|\sum_{a\in A}e_n(-\theta a)\bigg|^4\leq\sum_{\substack{\theta\in G\\\theta\neq0}}\big(|G|\|A\|_u\big)^2\bigg|\sum_{a\in A}e_n(-\theta a)\bigg|^2}.

Applying Parseval’s identity then gives

\displaystyle{E(A,A)\leq\frac{|A|^4}{|G|}+\big(|G|\|A\|_u\big)^2\cdot\frac{1}{|G|}\sum_{\theta\in G}\bigg|\sum_{a\in A}e_n(-\theta a)\bigg|^2=\frac{|A|^4}{|G|}+|G|^2\|A\|_u^2|A|},

which is a rearrangement of the right-hand inequality.     \Box

— The matrix construction —

Now that we’ve covered the basics of additive combinatorics, let’s consider how these ideas are useful to the pursuit of deterministic RIP matrices. The main idea is that we desire a Gram matrix \Phi^*\Phi whose entries exhibit cancellations for weak flat RIP (see my previous post). By the last lemma, we can control cancellations of complex exponentials

\displaystyle{\bigg|\sum_{a\in A}e_n(\theta a)\bigg|\leq|G|\|A\|_u,\qquad\theta\neq0}

in terms of the additive energy of the index set A. This motivates us to pursue a Gram matrix whose entries are complex exponentials. To this end, consider the following vector:

\displaystyle{u_{a,b}:=\frac{1}{\sqrt{p}}\Big(e_p(ax^2+bx)\Big)_{x\in\mathbb{F}_p}},

where p is prime and \mathbb{F}_p denotes the field of size p. Such vectors are called chirps, and they are used in a variety of applications including radar. Here, we are mostly interested in the form of their inner products. If a_1=a_2, then \langle u_{a_1,b_1},u_{a_2,b_2}\rangle=\delta_{b_1,b_2} by the geometric sum formula. Otherwise, the inner product is much more interesting:

\displaystyle{\langle u_{a_1,b_1},u_{a_2,b_2}\rangle=\frac{1}{p}\sum_{x\in\mathbb{F}_p}e_p\Big((a_1-a_2)x^2+(b_1-b_2)x\Big)}.

Since a_1-a_2\neq0, we can complete the square in the exponent, and changing variables to y:=x+(b_1-b_2)/(2(a_1-a_2)) gives

\displaystyle{\langle u_{a_1,b_1},u_{a_2,b_2}\rangle=\frac{1}{p}e_p\bigg(-\frac{(b_1-b_2)^2}{4(a_1-a_2)}\bigg)\sum_{y\in\mathbb{F}_p}e_p\Big((a_1-a_2)y^2\Big)}.

Finally, this can be simplified using a quadratic Gauss sum formula:

\displaystyle{\langle u_{a_1,b_1},u_{a_2,b_2}\rangle=\frac{\sigma_p}{\sqrt{p}}\bigg(\frac{a_1-a_2}{p}\bigg)e_p\bigg(-\frac{(b_1-b_2)^2}{4(a_1-a_2)}\bigg)},

where \sigma_p is 1 or i (depending on whether p is 1 or 3\bmod 4) and (\frac{a_1-a_2}{p}) is a Legendre symbol, taking value \pm1 depending on whether a_1-a_2 is a perfect square \bmod p. Modulo these factors, the above inner product is a complex exponential, and since we want these in our Gram matrix \Phi^*\Phi, we will take \Phi to have columns of the form u_{a,b} — in fact, the columns will be \{u_{a,b}\}_{(a,b)\in\mathcal{A}\times\mathcal{B}} for some well-designed sets \mathcal{A},\mathcal{B}\subseteq\mathbb{F}_p.

For weak flat RIP, we want to bound the following quantity for \Omega_1,\Omega_2\subseteq\mathcal{A}\times\mathcal{B} with |\Omega_1|,|\Omega_2|\leq\sqrt{p}:

\displaystyle{\bigg|\bigg\langle\sum_{(a_1,b_1)\in\Omega_1}u_{a_1,b_1},\sum_{(a_2,b_2)\in\Omega_2}u_{a_2,b_2}\bigg\rangle\bigg|}.

For i=1,2, define

  • A_i:=\{a:\exists b\mbox{ s.t. }(a,b)\in\Omega_i\} and
  • \Omega_i(a):=\{b:(a,b)\in\Omega_i\}.

These provide an alternate expression for the quantity of interest:

\displaystyle{\bigg|\sum_{(a_1,b_1)\in\Omega_1}\sum_{(a_2,b_2)\in\Omega_2}\langle u_{a_1,b_1},u_{a_2,b_2}\rangle\bigg|=\bigg|\sum_{\substack{a_1\in A_1\\a_2\in A_2}}\sum_{\substack{b_1\in\Omega_1(a_1)\\b_2\in\Omega_2(a_2)}}\langle u_{a_1,b_1},u_{a_2,b_2}\rangle\bigg|},

which admits the following bound by the triangle inequality:

\displaystyle{\leq\sum_{\substack{a_1\in A_1\\a_2\in A_2}}\bigg|\sum_{\substack{b_1\in\Omega_1(a_1)\\b_2\in\Omega_2(a_2)}}\langle u_{a_1,b_1},u_{a_2,b_2}\rangle\bigg|=\frac{1}{\sqrt{p}}\sum_{\substack{a_1\in A_1\\a_2\in A_2}}\bigg|\sum_{\substack{b_1\in\Omega_1(a_1)\\b_2\in\Omega_2(a_2)}}e_p\bigg(-\frac{(b_1-b_2)^2}{4(a_1-a_2)}\bigg)\bigg|}.

Pleasingly, it now suffices to bound a sum of complex exponentials, which we feel equipped to do using additive combinatorics. The following lemma does precisely this (it can be viewed as an analog of the previous lemma).

Lemma (Lemma 9). For every \theta\in\mathbb{F}_p^* and B_1,B_2\subseteq\mathbb{F}_p, we have

\displaystyle{\bigg|\sum_{\substack{b_1\in B_1\\b_2\in B_2}}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|\leq|B_1|^{1/2}E(B_1,B_1)^{1/8}|B_2|^{1/2}E(B_2,B_2)^{1/8}p^{1/8}}.

Proof: First, Cauchy-Schwarz gives

\displaystyle{\bigg|\sum_{\substack{b_1\in B_1\\b_2\in B_2}}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^2=\bigg|\sum_{b_1\in B_1}1\cdot\sum_{b_2\in B_2}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^2}

\displaystyle{\leq|B_1|\sum_{b_1\in B_1}\bigg|\sum_{b_2\in B_2}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^2}.

Expanding |w|^2=w\overline{w} and rearranging then gives an alternate expression for the right-hand side:

\displaystyle{|B_1|\sum_{b_2,b_2'\in B_2}e_p\Big(\theta(b_2^2-(b_2')^2)\Big)\overline{\sum_{b_1\in B_1}e_p\Big(\theta(2b_1(b_2-b_2'))\Big)}}.

Applying Cauchy-Schwarz again, we then have

\displaystyle{\bigg|\sum_{\substack{b_1\in B_1\\b_2\in B_2}}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^4\leq|B_1|^2|B_2|^2\sum_{b_2,b_2'\in B_2}\bigg|\sum_{b_1\in B_1}e_p\Big(\theta(2b_1(b_2-b_2'))\Big)\bigg|^2},

and expanding |w|^2=w\overline{w} this time gives

\displaystyle{|B_1|^2|B_2|^2\sum_{\substack{b_1,b_1'\in B_1\\b_2,b_2'\in B_2}}e_p\Big(2\theta(b_1-b_1')(b_2-b_2')\Big)}.

At this point, it is convenient to change variables, namely, x=b_1-b_1' and y=b_2-b_2':

\displaystyle{\bigg|\sum_{\substack{b_1\in B_1\\b_2\in B_2}}e_p\Big(\theta(b_1-b_2)^2\Big)\bigg|^4\leq |B_1|^2|B_2|^2\sum_{x,y\in\mathbb{F}_p}\lambda_x\mu_ye_p(2\theta xy),\qquad(1)}

where \lambda_x:=\#\{(b_1,b_1')\in B_1^2:b_1-b_1'=x\} and similarly for \mu_y in terms of B_2. We now apply Cauchy-Schwarz again to bound the sum in (1):

\displaystyle{\bigg|\sum_{x\in\mathbb{F}_p}\lambda_x\sum_{y\in\mathbb{F}_p}\mu_ye_p(2\theta xy)\bigg|^2\leq\|\lambda\|_2^2\sum_{x\in\mathbb{F}_p}\bigg|\sum_{y\in\mathbb{F}_p}\mu_ye_p(2\theta xy)\bigg|^2},

and changing variables x':=-2\theta x (this change is invertible since \theta\neq0), we see that the right-hand side is a sum of squares of the Fourier coefficients of \mu. As such, Parseval’s identity gives the following simplification:

\displaystyle{\bigg|\sum_{x,y\in\mathbb{F}_p}\lambda_x\mu_ye_p(2\theta xy)\bigg|^2\leq p\|\lambda\|_2^2\|\mu\|_2^2=pE(B_1,B_1)E(B_2,B_2)}.

Applying this bound to (1) gives the result.     \Box

— How to construct \mathcal{B}

The previous lemma enables us to prove weak-flat-RIP-type cancellations in cases where \Omega_1(a_1),\Omega_2(a_2)\subseteq\mathcal{B} both lack additive structure. Indeed, the method of this paper is to do precisely this, and the remaining cases (where either \Omega_1(a_1) or \Omega_2(a_2) has more additive structure) will find cancellations by accounting for the dilation weights 1/(a_1-a_2). This seems to contradict the paragraph after (2.9) in the journal version of the paper (this paragraph appears to suffer from several typos) — here is what I suspect should be the corrected version:

… To prove the cancellation in (2.8), we basically split it into two cases: (i) when B'=\Omega_i(a_i) has less additive structure (i.e., E(B',B') is small), the cancellation comes from the sum over b_1, b_2 (with a_1, a_2 fixed), and (ii) when B' has more additive structure, one gets dispersion of the phases from the dilation weights 1/(a_1-a_2) (taking a large moment and using (2.4)). Incidentally, oscillations of the factor (\frac{a_1-a_2}{p}) play no role in the argument.

Overall, we will be very close to proving that \Phi is RIP if most subsets of \mathcal{B} lack additive structure. To this end, the authors actually prove something much stronger: They design \mathcal{B} in such a way that all sufficiently large subsets have low additive structure. The following theorem is the first step in the design:

Theorem (Theorem 5). Fix r,M\in\mathbb{N}, M\geq2, and define the cube \mathcal{C}:=\{0,\ldots,M-1\}^r\subseteq\mathbb{Z}^r. Let \tau denote the solution to the equation

\displaystyle{\Big(\frac{1}{M}\Big)^{2\tau}+\Big(\frac{M-1}{M}\Big)^{\tau}=1}.

Then for any subsets A,B\subseteq\mathcal{C}, we have

|A+B|\geq\big(|A||B|\big)^\tau.

As a consequence of this theorem (taking A=B), we have |A+A|\geq|A|^{2\tau} for every A\subseteq\mathcal{C}, and since \tau>1/2, this means that large subsets A have |A+A|\gg|A|, indicating low additive structure. However, \mathcal{C} is a subset of the group \mathbb{Z}^r, whereas we need to construct a subset \mathcal{B} of \mathbb{F}_p. The trick here is to pick \mathcal{B} so that it inherits the additive structure of \mathcal{C}, and we use a Freiman isomorphism to accomplish this. In particular, we want a mapping \varphi\colon\mathcal{C}\rightarrow\mathbb{F}_p such that c_1+c_2=c_3+c_4 if and only if \varphi(c_1)+\varphi(c_2)=\varphi(c_3)+\varphi(c_4), and we will take \mathcal{B}:=\varphi(\mathcal{C}) — this is what it means for \mathcal{C} and \mathcal{B} to be Freiman isomorphic, and it’s easy to see that Freiman isomorphic sets have the same sized sumsets, difference sets and additive energy. In this case, it suffices to take

\displaystyle{\mathcal{B}:=\bigg\{\sum_{j=1}^{r}x_j(2M)^{j-1}:x_1,\ldots,x_r\in\{0,\ldots,M-1\}\bigg\}}.

Indeed, the 2M-ary expansion of b_1,b_2\in\mathcal{B} reveals the c_1,c_2\in\mathcal{C} such that \varphi(c_1)=b_1 and \varphi(c_2)=b_2. Also, adding b_1 and b_2 incurs no carries, so the expansion of b_1+b_2 reveals c_1+c_2 (even when c_1+c_2\not\in\mathcal{C}).

We already know that large subsets of \mathcal{C} (and \mathcal{B}) exhibit low additive structure, but the above theorem only gives this in terms of the sumset, whereas Lemma 9 requires low additive structure in terms of additive energy. As such, we will first convert the above theorem into a statement about difference sets, and then apply Corollary 1 to further convert it in terms of additive energy:

Corollary (essentially Corollary 3). Fix r, M and \tau according to Theorem 5, take \mathcal{B} as defined above, and pick s and t such that (2\tau-1)s\geq t. Then every subset B\subseteq\mathcal{B} such that |B|>p^s satisfies |B-B|>p^t|B|.

Proof: First note that -B is a translate of some other set B'\subseteq\mathcal{B}. Explicitly, if b_0=\sum_{j=1}^r(M-1)(2M)^{j-1}, then we can take B':=b_0-B. As such, Theorem 5 gives

|B-B|=|B+B'|\geq|B|^{2\tau}=|B|^{2\tau-1}|B|>p^{(2\tau-1)s}|B|\geq p^t|B|.\quad\Box

Corollary (essentially Corollary 4). Fix r, M and \tau according to Theorem 5, take \mathcal{B} as defined above, and pick \gamma and \ell such that (2\tau-1)(\ell-\gamma)\geq 10\gamma. Then for every \epsilon>0, there exists P such that for every p\geq P, every subset S\subseteq\mathcal{B} with |S|>p^\ell satisfies E(S,S)<p^{-\gamma+\epsilon}|S|^3.

Proof: Suppose to the contrary that there exists \epsilon>0 such that there are arbitrarily large p for which there is a subset S\subseteq\mathcal{B} with |S|>p^\ell and E(S,S)\geq p^{-\gamma+\epsilon}|S|^3. Writing E(S,S)=|S|^3/K, then K\leq p^{\gamma-\epsilon}. By Corollary 1, there exists B\subseteq S such that, for sufficiently large p,

|B|\geq|S|/(20K)>\frac{1}{20}p^{\ell-\gamma+\epsilon}>p^{\ell-\gamma},

and

|B-B|\leq10^7K^9|S|\leq10^7K^9(20K|B|)\leq10^7\cdot20\cdot p^{10(\gamma-\epsilon)}|B|<p^{10\gamma}|B|.

However, this contradicts the previous corollary with s=\ell-\gamma and t=10\gamma.     \Box

Notice that we can weaken our requirements on \gamma and \ell if we had a version of Corollary 1 with a smaller exponent on K. This exponent comes from a version of the Balog–Szemeredi–Gowers lemma (Lemma 6), which follows from the proof of Lemma 2.2 of this paper. (Specifically, take A=B, and you need to change A-_EB to A+_EB, but this change doesn’t affect the proof.) The authors indicate that it would be desirable to prove a better version of this lemma, but I’m not sure how easy that will be. If anything, one should probably try to prove Corollary 1 directly (instead of going through Lemma 6).

Advertisements

One thought on “Deterministic RIP matrices: Breaking the square-root bottleneck, II

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s