# Sparse Representations, Numerical Linear Algebra, and Optimization Workshop

A couple of weeks ago, I attended the “Sparse Representations, Numerical Linear Algebra, and Optimization Workshop.” It was my first time at Banff, and I was thoroughly impressed by the weather, the facility, and the workshop organization. A few of the talks were recorded and are available here. Check out this good-looking group of participants:

I wanted to briefly outline some of the problems that were identified throughout the workshop.

# An intuition for concentration inequalities

I use concentration inequalities a lot in my research, and it’s about time I write a post about them. Throughout this post, $\{X_i\}$ will denote independent realizations of a common random variable $X$ taking values in the real line. We will take $\mu:=\mathbb{E}X<\infty$, $\sigma^2:=\mathrm{Var}(X)<\infty$, and

$\displaystyle{\overline{X}_n:=\frac{1}{n}\sum_{i=1}^nX_n.}$

The central limit theorem (CLT) essentially states that

$\displaystyle{\lim_{n\rightarrow\infty}\mathrm{Pr}\bigg(\big|\overline{X}_n-\mu\big|\geq\frac{t\sigma}{\sqrt{n}}\bigg)=2Q(t)\leq e^{-t^2/2} \qquad \forall t\geq0,}$

where the Q-function $Q(\cdot)$ denotes the tail probability of the standard normal distribution. Unfortunately, this only ensures pointwise convergence for each $t$, meaning CLT provides no uniform tail bound for any fixed $n$. The purpose of a concentration inequality is to provide such a bound.

# How Not to Be Wrong: The Power of Mathematical Thinking

I recently read this book by Jordan Ellenberg, and I wanted to briefly discuss it. My first impressions:

• Is that really where “not” belongs in the title? (Answer: yes.)
• This is nothing like The Grasshopper King.
• The footnotes are fun.

Jordan introduces the book with a college student asking why math is so important. This book is an answer of sorts: It provides a bunch of simple, yet profound morsels of mathematical thinking. Actually, most of Jordan’s examples reveal how to properly think about the sort of math you might encounter in a newspaper (e.g., statistical significance that balding signifies future prostate cancer). I wonder if this book could form the basis of a “Math for the Humanities” type of class. Such a class might have more to offer the non-technical college student than calculus would. Overall, I highly recommend this book to everyone (including my mathematically disoriented wife).

# Robust width: A characterization of uniformly stable and robust compressed sensing

Jameson Cahill and I recently posted a new paper on the arXiv that we’re pretty excited about. Suppose you want to do compressed sensing with L1 minimization. That is, you get $y=\Phi x^\natural+e$ for some nearly $K$-sparse vector $x^\natural$ and some noise $e$ satisfying $\|e\|_2\leq\epsilon$, and then you attempt to reconstruct $x^\natural$ by taking

$\arg\min \quad \|x\|_1 \quad\mbox{ subject to } \quad\|\Phi x-y\|_2\leq\epsilon.$

One of the main results of compressed sensing is that the optimizer $x^\star$ of the above program satisfies

$\displaystyle{\|x^\star-x^\natural\|_2\leq \frac{C_0}{\sqrt{K}}\|x^\natural-x^\natural_K\|_1+C_1\epsilon}$

provided $\Phi$ satisfies the $(2K,\delta)$-restricted isometry property (RIP) for some $\delta<\sqrt{2}-1$. Here, $x^\natural_K$ denotes the best $K$-term approximation of $x^\natural$, and so $\|x^\natural-x^\natural_K\|_1$ captures how close $x^\natural$ is to being $K$-sparse. Jameson and I wondered if RIP was necessary to enjoy this uniformly stable and robust performance (uniform in the sense that the fixed matrix works for all $x^\natural$ simultaneously, stable because we don’t require $x^\natural$ to be exactly $K$-sparse, and robust because we allow noise). In our paper, we completely characterize the matrices that offer this performance, and we show that RIP matrices form a strict subset.

# Introduction to fast matrix multiplication

Recall how to multiply $n\times n$ matrices $A$ and $B$:

$\displaystyle{(AB)_{ik}=\sum_{j=1}^n A_{ij}B_{jk}}$

For each $(i,k)$, it takes $O(n)$ operations to calculate this sum, and so naive matrix multiplication takes $O(n^3)$ operations total. Let $\omega$ denote the infimum of $\alpha$ for which there exists an algorithm that computes the product of any two $n\times n$ matrices in $O(n^{\alpha})$ operations.

Conjecture. $\omega=2$.

A 3-way tensor is a complex function of index triples, i.e., $T\colon I\times J\times K\rightarrow\mathbb{C}$. A tensor has rank 1 if it enjoys a factorization of the form

$\displaystyle{T_{ijk}=x_iy_jz_k \qquad \forall (i,j,k)\in I\times J\times K.}$

The rank of a tensor is the size of its smallest decomposition into rank-1 tensors (like matrix rank).

# Derandomizing restricted isometries via the Legendre symbol

I recently finished writing the paper based on a talk I gave at MATHEON at TU Berlin last year. (The talk was very fun to give — halfway through, it featured a game show in which Felix Krahmer won a bottle of Tabasco.) This work was a collaboration with Afonso Bandeira, Matt Fickus and Joel Moreira, and in the original version (presented in my talk), we mixed two ingredients to derandomize Bernoulli RIP matrices:

(i) a strong version of the Chowla conjecture, and

(ii) the flat RIP trick from this paper.

The Chowla conjecture essentially states that as $n$ gets large, if you randomly draw $X$ from $\{1,\ldots,n\}$, then consecutive entries of the Liouville function $\lambda(X+1),\ldots,\lambda(X+r)$ are asymptotically independent. The strong version of the Chowla conjecture that we used in (i) not only gave asymptotic independence, but also provided a rate of convergence. The intuition is that consecutive Liouville function entries behave very similarly to iid Bernoulli $\pm1$‘s, and so we should expect that populating a matrix with these entries will yield RIP. Indeed, we used flat RIP to demonstrate random-like cancellations that prove RIP. Unfortunately, the strong version of the Chowla conjecture that we used implies the Chowla conjecture, which has been open for almost 50 years.

# Compressed sensing: Variations on a theme

A couple of months ago, I attended a workshop at Oberwolfach (my first!) called “Mathematical Physics meets Sparse Recovery.” I had a great time. I was asked to give the first talk of the week to get everyone on the same page with respect to sparse recovery. Here are the slides from my talk. What follows is an extended abstract (I added hyperlinks throughout for easy navigation):

Compressed sensing has been an exciting subject of research over the last decade, and the purpose of this talk was to provide a brief overview of the subject. First, we considered certain related topics (namely image compression and denoising) which led up to the rise of compressed sensing. In particular, wavelets provide a useful model for images, as natural images tend to be approximated by linear combinations of particularly few wavelets. This sparsity model has enabled JPEG2000 to provide particularly efficient image compression with negligible distortion. Additionally, this model has been leveraged to remove random noise from natural images.

Considering natural images enjoy such a useful model, one may ask whether the model can be leveraged to decrease the number of measurements necessary to completely determine an image. For example, an MRI scan might require up to 2 hours of exposure time, and then the image might be compressed with JPEG2000 after the fact, meaning most of the measurements can be effectively ignored. So is it possible to simply measure the important parts of the image and not waste time in the image acquisition process? This is the main idea underlying compressed sensing, as introduced by Candes, Romberg and Tao and Donoho.