Two chapters in Finite Frames: Theory and Applications

Last year, I helped write a couple of chapters in the following book:

Earlier this year, I was invited to become a reviewer for Mathematical Reviews, and my first assignment was to review two chapters from this book.  I decided to post my reviews here, and I highly recommend reading these chapters first-hand (both of them are very well written, and they provide fantastic context with the literature).

Quantization and finite frames

Alexander M. Powell, Rayan Saab, Ozgur Yilmaz

This chapter surveys the finite frames literature on quantization. Specifically, it focuses on the following “analysis” formulation:

  1. First, a vector x\in\mathbb{R}^N is encoded into frame coeffecients with respect to a frame \Phi, yielding y=\Phi^*x\in\mathbb{R}^M.
  2. Next, a quantizer \mathcal{Q} is applied to y to get q=\mathcal{Q}(y)\in\mathcal{A}^M; here, \mathcal{A} is some finite alphabet.
  3. Finally, a decoder \mathcal{G} is applied to q to form an estimate \widetilde{x}=\mathcal{G}(q)\in\mathbb{R}^N of the original vector x.

With this formulation, the quantization problem is the following: Given some (bounded) signal class \mathcal{B}\subseteq\mathbb{R}^N, an N\times M frame \Phi, and an alphabet size, design an efficient quantizer and decoder such that the distortion \|x-\mathcal{G}(\mathcal{Q}(\Phi^*x))\| is small over \mathcal{B}. Throughout, two particular types of quantizers are discussed: the memoryless scalar quantizer (MSQ) and various Sigma-Delta (\Sigma\Delta) quantizers.

MSQ individually quantizes each entry of y according to the same function: Q(y_i)\in\mathcal{A}. While MSQ is particularly efficient, it can produce undesirable levels of distortion. This is indicated by Figure 8.2, which illustrates that the cells of MSQ, namely vectors in \mathbb{R}^N with the same quantized frame coefficients, can be rather large and few. MSQ distortion is more rigorously analyzed in Subsection 8.2.1. Here, \mathcal{A}^M is taken to be a subset of “rounded” frame coefficients in \mathbb{R}^M, and a dual frame \Psi of \Phi is used to reconstruct: \widetilde{x}=\Psi q. With this decoder, distortion is bounded, but not in terms of the frame’s redundancy, indicating that MSQ fails to adequately exploit this redundancy. In Subsection 8.2.2, MSQ is also evaluated in terms of average distortion \mathrm{MSE}=\mathbb{E}\|x-\widetilde{x}\|^2 using a uniform noise model. Specifically, equation (8.15) gives that MSE is proportional to the Frobenius norm of the dual frame \Psi used in reconstruction, which in turn is minimized when \Psi is the canonical dual of \Phi. If we fix the signal dimension N and \Phi is any N\times M unit norm tight frame, then \mathrm{MSE}=\Theta(1/M). For comparison, it is known that \mathrm{MSE}=\Omega(1/M^2) for a general MSQ decoder. Figure 8.2 illustrates that a linear decoder can be inconsistent in the sense that \mathcal{Q}(\Phi^*\widetilde{x})\neq\mathcal{Q}(\Phi^*x), and so a consistent alternative is proposed in Subsection 8.2.3, which has asymptotically minimal \mathrm{MSE}=O(1/M^2).

Compared to MSQ, \Sigma\Delta quantization offers less distortion at the price of (slightly) less efficiency. Here, \mathcal{Q} quantizes the frame coefficients in order, each time compensating for any quantization errors incurred with the previous coefficient(s). Specifically, the rth order \Sigma\Delta quantizer iteratively compensates for the previous r coefficients; see equations (8.28) and (8.37) for explicit definitions. Again applying a dual frame \Psi of \Phi as the decoder, Corollary 8.2 bounds the distortion in rth order \Sigma\Delta quantization in terms of the operator norm of \Psi D^r, where D is the finite difference operator. While \|\Psi\|_\mathrm{Frob} was used to evaluate duals for MSQ reconstruction (and naturally led to the canonical dual), \|\Psi D^r\|_\mathrm{op} is used to evaluate duals for \Sigma\Delta reconstruction, and as Proposition 8.3 reveals, the rth order Sobolev dual of \Phi minimizes this quantity. In pursuit of an explicit bound on distortion, this quantity is analyzed for Sobolev duals of different types of frames. First, Theorem 8.3 considers a frame path, which is a paramaterized curve in \mathbb{R}^N with the property that regularly sampling this curve produces a frame. In this case, sampling at M points to get \Phi and reconstructing with the rth order Sobolev dual incurs distortion \|x-\widetilde{x}\|=O(1/M^r); when r>2, this is a substantial improvement to MSQ, even compared to MSQ’s average-case distortion. Next, Theorem 8.4 gives that if \Phi is Gaussian random with sufficient redundancy \lambda=M/N, then rth order Sobolev reconstruction has distortion of the form \|x-\widetilde{x}\|=O(1/\lambda^{cr}) with high probability. Finally, in the cases where \Phi is Sobolev self-dual or harmonic, Section 8.5 optimizes the order r as a function of redundancy \lambda to get root-exponential distortion: \|x-\widetilde{x}\|=O(e^{-c\sqrt{\lambda}}). Simulations in Figure 8.6 corroborate these distortion rates, suggesting that exponential decay is not possible (at least with such frames under \Sigma\Delta quantization).

Finite frames for sparse signal processing

Waheed U. Bajwa, Ali Pezeshki

This chapter surveys the sparse signal processing literature as it relates to finite frames. Overall, the authors describe what can be said about sparse vectors x given y=\Phi x+n, where \Phi is $N\times M$ with N\ll M and n is some noise vector. As an overarching theme, they consider how the pursuit of different types information about x imposes different verifiable geometric properties on \Phi=[\varphi_1\cdots\varphi_M]. Specifically, the following matrix parameters are often desired to be small:

  • spectral norm: \displaystyle\|\Phi\|_\mathrm{op}=\sup_{x\neq 0}\frac{\|\Phi x\|}{\|x\|}
  • worst-case coherence: \displaystyle\mu_\Phi=\max_{i\neq j}\frac{|\langle\varphi_i,\varphi_j\rangle|}{\|\varphi_i\|\|\varphi_j\|}
  • average coherence: \displaystyle\nu_\Phi=\max_i\bigg|\sum_{j\neq i}\frac{\langle\varphi_i,\varphi_j\rangle}{\|\varphi_i\|\|\varphi_j\|}\bigg|
  • sum coherence: \displaystyle\sum_{i}\sum_{j<i}\frac{|\langle\varphi_i,\varphi_j\rangle|}{\|\varphi_i\|\|\varphi_j\|}

Intuitively, having any one of these parameters be small ensures that, in some sense, the column vectors of \Phi are well distributed in \mathbb{R}^N. Throughout, the authors show that when these parameters are small, sparse signal processing can be rather successful. Section 9.2 focuses on how to determine x from y. Starting with the noiseless setting (where n=0), it is established in Theorem 9.1 that every K-sparse vector can be recovered by \ell_0 minimization provided \Phi satisfies the unique representation property: that every subcollection of 2K columns is linearly independent. Using the Gershgorin circle theorem, it then suffices for \Phi to have unit-norm columns and K<1+1/\mu_\Phi. Unfortunately, this upper bound is O(\sqrt{N}) by the Welch bound, whereas drawing the columns of \Phi at random from the unit sphere allow for the sparsity level K to get as large as N/2 with probability 1.

This reveals a square-root bottleneck that comes with using worst-case coherence; the authors address this bottleneck later in the chapter. First, tractible alternatives to \ell_0 minimization are identified, namely \ell_1 minimization (basis pursuit) and orthogonal matching pursuit, and Theorems 9.4 and 9.5 guarantee recovery, again provided K<1+1/\mu_\Phi.
In the noisy case (where n\neq0), Theorem 9.6 guarantees stable reconstruction using basis pursuit with inequality constraint, provided K=O(1/\mu_\Phi). Theorem 9.7 gives a similar guarantee for orthogonal matching pursuit, except the stability also depends on the size of x‘s smallest nonzero entry (this is an artifact of the reconstruction algorithm). In the case where the noise is stochastic, the least absolute shrinkage and selection operator (LASSO) produces a stable estimate of x provided K=O(1/\mu_\Phi) (Theorem 9.8), as does a modified version of orthogonal matching pursuit (Theorem 9.9), though this requires the smallest nonzero entry of x to be sufficiently large.

To address the square-root bottleneck, the authors note that these reconstruction algorithms perform well when K=\Theta(N) provided \Phi satisfies the restricted isometry property. However, this combinatorial property is particularly difficult to check, and so the authors shift their focus to an alternative notion of performance: Instead of seeking guarantees for all sparse signals simultaneously, they seek to perform well for all but a small fraction of the sparse signals. Section 9.3 is dedicated to such typical guarantees, which use the other matrix parameters above to achieve performance for most K-sparse vectors with K=O(N/\log M). First, Theorems 9.10 and 9.11 give that \ell_0 and \ell_1 minimization recover most K-sparse vectors x when n=0, provided \|\Phi\|_\mathrm{op} and \mu_\Phi are small. Next, n is taken to be stochastic, and the authors pursue an estimate \hat{x} that makes regression error \|\Phi x-\Phi\hat{x}\|_2 small. To this end, Theorem 9.12 gives that LASSO succeeds for most K-sparse vectors x provided \|\Phi\|_\mathrm{op} and \mu_\Phi are small; moreover, the requirement on \mu_\Phi here is much weaker than those in Theorems 9.10 and 9.11. Also, for most K-sparse vectors x with sufficiently large nonzero entries, Theorem 9.13 gives that LASSO determines the support of x provided \|\Phi\|_\mathrm{op} and $\mu_\Phi$ are small.

For each of these typical guarantees, the notion of “most” K-sparse vectors x has been based on a model in which the support of x is drawn randomly, and then the nonzero entries are drawn according to some distribution. The remaining typical guarantees remove the randomness from the nonzero entry values by working for most permutations of all K-sparse vectors, and the noise is taken to be stochastic. Theorem 9.14 gives that one-step thresholding determines the support of most K-sparse vectors x with sufficiently large nonzero entries (relative to \|x\|_2) provided \mu_\Phi and \nu_\Phi are small; the fact that \|\Phi\|_\mathrm{op} need not be small is an artifact of the algorithm, and the introduction of \nu_\Phi compensates for the removal of randomness from the signal model. Next, Theorem 9.15 gives that one-step thresholding reconstructs the same sort of K-sparse vectors provided \|\Phi\|_\mathrm{op}, \mu_\Phi and \nu_\Phi are small.

The chapter concludes with Section 4, which focuses on sparse signal detection. Here, y is either \Phi n or \Phi(x+n), where n is stochastic and x is K-sparse, and the goal is to determine which model holds for a given instance. The authors take the approach of finding the \Phi‘s which are optimal in some sense for K=1, and then optimizing these for K=2, etc. They first perform this lexicographic optimization in terms of the worst-case signal-to-noise ratio. For this case, Theorem 9.16 gives that the K=1 optimizers are equal norm tight frames, Theorem 9.17 gives that the K=2 optimizers are Grassmannian equal norm tight frames, and Theorem 9.18 bounds the K>2 optimizers in terms of worst-case coherence \mu_\Phi. Next, they optimize in terms of average-case signal-to-noise ratio; here, Theorem 9.19 gives that K=1 optimizers are tight frames, Theorem 9.20 gives that the K=2 optimizers minimize sum coherence, and Lemma 9.5 bounds the K>2 optimizers in terms of worst-case coherence \mu_\Phi.

One thought on “Two chapters in Finite Frames: Theory and Applications

Leave a comment