This spring, I’m teaching a graduate-level special topics course called “Mathematics of Data Science” at the Ohio State University. This will be a research-oriented class, and in lecture, I plan to cover some of the important ideas from convex optimization, probability, dimensionality reduction, clustering, and sparsity.
Click here for a draft of my lecture notes.
The current draft consists of a chapter on convex optimization. I will update the above link periodically. Feel free to comment below.
UPDATE #1: Lightly edited Chapter 1 and added a chapter on probability.
UPDATE #2: Lightly edited Chapter 2 and added a section on PCA.
UPDATE #3: Added a section on random projection.
Two years ago, Boris Alexeev emailed me a problem:
Let . Suppose you have distinct numbers in some field. Is it necessarily possible to arrange the numbers into an matrix of full rank?
Boris’s problem was originally inspired by a linear algebra exam problem at Princeton: Is it possible arrange four distinct prime numbers in a rank-deficient matrix? (The answer depends on whether you consider to be prime.) Recently, Boris reminded me of his email, and I finally bothered to solve it. His hint: Apply the combinatorial nullstellensatz. The solve was rather satisfying, and if you’re reading this, I highly recommend that you stop reading here and enjoy the solve yourself.
Continue reading A neat application of the polynomial method
Last week, I visited Joey Iverson at the University of Maryland, and we spent a lot of time working through different approaches to Zauner’s conjecture. In general, my relationship with this problem is very similar to Steve Flammia‘s description, as paraphrased by smerkel on Physics Stack Exchange:
[Flammia described] the SIC-POVM problem as a “heartbreaker” because every approach you take seems super promising but then inevitably fizzles out without really giving you a great insight as to why.
Case in point, Joey and I identified a promising approach involving ideas from our association schemes paper. We were fairly optimistic, and Joey even bet me $5 that our approach would work. Needless to say, I now have this keepsake from Joey:
While our failure didn’t offer any great insights (as Flammia predicted), the experience forced me to review the literature on Zauner’s conjecture a lot more carefully. A few things caught my eye, and I’ll discuss them here. Throughout, SIC denotes “symmetric informationally complete line set” and WH denotes “the Weyl-Heisenberg group.”
Continue reading Notes on Zauner’s conjecture
The polynomial method has made some waves recently (see this and that, for instance), and last week, Boris Alexeev gave a very nice talk on various applications of this method. This post is loosely based on his talk. All errors are my own.
It’s hard to pin down what exactly the polynomial method is. It’s a technique in algebraic extremal combinatorics, where the goal is to provide bounds on the sizes of objects with certain properties. The main idea is to identify the desired cardinality with some complexity measure of an algebraic object (e.g., the dimension of a vector space, the degree of a polynomial, or the rank of a tensor), and then use algebraic techniques to estimate that complexity measure. If at some point you use polynomials, then you might say you applied the polynomial method.
What follows is a series of instances of this meta-method.
Continue reading Introduction to the polynomial method (and other similar things)
A couple of weeks ago, I attended a workshop hosted by Darrin Speegle on the HRT Conjecture. First posed twenty years ago by Chris Heil, Jay Ramanathan, and Pankaj Topiwala in this paper, the conjecture states that every finite collection of time-frequency shifts of any nonzero function in is linearly independent. In this post, I will discuss some of the key ideas behind some of the various attempts at chipping away at HRT, and I’ll describe what appear to be fundamental barriers to a complete solution. For more information, see these survey papers: one and two.
First, some notation: Denote the translation-by- and modulation-by- operators by
respectively. Then the formal conjecture statement is as follows:
The HRT Conjecture. For every and every finite , the collection is linearly independent.
What follows are some of the popular methods for tackling HRT:
Continue reading The HRT Conjecture
Soledad Villar recently posted our latest paper on the arXiv (this one coauthored by her advisor, Rachel Ward). The paper provides guarantees for the k-means SDP when the points are drawn from a subgaussian mixture model. This blog entry will discuss one of the main ideas in our analysis, which we borrowed from Guedon and Vershynin’s recent paper.
Let’s start with two motivating applications:
The first application comes from graph clustering. Consider the stochastic block model, in which the vertices are secretly partitioned into two communities, each of size , and edges between vertices of a common community are drawn iid with some probability , and all other edges are drawn with probability . The goal of community estimation is to estimate the communities given a random draw of the graph. For this task, you might be inclined to find the maximum likelihood estimator for this model, but this results in an integer program. Relaxing the program leads to a semidefinite program, and amazingly, this program is tight and recovers the true communities with high probability when and for good choices of . (See this paper.) These edge probabilities scale like the threshold for connected Erdos-Renyi graphs, and this makes sense since we wouldn’t know how to assign vertices in isolated components. If instead, the probabilities were to scale like , then we would be in the “giant component” regime, so we’d still expect enough signal to correctly assign a good fraction of the vertices, but the SDP is not tight in this regime.
Continue reading Clustering noisy data with semidefinite relaxations
Last week, I attended this conference in Berlin, and much like the last CSA conference, it was very nice. This year, most of the talks followed one of three themes:
- Application-driven compressed sensing
- Quadratic or bilinear problems
- Clustering in graphs or Euclidean space
Examples of application-driven CS include theoretical results for radar-inspired sensing matrices and model-based CS for quantitative MRI. Readers of this blog are probably familiar with the prototypical quadratic problem (phase retrieval), and bilinear problems include blind deconvolution and self-calibration. Recently, I have blogged quite a bit about clustering in Euclidean space (specifically, k-means clustering), but I haven’t written much about clustering in graphs (other than its application to phase retrieval). For the remainder of this entry, I will discuss two of the talks from CSA2015 that covered different aspects of graph clustering.
Continue reading Compressed Sensing and its Applications 2015