This spring, I’m teaching a graduate-level special topics course called “Mathematics of Data Science” at the Ohio State University. This will be a research-oriented class, and in lecture, I plan to cover some of the important ideas from convex optimization, probability, dimensionality reduction, clustering, and sparsity.
Click here for a draft of my lecture notes.
The current draft consists of a chapter on convex optimization. I will update the above link periodically. Feel free to comment below.
UPDATE #1: Lightly edited Chapter 1 and added a chapter on probability.
This last semester, I was a long-term visitor at the Simons Institute for the Theory of Computing. My time there was rather productive, resulting in a few (exciting!) arXiv preprints, which I discuss below.
1. SqueezeFit: Label-aware dimensionality reduction by semidefinite programming.
Suppose you have a bunch of points in high-dimensional Euclidean space, some labeled “cat” and others labeled “dog,” say. Can you find a low-rank projection such that after projection, cats and dogs remain separated? If you can implement such a projection as a sensor, then that sensor collects enough information to classify cats versus dogs. This is the main idea behind compressive classification.
Continue reading A few paper announcements
Two years ago, Boris Alexeev emailed me a problem:
Let . Suppose you have distinct numbers in some field. Is it necessarily possible to arrange the numbers into an matrix of full rank?
Boris’s problem was originally inspired by a linear algebra exam problem at Princeton: Is it possible arrange four distinct prime numbers in a rank-deficient matrix? (The answer depends on whether you consider to be prime.) Recently, Boris reminded me of his email, and I finally bothered to solve it. His hint: Apply the combinatorial nullstellensatz. The solve was rather satisfying, and if you’re reading this, I highly recommend that you stop reading here and enjoy the solve yourself.
Continue reading A neat application of the polynomial method
This is the eleventh “research” thread of the Polymath16 project to make progress on the Hadwiger–Nelson problem, continuing this post. This project is a follow-up to Aubrey de Grey’s breakthrough result that the chromatic number of the plane is at least 5. Discussion of the project of a non-research nature should continue in the Polymath proposal page. We will summarize progress on the Polymath wiki page.
Here’s a brief summary of the progress made in the previous thread:
– Let w(k) denote the supremum of w such that is k-colorable. Then of course and for every . Furthermore,
Colorings that produce these lower bounds are depicted here. The upper bound for k=3 is given here.
– The largest known k-colorable disks for k=2,3,4,5 are depicted here.
Presumably, we can obtain descent upper bounds on w(4) by restricting (a finite subset of) the ring to an infinite strip.
This is the fifth (and final) entry to summarize talks in the “boot camp” week of the program on Foundations of Data Science at the Simons Institute for the Theory of Computing, continuing this post. On Friday, we heard talks from Ilya Razenshteyn and Michael Kapralov. Below, I link videos and provide brief summaries of their talks.
Ilya Razenshteyn — Nearest Neighbor Methods
Continue reading Foundations of Data Science Boot Camp, V
This is the fourth entry to summarize talks in the “boot camp” week of the program on Foundations of Data Science at the Simons Institute for the Theory of Computing, continuing this post. On Thursday, we heard talks from Santosh Vempala and Ilias Diakonikolas. Below, I link videos and provide brief summaries of their talks.
Santosh Vempala — High Dimensional Geometry and Concentration
Continue reading Foundations of Data Science Boot Camp, IV
This is the third entry to summarize talks in the “boot camp” week of the program on Foundations of Data Science at the Simons Institute for the Theory of Computing, continuing this post. On Wednesday, we heard talks from Fred Roosta and Will Fithian. Below, I link videos and provide brief summaries of their talks.
Fred Roosta — Stochastic Second-Order Optimization Methods
Continue reading Foundations of Data Science Boot Camp, III