MATH 8610: Mathematics of Data Science

This spring, I’m teaching a graduate-level special topics course called “Mathematics of Data Science” at the Ohio State University. This will be a research-oriented class, and in lecture, I plan to cover some of the important ideas from convex optimization, probability, dimensionality reduction, clustering, and sparsity.

Click here for a draft of my lecture notes.

The current draft consists of a chapter on convex optimization. I will update the above link periodically. Feel free to comment below.

UPDATE #1: Lightly edited Chapter 1 and added a chapter on probability.

Advertisements

A few paper announcements

This last semester, I was a long-term visitor at the Simons Institute for the Theory of Computing. My time there was rather productive, resulting in a few (exciting!) arXiv preprints, which I discuss below.

1. SqueezeFit: Label-aware dimensionality reduction by semidefinite programming.

Suppose you have a bunch of points in high-dimensional Euclidean space, some labeled “cat” and others labeled “dog,” say. Can you find a low-rank projection such that after projection, cats and dogs remain separated? If you can implement such a projection as a sensor, then that sensor collects enough information to classify cats versus dogs. This is the main idea behind compressive classification.

Continue reading A few paper announcements

Foundations of Data Science Boot Camp, V

This is the fifth (and final) entry to summarize talks in the “boot camp” week of the program on Foundations of Data Science at the Simons Institute for the Theory of Computing, continuing this post. On Friday, we heard talks from Ilya Razenshteyn and Michael Kapralov. Below, I link videos and provide brief summaries of their talks.

Ilya Razenshteyn — Nearest Neighbor Methods

Continue reading Foundations of Data Science Boot Camp, V

Foundations of Data Science Boot Camp, IV

This is the fourth entry to summarize talks in the “boot camp” week of the program on Foundations of Data Science at the Simons Institute for the Theory of Computing, continuing this post. On Thursday, we heard talks from Santosh Vempala and Ilias Diakonikolas. Below, I link videos and provide brief summaries of their talks.

Santosh Vempala — High Dimensional Geometry and Concentration

Continue reading Foundations of Data Science Boot Camp, IV

Foundations of Data Science Boot Camp, III

 

This is the third entry to summarize talks in the “boot camp” week of the program on Foundations of Data Science at the Simons Institute for the Theory of Computing, continuing this post. On Wednesday, we heard talks from Fred Roosta and Will Fithian. Below, I link videos and provide brief summaries of their talks.

Fred Roosta — Stochastic Second-Order Optimization Methods

Continue reading Foundations of Data Science Boot Camp, III

Foundations of Data Science Boot Camp, II

 

This is the second entry to summarize talks in the “boot camp” week of the program on Foundations of Data Science at the Simons Institute for the Theory of Computing, continuing this post. On Tuesday, we heard talks from Ken Clarkson, Rachel Ward, and Michael Mahoney. Below, I link videos and provide brief summaries of their talks.

Ken Clarkson — Sketching for Linear Algebra: Randomized Hadamard, Kernel Methods

Continue reading Foundations of Data Science Boot Camp, II

Foundations of Data Science Boot Camp

I’m spending the semester at the Simons Institute for the Theory of Computing as part of the program on Foundations of Data Science. This was the first day of the “boot camp” week, which was organized to acquaint program participants with the key themes of the program. Today, we heard talks from Ravi Kannan and David Woodruff. Below, I link videos and provide brief summaries of their talks.

Ravi Kannan — Foundations of Data Science

Continue reading Foundations of Data Science Boot Camp