STATS 231C: Theories of Machine Learning
Spring 2022
The course provides an introduction to the theoretical analysis of machine learning methods, with an emphasis on prediction problems. Both the statistical and the computational aspects of the problems will be considered. The course covers approaches such as local averaging, kernel methods, convex optimization, ensemble methods, neural networks, as well as online approaches. The theory of kernel methods will be covered in some depth.
Logistics
- Lectures: TR 9:30am-10:45am, BUNCHE 3164. (Also online via Zoom)
- Instructor: Arash A. Amini,
- Office Hours: No office hours.
- Special Reader: Gautem Dudeja
- Annoucements will be posted on Campuswire. Please sign-up using code 1926.
- Slides, homework, etc. will be posted on the Box folder.
- Homework should be returned on Gradescope. Please sign-up using code RW85JD.
Lecture videos
- Lecture videos are available at the following YouTube playlist.
Tentative Syllabus
- Probabilistic framework for prediction problems
- Local averaging and Stone’s theorem
- Kernel methods and Reproducing Kernel Hilbert Spaces (RKHS).
- Empirical risk minimization
- Risk bounds, uniform laws of large numbers, Rademacher averages and VC dimension.
- Bandit algorithms
- Ensemble methods
Recommended Books
There is no official text for the course, but you might find the following books useful:
- [UML] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014.
- [DFT] L. Györfi. M. Kohler, A. Krzyżak and H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer, 2002.
- [PTP] L. Devroye, L. Györfi and G. Lugosi, A Probabilistic Theory of Pattern Recognition, Springer, 1996.
- [HDS] M. Wainwright, High-dimensional statistics: A Non-asymptotic Viewpoint, Cambridge University Press, 2019.
- [ESL] T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning. 2nd Ed., Springer, 2009.
- [BA] T. Lattimore and C. Szepesvári, Bandit Algorithms, Cambridge University Press, 2020.
- [SVM] Steinwart, Ingo, and Andreas Christmann. Support vector machines. Springer Science & Business Media, 2008.
- Cucker, Felipe, and Ding Xuan Zhou. Learning theory: an approximation theory view- point. Vol. 24. Cambridge University Press, 2007.
- Vladimir Koltchinskii, Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems.
- [HDP] Roman Vershynin, High-Dimensional Probability with Applications in Data Science.
- [MFI] Evarist Giné and Richard Nickl, Mathematical Foundations of Infinite-Dimensional Statistical Models.
Prerequisites
- STATS 200A/B and preferably 200C (concurrently). Alternatively, graduate level probability theory and theoretical statistics.
- STATS 231A/B. Alternatively, basic knowledge of machine learning algorithms.
- Also, mathematical maturity.
Grading
The grade will be based 50% on homework and 50% on the final project.
- Homework. Problems sets will be assigned roughly every two weeks. Students are encouraged to discuss the problems together, but must independently produce and submit solutions. Use of LaTeX in preparing the solutions is strongly encouraged.
-
Project. The final project can be in any area related to the course. You might
- extend a theoretical result,
- summarize existing results (a survey paper),
- develop a new method or run experiments on existing methods,
- or do a combination of these.
Having theory component in the project is highly encouraged. You must submit a brief written report and give a presentation in class during the last week. You can work on the project in groups of two.