(CS/CNS/EE 155) Machine Learning & Data Mining


2019/2020 Winter Term (previous year)

Course Description

Prerequisite: background in algorithms, linear algebra, calculus, probability, and statistics (CS/CNS/EE/NB 154 or CS/CNS/EE 156a or instructor’s permission)

This course will cover popular methods in machine learning and data mining, with an emphasis on developing a working understanding of how to apply these methods in practice. This course will also cover core foundational concepts underpinning and motivating modern machine learning and data mining approaches. This course will also cover some recent research developments.

Course Details

Late Homework Policy

Assignments will be due at 9pm on Wednesday Friday via Gradscope. Students are allowed to use up to 48 late hours. Late hours must be used in units of hours. Specify the number of hours used when turning in the assignment. Late hours cannot be used on the final exam. There will be no TA support over the weekends.

Collaboration Policy

Detailed policy available here
TLDR;

Instructor

Yisong Yue                             yyue@caltech.edu

Teaching Assistants

Matthew Levine   mlevine@caltech.edu
Alex Cui acui@caltech.edu
James Deacon jdeacon@caltech.edu
Alex Guerraaguerra@caltech.edu
Alice Jinqjin@caltech.edu
Frank Koufkou@caltech.edu
Marcus Dominguez-Kuhnemddoming@caltech.edu
Karthik Nairknair@caltech.edu
Jessica Wangjessicawang@caltech.edu
Sherry Wangshuxian@caltech.edu
Erika Shuyue Yusyu5@caltech.edu
Albert Zhaialbertz@caltech.edu
Jim Zhangjim@caltech.edu
Eric Zhaoelzhao@caltech.edu

Office Hours

Additional Textbooks and Resources

  • Machine Learning: a Probabilistic Perspective, by Kevin Murphy
  • Convex Optimization: Algorithms and Complexity (Free Version), by Sebastien Bubeck
  • Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
  • A Course in Machine Learning, by Hal Daume III
  • Matrix Cookbook
  • Probability Review
  • Maximum Entropy and Logistic Regression
  • A Beginner's Guide to Recurrent Networks and LSTMs [link]
  • Stochastic Gradient Descent Tricks [link]
  • Practical Bayesian Optimization for Efficient Grid Search of Tuning Parameters. [paper][software]
  • Overview of Topic Models. [paper]
  • Tutorial on Learning Reductions. [pdf]
  • Learning Reductions Overview. [paper]
  • Assignments

    Lectures & Recitation Schedule

    Note: schedule is subject to change.

    Papers on Ensemble Selection. [paper1][paper2]
                                    Further Reading:                                                
    1/07/2020 Lecture: Administrivia, Basics, Bias/Variance, Overfitting [slides]
    1/09/2020 Lecture: Perceptron, Gradient Descent [slides] Daume Chapter 3
    Mistake Bounds for Perceptron [link]
    Stochastic Gradient Descent Tricks [link]
    Bubeck Chaper 3
    1/09/2020 Recitation: Introduction to Python for Machine Learning [materials]
    1/14/2020 Lecture: SVMs, Logistic Regression, Neural Nets, Loss Functions, Evaluation Metrics [slides] Bounds on Error Expectation for SVMs [link]
    1/16/2020 Lecture: NO LECTURE
    1/16/2020 Recitation: Linear Algebra [slides] The Matrix Cookbook [link]
    1/21/2020 Lecture: Regularization, Lasso [slides] Murphy 13.3
    1/23/2020 Lecture: Decision Trees, Bagging, Random Forests [slides] Overview of Decision Trees [pdf]
    Overview of Bagging [pdf]
    Overview of Random Forests [pdf]
    1/28/2020 Lecture: Boosting, Ensemble Selection Schapire's Overview of Boosting [pdf]
    1/30/2020 Lecture: Deep Learning Deep Learning Book [html]
    1/30/2020 Recitation: PyTorch Tutorial
    2/04/2020 Lecture: Deep Learning Part 2
    2/06/2020 Lecture: Unsupervised Learning, Clustering, Dimensionality Reduction
    2/11/2020 Lecture: Latent Factor Models, Non-Negative Matrix Factorization Original Netflix Paper [link]
    2/13/2020 Embeddings Locally Linear Embedding [link]
    Playlist Embedding [link]
    word2vec [link]
    2/18/2020 Lecture: Recent Applications
    2/20/2020 Lecture: Probabilistic Models, Naive Bayes Murphy 3.5
    2/20/2020 Recitation: Probability & Sampling
    2/25/2020 Lecture: Hidden Markov Models Murphy 17.3--17.5
    2/27/2020 Lecture: Hidden Markov Models Part 2
    2/27/2020 Recitation: Dynamic Programming
    3/04/2020 Lecture: Recent Applications: Deep Generative Models
    3/06/2020 Lecture: Survey of Advanced Topics
    3/11/2020 Lecture: Review & Q/A