(CS/CNS/EE 155) Machine Learning & Data Mining


2016/2017 Winter Term (previous year)

Course Description

Prerequisite: background in algorithms, linear algebra, calculus, probability, and statistics (CS/CNS/EE/NB 154 or CS/CNS/EE 156a or instructor’s permission)

This course will cover popular methods in machine learning and data mining, with an emphasis on developing a working understanding of how to apply these methods in practice. This course will also cover core foundational concepts underpinning and motivating modern machine learning and data mining approaches. This course will also cover some recent research developments.

Course Details

Late Homework Policy

**Updated January 10th 2017** Assignments will be due at 9pm on Friday via Moodle. Students are allowed to use up to three late tokens. Using a late token extends the due date to the following Monday at 9pm. Students cannot use more than one late token per assignment. Late tokens cannot be used for the final exam. There will be no TA support over the weekends.

Instructor

Yisong Yue               yyue@caltech.edu

Teaching Assistants

Milan Cvitkovic      mcvitkov@caltech.edu
Jagriti Agrawal jagrawal@caltech.edu
Avi Dutta adutta@caltech.edu
Andrew Kang akang@caltech.edu
Emily Mazo emazo@caltech.edu
Sidd Murching smurching@caltech.edu
Suraj Nair snair@caltech.edu
Sarthak Sahu ssahu@caltech.edu

Office Hours

Optional Textbooks

  • Machine Learning: a Probabilistic Perspective, by Kevin Murphy
  • Convex Optimization: Algorithms and Complexity (Free Version), by Sebastien Bubeck
  • A Course in Machine Learning, by Hal Daume III
  • Since this is an advanced level course, all relevant course materials can be learned via research papers and supplementary lecture notes. However, these books are excellent references and I will refer to various chapters throughout the course.

    Assignments

    Lectures & Recitation Schedule

    Note: schedule is subject to change.

                                    Further Reading:                                                
    1/05/2017 Lecture: Administrivia, Basics, Bias/Variance, Overfitting [slides]
    1/05/2017 Recitation: Introduction to Python for Machine Learning [slides]
    1/10/2017 Lecture: Perceptron, Gradient Descent [slides] Daume Chapter 3
    Mistake Bounds for Perceptron [link]
    AdaGrad [link]
    Stochastic Gradient Descent Tricks [link]
    Bubeck Chaper 3
    1/12/2017 Lecture: SVMs, Logistic Regression, Neural Nets, Loss Functions [slides]
    1/12/2017 Recitation: Linear Algebra [slides][iPython] The Matrix Cookbook [link]
    1/17/2017 Lecture: Regularization, Lasso [slides] Murphy 13.3
    1/19/2017 Lecture: Decision Trees, Bagging, Random Forests [slides] Overview of Decision Trees [pdf]
    Overview of Bagging [pdf]
    Overview of Random Forests [pdf]
    1/19/2017 Recitation: NO RECITATION
    1/24/2017 Lecture: Boosting, Ensemble Selection [slides] Shapire's Overview of Boosting [pdf]
    1/26/2017 Lecture: Deep Learning (taught by Joe Marino) [slides] Deep Learning Book [html]
    A Brief Overview of Deep Learning. [link]
    1/26/2017 Recitation: Keras Tutorial [slides] [link]
    1/31/2017 Lecture: Deep Learning Part 2 (taught by Joe Marino) [slides]
    2/2/2017 Lecture: Recent Applications [slides] Edge Detection [paper]
    Visual Speech [project][paper]
    2/2/2017 Recitation: Probability & Sampling [slides]
    2/7/2017 Lecture: Probabilistic Models, Naive Bayes [slides] Murphy 3.5
    2/9/2017 Lecture: Hidden Markov Models [slides][notes] Murphy 17.3--17.5
    2/9/2017 Recitation: NO RECITATION
    2/14/2017 Lecture: **CANCELLED** Deep Generative Models
    2/14/2017
    TUESDAY
    7-8pm
    Recitation: Dynamic Programming [slides]
    2/16/2017 Lecture: Unsupervised Learning, Clustering, Dimensionality Reduction [slides]
    2/21/2017 Lecture: Latent Factor Models, Non-Negative Matrix Factorization [slides] Original Netflix Paper [link]
    2/23/2017 Lecture: Embeddings [slides] Locally Linear Embedding [link]
    Playlist Embedding [link]
    word2vec [link]
    2/23/2017 Recitation: NO RECITATION
    2/28/2017 Lecture: Recent Applications [slides] Lasso for cancer detection [paper]
    Badge dictionary learning from twitter [paper]
    Deep learning for visual style [paper]
    3/2/2017 Lecture: Deep Generative Models (taught by Taehwan Kim) [slides]
    3/2/2017 Recitation: NO RECITATION
    3/7/2017 Lecture: Survey of Advanced Topics [slides]
    3/9/2017 Lecture: Review & Q/A

    Additional References