2016/2017 Winter Term (previous year)

Prerequisite: background in algorithms, linear algebra, calculus, probability, and statistics (CS/CNS/EE/NB 154 or CS/CNS/EE 156a or instructorâ€™s permission)

This course will cover popular methods in machine learning and data mining, with an emphasis on developing a working understanding of how to apply these methods in practice. This course will also cover core foundational concepts underpinning and motivating modern machine learning and data mining approaches. This course will also cover some recent research developments.

- Lectures on Tu/Th at 2:30pm-4pm in Annenberg 105
- Recitations on Th at 7:30pm-9pm (usually lasting 1 hour), in Annenberg 105
- We will be using Moodle for managing homeworks and grades [link]
- We will be using Piazza for discussion forums and announcements [link]
- Lectures will be video recorded [link]
- Lecture videos from previous year are available here.
- 6 Homeworks (worth approximately 60% of final grade)
- 3 Miniprojects (worth approximately 30% of final grade)
- Final Exam (worth approximately 10% of final grade)

****Updated January 10th 2017**** Assignments will be due at 9pm on Friday via Moodle. Students are allowed to use up to three late tokens. Using a late token extends the due date to the following Monday at 9pm. Students cannot use more than one late token per assignment. Late tokens cannot be used for the final exam. There will be no TA support over the weekends.

Yisong Yue yyue@caltech.edu

Milan Cvitkovic | mcvitkov@caltech.edu |

Jagriti Agrawal | jagrawal@caltech.edu |

Avi Dutta | adutta@caltech.edu |

Andrew Kang | akang@caltech.edu |

Emily Mazo | emazo@caltech.edu |

Sidd Murching | smurching@caltech.edu |

Suraj Nair | snair@caltech.edu |

Sarthak Sahu | ssahu@caltech.edu |

- Homework LaTeX template (could be useful for writing homework solutions) [zip]
****Updated Due Time!!****Homework 1, due 9pm on Jan 13th via Moodle [assignment][data]- Homework 2, due 9pm on Jan 20th via Moodle [assignment][data]
- Homework 3, due 9pm on Jan 27th via Moodle [assignment]
- Homework 4, due 9pm on Feb 3rd via Moodle [assignment][code]
- Kaggle Miniproject, deadline 2pm on Feb 9th, report due 9pm on Feb 13th
**(UPDATED)**via Moodle [info] - Homework 5, due 9pm on Feb 17th via Moodle [assignment][code]
- Miniproject 2, poem submission due 9pm on Feb 24th via Piazza, report due 9pm Feb 27th via Moodle [assignment][dataset]
- Homework 6, due 9pm on March 3rd via Moodle [assignment][data/code]
- Miniproject 3, due 9pm on March 10th via Moodle [assignment][data][guide]

Note: schedule is subject to change.

Further Reading: |
||||

1/05/2017 | Lecture: | Administrivia, Basics, Bias/Variance, Overfitting | [slides] | |

1/05/2017 | Recitation: | Introduction to Python for Machine Learning | [slides] | |

1/10/2017 | Lecture: | Perceptron, Gradient Descent | [slides] | Daume Chapter 3 Mistake Bounds for Perceptron [link] AdaGrad [link] Stochastic Gradient Descent Tricks [link] Bubeck Chaper 3 |

1/12/2017 | Lecture: | SVMs, Logistic Regression, Neural Nets, Loss Functions | [slides] | |

1/12/2017 | Recitation: | Linear Algebra | [slides][iPython] | The Matrix Cookbook [link] |

1/17/2017 | Lecture: | Regularization, Lasso | [slides] | Murphy 13.3 |

1/19/2017 | Lecture: | Decision Trees, Bagging, Random Forests | [slides] | Overview of Decision Trees [pdf] Overview of Bagging [pdf] Overview of Random Forests [pdf] |

1/19/2017 | Recitation: | NO RECITATION | ||

1/24/2017 | Lecture: | Boosting, Ensemble Selection | [slides] | Shapire's Overview of Boosting [pdf] |

1/26/2017 | Lecture: | Deep Learning (taught by Joe Marino) | [slides] | Deep Learning Book [html] A Brief Overview of Deep Learning. [link] |

1/26/2017 | Recitation: | Keras Tutorial | [slides] | [link] |

1/31/2017 | Lecture: | Deep Learning Part 2 (taught by Joe Marino) | [slides] | |

2/2/2017 | Lecture: | Recent Applications | [slides] |
Edge Detection [paper] Visual Speech [project][paper] |

2/2/2017 | Recitation: | Probability & Sampling | [slides] | |

2/7/2017 | Lecture: | Probabilistic Models, Naive Bayes | [slides] | Murphy 3.5 |

2/9/2017 | Lecture: | Hidden Markov Models | [slides][notes] | Murphy 17.3--17.5 |

2/9/2017 | Recitation: | NO RECITATION | ||

2/14/2017 | Lecture: | **CANCELLED** Deep Generative Models |
||

2/14/2017 TUESDAY 7-8pm |
Recitation: | Dynamic Programming | [slides] | |

2/16/2017 | Lecture: | Unsupervised Learning, Clustering, Dimensionality Reduction | [slides] | |

2/21/2017 | Lecture: | Latent Factor Models, Non-Negative Matrix Factorization | [slides] | Original Netflix Paper [link] |

2/23/2017 | Lecture: | Embeddings | [slides] | Locally Linear Embedding [link] Playlist Embedding [link] word2vec [link] |

2/23/2017 | Recitation: | NO RECITATION | ||

2/28/2017 | Lecture: | Recent Applications | [slides] | Lasso for cancer detection [paper] Badge dictionary learning from twitter [paper] Deep learning for visual style [paper] |

3/2/2017 | Lecture: | Deep Generative Models (taught by Taehwan Kim) | [slides] | |

3/2/2017 | Recitation: | NO RECITATION | ||

3/7/2017 | Lecture: | Survey of Advanced Topics | [slides] | |

3/9/2017 | Lecture: | Review & Q/A |

- Stochastic Gradient Descent Tricks [link]
- Papers on Ensemble Selection. [paper1][paper2][KDD Cup Report]
- Practical Bayesian Optimization for Efficient Grid Search of Tuning Parameters. [paper][software]
- Reasonably Accessible Paper on Regularized Multi-Task Learning. [paper]
- Overview of Topic Models. [paper]
- Overview of Structural SVMs. [paper]
- A Brief Overview of Deep Learning. [link]
- Tutorial on Learning Reductions. [pdf]
- The Matrix Cookbook (a lot of useful properties of matrices). [link]
- Learning Reductions Overview. [paper]