Peter Orbanz • Teaching

This class provides an introduction to Machine Learning and its core algorithms.

Course Slides

Here is the complete set of course slides (version: 1 May) as a single file.

Teaching Assistants

Lu Meng (lumeng@stat.columbia.edu)
Office hours: Tue 5:30-7:30pm, 1025 SSW (tenth floor, Department of Statistics)
Jingjing Zou (jingjing@stat.columbia.edu)
If you have questions on how your homework was graded, please address them to Jingjing.

Homework

Number 1 (due: 13 Feb)
Number 2 (due: 4 Mar)
Additional files: Digit data and fakedata.R
Number 3 (due: 3 Apr)
Number 4 (due: 17 Apr)
Additional files: histograms.zip
Number 5 (due: 1 May)

Textbooks

The course is not based on a specific textbook. The relevant course materials are the slides.

First half of the class

If you would like to complement lectures and slides by further reading, probably the best reference for the first half of the class (roughly up to the midterm) is:

The Elements of Statistical Learning
T. Hastie, R. Tibshirani and J. Friedman.
Second Edition, Springer, 2009.
[Available online here]

Here are some pointers to specific chapters:

Topic	Chapter
Linear classifiers, Perceptron	4.1, 4.5
Maximum margin classifiers	SVMs
Kernels	12.3
Model selection and cross validation	7, in particular 7.10
Trees	9.2
Boosting	10.1, 10.8
Bagging	8.7
Random Forests	15
Linear regression	3.2
Shrinkage	3.4

Second half of the class

There is unfortunately no single book that covers all topics in the second half of the class well, but some useful sources are:

Pattern Recognition and Machine Learning.
Christopher M. Bishop.
Springer, 2006.

Machine Learning: A Probabilistic Perspective.
Kevin P. Murphy.
MIT Press, 2012.

Bayesian Reasoning and Machine Learning.
David Barber.
Cambridge University Press, 2012.
[Available online]

Other references

Information Theory, Inference, and Learning Algorithms.
David J. C. MacKay.
Cambridge University Press, 2003.
[Available online]

Pattern Classification.
Richard O. Duda, Peter E. Hart, David G. Stork.
Wiley, 2001.

Convex Optimization.
Stephen Boyd and Lieven Vandenberghe.
Cambridge University Press, 2004.
[Available online]

Syllabus

There will be five or six homework assignments; you will usually have two weeks to complete each homework. The final grade will be computed as

40% homework + 30% midterm + 30% final exam

The midterm will cover the material of the first half of the class. The final will cover only the material covered after the midterm; you will not have to repeat everything all over again.

Preliminary list of topics

Week	Content
1	Introduction
	Review of basic concepts: Maximum likelihood, Gaussian distributions, etc.
2	Classification basics: Loss functions, naive Bayes, linear classifiers
3	Support vector machines, convex optimization
4	Kernels; model selection and cross validation
5	Ensemble methods: Boosting, bagging, random forests
6	Regression: Linear regression, regularization, ridge regression
7	Linear algebra review, high-dimensional and sparse regression
8	Dimension reduction, data visualization, principal component analysis
9	Clustering, mixture models and EM algorithms
10	Information theory; Text analysis
11	Markov models, PageRank
12	Hidden Markov models, speech recognition
13	Bayesian models
14	Sampling algorithms and MCMC

Statistical Machine Learning (W4400) • Spring 2014