This course has been given in November 2008 at the Computational Intelligence and Learning doctoral school. It gives a rather complete and self contained introduction to statistical learning theory for students and researchers with a good background in probability and statistics, and with some familiarity with machine learning.

Slides are available here.

This course has been mostly prepared with the help of the excellent book by
Luc Devroye, László Györfi et Gábor Lugosi, *A Probabilistic Theory of
Pattern Recognition*, published by Springer in 1996. Despite its age, I
believe this book to be a perfect reference on the subject because of its
extraordinary writing quality.

I've also used many articles to prepare the course; most of them are freely available online:

- General articles:
- S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: a survey of some recent advances. ESSAIM; Probability and Statistics, 9:323-375, November 2005
- O. Bousquet, S. Boucheron, and G. Lugosi. Advanced lectures in machine learning, volume 3176 of LNAI, chapter Introduction to statistical learning theory, pages 169-207. Springer-Verlag, 2004
- S. Kulkarni, G. Lugosi, and S. Venkatesh. Learning pattern classification - a survey. IEEE Transactions on Information Theory, 44(6):2178-2206, October 1998.

- Some specific results:
- Regression : G. Lugosi and K. Zeger. Nonparametric estimation via empirical risk minimization. IEEE Transactions on Information Theory, 41(3):677-687, May 1995.
- Structural risk minimization: G. Lugosi and K. Zeger. Concept learning using complexity regularization. IEEE Transactions on Information Theory, 42(1):48-54, January 1996.
- SVM: I. Steinwart. Consistency of support vector machines and other regularized kernel machines. IEEE Transactions on Information Theory, 51(1):128-–142, January 2005.