Last update in 2019.


This course given in English in the MMMEF master. It starts with an introduction to the Big Data phenomenon and the focuses on the predictive methods of data science (a.k.a. machine learning methods).

Lectures notes/slides

Please note that the R scripts below have been extracted automatically from the knitr sources of the slides. They must be adapted to run properly: paths to data files must be modified and the opt_chunk related code must be removed. The code is developed under GNU/Linux and uses frequently the doMC package which is not available under MS Windows. It should be replaced by the doParallel package (and the code should be adapted).


  • lecture notes for the introductory course: in English
  • slides for the data science introductory course: in English
  • slides on an introduction to machine learning: in English (R code)
  • a short introduction to computational complexity: in English


Supervised learning models

  • slides on decision trees: in English (R code)
  • slides on optimal models and naive bayes: in English
  • slides on support vector machines and kernel methods: in English (R code)
  • slides on ensemble methods (including random forests and boosting): in English (R code)

Tools for supervised learning

Unsupervised learning


  • slides on empirical risk minimization: in English
  • slides on regularization and capacity control: in English
  • a more advanced and more thorough presentation of the same concepts are available in my slides on learning theory: in French and in English


Recommended reading/viewing

General papers

Relational databases (and SQL)

Machine Learning

Full course

Tom Mitchell's and Nina Balcan's machine learning course:

Selected topics