Evaluation in machine learning

My Phd student Arnaud de Myttenaere (jointly advised by Prof. Bénédicte Le Grand) worked during in thesis on evaluation in machine learning. This Cifre thesis took place at Viadeo, a professional online social network. From the concrete problems faced his day to day data scientist position at Viadeo, Arnaud extracted a collection of theoretical problems revolving around quality evaluation in machine learning.

The first category of problems we studied concerns discrepancies between the learning set and the real data distribution the model will be facing. A typical framework to handle this problem is the covariate shift one. Arnaud studied situations that are not covered by this framework and introduced several specialized solutions based on instance weighting.

The second category of problems relates to non standard evaluation metric, especially the Mean Absolute Percentage Error (MAPE). This measure is used in application contexts where the values to predict are always far from zero (for instance prices or consumption of those goods) and where mistakes are judged relatively to the value to predict. Interestingly, the MAPE does not fulfill standard assumptions generally used for loss functions. Moreover, models are in general fitted with another loss functions and then evaluated with the MAPE. Arnaud derived a full framework for the MAPE, from its actual minimization in a learning procedure to the consistency of the empirical risk minimization when using it.

This work is covered by the following publications:

  • Mean Absolute Percentage Error for regression models (2016) Arnaud De Myttenaere, Boris Golden, Bénédicte Le Grand and Fabrice Rossi. Neurocomputing, volume 192, pages 38 - 48, June 2016.
  • Study of a bias in the offline evaluation of a recommendation algorithm (2015) Arnaud De Myttenaere, Boris Golden, Bénédicte Le Grand and Fabrice Rossi. In Advances in Data Mining (proceedings of the 11th Industrial Conference on Data Mining, ICDM 2015), edited by Petra Perner, pages 57-70, Hamburg, Germany, July 2015.
  • Consistance de la minimisation du risque empirique pour l'optimisation de l'erreur relative moyenne (2015) Arnaud De Myttenaere, Bénédicte Le Grand and Fabrice Rossi. In Actes des 47èmes Journées de Statistique de la SFdS, Lille, France, June 2015.
  • Using the Mean Absolute Percentage Error for Regression Models (2015) Arnaud De Myttenaere, Boris Golden, Bénédicte Le Grand and Fabrice Rossi. In Proceedings of the 23-th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015), pages 113-118, Bruges, Belgique, April 2015.
  • Reducing offline evaluation bias of collaborative filtering algorithms (2015) Arnaud De Myttenaere, Boris Golden, Bénédicte Le Grand and Fabrice Rossi. In Proceedings of the 23-th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015), pages 137-142, Bruges, Belgique, April 2015.
  • Reducing Offline Evaluation Bias in Recommendation Systems (2014) Arnaud De Myttenaere, Boris Golden, Bénédicte Le Grand and Fabrice Rossi. In Proceedings of 23rd annual Belgian-Dutch Conference on Machine Learning (Benelearn 2014), edited by Benoît Frénay, Michel Verleysen and Pierre Dupont, pages 55-62, Brussels (Belgium), June 2014.

The defense

took place on the 4th of November. Arnaud gave an excellent speech in front of the following jury:

  • Prof. Patrick Gallinari, Université Pierre et Marie Curie, referee
  • Prof. Nicolas Vayatis, ENS Paris-Saclay, referee
  • Dr. Boris Golden, Partech Ventures
  • Dr. Mathilde Mougeot, Université Paris Diderot
  • Prof. Gérard Biau, Université Pierre et Marie Curie,, president of the jury
  • Prof. Bénédicte Le Grand, CRI, co-adviser

and myself.

The summary of the thesis follows:

The offline evaluation permits to estimate the quality of a predictive model using historical data before deploying the model in production. To be efficient, the data used to compute the offline evaluation must be representative of real data. In this thesis we describe the case when the historical data is biased. Through experiments done at Viadeo (french professional social network) we suggest a new offline evaluation procedure to estimate the quality of a recommendation algorithm when the data is biased. Then we introduce the concept of Explanatory Shift, which is a particular case of bias, and we suggest a new approach to build an efficient model under Explanatory Shift. In the second part of this thesis we discuss the importance of the loss function used to select a model using the empirical risk minimization method (ERM), and we study in detail the particular case of the Mean Absolute Percentage Error (MAPE). First we analyze necessary conditions to ensure that the risk is well defined. Then we show that the model obtained by ERM is consistant under some assumptions. Arnaud de Myttenaere, Évaluation hors-ligne d'un modèle prédictif : application aux algorithmes de recommandation et à la minimisation de l'erreur relative moyenne

The thesis is available on TEL here (it's written French).