My Phd student Tsirizo Rabenoro (jointly advised by Prof. Marie Cottrell) worked during in thesis on aircraft engine health monitoring. This Cifre thesis took place at Snecma, one of the two world leaders in aircraft engine. The world of aircraft engine is representative of the constraints we are constantly facing when doing data science in industrial contexts. The ultimate goal is to build explainable and auditable classifiers. In the health monitoring context this means detecting early signs of possible failures in a way that can be explained to the field experts allowing them to accept (or not!) the automated decision.
Tsirizo's strategy was to use a simple classifier (the naive bayes classifier) and to combine it with low level detectors defined by field experts. The key idea is that experts are able to describe in a rough way early signs of failures. However those descriptions need to be tuned to the actual data (for instance, some threshold have to be set). In addition, their predictive power is generally rather low. Therefore we need to use a large number of those low level detectors and to combine them using a basic classifier. A basic combination of those detectors works only with machine learning methods that can handle high dimensional noisy inputs, e.g., random forests. Unfortunately, they belong to the black box class of models and thus are unacceptable for field experts. Tsirizo showed that using advanced feature selections it was possible to reduce significantly the number of low level tests needed to achieve acceptable performances while using an easy to understand white box classifier, the naive bayes one. This work is covered by the following publications:
took place on the 18th of September. Tsirizo gave an excellent speech in front of the following jury:
The summary of the thesis follows:
Identifying early signs of failures in an industrial complex system is one of the main goals of preventive maintenance. It allows to avoid failure and reduce the degradation on a component by doing an earlier maintenance operation. Health monitoring for aircraft engines is one of the industrial fields for which this anomaly detection is very important and meaningful. Aircraft engine manufacturers such as Snecma collect large amount of engine related data during each flight. The idea is to be able to automatically detect when the engine is deviating from its normal behavior. Thus Snecma is developing applications allowing people to prevent engine failures by detecting early signs of anomaly. This doctoral thesis is introdulcing how the experts' knowledge is used to process this engine related data. This first step has pointed out the difficulties in handling the data whether relating to their storage or relating to processing algorithms themselves. After that, this thesis offers a method to combine experts' knowledge with machine learning processes which follow Snecma needs such as the combination of various informations, error control or the interpretability of diagnostics results. To do that the method is focusing directly on the data from the algorithms developed by the experts themselves. This is done by homogenizing the data and then by merging these data. This step allows for the use of supervised classification algorithms whom goal are to to group the items (here the engines) of a similar nature in the same class without losing the temporal component of the information. The homogenization of the data also allows the use of monitoring applications developed by experts in order to detect anomalies. Before merging the data, a selection algorithm is used. This thesis describes how the selection process allows the monitoring algorithms to calibrate themselves. Moreover, this selection follows the first constraint imposed by Snecma concerning the interpretability of the results. Eventually, the method introduced in this thesis aims at helping Snecma make the anomalies' labels converge for all its users. It also aims at incitating to gather all the data on a single database containing : the raw and the processed data from the engine and the engine related data that could be useful such as the results from experts analysis, etc. Using this database, this thesis can then offer a labelisation tool that can be used to improve selection and classification algorithms. Tsirizo Rabenoro, Outils statistiques de traitement d'indicateurs pour le diagnostic et le pronostic des moteurs d'avions
The thesis is available on TEL here (it's written French).