Research Activities

I'm working on Data Mining (in a very broad sense), with a focus on statistical machine leaning methods, kernel methods (such as support vector machines) and artificial neural networks (most of my publications are available on line). My current research interests include more specifically non standard data, more precisely functional data and non vector data (described by dissimilarity matrices). When I work on a particular application field, I try to confront neural methods to other methods and also to provide practical solutions without prior preferences. I've worked in particular on spectrometric problems and more recently on web usage mining.

My research activities also include reviewing for conferences and journals, as well as research contracts. I've also directed some PhD thesis.

Research themes

Functional Data

An important part of my research activity focuses on functional data analysis (FDA). In this framework, data are not finite dimension vectors but functions from an infinite dimensional space. This introduces both theoretical and practical problems. My contribution to FDA has been to show that neural networks and support vector machines are as efficient for this type of data as they are for standard vector data. I've provided both practical and theoretical evidences to support this claim.

Non vector data

The second important part of my research activity concerns non vector data described by similarity or dissimilarity matrices. In this framework, the only available knowledge on the N input data consists in a NxN matrix that contains pairwise (dis)similarity between all the data. With some colleagues, I've defined new versions of Prof. Kohonen's Self Organizing Map (SOM) that can handle such data (variations of the Median SOM introduced by Prof. Kohonen and Dr. Somervuo, as well as an algorithm inspired by the relational approaches introduced by Hathaway, Davenport and Bezdek for k-means and its variants). Our aim is both to improve clustering quality, but also to reduce the computational cost of the dissimilarity based methods.

I've also explored some simple vector representation methods for some structured non vector data, such as interval data.

Feature selection

I've also investigated the very important theme of feature selection. My first work in this field used derivatives of the regression function as estimated by multi-layer perceptron in order to assess the predictive power of a feature. More recently, I've studied the k-nn based estimators of the mutual information proposed by Kraskov, Stögbauer and Grassberger. I'm particularly interested in assessing the actual gain provided by a feature via resampling techniques.

Application fields

Spectrometry/Chemometrics

One of the main application fields of functional data analysis is Spectrometry. In this field, observations consist in spectra that are smooth functions sampled with high precision (such as 1000 samples for each spectrum). A spectrum maps wavelengths to some response, such as the absorbance for near infrared spectrometry.

I've applied neural models and support vector machines to spectrometric problems, using the functional approach. The results were very good and showed that the functional framework provides very satisfactory answers to the problems induced by the high dimension of the spectra.

As an alternative to functional methods, I've also worked on variable selection applied to spectrometry and joined with non linear neural models.

Web mining

Web mining provides some very interesting non vector data. Web content mining focuses on the content of web site and therefore deals with texts, images and other similar contents. I'm more interested in web usage mining. In this application field, data consist in descriptions of user activities on the web server, obtained thanks to the log files of this server. I've applied an adapted Self Organizing Map to this type of data, in order to cluster and visualize the content of a web site, using the usage data only. The usage data are used to define a dissimilarity between the site content.

As an alternative to neural models, I've also applied to the same data graph based visualization methods coming from bibliography analysis. The visualizations provide complementary views on the data.