This page, originally published in late October (2012), gives a summary of my research activity. All the corresponding publications are available on my publication page and directly below.
Most of my work published in 2012 has been dedicated to clustering. I've been working with PhD student Matthieu Durut on Cloud implementation of online versions of the k-means and with my former PhD student Brieuc Conan-Guez on clustering of dissimilarity data. For this last work, we leveraged our former studies on graph clustering to use multilevel refinement tricks in the dissimilarity context. I hope to keep on working on this direction in the following months.
An example of multilevel refinement: the two classes top level partition is modified using sub-clusters at different levels in the clustering hierarchy.
I've been advising Mohamed Khalil El Mahrsi since early 2011. We work on trajectory clustering with a focus on objects (e.g., cars) moving on a constrained network. We had a series of publications in 2012:
We are now working on co-clustering of this type of data.
Road occupations are color coded using a standard heat colormap (red is for saturated roads, grey for unused ones). The width of the road is also proportional to its occupation.
The display uses a similar colormap but with local weighting, that is red corresponds here to the most used roads by the trajectories of this cluster.
I've been working on functional data since quite a long time (2000, roughly). My most recent paper on the subject is with my PhD student Romain Guigourès and his adviser at Orange Labs Marc Boullé. It's an application of Marc's MODL method to unsupervised learning, more precisely to clustering functional data. The method is highly efficient and provides density estimates in a completely parameter less way. A drawback of MODL is its ability to find subtle patterns which are not always easy to interpret. Romain has been working on ways to circumvent this problem and to enable exploratory analysis based on MODL's proved density estimation quality.
We applied this simplification techniques to the electric power consumption data set available here and obtained the following results, among others.
A complete representation of the power consumption data set over the year. Each line is a month, each column a day in the month.
Four clusters of dates according to MODL. The clusters are mostly dominated by average consumption effects, but with some subtlety. For instance, the purple clusters gather days with rather high average power consumption but with a distinctive aspect (compared to the blue cluster): the corresponding days have a high power consumption at night (from midnight to six am), a rare fact in the data set (see May the 6th for a concrete example).
We have also applied MODL to graph clustering. We consider in particular the case of temporal graphs: we have a fixed set of actors (the vertices of the graph) that interact at certain times. Each (directed) interaction is timestamped (we allow as many interactions as needed between two given actors, that is we work with directed multi-graphs). Using MODL, we build a block model for temporal graphs which clusters actors as sources of interaction and actors as receivers of interaction, and segments simultaneously the time line into intervals. This work has been presented at the Co-Clustering workshop of IEEE ICDM 2012 in December 2012 English and in October 2012 at the Marami conference in French.
I've been working since 2008 on the analysis of a large historical data set extracted from notarial contracts from 1200 to 1500 in a small Seigneurie from the south of France. This data set is so unique that my initial work on the subject was even mentioned in Nature News! In 2012, I kept working on those data in two directions:
Extended versions of those abstracts will be published in 2013 (the geographical analysis is already available).
I've not published anything in information visualization this year (even if the work with Romain includes nice and rather original visual representations of our results), but I've co-organized in February the Dagstuhl seminar on Information Visualization, Visual Data Mining and Machine Learning. This was a unique event which gathers specialists of information visualization and of machine learning in the perfect Dagstuhl environment. A summary of our work is available as a Dagstuhl report and the seminar was presented in Informatik-Spektrum.
I've also published in 2012 a chapter on information propagation in social networks. This is a quite unique work in the sense that it is based on polls and interviews conducted in France with a longitudinal study on a very large set of 4500 persons. Those data enabled us to design a local propagation model that was then implemented on random graphs to study its global properties.
Finally, I had the opportunity to participate to a survey article on neural networks for complex data in which I described, among others, my work on functional data and on dissimilarity data.