Slides of my recent talks are available below.
Making sense of medium to large data sets remains a very difficult challenge, especially when both the number of objects and the number of instances are large. The classical way of exploring such data sets remains a combination of clustering methods and low dimensional visual representations. Clustering methods are used to group similar objects while low dimensional visual representations enable the analyst to make sense of the relationships between clusters. However, truly high dimensional data sets cannot be represented faithfully in low dimension, a fact that strongly limits the practical usefulness of this standard analysis methodology on modern data sets. A potential solution is offered by the co-clustering framework in which both objects and variables are summarized. The main advantage of clustering variables rather than trying to build a low dimensional representation is that the former scales easily to complex data with high intrinsic dimension. However, most co-clustering methods cannot handle large data sets or mixed data sets (with numerical and categorical variables). I will present in this talk a general principle based on grid modeling which can be used in particular to circumvent the limitations of co-clustering and thus to explore medium to large scale data sets. I will first present the general idea of generative modeling, then introduce our non parametric generative model. I will give examples of the way the general idea can be adapted to different settings. The last part of the talk will be focused on the co-clustering case.