My Phd student Marco Corneli (jointly advised by Pierre Latouche) worked during his thesis on dynamic graphs.
The core idea of the thesis was to avoid the snapshotting trick which is generally used to turn interaction data with continuous time stamps into a time series of interaction graphs. While this is a nice way to switch to a discrete time view, one needs to have an idea of the natural time scale of the interaction process. We need also to assume that this time scale is somewhat stationary. My first incursion in this non snapshotting idea was during the Phd thesis of Romain Guigoures. We studied at that time a variant of the MODL approach. In the present thesis, we focused on generative models.
The key idea was to consider that the flow of interactions between two actors follows a non homogeneous Poisson point process. Then we assumed the intensity function to depend on pairwise hidden latent variables. In other words the temporal behavior of the intensity between two actors depend only on the classes of those actors.
Using this idea as a guide, Marco was able to develop several models for dynamic graphs. Some of them still use a snapshotting trick, especially the quite advanced one that mixes text and dynamic graphs, but one of them achieve the initial goal of the thesis by enabling us to analyze dynamic interactions in continuous time.
This work is covered by the following publications:
took place on the 17th of November. Marco gave a brilliant speech in front of the following jury:
and myself.
The summary of the thesis follows:
Graphs are mathematical structures very suitable to model interactions between objects or actors of interest. Several real networks such as communication networks, financial transaction networks, mobile telephone networks and social networks (Facebook, Linkedin, etc.) can be modeled via graphs. When observing a network, the time variable comes into play in two different ways: we can study the time dates at which the interactions occur and/or the interaction time spans. This thesis only focuses on the first time dimension and each interaction is assumed to be instantaneous, for simplicity. Hence, the network evolution is given by the interaction time dates only. In this framework, graphs can be used in two different ways to model networks: 1) Discrete time. A network is observed at several times and a graph is associated with each observation time. Two nodes of a graph are connected if one or more interactions occurred between them in the corresponding time frame. Thus, interactions are aggregated between two consecutive observation times and the exact interaction dates are lost. In this context, a dynamic network is represented by a sequence of graphs. 2) Continuous Time. Several edges are allowed to connect the nodes of a graph at different times. One edge is uniquely associated with a pair of nodes and a time point. No aggregation is required and interaction times are never lost. Therefore, a dynamic network is represented by a single multiple graph whose edges are labeled by the interaction times. In this thesis both these perspectives are adopted, alternatively. We consider new unsupervised methods to cluster the nodes of a graph into groups of homogeneous connection profiles. In this manuscript, the node groups are assumed to be time invariant to avoid possible identifiability issues. Moreover, the approaches that we propose aim to detect structural changes in the way the node clusters interact with each other. The building block of this thesis is the stochastic block model (SBM), a probabilistic approach initially used in social sciences. The standard SBM assumes that the nodes of a graph belong to hidden (disjoint) clusters and that the probability of observing an edge between two nodes only depends on their clusters. Since no further assumption is made on the connection probabilities, SBM is a very flexible model able to detect different network topologies (hubs, stars, communities, etc.). By adapting the block modeling perspective of SBM to dynamic graphs, the main contributions of this thesis are the following: 1. We introduce a new extension of SBM for dynamic graphs. The proposed approach, called dSBM, adopts non homogeneous Poisson processes to model the interaction times between pairs of nodes in dynamic graphs, either in discrete or continuous time. The intensity functions of the processes only depend on the node clusters, in a block modeling perspective. Moreover, all the intensity functions share some regularity properties on hidden time intervals that need to be estimated. 2. A recent estimation algorithm for SBM, based on the greedy maximization of an exact criterion (exact ICL) is adopted for inference and model selection in dSBM. To the best of our knowledge, this is the first time this algorithm is adopted for inference in dynamic stochastic block models. 3. An exact algorithm for change point detection in time series, the "pruned exact linear time" (PELT) method is extended to deal with dynamic graph data modeled via dSBM. The approach we propose can be used for change point analysis in graph data. 4. A further extension of dSBM is developed to analyze dynamic networks with textual edges (like social networks, for instance). In this context, the graph edges are associated with documents exchanged between the corresponding nodes. The textual content of the documents can provide additional information about the dynamic graph topological structure. The new model we propose is called "dynamic stochastic topic block model" (dSTBM). This manuscript is organized as follows. In the first chapter, we pass through the main notions of graph theory and review some stylized facts about real networks. Two formal definitions of dynamic graph are provided. Then, the main existing generative models for static and dynamic random graphs are presented along with their associated inference procedures. Finally, some statistical tools not necessarily related with network analysis are described in detail since they are used in later chapters. In the second chapter, two versions of dSBM are introduced, both dealing with discrete time dynamic graphs. The corresponding inference procedure aims to maximize the complete data integrated log-likelihood, thus allowing us to learn the model parameters and select the number of clusters at the same time. In the third chapter, we model continuous time dynamic graphs via dSBM and focus on clustering and change point analysis in graph data. A standard variational approach is adopted for the inference and one step of the estimation algorithm relies on the PELT method. Finally, the fourth chapter introduces the dSTBM for discrete time dynamic graph with textual edges. The inference procedure is detailed and a model selection criterion is formally obtained. The last part of each chapter is devoted to experiments on both simulated and real data. These experiments allow us to highlight the features of the proposed approaches and to compare them with alternative methods. Marco Corneli, Dynamic stochastic block models, clustering and segmentation in dynamic graphs
The thesis is available on TEL here.
Published
17 November 2017
Tags