The java programs provided on this web page implement a graph clustering and visualization method described in the following papers:
The method is based on two main components implement by two different standalone programs: a graph clustering program and a graph layout calculation program. They operate on a basic file format for graphs and handle only undirected (but possibly weighted) graphs. A graph is given by the list of its edges. Each line of the file contains one edge described by the identifiers of the two vertices connected and by an optional weight. Those two or three elements must be separated by white spaces or tabulations. Vertex identifiers can be any sequence of symbols without white spaces or tabulations.
My programs are implemented in Java and tested under Linux with Oracle Java
Hotspot 64-Bit Server VM (1.6.x and 1.7.x). The implementation is quite
memory hungry and you might need to allow the JVM to use a large amount of
memory for large graphs. This is done using the non standard options, such as
java -Xmx1G -jar something.jar to allow the JVM to use up to 1GB
The programs provided in this page have been implemented using two open source libraries:
Jar files are built using the Autojar open source tool.
You can use my programs for whatever purpose, but if you do so, I request a citation of the French paper above and a link to the present page. I will gladly list papers that use my code below. If you are interested in the source code, please contact me.
The clustering method implemented in the program has been used in the following papers:
Reformatted for the tools available here, the graph is described by two files:
The first step of the method consists in clustering the vertices of the graph. This is done with the following command line (assuming all needed files are in the working directory):
java -jar clustering.jar -graph polbooks-edges.txt -mod polbooks-modularities.txt -part polbooks-clustering.txt -random 100 -recursive
On a Intel Core 2 Duo P8600, this invocation take less than 10 seconds and produces two result files:
The second step of the method produces several layouts of the graph using the clustering results. This is done with the following command line:
java -jar hlayout.jar -graph polbooks-edges.txt -part polbooks-clustering.txt -nlayout 100 -outgraph polbooks -layout polbooks -saveall
In the particular case of the political books graph, this produces four layouts, from the coarsest partition with four clusters to the finest one with 14 clusters. Results can be downloaded here.
The visual results are given below. The first representation is the graph of the clusters of the best partition (in terms of modularity). Each subsequent representation is obtained by splitting a cluster into sub-clusters, using the same clustering technique. Layouts satisfy the minimum surprise principle as the sub-clusters are drawn at the position previously occupied by their main cluster.
The method is based on maximal modularity clustering. It implements a variant of the multi-level algorithms studied in Multi-level Algorithms for Modularity Clustering. Andreas Noack and Randolf Rotta. Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257–268, 2009. The main variation is that final clusters are guaranteed to be connected.
In addition to classical graph clustering, the implementation provides two important features:
The output file formats are described in the documentation file.
The visualization method leverage the hierarchical clustering results of the other program. The main idea consists in representing the graph of the clusters in a way that enables expanding clusters into sub-clusters. The program does not provide a visualization but only a layout (or several layouts) of the graph, that is vertices coordinates. Those coordinates can be used in R, for instance, in order to build an actual visualization.