Assess the significance of modularity (community structure)

Hi There,
I am working with a matrix of occurrences (abundances) of sites x species (dim: 30 x 25), and I was implementing a modularity analysis with the cluster_walktrap and cluster_louvain functions from the igraph package. First of all, I wanted to know if it is correct to apply these algorithms to this matrix with the characteristics that it is a very small and bipartite matrix, otherwise which algorithm would be the most correct. Secondly, when I go to evaluate the value of modularity, it is not clear to me if I can already assume the significance or should I offset it against a null model, in the latter case, what do you suggest to implement? e.g: null models and then a z-score?
Lastly, I do not undertsand why I have several modularity values: for the cluster_waltrap function I get 52 values (when running …$modularity), and for cluster_louvain I get 3 modularity values, that i belive its corresponds to three possible divisions of nodes into clusters.

Hi @mar-iana,

Just for completeness, the easiest way to construct the bipartite graph based on the bipartite matrix M would be:

G <- graph_from_incidence_matrix(M)

The size of the graph is irrelevant for the detection of clusters. However, the fact that the graph is bipartite will affect the results. For that reason, it would be better to choose an algorithm that is specifically adapted to bipartite graphs. See here for one method that allows you to detect clusters in bipartite graphs: https://cloud.r-project.org/web/packages/leiden/vignettes/run_bipartite.html .

The cluster_walktrap and cluster_louvain algorithm returns a full dendogram of “merges”. The $modularity returns the list of the modularity values across all “merges”. The highest modularity is the merge for that resulted in the returned $membership. See Community structure via short random walks — cluster_walktrap • igraph and Finding community structure by multi-level optimization of modularity — cluster_louvain • igraph for more details.

With regards to the question of the significance: no, you cannot assume that whatever modularity value you get would be significant. In particular, sparse graphs can have quite high modularity values. Indeed, if you construct a random graph, and then run some modularity-based community detection algorithm on it, it will typically find a positive modularity value. For that reason, if you are interested in establishing some sort of significance, you should compare the observed modularity value with a distribution of modularity values of random graphs (for which you might keep certain things constants, such as the degree, of the bipartite type, in this case).

Thank you so much @vtraag for your attention.
The answer was really helpful to clarify some issues about the estimation of modularity.

Just to continue completing this topic, to check for significance I implement 1000 random graph using a configurational model, typically the “sample_degseq” function, but with a tricky modification for bipartite graphs. It would bee a good idea include in this function the variant for bipartite, to the best of my knowledge it is not yet possible.
Finally, I compared the distribution of the modularity of the random graph with the modularity of the observed graph.

Thanks again
Greetings

If you have some code that is doing this, you might consider contributing it to igraph itself? Have you coded this yourself in R, or in C?

The issue for this is here: Sampler for bipartite graphs with given degrees · Issue #341 · igraph/igraph · GitHub Note that this must be done carefully, as for an application like yours we need to guarantee uniform sampling.

The straightforward generalization of the configuration model to bipartite graphs is to take two stub vectors for the two partitions, and shuffle one (i.e. produce a random permutation), then connect stubs that are in the same position in the two vectors. Just like with the standard configuration model, it can be shown that this will sample simple graphs uniformly, but not multigraphs. The result can be restricted to simple graphs by restarting the generation whenever a multi-edge is created.

The bipartite configuration model could be implemented as a new function called igraph_bidegree_sequence_game().

All this said, for applications like yours it is more practical and much faster to use rewiring, see Randomly rewire graph while keeping bipartite degree sequencce · Issue #2319 · igraph/igraph · GitHub