Details for cluster_louvain local moving heuristic (for r users)

Hi

I’d be interested in gaining a better understanding of how cluster_louvain specifically deals with the local moving heuristics i.e. the first stage of the standard two-step procedure as per Blondel et al. J. Stat. Mech. Theor. Exp. (2008).

Conscious of the following:

  1. A detailed description of cluster_louvain for R users is unavailable, as it relies on functions developed in a C-layer
  2. Similar questions have been previously raised in the mailing list and on this forum, namely:
    2.1) This post (https://lists.nongnu.org/archive/html/igraph-help/2012-09/msg00079.html) raised some concerns about different results obtained by cluster_louvain and one matlab implementation seemingly ‘endorsed’ by the louvain developers.
    2.2) This post (Modularity (Q) based on the Louvain split, unexpected values) mentions another benchmark for cluster_louvain, the Brain Connectivity Toolbox, also in matlab - a pretty good match, it seems
  3. There are some language-agnostic, useful pseudocodes out there: although with the aim of describing louvain as a benchmark for improved algorithms see e.g.:
    3.1) Traag et al Sci Rep 9, 5233 (2019) - esp. supplementary materials;
    3.2) Waltman, L., van Eck, N.J. Eur. Phys. J. B 86, 471 (2013).

Long story short, none of the above helps me unpick the cluster_louvain function specifically; nor is it meant chiefly as a resource for R users.
Obviously, I’ve attempted my own implementation in R form scratch, relying solely on the ‘basics’ as I understand them. It goes without saying I’m getting mixed results: good matching for small toy examples (i.e. yields the same n. of communities and modularity score across partitions), but then my implementation strands from cluster_louvain with slightly larger examples (not even 200 arcs…).

Grateful if anyone could point me in the right direction re what’s underneath the local moving heuristics in the cluster_louvain function, and whether that might be made digestible to a C illiterate like myself.

Many thanks,
Ettore

Hi Ettore (or ciao),

Welcome to the forum!

Vincent is the first author of Leiden, so he knows the inner workings of Louvain, smart local move, faster versions thereof, and eventually Leiden. He’s on holiday by will be back in a few days.

Since I have a little knowledge too, I’ll attempt to answer you in the meanwhile. I’m not an R user myself, using Python instead, but am happy to help if I can.

First, why are you looking at Louvain in the first place? It’s an old, slow, and flawed algorithm, which is the whole point of the later improved algos. Do you have an actual graph you are trying to cluster and have observed different results with R-igraph and another library?

Second, the original paper and the Leiden paper more recently, but also the SLM paper from 2013 are pretty good at describing how the clustering happens including the first stage. I myself started there and then went down into the Leiden code without too much trouble. I’m not sure if you need an R-specific explanation of the algorithm, R developers should read those papers just fine.

Third, if you have an actual graph you are working on it would be simpler if you could just share it with us, and also the expected result for the clustering and related modularity. That way we could see if it’s an inconsistency in the implementation.

Cheers,
Fabio

Ciao Fabio (well spotted…)

Thank you for the heads up - I really appreciate. Below, I try to address the questions you raised in response to mine:

1 - I am only interested in ‘Louvain’ for the sake self-education. I chiefly deploy it to cluster text documents and, not being a developer, I could have happily lived trusting whatever happens to be built into applications such as Gephi, VOS viewer, and obviously igraph. Lately, I’ve been having some issues interpreting apparently small differences between clusters generated by igraph and Gephi (both using Louvain), prompting me to try and replicate what’s going on in there from scratch.

2 - Before I share a fully-fledged graph, it’s enough to consider a very simple example like the one in this post (https://medium.com/walmartlabs/demystifying-louvains-algorithm-and-its-implementation-in-gpu-9a07cdd3b010). I illustrate a ‘dry run’ leading to different results:
a - blogger answer: A = {1, 2, 3}; B = {4}; C ={5,6,7}; D = {8, 9}
b - igraph cluster_louvain answer (modularity: 0.44): A = {1, 2, 3}; B = {4, 5, 6}; C = {7, 8, 9};
c - my ‘from scratch’ answer (modularity: 0.401):
---- c.1) first pass: A = {1,4}; B = {2,3}; C = {5,6}; D= {7}; E = {8,9}
---- c.2) second pass: A = {1,2,3,4}; B = {5,6}; C = {7,8,9}

The core of my query is therefore, how wrong am I in arriving at answer (“c”)? So far, I reasoned as follows re the ‘first pass’:

for each node i
–set ‘best modularity gain’ to zero
–for each neighbouring node j’s community c_j
------remove node i from the community it is currently assigned to
------compute the changein modularity due to moving i into c_j
------if change in modularity > best modularity gain
----------call it c_j*
----------re-assign node i to c_j* (until next round)
----------best modularity gain <- modularity gain at this iteration
------else
----------restore previously assigned community
–end
end

I produced a tableau (that cannot be attached - but see picture below) to illustrate how the above pseudo-code runs in my ‘from scratch’ replication, generating answer “c”. One might notice the presence of ‘ties’ on modularity gains in several occasions.

image

Since R doens’t seem very popular on here, I spare you the actual R script but obviouly I’m happy to share if sensible.

Hopefully the exapmle above makes enough sense despite the apparent triviality of the issue?

Thanks,
Ettore