Parallel algorithms for community detection

remysucre · 14 August 2020 10:13

There are a handful parallel algorithms for community detection, for example this paper and this paper. Are there plans to implement any?

szhorvat · 17 August 2020 09:37

Our member who is most closely involved with community detection methods is not around at the moment. I suspect he might respond when he is back.

I assume that the reason you are asking is that you need speed: handle larger networks faster. Can you explain what limits you ran into? How big (and of what sort) is the network which you could not handle with existing methods? Which methods did you try?

remysucre · 17 August 2020 18:48

I don’t really have a hard block. I’ve been running experiments with the SNAP datasets, and in the future I will likely run more of size similar to the livejournal data. I basically tried every single algorithm from igraph and networkx, and only infomap, louvain, leiden and fluid community could finish overnight. The quality of the detection was not good enough, so I will need to rerun with different parameters, or just find better datasets. I understand parallelism, even if possible, may not help other algorithms because of the computational complexity. But if only the tractable algorithms can be sped up it’d accelerate my experiments.

remysucre · 17 August 2020 18:51

Maybe label propagation also finished, but iirc the result quality was really bad (a few huge communities and tons of single-node communities).

vtraag · 20 August 2020 09:02

Hi @remysucre, there are no plans to introduce parallelizations of community detection algorithms, at least not from my side. Although the performance may increase, as you indicate, there are a number of algorithms that do finish (even if it is over night). I believe that the Leiden algorithm should be one of the fastest among those (disclaimer: I am one of the authors of that algorithm).

What “quality” was not good enough?

The Leiden algorithm supports also a different objective function (CPM, instead of Modularity), and you may also tune the resolution parameter. If your problem is that you need to explore many different resolution parameters, that is something that is actually trivial to parallelize. This will also be a much more efficient gain compared to trying to parallelize the algorithms themselves.

I am not sure what the goal is of your analysis? Of course, one dataset is not better than another dataset just because a community detection algorithm works “better” somehow.

remysucre · 20 August 2020 16:00

For the context, I’m trying to exploit community structure to improve data analytics. I can share more details offline (via email etc.), but here are my main constraints:

There should be few cross-community edges compared to intra-community edges. Between any pair of communities, there should be far fewer edges (like 1000X) than within a community.
There are a few communities of similar size, instead of many communities that differ too much in size. Even better if I can specify the number of communities. Usually this number is under 100 and can go under 10.

Note that 2 can be achieved even if the “ground truth” is many communities of different sizes. In that case just merge small ones and sacrifice the intra-community density (I’m fine with that). Given these, I’ve been looking into tools that perform k-partition on graphs, e.g. KaHyPar, which seem to fit my needs better. But I will still consider community detection.

And now to answer your questions @vtraag:

Leiden is indeed the fastest, awesome work!

The implemented algorithms usually return a few giant communities together with many many tiny communities, which doesn’t meet my constraints. Also there are just too many cross edges.

That’s absolutely right.

Since my goal is to leverage community structure for data analytics, it might be the case that the datasets I have simply don’t exhibit strong enough community structure. I got some good results from taking the ground truth from the synthetic LFR benchmarks which have really strong communities.

Topic		Replies	Views
Community detection at various resolutions Showcase and Applications	0	1372	4 June 2020
Overlapping and non-overlapping algorithms detecting communities in igrpah Usage R	0	286	12 May 2021
Implementing the LDBC Graphalytics benchmark Usage C	14	1193	6 February 2021
How do we use igrah for large graph Usage Python	3	262	13 November 2022
Optimal Community Detection Usage Python	4	1183	17 December 2022

Parallel algorithms for community detection

Related topics