infomap clustering issues

lizerlfunk · 29 April 2021 01:28

I am using the cluster_infomap algorithm for community detection on a directed graph with 5602 vertices and 42752 links. I have read that this algorithm is not necessarily ideal for a graph of this size, but since our graph is directed, our options are limited.

I ran the algorithm a handful of times, and while I got slightly different results each time, they were results that made sense in the context in our data. Then RStudio crashed, and I lost the communities I’d gotten before. When I ran the algorithm again, all of a sudden the communities were completely different. (We are dealing with musical influence data. Our initial communities very clearly had similarities in the artists that were being clustered into each community, where as I scrolled through the list sorted by community membership, I could tell immediately “okay, this is now a new community”. When I ran it again, that was no longer the case.)

I know that the only way to ensure that we are able to replicate the exact communities each time is to set a specific seed, but I’m really perplexed about how our community structure has changed so drastically. Does anyone have any input, or suggestions about how to get the original community structure back? Thank you!

szhorvat · 29 April 2021 18:52

If I understand you correctly, you ran a numerical experiment multiple times. While the results were slightly different, they were all consistent with each other (let’s call then “type A”)

Then you lost some of your work unfortunately, and had to re-create it. After that the results were still somewhat variable and consistent with each other (“type B”), but clearly different from “type A” results.

If so, the only explanation is that you did not succeed in recreating the experiment in the same way. You must have (inadvertently) done something differently.

I understand that this can be very frustrating. It happens to all of us at some point. On the upside, it can also be a good thing: if you go through your code, and find what the difference was, you will have more confidence in your results. Maybe there is a difference you didn’t even realize to be important, but this mishap will reveal it.

All we can really say is that the result that infomap.community returns depends only on two things: (1) the input you pass to the function (2) the state of the random number generator.

(2) cannot explain the difference you describe, so it must be (1). Find what you did differently.

Topic		Replies	Views
Help with directed movements and self-links in clusters_infomap Development R	6	376	22 March 2021
cluster_infomap results not reproducible despite set.seed() Usage R	5	382	26 July 2023
Extracting ground-truth clusters from sample_sbm result Usage R	4	855	26 February 2020
Modularity calculation for directed graphs in `cluster_edge_betweenness` and `cluster_infomap` Usage R	1	801	27 August 2022
warning message in community detection with cluster_edge_betweenness algorithm Usage R	8	1166	24 July 2023

infomap clustering issues

Related topics