infomap clustering issues

I am using the cluster_infomap algorithm for community detection on a directed graph with 5602 vertices and 42752 links. I have read that this algorithm is not necessarily ideal for a graph of this size, but since our graph is directed, our options are limited.

I ran the algorithm a handful of times, and while I got slightly different results each time, they were results that made sense in the context in our data. Then RStudio crashed, and I lost the communities I’d gotten before. When I ran the algorithm again, all of a sudden the communities were completely different. (We are dealing with musical influence data. Our initial communities very clearly had similarities in the artists that were being clustered into each community, where as I scrolled through the list sorted by community membership, I could tell immediately “okay, this is now a new community”. When I ran it again, that was no longer the case.)

I know that the only way to ensure that we are able to replicate the exact communities each time is to set a specific seed, but I’m really perplexed about how our community structure has changed so drastically. Does anyone have any input, or suggestions about how to get the original community structure back? Thank you!

If I understand you correctly, you ran a numerical experiment multiple times. While the results were slightly different, they were all consistent with each other (let’s call then “type A”)

Then you lost some of your work unfortunately, and had to re-create it. After that the results were still somewhat variable and consistent with each other (“type B”), but clearly different from “type A” results.

If so, the only explanation is that you did not succeed in recreating the experiment in the same way. You must have (inadvertently) done something differently.

I understand that this can be very frustrating. It happens to all of us at some point. On the upside, it can also be a good thing: if you go through your code, and find what the difference was, you will have more confidence in your results. Maybe there is a difference you didn’t even realize to be important, but this mishap will reveal it.

All we can really say is that the result that returns depends only on two things: (1) the input you pass to the function (2) the state of the random number generator.

(2) cannot explain the difference you describe, so it must be (1). Find what you did differently.