cluster_infomap results not reproducible despite set.seed()

As Infomap incorporates a random walker, different results are expected. However, if I use set.seed from R – as suggested in this post – the clusterings differ. The algorithm cluster_leiden does seem to recognize a set seed according the manual page 82. The old Infomap code offered a parameter --seed. Link

I’d appreciate reproducible results from a clustering algorithm. I’m aware this is not trivial and the Infomap code is old. Is there anything you could do? Or should I use the modern implementation from mapequation.org? Thank you

I cannot reproduce this issue. If I set the same seed, I get the same clustering. Can you provide a reproducible example that demonstrates the problem? Run the example in a fresh R session and include the output of sessionInfo() in your report.

Example:

> library(igraph)

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union

> set.seed(123)
> g<-sample_gnm(100,200)

Without setting a seed, the results are different for each run:

> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068

With a seed, the results are deterministic:

> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988
> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988
> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988

Session info:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] igraph_1.5.0

loaded via a namespace (and not attached):
[1] compiler_4.3.1    magrittr_2.0.3    cli_3.6.1        
[4] tools_4.3.1       rstudioapi_0.15.0 rlang_1.1.1      
[7] pkgconfig_2.0.3  
1 Like

It was helpful to see that you executed set.seed every time before clustering. I understood the R function help as if it was sufficient to set it once.
Thank you!

If you set the seed at the top of a script (sequence of commands), that script will execute identically every time. But that does not meet that multiple calls of a command within a script yield the same result.

Example:

> set.seed(123)
> g<-sample_gnm(100,200)
> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068

> set.seed(123)
> g<-sample_gnm(100,200)
> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068

As you can see, the three runs after set.seed give different result. But after a repeated set.seed (with the same seed) the same three runs give the same three results again.

1 Like

What I did was set a seed, register a parallel compute backend and then used the foreach package to cluster the same data with what I assumed to be same seed on several cores. Now I know better. Thanks again. Apparently, the state of the RNG is changed after the function call (to my limited understanding) and the user-provided seed is only utilized once.

Interestingly, the codelength was the same for all my clustering solutions but modularity value and community count differed.