As Infomap incorporates a random walker, different results are expected. However, if I use set.seed
from R – as suggested in this post – the clusterings differ. The algorithm cluster_leiden
does seem to recognize a set seed according the manual page 82. The old Infomap code offered a parameter --seed
. Link
I’d appreciate reproducible results from a clustering algorithm. I’m aware this is not trivial and the Infomap code is old. Is there anything you could do? Or should I use the modern implementation from mapequation.org? Thank you
I cannot reproduce this issue. If I set the same seed, I get the same clustering. Can you provide a reproducible example that demonstrates the problem? Run the example in a fresh R session and include the output of sessionInfo()
in your report.
Example:
> library(igraph)
Attaching package: ‘igraph’
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
> set.seed(123)
> g<-sample_gnm(100,200)
Without setting a seed, the results are different for each run:
> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068
With a seed, the results are deterministic:
> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988
> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988
> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988
Session info:
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] igraph_1.5.0
loaded via a namespace (and not attached):
[1] compiler_4.3.1 magrittr_2.0.3 cli_3.6.1
[4] tools_4.3.1 rstudioapi_0.15.0 rlang_1.1.1
[7] pkgconfig_2.0.3
1 Like
It was helpful to see that you executed set.seed every time before clustering. I understood the R function help as if it was sufficient to set it once.
Thank you!
If you set the seed at the top of a script (sequence of commands), that script will execute identically every time. But that does not meet that multiple calls of a command within a script yield the same result.
Example:
> set.seed(123)
> g<-sample_gnm(100,200)
> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068
> set.seed(123)
> g<-sample_gnm(100,200)
> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068
As you can see, the three runs after set.seed
give different result. But after a repeated set.seed
(with the same seed) the same three runs give the same three results again.
1 Like
What I did was set a seed, register a parallel compute backend and then used the foreach
package to cluster the same data with what I assumed to be same seed on several cores. Now I know better. Thanks again. Apparently, the state of the RNG is changed after the function call (to my limited understanding) and the user-provided seed is only utilized once.
Interestingly, the codelength
was the same for all my clustering solutions but modularity
value and community count differed.