cluster_infomap results not reproducible despite set.seed()

astruck · 24 July 2023 14:30

As Infomap incorporates a random walker, different results are expected. However, if I use set.seed from R – as suggested in this post – the clusterings differ. The algorithm cluster_leiden does seem to recognize a set seed according the manual page 82. The old Infomap code offered a parameter --seed. Link

I’d appreciate reproducible results from a clustering algorithm. I’m aware this is not trivial and the Infomap code is old. Is there anything you could do? Or should I use the modern implementation from mapequation.org? Thank you

szhorvat · 24 July 2023 16:39

I cannot reproduce this issue. If I set the same seed, I get the same clustering. Can you provide a reproducible example that demonstrates the problem? Run the example in a fresh R session and include the output of sessionInfo() in your report.

szhorvat · 24 July 2023 16:47

Example:

> library(igraph)

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union

> set.seed(123)
> g<-sample_gnm(100,200)

Without setting a seed, the results are different for each run:

> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068

With a seed, the results are deterministic:

> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988
> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988
> set.seed(345); cluster_infomap(g)$codelength
[1] 6.036988

Session info:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] igraph_1.5.0

loaded via a namespace (and not attached):
[1] compiler_4.3.1    magrittr_2.0.3    cli_3.6.1        
[4] tools_4.3.1       rstudioapi_0.15.0 rlang_1.1.1      
[7] pkgconfig_2.0.3

astruck · 25 July 2023 12:19

It was helpful to see that you executed set.seed every time before clustering. I understood the R function help as if it was sufficient to set it once.
Thank you!

szhorvat · 25 July 2023 16:35

If you set the seed at the top of a script (sequence of commands), that script will execute identically every time. But that does not meet that multiple calls of a command within a script yield the same result.

Example:

> set.seed(123)
> g<-sample_gnm(100,200)
> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068

> set.seed(123)
> g<-sample_gnm(100,200)
> cluster_infomap(g)$codelength
[1] 6.0551
> cluster_infomap(g)$codelength
[1] 6.054886
> cluster_infomap(g)$codelength
[1] 6.063068

As you can see, the three runs after set.seed give different result. But after a repeated set.seed (with the same seed) the same three runs give the same three results again.

astruck · 26 July 2023 06:49

What I did was set a seed, register a parallel compute backend and then used the foreach package to cluster the same data with what I assumed to be same seed on several cores. Now I know better. Thanks again. Apparently, the state of the RNG is changed after the function call (to my limited understanding) and the user-provided seed is only utilized once.

Interestingly, the codelength was the same for all my clustering solutions but modularity value and community count differed.

Topic		Replies	Views
infomap clustering issues Usage R	1	386	29 April 2021
How can I reproduce the results of sample_tree() ? Usage R	2	246	26 September 2022
warning message in community detection with cluster_edge_betweenness algorithm Usage R	8	1167	24 July 2023
Extracting ground-truth clusters from sample_sbm result Usage R	4	856	26 February 2020
Details for cluster_louvain local moving heuristic (for r users) Usage R	6	2275	1 September 2020

cluster_infomap results not reproducible despite set.seed()

Related topics