Largest_weighted_cliques leads to R crash

Hello,

I’m using igraph in R to infer the largest fully connected clique with the lowest mean weight. To do this, I’ve written the following code for a network of 113 nodes and 6156 weighted edges between 0.11 and 0.25:

library(igraph)

# EDIT: example dataset
vertices <-paste0("id_", seq(from = 1, to = 113))
subset <- as.data.frame(t(combn(vertices, 2)))
names(subset) <- c("ID1", "ID2")
subset$kinship_coefficient <- runif(6328, min=0.1, max=0.25)

# create network
net <- graph_from_data_frame(subset, directed = FALSE) #subset is edgelist df
net <-  set_edge_attr(net, "weight", value= subset$kinship_coefficient)
summary(net) # to check if weight is correct + nr. of edges and vertices

# find largest weighted cliques
groups <- largest_weighted_cliques(net)

# look for mean edge weights per clique
mean_edge_weights <- sapply(groups, function(clique) {
  subgraph <- induced.subgraph(net, clique)
  mean(E(subgraph)$weight)
})

# Find the index of the clique with the highest mean edge weight
min_mean_weight_index <- which.min(mean_edge_weights)

# Get the clique with the highest mean edge weight
min_mean_weight_clique <- groups[[min_mean_weight_index]]

However, as soon as I use ‘largest_weighted_cliques’ on this network, my R crashes. This also happens when I try using it on my VM on my server. When I use it on a subset of the network, e.g. with 3500 edges, it does not crash. I do need the whole network with all edges to get the result I am looking for.

Does anyone have an idea why it might be crashing? It should not be a memory issue, since the memory limit gives ‘Inf’ (but who knows maybe it is…).

Thanks a lot in advance!

Please share all information that’s needed to reproduce this issue and we will look into it:

  • R version
  • igraph version, and how you installed igraph
  • details of your system (OS, OS version, architecture, etc.)
  • your dataset, which is the most important piece of this puzzle

Note that this function works with vertex weights and not edge weights. Furthermore, only integer weights are supported, and the function will give a warning if you pass in non-integral values.

That said, the crash should be fixed.

P.S. I see that the docs state,

The weight of a clique is the sum of the weights of its edges.

This is an error, the weight of a clique is the sum of the weights of its vertices. I just updated the docs to correct this.

EDIT: setting the vertex weights to ‘NULL’ but increasing the network size to my desired size generated a crash again (apologies for the previous edit, I thought I had generated the right size of the network but it was smaller than I needed it to be)

Thanks a lot for the reply. Here is the information you asked for:

Igraph version: 1.4.1
R version: 4.2.1 (2022-06-23)
System details: MacOS Monterey, Apple M1 Pro Chip, 8‐Core CPU, 16GB (On-Board), 512 GB SSD, 14‐Core GPU.

I have provided an example of the dataset, which on my laptop still generates the same issue, in my original question.

Regarding your second comment, as I stated the function works fine when the network contains 2000 edges less than what I have in my dataset, so I don’t think the issue is a result of the fact that the function works with vertex weights (since I only focus on the edge weights after using the function).

Thanks a lot for your time!

I agree, the bug is not related to this, and we would like to investigate it.

Where can we access the dataset? Did you perhaps forget to attach it or link to it?

Although in this case it is unlikely to make a difference, as a general practice, always test with the latest package version, in this case igraph 1.5.1.

Here I attached the example dataset, please not that the first column of rownames should be deleted.
(I attached it via google since I cannot yet share here as a new user)

Thank you!

I do not see a crash with the code above.

Can you double-check that this code triggers the crash on your machine?

If yes, can you try with the latest igraph, version 1.5.1?

Did you see a crash with the code using the google drive file? It has slightly different (non-random) edge weights. I did not get a crash with the random edge weights generated with runif, but did with the actual ones.

I still get the crash with igraph 1.5.1 (without any error messages or memory limit warnings)

Can you provide the exact code that produces the crash, using your datafile? The following works fine for me:

library(igraph)
df <- read.csv('subset.csv')
g <- graph_from_data_frame(df,directed=F)
g <- set_edge_attr(g, "weight", value= df$kinship_coefficient)
largest_weighted_cliques(g)

The edge weights should not matter since they are not used by largest_weighted_cliques, but I set them anyway.

Yes, this is the exact code that produces a crash, both on my laptop and on my server. It is simply the command ‘largest_weighted_cliques’ that leads to the crash, although ‘maximal_cliques’ and ‘largest_cliques’ do so too. That is, with the dataset that I provide via the google drive, probably because it is a very densely connected network.

If you are able to reproduce it without an error, then I suppose it would be my computer that is the issue. Would you expect it to be a memory allocation issue in this case?

We cannot reproduce the crash on our side, not even with AddressSanitizer, which would reveal any problems.

We must have a misunderstanding. What you sent me is very sparse:

> df <- read.csv('~/Downloads/subset.csv')
> g <- graph_from_data_frame(df,directed=F)
> vcount(g)
[1] 6440
> ecount(g)
[1] 6328

Note that it has fewer edges than vertices.

If you see a crash, then there’s a problem which we would like to fix. We need a minimal reproducible example for this. Can you double check the steps you are using and create such an example using the guidelines here?

  • If you can come up with something that does not depend on external data, that will be the best, but use a datafile if needed.
  • If you use random numbers, set a seed first (set.seed)
  • Remove any code that is not necessary for reproducing the issue. I expect edge weights are irrelevant here.
  • Run the example in a fresh R session, i.e. immediately after (re)starting R and without restoring your workspace.
  • Do use the latest version of igraph (1.5.1) and if possible, use the latest R (4.3.1).
  • Show us the output of sessionInfo()
  • I am assuming you downloaded R from CRAN (i.e. https://cran.r-project.org/) and you installed igraph using the standard install.packages('igraph') command. If this is not the case, let us know how you obtained R and igraph, as this may be relevant. In particular, if you got R through Anaconda of Homebrew, let us know.

That’s weird, vcount should give 113. I must have done something wrong when uploading the file. Could you try again with the following file?

My apologies for the mess up.

Can you give a complete example, including every step necessary to reproduce the crash, instead of just a datafile? See the link I posted above. Without an example we are playing an unproductive guessing game.

It is the same code I listed in the original post. Here it is again, but with a line that takes the file from the google drive:

library(igraph)

subset <- read.csv("Downloads/subset.csv", row.names=1)
net <- graph_from_data_frame(subset, directed = FALSE) 
net <-  set_edge_attr(net, "weight", value= subset$kinship_coefficient)
summary(net) # to check if weight is correct + nr. of edges and vertices, should be 113 and 6328
groups <- largest_weighted_cliques(net) # This step results in a crash

Thanks for the clarity. I still cannot reproduce the crash using the following setup, or with a custom AddressSanitizer build. Also, the memory usage of R stays low.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Atlantic/Reykjavik
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.3.1

We need to figure out what the difference is between our setups. Can you please address these four points from my previous message?

Of course, Here is the requested info:

  • Running the example in a fresh R session makes no difference unfortunately.
  • I run igraph 1.5.1 and R 4.2.1 (I generally prefer to not update R too often, as it always leads to issues).
  • R was downloaded from CRAN and I installed igraph in the way you specified.
SessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] adegenet_2.1.10 ade4_1.7-22    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.11       later_1.3.1       pillar_1.9.0      compiler_4.2.1    plyr_1.8.8        tools_4.2.1       digest_0.6.33    
 [8] lifecycle_1.0.3   tibble_3.2.1      gtable_0.3.4      nlme_3.1-163      lattice_0.21-8    mgcv_1.9-0        pkgconfig_2.0.3  
[15] rlang_1.1.1       Matrix_1.5-4      igraph_1.5.1      DBI_1.1.3         shiny_1.7.5       cli_3.6.1         rstudioapi_0.15.0
[22] parallel_4.2.1    fastmap_1.1.1     cluster_2.1.4     colorblindr_0.1.0 dplyr_1.1.3       stringr_1.5.0     generics_0.1.3   
[29] vctrs_0.6.3       grid_4.2.1        cowplot_1.1.2     tidyselect_1.2.0  glue_1.6.2        ggnewscale_0.4.9  R6_2.5.1         
[36] fansi_1.0.4       sessioninfo_1.2.2 seqinr_4.2-30     ggplot2_3.4.3     reshape2_1.4.4    magrittr_2.0.3    splines_4.2.1    
[43] ellipsis_0.3.2    promises_1.2.1    htmltools_0.5.6   scales_1.2.1      MASS_7.3-60       permute_0.9-7     xtable_1.8-4     
[50] mime_0.12         colorspace_2.1-0  ape_5.7-1         httpuv_1.6.11     utf8_1.2.3        stringi_1.7.12    munsell_0.5.0    
[57] vegan_2.6-4

What is weird to me, is that with different distribution of the network’s edges, but with the same number of edges and nodes, I either get an error or I do not get an error. It seems to me, based on my limited knowledge, that a small subset of these compilations of the network (which might be related to the density of a specific try-out) leads to a crash whilst the rest does not. For instance, this is a different compilation that in my case also leads to a crash. Would you be able to check if it does so for you as well? If that doesn’t lead to a crash, then there must be something wrong with my set-up on multiple devices.

Here’s that dataset: subset_check.csv - Google Drive
It can be run with the same code as before:

library(igraph)
subset <- read.csv("Downloads/subset_check.csv", row.names=1)
g <- graph_from_data_frame(subset, directed = FALSE) 
g <-  set_edge_attr(gt, "weight", value= subset$kinship_coefficient)
summary(g) # to check if weight is correct + nr. of edges and vertices, should be 113 and 6328
groups <- largest_weighted_cliques(g) # This step results in a crash

I think at this point we need to give up. Thanks for posting all this information so far. I am quite confident that there is no issue with igraph. If there was one, AddressSanitizer would very likely catch it. Why it crashes for you is a mystery to me. Some comments below.

It’s probably a good idea to update to the latest R, which is 4.3.

You are running x86_64 binaries on an arm64 system. This is limiting the performance of your R. I recommend you install an R compiled for arm64.

This should not matter, as largest_weighted_cliques() does not use edge weights. It would be good to know if the crash occurs when you omit weights.

This is typical with crashes, just like with the flu. You may carry the disease yet not have symtpoms. The tool called AddressSanitizer is designed to help with this and force the symptom out. Yet we still can’t see the issue with it.

It might be interesting to run R in a terminal (not GUI, not RStudio) and see if there’s any terminal output at the point of the crash.

It might also be interesting to open your Console.app and see if there’s a crashlog for R. If yes, share it.

All that said, I don’t have high hopes anymore and I do not need you to continue investigating.

Ok, that makes sense.

Thanks a lot for all the help and suggestions!