I’m using igraph in R to infer the largest fully connected clique with the lowest mean weight. To do this, I’ve written the following code for a network of 113 nodes and 6156 weighted edges between 0.11 and 0.25:
library(igraph)
# EDIT: example dataset
vertices <-paste0("id_", seq(from = 1, to = 113))
subset <- as.data.frame(t(combn(vertices, 2)))
names(subset) <- c("ID1", "ID2")
subset$kinship_coefficient <- runif(6328, min=0.1, max=0.25)
# create network
net <- graph_from_data_frame(subset, directed = FALSE) #subset is edgelist df
net <- set_edge_attr(net, "weight", value= subset$kinship_coefficient)
summary(net) # to check if weight is correct + nr. of edges and vertices
# find largest weighted cliques
groups <- largest_weighted_cliques(net)
# look for mean edge weights per clique
mean_edge_weights <- sapply(groups, function(clique) {
subgraph <- induced.subgraph(net, clique)
mean(E(subgraph)$weight)
})
# Find the index of the clique with the highest mean edge weight
min_mean_weight_index <- which.min(mean_edge_weights)
# Get the clique with the highest mean edge weight
min_mean_weight_clique <- groups[[min_mean_weight_index]]
However, as soon as I use ‘largest_weighted_cliques’ on this network, my R crashes. This also happens when I try using it on my VM on my server. When I use it on a subset of the network, e.g. with 3500 edges, it does not crash. I do need the whole network with all edges to get the result I am looking for.
Does anyone have an idea why it might be crashing? It should not be a memory issue, since the memory limit gives ‘Inf’ (but who knows maybe it is…).
Note that this function works with vertex weights and not edge weights. Furthermore, only integer weights are supported, and the function will give a warning if you pass in non-integral values.
That said, the crash should be fixed.
P.S. I see that the docs state,
The weight of a clique is the sum of the weights of its edges.
This is an error, the weight of a clique is the sum of the weights of its vertices. I just updated the docs to correct this.
EDIT: setting the vertex weights to ‘NULL’ but increasing the network size to my desired size generated a crash again (apologies for the previous edit, I thought I had generated the right size of the network but it was smaller than I needed it to be)
Thanks a lot for the reply. Here is the information you asked for:
Igraph version: 1.4.1
R version: 4.2.1 (2022-06-23)
System details: MacOS Monterey, Apple M1 Pro Chip, 8‐Core CPU, 16GB (On-Board), 512 GB SSD, 14‐Core GPU.
I have provided an example of the dataset, which on my laptop still generates the same issue, in my original question.
Regarding your second comment, as I stated the function works fine when the network contains 2000 edges less than what I have in my dataset, so I don’t think the issue is a result of the fact that the function works with vertex weights (since I only focus on the edge weights after using the function).
I agree, the bug is not related to this, and we would like to investigate it.
Where can we access the dataset? Did you perhaps forget to attach it or link to it?
Although in this case it is unlikely to make a difference, as a general practice, always test with the latest package version, in this case igraph 1.5.1.
Here I attached the example dataset, please not that the first column of rownames should be deleted.
(I attached it via google since I cannot yet share here as a new user)
Did you see a crash with the code using the google drive file? It has slightly different (non-random) edge weights. I did not get a crash with the random edge weights generated with runif, but did with the actual ones.
I still get the crash with igraph 1.5.1 (without any error messages or memory limit warnings)
Yes, this is the exact code that produces a crash, both on my laptop and on my server. It is simply the command ‘largest_weighted_cliques’ that leads to the crash, although ‘maximal_cliques’ and ‘largest_cliques’ do so too. That is, with the dataset that I provide via the google drive, probably because it is a very densely connected network.
If you are able to reproduce it without an error, then I suppose it would be my computer that is the issue. Would you expect it to be a memory allocation issue in this case?
If you see a crash, then there’s a problem which we would like to fix. We need a minimal reproducible example for this. Can you double check the steps you are using and create such an example using the guidelines here?
If you can come up with something that does not depend on external data, that will be the best, but use a datafile if needed.
If you use random numbers, set a seed first (set.seed)
Remove any code that is not necessary for reproducing the issue. I expect edge weights are irrelevant here.
Run the example in a fresh R session, i.e. immediately after (re)starting R and without restoring your workspace.
Do use the latest version of igraph (1.5.1) and if possible, use the latest R (4.3.1).
Show us the output of sessionInfo()
I am assuming you downloaded R from CRAN (i.e. https://cran.r-project.org/) and you installed igraph using the standard install.packages('igraph') command. If this is not the case, let us know how you obtained R and igraph, as this may be relevant. In particular, if you got R through Anaconda of Homebrew, let us know.
Can you give a complete example, including every step necessary to reproduce the crash, instead of just a datafile? See the link I posted above. Without an example we are playing an unproductive guessing game.
It is the same code I listed in the original post. Here it is again, but with a line that takes the file from the google drive:
library(igraph)
subset <- read.csv("Downloads/subset.csv", row.names=1)
net <- graph_from_data_frame(subset, directed = FALSE)
net <- set_edge_attr(net, "weight", value= subset$kinship_coefficient)
summary(net) # to check if weight is correct + nr. of edges and vertices, should be 113 and 6328
groups <- largest_weighted_cliques(net) # This step results in a crash
Thanks for the clarity. I still cannot reproduce the crash using the following setup, or with a custom AddressSanitizer build. Also, the memory usage of R stays low.
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Atlantic/Reykjavik
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.3.1
We need to figure out what the difference is between our setups. Can you please address these four points from my previous message?
What is weird to me, is that with different distribution of the network’s edges, but with the same number of edges and nodes, I either get an error or I do not get an error. It seems to me, based on my limited knowledge, that a small subset of these compilations of the network (which might be related to the density of a specific try-out) leads to a crash whilst the rest does not. For instance, this is a different compilation that in my case also leads to a crash. Would you be able to check if it does so for you as well? If that doesn’t lead to a crash, then there must be something wrong with my set-up on multiple devices.
library(igraph)
subset <- read.csv("Downloads/subset_check.csv", row.names=1)
g <- graph_from_data_frame(subset, directed = FALSE)
g <- set_edge_attr(gt, "weight", value= subset$kinship_coefficient)
summary(g) # to check if weight is correct + nr. of edges and vertices, should be 113 and 6328
groups <- largest_weighted_cliques(g) # This step results in a crash
I think at this point we need to give up. Thanks for posting all this information so far. I am quite confident that there is no issue with igraph. If there was one, AddressSanitizer would very likely catch it. Why it crashes for you is a mystery to me. Some comments below.
It’s probably a good idea to update to the latest R, which is 4.3.
You are running x86_64 binaries on an arm64 system. This is limiting the performance of your R. I recommend you install an R compiled for arm64.
This should not matter, as largest_weighted_cliques() does not use edge weights. It would be good to know if the crash occurs when you omit weights.
This is typical with crashes, just like with the flu. You may carry the disease yet not have symtpoms. The tool called AddressSanitizer is designed to help with this and force the symptom out. Yet we still can’t see the issue with it.
It might be interesting to run R in a terminal (not GUI, not RStudio) and see if there’s any terminal output at the point of the crash.
It might also be interesting to open your Console.app and see if there’s a crashlog for R. If yes, share it.
All that said, I don’t have high hopes anymore and I do not need you to continue investigating.