I have a situation where I have a very large pairs list that’s composed of many unique communities/cliques that i evaluate one by one after building a large graph. I would like to prepare for the eventuality of not being able to build that large graph in R due to memory constraints though.
A very simplified version of my workflow would look like:
library(igraph) df <- data.frame("p1" = c("a", "a", "d", "d"), "p2" = c("b", "c", "e", "f"), "val" = c(0.5, 0.75, 0.25, 0.35)) g <- graph_from_data_frame(d = df, directed = FALSE) sg <- groups(components(g)) sg <- sapply(sg, function(x) induced_subgraph(graph = g, vids = x), USE.NAMES = FALSE, simplify = FALSE) # do other stuff with the contents of sg
df is incredibly large - on the scale of hundreds of millions to tens of billions of rows, is there a way for me to extract individual positions of
sg without having to build
g in it’s entirety? It’s relatively easy for me to store representations of
df outside of R either as a compressed txt file or as a sqlite database that can be programmatically queried. For reference, if
g is < 100 million rows, it can generally be constructed in R, but as it approaches ~ 600 million rows, the OS will kill R during the construction of