I have a situation where I have a very large pairs list that’s composed of many unique communities/cliques that i evaluate one by one after building a large graph. I would like to prepare for the eventuality of not being able to build that large graph in R due to memory constraints though.
A very simplified version of my workflow would look like:
library(igraph)
df <- data.frame("p1" = c("a", "a", "d", "d"),
"p2" = c("b", "c", "e", "f"),
"val" = c(0.5, 0.75, 0.25, 0.35))
g <- graph_from_data_frame(d = df,
directed = FALSE)
sg <- groups(components(g))
sg <- sapply(sg,
function(x) induced_subgraph(graph = g,
vids = x),
USE.NAMES = FALSE,
simplify = FALSE)
# do other stuff with the contents of sg
if df
is incredibly large - on the scale of hundreds of millions to tens of billions of rows, is there a way for me to extract individual positions of sg
without having to build g
in it’s entirety? It’s relatively easy for me to store representations of df
outside of R either as a compressed txt file or as a sqlite database that can be programmatically queried. For reference, if g
is < 100 million rows, it can generally be constructed in R, but as it approaches ~ 600 million rows, the OS will kill R during the construction of g
.