I have a directed graph with 55000 nodes and 100000 of edges. I have to compute all the shortest path among the nodes (3.025 billion paths). Then, I have to get and manipulate each path attribute. While doing this in R in my Windows PC, deducing from small sample, it may take me 180 days to compete the computation. A 2011 internet conversation mentioned that igraph in C is 1000 times faster than igraph in R . Will the same benefit happens to me if I switch to C?
It is hard to tell if your problem would benefit from programming in C. Can you show a small code example that illustrates what you are doing?
igraphâs shortest path finding is implemented in C, and it is fast (IMO no other library will give a significant speedup). R/igraph uses this C implementation. However, your R code may be performing other operations which may be responsible for the performance issue. In short, we need to see the code.
Finally, it is good to note that storing all shortest path lengths for a graph on 55000 vertices will take ~22.5 GB of memory. Should the code ever trigger copying this matrix, that would double the memory requirements. So itâs hopeless without at least 64 GB of memory in your computer. It wasnât clear to me if you compute path length or paths. It sounded like the latter, which takes even more memory. In such cases, one would want to avoid storing all these paths. It is better to process them incrementally, as you find them, then discard each one as soon as itâs processed. To do this efficiently, programming in C would indeed be necessary.
Here g2, is a directed graph, with complete edges attributes: time and distance
g2
IGRAPH 7bb4b20 DN-- 55016 84912 --
+ attr: name (v/c), geometry (e/x), distance (e/n),
| time (e/n)
+ edges from 7bb4b20 (vertex names):
[1] 29210x10505->29285x10443 47992x34505->48095x34529
[3] 32419x14523->31723x13736 34564x15333->35084x15493
[5] 32703x15115->33106x15753 31788x16008->31866x16097
[7] 29798x10045->29494x9483 30768x8040 ->30368x8599
[9] 29443x9341 ->29259x9059 33499x6314 ->33436x6380
[11] 34128x5187 ->34008x5231 32836x5065 ->32885x5086
[13] 32195x4616 ->32386x4846 33467x4328 ->33543x4453
+ ... omitted several edges>
I would like to create a distance table (in meter), but based on the shortest travelling time. So igraph::distance_table wonât work, as it will output the shortest time. One way, I can think of is by computing all the shortest path between all nodes with time as weight, and then summing up the edges distance.
However, a sample of 10000 shortest path:
system.time({path â shortest_paths(g2,V(g2)[1],V(g2)[1:10000],weights=E(g2)$time,output=âepathâ)})
user system elapsed
461.52 36.70 507.63
It will take me around 507.63* 5.5 * 55000s = 1777 days for just calculating 3 billion shortest paths.
system.time({distance1 â sapply(path$epath,function(x) sum(get.edge.attribute(g2,âdistanceâ,x)))})
user system elapsed
499.38 0.37 512.07
And another 1777 days to summing the distance for all paths. Iâm planning to do for loops and write it in HDD for each iteration. Besides, we have 1TB workstation. So memory not a problem. However, 10 years of computation wonât do it for me. Really appreciate your responses.
I think the overhead of the R interface should be minimal. However, perhaps you can simply try it in C and time the results for your graph?
You can get the shortest paths using igraph_get_shortest_paths_dijkstra
. This function returns all shortest paths from one single node to all other nodes (or a selection thereof). You can easily read graphs from a variety of input also in C.
The C program would then look something like this:
#include <igraph.h>
#include <time.h>
int main() {
igraph_t graph;
int i, n;
clock_t t;
double duration;
n = 100;
FILE *infile;
infile = fopen("FILENAME", "r");
igraph_read_graph_edgelist(&g, infile, 0, 1);
fclose(infile);
igraph_vector_ptr_t edges;
/* Initialize pointer vector for storing edge paths */
igraph_vector_ptr_init(&edges, n);
for (i = 0; i < igraph_vector_ptr_size(&edges); i++) {
VECTOR(edges)[i] = calloc(1, sizeof(igraph_vector_t));
igraph_vector_init(VECTOR(edges)[i], 0);
}
t = clock();
igraph_get_shortest_paths_dijkstra(&graph, /*vertices=*/ NULL,
/*edges=*/ &edges, /*from=*/ 0, /*to=*/ igraph_vss_all(),
/*weights=*/ NULL, /*mode=*/ IGRAPH_OUT,
/*predecessors=*/ NULL,
/*inbound_edges=*/ NULL);
duration = ((double)clock() - t)/CLOCKS_PER_SEC * 1000;
printf("Took %f ms to get shortest paths.\n", duration);
/* Free edge vector */
for (i = 0; i < igraph_vector_ptr_size(&edges); i++) {
igraph_vector_destroy(VECTOR(edges)[i]);
free(VECTOR(edges)[i]);
}
igraph_vector_ptr_destroy(&edges);
igraph_destroy(&graph);
}
Iâm not sure what format your graph is, so this will have to be adjusted of course. It now also doesnât use any weights.