Cost of least cost path using shortest_paths()

I am using the shortest_paths() function in the R version of igraph with a directed graph of points and distances. The calculation takes a while to run, which is fine, it’s a big job for my little laptop. This function returns the paths, but it does not return the costs.

I expect the function has the cost of each path to hand at the point where it identifies “yes, this is the shortest path between these two nodes” so it is surprising that it does not return these values - perhaps there is a reason?

Is there a (documented or undocumented) way of obtaining that information at the same time?

I could find none so I calculated the costs after extracting the paths with this code fragment:

sum(igraph::E(cost_graph)$weight[epath]

This ran rather slowly considering it is only extracting and summing data. I found that using the vertices to sum edge costs from the adjacency matrix was often quicker.

Has anybody suggestions please?

And would the developers consider adding output of each path’s costs to the shortest_paths() function’s return values?

Do you mind sharing example code. As far as I understand your problem, I cannot reproduce any issue from what I understand. See below:

library(igraph)
set.seed(100)
g <- sample_gnp(5000, 0.01,TRUE)
E(g)$weight <- floor(runif(ecount(g), min = 1,max = 10))
path <- shortest_paths(g,1,100,output = "epath")
bench::mark(check = FALSE,
shortest_paths(g,1,100,output = "epath"),
  sum(E(g)$weight[path$epath[[1]]]),
  sum(E(g)$weight[E(g)]),
  sum(E(g)$weight[1:ecount(g)])
)
#> # A tibble: 4 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 "shortest_paths(g, 1, 100, outpu… 886.5µs 964.9µs     1011.    1.91MB     54.2
#> 2 "sum(E(g)$weight[path$epath[[1]]…  46.7µs  89.6µs    11671.    1.91MB    759. 
#> 3 "sum(E(g)$weight[E(g)])"            694µs 839.3µs     1176.    4.78MB    144. 
#> 4 "sum(E(g)$weight[1:ecount(g)])"   677.7µs   816µs     1229.    4.78MB    144.

sum(E(g$weight(E(g)) calculates the sum of all weights of all edges (~250000) in under a 1ms. I consider that quite fast, but maybe I am missing something

Thanks David,

I appreciate your thoughtful reply. I too expected it to be fast, so perhaps I have messed up and turned it into an n-squared problem or something like that.

The relevant fragment of code, including the main call, is:

sp_result <- cost_graph |>
  igraph::shortest_paths(
    from = from_cell,
    to = to_cells,
    mode = "out",
    algorithm = "dijkstra",
    output = "both")
path_edges <- lapply(sp_result$epath,\(es)as.integer(es))

path_weight_from_edges <- function(epath) {
  sum(igraph::E(cost_graph)$weight[epath])
}
path_costs <- path_edges |>
  sapply(path_weight_from_edges)

So, path_edges is a list containing integer vectors of edge indices.

It seems to give the correct answers, but perhaps I am accidentally computing it multiple times?

hmm I don’t see anything obviously wrong with the code. Can you tell me how big cost_graph is? (nodes and edges) and how many nodes to_cells has?

For the example graph from above, your code ran in under 1ms on my machine

Thanks for trying, I wondered if I had somehow made it calculate multiple times or some other R-related error.

For context, the cost graph is built from a raster for a geographic problem, so each vertex represents a cell in the raster and is typically connected to up to 8 of its neighbours (there are some longer distance connections, but they are unimportant for this issue).

The adjacency matrix for my current run - other runs are sometimes larger - is 582360 by 582360, but only 498937 rows are non-zero. This is because some parts of the raster are not included in the calculation but are retained in the matrix so the numbering (raster cell indices and vertex indices) lines up.

So the cost graph also has 582360 vertices, 83423 of which are not connected and 3955134 edges, so an average of just under 8 edges per vertex.

The path lengths are typically hundreds of edges, but can be longer.

My timing (not benchmarking, just timings from runs) suggest that the cost calculation can take as long as or longer than the least cost path calculation. If I write my own code to calculate the cost from plugging the vertices into the adjacency matrix, that runs faster than the edge weight calculation.

The actual shortest path calculation is pleasingly fast. It’s a shame it doesn’t feed the path weight back automatically - I’m surprised that that hasn’t been requested before.

Ok your graph is really big and also I finally get the whole problem.
Regarding your question on why this is not implemented. While I have been using igraph for a long time, I just recently joint the development on the R side. I hope though that I represent this correctly: shortest_paths is really only meant to give you the actual paths. Obviously, for the unweighted case, you get the length almost for free.

The function distances() on the other hand is meant to give you the length of the shortest paths (or the sum of weights). So your way is correct, but cumbersome. Here is a reproducible example with a big lattice graph that shows how much faster it is to use distances and that it still produces the same result. That being sad, I think I still found something internally that we could optimize. Hope this helps and feel free to ask follow up questions!

library(igraph)
set.seed(100)
cost_graph <- make_lattice(c(700,700))
E(cost_graph)$weight <- floor(runif(ecount(cost_graph), 2,10))
vcount(cost_graph)
#> [1] 490000
ecount(cost_graph)
#> [1] 978600
from_cell <- 1
to_cells <- sample(seq_len(vcount(cost_graph)),5000)
sp_result <- cost_graph |>
  igraph::shortest_paths(
    from = from_cell,
    to = to_cells,
    mode = "out",
    algorithm = "dijkstra",
    output = "both")
path_edges <- lapply(sp_result$epath,\(es)as.integer(es))

path_weight_from_edges <- function(epath) {
  sum(igraph::E(cost_graph)$weight[epath])
}
bench::mark(
  path_costs1 <- path_edges |> sapply(path_weight_from_edges),
  path_costs2 <- as.vector(distances(cost_graph,from_cell,to_cells))
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 path_costs1 <- sapply(path_edg…    3.34s    3.34s     0.300    36.5GB     207.
#> 2 path_costs2 <- as.vector(dista… 143.92ms 146.36ms     6.26      7.7MB       0

all(path_costs1 == path_costs2)
#> [1] TRUE

Thanks again! If I had only wanted distances (costs, summed weights) I would certainly have used the distances() function! But I wanted both: path and summed weight. It seems strange that using the distances() function - with all the searching involved - should be quicker (if it is) than summing the weights, a simple look-up task. Perhaps there is something in the way that sum(E(cost_graph)$weight[epath] is executed by R that causes it?

Ah! Maybe I need to extract E(cost_graph)$weight first and then poke the array indices into it? I’ll give it a go. The cost of doing one path is not a problem, part of the issue is that I am calculating lots of paths on a large raster.

Hurrah!! Extracting the E(graph)$weight calculation made a huge difference. My code doesn’t have proper benchmarking in it, but it does sum elapsed times and for two identical runs each of 5000 paths I had:

Run 1 with code as above (so extracting weights each time):
short paths calculation: 36.15 s
cost calculation: 88.48 s

Run 2 with weights extracted once):
short paths calculation: 41.62 s (should be same as run 1!)
cost calculation: 0.04 s (is that all?!?!?)

The variation in shortest paths times just shows that I was running the tests while also fiddling with my laptop, so the results are not super-reliable. As I said, not benchmarking.

With trying to plug together terra and igraph, with sf also confusing me, and not being an R expert, I struggled initially with getting anything to work, so optimisation was not a priority.

Thank you for looking into this - this definitely seems to be the solution. However, I think it would still be good and complete if shortest_paths() also returned a vector of summed weights (distances or whatever you wish to call them) in its return data structure. I glanced at the C code, but, like any high-performance package, it doesn’t reveal its secrets to a casual observer, so perhaps there is a reason why this is less trivial than I guessed.

1 Like