Hi,
I am having an issue using all_simple_path function (I use a PC Windows 11, Rstudio 2023.03.0+386, igraph v.1.4.2).
I am working with not so big networks (<100 edges) but the calculation of all simple paths of all edges can take up to 10 hours.
Actually, I don’t really need to compute all simple paths and I would like to divide the calculation and get all simple paths of a specific length, e.g. 3 vertices only, then 4 vertices only, then 5 vertices only, etc.
However, the cutoff argument of the function does not allow this since its value represents the maximum length of the alternative paths. So if I want to get alternative paths of e.g. 5 vertices, all_simple_path will also compute alternative paths of 3 vertices and 4 vertices, and this adds time of calculation.
Is there a way to fix this issue? Did I miss something in igraph?
Thank you for your help,
Pauline
ps: the reason why I want to break up the computation of the simple paths would be a bit long to explain so I hope I was clear enough!
At the moment, igraph does not provide a feature to get only paths of a given length. However, it would be useful for us to understand your use case a bit better, to see how we can improve this function in the future.
What are you saying here about performance is not quite accurate. This function basically enumerates all paths rooted in a source (to all possible targets), and generates paths of length n+1 as extensions of paths of length n. Thus, generating only paths with a length of exactly 5 is not going to be faster than generating paths of length up to 5.
However, it will take more memory, and filtering out paths you don’t want may be slower in R than it would be in C.
The long-term solution we are planning for issues of this type, i.e. generating a potentially very large number of objects, is to use iterators. In other words, paths would be generated one by one, giving you the chance to process each path right then without storing it. This is a long-term project though and won’t come anytime soon. It’s also good to note that due to how path enumeration works, it’s not possible to generate paths in order of increasing lengths, so length 1 first, then length 2, length 3, etc. They are going to appear in arbitrary order.
In the shorter term, two alternative solutions could be implemented, if real use cases support them, and if there is enough demand.
- Instead of storing all paths, invoke a callback function for each path. This allows you to process paths one by one, potentially terminating the search at each step. But unlike the iterator solution, once the search is terminated, it cannot be resumed again.
- Filter paths by both minimum and maximum length in the C code, which is more memory efficient, and potentially (but not certainly) faster than doing the filtering in R.
But to be clear, with any of these solutions, generating up to length-n or only length-n will take the same amount of time. It’s only the filtering of results that can be done for you, but this is likely only bringing memory use benefits, not computation time benefits.
Thank you for answering so fast!
I understand now that what I am looking for will not help regarding time of calculation.
I won’t explain what I am doing in deep because I will probably contact you in the next weeks when the whole script will be finalised.
But to be brief, I have a complex network in which each edge holds a bootstrap support. I remove edges that has a weak bootstrap support i.e. there is an alternative path which the product of the bootstrap supports of its edges is higher than the bootstrap support of the original edge.
That is why I use all_simple_paths.
The potential alternative path that bears a higher bootstrap support than the original edge might be a short path with max. 5-6 vertices. So to save time, I made a loop for (i in 2:n) with n = maximum number of edges in an alternative path. Thank to this loop I compare the original edge to groups of alternative paths separately : first paths with 3 vertices, then paths with 3+4 vertices, then 3+4+5 vertices, etc. So I don’t need to compute all possible alternative path as the process stops as soon as an alternative path with a higher bootstrap support is found.
Sorry if I am not clear, but I will surely contact you again about this script.
Thank you again,
Pauline