I have a 60-node graph with one connected component, in which there is a path between every pair of nodes. I am trying to enumerate all simple paths between nodes. I expect the longest simple path to be of length 60 after iterating through all nodes and using get_all_simple_paths with a cutoff=-1, but instead the longest simple path is only of length 27. Might I be doing something wrong here / thinking about this incorrectly?
To reproduce
from tqdm import tqdm
from igraph import Graph
import itertools
G = Graph.Load('graph60.gml', format='gml')
sp_dicts = {}
sp_list = []
end_num = 60
for i, j in tqdm(itertools.combinations(range(end_num), 2)):
simple_path_lists = G.get_all_simple_paths(i, to=j, mode="OUT", cutoff=-1)
sp_dicts[f'{i}_{j}'] = [simple_path for simple_path in simple_path_lists]
for path_list in simple_path_lists:
sp_list.append(sorted(path_list))
# Convert lists to tuples and then to a set to get unique tuples
unique_tuples = set(tuple(lst) for lst in sp_list)
# Convert unique tuples back to lists
unique_lists = [sorted(list(tpl)) for tpl in unique_tuples]
n_mer_list = sorted(unique_lists, key=lambda x: (len(x), x))
print(f'length of longest path: {len(n_mer_list[-1])}')
Version information
‘0.11.3’ of python-igraph from pip
Thanks for the responses – it helped me realize that I made a mistake when generating the graph. The corrected graph should be undirected. I believe that a path should exist that visits every vertex since the graph is undirected and since it has only one connected component.
As a follow-up: the docs mention that get_all_simple_paths may run out of memory for exponentially many paths between two vertices. I noticed that on my newly generated, undirected graph, this same calculation appears to be suffering from this limitation. Do you have any suggestions to speed up this calculation of simple paths between nodes?
I have tried reducing the cutoff parameter of get_all_simple_paths to half of the node length, because I can intuit the paths that I need from only this truncation, but the calculation is still very slow. I suspect that part of the slowdown comes from creating the new, undirected graph with the full adjacency matrix, whereas the other (accidentally) directed graph was created with only the upper triangular adjacency matrix and was definitely sparser. Any suggestions are welcomed.
This function is already very efficient, and it is unlikely that it could be sped up significantly.
There is a feature request to add a version which does not store results, but instead calls a function with each path that is found. This won’t speed up the calculation, in fact it will slow it down. But it will eliminate the need to store all paths at the same time. You can instead process them one by one as they are found. If this will help your use case, add your vote here:
If you know C, a pull request is very welcome. This is not a difficult task.
You can add your vote to the following two feature requests if they help your use case. These would make it possible to filter paths by length without needing a (slow) callback.