Suggestions for implementing parallel runs of get_shortest_path in igraph Python

Hi!

I’m using igraph Python to calculate shortest paths between a large set of nodes. The graph is around 2.2 mil. nodes and 4 mil. edges.

I need to calculate for more than 100.000 sets of start and end nodes, the k shortest paths. Igraph has proved to be relatively fast to do this, but I’m looking to further speed up the calculations.

I’m rather new to parallelization, but to me this problem seems to be embarrasingly parallel (as for each set of start, end node is an independent calculation). However, experimenting with threading (joblib, dask, processing) did not yield speedup.

I think there might be two reasons:

  • The memory should be shared among the workers, but I did not manage to do that.
  • The underlying C code of Igraph makes it impossible to run in separate threads by Python.

Has anyone experience with doing this kind of parallelization? Or does anyone has advice how to approach this?

Any help is much appreciated. Thanks! :slight_smile:

Unfortunately the core C library of igraph is not guaranteed to be thread-safe so we cannot release the Python Global Interpreter Lock while an igraph function is running. This is the biggest obstacle to parallelization. You should probably use multiple processes instead of multiple threads within the same process. Check the multiprocessing library in Python. The downside is that you will have to read the same graph into memory for each process separately.