Interoperability between python-igraph and other Python network packages

This thread is for continuing the discussion in:

about interoperability between igraph and networkx, and extending it to other Python graph packages starting with graph-tool (for which we already have a simple converter).

Fabio’s PR is also on the same topic:

Re-posting my “executive summary” so people can easily quite and respond to the various points:


I wrote a lot (= too much) in the above comment, here’s the executive summary:

Before arriving to a final decision, we must test our ideas on practical use cases. These must work well:

  • Work primarily in igraph, and use a networkx function to compute node-specific or edge-specific information on my graph. I need to be able to associate the result with the appropriate igraph node or edge indices.
  • Work primarily in networkx and use igraph functions to compute edge specific information. I need to be able to associate the result with (u,v,key) networkx multigraph edge specifications. The same goes for node-specific information.

Specific proposals I made (and a few I didn’t mention above):

  • to_networkx should have a create_using argument to choose which nx class it should use. All nx graph generators have this argument.
  • The default create_using should be “automatic” and behave just as it does right now, except use ordered graphs (see below).
  • Round tripping either as nx -> igraph -> nx or as igraph -> nx -> igraph should be reasonably lossless. Priority should be on igraph -> nx -> igraph as we first want to support users who work in igraph and reach out to nx.
  • Achieve nx -> igraph -> nx round-tripping by:
    • Preserve and restore nx vertex name in igraph vertex attribute
    • Preserve and restore nx multi-edge keys in igraph edge attribute
    • Preserve and restore original nx graph class in igraph graph attribute
  • Achieve igraph -> nx -> igraph round-tripping by:
    • Using ordered nx graph classes in to_networkx by default in order to be able to preserve the igraph edge index (edge ordering). We must investigate (1) if this has any downsides such as performance (2) if this fully solves the problem in practice.
  • The above two bullet points are not fully consistent with each other—any suggestion to reconcile them?

Specific TODO item:

  • Create small example programs or notebooks where we use an igraph/networkx graph with a networkx/igraph algorithms and bring the results back to the original system. What were the challenges?

I finally read the notes on the Ordered networkx classes more carefully:

It looks like these are not a good choice: they are not recommended, ordering is present in Python 3.6+ anyway, but ordering apparently cannot be fully controlled (??).

So we need to look for different ways to preserve the igraph edge index when transferring graphs over to networkx. Perhaps an attribute?

I guess that the Ordered... classes in NetworkX are relevant only if you are stuck on Python < 3.6 where the dictionary classes are not ordered. In Python 3.6, the internal data structure behind the dict class was redesigned and now it is guaranteed to have a consistent ordering, which is the order of insertion.

However, note the caveats:

  • The Python language spec does not mandate that the dictionary class is ordered, it only happens to be that the CPython implementation committed itself to having an ordered dictionary class (and even documented it). I’m not sure whether such guarantees exist for PyPy, which we support in a half-hearted way as I don’t really see much point in it. PyPy does not use reference counting (like CPython), but it simulates reference counts in order to be able to run CPython C extensions, and the simulation actually makes igraph much slower in PyPy than in CPython.

  • Even if the CPython dict is ordered, we would still be at the perils of NetworkX’s own internal implementation as we have to trust that NetworkX inserts the edges in the order we feed them from igraph.

So, yes, using an edge attribute is probably the way to go.

Which, according to the link I posted, is not guaranteed to be the case.

:+1:

Thanks guys for initiating this discussion. I’ve been waiting for this conversation to happen before working more on the transparent API, lest I dash ahead in a totally unsupported direction :wink:

Agreed with adding a single edge hidden attribute, tentatively called _nx_multiedge_key, when converting nx -> ig, and using it to restore the nx multiedge order on the way back. Of course, there are some small issues, e.g. if the user purposefully changes the order of edges in igraph and then goes back, what are we to do then?

This ties in the transaprent API more than it appears at first sight. If you are using networkx and all you need is the result of a computation in igraph,it would be expected that using the API is safer than doing the conversions manually. Because the converting infrastructure keeps a reference to both the original nx network as well as the igraph network throughout the igraph computation, it can deal with this bookkeeping more easily.

The point of friction of the transparent API is to limit and organise the kinds of arguments/returns that the API accepts as input/output:

  • single value (e.g. graph diameter, single vertex attribute): no conversion needed
  • single vertex: convert the igraph index to nx hashable
  • single edge: same, but with some care for multiedges
  • list of vertices: list of hashables
  • list of edges: list of whatever we do for a single edge
  • single attribute for a list of vertices: convert to a dictionary with vertex hashables as keys
  • single attribute for a list of edges: as for vertices, plus caveat for multiedges
  • several attributes for a single vertex: dictionary with attribute names as keys (I guess?)
  • several attributes for a single edge: ditto
  • several attributes for several vertices; we should decide what to do (dict of dicts? in what order?)
  • several attributes for several edges: ditto
  • whole graph (changes in place): as above, but modify the input nx graph directly
  • subgraph (e.g. thinning of some kind): this one is tricky to preserve order and attributes correctly

For everything except the last case, we don’t need to mess with internal vertex/edge attributes too much, so that simplifies things quite a bit I think.

What do you guys think?

Correction: There is no edge order in networkx, only edge keys (which are not necessarily integers, nor anything orderable, merely something hashable).

igraph has edge (and vertex) ordering, and correspondingly, edge (and vertex) indices.

networkx has no edge (or vertex) ordering that one can rely on. It refers to edges based on endpoints. Multi-edges are distinguished using edge keys (as based on their endpoints, they are indistiguishable).

I may be misunderstanding here, please correct me if I am. If we’re talking about preserving data when doing an nx → ig → nx conversion, then the edge order in igraph is irrelevant. _nx_edge_key will preserve the edge keys for networkx, but this is independent of edge ordering in igraph.


Re the transparent API: This can be a very useful thing that makes it easier to work with multiple network libraries. But it is also a very ambitious project, that requires a lot of more details to be worked out that a relatively lossless nx → ig → nx and ig → nx → ig conversion.

It would be useful to first find a way to handle conversions in a way that is perhaps not perfect, but will support most common use cases with reasonable performance. This is a much smaller project than a full transparent API.

But since it does naturally tie into the transparent API, and lessons learned from the transparent API may prompt us to modify the conversion functions later, it might be a good idea to mark the conversion functions as “experimental” for the moment.