Does igraph accept json format and why are nodes represented as numbers?

Hello, I am very new to igraph and have a few questions.

Does igraph accept a dictionary of lists as input? For example, networkx has convert from dictionary of lists . I have a large json file with 24000 nodes and 52k edges which the value for each key being a list. An example of a key in my dictionary is shown below. Does igraph accept formats like this or will I need to rewrite my entire dictionary?

Example:

"tensor_util": [
    "tf_export.tf_export",
    "tensor_util._generate_isinstance_check",
    "<builtin>.frozenset"
  ],

Update:
I found out that igraph has from_networkx so I tried to use that but now my nodes are represented by numbers instead of text. Is there a way to search for a text since all my nodes are now numbers?

Hello!

python-igraph does not have a direct constructor yet for the dictionary-of-lists representation but it’s an easy one-liner to cast it into a list-of-pairs representation and then you can use Graph.TupleList():

adjlist = {"a": ["b", "c", "d"], "b": ["d"], "e": ["a"]}
pairs = sum(([(source, target) for target in targets] for source, targets in adjlist.items()), [])
g = Graph.TupleList(pairs)
print(g.vs["name"])
print(g.get_edgelist())

now my nodes are represented by numbers instead of text. Is there a way to search for a text since all my nodes are now numbers?

Vertices in an igraph graph always have numeric identifiers from 0 to |V|-1; this is because igraph is rooted in C where it is much easier to work with contiguous ranges of numbers than arbitrary strings. Typically, the string names of the vertices are stored in the name vertex attribute of the graph, which you can retrieve with:

g.vs["name"]

This gives you a list where the i-th element is the name of the i-th node. You can turn this into a lookup table if you need the reverse mapping:

>>> name_to_index = {v: k for k, v in enumerate(g.vs["name"])}
>>> print(name_to_index)
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}

However, in 99% of the cases you won’t need the mapping because any python-igraph function that needs a vertex ID can also accept a name instead, and it will automatically look up the ID of the vertex behind a scenes (and it cache a mapping similar to the one I constructed manually above). E.g., in the example graph from my first code snippet, you can get the degree of node c as follows:

>>> g.degree("c")
1

This is the same as referring to the node by its index:

>>> g.degree("c") == g.degree(2)
True

Just made a PR for this: Construct graph from adjacency dict of sequences (e.g. of lists) by iosonofabio · Pull Request #434 · igraph/python-igraph · GitHub

Hi, I also have the same questions. I can understand that:

Vertices in an igraph graph always have numeric identifiers from 0 to |V|-1; this is because igraph is rooted in C where it is much easier to work with contiguous ranges of numbers than arbitrary strings.

However, I do not think it is a good decision to ask user to find the numeric id before getting the handle of a vertex, although it is true that “99%” of the cases we can use the name (in string type) of a vertex instead of its numeric id. The reasons are as follows:

  1. The numeric id will change if some vertices are removed from the graph, so that users can not save numeric ids for further usage if the graph will change in future time. Up to now, I find that the numeric id can only be used when calling functions, which actually provides very little information for users.

  2. There is no need to let users know the inner numeric ids of vertices. They only care about the “name” and other attributes of each vertex. It is enough to enable users to use “name” to get the handle of each vertex.

Users can get the handle of a vertex from its “name” attribute by using vertex = g.vs.find(name). However, it is one more step than vertex = g.vs[numeric_id].

Are there some special considerations for current settings of python-igraph?

Yes, there is. This is not an arbitrary and meaningless “numeric id”. It is the position of the vertex in the vertex list. Almost all igraph functions that compute some property associated with vertices (such as vertex betweenness) return a vector ordered identically to the vertex list.

Also, remember that igraph is a C library that has interfaces to multiple high level languages, including Python, R, Mathematica. It is not designed with Python only in mind. igraph manages to be much faster than pure-Python libraries such as networkx precisely because it is written in C. While it would be in principle possible to return these results in a different format (e.g. dictionary mapping from vertex “names” to values), this would be a source of inefficiency. In my mind, enforcing an inefficiency is not acceptable for igraph. Providing slow-but-convenient solutions is useful, but providing only these and denying users access to the fast internals is not okay.

1 Like

Just my 2 cents, hoping they’re useful.

igraph’s user base is quite broad, from beginners to decent coders. Beginners tend to not care much about speed, but more advanced users sometimes benefit from knowing a little bit about the internals of the library to optimise properly.

Internally, igraph uses ordered lists of vertices and does not require vertices to have any attribute, even a name. If you know that, you can exploit it to optimise your algorithms, so we try to let the (advanced) users know about it.

That said, there’s no denying that a dict-like approach to graphs such as networkx takes is convenient in many cases. I can imagine igraph’s Python interface to slowly drift towards a more common use of name-based patterns (functions, data structures,etc) for simplicity’s sake, and users’ opinions and examples are influential as of how likely that is to happen. But internally, vertices will stay ordered with an integer id for a while I think, because of the C constraints Szabolcs just mentioned.

1 Like