most efficient way to refer to a vertex in a graph

As stated in the tutorial, each vertex has a unique number as its id, starting from 0 continuously to N-1, where N is the number of vertices in the graph.

However, the id for a vertex p may change if one vertex prior to it was deleted from the graph. So what is the most efficient way to refer to p after the delete operation?

Currently, I can only define a unique “name” attribute for each vertex beforeahead, and then use seq = g.vs(name_eq = unique_name_of_p) and seq[0] to get the reference of vertex p, which I intuitively think may be really inefficient, especially when dealing with a large graph.

I wonder whether igraph provides a more direct way to refer to a specific vertex, like a static and unique key so that the vertex can be referred directly?

1 Like

The name attribute is treated in a special way by the internals of the Python interface; in particular, the name attribute is indexed internally in a Python dict so looking up the ID of a node given its name attribute is an O(1) operation (on average). The index is invalidated if you mutate the graph or change the name attribute. This happens here and here in the C implementation of the Python interface.

Furthermore, there are dedicated code paths for handling g.vs.find(name="something"), which is the idiomatic way to get a handle for the Vertex object corresponding to the given name. (Note that Vertex objects are ephemeral, though, they are essentially a container for a graph and a vertex index, so they become invalid when you mutate the graph). So, g.vs.find(name="something") should be able to look up a vertex with a given name in constant amortized time.

Also, any method of the Graph object that accepts a single vertex index as an input should also accept a vertex name instead, and it will look up the index of the vertex corresponding the given name using the index. So, if you have a vertex with name equal to "A" and you want to get its degreee, you can simply call g.degree("A"). For example:

>>> from igraph import Graph
>>> g=Graph.GRG(25, 0.4)
>>> g.vs["name"] = list("ABCDEFGHIJKLMNOPQRSTUVWXY")
>>> g.vs.find("I").index
8
>>> g.degree("I")
9
>>> g.degree(8) == g.degree("I")
True

Note that the existence of the index assumes that the vertex names are unique, but this is not enforced in any way If you accidentally assign the same name to two vertices, the index will map the name to one of the vertices but it’s unspecified which one will be the “winner” (and you won’t get any warnings either).

1 Like

Thanks for your comprehensive reply. This solved my problem.

By the way, I find that there is the relative explanation in the tutorial. Maybe I missed the part when I read the tutorial. Anyway, thanks for your time and patience.

A further question: can we get the handle of an edge in a graph by using its name attribute in O(1)?

I failed to get the handle of an edge by the following code:

edge = g.es.find(edge_name)

However, the following line works, which I do not know whether it is executed in O(1).

edge = g.es.select(name_eq = edge_name)

Compared with getting the handle of vertex, I think it is OK if igraph does not support getting the handle of an edge directly by name in O(1). Currently I use edge = g.get_eid(v1, v2) to get the handle of an edge. I just want to confirm the inner mechanism.

The name attribute of edges is treated the same way as any other attribute, so no, there is no indexing behind that and g.es.select(name_eq=edge_name) would make a full scan. In theory it wouldn’t be too hard to add indexing for the edges as well but so far it has been a relatively uncommon scenario. g.get_eid(v1, v2) is definitely faster; O(1) to find the vertices by name and O(log(d)) for finding the edge among the edges incident on v1 (or v2).

If you feel like your code could really benefit from constant-time edge lookups by name, let me know and we’ll figure out something.

Thanks for the reply. I agree with you that “add indexing for the edges as well but so far it has been a relatively uncommon scenario”. Currently g.get_eid(v1, v2) is enough for me.

This story about edge names will become much more interesting if we start working with multigraphs and remove edges at the same time. Consider the following graph:

Distinguishing the two edges between 1 and 3 is important because removing one merges the green/orange, removing the other merges the blue/orange faces. Currently, we can only distinguish edges by index. If we repeatedly do such removals (face mergers), the edge indices will change. Thus the only reliable way to refer to edges consistently is to give them names. Then it follows that fast lookup by name might be a good idea.

Another application is moving between the original graph and its line graph.

I think we’re not quite there yet though, it’s not yet time to implement fast edge lookup by name. We need a more practical use case first.

2 Likes

A post was split to a new topic: Check if vertex exists, create it if it doesn’t

Can the special name code paths handle arbitrary Python objects? I could not tell from the docs or this thread.
I have some classes that define __eq__ and __hash__ (so they can be used as keys in a dict) that represent vertices in a graph.

It seems like there is some support for arbitrary hashable objects (I can create a graph, and here in the source it looks like it’ll work because it is being just used as a dict key)

But then if I try to do g.neighbors(obj) (an many other operations expecting a vertex ID) I get this error.

~Most critically, for operations that accept an iterable of vertices like delete_vertices this means we’d have to do a linear scan lookup in Python (either manually or via another attribute and a select), which I think would make this sort of thing a lot slower than it needs to be.~

Ah I re-read above again, I see

Any method of the Graph object that accepts a single vertex index as an input should also accept a vertex name instead…

So it is expected that methods that accept iterables of vertices won’t work.
That’s a bummer because the point about performance still stands.
It’s even more critical for iterables of vertices than the single vertex car since that means doing the scan in Python, which is going to be slow.

No, they cannot. Well, in theory they could because we use an internal Python dictionary to map from vertex names to indices; the part where it breaks is when we handle method calls like g.degree(some_obj); here we need a way to convert some_obj into a vertex index or a list of vertex indices, and the rules right now are roughly as follows:

  • If some_obj is an integer, it is used as a vertex index as is.
  • If some_obj is a string, it is looked up in the name-to-index mapping.
  • If some_obj is an object of type Vertex, its index is retrieved from its index property.
  • In all other cases, we attempt to cast it into a numeric index using the __index__ magic method, and throw an error if it fails.

There is an escape hatch, though: the VertexSeq object has a property named _name_index, which contains the name-to-index mapping dict if the index is valid and None if the name-to-index mapping is currently invalid (has not been generated yet or has been invalidated recently due to a mutation on the graph). VertexSeq also has a hidden method named _reindex_names(), which forces the index to be (re-)generated. So, if you promise not to ever modify _name_index in a way that becomes inconsistent with the graph itself, you can use it to look up a vertex given its non-string name as follows:

from igraph import Graph
from typing import Any

def find_by_name(g: Graph, name: Any) -> int:
    mapping = g.vs._name_index
    if mapping is None:
        g.vs._reindex_names()
        mapping = g.vs._name_index
    return mapping[name]

E.g.:

>>> g = Graph.Full(9)
>>> g.vs["name"] = [divmod(vertex.index, 3) for vertex in g.vs]
>>> find_by_name(g, (2, 0))
6
1 Like

Interesting, that is useful.

One unfortunate thing is that even with that, if I want to do an operation on a sequence of vertices I still have to iterate over them and get their indices in Python. It would be nice to push that iteration into C since I think that would be faster.

Other than that, it would also be interesting to think of how the interface can be simplified, maybe I didn’t spend enough time with it to understand or see the advantages but that rule-set feels a bit hard to understand as a user.