Subgraph vs g.vs.select behavior

JBum21 · 19 December 2021 22:37

Hello,
In my usage of igraph, I consistently extract attribute information from vertices in my graphs. These graphs can have an upwards of 1e6 vertices, with typically a degree of 2-3 per vertex.

I’ve found that it’s faster to extract vertex attribute information from vertices by first creating a subgraph of the vertices and then pulling information from them from that subgraph. Mainly, I’m posting because I want to ask if this observation in typical? Am I doing something incorrectly to be causing the vertex retrieval to be so slow?

See a minimally reproducible example below.

import igraph as ig
import numpy as np
from time import perf_counter as pf

### Graph setup ### 
vs = 1000000
es = 1000000

edges = np.random.randint(0, vs, (es, 2))

g = ig.Graph()
g.add_vertices(vs)
g.vs['radius'] = np.random.randint(0, 20, vs)
g.add_edges(edges)
g.vs['coords'] = np.random.randint(0,1000, (vs, 3))

components = g.components()

### V selection attribute extraction ### 
t = pf()
for component in components:
    vs = g.vs[component]
    coords = vs['coords']
print (f"Vertex isolation speed: {pf() - t:.2f} s")

### Subgraph attribute extraction ### 
t = pf()
for component in components:
    c_graph = g.subgraph(component)
    coords = c_graph.vs['coords']
print (f"Subgraph isolation speed: {pf() - t:.2f} s")

And an example speed test on my computer.

Vertex isolation speed: 169.08 s
Subgraph isolation speed: 33.84 s

tamas · 20 December 2021 20:13

I’ll take a look at this soon (probably tomorrow) and get back to you. It does not seem normal to me at first glance.

tamas · 26 December 2021 20:38

Sorry, Christmas season struck and I dropped the ball on this one. I’ll get back to this as soon as possible, hopefully some time next week. Thanks for your patience!

tamas · 26 December 2021 21:50

Okay, I think I’ve found the bottleneck deep in the Python-C glue code; g.vs[component] unnecessarily constructs a vector in C that contains all the integers from 0 to g.vcount()-1 and then indexes that; that’s why g.vs[component] is so slow. I will provide a fix for this in the next patch release. Thanks for reporting!

You can follow the issue in the issue tracker here.

JBum21 · 27 December 2021 17:35

Thanks for the update tamas - will definitely be keeping an eye out for this update!

JBum21 · 27 December 2021 17:35

For the time being - is this a problem with a recent update? I.e., do older versions of igraph have faster vertex access? If so, I might downgrade for the moment for the sake of speed. Thanks!

tamas · 5 January 2022 07:50

Sorry for the late reply, Christmas holidays I don’t think it’s a recent problem, older versions are likely to have the same issue. However, I plan to fix this for the next patch version in a few days so stay tuned.

The planned schedule is as follows:

Version 0.9.6 of the C core is released later today (if everything goes well with the CI tests of the various higher-level interfaces).
Then I’ll fix this bug, hopefully this week.
A new version of python-igraph is then expected towards the end of this week.

tamas · 5 January 2022 09:07

Okay, so now I have a patch that brings down the time of the “vertex isolation” case to 0.22s from 88.57s with 1 million vertices. I’ll test it a bit more, then fix the same thing in g.es[...] and then this will go into 0.9.9.

JBum21 · 17 January 2022 20:01

Awesome. I’ll be looking forward to the new release. Thanks tamas!

szhorvat · 17 January 2022 21:00

0.9.9 is already out. You can install it right now.

Topic		Replies	Views
Select based on search within vertex attribute (list) Usage Python	0	121	29 November 2023
Attributes in the linegraph function Usage Python	4	181	16 March 2023
Filtering of cluster sizes Usage Python	4	705	11 April 2021
igraph is much slower than networkx when generating a graph Usage Python	10	2849	10 September 2021
Select vertices on the basis of a primary and secondary attribute Usage Python	0	404	22 April 2020

Subgraph vs g.vs.select behavior

Related topics