igraph is much slower than networkx when generating a graph

I use the following code to compare the efficiency of igraph and networkx when generating a graph. The results are shown at the bottom, which indicate that igraph is hundreds of times slower than networkx when the number of vertices exceeds 5000.

You can run the following code directly in Python 3.x to get my results.

import random
import time
import igraph as ig
import networkx as nx


def ig_vs_nx(N):
    g1 = ig.Graph(directed=True)
    g2 = nx.DiGraph()

    node_list = ["p"+str(i) for i in range(N)]
    edge_set = set()
    for j in range(2*N):
        node1 = "p" + str(random.randint(0, N-1))
        node2 = "p" + str(random.randint(0, N-1))
        edge_set.add((node1, node2))

    time_start = time.time()
    # add nodes and edges to g1
    for node in node_list:
        g1.add_vertices(node)
    for node1, node2 in edge_set:
        g1.add_edges([(node1, node2)])
    time_ig = time.time()

    # add nodes and edges to g2
    for node in node_list:
        g2.add_node(node, name=node)
    for node1, node2 in edge_set:
        g2.add_edge(node1, node2)
    time_nx = time.time()

    print("------------------------------------------------------")
    print(f"Node num: {len(node_list)}; Edge num: {len(edge_set)}")
    print(f"ig time: {time_ig - time_start}s")
    print(f"nx time: {time_nx - time_ig}s")
    print(f"ig/nx: {(time_ig - time_start)/(time_nx - time_ig)}")


if __name__ == "__main__":
    N_list = [1000, 5000, 10000, 15000, 20000]
    for N in N_list:
        ig_vs_nx(N)

Results:

igraph’s internal C data structure is an indexed edge list; it facilitates quick lookups and queries in most cases while being expensive when it comes to mutating the graph. The net result is that adding a single new edge to the graph is almost as expensive as adding many of them (because the internal indices have to be re-calculated anyway, no matter whether you are adding one edge or one million edges in a single batch). To prove my point, I’m attaching a script that adds 10K edges to an empty graph, first with add_edge(), then with add_edges(); this is the output:

Edges one by one, v1: 1.646003007888794s
Edges one by one, v2: 1.5165398120880127s
All edges at once: 0.0014331340789794922s

And this is the script:

Script for testing edge addition performance
from contextlib import contextmanager
from igraph import Graph
from random import randint
from time import time


@contextmanager
def measure(message):
    start = time()
    try:
        yield
    finally:
        dt = time() - start
        print(f"{message}: {dt}s")


def add_edges_one_by_one_v1(n, edges):
    g = Graph(n)
    for edge in edges:
        g.add_edge(*edge)


def add_edges_one_by_one_v2(n, edges):
    g = Graph(n)
    for edge in edges:
        g.add_edges([edge])


def add_edges_at_once(n, edges):
    g = Graph(n)
    g.add_edges(edges)


def main():
    n, m = 1000, 10000
    edges = [
        (randint(0, n-1), randint(0, n-1)) for _ in range(m)
    ]

    with measure("Edges one by one, v1"):
        add_edges_one_by_one_v1(n, edges)

    with measure("Edges one by one, v2"):
        add_edges_one_by_one_v2(n, edges)

    with measure("All edges at once"):
        add_edges_at_once(n, edges)


if __name__ == "__main__":
    main()

Thanks for your reply. I think it would be better to use networkx first when generating a graph, which requires frequent operation of adding vertices or edges. And then add all the nodes and edges from networkx graph to igraph graph at once for subsequent calculation.

In my experience, in the majority of applications, you can simply collect edges into a list first, then build the graph in one step. Most of the time, there is no need to use a graph data structure when building a graph step-by-step. A list of edges will do just fine.

This applies not just to igraph, but to most graph libraries.

Can you explain why you feel you need to maintain an actual graph (as opposed to list of edges) at each step of the construction?

@tamas @iosonofabio I notice that our tutorial shows functions like add_vertices and add_edges at the very beginning: Tutorial. This makes the impression that a procedural, step-by-step construction is normal or even preferable, which in fact it’s the worst thing one can do. We should change this.

The situation is made worse by the fact that this is Python, a highly procedural language. In R or Mathematica it would be obvious to to any non-novice user not to do this, but not in Python.

There’s nothing wrong with add_edges, I think. But if one uses it in a for loop, feeding a made-up list of one edge at a time, that user hasn’t understood how add_edges is designed to work, and we should clarify that in the tutorial and docstring, agreed.

I agree with @iosonofabio; I think there’s nothing wrong with showing add_vertices() and add_edges() in the tutorial early at the beginning, but then we should probably add a warning an an admonition box that explains this pitfall. This is similar to how Python’s strings work; in Python, string concatenation is a slow operation because strings are immutable, so building a long string character by character is an O(n^2) operation. However, it is perfectly fine to use string concatenation for smaller strings. You simply need to be aware that if you are building up a string from thousands of small chunks, then the idiomatic way of doing that is to collect the chunks in a list and then run "".join(chunks).

Would it not be better if the first few code lines in the tutorial showed the recommended and efficient way of constructing graphs?

What is the first thing you will do when learning a graph library? Construct a graph. Implicitly, the tutorial suggests that this is done by creating an empty graph, then “adding” vertices and edges in small steps. That’s just wrong. There is zero advantage to doing that, and many disadvantages.

We can add extra explanations to the tutorial, which will make it longer and more tedious to read. Or we can start immediately with the right way, no extra explanations needed.

The difference is that this computationally inefficient way of constructing a string is efficient in terms of writing code (it’s often simple). This does not apply to constructing graphs. The step-by-step method is both slow and more complicated to write.

EDIT: Thinking more about it, creating and mutating graphs should be separate topics in the tutorial.

1 Like

Great ideas Szabolcs, go ahead and make a PR when you have time and I’ll be happy to take a look at your changes, overall it sounds promising!

Just to be clear, I won’t have time for this for quite a while. I have a long TODO list for igraph with higher-priority items (such as finally updating the Mma interface to use 0.9).