Filtering of cluster sizes

Hi there,
I would like to filter all clusters below a certain size in my graph.

I have the current code implemented:

# g is a generated graph containing a number of vertices and edges

   clusters = g.clusters()
   clms = clusters.membership

   vertices_togo = []

   for cluster in clusters:
       if len(cluster) < 5:
           vertices_togo += cluster

   g.delete_vertices(vertices_togo)

Is there a more efficient implementation?

Thanks,
Jacob

I think this is about as fast as it gets, at least for the accumulation of the vertices that you would like to remove from the graph.

Depending on the number of nodes that you want to remove from the graph it might be cheaper to construct a new graph consisting of only the nodes that you want to include using induced_subgraph, at least if you are also retaining a copy of the original graph g somewhere. If you don’t care about the original graph, then simply removing the vertices is indeed best I think.

Is this part forming a bottle-neck for you?

Thanks for the response vtraag, and sorry for my delay.

This process is not inefficient for me - I just wanted to make sure that there wasn’t some cleaner syntax implemented into igraph already.

On a tangent, could you explain why g.delete_edges(edge_list) is faster to call a single time rather than repeatedly? For example, as with above, if we called g.delete_edges under each loop rather than outside of the loop after it has run?

It took me forever to realize that calling it repeatedly was bottlenecking my speed, but my function dropped from ~4 seconds to ~0.05 seconds after moving it out of the loop.

Again, sorry for the tangent, just curious to hear the rationale behind this.

Currently, each call to delete_edges will recreate the entire graph. The amount of data that is moved around is not proportional to the number of edges you delete. Instead, it is always the entire graph.

There are compromises to be made when choosing a way to represent graphs on a computer. Efficient modification is in conflict with efficient access. There are plans to review igraph’s current representation, and its tradeoffs, but this is a long-term project.

Makes much more sense now…

I’m glad I messed around with the code enough to figure out that delete_edges was slowing down my program, but I had no trouble appending the target edges to a list for deletion after my analysis loop.

Thanks