Filtering of cluster sizes

JBum21 · 1 April 2021 20:32

Hi there,
I would like to filter all clusters below a certain size in my graph.

I have the current code implemented:

# g is a generated graph containing a number of vertices and edges

   clusters = g.clusters()
   clms = clusters.membership

   vertices_togo = []

   for cluster in clusters:
       if len(cluster) < 5:
           vertices_togo += cluster

   g.delete_vertices(vertices_togo)

Is there a more efficient implementation?

Thanks,
Jacob

vtraag · 6 April 2021 09:20

I think this is about as fast as it gets, at least for the accumulation of the vertices that you would like to remove from the graph.

Depending on the number of nodes that you want to remove from the graph it might be cheaper to construct a new graph consisting of only the nodes that you want to include using induced_subgraph, at least if you are also retaining a copy of the original graph g somewhere. If you don’t care about the original graph, then simply removing the vertices is indeed best I think.

Is this part forming a bottle-neck for you?

JBum21 · 9 April 2021 05:29

Thanks for the response vtraag, and sorry for my delay.

This process is not inefficient for me - I just wanted to make sure that there wasn’t some cleaner syntax implemented into igraph already.

On a tangent, could you explain why g.delete_edges(edge_list) is faster to call a single time rather than repeatedly? For example, as with above, if we called g.delete_edges under each loop rather than outside of the loop after it has run?

It took me forever to realize that calling it repeatedly was bottlenecking my speed, but my function dropped from ~4 seconds to ~0.05 seconds after moving it out of the loop.

Again, sorry for the tangent, just curious to hear the rationale behind this.

szhorvat · 9 April 2021 06:54

Currently, each call to delete_edges will recreate the entire graph. The amount of data that is moved around is not proportional to the number of edges you delete. Instead, it is always the entire graph.

There are compromises to be made when choosing a way to represent graphs on a computer. Efficient modification is in conflict with efficient access. There are plans to review igraph’s current representation, and its tradeoffs, but this is a long-term project.

JBum21 · 11 April 2021 21:39

Makes much more sense now…

I’m glad I messed around with the code enough to figure out that delete_edges was slowing down my program, but I had no trouble appending the target edges to a list for deletion after my analysis loop.

Thanks

Topic		Replies	Views
Remove singleton clusters using induced subgraph Usage Python , R	7	902	17 November 2022
Long running time in cluster_edge_betweenness function Usage R	1	522	21 August 2022
Subgraph vs g.vs.select behavior Usage Python	9	1030	17 January 2022
Extracting single linkage clusters from a very large pairs list Usage R	0	330	22 January 2021
How to remove isolates in Graph object in R Usage R	3	504	16 November 2022

Filtering of cluster sizes

Related topics