Remove singleton clusters using induced subgraph

MOutram · 10 November 2022 06:20

Hi there,

I apologise in advance as I am relatively new to igraph, and coding in general, and I hope my question is okay. But a bit of background - I have used igraph to create a graph with vertices and edges and then used the leiden algorithm to calculate my network.

My graph/network contains ~5800 vertices/nodes that are subsequently clustered into 2739 clusters, though the majority of these are singletons (~2000 - and I point out that I am not concerned by the number/think it is a true reflection of the data). For the purposes of clarity I would like to not plot these clusters i.e. if cluster <2 don’t show on final plot.

I believe to do this I want to create an induced subgraph and plot that? from here igraph R manual pages (please correct me if I’m wrong), but I am unsure how to reflect keeping only clusters above 2 in the vids argument (Numeric vector, the vertices of the original graph which will form the subgraph.). I am working in python for reference but am thinking to move to R for visualising…

Thanks in advance for any help!

KeesP · 10 November 2022 20:00

In r+igraph, a solution might look like this:

require(igraph)
## create graph with named vertices (g)
## keep vertices with degree > 0
## plot result
g <- sample_gnm(20, 12)
V(g)$names <- as.character(V(g))
plot(g)
g2 <- subgraph(g, which(degree(g) > 0))
V(g2)$label <- V(g2)$names
dev.new()
plot(g2)

MOutram · 11 November 2022 00:24

Thank you so much - this did exactly what I need!

KeesP · 11 November 2022 10:09

Using subgraph() may not be the most obvious solution for removing singletons from a graph.
That is directly removing the unwanted vertices. As in this example.


require(igraph)
require(microbenchmark)
set.seed(1L)
g <- sample_gnm(5000L, 10000L)
V(g)$names <- as.character(V(g))
hist(degree(g))

## help for minus: help("-.igraph")
  microbenchmark( g2 <- subgraph(g, which(degree(g) > 0L))
                , g3 <- g - V(g)[which(degree(g) == 0L)]
                , g4 <- delete_vertices(g, V(g)[which(degree(g) == 0L)])
)
isomorphic(g2, g3)
isomorphic(g2, g4)

KeesP · 11 November 2022 12:55

In addition to singletons, one could remove small weakly connected components as follows:

require(igraph)
set.seed(1L)
g <- sample_gnm(5000L, 10000L)
V(g)$names <- as.character(V(g))
min_wcc <- 2

## Calculate weakly connected components (see Wikipedia) and
## delete all weakly connected components with less then min_wcc members.
## Or feel free to apply a different fiter.
## Filter min_wcc equal to 2, will remove singletons only.
wcc <- components(g, mode='weak')
wcg <- groups(wcc)
sz <- table("Community sizes" = wcc$membership)
table("Community frequency, dimnames by size" = sz)
filter_clusters <- ( which(sz < min_wcc) )
V_to_delete <- unlist(wcg[filter_clusters])
g5 <- delete_vertices(g, V_to_delete)
table("Community sizes, indexed by community (after cleansing)" =
      components(g5, mode = "weak")$membership)

Output:

Community frequency, dimnames by size
   1    2 4893 
 101    3    1 

Community sizes, indexed by community (after cleansing)
   1    2    3    4 
4893    2    2    2

MOutram · 16 November 2022 06:10

This has been incredibly helpful thank you so much for your time!

To ask a follow up - with your example to remove weakly connected components I get the below error (and same error when I add in my actual data). I was under the impression that wcc is a list? Am I missing something?

require(igraph)
set.seed(1L)
g ← sample_gnm(5000L, 10000L)
V(g)$names ← as.character(V(g))
min_wcc ← 2

wcc ← components(g, mode=‘weak’)
wcg ← groups(wcc)
Error in UseMethod(“groups”) :
no applicable method for ‘groups’ applied to an object of class “list”

Could I additionally ask one more question which is perhaps the reverse of my original question. Is it possible to select all the nodes and edges in a single cluster or set of of clusters (e.g. clusters 1:10) to subsequently graph as well…

KeesP · 16 November 2022 20:43

When I try

require(igraph)
help(groups)

I see the following example:

g <- make_graph("Zachary")
fgc <- cluster_fast_greedy(g)
groups(fgc)

g2 <- make_ring(10) + make_full_graph(5)
groups(components(g2))

If this does not work on your system, perhaps the groups() function has been redefined. To avoid confusion you could try: igraph::groups(). This retrieves the function groups() from the igraph library.

If this does not remedy the problem then a separate question on this forum is worthwhile.

On my system groups() is defined as:

>groups
function (x) 
UseMethod("groups")
<bytecode: 0x00000000176d83e0>
<environment: namespace:igraph>
>

Version of igraph:

> igraph_version()
[1] "1.3.1.9037"
>

KeesP · 17 November 2022 09:07

Because an example paints a thousand words.


require(igraph)
g <- make_graph(c(),2, directed=FALSE)
g <- g + make_ring(3) 
g <- g + make_tree(8+1, 2, mode="undirected")
g <- g + make_full_graph(16)
V(g)$names <- as.character(V(g))
dev.new(); plot(g)
min_wcc <- 2
topx    <- 3

## Calculate weakly connected components and
## delete connected components with less then min_wcc members.
## Or feel free to apply a different fiter (examples provided).
## Note wcg is indexed by cluster id.
## Filter min_wcc equal to 2, will remove singletons only.
wcc <- components(g)
wcg <- groups(wcc)
wcg

## names(sz), component ids.
## sz         corresponding number of vertices.
sz <- table("Component sizes, indexed by component id" = wcc$membership)
sz
table("Number of components, indexed by frequency" = sz)

## examples of how to filter components.
##
##   components by component id.   
##   components with maximal number of vertices. 
##   components in top 3 by cluster size.
##   components with singletons removed.
##
## filter_components <- c("3", "4")
## filter_components <- which(sz == max(sz) )   
## filter_components <- which(rank(-sz, ties.method= "min") <= topx)
## filter_components <- which(sz > min_wcc)

## Clear graph.
filter_components <- which(rank(-sz, ties.method= "min") <= topx)
V_to_keep <- unlist(wcg[filter_components])
g2 <- subgraph(g, V_to_keep)
table("Component sizes, indexed by component id (after cleansing)" =
      components(g2)$membership)

## Leiden communities.
## Note, same resolution across several components.
resolution <- quantile(degree(g2))[3] / (ecount(g2) - 1)
lc <- cluster_leiden(g2,  resolution = resolution)
dev.new(); plot(g2, mark.groups=lc)

Topic		Replies	Views
How to remove isolates in Graph object in R Usage R	3	504	16 November 2022
Filtering of cluster sizes Usage Python	4	706	11 April 2021
Extracting single linkage clusters from a very large pairs list Usage R	0	330	22 January 2021
SubGraphs on basis of Node Degree Usage R	4	249	23 November 2022
Unexpected results with the subtraction operator / difference function Usage Python	2	226	13 May 2023

Remove singleton clusters using induced subgraph

Related topics