Betweenness and communities (cluster_edge_betweenness)

Could someone explain how communities are determined based on R’s cluster_edge_betweenness function? I would like to know why, in the attached graph, p14 and p15 are grouped with the g6 community and not with the other one (g3 + g5). And how can I force these two nodes to be clustered with g3/g5? I tried using ‘weights’ but no luck.

library(igraph)
library(dplyr)

mydf <- data.frame(id=LETTERS, locus=c(rep("alpha",14), rep("beta",12)), 
     group=c(rep(1,2), rep(2,6), rep(3,6), rep(4,4), rep(5,4), rep(6,4)),
                   pair=c(1:12,14,15,3,4,6,8,9:16))

mydf <- filter(mydf, group %in% c(3,5,6))
mydf

g <- mydf %>%
  dplyr::mutate(sg = paste0("g", group), sp = paste0("p", pair)) %>%
  dplyr::select(sg, sp, locus) %>%
  graph_from_data_frame(directed=FALSE) 

g
#IGRAPH f429061 UN-- 11 14 -- 
#+ attr: name (v/c), locus (e/c)
#+ edges from f429061 (vertex names):
# [1] g3--p9  g3--p10 g3--p11 g3--p12 g3--p14 g3--p15 g5--p9  g5--p10 g5--p11 g5--p12 g6--p13
#[12] g6--p14 g6--p15 g6--p16

ceb <- cluster_edge_betweenness(g)
  
plot(ceb, g, 
     vertex.shape=c(rep("square",3), rep("circle",8)),
     col="yellow", 
     edge.color = as.factor(E(ig)$locus), 
     edge.width=2 + (E(ig)$locus=="alpha") * 2)

There is a reference in the documentation to the original paper by Girvan and Newman, which you can look at.

This is a well-known community detection algorithm that today is usually discussed only for educational purposes. It does not do very well, in general. There is a (somewhat mediocre) Wikipedia article about how it works which you can also look at: Girvan–Newman algorithm - Wikipedia

Because g3--p15 and g3--p14 are the highest betweenness edges, thus they are removed first.

If you just want to plot manually specified groups, you can enter your own membership vector.

Thank you szhorvat! That’s very helpful.

I was hoping to find a method that groups g3–p15 and g3–p14 in the same community as {p9, p10, p11, p12}. Can I use “weights” to reduce the betweenness for those two particular edges, or any other way?

If you want to use a certain partitioning, why are you looking for a method that returns it? You already have the partitioning.

I know the partitioning for only those two edges. I don’t have the partitioning for all cases, that’s why I need a method to do it in general. I thought there may be a way to use edge attributes (which I have - the locus variable) to make the algorithm choose one or the other community for certain edges that fall in-between clusters. :slight_smile: