Issue verifying betweenness centrality for a directed graph in R

Consider a simple R object containing the following edge list for a directed graph:

myArcs_igraph

   source_node_ISO target_node_ISO
1               FR              BE
2               NL              BE
3               ES              BE
4               BE              FR
5               IT              FR
6               ES              FR
7               BE              DE
8               FR              DE
9               IT              DE
10              NL              DE
11              PL              DE
12              ES              DE
13              FR              IT
14              DE              IT
15              NL              IT
16              PL              IT
17              ES              IT
18              BE              NL
19              FR              NL
20              DE              NL
21              IT              NL
22              ES              NL
23              GB              NL
24              FR              PL
25              DE              PL
26              IT              PL
27              NL              PL
28              ES              PL
29              GB              PL
30              FR              ES
31              NL              ES
32              FR              GB
33              NL              GB
34              PL              GB
35              ES              GB

In R I wish to verify the betweenness centrality for the above graph but something doesn’t feel right.

I proceed as follows:

library(igraph)

myArcs_igraph <- as.matrix(myArcs_igraph)
myGraph <- graph_from_edgelist(myArcs_igraph, directed = TRUE)
test_bc <- centr_betw(myGraph, directed = TRUE)$res

or equivalently

test_bc ← betweenness(myGraph, directed = TRUE, normalized = FALSE)

The output I get is as follows:

names(test_bc) <- names(V(myGraph))
as.data.frame(test_bc)

      test_bc
FR  3.5000000
BE  0.5833333
NL 10.7500000
ES  0.5833333
IT  4.1666667
DE  1.5000000
PL  2.0833333
GB  0.833333

Yet something seems off: some nodes never fall on the geodesics connecting other nodes, yet igraph assings to every node a non-zero betweenness

To verify the statement above I proceed as follows (using only igraph’s stock functions)

myGeod <- distances(myGraph, mode = “out”)

node_in_paths ← lapply(names(V(myGraph)),function(z){
chosen_node_bc ← z
chosen_node_bc_idx ← which(names(V(myGraph)) %in% chosen_node_bc)
myGeod_chosen_node ← myGeod[-chosen_node_bc_idx,-chosen_node_bc_idx]
nodename_test ← colnames(myGeod)[chosen_node_bc_idx]
myGeod_chosen_node_detail ← lapply(seq(1:nrow(myGeod))[-chosen_node_bc_idx], function(x){
sublist_x ← shortest_paths(myGraph, x, V(myGraph)[-chosen_node_bc_idx], mode = “out”)$vpath
temp_v ← sapply(sublist_x, function(y){
path_y ← c(names(y))
length(which(path_y %in% nodename_test))
})
temp_v ← t(as.matrix(temp_v))
colnames(temp_v) ← names(V(myGraph)[-chosen_node_bc_idx])
rownames(temp_v) ← names(V(myGraph)[x])
temp_v
})
myGeod_chosen_node_detail ← do.call(“rbind.data.frame”, myGeod_chosen_node_detail)
myGeod_chosen_node_ratio ← as.matrix(myGeod_chosen_node_detail/myGeod_chosen_node)
myGeod_chosen_node_ratio[which(!is.finite(myGeod_chosen_node_ratio))] ← 0
myGeod_chosen_node_ratio
})
names(node_in_paths) ← names(V(myGraph))

Then summing the matrices in the list

sapply(node_in_paths, function(z){
sum(z)
})

  FR        BE        NL        ES        IT        DE        PL        GB 
4.1666667 0.8333333 3.8333333 0.0000000 2.1666667 0.0000000 0.0000000 0.0000000 

From the above we get that ES, DE, PL, GB never fall into other nodes’ geodesics. Yet, as mentioned earlier, igraph returns a non-zero betwenness metric for these nodes.

How come? Am I missing something?

Thanks in advance for addressing my query

I believe I have found the main mistake in my query above - although a second issue concerns normalisation: both need to be addressed to fully replicate igraph’s betweenness centrality returned by the function centr_betw or equivalently betweenness(myGraph, directed = TRUE, normalized = FALSE)

The main mistake is that I have erroneously used the length of the geodesics instead of the count of geodesics: in other words I should have used igraph’s function all_shortest_paths instead of shortest_paths

Specifically the code in my original post should be modified as follows

node_in_paths <- lapply(names(V(myGraph)),function(z){
chosen_node_bc <- z
chosen_node_bc_idx <- which(names(V(myGraph)) %in% chosen_node_bc)
myGeod_chosen_node <- myGeod[-chosen_node_bc_idx,-chosen_node_bc_idx]
nodename_test <- chosen_node_bc
myGeod_chosen_node_ratio <- lapply(seq(1:nrow(myGeod))[-chosen_node_bc_idx], function(x){
fract_all_geodesics_with_x <- sapply(seq(1:nrow(myGeod))[-chosen_node_bc_idx], function(q){
sublist_x <- all_shortest_paths(myGraph, x, as.vector(q), mode = "out")$vpath
count_shortes_paths <- length(sublist_x)
temp_v <- sapply(sublist_x, function(y){
path_y <- c(names(y))
length(which(path_y %in% nodename_test))
})
sum(temp_v)/count_shortes_paths
})
names(fract_all_geodesics_with_x) <- names(V(myGraph))[seq(1:nrow(myGeod))[-chosen_node_bc_idx]]
fract_all_geodesics_with_x
})
names(myGeod_chosen_node_ratio ) <- names(V(myGraph))[seq(1:nrow(myGeod))[-chosen_node_bc_idx]]
myGeod_chosen_node_ratio <- do.call("rbind", myGeod_chosen_node_ratio )
})
names(node_in_paths) <- names(V(myGraph))

Regarding normalisation: I should have not normalised i.e.

#norm_factor <- ((nrow(myGeod)-1)*(nrow(myGeod)-2))/2
norm_factor <- 1
my_between_centr <- sapply(node_in_paths, function(z){
sum(z)/norm_factor
})

which returns:

my_between_centr

FR BE NL ES IT DE PL GB
3.5000000 0.5833333 10.7500000 0.5833333 4.1666667 1.5000000 2.0833333 0.8333333

This is the same result one obtains with: centr_betw(myGraph, directed = TRUE)$res