Issue verifying betweenness centrality for a directed graph in R

Consider a simple R object containing the following edge list for a directed graph:

myArcs_igraph

   source_node_ISO target_node_ISO
1               FR              BE
2               NL              BE
3               ES              BE
4               BE              FR
5               IT              FR
6               ES              FR
7               BE              DE
8               FR              DE
9               IT              DE
10              NL              DE
11              PL              DE
12              ES              DE
13              FR              IT
14              DE              IT
15              NL              IT
16              PL              IT
17              ES              IT
18              BE              NL
19              FR              NL
20              DE              NL
21              IT              NL
22              ES              NL
23              GB              NL
24              FR              PL
25              DE              PL
26              IT              PL
27              NL              PL
28              ES              PL
29              GB              PL
30              FR              ES
31              NL              ES
32              FR              GB
33              NL              GB
34              PL              GB
35              ES              GB

In R I wish to verify the betweenness centrality for the above graph but something doesn’t feel right.

I proceed as follows:

library(igraph)

myArcs_igraph <- as.matrix(myArcs_igraph)
myGraph <- graph_from_edgelist(myArcs_igraph, directed = TRUE)
test_bc <- centr_betw(myGraph, directed = TRUE)$res

or equivalently

test_bc ← betweenness(myGraph, directed = TRUE, normalized = FALSE)

The output I get is as follows:

names(test_bc) <- names(V(myGraph))
as.data.frame(test_bc)

      test_bc
FR  3.5000000
BE  0.5833333
NL 10.7500000
ES  0.5833333
IT  4.1666667
DE  1.5000000
PL  2.0833333
GB  0.833333

Yet something seems off: some nodes never fall on the geodesics connecting other nodes, yet igraph assings to every node a non-zero betweenness

To verify the statement above I proceed as follows (using only igraph’s stock functions)

myGeod <- distances(myGraph, mode = “out”)

node_in_paths ← lapply(names(V(myGraph)),function(z){
chosen_node_bc ← z
chosen_node_bc_idx ← which(names(V(myGraph)) %in% chosen_node_bc)
myGeod_chosen_node ← myGeod[-chosen_node_bc_idx,-chosen_node_bc_idx]
nodename_test ← colnames(myGeod)[chosen_node_bc_idx]
myGeod_chosen_node_detail ← lapply(seq(1:nrow(myGeod))[-chosen_node_bc_idx], function(x){
sublist_x ← shortest_paths(myGraph, x, V(myGraph)[-chosen_node_bc_idx], mode = “out”)$vpath
temp_v ← sapply(sublist_x, function(y){
path_y ← c(names(y))
length(which(path_y %in% nodename_test))
})
temp_v ← t(as.matrix(temp_v))
colnames(temp_v) ← names(V(myGraph)[-chosen_node_bc_idx])
rownames(temp_v) ← names(V(myGraph)[x])
temp_v
})
myGeod_chosen_node_detail ← do.call(“rbind.data.frame”, myGeod_chosen_node_detail)
myGeod_chosen_node_ratio ← as.matrix(myGeod_chosen_node_detail/myGeod_chosen_node)
myGeod_chosen_node_ratio[which(!is.finite(myGeod_chosen_node_ratio))] ← 0
myGeod_chosen_node_ratio
})
names(node_in_paths) ← names(V(myGraph))

Then summing the matrices in the list

sapply(node_in_paths, function(z){
sum(z)
})

  FR        BE        NL        ES        IT        DE        PL        GB 
4.1666667 0.8333333 3.8333333 0.0000000 2.1666667 0.0000000 0.0000000 0.0000000 

From the above we get that ES, DE, PL, GB never fall into other nodes’ geodesics. Yet, as mentioned earlier, igraph returns a non-zero betwenness metric for these nodes.

How come? Am I missing something?

Thanks in advance for addressing my query

I believe I have found the main mistake in my query above - although a second issue concerns normalisation: both need to be addressed to fully replicate igraph’s betweenness centrality returned by the function centr_betw or equivalently betweenness(myGraph, directed = TRUE, normalized = FALSE)

The main mistake is that I have erroneously used the length of the geodesics instead of the count of geodesics: in other words I should have used igraph’s function all_shortest_paths instead of shortest_paths

Specifically the code in my original post should be modified as follows

node_in_paths <- lapply(names(V(myGraph)),function(z){
chosen_node_bc <- z
chosen_node_bc_idx <- which(names(V(myGraph)) %in% chosen_node_bc)
myGeod_chosen_node <- myGeod[-chosen_node_bc_idx,-chosen_node_bc_idx]
nodename_test <- chosen_node_bc
myGeod_chosen_node_ratio <- lapply(seq(1:nrow(myGeod))[-chosen_node_bc_idx], function(x){
fract_all_geodesics_with_x <- sapply(seq(1:nrow(myGeod))[-chosen_node_bc_idx], function(q){
sublist_x <- all_shortest_paths(myGraph, x, as.vector(q), mode = "out")$vpath
count_shortes_paths <- length(sublist_x)
temp_v <- sapply(sublist_x, function(y){
path_y <- c(names(y))
length(which(path_y %in% nodename_test))
})
sum(temp_v)/count_shortes_paths
})
names(fract_all_geodesics_with_x) <- names(V(myGraph))[seq(1:nrow(myGeod))[-chosen_node_bc_idx]]
fract_all_geodesics_with_x
})
names(myGeod_chosen_node_ratio ) <- names(V(myGraph))[seq(1:nrow(myGeod))[-chosen_node_bc_idx]]
myGeod_chosen_node_ratio <- do.call("rbind", myGeod_chosen_node_ratio )
})
names(node_in_paths) <- names(V(myGraph))

Regarding normalisation: I should have not normalised i.e.

#norm_factor <- ((nrow(myGeod)-1)*(nrow(myGeod)-2))/2
norm_factor <- 1
my_between_centr <- sapply(node_in_paths, function(z){
sum(z)/norm_factor
})

which returns:

my_between_centr

FR BE NL ES IT DE PL GB
3.5000000 0.5833333 10.7500000 0.5833333 4.1666667 1.5000000 2.0833333 0.8333333

This is the same result one obtains with: centr_betw(myGraph, directed = TRUE)$res

Update: The tl;dr is that there can be more than one shortest path between two endpoints.


I only saw your post today. Let me have a look.

When you say “verify”, do you simply mean “computer”, or do you wish to compare against a result obtained by different means?

centr_betw computes centralization, not betweenness centrality. That is a distinct, somewhat niche, and in my opinion dubious concept. If you’re not sure what it is, I suggest to just forget it.

That is not equivalent to centr_betw(). betweenness() is the correct function to compute betweenness centrality.

Why do you think so? This is in fact incorrect. There is a shortest path passing through all vertices of this graph. Which vertex do you think does not lie on any shortest path? [Edit: I saw your answer to this question as I read further.]

Here’s a little table I made that shows, for each node, the endpoints of the shortest paths it lies on. Keep in mind that there may be more than one shortest path between the same two endpoints.

Regarding your R code block: I’m sorry, R is not my primary language, and I don’t have the time to parse through it and find the error. I do however notice that you seem to use shortest_paths(), which finds a single shortest path between two endpoints. Try all_shortest_paths() instead.

For example, between NL and FR there are three shortest paths, namely:

{{"NL", "ES", "FR"}, {"NL", "IT", "FR"}, {"NL", "BE", "FR"}}

I’m sorry, I must be tired today and not reading carefully. Of course you’ve already found the answer :slight_smile:

Hi I really appreciate you took the time to review my query - the only reason I “replied” to myself rather than “edit” further the question is that sometimes it might be useful to leave a trail as it were, just in case someone else is having similar silly doubts.

I found your clarifications very instructive - although the two functions centr_betw and betweeness can be used interchangeably to generate the same metrics, which some of us might find indeed confusing.

Regarding the term “verification” - it’s just a way to say I’m interested in understanding a bit better how a function arrives at specific numerical outputs. My R code was just an attempt to replicate the logic for my own benefit.

Thanks for the great work with igraph - it’s an excellent library with an impressive range of very useful functions - the all_shortest_paths function is a little gem that helped me correct my initial mistake.

Maybe one minor, academic comment concerning normalisation for betweeness centrality: I cannot help but notice that this igraph documentations page quotes a formula from literature that refers to undirected graphs. Although the computations are carried out correctly in all cases under the hood, I have been wondering whether it might be worth citing explicitly work that refers to directed graphs explicitly (and that seems to provide the underlying normalisation implemented in igraph) i.e.,

White, D. R., & Borgatti, S. P. (1994). Betweenness centrality measures for directed graphs. Social Networks, 16(4), 335–346. Redirecting

All the best