Clarification on Eigenvector Centralization Calculation in igraph

Dear igraph Development Team,

I hope this email finds you well. I’m writing to seek clarification on the eigenvector centralization calculation in the igraph package, specifically regarding the ‘scale’ parameter in the centr_eigen() function.

I’ve noticed some inconsistencies in the results when comparing igraph’s output with that of the sna package. My observations are as follows:

  1. When calculating eigenvector centrality using eigen_centrality(), setting ‘scale = TRUE’ produces results consistent with sna’s evcent() function.

  2. However, for eigenvector centralization using centr_eigen(), setting ‘scale = FALSE’ yields results consistent with sna’s centralization() function.

This discrepancy raises two important questions:

  1. What exactly does the ‘scale’ parameter do in each of these functions, and why does it seem to have opposite effects?

  2. Which setting should be used for the most accurate and comparable results across different network analysis packages?

Understanding these nuances is crucial for ensuring the accuracy and comparability of network analysis results. I would greatly appreciate any insights you can provide on this matter.

Thank you for your time and expertise.

Best regards,

Chuding

Below are the codes with annotations:

## Step 1: Create a 5x5 undirected and valued matrix

set.seed(123)  # Set seed for reproducibility
matrix_values <- sample(1:4, 15, replace = TRUE)  # Generate random weights for the upper triangle
undirected_matrix <- matrix(0, 5, 5)  # Initialize a 5x5 matrix

# Fill the upper triangle with random values

undirected_matrix[upper.tri(undirected_matrix)] <- matrix_values

# Make the matrix symmetric to represent an undirected graph

undirected_matrix <- undirected_matrix + t(undirected_matrix)

undirected_matrix

## Step 2: Convert the matrix to an igraph object

library(igraph)

undirected_matrix_graph <- graph_from_adjacency_matrix(undirected_matrix, 
                                                       mode = c("undirected"), 
                                                       weighted = TRUE, 
                                                       diag = FALSE)

## Step 3: Calculate eigenvector centrality

Eigenvector_centrality_from_igraph <- eigen_centrality(
  undirected_matrix_graph,
  directed = FALSE,
  scale = TRUE,
  weights = NULL)

Eigenvector_centrality_from_igraph <- data.frame(Eigenvector_centrality_from_igraph = Eigenvector_centrality_from_igraph$vector)

library(sna)

Eigenvector_centrality_from_sna <- evcent(undirected_matrix, 
                   g=1, 
                   nodes=NULL, 
                   gmode="graph", 
                   diag=FALSE,
                   tmaxdev=FALSE, 
                   rescale=FALSE, 
                   ignore.eval=FALSE, 
                   tol=1e-10,
                   maxiter=1e5, 
                   use.eigen=FALSE)

detach("package:sna", unload=TRUE)

Eigenvector_centrality_from_sna <- Eigenvector_centrality_from_sna / max(Eigenvector_centrality_from_sna)

Eigenvector_centrality_from_sna <- data.frame(Eigenvector_centrality_from_sna = Eigenvector_centrality_from_sna)

# Combine and compare the results of eigenvector centrality from igraph and sna

CombinedResults <- cbind(Eigenvector_centrality_from_igraph, Eigenvector_centrality_from_sna)

CombinedResults

cor(CombinedResults)

## Step 4: Calculate eigenvector centralization

# If we change this to "scale = TRUE" into "scale = FALSE", the results will be consistent. 
# But when calculating eigenvector centrality, this is set as TRUE. 
# And the results above are consistent with those from sna.

Eigenvector_centralization_from_igraph <- centr_eigen(
  undirected_matrix_graph,
  directed = FALSE,
  scale = TRUE, 
  options = arpack_defaults(),
  normalized = TRUE
)

Eigenvector_centralization_from_igraph <- Eigenvector_centralization_from_igraph$centralization

library(sna)

Eigenvector_centralization_from_sna <- centralization(undirected_matrix, 
                           FUN=evcent, 
                           g=NULL, 
                           mode="graph", 
                           diag=FALSE, 
                           normalize=TRUE)

detach("package:sna", unload=TRUE)

# Combine and compare the results of eigenvector centralization from igraph and sna

CombinedResults <- cbind(Eigenvector_centralization_from_igraph, Eigenvector_centralization_from_sna)

CombinedResults

Eigenvector centralities don’t have an absolute scale. Values are only meaningful relative to each other, e.g. vertex A may have a centrality twice as large as vertex B in the same graph. scale=TRUE is just a convenience normalization, but makes no difference in the interpretation of the value.

In fact, this parameter will go away in future versions of igraph, and the current scale=TRUE will be the default behaviour.

In your code, you manually normalize results from SNA in the same way as igraph’s scale=TRUE does.

Again, scaling with a constant factor does not matter. We can say that two packages produce the same result if the output from one is a constant multiple of the output from the other.

This should answer your question, “Which setting should be used for the most accurate and comparable results across different network analysis packages?”


As for “centralization”, personally I am not a fan of this graph measure. It seems to me that it is defined in a rather ad-hoc way, without a principled justification.

It is defined in terms of differences between centrality values. This presumes that there is some fixed, absolute scale for the centrality scores we are using. There is no natural scale for eigenvector centrality. If we are to define centralization consistently, we must choose some scale first, i.e. we must normalize the scores. The results will be quite different depending on which norm we choose (e.g. Euclidean norm vs maximum norm), even if the centralization is then normalized again using its theoretical maximum value.

Now if “centralization” were defined in terms of ratios, instead of in terms of differences, then this problem wouldn’t occur. This shows how arbitrary the choice of differences is …


What does igraph compute? When using scale=TRUE, it normalizes eigenvector centrality scores using the maximum norm. When using scale=FALSE, igraph does not guarantee any kind of normalization. While you may notice that usually it produces results normalized using the Euclidean norm, this is merely an artefact of the eigenvector computation method. It is not guaranteed, and it is not always the case.

Since centralization assumes the use of some scale (some normalization), passing scale=FALSE to centr_eigen() in igraph is invalid. I opened an issue for dealing with this, but note that there are already plans for removing this parameter from igraph (the result will always be normalized using the maximum norm). Thanks for bringing up this topic so we became aware of this issue.

If you want to use the Euclidean norm when computing eigenvector centralization, you will need to normalize the centralities manually and then compute centralization using centralize().

What does sna compute? I looked at the SNA documentation. It seems to me that with rescale=TRUE, it uses the 1-norm. There is no statement about what it uses with rescale=FALSE, which is the default. Since the chosen normalization affects the results from centralization computation, not stating this is a problem. Presumable it’s the Euclidean norm, but is this guaranteed, or merely a side-effect of how the eigenvector happened to be computed?

I would recommend that you contact the sna folks about this issue, and point them to this thread. Their input will be valuable for improving igraph as well.


Summary:

  • Eigenvector centrality has no natural scale. Any sort of interpretation that relies on such a scale is problematic.
  • Centralization assumes the existence of such a scale. If you want to use centralization, you need to choose a scale, i.e. a normalization method yourself. Be aware that different methods can yield very different results, even if centralization itself is normalized using its theoretical maximum.
  • Please only use igraph::centr_eigen() with scale=TRUE. This will be the only available behaviour in the future. It uses the maximum norm to normalize eigenvector centrality scores. If you need anything different from this, you must perform normalization on your own.

Dear Prof. Szabolcs Horvát,

Thank you for your suggestions. I believe I have understood most of them. Following your advice, I also raised the issue with the developers of ‘sna’. Here is their response:

Eigenvectors (and hence eigenvector centrality) are defined only up to a nonzero scalar multiple, so their lengths are inherently arbitrary; evcent() follows the behavior of eigen(), which in turn follows the very common convention of taking eigenvectors to be of unit length. If rescale=TRUE, then the scores are rescaled so that the sum of scores is 1 (which does not, in general, yield a unit-length vector). However, this is again cosmetic, and you can make them sum to 5, to π, or to your birthdate if you like, without changing the meaning of the index. (You can also flip their signs, if you like. And, indeed, eigen()-based calculation can give you negative-sign solutions. An all-negative solution is equivalent to an all-positive solution, because only the products of values matter.)

Although I believe I understand their answer, I am still unsure how to modify the code, either in ‘igraph’ or ‘sna’, to obtain comparable results for eigenvector centralization. Could you please offer some suggestions? Thank you!