HITS algorithm inconsistent across R/igraph, networkx and Mathematica

Christian-T · 14 July 2021 11:47

Hi everyone ,

I am applying the HITS algorithm (Kleinberg, 1999) in my research and came across a strange issue: The results of the HITS algorithm differ between R, Python and Mathematica. Importantly, the results differ not only in their relative weight, but also in which nodes get assigned which weight.

Below, you find an example of how the code and the output look like (scroll to the end for code as text).

I am not astonished, that the weights are different between the three software packages, but also the relative weights are different!
Do you have any idea, where these differences might come from? Are the implementations of the algorithm different between the packages?

Would that mean that scientific results using the HITS algorithm are likely software specific?

Please note that I am not using the igraph implementation in python or mathematica.

Thanks a lot for clarifying this issue to me

I am working on a Microsoft Windows 10; Version 10.0.19042
The igraph version is 1.2.6.; R is in version 4.0.3 (2020-10-10) “Bunny-Wunnies Freak Out”

Code as text:

Mathematica:

HITSCentrality[Graph[{1 -> 3, 2 -> 1, 3 -> 6, 4 -> 6, 1 -> 5, 5 -> 4, 6 -> 1}]]

R:

library(igraph); 
g2 <- graph_from_literal(2-+1-+5-+4-+6-+1,1-+3-+6);
plot(g2); 
hub_score(g2, scale=TRUE)$vector;
authority_score(g2)$vector

Python:

import networkx as nx   
from networkx.algorithms import community   
G = nx.DiGraph()   
G.add_nodes_from([1,2,3,4,5,6])   
G.add_edges_from([(1,3),(2,1),(3,6),(4,6),(1,5), (5,4), (6,1)])   
print(nx.hits(G))

szhorvat · 14 July 2021 12:09

This is not a (possible) difference between R, Python and Mathematica, but R/igraph, networkx and Mathematica’s built-in HITS algorithm. I changed the title to avoid confusion. Note that there is more than one network analysis package for all these languages. Furthermore, igraph itself is exposed to all three of these languages.

Additionally, the Mathematica code you posted was invalid due to the use of Unicode arrows. I fixed this as well.

Christian-T · 14 July 2021 12:13

Thanks for naming my topic more precisely and for fixing the code

szhorvat · 14 July 2021 12:39

First, let’s look at your Mathematica code.

g = Graph[{1 -> 3, 2 -> 1, 3 -> 6, 4 -> 6, 1 -> 5, 5 -> 4, 6 -> 1}]
HITSCentrality[g]

You should be aware that this does not return the hub and authority scores for vertices 1, 2, 3, 4, 5, 6 but for 1, 3, 2, 6, 4, 5, in this order. Why? Because these numbers are not vertex indices but vertex names. You can get the vertex list using VertexList[g].

You probably want this instead:

g = Graph[{1,2,3,4,5,6}, {1 -> 3, 2 -> 1, 3 -> 6, 4 -> 6, 1 -> 5, 5 -> 4, 6 -> 1}]

This way vertex 1 is the 1st vertex, vertex 2 is the second, i.e. names and indices coincide.

The same applies to R/igraph as well, where vertex names and indices are not the same thing.

Hub and authority scores are the leading eigenvectors of AA^T and A^TA where A is the adjacency matrix. Note that the leading eigenvalue may be degenerate, so the solution may not be unique. This is precisely the case with your graph.

Taking your graph in Mathematica,

In[314]:= 
am = AdjacencyMatrix[g];
Eigensystem[am . Transpose[am]]

Out[315]= 
{{2, 2, 2, 1, 0, 0}, 
{{0, 1, 0, 0, 0, 1}, {0, 0, 1, 1, 0, 0}, {1, 0, 0, 0, 0, 0}, 
 {0,0, 0, 0, 1, 0}, {0, -1, 0, 0, 0, 1}, {0, 0, -1, 1, 0, 0}}}

As you can see, the first three eigenvectors all correspond to eigenvalue 2. Thus any linear combination of them is a valid hub score vector.

In short, the results from all three systems are correct.

You should be aware that the current version of igraph has a small bug: while both the hub and authority score vectors are correct, they may not be a matching pair in degenerate cases like yours. For more details, see

github.com/igraph/igraph

Hub and authority scores are not computed as a matching pair

opened 12:21PM - 01 Jul 20 UTC

szhorvat

todo

Let us denote hub and authority scores by `h`, `a` and the adjacency matrix and …its transpose by `A`, `At`. Then `h` and `a` are defined as - `A.At.h = alpha h` - `At.A.a = beta a` I.e. they're the leading eigenvectors of `A.At` and `At.A`, respectively. Warning: If you are looking at the book MEJ Newman: Networks, 2nd ed., the definitions are interchanged there! (It's a mistake in the text.) This is precisely how igraph computes them, using ARPACK. However, there is an additional expectation that `a` and `h` "match" each other in that `At.h ~ a` and `A.a ~ h`, where I used `~` to denote proportionality. If the leading eigenvalue is degenerate, and thus there are many possible solutions for `a` and `h`, then the results returned by igraph may not match, i.e. `At.h` may not be proportional to `a` and `A.a` may not be proportional to `h`. @vtraag Do you have practical experience with these scores? Are they still useful for anything if degeneracy arises? Still, it would make sense for igraph to guarantee returning a matching pair. Small example graph with degeneracy and its edge list: ![image](https://user-images.githubusercontent.com/1212871/86243162-2c466e80-bba6-11ea-9588-8007d1348629.png) {{1, 2}, {2, 1}, {2, 3}, {3, 2}, {3, 4}, {4, 1}}

The workaround is to compute the hub scores h first and use a = A^t h as the authority scores (instead of computing them separately).

szhorvat · 14 July 2021 13:47

The good news is that such degeneracy is unlikely to occur in large connected graphs constructed from empirical data.

Christian-T · 14 July 2021 15:12

Thanks for the fast and helpful answer! So it’s an issue of the specific graph and not of the software packages? That would be great.
I used this specific graph, as it is used in various examples in the Mathematica manual, but then this is maybe not the best example to try replicate the results.

Would the appropriate calculation of the eigenfactors in R look like this?

g2 <- graph_from_literal(2-+1-+5-+4-+6-+1,1-+3-+6)
am <- matrix(as_adjacency_matrix(g2), ncol=6)
am
#hub:  
eigen(am %*% t(am))
#auth:
eigen(t(am) %*% am)

Allow me two follow-up questions on some peculiarities that I noticed with Mathematica:

For this even simpler graph, the values for hub- and authority all become 0 in Mathematica, although one node clearly is different from the others. (In R/igraph I get a more realistic solution)

In:=    
g = Graph[{1,2,3},{1->3,2->3}]
HITSCentrality[g]

Out:= 
{{0., 0., 0.}, {0., 0., 0.}}

For undirected graphs, hub- and authority values should be the same, according to the igraph manual. Yet, somehow, Mathematica yields different values for hubs and authority:

In:=
g = Graph[{1, 2, 3, 4, 5, 6}, {1 -> 3, 2 -> 1, 3 -> 6, 4 -> 6, 1 -> 5, 5 -> 4, 6 -> 1}]
gundirected = UndirectedGraph[g]
HITSCentrality[gundirected]
Out:=
 {{0.239686, 0.09222, 0.173757, 0.137351, 0.145066, 
  0.21192}, {0.622962, 0.239686, 0.451606, 0.356986, 0.377038, 
  0.550794}}

The reason that I ask is that I want to replicate results from a research group who applied Mathematica on their data, by applying R/igraph to it now.

szhorvat · 14 July 2021 16:21

Those two results that Mathematica shows for gundirected are in fact the same, one is just scaled relative to the other. You can divide one by the other: the result vector will be constant.

In[415]:= Divide @@ HITSCentrality[gundirected]
Out[415]= {0.384753, 0.384753, 0.384753, 0.384753, 0.384753, 0.384753}

Since hub and authority scores are eigenvectors, they do not have a natural scaling. The values make sense relative to each other, but not as absolute numbers. In other works, you may choose your favourite way to normalize the scores.

It will be useful for you to also look at IGraph/M, which is the igraph interface of Mathematica. If you ever find a difference between the result produced by Mathematica’s built-ins vs IGraph/M, do let me know.

Do you mind linking to the paper you are trying to reproduce?

I don’t know what’s up with that. It looks like a bug. I suggest you report it to Wolfram Support.

I don’t really use R, so I’m in the same boat as you: I would need to check the documentation. However, you might use

if you want the vertex names and indices to coincide.

as_adjacency_matrix accepts a sparse=F option if you want a dense matrix.

I would look for ways to compute only the leading eigenvector of sparse matrices, as this will be much more efficient. I expect R must have a way, like all other numerically focused packages.

Christian-T · 15 July 2021 09:44

Ah great, thanks for clarifying that the vectors are scaled relative to each other.

The paper which I try to replicate is this one:
https://bera-journals.onlinelibrary.wiley.com/doi/full/10.1111/bjet.12938

It contains interesting suggestions on how to model learner’s knowledge structures using indices based on the HITS algorithm.

I wrote the Wolfram Mathematica support team about the bug. And I will try to get the sparse eigenvector thing running in R

szhorvat · 15 July 2021 13:42

Generally, if you have doubt about the validity of a hub/authority score result, you can verify it by checking that it is truly an eigenvector: multiply it by AA^T or A^TA and check that the result is the same as the original, scaled by some number. (The exception being that all-zeros is not an acceptable result.)

Topic		Replies	Views
eigen_centrality for a directed graph (R) Usage R	13	2174	27 August 2022
Eivenvector centralities & connected components Usage Python	28	2361	6 October 2020
Implementing the LDBC Graphalytics benchmark Usage C	14	1166	6 February 2021
C/igraph 0.9.0 Announcements C	0	777	16 February 2021
the result is different neo4j for eigenvector_centrality? Usage C	2	189	6 December 2023

HITS algorithm inconsistent across R/igraph, networkx and Mathematica

Related topics