Eivenvector centralities & connected components

Hi,
I have recently encountered a somewhat weird behaviour when calculating eigenvector centralities (with the evcent function). I have a graph with ~600k edges and ~6k nodes, and I use the Lgl format as my input. I noticed a strange difference in eigenvector centralities depending on minor differences in the format.

If I specify my edges as follows:

'# node1'
'node2   weight2'  
'node3   weight3'
...

than, not surprisingly, many of my nodes will have zero eigenvector centrality.

However, if I put no space between # and node 1 (’#node1’), than most of the previously null values will be approx. 2.7e-18. Other centralities (betweenness, closeness, degree, transitivity) are unaffected, and in case of small, “test” graphs this does not happen.

Another issue is that I have noticed that the eigenvector centralities are calculated somewhat differently by iGraph than by, for example NetworkX. In the following graph (see below, Lgl format) node 3 is disconnected from the rest. NetworkX assigns an eigenvector centrality 0 to this node, while iGraph ~0.35, and I wonder what might be the cause of the difference.
(g.evcent(directed=False, scale=True, weights=Weights, return_eigenvalue=False))

I would greatly appreciete any help or suggestion with these issues.

Best wishes,
George

#----------------
#3
#0
1	
2	
4	
#1
2	
4	
#2
5	
#4
5

A few more details: I use macOS 10.15 and python igraph 0.82.

Fore the small example graph (NetworkX vs iGraph) I used the following code:

g = Graph.Read_Lgl(sys.argv[1], names=True, weights="if_present", directed=False)
Weights = g.es["weight"]  
EVcent = g.evcent(directed=False, scale=True, weights=Weights, return_eigenvalue=False)

(For each edge the weigth was set as 1.)

I also mixed up the outputs for the graph with 600k edges: ‘#node’ results in zero EV centralities for some nodes, while ‘# node’ (with space) results in values ~2e-18 for the same nodes, which have typically either very few, or no edges. However, only EV centralities are affected, the other node characteristics I tested are not.

Best wishes,
George

Perhaps we should first establish whether the graphs are correctly established. The provided example:

#----------------
#3
#0
1
2
4
#1
2
4
#2
5
#4
5

leads to the following graph:

As you see, the first line is also picked up as a node. I am fairly sure that this is not what you intend?

Another point of attention is that the node identifiers are parsed as names, not as integers. That is,

>>> G.vs['name']
['----------------', '3', '0', '1', '2', '4', '5']

Secondly, the eigenvector centrality of node with name '3' does have an eigenvector centrality of 0. However, it is the second node in terms of integer identifier:

>>> G.evcent()
[0.0,
 0.0,
 1.0,
 1.0,
 0.9278862533179941,
 0.9278862533179941,
 0.6498320515110046]

Does this clarify things for you, or does a problem remain?

Hi,
Thank you; I didn’t use the “#--------” as a node, I just put it as a separator from the text. There is one thing that is stil unclear however, if every edge has weight 1 (I actually used a graph with weights), than the centrality changes.

g = Graph.Read_Lgl(sys.argv[1], names=True, weights="if_present", directed=False)
print g.evcent()

=>  [0.0, 1.0, 1.0, 0.9278862533179943, 0.9278862533179946, 0.6498320515110049]

(node 3 has EVcent 0.0)

adding weights:

Weights = g.es["weight"]
print g.evcent(weights=Weights) 

=>  [0.35046675862221427, 0.9999995855898569, 1.0, 0.9265038303219337, 0.9265034848321988, 0.6495322360646949]

(node 3 has EVcent ~0.35)

Node 3 is still disconnected, and I assumed that no edge weigths equals weights = 1, but apparently this is not the case; I wonder why.

Best wishes,
George

I don’t understand the graph format you are quoting here. Can you explain it?

Also, please put all code, and all graph specifications into code blocks for readability.

It’s good to note that eigenvector centrality is not really meaningful in non-connected graphs. The largest component will dominate, and any result for the other components will be meaningless (usually zero).

You might decompose the graph into components and compute eigenvector centrality for each component separately. Vertex centralities will not be comparable between components, but they will be within components.

Regarding networkx, as far as I know it uses power iteration to compute eigenvector centralities. igraph uses ARPACK (i.e. more robust methods). For connected undirected graphs with all-positive weights, the results should match. In pathological cases, the result from a power iteration may be unreliable.

Regarding the file format @szhorvat, it is the LgL format.

The results are indeed somewhat unexepected, I am not yet sure what is happening. Just working with a small example:

G = ig.Graph.Formula('a - b - c - d - a, e')

yields eigenvector centralities

[1.0, 0.9999999999999998, 0.9999999999999996, 1.0, 0.0]

and weighted eigenvector centralities (with weights of 1 everywhere)

[0.9999988952517365, 1.0, 0.9999989940097329, 0.9999991131820248, 0.4999995532246091]

Note the 0.5 for node e. In principle I would indeed expect the two to be identical as well.

Do you have any immediate idea @szhorvat?

Thanks; I’m using LGL format, with weights (see below), and I assumed that having edges with weights 1 is the same as not having weights. But this is not the case, the EV centrality of the disconected node 3 is 0.0 without weigths and ~0.35 with weights, which I find somewhat counterintuitive.

#3
#0
1 1
2 1
4 1
#1
2 1
4 1
#2
5 1
#4
5 1

This line (i.e. igraph_inclist_remove_duplicate) is suspect:

I thought it should not be there for different reasons, so I removed it, and it turns the 0.5 into 0.

I don’t have time to investigate this further today, but I’ll tell you why I though that line was wrong:

I assume it simplifies multigraphs (didn’t take the time to understand it fully). But then there may be a mismatch between the weights vector and the edge indices …

We need to fix this before 0.8.3 (so please don’t go ahead with 0.8.3 @tamas)

As far as I can tell from here:

removing that line does not break anything, even for multigraphs. It will simply effectively add up the weights of parallel edges, which is fine, and which can be documented.

But we need to figure out what to do with self-loops (those are removed in the unweighted case).

1 Like

If that is the case, can we simply say that for non-connected graphs, it is enough to calculate the eigenvector centrality for the largest component and return all-zeros for the rest?

No, they aren’t. What that line does is that it removes one stub of each loop edge in the adjacency list for each vertex while keeping the other. So, if node 2 has a self-loop, it will appear twice in its own adjacency list, and the function keeps one. If node 2 has two self-loops, it will appear four times, and the function keeps two. So I think it’s correct.

1 Like

I don’t think so. I need to take the time to fully understand the theory.

If the graph is disconnected, the adjaceny matrix is block-diagonal. The eigenvalues/vectors of the two blocks are essentially independent. Technically, eigenvector centrality is the eigenvector of the leading eigenvalue, which usually, but not always belongs to the largest block.

For example, let’s take this graph:

The K_4 complete graph has eigenvalue 3, while the other component has eigenvalue 2.85, even though it has more than 4 vertices. So in this case, the leading eigenvector belongs to the smaller component of the two.


All that said, I think that in practice it is simply not useful to compute eigenvector centralities for disconnected graphs. What that effectively does is to select just one component, and discard the rest. Why should the rest be discarded? The correct way to do it, in my opinion, is to decompose the graph and do the computation for each component separately. But this should be left to the user—igraph should not do this automatically. The docs of the C interface do already suggest doing this.

Should we perhaps simply throw an error and say that eigenvector centrality is not supported for disconnected graphs? As you explain, it simply does not make much sense. It is no problem to break it into different components and getting the centralities separately, so why not simply enforce this?

So the eigenvalues of the entire adjacency matrix simply consists of the eigenvalues of all individual blocks. It then comes down to the question which block, i.e. connected component, has the largest eigenvalue. There are quite some bounds on the largest eigenvalue of a graph, see https://dx.doi.org/10.1080/03081089008818026. However, I don’t think you can say in advance which connected component will have the larger eigenvalue in general. Of course, given these bounds, you can say something for some specific cases.

1 Like

I’m not sure whether we should really leave it to the user, so let me phrase an argument for both sides:

  • Decomposing the graph automatically would automatically provide the “expected” result for the average user. Whether the decomposition should be left for higher level interfaces is up for debate; it is definitely harder to do it in the C layer, but not doing so will inevitably mean that some higher level interfaces will decompose automatically while others won’t, which might lead to confusion.

  • On the other hand, simply calculating the eigenvector centrality component by component can be misleading is there is one very large component (with, say, thousands of vertices) and a very small one with two vertices and one edge only. The centralities in the small component will be set to sqrt(2) / 2 if you calculate the eigenvector centrality for it on its own, which will probably outshadow the eigenvector centralities from the very large component - giving the impression that the vertices in the small component are somehow more important.

Some random thoughts, some of which try to address your comments (both of you) above:

  • There is also the directed case, which is a bit more complicated. We’ll need to take the time to fully understand that case.

  • I am not in favour of auto-decomposing: (1) people will be confused about why the result is different from just about every other package’s result (2) users will naively assume that centrality values are comparable across componets—they’re not (this relates to Tamás’s 2nd point above).

  • Throwing an error for the disconnected case: If we were to do this, it’s probably better to just issue a warning. Do not deny people to see the result, even if it may not be meaningful.

  • If some weights are zero that may create an “effectively disconnected” graph. I am not sure it’s worth going to the trouble to detect this case.

  • While it’s good to try to save users from shooting themselves in the foot, there is a limit to how much hand-holding is productive. One should have a minimum level of understanding of a concept before using that concept. We should trust users to take the time to learn about the concepts before using them (even though it is inevitable that some will not). Personally, I think that the best approach is to have good documentation instead of having functions that try to detect and prevent any misuse. A typical package might simply say “This function computes the Foobar”. We should say, “This function computes the Foobar. Foobar is defined as xyz. igraph normalizes Foobar in such and such way, which may be different from other packages. Foobar is meaningful to use in such and such situations.”

The biggest missing piece for me is fully understanding the directed case.

The networkx documentation is pretty good, actually:

But it gets sloppy about the Perron-Frobenius theorem: it does not address when the theorem doesn’t apply.

It turned out that it was a bug in a lower-level routine that was triggered for the weighted case, and it mistakenly created a loop edge on isolated nodes; that’s the reason why the results were different. The bug is now fixed here. Nevertheless, it is still true that calculating eigenvector centrality for disconnected networks has its caveats, and you should decompose the graph into connected components first and calculate the centralities component by component.

Thank you. A comment from the perspective of the user: if people will have to decompose their graphs themselves into disconnected units to calculate EV centralities, they will simply use libraries that calculate EV centrality for their entire graph, even if this has theoretical problems. On the other hand, providing EVcent values that are not comparable between different components of the graph is something that can potetially trigger lots of mistakes in interpretation, so the output format should make it very clear which values can be compared and which cannot. I wouldn’t worry about being different from other packages, especially if iGraph does it in the correct way …