Hi,
I have recently encountered a somewhat weird behaviour when calculating eigenvector centralities (with the evcent function). I have a graph with ~600k edges and ~6k nodes, and I use the Lgl format as my input. I noticed a strange difference in eigenvector centralities depending on minor differences in the format.
If I specify my edges as follows:
'# node1'
'node2 weight2'
'node3 weight3'
...
than, not surprisingly, many of my nodes will have zero eigenvector centrality.
However, if I put no space between # and node 1 (â#node1â), than most of the previously null values will be approx. 2.7e-18. Other centralities (betweenness, closeness, degree, transitivity) are unaffected, and in case of small, âtestâ graphs this does not happen.
Another issue is that I have noticed that the eigenvector centralities are calculated somewhat differently by iGraph than by, for example NetworkX. In the following graph (see below, Lgl format) node 3 is disconnected from the rest. NetworkX assigns an eigenvector centrality 0 to this node, while iGraph ~0.35, and I wonder what might be the cause of the difference.
(g.evcent(directed=False, scale=True, weights=Weights, return_eigenvalue=False))
I would greatly appreciete any help or suggestion with these issues.
I also mixed up the outputs for the graph with 600k edges: â#nodeâ results in zero EV centralities for some nodes, while â# nodeâ (with space) results in values ~2e-18 for the same nodes, which have typically either very few, or no edges. However, only EV centralities are affected, the other node characteristics I tested are not.
Secondly, the eigenvector centrality of node with name '3' does have an eigenvector centrality of 0. However, it is the second node in terms of integer identifier:
Hi,
Thank you; I didnât use the â#--------â as a node, I just put it as a separator from the text. There is one thing that is stil unclear however, if every edge has weight 1 (I actually used a graph with weights), than the centrality changes.
Itâs good to note that eigenvector centrality is not really meaningful in non-connected graphs. The largest component will dominate, and any result for the other components will be meaningless (usually zero).
You might decompose the graph into components and compute eigenvector centrality for each component separately. Vertex centralities will not be comparable between components, but they will be within components.
Regarding networkx, as far as I know it uses power iteration to compute eigenvector centralities. igraph uses ARPACK (i.e. more robust methods). For connected undirected graphs with all-positive weights, the results should match. In pathological cases, the result from a power iteration may be unreliable.
Thanks; Iâm using LGL format, with weights (see below), and I assumed that having edges with weights 1 is the same as not having weights. But this is not the case, the EV centrality of the disconected node 3 is 0.0 without weigths and ~0.35 with weights, which I find somewhat counterintuitive.
This line (i.e. igraph_inclist_remove_duplicate) is suspect:
I thought it should not be there for different reasons, so I removed it, and it turns the 0.5 into 0.
I donât have time to investigate this further today, but Iâll tell you why I though that line was wrong:
I assume it simplifies multigraphs (didnât take the time to understand it fully). But then there may be a mismatch between the weights vector and the edge indices âŚ
We need to fix this before 0.8.3 (so please donât go ahead with 0.8.3 @tamas)
removing that line does not break anything, even for multigraphs. It will simply effectively add up the weights of parallel edges, which is fine, and which can be documented.
But we need to figure out what to do with self-loops (those are removed in the unweighted case).
If that is the case, can we simply say that for non-connected graphs, it is enough to calculate the eigenvector centrality for the largest component and return all-zeros for the rest?
No, they arenât. What that line does is that it removes one stub of each loop edge in the adjacency list for each vertex while keeping the other. So, if node 2 has a self-loop, it will appear twice in its own adjacency list, and the function keeps one. If node 2 has two self-loops, it will appear four times, and the function keeps two. So I think itâs correct.
I donât think so. I need to take the time to fully understand the theory.
If the graph is disconnected, the adjaceny matrix is block-diagonal. The eigenvalues/vectors of the two blocks are essentially independent. Technically, eigenvector centrality is the eigenvector of the leading eigenvalue, which usually, but not always belongs to the largest block.
The K_4 complete graph has eigenvalue 3, while the other component has eigenvalue 2.85, even though it has more than 4 vertices. So in this case, the leading eigenvector belongs to the smaller component of the two.
All that said, I think that in practice it is simply not useful to compute eigenvector centralities for disconnected graphs. What that effectively does is to select just one component, and discard the rest. Why should the rest be discarded? The correct way to do it, in my opinion, is to decompose the graph and do the computation for each component separately. But this should be left to the userâigraph should not do this automatically. The docs of the C interface do already suggest doing this.
Should we perhaps simply throw an error and say that eigenvector centrality is not supported for disconnected graphs? As you explain, it simply does not make much sense. It is no problem to break it into different components and getting the centralities separately, so why not simply enforce this?
So the eigenvalues of the entire adjacency matrix simply consists of the eigenvalues of all individual blocks. It then comes down to the question which block, i.e. connected component, has the largest eigenvalue. There are quite some bounds on the largest eigenvalue of a graph, see https://dx.doi.org/10.1080/03081089008818026. However, I donât think you can say in advance which connected component will have the larger eigenvalue in general. Of course, given these bounds, you can say something for some specific cases.
Iâm not sure whether we should really leave it to the user, so let me phrase an argument for both sides:
Decomposing the graph automatically would automatically provide the âexpectedâ result for the average user. Whether the decomposition should be left for higher level interfaces is up for debate; it is definitely harder to do it in the C layer, but not doing so will inevitably mean that some higher level interfaces will decompose automatically while others wonât, which might lead to confusion.
On the other hand, simply calculating the eigenvector centrality component by component can be misleading is there is one very large component (with, say, thousands of vertices) and a very small one with two vertices and one edge only. The centralities in the small component will be set to sqrt(2) / 2 if you calculate the eigenvector centrality for it on its own, which will probably outshadow the eigenvector centralities from the very large component - giving the impression that the vertices in the small component are somehow more important.
Some random thoughts, some of which try to address your comments (both of you) above:
There is also the directed case, which is a bit more complicated. Weâll need to take the time to fully understand that case.
I am not in favour of auto-decomposing: (1) people will be confused about why the result is different from just about every other packageâs result (2) users will naively assume that centrality values are comparable across componetsâtheyâre not (this relates to TamĂĄsâs 2nd point above).
Throwing an error for the disconnected case: If we were to do this, itâs probably better to just issue a warning. Do not deny people to see the result, even if it may not be meaningful.
If some weights are zero that may create an âeffectively disconnectedâ graph. I am not sure itâs worth going to the trouble to detect this case.
While itâs good to try to save users from shooting themselves in the foot, there is a limit to how much hand-holding is productive. One should have a minimum level of understanding of a concept before using that concept. We should trust users to take the time to learn about the concepts before using them (even though it is inevitable that some will not). Personally, I think that the best approach is to have good documentation instead of having functions that try to detect and prevent any misuse. A typical package might simply say âThis function computes the Foobarâ. We should say, âThis function computes the Foobar. Foobar is defined as xyz. igraph normalizes Foobar in such and such way, which may be different from other packages. Foobar is meaningful to use in such and such situations.â
The biggest missing piece for me is fully understanding the directed case.
It turned out that it was a bug in a lower-level routine that was triggered for the weighted case, and it mistakenly created a loop edge on isolated nodes; thatâs the reason why the results were different. The bug is now fixed here. Nevertheless, it is still true that calculating eigenvector centrality for disconnected networks has its caveats, and you should decompose the graph into connected components first and calculate the centralities component by component.
Thank you. A comment from the perspective of the user: if people will have to decompose their graphs themselves into disconnected units to calculate EV centralities, they will simply use libraries that calculate EV centrality for their entire graph, even if this has theoretical problems. On the other hand, providing EVcent values that are not comparable between different components of the graph is something that can potetially trigger lots of mistakes in interpretation, so the output format should make it very clear which values can be compared and which cannot. I wouldnât worry about being different from other packages, especially if iGraph does it in the correct way âŚ