User-defined distance for unconnected node in closeness, and harmonic centrality

Hi there,
recently, I have been comparing implementations of various centralities in Python graph libraries and it has been surprisingly difficult to obtain identical results.

I particular, I would like to obtain similar results for closeness centrality for directed graphs, where some nodes may have zero in or out-degrees.
Depending on the definition, one can decide to have undefined (nan) or zero centralities for such nodes.
However, igraph currently replaces the distance of these nodes by the number of vertices in the graph. While this is one choice among others, it does not necessarily fit the user’s needs.

How difficult would it be to propose a user-defined choice for this length? (e.g. between nan, 0, and using the number of vertices).

I have given a quick look at the implementation but I must say that, for somebody discovering igraph’s internals for the first time, the algorithm is not very easy to read…
In principle, I do not think the proposed change is difficult to implement and I may be able to propose an implementation if somebody can give me a brief overview of the algorithm but I’m afraid I do not have time to try to figure it out on my own.

In addition to this, a similar question would be whether it is possible to implement the harmonic definition of the closeness centrality:

c_i = \frac{1}{N-1} \sum_{j=1}^N\frac{1}{d_{ij}}

EDIT: to contextualize, I think this feature would provide several benefits,

  • more versatile
  • provide ways to get similar results between graph-tool, igraph, and networkx in python
  • eventually remove the need for the runtimewarning for “disconnected graph” (they are generally not) if the closeness returned for nodes with zero in/out-degree is NaN by default (could be done in a few releases if deemed desirable); a warning can be overlooked while a NaN tends to stand out

I fully agree that the handling of non-connected graphs is currently not very good. We are looking for solutions to this, and we welcome ideas (and references) on how to handle such graphs.

If I understand you correctly, you are talking only about the case of zero-degree vertices (or zero in- or out-degree ones). This is just one special case of a non-connected graph, and I do not see why it should get special treatment. A proper solution would address all non-connected graphs.

Can you clarify why you are looking to treat zero-degree vertices specially?

Personally, I would say that closeness centrality simply does not make sense for non-connected graphs. No matter what distance is chosen for non-connected pairs of vertices, the interpretation of the result would be dubious.

Can you provide references for this concept?

I already have a 95% done pull request for global efficiency defined as

E = \frac{1}{N(N-1)} \sum_{i,j} \frac{1}{d_{ij}}

Adding what you mention would be easy.

Hi @szhorvat, thanks for the quick answer!

Non-connected graphs and zero-degree nodes:
I’m not really talking about ‘special-treatment’ for zero degree nodes; properly defining the distance value for disconnected nodes is a large part of how non-connected graphs are handled with respect to closeness.
Both networkx and graph-tool use, by definition, d_{ij} = 0 if there is no path for i to j, unfortunately networkx does not follow the logic to the end and sets the centrality of fully disconnected nodes to zero, while graph-tool consistently sets there centralities to NaN.
With this choice of distance, the last question is how to normalize, which both nx and gt do via the size out the node’s out-component.
However, networkx does provides a better normalization procedure via the wf_improved argument, which uses the the Wasserman and Faust formula.

Harmonic closeness:
I think the easiest is just to look at the wikipedia section which has all relevant references.
To me, the harmonic implementation is much more intuitive and clean since, taking d_{ij} = \infty if there is no path for i to j, there is no special case and there is a single way to normalize, which is by N - 1, regardless of the number of components.

Can you please open a new issue for harmonic centrality, and link back here?

Since I’m not supposed to open feature requests there, I would open an issue regarding closeness centrality in graphs with “non-connected” graphs and propose the different definition of distance, the networkx renormalization, and the harmonic centrality as potential solutions, is that what you meant?

It’s okay, I created an issue: https://github.com/igraph/igraph/issues/1373

The issue for closeness in disconnected graphs is here:

Since you are already looking at what various other packages do, it would be useful if you could summarize your findings and post them either here or in that issue. You already did that for networkx and graph-tool. Thanks!


Just a small note: Regarding your description of what networkx does. You focus on what the distance to non-reachable nodes should be. I found this way of putting it quite confusing. It does not take the distances to non-reachable nodes to be zero. Instead, it simply ignores non-reachable nodes. There is no distance to those included in the calculation.

This is one reason why I think that asking “what the distance to non-reachable nodes should be” is the wrong approach.

BTW Mathematica does the same: it ignores non-reachable vertices. Of course, this is not a very useful thing to do because it results in large centralities for vertices in small connected components. The Wasserman paper tries to address this, but what they do seems rather ad-hoc … I should read the paper to see if they have any well-founded justification.

OK, I’ll update them with some data; I’m afraid I’m only using networx, igraph, and graph-tool, though, so i already told you all I know regarding what happens in other libraries.

Regarding you’re remark on the distance, for the arithmetic definition of closeness, ignoring the distance or setting it to zero is almost exactly the same, except for the fact that defining the distance gives a non-arbitrary value of the closeness for fully disconnected nodes (NaN), which I think provides a consistent answer the “non-connected” issue.

As for the normalization, I agree, the formula is somehow ad-hoc, however (at least to me) the whole arithmetic definition of closeness is somewhat “awkward”, and this is the best option I’ve found so far, apart from moving to the harmonic definition altogether… which would be the better option as far as I’m concerned.

I think we are not on the same page.

Consider the disjoint union of two cycle graphs of size 3 and size 11:

image

If we ignore non-reachable nodes, the closeness in the small component is 1, in the big one it is 1/3, for each vertex.

If we set the distance to non-reachable nodes to zero, the values are 13/2 and 13/30, very different.

Ah, maybe my formulation was not clear enough: for the arithmetic definition, the normalization is done by n - 1 where n is the number of nodes in the out/in-component of the nodes (depending on the mode).
In your examples, that would be respectively 2 and 10.
Since this is not necessarily meaningful, the Wasserman and Faust formula tries to improve this slightly.
But I agree that all of this leads to convoluted and somewhat awkward definitions.
Yet, this is, to my knowledge, the least inconsistent way of defining arithmetic closeness.

The harmonic definition, however, directly uses N - 1, where N is the total number of nodes in the graph (here, indeed, non-connected graphs and two separate graphs give different centralities, which is normal to me).

I updated the issues on GitHub, including examples, hopefully this should clarify things.

1 Like