Replicating 4 directed assortativities in the literature with knn "mode"

Four types of directed assortativity were proposed in a Foster et al. PNAS paper (http://www.pnas.org/cgi/doi/10.1073/pnas.0912671107, see Figure 1) and are also summarized in Barabasi’s discussion of knn, box 7.3 of his text (Network Science by Albert-László Barabási; he cites Foster et al. in the printed text but not the online book). (Both figures are appended)

The types are in-in, in-out, out-in, and out-out, defining whether one is correlating the in-degree or out-degree of the source vertex with the in- or out-degree of the target vertex. In all cases, their figures and mathematics indicate that the target vertex is found by following edges out of the source in the direction of the edge.

When I first saw the igraph knn documentation (igraph R manual pages), I thought that “mode” would specify whether the in-degree or out-degree of the source vertex is used, and “neighbor.degree.mode” would specify whether the in- or out-degree of the target vertex is used for the correlations, so it would be simple to replicate the 4 cases of the above publications. But it turns out this is not quite right, as “mode” also specifies how to reach neighbors.

For mode: “the type of neighbors to consider in directed graphs. out considers out-neighbors, ⁠in⁠ considers in-neighbors”: I read this to mean “mode” is both how to find the neighbors AND which degree of the source vertex will be used for the k of knn(k). We can’t separate the choice of how to find neighbors and what is used for ego node k.

For neighbor.degree.mode: “The type of degree to average in directed graphs. out averages out-degrees, ⁠in⁠ averages in-degrees and all ignores edge directions for the degree calculation.”: This is clearly which degrees will be used to compute the knn(k) once the neighbors are reached: no problem here.

Given that these readings are correct, I’m trying to figure out how to specify the 4 assortativities of Foster et al. I’ve got 3 of them but not the 4th, which is hampered by the conflation of how to find neighbors with what degree to use for the source node. The cases:

Pattern “out-in” is of interest because it is the situation specified by Newman’s Mixing Patterns in Networks, Physical Review 2003, formula 25, and the documentation of igraph::assortativity. We want to see whether the in-degree of vertices reached by following out-edges are correlated with the originating vertices’ out-degree. So we give knn mode=“out” to both reach the desired neighbors and use out-degree for “k” in knn(k), and neighbor.degree.mode=“in” to specify that the in-degree of neighbors is to be computed for knn. No problem here.

Pattern “out-out” is also straightforward: mode=“out” both finds the out-degree neighbors and uses out-degree for k, while neighbor.degree.mode=“out” averages the neighbors’ out-degree.

The patterns starting with “in” puzzled me because the figures in the above mentioned papers imply one is finding neighbors by following links in the out direction in all 4 cases. But I made some progress when I realized that the “in-in” situation is symmetric: mode=“in” will take us from the right hand node to the left hand node in the figures, going backwards over the link (as well as specify the correct k), and neighbor-degree.mode=“in” computes the average in-degree of the left hand nodes so reached.

The one I am stuck on is “in-out”: Their figures show that we are reaching neighbors via out-degree but we want to use the in-degree of the ego nodes we started at for the “k” of knn(k). Yet a single parameter “mode” specifies both how to reach neighbors and what is the k, and it can’t have two values. If I use mode=“out” to get to the neighbors in the direction of the links I am using out-degree rather than in-degree for k. If instead I use the reversed strategy and set mode=“in” to take us from the right hand mode to the left hand node, that implies we are using in-degree on the right hand side, where we want out as shown in the figure. Specifying (in, out) creates a situation not actually shown in their figures.

In summary, there is a conflict between how we reach neighbors and which degree to use for the “k” of knn(k). There does not seem to be a way to get neighbors via out-going edges but then correlate ego node in-degree with these neighbors.

My question is: Am I correct in this analysis, or am I missing something (after hours of puzzling over it)? Is there a way to get the “(in, out)” situation of Foster et al. with igraph::knn?

Thanks for your consideration,
Dan Suthers

This is a long post, so for now I’ll only answer this paragraph. If you are just looking to compute these assortativities, it is possible using assortativity().

Just set type1 and type2 to various combinations of in/out degrees as appropriate.

“type” here is a misnomer, and this will be corrected in a future version. These are not types (i.e. not categorical variables), but vertex values (i.e. continuous variables). It is the correlation of these values between directionally connected vertices that the function computes.

The C library has better documentation on this, and the R package is in the process of being updated as well. See igraph Reference Manual

I’ll look at the rest of your question later.

1 Like

@krlmlr It is strange that we already changed the type1 and type2 parameter names to values and values.in in the igraph-0.10 branch (through auto-generation), which is a breaking change. Yet somehow this did not cause any revdepcheck failures. Is no one using this?

I think your analysis is correct, so the answer to this question is “no”:

Is there a way to get the “(in, out)” situation of Foster et al. with igraph::knn?

But if you only need the assortativities, and not the k_{nn}(k) functions, then this can be accomplished with assortativity(), as I said above.


As you say, there could be three relevant “modes” for k_{nn}(k): how we reach neighbours (mode), how we compute the degree of neighbours (neighbor.mode), and how we compute the degree of the source (this is now always the same as mode).

1 Like

As I’m looking at this more carefully, I’m a bit confused about whether we are doing the weighted calculation of k_{nn}(k) correctly, @vtraag. Can you take a look?

My understanding of k_{nn}(k) is this:

We take all edges whose source is a degree-k vertex, and record the degree of the target. Then we average these target degrees.

What igraph actually does is compute the mean neighbour degree of each individual vertex i. This way we obtain k_{nn,i} for vertex i. Then it averages the mean neighbour degrees of vertices with degree k. This happens to be equivalent with the above definition of k_{nn}(k) because each source vertex of degree k has k neighbours.

However, if we implemented the generalization proposed by @dan_suthers, then the type of degree we consider for the source may not be the same as the number of its neighbours we looked at. So the above two approaches would no longer be equivalent.

We do have the same problem with the weighted generalization. It computes a weighted average of neighbour degrees, based on the weights of edges pointing to them. So once again the weighted average of target vertex degrees (taking the average over all directed edges) is not equal to the mean of weighted average neighbour degrees. igraph computes the latter. I think the correct way would be this:

In pseudocode code, igraph does this for the weighted case:

for all vertices u:
    strength = 0
    sum = 0
    for all neighbors v of u:
        strength += weight(u -> v)
        sum += degree(v)
    knn[u] = sum/strength

    du = degree(u)

    # the difference is here:
    knnk[du] = += knn[u]
    deghist[du] += 1

for all degrees d:
    knnk[d] = knnk[d] / deghist[d]

This seems wrong to me.

I think what it should be doing is:

for all vertices u:
    strength = 0
    sum = 0
    for all neighbors v of u:
        strength += weight(u -> v)
        sum += degree(v)
    knn[u] = sum/strength

    du = degree(u)

    # the difference is here:
    knnk[du] = += sum  # this is just knn[u]*strength
    deghist[du] += strength

for all degrees d:
    knnk[d] = knnk[d] / deghist[d]

Do you agree @dan_suthers and @vtraag ?

I convinced myself that:

  • The weighted calculation is buggy
  • knn and knnk should really be computed separately (not in a single function), and for knnk the relevant “modes” are the source and target degrees. For the edge connecting them we might choose whether we want to take the direction into account

Expect two PRs for these once I get the time (probably once the teaching part of the semester is over).

Thanks @szhorvat for your replies. I’m well aware of assortativity (had just taught it in the same class), but it had not occurred to me to point out to students that we can get the 4 assortativities of the Foster et al. and Barabasi figures with that by retrieving the appropriate degree sequences ourselves and supplying these as the values. (Good idea to rename “type1” and “type2” to values. I appreciate your continued work on improving igraph.)

I was focused on knn, because it supports succinct plots of degree correlations, and of course gets all the data for you. We were trying to use it to plot all 4 assortativities of a High Energy Physics citation data set I use for examples, and I was stumbling on explaining the “in” cases when the figures show following an outgoing link. Thanks for confirming that my analysis is correct concerning knn mode, and in the future I’ll use assortativity rather than knn when showing students the 4 assortativities (or at least be able to point out this limitation of knn with greater confidence). – Dan

@krlmlr It is strange that we already changed the type1 and type2 parameter names to values and values.

Maybe “values1” and “values2” would be clearer and more consistent.

I am not sure I am following the suggested change to weight computation (which I have not thought about much before now). I probably need more time to study this, but what it reminds me of is Newman’s (textbook) discussion of how Watt-Strogatz clustering or average local CC is node-centric (overweighting low degree nodes), while the triangle/two-paths definition of transitivity is edge-centric, weighting all edges equally. Depends on what you want to emphasize?

See fix: correct calculation of knn(k) in avg_nearest_neighbor_degree() by szhorvat · Pull Request #2419 · igraph/igraph · GitHub