I have a question regarding the best practices for calculating centrality metrics and community structure in weighted, undirected networks that contain both positive and negative edge weights (representing correlations).
My goal is to analyze a microbial co-occurrence network where edges represent SparCC correlations, ranging from -1 to +1.
When I try to calculate metrics on my graph object g, I encounter errors with several key functions:
-
Betweenness Centrality: betweenness(g) fails with the error Weight vector must be positive. This seems to be triggered by negative cycles in the graph.
-
Modularity: cluster_fast_greedy(g) fails with the error Weights must not be negative.
-
Closeness Centrality: Even after transforming weights to a distance metric like 1 - abs(E(g)$weight), closeness(g, weights = …) can fail with Weight vector must be positive if any correlation has an absolute value of exactly 1, resulting in a zero-weight edge.
This leads to my main questions:
1. What is the canonical igraph approach for calculating betweenness() on a graph with meaningful negative weights?
* Is it standard practice to use the absolute value of the weights, like betweenness(g, weights = abs(E(g)$weight)), to represent connection strength? Does this approach have any known theoretical drawbacks I should be aware of?
2. How should cluster_fast_greedy() (or other modularity functions) handle networks with negative (antagonistic) relationships?
* Should negative edges be removed before calculating modularity (e.g., by creating a subgraph with only positive weights)?
* Or is there a specific community detection algorithm within igraph that is designed to handle signed networks correctly?
3. What is the robust way to calculate closeness() with weights derived from correlations?
* Is transforming weights using 1 - abs(E(g)$weight) the correct conceptual approach to represent “distance”?
* If so, how should the “zero-distance” edge case (when abs(r) = 1) be handled to avoid the error? Is adding a small epsilon, like (1 - abs(E(g)$weight)) + 1e-10, a valid and accepted workaround?
I suggest thinking about this not as “how to calculate centrality X with negative weights”, but “does centrality X make any sense with negative weights”.
Betweenness is defined based on shortest paths, which make no sense with negative weights. You can try to transform weights into the positive range, but the interpretation of betweenness (or any centrality!) with a given weight transformation is not a trivial question …
igraph’s closeness functions currently disallow zero weights (the error message is correct: zero is not positive). To be honest, I do not recall why this decision was made. Feel free to open an issue and suggest allowing zero weights. This could potentially result in infinite closeness values but that is not necessarily a problem. Using tiny weights instead of zeros is a fine workaround for now. I recommend 10^{-16}.
As for cluster_fast_greedy(), I do not recommend this function as igraph has more effective methods implemented that also maximize modularity, such as cluster_louvain() and cluster_leiden(). I would only use cluster_fast_greedy() if the other modulairty maximization methods are too slow compared to it.
As for signed networks, I am not experienced with these and will refrain from giving advice. igraph has no community detection methods that are specifically designed for signed networks, although the CPM objective function of cluster_leiden() does support negative weights (but note that this maximizes a different objective function than modularity!). The classic modularity doesn’t really make sense for negative weights. Gomez et al’s signed modularity is not currently implemented in igraph. Perhaps @vtraag or @schochastics can give more advice here.
1 Like
They are really helpful suggestions!
But I still have some problems.
When I’m using the function betweenness() or closeness(), may I define weight as (1 - abs(E(g)$weight)) + 1e-16? Or perhaps I should use abs(E(g)$weight)+1e-16? Or this method is completely wrong? When you are dealing with a signed network with both positive weight and negative weight, what would you do?
I’m sorry to bother you again, but I’m really confused!
Sorry! I think it was I misunderstood the meaning of “weight” in betweenness(), now I’ve read some articles about it and I suppose that it may be set as NA in many biological articles, because they just want to know how central some genes or microbes are, thus only the connections between them need to be kept, and we don’t have to know the distances between them.
Though, I’m still confused about the usage of cluster_fast_greedy() or cluster_leiden() and I’m not sure that how to import “weight” when using the function, is that hold the same meaning as in betweenness() ?
Thank you for your time!