I am trying to use the subgraph_centrality function on a protein-protein interaction network of about 11.000 nodes and 400.000 edges and it has been running for a couple of hours and it doesn’t work. I have read from documentation that the algorithm is only suitable for small graphs, but I hadn’t problems with other centrality measures. How could I handle this problem?
Thank you very much in advance!
The computations of different centrality measures have absolutely nothing to do with each other, so it is not possible to tell how long computing one of them should take based on another. Some are inherently slow to compute, such as subgraph centrality. Hence the warning in the documentation.
My first advice is to look for other centrality measures.
If you really need this, make sure you are using an undirected graph. I just looked at the implementation. It is based on diagonalizing the adjacency matrix, so it only works for undirected graphs. In fact this measure was proposed specifically for undirected graphs.
If that doesn’t help, try to compute it with other systems than R. Mathematically, the definition is simple: it’s just the diagonal of the matrix exponential of the adjacency matrix. For symmetric matrices, this can be done efficiently by diagonalization, which is precisely what igraph does (evaluate subgraph_centrality
to see the source code). The bottleneck is the call to R’s eigen()
function, which seems a bit slow on my machine compared to other systems like Mathematica. Most of these systems will just use LAPACK for computing eigenvalues/vectors, so I suspect the difference is down to CRAN’s R using a slow BLAS/LAPACK by default. Using an R distribution that links to a fast BLAS/LAPACK, such as OpenBLAS or Apple Accelerate, might help tremendously. Alternatively, you can use a different system than R if you know that they link to a fast BLACK/LAPACK on your system (NumPy, MATLAB, Mathematica are all good bets).
I expect this to be computable under an hour with a good BLAS/LAPACK, but the size of your graph is on the very limits of what I expect can be handled. Note that the number of edges doesn’t matter for this computation. It is the number of vertices alone that determines how long it will take.
Another idea is to compute an approximation through the Taylor expansion:
I + A + \frac{A^2}{2!} + \frac{A^3}{3!} + \dots
You can take as many terms as you have computational power for, and check if the last term you added had a significant contribution to the result.
In the meantime I computed the eigensystem of a 11000 by 11000 symmetric matrix with Mathematica and it took about 400 seconds (7 minutes) on an Apple M2. With R (I use the CRAN distribution for macOS) it takes longer I care to wait (and it uses only one core). Once again, this performance difference is not down to igraph, and not even down to R vs Mathematica, but very likely due to the BLAS/LAPACK they are using.