It takes much more time to calculate the weighted betweenness than the unweighted one. For example, if it takes 5 hours to get the unweighted betweenness, it will take several days. Actually I am still waiting for the results after 2 days. Admittedly I have a huge dataset.
This happens for both the R version and the Python version of iGraph. I guess that the underlying code is different for the weighted and unweighted calculations. Is there any way that I can make the calculation time shorter?
I hope that this issue can be solved soon.
I would take the glass-half-full view and say that there is no issue or problem. What we have is a great feature:
Even though unweighted betweenness is a special case of the weighted one, so the same implementation could be used for both, we don’t re-use the weighted code. Instead, we have a specialized, fast implementation for the unweighted case, so you don’t have to wait as long for the unweighted graphs. Keep in mind that the weighted calculation is inherently slower than the unweighted one, and cannot use an algorithm that is as simple as in the unweighted case.
That said, we are working on features that make various betweenness approximation schemes easy to implement. We already have functions for range-limited betweenness. See here for how to use that for approximation: Phys. Rev. Lett. 105, 038701 (2010) - Centrality Scaling in Large Networks Version 0.10 of C/igraph will also have subset betweenness, which makes other approximation schemes possible, see e.g. Brandes & Pich: Centrality estimation in large networks.
What I suggest for you right now is to compare the timings of the weighted and unweighted cases on small graphs that are representative of your network’s structure. This way you can get an estimate of how long the computation may take on your large network, and make a decision about whether to wait for it.
If you have evidence that igraph’s calculation is too slow (e.g. by noticing that it doesn’t scale as it should with the network size, or that it is several times slower than a non-parallelized implementation in some other library), let us know.
You mentioned that the comparison should be based on non-parallelized implementation. Is there any plan to make the iGraph computation parallelized? I guess that will reduce the time greatly.
There is no immediate plan.
Of course, it might happen in the future, but in my opinion there are higher priority tasks at the moment. As an example, it makes sense to base a parallelized implementation on subset betweenness, a feature which will be part of the next release. Thus, there are milestones to pass before we get there.
Also, I’m sure you’re aware, but it’s always good to note that igraph is primarily a volunteer project, just like most other open-source software. Therefore, as always, contributions are very welcome!
Even if you don’t feel up to contributing a parallelized betweenness implementation (which is certainly not a trivial thing to do!), you can help out in many other ways, from reporting problems through improving documentation to answering other people’s questions this forum. Simply saving time for the core igraph contributors will help igraph move forward at a faster pace
@logion2000 Just a heads up that we are looking into possibilities to speed up such calculations. It would be helpful if you could let me know the size of your graph (vertex count and edge count), and whether it is directed or undirected.
My graph has 335053 nodes, and 10250346 edges. It is an undirected graph. Yes, it is a huge graph.
When can we expect to see this improvement? Just can’t wait
This is not an easy task. We still don’t know how much speedup is possible, if any at all, but we are investigating. If anything can be done, it would require changes to fundamental parts of the library. Thus I cannot give you a timeline. I would say that it is unlikely that any speedups will be implemented for version 0.10 of C/igraph, but is not impossible.