Extracting ground-truth clusters from sample_sbm result

graphenthus · 26 February 2020 07:40

Hello!

Firstly, really, really, really wonderful package; thank you so much for creating/maintaining it!

I’ve run into an issue using igraph on R that I can’t quite figure out.

I generate a graph using sample_sbm with 100 nodes and 2 clusters and intra-cluster edge probability greater than inter-cluster edge probability. That is, I tell sample_sbm to generate a graph with two clusters by passing a 2-by-2 matrix into the pref.matrix parameter. My resulting graph looks like this:

Is there some way to recover the ground-truth clusters (i.e., the clusters that sample_sbm presumably creates using pref.matrix)? It seems using cluster_leading_eigen worked for this particular case, but I was wondering if it’d be possible to recover the information used to generate the network in the first place. I’m having some trouble finding a solution in the documentation, etc.

If this is not possible, which of the clustering algorithms that igraph provides would be best suited (in terms of accuracy and time)? I’m working with larger networks with more clusters as well, so any way to recover this information efficiently would be fantastic!

Thank you!

vtraag · 26 February 2020 08:21

If I understand the documentation correctly, the nodes are simply assigned to a block in consecutive order. That is, all nodes 1 to block.sizes[1] are in block 1 and all nodes in block.sizes[1] + 1 to block.sizes[2] are in block 2, and so on. So, in your example, nodes 1 - 50 should be in block 1 while nodes 51-100 are in block 2. Does that answer your question?

For extracting communities from the graph there are quite a number of different options, see all cluster_* methods. In my experience cluster_infomap and cluster_louvain work quite well. Hopefully soon the cluster_leiden method will also become available in R, which improves on the cluster_louvain method (disclaimer: I am the author of the Leiden algorithm).

graphenthus · 26 February 2020 08:43

Ah, thank you!

Are you getting that from the description of the block.sizes argument?

Using cluster_louvain on another dataset, this indeed seems to be the case:

[1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
  [35]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2
  [69]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
 [103]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
 [137]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2 11 11 11
 [171] 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
 [205] 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
 [239]  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
 [273]  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  5  5  5  5  5  5  5  5  5  5  5  5  5
 [307]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
 [341]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5

Thank you so much for your help!

vtraag · 26 February 2020 08:56

Well, I must admit that the documentation is a bit unclear at this point, which should be improved. I just opened an issue for that on GitHub.

The C documentation and the Python documentation do mention the consecutive node order.

graphenthus · 26 February 2020 16:20

Ah, amazing; the documentation for Python and C clear things up significantly. I should have thought to look there. Thank you so much! Looking forward to the R implementation of cluster_leiden!

Topic		Replies	Views
Stochastic Block Model algorithm Development	2	817	27 February 2020
infomap clustering issues Usage R	1	386	29 April 2021
Extracting single linkage clusters from a very large pairs list Usage R	0	329	22 January 2021
Improve cluster_leiden doc. Usage R	12	456	1 August 2022
Remove singleton clusters using induced subgraph Usage Python , R	7	894	17 November 2022

Extracting ground-truth clusters from sample_sbm result

Related topics