graph:::length.communities (sometimes) produces the wrong length

# as expected.
g <- make_full_bipartite_graph(4,4)
clu <- structure( list(membership = c(1, 1, 1, 1, 2, 2, 2, 2)
                , algorithm = "onto"), class = "communities")
[1] 2
[1] 5 6 7 8

# not as expected when membership vector in cummunities is corrupted.
g <- make_full_bipartite_graph(4,4)
clu <- structure( list(membership = c(1, 1, 1, 1, 4, 4, 4, 4)
                , algorithm = "outside (1,2)"), class = "communities")
[1] 4

This can harm applications that depend on ‘[[’, e.g. plot.

Error in groups(x)[[i]] : subscript out of bounds.

For the record: this is due to a bug in the C core, and it will be resolved automatically after introducing the fix in the C core:

> g <- make_empty_graph(1, directed=F)
> clu <- cluster_leiden(g)
> length(clu)
[1] 1
> clu$membership
[1] 1

The point I would like to make: if the groups in the $membership vector are not numbered consecutively and starting at 1, then max($membership) is wrong. A safer alternative is length(unique()). But that’s a lot of overhead for an unlikely situation. Is that worth it?

In theory, the C core of igraph will always return membership vectors where the communities are numbered consecutively and there are no empty communities. We have a function for that in the C core called igraph_reindex_membership(). I think that all community detection functions in the C core call this function in the end, and if not, that’s a bug in the C core and should be fixed there.

1 Like

I added a test to the C core to verify that membership vectors use proper indexing when detecting communities in one particular random graph