Improve cluster_leiden doc.

KeesP · 21 July 2022 08:13

g <- graph.famous("Zachary")
# By default CPM is used
g <- cluster_leiden(g, resolution_parameter=0.06)

The example above can easily mislead the casual reader. The result of cluster_leiden() is not a graph.

My suggestion is to modify the documentation and include a plot to show the leiden partition:

g <- graph.famous("Zachary")  # By default CPM is used
ldc <- cluster_leiden(g, resolution_parameter=0.06)
ldc
plot(g, mark.groups = ldc)

szhorvat · 21 July 2022 08:48

Good point. Why don’t you open a pull request? We could certainly use help improving the documentation and examples. The example is located here in the code: rigraph/community.R at dev · igraph/rigraph · GitHub

KeesP · 21 July 2022 09:45

I need to familiarize myself with github. But this is a save topic to start with.

szhorvat · 21 July 2022 10:39

Yes, this is safe. You cannot do any damage, as someone will need to approve changes before they go live. So don’t worry about that.

The simplest way to get started is to make the changes directly in your browser. You can find the file on GitHub and use the “edit” button (pencil icon). rigraph/community.R at dev · igraph/rigraph · GitHub I suggest you use this method for your first PR.

In the long term, if you plan to make more contributions (which I would encourage you to do), it will pay off to check out the git repository on your computer and work with it locally (instead of in the browser). If you are not very comfortable with git yet, GitHub Desktop may be the easiest way to get started. https://desktop.github.com/

szhorvat · 21 July 2022 12:24

There’s another thing, in addition to the documentation, which you could help with, if you like.

At one time you did some tests, verifying if functions behave well when they are passed a very large number of vertices. We are working on this, and it will take time to resolve.

However, other types of invalid input would be useful to test for. Specifically:

Do all functions behave well with non-simple graphs (multigraphs, graphs with self-loops)?
Do functions generally behave well with edge cases such as the zero-vertex graph, the one-vertex graph (either with zero edges or self-loops), disconnected graphs, etc.?
Do functions reject invalid input such as a negative number of vertices for a graph generator?

To my knowledge, all these should work well. If there is a function that does not handle such inputs, it should be fixed.

Once again: What we cannot fix right now is very large numbers as input. We know that this fails and there is ongoing work to deal with this.

Doing such testing is of course rather boring, so I don’t expect that you would want to do it. But given that you have reported similar issues in the past, I wanted to let you know what kind of testing is the most useful at the moment.

KeesP · 21 July 2022 12:49

I don’t mind testing. However, to do it systematically over a set of functions, it helps to have a list of functions, or a simple procedure to create the list yourself.

KeesP · 22 July 2022 19:50

I carried out a few tests:

func <-  c( "cluster_walktrap(g, steps=4)"
          , "if (components(g)$no == 1) cluster_spinglass(g)"
          , "cluster_infomap(g)"
          , "if (components(g)$no == 1) cluster_fluid_communities(as.undirected(simplify(g)), no.of.communities=10)"
          , "cluster_leading_eigen(as.undirected(g))"
          , "cluster_edge_betweenness(g)"
          , "cluster_fast_greedy(as.undirected(g))"
          , "cluster_label_prop(g)"
          , "cluster_louvain(as.undirected(g))"
          , "if (gorder(g)< 50) cluster_optimal(g)"
          , "if (components(g)$no == 1) clusters(g)"
          )

List of graph’s

g ← make_empty_graph(0L, directed=FALSE)
g ← make_empty_graph(0L, directed=TRUE)
g ← make_empty_graph(1L, directed=FALSE)
g ← make_empty_graph(1L, directed=TRUE)
g ← make_empty_graph(2L, directed=FALSE)
g ← make_empty_graph(2L, directed=TRUE)
g ← graph_from_literal(1-1)
g ← graph_from_literal(1-+1)
g ← make_de_bruijn_graph(2,10) # 2= alphabet, 10 = all unique 10-sequences
g ← sample_gnm(n, n/2);
g ← make_de_bruijn_graph(2,10)
g ← make_de_bruijn_graph(2,8); g ← g + g
g ← make_de_bruijn_graph(2,8); g ← add_edges(g, c(t(get.edgelist(g))))

I found no irregularities.

KeesP · 23 July 2022 20:56

On closer inspection I found an issue with Spinglass and the length function.

g ← make_empty_graph(1L)
clu ← cluster_spinglass(g)
print.default(clu)

$membership
[1] 2

$csize
numeric(0)

$modularity
[1] NaN

$temperature
[1] 0.01

$algorithm
[1] "spinglass"

$vcount
[1] 1

attr(,"class")
[1] "communities"
 length(clu)
[1] 2

Community IDs must start with one. The $membership vector contains a non-existing vertex, $csize is numeric(0) (wrong), and length() incorrectly sets the length to 2.

tamas · 25 July 2022 11:53

Thanks, this seems like a bug in the C core of igraph, which contains a separate branch for handling null and singleton input graphs. I’ll look into this.

tamas · 25 July 2022 12:02

Bug now fixed in the master and develop branches of the C core; the fix will be released in the next patch version of the R interface. Until then, I’m afraid you’ll need to special-case null and singleton graphs (i.e. if vcount(g) < 2) when working with the spinglass clustering.

KeesP · 25 July 2022 13:28

As you can see in the example, the length is also wrong (2).
For clarity I will make a separate item.

tamas · 28 July 2022 18:59

The problem with length() will probably be solved by the fix in the C core as well because length(clu) is essentially max(clu$membership) and it’s the membership vector that’s incorrect. This happens with my R after fixing the C core:

> g <- make_empty_graph(1, directed=F)
> clu <- cluster_leiden(g)
> length(clu)
[1] 1
> clu$membership
[1] 1

KeesP · 1 August 2022 12:34

I would like to clarify examples in help(‘-.igraph’). How do I find this file to process it?

Topic		Replies	Views
R/igraph 1.2.7 Announcements R	0	426	15 October 2021
The unit tests thread Development C	56	1908	26 April 2021
C/igraph 0.10.2 Announcements C	0	252	16 October 2022
Extracting ground-truth clusters from sample_sbm result Usage R	4	856	26 February 2020
Hierarchical clustering using Leiden Algorithm Usage Python	4	1912	1 August 2022

Improve cluster_leiden doc.

Related topics