Hello everyone,

This summer I was looking for an R package with a function returning all the elementary circuits of a directed graph but didn’t find any.

That’s why I eventually implemented an algorithm to perform this task (Johnson’s algorithm).

Do you think it would be useful to add such a function to the package igraph?

Best regards,

Olivier Givaudan

1 Like

Yes, contributions are absolutely welcome.

Basic algorithms are implemented in igraph’s C core. It would be a welcome addition there, if you are familiar with C.

I was wondering if you implemented it in R or in C.

For reference, I believe you mean this algorithm: https://epubs.siam.org/doi/abs/10.1137/0204007

It finds all simple cycles, i.e. closed paths with no repeating vertices.

In general, there can be a very large number of such cycles. The paper gives the count in a complete graph as:

Thus a useful implementation would make it possible to output only some of the cycles (either implement it as an *iterator* like in Python, or allow for a callback function to terminate the enumeration early, as with some existing functions)

I implemented it in R (I posted on purpose here in the R section).

I am familiar with C, although I didn’t code in this language for some years now.

Yes, indeed, I should have posted the exact reference.

I meant this algorithm, yes: Donald B. Johnson: Finding All the Elementary Circuits of a Directed Graph. SIAM Journal on Computing. Volumne 4, Nr. 1 (1975), pp. 77-84.

Another link to the article (for free, and legally): https://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF

It relies on Tarjan’s algorithm to find the strongly connected components (as the elementary circuits are found within each of the strongly connected components):

Robert Tarjan: Depth-first search and linear graph algorithms. In: SIAM Journal on Computing. Volume 1, Nr. 2 (1972), pp. 146-160.

A link to the article (for free, and legally): https://www.semanticscholar.org/paper/Depth-First-Search-and-Linear-Graph-Algorithms-Tarjan/385742fffcf113656f0d3cf6c06ef95cb8439dc6

For my particular use I didn’t encounter performance/memory issues related to the number of elementary circuits (I typically have 1000+ vertices and 1000+ edges). So it was possible to output the complete list.

I agree there should be some restrictions for bigger graphs: on the length of the circuits perhaps?

[EDIT]: OK, that’s what is already done in `igraph_cliques_callback`

.

So, what would be the next step?

As far as I am concerned, this would be a very welcome addition to igraph’s C core. If you would like to give it a go, the first step is to get familiar with igraph and its quirks: compile the development version and make sure you can compile small programs that use igraph. If you need help, feel free to ask (I suggest opening a new thread in the Development category). I’m sure you’ll have questions. There are also useful resources in the Wiki. Once you got started with the basics, you can open a PR against the `develop`

branch on GitHub and we can provide guidance there as you are proceeding with the implementation.

Implementing it in C is of course going to be quite a bit more work than doing it in R, so you should feel no obligation to do it. However, if you do decide to try, this would be a welcome addition.

As for incorporating an existing R implementation into the R interface, I can’t really comment on that as I am not deeply involved with that interface. We are a bit short on resources, and short on R developers. To be frank, the first priority is to simply update the R interface to use the latest C core, and get a release out. This is already taking up a lot of time. While I think it would be a good addition, I can’t personally make it happen, and I am not sure we have the resources *right now* do deal with it. Of course, it is possible that another maintainer would step up and be willing to incorporate the R implementation—however, there has no been any response so far. Therefore, *for the moment* I would suggest publishing the R implementation as a separate package that depends on igraph (if you are looking to publish it). Once again, this is just my opinion, and I am saying it because we are really short on resources right now.

Many thanks, Szabolcs, for your detailed answer.

OK, let’s see how much available time I have in the next weeks/months.

1 Like

As I wrote 3 years ago, I implemented this algorithm in R without any issue and did the same in C (that could be later integrated into igraph) but I now have only one issue with the latter implementation.

Precisely I am struggling to understand why the B-lists (the ones that keep track of the nodes that were already visited and that did not yield any circuit) don’t have the same structure as the adjacency list of the initial (i.e. not the subgraphs thereafter) directed graph. With same structure I mean the number of neighbours of each node in the B-lists should be at most the number of neighbours of each node in this initial adjacency list.

However in practice I noticed this is not the case (overflow errors and not the right number of circuits) but I don’t understand why (theoretical problem).

As a consequence, my initial implementation (storing the B-lists on the stack as a unique big array of length equal to the total number of edges from the initial graph) doesn’t work (pratical problem) but I have first to understand the underlying theory so that I can implement it in a correct way. Of course knowning the maximum size of the big array beforehand would be much better to avoid dynamic memory management in C (reallocations) and the related loss in execution speed.

BTW in R I didn’t encounter/notice this issue because you don’t have to worry about memory management issues in this language - the B-lists could grow and shrink dynamically without any problem.

My question is: Do you know whether there exists a non-trivial (a trivial one being the number of distinct nodes for each individual B-list) upper bound either for the length of each B-list (individual upper bound) or for the sum of all B-lists (i.e. over all the vertices - collective upper bound) and why is this collective upper bound not equal to the number of edges of the initial graph?