It finds all simple cycles, i.e. closed paths with no repeating vertices.
In general, there can be a very large number of such cycles. The paper gives the count in a complete graph as:
Thus a useful implementation would make it possible to output only some of the cycles (either implement it as an iterator like in Python, or allow for a callback function to terminate the enumeration early, as with some existing functions)
Yes, indeed, I should have posted the exact reference.
I meant this algorithm, yes: Donald B. Johnson: Finding All the Elementary Circuits of a Directed Graph. SIAM Journal on Computing. Volumne 4, Nr. 1 (1975), pp. 77-84.
Another link to the article (for free, and legally): https://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF
For my particular use I didn’t encounter performance/memory issues related to the number of elementary circuits (I typically have 1000+ vertices and 1000+ edges). So it was possible to output the complete list.
I agree there should be some restrictions for bigger graphs: on the length of the circuits perhaps?
[EDIT]: OK, that’s what is already done in igraph_cliques_callback.
As far as I am concerned, this would be a very welcome addition to igraph’s C core. If you would like to give it a go, the first step is to get familiar with igraph and its quirks: compile the development version and make sure you can compile small programs that use igraph. If you need help, feel free to ask (I suggest opening a new thread in the Development category). I’m sure you’ll have questions. There are also useful resources in the Wiki. Once you got started with the basics, you can open a PR against the develop branch on GitHub and we can provide guidance there as you are proceeding with the implementation.
Implementing it in C is of course going to be quite a bit more work than doing it in R, so you should feel no obligation to do it. However, if you do decide to try, this would be a welcome addition.
As for incorporating an existing R implementation into the R interface, I can’t really comment on that as I am not deeply involved with that interface. We are a bit short on resources, and short on R developers. To be frank, the first priority is to simply update the R interface to use the latest C core, and get a release out. This is already taking up a lot of time. While I think it would be a good addition, I can’t personally make it happen, and I am not sure we have the resources right now do deal with it. Of course, it is possible that another maintainer would step up and be willing to incorporate the R implementation—however, there has no been any response so far. Therefore, for the moment I would suggest publishing the R implementation as a separate package that depends on igraph (if you are looking to publish it). Once again, this is just my opinion, and I am saying it because we are really short on resources right now.
As I wrote 3 years ago, I implemented this algorithm in R without any issue and did the same in C (that could be later integrated into igraph) but I now have only one issue with the latter implementation.
Precisely I am struggling to understand why the B-lists (the ones that keep track of the nodes that were already visited and that did not yield any circuit) don’t have the same structure as the adjacency list of the initial (i.e. not the subgraphs thereafter) directed graph. With same structure I mean the number of neighbours of each node in the B-lists should be at most the number of neighbours of each node in this initial adjacency list.
However in practice I noticed this is not the case (overflow errors and not the right number of circuits) but I don’t understand why (theoretical problem).
As a consequence, my initial implementation (storing the B-lists on the stack as a unique big array of length equal to the total number of edges from the initial graph) doesn’t work (pratical problem) but I have first to understand the underlying theory so that I can implement it in a correct way. Of course knowning the maximum size of the big array beforehand would be much better to avoid dynamic memory management in C (reallocations) and the related loss in execution speed.
BTW in R I didn’t encounter/notice this issue because you don’t have to worry about memory management issues in this language - the B-lists could grow and shrink dynamically without any problem.
My question is: Do you know whether there exists a non-trivial (a trivial one being the number of distinct nodes for each individual B-list) upper bound either for the length of each B-list (individual upper bound) or for the sum of all B-lists (i.e. over all the vertices - collective upper bound) and why is this collective upper bound not equal to the number of edges of the initial graph?
I found a practical solution already in December 2023 but never took the time to post it here.
It might not be the best solution.
As mentioned on page 4 of the original Johnson’s paper (whose link was posted in my previous message), “B-lists are used to remember information obtained from searches of portions of the graph which do not yield an elementary circuit.”
Practically, the object ‘B’ (a collection of B-lists, for each vertex) can be implemented as an adjacency list, i.e. as a 2-dimension list: a list of n vertices and from each of these n vertices, a list of neighbours of each vertex.
Heuristically, I replicated the structure of the adjacency list of the initial graph (not from the subsequent subgraphs) to instantiate and initialise the object ‘B’, and kept track of the physical length (initially equal to the length of each adjacency ‘branch’ from the adjacency list for a given vertex) as well as of the logical length of each B-list (i.e. at each vertex) when building ‘B’.
Whenever the logical length of a B-list becomes larger than its physical length, then this B-list is reallocated so that it contains as many items as vertices (i.e. n); as the number of vertices n is a (trivial) upper bound, this reallocation, if it happens (in my sample of 300+ graphs, it happened less than on 10 graphs), will not happen more than once for each B-list.
The alternative to avoid these dynamic reallocations would be to allocate B at once as a n x n table (i.e. n B-lists with n items each) but that would be far too memory-intensive for graphs with a large number of vertices (O(n²) storage space).
I know, I even gave a few examples of graphs with self-loops to test the self-loop detection on Github in early November (issue #2692).
Looking forward to comparing the performance on my graphs database.