Data structure type usage and visibility

There is a type called igraph_array3_t, but it does not seem to be used anywhere. Is there still any use for it? It is only referenced in a test file, and nowhere else in the codebase. It is mentioned in the documentation, but only the type is mentioned, and no further methods are included.

I propose that, if there is no use for it, we simply remove it from the project.

I would not remove it from the library. I don’t think removal provides any immediate benefit. I am okay with removing it from the documentation though.

Reasons:

  • It is documented now. Existing code may be using it (not igraph itself, but things depending on igraph). Let’s not break stuff for no benefit.

  • The lack of data structures in igraph, compared to the C++ standard library, is a recurring frustration for me. We may find a need for this data structure in the future. Let’s not have to spend time re-implementing it then.

In short: it does no harm and it is potentially useful, so leave it be for now.

Well, it just cleans up the codebase. Having unused code lingering around may invite people to start using it, but if it is not at all clear whether the implementation is actually fully functional, and correct, it may just lead to bugs and frustration. This is actually also a good reason to hide non-public symbols, so that people cannot start building on things that are not meant to be public. I would say that, whatever we make available, we should also maintain. Perhaps there is a clear use for array3, but from what I’ve seen it is quite straightforward, and there’s little added value of that particular data structure.

Not quite. It is only mentioned in passing, but none of the functions are actually properly documented.

That brings me to another point, and perhaps one that is more relevant indeed. There are many internal data types that are not documented (and some not used). One example is igraph_hashtable_t, which is private (defined in igraph_internal_types.h), but it is not used anywhere. However, I thought that maybe a hashtable would in fact be useful (as opposed to the array3). There are actually quite a number of internal types:

  • igraph_indheap_t
  • igraph_d_indheap_t
  • igraph_2wheap_t
  • igraph_trie_node_t
  • igraph_trie_t
  • igraph_2dgrid_t
  • igraph_hashtable_t
  • igraph_buckets_t
  • igraph_dbuckets_t
  • igraph_i_cutheap_t
  • igraph_set_t
  • igraph_fixed_vectorlist_t

Some of them can be more generally useful (e.g. igraph_set_t, igraph_trie_t, igraph_indheap_t, igraph_2wheap_t) and it may be worthwile to make them public (and include proper documentation).

What data structure from the C++ STL are you missing?

For example, linked lists. Generally, it’s harder to experiment with algorithms because not nearly as much is available.

However, the biggest pain is that igraph’s data structures can’t be flexibly templated on the element type. What if I want a set that holds not numbers but pairs of numbers? Something could be done with pointers, but it’s not sufficient to allow pointers. Some data structures need certain operations such as <, >, == comparisons or hashing. Providing these through a callback is both inconvenient and is likely a major performance hit compared to what C++ offers through inline functions and templates.

In the end, the system just can’t be made nearly as convenient as C++ (and it’s likely slower too).

This was a lot easier to write in C++, not only because it currently uses data structures that igraph doesn’t have, but also because it was much easier to change things and experiment while developing it: https://github.com/igraph/igraph/blob/master/src/degree_sequence.cpp (But frankly, I don’t remember why I ended up with linked lists.)


Anyway, back to array3: For some time now I wanted a function that converts binary images to graphs. This is very useful with spatial networks and microscopy data, some of which is 3D. I am not 100% happy with any of the tools that exist for this. It would be a good addition and it could be done for 3D images as well.

When, if ever, I get to it, I don’t know. My point is that stripping out pieces of code that might find use in the future should not be done lightly. “Clean codebase” is to some extent subjective. We need to know what tangible benefits we are getting. Don’t want to maintain it? Don’t actually have to right now. Worried that people will use it and it might have bugs? Remove it from the docs.

Hi guys,

I can see both points of view.

But there is one aspect of the problem that is asymmetric. If nobody is using that data type right now, we can remove it if we want. Once people start using it, deprecating will be a mess.

@szhorvat I’ve been in your shoes several times, especially during my postdoc. I slowly realized - the hard way, alas! - that there is a huge difference between “I need this data structure right now because I have a paper in review that uses it” and “I’m thinking of perhaps starting a project that could possibly benefit from that data structure”. The big difference is iterating through 15 wrong prototypes of the data structures you need and finally requesting the exact one you want.

So I’m with @vtraag to remove (for now) but I’ll be supporting @szhorvat in the future if he wants to reintroduce it for an application.

I’m also for adding some docs on the useful data types, as mentioned above by others.

As I said, there is a specific addition I meant to implement for a while which requires array3.

What practical gains are there from removing all the code (as opposed to removing it from the documentation)?

The practical gain is as above: if people start using it (with or without docs), we are in for a sea of trouble…

If it’s made clear that only documented functions are supported, I think there won’t be trouble.

I am asking not to remove array3 specifically because I will likely need it in the future, and because it is not as trivial as it may seem. For example, it already seems to be hooked up to the R interface. Not being very familiar with the R interface, I would not be able to easily do this if I had to add my own 3D array in the future.

If you really need it, of course I’m not against.

Maintaining a project like igraph is like walking a wire for us: to keep it alive we need to infuse some of our passion and requirements, but we also have to strive for a streamlined interface beyond our own needs. Therefore, let’s try to not keep too many of these loose threads or we’ll get tangled…