Data structure type usage and visibility

vtraag · 30 June 2020 14:27

There is a type called igraph_array3_t, but it does not seem to be used anywhere. Is there still any use for it? It is only referenced in a test file, and nowhere else in the codebase. It is mentioned in the documentation, but only the type is mentioned, and no further methods are included.

I propose that, if there is no use for it, we simply remove it from the project.

szhorvat · 1 July 2020 06:13

I would not remove it from the library. I don’t think removal provides any immediate benefit. I am okay with removing it from the documentation though.

Reasons:

It is documented now. Existing code may be using it (not igraph itself, but things depending on igraph). Let’s not break stuff for no benefit.
The lack of data structures in igraph, compared to the C++ standard library, is a recurring frustration for me. We may find a need for this data structure in the future. Let’s not have to spend time re-implementing it then.

In short: it does no harm and it is potentially useful, so leave it be for now.

vtraag · 1 July 2020 07:05

Well, it just cleans up the codebase. Having unused code lingering around may invite people to start using it, but if it is not at all clear whether the implementation is actually fully functional, and correct, it may just lead to bugs and frustration. This is actually also a good reason to hide non-public symbols, so that people cannot start building on things that are not meant to be public. I would say that, whatever we make available, we should also maintain. Perhaps there is a clear use for array3, but from what I’ve seen it is quite straightforward, and there’s little added value of that particular data structure.

Not quite. It is only mentioned in passing, but none of the functions are actually properly documented.

That brings me to another point, and perhaps one that is more relevant indeed. There are many internal data types that are not documented (and some not used). One example is igraph_hashtable_t, which is private (defined in igraph_internal_types.h), but it is not used anywhere. However, I thought that maybe a hashtable would in fact be useful (as opposed to the array3). There are actually quite a number of internal types:

igraph_indheap_t
igraph_d_indheap_t
igraph_2wheap_t
igraph_trie_node_t
igraph_trie_t
igraph_2dgrid_t
igraph_hashtable_t
igraph_buckets_t
igraph_dbuckets_t
igraph_i_cutheap_t
igraph_set_t
igraph_fixed_vectorlist_t

Some of them can be more generally useful (e.g. igraph_set_t, igraph_trie_t, igraph_indheap_t, igraph_2wheap_t) and it may be worthwile to make them public (and include proper documentation).

What data structure from the C++ STL are you missing?

szhorvat · 1 July 2020 09:36

For example, linked lists. Generally, it’s harder to experiment with algorithms because not nearly as much is available.

However, the biggest pain is that igraph’s data structures can’t be flexibly templated on the element type. What if I want a set that holds not numbers but pairs of numbers? Something could be done with pointers, but it’s not sufficient to allow pointers. Some data structures need certain operations such as <, >, == comparisons or hashing. Providing these through a callback is both inconvenient and is likely a major performance hit compared to what C++ offers through inline functions and templates.

In the end, the system just can’t be made nearly as convenient as C++ (and it’s likely slower too).

This was a lot easier to write in C++, not only because it currently uses data structures that igraph doesn’t have, but also because it was much easier to change things and experiment while developing it: https://github.com/igraph/igraph/blob/master/src/degree_sequence.cpp (But frankly, I don’t remember why I ended up with linked lists.)

Anyway, back to array3: For some time now I wanted a function that converts binary images to graphs. This is very useful with spatial networks and microscopy data, some of which is 3D. I am not 100% happy with any of the tools that exist for this. It would be a good addition and it could be done for 3D images as well.

When, if ever, I get to it, I don’t know. My point is that stripping out pieces of code that might find use in the future should not be done lightly. “Clean codebase” is to some extent subjective. We need to know what tangible benefits we are getting. Don’t want to maintain it? Don’t actually have to right now. Worried that people will use it and it might have bugs? Remove it from the docs.

iosonofabio · 5 July 2020 08:43

Hi guys,

I can see both points of view.

But there is one aspect of the problem that is asymmetric. If nobody is using that data type right now, we can remove it if we want. Once people start using it, deprecating will be a mess.

@szhorvat I’ve been in your shoes several times, especially during my postdoc. I slowly realized - the hard way, alas! - that there is a huge difference between “I need this data structure right now because I have a paper in review that uses it” and “I’m thinking of perhaps starting a project that could possibly benefit from that data structure”. The big difference is iterating through 15 wrong prototypes of the data structures you need and finally requesting the exact one you want.

So I’m with @vtraag to remove (for now) but I’ll be supporting @szhorvat in the future if he wants to reintroduce it for an application.

I’m also for adding some docs on the useful data types, as mentioned above by others.

szhorvat · 5 July 2020 10:12

As I said, there is a specific addition I meant to implement for a while which requires array3.

What practical gains are there from removing all the code (as opposed to removing it from the documentation)?

iosonofabio · 5 July 2020 10:13

The practical gain is as above: if people start using it (with or without docs), we are in for a sea of trouble…

szhorvat · 8 July 2020 07:30

If it’s made clear that only documented functions are supported, I think there won’t be trouble.

I am asking not to remove array3 specifically because I will likely need it in the future, and because it is not as trivial as it may seem. For example, it already seems to be hooked up to the R interface. Not being very familiar with the R interface, I would not be able to easily do this if I had to add my own 3D array in the future.

iosonofabio · 8 July 2020 07:34

If you really need it, of course I’m not against.

Maintaining a project like igraph is like walking a wire for us: to keep it alive we need to infuse some of our passion and requirements, but we also have to strive for a streamlined interface beyond our own needs. Therefore, let’s try to not keep too many of these loose threads or we’ll get tangled…

Topic		Replies	Views
igraph integer transition Development C	2	481	28 April 2021
New graph data structure proposal Development C	9	896	16 December 2020
C/igraph 0.10.0 Announcements C	5	387	11 September 2022
Why is igraph_bool_t an int and not a char? Development C	9	666	30 July 2020
igraph_distances_dijkstra Usage C	2	141	4 March 2024

Data structure type usage and visibility

Related topics