igraph integer transition

At the moment, various integer types are being used in igraph, ranging from int to long int to igraph_integer_t. This post describes a proposal to clean up the integer types, and in particular, support 64-bit integers properly. This is a quite disruptive breaking change with respect to previous major releases. We therefore want to do this only once, so we want to make sure that we are doing this right. We therefore also solicit feedback from users about this proposal. We plan to do this for release 0.10.

Basic data types

We intend to have four basic (i.e. atomic) data types:

  • igraph_real_t. This will always be a double, i.e. a 64-bit floating point number.
  • igraph_integer_t. This will be an integer type that by default is 64-bit on 64-bit systems and 32-bit on 32-bit systems. It will be configurable to be a 32-bit integer type, even on 64-bit systems, in order to support the R interface, which unfortunately does not support 64-bit integers.
  • igraph_bool_t. This datatype will represent a boolean. Typically, we can store this in a char data type (8-bit). In order to support the R interface, this type is configurable to use a 32-bit integer instead.
  • igraph_complex_t. This datatype will represent a complex number, consisting of a real part and a complex part, both of which are represented as igraph_real_t.

In public headers, no other basic data types are allowed. All data types used in igraph should be defined on the basis of these four basic data types. Internally, it may be necessary to sometimes use other types, for example to interface with external libraries, or to support specific implementations of random number generators. This usage should be limited to what is strictly necessary.

In particular, this means that there should be (nearly) no int or long int being used, but only igraph_integer_t. Similarly, (almost) no float or double should be used, only igraph_real_t.

Array-like data structures

At the moment there are various array-like data structures that are defined on several basic data types. In particular, these are vector, matrix, array3, dqueue, heap,and stack. We intend to limit these data structures to the four basic data types that are introduced above. For example, for vector, we would only support the following types:

  • igraph_vector_real_t, defined as a vector of igraph_real_t elements.
  • igraph_vector_integer_t, defined as a vector of igraph_integer_t elements.
  • igraph_vector_bool_t, defined as a vector of igraph_bool_t elements.
  • igraph_vector_complex_t, defined as a vector of igraph_complex_t elements.

In addition, we will have a “default” type:

  • igraph_vector_t will be identical to an igraph_vector_real_t.

This default is consistent with the current implementation.

For the other data structures, we would similarly support only the indicated four basic data types, and make the “default” be identical to the data type defined on an igraph_real_t.

Similarly to the basic data types, in public headers, no other array-like data structures are allowed. All data types used in igraph should be defined on the basic of these four basic data types. Internally, it may be necessary to sometimes use other types, for example to interface with external libraries. This usage should be limited to what is strictly necessary. Although we may develop some helper data types for this specific purpose, these data types will be strictly private, and only limited to these specific purposes.

In particular, this means that we will remove igraph_vector_char_t, igraph_vector_long_t, igraph_vector_int_t and igraph_vector_float_t from the library, with of course first a deprecation of these types. Similarly, these types would also be remove for the other array-like data structures.

Errors

At the moment, most functions return an int for signalling an error condition. We will change this so that functions that return an error will return an igraph_error_t instead, which will be typedef’d to an int. This does not change the return data type of the functions, but it will clarify the semantics of the function definitions. It also distinguishes it from other functions that do return an actual integer, such as igraph_vcount, and not an error code.

Configurability

The R interface only supports double and int values. Hence, in order to keep supporting the R interface, we need to make the igraph_integer_t and the igraph_bool_t configurable to both be an int. Since any array-like data structures are defined on the basis of these basic data types, they should automatically change along with it. This should be strictly limited to the R interface only, and should not be used in other settings.

Graph representation

At the moment, the graph is represented using igraph_vector_t, i.e. using doubles. At some point, we will change this to work with the integer type, i.e. igraph_vector_integer_t. This will break much of the existing code, because the interaction with the graph is usually based on handling igraph_vector_t.

Previous discussions

This topic is further discussed in various places, starting at Put integer types into order · Issue #1450 · igraph/igraph · GitHub, with some follow up discussions in Fix integer types by vtraag · Pull Request #1626 · igraph/igraph · GitHub, including some more technical details.

Is the primary purpose of these types to have more flexibility, like making bool a smaller type, and changing the integer size? Do we want to keep this flexibility for the igraph_real_t? It seems like it will be a double forever, and I find double clearer than igraph_real_t.

Why would we keep the default igraph_vector_t, instead of only using igraph_vector_real_t? Because it’s used everywhere? I find it confusing, and noticed really late that I was using doubles as integers in lots of places. This does relate to the graph representation part.

Changing the error return types seems like it’s independent of the other parts. Since this is such a huge project, maybe it’s nice to split it up as much as possible, and see the error return types as a separate issue.

Do we discuss the plans here, and the execution at Fix integer types by vtraag · Pull Request #1626 · igraph/igraph · GitHub? I’m not completely sure what’s the status of that pull request. In general this seems like an issue where I could spend a lot of time on to fix things, but I don’t completely understand the plans.

One argument for having an igraph_real_t is that people will keep on asking why we don’t have it if we have igraph_integer_t, especially beause we used to have it in the past. It might also open up the possibility to port igraph to low-end embedded CPUs (like the STM32F4/F7) where one can save space in flash memory by not using doubles (a clever compiler can strip out the entire double-precision part of the math library if the whole project uses floats). I’m not saying that it will happen as we haven’t really had embedded devices in mind when we started igraph, but it’s a possibility, and I can see a few potential applications there (for instance, doing obstacle avoidance with drones in static environments can make use of a visibility graph of the obstacles for path planning).

I find it confusing, and noticed really late that I was using doubles as integers in lots of places.

Yes, that’s confusing, and it comes from igraph’s legacy as it was originally meant to be an R library only. In R, there are no integer vectors/arrays, numeric vectors are always stored as doubles, and it made it possible to use the exact same graph representation in R and in C with zero copying when crossing from C-world to R-world or vice versa.

Changing the error return types seems like it’s independent of the other parts. Since this is such a huge project, maybe it’s nice to split it up as much as possible, and see the error return types as a separate issue.

Yes, and actually, I wanted to do this part separately for the whole of this week and the previous one, but I have never gotten around to doing it :slight_smile: But I totally agree with you, we can first change the error return types, commit and test it, and then move on to the actual integer-related problems.

Re Vincent’s pull request: I don’t think it will be merged as-is, but don’t spend time on fixing it either. There was lots of search-and-replace stuff in that PR and it replaced ints with other types even in places where it didn’t make sense to do so. I’d probably rather take a bottom-up approach where we carefully fix the core data types first to behave nicely with 64-bit integers, and then slowly move up and fix the higher-level algorithms as well.

One difficulty that comes to my mind (and that I don’t really know what to do with at the moment) is to be able to construct test cases with large graphs (more than 2^32 vertices) so we can spot overflow errors in the algorithms that become apparent only when the graph is large. It is going to be quite tedious to construct graphs that are large enough to make a proper test case for the 32-bit to 64-bit transition but we can still compute the exact, expected results for them by other means.