A stable API for igraph 1.0?

We were discussing a bit about breaking the API with @vtraag this morning. I think it’s good to write some of my thoughts down to share with others as well.

We are aiming for a “stable” 1.0 release, but what does stable mean? Also, does it apply to source or binary compatibility? Stability is the enemy of flexibility and quick progress. It is useful to have a balance between stability and flexibility. This post is about how to achieve a good balance, and get the advantages from both.

Marking functions as experimental

Something I proposed in the past was to have “experimental” functions for which we do not make any promises about stability, i.e. they may be changed in a 1.x release (not only in 2.0). This is generally useful, since we may not have the most clear picture about what the best API would be when implementing some new functionality for the first time. It would also be perfectly fine for “direct” users who program with C/igraph.

But what about the expectations of distro/package maintainers and binary compatibility? What about “indirect” users who do not program igraph, but use software that depends on igraph?

Distro/package maintainers will likely expect that an igraph 1.0 binary (the shared library) can safely be replaced by an igraph 1.1 binary within a system. However, this will break software that used such “experimental” functions. One could blame this on that software (“you knew it was unstable API”) but there is very good reason for the high-level interfaces to make use of experimental functions: If we don’t actually use these experimental functions, we will not be able to accumulate the experience that will let us decide when the API has become good enough and graduate these functions to “stable” ones.

Breaking up igraph into several packages

Another option to deal with API stability is to break up igraph into several sub-packages, each with its own versioning and stability promises. The closer a sub-package is to the “core”, the more stable it can be. This was proposed before (I think both by Tamás and Vincent?): e.g. the SCG functionality is trouble, and wouldn’t it be nice to break it out into an auxiliary package?

However, this approach will only solve distro maintainers’ problems if the library is literally broken up into multiple shared libraries. I don’t love this idea, personally, because: (1) The task seems daunting (2) It might make igraph more complicated to use. The target users are academics, i.e. not experienced programmers. igraph should strive to be as easy to use as possible. In fact, it already is, compared to other C or C++ libraries. For me, this is one of the main selling points.

Designing APIs with care

In this section, I do not propose a third possible solution. Instead I will talk a bit about one specific issue that repeatedly comes up, and which has impact on designing stable APIs.

Some graph theory problems can be solved with multiple different methods. Current examples in igraph are shortest paths, feedback arc set, sampling graphs with given degrees. Typically, igraph functions have an enum parameter that selects the method. This is too rigid because different methods may have different sub-parameters, which cannot conveniently be exposed through a single function in C (in most high-level languages they can, but not in C). For example, igraph_degree_sequence_game has a method parameter, which may be IGRAPH_DEGSEQ_VL. The VL method relies on certain heuristics, which can be adjusted. This adjustability is not currently exposed, and cannot be exposed with such a rigid API.

Proposal: In such cases, let us have a main function in which one can select a method, but not adjust method parameters; in addition, let us have a separate “lower level” function for each specific method. Each will have its own special parameters.

Concrete example: This came up while implementing the minimum cycle basis calculation. The method I implemented has certain special parameters. I do not know at this point if the same parameters would make sense for other methods as well. Currently, I have igraph_minimum_cycle_basis(graph, result, param1, param2). It would be less restrictive to have igraph_minimum_cycle_basis(graph, result, method) and igraph_minimum_cycle_basis_method1(graph, result, param1, param2).

About “semantic versioning”

I am leaning towards the conclusion that rigidly adhering to semantic versioning is not the best choice for igraph.

I like the idea of having “experimental” functions very much (i.e. giving ourselves time, and soliciting user feedback, before settling on a final API). But this is not going to play well with semver, as it would require incrementing the major version frequently. We could still use x.y.z versions, but have somewhat different rules for compatibility than semver does: Source and binary compatibility is guaranteed when z changes. Source compatibility is guaranteed for all except experimental functions when y changes. We should make it clear to distro maintainers that they can’t swap up an x.y binary for an x.(y+1) one blindly. When x changes, all bets are off and anything can change. This should happen as rarely as possible.

We don’t need to use this three-level version number scheme (I don’t care about what a version specification looks like), but it would be nice to have these three increasingly disruptive levels of changes.

The main difference from rigid semver would be that an x.y shared library could not be safely replaced by an x.(y+1) library. In practice, there would not be major differences for users of the source code (who understand when they use experimental features), only for users of the binaries.

Should high-level interfaces bundle the C core?

The primary users of the C core are the high-level interfaces. If each of them were to bundle a matching version of the C core, many of compatibility problems I described above would go away. It is the high-level interfaces that will use most igraph C functions, including experimental ones. Other dependents of C/igraph are less likely to use them.

Bundling a matching C core with each high-level interface (either by static linking, or by dynamically linking to a private version of the C core) will give us a lot of flexibility, allow us to move faster and spend resources on implementing new functionality rather than maintenance (and fussing with technical details), and will make it much easier to ensure that users will have a consistent experience.

Distro maintainers will always complain about “not duplicating code”, but that’s a philosophical argument, while what I said above is a real practical consideration for us.

(BTW: I am quite concerned about the incessant insistence of distro maintainers to “unbundle” libraries without understanding why they’re bundled in the first place, and without asking us. See here, this is dangerous. In that case: plfit needed to be adapted (RNG, error handling), prpack is abandoned and has important bugfixes in igraph, cliquer will likely not see another release according to my email exchange with its author, and again has bugfixes in igraph in response to actual user complaints. bliss could now be unbundled—thanks to the very helpful changes by its author after our email exchanges—, but I would not say that its API has stabilized yet, and resources spent on unbundling it for no immediate gain could be much better spent elsewhere.)

Versioning and compatibility of high-level interfaces

What I wrote above concerns the C core. It seems to me that it is easier to avoid breaking the API in high-level interfaces, because the high-level languages that igraph is exposed to all have very flexible syntax. Furthermore, I expect the vast majority of users of the high-level interfaces to be “source users” (i.e. people who program igraph), not “app users” / “binary users”. This is not true for the C core.

I think this is already what we have been doing with isomorphism functions; i.e. the top-level igraph_isomorphic() function provides a simple interface with no knobs to twist, while power users can make use of igraph_isomorphic_vf2() or igraph_isomorphic_bliss() and their many variants.

Regarding versioning: if we want igraph to be considered as a package in major Linux distributions, we must ensure some kind of ABI stability once we get past version 0.x, otherwise all the maintainers would come complaining to us (and rightly so). A good convention to have (IMHO) is to treat the major version number of igraph as the ABI version number as well. Debian currently has libigraph0 and libigraph0-dev because we are at ABI version 0 where everything is still possible and anything may break at any time. Once we reach 1.x and these packages get turned into libigraph1 and libigraph1-dev, we will have to be much more careful about the ABI. Maybe it would be worth reading the chapter in the Debian Policy Manual about shared libraries to see what the expectations are – if we can satisfy Debian, chances are that we can satisfy any other distro as well :slight_smile: One thing that I worry about is that distros like Debian may have scripts in place that compare the list of symbols from one particular ABI version to future releases and complain loudly if something changes that would break the ABI.

It seems we are in luck with respect to the experimental idea of @szhorvat. According to Debian policy:

An ABI change is backward-compatible if any reasonable program or library that was linked with the previous version of the shared library will still work correctly with the new version of the shared library. […] An example of an “unreasonable” program is one that uses library interfaces that are documented as internal and unsupported. If the only programs or libraries affected by a change are “unreasonable” ones, other techniques, such as declaring Breaks relationships with affected packages or treating their usage of the library as bugs in those packages, may be appropriate instead of changing the SONAME. However, the default approach is to change the SONAME for any change to the ABI that could break a program.

This does give some room to have ABI breaking functionality even in a stable major version, as long as it is clearly marked as such (and probably should be accompanied by a warning).

I don’t know what this project is about, but they seem to have a fairly comprehensive ABI policy that also seems to deal with experimental APIs:

https://doc.dpdk.org/guides/contributing/abi_policy.html

Also, this tool seems useful:

https://sourceware.org/libabigail/manual/abidiff.html

One thing that seems clear (from the Debian policy as well) is that we need some kind of a marker macro in the headers for experimental functions, as well as an \experimental marker in the comments (if we don’t have it yet) that we can use to clearly indicate which functions are experimental.

One more thing that came to my mind. Python’s C headers make use of a macro named Py_LIMITED_API. If this macro is defined when compiling a Python extension, the headers make sure to hide any function that is not considered to be part of Python’s stable C ABI so one is forced to use only the stable part. This makes it possible to have C extensions that are compiled for Python 3.6 only (the first Python version where the stable ABI was introduced) and that are guaranteed to work in any future version of Python 3.x.

1 Like

Why do we have igraph_degree_sequence_game, instead of just exposing all the variants directly?

I am tempted to say that we don’t need the main function, but there are some good arguments for it:

One of the design goals for igraph is to be easy to use. This is definitely true for the high-level interfaces, but is also reflected in the C core. Whether we like it or not, many people coming to igraph do not yet understand the implemented graph and network concepts yet. They learn them through igraph. A typical user will arrive wanting to generate a random graph with given degrees. They want a function that “just does it”, with reasonable defaults. The next step is thinking about which method to use, and what are the advantages or disadvantages of different methods. Here, ideally each method will come with reasonable defaults again. The final step is to tune the parameters of a given method.

It is nice to allow people to go through this progression naturally. This argument fits the high-level interfaces much more than the C core though, which is why I said that I am tempted to say that we don’t need the main function. On the other hand: when someone implements a new high-level interface, they will need the same kind of guidance. The presence of a main function in the C core immediately makes it clear that hey should ideally expose this functionality as a single function in the high-level language as well.

P.S. I am most definitely not arguing for becoming teachers with igraph. But in practice, we can’t completely eschew that responsibility. A design that gently shepherds users onto the right path is a good design. “You should learn graph theory properly before starting to use igraph” stops being a good (or a fair) argument at some point.

P.P.S. I was not around when this design was set down for degree_sequence_game or some other functions. The actual motivation might have been something different.

1 Like