We were discussing a bit about breaking the API with @vtraag this morning. I think it’s good to write some of my thoughts down to share with others as well.
We are aiming for a “stable” 1.0 release, but what does stable mean? Also, does it apply to source or binary compatibility? Stability is the enemy of flexibility and quick progress. It is useful to have a balance between stability and flexibility. This post is about how to achieve a good balance, and get the advantages from both.
Marking functions as experimental
Something I proposed in the past was to have “experimental” functions for which we do not make any promises about stability, i.e. they may be changed in a 1.x release (not only in 2.0). This is generally useful, since we may not have the most clear picture about what the best API would be when implementing some new functionality for the first time. It would also be perfectly fine for “direct” users who program with C/igraph.
But what about the expectations of distro/package maintainers and binary compatibility? What about “indirect” users who do not program igraph, but use software that depends on igraph?
Distro/package maintainers will likely expect that an igraph 1.0 binary (the shared library) can safely be replaced by an igraph 1.1 binary within a system. However, this will break software that used such “experimental” functions. One could blame this on that software (“you knew it was unstable API”) but there is very good reason for the high-level interfaces to make use of experimental functions: If we don’t actually use these experimental functions, we will not be able to accumulate the experience that will let us decide when the API has become good enough and graduate these functions to “stable” ones.
Breaking up igraph into several packages
Another option to deal with API stability is to break up igraph into several sub-packages, each with its own versioning and stability promises. The closer a sub-package is to the “core”, the more stable it can be. This was proposed before (I think both by Tamás and Vincent?): e.g. the SCG functionality is trouble, and wouldn’t it be nice to break it out into an auxiliary package?
However, this approach will only solve distro maintainers’ problems if the library is literally broken up into multiple shared libraries. I don’t love this idea, personally, because: (1) The task seems daunting (2) It might make igraph more complicated to use. The target users are academics, i.e. not experienced programmers. igraph should strive to be as easy to use as possible. In fact, it already is, compared to other C or C++ libraries. For me, this is one of the main selling points.
Designing APIs with care
In this section, I do not propose a third possible solution. Instead I will talk a bit about one specific issue that repeatedly comes up, and which has impact on designing stable APIs.
Some graph theory problems can be solved with multiple different methods. Current examples in igraph are shortest paths, feedback arc set, sampling graphs with given degrees. Typically, igraph functions have an enum
parameter that selects the method. This is too rigid because different methods may have different sub-parameters, which cannot conveniently be exposed through a single function in C (in most high-level languages they can, but not in C). For example, igraph_degree_sequence_game
has a method
parameter, which may be IGRAPH_DEGSEQ_VL
. The VL method relies on certain heuristics, which can be adjusted. This adjustability is not currently exposed, and cannot be exposed with such a rigid API.
Proposal: In such cases, let us have a main function in which one can select a method, but not adjust method parameters; in addition, let us have a separate “lower level” function for each specific method. Each will have its own special parameters.
Concrete example: This came up while implementing the minimum cycle basis calculation. The method I implemented has certain special parameters. I do not know at this point if the same parameters would make sense for other methods as well. Currently, I have igraph_minimum_cycle_basis(graph, result, param1, param2)
. It would be less restrictive to have igraph_minimum_cycle_basis(graph, result, method)
and igraph_minimum_cycle_basis_method1(graph, result, param1, param2)
.
About “semantic versioning”
I am leaning towards the conclusion that rigidly adhering to semantic versioning is not the best choice for igraph.
I like the idea of having “experimental” functions very much (i.e. giving ourselves time, and soliciting user feedback, before settling on a final API). But this is not going to play well with semver, as it would require incrementing the major version frequently. We could still use x.y.z
versions, but have somewhat different rules for compatibility than semver does: Source and binary compatibility is guaranteed when z
changes. Source compatibility is guaranteed for all except experimental functions when y
changes. We should make it clear to distro maintainers that they can’t swap up an x.y
binary for an x.(y+1)
one blindly. When x
changes, all bets are off and anything can change. This should happen as rarely as possible.
We don’t need to use this three-level version number scheme (I don’t care about what a version specification looks like), but it would be nice to have these three increasingly disruptive levels of changes.
The main difference from rigid semver would be that an x.y
shared library could not be safely replaced by an x.(y+1)
library. In practice, there would not be major differences for users of the source code (who understand when they use experimental features), only for users of the binaries.
Should high-level interfaces bundle the C core?
The primary users of the C core are the high-level interfaces. If each of them were to bundle a matching version of the C core, many of compatibility problems I described above would go away. It is the high-level interfaces that will use most igraph C functions, including experimental ones. Other dependents of C/igraph are less likely to use them.
Bundling a matching C core with each high-level interface (either by static linking, or by dynamically linking to a private version of the C core) will give us a lot of flexibility, allow us to move faster and spend resources on implementing new functionality rather than maintenance (and fussing with technical details), and will make it much easier to ensure that users will have a consistent experience.
Distro maintainers will always complain about “not duplicating code”, but that’s a philosophical argument, while what I said above is a real practical consideration for us.
(BTW: I am quite concerned about the incessant insistence of distro maintainers to “unbundle” libraries without understanding why they’re bundled in the first place, and without asking us. See here, this is dangerous. In that case: plfit
needed to be adapted (RNG, error handling), prpack
is abandoned and has important bugfixes in igraph, cliquer
will likely not see another release according to my email exchange with its author, and again has bugfixes in igraph in response to actual user complaints. bliss
could now be unbundled—thanks to the very helpful changes by its author after our email exchanges—, but I would not say that its API has stabilized yet, and resources spent on unbundling it for no immediate gain could be much better spent elsewhere.)
Versioning and compatibility of high-level interfaces
What I wrote above concerns the C core. It seems to me that it is easier to avoid breaking the API in high-level interfaces, because the high-level languages that igraph is exposed to all have very flexible syntax. Furthermore, I expect the vast majority of users of the high-level interfaces to be “source users” (i.e. people who program igraph), not “app users” / “binary users”. This is not true for the C core.