This discussion is brought over here from our Slack channel.
Original message by yours truly:
I’ve noticed an odd behaviour of igraph’s C random number generator API and I thought it would be nice to discuss it here to see whether there’s a need to change the API.
To put things into context, the API looks like this:
- We have
igraph_rng_type_t, which is a struct that hosts a bunch of methods for a particular RNG (e.g.,get(), which generates a random number,get_real(), which generates a uniformly distributed value between 0 and 1 etc). - We have a bunch of “implementations” of
igraph_rng_type_tsuch asigraph_rngtype_rand()(relies on therand()function of the standard C library),igraph_rngtype_mt19937(a portable Mersenne Twister implementation that behaves the same way on all platforms, irrespectively of the behaviour of the standard C library),igraph_rngtype_Python(specific to the Python interface, it calls into Python to generate random numbers) etc - We then have
igraph_rng_t, which is a struct that holds a pointer to anigraph_rng_type_t(i.e. the “method table”), another pointer to a memory area holding the state of the RNG, and a boolean flag nameddef, which is 1 if the RNG is igraph’s default RNG. - Finally, we have
igraph_i_rng_default, which is a static thread-local variable of typeigraph_rng_t, and this RNG is used by igraph functions when you call stuff likeRNG_INT31(),RNG_UNIF01()and so on.
Stuff that I find problematic or confusing:
-
igraph_i_rng_defaultis thread-local, but its state is not (because it points to the same memory area,igraph_i_rng_default_state, from all threads). This basically means that without additional mutexing around the state, our RNG is not thread-safe. Maybe makingigraph_i_rng_default_statethread-local would be enough, but this needs to be tested. (But how?). -
igraph_rng_default()returns the address ofigraph_i_rng_default, which is sort-of-okay-ish.igraph_rng_set_default()also takes a pointer — but here’s the catch:igraph_rng_set_default()does not actually set the pointer returned byigraph_rng_default()but it copies theigraph_rng_tobject given to it. It is not documented thatigraph_rng_set_default()takes a copy, and neither it is documented (I think) that it works on a per-thread basis. I’ve spent quite a bit of time today debugging why I cannot restore the default RNG in the Python interface, only to realize that after callingigraph_rng_set_default(something), it is not true theigraph_rng_default() == something. I cannot decide whether it’s logical (and there’s a reason why it’s done this way) or it’s totally contrary to expectations. - I don’t quite understand what the
defmember of anigraph_rng_tis good for. All that I’ve found in the code is thatigraph_i_rng_defaultstarts withdef = 1, and seeding an RNG always setsdefto 0, but I don’t see this member being read anywhere.
Response from @szhorvat, copied here:
Regarding the thread-local state: I’m not sure it can be made thread-local without having some initialization code. When I looked at it (and I have to say I don’t recall all the details), my conclusions were that we have these options:
- Require explicit initialization
- Try to make initialization implicit by hiding it in something like
RNG_BEGIN()(I don’t like it) - Use C++ for initialization. Constructors of thread-local variables will be run on thread creation. It’s a hack, I don’t like it.
There’s also the question of:
- What integer type to use for the RNG return value (
igraph_integer_t,int64_t,int,long, etc.) - How many bits to actually use for the RNG return value.
Since igraph_integer_t might only hold 32 bits, and since int64_t may (??) be slow on 32-bit platforms, perhaps we should make it explicit that only 32 or 31 bits are used. Would we ever want more in the future? Why/why not?
Do we want consistent RNG behaviour on 32- and 64-bit platforms? I.e. should it return the same sequence of numbers for the same seed? I think yes.