This discussion is brought over here from our Slack channel.
Original message by yours truly:
I’ve noticed an odd behaviour of igraph’s C random number generator API and I thought it would be nice to discuss it here to see whether there’s a need to change the API.
To put things into context, the API looks like this:
- We have
igraph_rng_type_t, which is a struct that hosts a bunch of methods for a particular RNG (e.g.,
get(), which generates a random number,
get_real(), which generates a uniformly distributed value between 0 and 1 etc).
- We have a bunch of “implementations” of
igraph_rngtype_rand()(relies on the
rand()function of the standard C library),
igraph_rngtype_mt19937(a portable Mersenne Twister implementation that behaves the same way on all platforms, irrespectively of the behaviour of the standard C library),
igraph_rngtype_Python(specific to the Python interface, it calls into Python to generate random numbers) etc
- We then have
igraph_rng_t, which is a struct that holds a pointer to an
igraph_rng_type_t(i.e. the “method table”), another pointer to a memory area holding the state of the RNG, and a boolean flag named
def, which is 1 if the RNG is igraph’s default RNG.
- Finally, we have
igraph_i_rng_default, which is a static thread-local variable of type
igraph_rng_t, and this RNG is used by igraph functions when you call stuff like
RNG_UNIF01()and so on.
Stuff that I find problematic or confusing:
igraph_i_rng_defaultis thread-local, but its state is not (because it points to the same memory area,
igraph_i_rng_default_state, from all threads). This basically means that without additional mutexing around the state, our RNG is not thread-safe. Maybe making
igraph_i_rng_default_statethread-local would be enough, but this needs to be tested. (But how?).
igraph_rng_default()returns the address of
igraph_i_rng_default, which is sort-of-okay-ish.
igraph_rng_set_default()also takes a pointer — but here’s the catch:
igraph_rng_set_default()does not actually set the pointer returned by
igraph_rng_default()but it copies the
igraph_rng_tobject given to it. It is not documented that
igraph_rng_set_default()takes a copy, and neither it is documented (I think) that it works on a per-thread basis. I’ve spent quite a bit of time today debugging why I cannot restore the default RNG in the Python interface, only to realize that after calling
igraph_rng_set_default(something), it is not true the
igraph_rng_default() == something. I cannot decide whether it’s logical (and there’s a reason why it’s done this way) or it’s totally contrary to expectations.
- I don’t quite understand what the
defmember of an
igraph_rng_tis good for. All that I’ve found in the code is that
def = 1, and seeding an RNG always sets
defto 0, but I don’t see this member being read anywhere.
Response from @szhorvat, copied here:
Regarding the thread-local state: I’m not sure it can be made thread-local without having some initialization code. When I looked at it (and I have to say I don’t recall all the details), my conclusions were that we have these options:
- Require explicit initialization
- Try to make initialization implicit by hiding it in something like
RNG_BEGIN()(I don’t like it)
- Use C++ for initialization. Constructors of thread-local variables will be run on thread creation. It’s a hack, I don’t like it.
There’s also the question of:
- What integer type to use for the RNG return value (
- How many bits to actually use for the RNG return value.
igraph_integer_t might only hold 32 bits, and since
int64_t may (??) be slow on 32-bit platforms, perhaps we should make it explicit that only 32 or 31 bits are used. Would we ever want more in the future? Why/why not?
Do we want consistent RNG behaviour on 32- and 64-bit platforms? I.e. should it return the same sequence of numbers for the same seed? I think yes.