This discussion is brought over here from our Slack channel.
Original message by yours truly:
I’ve noticed an odd behaviour of igraph’s C random number generator API and I thought it would be nice to discuss it here to see whether there’s a need to change the API.
To put things into context, the API looks like this:
- We have
igraph_rng_type_t
, which is a struct that hosts a bunch of methods for a particular RNG (e.g.,get()
, which generates a random number,get_real()
, which generates a uniformly distributed value between 0 and 1 etc). - We have a bunch of “implementations” of
igraph_rng_type_t
such asigraph_rngtype_rand()
(relies on therand()
function of the standard C library),igraph_rngtype_mt19937
(a portable Mersenne Twister implementation that behaves the same way on all platforms, irrespectively of the behaviour of the standard C library),igraph_rngtype_Python
(specific to the Python interface, it calls into Python to generate random numbers) etc - We then have
igraph_rng_t
, which is a struct that holds a pointer to anigraph_rng_type_t
(i.e. the “method table”), another pointer to a memory area holding the state of the RNG, and a boolean flag nameddef
, which is 1 if the RNG is igraph’s default RNG. - Finally, we have
igraph_i_rng_default
, which is a static thread-local variable of typeigraph_rng_t
, and this RNG is used by igraph functions when you call stuff likeRNG_INT31()
,RNG_UNIF01()
and so on.
Stuff that I find problematic or confusing:
-
igraph_i_rng_default
is thread-local, but its state is not (because it points to the same memory area,igraph_i_rng_default_state
, from all threads). This basically means that without additional mutexing around the state, our RNG is not thread-safe. Maybe makingigraph_i_rng_default_state
thread-local would be enough, but this needs to be tested. (But how?). -
igraph_rng_default()
returns the address ofigraph_i_rng_default
, which is sort-of-okay-ish.igraph_rng_set_default()
also takes a pointer — but here’s the catch:igraph_rng_set_default()
does not actually set the pointer returned byigraph_rng_default()
but it copies theigraph_rng_t
object given to it. It is not documented thatigraph_rng_set_default()
takes a copy, and neither it is documented (I think) that it works on a per-thread basis. I’ve spent quite a bit of time today debugging why I cannot restore the default RNG in the Python interface, only to realize that after callingigraph_rng_set_default(something)
, it is not true theigraph_rng_default() == something
. I cannot decide whether it’s logical (and there’s a reason why it’s done this way) or it’s totally contrary to expectations. - I don’t quite understand what the
def
member of anigraph_rng_t
is good for. All that I’ve found in the code is thatigraph_i_rng_default
starts withdef = 1
, and seeding an RNG always setsdef
to 0, but I don’t see this member being read anywhere.
Response from @szhorvat, copied here:
Regarding the thread-local state: I’m not sure it can be made thread-local without having some initialization code. When I looked at it (and I have to say I don’t recall all the details), my conclusions were that we have these options:
- Require explicit initialization
- Try to make initialization implicit by hiding it in something like
RNG_BEGIN()
(I don’t like it) - Use C++ for initialization. Constructors of thread-local variables will be run on thread creation. It’s a hack, I don’t like it.
There’s also the question of:
- What integer type to use for the RNG return value (
igraph_integer_t
,int64_t
,int
,long
, etc.) - How many bits to actually use for the RNG return value.
Since igraph_integer_t
might only hold 32 bits, and since int64_t
may (??) be slow on 32-bit platforms, perhaps we should make it explicit that only 32 or 31 bits are used. Would we ever want more in the future? Why/why not?
Do we want consistent RNG behaviour on 32- and 64-bit platforms? I.e. should it return the same sequence of numbers for the same seed? I think yes.