Possible shortcomings of the existing RNG API

This discussion is brought over here from our Slack channel.


Original message by yours truly:

I’ve noticed an odd behaviour of igraph’s C random number generator API and I thought it would be nice to discuss it here to see whether there’s a need to change the API.

To put things into context, the API looks like this:

  1. We have igraph_rng_type_t , which is a struct that hosts a bunch of methods for a particular RNG (e.g., get() , which generates a random number, get_real() , which generates a uniformly distributed value between 0 and 1 etc).
  2. We have a bunch of “implementations” of igraph_rng_type_t such as igraph_rngtype_rand() (relies on the rand() function of the standard C library), igraph_rngtype_mt19937 (a portable Mersenne Twister implementation that behaves the same way on all platforms, irrespectively of the behaviour of the standard C library), igraph_rngtype_Python (specific to the Python interface, it calls into Python to generate random numbers) etc
  3. We then have igraph_rng_t , which is a struct that holds a pointer to an igraph_rng_type_t (i.e. the “method table”), another pointer to a memory area holding the state of the RNG, and a boolean flag named def , which is 1 if the RNG is igraph’s default RNG.
  4. Finally, we have igraph_i_rng_default , which is a static thread-local variable of type igraph_rng_t , and this RNG is used by igraph functions when you call stuff like RNG_INT31() , RNG_UNIF01() and so on.

Stuff that I find problematic or confusing:

  • igraph_i_rng_default is thread-local, but its state is not (because it points to the same memory area, igraph_i_rng_default_state , from all threads). This basically means that without additional mutexing around the state, our RNG is not thread-safe. Maybe making igraph_i_rng_default_state thread-local would be enough, but this needs to be tested. (But how?).
  • igraph_rng_default() returns the address of igraph_i_rng_default , which is sort-of-okay-ish. igraph_rng_set_default() also takes a pointer — but here’s the catch: igraph_rng_set_default() does not actually set the pointer returned by igraph_rng_default() but it copies the igraph_rng_t object given to it. It is not documented that igraph_rng_set_default() takes a copy, and neither it is documented (I think) that it works on a per-thread basis. I’ve spent quite a bit of time today debugging why I cannot restore the default RNG in the Python interface, only to realize that after calling igraph_rng_set_default(something) , it is not true the igraph_rng_default() == something . I cannot decide whether it’s logical (and there’s a reason why it’s done this way) or it’s totally contrary to expectations.
  • I don’t quite understand what the def member of an igraph_rng_t is good for. All that I’ve found in the code is that igraph_i_rng_default starts with def = 1 , and seeding an RNG always sets def to 0, but I don’t see this member being read anywhere.

Response from @szhorvat, copied here:

Regarding the thread-local state: I’m not sure it can be made thread-local without having some initialization code. When I looked at it (and I have to say I don’t recall all the details), my conclusions were that we have these options:

  • Require explicit initialization
  • Try to make initialization implicit by hiding it in something like RNG_BEGIN() (I don’t like it)
  • Use C++ for initialization. Constructors of thread-local variables will be run on thread creation. It’s a hack, I don’t like it.

There’s also the question of:

  • What integer type to use for the RNG return value ( igraph_integer_t , int64_t , int , long , etc.)
  • How many bits to actually use for the RNG return value.

Since igraph_integer_t might only hold 32 bits, and since int64_t may (??) be slow on 32-bit platforms, perhaps we should make it explicit that only 32 or 31 bits are used. Would we ever want more in the future? Why/why not?

Do we want consistent RNG behaviour on 32- and 64-bit platforms? I.e. should it return the same sequence of numbers for the same seed? I think yes.

Sounds logical to me as well (after all, that’s why we have our own Mersenne Twister implementation as the default RNG instead of relying on rand() from the standard library). In that case, the return value of the RNG function should not depend on whether the platform is 32-bit or 64-bit; it has to be the same data type, preferably an uint32_t.

The choice of the data type constrains our hand a bit in terms of what RNG implementations we can accommodate easily. The most common Mersenne Twister implementation has a word length of 32 bits, so that fits in nicely with an uint32_t return type. Other RNGs might use a different word length so we should probably do our homework a bit and see what word length the state-of-the-art RNGs use.

At the same time, we should probably get rid of the seemingly awkward RNG_INT31() function as well. I mean, the part that is being awkward here is the bit length (31 bits); we only used 31 bits so we can safely store it in a signed long int in the Viger-Latapy graph generator, but there’s absolutely no need for that; we could just as well use 32 bits – and if the user really needs a smaller number of random bits at some point, there’s always the possibility of shifting the result down by a few bits. (That’s what Python’s getrandbits() function does if you ask for less than 32 bits – it gets the next 32 bits from the RNG and then discards a few).

Let me continue the brainstorming.

The picture I describe below seems nice to me. Let’s think about whether it’s feasible for igraph.

We can have two types of functions:

  • A low-level function that essentially returns random bits. This could for example return a fixed type of uint32_t on all platforms.
  • High-level sampling functions that returns igraph’s native types, namely igraph_integer_t and igraph_real_t.

In the current implementation, the “high-level functions” are igraph_rng_get_integer(), igraph_rng_get_unif(), igraph_rng_get_normal(), etc.

It would be nice if the high-level functions would completely fill their return types with random bits. For example, if igraph_integer_t is 64-bit, it would be great if igraph_rng_get_integer(rng, -2^63, 2^63-1) would actually return 64 random bits instead of stretching a 32-bit integer to fill its 64-bit range. This is not currently implemented. The same goes for igraph_real_t which on most of our target platforms holds 53 bits (i.e. many more than 32).

Right now, we rely on rand_max() being large enough to give a reasonable resolution, and do not try to ensure that all bits are random.

In the current implementation, it is not so clear what the “low-level” function returning random bits is. Sometimes rng->get() is used and sometimes rng->get_real() is used. I assume this is because get_real() is the “primary” random function of R? If it is only because of R, I think we can do away with this, and always rely only on get(). I am saying this because the default implementations of the high-level functions are not used by the R interface anyway. Instead, R’s own sampling functions (for uniform, normal, etc. distributions) are exposed directly.


Concrete proposal

This is a tentative proposal only, and very much debatable. I expect that the parts about rng_max() may be controversial.

  • rng->get() should be changed to return an unsigned 32-bit integer in an uint32_t.
  • We get rid of rng_min() (already deprecated) and also rng_max(). Instead of using rng_max(), we ensure that rng->get() always returns 32 random bits, whatever that takes. If the underlying generator outputs only 16 bits, then two such numbers will be combined to get 32 bits. If the underlying generator outputs 64 bits, then the first call returns the first 32 of them, then next one the remaining 32.
  • The usage of rng->get_real() should be banished from the default implementations of high-level sampling functions.
  • The default implementation of the uniform real generator (i.e. igraph_rng_get_unif01()) should use not one, but two raw 32-bit values in order to be able to fill all 52 bits. This can then be used as a building block for other distributions over the reals.
  • The default implementation of the uniform integer generator (i.e. igraph_rng_get_integer()) should conditionally use two raw 32-bit values instead of one if the requested integer range is large enough to necessitate this (only possible in 64-bit mode).
  • We keep all current high-level samplers, except for igraph_rng_get_int31() which will be removed. Those who need raw values should call rng->get() directly.

Fixing the raw values to be 32-bit on all platforms, and using precisely two raw values to generate real numbers on all platforms (even if that platform has an unusual floating-point representation) should ensure that if the same seed and same RNG is set, igraph will behave the same way, regardless of the platform.


Problems?

I think that there is one point in my proposal that may be controversial, namely enforcing that we always get precisely 32 random bits (or maybe only 31 if that fits the majority of practical generators better) instead of allowing the flexibility of having an arbitrary rand_max().

We should probably look into whether any other system does this, or if anything has been written about this approach.

Any thoughts?

How many bits should the raw value have?

Is 32 the best choice? More than 32 is problematic on 32-bit systems, so it should not be more.

What do the existing generators produce?

igraph_rngtype_glibc2 and igraph_rngtype_rand both produce 31 bit only. Both of these are kept only for backwards compatibility (to be able to reproduce calculations done with old igraph). We might want to get rid of them.

igraph_rngtype_R also produces 31 bits, but this is easily changed, this it uses the real generator as the primary one. Furthermore, with R, none of the default samplers would be used.

What about the Python one @tamas ? Can it produce 32 bits, or is 31 better?

I have to double-check this, but I think that the Mathematica one can return 32 bits on 32-bit platforms (and 64 bits on 64-bit ones). It does return a signed integer, but I think it can utilize the negative range as well.

Overall, going with only 31 bits seems safer when it comes to integrating igraph into other systems. On the other hand, we can’t completely fill a 64-bit igraph_integer_t with only two 31-bit raw values.

What would we gain from my proposal above?

  • Simplicity
  • More randomness, and full utilization of all 64 bits that we will soon have in igraph_integer_t. We also don’t have to rely on the (potentially bad) resolution of the RNG when generating random reals.

In what other ways can we get these advantages?

Another possible solution is to use random reals as “raw” values, just like R does. This will only give us 52 bits instead of the full 64, but that is a good enough approximation for most applications. I don’t see anyone wanting to generate random integers with a greater range than 2^{52} \approx 4.5\times10^{15}. For reference, 10^15 bytes is 1000 terabytes. We will probably not need indices this large even 20 years from now.

Agree, with a possible caveat. igraph’s built-in glibc2 RNG has a declared purpose of being compatible with the RNG in glibc2. This particular RNG returns 31 bits for some reason; see igraph_i_rng_glibc2_get(), which is probably lifted verbatim from glibc2. But, to simplify our lives a bit, we coudl actually assume that all RNGs return a given number of random bits, and instead of rng->max(), we could have rng->num_bits(), which returns the number of random bits that a single invocation of rng->get() returns. rng->max() could then be kept and changed to return pow(2, rng->num_bits()) - 1.

See above regarding a potential case where we cannot control how many random bits we get from rng->get().

Can’t we actually declare rng->get_real() a higher-level sampling function as well? Both the glibc2 and the mt19937 implementation of get_real() in igraph’s C core simply falls back to rng->get() followed by a division.

Otherwise I agree with the rest. Maybe it’s also worth looking at the RNG interface of the GNU GSL for inspiration.

The RNG in python-igraph uses Python’s random.random() for ->get_real(), and uses random.getrandbits(32) for ->get(), but it’s easy to change.

Yes, that was what I meant all along.

There are the rng->get_... functions, which should all be optional, and serve to override the default implementations. Only rng->get() should be mandatory and relied on.