how to properly test random functions

GroteGnoom · 14 January 2021 12:20

@szhorvat
For example igraph_growing_random_game. Should I only test things that are the same for each call (e.g. the number of vertices) ?

I tried

#undef RNG_BEGIN
#define RNG_BEGIN() igraph_rng_seed(igraph_rng_default(), 0);	\
	igraph_rng_default()->def=2;}

to fix the seed, but that didn’t work out, and I am not sure that’s the right way to go anyway.

vtraag · 14 January 2021 13:08

You can set the seed using igraph_rng_seed(igraph_rng_default(), 0) as you already found out. There is no need to redefine RNG_BEGIN. Is there any specific that is not working as expected when setting the seed?

There are quite some other functions that are stochastic, or use a random graph, and hence depend on the rng seed. You could also take a look at those if you want to see how they are dealing with this situation, e.g. tests/unit/igraph_coloring.c to name just one example.

szhorvat · 14 January 2021 17:33

Many random functions are not tested, and we don’t have proper guidelines on how to do it. Ideas are welcome. Here are a few from the top of my head:

Do set a seed explicitly, for example igraph_rng_seed(igraph_rng_default(), 42). (Note: using 0 is likely fine, personally I’m averse to it … with some not very well implemented RNGs, certain seeds have been known to produce suboptimal output)
Do not use RNG_BEGIN in user code (it’s only for internal use to ensure that he default generator was initialized).
Think about what properties should be the same for all generated instances.

For example, with igraph_growing_random_game, the vertex and edge counts are fixed. If citation=true and m=1, then the graph should be a tree (see igraph_is_tree). Now that I look at it, I notice that the documentation is not comprehensive: it does not state that with citation=true, not only do edges originate from the newly added vertex, but they target an old vertex (i.e. no self-loops are created). When discovering such things, it’s good to amend/correct the documentation.

As you say, common things that are invariant are vertex count and edge count. Yet another things is if the graph lacks multi-edges or self-loops (with growing_random_game, it does not).
It is quite tedious, but it is useful to check some edge cases. E.g., what if n=0? (I fixed lots of edge-case failures in the past.) It is also useful to have a glance at the source code and make sure that the function does check its input arguments for validity. If it doesn’t, add it to All graph generators should check their input for validity · Issue #1322 · igraph/igraph · GitHub
In some cases, even though the function is random, for some input parameters there is only one valid output. One such example is threshold degree sequences with igraph_degree_sequence_game—there is precisely one simple graph realization of these sequences. It’s useful to add such tests.
Even if the test is fairly limited, it is still useful, as it exercises the function, and gives AddressSanitizer a chance to find problems. IMO this is the most important reason to add these tests.

Topic		Replies	Views
Possible shortcomings of the existing RNG API Development C	6	449	11 June 2021
How can I reproduce the results of sample_tree() ? Usage R	2	246	26 September 2022
help with using multiple rng's in C Usage C	3	292	10 May 2022
Consistency of RNG_BEGIN and RNG_END Development C , R	17	555	25 February 2021
The unit tests thread Development C	56	1908	26 April 2021

how to properly test random functions

Related topics