# arc4random_uniform and avoiding modulo bias when using a random number generator

Using the `arc4random_uniform`

function is recommend over using `arc4random`

:
Former is advertised *as not having "modulo bias"* issues, see ```
man
arc4random
```

. (So I got curious, searched the web a bit, found
this blog post,
this answer and
this link to the source code of
one version used by Apple's.
It took me a bit to figure things out but when I got it I was surprised how
simple the solution is and the explanation could be. So here is my take at a
better explanation.) Say we have a random number generator that produces
numbers from 0 to 7 (i.e. 8 different values). Now in our application, we need
3 different values (i.e. 0 to 2, for simplicity). So the maximum source value
`SRC_MAX`

is 7, the maximum destination value `DST_MAX`

is 2. If we calculate
random values like

r = source_generator() % (DST_MAX + 1);

we end up with 0s and 1s a lot more than 2s — some people call that *modulo
bias* , because there is bias for/against certain values. In the semi-visual
world, the scenario looks like this ("XXX" marking troublemakers, i.e. values
causing bias):

/---------\ /---------\ /---------\ | | | | | | |XXX|XXX| 0 1 DST_ SRC_ MAX MAX

Now to solve the modulo bias one could pick *some way* so that troublemakers
are skipped, not taken into account. With troublemakers removed, we return to
uniform distribution. To do that, when we hit a troublemaker we just *roll the
dice again* until we no longer have a troublemaker. If our source generator is
producing uniformly distributed output, that's guaranteed to terminate,
quickly. In code, it could be a loop like this:

for (;;) { src = source_generator(); // [0 .. SRC_MAX] if (... no skip needed ...) break; } return src % (DST_MAX + 1); // [0 .. DST_MAX]

How many (and which values) need to be skipped? The number of troublemakers calculates as

trouble_count = (SRC_MAX + 1) % (DST_MAX + 1);

e.g. 2 for our case because (7+1) % (2+1) = 8 % 3 = 2 because 2*3 + 2 = 8. Our
input is a range ([0 .. `SRC_MAX`

]) and our output is a (smaller) range ([0
.. `DST_MAX`

]). There are two easy places where to skip values:
*(a) at the beginning* or *(b) at the end*.

a) Beginning > for (;;) { /---------\ /---------\ > src = source_generator(); |XXX|XXX| | | | | | | > if (src >= trouble_count) 0 1 DST_ SRC_ > break; MAX MAX > } > return src % (DST_MAX + 1); b) End > for (;;) { /---------\ /---------\ > src = source_generator(); | | | | | | |XXX|XXX| > if (src <= SRC_MAX - trouble_count) 0 1 DST_ SRC_ > break; MAX MAX > } > return src % (DST_MAX + 1);

To me (b) seems more intuitive, the
`arc4random_uniform`

sources I looked at
went for (a). Now the only tricky part left is how to calculate

trouble_count = (SRC_MAX + 1) % (DST_MAX + 1);

for `SRC_MAX`

= `0xffffffff`

= 2³²-1 without `(SRC_MAX + 1)`

wrapping around
to 0. I would go with

trouble_count = (SRC_MAX - (DST_MAX + 1) + 1) % (DST_MAX + 1);

The idea is that `(k - n) % n`

equals `k % n`

as we are dealing with modulus.
That's all.