Single-use PRNG seedable with consecutive seeds - random

I need to make a pseudorandom number generator with a particular twist. Instead of generating numbers serially by using the seed from the previous generation for the new generation of a random number as it is usually done, I need a sequence of pseudorandom numbers generated in parallel from a consecutive sequence of seeds.
It would work like this, executed in parallel, each thread producing only a single number, with nothing shared or stored between threads:
thread #0: my_prng(1000) -> 1455191155 -> array[0]
thread #1: my_prng(1001) -> 2432152707 -> array[1]
thread #2: my_prng(1002) -> 185188134 -> array[2]
It's for generating image noise in parallel from a GPU (using OpenCL) so:
it should be run fast enough, as in using just a few operations
it shouldn't be cryptographically secure, it's just for graphics, it only needs to look about random
low periods are just fine, even 2^24 would do
it only needs to make 32-bit integers
it shouldn't use any memory, no buffers, and not store anything in a variable other than the result (the resulting new seed if there were one would go unused anyway)
it cannot rely on calls to rand() as it's not available in OpenCL or rely on any library
it shouldn't loop to use serialness (for instance looping 60 times just to make the 60th number)
it literally just needs to make a good pseudorandom number from a seed like 1000 that doesn't share a pattern with numbers made from adjacent seeds
None of the typical PRNG algorithms that I've tried could produce sequences from adjacent seeds that looked even remotely random, they're not meant to be seeded and used that way.

If you want 32bit->32bit RNG, then period would be 232, and with 224 in each stream you're limited to 28 streams.
Having said that, you might want to look into LCG RNG with following twist: implement fast skip-ahead as described in F. Brown, "Random Number Generation with Arbitrary Stride," Trans. Am. Nucl. Soc. (Nov. 1994).
Thus, you start with seed 1 and each consequent seed will just skip by 224 along the line
int32_t stream = 1 << 24;
rng.set_seed(int32_t seed) {
rng.skip_ahead(seed*stream)
}
Thus, you'll guarantee to get non-overlapping streams covering your whole period
Code, which implements idea for 63bit generator is here
UPDATE
F.Brown postulated skip-ahead is logarithmic in N, O(log2N).

Following Severin Pappadeux's answer I looked into fast skipping of LCGs and found that it is actually very simple to adapt the MINSTD algorithm for this using a simple modular exponentiation.
With MINSTD being minstd(n+1) = 16807*minstd(n) mod 2147483647 we get minstd(n+1) = 16807^n mod 2147483647.
Here's my resulting algorithm in OpenCL:
int pow_mod(int base, uint expon, uint mod)
{
int x = 1, power = base % mod;
for (; expon > 0; expon >>= 1)
{
if (expon & 1)
x = (x * power) % mod;
power = (power * power) % mod;
}
return x;
}
uint rand16(uint pos)
{
return pow_mod(16807, pos, 2147483647) >> 13 & 0xFFFF;
}
uint rand32(uint pos)
{
return rand16(pos) << 16 | rand16(pos + 0x80000000);
}
MINSTD produces 31-bits (but no 2^31-1 value), however I found bad patterns in the 11 least significant bits, so I take 16 of the 20 good bits and make a good 32-bit random number out two of those.
pos would be a seed plus an offset, representing a position in the sequence of MINSTD outputs.

Related

The fastest random number Generator

I'm intending to implement a random number generator via Swift 3. I have three different methods for generating an integer (between 0 and 50000) ten thousand times non-stop.
Do these generators use the same math principles of generating a value or not?
What generator is less CPU and RAM intensive at runtime (having 10000 iterations)?
method A:
var generator: Int = random() % 50000
method B:
let generator = Int(arc4random_uniform(50000))
method C:
import GameKit
let number: [Int] = [0, 1, 2... 50000]
func generator() -> Int {
let random = GKRandomSource.sharedRandom().nextIntWithUpperBound(number.count)
return number[random]
}
All of these are pretty well documented, and most have published source code.
var generator: Int = random() % 50000
Well, first of all, this is modulo biased, so it certainly won't be equivalent to a proper uniform random number. The docs for random explain it:
The random() function uses a non-linear, additive feedback, random number generator, employing a default table of size 31 long integers. It returns successive pseudo-random numbers in the range
from 0 to (2**31)-1. The period of this random number generator is very large, approximately 16*((2**31)-1).
But you can look at the full implementation and documentation in Apple's source code for libc.
Contrast the documentation for arc4random_uniform (which does not have modulo bias):
These functions use a cryptographic pseudo-random number generator to generate high quality random bytes very quickly. One data pool is used for all consumers in a process, so that consumption
under program flow can act as additional stirring. The subsystem is re-seeded from the kernel random number subsystem on a regular basis, and also upon fork(2).
And the source code is also available. The important thing to note from arc4random_uniform is that it avoids modulo bias by adjusting the modulo correctly and then generating random numbers until it is in the correct range. In principle this could require generating an unlimited number of random values; in practice it is incredibly rare that it would need to generate more than one, and rare-to-the-point-of-unbelievable that it would generate more than that.
GKRandomSource.sharedRandom() is also well documented:
The system random source shares state with the arc4random family of C functions. Generating random numbers with this source modifies the outcome of future calls to those functions, and calling those functions modifies the sequence of random values generated by this source. As such, this source is neither deterministic nor independent—use it only for trivial gameplay mechanics that do not rely on those attributes.
For performance, you would expect random() to be fastest since it never seeds itself from the system entropy pool, and so it also will not reduce the entropy in the system (though arc4random only does this periodically, I believe around every 1.5MB or so of random bytes generated; not for every value). But as with all things performance, you must profile. Of course since random() does not reseed itself it is less random than arc4random, which is itself less random than the source of entropy in the system (/dev/random).
When in doubt, if you have GameplayKit available, use it. Apple selected the implementation of sharedRandom() based on what they think is going to work best in most cases. Otherwise use arc4random. But if you really need to minimize impact on the system for "pretty good" (but not cryptographic) random numbers, look at random. If you're willing to take "kinda random if you don't look at them too closely" numbers and have even less impact on the system, look at rand. And if you want almost no impact on the system (guaranteed O(1), inlineable), see XKCD's getRandomNumber().
Xorshift generators are among the fastest non-cryptographically-secure random number generators, requiring very small code and state.
an example of swift implementation of xorshift128+
func xorshift128plus(seed0 : UInt64, seed1 : UInt64) -> () -> UInt64 {
var s0 = seed0
var s1 = seed1
if s0 == 0 && s1 == 0 {
s1 = 1 // The state must be seeded so that it is not everywhere zero.
}
return {
var x = s0
let y = s1
s0 = y
x ^= x << 23
x ^= x >> 17
x ^= y
x ^= y >> 26
s1 = x
return s0 &+ s1
}
}
// create random generator, seed as needed!!
let random = xorshift128plus(seed0: 0, seed1: 0)
for _ in 0..<100 {
// and use it later
random()
}
to avoid modulo bias, you could use
func random_uniform(bound: UInt64)->UInt64 {
var u: UInt64 = 0
let b: UInt64 = (u &- bound) % bound
repeat {
u = random()
} while u < b
return u % bound
}
in your case
let r_number = random_uniform(bound: 5000) // r_number from interval 0..<5000

random number generator with x,y coordinates as seed

I'm looking for a efficient, uniformly distributed PRNG, that generates one random integer for any whole number point in the plain with coordinates x and y as input to the function.
int rand(int x, int y)
It has to deliver the same random number each time you input the same coordinate.
Do you know of algorithms, that can be used for this kind of problem and also in higher dimensions?
I already tried to use normal PRNGs like a LFSR and merged the x,y coordinates together to use it as a seed value. Something like this.
int seed = x << 16 | (y & 0xFFFF)
The obvious problem with this method is that the seed is not iterated over multiple times but is initialized again for every x,y-point. This results in very ugly non random patterns if you visualize the results.
I already know of the method which uses shuffled permutation tables of some size like 256 and you get a random integer out of it like this.
int r = P[x + P[y & 255] & 255];
But I don't want to use this method because of the very limited range, restricted period length and high memory consumption.
Thanks for any helpful suggestions!
I found a very simple, fast and sufficient hash function based on the xxhash algorithm.
// cash stands for chaos hash :D
int cash(int x, int y){
int h = seed + x*374761393 + y*668265263; //all constants are prime
h = (h^(h >> 13))*1274126177;
return h^(h >> 16);
}
It is now much faster than the lookup table method I described above and it looks equally random. I don't know if the random properties are good compared to xxhash but as long as it looks random to the eye it's a fair solution for my purpose.
This is what it looks like with the pixel coordinates as input:
My approach
In general i think you want some hash-function (mostly all of these are designed to output randomness; avalanche-effect for RNGs, explicitly needed randomness for CryptoPRNGs). Compare with this thread.
The following code uses this approach:
1) build something hashable from your input
2) hash -> random-bytes (non-cryptographically)
3) somehow convert these random-bytes to your integer range (hard to do correctly/uniformly!)
The last step is done by this approach, which seems to be not that fast, but has strong theoretical guarantees (selected answer was used).
The hash-function i used supports seeds, which will be used in step 3!
import xxhash
import math
import numpy as np
import matplotlib.pyplot as plt
import time
def rng(a, b, maxExclN=100):
# preprocessing
bytes_needed = int(math.ceil(maxExclN / 256.0))
smallest_power_larger = 2
while smallest_power_larger < maxExclN:
smallest_power_larger *= 2
counter = 0
while True:
random_hash = xxhash.xxh32(str((a, b)).encode('utf-8'), seed=counter).digest()
random_integer = int.from_bytes(random_hash[:bytes_needed], byteorder='little')
if random_integer < 0:
counter += 1
continue # inefficient but safe; could be improved
random_integer = random_integer % smallest_power_larger
if random_integer < maxExclN:
return random_integer
else:
counter += 1
test_a = rng(3, 6)
test_b = rng(3, 9)
test_c = rng(3, 6)
print(test_a, test_b, test_c) # OUTPUT: 90 22 90
random_as = np.random.randint(100, size=1000000)
random_bs = np.random.randint(100, size=1000000)
start = time.time()
rands = [rng(*x) for x in zip(random_as, random_bs)]
end = time.time()
plt.hist(rands, bins=100)
plt.show()
print('needed secs: ', end-start)
# OUTPUT: needed secs: 15.056888341903687 -> 0,015056 per sample
# -> possibly heavy-dependence on range of output
Possible improvements
Add additional entropy from some source (urandom; could be put into str)
Make a class and initialize to memorize preprocessing (costly if done for each sampling)
Handle negative integers; maybe just use abs(x)
Assumptions:
the ouput-range is [0, N) -> just shift for others!
the output-range is smaller (bits) than the hash-output (may use xxh64)
Evaluation:
Check randomness/uniformity
Check if deterministic regarding input
You can use various randomness extractors to achieve your goals. There are at least two sources you can look for a solution.
Dodis et al, "Randomness Extraction and Key Derivation
Using the CBC, Cascade and HMAC Modes"
NIST SP800-90 "Recommendation for the Entropy Sources Used for
Random Bit Generation"
All in all, you can preferably use:
AES-CBC-MAC using a random key (may be fixed and reused)
HMAC, preferably with SHA2-512
SHA-family hash functions (SHA1, SHA256 etc); using a random final block (eg use a big random salt at the end)
Thus, you can concatenate your coordinates, get their bytes, add a random key (for AES and HMAC) or a salt for SHA and your output has an adequate entropy.
According to NIST, the output entropy relies on the input entropy:
Assuming you use SHA1; thus n = 160bits. Let's suppose that m = input_entropy (your coordinates' entropy)
if m >= 2n then output_entropy=n=160 bits
if 2n < m <= n then maximum output_entropy=m (but full entropy is not guaranteed).
if m < n then maximum output_entropy=m (this is your case)
see NIST sp800-90c (page 11)

Generate random numbers without repetition (or vanishing probability of repetition) without storing full list of past generated numbers?

I need to generate random numbers in a very large range, 128 bits integers, and I will generate a many many of them. I'll generate so many of them, that I cannot fit into memory a list of the numbers generated.
I also have the requirement that the generated numbers do not repeat, or at least that the probability of repetition is vanishingly small.
Is there an algorithm that does this?
Build a 128 bit linear congruential generator or linear feedback shift register generator. With properly chosen coefficients either of those will achieve full cycle, meaning no repeats until you've exhausted all outcomes.
Any full-period PRNG with a 128-bit state will do what you need in principle. Unfortunately many of these generators tend to produce only 32 or 64 bits per iteration while the rest of the state goes through a predictable permutation (LFSRs being the worst case, producing only 1 bit per iteration). Each 128-bit state is unique, but many of its bits would show a trivial relation to the previous state.
This can be overcome with tempering -- taking your questionable-quality PRNG state with a known-good period, and permuting it through a 1:1 transform to hide the not-so-random factors.
For example, borrowing from the example xorshift+ shown on Wikipedia:
static uint64_t s[2] = { 1, 0 };
void random128(uint64_t result[]) {
uint64_t x = s[0];
uint64_t y = s[1];
x ^= x << 23;
x ^= y ^ (x >> 17) ^ (y >> 26);
s[0] = y;
s[1] = x;
At this point we know that s[0] is just the old value of s[1], which would be a terrible PRNG if all 128 bits were exposed (normally only s[1] is exposed). To overcome this we permute the result to disguise that relationship (following the same principle as a feistel network to ensure that the transform is 1:1).
y += x * 1630144151483159999;
x ^= y >> 3;
result[0] = x;
result[1] = y;
}
This seems to be sufficient to pass diehard. So long as the original generator has full(ish) period, the whole generator should be full period too.
The logical conclusion to tempering a low-quality generator is to use AES-128 in counter mode. Simply run a counter from 0 to 2**128-1 (an extremely low-quality generator), and encrypt each value using AES-128 and a consistent key (an ideal temper) for your final output.
If you do this, don't get distracted by full cryptographic RNG requirements. Those involve re-seeding and consequently can produce the same number more than once (which is more random, but it's what you want to avoid).

Pseudo-random number generator for cluster environment

How can I generate independent pseudo-random numbers on a cluster, for Monte Carlo simulation for example? I can have many compute nodes (e.g. 100), and I need to generate millions of numbers on each node. I need a warranty that a PRN sequence on one node will not overlap the PRN sequence on another node.
I could generate all PRN on root node, then send them to other nodes. But it would be far too slow.
I could jump to a know distance in the sequence, on each node. But is there such an algorithm for Mersenne-Twister or for any other good PRNG, which can be done with a reasonable amount of time and memory?
I could use different generators on each node. But is it possible with good generators like Mersenne-Twister? How could it be done?
Any other though?
You should never use potentially overlapping random streams obtained from the same original stream. If you have not tested the resulting interleaved stream, you have no idea of its statistic quality.
Fortunately, Mersenne Twister (MT) will help you in your distribution task. Using its dedicated algorithm, called Dynamic Creator (DC hereafter), you can create independent random number generators that will produce highly independent random streams.
Each stream will be created on the node that will be using it. Basically, think of DC as a constructor in object oriented paradigm that creates different instances of MT. Each different instance is designed to produce highly independent random sequences.
You can find DC here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html
It's quite straightforward to use and you'll be able to fix different parameters such as the number of different MT instances you want to obtain or the period of these MTs. Depending on its input parameter, DC will runtime will change.
In addition of the README coming along with DC, take a look at the file example/new_example2.c in the DC archive. It shows example of calls to get independent sequences given a different input identifier, which is basically what you have to identify cluster jobs.
Finally, if you intend to learn more about how to use PRNGs in parallel or distributed environments, I suggest you read this scientific articles:
Practical distribution of random streams for stochastic High Performance Computing, David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010
Okay, answer #2 ;-)
I am going to say ... keep it simple. Just use a "short" seed to prime the MT (imagine that this seed is 232 bits for lack of better restriction). This assumes that the the short seed generates "sufficiently distributed" MT starting states (e.g. init_genrand in the code in my other answer, hopefully). This doesn't guarantee an equally distributed starting state but rather goes for "good enough", see below.
Each node will use it's own sequence of seeds which are pre-selected (either a list of random seeds which is transmitted or a formula like number_nodes * node_number * iteration). The important thing is that the initial "short" seed will never be re-used across nodes.
Each node will then use a MT PRNG initialized with this seed n times where n <<< MT_period / max_value_of_short_seed (TT800 is 2800-1 and MT19937 is 219937-1, so n can still be a very large number). After n times, the node moves onto the next seed in the chosen list.
While I do not (nor can I) provide a "guarantee" that no node will ever have a duplicate sequence at the same time (or at all), here is what AMD says about Using Different Seends: (Obviously the initial seeding algorithm plays a role).
Of the four methods for creating multiple streams described here, this is the least satisfactory ... For example, sequences generated from different starting points may overlap if the initial values are not far enough apart. The potential for overlapping sequences is reduced if the period of the generator being used is large. Although there is no guarantee of the independence of the sequences, due to its extremely large period, using the Mersenne Twister with random starting values is unlikely to lead to problems, especially if the number of sequences required is small ...
Happy coding.
I could jump to a know distance in the sequence, on each node. But is
there such an algorithm for Mersenne-Twister or for any other good
PRNG, which can be done with a reasonable amount of time and memory?
Yes, see http://theo.phys.sci.hiroshima-u.ac.jp/~ishikawa/PRNG/mt_stream_en.html. This is a excellent solution to obtaining independent random number streams. By making jumps that are larger than the number of random numbers needed from each stream to create the starts of each stream, the streams won't overlap.
Disclaimer: I am not sure what guarantee MT has in terms of cycle overlap when starting from an arbitrary "uint" (or x, where x is a smaller arbitrary but unique value) seed, but that may be worth looking into, as if there is a guarantee then it may be sufficient to just start each node on a different "uint" seed and the rest of this post becomes largely moot. (The cycle length/period of MT is staggering and dividing out UINT_MAX still leaves an incomprehensible -- except on paper -- number.)
Well, here goes my comments to answer...
I like approach #2 with a pre-generated set of states; the MT in each node is then initialized with a given starting state.
Only the initial states must be preserved, of course, and once this is generated these states can
Be re-used indefinitely, if requirements are met, or;
The next states can generated forward on an external fast box why the simulation is running or;
The nodes can report back the end-state (if reliable messaging, and if sequence is used at same rate among nodes, and meets requirements, etc)
Considering that MT is fast to generate, I would not recommend #3 from above as it's just complicated and has a number of strings attached. Option #1 is simple, but might not be dynamic enough.
Option #2 seems like a very good possibility. The server (a "fast machine", not necessarily a node) only needs to transmit the starting state of the next "unused sequence block" (say, one billion cycles) -- the node would use the generator for one billion cycles before asking for a new block. This would make it a hybrid of #1 in the post with very infrequent messaging.
On my system, a Core2 Duo, I can generate one billion random numbers in 17 seconds using the code provided below (it runs in LINQPad). I am not sure what MT variant this is.
void Main()
{
var mt = new MersenneTwister();
var start = DateTime.UtcNow;
var ct = 1000000000;
int n = 0;
for (var i = 0; i < ct; i++) {
n = mt.genrand_int32();
}
var end = DateTime.UtcNow;
(end - start).TotalSeconds.Dump();
}
// From ... and modified (stripped) to work in LINQPad.
// http://mathnet-numerics.googlecode.com/svn-history/r190/trunk/src/Numerics/Random/MersenneTwister.cs
// See link for license and copyright information.
public class MersenneTwister
{
private const uint _lower_mask = 0x7fffffff;
private const int _m = 397;
private const uint _matrix_a = 0x9908b0df;
private const int _n = 624;
private const double _reciprocal = 1.0/4294967295.0;
private const uint _upper_mask = 0x80000000;
private static readonly uint[] _mag01 = {0x0U, _matrix_a};
private readonly uint[] _mt = new uint[624];
private int mti = _n + 1;
public MersenneTwister() : this((int) DateTime.Now.Ticks)
{
}
public MersenneTwister(int seed)
{
init_genrand((uint)seed);
}
private void init_genrand(uint s)
{
_mt[0] = s & 0xffffffff;
for (mti = 1; mti < _n; mti++)
{
_mt[mti] = (1812433253*(_mt[mti - 1] ^ (_mt[mti - 1] >> 30)) + (uint) mti);
_mt[mti] &= 0xffffffff;
}
}
public uint genrand_int32()
{
uint y;
if (mti >= _n)
{
int kk;
if (mti == _n + 1) /* if init_genrand() has not been called, */
init_genrand(5489); /* a default initial seed is used */
for (kk = 0; kk < _n - _m; kk++)
{
y = (_mt[kk] & _upper_mask) | (_mt[kk + 1] & _lower_mask);
_mt[kk] = _mt[kk + _m] ^ (y >> 1) ^ _mag01[y & 0x1];
}
for (; kk < _n - 1; kk++)
{
y = (_mt[kk] & _upper_mask) | (_mt[kk + 1] & _lower_mask);
_mt[kk] = _mt[kk + (_m - _n)] ^ (y >> 1) ^ _mag01[y & 0x1];
}
y = (_mt[_n - 1] & _upper_mask) | (_mt[0] & _lower_mask);
_mt[_n - 1] = _mt[_m - 1] ^ (y >> 1) ^ _mag01[y & 0x1];
mti = 0;
}
y = _mt[mti++];
/* Tempering */
y ^= (y >> 11);
y ^= (y << 7) & 0x9d2c5680;
y ^= (y << 15) & 0xefc60000;
y ^= (y >> 18);
return y;
}
}
Happy coding.
TRNG is a random number generator built specifically with parallel cluster environments in mind (specifically it was built for the TINA super computer in Germany). Hence it is very eas to create independent random number streams and also generate non standard distributions. There is a tutorial on how to set it up here:
http://www.lindonslog.com/programming/parallel-random-number-generation-trng/

random permutation

I would like to genrate a random permutation as fast as possible.
The problem: The knuth shuffle which is O(n) involves generating n random numbers.
Since generating random numbers is quite expensive.
I would like to find an O(n) function involving a fixed O(1) amount of random numbers.
I realize that this question has been asked before, but I did not see any relevant answers.
Just to stress a point: I am not looking for anything less than O(n), just an algorithm involving less generation of random numbers.
Thanks
Create a 1-1 mapping of each permutation to a number from 1 to n! (n factorial). Generate a random number in 1 to n!, use the mapping, get the permutation.
For the mapping, perhaps this will be useful: http://en.wikipedia.org/wiki/Permutation#Numbering_permutations
Of course, this would get out of hand quickly, as n! can become really large soon.
Generating a random number takes long time you say? The implementation of Javas Random.nextInt is roughly
oldseed = seed;
nextseed = (oldseed * multiplier + addend) & mask;
return (int)(nextseed >>> (48 - bits));
Is that too much work to do for each element?
See https://doi.org/10.1145/3009909 for a careful analysis of the number of random bits required to generate a random permutation. (It's open-access, but it's not easy reading! Bottom line: if carefully implemented, all of the usual methods for generating random permutations are efficient in their use of random bits.)
And... if your goal is to generate a random permutation rapidly for large N, I'd suggest you try the MergeShuffle algorithm. An article published in 2015 claimed a factor-of-two speedup over Fisher-Yates in both parallel and sequential implementations, and a significant speedup in sequential computations over the other standard algorithm they tested (Rao-Sandelius).
An implementation of MergeShuffle (and of the usual Fisher-Yates and Rao-Sandelius algorithms) is available at https://github.com/axel-bacher/mergeshuffle. But caveat emptor! The authors are theoreticians, not software engineers. They have published their experimental code to github but aren't maintaining it. Someday, I imagine someone (perhaps you!) will add MergeShuffle to GSL. At present gsl_ran_shuffle() is an implementation of Fisher-Yates, see https://www.gnu.org/software/gsl/doc/html/randist.html?highlight=gsl_ran_shuffle.
Not what you asked exactly, but if provided random number generator doesn't satisfy you, may be you should try something different. Generally, pseudorandom number generation can be very simple.
Probably, best-known algorithm
http://en.wikipedia.org/wiki/Linear_congruential_generator
More
http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators
As other answers suggest, you can make a random integer in the range 0 to N! and use it to produce a shuffle. Although theoretically correct, this won't be faster in general since N! grows fast and you'll spend all your time doing bigint arithmetic.
If you want speed and you don't mind trading off some randomness, you will be much better off using a less good random number generator. A linear congruential generator (see http://en.wikipedia.org/wiki/Linear_congruential_generator) will give you a random number in a few cycles.
Usually there is no need in full-range of next random value, so to use exactly the same amount of randomness you can use next approach (which is almost like random(0,N!), I guess):
// ...
m = 1; // range of random buffer (single variant)
r = 0; // random buffer (number zero)
// ...
for(/* ... */) {
while (m < n) { // range of our buffer is too narrow for "n"
r = r*RAND_MAX + random(); // add another random to our random-buffer
m *= RAND_MAX; // update range of random-buffer
}
x = r % n; // pull-out next random with range "n"
r /= n; // remove it from random-buffer
m /= n; // fix range of random-buffer
// ...
}
P.S. of course there will be some errors related with division by value different from 2^n, but they will be distributed among resulted samples.
Generate N numbers (N < of the number of random number you need) before to do the computation, or store them in an array as data, with your slow but good random generator; then pick up a number simply incrementing an index into the array inside your computing loop; if you need different seeds, create multiple tables.
Are you sure that your mathematical and algorithmical approach to the problem is correct?
I hit exactly same problem where Fisher–Yates shuffle will be bottleneck in corner cases. But for me the real problem is brute force algorithm that doesn't scale well to all problems. Following story explains the problem and optimizations that I have come up with so far.
Dealing cards for 4 players
Number of possible deals is 96 bit number. That puts quite a stress for random number generator to avoid statical anomalies when selecting play plan from generated sample set of deals. I choose to use 2xmt19937_64 seeded from /dev/random because of the long period and heavy advertisement in web that it is good for scientific simulations.
Simple approach is to use Fisher–Yates shuffle to generate deals and filter out deals that don't match already collected information. Knuth shuffle takes ~1400 CPU cycles per deal mostly because I have to generate 51 random numbers and swap 51 times entries in the table.
That doesn't matter for normal cases where I would only need to generate 10000-100000 deals in 7 minutes. But there is extreme cases when filters may select only very small subset of hands requiring huge number of deals to be generated.
Using single number for multiple cards
When profiling with callgrind (valgrind) I noticed that main slow down was C++ random number generator (after switching away from std::uniform_int_distribution that was first bottleneck).
Then I came up with idea that I can use single random number for multiple cards. The idea is to use least significant information from the number first and then erase that information.
int number = uniform_rng(0, 52*51*50*49);
int card1 = number % 52;
number /= 52;
int cards2 = number % 51;
number /= 51;
......
Of course that is only minor optimization because generation is still O(N).
Generation using bit permutations
Next idea was exactly solution asked in here but I ended up still with O(N) but with larger cost than original shuffle. But lets look into solution and why it fails so miserably.
I decided to use idea Dealing All the Deals by John Christman
void Deal::generate()
{
// 52:26 split, 52!/(26!)**2 = 495,918,532,948,1041
max = 495918532948104LU;
partner = uniform_rng(eng1, max);
// 2x 26:13 splits, (26!)**2/(13!)**2 = 10,400,600**2
max = 10400600LU*10400600LU;
hands = uniform_rng(eng2, max);
// Create 104 bit presentation of deal (2 bits per card)
select_deal(id, partner, hands);
}
So far good and pretty good looking but select_deal implementation is PITA.
void select_deal(Id &new_id, uint64_t partner, uint64_t hands)
{
unsigned idx;
unsigned e, n, ns = 26;
e = n = 13;
// Figure out partnership who owns which card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx > 0; ) {
uint64_t cut = ncr(idx - 1, ns);
if (partner >= cut) {
partner -= cut;
// Figure out if N or S holds the card
ns--;
cut = ncr(ns, n) * 10400600LU;
if (hands > cut) {
hands -= cut;
n--;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
} else
new_id[idx%NUM_SUITS + NUM_SUITS] |= 1 << (idx/NUM_SUITS);
idx--;
}
unsigned ew = 26;
// Figure out if E or W holds a card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx-- > 0; ) {
if (new_id[idx%NUM_SUITS + NUM_SUITS] & (1 << (idx/NUM_SUITS))) {
uint64_t cut = ncr(--ew, e);
if (hands >= cut) {
hands -= cut;
e--;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
}
}
}
Now that I had the O(N) permutation solution done to prove algorithm could work I started searching for O(1) mapping from random number to bit permutation. Too bad it looks like only solution would be using huge lookup tables that would kill CPU caches. That doesn't sound good idea for AI that will be using very large amount of caches for double dummy analyzer.
Mathematical solution
After all hard work to figure out how to generate random bit permutations I decided go back to maths. It is entirely possible to apply filters before dealing cards. That requires splitting deals to manageable number of layered sets and selecting between sets based on their relative probabilities after filtering out impossible sets.
I don't yet have code ready for that to tests how much cycles I'm wasting in common case where filter is selecting major part of deal. But I believe this approach gives the most stable generation performance keeping the cost less than 0.1%.
Generate a 32 bit integer. For each index i (maybe only up to half the number of elements in the array), if bit i % 32 is 1, swap i with n - i - 1.
Of course, this might not be random enough for your purposes. You could probably improve this by not swapping with n - i - 1, but rather by another function applied to n and i that gives better distribution. You could even use two functions: one for when the bit is 0 and another for when it's 1.

Resources