Generate random numbers without repetition (or vanishing probability of repetition) without storing full list of past generated numbers? - random

I need to generate random numbers in a very large range, 128 bits integers, and I will generate a many many of them. I'll generate so many of them, that I cannot fit into memory a list of the numbers generated.
I also have the requirement that the generated numbers do not repeat, or at least that the probability of repetition is vanishingly small.
Is there an algorithm that does this?

Build a 128 bit linear congruential generator or linear feedback shift register generator. With properly chosen coefficients either of those will achieve full cycle, meaning no repeats until you've exhausted all outcomes.

Any full-period PRNG with a 128-bit state will do what you need in principle. Unfortunately many of these generators tend to produce only 32 or 64 bits per iteration while the rest of the state goes through a predictable permutation (LFSRs being the worst case, producing only 1 bit per iteration). Each 128-bit state is unique, but many of its bits would show a trivial relation to the previous state.
This can be overcome with tempering -- taking your questionable-quality PRNG state with a known-good period, and permuting it through a 1:1 transform to hide the not-so-random factors.
For example, borrowing from the example xorshift+ shown on Wikipedia:
static uint64_t s[2] = { 1, 0 };
void random128(uint64_t result[]) {
uint64_t x = s[0];
uint64_t y = s[1];
x ^= x << 23;
x ^= y ^ (x >> 17) ^ (y >> 26);
s[0] = y;
s[1] = x;
At this point we know that s[0] is just the old value of s[1], which would be a terrible PRNG if all 128 bits were exposed (normally only s[1] is exposed). To overcome this we permute the result to disguise that relationship (following the same principle as a feistel network to ensure that the transform is 1:1).
y += x * 1630144151483159999;
x ^= y >> 3;
result[0] = x;
result[1] = y;
}
This seems to be sufficient to pass diehard. So long as the original generator has full(ish) period, the whole generator should be full period too.
The logical conclusion to tempering a low-quality generator is to use AES-128 in counter mode. Simply run a counter from 0 to 2**128-1 (an extremely low-quality generator), and encrypt each value using AES-128 and a consistent key (an ideal temper) for your final output.
If you do this, don't get distracted by full cryptographic RNG requirements. Those involve re-seeding and consequently can produce the same number more than once (which is more random, but it's what you want to avoid).

Related

low-memory pseudo-random shuffle for fixed arbitrary length array

Context: I'm writing an external SRAM tester for a microcontroller-based embedded system. No security, no cryptography involved. For reproducible access to "as-non-consecutive-as-possible" memory locations, I'm looking for an implementation of
y = shuffle(x), taking and returning an integer between 0 and a fixed N = 2^16 - 1
It may not use a lot of RAM itself, as e.g. a naive shuffled array of addresses would. On the upside, it is allowed to be slow. No strict definition of non-consecutive - it's about bumping address lines up and down, hunting for soldering and other faults of the printed circuit board. Suggestions?
I found so far Knuth shuffle a.k.a. Fisher-Yates shuffle.
A late EDIT: I think I'm looking to maximize the Hamming distance. "Anti-Grey code"?
I agree with Jim Mischel that xorshift is a good candidate for a fast non-crypto PRNG. Coefficients need to be carefully chosen to achieve full cycle, which includes all values except 0.
Since you wired the problem to 16 bits in your question, here's a 16 bit implementation written in Ruby, but easily ported to anything else you like:
# << is shift left, >> is shift right, ^ is bitwise XOR, & is bitwise AND
MASK_16 = (1 << 16) - 1
def xorshift(x)
x ^= x << 7 & MASK_16
x ^= x >> 9
x ^= x << 8 & MASK_16
return x
end
counter = 0
x = 1
y = 1
# Floyd's "Tortoise and Hare" cycle-finding algorithm shows
# the generator to be full cycle for 16 bits, excluding zero.
loop do
counter += 1
x = xorshift(x)
y = xorshift(xorshift(y))
break if x == y
end
puts counter # => 65535
I'd suggest implementing Xorshift, or something similar. They are fast, require little memory, can be constructed to have a long period, and satisfy many tests of randomness.
Another way to do it would be to uniquely map every number in the range 0..(n-1) to another number within that range. That's easy enough to do using a modular multiplicative inverse, as I describe in this answer.

random number generator with x,y coordinates as seed

I'm looking for a efficient, uniformly distributed PRNG, that generates one random integer for any whole number point in the plain with coordinates x and y as input to the function.
int rand(int x, int y)
It has to deliver the same random number each time you input the same coordinate.
Do you know of algorithms, that can be used for this kind of problem and also in higher dimensions?
I already tried to use normal PRNGs like a LFSR and merged the x,y coordinates together to use it as a seed value. Something like this.
int seed = x << 16 | (y & 0xFFFF)
The obvious problem with this method is that the seed is not iterated over multiple times but is initialized again for every x,y-point. This results in very ugly non random patterns if you visualize the results.
I already know of the method which uses shuffled permutation tables of some size like 256 and you get a random integer out of it like this.
int r = P[x + P[y & 255] & 255];
But I don't want to use this method because of the very limited range, restricted period length and high memory consumption.
Thanks for any helpful suggestions!
I found a very simple, fast and sufficient hash function based on the xxhash algorithm.
// cash stands for chaos hash :D
int cash(int x, int y){
int h = seed + x*374761393 + y*668265263; //all constants are prime
h = (h^(h >> 13))*1274126177;
return h^(h >> 16);
}
It is now much faster than the lookup table method I described above and it looks equally random. I don't know if the random properties are good compared to xxhash but as long as it looks random to the eye it's a fair solution for my purpose.
This is what it looks like with the pixel coordinates as input:
My approach
In general i think you want some hash-function (mostly all of these are designed to output randomness; avalanche-effect for RNGs, explicitly needed randomness for CryptoPRNGs). Compare with this thread.
The following code uses this approach:
1) build something hashable from your input
2) hash -> random-bytes (non-cryptographically)
3) somehow convert these random-bytes to your integer range (hard to do correctly/uniformly!)
The last step is done by this approach, which seems to be not that fast, but has strong theoretical guarantees (selected answer was used).
The hash-function i used supports seeds, which will be used in step 3!
import xxhash
import math
import numpy as np
import matplotlib.pyplot as plt
import time
def rng(a, b, maxExclN=100):
# preprocessing
bytes_needed = int(math.ceil(maxExclN / 256.0))
smallest_power_larger = 2
while smallest_power_larger < maxExclN:
smallest_power_larger *= 2
counter = 0
while True:
random_hash = xxhash.xxh32(str((a, b)).encode('utf-8'), seed=counter).digest()
random_integer = int.from_bytes(random_hash[:bytes_needed], byteorder='little')
if random_integer < 0:
counter += 1
continue # inefficient but safe; could be improved
random_integer = random_integer % smallest_power_larger
if random_integer < maxExclN:
return random_integer
else:
counter += 1
test_a = rng(3, 6)
test_b = rng(3, 9)
test_c = rng(3, 6)
print(test_a, test_b, test_c) # OUTPUT: 90 22 90
random_as = np.random.randint(100, size=1000000)
random_bs = np.random.randint(100, size=1000000)
start = time.time()
rands = [rng(*x) for x in zip(random_as, random_bs)]
end = time.time()
plt.hist(rands, bins=100)
plt.show()
print('needed secs: ', end-start)
# OUTPUT: needed secs: 15.056888341903687 -> 0,015056 per sample
# -> possibly heavy-dependence on range of output
Possible improvements
Add additional entropy from some source (urandom; could be put into str)
Make a class and initialize to memorize preprocessing (costly if done for each sampling)
Handle negative integers; maybe just use abs(x)
Assumptions:
the ouput-range is [0, N) -> just shift for others!
the output-range is smaller (bits) than the hash-output (may use xxh64)
Evaluation:
Check randomness/uniformity
Check if deterministic regarding input
You can use various randomness extractors to achieve your goals. There are at least two sources you can look for a solution.
Dodis et al, "Randomness Extraction and Key Derivation
Using the CBC, Cascade and HMAC Modes"
NIST SP800-90 "Recommendation for the Entropy Sources Used for
Random Bit Generation"
All in all, you can preferably use:
AES-CBC-MAC using a random key (may be fixed and reused)
HMAC, preferably with SHA2-512
SHA-family hash functions (SHA1, SHA256 etc); using a random final block (eg use a big random salt at the end)
Thus, you can concatenate your coordinates, get their bytes, add a random key (for AES and HMAC) or a salt for SHA and your output has an adequate entropy.
According to NIST, the output entropy relies on the input entropy:
Assuming you use SHA1; thus n = 160bits. Let's suppose that m = input_entropy (your coordinates' entropy)
if m >= 2n then output_entropy=n=160 bits
if 2n < m <= n then maximum output_entropy=m (but full entropy is not guaranteed).
if m < n then maximum output_entropy=m (this is your case)
see NIST sp800-90c (page 11)

Single-use PRNG seedable with consecutive seeds

I need to make a pseudorandom number generator with a particular twist. Instead of generating numbers serially by using the seed from the previous generation for the new generation of a random number as it is usually done, I need a sequence of pseudorandom numbers generated in parallel from a consecutive sequence of seeds.
It would work like this, executed in parallel, each thread producing only a single number, with nothing shared or stored between threads:
thread #0: my_prng(1000) -> 1455191155 -> array[0]
thread #1: my_prng(1001) -> 2432152707 -> array[1]
thread #2: my_prng(1002) -> 185188134 -> array[2]
It's for generating image noise in parallel from a GPU (using OpenCL) so:
it should be run fast enough, as in using just a few operations
it shouldn't be cryptographically secure, it's just for graphics, it only needs to look about random
low periods are just fine, even 2^24 would do
it only needs to make 32-bit integers
it shouldn't use any memory, no buffers, and not store anything in a variable other than the result (the resulting new seed if there were one would go unused anyway)
it cannot rely on calls to rand() as it's not available in OpenCL or rely on any library
it shouldn't loop to use serialness (for instance looping 60 times just to make the 60th number)
it literally just needs to make a good pseudorandom number from a seed like 1000 that doesn't share a pattern with numbers made from adjacent seeds
None of the typical PRNG algorithms that I've tried could produce sequences from adjacent seeds that looked even remotely random, they're not meant to be seeded and used that way.
If you want 32bit->32bit RNG, then period would be 232, and with 224 in each stream you're limited to 28 streams.
Having said that, you might want to look into LCG RNG with following twist: implement fast skip-ahead as described in F. Brown, "Random Number Generation with Arbitrary Stride," Trans. Am. Nucl. Soc. (Nov. 1994).
Thus, you start with seed 1 and each consequent seed will just skip by 224 along the line
int32_t stream = 1 << 24;
rng.set_seed(int32_t seed) {
rng.skip_ahead(seed*stream)
}
Thus, you'll guarantee to get non-overlapping streams covering your whole period
Code, which implements idea for 63bit generator is here
UPDATE
F.Brown postulated skip-ahead is logarithmic in N, O(log2N).
Following Severin Pappadeux's answer I looked into fast skipping of LCGs and found that it is actually very simple to adapt the MINSTD algorithm for this using a simple modular exponentiation.
With MINSTD being minstd(n+1) = 16807*minstd(n) mod 2147483647 we get minstd(n+1) = 16807^n mod 2147483647.
Here's my resulting algorithm in OpenCL:
int pow_mod(int base, uint expon, uint mod)
{
int x = 1, power = base % mod;
for (; expon > 0; expon >>= 1)
{
if (expon & 1)
x = (x * power) % mod;
power = (power * power) % mod;
}
return x;
}
uint rand16(uint pos)
{
return pow_mod(16807, pos, 2147483647) >> 13 & 0xFFFF;
}
uint rand32(uint pos)
{
return rand16(pos) << 16 | rand16(pos + 0x80000000);
}
MINSTD produces 31-bits (but no 2^31-1 value), however I found bad patterns in the 11 least significant bits, so I take 16 of the 20 good bits and make a good 32-bit random number out two of those.
pos would be a seed plus an offset, representing a position in the sequence of MINSTD outputs.

Efficiently Get Random Numbers in Range on GPU

Given a uniformly distributed random number generator in the range [0, 2^64), is there any efficient way (on a GPU) to build a random number generator for the range [0, k) for some k < 2^64?
Some solutions that don't work:
// not uniformly distributed in [0, k)
myRand(rng, k) = rng() % k;
// way too much branching to run efficiently on a gpu
myRand(rng, k) =
uint64_t ret;
while((ret = rng() & (nextPow2(k)-1)) >= k);
return ret;
// only 53 bits of random data, not 64. Also I
// have no idea how to reason about how "uniform"
// this distribution is.
myRand(doubleRng, k) =
double r = doubleRng(); // generates a random number in [0, 1)
return (uint64_t)floor(r*k);
I'd be willing to compromise non-uniformity if the difference is sufficiently small (say, within 1/2^64).
There are only two options: do the modulus (or the floating point) and settle for non-uniformity, or do rejection sampling with a loop. There really isn't a third option. Which one is better depends on your application.
If your k is typically very small (say, you're shuffling cards so k is on the order of 100), then the non-uniformity is so small that it's probably OK, even at 32 bits. At 64 bits, a k on the order of millions is still going to give you a non-uniformity vanishingly small. No, it won't be on the order of 1/2^64, but I can't imagine a real-world application where a non-uniformity on the order of 1/2^20 is noticeable. When I wrote the test suite for my RNG library, I deliberately ran it against a known bad mod implementation and it had a really hard time detecting the error even at 32 bits.
If you really have to be perfectly uniform, then you're just going to have to sample and reject. This can be done pretty fast, and you can even get rid of the division (calculate that nextPow2() outside the rejection loop--that's how I do it in ojrandlib). FYI, the fastest way to do the next-power-of-two mask is this:
mask = k - 1;
mask |= mask >> 1;
mask |= mask >> 2;
mask |= mask >> 4;
mask |= mask >> 8;
mask |= mask >> 16;
mask |= mask >> 32;
If you have a function that returns 53 bits of random data, but you need 64, call it twice, use the bottom 32 bits of the first call for the top 32 bits of your result, and the bottom 32 bits of the second call for the bottom 32 bits of your result. If your original function was uniform, this one is too.

random permutation

I would like to genrate a random permutation as fast as possible.
The problem: The knuth shuffle which is O(n) involves generating n random numbers.
Since generating random numbers is quite expensive.
I would like to find an O(n) function involving a fixed O(1) amount of random numbers.
I realize that this question has been asked before, but I did not see any relevant answers.
Just to stress a point: I am not looking for anything less than O(n), just an algorithm involving less generation of random numbers.
Thanks
Create a 1-1 mapping of each permutation to a number from 1 to n! (n factorial). Generate a random number in 1 to n!, use the mapping, get the permutation.
For the mapping, perhaps this will be useful: http://en.wikipedia.org/wiki/Permutation#Numbering_permutations
Of course, this would get out of hand quickly, as n! can become really large soon.
Generating a random number takes long time you say? The implementation of Javas Random.nextInt is roughly
oldseed = seed;
nextseed = (oldseed * multiplier + addend) & mask;
return (int)(nextseed >>> (48 - bits));
Is that too much work to do for each element?
See https://doi.org/10.1145/3009909 for a careful analysis of the number of random bits required to generate a random permutation. (It's open-access, but it's not easy reading! Bottom line: if carefully implemented, all of the usual methods for generating random permutations are efficient in their use of random bits.)
And... if your goal is to generate a random permutation rapidly for large N, I'd suggest you try the MergeShuffle algorithm. An article published in 2015 claimed a factor-of-two speedup over Fisher-Yates in both parallel and sequential implementations, and a significant speedup in sequential computations over the other standard algorithm they tested (Rao-Sandelius).
An implementation of MergeShuffle (and of the usual Fisher-Yates and Rao-Sandelius algorithms) is available at https://github.com/axel-bacher/mergeshuffle. But caveat emptor! The authors are theoreticians, not software engineers. They have published their experimental code to github but aren't maintaining it. Someday, I imagine someone (perhaps you!) will add MergeShuffle to GSL. At present gsl_ran_shuffle() is an implementation of Fisher-Yates, see https://www.gnu.org/software/gsl/doc/html/randist.html?highlight=gsl_ran_shuffle.
Not what you asked exactly, but if provided random number generator doesn't satisfy you, may be you should try something different. Generally, pseudorandom number generation can be very simple.
Probably, best-known algorithm
http://en.wikipedia.org/wiki/Linear_congruential_generator
More
http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators
As other answers suggest, you can make a random integer in the range 0 to N! and use it to produce a shuffle. Although theoretically correct, this won't be faster in general since N! grows fast and you'll spend all your time doing bigint arithmetic.
If you want speed and you don't mind trading off some randomness, you will be much better off using a less good random number generator. A linear congruential generator (see http://en.wikipedia.org/wiki/Linear_congruential_generator) will give you a random number in a few cycles.
Usually there is no need in full-range of next random value, so to use exactly the same amount of randomness you can use next approach (which is almost like random(0,N!), I guess):
// ...
m = 1; // range of random buffer (single variant)
r = 0; // random buffer (number zero)
// ...
for(/* ... */) {
while (m < n) { // range of our buffer is too narrow for "n"
r = r*RAND_MAX + random(); // add another random to our random-buffer
m *= RAND_MAX; // update range of random-buffer
}
x = r % n; // pull-out next random with range "n"
r /= n; // remove it from random-buffer
m /= n; // fix range of random-buffer
// ...
}
P.S. of course there will be some errors related with division by value different from 2^n, but they will be distributed among resulted samples.
Generate N numbers (N < of the number of random number you need) before to do the computation, or store them in an array as data, with your slow but good random generator; then pick up a number simply incrementing an index into the array inside your computing loop; if you need different seeds, create multiple tables.
Are you sure that your mathematical and algorithmical approach to the problem is correct?
I hit exactly same problem where Fisher–Yates shuffle will be bottleneck in corner cases. But for me the real problem is brute force algorithm that doesn't scale well to all problems. Following story explains the problem and optimizations that I have come up with so far.
Dealing cards for 4 players
Number of possible deals is 96 bit number. That puts quite a stress for random number generator to avoid statical anomalies when selecting play plan from generated sample set of deals. I choose to use 2xmt19937_64 seeded from /dev/random because of the long period and heavy advertisement in web that it is good for scientific simulations.
Simple approach is to use Fisher–Yates shuffle to generate deals and filter out deals that don't match already collected information. Knuth shuffle takes ~1400 CPU cycles per deal mostly because I have to generate 51 random numbers and swap 51 times entries in the table.
That doesn't matter for normal cases where I would only need to generate 10000-100000 deals in 7 minutes. But there is extreme cases when filters may select only very small subset of hands requiring huge number of deals to be generated.
Using single number for multiple cards
When profiling with callgrind (valgrind) I noticed that main slow down was C++ random number generator (after switching away from std::uniform_int_distribution that was first bottleneck).
Then I came up with idea that I can use single random number for multiple cards. The idea is to use least significant information from the number first and then erase that information.
int number = uniform_rng(0, 52*51*50*49);
int card1 = number % 52;
number /= 52;
int cards2 = number % 51;
number /= 51;
......
Of course that is only minor optimization because generation is still O(N).
Generation using bit permutations
Next idea was exactly solution asked in here but I ended up still with O(N) but with larger cost than original shuffle. But lets look into solution and why it fails so miserably.
I decided to use idea Dealing All the Deals by John Christman
void Deal::generate()
{
// 52:26 split, 52!/(26!)**2 = 495,918,532,948,1041
max = 495918532948104LU;
partner = uniform_rng(eng1, max);
// 2x 26:13 splits, (26!)**2/(13!)**2 = 10,400,600**2
max = 10400600LU*10400600LU;
hands = uniform_rng(eng2, max);
// Create 104 bit presentation of deal (2 bits per card)
select_deal(id, partner, hands);
}
So far good and pretty good looking but select_deal implementation is PITA.
void select_deal(Id &new_id, uint64_t partner, uint64_t hands)
{
unsigned idx;
unsigned e, n, ns = 26;
e = n = 13;
// Figure out partnership who owns which card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx > 0; ) {
uint64_t cut = ncr(idx - 1, ns);
if (partner >= cut) {
partner -= cut;
// Figure out if N or S holds the card
ns--;
cut = ncr(ns, n) * 10400600LU;
if (hands > cut) {
hands -= cut;
n--;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
} else
new_id[idx%NUM_SUITS + NUM_SUITS] |= 1 << (idx/NUM_SUITS);
idx--;
}
unsigned ew = 26;
// Figure out if E or W holds a card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx-- > 0; ) {
if (new_id[idx%NUM_SUITS + NUM_SUITS] & (1 << (idx/NUM_SUITS))) {
uint64_t cut = ncr(--ew, e);
if (hands >= cut) {
hands -= cut;
e--;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
}
}
}
Now that I had the O(N) permutation solution done to prove algorithm could work I started searching for O(1) mapping from random number to bit permutation. Too bad it looks like only solution would be using huge lookup tables that would kill CPU caches. That doesn't sound good idea for AI that will be using very large amount of caches for double dummy analyzer.
Mathematical solution
After all hard work to figure out how to generate random bit permutations I decided go back to maths. It is entirely possible to apply filters before dealing cards. That requires splitting deals to manageable number of layered sets and selecting between sets based on their relative probabilities after filtering out impossible sets.
I don't yet have code ready for that to tests how much cycles I'm wasting in common case where filter is selecting major part of deal. But I believe this approach gives the most stable generation performance keeping the cost less than 0.1%.
Generate a 32 bit integer. For each index i (maybe only up to half the number of elements in the array), if bit i % 32 is 1, swap i with n - i - 1.
Of course, this might not be random enough for your purposes. You could probably improve this by not swapping with n - i - 1, but rather by another function applied to n and i that gives better distribution. You could even use two functions: one for when the bit is 0 and another for when it's 1.

Resources