For an implementation of Perlin noise, I need to select a vector from a static list of n vectors for each integer coordinate in 3D space. This boils down to generating a pseudo random number in 1..n from four signed integer values x, y, z and seed.
unsigned int pseudo_random_number(int x, int y, int z, int seed);
The algorithm should be stateless, i.e., return the same number each time it is called with the same input values.
An existing Perlin noise implementation I looked at multiplies each integer with a large prime, adds the results, does some bit manipulation on it and takes the reminder of a division by n. I don't want to just copy this because I don't understand a few things about it:
How are the primes selected?
Why is the additional bit manipulation done?
How do I know if this is „sufficiently pseudo-random“ to generate a visually pleasing result?
I looked for explanations of how a PRNG works but I couldn't find anything about multiple input values.
If you have arbitrary precision pseudo-random number generation then you can just concatenate the four inputs (x,y,z,seed) and call your pseudo-random number generator function on this input to get the "next" pseudo-random number which will serve as your random number. (and then take the appropriate number of high bits if you want to have a random number between 1 and n).
The implementation you mentioned uses the fact that different large prime numbers, modulo n, produce essentially uncorrelated results (modulo n) when multiplied with input integers. Of course you need your input integers to not all have a universal common divisor with n for this to work. This is why the additional bit manipulation is done, so that if all of your input integers are divisible by k and n is divisible by k, the remainder modulo n will not automatically be divisible by k as well. At any rate, people have put a lot of thought into established pseudo-random number generators so my advice to you is that you trust that they considered all the potential issues and that their generator is "good" if there is a large crowd that uses it without complaints.
Related
I want a simple (non-cryptographic) random number generation algorithm where I can freely choose the period.
One candidate would be a special instance of LCG:
X(n+1) = (aX(n)+c) mod m (m,c relatively prime; (a-1) divisible by all prime factors of m and also divisible by 4 if m is).
This has period m and does not restrict possible values of m.
I intend to use this RNG to create a permutation of an array by generating indices into it. I tried the LCG and it might be OK. However, it may not be "random enough" in that distances between adjacent outputs have very few possible values (i.e, plotting x(n) vs n gives a wrapped line). The arrays I want to index into have some structure that has to do with this distance and I want to avoid potential issues with this.
Of course, I could use any good PRNG to shuffle (using e.g. Fisher–Yates) an array [1,..., m]. But I don't want to have to store this array of indices. Is there some way to capture the permuted indices directly in an algorithm?
I don't really mind the method ending up biased w.r.t choice of RNG seed. Only the period matters and the permuted sequence (for a given seed) being reasonably random.
Encryption is a one-to-one operation. If you encrypt a range of numbers, you will get the same count of apparently random numbers back. In this case the period will be the size of the chosen range. So for a period of 20, encrypt the numbers 0..19.
If you want the output numbers to be in a specific range, then pick a block cipher with an appropriately sized block and use Format Preserving Encryption if needed, as #David Eisenstat suggests.
It is not difficult to set up a cipher with almost any reasonable block size, so long as it is an even number of bits, using the Feistel structure. If you don't require cryptographic security then four or six Feistel rounds should give you enough randomness.
Changing the encryption key will give you a different ordering of the numbers.
Let's say that we have a random number generator that can generate random 32 or 64 bit integers (like rand.Rand in the standard library)
Generating a random int64 in a given range [a,b] is fairly easy:
rand.Seed(time.Now().UnixNano())
n := rand.Int63n(b-a) + a
Is it possible to generate random 128 bit decimal (as defined in specification IEEE 754-2008) in a given range from a combination of 32 or 64 bit random integers?
It is possible, but the solution is far from trivial. For a correct solution, there are several things to consider.
For one thing, values with exponent E are 10 times more likely than values with exponent E - 1.
Other issues include subnormal numbers and ranges that straddle zero.
I am aware of the Rademacher Floating-Point Library, which tackled this problem for binary floating-point numbers, but the solution there is complicated and its author has not yet written up how his algorithm works.
EDIT (May 11):
I have now specified an algorithm for generating random "uniform" floating-point numbers—
In any range,
with full coverage, and
regardless of the digit base (such as binary or decimal).
Possible, but by no means easy. Here is a sketch of a solution that might be acceptable — writing and debugging it would probably be at least a day of concerted effort.
Let min and max be primitive.Decimal128 objects from go.mongodb.org/mongo-driver/bson. Let MAXBITS be a multiple of 32; 128 is likely to be adequate.
Get the significand (as big.Int) and exponent (as int) of min and max using the BigInt method.
Align min and max so that they have the same exponent. As far as possible, left-justify the value with the larger exponent by decreasing its exponent and adding a corresponding number of zeroes to the right side of its significand. If this would cause the absolute value of the significand to become >= 2**(MAXBITS-1), then either
(a) Right-shift the value with the smaller exponent by dropping digits from the right side of its significand and increasing its exponent, causing precision loss.
(b) Dynamically increase MAXBITS.
(c) Throw an error.
At this point both exponents will be the same, and both significands will be aligned big integers. Set aside the exponents for now, and let range (a new big.Int) be maxSignificand - minSignificand. It will be between 0 and 2**MAXBITS.
Turn range into MAXBITS/32 uint32s using the Bytes or DivMod methods, whatever is easier.
If the highest word of range is equal to math.MaxUint32 then set a flag limit to false, otherwise true.
For n from 0 to MAXBITS/32:
if limit is true, use rand.Int63n (!, not rand.Int31n or rand.Uint32) to generate a value between 0 and the nth word of range, inclusive, cast it to uint32, and store it as the nth word of the output. If the value generated is equal to the nth word of range (i.e. if we generated the maximum possible random value for this word) then let limit remain true, otherwise set it false.
If limit is false, use rand.Uint32 to generate the nth word of the output. limit remains false regardless of the generated value.
Combine the generated words into a big.Int by building a []byte and using big/Int.SetBytes or multiplication and addition, as convenient.
Add the generated value to minSignificand to obtain the significand of the result.
Use ParseDecimal128FromBigInt with the result significand and the exponent from steps 2-3 to obtain the result.
The heart of the algorithm is step 6, which generates a uniform random unsigned integer of arbitrary length 32 bits at a time. The alignment in step 2 reduces the problem from a floating-point to an integer one, and the subtraction in step 3 reduces it to an unsigned one, so that we only have to think about one bound instead of 2. The limit flag records whether we're still dealing with that bound, or whether we've already narrowed the result down to an interval that doesn't include it.
Caveats:
I haven't written this, let alone tested it. I may have gotten it quite wrong. A sanity check by someone who does more numerical computation work than me would be welcome.
Generating numbers across a large dynamic range (including crossing zero) will lose some precision and omit some possible output values with smaller exponents unless a ludicrously large MAXBITS is used; however, 128 bits should give a result at least as good as a naive algorithm implemented in terms of decimal128.
The performance is probably pretty bad.
Go has a large number package that can do arbitrary length integers: https://golang.org/pkg/math/big/
It has a pseudo random number generator https://golang.org/pkg/math/big/#Int.Rand, and the crypto package also has https://golang.org/pkg/crypto/rand/#Int
You'd want to specify the max using https://golang.org/pkg/math/big/#Int.Exp as 2^128.
Can't speak to performance, though, or whether this is compliant if the IEEE standard, but large random numbers like what you'd use for UUIDs are possible.
It depends how many values you want to generate. If it's enough to have no more 10^34 values in a specified range - it's quite simple.
As I see the problem, a random value in the range min..max can be calculated as random(0..1)*(max-min)+min
Look like we need to generate only decimal128 value in range 0..1. So it's a random value in range 0..10^34-1 with exponent -34. This value can be generated with a golang standard random package.
To multiply, add and substruct float128 values can be used golang math/big package with values normalization.
This is definitely what you are looking for.
I want to create a random value between 0 and n. As source I receive a random 256-bit value from an oracle.
Just doing
source % (n + 1)
introduces modulo bias. Solutions to modulo bias I've seen are doing a while loop until the value lies within the largest multiple of n that fits the source range - but in my case I can draw a random source value only once.
my n is much smaller though, let's say smaller than 2^32. Is there any way I can generate a random value that has no modulo bias in this scenario?
Under the assumption that your random number generator is uniform, there is no way to map it to an unbiased generator on 0 to n inclusive unless n+1 is a divisor of 2^256. This is because it is impossible to partition a set of size 2^256 into n+1 equally probable subsets unless n+1 divides 2^256. Having said that, if n is much smaller than 2^256 the departure from uniformity in doing something like int(source * (n+1)/(2^256 - 1) will be negligible.
Assuming I can generate random bytes of data, how can I use that to choose an element out of an array of n elements?
If I have 256 elements I can generate 1 byte of entropy (8 bits), and then use that to pick my element simply be converting it to an integer.
If I have 2 elements I can generate 1 byte, discard 7 bits and use the remaining bit to select my element.
But what if I have 3 elements? 1 bit is too few and 2 is too many. How would I randomly select 1 of the 3 elements with equal probability?
Here is a survey of algorithms to generate uniform random integers from random bits.
J. Lumbroso's Fast Dice Roller in "Optimal Discrete Uniform Generation from Coin Flips, and Applications, 2013. See also the implementation at the end of this answer.
The Math Forum, 2004. See also "Bit Recycling for Scaling Random Number Generators".
D. Lemire, "A Fast Alternative to the Modulo Reduction".
M. O'Neill, "Efficiently Generating a Number in a Range".
Some of these algorithms are "constant-time", others are unbiased, and still others are "optimal" in terms of the number of random bits it uses on average. In the rest of this answer we will assume we have a "true" random generator that can produce unbiased and independent random bits.
For further discussion, see the following answer of mine:
How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?
You can generate the proper distribution by simply truncating into the necessary range. If you have N elements then simply generate ceiling(log(N))=K random bits. Doing so is inefficient, but still works as long as the K bits are generated randomly.
In your example where you have N=3, you need at least K=2 bits, you have the following outcomes [00, 01, 10, 11] of equal probability. To map this into the proper range, just ignore one of the outcomes, such as the last one. Think of this as creating a new joint probability distribution, p(x_1, x_2), over the two bits where p(x_1=1, x_2=1) = 0, while for each of the others it will be 1/3 due to renormalization (i.e., (1/4)/(3/4) = 1/3 ).
I understand 3D hyperplanes can represent numbers generated by linear congruential generator. But I don't get how it determines the location for each number or point. Especially in a 3D cube? I mean, doesn't a point have to have X, Y, and Z values to be in there?! What if one of the numbers generated is "8"? It's just "8"... how would I know XYZ for that? (I hope you know what I'm talking about... couldn't post an image, sorry :/)
Suppose you generate batches of three pseudo-random numbers in a sequence from your linear congruential generator and use the first number in each batch as the x-dimension, the next as the y-dimension and the last as the z-dimension, you can then plot each batch of three pseudo-random numbers in a x-y-z cube. A similar argument goes for generating batches of n (n > 3) numbers, except you'll plot them in a hypercube.
Assume that you are generating each of those pseudo-random numbers with b bits. There are then 2nb possible numbers that would have to be generated to fill the (hyper)cube (which will be a very large number, for any typical value of b). However, if the generator has a period of less than 2nb (which will almost always be the case for practical purposes), it won't fill all the available spaces in the cube (or hypercube, if n > 3). It will only fill some of the spaces.
What's more, the filled spaces may be located in planes (or hyperplanes, if n > 3) passing through the (hyper)cube, with spaces in-between the (hyper)planes that represent numbers that the generator will never produce because it repeats its cycle without ever producing such a number. This occurs because the pseudo-random numbers are serially correlated. You can see this behaviour at any dimensionality but the number of (hyper)planes on which the pseudo-random numbers are located reduces as the dimensionality n increases, so the behaviour becomes much more obvious as n gets larger.
This can be a particular problem in when using the generated pseudo-random numbers as input to a simulation because the simulation can then produce output that is more an artefact of the imperfections of the pseudo-random numbers than a consequence of the simulated model.
The Wikipedia article on Linear congruential generator is excellent.
(EDITED TO ADD AN EXAMPLE)
Here is a linear congruential generator (with very poor parameters selected deliberately) implemented in Python. Pseudo-random numbers with an even index are assigned to x values and those with odd numbers are assigned to y values.
import matplotlib.pyplot as plt
def lcg (X, a, c, m):
return (a * X + c) % m;
x = []
y = []
X = 0
for i in range(1000):
X = lcg(X,43,5,256)
if i % 2 == 0:
x.append(X)
else:
y.append(X)
plt.scatter(x,y)
plt.show()
This script produces the following output:
You can see that the resulting (x,y) pairs are all found on a small number of straight lines and pairs that appear in-between the lines can never be produced by the generator. The same thing can be done in three or more dimensions to see how generators with better parameters than I've used here still produce outputs that sit on lines, planes or hyperplanes in 2, 3, or n-dimensional space.