Non-recursive pseudo-random number generation with discrete uniform distribution - random

I wonder is there any cheap and effective function to generate pseudo-random numbers by their indices? With something like that implementation:
var rand = new PseudoRandom(seed); // all sequences for same seeds are equal
trace(rand.get(index1)); // get int number by index1, for example =0x12345678
trace(rand.get(index2));
...
trace(rand.get(index1)); // must return the SAME number, =0x12345678
Probably it isn't about randomness but about good (fast and close as much as possible to uniform distribution) hashing where initial seed used as salt.

You could build such random number generator out of the stream cipher Salsa20. One of the nice features of Salsa20 is that you can jump ahead to any offset very cheaply. And Salsa20 is fast, typically less than 20 cycles per byte. Since the cipher is indistinguishable from a truly random stream, uniformity should be excellent.
Since you probably don't need cryptographically secure random numbers, you could even reduce the number of rounds to something like 8 instead of the usual 20 rounds.
Another option is to just use the ideas behind Salsa20, how to mix up a state array (Bernstein calls that a hashing function), to build your own random number generator.

Related

Right way to permute [duplicate]

I've been using Random (java.util.Random) to shuffle a deck of 52 cards. There are 52! (8.0658175e+67) possibilities. Yet, I've found out that the seed for java.util.Random is a long, which is much smaller at 2^64 (1.8446744e+19).
From here, I'm suspicious whether java.util.Random is really that random; is it actually capable of generating all 52! possibilities?
If not, how can I reliably generate a better random sequence that can produce all 52! possibilities?
Selecting a random permutation requires simultaneously more and less randomness than what your question implies. Let me explain.
The bad news: need more randomness.
The fundamental flaw in your approach is that it's trying to choose between ~2226 possibilities using 64 bits of entropy (the random seed). To fairly choose between ~2226 possibilities you're going to have to find a way to generate 226 bits of entropy instead of 64.
There are several ways to generate random bits: dedicated hardware, CPU instructions, OS interfaces, online services. There is already an implicit assumption in your question that you can somehow generate 64 bits, so just do whatever you were going to do, only four times, and donate the excess bits to charity. :)
The good news: need less randomness.
Once you have those 226 random bits, the rest can be done deterministically and so the properties of java.util.Random can be made irrelevant. Here is how.
Let's say we generate all 52! permutations (bear with me) and sort them lexicographically.
To choose one of the permutations all we need is a single random integer between 0 and 52!-1. That integer is our 226 bits of entropy. We'll use it as an index into our sorted list of permutations. If the random index is uniformly distributed, not only are you guaranteed that all permutations can be chosen, they will be chosen equiprobably (which is a stronger guarantee than what the question is asking).
Now, you don't actually need to generate all those permutations. You can produce one directly, given its randomly chosen position in our hypothetical sorted list. This can be done in O(n2) time using the Lehmer[1] code (also see numbering permutations and factoriadic number system). The n here is the size of your deck, i.e. 52.
There is a C implementation in this StackOverflow answer. There are several integer variables there that would overflow for n=52, but luckily in Java you can use java.math.BigInteger. The rest of the computations can be transcribed almost as-is:
public static int[] shuffle(int n, BigInteger random_index) {
int[] perm = new int[n];
BigInteger[] fact = new BigInteger[n];
fact[0] = BigInteger.ONE;
for (int k = 1; k < n; ++k) {
fact[k] = fact[k - 1].multiply(BigInteger.valueOf(k));
}
// compute factorial code
for (int k = 0; k < n; ++k) {
BigInteger[] divmod = random_index.divideAndRemainder(fact[n - 1 - k]);
perm[k] = divmod[0].intValue();
random_index = divmod[1];
}
// readjust values to obtain the permutation
// start from the end and check if preceding values are lower
for (int k = n - 1; k > 0; --k) {
for (int j = k - 1; j >= 0; --j) {
if (perm[j] <= perm[k]) {
perm[k]++;
}
}
}
return perm;
}
public static void main (String[] args) {
System.out.printf("%s\n", Arrays.toString(
shuffle(52, new BigInteger(
"7890123456789012345678901234567890123456789012345678901234567890"))));
}
[1] Not to be confused with Lehrer. :)
Your analysis is correct: seeding a pseudo-random number generator with any specific seed must yield the same sequence after a shuffle, limiting the number of permutations that you could obtain to 264. This assertion is easy to verify experimentally by calling Collection.shuffle twice, passing a Random object initialized with the same seed, and observing that the two random shuffles are identical.
A solution to this, then, is to use a random number generator that allows for a larger seed. Java provides SecureRandom class that could be initialized with byte[] array of virtually unlimited size. You could then pass an instance of SecureRandom to Collections.shuffle to complete the task:
byte seed[] = new byte[...];
Random rnd = new SecureRandom(seed);
Collections.shuffle(deck, rnd);
In general, a pseudorandom number generator (PRNG) can't choose from among all permutations of a 52-item list if its maximum cycle length is less than 226 bits.
java.util.Random implements an algorithm with a modulus of 248 and a maximum cycle length of not more than that, so much less than 2226 (corresponding to the 226 bits I referred to). You will need to use another PRNG with a bigger cycle length, specifically one with a maximum cycle length of 52 factorial or greater.
See also "Shuffling" in my article on random number generators.
This consideration is independent of the nature of the PRNG; it applies equally to cryptographic and noncryptographic PRNGs (of course, noncryptographic PRNGs are inappropriate whenever information security is involved).
Although java.security.SecureRandom allows seeds of unlimited length to be passed in, the SecureRandom implementation could use an underlying PRNG (e.g., "SHA1PRNG" or "DRBG"). And it depends on that PRNG's maximum cycle length whether it's capable of choosing from among 52 factorial permutations.
Let me apologize in advance, because this is a little tough to understand...
First of all, you already know that java.util.Random is not completely random at all. It generates sequences in a perfectly predictable way from the seed. You are completely correct that, since the seed is only 64 bits long, it can only generate 2^64 different sequences. If you were to somehow generate 64 real random bits and use them to select a seed, you could not use that seed to randomly choose between all of the 52! possible sequences with equal probability.
However, this fact is of no consequence as long as you're not actually going to generate more than 2^64 sequences, as long as there is nothing 'special' or 'noticeably special' about the 2^64 sequences that it can generate.
Lets say you had a much better PRNG that used 1000-bit seeds. Imagine you had two ways to initialize it -- one way would initialize it using the whole seed, and one way would hash the seed down to 64 bits before initializing it.
If you didn't know which initializer was which, could you write any kind of test to distinguish them? Unless you were (un)lucky enough to end up initializing the bad one with the same 64 bits twice, then the answer is no. You could not distinguish between the two initializers without some detailed knowledge of some weakness in the specific PRNG implementation.
Alternatively, imagine that the Random class had an array of 2^64 sequences that were selected completely and random at some time in the distant past, and that the seed was just an index into this array.
So the fact that Random uses only 64 bits for its seed is actually not necessarily a problem statistically, as long as there is no significant chance that you will use the same seed twice.
Of course, for cryptographic purposes, a 64 bit seed is just not enough, because getting a system to use the same seed twice is computationally feasible.
EDIT:
I should add that, even though all of the above is correct, that the actual implementation of java.util.Random is not awesome. If you are writing a card game, maybe use the MessageDigest API to generate the SHA-256 hash of "MyGameName"+System.currentTimeMillis(), and use those bits to shuffle the deck. By the above argument, as long as your users are not really gambling, you don't have to worry that currentTimeMillis returns a long. If your users are really gambling, then use SecureRandom with no seed.
I'm going to take a bit of a different tack on this. You're right on your assumptions - your PRNG isn't going to be able to hit all 52! possibilities.
The question is: what's the scale of your card game?
If you're making a simple klondike-style game? Then you definitely don't need all 52! possibilities. Instead, look at it like this: a player will have 18 quintillion distinct games. Even accounting for the 'Birthday Problem', they'd have to play billions of hands before they'd run into the first duplicate game.
If you're making a monte-carlo simulation? Then you're probably okay. You might have to deal with artifacts due to the 'P' in PRNG, but you're probably not going to run into problems simply due to a low seed space (again, you're looking at quintillions of unique possibilities.) On the flip side, if you're working with large iteration count, then, yeah, your low seed space might be a deal-breaker.
If you're making a multiplayer card game, particularly if there's money on the line? Then you're going to need to do some googling on how the online poker sites handled the same problem you're asking about. Because while the low seed space issue isn't noticeable to the average player, it is exploitable if it's worth the time investment. (The poker sites all went through a phase where their PRNGs were 'hacked', letting someone see the hole cards of all the other players, simply by deducing the seed from exposed cards.) If this is the situation you're in, don't simply find a better PRNG - you'll need to treat it as seriously as a Crypto problem.
Short solution which is essentially the same of dasblinkenlight:
// Java 7
SecureRandom random = new SecureRandom();
// Java 8
SecureRandom random = SecureRandom.getInstanceStrong();
Collections.shuffle(deck, random);
You don't need to worry about the internal state. Long explanation why:
When you create a SecureRandom instance this way, it accesses an OS specific
true random number generator. This is either an entropy pool where values are
accessed which contain random bits (e.g. for a nanosecond timer the nanosecond
precision is essentially random) or an internal hardware number generator.
This input (!) which may still contain spurious traces are fed into a
cryptographically strong hash which removes those traces. That is the reason those CSPRNGs are used, not for creating those numbers themselves! The SecureRandom has a counter which traces how many bits were used (getBytes(), getLong() etc.) and refills the SecureRandom with entropy bits when necessary.
In short: Simply forget objections and use SecureRandom as true random number generator.
If you consider the number as just an array of bits (or bytes) then maybe you could use the (Secure)Random.nextBytes solutions suggested in this Stack Overflow question, and then map the array into a new BigInteger(byte[]).
A very simple algorithm is to apply SHA-256 to a sequence of integers incrementing from 0 upwards. (A salt can be appended if desired to "get a different sequence".) If we assume that the output of SHA-256 is "as good as" uniformly distributed integers between 0 and 2256 - 1 then we have enough entropy for the task.
To get a permutation from the output of SHA256 (when expressed as an integer) one simply needs to reduce it modulo 52, 51, 50... as in this pseudocode:
deck = [0..52]
shuffled = []
r = SHA256(i)
while deck.size > 0:
pick = r % deck.size
r = floor(r / deck.size)
shuffled.append(deck[pick])
delete deck[pick]
My Empirical research results are Java.Random is not totally truly random. If you try yourself by using Random class "nextGaussian()"-method and generate enough big sample population for numbers between -1 and 1, the graph is normal distbruted field know as Gaussian Model.
Finnish goverment owned gambling-bookmarker have a once per day whole year around every day drawn lottery-game where winning table shows that the Bookmarker gives winnings in normal distrbuted way. My Java Simulation with 5 million draws shows me that with nextInt() -methdod used number draw, winnings are normally distributed same kind of like the my Bookmarker deals the winnings in each draw.
My best picks are avoiding numbers 3 and 7 in each of ending ones and that's true that they are rarely in winning results. Couple of times won five out of five picks by avoiding 3 and 7 numbers in ones column in Integer between 1-70 (Keno).
Finnish Lottery drawn once per week Saturday evenings If you play System with 12 numbers out of 39, perhaps you get 5 or 6 right picks in your coupon by avoiding 3 and 7 values.
Finnish Lottery have numbers 1-40 to choose and it takes 4 coupon to cover all the nnumbers with 12 number system. The total cost is 240 euros and in long term it's too expensive for the regural gambler to play without going broke. Even if you share coupons to other customers available to buy still you have to be quite a lucky if you want to make profit.

Can a pseudorandom generator ever be true random?

I can understand how using a seed for a pseudorandom gen such as the time does not make it truly random; but when a pseudorandom generator gets its seed from a hardware random number generator, doesn't the pseudorandom generator then become True Random, as its seed is gathered from a TRNG?
First of all, realize that individual numbers are not random or non-random: only large sets of numbers.
If you seed a PRNG from a truly random source, and then just keep calling the PRNG to get more numbers, then you will just have a pseudorandom sequence of numbers, albeit well seeded.
If you seed a PRNG with a truly random source and then fetch only one value from the PRNG, then you have a hash of a truly random number. If the PRNG's seed hashing function is good, this will be just as random as its input. If it's not, it might be more predictable (for example, a PRNG with only 64 bits of internal state will only produce 2^64 different values, regardless of how many bits you seed it with).
That's not to say that it's a bad idea--game simulations and Monte Carlo systems should use a fast PRNG seeded from a TRNG source to get the best compromise of speed and quality. But cryptographic applications need cryptographically secure random values, and that's trickier.
No
Good seeds are necessity, but they won't change the nature (and flaws) of the PRNG.
For example, even with good absolutely true random seed RNG such as LCG will still experience correlated sampling at high dimensions

"Resetting" pseudo-random number generator seed multiple times?

Today, my friend had a thought that setting the seed of a pseudo-random number generator multiple times using the pseudo-random number generated to "make things more randomized".
An example in C#:
// Initiate one with a time-based seed
Random rand = new Random(milliseconds_since_unix_epoch());
// Then loop for a_number_of_times...
for (int i = 0; i < a_number_of_times; i++)
{
// ... to initiate with the next random number generated
rand = new Random(rand.Next());
}
// So is `rand` now really random?
assert(rand.Next() is really_random);
But I was thinking that this could probably increase the chance of getting a repeated seed being used for the pseudo-random number generator.
Will this
make things more randomized,
making it loop through a certain number of seeds used, or
does nothing to the randomness (i.e. neither increase nor decrease)?
Could any expert in pseudo-random number generators give some detailed explanations so that I can convince my friend? I would be happy to see answers explaining further detail in some pseudo-random number generator algorithm.
There are three basic levels of use for pseudorandom numbers. Each level subsumes the one below it.
Unexpected numbers with no particular correlation guarantees. Generators at this level typically have some hidden correlations that might matter to you, or might not.
Statistically-independent number with known non-correlation. These are generally required for numerical simulations.
Cryptographically secure numbers that cannot be guessed. These are always required when security is at issue.
Each of these is deterministic. A random number generator is an algorithm that has some internal state. Applying the algorithm once yields a new internal state and an output number. Seeding the generator means setting up an internal state; it's not always the case that the seed interface allows setting up every possible internal state. As a good rule of thumb, always assume that the default library random() routine operates at only the weakest level, level 1.
To answer your specific question, the algorithm in the question (1) cannot increase the randomness and (2) might decrease it. The expectation of randomness, thus, is strictly lower than seeding it once at the beginning. The reason comes from the possible existence of short iterative cycles. An iterative cycle for a function F is a pair of integers n and k where F^(n) (k) = k, where the exponent is the number of times F is applied. For example, F^(3) (x) = F(F(F(x))). If there's a short iterative cycle, the random numbers will repeat more often than they would otherwise. In the code presented, the iteration function is to seed the generator and then take the first output.
To answer a question you didn't quite ask, but which is relevant to getting an understanding of this, seeding with a millisecond counter makes your generator fail the test of level 3, unguessability. That's because the number of possible milliseconds is cryptographically small, which is a number known to be subject to exhaustive search. As of this writing, 2^50 should be considered cryptographically small. (For what counts as cryptographically large in any year, please find a reputable expert.) Now the number of milliseconds in a century is approximately 2^(41.5), so don't rely on that form of seeding for security purposes.
Your example won't increase the randomness because there is no increase in entropy. It is simply derived from the execution time of the program.
Instead of using something based of the current time, computers maintain an entropy pool, and build it up with data that is statistically random (or at least, unguessable). For example, the timing delay between network packets, or key-strokes, or hard-drive read times.
You should tap into that entropy pool if you want good random numbers. These are known as Cryptographically secure pseudorandom number generators.
In C#, see the Cryptography.RandomNumberGenerator Class for the right way to get a secure random number.
This will not make things more "random".
Our seed determines the random looking but completely determined sequence of numbers that rand.next() gives us.
Instead of making things more random, your code defines a mapping from your initial seed to some final seed, and, given the same initial seed, you will always end up with the same final seed.
Try playing with this code and you will see what I mean (also, here is a link to a version you can run in your browser):
int my_seed = 100; // change my seed to whatever you want
Random rand = new Random(my_seed);
for (int i = 0; i < a_number_of_times; i++)
{
rand = new Random(rand.Next());
}
// does this print the same number every run if we don't change the starting seed?
Console.WriteLine(rand.Next()); // yes, it does
The Random object with this final seed is just like any other Random object. It just took you more time then necessary to create it.

I'm looking for a good psuedo random number generator, that takes two inputs instead of one

I'm looking for a determenistic psuedo random generator that takes two inputs and always returns the same output. I'm looking for things like uniform distribution, unpredictable as possible, and doesn't repeat for a long long time. Ideally the function doesn't rely on previous values. The reason that is a problem is I'm generating terrain data for an extremely large procedurely generated world and can't afford to store previous values.
Any help is appreciated.
i think what you're looking for is perlin noise - it's a way of generating "random" values in 2d (typically) that look like terrain / clouds / etc.
note that this doesn't have much to do with cryptography etc, but a "real" random number source is probably not what you want for synthetic terrain (it looks too noisy/spikey).
there's a good article on perlin noise here.
the implementation of perlin noise does use a source of random numbers, but typically you can use whatever is present on your system (starting with a known seed if you want to reproduce it later).
Is the problem deciding on a PRNG algorithm to use or an algorithm that accepts 2 inputs?
If it's the former, why not use the built in random class - such as Random class in .NET - since it strives for uniform distribution and long cycles. Also, given the same seed it will generate the same sequence of numbers.
If it's the latter, what you can do is map the 2 inputs to a single ouput and use that as a seed to your random algorithm. You can define a simple hash function that takes a string and calculates an integer from it:
s[0] + s[1]^1 + s[2]^2 + ... s[n]^n = seed
Combination of two inputs (by concatenating each other, provided the inputs are binary integers) into one seed will do, for a PRNG, such as Mersenne Twister.

How exactly does PC/Mac generates random numbers for either 0 or 1?

This question is NOT about how to use any language to generate a random number between any interval. It is about generating either 0 or 1.
I understand that many random generator algorithm manipulate the very basic random(0 or 1) function and take seed from users and use an algorithm to generate various random numbers as needed.
The question is that how the CPU generate either 0 or 1? If I throw a coin, I can generate head or tailer. That's because I physically throw a coin and let the nature decide. But how does CPU do it? There must be an action that the CPU does (like throwing a coin) to get either 0 or 1 randomly, right?
Could anyone tell me about it?
Thanks
(This has several facets and thus several algorithms. Keep in mind that there are many different forms of randomness used for different purposes, but I understand your question in the way that you are interested in actual randomness used for cryptography.)
The fundamental problem here is that computers are (mostly) deterministic machines. Given the same input in the same state they always yield the same result. However, there are a few ways of actually gathering entropy:
User input. Since users bring outside input into the system you can take that to derive some bits from that. Similar to how you could use radioactive decay or line noise.
Network activity. Again, an outside source of stuff.
Generally interrupts (which kinda include the first two).
As alluded to in the first item, noise from peripherals, such as audio input or a webcam can be used.
There is dedicated hardware that can generate a few hundred MiB of randomness per second. Usually they give you random numbers directly instead of their internal entropy, though.
How exactly you derive bits from that is up to you but you could use time between events, or actual content from the events, etc. – generally eliminating bias from entropy sources isn't easy or trivial and a lot of thought and algorithmic work goes into that (in the case of the aforementioned special hardware this is all done in hardware and the code using it doesn't need to care about it).
Once you have a pool of actually random bits you can just use them as random numbers (/dev/random on Linux does that). But this has downsides, since there is usually little actual entropy and possibly a higher demand for random numbers. So you can invent algorithms to “stretch” that initial randomness in a manner that makes it still impossible or at least very difficult to predict anything about following numbers (/dev/urandom on Linux or both /dev/random and /dev/urandom on FreeBSD do that). Fortuna and Yarrow are so-called cryptographically secure pseudo-random number generators and designed with that in mind. You still have a very good guarantee about the quality of random numbers you generate, but have many more before your entropy pool runs out.
In any case, the CPU itself cannot give you a random 0 or 1. There's a lot more involved and this usually includes the complete computer system or special hardware built for that purpose.
There is also a second class of computational randomness: Plain vanilla pseudo-random number generators (PRNGs). What I said earlier about determinism – this is the embodiment of it. Given the same so-called seed a PRNG will yield the exact same sequence of numbers every time¹. While this sounds idiotic it has practical benefits.
Suppose you run a simulation involving lots of random numbers, maybe to simulate interaction between molecules or atoms that involve certain probabilities and unpredictable behaviour. In science you want results anyone can independently verify, given the same setup and procedure (or, with computing, the same algorithms). If you used actual randomness the only option you have would be to save every single random number used to make sure others can replicate the results independently.
But with a PRNG all you need to save is the seed and remember what algorithm you used. Others can then get the exact same sequence of pseudo-random numbers independently. Very nice property to have :-)
Footnotes
¹ This even includes the CSPRNGs mentioned above, but they are designed to be used in a special way that includes regular re-seeding with entropy to overcome that problem.
A CPU can only generate a uniform random number, U(0,1), which happens to range from 0 to 1. So mathematically, it would be defined as a random variable U in the range [0,1]. Examples of random draws of a U(0,1) random number in the range 0 to 1 would be 0.28100002, 0.34522, 0.7921, etc. The probability of any value between 0 and 1 is equal, i.e., they are equiprobable.
You can generate binary random variates that are either 0 or 1 by setting a random draw of U(0,1) to a 0 if U(0,1)<=0.5 and 1 if U(0,1)>0.5, since in theory there will be an equal number of random draws of U(0,1) below 0.5 and above 0.5.

Resources