Characteristics of the Mersenne Twister - 19937 - random

I have a quite simple question:
When we take the Mersenne Twister -19937 generator and we fix the seed, then everytime we call the generator it produces a sequence of numbers that have some characteristics (independence, uniform distribution). What is of importance here is the Independance (or low correlation between 2 consecutive calls).
Now, what happens if I have two instaces of Mersenne Twister -19937 with different (but fixed) seeds and I call each generator once. How is the independance, correlation structure of the two sets of Random numbers I get in this case?
Many Thanks

The "guarantee" isn't there anymore. It's quite possible to have a random generator that produces the exactly same values for two different seeds.
This isn't an issue unless you depend on some behaviour of the randomness. The main point of course being cryptography - cryptographical random number generators try very hard to be very random even when you eg. run 10 generators in parallel. However, that kind of defeats the purpose of repeatability (eg. procedural generation etc.).
However, the two generators do keep their guarantees independently. This means that as long as they don't "interact" (eg. two zones in a game, each with their own generator), the randomness will be preserved.
A good rule-of-thumb is to test it (unless the randomness is critical, then it's math all the way :)). Plot graphs. Find out in the real world :)
EDIT: Since you've added the specific algorithm, let me expand the answer a bit. Mersenne twister is quite random. However, the randomness is very much dependent on the initial value. For some seeds, it could produce very random values even having a few parallel generators. For some seeds, the results are very close to each other. As wiki points out:
A consequence of this is that two instances of the generator, started with initial states that are almost the same, will output nearly the same sequence for many iterations before eventually diverging.

Related

Random number understanding [duplicate]

This question already has answers here:
How does a random number generator work?
(9 answers)
how does random() actually work?
(7 answers)
Closed 8 years ago.
I can't understand how could computer make random numbers.
I mean what piece of hardware can do this? and does the computer has only one source to do this and all the programming languages use that?
Thanks in advance.
The short answer is that computers can't easily make truly random numbers. There are a couple ways to generate random numbers, though, some fast but not random, and some slow but true...
Pseudo-Random Generators
Most low-level languages (Namely, C) have built in functionality that allows them to psuedo generate random numbers, but this is not true random number generation. It works by starting with a "seed" value, an initial string of numbers, and then modifying this seed, over and over again, to create a "random" string.
They fall short in that, with the right seed and factors, conditions can be created to force a certain number to be generated. Also, due to the nature of the generation, when graphed, the results will not be evenly distributed. As mentioned by the above answerer, there are things a programmer can do to make it more random, but the method can not be truly random, for the above reasons. An example is the random number generator in most programming languages. It is hard-coded, and is performed in the CPU.
Entropy Generators
Random numbers that work through entropy generation work by measuring a type of entropy (disorder, or, as I have heard it defined, chaos #duffymo has informed me that chaos is not a good synonym. Sorry!) that is presumed to be random. Atmospheric and thermal noise are common things measured. They are generally considered to be "better" than the above choice, as they are, for the most part, closer to true randomness. One issue is that they are slow - numbers can not be generated unless enough entropy is harvested. An example is random.org, an atmospheric noise entropacal random number generator (say that 10 times fast!). It is performed by whatever piece of hardware makes the measurement of entropy.
Quantum Generators
A subset of entropy generators, quantum generators measure quantum factors (factors not used in classical physics), such as the spin of particles to determine a number. A downside is that true quantum generators are expensive. An example is this piece of hardware which uses the path of a photon to determine a number.
Hope this helps!
It can be hardware, but most languages like Java and C# use a software construct best explained by Donald Knuth in his opus "The Art of Computer Programming": linear congruential generator.
As you can imagine, there are problems with these approaches.
There are attempts to improve it (e.g. Mersenne Twister).
There are extensive statistical tests to assess a given random number generation algorithm called the Diehard Tests. (I always picture big vehicles in a snowstorm being cranked in the cold by honking batteries when I hear about those tests.)
I'd be willing to bet that the period on these pseudo random number generators is more than adequate for your applications.
The best way to generate a truly random number is to use a quantum process from nature in hardware.

Is there a somewhat-reliable way to detect that a list of integers came from a common PRNG?

Basically I'm looking for a detective function. I pass it a list of integers (probably between 20 and 100 integers) and it tell me "Yeah, 84% chance this came from a PRNG, I tested it against the main ones that most modern programming languages use", or "No, only 12% chance this came from a well-known PRNG".
If it helps (or hinders), the integers will always be between 1 and 999.
Does this exist?
Unless you are prepared to break new ground in number theory, you would only be able to detect obsolete, badly designed, or poorly seeded PRNGs. Good PRNGs are explicitly designed to prevent what you are trying to do. Random number generation is a critical part of digital cryptography, so a lot of effort goes into producing random numbers that meet all known tests.
There are batteries of tests to profile PRNGs. See for example this NIST page.
As the comments point out, the first two sentences are overstated and are only strictly true for PRNGs that may be used in cryptography. Weaker (i.e. more predictable) PRNGs might be chosen for other domains in order to improve time or space performance.
You can write a battery of tests for a list of candidate generators, but there are a lot of generators, and some have enormous state where adjacent values of a well-seeded generator will reveal nothing useful and you'll have to see wait for a long time before you can get the two data points which will have an informative relationship.
On the plus side; while the list of random number generators that you might encounter is vast, there are telltale signs that will help you identify some classes of simple generators quickly and then you can perform focussed analysis to derive the specific configuration.
Unfortunately even a simple generator like KISS shows that while the generator can be trivially broken when you know its configuration, it can hide its signature from anything that does not know its configuration, leaving you in a situation where you have to individually test for every possible configuration.
There are quality tests like dieharder and TestU01 which will consume many megabytes of data to identify any weakness in a generator; however, these can also identify weaknesses in real RNGs, so they could give a strong false positive.
To consume only a 100 integers you would really need to have a list of generators in mind. For example, to detect LCG used inappropriately, you simply test to see if the bottom three bits cycle through a repeating pattern of 8 values -- but this is by far the easiest case.
If you had a sequence 625 or more 32-bit integers, you could detect with high confidence whether it was from consecutive calls to Mersenne Twister. That is because it leaks state information in the output values.
For an example of how it is done, see this blog entry.
Similar results are in theory possible when you don't have ideal data such as full 32-bit integers, but you would need a longer sequence and the maths gets harder. You would also need to know - or perhaps guess by trying obvious options - how the numbers were being reduced from the larger range to the smaller one.
Similar results are possible from other PRNGs, but generally only the non-cryptographic ones.
In principle you could identify specific PRNG sequences with very high confidence, but even simple barriers such as missing numbers from the strict sequence can make it a lot harder. There will also be many PRNGs that you will not be able to reliably detect, and typically you will either have close to 100% confidence of a match (to a hackable PRNG) or 0% confidence of any match.
Whether or not a PRNG is a hackable (and therefore could be detected by the numbers it emits) is not a general indicator of PRNG quality. Obviously, "hackable" is opposite to a requirement for "secure", so don't consider Mersenne Twister for creating unguessable codes. However, do consider it as a source of randomness for e.g. neural networks, genetic algorithms, monte-carlo simulations and other places where you need a lot of statistically random-looking data.

How would one know if one saw a random number generator?

I have been reading various articles about random numbers and their generators. There are usually 3 important conclusions that I draw from them:
Random numbers are not truly random
Much of the time they have a bias (modulo bias)
Humans are incapable of being random number generators, when they are trying to "act randomly"
So, with the latter-most of these observations in mind, how would we be able to
Tell if a sequence of numbers that we see is truly random, and more importantly
Is there some way we can prove that said sequence is really random?
I'm tempted to say that so long as you generate a sufficiently large enough sample set 1,000,000+, you should see more or less a uniform dispersion of (pseudo)random numbers occur. However, I'm sure some Maths genius has a way of discrediting this, because surely the by laws of probability you could get a run of one number just as likely as any other sequence.
From what I have read, if you really need random numbers its best to try and reuse what cryptographic libraries use. The field of Cryptography is obviously complex and relies on random numbers for key generation. From the section in OWASP's guide titled "Reversible Authentication Tokens" it says this...
The only way to generate secure authentication tokens is to ensure
there is no way to predict their sequence. In other words: true random
numbers.
It could be argued that computers can not generate true random
numbers, but using new techniques such as reading mouse movements and
key strokes to improve entropy has significantly increased the
randomness of random number generators. It is critical that you do not
try to implement this on your own; use of existing, proven
implementations is highly desirable.
Most operating systems include functions to generate random numbers
that can be called from almost any programming language.
My take is that unless you're coding Cryptographic libraries yourself, put trust in those that are (e.g. use Java Cryptography Extension) so you don't have to proove it yourself.
Pretty Simple Test:
If you really want to get into testing random numbers, you could simulate a program that outputs random numbers from 1-100 100 times as an example.
Then look at those numbers and see if there's any patterns. Then follow that test by restarting the program several times and repeating the process.
Examine all data to figure out if random numbers are always random, just random during individual tests, or never. :P
Testing a random number generator is probably mostly up to what you want to look for. Even pure non-repeatability is no guarantee of randomness.
There are some companies that will test a random number generator for the purposes of certification (e.g. online casinos). One that I found quickly is called iTech Labs, though their testing methodology page leaves a lot to be desired in terms of technical detail.
Other testers and certification bodies publish the required data for a certification; there's more specific detail here but not as much as you want.
You could potentially do a statistical analysis and compare the results of your random number generator to a "true" random source but the argument could be made for bias from trying to translate the true random source into your possibility space anyway.
Randomness tests verify the mathematical properties of the sequence. For example entry frequencies (all symbols are expected to have the same frequency), local variance, sequence analysis (the probability of a symbol must not depend on the previous ones).
A definite proof does not exist, but there is a quality factor - the probability of a sequence to really be random.
Another criterion could be based on compressibility: true randomness has maximum entropy and can not therefore be compressed.
This test is not reliable for randomness, of course, but allows quick and dirty testing with ready tools such as zlib.

Generating "too perfect" random numbers

A good RNG ought to pass several statistical tests of randomness. For example, uniform real values in the range 0 to 1 can be binned into a histogram with roughly equal counts in each bin, give or take some due to statistical fluctuations. These counts obey some distribution, I don't recall offhand if it's Poisson or binomial or what, but in any case these distributions have tails. Same idea applies to tests for correlations, subtle periodicities etc.
A high quality RNG will occasionally fail a statistical test. It is good advice to be suspicious of RNGs that look to perfect.
Well, I'm crazy and would like to generate (reproducibly) "too perfect" random numbers, ones suspiciously lacking in those random fluctuations in statistical measures. Histograms come out too flat, variances of moving-box averages come out too small, correlations suspiciously close to zero, etc. Looking for RNGs that pass all statistical tests too cleanly. What known RNGs are like this? Is there published research on this idea?
One unacceptable answer: some of the poorer linear congruential counter generators have too flat a distribution, but totally flunk most tests of randomness.
Related to this is the generation of random number streams with a known calibrated amount of imperfection. A lump in the distribution is easy - just generate a nonuniform distribution approximating the idea (e.g see Generating non-uniform random numbers) but what about introducing calibrated amounts of higher order correlations while maintaining a correct, or too perfect, distribution?
Apparently the Mersenne Twister, a commonly used random number generator, fails the DieHarder tests by being "too random". In other words, certain tests consistently come too close to their expected value under true randomness.
You can't. If it is flat in one test this will mean failure in another one, since the flatness shows it is not random.
You could try something like:
numbers = [1, 2, 3, 4, 5, 6] * 100
random.shuffle(numbers)
to get a random sequence with a perfect uniform distribution.
I think what you're looking for may be a quasi-random sequence. A quasi-random sequence jitters around but in a self-avoiding way, not clumping as much as a random sequence. When you look at how many points fall in different bins, the distribution will work out "too well" compared to a random sequence.
Also, this article may be relevant: When people ask for a random sequence, they’re often disappointed with what they get.
If you wish to generate a set of random numbers while tied to a set a correlation, you may want to investigate the Cholesky decomposition. I suspect from there you would just need a simple transformation to generate your "too perfect" random numbers.
By definition, a PRNG (pseudorandom number generator) cannot generate truly random numbers. No matter what trick you use to generate your pseudorandom sequence, there exists a test that will expose the trick, by showing the actual nonrandomness.
The folks at the National Institutes of Standards and Technology Computer Security Division have an abiding interest in RNGs and being able to measure the degree of randomness. I found this while looking for the old DIEHARD suite of PRNG tests.
The folks at the National Security Agency have an abiding interest in RNGs also, but they aren't going to tell you much.

How different do random seeds need to be?

Consider code like this (Python):
import random
for i in [1, 2, 3, 4]:
random.seed(i)
randNumbers = [random.rand() for i in range(100)] # initialize a list with 100 random numbers
doStuff(randNumbers)
I want to make sure that randNumbers differ significantly from one call to another. Do I need to make sure the seed numbers differ significantly between the subsequent calls, or is it sufficient that the seeds are different (no matter how)?
To the pedants: please realize the above code is super-over-simplified
Short answer: Avoid the re-seeding, as it doesn't buy you anything here. Long answer below.
That all depends on what exactly you need. In Common defects in initialization of pseudorandom number generators it is outlined that linear dependent seeds (which 1, 2, 3, 4 definitely are) are a bad choice for initializing multiple PRNGs, at least when used for simulation and desiring uncorrelated results.
If all you do is rolling a few dice, or generating some pseudo-random input for something uncritical, then it very likely doesn't matter.
Note also that using some classes of a PRNG itself for generating seeds have the same problem in generating linear dependent numbers (LCGs spring to mind).
If your random number generator is high quality, it shouldn't matter how you seed it. In fact, the best practice would be to seed it only once. Random number generators are designed to have certain statistical behavior once they're started. Frequently reseeding effectively creates a different random number generator, one that may not be as good.
Randomly selecting seeds sounds like a good idea, but it isn't. In fact, because of the "birthday paradox," there's a surprisingly high probability that you'll pick the same seed twice.
Generally speaking, you only seed your random number generator when you need the random numbers to be generated in identical fashion each time through. This is useful when you have a random component to your processing, but need to test it and therefore want it to be consistent between tests. Otherwise, you let the system seed the generator itself.
In otherwords, by seeding the random number generator with specific pre-defined seeds, you are actually reducing the randomness of the system as a whole. The random numbers generated when using a seed of 1 are indeed psuedo-randomly different from that with a seed of 2, but a hard coded seed will result in repeated random sequences in each run of the program.
You seem to want pseudo-random numbers that aren't pseudo-random, with a higher probability of consecutive numbers being 'significantly' different than pseudo-randomness requires. I doubt that any common prng will do this, whatever your seeding strategy.
The seeds themselves should be random so that the output is unpredictable. There can be problems if the seeds differ only in one or two bits (as this question demonstrates).
It depends upon the application for which you're using the PRNG. If you're using something that needs to be cryptographically sound, then the seeds generally need to be extremely difficult to deduce based on the output, different every time the application runs, difficult to simply guess, and impossible to determine by reverse engineering the application (i.e. they can't be hard coded).
If your goal is a game, your requirements may be different. For example, if you're controlling computer strategy, but the computer's strategy remains the same for all runs of the game, you may have an easily beatable game. Then again, you may want that for "easy" mode.

Resources