It's not clear from the documentation if the sequence of integers produced by these PRNGs belongs to the uniform distribution.
Also it's look like there is a whole family of RNG algorithms called in a similar manner. Personally I'm mostly interested in xoshiro256** - but no information about distribution uniformity as well.
Related
We're all familiar with not really random, human-chosen variables such as the disproportionate appearance of 37 when humans are asked to choose a number between 1 and 100, and for other cases (disproportionate selection of a particular one of four quadrants of a 2x2 grid, etc). I'm sure I'd once read that these are called "[something] Variables."
Can anyone please provide me with the term for these variables? Many thanks!
Random behavior is described by probability distributions. The Uniform(0,1) distribution is particularly important in computing because we have a variety of techniques to transform U(0,1)'s into other distributions. It's also easy to transform independent observations into non-independence via distributional conditioning. I don't know of any general solution to go the other way. Bottom line is that it's much easier to deal with independent uniforms.
Because of those two observations, the gold standard for Pseudo-Random Number Generators (PRNGs) is to produce uniformly distributed values which appear to be independent. (I say "appear" because if each value follows its predecessor based on deterministic calculations, clearly they must be dependent in some fashion.) Humans do a terrible job in both regards. To actually answer your question, values chosen by humans can be described as "non-uniform" and "not independent."
Some people might be tempted to say "uncorrelated" rather than "not independent," but they're not the same thing. While independence always produces zero correlation, there are counter-examples where dependent random variables can have zero correlation. In fact, most PRNGs are pretty good at producing uncorrelated values.
I'm in the process of evaluating some PRNGs, both in terms of speed and quality. One aspect of quality I want to test is multidimensional distribution and bias.
I know of TestU01's batteries, and I plan on using them (and, perhaps, others that the NIST suggests).
But what about testing multidimensional bias? Boost's PRNG have some comments, and the Mersenne Twister is known to be uniform in several hundred dimensions, while the Hellekalek PRNG has good uniform distribution in "several" dimensions (however many that means...).
I imagine the runtime complexity of a battery testing for multidimensional bias would increase with each dimension. So it's possible there isn't a suitable battery for this test. However, that I haven't confirmed that suspicion.
Is there a known way to test PRNGs for multidimensional bias? I'd even be okay if the test is limited to 2, 3, or 4 dimensions; that would be better than no test at all.
TestU01 is good. PractRand is arguably better (full disclosure: I wrote PractRand). For some categories of PRNGs, RaBiGeTe is also decent. There are other options which are not good (NIST STS, Diehard, and Dieharder are well known but ineffective).
Any good test suite will test a wide variety of numbers of "dimensions", though fundamentally it is easier to do comprehensive testing for shorter range correlations, so a better job is done on smaller numbers of dimensions.
Generally, anything that passes the TestU01 BigCrush battery and/or one terabyte of PractRand standard battery is likely to be fine for real-world non-cryptographic usage. This kind of testing cannot identify some categories of problems however, particularly inter-seed correlation issues.
I have found automatic differentiation to be extremely useful when writing mathematical software. I now have to work with random variables and functions of the random variables, and it seems to me that an approach similar to automatic differentiation could be used for this, too.
The idea is to start with a basic random vector with given multivariate distribution and then you want to work with the implied probability distributions of functions of components of the random vector. The idea is to define operators that automatically combine two probability distributions appropriately when you add, multiply, divide two random variables and transform the distribution appropriately when you apply scalar functions such as exponentiation. You could then combine these to build any function you need of the original random variables and automatically have the corresponding probability distribution available.
Does this sound feasible? If not, why not? If so and since it's not a particularly original thought, could someone point me to an existing implementation, preferably in C
There has been a lot of work on probabilistic programming. One issue is that as your distribution gets more complicated you start needing more complex techniques to sample from it.
There are a number of ways this is done. Probabilistic graphical models gives one vocabulary for expressing these models, and you can then sample from them using various Metropolis-Hastings-style methods. Here is a crash course.
Another model is Probabilistic Programming, which can be done through an embedded domain specific language, directly. Oleg Kiselyov's HANSEI is an example of this approach. Once they have the program they can inspect the tree of decisions and expand them out by a form of importance sampling to gain the most information possible at each step.
You may also want to read "Nonstandard Interpretations of Probabilistic
Programs for Efficient Inference" by Wingate et al. which describes one way to use extra information about the derivative of your distribution to accelerate Metropolis-Hastings-style sampling techniques. I personally use automatic differentiation to calculate those derivatives and this brings the topic back to automatic-differentiation. ;)
A good RNG ought to pass several statistical tests of randomness. For example, uniform real values in the range 0 to 1 can be binned into a histogram with roughly equal counts in each bin, give or take some due to statistical fluctuations. These counts obey some distribution, I don't recall offhand if it's Poisson or binomial or what, but in any case these distributions have tails. Same idea applies to tests for correlations, subtle periodicities etc.
A high quality RNG will occasionally fail a statistical test. It is good advice to be suspicious of RNGs that look to perfect.
Well, I'm crazy and would like to generate (reproducibly) "too perfect" random numbers, ones suspiciously lacking in those random fluctuations in statistical measures. Histograms come out too flat, variances of moving-box averages come out too small, correlations suspiciously close to zero, etc. Looking for RNGs that pass all statistical tests too cleanly. What known RNGs are like this? Is there published research on this idea?
One unacceptable answer: some of the poorer linear congruential counter generators have too flat a distribution, but totally flunk most tests of randomness.
Related to this is the generation of random number streams with a known calibrated amount of imperfection. A lump in the distribution is easy - just generate a nonuniform distribution approximating the idea (e.g see Generating non-uniform random numbers) but what about introducing calibrated amounts of higher order correlations while maintaining a correct, or too perfect, distribution?
Apparently the Mersenne Twister, a commonly used random number generator, fails the DieHarder tests by being "too random". In other words, certain tests consistently come too close to their expected value under true randomness.
You can't. If it is flat in one test this will mean failure in another one, since the flatness shows it is not random.
You could try something like:
numbers = [1, 2, 3, 4, 5, 6] * 100
random.shuffle(numbers)
to get a random sequence with a perfect uniform distribution.
I think what you're looking for may be a quasi-random sequence. A quasi-random sequence jitters around but in a self-avoiding way, not clumping as much as a random sequence. When you look at how many points fall in different bins, the distribution will work out "too well" compared to a random sequence.
Also, this article may be relevant: When people ask for a random sequence, they’re often disappointed with what they get.
If you wish to generate a set of random numbers while tied to a set a correlation, you may want to investigate the Cholesky decomposition. I suspect from there you would just need a simple transformation to generate your "too perfect" random numbers.
By definition, a PRNG (pseudorandom number generator) cannot generate truly random numbers. No matter what trick you use to generate your pseudorandom sequence, there exists a test that will expose the trick, by showing the actual nonrandomness.
The folks at the National Institutes of Standards and Technology Computer Security Division have an abiding interest in RNGs and being able to measure the degree of randomness. I found this while looking for the old DIEHARD suite of PRNG tests.
The folks at the National Security Agency have an abiding interest in RNGs also, but they aren't going to tell you much.
Every language has a random() function or something similar to generate a pseudo-random number. I am wondering what happens underneath to generate these numbers? I am not programming anything that makes this knowledge necessary, just trying to satisfy my own curiosity.
The entire first chapter of Donald Knuth's seminal
work Seminumerical Algorithms is taken up with the subject of random number generation. I really don't think an SO answer is going to come close to describing the issues involved. Read the book.
It turns out to be surprisingly easy to get half-way-decent pseudorandom numbers. For decades the gold standard was a remarkably simple algorithm: keep state x, multiply by constant A (32x32 => 64 bits) then add constant B, then return the low 32-bits, which also become the new x. If A and B are chosen carefully this actually works fairly well.
Pseudorandom numbers need to be repeatable, too, in order to reproduce behavior during debugging. So, seeding the generator (initializing x with, say, the time-of-day) is typically avoided during debugging.
In recent years, and with more compute cycles available to burn, more sophisticated algorithms are available, some of them invented since the publication of the otherwise quite authoritive Seminumerical Algorithms. Operating systems are also starting to provide hardware and network-derived entropy bits for specialized cryptographic purposes.
The Wikipedia page is a good reference.
The actual algorithm used is going to be dependent on the language and the implementation of the language.
random() is a so called pseudorandom number generator (PRNG). random() is mostly implemented as a Linear congruential generator. This is a function of the form X(n+1) (aXn +c) modulo m. Xn is the sequence of generated pseudorandom numbers. The genarated sequence of numbers is easy guessable. This algorithm can't be used as a cryptographically safe PRNG.
Wikipedia:Linear congruential generator
And take a look at the diehard tests for PRNG
PRNG Diehard Tests
To exactly answer you answer, the random function is provided by the operation system (usually).
But how the operating system creates this random numbers is a specialized area in computer science. See for example the wiki page posted in the answers above.
One thing you might want to examine is the family of random devices available on some Unix-like OSes like Linux and Mac OSX. For example, on Linux, the kernel gathers entropy from a variety of sources into a pool which it then uses to seed it's pseudo-random number generator. The entropy can come from a variety of sources, the most notable being device driver jitter from keypresses, network events, hard disk activity and (most of all) mouse movements. Aside from this, there are other techniques to gather entropy, some of them even implemented totally in hardware. There are two character devices you can get random bytes from and on Linux, they behave in the following way:
/dev/urandom gives you a constant stream of bytes which is very random but not cryptographically safe because it reuses whatever entropy is available in the pool.
/dev/random gives you cryptographically safe random numbers but it won't give you a constant stream as it uses the entropy available in the pool and then blocks while more entropy is collected.
Note that while Mac OSX uses a different method for it's PRNG and therefore does not block, my personal benchmarks (done in college) have shown it to be every-so-slightly less random than the Linux kernel. Certainly good enough, though.
So, in my projects, when I need randomness, I typically go for reading from one of the random devices, at least for the seed for an algorithm in my program.
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG),1 is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed (which may include truly random values). Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.[2]
PRNGs are central in applications such as simulations (e.g. for the Monte Carlo method), electronic games (e.g. for procedural generation), and cryptography. Cryptographic applications require the output not to be predictable from earlier outputs, and more elaborate algorithms, which do not inherit the linearity of simpler PRNGs, are needed.
Good statistical properties are a central requirement for the output of a PRNG. In general, careful mathematical analysis is required to have any confidence that a PRNG generates numbers that are sufficiently close to random to suit the intended use. John von Neumann cautioned about the misinterpretation of a PRNG as a truly random generator, and joked that "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."[3]
You can check out the wikipedia page for more here