Most suitable pseudo random number generators for Metropolis–Hastings MCMC - random

I am doing a lot of Metropolis-Hastings Markov chain Monte Carlo (MCMC).
Most codes I have in use, use Mersenne Twister (MT) as pseudo random number generator (PRNG).
However, I recently read, that MT is outdated and probably shouldn't be used anymore as it fails some tests and is relatively slow. So I am willing to switch.
Numpy now defaults to PCG (https://www.pcg-random.org/), which claims to be good. Other sites are rather critical. E.g. http://pcg.di.unimi.it/pcg.php.
It seems everyone praises its own work.
There is some good information already here: Pseudo-random number generator
But many answers are already a bit dated and I want to formulate my question a bit more specific.
As I said: the main use case is Metropolis-Hastings MCMC.
Therefore, I need:
uniformly distributed numbers in half-open and open intervals
around 2^50 samples, apparently per rule of thumb the PRNG should have a period of at least 2^128
sufficient quality of random numbers (whatever this might mean)
a reasonable fast PRNG (for a fixed runtime faster code means more accuracy for MCMC)
I do not need
cryptographically security
As I am by no means an expert, of course usability counts also. So I would welcome an available C++ implementation (this seems to be standard), which is sufficiently easy to use for the novice.

Related

Better Random Number Testing Suites

I wanted to generate multiple random number sets and test them using conventional random number testing suites. I came across the NIST suite of testing, Diehard suite and a few other commonly mentioned ones.
These seemed to be pretty old. Are there any newer test suites that are deemed better for the modern generation schemes or if not which of these are the better choices for testing pseudo random number sets of relatively small numbers.
Thanks.
I depends a bit on what you want the random numbers for. Assuming you are doing simulations, Monte-Carlo methods, or most other non-security uses, testu01 as mentioned in another answer is very good. Also practrand http://pracrand.sourceforge.net/
and gjrand http://gjrand.sourceforge.net/
Lots of caveats. None is really easy to use out of the box. It would help to have some C or C++ experience. None can prove that the prng is good. They can find certain classes of faults if they exist (and lots of popular prngs have faults that at least one of these three suites can find).
TestU01 from prof P.L'Ecuyer seems to be pretty good choice, and it is not that old as you imply http://simul.iro.umontreal.ca/testu01/tu01.html

Pipelining the Ziggurat Random Number Generator

I'm currently implementing a version of George Marsaglia's Ziggurat random number generator. Although it is supposedly one of the fastest ways to generate good quality normally-distributed random number generators, it is full of loop control code (ie. return statements in the middle of a loop, if-statements, branches, etc) and it makes several calls to standard C functions like exp() and log(). Not to mention the infinite loop.
This makes for code that cannot be pipelined by the compiler. Ultimately, I feel like a basic approach, such as using the central limit theorem directly, might ultimately be faster since it can be pipelined easily. Unfortunately, it is not suitable for the tails of the Gaussian distribution and therefore it's not acceptable for my application.
Does anybody here have any ideas on how control code and function calls might be reduced. I am currently using Colin Green's implementation of the algorithm that I ported to C. My underlying uniform generator is the Tiny Mersenne Twister (so please don't tell me to use the MT as I've seen other people do, I'm already there. This discussion is for normally-distributed RNG's, not uniform RNG's).
You might take a look at my C implementation
here. The main function is only 20-something lines of code, so should be easy to unroll the loop a bit. It also gives you the choice of using integer or float compares, whichever is faster on your machine. You can plug in any back-end RNG.

Random number generation / which algorithm?

We need to migrate to a better RNG or RBG for some key value generation which will be further used for encryption of the data.
Which will be the most suitable algorithm? Shall I consider NIST doc for this?
Any pseudo random number generator that produces a Gaussian distribution and that has a wide output (say at least 32 bits) should be enough for creating keys. It's up to you to determine your needs and then find a matching RNG.
For more info, see http://www.random.org/randomness.
Depending on the language you choose to implement this, I'm sure you can find source code for pseudo-RNG on the Web, if the one built-in into your system isn't good enough.
As we are a programming site, I would seriously look at the secure random number generators at your disposal in your particular runtime environment. In general you will have to rely on system resources to generate randoms, at least to seed the pseudo random number generator. The only possible exception are CPU specific random instructions, such as the ones used on the latest Intel CPU's (hopefully well-tested secure RNGs will become a main feature of CPU's).
Within many programming environments there is very little choice but to use OpenSSL or /dev/random for seeding. In general it is hard to find useful information about the random number generator. Sometimes the RNG is really not suitable at all (e.g. the native PHP version).
If possible, try to find something that conforms to NIST requirements.

Is Mersenne Twister a good binary RNG?

I'm trying to find an RNG to generate a stream of pseudorandom bits. I have found that Mersenne Twister (MT19937) is a widely used RNG that generates good 32-bit unsigned integers and that implementations have been done to generate apparently good double-precision floats (generating a 53-bit integer). But I don't seem to find any references to it being well-behaved on the bit side of things.
Marsaglia expressed some concerns about the randomness of Mersenne Twister that are making me doubt about using it.
Does anybody know if Mersenne Twister has a significant bias used to generate pseudorandom bits? If it is the case, does anyone know a good pseudorandom bit generator?
All psudorandom generators strive to generate a high degree of unpredictability per bit. There is currently no way to predict a bit from mersene twisters with a degree substantially better than random chance until you observe 624 values.
All questions in the form of "is X RNG good" must be replied with: "what are you doing with it?" Meresene Twister has had GREAT success in simulations because of its excellent frequency distributions. In cryptographic situations, it is completely and utterly devoid of all value whatsoever. The internal state can be identified by looking at any 624 contiguous outputs. Blum Blum Shub has been very strong in cryptographic situations, but it runs unacceptably slow for use in simulations.
No.
Nobody should be choosing a Mersenne Twister to generate randomness unless it's built-in, and if you are using randomness extensively you should be replacing it anyway. The Mersenne Twister fails basic statistical randomness tests that far simpler, far faster algorithms do not, and is generally just a bit disappointing.
The insecure, non-crytographic pseudo-random number generators I recommend nowadays are xoroshiro+ and the PCG family. xoroshiro+ is faster and purported to be slightly higher quality, but the PCG family comes with a more complete library and fills more roles.
However, modern cryptographic randomness can get more than fast enough. Rust's rand library uses ISAAC by default, and other choices exist. This should be your default choice in all but the most exceptional cases.

Safe mixing of entropy sources

Let us assume we're generating very large (e.g. 128 or 256bit) numbers to serve as keys for a block cipher.
Let us further assume that we wear tinfoil hats (at least when outside).
Being so paranoid, we want to be sure of our available entropy, but we don't entirely trust any particular source. Maybe the government is rigging our coins. Maybe these dice are ever so subtly weighted. What if the hardware interrupts feeding into /dev/random are just a little too consistent? (Besides being paranoid, we're lazy enough that we don't want to generate it all by hand...)
So, let's mix them all up.
What are the secure method(s) for doing this? Presumably just concatenating a few bytes from each source isn't entirely secure -- if one of the sources is biased, it might, in theory, lend itself to such things as a related-key attack, for example.
Is running SHA-256 over the concatenated bytes sufficient?
(And yes, at some point soon I am going to pick up a copy of Cryptography Engineering. :))
Since you mention /dev/random -- on Linux at least, /dev/random is fed by an algorithm that does very much what you're describing. It takes several variously-trusted entropy sources and mixes them into an "entropy pool" using a polynomial function -- for each new byte of entropy that comes in, it's xor'd into the pool, and then the entire pool is stirred with the mixing function. When it's desired to get some randomness out of the pool, the entire pool is hashed with SHA-1 to get the output, then the pool is mixed again (and actually there's some more hashing, folding, and mutilating going on to make sure that reversing the process is about as hard as reversing SHA-1). At the same time, there's a bunch of accounting going on -- each time some entropy is added to the pool, an estimate of the number of bits of entropy it's worth is added to the account, and each time some bytes are extracted from the pool, that number is subtracted, and the random device will block (waiting on more external entropy) if the account would go below zero. Of course, if you use the "urandom" device, the blocking doesn't happen and the pool simply keeps getting hashed and mixed to produce more bytes, which turns it into a PRNG instead of an RNG.
Anyway... it's actually pretty interesting and pretty well commented -- you might want to study it. drivers/char/random.c in the linux-2.6 tree.
Using a hash function is a good approach - just make sure you underestimate the amount of entropy each source contributes, so that if you are right about one or more of them being less than totally random, you haven't weakened your key unduly.
This isn't dissimilar to the approach used in key stretching (though you have no need for multiple iterations here).
I've done this before, and my approach was just to XOR them, byte-by-byte, against each other.
Running them through some other algorithm, like SHA-256, is terribly inefficient, so it's not practical, and I think it would be not really useful and possibly harmful.
If you do happen to be incredibly paranoid, and have a tiny bit of money, it might be fun to buy a "true" (depending on how convinced you are by Quantum Mechanics) a Quantum Random Number Generator.
-- Edit:
FWIW, I think the method I describe above (or something similar) is effectively a One-Time Pad from the point of view of either sources, assuming one of them is random, and therefore unattackable assuming they are independant and out to get you. I'm happy to be corrected on this if someone takes issue with it, and I encourage anyone not taking issue with it to question it anyway, and find out for yourself.
If you have a source of randomness but you're not sure whether it is biased or not, then there are a lot of different algorithms. Depending on how much work you want to do, the entropy you waste from the original source differes.
The easiest algorithm is the (improved) van Neumann algorithm. You can find the details in this pdf:
http://security1.win.tue.nl/~bskoric/physsec/files/PhysSec_LectureNotes.pdf
at page 27.
I also recommend you to read this document if you're interested in how to produce uniformly randomness from a given souce, how true random number generators work, etc!

Resources