Is Mersenne Twister a good binary RNG? - random

I'm trying to find an RNG to generate a stream of pseudorandom bits. I have found that Mersenne Twister (MT19937) is a widely used RNG that generates good 32-bit unsigned integers and that implementations have been done to generate apparently good double-precision floats (generating a 53-bit integer). But I don't seem to find any references to it being well-behaved on the bit side of things.
Marsaglia expressed some concerns about the randomness of Mersenne Twister that are making me doubt about using it.
Does anybody know if Mersenne Twister has a significant bias used to generate pseudorandom bits? If it is the case, does anyone know a good pseudorandom bit generator?

All psudorandom generators strive to generate a high degree of unpredictability per bit. There is currently no way to predict a bit from mersene twisters with a degree substantially better than random chance until you observe 624 values.
All questions in the form of "is X RNG good" must be replied with: "what are you doing with it?" Meresene Twister has had GREAT success in simulations because of its excellent frequency distributions. In cryptographic situations, it is completely and utterly devoid of all value whatsoever. The internal state can be identified by looking at any 624 contiguous outputs. Blum Blum Shub has been very strong in cryptographic situations, but it runs unacceptably slow for use in simulations.

No.
Nobody should be choosing a Mersenne Twister to generate randomness unless it's built-in, and if you are using randomness extensively you should be replacing it anyway. The Mersenne Twister fails basic statistical randomness tests that far simpler, far faster algorithms do not, and is generally just a bit disappointing.
The insecure, non-crytographic pseudo-random number generators I recommend nowadays are xoroshiro+ and the PCG family. xoroshiro+ is faster and purported to be slightly higher quality, but the PCG family comes with a more complete library and fills more roles.
However, modern cryptographic randomness can get more than fast enough. Rust's rand library uses ISAAC by default, and other choices exist. This should be your default choice in all but the most exceptional cases.

Related

Most suitable pseudo random number generators for Metropolis–Hastings MCMC

I am doing a lot of Metropolis-Hastings Markov chain Monte Carlo (MCMC).
Most codes I have in use, use Mersenne Twister (MT) as pseudo random number generator (PRNG).
However, I recently read, that MT is outdated and probably shouldn't be used anymore as it fails some tests and is relatively slow. So I am willing to switch.
Numpy now defaults to PCG (https://www.pcg-random.org/), which claims to be good. Other sites are rather critical. E.g. http://pcg.di.unimi.it/pcg.php.
It seems everyone praises its own work.
There is some good information already here: Pseudo-random number generator
But many answers are already a bit dated and I want to formulate my question a bit more specific.
As I said: the main use case is Metropolis-Hastings MCMC.
Therefore, I need:
uniformly distributed numbers in half-open and open intervals
around 2^50 samples, apparently per rule of thumb the PRNG should have a period of at least 2^128
sufficient quality of random numbers (whatever this might mean)
a reasonable fast PRNG (for a fixed runtime faster code means more accuracy for MCMC)
I do not need
cryptographically security
As I am by no means an expert, of course usability counts also. So I would welcome an available C++ implementation (this seems to be standard), which is sufficiently easy to use for the novice.

Is there any legitimate use for Intel's RDRAND?

Today I thought: well, even if there is great suspicion on RDRAND implementation of NIST SP 800-90A, it is still a hardware implementation of pseudo-random number generator (PRNG) that must be good enough for non-sensitive applications. So I thought of using it on my game instead of Mersenne Twister.
So, to see if there was any performance gain on using the instruction, I compared the time of the two following codes:
// test.cpp
#include <cstdio>
int main()
{
unsigned int rnd = 0;
for(int i = 0; i < 10000000; ++i) {
__builtin_ia32_rdrand32_step(&rnd);
}
printf("%x\n", rnd);
}
and
//test2.cpp
#include <cstdio>
#include <random>
int main()
{
unsigned int rnd = 0;
__builtin_ia32_rdrand32_step(&rnd);
std::mt19937 gen(rnd);
for(int i = 0; i < 10000000; ++i) {
rnd ^= gen();
}
printf("%x\n", rnd);
}
and by running the two I get:
$ time ./test
d230449a
real 0m0.361s
user 0m0.358s
sys 0m0.002s
$ time ./test2
bfc4e472
real 0m0.051s
user 0m0.050s
sys 0m0.002s
So, Mersenne Twister is much faster than RDRAND on my CPU. Well, I was disappointed, ruled out from my game. But RDRAND is a cryptographically secure PRNG (CSPRNG), so it does much behind the scenes... more fair would be compare it to other CSPRNG. So I took my Rabbit implementation (plain translation of the RFC to C, no fancy tricks for performance), and wrote the following test:
// test3.cpp
#include <cstdio>
extern "C"
{
#include "rabbit.h"
}
int main()
{
rabbit_state s;
unsigned long long buf[2];
__builtin_ia32_rdrand64_step(&buf[0]);
__builtin_ia32_rdrand64_step(&buf[1]);
rabbit_init_key(&s, (uint8_t*)&buf[0]);
for(int i = 0; i < 10000000; ++i) {
rabbit_extract(&s, (uint8_t*)&buf[0]);
}
printf("%llx\n", buf[0]);
}
And for my surprise, generating twice as much pseudo-random data as the first two of them, I got a better time than RDRAND:
$ time ./test3
8ef9772277b70aba
real 0m0.344s
user 0m0.341s
sys 0m0.002s
All three were compiled with optimization enabled.
So, we have a widespread paranoia that RDRAND was made to embed NSA backdoors into everybody's software cryptography. Also we have at least one software CSPRNG faster than RDRAND, and the most widely used decent PRNG, Mersenne Twister, is much faster than RDRAND. Finally, we have open-source auditable software entropy pools, like /dev/random and /dev/urandom, that are not hidden behind twofold scrambler layers of AES, like RDRAND.
So, the question: should people be using RDRAND? Is there any legitimate use for it? Or should we stop using it altogether?
As pointed out in the other answer, RDRAND is seeded with true randomness. In particular, it frequently reseeds its internal CSPRNG with 128 bits of hardware-generated randomness, guaranteeing a reseed at least once every 511 * 128 bits. See section 4.2.5 of this doc:
https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide
So in your examples, you used a single 128-bit seed to generate 10 million random draws from rabbit_extract. In the RDRAND version, you had the equivalent of 2.5 million 128-bit draws, meaning that the CSPRING was reseeded at least 2,500,000/511 = 4,892 times.
So instead of 128 bits of entropy going into your rabbit example, there were at least 4,892*128 = 626,176 bits of entropy going into the RDRAND example.
That's much, much more entropy than you're going to get in 0.361 seconds without hardware support. That could matter if you're doing stuff where lots of real randomness is important. One example is Shamir secret sharing of large quantities of data -- not sure if there are others.
So in conclusion -- it's not for speed, it's for high security. The question of whether it's backdoored is troubling, of course, but you can always XOR it with other sources, and at the very least it's not hurting you.
RDRAND is not just a PRNG. It is a whitened TRNG that is FIPS compliant. The difference is that you can rely on RDRAND to contain quite a lot of actual entropy directly retrieved from the CPU. So the main use of RDRAND is to supply entropy to OS/libraries/applications.
The only other good way for applications to retrieve entropy is usually using an OS supplied entropy source such as /dev/random or /dev/urandom (which usually draws the entropy from /dev/random). However, that OS also requires to find the entropy somewhere. Usually tiny differences in disk and network access times are used for this (+ other semi-random input). These devices are not always present, and are not designed as sources of entropy; they are often not very good sources, nor are they very fast. So on systems that support it, RDRAND is often used as an entropy source for the cryptographically secure random number generator of the operating system.
With regards to speed, especially for games, it is completely valid to use a (non-secure) PRNG. If you want to have a reasonable random seed then seeding it with the result of RDRAND may be a good idea, although seeding it from the OS supplied RNG may be a more portable and even a more secure option (in case you don't fully trust Intel or the US).
Note that currently RDRAND is implemented using (AES) CTR_DRBG instead of a (less well analysed) stream cipher that was created for speed such as Rabbit, so it should come as no surprise that Rabbit is faster. Even more importantly, it also has to retrieve the entropy from the entropy source within the CPU before it can run.
Is there any legitimate use for Intel's RDRAND?
Yes.
Consider a Monte-Carlo simulation. It has no cryptographic needs, so it does not matter if its backdoored by the NSA.
Or should we stop using it altogether?
We can't answer that. That's a confluence use cases, requirements and personal preferences.
... Also we have at least one software CSPRNG faster than RDRAND, and the most widely used decent PRNG..."
Mersenne Twister may be faster for a word at at time after initialization and without Twists because its returning a word from the state array. But I doubt its as fast as RDRAND for a continuous stream. I know RDRAND can achieve theoretical limits based on bus width in a continuous stream.
According to David Johnston of Intel (who designed the circuit), that's something like 800+ MB/s. See DJ's answer at What is the latency and throughput of the RDRAND instruction on Ivy Bridge?.
So, we have a widespread paranoia that RDRAND was made to embed NSA backdoors into everybody's software cryptography.
Paranoid folks have at least two choices. First, they can forgo using RDRAND and RDSEED. Second, they can use the output of RDRAND and RDSEED to seed another generator, and then use the output of the second generator. I believe the Linux kernel takes the second approach.
There is an astrophysics research paper here (http://iopscience.iop.org/article/10.3847/1538-4357/aa7ede/meta;jsessionid=A9DA9DDB925E6522D058F3CEEC7D0B21.ip-10-40-2-120), (non-paywalled) version here (https://arxiv.org/abs/1707.02212) that gives a legitimate use for RdRand.
It examines the effects of RdRand on a Monte Carlo simulator, as an earlier post advised. But the author didn't find any statistical difference in the results that use or do not use RdRand. From the performance standpoint, it looks like the Mersenne Twister is much faster. I think Sections 2.2.1 and 5 have all of the details.
True Randomness vs Pseudorandom.
rdrand is a hardware-source of true entropy, and at that a quite powerfull one (also it is not Intel-exclusive - AMD offers it the same - why do you single out Intel then? well, i'm concentrating on the implementation of Intel as they offer a bit more information and have the vastly higher throughput compared to even Zen2 [ryzen 3xxx])
Forget about the clockcyle-throughput - that is a misleading metric here as it is not really related. The throughput is limited by the roundtrip-delay for a single thread and by the actual hardware implementation running at 800 Mhz, has a fixed 8 cycles and can deliver a 64bit random value per transaction.
Looking just at the clock-cycles taken by the thread running the code will very depending on the current clockrate - idling at 800 MHz it would seem like at was 6 times faster than running at 4.8 Ghz.
But the delay between the cores and the DRNG is in the order of inter-Core-latency, and you have that twice, so about 80ns - with results in roughly 12MHz#8byte => ~100 MByte/s/Thread and maximum is 800 MB/s.
one other benefit of this: It is Not using up your CPU-Resources like most PRNG would do, so while other generators can have quite a bit higher throughput (1-2 orders of magnitude) they also require more resources. In situations were you need a lot of random numbers and are in a tight computationally heavy loop rdrand might actually give the better performance, but that of course is heavily dependent on the exact scenario and hardware.
If you just need a lot of random numbers for some simulation or need quasi-random numbers for games then rdrand is likely not the best choice. If you need true secure random number? Use it - and combine it with other sources of entropy if you are worried about some backdoors.

Predictability of Pseudo Random Generators by Quantum computers

May classical pseudo random generators be predictable by powerful quantum computers in the future or is it proven that this is not possible?
If they are predictable, do scientists know whether there exist PRGs that are unpredictable by quantum computers?
The security of a classical Cryptographic Pseudo-Random Number Generator (CPRNG) is always based on some hardness assumption, such as "factoring is hard" or "colliding the SHA-256 function is hard".
Quantum computers make some computational problems easier. That violates some of the old hardness assumptions. But not all of them.
For example, blum blum shub is likely broken by quantum computers, but no one knows how to break lattice-based cryptography with quantum computers. Showing you can break all classical CPRNGs with quantum computers is tantamount to showing that BQP=NP, which is not expected to be the case.
Even if quantum computers did break all classical CPRNGs, they happen to also fill that hole. They enable the creation of "Einstein-certified" random numbers.

Extremely fast method for modular exponentiation with modulus and exponent of several million digits

As a hobby project I'm taking a crack at finding really large prime numbers. The primality tests for this contain modular exponentiation calculations, i.e. a^e mod n. Let's call this the modpow operation to keep the explanation simple. I am wanting to speed up this particular calculation.
Currently I am using GMP's mpz_pown function, but, it is kind of slow. The reason I think it's too slow, is because a function call to GMP's modpow is slower than a full-blown primality test of the software called PFGW for the same large number. (So to be clear, this is just the GMP's modpow part, not my whole custom primality testing routine I am comparing). PFGW is considered the fastest in it's field and for my use case it uses a Brillhart-Lehmer-Selfridge primality test - which also uses the modpow procedure - so it's not because of mathematical cleverness that PFGW is faster in that aspect (please correct me if I'm wrong here). It looks like the bottleneck in GMP is the modpow operation. An example runtime for numbers which have a little over 20,000 digits: GMP's modpow operation takes about 45 seconds and PFGW finishes the whole primality test (involving a modpow) in 9 seconds flat. The difference gets even more impressive with even bigger numbers. GMP uses FFT multiplication and Montgomery reduction for this test comparison, see comments on this post below.
I did some research. So far I understand that the modpow algorithm uses exponentiation by squaring, integer multiplication and modulo reduction - these all sound very familiar to me. Several helper methods could improve the running time of integer multiplication:
Montgomery multiplication
FFT multiplication
To improve the running time of the exponentiation by squaring part, one may use a signed digit representation to reduce the number of multiplications (i.e. bits are represented as 0, 1 or -1, and the bit string is represented in such a way so that it contains many more zeros than in it's original base-2 representation - this reduces the running time of exponentiation by squaring).
For optimizing the modulo part of the operation, I know of these methods:
Montgomery reduction
So here is the 150,000 dollar question: is there a software library available to do a modpow operation efficiently given a very large base, exponent and modulus? (I'm aiming for several millions of digits). If you would like to suggest an option, please try to explain the inner workings of the algorithm for the case with millions of digits for the base, modulus and exponents, as some libraries use different algorithms based on the number of digits. Basically I am looking for a library which supports the techniques mentioned above (or possibly more clever techniques) and it should perform well while running the algorithm (well, better than GMP at least). So far I've searched, found and tried GMP and PFGW, but didn't find these satisfying (PFGW is fast, but I'm just interested in the modpow operation and there is no direct programming interface to that). I'm hoping that maybe an expert in the field can suggest a library with these capabilities, as there seem to be very few that are able to handle these requirements.
Edit: made the question more concise, as it was marked too broad.
First off, re. the Answer 1 writer's comment "I do not use GMP but I suspect when they wrote they use FFT
they really mean the NTT" -- no, when GMP says "FFT' it means a floating-point FFT. IIRC they also have some NTT-based routines, but for bignum mul those are uncompetitive with FFT.
The reason a well-tuned FFT-mul beats any NTT is that the slight loss of per-word precision due to roundoff error accumulation is more than made up for by the vastly superior floating-point capabilities of modern CPU offerings, especially when one considers high-performance implementations which make use of the vector-math capabilities of CPUs such as the x86_64 family, the current iterations of which - Intel Haswell, Broadwell and Skylake - have massive vector floating-point capability. (I don't cite AMD in this regard because their AVX offerings have lagged far behind Intel's; their high-water mark was circa 2002 and since then Intel has been beating the pants off them in progressively-worse fashion each year.) The reason GMP disappoints in this area is that GMP's FFT is, relatively speaking, crap. I have great respect for the GMP coders overall, but FFT timings are FFT timings, you don't get points for effort or e.g. having a really good bignum add. Here is a paper detailing a raft of GMP FFT-mul improvements:
Pierrick Gaudry, Alex Kruppa, Paul Zimmerman: "A GMP-based Implementation of Schönhage-Strassen's Large Integer Multiplication Algorithm" [http://www.loria.fr/~gaudry/publis/issac07.pdf]
This is from 2007, but AFAIK the performance gap noted in the snippet below has not been narrowed; if anything it has widened. The paper is excellent for detailing various mathematical and algorithmic improvements which can be deployed, but let's cut to the money quote:
"A program that implements a complex floating-point FFT for integer multiplication is George Woltman’s Prime95. It is written mainly for testing large Mersenne numbers 2^p − 1 for primality in the in the Great Internet Mersenne Prime Search [24]. It uses a DWT for multiplication mod a*2^n ± c, with a and c not too large, see [17]. We compared multiplication modulo 2^2wn − 1 in Prime95 version 24.14.2 with multiplication of n-word integers using our SSA implementation on a Pentium 4 at 3.2 GHz, and on an Opteron 250 at 2.4 GHz, see Figure 4. It is plain that Prime95 beats our im- plementation by a wide margin, in fact usually by more than a factor of 10 on a Pentium 4, and by a factor between 2.5 and 3 on the Opteron."
The next few paragraphs are a raft of face-saving spin. (And again, I am personally acquainted with 2 of the 3 authors, and they are all top guys in the field of computational number theory.)
Note that the aforementioned George Woltman, whose Prime95 code has discovered all of the world-record primes since shortly after its debut 20 years ago, has made his core bignum routines available in a general API-ized form called the GWNUM library. You mentioned how much faster PFGW is than GMP for FFT-mul - that's because PFGW uses GWNUM for the core 'heavy lifting' arithmetic, that's where the 'GW' in PFGW comes from.
My own FFT implementation, which has generic-C build support but like George's uses reams of x86 vector-math assembler for high performance on that CPU family, is roughly 60-70% slower than George's on current Intel processor families. I believe that makes it the world's 2nd-fastest bignum-mul code on x86. By way of example, my code is currently running a primality test on a number with roughly 2^29 bits using a 30-Mdouble-length FFT (30*2^20 doubles); thus a little more than 17 bits per input word. Using all four of my 3.3 GHz Haswell 4670 quad's cores it takes ~90 ms per modmul.
BTW, many (if not most) of the world's top bignum-math coders hang out at mersenneforum.org, I encourage you to check it out and ask your questions to the broader (at least in this particular area) expert audience there. I appear under the same handle there as here; George Woltman appears as "Prime95', PFGW's Mark Rodenkirch goes as "rogue".
I do not use GMP at all so handle this with that in mind.
I rather use NTT instead of FFT for multiplication
it removes the rounding errors and in comparison to mine FFT implementations optimized to the same point is faster
C++ NTT implementation
C++ NTT sqr(mul) implementation
as I mentioned I do not use GMP but I suspect when they wrote they use FFT they really mean the NTT (finite field Fourier transform)
The speed difference of your test and the GMP primality test can be caused by modpow call.
if there are too much calls to it then it causes heap/stack trashing which slows things down considerably. Especially for bignums. Try to avoid heap trashing so eliminate as much data from operands and return calls as possible for frequently called functions. Also sometimes helps to eliminate the bottleneck call by directly copy the function source code into your code instead of the call (or use of macros instead) with use of local variables only.
I think GMP published their source code so finding their implementation of modpow should not be too hard. You just have to use it correctly
just to be clear
you are using big numbers like 20000+ decadic digits which means ~8.4 KBytes per number. Any return or non pointer operand means to copy that amount of data to/from heap stack. This not only takes time but also usually invalidates the CPU's CACHE which also kills performance.
Now multiply this by the algorithm iterations number and you get the Idea. Had similar problems while tweaking many of mine bignum functions and the speedup is often more than 10000% (100 times) even if there is no change in the algorithm used. Just limiting/eliminating heap/stack trashing.
So I do not think you need better modpow implementation just better usage of it but of coarse I may be wrong but without the code you are using is hard to deduce more.

Can quantum algorithms be used for encryption?

Can quantum algorithms be useful?
Has any one been successful in putting quantum algorithms to any use?
"Quantum algorithms" are algorithms to be run on quantum computers.
There are things that can be done quickly in the quantum computation model that are not known (or believed) to be possible with classical computation: Discrete logarithm and Integer factorisation (see Shor's algorithm) are in BQP, but not believed to be in P (or BPP). Thus when/if a quantum computer is built, it is known that it can break RSA and most current cryptography.
However,
quantum computers cannot (are not believed to, I mean) solve NP-complete problems in polynomial time, and more importantly,
no one has built a quantum computer yet, and it is not even clear if it will be possible to build one -- avoiding decoherence, etc. (There have been claims of quantum computers with a limited number of qubits -- 5 to 10, but clearly they are not useful for anything much.)
"Well, there's a quantum computer that can factor 15, so those of you using
4-bit RSA should worry." -- Bruce Schneier
[There is also the idea of quantum cryptography, which is cryptography over a quantum channel, and is something quite different from quantum computation.]
The only logical answer is that they are both useful and not useful. ;-)
My understanding is that current quantum computing capabilities can be used to exchange keys securely. The exchanged keys can then be used to perform traditional cryptography.
As far i know about Quantum Computing and Algorithms.I seen pretty much usage of Quantum Algorithms in Cryptography.If you are really interested in Cryptography do please check on those things.Basically all matter is how well you know basics of Quantum mechanics and discrete mathematics. Eg: you must be seeing difficult algorithms like Shor's Algorithm ,this basically integer factorization.Basically integer factorization is easy thing using normal algorithms Algebraic-group factorization algorithm,Fermat's factorization method..etc but when its comes to Quantum Computing its totally different,you are running the things in Quantum computers so algorithm changes and we have to use the algorithms like Shor's etc.
Basically make good understanding about Quantum Computing and then look Quantum Algorithms
There is also some research into whether quantum computing can be used to solve hard problems, such as factoring large numbers (if this was feasible it would break current encryption techniques).
Stackoverflow runs on a quantum computer of sorts.
Feynman has implied the possibility that quantum probability is the source of human creativity.
Individuals in the crowd present answers and vote on them with only a probability of being correct. Only by sampling the crowd many times can the probability be raised to a confident level.
So maybe Stackoverflow examplifies a successful quantum algorithm implementation.
What do you think?
One good use of a quantum device that can be done in current technology is a random number generator.
Generating truly random bits is an important cryptographic primitive, and is used, for example, in the RSA algorithm to generate the private key. In our PC the random number generator isn't random at all in the sense that the source has no entropy in it, and therefore isn't really random at all.

Resources