Is it possible for a RNG skewed by seed? - random

As the question asks, Is it possible for a badly programmed Pseudo RNG to be skewed by the seed used to generate the random number? If yes, what are the common mistakes that causes this to happen?

A few examples:
https://banu.com/blog/42/openbsd-bug-in-the-random-function/
the old seeding procedure used by Mersenne Twister (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html links to a clear explanation)
the old super duper (can't find the link atm) was supposed to have period ~2^64, but some seeds gave it period ~2^5x
Also, even if it's not really a bug, many, many generators initialize themselves with something like
if seed == 0:
# comment explaining that zero is bad
seed = nonzero
or
// Due to (reasons) we only really allow seeds in [0,M)
seed = seed % M
meaning that there will be at least two seeds giving identical results.
Examples: xorgens (zero=>nonzero), java's Random (seed is 64 bits, internal state is 48 bits...), go's math/rand (it does both things, see the first lines of the Seed() method at https://golang.org/src/math/rand/rng.go)

Related

urandom_range(), urandom(), random() in verilog

I am confused between these three functions and I was wondering for some explanation. If I set the range how do I make the range exclusive or inclusive? Are the ranges inclusive or exclusive if I don't specify the range?
In addition to the answer from #dave_59, there are other important differences:
i) $random returns a signed 32-bit integer; $urandom and $urandom_range return unsigned 32-bit integers.
ii) The random number generator for $random is specified in IEEE Std 1800-2012. With the same seed you will get exactly the same sequence of random numbers in any SystemVerilog simulator. That is not the case for $urandom and $urandom_range, where the design of the random number generator is up to the EDA vendor.
iii) Each thread has its own random number generator for $urandom and $urandom_range, whereas there is only one random number generator for $random shared between all threads (ie only one for the entire simulation). This is really important, because having separate random number generators for each thread helps you simulation improve a property called random stability. Suppose you are using a random number generator to generate random stimulus. Suppose you find a bug and fix it. This could easily change the order in which threads (ie initial and always blocks) are executed. If that change changed the order in which random numbers were generated then you would never know whether the bug had gone away because you'd fixed it or because the stimulus has changed. If you have a random number generator for each thread then your testbench is far less vulnerable to such an effect - you can be far more sure that the bug has disappeared because you fixed it. That property is called random stability.
So, As #dave_59 says, you should only be using $urandom and $urandom_range.
You should only be using $urandom and $urandom_range. These two functions provide better quality random numbers and better seed initialization and stability than $random. The range specified by $urandom_range is always inclusive.
Although $random generates the exact same sequence of random numbers for each call, it is extremely difficult to keep the same call ordering as soon as any change is made to the design or testbench. Even more difficult when multiple threads are concurrently generating random numbers.

C++ random_shuffle() how does it work?

I have a Deck vector with 52 Card, and I want to shuffle it.
vector<Card^> cards;
So I used this:
random_shuffle(cards.begin(), cards.end());
The problem was that it gave me the same result every time, so I used srand to randomize it:
srand(unsigned(time(NULL)));
random_shuffle(cards.begin(),cards.end());
This was still not truly random. When I started dealing cards, it was the same as in the last run. For example: "1. deal: A,6,3,2,K; 2. deal: Q,8,4,J,2", and when I restarted the program I got exactly the same order of deals.
Then I used srand() and random_shuffle with its 3rd parameter:
int myrandom (int i) {
return std::rand()%i;
}
srand(unsigned(time(NULL)));
random_shuffle(cards.begin(),cards.end(), myrandom);
Now it's working and always gives me different results on re-runs, but I don't know why it works this way. How do these functions work, what did I do here?
This answer required some investigation, looking at the C++ Standard Library headers in VC++ and looking at the C++ standard itself. I knew what the standard said, but I was curious about VC++ (including C++CLI) did their implementation.
First what does the standard say about std::random_shuffle . We can find that here. In particular it says:
Reorders the elements in the given range [first, last) such that each possible permutation of those elements has equal probability of appearance.
1) The random number generator is implementation-defined, but the function std::rand is often used.
The bolded part is key. The standard says that the RNG can be implementation specific (so results across different compilers will vary). The standard suggests that std::rand is often used. But this isn't a requirement. So if an implementation doesn't use std::rand then it follows that it likely won't use std::srand for a starting seed. An interesting footnote is that the std::random_shuffle functions are deprecated as of C++14. However std::shuffle remains. My guess is that since std::shuffle requires you to provide a function object you are explicitly defining the behavior you want when generating random numbers, and that is an advantage over the older std::random_shuffle.
I took my VS2013 and looked at the C++ standard library headers and discovered that <algorithm> uses template class that uses a completely different pseudo-rng (PRNG) than std::rand with an index (seed) set to zero. Although this may vary in detail between different versions of VC++ (including C++/CLI) I think it is probable that most versions of VC++/CLI do something similar. This would explain why each time you run your application you get the same shuffled decks.
The option I would opt for if I am looking for a Pseudo RNG and I'm not doing cryptography is to use something well established like Mersenne Twister:
Advantages The commonly-used version of Mersenne Twister, MT19937, which produces a sequence of 32-bit integers, has the following desirable properties:
It has a very long period of 2^19937 − 1. While a long period is not a guarantee of quality in a random number generator, short periods (such as the 2^32 common in many older software packages) can be problematic.
It is k-distributed to 32-bit accuracy for every 1 ≤ k ≤ 623 (see definition below).
It passes numerous tests for statistical randomness, including the Diehard tests.
Luckily for us C++11 Standard Library (which I believe should work on VS2010 and later C++/CLI) includes a Mersenne Twister function object that can be used with std::shuffle Please see this C++ documentation for more details. The C++ Standard Library reference provided earlier actually contains code that does this:
std::random_device rd;
std::mt19937 g(rd());
std::shuffle(v.begin(), v.end(), g);
The thing to note is that std::random_device produces non-deterministic (non repeatable) unsigned integers. We need non-deterministic data if we want to seed our Mersenne Twister (std::mt19937) PRNG with. This is similar in concept to seeding rand with srand(time(NULL)) (The latter not being an overly good source of randomness).
This looks all well and good but has one disadvantage when dealing with card shuffling. An unsigned integer on the Windows platform is 4 bytes (32 bits) and can store 2^32 values. This means there are only 4,294,967,296 possible starting points (seeds) therefore only that many ways to shuffle the deck. The problem is that there are 52! (52 factorial) ways to shuffle a standard 52 card deck. That happens to be 80658175170943878571660636856403766975289505440883277824000000000000 ways, which is far bigger than the number of unique ways we can get from setting a 32-bit seed.
Thankfully, Mersenne Twister can accept seeds between 0 and 2^19937-1. 52! is a big number but all combinations can be represented with a seed of 226 bits (or ~29 bytes). The Standard Library allow std::mt19937 to accept a seed up to 2^19937-1 (~624 bytes of data) if we so choose. But since we need only 226 bits the following code would allow us to create 29 bytes of non-deterministic data to be used as a suitable seed for std::mt19937:
// rd is an array to hold 29 bytes of seed data which covers the 226 bits we need */
std::array<unsigned char, 29> seed_data;
std::random_device rd;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(rd));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
// Set the seed for Mersenne *using the 29 byte sequence*
std::mt19937 g(seq);
Then all you need to do is call shuffle with code like:
std::shuffle(cards.begin(),cards.end(), g);
On Windows VC++/CLI you will get a warning that you'll want to suppress with the code above. So at the top of the file (before other includes) you can add this:
#define _SCL_SECURE_NO_WARNINGS 1

Is there anything wrong with a random seed of 0, 1

Is there anything particularly wrong with a random seed of 0, 1... or any other small integer, when using Mersenne Twister?
(I definitely want a repeatable pseudorandom sequence hence not seeding from time() or such like).
No problem whatsoever.
The point of seeding via a single integer is programmers's convenience:
You don't need to remember the rules for admissible seeds ("need 55 numbers, not all even; need < 256 bytes; ...");
You can (hopefully) easily get either different streams, or sequences which are at least far away from one another (some generators can prove this, not sure about MT); the seed is just (conceptually!) an index into a list of possible sequences.
The actual value of the seed is irrelevant -- 0 isn't more random than 1391202260 :)

"Resetting" pseudo-random number generator seed multiple times?

Today, my friend had a thought that setting the seed of a pseudo-random number generator multiple times using the pseudo-random number generated to "make things more randomized".
An example in C#:
// Initiate one with a time-based seed
Random rand = new Random(milliseconds_since_unix_epoch());
// Then loop for a_number_of_times...
for (int i = 0; i < a_number_of_times; i++)
{
// ... to initiate with the next random number generated
rand = new Random(rand.Next());
}
// So is `rand` now really random?
assert(rand.Next() is really_random);
But I was thinking that this could probably increase the chance of getting a repeated seed being used for the pseudo-random number generator.
Will this
make things more randomized,
making it loop through a certain number of seeds used, or
does nothing to the randomness (i.e. neither increase nor decrease)?
Could any expert in pseudo-random number generators give some detailed explanations so that I can convince my friend? I would be happy to see answers explaining further detail in some pseudo-random number generator algorithm.
There are three basic levels of use for pseudorandom numbers. Each level subsumes the one below it.
Unexpected numbers with no particular correlation guarantees. Generators at this level typically have some hidden correlations that might matter to you, or might not.
Statistically-independent number with known non-correlation. These are generally required for numerical simulations.
Cryptographically secure numbers that cannot be guessed. These are always required when security is at issue.
Each of these is deterministic. A random number generator is an algorithm that has some internal state. Applying the algorithm once yields a new internal state and an output number. Seeding the generator means setting up an internal state; it's not always the case that the seed interface allows setting up every possible internal state. As a good rule of thumb, always assume that the default library random() routine operates at only the weakest level, level 1.
To answer your specific question, the algorithm in the question (1) cannot increase the randomness and (2) might decrease it. The expectation of randomness, thus, is strictly lower than seeding it once at the beginning. The reason comes from the possible existence of short iterative cycles. An iterative cycle for a function F is a pair of integers n and k where F^(n) (k) = k, where the exponent is the number of times F is applied. For example, F^(3) (x) = F(F(F(x))). If there's a short iterative cycle, the random numbers will repeat more often than they would otherwise. In the code presented, the iteration function is to seed the generator and then take the first output.
To answer a question you didn't quite ask, but which is relevant to getting an understanding of this, seeding with a millisecond counter makes your generator fail the test of level 3, unguessability. That's because the number of possible milliseconds is cryptographically small, which is a number known to be subject to exhaustive search. As of this writing, 2^50 should be considered cryptographically small. (For what counts as cryptographically large in any year, please find a reputable expert.) Now the number of milliseconds in a century is approximately 2^(41.5), so don't rely on that form of seeding for security purposes.
Your example won't increase the randomness because there is no increase in entropy. It is simply derived from the execution time of the program.
Instead of using something based of the current time, computers maintain an entropy pool, and build it up with data that is statistically random (or at least, unguessable). For example, the timing delay between network packets, or key-strokes, or hard-drive read times.
You should tap into that entropy pool if you want good random numbers. These are known as Cryptographically secure pseudorandom number generators.
In C#, see the Cryptography.RandomNumberGenerator Class for the right way to get a secure random number.
This will not make things more "random".
Our seed determines the random looking but completely determined sequence of numbers that rand.next() gives us.
Instead of making things more random, your code defines a mapping from your initial seed to some final seed, and, given the same initial seed, you will always end up with the same final seed.
Try playing with this code and you will see what I mean (also, here is a link to a version you can run in your browser):
int my_seed = 100; // change my seed to whatever you want
Random rand = new Random(my_seed);
for (int i = 0; i < a_number_of_times; i++)
{
rand = new Random(rand.Next());
}
// does this print the same number every run if we don't change the starting seed?
Console.WriteLine(rand.Next()); // yes, it does
The Random object with this final seed is just like any other Random object. It just took you more time then necessary to create it.

Does Kernel::srand have a maximum input value?

I'm trying to seed a random number generator with the output of a hash. Currently I'm computing a SHA-1 hash, converting it to a giant integer, and feeding it to srand to initialize the RNG. This is so that I can get a predictable set of random numbers for an set of infinite cartesian coordinates (I'm hashing the coordinates).
I'm wondering whether Kernel::srand actually has a maximum value that it'll take, after which it truncates it in some way. The docs don't really make this obvious - they just say "a number".
I'll try to figure it out myself, but I'm assuming somebody out there has run into this already.
Knowing what programmers are like, it probably just calls libc's srand(). Either way, it's probably limited to 2^32-1, 2^31-1, 2^16-1, or 2^15-1.
There's also a danger that the value is clipped when cast from a biginteger to a C int/long, instead of only taking the low-order bits.
An easy test is to seed with 1 and take the first output. Then, seed with 2i+1 for i in [1..64] or so, take the first output of each, and compare. If you get a match for some i=n and all greater is, then it's probably doing arithmetic modulo 2n.
Note that the random number generator is almost certainly limited to 32 or 48 bits of entropy anyway, so there's little point seeding it with a huge value, and an attacker can reasonably easily predict future outputs given past outputs (and an "attacker" could simply be a player on a public nethack server).
EDIT: So I was wrong.
According to the docs for Kernel::rand(),
Ruby currently uses a modified Mersenne Twister with a period of 2**19937-1.
This means it's not just a call to libc's rand(). The Mersenne Twister is statistically superior (but not cryptographically secure). But anyway.
Testing using Kernel::srand(0); Kernel::sprintf("%x",Kernel::rand(2**32)) for various output sizes (2*16, 2*32, 2*36, 2*60, 2*64, 2*32+1, 2*35, 2*34+1), a few things are evident:
It figures out how many bits it needs (number of bits in max-1).
It generates output in groups of 32 bits, most-significant-bits-first, and drops the top bits (i.e. 0x[r0][r1][r2][r3][r4] with the top bits masked off).
If it's not less than max, it does some sort of retry. It's not obvious what this is from the output.
If it is less than max, it outputs the result.
I'm not sure why 2*32+1 and 2*64+1 are special (they produce the same output from Kernel::rand(2**1024) so probably have the exact same state) — I haven't found another collision.
The good news is that it doesn't simply clip to some arbitrary maximum (i.e. passing in huge numbers isn't equivalent to passing in 2**31-1), which is the most obvious thing that can go wrong. Kernel::srand() also returns the previous seed, which appears to be 128-bit, so it seems likely to be safe to pass in something large.
EDIT 2: Of course, there's no guarantee that the output will be reproducible between different Ruby versions (the docs merely say what it "currently uses"; apparently this was initially committed in 2002). Java has several portable deterministic PRNGs (SecureRandom.getInstance("SHA1PRNG","SUN"), albeit slow); I'm not aware of something similar for Ruby.

Resources