Mersenne Twister: seeding & visualization

Mersenne Twister: seeding & visualization - random

I am using a C# implementation of Mersenne Twister I downloaded from CenterSpace. I have two problems with it:
No matter how I seed the algorithm it does not pass DieHard tests, and by that I mean I get quite a lot of 1s and 0s for p-value. Also my KStest on 269 p-values is 0. Well, I cannot quite interpret p-value, but I think a few 1s and 0s in the result is bad news.
I have been asked to visually show the randomness of the numbers. So I plot the numbers as they are generated, and this does not seem random at all. Here is two screenshots of the result after a few seconds and a few seconds later. As you can see in the second screenshot the numbers fall on some parallel lines. I have tried different algorithms to map numbers to points. They all result in parallel lines, but with different angles! This is how I mapped numbers to points for these screenshots: new Point(number % _canvasWidth, number % _canvasHeight). As you may guess, the visual result depends on the form's width and height, and this is a disasterous result.
Here is a few ways I tried to seed the algorithm:
User entry. I enter some numbers to seed the algorithm as an int array.
Random numbers generated by the algorithm itself!!
An array of new Guid().GetHashCode()
What am I missing here? How should I seed the algorithm? How can I get it pass the DieHard?

While I cannot speak to your first point, the second problem has to do with how you are computing the points to draw on. Specifically,
x = number % _canvasWidth;
y = number % _canvasHeight;
will give you a "pattern" that corresponds somewhat to the aspect ratio of the window you are drawing to. For example, if _canvasWidth and _canvasHeight were equal, you would always draw on a single diagonal line as x and y would always be the same. This graphical representation wouldn't be appropriate in this case, then.
What about taking the N bits of the RNG output and using half for the x coordinate and the other half for the y coordinate? For those bits that fall out of the bounds of your window you might want to consider two options:
Don't draw them (or draw them offscreen)
Perform a linear interpolation to map the range of bits to the width/height of your window
Either option should give you a more representative picture of the bits you are getting our of your random number generator. Good luck!

Your stripy point-plotting problem should easily be fixed by generating a new random number for each of the x and y coordinates. Trying to reuse a single generated number for x and y is basically premature optimization, but if you do go down that route, make sure you extract different bits for each from the number; as is, x=n%width;y=n%height gives you enormous correlation between x and y, as can be seen in your images.
I've been using various C++ Mersenne Twister implementations for years (most recently boost's) to generate random points and had no difficulties with it (seed related or otherwise). It really is a superb generator.

True random number generation cannot be done with a mathematical function. If it's important to have truly random numbers, get a hardware random number generator. I've developed real money online poker games—such hardware is the only way to be confident there are no patterns in the numbers.
If targeting a Linux environment, the /dev/random and /dev/urandom pseudo devices do a lot better than a mathematical generator, since they incorporate random numbers representing hardware activity.

Related

How to get a representative random number from a set of pseudo random numbers?

Let's say I got three pseudo random numbers from different pseudo random number generators.
Since the generators would reflect only a part of the real random number generating process, I believe that one way to get a number closer to real random might be to somehow get a "center" of the three pseudo random numbers.
An easy way to get that "center" would be to take average, median or mode (if any) of them.
I am wondering if there's a more sophisticated way due to the fact that they should represent random numbers.

Well, there is an approach, called entropy extractor, which allows to get (good) random numbers from not quite random source(s).
If you have three independent but somewhat low quality (biased) RNGs, you could combine them together into uniform source.
Suppose you have three generators giving you a single byte each, then uniform output would be
t = X*Y + Z
where addition and multiplication are done over GF(28) finite field.
Some code (Python)
def RNG1():
return ... # single random byte
def RNG2():
return ... # single random byte
def RNG3():
return ... # single random byte
from pyfinite import ffield
def muRNG():
X = RNG1()
Y = RNG2()
Z = RNG3()
GF = ffield.FField(8)
return GF.Add(GF.Multiply(X, Y), Z)
Paper where this idea was stated

Trying to use some form of "centering" turns out to be a bad idea if your goal is to have a better representation of the randomness.
First, a thought experiment. If you think three values gives more randomness, wouldn't more be even better? It turns out that if you take either the average or median of n Uniform(0,1) values, as n→∞ these both converge to 0.5, a point. It also happens to be the case that replacing distributions with a "representative" constant is generally a bad idea if you want to understand stochastic systems. As an extreme example, consider queues. As the arrival rate of customers/entities approaches the rate at which they can be served, stochastic queues get progressively larger on average. However, if the arrival and service distributions are constant, queues remain at zero length until the arrival rate exceeds the service rate, at which point they go to infinity. When the rates are equal, the stochastic queue would have infinite queues, while the deterministic queue would remain at its initial length (usually assumed to be zero). Infinity and zero are about as wildly different as you can get, illustrating that replacing distributions in a queueing model with their means would give you no understanding of how queues actually work.
Next, empirical evidence. Below histograms of the medians and averages constructed from 10,000 samples of three uniforms. As you can see, they have different distribution shapes but are clearly no longer uniform. Values bunch in the middle and are progressively rarer towards the endpoints of the range (0,1).
The uniform distribution has maximum entropy for continuous distributions on a closed interval, so both of these alternatives, being non-uniform, are clearly lower entropy, i.e., more predictable.

To get good random numbers, it's advisable to get some bits of entropy. Depending on whether they are used for security purposes or not, you could just get the time from the system clock as a seed for a random number generator, or use more sophisticated means. The project PWGen download | SourceForge.net is open-sourced, and monitors Windows events as a source of random bits of entropy.
You can find more info on how to random numbers in C++ from this SO ? too: Random number generation in C++11: how to generate, how does it work? [closed]. It turns out C++'s random numbers aren't always all that random: Everything You Never Wanted to Know about C++'s random_device; so looking for a good way to seed, i.e. by passing the time in mS to srand() and calling rand() might be a quick and dirty way to go.

Random number from many other random numbers, is it more random?

We want to generate a uniform random number from the interval [0, 1].
Let's first generate k random booleans (for example by rand()<0.5) and decide according to these on what subinterval [m*2^{-k}, (m+1)*2^{-k}] the number will fall. Then we use one rand() to get the final output as m*2^{-k} + rand()*2^{-k}.
Let's assume we have arbitrary precision.
Will a random number generated this way be 'more random' than the usual rand()?
PS. I guess the subinterval picking amounts to just choosing the binary representation of the output 0. b_1 b_2 b_3... one digit b_i at a time and the final step is adding the representation of rand() to the end of the output.

It depends on the definition of "more random". If you use more random generators, it means more random state, and it means that cycle length will be greater. But cycle length is just one property of random generators. Cycle length of 2^64 usually OK for almost any purpose (the only exception I know is that if you need a lot of different, long sequences, like for some kind of simulation).
However, if you combine two bad random generators, they don't necessarily become better, you have to analyze it. But there are generators, which do work this way. For example, KISS is an example for this: it combines 3, not-too-good generators, and the result is a good generator.
For card shuffling, you'll need a cryptographic RNG. Even a very good, but not cryptographic RNG is inadequate for this purpose. For example, Mersenne Twister, which is a good RNG, is not suitable for secure card shuffling! It is because observing output numbers, it is possible to figure out its internal state, so shuffle result can be predicted.

This can help, but only if you use a different pseudorandom generator for the first and last bits. (It doesn't have to be a different pseudorandom algorithm, just a different seed.)
If you use the same generator, then you will still only be able to construct 2^n different shuffles, where n is the number of bits in the random generator's state.
If you have two generators, each with n bits of state, then you can produce up to a total of 2^(2n) different shuffles.

Tinkering with a random number generator, as you are doing by using only one bit of random space and then calling iteratively, usually weakens its random properties. All RNGs fail some statistical tests for randomness, but you are more likely to get find that a noticeable cycle crops up if you start making many calls and combining them.

How to efficiently sample the continuous negative binomial distribution?

First, for context, I am working on a game where when you do something good you earn positive credits and when you do something bad you earn negative credits, and each credit corresponds to flipping a biased coin where if you get heads then something happens (good if its a positive credit, bad if its a negative credit) and otherwise nothing happens.
The deal is that I want to handle the case of multiple credits and fractional credits, and I would like to have flips use up credits so that if something good/bad happens then the leftover credits carry over. A straightforward way of doing this is to just perform a bunch of trials, and in particular for the case of fractional credits we can multiply the number of credits by X and the likelihood of something happening by 1/X (the distribution has the same expectation but slightly different weights); unfortunately, this places a practical limit on how many credits the user can get and also how many decimal places can be in the number of credits since this results in an unbounded amount of work.
What I would like to do is to take advantage of the fact that I am sampling the continuous negative binomial distribution, which is the distribution of how many trials it takes to get heads, i.e. so that if f(X) is the distribution then f(X) gives the probability that there will be X tails before we run into a heads, where X need not be an integer. If I can sample this distribution, then what I can do is that if X is the number of tails then I can see if X is greater or less than the number of credits; if it is greater than then we use up all of the credits but nothing happens, and if it is less than or equal to then something good happens and we subtract X from the number of credits. Furthermore, because the distribution is continuous I can easily handle fractional credits.
Does anyone know of a way for me to be able to efficiently sample the continuous negative binomial distribution (that is, a function that generates random numbers from this distribution)?

This question may be better answered on StatsExchange, but here I will take a stab at it.
You are correct that trying to compute this directly will be computationally expensive as you cannot avoid the beta and/or gamma function dependencies. The only statistically valid approximation I'm aware of is if the number of successes s required is large, and p is neither very small nor very large, then you can approximate it with a normal distribution with special values for the mean and variance. You can read more here but I'm guessing this approximation will not be generally applicable for you.
The negative binomial distribution can also be approximated as a mixture of Poisson distributions, but this doesn't save you from the gamma function dependency.
The only efficient class of negative binomial samplers that I'm aware of use optimized accept-reject techniques. Pages 10-11 of this PDF here describe the concept behind the method. Page 6 (page 295 internally) of this PDF here contains source code for sampling binomial deviates using related techniques. Note that even these methods still require random uniform deviates as well as sqrt(), log(), and gammln() calls. For small numbers of trials (less than 100 maybe?) I wouldn't be surprised at all if just simulating the trials with fast random number generator is faster than even the accept-reject techniques. Definitely start by getting a fast PRNG; they are not all created equal.
Edit:
The following pseudo-code would probably be fairly efficient to draw a random discrete negative binomial-distributed value as long as p is not very large (too close to 1.0). It will return the number of trials required before reaching your first "desired" outcome (which is actually the first "failure" in terms of the distribution):
// assume p and r are the parameters to the neg. binomial dist.
// r = number of failures (you'll set to one for your purpose)
// p = probability of a "success"
double rnd = _rnd.nextDouble(); // [0.0, 1.0)
int k = 0; // represents the # of successes that occur before 1st failure
double lastPmf = (1 - p)^r;
double cdf = lastPmf;
while (cdf < rnd)
{
lastPmf *= (p * (k+r) / (k+1));
cdf += lastPmf;
k++;
}
return k;
// or return (k+1) to also count the trial on which the failure occurred
Using the recurrence relationship saves over repeating the factorial independently at each step. I think using this, combined with limiting your fractional precision to 1 or 2 decimal places (so you only need to multiply by 10 or 100 respectively) might work for your purposes. You are drawing only one random number and the rest is just multiplications--it should be quite fast.

How to choose seeds for two random number generators so that sequences would be completely uncorrelated?

I want to use rejection sampling to generate random numbers from a given distribution. I want to be quite general so that I don't want to relay on things like Box-Muller transformation which can generate only normal distributed random numbers. I am using linear congruential generator to generate a random sequence between 0 and 1 with uniform distribution. To use rejection sampling, I need to generate two sequences of random numbers so that I would be able to generate uniform points inside a 2d region. This can be done using two random sequences (one for x coordinate and other for y coordinate). I searched on Internet but nowhere I saw how to make sure that these two sequences are really uncorrelated. Is there any way to choose seeds for these such that these sequences are uncorrelated? If I randomly give seeds then the final distribution of these numbers is not quite like what I am looking for.
Thank you

math: scale coordinate system so that certain points get integer coordinates

this is more a mathematical problem. nonethelesse i am looking for the algorithm in pseudocode to solve it.
given is a one dimensional coordinate system, with a number of points. the coordinates of the points may be in floating point.
now i am looking for a factor that scales this coordinate system, so that all points are on fixed number (i.e. integer coordinate)
if i am not mistaken, there should be a solution for this problem as long as the number of points is not infinite.
if i am wrong and there is no analytical solution for this problem, i am interested in an algorithm that approximates the solution as close as possible. (i.e. the coordinates will look like 15.0001)
if you are interested for the concrete problem:
i would like to overcome the well known pixelsnapping problem in adobe flash, which cuts of half-pixels at the border of bitmaps if the whole stage is scaled. i would like to find out an ideal scaling factor for the stage which makes my bitmaps being placed on whole (screen-)pixel coordinates.
since i am placing two bitmaps on the stage, the number of points will be 4 in each direction (x,y).
thanks!

As suggested, you have to convert your floating point numbers to rational ones. Fix a tolerance epsilon, and for each coordinate, find its best rational approximation within epsilon.
An algorithm and definitions is outlined there in this section.
Once you have converted all the coordinates into rational numbers, the scaling is given by the least common multiple of the denominators.
Note that this latter number can become quite huge, so you may want to experiment with epsilon so that to control the denominators.

My own inclination, if I were in your situation, would be to use rational numbers not with floating point.
And the algorithms you are looking for is finding the lowest common denominator.

A floating point number is an integer, multiplied by a power of two (the power might be negative).
So, find the largest necessary power of two among your inputs, and that gives you a scale factor that will work. The power of two isn't just -1 times the exponent of the float, it's a few more than that (according to where the least significant 1 bit is in the significand).
It's also optimal, because if x times a power of 2 is an odd integer then x in its float representation was already in simplest rational form, there's no smaller integer that you can multiply x by to get an integer.
Obviously if you have a mixture of large and small values among your input, then the resulting integers will tend to be bigger than 64 bit. So there is an analytical solution, but perhaps not a very good one given what you want to do with the results.
Note that this approach treats floats as being precise representations, which they are not. You may get more sensible results by representing each float as a rational number with smaller denominator (within some defined tolerance), then taking the lowest common multiple of all the denominators.
The problem there though is the approximation process - if the input float is 0.334[*] then I can't in general be sure whether the person who gave it to me really mean 0.334, or whether it's 1/3 with some inaccuracy. I therefore don't know whether to use a scale factor of 3 and say the scaled result is 1, or use a scale factor of 500 and say the scaled result is 167. And that's just with 1 input, never mind a bunch of them.
With 4 inputs and allowed final tolerance of 0.0001, you could perhaps find the 10 closest rationals to each input with a certain maximum denominator, then try 10^4 different possibilities and see whether the resulting scale factor gives you any values that are too far from an integer. Brute force seems nasty, but you might a least be able to bound the search a bit as you go. Also "maximum denominator" might be expressed in terms of the primes present in the factorization, rather than just the number, since if you can find a lot of common factors among them then they'll have a smaller lcm and hence smaller deviation from integers after scaling.
[*] Not that 0.334 is an exact float value, but that sort of thing. Decimal examples are easier.

If you are talking about single precision floating point numbers, then the number can be expressed like this according to wikipedia:
From this formula you can deduce that you always get an integer if you multiply by 2127+23. (Actually, when e is 0 you have to use another formula for the special range of "subnormal" numbers so 2126+23 is sufficient. See the linked wikipedia article for details.)
To do this in code you will probably need to do some bit twiddling to extract the factors in the above formula from the bits in the floating point value. And then you will need some kind of support for unlimited size numbers to express the integer result of the scaling (e.g. BigInteger in .NET). Normal primitive types in most languages/platforms are typically limited to much smaller sizes.

It's really a problem in statistical inference combined with noise reduction. This is the method I'm going to try out soon. I'm assuming you're trying to get a regularly spaced 2-D grid but a similar method could work on a regularly spaced grid of 3 or more dimensions.
First tabulate all the differences and note that (dx,dy) and (-dx,-dy) denote the same displacement, so there's an equivalence relation. Group those differenecs that are within a pre-assigned threshold (epsilon) of one another. Epsilon should be large enough to capture measurement errors due to random noise or lack of image resolution, but small enough not to accidentally combine clusters.
Sort the clusters by their average size (dr = root(dx^2 + dy^2)).
If the original grid was, indeed, regularly spaced and generated by two independent basis vectors, then the two smallest linearly independent clusters will indicate so. The smallest cluster is the one centered on (0, 0). The next smallest cluster (dx0, dy0) has the first basis vector up to +/- sign (-dx0, -dy0) denotes the same displacement, recall.
The next smallest clusters may be linearly dependent on this (up to the threshold epsilon) by virtue of being multiples of (dx0, dy0). Find the smallest cluster which is NOT a multiple of (dx0, dy0). Call this (dx1, dy1).
Now you have enough to tag the original vectors. Group the vector, by increasing lexicographic order (x,y) > (x',y') if x > x' or x = x' and y > y'. Take the smallest (x0,y0) and assign the integer (0, 0) to it. Take all the others (x,y) and find the decomposition (x,y) = (x0,y0) + M0(x,y) (dx0, dy0) + M1(x,y) (dx1,dy1) and assign it the integers (m0(x,y),m1(x,y)) = (round(M0), round(M1)).
Now do a least-squares fit of the integers to the vectors to the equations (x,y) = (ux,uy) m0(x,y) (u0x,u0y) + m1(x,y) (u1x,u1y)
to find (ux,uy), (u0x,u0y) and (u1x,u1y). This identifies the grid.
Test this match to determine whether or not all the points are within a given threshold of this fit (maybe using the same threshold epsilon for this purpose).
The 1-D version of this same routine should also work in 1 dimension on a spectrograph to identify the fundamental frequency in a voice print. Only in this case, the assumed value for ux (which replaces (ux,uy)) is just 0 and one is only looking for a fit to the homogeneous equation x = m0(x) u0x.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio