I am writing an application where I need a persistent random integer within a range for each day. The number should be different but persistent for each passing day. The output should be as uniform as possible, but distribution quality doesn't have to be the best. I'd prefer a simple and "good enough" solution to this problem.
What kind of algorithm can I use for this?
Input: Current day (for example, an integer denoting days since some epoch)
Output: A random integer between X and Y
Thank you.
Edit: The platform I'm working in does not have a seeded PRNG implementation.
Algorithm:
Seed RNG with current day
Generate one random number
Mod y, add x
Replace step three with a smarter algorithm if you want uniform probabilities.
EDIT: ok, you don't have a PRNG. Then you might want to apply some hash algorithm to the current date and treat that as a random number.
I'm not sure if you want to write the algorithm yourself or just need a programming solution.
For the latter, you could use something along these lines:
new Random((DateTime.Today - new DateTime(1970,1,1)).Days)
.Next(min, max)
That's in C# but you get the idea:
use a fixed beginning date
count the days since then
use that number of days as the seed for the random number generator
get a number within your bound using a utility function e.g.
Related
Let's say I got three pseudo random numbers from different pseudo random number generators.
Since the generators would reflect only a part of the real random number generating process, I believe that one way to get a number closer to real random might be to somehow get a "center" of the three pseudo random numbers.
An easy way to get that "center" would be to take average, median or mode (if any) of them.
I am wondering if there's a more sophisticated way due to the fact that they should represent random numbers.
Well, there is an approach, called entropy extractor, which allows to get (good) random numbers from not quite random source(s).
If you have three independent but somewhat low quality (biased) RNGs, you could combine them together into uniform source.
Suppose you have three generators giving you a single byte each, then uniform output would be
t = X*Y + Z
where addition and multiplication are done over GF(28) finite field.
Some code (Python)
def RNG1():
return ... # single random byte
def RNG2():
return ... # single random byte
def RNG3():
return ... # single random byte
from pyfinite import ffield
def muRNG():
X = RNG1()
Y = RNG2()
Z = RNG3()
GF = ffield.FField(8)
return GF.Add(GF.Multiply(X, Y), Z)
Paper where this idea was stated
Trying to use some form of "centering" turns out to be a bad idea if your goal is to have a better representation of the randomness.
First, a thought experiment. If you think three values gives more randomness, wouldn't more be even better? It turns out that if you take either the average or median of n Uniform(0,1) values, as n→∞ these both converge to 0.5, a point. It also happens to be the case that replacing distributions with a "representative" constant is generally a bad idea if you want to understand stochastic systems. As an extreme example, consider queues. As the arrival rate of customers/entities approaches the rate at which they can be served, stochastic queues get progressively larger on average. However, if the arrival and service distributions are constant, queues remain at zero length until the arrival rate exceeds the service rate, at which point they go to infinity. When the rates are equal, the stochastic queue would have infinite queues, while the deterministic queue would remain at its initial length (usually assumed to be zero). Infinity and zero are about as wildly different as you can get, illustrating that replacing distributions in a queueing model with their means would give you no understanding of how queues actually work.
Next, empirical evidence. Below histograms of the medians and averages constructed from 10,000 samples of three uniforms. As you can see, they have different distribution shapes but are clearly no longer uniform. Values bunch in the middle and are progressively rarer towards the endpoints of the range (0,1).
The uniform distribution has maximum entropy for continuous distributions on a closed interval, so both of these alternatives, being non-uniform, are clearly lower entropy, i.e., more predictable.
To get good random numbers, it's advisable to get some bits of entropy. Depending on whether they are used for security purposes or not, you could just get the time from the system clock as a seed for a random number generator, or use more sophisticated means. The project PWGen download | SourceForge.net is open-sourced, and monitors Windows events as a source of random bits of entropy.
You can find more info on how to random numbers in C++ from this SO ? too: Random number generation in C++11: how to generate, how does it work? [closed]. It turns out C++'s random numbers aren't always all that random: Everything You Never Wanted to Know about C++'s random_device; so looking for a good way to seed, i.e. by passing the time in mS to srand() and calling rand() might be a quick and dirty way to go.
We want to generate a uniform random number from the interval [0, 1].
Let's first generate k random booleans (for example by rand()<0.5) and decide according to these on what subinterval [m*2^{-k}, (m+1)*2^{-k}] the number will fall. Then we use one rand() to get the final output as m*2^{-k} + rand()*2^{-k}.
Let's assume we have arbitrary precision.
Will a random number generated this way be 'more random' than the usual rand()?
PS. I guess the subinterval picking amounts to just choosing the binary representation of the output 0. b_1 b_2 b_3... one digit b_i at a time and the final step is adding the representation of rand() to the end of the output.
It depends on the definition of "more random". If you use more random generators, it means more random state, and it means that cycle length will be greater. But cycle length is just one property of random generators. Cycle length of 2^64 usually OK for almost any purpose (the only exception I know is that if you need a lot of different, long sequences, like for some kind of simulation).
However, if you combine two bad random generators, they don't necessarily become better, you have to analyze it. But there are generators, which do work this way. For example, KISS is an example for this: it combines 3, not-too-good generators, and the result is a good generator.
For card shuffling, you'll need a cryptographic RNG. Even a very good, but not cryptographic RNG is inadequate for this purpose. For example, Mersenne Twister, which is a good RNG, is not suitable for secure card shuffling! It is because observing output numbers, it is possible to figure out its internal state, so shuffle result can be predicted.
This can help, but only if you use a different pseudorandom generator for the first and last bits. (It doesn't have to be a different pseudorandom algorithm, just a different seed.)
If you use the same generator, then you will still only be able to construct 2^n different shuffles, where n is the number of bits in the random generator's state.
If you have two generators, each with n bits of state, then you can produce up to a total of 2^(2n) different shuffles.
Tinkering with a random number generator, as you are doing by using only one bit of random space and then calling iteratively, usually weakens its random properties. All RNGs fail some statistical tests for randomness, but you are more likely to get find that a noticeable cycle crops up if you start making many calls and combining them.
I'm working on a project with multiple functions. I call a pseudorandom number several times, each in different functions, and then do some math on it. For example:
f(i,j)*random(i,j)
I assume that in the different functions the pseudorandom number isn't equal to the pseudorandom number in another function at a given i and j. Is that a correct assumption? If so, how is it possible to change that?
If it matters, the language I'm using is Xojo, which is similar to VB6.
I'm not really sure what the question is, but hopefully giving some basics of pseudo-random number generators (PRNG) will answer it:
This is more of a language feature, but usually calling the same function (i.e. random) is independent of where you call it from (there may be other determining factors).
random(i,j) may or may not return the same number twice in a row or after some time. It's (pseudo-)random, we just don't know whether it will.
If you want random(i,j) to always return the same value, you can consider writing your own function that maps some value of i and j to another value using some formula, or you can store all previous generated numbers in a map, and simply return this value if it exists.
If you want random(i,j) to never return the same value, consider generating numbers from i to j and shuffling them and simply returning the next value in the list repeatedly.
You can usually set the seed of a PRNG. This will cause that, if you get some sequence after setting the seed to some value, you will get the same sequence if you set the seed to the same value at some other time. This doesn't really serve much of a practical purpose (that I can think of) beyond giving you the capability to reproducible previous results exactly.
If you want random(i,j) to return the same random number each time it's called with the same i,j you could simply save the state.
One approach would be to store the state in an n x n matrix R (where n is the range of i,j). On the first call to random(i,j) set R(i,j) = rand(). On subsequent calls retrieve the existing value.
If the range of i,j is very large and the values sparse use a hash table for R instead of a matrix.
The awk manual says srand "sets the seed (starting point) for rand()". I used srand(5) with the following code:
awk 'BEGIN {srand(5);while(1)print rand()}'> /var/tmp/rnd
It generates numbers like:
0.177399
0.340855
0.0256178
0.838417
0.0195347
0.29598
Can you explain how srand(5) generates the "starting point" with the above output?
The starting point is called the seed. It is given to the first iteration of the rand function. After that rand uses the previous value it got when calculating the old number -- to generate the next number. Using a prime number for the seed is a good idea.
PRNGs (pseudo-random number generators) produce random values by keeping some kind of internal state which can be advanced through a series of values whose repeating period is very large, and whose successive values have very few apparent statistical correlations as long as we use far fewer of them. But nonetheless, its values are a deterministic sequence.
"Seeding" a PRNG is basically selecting what point in the deterministic sequence to start at. The algorithm will take the number passed as the seed and compute (in some algorithm-specific way) where to start in the sequence. The actual value of the seed is irrelevant--the algorithm should not depend on it in any way.
But, although the seed value itself does not directly participate in the PRNG algorithm, it does uniquely identify the starting point in the sequence, so if you give a particular seed and then generate a sequence of values, seeding again with the same value should cause the PRNG to generate the same sequence of values.
I am using a C# implementation of Mersenne Twister I downloaded from CenterSpace. I have two problems with it:
No matter how I seed the algorithm it does not pass DieHard tests, and by that I mean I get quite a lot of 1s and 0s for p-value. Also my KStest on 269 p-values is 0. Well, I cannot quite interpret p-value, but I think a few 1s and 0s in the result is bad news.
I have been asked to visually show the randomness of the numbers. So I plot the numbers as they are generated, and this does not seem random at all. Here is two screenshots of the result after a few seconds and a few seconds later. As you can see in the second screenshot the numbers fall on some parallel lines. I have tried different algorithms to map numbers to points. They all result in parallel lines, but with different angles! This is how I mapped numbers to points for these screenshots: new Point(number % _canvasWidth, number % _canvasHeight). As you may guess, the visual result depends on the form's width and height, and this is a disasterous result.
Here is a few ways I tried to seed the algorithm:
User entry. I enter some numbers to seed the algorithm as an int array.
Random numbers generated by the algorithm itself!!
An array of new Guid().GetHashCode()
What am I missing here? How should I seed the algorithm? How can I get it pass the DieHard?
While I cannot speak to your first point, the second problem has to do with how you are computing the points to draw on. Specifically,
x = number % _canvasWidth;
y = number % _canvasHeight;
will give you a "pattern" that corresponds somewhat to the aspect ratio of the window you are drawing to. For example, if _canvasWidth and _canvasHeight were equal, you would always draw on a single diagonal line as x and y would always be the same. This graphical representation wouldn't be appropriate in this case, then.
What about taking the N bits of the RNG output and using half for the x coordinate and the other half for the y coordinate? For those bits that fall out of the bounds of your window you might want to consider two options:
Don't draw them (or draw them offscreen)
Perform a linear interpolation to map the range of bits to the width/height of your window
Either option should give you a more representative picture of the bits you are getting our of your random number generator. Good luck!
Your stripy point-plotting problem should easily be fixed by generating a new random number for each of the x and y coordinates. Trying to reuse a single generated number for x and y is basically premature optimization, but if you do go down that route, make sure you extract different bits for each from the number; as is, x=n%width;y=n%height gives you enormous correlation between x and y, as can be seen in your images.
I've been using various C++ Mersenne Twister implementations for years (most recently boost's) to generate random points and had no difficulties with it (seed related or otherwise). It really is a superb generator.
True random number generation cannot be done with a mathematical function. If it's important to have truly random numbers, get a hardware random number generator. I've developed real money online poker games—such hardware is the only way to be confident there are no patterns in the numbers.
If targeting a Linux environment, the /dev/random and /dev/urandom pseudo devices do a lot better than a mathematical generator, since they incorporate random numbers representing hardware activity.