Random number starting points

Random number starting points - random

The awk manual says srand "sets the seed (starting point) for rand()". I used srand(5) with the following code:
awk 'BEGIN {srand(5);while(1)print rand()}'> /var/tmp/rnd
It generates numbers like:
0.177399
0.340855
0.0256178
0.838417
0.0195347
0.29598
Can you explain how srand(5) generates the "starting point" with the above output?

The starting point is called the seed. It is given to the first iteration of the rand function. After that rand uses the previous value it got when calculating the old number -- to generate the next number. Using a prime number for the seed is a good idea.

PRNGs (pseudo-random number generators) produce random values by keeping some kind of internal state which can be advanced through a series of values whose repeating period is very large, and whose successive values have very few apparent statistical correlations as long as we use far fewer of them. But nonetheless, its values are a deterministic sequence.
"Seeding" a PRNG is basically selecting what point in the deterministic sequence to start at. The algorithm will take the number passed as the seed and compute (in some algorithm-specific way) where to start in the sequence. The actual value of the seed is irrelevant--the algorithm should not depend on it in any way.
But, although the seed value itself does not directly participate in the PRNG algorithm, it does uniquely identify the starting point in the sequence, so if you give a particular seed and then generate a sequence of values, seeding again with the same value should cause the PRNG to generate the same sequence of values.

Related

How to get a representative random number from a set of pseudo random numbers?

Let's say I got three pseudo random numbers from different pseudo random number generators.
Since the generators would reflect only a part of the real random number generating process, I believe that one way to get a number closer to real random might be to somehow get a "center" of the three pseudo random numbers.
An easy way to get that "center" would be to take average, median or mode (if any) of them.
I am wondering if there's a more sophisticated way due to the fact that they should represent random numbers.

Well, there is an approach, called entropy extractor, which allows to get (good) random numbers from not quite random source(s).
If you have three independent but somewhat low quality (biased) RNGs, you could combine them together into uniform source.
Suppose you have three generators giving you a single byte each, then uniform output would be
t = X*Y + Z
where addition and multiplication are done over GF(28) finite field.
Some code (Python)
def RNG1():
return ... # single random byte
def RNG2():
return ... # single random byte
def RNG3():
return ... # single random byte
from pyfinite import ffield
def muRNG():
X = RNG1()
Y = RNG2()
Z = RNG3()
GF = ffield.FField(8)
return GF.Add(GF.Multiply(X, Y), Z)
Paper where this idea was stated

Trying to use some form of "centering" turns out to be a bad idea if your goal is to have a better representation of the randomness.
First, a thought experiment. If you think three values gives more randomness, wouldn't more be even better? It turns out that if you take either the average or median of n Uniform(0,1) values, as n→∞ these both converge to 0.5, a point. It also happens to be the case that replacing distributions with a "representative" constant is generally a bad idea if you want to understand stochastic systems. As an extreme example, consider queues. As the arrival rate of customers/entities approaches the rate at which they can be served, stochastic queues get progressively larger on average. However, if the arrival and service distributions are constant, queues remain at zero length until the arrival rate exceeds the service rate, at which point they go to infinity. When the rates are equal, the stochastic queue would have infinite queues, while the deterministic queue would remain at its initial length (usually assumed to be zero). Infinity and zero are about as wildly different as you can get, illustrating that replacing distributions in a queueing model with their means would give you no understanding of how queues actually work.
Next, empirical evidence. Below histograms of the medians and averages constructed from 10,000 samples of three uniforms. As you can see, they have different distribution shapes but are clearly no longer uniform. Values bunch in the middle and are progressively rarer towards the endpoints of the range (0,1).
The uniform distribution has maximum entropy for continuous distributions on a closed interval, so both of these alternatives, being non-uniform, are clearly lower entropy, i.e., more predictable.

To get good random numbers, it's advisable to get some bits of entropy. Depending on whether they are used for security purposes or not, you could just get the time from the system clock as a seed for a random number generator, or use more sophisticated means. The project PWGen download | SourceForge.net is open-sourced, and monitors Windows events as a source of random bits of entropy.
You can find more info on how to random numbers in C++ from this SO ? too: Random number generation in C++11: how to generate, how does it work? [closed]. It turns out C++'s random numbers aren't always all that random: Everything You Never Wanted to Know about C++'s random_device; so looking for a good way to seed, i.e. by passing the time in mS to srand() and calling rand() might be a quick and dirty way to go.

Random number from many other random numbers, is it more random?

We want to generate a uniform random number from the interval [0, 1].
Let's first generate k random booleans (for example by rand()<0.5) and decide according to these on what subinterval [m*2^{-k}, (m+1)*2^{-k}] the number will fall. Then we use one rand() to get the final output as m*2^{-k} + rand()*2^{-k}.
Let's assume we have arbitrary precision.
Will a random number generated this way be 'more random' than the usual rand()?
PS. I guess the subinterval picking amounts to just choosing the binary representation of the output 0. b_1 b_2 b_3... one digit b_i at a time and the final step is adding the representation of rand() to the end of the output.

It depends on the definition of "more random". If you use more random generators, it means more random state, and it means that cycle length will be greater. But cycle length is just one property of random generators. Cycle length of 2^64 usually OK for almost any purpose (the only exception I know is that if you need a lot of different, long sequences, like for some kind of simulation).
However, if you combine two bad random generators, they don't necessarily become better, you have to analyze it. But there are generators, which do work this way. For example, KISS is an example for this: it combines 3, not-too-good generators, and the result is a good generator.
For card shuffling, you'll need a cryptographic RNG. Even a very good, but not cryptographic RNG is inadequate for this purpose. For example, Mersenne Twister, which is a good RNG, is not suitable for secure card shuffling! It is because observing output numbers, it is possible to figure out its internal state, so shuffle result can be predicted.

This can help, but only if you use a different pseudorandom generator for the first and last bits. (It doesn't have to be a different pseudorandom algorithm, just a different seed.)
If you use the same generator, then you will still only be able to construct 2^n different shuffles, where n is the number of bits in the random generator's state.
If you have two generators, each with n bits of state, then you can produce up to a total of 2^(2n) different shuffles.

Tinkering with a random number generator, as you are doing by using only one bit of random space and then calling iteratively, usually weakens its random properties. All RNGs fail some statistical tests for randomness, but you are more likely to get find that a noticeable cycle crops up if you start making many calls and combining them.

Different stream of pseudorandom numbers

I have a homework problem on running a simulation where I generate 100 random numbers and perform a calculation on each outcome. The next question asks me to repeat the previous question but with a different stream of pseudorandom numbers. The side note tells me to to perform two computations within one call to the program because changing the seed/state arbitrary can lead to overlapping streams.
Can someone explain to me what this means? Why do I have to do it through 1 loop?
Why can't I just call the same code twice using a different seed each time?

Pseudo-random number generators (PRNGs) work by iterating through a deterministic set of calculations on some internal information known as the generator's state, and then handing you back a value which is based on the state. There's a finite amount of state information that determines what the next state, and thus the next outcome, will be. Since it's finite, eventually the generator will revisit a state that it used before, and from that point forward all values will be exact duplicates of the sequence you've already seen. The PRNG is said to have cycled. "Seeding" a random number generator sets the starting point for the state, so it effectively corresponds to choosing an entry point to the cycle.
If a human intervenes by changing the seed arbitrarily, there's a chance that they will prematurely put the state back to where some portion of the output sequence repeat gets repeated. This is referred to as overlapping streams. The solution is to seed your PRNG once, and then don't mess with it so it can achieve its full cycle.
In your case it means that the values and ordering of your first set of 100 numbers will be distinct from your the values and ordering of your second set of 100.

How can I call the same pseudorandom number for a given i and j?

I'm working on a project with multiple functions. I call a pseudorandom number several times, each in different functions, and then do some math on it. For example:
f(i,j)*random(i,j)
I assume that in the different functions the pseudorandom number isn't equal to the pseudorandom number in another function at a given i and j. Is that a correct assumption? If so, how is it possible to change that?
If it matters, the language I'm using is Xojo, which is similar to VB6.

I'm not really sure what the question is, but hopefully giving some basics of pseudo-random number generators (PRNG) will answer it:
This is more of a language feature, but usually calling the same function (i.e. random) is independent of where you call it from (there may be other determining factors).
random(i,j) may or may not return the same number twice in a row or after some time. It's (pseudo-)random, we just don't know whether it will.
If you want random(i,j) to always return the same value, you can consider writing your own function that maps some value of i and j to another value using some formula, or you can store all previous generated numbers in a map, and simply return this value if it exists.
If you want random(i,j) to never return the same value, consider generating numbers from i to j and shuffling them and simply returning the next value in the list repeatedly.
You can usually set the seed of a PRNG. This will cause that, if you get some sequence after setting the seed to some value, you will get the same sequence if you set the seed to the same value at some other time. This doesn't really serve much of a practical purpose (that I can think of) beyond giving you the capability to reproducible previous results exactly.

If you want random(i,j) to return the same random number each time it's called with the same i,j you could simply save the state.
One approach would be to store the state in an n x n matrix R (where n is the range of i,j). On the first call to random(i,j) set R(i,j) = rand(). On subsequent calls retrieve the existing value.
If the range of i,j is very large and the values sparse use a hash table for R instead of a matrix.

Generating two differnet pseudorandom numbers using the same seed

Is it possible to generate two different psuedorandom numbers on two separate program runs without using time as the seed? i.e. using the same seed on both runs, is it possible to get two different numbers?

In general, it is not possible to get different pseudorandom numbers using the same seed.
Pseudorandom numbers are, by definition, not truly random numbers and therefore are not composed from sources of entropy. Or, if the numbers do contain some entropy input, the input is not enough to cause the sequence to qualify as statistically "random." (An example of a property that such a sequence should have is runs of 1-bits n-bits-long with probability of 2^(-n), among many other properties of statistical randomness. The definition of statistical randomness becomes more sophisticated (in a sense, more "actual" or close to nature) as mathematics around randomness improves. This is another way of saying that, at any given time, the definitions of statistical randomness are about to become out-dated or obsolete.)
In any case, the vast majority of pseudorandom number generators are, in fact, completely deterministic.
The canonical1 example of a pseudorandom number generator is a linear feedback shift register (LFSR). The LFSR can be implemented as a digital logic circuit containing a register which holds N bits, some gates numbering M, much less than N (e.g., M=1, M=2), usually these are XOR gates, which "feed back" into the register's bits at certain "tap bits." There is a lot about this on the web.
Given the same seed input, the LFSR will always generate the same sequence.
It is possible, using Walsh-Hadamard matrices, or otherwise called "M matrices", additionally called "sequency transform", to sample the output of an LFSR and determine that the sequence is, in fact, from an LFSR and also the structure of its gates and taps, as well as the current register content. From this information all sequence values are known, and it is possible to reverse out the possible seed values which were used as input. For these reasons, LFSRs are not suitable for security purposes such as random tokens for authentication.
By canonical, I am refering to Don Knuth's use of the LFSR as an example, as well as the timeless tradition which has ensued therefrom.

Not sure if you want to generate 2 different random numbers from same seed - or avoid it! But, if you really do want that, then similar to LFSRs, LCGs (Linear Congruential Generators) are often used to generate deterministic psuedo random numbers. You can 'easily' create 2 simple LCGs using different constants, which will generate 2 different psuedo random numbers for the same seed.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio