More random numbers - algorithm

So I get that all the built in function only return pseudo random numbers as they use the clock speed or some other hardware to get the number.
So here my idea, if I take two pseudo random numbers and bitwise them together would the result still be pseudo random or would it be closer to truly random.
I figured that if I fiddled about with the number a bit it would be less replicable, or am I getting this wrong.
On a side note why is pseudo random a problem?

It will not be more random, but there is a big risk that the number will be less random (less uniformly ditributed). What bitwise operator were you thinking about?
Lets assume 4-bit random numbers 0101, 1000. When OR:ed together you would get 1101. With OR there would be a clear bias towards 1111, with AND towards 0000. (75 % of getting a 1 or 0 respectively in each position)
I don't think XOR and XNOR would be biased. But also you wouldn't get any more randomness out of it (see Pavium's answer).

Algorithms executed by computers are deterministic.
You can only generate truly random numbers if there's a non-deterministic input.
Pseudo-random numbers follow a repeating sequence. Maybe a long sequence but the repetition makes them predictable and therefore not truly random.
You can't generate truly random numbers from two pseudo-random numbers.
EDITED: to put the sentences in a more logical order.

Related

How to get a representative random number from a set of pseudo random numbers?

Let's say I got three pseudo random numbers from different pseudo random number generators.
Since the generators would reflect only a part of the real random number generating process, I believe that one way to get a number closer to real random might be to somehow get a "center" of the three pseudo random numbers.
An easy way to get that "center" would be to take average, median or mode (if any) of them.
I am wondering if there's a more sophisticated way due to the fact that they should represent random numbers.
Well, there is an approach, called entropy extractor, which allows to get (good) random numbers from not quite random source(s).
If you have three independent but somewhat low quality (biased) RNGs, you could combine them together into uniform source.
Suppose you have three generators giving you a single byte each, then uniform output would be
t = X*Y + Z
where addition and multiplication are done over GF(28) finite field.
Some code (Python)
def RNG1():
return ... # single random byte
def RNG2():
return ... # single random byte
def RNG3():
return ... # single random byte
from pyfinite import ffield
def muRNG():
X = RNG1()
Y = RNG2()
Z = RNG3()
GF = ffield.FField(8)
return GF.Add(GF.Multiply(X, Y), Z)
Paper where this idea was stated
Trying to use some form of "centering" turns out to be a bad idea if your goal is to have a better representation of the randomness.
First, a thought experiment. If you think three values gives more randomness, wouldn't more be even better? It turns out that if you take either the average or median of n Uniform(0,1) values, as nā†’āˆž these both converge to 0.5, a point. It also happens to be the case that replacing distributions with a "representative" constant is generally a bad idea if you want to understand stochastic systems. As an extreme example, consider queues. As the arrival rate of customers/entities approaches the rate at which they can be served, stochastic queues get progressively larger on average. However, if the arrival and service distributions are constant, queues remain at zero length until the arrival rate exceeds the service rate, at which point they go to infinity. When the rates are equal, the stochastic queue would have infinite queues, while the deterministic queue would remain at its initial length (usually assumed to be zero). Infinity and zero are about as wildly different as you can get, illustrating that replacing distributions in a queueing model with their means would give you no understanding of how queues actually work.
Next, empirical evidence. Below histograms of the medians and averages constructed from 10,000 samples of three uniforms. As you can see, they have different distribution shapes but are clearly no longer uniform. Values bunch in the middle and are progressively rarer towards the endpoints of the range (0,1).
The uniform distribution has maximum entropy for continuous distributions on a closed interval, so both of these alternatives, being non-uniform, are clearly lower entropy, i.e., more predictable.
To get good random numbers, it's advisable to get some bits of entropy. Depending on whether they are used for security purposes or not, you could just get the time from the system clock as a seed for a random number generator, or use more sophisticated means. The project PWGen download | SourceForge.net is open-sourced, and monitors Windows events as a source of random bits of entropy.
You can find more info on how to random numbers in C++ from this SO ? too: Random number generation in C++11: how to generate, how does it work? [closed]. It turns out C++'s random numbers aren't always all that random: Everything You Never Wanted to Know about C++'s random_device; so looking for a good way to seed, i.e. by passing the time in mS to srand() and calling rand() might be a quick and dirty way to go.

Combining PRNG and 'true' random, fast and (perhaps) dumb way

Take fast PRNG like xoroshiro or xorshift and 'true' entropy based generator like /dev/random.
Seed PRNG with 'true' random, but also get a single number from 'true' random and use it to XOR all results from PRNG to produce final output.
Then, replace this number once a while (e.g. after 10000 random numbers are generated).
Perhaps this is naive, but I would hope this should improve some aspects of PRNG like period size with negligible impact on speed. What am I getting wrong?
What I am concerned about here is generating UUIDs (fast), which are basically 128-bit numbers which should be "really unique". What my concern is that using modern PRGN like xorshift family with periods close to 'just' 2^128 the chance of collision of entropy seeded PRNG generator is not as negligible as it would be with truly random numbers.
The improvements are only minor compared to the plain PRNG. For example the single true random number used for masking the result can be eliminated by taking the XOR of successive results. This will be the same value as the XOR of successive plain PRNG numbers. So if you can predict the PRNG, it is not too hard to do the same to the improved sequence.

Random number from many other random numbers, is it more random?

We want to generate a uniform random number from the interval [0, 1].
Let's first generate k random booleans (for example by rand()<0.5) and decide according to these on what subinterval [m*2^{-k}, (m+1)*2^{-k}] the number will fall. Then we use one rand() to get the final output as m*2^{-k} + rand()*2^{-k}.
Let's assume we have arbitrary precision.
Will a random number generated this way be 'more random' than the usual rand()?
PS. I guess the subinterval picking amounts to just choosing the binary representation of the output 0. b_1 b_2 b_3... one digit b_i at a time and the final step is adding the representation of rand() to the end of the output.
It depends on the definition of "more random". If you use more random generators, it means more random state, and it means that cycle length will be greater. But cycle length is just one property of random generators. Cycle length of 2^64 usually OK for almost any purpose (the only exception I know is that if you need a lot of different, long sequences, like for some kind of simulation).
However, if you combine two bad random generators, they don't necessarily become better, you have to analyze it. But there are generators, which do work this way. For example, KISS is an example for this: it combines 3, not-too-good generators, and the result is a good generator.
For card shuffling, you'll need a cryptographic RNG. Even a very good, but not cryptographic RNG is inadequate for this purpose. For example, Mersenne Twister, which is a good RNG, is not suitable for secure card shuffling! It is because observing output numbers, it is possible to figure out its internal state, so shuffle result can be predicted.
This can help, but only if you use a different pseudorandom generator for the first and last bits. (It doesn't have to be a different pseudorandom algorithm, just a different seed.)
If you use the same generator, then you will still only be able to construct 2^n different shuffles, where n is the number of bits in the random generator's state.
If you have two generators, each with n bits of state, then you can produce up to a total of 2^(2n) different shuffles.
Tinkering with a random number generator, as you are doing by using only one bit of random space and then calling iteratively, usually weakens its random properties. All RNGs fail some statistical tests for randomness, but you are more likely to get find that a noticeable cycle crops up if you start making many calls and combining them.

Diehard random number tester with a very small amount of numbers

I am trying to test 100 different sets of 100 random human generated numbers for randomness in comparison to the randomness of 100 different sets of 100 random computer generated numbers, but the diehard program wants a single set of around 100000 numbers.
I was wondering if it's possible to combine the human sets together into a block of 100000 numbers by using the human numbers as a seed for a pseudo number generator, and using the output as the number to test for the diehard program. I would do the same with the computer set with the same pseudo random generator. Would this actually change the result of the randomness if all I'm trying to prove is that computer generated numbers is more random than human generated numbers?
You can try just concatenating the numbers. I wouldn't think any combination would consistently be a lot better than some other. Any way of combining the numbers would cause them to lose some properties (possibly including the classification of 'random' by some test) regardless (some combinations more than others in certain cases, but if we're dealing with random numbers, you can't really predict much).
I'm not sure why you'd want to use the numbers as a seed for another random number generator (if I understand you correctly). This will not yield any useful applicable results. If you use a random number generator, you will get a sequence of numbers from a pseudo-random set, the seed will only determine where in this set you start, starting with any seed should produce as random results as starting with any other seed.
Any alleged test for randomness can, at best, say that some set is probably random. No test can measure true randomness accurately, that would probably contradict the definition of randomness.

How to adjust the distribution of values in a random data stream?

Given a infinite stream of random 0's and 1's that is from a biased (e.g. 1's are more common than 0's by a know factor) but otherwise ideal random number generator, I want to convert it into a (shorter) infinite stream that is just as ideal but also unbiased.
Looking up the definition of entropy finds this graph showing how many bits of output I should, in theory, be able to get from each bit of input.
The question: Is there any practical way to actually implement a converter that is nearly ideally efficient?
There is a well-known device due to Von Neumann for turning an unfair coin into a fair coin. We can use this device to solve our problem here.
Repeatedly draw two bits from your biased source until you obtain a pair for which the bits are different. Now return the first bit, discarding the second. This produces an unbiased source. The reason this works is because regardless of the source, the probability of a 01 is the same as a probability of a 10. Therefore the probability of a 0 conditional on 01 or 10 is 1/2 and the probability of a 1 conditional on 01 or 10 is 1/2.
Please see
http://en.wikipedia.org/wiki/Randomness_extractor
http://en.wikipedia.org/wiki/Whitening_transform
http://en.wikipedia.org/wiki/Decorrelation
Hoffman encode the input.
Given that the input is of a known bias, you can compute a probability distribution for check sums of each n bit segment. From that construct a Hoffman code and then just encode the sequence.
I'm not sure but one potential problem is that this might introduce some correlation between sequential bits.

Resources