Combining PRNG and 'true' random, fast and (perhaps) dumb way - algorithm

Take fast PRNG like xoroshiro or xorshift and 'true' entropy based generator like /dev/random.
Seed PRNG with 'true' random, but also get a single number from 'true' random and use it to XOR all results from PRNG to produce final output.
Then, replace this number once a while (e.g. after 10000 random numbers are generated).
Perhaps this is naive, but I would hope this should improve some aspects of PRNG like period size with negligible impact on speed. What am I getting wrong?
What I am concerned about here is generating UUIDs (fast), which are basically 128-bit numbers which should be "really unique". What my concern is that using modern PRGN like xorshift family with periods close to 'just' 2^128 the chance of collision of entropy seeded PRNG generator is not as negligible as it would be with truly random numbers.

The improvements are only minor compared to the plain PRNG. For example the single true random number used for masking the result can be eliminated by taking the XOR of successive results. This will be the same value as the XOR of successive plain PRNG numbers. So if you can predict the PRNG, it is not too hard to do the same to the improved sequence.

Related

Random number from many other random numbers, is it more random?

We want to generate a uniform random number from the interval [0, 1].
Let's first generate k random booleans (for example by rand()<0.5) and decide according to these on what subinterval [m*2^{-k}, (m+1)*2^{-k}] the number will fall. Then we use one rand() to get the final output as m*2^{-k} + rand()*2^{-k}.
Let's assume we have arbitrary precision.
Will a random number generated this way be 'more random' than the usual rand()?
PS. I guess the subinterval picking amounts to just choosing the binary representation of the output 0. b_1 b_2 b_3... one digit b_i at a time and the final step is adding the representation of rand() to the end of the output.
It depends on the definition of "more random". If you use more random generators, it means more random state, and it means that cycle length will be greater. But cycle length is just one property of random generators. Cycle length of 2^64 usually OK for almost any purpose (the only exception I know is that if you need a lot of different, long sequences, like for some kind of simulation).
However, if you combine two bad random generators, they don't necessarily become better, you have to analyze it. But there are generators, which do work this way. For example, KISS is an example for this: it combines 3, not-too-good generators, and the result is a good generator.
For card shuffling, you'll need a cryptographic RNG. Even a very good, but not cryptographic RNG is inadequate for this purpose. For example, Mersenne Twister, which is a good RNG, is not suitable for secure card shuffling! It is because observing output numbers, it is possible to figure out its internal state, so shuffle result can be predicted.
This can help, but only if you use a different pseudorandom generator for the first and last bits. (It doesn't have to be a different pseudorandom algorithm, just a different seed.)
If you use the same generator, then you will still only be able to construct 2^n different shuffles, where n is the number of bits in the random generator's state.
If you have two generators, each with n bits of state, then you can produce up to a total of 2^(2n) different shuffles.
Tinkering with a random number generator, as you are doing by using only one bit of random space and then calling iteratively, usually weakens its random properties. All RNGs fail some statistical tests for randomness, but you are more likely to get find that a noticeable cycle crops up if you start making many calls and combining them.

Pseudorandom permutations vs random shuffle

I would like to apply a permutation test to a sequence with 4,000,000 elements. To my knowledge, it is infeasible due to a number of possible permutations being ridiculously large (no RNG will generate uniformly distributed values in range {1 ... 4000000!}). I've heard of pseudorandom permutations though, and it sounds like something I need, but I can't comprehend if it's actually a proper replacement for random shuffle in my case.
If you are running a permutation test I presume that you want to generate a random sample from the set of all possible permutations, so that you can test some statistic calculated on the real data against the distribution of statistics calculated on the permuted data.
Algorithms for generating random permutations, such as those described at http://en.wikipedia.org/wiki/Random_permutation, typically use many random numbers, so there is no requirement for any single step of the generation process to need numbers as large as 4000000!. The only worry would be that, since the seed used to generate the random numbers is typically much smaller than 4000000!, not all permutations are possible.
There are other statistical tests which consume very large quantities of pseudo-random numbers (e.g. MCMC), so I wouldn't worry about this if you are using a random number generator which is commonly used for statistical tests. If you are worried about this, you could repeat the test with a cryptographically secure random number generator, such as http://docs.oracle.com/javase/6/docs/api/java/security/SecureRandom.html. This will be slower, so you might need to reduce the number of permutations tested, but it is very unlikely that it has any characteristic which would stand out far enough to affect your test results, because any such characteristic would be a security weakness - it would mean that, given a large quantity of random numbers already generated, you would have a slightly better than random chance of guessing the next number correctly.

Diehard random number tester with a very small amount of numbers

I am trying to test 100 different sets of 100 random human generated numbers for randomness in comparison to the randomness of 100 different sets of 100 random computer generated numbers, but the diehard program wants a single set of around 100000 numbers.
I was wondering if it's possible to combine the human sets together into a block of 100000 numbers by using the human numbers as a seed for a pseudo number generator, and using the output as the number to test for the diehard program. I would do the same with the computer set with the same pseudo random generator. Would this actually change the result of the randomness if all I'm trying to prove is that computer generated numbers is more random than human generated numbers?
You can try just concatenating the numbers. I wouldn't think any combination would consistently be a lot better than some other. Any way of combining the numbers would cause them to lose some properties (possibly including the classification of 'random' by some test) regardless (some combinations more than others in certain cases, but if we're dealing with random numbers, you can't really predict much).
I'm not sure why you'd want to use the numbers as a seed for another random number generator (if I understand you correctly). This will not yield any useful applicable results. If you use a random number generator, you will get a sequence of numbers from a pseudo-random set, the seed will only determine where in this set you start, starting with any seed should produce as random results as starting with any other seed.
Any alleged test for randomness can, at best, say that some set is probably random. No test can measure true randomness accurately, that would probably contradict the definition of randomness.

More random numbers

So I get that all the built in function only return pseudo random numbers as they use the clock speed or some other hardware to get the number.
So here my idea, if I take two pseudo random numbers and bitwise them together would the result still be pseudo random or would it be closer to truly random.
I figured that if I fiddled about with the number a bit it would be less replicable, or am I getting this wrong.
On a side note why is pseudo random a problem?
It will not be more random, but there is a big risk that the number will be less random (less uniformly ditributed). What bitwise operator were you thinking about?
Lets assume 4-bit random numbers 0101, 1000. When OR:ed together you would get 1101. With OR there would be a clear bias towards 1111, with AND towards 0000. (75 % of getting a 1 or 0 respectively in each position)
I don't think XOR and XNOR would be biased. But also you wouldn't get any more randomness out of it (see Pavium's answer).
Algorithms executed by computers are deterministic.
You can only generate truly random numbers if there's a non-deterministic input.
Pseudo-random numbers follow a repeating sequence. Maybe a long sequence but the repetition makes them predictable and therefore not truly random.
You can't generate truly random numbers from two pseudo-random numbers.
EDITED: to put the sentences in a more logical order.

Generating two differnet pseudorandom numbers using the same seed

Is it possible to generate two different psuedorandom numbers on two separate program runs without using time as the seed? i.e. using the same seed on both runs, is it possible to get two different numbers?
In general, it is not possible to get different pseudorandom numbers using the same seed.
Pseudorandom numbers are, by definition, not truly random numbers and therefore are not composed from sources of entropy. Or, if the numbers do contain some entropy input, the input is not enough to cause the sequence to qualify as statistically "random." (An example of a property that such a sequence should have is runs of 1-bits n-bits-long with probability of 2^(-n), among many other properties of statistical randomness. The definition of statistical randomness becomes more sophisticated (in a sense, more "actual" or close to nature) as mathematics around randomness improves. This is another way of saying that, at any given time, the definitions of statistical randomness are about to become out-dated or obsolete.)
In any case, the vast majority of pseudorandom number generators are, in fact, completely deterministic.
The canonical1 example of a pseudorandom number generator is a linear feedback shift register (LFSR). The LFSR can be implemented as a digital logic circuit containing a register which holds N bits, some gates numbering M, much less than N (e.g., M=1, M=2), usually these are XOR gates, which "feed back" into the register's bits at certain "tap bits." There is a lot about this on the web.
Given the same seed input, the LFSR will always generate the same sequence.
It is possible, using Walsh-Hadamard matrices, or otherwise called "M matrices", additionally called "sequency transform", to sample the output of an LFSR and determine that the sequence is, in fact, from an LFSR and also the structure of its gates and taps, as well as the current register content. From this information all sequence values are known, and it is possible to reverse out the possible seed values which were used as input. For these reasons, LFSRs are not suitable for security purposes such as random tokens for authentication.
By canonical, I am refering to Don Knuth's use of the LFSR as an example, as well as the timeless tradition which has ensued therefrom.
Not sure if you want to generate 2 different random numbers from same seed - or avoid it! But, if you really do want that, then similar to LFSRs, LCGs (Linear Congruential Generators) are often used to generate deterministic psuedo random numbers. You can 'easily' create 2 simple LCGs using different constants, which will generate 2 different psuedo random numbers for the same seed.

Resources