Distributed System Random Number Generator - random

This may have to be more of a statistical/mathematical question, especially given the general nature of it. I'll move it if need be, but feedback is greatly appreciated.
I am very curious what the potential impact to a random number generator would be in a SOA design. For the sake of argument, let's assume a well seeded and design generator, with no issues and a perfectly random generation capability.
Within the full set of numbers created by this generator we see perfect randomness. But what happens when the consuming applications are distributed, and multiple remote hosts request a random number? Will it still be random to the host?
It seems like there is at least a potential reduction in the variability of the randomness, since each host will be taking a subset of the randomly generated numbers. And as such it could, by chance/coincidence create a more or less structured pattern. The pattern would only exist on a host by host basis, while the overall system would still seem to be generating perfectly random results.

Assuming you'd really generate random numbers, than any subset of those numbers would still be random.
Now since that isn't possible, what happens if you would take the random number generators which are used for security purposes like encryption (since those have more sophisticated algorithms than the simple generators)? I see two scenarios:
Your clients will talk to the server in the same order. So 10 clients means you get every 10th generated number. People could try to take those numbers and figure out the configuration of the random number generator. But they can do it also when they see all the numbers. And if any subset of those generators would be predictable, it wouldn't be useful for encryption anymore, so here you can argue that they still want any subset to be as well random.
Your clients will act completely independently and access the server in random orders. Here you actually get real randomness (if it is for instance based on user input) and therefore even badly designed pseudo random generators could give you a better random distribution on each client. Since those don't know if they are getting the next number of the generator or the 100th.

Related

Securely Use Random Number Generator for Lottery Winning

I want to design a lottery winning mechanism using random number generator. I know that for computer, there is no true randomness but only "pseudorandom". If the system gets hacked and random seed is seen, people will know the sequence of random numbers. In fact, there is news that people did this and won several lotteries. I am thinking about two ways of designing my system:
Use random number generator as a global variable. There is only one
random seed; the sequence is generated when the system starts.
Con:
a. Once the random seed is seen, hackers will know the sequence
easily.
b. Once the system crashes and restarts, the sequence will repeat
itself.
Create a random number generator using timestamp as random seed each
time to generate a number.
Con:
a. Obviously timestamp cannot be directly used. There are some
tricks needed to be done with the timestamp each time. For example,
plus or minus some values each time on the timestamp. What algorithm can I use here to do this kind of modification on timestamp?
b. Is this method even taking advantage of random number generator?
It seems I am just creating a random number by myself...
As we can see, either of the method above is not secure enough. Which way is slightly better? Or is there a better way?
The notion that computers are incapable of truly random numbers hasn't been true for decades. All modern desktop and laptop computers have true hardware-based random number generators. Even most small embedded systems do as well.
That said, it may be the case that your programming language hasn't caught up to the recent hardware, or that even if it has, it's easy to make a mistake with RNGs and get a bad result from a good generator. So it's probably a good idea to use something like random.org unless you know what you're doing.

How would one know if one saw a random number generator?

I have been reading various articles about random numbers and their generators. There are usually 3 important conclusions that I draw from them:
Random numbers are not truly random
Much of the time they have a bias (modulo bias)
Humans are incapable of being random number generators, when they are trying to "act randomly"
So, with the latter-most of these observations in mind, how would we be able to
Tell if a sequence of numbers that we see is truly random, and more importantly
Is there some way we can prove that said sequence is really random?
I'm tempted to say that so long as you generate a sufficiently large enough sample set 1,000,000+, you should see more or less a uniform dispersion of (pseudo)random numbers occur. However, I'm sure some Maths genius has a way of discrediting this, because surely the by laws of probability you could get a run of one number just as likely as any other sequence.
From what I have read, if you really need random numbers its best to try and reuse what cryptographic libraries use. The field of Cryptography is obviously complex and relies on random numbers for key generation. From the section in OWASP's guide titled "Reversible Authentication Tokens" it says this...
The only way to generate secure authentication tokens is to ensure
there is no way to predict their sequence. In other words: true random
numbers.
It could be argued that computers can not generate true random
numbers, but using new techniques such as reading mouse movements and
key strokes to improve entropy has significantly increased the
randomness of random number generators. It is critical that you do not
try to implement this on your own; use of existing, proven
implementations is highly desirable.
Most operating systems include functions to generate random numbers
that can be called from almost any programming language.
My take is that unless you're coding Cryptographic libraries yourself, put trust in those that are (e.g. use Java Cryptography Extension) so you don't have to proove it yourself.
Pretty Simple Test:
If you really want to get into testing random numbers, you could simulate a program that outputs random numbers from 1-100 100 times as an example.
Then look at those numbers and see if there's any patterns. Then follow that test by restarting the program several times and repeating the process.
Examine all data to figure out if random numbers are always random, just random during individual tests, or never. :P
Testing a random number generator is probably mostly up to what you want to look for. Even pure non-repeatability is no guarantee of randomness.
There are some companies that will test a random number generator for the purposes of certification (e.g. online casinos). One that I found quickly is called iTech Labs, though their testing methodology page leaves a lot to be desired in terms of technical detail.
Other testers and certification bodies publish the required data for a certification; there's more specific detail here but not as much as you want.
You could potentially do a statistical analysis and compare the results of your random number generator to a "true" random source but the argument could be made for bias from trying to translate the true random source into your possibility space anyway.
Randomness tests verify the mathematical properties of the sequence. For example entry frequencies (all symbols are expected to have the same frequency), local variance, sequence analysis (the probability of a symbol must not depend on the previous ones).
A definite proof does not exist, but there is a quality factor - the probability of a sequence to really be random.
Another criterion could be based on compressibility: true randomness has maximum entropy and can not therefore be compressed.
This test is not reliable for randomness, of course, but allows quick and dirty testing with ready tools such as zlib.

How different do random seeds need to be?

Consider code like this (Python):
import random
for i in [1, 2, 3, 4]:
random.seed(i)
randNumbers = [random.rand() for i in range(100)] # initialize a list with 100 random numbers
doStuff(randNumbers)
I want to make sure that randNumbers differ significantly from one call to another. Do I need to make sure the seed numbers differ significantly between the subsequent calls, or is it sufficient that the seeds are different (no matter how)?
To the pedants: please realize the above code is super-over-simplified
Short answer: Avoid the re-seeding, as it doesn't buy you anything here. Long answer below.
That all depends on what exactly you need. In Common defects in initialization of pseudorandom number generators it is outlined that linear dependent seeds (which 1, 2, 3, 4 definitely are) are a bad choice for initializing multiple PRNGs, at least when used for simulation and desiring uncorrelated results.
If all you do is rolling a few dice, or generating some pseudo-random input for something uncritical, then it very likely doesn't matter.
Note also that using some classes of a PRNG itself for generating seeds have the same problem in generating linear dependent numbers (LCGs spring to mind).
If your random number generator is high quality, it shouldn't matter how you seed it. In fact, the best practice would be to seed it only once. Random number generators are designed to have certain statistical behavior once they're started. Frequently reseeding effectively creates a different random number generator, one that may not be as good.
Randomly selecting seeds sounds like a good idea, but it isn't. In fact, because of the "birthday paradox," there's a surprisingly high probability that you'll pick the same seed twice.
Generally speaking, you only seed your random number generator when you need the random numbers to be generated in identical fashion each time through. This is useful when you have a random component to your processing, but need to test it and therefore want it to be consistent between tests. Otherwise, you let the system seed the generator itself.
In otherwords, by seeding the random number generator with specific pre-defined seeds, you are actually reducing the randomness of the system as a whole. The random numbers generated when using a seed of 1 are indeed psuedo-randomly different from that with a seed of 2, but a hard coded seed will result in repeated random sequences in each run of the program.
You seem to want pseudo-random numbers that aren't pseudo-random, with a higher probability of consecutive numbers being 'significantly' different than pseudo-randomness requires. I doubt that any common prng will do this, whatever your seeding strategy.
The seeds themselves should be random so that the output is unpredictable. There can be problems if the seeds differ only in one or two bits (as this question demonstrates).
It depends upon the application for which you're using the PRNG. If you're using something that needs to be cryptographically sound, then the seeds generally need to be extremely difficult to deduce based on the output, different every time the application runs, difficult to simply guess, and impossible to determine by reverse engineering the application (i.e. they can't be hard coded).
If your goal is a game, your requirements may be different. For example, if you're controlling computer strategy, but the computer's strategy remains the same for all runs of the game, you may have an easily beatable game. Then again, you may want that for "easy" mode.

Why is it hard for a program to generate random numbers?

My kids asked me this question and I couldn't really give a concise, understandable explanation.
So I'm hoping someone on SO can.
How about, "Because computers just follow instructions, and random numbers are the opposite of following instructions. If you make a random number by following instructions, then it's not very random! Imagine trying to give someone instructions on how to choose a random number."
Here's a kid friendly explanation:
Get a Dice (the number of sides doesn't matter)
Write these down on a piece of paper:
Move right
Move up
Move up
Turn the dice over
Move down
Move right
Show them the dice and paper. Explain that the dice represents the computer and the
paper represent the math or algorithm that tells the computer what number it will return.
Now, roll the dice. Tell them that you are "seeding" or asking the computer to start at a random dice position.
Follow each step in the paper (move right) by moving the dice.
Let's say that you threw a 6 sided die and it was seeded at 5. By moving right, you get a 4.
Explain that the computer must start with a starting value. This could be given by any number of sources such as the date or mouse movement. Show them that how they throw the dice determines the starting value.
Explain that the piece of paper is how the computer get the next number. Tell them that the instructions on the paper can be changed as easily as the algorithm for the random generator can be changed by the programmer.
Have fun showing them the various possibilities that is only limited by their imaginations.
Now for the answer to your question:
Tell them that when a good mathematician knows the starting value and what step the computer is currently at, the mathematician can tell what is the next value of the random number.
Ask the child were to hide the paper and throw the dice.
Then ask the child to follow the steps on the paper, you then write down how he gets the next random number.
Afterwards, show them your paper. Now that you have a copy of their random number generator, its easy for anyone else to "guess" the next random to come out.
No matter how creative the child is with their algorithm, you should still be able to deduce their algorithm. Tell your child that in the computer world, nothing is hidden and just by observation, even if its just the numbers that was observed, the random number algorithm can be discovered.
...as a side effect, if the child was able to come up with a good algorithm that confused you, in which you can't deduce the next sequence, then you have a bright child. :D
Here's my attempt at explaining randomness at an approximately eighth-grade level. Hope your kids find it useful!
Surprising as it may seem, a computer is not very smart. Computers must follow their instructions blindly, and are therefore completely predictable. A computer that doesn't follow its instructions in this manner is, in fact, broken! We want computers to do exactly what we tell them.
That's precisely what makes it hard to do things randomly. Computers must be told a sequence of instructions on how to generate random numbers. But that's not really random, because if you gave anybody else the instructions and the same starting point, they could come up with the same answers. So computers can't be truly random just by following instructions.
Ask them to devise a step-by-step method to generate a random number.
And don't accept "pick a number from 1 to 10" as an answer ;)
Trying out a problem should illustrate the difficulty of having to generate random numbers from a set of instructions, just like what computers actually have to do.
Because computers are deterministic machines.
Generating random numbers on a computer is like playing "Eenie meenie miney moe" when choosing who's It first in a game of tag. On the surface it does look random, but when you get into the details, it's completely deterministic. It's hard to make eenie meenie miney moe into a scheme that a person really can't predict the outcome of.
Also there's some difficulties with getting the distribution nice and even.
Because given any input, an algorithm produces the exact same output every single time. And you can't just provide a "random" input, because you're trying to generate the random number in the first place.
"Kids, unless they're broken, computers never lie, and they always do what you tell them to do. Even when we are disappointed by the results, it always turns out that they were doing what they were told to do with complete fidelity. They can only do two things: add one and one, and move a number from one place to another. If you want them to produce random numbers, you need to explain to them how to do that in terms of adding one and one and moving. Once you have explained that, the results will not be random."
Because the only true source of randomness exists at the quantum level. With suitable hardware assists, computers can access this level. for example, they can sample the decay of a radioactve isotope or the noise from a thermionic valve. But your basic PC doesn't come with this cool stuff.
A simple explanation for the children:
The definition of randomness is a philosophical and mathematical question, beyond the scope of this answer, but by definition there is no such thing as a "random" number. In a metaphysical sense, a number is only random in sequential form; however, there is a probability that a sequence follows certain statistical distributions depending on the sample size. A random number generator (in our case a pseudo-random number generator, or PRNG) is simply a device to produce a quasi-random sequence of numbers that we can only estimate (based on the given probability inherent within the sequence) to be random.
You should explain to the children that programs can only mimic these devices using complex mathematical formulas (which guarantee a lack of "randomness" by definition because they are a result of some function, or procedural algorithm). Typically, rigorous statistical analysis is necessary in order to differentiate the use of a quantum hardware PRNG (use this as an opportunity to explain to your kids the Heisenberg Principle!) and that of a strong software PRNG.
Had to be done really
Source: http://xkcd.com/221/
Because there is no such thing as a random number.
Random is a human concept that we use when we cannot comprehend data and do not understand it. If we are to believe that science will ultimately lead to an understanding of how everything works then surely everything is deterministic.
Take away the human and there is no random there is only "this". It happens because it happens, not because it is random.
Because a program is a system and everything in a system is made to run with consistency and regularity. Randomness has no place in a system.
It is hard because given the same sets of inputs and conditions, a program will produce the same result everytime. This by definition is not random.
Algorithms to generate random numbers are inevitably deterministic. They take a small random seed, and use it to obtain a long string of pseudo-random digits.
It's very difficult to do this without introducing subtle patterns into the data. A string of digits can look perfectly random but have repeated patterns which make the distribution innappropriate for applications where randomness is required.
Computers can only execute algorithmic computations, and a truly random number isn't an algorithmic thing. You can get algorithms that produce numbers that behave like random numbers; such algorithms are called 'Pseudo-Random number generators'.
At various times in the past, people have made random number generators from analog-digital converters connected to sources of electronic noise, but this tends to be fairly specialised kit.
Primarily because computers don't have any functions that behave in discrete, non-random ways. A computer is predictable, which allows us to program reliable software. If it wasn't predictable it would be easier to generate a random number (since our software could rely on this unpredictable method).
While it's possible to generate pseudo-random numbers, and numbers that are distributed randomly, you cannot generate truly random numbers without separate hardware. There is hardware that generates truly random numbers based on "quantum" interactions (at least according to the manufacturers). Online poker sites sometimes use these adapters for their generators.
Apparently there are even online services to provide random numbers - random.org for example.
As surprising as it may seem, it is difficult to get a computer to do something by chance. A computer follows its instructions blindly and is therefore completely predictable. (A computer that doesn't follow its instructions in this manner is broken.) There are two main approaches to generating random numbers using a computer: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs).
Actually, on most modern computers it's not hard to produce numbers that are "random enough" for most purposes. As others have noted, the critical thing is having a source of randomness. You can't just write a program that will produce randomness algorithmically, but you can observe randomness in the various activities of most computers of reasonable complexity, i.e., the ones we typically think of when writing programs. One such source is timing data of interrupts from various system devices.
At one time many computers had no way to get at this data and could only offer pseudorandomness, that is, a random, but repeatable distribution of numbers based on a particular seed. For many purposes this is sufficient -- choosing a different seed each time results in good enough randomness. For other purposes, such as encryption, this isn't strong enough and you need some randomness to start with that isn't repeatable or predictable. Today, most computers (with the exception of embedded devices, perhaps) are sophisticated enough to have a source of randomness that can generate encryption-strength random numbers. For instance, Linux has /dev/random and the .NET framework supports the cryptographically strong RandomNumberGenerator class which has a number of implementations.
Its probably helpful to distinguish between a number that is hard to predict (which a computer can create) from something that is not deterministic (which is a bit tougher for computers, and theoretically, any physical being).
It's easy to come up with an algorithm that generates unexpected numbers, that appear random in some sense. But to design an algorithm that generates true random numbers, well, that's hard.
Imagine designing an algorithm to simulate a dice roll. You can easily formulate some procedure to generate different numbers on each iteration. But can you guarantee that, in the long run (I mean, up to the infinity), the amount of times that 6 came out will be the same as any other number? When designing a good random number generator, that's the kind of commitment that you have to assume. You have to provide strong guarantees (i.e. mathematical proofs) about the randomness, if the application (e.g. lottery) requires it.
It is relevant to note that humans perform very poorly at generating random numbers. Computers are worse because they just follow a strict set commands. Humans can only generate good (pseudo) random numbers when following an algorithm, a set of commands. Computers are the same.
Although it should be noted that computers can gather entropy from the "environment" connected to it, like keyboard and mouse actions, what aids in generating random numbers (either directly or by seeding a PRNG).
To make the computer generate a random number, the computer has to have a source of randomness to start with.
It has to be feeded a seed that can't be expected or calculated by just looking at the seed, if the seed comes from a clock then it can be predicted or calculated by knowing the time, if the seed comes from like filming a lavalamp and get numbers from the picture stream then it's harder to just look at the seed to know what next number will be.
The computer does not have an built in lava lamp to generate that randomness, thats whats make it hard, we have to substitute real randomness with some input that exists in the computer, maybe by logging passing tcpip-packets or other things, but its not many ways to get that randomness sources in.
Computers just don't have suitable hardware. Ordinary computer's hardware is meant to be deterministic. With suitable hardware like mentioned here random numbers are not a problem at all.
Awhile back I came across the "Dice-O-Matic"
http://GamesByEmail.com/News/DiceOMatic
Kind of interesting real world application of the problem.
Its not hard, here's a couple for free: 12, 1400, 397.6

Do stateless random number generators exist?

Is there a difference between generating multiple numbers using a single random number generator (RNG) versus generating one number per generator and discarding it? Do both implementations generate numbers which are equally random? Is there a difference between the normal RNGs and the secure RNGs for this?
I have a web application that is supposed to generate a list of random numbers on behalf of clients. That is, the numbers should appear to be random from each client's point of view. Does this mean I need retain a separate random RNG per client session? Or can I share a single RNG across all sessions? Or can I create and discard a RNG on a per-request basis?
UPDATE: This question is related to Is a subset of a random sequence also random?
A random number generator has a state -- that's actually a necessary feature. The next "random" number is a function of the previous number and the seed/state. The purists call them pseudo-random number generators. The numbers will pass statistical tests for randomness, but aren't -- actually -- random.
The sequence of random values is finite and does repeat.
Think of a random number generator as shuffling a collection of numbers and then dealing them out in a random order. The seed is used to "shuffle" the numbers. Once the seed is set, the sequence of numbers is fixed and very hard to predict. Some seeds will repeat sooner than others.
Most generators have period that is long enough that no one will notice it repeating. A 48-bit random number generator will produce several hundred billion random numbers before it repeats -- with (AFAIK) any 32-bit seed value.
A generator will only generate random-like values when you give it a single seed and let it spew values. If you change seeds, then numbers generated with the new seed value may not appear random when compared with values generated by the previous seed -- all bets are off when you change seeds. So don't.
A sound approach is to have one generator and "deal" the numbers around to your various clients. Don't mess with creating and discarding generators. Don't mess with changing seeds.
Above all, never try to write your own random number generator. The built-in generators in most language libraries are really good. Especially modern ones that use more than 32 bits.
Some Linux distros have a /dev/random and /dev/urandom device. You can read these once to seed your application's random number generator. These have more-or-less random values, but they work by "gathering noise" from random system events. Use them sparingly so there are lots of random events between uses.
I would recommend using a single generator multiple times. As far as I know, all the generators have a state. When you seed a generator, you set its state to something based on the seed. If you keep spawning new ones, it's likely that the seeds you pick will not be as random as the numbers generated by using just one generator.
This is especially true with most generators I've used, which use the current time in milliseconds as a seed.
Hardware-based, true [1], random number generators are possible, but non-trivial and often have low mean rates. Availablity can also be an issue [2]. Googling for "shot noise" or "radioactive decay" in combination with "random number generator" should return some hits.
These systems do not need to maintain state. Probably not what you were looking for.
As noted by others, software systems are only pseudo-random, and must maintain state.
A compromise is to use a hardware based RNG to provide an entropy pool (stored state) which is made available to seed a PRNG. This is done quite explicitly in the linux implementation of /dev/random [3] and /dev/urandom [4].
These is some argument about just how random the default inputs to the /dev/random entropy pool really are.
Footnotes:
modulo any problems with our understanding of physics
because you're waiting for a random process
/dev/random features direct access to the entropy pool seeded from various sources believed to be really or nearly random, and blocks when the entropy is exhausted
/dev/urandom is like /dev/random, but when the entopy is exhausted a cryptographic hash is employed which makes the entropy pool effectively a stateful PRNG
If you create a RNG and generate a single random number from it then discard the RNG, the number generated is only as random as the seed used to start the RNG.
It would be much better to create a single RNG and draw many numbers from it.
As people have already said, it's much better to seed the PRNG once, and reuse it. A secure PRNG is simply one which is suitable for cryptographic applications. The only way re-seeding each time will give reasonably random results is where it comes from a genuinely random "real world" source - ie specialised hardware. Even then, it's possible that the source is biased and it will still be theoretically better to use the same PRNG over.
Normally seeding a new state takes quite while for a serious PRNG, and making new ones each time won't really help much.
The only case I can think of where you might want more than one PRNG is for different systems, say in a casino game you have one generator for shuffling cards and a separate one to generate comments done by the computer control characters, this way REALLY dedicated users can't guess outcomes based on character behaviors.
A nice solution for seeding is to use this (Random.org) , they supply random numbers generated from the atmospheric noise for free. It could be a better source for seeding than using time.
Edit: In your case, I would definitely use one PRNG per client, if for no other reason than for good programming standards. Anyways if you share one PRNG among clients, you will still be providing pseudo-random values to each, of a quality equal to your PRNG's quality. So that's a viable option but seems like a bad policy for programming
It's worth mentioning that Haskell is a language which attempts to entirely eliminate mutable state. In order to reconcile this goal with hard-requirements like IO (which requires some form of mutability), monads must be used to thread state from one calculation to the next. In this way, Haskell implements its pseudo-random number generator. Strictly speaking, generating random numbers is an inherently stateful operation, but Haskell is able to hide this fact by moving the state "mutation" into the bind (>>=) operation.
This probably sounds a little abstract, and it doesn't really answer your question completely, but I think it is still applicable. From a theoretical standpoint, it is impossible to work with a RNG without involving state. Regardless, there are techniques which can be used to mitigate this interaction and make it appear as if the entire operation is of a stateless nature.
It's generally better to create a single PRNG and pull multiple values from it. Creating multiple instances means you need to ensure that the seeds for the instances are guaranteed unique, which will require incorporating instance-specific information.
As an aside, there are better "true" Random Number Generators, but they usually require specialized hardware which does things like derive random data from electrical signal variance inside the computer. Unless you're really worried about it, I'd say the Pseudo Random Number Generators built into the language libraries and/or OS are probably sufficient, as long as your seed value is not easily predictable.
The use of a secure PRNG depends on your application. What are the random numbers used for?
If they're something of real value (e.g. anything cryptographically related), you wouldn't want to use anything less.
Secure PRNGs are much slower, and may require libraries to do operation of arbitrary precision, and primality testing, etc etc...
Well, as long as they are seeded differently each time they're created, then no, I don't think there'd be any difference; however, if it depended on something like the time, then they'd probably be non-uniform, due to the biased seed.

Resources