Related
Basically I'm looking for a detective function. I pass it a list of integers (probably between 20 and 100 integers) and it tell me "Yeah, 84% chance this came from a PRNG, I tested it against the main ones that most modern programming languages use", or "No, only 12% chance this came from a well-known PRNG".
If it helps (or hinders), the integers will always be between 1 and 999.
Does this exist?
Unless you are prepared to break new ground in number theory, you would only be able to detect obsolete, badly designed, or poorly seeded PRNGs. Good PRNGs are explicitly designed to prevent what you are trying to do. Random number generation is a critical part of digital cryptography, so a lot of effort goes into producing random numbers that meet all known tests.
There are batteries of tests to profile PRNGs. See for example this NIST page.
As the comments point out, the first two sentences are overstated and are only strictly true for PRNGs that may be used in cryptography. Weaker (i.e. more predictable) PRNGs might be chosen for other domains in order to improve time or space performance.
You can write a battery of tests for a list of candidate generators, but there are a lot of generators, and some have enormous state where adjacent values of a well-seeded generator will reveal nothing useful and you'll have to see wait for a long time before you can get the two data points which will have an informative relationship.
On the plus side; while the list of random number generators that you might encounter is vast, there are telltale signs that will help you identify some classes of simple generators quickly and then you can perform focussed analysis to derive the specific configuration.
Unfortunately even a simple generator like KISS shows that while the generator can be trivially broken when you know its configuration, it can hide its signature from anything that does not know its configuration, leaving you in a situation where you have to individually test for every possible configuration.
There are quality tests like dieharder and TestU01 which will consume many megabytes of data to identify any weakness in a generator; however, these can also identify weaknesses in real RNGs, so they could give a strong false positive.
To consume only a 100 integers you would really need to have a list of generators in mind. For example, to detect LCG used inappropriately, you simply test to see if the bottom three bits cycle through a repeating pattern of 8 values -- but this is by far the easiest case.
If you had a sequence 625 or more 32-bit integers, you could detect with high confidence whether it was from consecutive calls to Mersenne Twister. That is because it leaks state information in the output values.
For an example of how it is done, see this blog entry.
Similar results are in theory possible when you don't have ideal data such as full 32-bit integers, but you would need a longer sequence and the maths gets harder. You would also need to know - or perhaps guess by trying obvious options - how the numbers were being reduced from the larger range to the smaller one.
Similar results are possible from other PRNGs, but generally only the non-cryptographic ones.
In principle you could identify specific PRNG sequences with very high confidence, but even simple barriers such as missing numbers from the strict sequence can make it a lot harder. There will also be many PRNGs that you will not be able to reliably detect, and typically you will either have close to 100% confidence of a match (to a hackable PRNG) or 0% confidence of any match.
Whether or not a PRNG is a hackable (and therefore could be detected by the numbers it emits) is not a general indicator of PRNG quality. Obviously, "hackable" is opposite to a requirement for "secure", so don't consider Mersenne Twister for creating unguessable codes. However, do consider it as a source of randomness for e.g. neural networks, genetic algorithms, monte-carlo simulations and other places where you need a lot of statistically random-looking data.
I have been reading various articles about random numbers and their generators. There are usually 3 important conclusions that I draw from them:
Random numbers are not truly random
Much of the time they have a bias (modulo bias)
Humans are incapable of being random number generators, when they are trying to "act randomly"
So, with the latter-most of these observations in mind, how would we be able to
Tell if a sequence of numbers that we see is truly random, and more importantly
Is there some way we can prove that said sequence is really random?
I'm tempted to say that so long as you generate a sufficiently large enough sample set 1,000,000+, you should see more or less a uniform dispersion of (pseudo)random numbers occur. However, I'm sure some Maths genius has a way of discrediting this, because surely the by laws of probability you could get a run of one number just as likely as any other sequence.
From what I have read, if you really need random numbers its best to try and reuse what cryptographic libraries use. The field of Cryptography is obviously complex and relies on random numbers for key generation. From the section in OWASP's guide titled "Reversible Authentication Tokens" it says this...
The only way to generate secure authentication tokens is to ensure
there is no way to predict their sequence. In other words: true random
numbers.
It could be argued that computers can not generate true random
numbers, but using new techniques such as reading mouse movements and
key strokes to improve entropy has significantly increased the
randomness of random number generators. It is critical that you do not
try to implement this on your own; use of existing, proven
implementations is highly desirable.
Most operating systems include functions to generate random numbers
that can be called from almost any programming language.
My take is that unless you're coding Cryptographic libraries yourself, put trust in those that are (e.g. use Java Cryptography Extension) so you don't have to proove it yourself.
Pretty Simple Test:
If you really want to get into testing random numbers, you could simulate a program that outputs random numbers from 1-100 100 times as an example.
Then look at those numbers and see if there's any patterns. Then follow that test by restarting the program several times and repeating the process.
Examine all data to figure out if random numbers are always random, just random during individual tests, or never. :P
Testing a random number generator is probably mostly up to what you want to look for. Even pure non-repeatability is no guarantee of randomness.
There are some companies that will test a random number generator for the purposes of certification (e.g. online casinos). One that I found quickly is called iTech Labs, though their testing methodology page leaves a lot to be desired in terms of technical detail.
Other testers and certification bodies publish the required data for a certification; there's more specific detail here but not as much as you want.
You could potentially do a statistical analysis and compare the results of your random number generator to a "true" random source but the argument could be made for bias from trying to translate the true random source into your possibility space anyway.
Randomness tests verify the mathematical properties of the sequence. For example entry frequencies (all symbols are expected to have the same frequency), local variance, sequence analysis (the probability of a symbol must not depend on the previous ones).
A definite proof does not exist, but there is a quality factor - the probability of a sequence to really be random.
Another criterion could be based on compressibility: true randomness has maximum entropy and can not therefore be compressed.
This test is not reliable for randomness, of course, but allows quick and dirty testing with ready tools such as zlib.
Given a pseudo-random binary sequence (e.g.: 00101010010101) of finite values, predict how the sequence will continue. Can someone please tell me the easiest way to do it? Or in case it's too difficult for someone who can barely play solitaire on its computer, can someone tell me where to get my first steps...
PS: can this technique be used to predict the colour of the next electronic roulette number (e.g.: assigning 1 and 0 to red and black respectively)?
Cryptographically secure pseudorandom number generators are intended specifically to make what you want to do impossible. In particular, they satisfy the "next bit test": given k bits of their output, you cannot guess bit k+1 with probability greater than 1/2.
Plain pseudorandom number generators that do not satisfy the next bit test can be attacked and in fact security vulnerabilities have been discovered in real world systems due to the choice of PRNG. In particular, linear congruential generators are known to be somewhat (or completely) predictable, and some versions of Unix random may use this algorithm. This method is quite math intensive though. If you want to go down this path a search for "linear congruential generator prediction" is a place to start.
Another attack if you are aware of the PRNG implementation is to try to determine the seed used to generate the sequence you are analyzing. The seed is sometimes based on guessable information like time of day, process ID, etc.
Well, for pseudo-random sequences, the only possibility is to keep count how many of each possibility has come before. If the 1s outweigh the 0s, it's more likely that the next one will be 0. How much more likely depends on the relative occurrences of each.
Note that this won't work for true randomness since the events are independent, despite what the statisticians tell you :-)
You'll find that out (painfully) the first time you get a run of 13 reds on the table when you're using the double-on-loss method of playing roulette. In any case, the house derives its advantage from 0 (and double-0 on some tables) which are neither red nor black.
This is a decent question but I think if "you can barely play solitaire" it might be out of your reach right now.
You should look into picking up a basic language, and most are going to say PHP but I'm wary of recommending that to a beginner (it's pretty easy to get working though, see:XAMPP). Java is probably an "easy-to-get-running-and-work-with" language but I'm sure there's better threads on here about which language to start with (Python or something probably wins because experienced programmers love it).
By the way, your English is fine (I didn't notice you were a non-native English speaker).
Now, as for your question, if you're looking at true pattern matching. I'd be inclined to convert this idea to code:
"CURRENTPOINT" is end of first letter.
LOOP: Pick letter(s) from Start to "CURRENTPOINT"
Break the rest of your binary string into blocks of the same size.
See if these blocks all equal your picked letters.
If not, move "CURRENTPOINT" along and repeat the LOOP until you run out of letters.
If so, you have your "repeating section."
If you're just guessing that the random generator is temporarily biased, and that this bias will re-establish a baseline (balanced 0s and 1s) in the reasonably short-term then you can compare the count of each 0s and 1s and say the other is more likely based on the deviation from your baseline. However, be careful of the Monte Carlo fallacy.
To answer the PS first: No, because roulette spins are independent events so there's nothing predictive in the historical sequence of outcomes.
The general question is hard and interesting.
This website can infer a surprising number of sequences from their initial values:
http://www.research.att.com/~njas/sequences/
Note that it's for arbitrary integer sequences.
I tried it on simple patterns like {0,0,1,1,0,0,1,1,...} and it says the right thing.
I noticed that nobody told you about periodicity.
Pseudo-random sequence always works on mathematical operation. (until the quantic computer ^^)
An usual way to generate one is to divide two prime number (not sure it's the right word but whatever).
for instance
1/3=1.333333.....
9/7=1,2857142857142857142857142857143
Those are fairly small number and what do we notice? Periodicity.
1/3=1.3 3 3 3 3 3.....
9/7=1,2857 142857 142857 142857 142857 143
The more big is the prime number the more the sequence in that case: 3 and 142857 will be big
So if you look to a pseudo-random sequence for a long time you may find a periodicity and be able to "guess" the next number. But that could take a while.
PS: sorry for my English, I’m a bit rusty ^^
What you need to think about is the properties of randomness, study those. For example, "Randomness runs in bunches". Compare a random sequence against a predictable sequence: you won't normally find bunches in the predictable one. To take advantage of bunches wait for the bunch. And with a little luck you will win.
What is the best algorithm to take a long sequence of integers (say 100,000 of them) and return a measurement of how random the sequence is?
The function should return a single result, say 0 if the sequence is not all all random, up to, say 1 if perfectly random. It can give something in-between if the sequence is somewhat random, e.g. 0.95 might be a reasonably random sequence, whereas 0.50 might have some non-random parts and some random parts.
If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1. If I passed the sequence 1, 2, ... 100,000 to it, it should return 0.
This way I can easily take 30 sequences of numbers, identify how random each one is, and return information about their relative randomness.
Is there such an animal?
…..
Update 24-Sep-2019: Google may have just ushered in an era of quantum supremacy says:
"Google’s quantum computer was reportedly able to solve a calculation — proving the randomness of numbers produced by a random number generator — in 3 minutes and 20 seconds that would take the world’s fastest traditional supercomputer, Summit, around 10,000 years. This effectively means that the calculation cannot be performed by a traditional computer, making Google the first to demonstrate quantum supremacy."
So obviously there is an algorithm to "prove" randomness. Does anyone know what it is? Could this algorithm also provide a measure of randomness?
Your question answers itself. "If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1", except the digits of Pi are not random numbers so if your algorithm does not recognise a very specific sequence as being non-random then it's not very good.
The problem here is there are many types of non random-ness:-
eg. "121,351,991,7898651,12398469018461" or "33,27,99,3000,63,231" or even "14297141600464,14344872783104,819534228736,3490442496" are definitely not random.
I think what you need to do is identify the aspects of randomness that are important to you-
distribution, distribution of digits, lack of common factors, the expected number of primes, Fibonacci and other "special" numbers etc. etc.
PS. The Quick and Dirty (and very effective) test of randomness does the file end up roughly the same size after you gzip it.
It can be done this way:
CAcert Research Lab does a Random Number Generator Analysis.
Their results page evaluates each random sequence using 7 tests (Entropy, Birthday Spacing, Matrix Ranks, 6x8 Matrix Ranks, Minimum Distance, Random Spheres, and the Squeeze). Each test result is then color coded as one of "No Problems", "Potentially deterministic" and "Not Random".
So a function can be written that accepts a random sequence and does the 7 tests.
If any of the 7 tests are "Not Random" then the function returns a 0. If all of the 7 tests are "No Problems", then it returns a 1. Otherwise, it can return some number in-between based on how many tests come in as "Potentially Deterministic".
The only thing missing from this solution is the code for the 7 tests.
You could try to zip-compress the sequence. The better you succeed the less random the sequence is.
Thus, heuristic randomness = length of zip-code/length of original sequence
As others have pointed out, you can't directly calculate how random a sequence is but there are several statistical tests that you could use to increase your confidence that a sequence is or isn't random.
The DIEHARD suite is the de facto standard for this kind of testing but it neither returns a single value nor is it simple.
ENT - A Pseudorandom Number Sequence Test Program, is a simpler alternative that combines 5 different tests. The website explains how each of these tests works.
If you really need just a single value, you could pick one of the 5 ENT tests and use that. The Chi-Squared test would probably be the best to use, but that might not meet the definition of simple.
Bear in mind that a single test is not as good as running several different tests on the same sequence. Depending on which test you choose, it should be good enough to flag up obviously suspicious sequences as being non-random, but might not fail for sequences that superficially appear random but actually exhibit some pattern.
You can treat you 100.000 outputs as possible outcomes of a random variable and calculate associated entropy of it. It will give you a measure of uncertainty. (Following image is from wikipedia and you can find more information on Entropy there.) Simply:
You just need to calculate the frequencies of each number in the sequence. That will give you p(xi) (e.g. If 10 appears 27 times p(10) = 27/L where L is 100.000 for your case.) This should give you the measure of entropy.
Although it will not give you a number between 0 to 1. Still 0 will be minimal uncertainty. However the upper bound will not be 1. You need to normalize the output to achieve that.
What you seek doesn't exist, at least not how you're describing it now.
The basic issue is this:
If it's random then it will pass tests for randomness; but the converse doesn't hold -- there's no test that can verify randomness.
For example, one could have very strong correlations between elements far apart and one would generally have to test explicitly for this. Or one could have a flat distribution but generated in a very non-random way. Etc, etc.
In the end, you need to decide on what aspects of randomness are important to you, and test for these (as James Anderson describes in his answer). I'm sure if you think of any that aren't obvious how to test for, people here will help.
Btw, I usually approach this problem from the other side: I'm given some set of data that looks for all I can see to be completely random, but I need to determine whether there's a pattern somewhere. Very non-obvious, in general.
"How random is this sequence?" is a tough question because fundamentally you're interested in how the sequence was generated. As others have said it's entirely possible to generate sequences that appear random, but don't come from sources that we'd consider random (e.g. digits of pi).
Most randomness tests seek to answer a slightly different questions, which is: "Is this sequence anomalous with respect to a given model?". If you're model is rolling ten sided dice, then it's pretty easy to quantify how likely a sequence is generated from that model, and the digits of pi would not look anomalous. But if your model is "Can this sequence be easily generated from an algorithm?" it becomes much more difficult.
I want to emphasize here that the word "random" means not only identically distributed, but also independent of everything else (including independent of any other choice).
There are numerous "randomness tests" available, including tests that estimate p-values from running various statistical probes, as well as tests that estimate min-entropy, which is roughly a minimum "compressibility" level of a bit sequence and the most relevant entropy measure for "secure random number generators". There are also various "randomness extractors", such as the von Neumann and Peres extractors, that could give you an idea on how much "randomness" you can extract from a bit sequence. However, all these tests and methods can only be more reliable on the first part of this definition of randomness ("identically distributed") than on the second part ("independent").
In general, there is no algorithm that can tell, from a sequence of numbers alone, whether the process generated them in an independent and identically distributed way, without knowledge on what that process is. Thus, for example, although you can tell that a given sequence of bits has more zeros than ones, you can't tell whether those bits—
Were truly generated independently of any other choice, or
form part of an extremely long periodic sequence that is only "locally random", or
were simply reused from another process, or
were produced in some other way,
...without more information on the process. As one important example, the process of a person choosing a password is rarely "random" in this sense since passwords tend to contain familiar words or names, among other reasons.
Also I should discuss the article added to your question in 2019. That article dealt with the task of sampling from the distribution of bit strings generated by pseudorandom quantum circuits, and doing so with a low rate of error (a task specifically designed to be exponentially easier for quantum computers than for classical computers), rather than the task of "verifying" whether a particular sequence of bits (taken out of its context) was generated "at random" in the sense given in this answer. There is an explanation on what exactly this "task" is in a July 2020 paper.
In Computer Vision when analysing textures, the problem of trying to gauge the randomness of a texture comes up, in order to segment it. This is exactly the same as your question, because you are trying to determine the randomness of a sequence of bytes/integers/floats. The best discussion I could find of image entropy is http://www.physicsforums.com/showthread.php?t=274518 .
Basically, its the statistical measure of randomness for a sequence of values.
I would also try autocorrelation of the sequence with itself. In the autocorrelation result, if there is no peaks other than the first value that means there is no periodicity to your input.
I would use Claude Shannon’s Information Entropy algorithm. You can find the calculation on Youtube easily. I guess it really depends upon why you want this to be measured, and what type of reporting you want to do with the data points you collect.
#JohnFx "... mathematically impossible."
poster states: take a long sequence of integers ...
Thus, just as limits are used in The Calculus, we can take the value as being the value - the study of Chaotics shows us finite limits may 'turn on themselves' producing tensor fields that provide the illusion of absolute(s), and which can be run as long as there is time and energy. Due to the curvature of space-time, there is no perfection - hence the op's "... say 1 if perfectly random." is a misnomer.
{ noted: ample observations on that have been provided - spare me }
According to your position, given two byte[] of a few k, each randomized independently - op could not obtain "a measurement of how random the sequence is" The article at Wiki is informative, and makes definite strides dis-entagling the matter, but
In comparison to classical physics, quantum physics predicts that the properties of a quantum mechanical system depend on the measurement context, i.e. whether or not other system measurements are carried out.
A team of physicists from Innsbruck,
Austria, led by Christian Roos and
Rainer Blatt, have for the first time
proven in a comprehensive experiment
that it is not possible to explain
quantum phenomena in non-contextual
terms.
Source: Science Daily
Let us consider non-random lizard movements. The source of the stimulus that initiates complex movements in the shed tails of leopard geckos, under your original, corrected hyper-thesis, can never be known. We, the experienced computer scientists, suffer the innocent challenge posed by newbies knowing too well that there - in the context of an un-tainted and pristine mind - are them gems and germinators of feed-forward thinking.
If the thought-field of the original lizard produces a tensor-field ( deal with it folks, this is front-line research in sub-linear physics ) then we could have "the best algorithm to take a long sequence" of civilizations spanning from the Toba Event to present through a Chaotic Inversion". Consider the question whether such a thought-field produced by the lizard, taken independently, is a spooky or knowable.
"Direct observation of Hardy's paradox
by joint weak measurement with an
entangled photon pair," authored by
Kazuhiro Yokota, Takashi Yamamoto,
Masato Koashi and Nobuyuki Imoto from
the Graduate School of Engineering
Science at Osaka University and the
CREST Photonic Quantum Information
Project in Kawaguchi City
Source: Science Daily
( considering the spooky / knowable dichotomy )
I know from my own experiments that direct observation weakens the absoluteness of perceptible tensors, distinguishing between thought and perceptible tensors is impossible using only single focus techniques because the perceptible tensor is not the original thought. A fundamental consequence of quantaeus is that only weak states of perceptible tensors can be reliably distinguished from one another without causing a collapse into a unified perceptible tensor. Try it sometime - work on the mainifestation of some desired eventuality, using pure thought. Because an idea has no time or space, it is therefore in-finite. ( not-finite ) and therefore can attain "perfection" - i.e. absoluteness. Just for a hint, start with the weather as that is the easiest thing to influence ( as least as far as is currently known ) then move as soon as can be done to doing a join from the sleep-state to the waking-state with virtually no interruption of sequential chaining.
There is an almost unavoidable blip there when the body wakes up but it is just like when the doorbell rings, speaking of which brings an interesting area of statistical research to funding availability: How many thoughts can one maintain synchronously? I find that duality is the practical working limit, at triune it either breaks on the next thought or doesn't last very long.
Perhaps the work of Yokota et al could reveal the source of spurious net traffic...maybe it's ghosts.
As per Knuth, make sure you test the low-order bits for randomness, since many algorithms exhibit terrible randomness in the lowest bits.
Although this question is old, it does not seem "solved", so here is my 2 cents, showing that it is still an important problem that can be discussed in simple terms.
Consider password security.
The question was about "long" number sequences, "say 100.000", but does not state what is the criterium for "long". For passwords, 8 characters might be considered long. If those 8 chars were "random", it might be considered a good password, but if it can be easily guessed, a useless password.
Common password rules are to mix upper case, numbers and special characters. But the commonly used "Password1" is still a bad password. (okay, 9-char example, sorry) So how many of the methods of the other answers you apply, you should also check if the password occurs in several dictionaries, including sets of leaked passwords.
But even then, just imagine the rise of a new Hollywood star. This may lead to a new famous name that will be given to newborns, and may become popular as a password, that is not yet in the dictionaries.
If I am correctly informed, it is pretty much impossible to automatically verify that a password selected by a human is random and not derived with an easy to guess algorithm. And also that a good password system should work with computer-generated random passwords.
The conclusion is that there is no method to verify if an 8-char password is random, let alone a good and simple method. And if you cannot verify 8 characters, why would it be easier to verify 100.000 numbers?
The password example is just one example of how important this question of randomness is; think also about encryption. Randomness is the holy grail of security.
Measuring randomness? In order to do so, you should fully understand its meaning. The problem is, if you search the internet you will reach the conclusion that there is a nonconformity concept of randomness. For some people it's one thing, for others it's something else. You'll even find some definitions given through a philosophical perspective. One of the most frequent misleading concepts is to test if "it's random or not random". Randomness is not a "yes" or a "no", it could be anything in between. Although it is possible to measure and quantify "randomness", its concept should remain relative regarding its classification and categorization. So, to say that something is random or not random in an absolute way would be wrong because it's relative and even subjective for that matter. Accordingly, it is also subjective and relative to say that something follows a pattern or doesn't because, what's a pattern? In order to measure randomness, you have to start off by understanding it's mathematical theoretical premise. The premise behind randomness is easy to understand and accept. If all possible outcomes/elements in your sample space have the EXACT same probability of happening than randomness is achieved to it's fullest extent. It's that simple. What is more difficult to understand is linking this concept/premise to a certain sequence/set or a distribution of outcomes of events in order to determine a degree of randomness. You could divide your sample into sets or subsets and they could prove to be relatively random. The problem is that even if they prove to be random by themselves, it could be proven that the sample is not that random if analyzed as a whole. So, in order to analyze the degree of randomness, you should consider the sample as a whole and not subdivided. Conducting several tests to prove randomness will necessarily lead to subjectiveness and redundancy. There are no 7 tests or 5 tests, there is only one. And that test follows the already mentioned premise and thus determines the degree of randomness based on the outcome distribution type or in other words, the outcome frequency distribution type of a given sample. The specific sequence of a sample is not relevant. A specific sequence would only be relevant if you decide to divide your sample into subsets, which you shouldn't, as I already explained. If you consider the variable p(possible outcomes/elements in sample space) and n(number of trials/events/experiments) you will have a number of total possible sequences of (p^n) or (p to the power of n). If we consider the already mentioned premise to be true, any of these possible sequences have the exact same probability of occurring. Because of this, any specific sequence would be inconclusive in order to calculate the "randomness" of a sample. What is essential is to calculate the probability of the outcome distribution type of a sample of happening. In order to do so, we would have to calculate all the sequences that are associated with the outcome distribution type of a sample. So if you consider s=(number of all possible sequences that lead to a outcome distribution type), then s/(p^n) would give you a value between 0 and 1 which should be interpreted as being a measurement of randomness for a specific sample. Being that 1 is 100% random and 0 is 0% random. It should be said that you will never get a 1 or a 0 because even if a sample represents the MOST likely random outcome distribution type it could never be proven as being 100%. And if a sample represents the LEAST likely random outcome distribution type it could never be proven as being 0%. This happens because since there are several possible outcome distribution types, no single one of them can represent being 100% or 0% random. In order to determine the value of variable (s), you should use the same logic used in multinominal distribution probabilities. This method applies to any number of possible outcomes/elements in sample space and to any number of experiments/trials/events. Notice that, the bigger your sample is, the more are the possible outcome frequency distribution types, and the less is the degree of randomness that can be proven by each one of them.
Calculating [s/(n^t)]*100 will give you the probability of the outcome frequency dirtibution type of a set occuring if the source is truly random. The higher the probability the more random your set is. To actually obtain a value of randomness you would have to divide [s/(n^t)] by the highest value [s/(n^t)] of all possible outcome frequency distibution types and multiply by 100.
My kids asked me this question and I couldn't really give a concise, understandable explanation.
So I'm hoping someone on SO can.
How about, "Because computers just follow instructions, and random numbers are the opposite of following instructions. If you make a random number by following instructions, then it's not very random! Imagine trying to give someone instructions on how to choose a random number."
Here's a kid friendly explanation:
Get a Dice (the number of sides doesn't matter)
Write these down on a piece of paper:
Move right
Move up
Move up
Turn the dice over
Move down
Move right
Show them the dice and paper. Explain that the dice represents the computer and the
paper represent the math or algorithm that tells the computer what number it will return.
Now, roll the dice. Tell them that you are "seeding" or asking the computer to start at a random dice position.
Follow each step in the paper (move right) by moving the dice.
Let's say that you threw a 6 sided die and it was seeded at 5. By moving right, you get a 4.
Explain that the computer must start with a starting value. This could be given by any number of sources such as the date or mouse movement. Show them that how they throw the dice determines the starting value.
Explain that the piece of paper is how the computer get the next number. Tell them that the instructions on the paper can be changed as easily as the algorithm for the random generator can be changed by the programmer.
Have fun showing them the various possibilities that is only limited by their imaginations.
Now for the answer to your question:
Tell them that when a good mathematician knows the starting value and what step the computer is currently at, the mathematician can tell what is the next value of the random number.
Ask the child were to hide the paper and throw the dice.
Then ask the child to follow the steps on the paper, you then write down how he gets the next random number.
Afterwards, show them your paper. Now that you have a copy of their random number generator, its easy for anyone else to "guess" the next random to come out.
No matter how creative the child is with their algorithm, you should still be able to deduce their algorithm. Tell your child that in the computer world, nothing is hidden and just by observation, even if its just the numbers that was observed, the random number algorithm can be discovered.
...as a side effect, if the child was able to come up with a good algorithm that confused you, in which you can't deduce the next sequence, then you have a bright child. :D
Here's my attempt at explaining randomness at an approximately eighth-grade level. Hope your kids find it useful!
Surprising as it may seem, a computer is not very smart. Computers must follow their instructions blindly, and are therefore completely predictable. A computer that doesn't follow its instructions in this manner is, in fact, broken! We want computers to do exactly what we tell them.
That's precisely what makes it hard to do things randomly. Computers must be told a sequence of instructions on how to generate random numbers. But that's not really random, because if you gave anybody else the instructions and the same starting point, they could come up with the same answers. So computers can't be truly random just by following instructions.
Ask them to devise a step-by-step method to generate a random number.
And don't accept "pick a number from 1 to 10" as an answer ;)
Trying out a problem should illustrate the difficulty of having to generate random numbers from a set of instructions, just like what computers actually have to do.
Because computers are deterministic machines.
Generating random numbers on a computer is like playing "Eenie meenie miney moe" when choosing who's It first in a game of tag. On the surface it does look random, but when you get into the details, it's completely deterministic. It's hard to make eenie meenie miney moe into a scheme that a person really can't predict the outcome of.
Also there's some difficulties with getting the distribution nice and even.
Because given any input, an algorithm produces the exact same output every single time. And you can't just provide a "random" input, because you're trying to generate the random number in the first place.
"Kids, unless they're broken, computers never lie, and they always do what you tell them to do. Even when we are disappointed by the results, it always turns out that they were doing what they were told to do with complete fidelity. They can only do two things: add one and one, and move a number from one place to another. If you want them to produce random numbers, you need to explain to them how to do that in terms of adding one and one and moving. Once you have explained that, the results will not be random."
Because the only true source of randomness exists at the quantum level. With suitable hardware assists, computers can access this level. for example, they can sample the decay of a radioactve isotope or the noise from a thermionic valve. But your basic PC doesn't come with this cool stuff.
A simple explanation for the children:
The definition of randomness is a philosophical and mathematical question, beyond the scope of this answer, but by definition there is no such thing as a "random" number. In a metaphysical sense, a number is only random in sequential form; however, there is a probability that a sequence follows certain statistical distributions depending on the sample size. A random number generator (in our case a pseudo-random number generator, or PRNG) is simply a device to produce a quasi-random sequence of numbers that we can only estimate (based on the given probability inherent within the sequence) to be random.
You should explain to the children that programs can only mimic these devices using complex mathematical formulas (which guarantee a lack of "randomness" by definition because they are a result of some function, or procedural algorithm). Typically, rigorous statistical analysis is necessary in order to differentiate the use of a quantum hardware PRNG (use this as an opportunity to explain to your kids the Heisenberg Principle!) and that of a strong software PRNG.
Had to be done really
Source: http://xkcd.com/221/
Because there is no such thing as a random number.
Random is a human concept that we use when we cannot comprehend data and do not understand it. If we are to believe that science will ultimately lead to an understanding of how everything works then surely everything is deterministic.
Take away the human and there is no random there is only "this". It happens because it happens, not because it is random.
Because a program is a system and everything in a system is made to run with consistency and regularity. Randomness has no place in a system.
It is hard because given the same sets of inputs and conditions, a program will produce the same result everytime. This by definition is not random.
Algorithms to generate random numbers are inevitably deterministic. They take a small random seed, and use it to obtain a long string of pseudo-random digits.
It's very difficult to do this without introducing subtle patterns into the data. A string of digits can look perfectly random but have repeated patterns which make the distribution innappropriate for applications where randomness is required.
Computers can only execute algorithmic computations, and a truly random number isn't an algorithmic thing. You can get algorithms that produce numbers that behave like random numbers; such algorithms are called 'Pseudo-Random number generators'.
At various times in the past, people have made random number generators from analog-digital converters connected to sources of electronic noise, but this tends to be fairly specialised kit.
Primarily because computers don't have any functions that behave in discrete, non-random ways. A computer is predictable, which allows us to program reliable software. If it wasn't predictable it would be easier to generate a random number (since our software could rely on this unpredictable method).
While it's possible to generate pseudo-random numbers, and numbers that are distributed randomly, you cannot generate truly random numbers without separate hardware. There is hardware that generates truly random numbers based on "quantum" interactions (at least according to the manufacturers). Online poker sites sometimes use these adapters for their generators.
Apparently there are even online services to provide random numbers - random.org for example.
As surprising as it may seem, it is difficult to get a computer to do something by chance. A computer follows its instructions blindly and is therefore completely predictable. (A computer that doesn't follow its instructions in this manner is broken.) There are two main approaches to generating random numbers using a computer: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs).
Actually, on most modern computers it's not hard to produce numbers that are "random enough" for most purposes. As others have noted, the critical thing is having a source of randomness. You can't just write a program that will produce randomness algorithmically, but you can observe randomness in the various activities of most computers of reasonable complexity, i.e., the ones we typically think of when writing programs. One such source is timing data of interrupts from various system devices.
At one time many computers had no way to get at this data and could only offer pseudorandomness, that is, a random, but repeatable distribution of numbers based on a particular seed. For many purposes this is sufficient -- choosing a different seed each time results in good enough randomness. For other purposes, such as encryption, this isn't strong enough and you need some randomness to start with that isn't repeatable or predictable. Today, most computers (with the exception of embedded devices, perhaps) are sophisticated enough to have a source of randomness that can generate encryption-strength random numbers. For instance, Linux has /dev/random and the .NET framework supports the cryptographically strong RandomNumberGenerator class which has a number of implementations.
Its probably helpful to distinguish between a number that is hard to predict (which a computer can create) from something that is not deterministic (which is a bit tougher for computers, and theoretically, any physical being).
It's easy to come up with an algorithm that generates unexpected numbers, that appear random in some sense. But to design an algorithm that generates true random numbers, well, that's hard.
Imagine designing an algorithm to simulate a dice roll. You can easily formulate some procedure to generate different numbers on each iteration. But can you guarantee that, in the long run (I mean, up to the infinity), the amount of times that 6 came out will be the same as any other number? When designing a good random number generator, that's the kind of commitment that you have to assume. You have to provide strong guarantees (i.e. mathematical proofs) about the randomness, if the application (e.g. lottery) requires it.
It is relevant to note that humans perform very poorly at generating random numbers. Computers are worse because they just follow a strict set commands. Humans can only generate good (pseudo) random numbers when following an algorithm, a set of commands. Computers are the same.
Although it should be noted that computers can gather entropy from the "environment" connected to it, like keyboard and mouse actions, what aids in generating random numbers (either directly or by seeding a PRNG).
To make the computer generate a random number, the computer has to have a source of randomness to start with.
It has to be feeded a seed that can't be expected or calculated by just looking at the seed, if the seed comes from a clock then it can be predicted or calculated by knowing the time, if the seed comes from like filming a lavalamp and get numbers from the picture stream then it's harder to just look at the seed to know what next number will be.
The computer does not have an built in lava lamp to generate that randomness, thats whats make it hard, we have to substitute real randomness with some input that exists in the computer, maybe by logging passing tcpip-packets or other things, but its not many ways to get that randomness sources in.
Computers just don't have suitable hardware. Ordinary computer's hardware is meant to be deterministic. With suitable hardware like mentioned here random numbers are not a problem at all.
Awhile back I came across the "Dice-O-Matic"
http://GamesByEmail.com/News/DiceOMatic
Kind of interesting real world application of the problem.
Its not hard, here's a couple for free: 12, 1400, 397.6