Randomness of random.randint() vs uuid4 - random

I have used uuid4 since now in my project database (still in development) but I am now thinking about switching to random bigints for PK.
Can someone tell me, how much worse the randomness of randint(1000000000000000, 9007199254740991) (javascript safe) vs uuid4 is?
I bet, I am using the term randomness wrong here. What I want to achieve is, that I do not have continous but of course unique ids in my DB, thats why I used uuid until now. But because I ran into a lot of troubles I would like to switch to integers now and would like to know how much "more random" uuid would have been

Just eyeballing it, 9007199254740991 looks like about 55 bits; UUIDv4 has 122 random bits, which means it has about 2^67 times more randomness. You lose one bit or so by cutting off the low end of the range, which is easy to fix, but it doesn't really matter when the difference is already that stark.

Related

Commercial-grade randomization for Poker game

I need some advice on how to tackle an algorithmic problem (ie. not programming per se). What follows are my needs and how I tried to meet them. Any comments for improvement would be welcome.
Let me first start off by explaining my goal. I would like to play some poker about a billion times. Maybe I'm trying to create the next PokerStars.net, maybe I'm just crazy.
I would like to create a program that can produce better randomized decks of cards, than say the typical program calling random(). These need to be production quality decks created from high quality random numbers. I've heard that commercial-grade poker servers use 64-bit vectors for every card, thus ensuring randomness for all the millions of poker games played daily.
I'd like to keep whatever I write simple. To that end, the program should only need one input to achieve the stated goal. I have decided that whenever the program begins, it will record the current time and use that as the starting point. I realize that this approach would not be feasible for commercial environments, but as long as it can hold up for a few billion games, better than simpler alternatives, I'll be happy.
I began to write pseudo-code to solve this problem, but ran into a thorny issue. It's clear to me, but it might not be to you, so please let me know.
Psuedo-code below:
Start by noting the system time.
Hash the current time (with MD5) around ten times (I chose the ten arbitrarily).
Take the resulting hash, and use it as the seed to the language-dependent random() function.
Call random() 52 times and store the results.
Take the values produced by random() and hash them.
Any hash function that produces at least 64-bits of output will do for this.
Truncate (if the hash is too big) so the hashes will fit inside a 64-bit double.
Find a way to map the 52 doubles (which should be random now, according to my calculations) into 52 different cards, so we can play some poker.
My issue is with the last step. I cannot think of a way to properly map each 64-bit value to a corresponding card, without having to worry about two numbers being the same (unlikely) or losing any randomness (likely).
My first idea was to break 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF into four even sections (to represent the suits). But there is no guarantee that we will find exactly thirteen cards per section, which would be bad.
Now that you know where I am stuck, how would you overcome this challenge?
-- Edited --
Reading bytes from /dev/random would work well actually. But that still leaves me lost on how to do the conversion? (assuming I read enough bytes for 52 cards).
My real desire is to take something simple and predictable, like the system time, and transform it into a randomized deck of cards. Seeding random() with the system time is a BAD way of going about doing this. Hence the hashing of the time and hashing the values that come out of random().
Hell, if I wanted to, I could hash the bytes from /dev/random, just for shizzles and giggles. Hashing improves the randomness of things, doesn't it? Isn't that why modern password managers store passwords that have been hashed thousands of times?
-- Edit 2 --
So I've read your answers and I find myself confused by the conclusion many of you are implying. I hinted at it in my first edit, but it's really throwing me for a loop. I'd just like to point it out and move on.
Rainbow tables exist which do funky math and clever magic to essentially act as a lookup table for common hashes that map to a particular password. It is my understanding that longer, better passwords are unlikely to show up in these rainbow tables. But the fact still stands that despite how common many user passwords are, the hashed passwords remain safe after being hashed thousands of times.
So is that a case where many deterministic operations have increased the randomness of the original password (or seems to?) I'm not saying I'm right, I'm just saying thats my feeling.
The second thing I want to point out is I'm doing this backwards.
What I mean is that you all are suggesting I take a sorted, predictable, non-random deck of cards and use the Fisher-Yates shuffle on it. I'm sure Fisher-Yates is a fine algorithm, but lets say you couldn't use it for whatever reason.
Could you take a random stream of bytes, say in the neighborhood of 416 bytes (52 cards with 8 bytes per card) and BAM produce an already random deck of cards? The bytes were random, so it shouldn't be too hard to do this.
Most people would start with a deck of 52 cards (random or not) and swap them around a bunch of times (by picking a random index to swap). If you can do that, then you can take 52 random numbers, run through them once, and produce the randomized deck.
As simply as I can describe it,
The algorithm to accepts a stream of randomized bytes and looks at each 8-byte chunk. It maps each chunk to a card.
Ex. 0x123 maps to the Ace of Spades
Ex. 0x456 maps to the King of Diamonds
Ex. 0x789 maps to the 3 of Clubs
.... and so on.
As long as we chose a good model for the mapping, this is fine. No shuffling required. The program will be reduced to two steps.
Step 1: Obtain a sufficient quantity of random bytes from a good source
Step 2: Split this stream of bytes into 52 chunks, one for each card in the deck
Step 2a: Run through the 52 chunks, converting them into card values according to our map.
Does that makes sense?
You are massively overcomplicating the problem. You need two components to solve your problem:
A shuffling algorithm
A sufficiently high-quality random number generator for the shuffling algorithm to use.
The first is easy, just use the Fisher-Yates shuffle algorithm.
For the second, if you want sufficient degrees of freedom to be able to generate every possible permutation (of the 52! possibilities) then you need at least 226 bits of entropy. Using the system clock won't give you more than 32 or 64 bits of entropy (in practice far fewer as most of the bits are predictable), regardless of how many redundant hashes you perform. Find an RNG that uses a 256-bit seed and seed it with 256 random bits (a bootstrapping problem, but you can use /dev/random or a hardware RNG device for this).
You don't mention which OS you're on, but most modern OS's have pre-made sources of high quality entropy. On Linux, it's /dev/random and /dev/urandom, from which you can read as many random bytes as you want.
Writing your own random number generator is highly non-trivial, if you want good randomness. Any homebrew solution is likely to be flawed and could potentially be broken and its outputs predicted.
You will never improve your randomness if you still use a pseudo-random generator, no matter how many deterministic manipulations you do to it. In fact, you are probably making it considerably worse.
I would use a commercial random number generator. Most use hardware solutions, like a Geiger counter. Some use existing user input as a source of entropy, such as background noise into the computer's microphone or latency between keyboard strokes.
Edit:
You mentioned that you also want to know how to map this back to a shuffle algorithm. That part is actually quite simple. One straightforward way is Fisher-Yates shuffle. Basically all you need from your RNG is a random number uniformly distributed between 0 and 51 inclusive. That you can do computationally given any RNG and is usually built into a good library. See the "Potential sources of bias" section of the Wikipedia article.
Great question!
I would strongly discourage you from using the random function that comes built-in with any programming language. This generates pseudorandom numbers that are not cryptographically secure, and so it would be possible for a clever attacker to look at the sequence of numbers coming back out as cards and to reverse-engineer the random number seed. From this, they could easily start predicting the cards that would come out of the deck. Some early poker sites, I've heard, had this vulnerability.
For your application, you will need cryptographically secure random numbers so that an adversary could not predict the sequence of cards without breaking something cryptographically assumed to be secure. For this, you could either use a hardware source of randomness or a cryptographically secure pseudorandom number generator. Hardware random generators can be expensive, so a cryptographically secure PRNG may be a good option.
The good news is that it's very easy to get a cryptographically secure PRNG. If you take any secure block cipher (say, AES or 3DES) and using a random key start encrypting the numbers 0, 1, 2, ..., etc. then the resulting sequence is cryptographically secure. That is, you could use /dev/random to get some random bytes for use as a key, then get random numbers by encrypting the integers in sequence using a strong cipher with the given key. This is secure until you hand back roughly √n numbers, where n is the size of the key space. For a cipher like AES-256, this is 2128 values before you'd need to reset the random key. If you "only" want to play billions of games (240), this should be more than fine.
Hope this helps! And best of luck with the project!
You should definitely read the answer to this question: Understanding "randomness"
Your approach of applying a number of arbitrary transformations to an existing pseudorandom number is very unlikely to improve your results, and in fact risks rendering less random numbers.
You might consider using physically derived random numbers rather than pseudorandom numbers:
http://en.wikipedia.org/wiki/Hardware_random_number_generator
If you are definitely going to use pseudorandom numbers, then you are likely to be best off seeding with your operating system's randomness device, which is likely to include additional entropy from things like disk seek times as well as user IO.
Reading bytes from /dev/random would work well actually. But that still leaves me lost on how to do the conversion? (assuming I read enough bytes for 52 cards).
Conversion of what? Just take a deck of cards and, using your cryptographically-secure PRNG, shuffle it. This will produce every possible deck of cards with equal probability, with no way for anyone to determine what cards are coming next - that's the best you could possibly do.
Just make sure you implement the shuffling algorithm correctly :)
In terms of actually turning the random numbers into cards(once you follow the advice of others in generating the random numbers), You can map the lowest number to the Ace of diamonds, the 2nd lowest number to the 2 of diamonds, etc.
Basically you assume the actual cards have a natural ordering and then you sort the random numbers and map to the deck.
Edit
Apparently wikipedia lists this method as an alternative to the Fisher-Yates algorithm(which I hadn't previously heard of -Thanks Dan Dyer!). One thing in the wikipedia article that I didn't think of is that you need to be sure that you don't repeat any random numbers if you're using the algorithm I described.
A ready-made, off the shelf poker hand evaluator can be found here. All feedback welcomed at the e-mail address found therein.

when to resize a hash table?

In various hash table implementations, I have seen "magic numbers" for when a mutable hash table should resize (grow). Usually this number is somewhere between 65% to 80% of the values added per allocated slots. I am assuming the trade off is that a higher number will give the potential for more collisions and a lower number less at the expense of using more memory.
My question is how is this number arrived at?
Is it arbitrary? based on testing? based on some other logic?
At a guess, most people at least start from the numbers in a book (e.g., Knuth, Volume 3), which were produced by testing. Depending on the situation, some may carry out testing afterwards, and make adjustments accordingly -- but from what I've seen, these are probably in the minority.
As I outlined in a previous answer, the "right" number also depends heavily on how you resolve collisions. For better or worse, this fact seems to be widely ignored -- people frequently don't pick numbers that are particularly appropriate for the collision resolution they use.
OTOH, the other point I found in my testing is that it only rarely makes a whole lot of difference. You can pick numbers across a fairly broad range and get pretty similar overall speed. The main thing is to be careful to avoid pushing the number too high, especially if you're using something like linear probing for collision resolution.
I think you don't want to consider "how full" the table is (how many "buckets" out of total buckets have values) but rather the number of collisions it might take to find a spot for a new item.
I read some compiler book years ago (can't remember title or authors) that suggested just using linked lists until you have more than 10 to 12 items. That would seem to support more than 10 collisions means time to re-size.
The Design and Implementation of Dynamic. Hashing for Sets and Tables in Icon suggests that an average hash chain length of 5 (in that algorithm, the average number of collisions) is enough to trigger a rehash. Seems supported by testing, but I'm not sure I'm reading the paper correctly.
It looks like the resize condition is mainly the result of testing.
That depends on the keys. If you know that your hash function is perfect for all possible keys (for example, using gperf), then you know that you'll have only few collisions, so the number is higher.
But most of the time, you don't know much about the keys except that they are text. In this case, you have to guess since you don't even have test data to figure out in advance how your hash function is behaving.
So you hope for the best. If you hash function is very bad for the keys, then you will have a lot of collisions and the point of growth will never be reached. In this case, the chosen figure is irrelevant.
If your hash function is adequate, then it should create only a few collisions (less than 50%), so a number between 65% and 80% seems reasonable.
That said: Unless your hash table must be perfect (= huge size or lots of accesses), don't bother. If you have, say, ten elements, considering these issues is a waste of time.
As far as I'm aware the number is a heuristic based on empirical testing.
With a reasonably good distribution of hash values it seems that the magic load factor is -- as you say -- usually around 70%. A smaller load factor means that you're wasting space for no real benefit; a higher load factor means that you'll use less space but spend more time dealing with hash collisions.
(Of course, if you know that your hash values are perfectly distributed then your load factor can be 100% and you'll still have no wasted space and no hash collisions.)
Collisions depend highly on data and used hash function.
Most of numbers based on heuristics or on assumption about normal distribution of hash values. (AFAIK values about 70% are typical for extendible hash tables, but one can always construct such data stream, that you get much more/less collisions)

How does one go about reverse engineering an algorithm?

I'm wondering how does one go about reversing an algorithm such as one for storing logins or pin codes.
Lets say I have an amount of data where:
7262627 -> ? -> 8172
5353773 -> ? -> 1132
etc. This is just an example. Or say a hex string that is tansformed into another.
&h8712 -> &h1283 or something like that.
How do I go about starting to figure out what that algorithm is? Where does one start?
Would you start trying different shifts, xors and hope something stands out? I'm sure there's a better way as this seems like stabbing in the dark.
Is it even practically possible to reverse engineer this kind of algorithm?
Sorry if this is a stupid question. Thanks for your help / pointers.
There are a few things people try:
Get the source code, or disassemble an executable.
Guess, based on the hash functions other people use. For example, a hash consisting of 32 hex digits might well be one or more repetitions of MD5, and if you can get a single input/output pair then it is quite easy to confirm or refute this (although see "salt", below).
Statistically analyze a large number of pairs of inputs and outputs, looking for any kind of pattern or correlations, and relate those correlations to properties of known hash functions and/or possible operations that the designer of the system might have used. This is beyond the scope of a single technique, and into the realms of general cryptanalysis.
Ask the author. Secure systems don't usually rely on the secrecy of the hash algorithms they use (and don't usually stay secure long if they do). The examples you give are quite small, though, and secure hashing of passwords would always involve a salt, which yours apparently don't. So we might not be talking about the kind of system where the author is confident to do that.
In the case of a hash where the output is only 4 decimal digits, you can attack it simply by building a table of every possible 7 digit input, together with its hashed value. You can then reverse the table and you have your (one-to-many) de-hashing operation. You never need to know how the hash is actually calculated. How do you get the input/output pairs? Well, if an outsider can somehow specify a value to be hashed, and see the result, then you have what's called a "chosen plaintext", and an attack relying on that is a "chosen plaintext attack". So a 7 digit -> 4 digit hash would be very weak indeed if it was used in a way which allowed chosen plaintext attacks to generate a lot of input/output pairs. I realise that's just one example, but it's also just one example of a technique to reverse it.
Note that reverse engineering the hash, and actually reversing it, are two different things. You could figure out that I'm using SHA-256, but that wouldn't help you reverse it (i.e., given an output, work out the input value). Nobody knows how to fully reverse SHA-256, although of course there are always rainbow tables (see "salt", above) <conspiracy>At least nobody admits they do, so it's no use to you or me.</conspiracy>
Probably, you can't. Suppose the transformation function is known, something like
function hash(text):
return sha1("secret salt"+text)
But the "secret salt" is not known, and is cryptographically strong (a very large, random integer). You could never brute force the secret salt from even a very large number of plain-text, crypttext pairs.
In fact, if the precise hash function used was known to be one of two equally strong functions, you could never even get a good guess between which one was being used.
Stabbing in the dark will drive you to insanity. There are some algorithms that, given current understanding, you couldn't hope to deduce the inner workings of between now and the [predicted] end of the universe without knowing the exact details (potentially including private keys or internal state). Of course, some of these algorithms are the foundations of modern cryptography.
If you know in advance that there's a pattern to be discovered though, there are sometimes ways of approaching this. For instance, if the dataset contains several input values that differ by 1, compare the corresponding output values:
7262627 -> 8172
7262628 -> 819
7262629 -> 1732
...
7262631 -> 3558
Here it's fairly clear (given a few minutes and a calculator) that when the input increases by 1, the output increases by 913 modulo 8266 (i.e. a simple linear congruential generator).
Differential cryptanalysis is a relatively modern technique used to analyse the strength of cryptographic block ciphers, relying on a similar but more complex idea for where the cipher algorithm is known, but it's assumed the private key isn't. Input blocks differing from each other by a single bit are considered and the effect of that bit is traced through the cipher to deduce how likely each output bit is to "flip" as a result.
Other ways of approaching this kind of problem would be to look at the extremes (maximum, minimum values), distribution (leading to frequency analysis), direction (do the numbers always increase? decrease?) and (if this is allowed) consider the context in which the data sets were found. For instance, some types of PIN codes always contain a repeated digit to make them easier to remember (I'm not saying a PIN code can necessarily be deduced from anything else - just that a repeated digit is one less digit to worry about!).
Is it even practically possible to reverse engineer this kind of algorithm?
It is possible with a flawed algorithm and enough encrypted/unencrypted pairs, but a well designed algorithm can eliminate that possibility of doing it at all.

Pseudo-Random Binary Sequence Prediction

Given a pseudo-random binary sequence (e.g.: 00101010010101) of finite values, predict how the sequence will continue. Can someone please tell me the easiest way to do it? Or in case it's too difficult for someone who can barely play solitaire on its computer, can someone tell me where to get my first steps...
PS: can this technique be used to predict the colour of the next electronic roulette number (e.g.: assigning 1 and 0 to red and black respectively)?
Cryptographically secure pseudorandom number generators are intended specifically to make what you want to do impossible. In particular, they satisfy the "next bit test": given k bits of their output, you cannot guess bit k+1 with probability greater than 1/2.
Plain pseudorandom number generators that do not satisfy the next bit test can be attacked and in fact security vulnerabilities have been discovered in real world systems due to the choice of PRNG. In particular, linear congruential generators are known to be somewhat (or completely) predictable, and some versions of Unix random may use this algorithm. This method is quite math intensive though. If you want to go down this path a search for "linear congruential generator prediction" is a place to start.
Another attack if you are aware of the PRNG implementation is to try to determine the seed used to generate the sequence you are analyzing. The seed is sometimes based on guessable information like time of day, process ID, etc.
Well, for pseudo-random sequences, the only possibility is to keep count how many of each possibility has come before. If the 1s outweigh the 0s, it's more likely that the next one will be 0. How much more likely depends on the relative occurrences of each.
Note that this won't work for true randomness since the events are independent, despite what the statisticians tell you :-)
You'll find that out (painfully) the first time you get a run of 13 reds on the table when you're using the double-on-loss method of playing roulette. In any case, the house derives its advantage from 0 (and double-0 on some tables) which are neither red nor black.
This is a decent question but I think if "you can barely play solitaire" it might be out of your reach right now.
You should look into picking up a basic language, and most are going to say PHP but I'm wary of recommending that to a beginner (it's pretty easy to get working though, see:XAMPP). Java is probably an "easy-to-get-running-and-work-with" language but I'm sure there's better threads on here about which language to start with (Python or something probably wins because experienced programmers love it).
By the way, your English is fine (I didn't notice you were a non-native English speaker).
Now, as for your question, if you're looking at true pattern matching. I'd be inclined to convert this idea to code:
"CURRENTPOINT" is end of first letter.
LOOP: Pick letter(s) from Start to "CURRENTPOINT"
Break the rest of your binary string into blocks of the same size.
See if these blocks all equal your picked letters.
If not, move "CURRENTPOINT" along and repeat the LOOP until you run out of letters.
If so, you have your "repeating section."
If you're just guessing that the random generator is temporarily biased, and that this bias will re-establish a baseline (balanced 0s and 1s) in the reasonably short-term then you can compare the count of each 0s and 1s and say the other is more likely based on the deviation from your baseline. However, be careful of the Monte Carlo fallacy.
To answer the PS first: No, because roulette spins are independent events so there's nothing predictive in the historical sequence of outcomes.
The general question is hard and interesting.
This website can infer a surprising number of sequences from their initial values:
http://www.research.att.com/~njas/sequences/
Note that it's for arbitrary integer sequences.
I tried it on simple patterns like {0,0,1,1,0,0,1,1,...} and it says the right thing.
I noticed that nobody told you about periodicity.
Pseudo-random sequence always works on mathematical operation. (until the quantic computer ^^)
An usual way to generate one is to divide two prime number (not sure it's the right word but whatever).
for instance
1/3=1.333333.....
9/7=1,2857142857142857142857142857143
Those are fairly small number and what do we notice? Periodicity.
1/3=1.3 3 3 3 3 3.....
9/7=1,2857 142857 142857 142857 142857 143
The more big is the prime number the more the sequence in that case: 3 and 142857 will be big
So if you look to a pseudo-random sequence for a long time you may find a periodicity and be able to "guess" the next number. But that could take a while.
PS: sorry for my English, I’m a bit rusty ^^
What you need to think about is the properties of randomness, study those. For example, "Randomness runs in bunches". Compare a random sequence against a predictable sequence: you won't normally find bunches in the predictable one. To take advantage of bunches wait for the bunch. And with a little luck you will win.

Why is it hard for a program to generate random numbers?

My kids asked me this question and I couldn't really give a concise, understandable explanation.
So I'm hoping someone on SO can.
How about, "Because computers just follow instructions, and random numbers are the opposite of following instructions. If you make a random number by following instructions, then it's not very random! Imagine trying to give someone instructions on how to choose a random number."
Here's a kid friendly explanation:
Get a Dice (the number of sides doesn't matter)
Write these down on a piece of paper:
Move right
Move up
Move up
Turn the dice over
Move down
Move right
Show them the dice and paper. Explain that the dice represents the computer and the
paper represent the math or algorithm that tells the computer what number it will return.
Now, roll the dice. Tell them that you are "seeding" or asking the computer to start at a random dice position.
Follow each step in the paper (move right) by moving the dice.
Let's say that you threw a 6 sided die and it was seeded at 5. By moving right, you get a 4.
Explain that the computer must start with a starting value. This could be given by any number of sources such as the date or mouse movement. Show them that how they throw the dice determines the starting value.
Explain that the piece of paper is how the computer get the next number. Tell them that the instructions on the paper can be changed as easily as the algorithm for the random generator can be changed by the programmer.
Have fun showing them the various possibilities that is only limited by their imaginations.
Now for the answer to your question:
Tell them that when a good mathematician knows the starting value and what step the computer is currently at, the mathematician can tell what is the next value of the random number.
Ask the child were to hide the paper and throw the dice.
Then ask the child to follow the steps on the paper, you then write down how he gets the next random number.
Afterwards, show them your paper. Now that you have a copy of their random number generator, its easy for anyone else to "guess" the next random to come out.
No matter how creative the child is with their algorithm, you should still be able to deduce their algorithm. Tell your child that in the computer world, nothing is hidden and just by observation, even if its just the numbers that was observed, the random number algorithm can be discovered.
...as a side effect, if the child was able to come up with a good algorithm that confused you, in which you can't deduce the next sequence, then you have a bright child. :D
Here's my attempt at explaining randomness at an approximately eighth-grade level. Hope your kids find it useful!
Surprising as it may seem, a computer is not very smart. Computers must follow their instructions blindly, and are therefore completely predictable. A computer that doesn't follow its instructions in this manner is, in fact, broken! We want computers to do exactly what we tell them.
That's precisely what makes it hard to do things randomly. Computers must be told a sequence of instructions on how to generate random numbers. But that's not really random, because if you gave anybody else the instructions and the same starting point, they could come up with the same answers. So computers can't be truly random just by following instructions.
Ask them to devise a step-by-step method to generate a random number.
And don't accept "pick a number from 1 to 10" as an answer ;)
Trying out a problem should illustrate the difficulty of having to generate random numbers from a set of instructions, just like what computers actually have to do.
Because computers are deterministic machines.
Generating random numbers on a computer is like playing "Eenie meenie miney moe" when choosing who's It first in a game of tag. On the surface it does look random, but when you get into the details, it's completely deterministic. It's hard to make eenie meenie miney moe into a scheme that a person really can't predict the outcome of.
Also there's some difficulties with getting the distribution nice and even.
Because given any input, an algorithm produces the exact same output every single time. And you can't just provide a "random" input, because you're trying to generate the random number in the first place.
"Kids, unless they're broken, computers never lie, and they always do what you tell them to do. Even when we are disappointed by the results, it always turns out that they were doing what they were told to do with complete fidelity. They can only do two things: add one and one, and move a number from one place to another. If you want them to produce random numbers, you need to explain to them how to do that in terms of adding one and one and moving. Once you have explained that, the results will not be random."
Because the only true source of randomness exists at the quantum level. With suitable hardware assists, computers can access this level. for example, they can sample the decay of a radioactve isotope or the noise from a thermionic valve. But your basic PC doesn't come with this cool stuff.
A simple explanation for the children:
The definition of randomness is a philosophical and mathematical question, beyond the scope of this answer, but by definition there is no such thing as a "random" number. In a metaphysical sense, a number is only random in sequential form; however, there is a probability that a sequence follows certain statistical distributions depending on the sample size. A random number generator (in our case a pseudo-random number generator, or PRNG) is simply a device to produce a quasi-random sequence of numbers that we can only estimate (based on the given probability inherent within the sequence) to be random.
You should explain to the children that programs can only mimic these devices using complex mathematical formulas (which guarantee a lack of "randomness" by definition because they are a result of some function, or procedural algorithm). Typically, rigorous statistical analysis is necessary in order to differentiate the use of a quantum hardware PRNG (use this as an opportunity to explain to your kids the Heisenberg Principle!) and that of a strong software PRNG.
Had to be done really
Source: http://xkcd.com/221/
Because there is no such thing as a random number.
Random is a human concept that we use when we cannot comprehend data and do not understand it. If we are to believe that science will ultimately lead to an understanding of how everything works then surely everything is deterministic.
Take away the human and there is no random there is only "this". It happens because it happens, not because it is random.
Because a program is a system and everything in a system is made to run with consistency and regularity. Randomness has no place in a system.
It is hard because given the same sets of inputs and conditions, a program will produce the same result everytime. This by definition is not random.
Algorithms to generate random numbers are inevitably deterministic. They take a small random seed, and use it to obtain a long string of pseudo-random digits.
It's very difficult to do this without introducing subtle patterns into the data. A string of digits can look perfectly random but have repeated patterns which make the distribution innappropriate for applications where randomness is required.
Computers can only execute algorithmic computations, and a truly random number isn't an algorithmic thing. You can get algorithms that produce numbers that behave like random numbers; such algorithms are called 'Pseudo-Random number generators'.
At various times in the past, people have made random number generators from analog-digital converters connected to sources of electronic noise, but this tends to be fairly specialised kit.
Primarily because computers don't have any functions that behave in discrete, non-random ways. A computer is predictable, which allows us to program reliable software. If it wasn't predictable it would be easier to generate a random number (since our software could rely on this unpredictable method).
While it's possible to generate pseudo-random numbers, and numbers that are distributed randomly, you cannot generate truly random numbers without separate hardware. There is hardware that generates truly random numbers based on "quantum" interactions (at least according to the manufacturers). Online poker sites sometimes use these adapters for their generators.
Apparently there are even online services to provide random numbers - random.org for example.
As surprising as it may seem, it is difficult to get a computer to do something by chance. A computer follows its instructions blindly and is therefore completely predictable. (A computer that doesn't follow its instructions in this manner is broken.) There are two main approaches to generating random numbers using a computer: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs).
Actually, on most modern computers it's not hard to produce numbers that are "random enough" for most purposes. As others have noted, the critical thing is having a source of randomness. You can't just write a program that will produce randomness algorithmically, but you can observe randomness in the various activities of most computers of reasonable complexity, i.e., the ones we typically think of when writing programs. One such source is timing data of interrupts from various system devices.
At one time many computers had no way to get at this data and could only offer pseudorandomness, that is, a random, but repeatable distribution of numbers based on a particular seed. For many purposes this is sufficient -- choosing a different seed each time results in good enough randomness. For other purposes, such as encryption, this isn't strong enough and you need some randomness to start with that isn't repeatable or predictable. Today, most computers (with the exception of embedded devices, perhaps) are sophisticated enough to have a source of randomness that can generate encryption-strength random numbers. For instance, Linux has /dev/random and the .NET framework supports the cryptographically strong RandomNumberGenerator class which has a number of implementations.
Its probably helpful to distinguish between a number that is hard to predict (which a computer can create) from something that is not deterministic (which is a bit tougher for computers, and theoretically, any physical being).
It's easy to come up with an algorithm that generates unexpected numbers, that appear random in some sense. But to design an algorithm that generates true random numbers, well, that's hard.
Imagine designing an algorithm to simulate a dice roll. You can easily formulate some procedure to generate different numbers on each iteration. But can you guarantee that, in the long run (I mean, up to the infinity), the amount of times that 6 came out will be the same as any other number? When designing a good random number generator, that's the kind of commitment that you have to assume. You have to provide strong guarantees (i.e. mathematical proofs) about the randomness, if the application (e.g. lottery) requires it.
It is relevant to note that humans perform very poorly at generating random numbers. Computers are worse because they just follow a strict set commands. Humans can only generate good (pseudo) random numbers when following an algorithm, a set of commands. Computers are the same.
Although it should be noted that computers can gather entropy from the "environment" connected to it, like keyboard and mouse actions, what aids in generating random numbers (either directly or by seeding a PRNG).
To make the computer generate a random number, the computer has to have a source of randomness to start with.
It has to be feeded a seed that can't be expected or calculated by just looking at the seed, if the seed comes from a clock then it can be predicted or calculated by knowing the time, if the seed comes from like filming a lavalamp and get numbers from the picture stream then it's harder to just look at the seed to know what next number will be.
The computer does not have an built in lava lamp to generate that randomness, thats whats make it hard, we have to substitute real randomness with some input that exists in the computer, maybe by logging passing tcpip-packets or other things, but its not many ways to get that randomness sources in.
Computers just don't have suitable hardware. Ordinary computer's hardware is meant to be deterministic. With suitable hardware like mentioned here random numbers are not a problem at all.
Awhile back I came across the "Dice-O-Matic"
http://GamesByEmail.com/News/DiceOMatic
Kind of interesting real world application of the problem.
Its not hard, here's a couple for free: 12, 1400, 397.6

Resources