I need some advice on how to tackle an algorithmic problem (ie. not programming per se). What follows are my needs and how I tried to meet them. Any comments for improvement would be welcome.
Let me first start off by explaining my goal. I would like to play some poker about a billion times. Maybe I'm trying to create the next PokerStars.net, maybe I'm just crazy.
I would like to create a program that can produce better randomized decks of cards, than say the typical program calling random(). These need to be production quality decks created from high quality random numbers. I've heard that commercial-grade poker servers use 64-bit vectors for every card, thus ensuring randomness for all the millions of poker games played daily.
I'd like to keep whatever I write simple. To that end, the program should only need one input to achieve the stated goal. I have decided that whenever the program begins, it will record the current time and use that as the starting point. I realize that this approach would not be feasible for commercial environments, but as long as it can hold up for a few billion games, better than simpler alternatives, I'll be happy.
I began to write pseudo-code to solve this problem, but ran into a thorny issue. It's clear to me, but it might not be to you, so please let me know.
Psuedo-code below:
Start by noting the system time.
Hash the current time (with MD5) around ten times (I chose the ten arbitrarily).
Take the resulting hash, and use it as the seed to the language-dependent random() function.
Call random() 52 times and store the results.
Take the values produced by random() and hash them.
Any hash function that produces at least 64-bits of output will do for this.
Truncate (if the hash is too big) so the hashes will fit inside a 64-bit double.
Find a way to map the 52 doubles (which should be random now, according to my calculations) into 52 different cards, so we can play some poker.
My issue is with the last step. I cannot think of a way to properly map each 64-bit value to a corresponding card, without having to worry about two numbers being the same (unlikely) or losing any randomness (likely).
My first idea was to break 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF into four even sections (to represent the suits). But there is no guarantee that we will find exactly thirteen cards per section, which would be bad.
Now that you know where I am stuck, how would you overcome this challenge?
-- Edited --
Reading bytes from /dev/random would work well actually. But that still leaves me lost on how to do the conversion? (assuming I read enough bytes for 52 cards).
My real desire is to take something simple and predictable, like the system time, and transform it into a randomized deck of cards. Seeding random() with the system time is a BAD way of going about doing this. Hence the hashing of the time and hashing the values that come out of random().
Hell, if I wanted to, I could hash the bytes from /dev/random, just for shizzles and giggles. Hashing improves the randomness of things, doesn't it? Isn't that why modern password managers store passwords that have been hashed thousands of times?
-- Edit 2 --
So I've read your answers and I find myself confused by the conclusion many of you are implying. I hinted at it in my first edit, but it's really throwing me for a loop. I'd just like to point it out and move on.
Rainbow tables exist which do funky math and clever magic to essentially act as a lookup table for common hashes that map to a particular password. It is my understanding that longer, better passwords are unlikely to show up in these rainbow tables. But the fact still stands that despite how common many user passwords are, the hashed passwords remain safe after being hashed thousands of times.
So is that a case where many deterministic operations have increased the randomness of the original password (or seems to?) I'm not saying I'm right, I'm just saying thats my feeling.
The second thing I want to point out is I'm doing this backwards.
What I mean is that you all are suggesting I take a sorted, predictable, non-random deck of cards and use the Fisher-Yates shuffle on it. I'm sure Fisher-Yates is a fine algorithm, but lets say you couldn't use it for whatever reason.
Could you take a random stream of bytes, say in the neighborhood of 416 bytes (52 cards with 8 bytes per card) and BAM produce an already random deck of cards? The bytes were random, so it shouldn't be too hard to do this.
Most people would start with a deck of 52 cards (random or not) and swap them around a bunch of times (by picking a random index to swap). If you can do that, then you can take 52 random numbers, run through them once, and produce the randomized deck.
As simply as I can describe it,
The algorithm to accepts a stream of randomized bytes and looks at each 8-byte chunk. It maps each chunk to a card.
Ex. 0x123 maps to the Ace of Spades
Ex. 0x456 maps to the King of Diamonds
Ex. 0x789 maps to the 3 of Clubs
.... and so on.
As long as we chose a good model for the mapping, this is fine. No shuffling required. The program will be reduced to two steps.
Step 1: Obtain a sufficient quantity of random bytes from a good source
Step 2: Split this stream of bytes into 52 chunks, one for each card in the deck
Step 2a: Run through the 52 chunks, converting them into card values according to our map.
Does that makes sense?
You are massively overcomplicating the problem. You need two components to solve your problem:
A shuffling algorithm
A sufficiently high-quality random number generator for the shuffling algorithm to use.
The first is easy, just use the Fisher-Yates shuffle algorithm.
For the second, if you want sufficient degrees of freedom to be able to generate every possible permutation (of the 52! possibilities) then you need at least 226 bits of entropy. Using the system clock won't give you more than 32 or 64 bits of entropy (in practice far fewer as most of the bits are predictable), regardless of how many redundant hashes you perform. Find an RNG that uses a 256-bit seed and seed it with 256 random bits (a bootstrapping problem, but you can use /dev/random or a hardware RNG device for this).
You don't mention which OS you're on, but most modern OS's have pre-made sources of high quality entropy. On Linux, it's /dev/random and /dev/urandom, from which you can read as many random bytes as you want.
Writing your own random number generator is highly non-trivial, if you want good randomness. Any homebrew solution is likely to be flawed and could potentially be broken and its outputs predicted.
You will never improve your randomness if you still use a pseudo-random generator, no matter how many deterministic manipulations you do to it. In fact, you are probably making it considerably worse.
I would use a commercial random number generator. Most use hardware solutions, like a Geiger counter. Some use existing user input as a source of entropy, such as background noise into the computer's microphone or latency between keyboard strokes.
Edit:
You mentioned that you also want to know how to map this back to a shuffle algorithm. That part is actually quite simple. One straightforward way is Fisher-Yates shuffle. Basically all you need from your RNG is a random number uniformly distributed between 0 and 51 inclusive. That you can do computationally given any RNG and is usually built into a good library. See the "Potential sources of bias" section of the Wikipedia article.
Great question!
I would strongly discourage you from using the random function that comes built-in with any programming language. This generates pseudorandom numbers that are not cryptographically secure, and so it would be possible for a clever attacker to look at the sequence of numbers coming back out as cards and to reverse-engineer the random number seed. From this, they could easily start predicting the cards that would come out of the deck. Some early poker sites, I've heard, had this vulnerability.
For your application, you will need cryptographically secure random numbers so that an adversary could not predict the sequence of cards without breaking something cryptographically assumed to be secure. For this, you could either use a hardware source of randomness or a cryptographically secure pseudorandom number generator. Hardware random generators can be expensive, so a cryptographically secure PRNG may be a good option.
The good news is that it's very easy to get a cryptographically secure PRNG. If you take any secure block cipher (say, AES or 3DES) and using a random key start encrypting the numbers 0, 1, 2, ..., etc. then the resulting sequence is cryptographically secure. That is, you could use /dev/random to get some random bytes for use as a key, then get random numbers by encrypting the integers in sequence using a strong cipher with the given key. This is secure until you hand back roughly √n numbers, where n is the size of the key space. For a cipher like AES-256, this is 2128 values before you'd need to reset the random key. If you "only" want to play billions of games (240), this should be more than fine.
Hope this helps! And best of luck with the project!
You should definitely read the answer to this question: Understanding "randomness"
Your approach of applying a number of arbitrary transformations to an existing pseudorandom number is very unlikely to improve your results, and in fact risks rendering less random numbers.
You might consider using physically derived random numbers rather than pseudorandom numbers:
http://en.wikipedia.org/wiki/Hardware_random_number_generator
If you are definitely going to use pseudorandom numbers, then you are likely to be best off seeding with your operating system's randomness device, which is likely to include additional entropy from things like disk seek times as well as user IO.
Reading bytes from /dev/random would work well actually. But that still leaves me lost on how to do the conversion? (assuming I read enough bytes for 52 cards).
Conversion of what? Just take a deck of cards and, using your cryptographically-secure PRNG, shuffle it. This will produce every possible deck of cards with equal probability, with no way for anyone to determine what cards are coming next - that's the best you could possibly do.
Just make sure you implement the shuffling algorithm correctly :)
In terms of actually turning the random numbers into cards(once you follow the advice of others in generating the random numbers), You can map the lowest number to the Ace of diamonds, the 2nd lowest number to the 2 of diamonds, etc.
Basically you assume the actual cards have a natural ordering and then you sort the random numbers and map to the deck.
Edit
Apparently wikipedia lists this method as an alternative to the Fisher-Yates algorithm(which I hadn't previously heard of -Thanks Dan Dyer!). One thing in the wikipedia article that I didn't think of is that you need to be sure that you don't repeat any random numbers if you're using the algorithm I described.
A ready-made, off the shelf poker hand evaluator can be found here. All feedback welcomed at the e-mail address found therein.
Related
I'm coding an online poker game. The shuffling part is using Fisher Yates algorithm. But I have no idea which random number generator to generate good unpredictable random numbers for the shuffling algorithm to use.
52 cards have 52! ~= 8.065e67 possible sequences.
8e+67 is a lot big number, but it is not very high in data size. It is only 226 bit in data length. 28 bytes.
You may consider using a CSPRNG, a cryptographically strong pseudorandom generator, i.e. an RNG which generates enough strong randomness to be usable for cryptography.
Sometimes also the CPU has a true random number source, it is fast. Here I describe the CSPRNG.
On Linux, you can simply read out the random bytes from the /dev/urandom character device file.
As you point out, you should use one with far more than 52! possible internal states, which means 226 bits of state. There are many PRNGs that exceed this, having 1024 or more bits of state. You also want something fast, so you can simulate millions of hands. The most popular algorithm that meets these criteria is Mersenne Twister. I personally also like variants of Marsaglia's XORshift.
I generally only use hardware true RNGs (that most PCs have nowadays) for cryptography and for seeding these PRNGs, but I am told that some of the better ones are fast enough to produce values even for simulations. You'd have to look that up for your hardware.
I have been reading various articles about random numbers and their generators. There are usually 3 important conclusions that I draw from them:
Random numbers are not truly random
Much of the time they have a bias (modulo bias)
Humans are incapable of being random number generators, when they are trying to "act randomly"
So, with the latter-most of these observations in mind, how would we be able to
Tell if a sequence of numbers that we see is truly random, and more importantly
Is there some way we can prove that said sequence is really random?
I'm tempted to say that so long as you generate a sufficiently large enough sample set 1,000,000+, you should see more or less a uniform dispersion of (pseudo)random numbers occur. However, I'm sure some Maths genius has a way of discrediting this, because surely the by laws of probability you could get a run of one number just as likely as any other sequence.
From what I have read, if you really need random numbers its best to try and reuse what cryptographic libraries use. The field of Cryptography is obviously complex and relies on random numbers for key generation. From the section in OWASP's guide titled "Reversible Authentication Tokens" it says this...
The only way to generate secure authentication tokens is to ensure
there is no way to predict their sequence. In other words: true random
numbers.
It could be argued that computers can not generate true random
numbers, but using new techniques such as reading mouse movements and
key strokes to improve entropy has significantly increased the
randomness of random number generators. It is critical that you do not
try to implement this on your own; use of existing, proven
implementations is highly desirable.
Most operating systems include functions to generate random numbers
that can be called from almost any programming language.
My take is that unless you're coding Cryptographic libraries yourself, put trust in those that are (e.g. use Java Cryptography Extension) so you don't have to proove it yourself.
Pretty Simple Test:
If you really want to get into testing random numbers, you could simulate a program that outputs random numbers from 1-100 100 times as an example.
Then look at those numbers and see if there's any patterns. Then follow that test by restarting the program several times and repeating the process.
Examine all data to figure out if random numbers are always random, just random during individual tests, or never. :P
Testing a random number generator is probably mostly up to what you want to look for. Even pure non-repeatability is no guarantee of randomness.
There are some companies that will test a random number generator for the purposes of certification (e.g. online casinos). One that I found quickly is called iTech Labs, though their testing methodology page leaves a lot to be desired in terms of technical detail.
Other testers and certification bodies publish the required data for a certification; there's more specific detail here but not as much as you want.
You could potentially do a statistical analysis and compare the results of your random number generator to a "true" random source but the argument could be made for bias from trying to translate the true random source into your possibility space anyway.
Randomness tests verify the mathematical properties of the sequence. For example entry frequencies (all symbols are expected to have the same frequency), local variance, sequence analysis (the probability of a symbol must not depend on the previous ones).
A definite proof does not exist, but there is a quality factor - the probability of a sequence to really be random.
Another criterion could be based on compressibility: true randomness has maximum entropy and can not therefore be compressed.
This test is not reliable for randomness, of course, but allows quick and dirty testing with ready tools such as zlib.
It'd be nice to be able, for some purposes, to bypass any sort of algorithmically generated random numbers in favor of natural input---say, dice rolls. Cryptographic key generation, for instance, strikes me as a situation where little enough random data is needed, and the requirement that the data be truly random is high enough, that this might be a feasible and desirable thing to do.
So what I'd like to know, before I go and get my hands dirty, is this: does any software exist for building an entropy pool directly from random digit input? Note that it's not quite enough to simply convert things from radix r to radix 2; since, for instance, 3 and 2 are relatively prime, it's not entirely straightforward to turn a radix-3 (or radix-6) number into binary digits while holding onto maximal entropy in the original input.
The device /dev/random does exactly this on Linux -- maybe it would be worth looking at the source?
EDIT:
As joeytwiddle says, if sufficient randomness is unavailable, /dev/random will block, waiting for entropy to "build up" by monitoring external devices (e.g. mouse, disk drives). This may or may not be what you want. If you'd prefer never to wait and are satisfied with possibly-lower-quality randomness, use /dev/urandom instead -- it's a non-blocking pseudorandom number generator that injects randomness from /dev/random whenever it is available, making it more random than a plain deterministic PRNG. (See man /dev/urandom for further details.)
This paper suggests various approaches with implementation ideas for both UN*X and Windows.
I'm not sure what you're asking for. "Entropy pool" is just a word for "some random numbers", so you could certainly use dice rolls; simply use them as the seen to a pseudorandom number generator that has the characteristics you want.
You can get physically generated random numbers online from, eg, Lavarnd or Hotbits.
Note that the amount of entropy in the pool doesn't necessarily have to be an integer. This should mostly deal with your prime-factors-other-than-2 issue.
Even if you end up using an implementation that does require integer estimates, you need quite a few dice rolls to generate a crypto key. So you could just demand them in bunches. If the user gives you the results of 10 d6 rolls, and you estimate the entropy as 25 bits, you've only lost 0.08 bits per dice roll. Remember to round down ;-)
Btw, I would treat asking the user for TRNG data, rather than drawing it from hardware sources as /dev/random does, to be a fun toy rather than an improvement. It's difficult enough for experts to generate random numbers - you don't want to leave general users at the mercy of their own amateurism. "The generation of random numbers is too important to be left to chance" --Robert Coveyou.
By another way, the authors of BSD argue that since entropy estimation for practical sources on PC hardware isn't all that well understood (being a physics problem, not a math problem), using a PRNG isn't actually that bad an option, provided that it is well-reseeded according to Schneier / Kelsey / Ferguson's Yarrow design. Your dice idea does at least have the advantage over typical sources of entropy for /dev/random, that as long as the user can be trusted to find fair dice and roll them properly, you can confidently put a lower bound the entropy. It has the disadvantage that an observer with a good pair of binoculars and/or a means of eavesdropping on their keyboard (e.g. by its E/M emissions) can break the whole scheme, so really it all depends on your threat model.
My kids asked me this question and I couldn't really give a concise, understandable explanation.
So I'm hoping someone on SO can.
How about, "Because computers just follow instructions, and random numbers are the opposite of following instructions. If you make a random number by following instructions, then it's not very random! Imagine trying to give someone instructions on how to choose a random number."
Here's a kid friendly explanation:
Get a Dice (the number of sides doesn't matter)
Write these down on a piece of paper:
Move right
Move up
Move up
Turn the dice over
Move down
Move right
Show them the dice and paper. Explain that the dice represents the computer and the
paper represent the math or algorithm that tells the computer what number it will return.
Now, roll the dice. Tell them that you are "seeding" or asking the computer to start at a random dice position.
Follow each step in the paper (move right) by moving the dice.
Let's say that you threw a 6 sided die and it was seeded at 5. By moving right, you get a 4.
Explain that the computer must start with a starting value. This could be given by any number of sources such as the date or mouse movement. Show them that how they throw the dice determines the starting value.
Explain that the piece of paper is how the computer get the next number. Tell them that the instructions on the paper can be changed as easily as the algorithm for the random generator can be changed by the programmer.
Have fun showing them the various possibilities that is only limited by their imaginations.
Now for the answer to your question:
Tell them that when a good mathematician knows the starting value and what step the computer is currently at, the mathematician can tell what is the next value of the random number.
Ask the child were to hide the paper and throw the dice.
Then ask the child to follow the steps on the paper, you then write down how he gets the next random number.
Afterwards, show them your paper. Now that you have a copy of their random number generator, its easy for anyone else to "guess" the next random to come out.
No matter how creative the child is with their algorithm, you should still be able to deduce their algorithm. Tell your child that in the computer world, nothing is hidden and just by observation, even if its just the numbers that was observed, the random number algorithm can be discovered.
...as a side effect, if the child was able to come up with a good algorithm that confused you, in which you can't deduce the next sequence, then you have a bright child. :D
Here's my attempt at explaining randomness at an approximately eighth-grade level. Hope your kids find it useful!
Surprising as it may seem, a computer is not very smart. Computers must follow their instructions blindly, and are therefore completely predictable. A computer that doesn't follow its instructions in this manner is, in fact, broken! We want computers to do exactly what we tell them.
That's precisely what makes it hard to do things randomly. Computers must be told a sequence of instructions on how to generate random numbers. But that's not really random, because if you gave anybody else the instructions and the same starting point, they could come up with the same answers. So computers can't be truly random just by following instructions.
Ask them to devise a step-by-step method to generate a random number.
And don't accept "pick a number from 1 to 10" as an answer ;)
Trying out a problem should illustrate the difficulty of having to generate random numbers from a set of instructions, just like what computers actually have to do.
Because computers are deterministic machines.
Generating random numbers on a computer is like playing "Eenie meenie miney moe" when choosing who's It first in a game of tag. On the surface it does look random, but when you get into the details, it's completely deterministic. It's hard to make eenie meenie miney moe into a scheme that a person really can't predict the outcome of.
Also there's some difficulties with getting the distribution nice and even.
Because given any input, an algorithm produces the exact same output every single time. And you can't just provide a "random" input, because you're trying to generate the random number in the first place.
"Kids, unless they're broken, computers never lie, and they always do what you tell them to do. Even when we are disappointed by the results, it always turns out that they were doing what they were told to do with complete fidelity. They can only do two things: add one and one, and move a number from one place to another. If you want them to produce random numbers, you need to explain to them how to do that in terms of adding one and one and moving. Once you have explained that, the results will not be random."
Because the only true source of randomness exists at the quantum level. With suitable hardware assists, computers can access this level. for example, they can sample the decay of a radioactve isotope or the noise from a thermionic valve. But your basic PC doesn't come with this cool stuff.
A simple explanation for the children:
The definition of randomness is a philosophical and mathematical question, beyond the scope of this answer, but by definition there is no such thing as a "random" number. In a metaphysical sense, a number is only random in sequential form; however, there is a probability that a sequence follows certain statistical distributions depending on the sample size. A random number generator (in our case a pseudo-random number generator, or PRNG) is simply a device to produce a quasi-random sequence of numbers that we can only estimate (based on the given probability inherent within the sequence) to be random.
You should explain to the children that programs can only mimic these devices using complex mathematical formulas (which guarantee a lack of "randomness" by definition because they are a result of some function, or procedural algorithm). Typically, rigorous statistical analysis is necessary in order to differentiate the use of a quantum hardware PRNG (use this as an opportunity to explain to your kids the Heisenberg Principle!) and that of a strong software PRNG.
Had to be done really
Source: http://xkcd.com/221/
Because there is no such thing as a random number.
Random is a human concept that we use when we cannot comprehend data and do not understand it. If we are to believe that science will ultimately lead to an understanding of how everything works then surely everything is deterministic.
Take away the human and there is no random there is only "this". It happens because it happens, not because it is random.
Because a program is a system and everything in a system is made to run with consistency and regularity. Randomness has no place in a system.
It is hard because given the same sets of inputs and conditions, a program will produce the same result everytime. This by definition is not random.
Algorithms to generate random numbers are inevitably deterministic. They take a small random seed, and use it to obtain a long string of pseudo-random digits.
It's very difficult to do this without introducing subtle patterns into the data. A string of digits can look perfectly random but have repeated patterns which make the distribution innappropriate for applications where randomness is required.
Computers can only execute algorithmic computations, and a truly random number isn't an algorithmic thing. You can get algorithms that produce numbers that behave like random numbers; such algorithms are called 'Pseudo-Random number generators'.
At various times in the past, people have made random number generators from analog-digital converters connected to sources of electronic noise, but this tends to be fairly specialised kit.
Primarily because computers don't have any functions that behave in discrete, non-random ways. A computer is predictable, which allows us to program reliable software. If it wasn't predictable it would be easier to generate a random number (since our software could rely on this unpredictable method).
While it's possible to generate pseudo-random numbers, and numbers that are distributed randomly, you cannot generate truly random numbers without separate hardware. There is hardware that generates truly random numbers based on "quantum" interactions (at least according to the manufacturers). Online poker sites sometimes use these adapters for their generators.
Apparently there are even online services to provide random numbers - random.org for example.
As surprising as it may seem, it is difficult to get a computer to do something by chance. A computer follows its instructions blindly and is therefore completely predictable. (A computer that doesn't follow its instructions in this manner is broken.) There are two main approaches to generating random numbers using a computer: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs).
Actually, on most modern computers it's not hard to produce numbers that are "random enough" for most purposes. As others have noted, the critical thing is having a source of randomness. You can't just write a program that will produce randomness algorithmically, but you can observe randomness in the various activities of most computers of reasonable complexity, i.e., the ones we typically think of when writing programs. One such source is timing data of interrupts from various system devices.
At one time many computers had no way to get at this data and could only offer pseudorandomness, that is, a random, but repeatable distribution of numbers based on a particular seed. For many purposes this is sufficient -- choosing a different seed each time results in good enough randomness. For other purposes, such as encryption, this isn't strong enough and you need some randomness to start with that isn't repeatable or predictable. Today, most computers (with the exception of embedded devices, perhaps) are sophisticated enough to have a source of randomness that can generate encryption-strength random numbers. For instance, Linux has /dev/random and the .NET framework supports the cryptographically strong RandomNumberGenerator class which has a number of implementations.
Its probably helpful to distinguish between a number that is hard to predict (which a computer can create) from something that is not deterministic (which is a bit tougher for computers, and theoretically, any physical being).
It's easy to come up with an algorithm that generates unexpected numbers, that appear random in some sense. But to design an algorithm that generates true random numbers, well, that's hard.
Imagine designing an algorithm to simulate a dice roll. You can easily formulate some procedure to generate different numbers on each iteration. But can you guarantee that, in the long run (I mean, up to the infinity), the amount of times that 6 came out will be the same as any other number? When designing a good random number generator, that's the kind of commitment that you have to assume. You have to provide strong guarantees (i.e. mathematical proofs) about the randomness, if the application (e.g. lottery) requires it.
It is relevant to note that humans perform very poorly at generating random numbers. Computers are worse because they just follow a strict set commands. Humans can only generate good (pseudo) random numbers when following an algorithm, a set of commands. Computers are the same.
Although it should be noted that computers can gather entropy from the "environment" connected to it, like keyboard and mouse actions, what aids in generating random numbers (either directly or by seeding a PRNG).
To make the computer generate a random number, the computer has to have a source of randomness to start with.
It has to be feeded a seed that can't be expected or calculated by just looking at the seed, if the seed comes from a clock then it can be predicted or calculated by knowing the time, if the seed comes from like filming a lavalamp and get numbers from the picture stream then it's harder to just look at the seed to know what next number will be.
The computer does not have an built in lava lamp to generate that randomness, thats whats make it hard, we have to substitute real randomness with some input that exists in the computer, maybe by logging passing tcpip-packets or other things, but its not many ways to get that randomness sources in.
Computers just don't have suitable hardware. Ordinary computer's hardware is meant to be deterministic. With suitable hardware like mentioned here random numbers are not a problem at all.
Awhile back I came across the "Dice-O-Matic"
http://GamesByEmail.com/News/DiceOMatic
Kind of interesting real world application of the problem.
Its not hard, here's a couple for free: 12, 1400, 397.6
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Sorry for this not being a "real" question, but Sometime back i remember seeing a post here about randomizing a randomizer randomly to generate truly random numbers, not just pseudo random. I dont see it if i search for it.
Does anybody know about that article?
I have to disagree with a lot of the answers to this question.
It is possible to collect random data on a computer. SSL, SSH and VPNs would not be secure if you couldn't.
The way software random number generator work is there is a pool of random data that is gathered from many different places, such as clock drift, interrupt timings, etc.
The trick to these schemes is in correctly estimating the entropy (the posh name for the randomness). It doesn't matter whether the source is bias, as long as you estimate the entropy correctly.
To illustrate this, the chance of me hitting the letter e in this comment is much higher than that of z , so if I were to use key interrupts as a source of entropy it would be bias - but there is still some randomness to be had in that input. You can't predict exactly which sequence of letters will come next in this paragraph. You can extract entropy from this uncertainty and use it part of a random byte.
Good quality real-random generators like Yarrow have quite sophisticated entropy estimation built in to them and will only emit as many bytes as it can reliably say it has in its "randomness pool."
I believe that was on thedailywtf.com - ie. not something that you want to do.
It is not possible to get a truly random number from pseudorandom numbers, no matter how many times you call randomize().
You can get "true" random numbers from special hardware. You could also collect entropy from mouse movements and things like that.
At the end of the post, I will answer your question of why you might want to use multiple random number generators for "more randomness".
There are philosophical debates about what randomness means. Here, I will mean "indistinguishable in every respect from a uniform(0,1) iid distribution over the samples drawn" I am totally ignoring philosophical questions of what random is.
Knuth volume 2 has an analysis where he attempts to create a random number generator as you suggest, and then analyzes why it fails, and what true random processes are. Volume 2 examines RNGs in detail.
The others recommend you using random physical processes to generate random numbers. However, as we can see in the Espo/vt interaction, these processes can have subtle periodic elements and other non-random elements, in part due to outside factors with deterministic behavior. In general, it is best never to assume randomness, but always to test for it, and you usually can correct for such artifacts if you are aware of them.
It is possible to create an "infinite" stream of bits that appears completely random, deterministically. Unfortunately, such approaches grow in memory with the number of bits asked for (as they would have to, to avoid repeating cycles), so their scope is limited.
In practice, you are almost always better off using a pseudo-random number generator with known properties. The key numbers to look for is the phase-space dimension (which is roughly offset between samples you can still count on being uniformally distributed) and the bit-width (the number of bits in each sample which are uniformally random with respect to each other), and the cycle size (the number of samples you can take before the distribution starts repeating).
However, since random numbers from a given generator are deterministically in a known sequence, your procedure might be exposed by someone searching through the generator and finding an aligning sequence. Therefore, you can likely avoid your distribution being immediately recognized as coming from a particular random number generator if you maintain two generators. From the first, you sample i, and then map this uniformally over one to n, where n is at most the phase dimension. Then, in the second you sample i times, and return the ith result. This will reduce your cycle size to (orginal cycle size/n) in the worst case, but for that cycle will still generate uniform random numbers, and do so in a way that makes the search for alignment exponential in n. It will also reduce the independent phase length. Don't use this method unless you understand what reduced cycle and independent phase lengths mean to your application.
An algorithm for truly random numbers cannot exist as the definition of random numbers is:
Having unpredictable outcomes and, in
the ideal case, all outcomes equally
probable; resulting from such
selection; lacking statistical
correlation.
There are better or worse pseudorandom number generators (PRNGs), i.e. completely predictable sequences of numbers that are difficult to predict without knowing a piece of information, called the seed.
Now, PRNGs for which it is extremely hard to infer the seed are cryptographically secure. You might want to look them up in Google if that is what you seek.
Another way (whether this is truly random or not is a philosophical question) is to use random sources of data. For example, unpredictable physical quantities, such as noise, or measuring radioactive decay.
These are still subject to attacks because they can be independently measured, have biases, and so on. So it's really tricky. This is done with custom hardware, which is usually quite expensive. I have no idea how good /dev/random is, but I would bet it is not good enough for cryptography (most cryptography programs come with their own RNG and Linux also looks for a hardware RNG at start-up).
According to Wikipedia /dev/random, in Unix-like operating systems, is a special file that serves as a true random number generator.
The /dev/random driver gathers environmental noise from various non-deterministic sources including, but not limited to, inter-keyboard timings and inter-interrupt timings that occur within the operating system environment. The noise data is sampled and combined with a CRC-like mixing function into a continuously updating ``entropy-pool''. Random bit strings are obtained by taking a MD5 hash of the contents of this pool. The one-way hash function distills the true random bits from pool data and hides the state of the pool from adversaries.
The /dev/random routine maintains an estimate of true randomness in the pool and decreases it every time random strings are requested for use. When the estimate goes down to zero, the routine locks and waits for the occurrence of non-deterministic events to refresh the pool.
The /dev/random kernel module also provides another interface, /dev/urandom, that does not wait for the entropy-pool to re-charge and returns as many bytes as requested. As a result /dev/urandom is considerably faster at generation compared to /dev/random which is used only when very high quality randomness is desired.
John von Neumann once said something to the effect of "anyone attempting to generate random numbers via algorithmic means is, of course, living in sin."
Not even /dev/random is random, in a mathematician's or a physicist's sense of the word. Not even radioisotope decay measurement is random. (The decay rate is. The measurement isn't. Geiger counters have a small reset time after each detected event, during which time they are unable to detect new events. This leads to subtle biases. There are ways to substantially mitigate this, but not completely eliminate it.)
Stop looking for true randomness. A good pseudorandom number generator is really what you're looking for.
If you believe in a deterministic universe, true randomness doesn't exist. :-) For example, someone has suggested that radioactive decay is truly random, but IMHO, just because scientists haven't yet worked out the pattern, doesn't mean that there isn't a pattern there to be worked out. Usually, when you want "random" numbers, what you need are numbers for encryption that no one else will be able to guess.
The closest you can get to random is to measure something natural that no enemy would also be able to measure. Usually you throw away the most significant bits, from your measurement, leaving numbers with are more likely to be evenly spread. Hard core random number users get special hardware that measures radioactive events, but you can get some randomness from the human using the computer from things like keypress intervals and mouse movements, and if the computer doesn't have direct users, from CPU temperature sensors, and from network traffic. You could also use things like web cams and microphones connected to sound cards, but I don't know if anyone does.
To summarize some of what has been said, our working definition of what a secure source of randomness is is similar to our definition of cryptographically secure: it appears random if smart folks have looked at it and weren't able to show that it isn't completely unpredictable.
There is no system for generating random numbers which couldn't conceivably be predicted, just as there is no cryptographic cipher that couldn't conceivably be cracked. The trusted solutions used for important work are merely those which have proven to be difficult to defeat so far. If anyone tells you otherwise, they're selling you something.
Cleverness is rarely rewarded in cryptography. Go with tried and true solutions.
A computer usually has many readily available physical sources of random noise:
Microphone (hopefully in a noisy place)
Compressed video from a webcam (pointed to something variable, like a lava lamp or a street)
Keyboard & mouse timing
Network packet content and timing (the whole world contributes)
And sometimes
Clock drift based hardware
Geiger counters and other detectors of rare events
All sorts of sensors attached to A/D converters
What's difficult is estimating the entropy of these sources, which is in most cases low despite the high data rates and very variable; but entropy can be estimated with conservative assumptions, or at least not wasted, to feed systems like Yarrow or Fortuna.
It's not possible to obtain 'true' random numbers, a computer is a logical construct that can't possibly create 'truly' random anything, only pseudo-random. There are better and worse pseudo-random algorithms out there, however.
In order to obtain a 'truly' random number you need a physical random source, some gambling machines actually have these built in - often it's a radioactive source, the radioactive decay (which as far as I know is truly random) is used to generate the numbers.
One of the best method to generate a random number is through Clock Drift. This primarily works with two oscillators.
An analogy of how this works is imagine a race car on a simple oval circuit with a while line at the start of the lap and also a while line on one of the tyres. When the car completes a lap, a number will be generated based on the difference between the position of the white line on the road and on the tyre.
Very easy to generate and impossible to predict.