Does any software exist for building entropy pools from user input? - random

It'd be nice to be able, for some purposes, to bypass any sort of algorithmically generated random numbers in favor of natural input---say, dice rolls. Cryptographic key generation, for instance, strikes me as a situation where little enough random data is needed, and the requirement that the data be truly random is high enough, that this might be a feasible and desirable thing to do.
So what I'd like to know, before I go and get my hands dirty, is this: does any software exist for building an entropy pool directly from random digit input? Note that it's not quite enough to simply convert things from radix r to radix 2; since, for instance, 3 and 2 are relatively prime, it's not entirely straightforward to turn a radix-3 (or radix-6) number into binary digits while holding onto maximal entropy in the original input.

The device /dev/random does exactly this on Linux -- maybe it would be worth looking at the source?
EDIT:
As joeytwiddle says, if sufficient randomness is unavailable, /dev/random will block, waiting for entropy to "build up" by monitoring external devices (e.g. mouse, disk drives). This may or may not be what you want. If you'd prefer never to wait and are satisfied with possibly-lower-quality randomness, use /dev/urandom instead -- it's a non-blocking pseudorandom number generator that injects randomness from /dev/random whenever it is available, making it more random than a plain deterministic PRNG. (See man /dev/urandom for further details.)

This paper suggests various approaches with implementation ideas for both UN*X and Windows.

I'm not sure what you're asking for. "Entropy pool" is just a word for "some random numbers", so you could certainly use dice rolls; simply use them as the seen to a pseudorandom number generator that has the characteristics you want.
You can get physically generated random numbers online from, eg, Lavarnd or Hotbits.

Note that the amount of entropy in the pool doesn't necessarily have to be an integer. This should mostly deal with your prime-factors-other-than-2 issue.
Even if you end up using an implementation that does require integer estimates, you need quite a few dice rolls to generate a crypto key. So you could just demand them in bunches. If the user gives you the results of 10 d6 rolls, and you estimate the entropy as 25 bits, you've only lost 0.08 bits per dice roll. Remember to round down ;-)
Btw, I would treat asking the user for TRNG data, rather than drawing it from hardware sources as /dev/random does, to be a fun toy rather than an improvement. It's difficult enough for experts to generate random numbers - you don't want to leave general users at the mercy of their own amateurism. "The generation of random numbers is too important to be left to chance" --Robert Coveyou.
By another way, the authors of BSD argue that since entropy estimation for practical sources on PC hardware isn't all that well understood (being a physics problem, not a math problem), using a PRNG isn't actually that bad an option, provided that it is well-reseeded according to Schneier / Kelsey / Ferguson's Yarrow design. Your dice idea does at least have the advantage over typical sources of entropy for /dev/random, that as long as the user can be trusted to find fair dice and roll them properly, you can confidently put a lower bound the entropy. It has the disadvantage that an observer with a good pair of binoculars and/or a means of eavesdropping on their keyboard (e.g. by its E/M emissions) can break the whole scheme, so really it all depends on your threat model.

Related

What are typical means by which a random number can be generated in an embedded system?

What are typical means by which a random number can be generated in an embedded system? Can you offer advantages and disadvantages for each method, and/or some factors that might make you choose one method over another?
First, you have to ask a fundamental question: do you need unpredictable random numbers?
For example, cryptography requires unpredictable random numbers. That is, nobody must be able to guess what the next random number will be. This precludes any method that seeds a random number generator from common parameters such as the time: you need a proper source of entropy.
Some applications can live with a non-cryptographic-quality random number generator. For example, if you need to communicate over Ethernet, you need a random number generator for the exponential back-off; statistic randomness is enough for this¹.
Unpredictable RNG
You need an unpredictable RNG whenever an adversary might try to guess your random numbers and do something bad based on that guess. For example, if you're going to generate a cryptographic key, or use many other kinds of cryptographic algorithms, you need an unpredictable RNG.
An unpredictable RNG is made of two parts: an entropy source, and a pseudo-random number generator.
Entropy sources
An entropy source kickstarts the unpredictability. Entropy needs to come from an unpredictable source or a blend of unpredictable sources. The sources don't need to be fully unpredictable, they need to not be fully predictable. Entropy quantifies the amount of unpredictability. Estimating entropy is difficult; look for research papers or evaluations from security professionals.
There are three approaches to generating entropy.
Your device may include some non-deterministic hardware. Some devices include a dedicated hardware RNG based on physical phenomena such as unstable oscillators, thermal noise, etc. Some devices have sensors which capture somewhat unpredictable values, such as the low-order bits of light or sound sensors.
Beware that hardware RNG often have precise usage conditions. Most methods require some time after power-up before their output is truly random. Often environmental factors such as extreme temperatures can affect the randomness. Read the RNG's usage notes very carefully. For cryptographic applications, it is generally recommended to make statistical tests the HRNG's output and refuse to operate if these tests fail.
Never use a hardware RNG directly. The output is rarely fully unpredictable — e.g. each bit may have a 60% probability of being 1, or the probability of two consecutive bits being equal may be only 48%. Use the hardware RNG to seed a PRNG as explained below.
You can preload a random seed during manufacturing and use that afterwards. Entropy doesn't wear off when you use it²: if you have enough entropy to begin with, you'll have enough entropy during the lifetime of your device. The danger with keeping entropy around is that it must remain confidential: if the entropy pool accidentally leaks, it's toast.
If your device has a connection to a trusted third party (e.g. a server of yours, or a master node in a sensor network), it can download entropy from that (over a secure channel).
Pseudo-random number generator
A PRNG, also called deterministic random bit generator (DRBG), is a deterministic algorithm that generates a sequence of random numbers by transforming an internal state. The state must be seeded with sufficient entropy, after which the PRNG can run practically forever. Cryptographic-quality PRNG algorithms are based on cryptographic primitives; always use a vetted algorithm (preferably some well-audited third-party code if available).
The PRNG needs to be seeded with entropy. You can choose to inject entropy once during manufacturing, or at each boot, or periodically, or any combination.
Entropy after a reboot
You need to take care that the device doesn't boot twice in the same RNG state: otherwise an observer can repeat the same sequence of RNG calls after a reset and will know the RNG output the second time round. This is an issue for factory-injected entropy (which by definition is always the same) as well as for entropy derived from sensors (which takes time to accumulate).
If possible, save the RNG state to persistent storage. When the device boots, read the RNG state, apply some transformation to it (e.g. by generating one random word), and save the modified state. After this is done, you can start returning random numbers to applications and system services. That way, the device will boot with a different RNG state each time.
If this is not possible, you ned to be very careful. If your device has factory-injected entropy plus a reliable clock, you can mix the clock value into the RNG state to achieve unicity; however, beware that if your device loses power and the clock restarts from some fixed origin (blinking twelve), you'll be in a repeatable state.
Predictable RNG state after a reset or at the first boot is a common problem with embedded devices (and with servers). For example, a study of RSA public keys showed that many had been generated with insufficient entropy, resulting in many devices generating the same key³.
Statistical RNG
If you can't achieve a cryptographic quality, you can fall back to a less good RNG. You need to be aware that some applications (including a lot of cryptography) will be impossible.
Any RNG relies on a two-part structure: a unique seed (i.e. an entropy source) and a deterministic algorithm based on that seed.
If you can't gather enough entropy, at least gather as much as possible. In particular, make sure that no two devices start from the same state (this can usually be achieved by mixing the serial number into the RNG seed). If at all possible, arrange for the seed not to repeat after a reset.
The only excuse not to use a cryptographic DRBG is if your device doesn't have enough computing power. In that case, you can fall back to faster algorithm that allow observers to guess some numbers based on the RNG's past or future output. The Mersenne twister is a popular choice, but there have been improvements since its invention.
¹ Even this is debatable: with non-crypto-quality random backoff, another device could cause a denial of service by aligning its retransmission time with yours. But there are other ways to cause a DoS, by transmitting more often.
² Technically, it does, but only at an astronomical scale.
³ Or at least with one factor in common, which is just as bad.
One way to do it would be to create a Pseudo Random Bit Sequence, just a train of zeros and ones, and read the bottom bits as a number.
PRBS can be generated by tapping bits off a shift register, doing some logic on them, and using that logic to produce the next bit shifted in. Seed the shift register with any non zero number. There's a math that tells you which bits you need to tap off of to generate a maximum length sequence (i.e., 2^N-1 numbers for an N-bit shift register). There are tables out there for 2-tap, 3-tap, and 4-tap implementations. You can find them if you search on "maximal length shift register sequences" or "linear feedback shift register.
from: http://www.markharvey.info/fpga/lfsr/
HOROWITZ AND HILL gave a great part of a chapter on this. Most of the math surrounds the nature of the PRBS and not the number you generate with the bit sequence. There are some papers out there on the best ways to get a number out of the bit sequence and improving correlation by playing around with masking the bits you use to generate the random number, e.g., Horan and Guinee, Correlation Analysis of Random Number Sequences based on Pseudo Random Binary Sequence Generation, In the Proc. of IEEE ISOC ITW2005 on Coding and Complexity; editor M.J. Dinneen; co-chairs U. Speidel and D. Taylor; pages 82-85
An advantage would be that this can be achieved simply by bitshifting and simple bit logic operations. A one-liner would do it. Another advantage is that the math is pretty well understood. A disadvantage is that this is only pseudorandom, not random. Also, I don't know much about random numbers, and there might be better ways to do this that I simply don't know about.
How much energy you expend on this would depend on how random you need the number to be. If I were running a gambling site, and needed random numbers to generate deals, I wouldn't depend on Pseudo Random Bit Sequences. In those cases, I would probably look into analog noise techniques, maybe Johnson Noise around a big honking resistor or some junction noise on a PN junction, amplify that and sample it. The advantages of that are that if you get it right, you have a pretty good random number. The disadvantages are that sometimes you want a pseudorandom number where you can exactly reproduce a sequence by storing a seed. Also, this uses hardware, which someone must pay for, instead of a line or two of code, which is cheap. It also uses A/D conversion, which is yet another peripheral to use. Lastly, if you do it wrong -- say make a mistake where 60Hz ends up overwhelming your white noise-- you can get a pretty lousy random number.
What are typical means by which a random number can be generated in an embedded system?
Giles indirectly stated this: it depends on the use.
If you are using the generator to drive a simulation, then all you need is a uniform distribution and a linear congruential generator (LCG) will work fine.
If you need a secure generator, then its a trickier problem. I'm side-stepping what it means to be secure, but from 10,000 feet think "wrap it in a cryptographic transformation", like a SHA-1/HMAC or SHA-512/HMAC. There are others ways, like sampling random events, but they may not be viable.
When you need secure random numbers, some low resource devices are notoriously difficult to work with. See, for example, Mining Your Ps and Qs: Detection of Widespread Weak Keys in Network Devices and Traffic sensor flaw that could allow driver tracking fixed. And a caveat for Linux 3.0 kernel users: the kernel removed a couple of entropy sources, so entropy depletion and starvation might have gotten worse. See Appropriate sources of entropy on LWN.
If you have a secure generator, then your problem becomes getting your hands on a good seed (or seeds over time). One of the better methods I have seen for environments that are constrained is Hedging. Hedging was proposed for Virtual Machines where a program could produce the same sequence after a VM reset.
The idea for hedging is to extract the randomness provided by your peer, and use it to keep you secure generator fit. For example, in the case of TLS, there is a client_random and a server_random. If the device is a server, then it would stir in the client_random. If the device is a client, then it would stir in server_random.
You can find the two papers of interest that address hedging at:
When Good Randomness Goes Bad: Virtual Machine Reset
Vulnerabilities and Hedging Deployed Cryptography
When Virtual is Harder than Real: Resource Allocation Challenges in
Virtual Machine Based IT Environments
Using client_random and a server_random is consistent with Peter Guttman's view on the subject: "mix every entropy source you can get your hands on into your PRNG, including less-than-perfect ones". Gutmann is the author of Engineering Security.
Hedging only solves part of the problem. You will still need to solve other problems, like how to bootstrap the entropy pool, how to regenerate system key pairs when the pool is in a bad state, and how persist the entropy across reboots when there's no filesystem.
Although it may not be the most complex or sound method, it can be fun to use external stimuli as your seed for random number generation. Consider using analogue input from a photodiode, or a thermistor. Even random noise from a floating pin could potentially yield some interesting results.

Is there a somewhat-reliable way to detect that a list of integers came from a common PRNG?

Basically I'm looking for a detective function. I pass it a list of integers (probably between 20 and 100 integers) and it tell me "Yeah, 84% chance this came from a PRNG, I tested it against the main ones that most modern programming languages use", or "No, only 12% chance this came from a well-known PRNG".
If it helps (or hinders), the integers will always be between 1 and 999.
Does this exist?
Unless you are prepared to break new ground in number theory, you would only be able to detect obsolete, badly designed, or poorly seeded PRNGs. Good PRNGs are explicitly designed to prevent what you are trying to do. Random number generation is a critical part of digital cryptography, so a lot of effort goes into producing random numbers that meet all known tests.
There are batteries of tests to profile PRNGs. See for example this NIST page.
As the comments point out, the first two sentences are overstated and are only strictly true for PRNGs that may be used in cryptography. Weaker (i.e. more predictable) PRNGs might be chosen for other domains in order to improve time or space performance.
You can write a battery of tests for a list of candidate generators, but there are a lot of generators, and some have enormous state where adjacent values of a well-seeded generator will reveal nothing useful and you'll have to see wait for a long time before you can get the two data points which will have an informative relationship.
On the plus side; while the list of random number generators that you might encounter is vast, there are telltale signs that will help you identify some classes of simple generators quickly and then you can perform focussed analysis to derive the specific configuration.
Unfortunately even a simple generator like KISS shows that while the generator can be trivially broken when you know its configuration, it can hide its signature from anything that does not know its configuration, leaving you in a situation where you have to individually test for every possible configuration.
There are quality tests like dieharder and TestU01 which will consume many megabytes of data to identify any weakness in a generator; however, these can also identify weaknesses in real RNGs, so they could give a strong false positive.
To consume only a 100 integers you would really need to have a list of generators in mind. For example, to detect LCG used inappropriately, you simply test to see if the bottom three bits cycle through a repeating pattern of 8 values -- but this is by far the easiest case.
If you had a sequence 625 or more 32-bit integers, you could detect with high confidence whether it was from consecutive calls to Mersenne Twister. That is because it leaks state information in the output values.
For an example of how it is done, see this blog entry.
Similar results are in theory possible when you don't have ideal data such as full 32-bit integers, but you would need a longer sequence and the maths gets harder. You would also need to know - or perhaps guess by trying obvious options - how the numbers were being reduced from the larger range to the smaller one.
Similar results are possible from other PRNGs, but generally only the non-cryptographic ones.
In principle you could identify specific PRNG sequences with very high confidence, but even simple barriers such as missing numbers from the strict sequence can make it a lot harder. There will also be many PRNGs that you will not be able to reliably detect, and typically you will either have close to 100% confidence of a match (to a hackable PRNG) or 0% confidence of any match.
Whether or not a PRNG is a hackable (and therefore could be detected by the numbers it emits) is not a general indicator of PRNG quality. Obviously, "hackable" is opposite to a requirement for "secure", so don't consider Mersenne Twister for creating unguessable codes. However, do consider it as a source of randomness for e.g. neural networks, genetic algorithms, monte-carlo simulations and other places where you need a lot of statistically random-looking data.

Commercial-grade randomization for Poker game

I need some advice on how to tackle an algorithmic problem (ie. not programming per se). What follows are my needs and how I tried to meet them. Any comments for improvement would be welcome.
Let me first start off by explaining my goal. I would like to play some poker about a billion times. Maybe I'm trying to create the next PokerStars.net, maybe I'm just crazy.
I would like to create a program that can produce better randomized decks of cards, than say the typical program calling random(). These need to be production quality decks created from high quality random numbers. I've heard that commercial-grade poker servers use 64-bit vectors for every card, thus ensuring randomness for all the millions of poker games played daily.
I'd like to keep whatever I write simple. To that end, the program should only need one input to achieve the stated goal. I have decided that whenever the program begins, it will record the current time and use that as the starting point. I realize that this approach would not be feasible for commercial environments, but as long as it can hold up for a few billion games, better than simpler alternatives, I'll be happy.
I began to write pseudo-code to solve this problem, but ran into a thorny issue. It's clear to me, but it might not be to you, so please let me know.
Psuedo-code below:
Start by noting the system time.
Hash the current time (with MD5) around ten times (I chose the ten arbitrarily).
Take the resulting hash, and use it as the seed to the language-dependent random() function.
Call random() 52 times and store the results.
Take the values produced by random() and hash them.
Any hash function that produces at least 64-bits of output will do for this.
Truncate (if the hash is too big) so the hashes will fit inside a 64-bit double.
Find a way to map the 52 doubles (which should be random now, according to my calculations) into 52 different cards, so we can play some poker.
My issue is with the last step. I cannot think of a way to properly map each 64-bit value to a corresponding card, without having to worry about two numbers being the same (unlikely) or losing any randomness (likely).
My first idea was to break 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF into four even sections (to represent the suits). But there is no guarantee that we will find exactly thirteen cards per section, which would be bad.
Now that you know where I am stuck, how would you overcome this challenge?
-- Edited --
Reading bytes from /dev/random would work well actually. But that still leaves me lost on how to do the conversion? (assuming I read enough bytes for 52 cards).
My real desire is to take something simple and predictable, like the system time, and transform it into a randomized deck of cards. Seeding random() with the system time is a BAD way of going about doing this. Hence the hashing of the time and hashing the values that come out of random().
Hell, if I wanted to, I could hash the bytes from /dev/random, just for shizzles and giggles. Hashing improves the randomness of things, doesn't it? Isn't that why modern password managers store passwords that have been hashed thousands of times?
-- Edit 2 --
So I've read your answers and I find myself confused by the conclusion many of you are implying. I hinted at it in my first edit, but it's really throwing me for a loop. I'd just like to point it out and move on.
Rainbow tables exist which do funky math and clever magic to essentially act as a lookup table for common hashes that map to a particular password. It is my understanding that longer, better passwords are unlikely to show up in these rainbow tables. But the fact still stands that despite how common many user passwords are, the hashed passwords remain safe after being hashed thousands of times.
So is that a case where many deterministic operations have increased the randomness of the original password (or seems to?) I'm not saying I'm right, I'm just saying thats my feeling.
The second thing I want to point out is I'm doing this backwards.
What I mean is that you all are suggesting I take a sorted, predictable, non-random deck of cards and use the Fisher-Yates shuffle on it. I'm sure Fisher-Yates is a fine algorithm, but lets say you couldn't use it for whatever reason.
Could you take a random stream of bytes, say in the neighborhood of 416 bytes (52 cards with 8 bytes per card) and BAM produce an already random deck of cards? The bytes were random, so it shouldn't be too hard to do this.
Most people would start with a deck of 52 cards (random or not) and swap them around a bunch of times (by picking a random index to swap). If you can do that, then you can take 52 random numbers, run through them once, and produce the randomized deck.
As simply as I can describe it,
The algorithm to accepts a stream of randomized bytes and looks at each 8-byte chunk. It maps each chunk to a card.
Ex. 0x123 maps to the Ace of Spades
Ex. 0x456 maps to the King of Diamonds
Ex. 0x789 maps to the 3 of Clubs
.... and so on.
As long as we chose a good model for the mapping, this is fine. No shuffling required. The program will be reduced to two steps.
Step 1: Obtain a sufficient quantity of random bytes from a good source
Step 2: Split this stream of bytes into 52 chunks, one for each card in the deck
Step 2a: Run through the 52 chunks, converting them into card values according to our map.
Does that makes sense?
You are massively overcomplicating the problem. You need two components to solve your problem:
A shuffling algorithm
A sufficiently high-quality random number generator for the shuffling algorithm to use.
The first is easy, just use the Fisher-Yates shuffle algorithm.
For the second, if you want sufficient degrees of freedom to be able to generate every possible permutation (of the 52! possibilities) then you need at least 226 bits of entropy. Using the system clock won't give you more than 32 or 64 bits of entropy (in practice far fewer as most of the bits are predictable), regardless of how many redundant hashes you perform. Find an RNG that uses a 256-bit seed and seed it with 256 random bits (a bootstrapping problem, but you can use /dev/random or a hardware RNG device for this).
You don't mention which OS you're on, but most modern OS's have pre-made sources of high quality entropy. On Linux, it's /dev/random and /dev/urandom, from which you can read as many random bytes as you want.
Writing your own random number generator is highly non-trivial, if you want good randomness. Any homebrew solution is likely to be flawed and could potentially be broken and its outputs predicted.
You will never improve your randomness if you still use a pseudo-random generator, no matter how many deterministic manipulations you do to it. In fact, you are probably making it considerably worse.
I would use a commercial random number generator. Most use hardware solutions, like a Geiger counter. Some use existing user input as a source of entropy, such as background noise into the computer's microphone or latency between keyboard strokes.
Edit:
You mentioned that you also want to know how to map this back to a shuffle algorithm. That part is actually quite simple. One straightforward way is Fisher-Yates shuffle. Basically all you need from your RNG is a random number uniformly distributed between 0 and 51 inclusive. That you can do computationally given any RNG and is usually built into a good library. See the "Potential sources of bias" section of the Wikipedia article.
Great question!
I would strongly discourage you from using the random function that comes built-in with any programming language. This generates pseudorandom numbers that are not cryptographically secure, and so it would be possible for a clever attacker to look at the sequence of numbers coming back out as cards and to reverse-engineer the random number seed. From this, they could easily start predicting the cards that would come out of the deck. Some early poker sites, I've heard, had this vulnerability.
For your application, you will need cryptographically secure random numbers so that an adversary could not predict the sequence of cards without breaking something cryptographically assumed to be secure. For this, you could either use a hardware source of randomness or a cryptographically secure pseudorandom number generator. Hardware random generators can be expensive, so a cryptographically secure PRNG may be a good option.
The good news is that it's very easy to get a cryptographically secure PRNG. If you take any secure block cipher (say, AES or 3DES) and using a random key start encrypting the numbers 0, 1, 2, ..., etc. then the resulting sequence is cryptographically secure. That is, you could use /dev/random to get some random bytes for use as a key, then get random numbers by encrypting the integers in sequence using a strong cipher with the given key. This is secure until you hand back roughly √n numbers, where n is the size of the key space. For a cipher like AES-256, this is 2128 values before you'd need to reset the random key. If you "only" want to play billions of games (240), this should be more than fine.
Hope this helps! And best of luck with the project!
You should definitely read the answer to this question: Understanding "randomness"
Your approach of applying a number of arbitrary transformations to an existing pseudorandom number is very unlikely to improve your results, and in fact risks rendering less random numbers.
You might consider using physically derived random numbers rather than pseudorandom numbers:
http://en.wikipedia.org/wiki/Hardware_random_number_generator
If you are definitely going to use pseudorandom numbers, then you are likely to be best off seeding with your operating system's randomness device, which is likely to include additional entropy from things like disk seek times as well as user IO.
Reading bytes from /dev/random would work well actually. But that still leaves me lost on how to do the conversion? (assuming I read enough bytes for 52 cards).
Conversion of what? Just take a deck of cards and, using your cryptographically-secure PRNG, shuffle it. This will produce every possible deck of cards with equal probability, with no way for anyone to determine what cards are coming next - that's the best you could possibly do.
Just make sure you implement the shuffling algorithm correctly :)
In terms of actually turning the random numbers into cards(once you follow the advice of others in generating the random numbers), You can map the lowest number to the Ace of diamonds, the 2nd lowest number to the 2 of diamonds, etc.
Basically you assume the actual cards have a natural ordering and then you sort the random numbers and map to the deck.
Edit
Apparently wikipedia lists this method as an alternative to the Fisher-Yates algorithm(which I hadn't previously heard of -Thanks Dan Dyer!). One thing in the wikipedia article that I didn't think of is that you need to be sure that you don't repeat any random numbers if you're using the algorithm I described.
A ready-made, off the shelf poker hand evaluator can be found here. All feedback welcomed at the e-mail address found therein.

Could a truly random number be generated using pings to pseudo-randomly selected IP addresses?

The question posed came about during a 2nd Year Comp Science lecture while discussing the impossibility of generating numbers in a deterministic computational device.
This was the only suggestion which didn't depend on non-commodity-class hardware.
Subsequently nobody would put their reputation on the line to argue definitively for or against it.
Anyone care to make a stand for or against. If so, how about a mention as to a possible implementation?
No.
A malicious machine on your network could use ARP spoofing (or a number of other techniques) to intercept your pings and reply to them after certain periods. They would then not only know what your random numbers are, but they would also control them.
Of course there's still the question of how deterministic your local network is, so it might not be as easy as all that in practice. But since you get no benefit from pinging random IPs on the internet, you might just as well draw entropy from ethernet traffic.
Drawing entropy from devices attached to your machine is a well-studied principle, and the pros and cons of various kinds of devices and methods of measuring can be e.g. stolen from the implementation of /dev/random.
[Edit: as a general principle, when working in the fundamentals of security (and the only practical needs for significant quantities of truly random data are security-related) you MUST assume that a fantastically well-resourced, determined attacker will do everything in their power to break your system.
For practical security, you can assume that nobody wants your PGP key that badly, and settle for a trade-off of security against cost. But when inventing algorithms and techniques, you need to give them the strongest security guarantees that they could ever possibly face. Since I can believe that someone, somewhere, might want someone else's private key badly enough to build this bit of kit to defeat your proposal, I can't accept it as an advance over current best practice. AFAIK /dev/random follows fairly close to best practice for generating truly random data on a cheap home PC]
[Another edit: it has suggested in comments that (1) it is true of any TRNG that the physical process could be influenced, and (2) that security concerns don't apply here anyway.
The answer to (1) is that it's possible on any real hardware to do so much better than ping response times, and gather more entropy faster, that this proposal is a non-solution. In CS terms, it is obvious that you can't generate random numbers on a deterministic machine, which is what provoked the question. But then in CS terms, a machine with an external input stream is non-deterministic by definition, so if we're talking about ping then we aren't talking about deterministic machines. So it makes sense to look at the real inputs that real machines have, and consider them as sources of randomness. No matter what your machine, raw ping times are not high on the list of sources available, so they can be ruled out before worrying about how good the better ones are. Assuming that a network is not subverted is a much bigger (and unnecessary) assumption than assuming that your own hardware is not subverted.
The answer to (2) is philosophical. If you don't mind your random numbers having the property that they can be chosen at whim instead of by chance, then this proposal is OK. But that's not what I understand by the term 'random'. Just because something is inconsistent doesn't mean it's necessarily random.
Finally, to address the implementation details of the proposal as requested: assuming you accept ping times as random, you still can't use the unprocessed ping times as RNG output. You don't know their probability distribution, and they certainly aren't uniformly distributed (which is normally what people want from an RNG).
So, you need to decide how many bits of entropy per ping you are willing to rely on. Entropy is a precisely-defined mathematical property of a random variable which can reasonably be considered a measure of how 'random' it actually is. In practice, you find a lower bound you're happy with. Then hash together a number of inputs, and convert that into a number of bits of output less than or equal to the total relied-upon entropy of the inputs. 'Total' doesn't necessarily mean sum: if the inputs are statistically independent then it is the sum, but this is unlikely to be the case for pings, so part of your entropy estimate will be to account for correlation. The sophisticated big sister of this hashing operation is called an 'entropy collector', and all good OSes have one.
If you're using the data to seed a PRNG, though, and the PRNG can use arbitrarily large seed input, then you don't have to hash because it will do that for you. You still have to estimate entropy if you want to know how 'random' your seed value was - you can use the best PRNG in the world, but its entropy is still limited by the entropy of the seed.]
Random numbers are too important to be left to chance.
Or external influence/manipulation.
Short answer
Using ping timing data by itself would not be truly random, but it can be used as a source of entropy which can then be used to generate truly random data.
Longer version
How random are ping times?
By itself, timing data from network operations (such as ping) would not be uniformly distributed. (And the idea of selecting random hosts is not practical - many will not respond at all, and the differences between hosts can be huge, with gaps between ranges of response time - think satellite connections).
However, while the timing will not be well distributed, there will be some level of randomness in the data. Or to put it another way, a level of information entropy is present. It is a fine idea to feed the timing data into a random number generator to seed it. So what level of entropy is present?
For network timing data of say around 50ms, measured to the nearest 0.1ms, with a spread of values of 2ms, you have about 20 values. Rounding down to the nearest power of 2 (16 = 2^4) you have 4 bits of entropy per timing value. If it is for any kind of secure application (such as generating cryptographic keys) then I would be conservative and say it was only 2 or 3 bits of entropy per reading. (Note that I've done a very rough estimate here, and ignored the possibility of attack).
How to generate truly random data
For true random numbers, you need to send the data into something designed along the lines of /dev/random that will collect the entropy, distributing it within a data store (using some kind of hash function, usually a secure one). At the same time, the entropy estimate is increased. So for a 128 bit AES key, 64 ping timings would be required before the entropy pool had enough entropy.
To be more robust, you could then add timing data from the keyboard and mouse usage, hard disk response times, motherboard sensor data (eg temperature), etc. It increases the rate of entropy collection and makes it hard for an attacker to monitor all sources of entropy. And indeed this is what is done with modern systems. The full list of MS Windows entropy sources is listed in the second comment of this post.
More reading
For discussion of the (computer security) attacks on random number generators, and the design of a cryptographically secure random number generator, you could do worse than read the yarrow paper by Bruce Schneier and John Kelsey. (Yarrow is used by BSD and Mac OS X systems).
No.
Unplug the network cable (or /etc/init.d/networking stop) and the entropy basically drops to zero.
Perform a Denial-Of-Service attack on the machine it's pinging and you also get predictable results (the ping-timeout value)
I guess you could. A couple things to watch out for:
Even if pinging random IP addresses, the first few hops (from you to the first real L3 router in the ISP network) will be the same for every packet. This puts a lower bound on the round trip time, even if you ping something in a datacenter in that first Point of Presence. So you have to be careful about normalizing the timing, there is a lower bound on the round trip.
You'd also have to be careful about traffic shaping in the network. A typical leaky bucket implementation in a router releases N bytes every M microseconds, which effectively perturbs your timing into specific timeslots rather than a continuous range of times. So you might need to discard the low order bits of your timestamp.
However I would disagree with the premise that there are not good sources of entropy in commodity hardware. Many x86 chipsets for the last few years have included random number generators. The ones I am familiar with use relatively sensitive ADCs to measure temperature in two different locations on the die, and subtract them. The low order bits of this temperature differential can be shown (via Chi-squared analysis) to be strongly random. As you increase the processing load on the system the overall temperature goes up, but the differential between two areas of the die remains uncorrelated and unpredictable.
The best source of randomness on commodity hardware I've seen, was a guy who removed a filter or something from his webcam, put opaque glue on the lens, and was then able to easily detect individual white pixels from cosmic rays striking the CCD. These are as close to perfectly random as possible, and are protected from external snooping by quantum effects.
Part of a good random number generator is equal probabilities of all numbers as n -> infinity.
So if you are planning to generate random bytes, then with sufficient data from a good rng, each byte should have an equal probability of being returned. Further, there should be no pattern or predictibiltiy (spikes in probability during certain time periods) of certain numbers being returned.
I am not too sure with using ping what you would be measuring to get the random variable, is it response time? If so, you can be pretty sure that some response times, or ranges of response times, will be more frequent than others and hence would make a potentially insecure random number generator.
If you want commodity hardware, your sound card should pretty much do it. Just turn up the volume on an analog input and you have a cheap white noise source. Cheap randomness without the need for a network.
The approach of measuring something to generate a random seed appears to be a pretty good one. The O'Reilly book Practical Unix and Internet Security gives a few similar additional methods of determining a random seed, such as asking the user to type a few keystrokes, and then measuring the time between keystrokes. (The book notes that this technique is used by PGP as a source of its randomness.)
I wonder if the current temperature of a system's CPU (measured out to many decimal places) could be a viable component of a random seed. This approach would have the advantage of not needing to access the network (so the random generator wouldn't become unavailable when the network connection goes down).
However, it's probably not likely that a CPU's internal sensor could accurately measure the CPU temperature out to enough decimal places to make the value truly viable as a random number seed; at least, not with "commodity-class hardware," as mentioned in the question!
It's not as good as using atmospheric noise but it's still truly random since it depends on the characteristics of the network which is notorious for random non-repeatable behavior.
See Random.org for more on randomness.
Here's an attempt at an implementation:
#ips : list = getIpAddresses();
#rnd = PseudorandomNumberGenerator(0 to (ips.count - 1));
#getTrueRandomNumber() { ping(ips[rnd.nextNumber()]).averageTime }
I would sooner use something like ISAAC as a stronger PRNG before trusting round trip pings as entropy. As others have said, it would just be too easy for someone to not only guess your numbers, but also possibly control them to various degrees.
Other great sources of entropy exist, which others have mentioned. One that was not mentioned (which might not be practical) is sampling noise from the on board audio device.. which is usually going to be a little noisy even if no microphone is connected to it.
I went 9 rounds with trying to come up with a strong (and fast) PRNG for a client/server RPC mechanism I was writing. Both sides had an identical key, consisting of 1024 lines of 32 character ciphers. The client would send AUTH xx, the server would return AUTH yy .. and both sides knew which two lines of the key to use to produce the blowfish secret (+ salt). Server would then send a SHA-256 digest of the entire key (encrypted), client knew it was talking to something that had the correct key .. session continued. Yeah, very weak protection for man in the middle, but a public key was out of the question for how the device was being used.
So, you had a non blocking server that had to handle up to 256 connections .. not only did the PRNG have to be strong, it had to be fast. It wasn't such a hardship to use slower methods to gather entropy in the client, but that could not be afforded in the server.
So, I have to ask regarding your idea .. how practical would it be?
No mathmatical computation can produce a random result but in the "real world" computers don't exactly just crunch numbers... With a little bit of creativity it should be possible to produce random results of the kind where there is no known method of reproducing or predicting exact outcomes.
One of the easiest to implement ideas I've seen which works universally on all systems is to use static from the computers sound card line in/mic port.
Other ideas include thermal noise and low level timing of cache lines. Many modern PCs with TPM chips have encryption quality hardware random number generators already onboard.
My kneejerk reaction to ping (esp if using ICMP) is that your cheating too blatently. At that point you might as well whip out a giger counter and use background radiation as your random source.
Yes, it's possible, but... the devil's in the details.
If you're going to generate a 32-bit integer, you need to gather >32 bits of entropy (and use a sufficient mixing function to get that entropy spread around, but that's known and doable). The big question that is:
how much entropy do ping times have?
The answer to this question depends on all sorts of assumptions about the network and your attack model, and there's different answers in different circumstances.
If attackers are able to totally control ping times, you get 0 bits of entropy per ping, and you can't ever total 32-bits of entropy, no matter how much you mix. If they have less than perfect control over ping times, you'll get some entropy, and (if you don't overestimate the amount of entropy you're gathering) will get perfectly random 32-bit numbers.
YouTube shows a device in action: http://www.youtube.com/watch?v=7n8LNxGbZbs
Random is, if nobody can predict the next state.
Though i cant definitively site for or against, this implementation has its issues.
Where are these IP Addresses coming from, if they are randomly selected, what happens when they do not reply or are late in replying, does that mean the random number will be slower to appear.
Also, even if you would make a visual graph of 100.000 results and calculated that there are no or few correlations between the numbers, does not mean it is truly random. As explained by dilbert :)
It doesn't strike me as a good source of randomness.
What metric would you use -- the obvious one is response time, but the range of values you can reasonably expect is small: a few tens of milliseconds to a few thousand. The response times themselves will follow a bell curve and not be randomly distributed across any interval (how would you choose the interval?) so you would have to try and select a few 'random' bits from the numbers.
The LSB might give you a random bit stream, but you would have to consider clock granularity issues - maybe due to how interrupts work you would always get multiples of 2ms on some systems.
There are probably much better 'interesting' ways of getting random bits -- maybe google for a random word, grab the first page and choose the Nth bit from the page.
Eh, I find that this kind of question leads into discussions about the meaning of 'truly random' pretty quickly.
I think that measuring pings would yield decent-quality random bits, but at an insufficient rate to be of much use (unless you were willing to do some serious DDOSing).
And I don't see that it would be any more random than measuring analogue/mechanical properties of the computer, or the behaviour of the meatbag operating it.
(edit) On a practical note, this approach opens you up to the possibility of someone on your network manipulating your 'random' number generator.
It seems to me that true randomness is ineffable - there is no way to know whether a sequence is random, since by definition it can contain anything no matter how improbable. Guaranteeing a particular distribution pattern reduces the randomness. The word "pattern" is a bit of a giveaway.
I MADE U A RANDOM NUMBER
BUT I EATED IT
Randomness is not a binary property -- it's a value between 0 and 1 that describes how difficult it is to predict the next value in a stream.
Asking "how random can my values be if I base them on pings?" is actually asking "how random are pings?". You can estimate that by gathering a large enough set of data (1 mln pings for example) and mapping their distribution curve and behavior in time. If the distribution is flat and the behavior is difficult to predict, the data seems more random. The more bumpy distribution or predictable behavior suggest lower randomness.
You should also consider the sample resolution. I could imagine the results being rounded in some way to a milisecond, so with pings you could have integer values between 0 and 500. That's not a lot of resolution.
On the practical side, I would recommend against it, since pings can be predicted and manipulated, further reducing their randomness.
Generally, I suggest against "rolling your own" randomness generators, encryption methods and hashing algorithms. As fun as it seems, it's mostly a lot of very intimidating math.
As to how to build a really good entropy generator -- I think that's probably going to have to be a sealed box that outputs some sort of result of interactions on atomic or sub-atomic level. I mean, if you're using a source of entropy that the enemy can easily read too, he only needs to find out your algorythm. Any form of connection is a possible attack vector, so you should place the source of entropy as close to the service that consumes it as possible.
You can use the XKCD method:
I got some code that creates random numbers with traceroute. I also have a program that does it using ping. I did it over a year ago for a class project. All it does is run traceroute on and address and it takes the least sig digit of the ms times. It works pretty well at getting random numbers but I really don't know how close it is to true random.
Here is a list of 8 numbers that I got when I ran it.
455298558263758292242406192
506117668905625112192115962
805206848215780261837105742
095116658289968138760389050
465024754117025737211084163
995116659108459780006127281
814216734206691405380713492
124216749135482109975241865
#include <iostream>
#include <string>
#include <stdio.h>
#include <cstdio>
#include <stdlib.h>
#include <vector>
#include <fstream>
using namespace std;
int main()
{
system("traceroute -w 5 www.google.com >> trace.txt");
string fname = "trace.txt";
ifstream in;
string temp;
vector<string> tracer;
vector<string> numbers;
in.open(fname.c_str());
while(in>>temp)
tracer.push_back(temp);
system("rm trace.txt");
unsigned index = 0;
string a = "ms";
while(index<tracer.size())
{
if(tracer[index]== a)
numbers.push_back(tracer[index-1]);
++index;
}
std::string rand;
for(unsigned i = 0 ; i < numbers.size() ; ++i)
{
std::string temp = numbers[i];
int index = temp.size();
rand += temp[index - 1];
}
cout<<rand<<endl;
return 0;
}
Very simply, since networks obey prescribed rules, the results are not random.
The webcam idea sounds (slightly) reasonable. Linux people often recommend simply using the random noise from a soundcard which has no mic attached.
here is my suggestion :
1- choose a punch of websites that are as far away from your location as possible. e.g. if you are in US try some websites that have their server IPs in malasia , china , russia , India ..etc . servers with high traffic are better.
2- during times of high internet traffic in your country (in my country it is like 7 to 11 pm) ping those websites many many many times ,take each ping result (use only the integer value) and calculate modulus 2 of it ( i.e from each ping operation you get one bit : either 0 or 1).
3- repeat the process for several days ,recording the results.
4- collect all the bits you got from all your pings (probably you will get hundreds of thousands of bits ) and choose from them your bits . (maybe you wanna choose your bits by using some data from the same method mentioned above :) )
BE CAREFUL : in your code you should check for timeout ..etc

True random number generator [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Sorry for this not being a "real" question, but Sometime back i remember seeing a post here about randomizing a randomizer randomly to generate truly random numbers, not just pseudo random. I dont see it if i search for it.
Does anybody know about that article?
I have to disagree with a lot of the answers to this question.
It is possible to collect random data on a computer. SSL, SSH and VPNs would not be secure if you couldn't.
The way software random number generator work is there is a pool of random data that is gathered from many different places, such as clock drift, interrupt timings, etc.
The trick to these schemes is in correctly estimating the entropy (the posh name for the randomness). It doesn't matter whether the source is bias, as long as you estimate the entropy correctly.
To illustrate this, the chance of me hitting the letter e in this comment is much higher than that of z , so if I were to use key interrupts as a source of entropy it would be bias - but there is still some randomness to be had in that input. You can't predict exactly which sequence of letters will come next in this paragraph. You can extract entropy from this uncertainty and use it part of a random byte.
Good quality real-random generators like Yarrow have quite sophisticated entropy estimation built in to them and will only emit as many bytes as it can reliably say it has in its "randomness pool."
I believe that was on thedailywtf.com - ie. not something that you want to do.
It is not possible to get a truly random number from pseudorandom numbers, no matter how many times you call randomize().
You can get "true" random numbers from special hardware. You could also collect entropy from mouse movements and things like that.
At the end of the post, I will answer your question of why you might want to use multiple random number generators for "more randomness".
There are philosophical debates about what randomness means. Here, I will mean "indistinguishable in every respect from a uniform(0,1) iid distribution over the samples drawn" I am totally ignoring philosophical questions of what random is.
Knuth volume 2 has an analysis where he attempts to create a random number generator as you suggest, and then analyzes why it fails, and what true random processes are. Volume 2 examines RNGs in detail.
The others recommend you using random physical processes to generate random numbers. However, as we can see in the Espo/vt interaction, these processes can have subtle periodic elements and other non-random elements, in part due to outside factors with deterministic behavior. In general, it is best never to assume randomness, but always to test for it, and you usually can correct for such artifacts if you are aware of them.
It is possible to create an "infinite" stream of bits that appears completely random, deterministically. Unfortunately, such approaches grow in memory with the number of bits asked for (as they would have to, to avoid repeating cycles), so their scope is limited.
In practice, you are almost always better off using a pseudo-random number generator with known properties. The key numbers to look for is the phase-space dimension (which is roughly offset between samples you can still count on being uniformally distributed) and the bit-width (the number of bits in each sample which are uniformally random with respect to each other), and the cycle size (the number of samples you can take before the distribution starts repeating).
However, since random numbers from a given generator are deterministically in a known sequence, your procedure might be exposed by someone searching through the generator and finding an aligning sequence. Therefore, you can likely avoid your distribution being immediately recognized as coming from a particular random number generator if you maintain two generators. From the first, you sample i, and then map this uniformally over one to n, where n is at most the phase dimension. Then, in the second you sample i times, and return the ith result. This will reduce your cycle size to (orginal cycle size/n) in the worst case, but for that cycle will still generate uniform random numbers, and do so in a way that makes the search for alignment exponential in n. It will also reduce the independent phase length. Don't use this method unless you understand what reduced cycle and independent phase lengths mean to your application.
An algorithm for truly random numbers cannot exist as the definition of random numbers is:
Having unpredictable outcomes and, in
the ideal case, all outcomes equally
probable; resulting from such
selection; lacking statistical
correlation.
There are better or worse pseudorandom number generators (PRNGs), i.e. completely predictable sequences of numbers that are difficult to predict without knowing a piece of information, called the seed.
Now, PRNGs for which it is extremely hard to infer the seed are cryptographically secure. You might want to look them up in Google if that is what you seek.
Another way (whether this is truly random or not is a philosophical question) is to use random sources of data. For example, unpredictable physical quantities, such as noise, or measuring radioactive decay.
These are still subject to attacks because they can be independently measured, have biases, and so on. So it's really tricky. This is done with custom hardware, which is usually quite expensive. I have no idea how good /dev/random is, but I would bet it is not good enough for cryptography (most cryptography programs come with their own RNG and Linux also looks for a hardware RNG at start-up).
According to Wikipedia /dev/random, in Unix-like operating systems, is a special file that serves as a true random number generator.
The /dev/random driver gathers environmental noise from various non-deterministic sources including, but not limited to, inter-keyboard timings and inter-interrupt timings that occur within the operating system environment. The noise data is sampled and combined with a CRC-like mixing function into a continuously updating ``entropy-pool''. Random bit strings are obtained by taking a MD5 hash of the contents of this pool. The one-way hash function distills the true random bits from pool data and hides the state of the pool from adversaries.
The /dev/random routine maintains an estimate of true randomness in the pool and decreases it every time random strings are requested for use. When the estimate goes down to zero, the routine locks and waits for the occurrence of non-deterministic events to refresh the pool.
The /dev/random kernel module also provides another interface, /dev/urandom, that does not wait for the entropy-pool to re-charge and returns as many bytes as requested. As a result /dev/urandom is considerably faster at generation compared to /dev/random which is used only when very high quality randomness is desired.
John von Neumann once said something to the effect of "anyone attempting to generate random numbers via algorithmic means is, of course, living in sin."
Not even /dev/random is random, in a mathematician's or a physicist's sense of the word. Not even radioisotope decay measurement is random. (The decay rate is. The measurement isn't. Geiger counters have a small reset time after each detected event, during which time they are unable to detect new events. This leads to subtle biases. There are ways to substantially mitigate this, but not completely eliminate it.)
Stop looking for true randomness. A good pseudorandom number generator is really what you're looking for.
If you believe in a deterministic universe, true randomness doesn't exist. :-) For example, someone has suggested that radioactive decay is truly random, but IMHO, just because scientists haven't yet worked out the pattern, doesn't mean that there isn't a pattern there to be worked out. Usually, when you want "random" numbers, what you need are numbers for encryption that no one else will be able to guess.
The closest you can get to random is to measure something natural that no enemy would also be able to measure. Usually you throw away the most significant bits, from your measurement, leaving numbers with are more likely to be evenly spread. Hard core random number users get special hardware that measures radioactive events, but you can get some randomness from the human using the computer from things like keypress intervals and mouse movements, and if the computer doesn't have direct users, from CPU temperature sensors, and from network traffic. You could also use things like web cams and microphones connected to sound cards, but I don't know if anyone does.
To summarize some of what has been said, our working definition of what a secure source of randomness is is similar to our definition of cryptographically secure: it appears random if smart folks have looked at it and weren't able to show that it isn't completely unpredictable.
There is no system for generating random numbers which couldn't conceivably be predicted, just as there is no cryptographic cipher that couldn't conceivably be cracked. The trusted solutions used for important work are merely those which have proven to be difficult to defeat so far. If anyone tells you otherwise, they're selling you something.
Cleverness is rarely rewarded in cryptography. Go with tried and true solutions.
A computer usually has many readily available physical sources of random noise:
Microphone (hopefully in a noisy place)
Compressed video from a webcam (pointed to something variable, like a lava lamp or a street)
Keyboard & mouse timing
Network packet content and timing (the whole world contributes)
And sometimes
Clock drift based hardware
Geiger counters and other detectors of rare events
All sorts of sensors attached to A/D converters
What's difficult is estimating the entropy of these sources, which is in most cases low despite the high data rates and very variable; but entropy can be estimated with conservative assumptions, or at least not wasted, to feed systems like Yarrow or Fortuna.
It's not possible to obtain 'true' random numbers, a computer is a logical construct that can't possibly create 'truly' random anything, only pseudo-random. There are better and worse pseudo-random algorithms out there, however.
In order to obtain a 'truly' random number you need a physical random source, some gambling machines actually have these built in - often it's a radioactive source, the radioactive decay (which as far as I know is truly random) is used to generate the numbers.
One of the best method to generate a random number is through Clock Drift. This primarily works with two oscillators.
An analogy of how this works is imagine a race car on a simple oval circuit with a while line at the start of the lap and also a while line on one of the tyres. When the car completes a lap, a number will be generated based on the difference between the position of the white line on the road and on the tyre.
Very easy to generate and impossible to predict.

Resources