Related
I know that for most programs, a pseudo-random number is sufficient but there are ways that machines can generate truly random numbers. For example, devices that generate unpredictable processes. But, they tend to be biased somehow. So, is it possible to use or make devices that can generate unpredictable processes and being unbiased?
Yes. You can buy cards or USB devices to plug into your computer that use physical effects, such as quantum mechanics, to generate true random numbers. A simple Geiger counter would be an example, though commercially available devices are more complex with a higher bit rate.
Google "Quantis RNG" for some examples.
What are typical means by which a random number can be generated in an embedded system? Can you offer advantages and disadvantages for each method, and/or some factors that might make you choose one method over another?
First, you have to ask a fundamental question: do you need unpredictable random numbers?
For example, cryptography requires unpredictable random numbers. That is, nobody must be able to guess what the next random number will be. This precludes any method that seeds a random number generator from common parameters such as the time: you need a proper source of entropy.
Some applications can live with a non-cryptographic-quality random number generator. For example, if you need to communicate over Ethernet, you need a random number generator for the exponential back-off; statistic randomness is enough for this¹.
Unpredictable RNG
You need an unpredictable RNG whenever an adversary might try to guess your random numbers and do something bad based on that guess. For example, if you're going to generate a cryptographic key, or use many other kinds of cryptographic algorithms, you need an unpredictable RNG.
An unpredictable RNG is made of two parts: an entropy source, and a pseudo-random number generator.
Entropy sources
An entropy source kickstarts the unpredictability. Entropy needs to come from an unpredictable source or a blend of unpredictable sources. The sources don't need to be fully unpredictable, they need to not be fully predictable. Entropy quantifies the amount of unpredictability. Estimating entropy is difficult; look for research papers or evaluations from security professionals.
There are three approaches to generating entropy.
Your device may include some non-deterministic hardware. Some devices include a dedicated hardware RNG based on physical phenomena such as unstable oscillators, thermal noise, etc. Some devices have sensors which capture somewhat unpredictable values, such as the low-order bits of light or sound sensors.
Beware that hardware RNG often have precise usage conditions. Most methods require some time after power-up before their output is truly random. Often environmental factors such as extreme temperatures can affect the randomness. Read the RNG's usage notes very carefully. For cryptographic applications, it is generally recommended to make statistical tests the HRNG's output and refuse to operate if these tests fail.
Never use a hardware RNG directly. The output is rarely fully unpredictable — e.g. each bit may have a 60% probability of being 1, or the probability of two consecutive bits being equal may be only 48%. Use the hardware RNG to seed a PRNG as explained below.
You can preload a random seed during manufacturing and use that afterwards. Entropy doesn't wear off when you use it²: if you have enough entropy to begin with, you'll have enough entropy during the lifetime of your device. The danger with keeping entropy around is that it must remain confidential: if the entropy pool accidentally leaks, it's toast.
If your device has a connection to a trusted third party (e.g. a server of yours, or a master node in a sensor network), it can download entropy from that (over a secure channel).
Pseudo-random number generator
A PRNG, also called deterministic random bit generator (DRBG), is a deterministic algorithm that generates a sequence of random numbers by transforming an internal state. The state must be seeded with sufficient entropy, after which the PRNG can run practically forever. Cryptographic-quality PRNG algorithms are based on cryptographic primitives; always use a vetted algorithm (preferably some well-audited third-party code if available).
The PRNG needs to be seeded with entropy. You can choose to inject entropy once during manufacturing, or at each boot, or periodically, or any combination.
Entropy after a reboot
You need to take care that the device doesn't boot twice in the same RNG state: otherwise an observer can repeat the same sequence of RNG calls after a reset and will know the RNG output the second time round. This is an issue for factory-injected entropy (which by definition is always the same) as well as for entropy derived from sensors (which takes time to accumulate).
If possible, save the RNG state to persistent storage. When the device boots, read the RNG state, apply some transformation to it (e.g. by generating one random word), and save the modified state. After this is done, you can start returning random numbers to applications and system services. That way, the device will boot with a different RNG state each time.
If this is not possible, you ned to be very careful. If your device has factory-injected entropy plus a reliable clock, you can mix the clock value into the RNG state to achieve unicity; however, beware that if your device loses power and the clock restarts from some fixed origin (blinking twelve), you'll be in a repeatable state.
Predictable RNG state after a reset or at the first boot is a common problem with embedded devices (and with servers). For example, a study of RSA public keys showed that many had been generated with insufficient entropy, resulting in many devices generating the same key³.
Statistical RNG
If you can't achieve a cryptographic quality, you can fall back to a less good RNG. You need to be aware that some applications (including a lot of cryptography) will be impossible.
Any RNG relies on a two-part structure: a unique seed (i.e. an entropy source) and a deterministic algorithm based on that seed.
If you can't gather enough entropy, at least gather as much as possible. In particular, make sure that no two devices start from the same state (this can usually be achieved by mixing the serial number into the RNG seed). If at all possible, arrange for the seed not to repeat after a reset.
The only excuse not to use a cryptographic DRBG is if your device doesn't have enough computing power. In that case, you can fall back to faster algorithm that allow observers to guess some numbers based on the RNG's past or future output. The Mersenne twister is a popular choice, but there have been improvements since its invention.
¹ Even this is debatable: with non-crypto-quality random backoff, another device could cause a denial of service by aligning its retransmission time with yours. But there are other ways to cause a DoS, by transmitting more often.
² Technically, it does, but only at an astronomical scale.
³ Or at least with one factor in common, which is just as bad.
One way to do it would be to create a Pseudo Random Bit Sequence, just a train of zeros and ones, and read the bottom bits as a number.
PRBS can be generated by tapping bits off a shift register, doing some logic on them, and using that logic to produce the next bit shifted in. Seed the shift register with any non zero number. There's a math that tells you which bits you need to tap off of to generate a maximum length sequence (i.e., 2^N-1 numbers for an N-bit shift register). There are tables out there for 2-tap, 3-tap, and 4-tap implementations. You can find them if you search on "maximal length shift register sequences" or "linear feedback shift register.
from: http://www.markharvey.info/fpga/lfsr/
HOROWITZ AND HILL gave a great part of a chapter on this. Most of the math surrounds the nature of the PRBS and not the number you generate with the bit sequence. There are some papers out there on the best ways to get a number out of the bit sequence and improving correlation by playing around with masking the bits you use to generate the random number, e.g., Horan and Guinee, Correlation Analysis of Random Number Sequences based on Pseudo Random Binary Sequence Generation, In the Proc. of IEEE ISOC ITW2005 on Coding and Complexity; editor M.J. Dinneen; co-chairs U. Speidel and D. Taylor; pages 82-85
An advantage would be that this can be achieved simply by bitshifting and simple bit logic operations. A one-liner would do it. Another advantage is that the math is pretty well understood. A disadvantage is that this is only pseudorandom, not random. Also, I don't know much about random numbers, and there might be better ways to do this that I simply don't know about.
How much energy you expend on this would depend on how random you need the number to be. If I were running a gambling site, and needed random numbers to generate deals, I wouldn't depend on Pseudo Random Bit Sequences. In those cases, I would probably look into analog noise techniques, maybe Johnson Noise around a big honking resistor or some junction noise on a PN junction, amplify that and sample it. The advantages of that are that if you get it right, you have a pretty good random number. The disadvantages are that sometimes you want a pseudorandom number where you can exactly reproduce a sequence by storing a seed. Also, this uses hardware, which someone must pay for, instead of a line or two of code, which is cheap. It also uses A/D conversion, which is yet another peripheral to use. Lastly, if you do it wrong -- say make a mistake where 60Hz ends up overwhelming your white noise-- you can get a pretty lousy random number.
What are typical means by which a random number can be generated in an embedded system?
Giles indirectly stated this: it depends on the use.
If you are using the generator to drive a simulation, then all you need is a uniform distribution and a linear congruential generator (LCG) will work fine.
If you need a secure generator, then its a trickier problem. I'm side-stepping what it means to be secure, but from 10,000 feet think "wrap it in a cryptographic transformation", like a SHA-1/HMAC or SHA-512/HMAC. There are others ways, like sampling random events, but they may not be viable.
When you need secure random numbers, some low resource devices are notoriously difficult to work with. See, for example, Mining Your Ps and Qs: Detection of Widespread Weak Keys in Network Devices and Traffic sensor flaw that could allow driver tracking fixed. And a caveat for Linux 3.0 kernel users: the kernel removed a couple of entropy sources, so entropy depletion and starvation might have gotten worse. See Appropriate sources of entropy on LWN.
If you have a secure generator, then your problem becomes getting your hands on a good seed (or seeds over time). One of the better methods I have seen for environments that are constrained is Hedging. Hedging was proposed for Virtual Machines where a program could produce the same sequence after a VM reset.
The idea for hedging is to extract the randomness provided by your peer, and use it to keep you secure generator fit. For example, in the case of TLS, there is a client_random and a server_random. If the device is a server, then it would stir in the client_random. If the device is a client, then it would stir in server_random.
You can find the two papers of interest that address hedging at:
When Good Randomness Goes Bad: Virtual Machine Reset
Vulnerabilities and Hedging Deployed Cryptography
When Virtual is Harder than Real: Resource Allocation Challenges in
Virtual Machine Based IT Environments
Using client_random and a server_random is consistent with Peter Guttman's view on the subject: "mix every entropy source you can get your hands on into your PRNG, including less-than-perfect ones". Gutmann is the author of Engineering Security.
Hedging only solves part of the problem. You will still need to solve other problems, like how to bootstrap the entropy pool, how to regenerate system key pairs when the pool is in a bad state, and how persist the entropy across reboots when there's no filesystem.
Although it may not be the most complex or sound method, it can be fun to use external stimuli as your seed for random number generation. Consider using analogue input from a photodiode, or a thermistor. Even random noise from a floating pin could potentially yield some interesting results.
I have to build an One Time Pad system and for that, I have to build my own TRNG. I want to know how to make record atmospheric noise and use that to generate random numbers. I've tried so far to record a .wav file and read it in Java, but the values don't seem very...random. Any suggestions? I know about Random.org, but I can't really use their generators, I have to build my own, so what I want is some insight into how the folks at Random.org have built their numbers generator, with atmospheric noise as a source of 'randomness'.
Non Real-time solution
What you can do is record the audio surrounding the room before in and save a temporary WAV file. If you know how the WAV file works which is based on the RIFF specification. Then strip the WAV header which is 44 bytes in length. Then read the audio bytes and do the proper conversions depending on whether you want to generate WORDS, DWORDS, or BYTES, it is up to you. Then you should have some random values to work with. Then use those random values accordingly.
Real-time solution
Since I do not know whether you want to program this in Java or some other language. In addition, I do not know the intended platform; so I cannot recommend you any realtime audio processing libraries.
For C# you can use NAudio and you can record the audio in realtime and recieve the audio bytes. Then you can convert the audio bytes into either a DWORD, QWORD, WORD, etc. You should be able to have some random values. Remember to stop recording and to release unmanaged resources when generating random numbers has ceased.
Good Resources On The WAV File Specification
Link to the specification (Easy to understand)
The answer is unknown and probably intentionally so. Although hard to be sure, the site seems to be a combination of charity and for-profit work. Each radio source only produces a few Kbps of random data. How he describes it in many links, I don't see evidence of a CSRNG. It doesn't matter. For OTP purposes, if it's not truly random, it's a glorified stream cipher. (I think that's what Bruce and others have always said.)
I find it hard to recall when a good CSRNG was broken. I'd recommend you use something like ISAAC or a properly implemented block/stream cipher. Perfect Paper Passwords does this. Use a Fortuna construction with the internals of Fortuna using the above ciphers/algorithms to produce the majority of the random data. The Fortuna system can regularly have data injected into it by a TRNG. The very best TRNG on a budget is random.org plus locally generated stuff. The best cheap, hardware solution is a VIA Artigo board with VIA Padlock (TRNG + acceleration for SHA-1, SHA256, AES, & RSA) for $300. They have libraries to help you use things, too. (There's even a pseudo-TRNG that uses processor timing under network load.)
Remember, the crypto is usually the strongest link in the chain. System security exists on many levels: processor, firmware, peripheral firmware (esp DMA), kernel mode code, OS, trusted middleware or OS functions, application. Security as a whole includes users, policy, physical security, EMSEC, etc. Anyone worrying way too much about RNG's is usually wasting effort. Just use an accepted solution or something I mentioned above. Then, focus on the rest. Especially, how people and systems interact. Configuration, patching, choice of OS, policies. Most problems happen there.
I recall an article on random.org that I can't seem to find now. I all remember is that they used the lsb of the noise they were measuring. The MSBs will certainly not be random. Then then generated a string of 1s and 0's based on the lsb. Don't do something silly like a simple binary conversion, that won't work. You maybe have to sample the noise in binary, to make the distribution of the lsb have a more uniform sampling.
The trick they used to ensure an even distribution was to not use this string of 1's and 0's as the random numbers. Instead they would parse the string, 2 bits at a time. Every time the bits matched (ie 00 or 11) they added a 1 to their random string. Every times the bits flipped (ie 01 or 10) they added a 0 to their random string.
If you make your own TRNG, make sure you verify it!
It is hardly possible to get real random numbers out of software. Even the static in your wav file is likely to be influenced by periodic EMI generated by your computer and is therefore not purely random.
Can you use special hardware or are you forced to stick to pure software? Why won't pseudo random numbers satisfy your needs? They will do fine on a relatively small number of random samples. Because you want to use the random numbers in an OTP, I guess you won't be using it in a big scale.
Can you provide a little more detail?
The atmospheric noise approach to generating random numbers is complex because the atmosphere is filled with non-random signals, all of which pollute the entropy you seek. There is an easier way.
Chances are good your CPU already contains a true random number generator, assuming you have an Intel Ivy Bridge-based Core/Xeon processor, which became available in April, 2012. (The new Haswell architecture also has this feature).
Intel's random generator exploits the random effects of thermal noise inside an unstable digital circuit. Thermal noise is just random atomic vibrations, which is pretty much the same underlying physical phenomenon that Random.org uses when it samples atmospheric noise. The sampled random bits go through a sophisticated conditioning and testing process to eliminate pollution from non-random signals. I highly recommend this excellent article on IEEE Spectrum which describes the process in detail.
Intel added a new x86 instruction called RDRAND that allows programs to directly retrieve these random numbers. Although Java does not yet support direct access to RDRAND, it's possible using JNI. This is the approach I took with the drnglib project. For example:
DigitalRandom random = new DigitalRandom();
System.out.println(random.nextInt());
The nextInt() method is implemented as a JNI native call that invokes RDRAND. The performance is pretty good considering the quality of randomness. Using eight threads, I've generated ~760 MB/sec of random data.
True random number generators (TRNGs) are usually from natural sources like seismic signals, non-stationary bio-signals, etc. The two issues faced by these generators are:
1) The data points are non-uniformly distributed
2) It takes very long time to generate large sequence of numbers (specially when the requirement is in millions).
However, the most important advantage on their part is their unpredictable nature. To overcome their issues and to retain its advantage, it is better to fuse the output of TRNG to seed a pseudo-random number generator. For this, you may try using the amplitude values of atmospheric noise at random time points and use it to seed a PRNG.
This will help you to get large numbers of uniformly distributed values. As the seed is unpredictable, the output of PRNG too becomes unpredictable.
Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 6 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Given the extremely high requirements for unpredictability to prevent casinos from going bankrupt, what random number generation algorithm and seeding scheme is typically used in devices like slot machines, video poker machines, etc.?
EDIT: Related questions:
Do stateless random number generators exist?
True random number generator
Alternative Entropy Sources
There are many things that the gaming sites have to consider when choosing/implementing an RNG. Without due dilligence, it can go spectacularly wrong.
To get a licence to operate a gaming site in a particular jurisdiction usually requires that the RNG has been certified by an independent third-party. The third-party testers will analyse the source code and run statistical tests (e.g. Diehard) to ensure that the RNG behaves randomly. Reputable poker sites will usually include details of the certification that their RNG has undergone (for example: PokerStars's RNG page).
I've been involved in a few gaming projects, and for one of them I had to design and implement the RNG part, so I had to investigate all of these issues. Most poker sites will use some hardware device for entropy, but they won't rely on just hardware. Usually it will be used in conjunction with a pseudo-RNG (PRNG). There are two main reasons for this. Firstly, the hardware is slow, it can only extract a certain number of bits of entropy in a given time period from whatever physical process it is monitoring. Secondly, hardware fails in unpredictable ways that software PRNGs do not.
Fortuna is the state of the art in terms of cryptographically strong PRNGs. It can be fed entropy from one or more external sources (e.g. a hardware RNG) and is resilient in the face of attempted exploits or RNG hardware failure. It's a decent choice for gaming sites, though some might argue it is overkill.
Pokerroom.com used to just use Java's SecureRandom (they probably still do, but I couldn't find details on their site). This is mostly good enough, but it does suffer from the degrees of freedom problem.
Most stock RNG implementations (e.g. Mersenne Twister) do not have sufficient degrees of freedom to be able to generate every possible shuffle of a 52-card deck from a given initial state (this is something I tried to explain in a previous blog post).
EDIT: I've answered mostly in relation to online poker rooms and casinos, but the same considerations apply to physical video poker and video slots machines in real world casinos.
For a casino gaming applications, I think the seeding of the algorithm is the most important part to make sure all games "booted" up don't run through the same sequence or some small set of predictable sequences. That is, the source of entropy leading to the seed for the starting position is the critical thing. Beyond that, any good quality random number generator where each bit position as has a ~50/50 probability of being 1/0 and the period is relatively long would be sufficient. For example, something like the Mersenne twister PRNG has such properties.
Using cryptographically secure random generators only becomes important when the actual output of the random generator can be viewed directly. For example, if you were monitoring each number actually generated by the number generator - after viewing many numbers in the sequence - with a non-cryptographic generator information about that sequence can lead to establishing information about all the internal state of the generator. At this point, if you know what the algorithm looks like, you would be able to predict future numbers and that would be bad. The cryptographic generator prevents that reverse engineering back to the internal state so that predicting future numbers becomes "impossible".
However, in the case of a casino game, you would (or should) have no visibility to the actual numbers being generated under the hood. Each time a random number is generated - say a 32-bit number - that number will be used then, for example, mod 52 for a deck shuffling algorithm....no where in that process do you have any idea what numbers were being generated by the algorithm to shuffle that deck. That is, most of the bits of "randomness" is just being thrown out and even the ones being used you have no visibility to. Therefore, no way to reverse engineer the state.
Getting back to a true source of entropy to seed the whole process, that is the hard part. See the Wikipedia entry on entropy for some starting points on techniques.
As an aside, if you did want cryptographically sequence random numbers from a "regular" algorithm, a simple approach is to take a few random numbers in sequence, concatenate them together and then run something like MD5 or SHA-1 on them and the result is just as random and also cryptographically secure. That is, you just made your own "secure" random number generator.
We've been using the Protego R210-USB TRNG (and the non-usb version before that) as random seed generators in casino applications, with java.security.SecureRandom
on top. We had The Swedish National Laboratory of Forensic Science perform a separate audit of the R210, and it passed without a flaw.
You probably need a cryptographically secure pseudo-random generator. There are a lot of variants. Google "Blum-Blum-Shub", for example.
The security properties of these pseudo-random generators will generally be that, even when the attacker can observe polynomially many outputs from such generators, it won't be feasible to guess the next output with a probability much better than random guessing. Also, it is not feasible to distinguish the output of such generators from truly random bits. The security holds even when all the algorithms and parameters are known by the attacker (except for the secret seed).
The security of the generators is often measured with respect to a security parameter. In the case of BBS, it is the size of the modulus. This is no different from other crypto stuff. For example, RSA is secure only when the key is long enough.
Note that, the output of such generators may not be uniform (in fact, can be far away from uniform in statistical sense). But since no one can distinguish the two distributions without infinite computing power, these generators will suffice in most applications that require truly random bits.
Bear in mind, however, that these cryptographically secure pseudo-random generators are usually slow. So if speed is indeed a concern, less rigorous approaches may be more relevant, such as using hash functions, as suggested by Jeff.
Casino slot machines generate random numbers continuously at very high speed and use the most recent result(s) when the user pulls the lever (or hits the button) to spin the reels.
Even a simplistic generator can be used. Even if you knew the algorithm used, you cannot observe where in the sequence it is because nearly all the results are discarded. If somehow you did know where it was in the sequence, you'd have to have millisecond or better timing to take advantage of it.
Modern "mechanical reel" machines use PRNGs and drive the reels with stepper motors to simulate the old style spin-and-brake.
http://en.wikipedia.org/wiki/Slot_machine#Random_number_generators
http://entertainment.howstuffworks.com/slot-machine3.htm
Casinos shouldn't be using Pseudo-random number generators, they should be using hardware ones: http://en.wikipedia.org/wiki/Hardware_random_number_generator
I suppose anything goes for apps and offshore gambling nowadays, but all these other answers are incomplete, at least for Nevada Gaming Control Board licensed machines, which I believe the question is originally about.
The technical specifications for RNGs licensed in Nevada for gaming purposes are laid out in Regulation 14.040(2).
As of May 24, 2012, here is a summary of the rules a RNG must follow:
Static seeds cannot be used. You have to seed the RNG using a millisecond time source or other true entropy source which has no external readout anywhere on the machine. (This helps to reduce the incidence of "magic number" attacks)
The RNG must continue generating numbers in its sequence at least 100 times per second when the game is not being played. (This helps to avoid timing attacks)
RNG outputs cannot be reused; they must be used exactly once if at all and then thrown away.
Multi-system cabinets must use a separate RNG and separate seed for each game.
Games that use RNGs for helping to choose numbers on behalf of the player (such as Lotto Quick Pick) must use a separate RNG for that process.
Games must not roll RNGs until they are actually needed in a game. (i.e. you need to wait until the player chooses to deal or spin before generating RNGs)
The RNG must pass a 95% confidence chi-squared test based on 10,000 trials as apart of a system test. It must display a warning if this test fails, and it must disable play if it fails twice in a row.
It must remember, and be able to report on, the last 10 test results as described in 7.
Every possible game outcome must be generatable by the RNG. As a pessimal example, linear congruential generators don't actually generate every possible output in their range, so they're not very useful for gaming.
Additionally, your machine design has to be submitted to the gaming commission and it has to be approved, which is expensive and takes lots of time. There are a few third-party companies that specialize in auditing your new RNG to make sure it's random. Gaming Laboratories publishes an even stricter set of standards than Nevada does. They go into much greater detail about the limitations of hardware RNGs, and Nevada in particular likes to see core RNGs that it's previously approved. This can all get very expensive, which is why many developers prefer to license an existing previously-approved RNG for new game projects.
Here is a fun list of random number generator attacks to keep you up late at night.
For super-nerds only: the source for most USB hardware RNGs is typically an avalanche diode. However, the thermal noise produced by this type of diode is not quantum-random, and it is possible to influence the randomness of an avalanche diode by significantly lowering the temperature.
As a final note, someone above recommended just using a Mersenne Twister for random number generation. This is a Bad Idea unless you are taking additional entropy from some other source. The plain vanilla Mersenne Twister is highly inappropriate for gaming and cryptographic applications, as described by its creator.
If you want to do it properly you have to get physical - ERNIE the UK national savings number picker uses a shot noise in Neon tubes.
I for sure have seen a german gambling machine that was not allowed to be ran commercially after a given date, so I suppose it was a PNRG with a looong one time pad seed list.
Most poker sites use hardware random number generators. They will also modify the output to remove any scaling bias and often use 'pots' of numbers which can be 'stirred' using entropic events (user activity, serer i/o events etc). Quite often the resultant numbers just index pre-generated decks (starting off as a sorted list of cards).
The question posed came about during a 2nd Year Comp Science lecture while discussing the impossibility of generating numbers in a deterministic computational device.
This was the only suggestion which didn't depend on non-commodity-class hardware.
Subsequently nobody would put their reputation on the line to argue definitively for or against it.
Anyone care to make a stand for or against. If so, how about a mention as to a possible implementation?
No.
A malicious machine on your network could use ARP spoofing (or a number of other techniques) to intercept your pings and reply to them after certain periods. They would then not only know what your random numbers are, but they would also control them.
Of course there's still the question of how deterministic your local network is, so it might not be as easy as all that in practice. But since you get no benefit from pinging random IPs on the internet, you might just as well draw entropy from ethernet traffic.
Drawing entropy from devices attached to your machine is a well-studied principle, and the pros and cons of various kinds of devices and methods of measuring can be e.g. stolen from the implementation of /dev/random.
[Edit: as a general principle, when working in the fundamentals of security (and the only practical needs for significant quantities of truly random data are security-related) you MUST assume that a fantastically well-resourced, determined attacker will do everything in their power to break your system.
For practical security, you can assume that nobody wants your PGP key that badly, and settle for a trade-off of security against cost. But when inventing algorithms and techniques, you need to give them the strongest security guarantees that they could ever possibly face. Since I can believe that someone, somewhere, might want someone else's private key badly enough to build this bit of kit to defeat your proposal, I can't accept it as an advance over current best practice. AFAIK /dev/random follows fairly close to best practice for generating truly random data on a cheap home PC]
[Another edit: it has suggested in comments that (1) it is true of any TRNG that the physical process could be influenced, and (2) that security concerns don't apply here anyway.
The answer to (1) is that it's possible on any real hardware to do so much better than ping response times, and gather more entropy faster, that this proposal is a non-solution. In CS terms, it is obvious that you can't generate random numbers on a deterministic machine, which is what provoked the question. But then in CS terms, a machine with an external input stream is non-deterministic by definition, so if we're talking about ping then we aren't talking about deterministic machines. So it makes sense to look at the real inputs that real machines have, and consider them as sources of randomness. No matter what your machine, raw ping times are not high on the list of sources available, so they can be ruled out before worrying about how good the better ones are. Assuming that a network is not subverted is a much bigger (and unnecessary) assumption than assuming that your own hardware is not subverted.
The answer to (2) is philosophical. If you don't mind your random numbers having the property that they can be chosen at whim instead of by chance, then this proposal is OK. But that's not what I understand by the term 'random'. Just because something is inconsistent doesn't mean it's necessarily random.
Finally, to address the implementation details of the proposal as requested: assuming you accept ping times as random, you still can't use the unprocessed ping times as RNG output. You don't know their probability distribution, and they certainly aren't uniformly distributed (which is normally what people want from an RNG).
So, you need to decide how many bits of entropy per ping you are willing to rely on. Entropy is a precisely-defined mathematical property of a random variable which can reasonably be considered a measure of how 'random' it actually is. In practice, you find a lower bound you're happy with. Then hash together a number of inputs, and convert that into a number of bits of output less than or equal to the total relied-upon entropy of the inputs. 'Total' doesn't necessarily mean sum: if the inputs are statistically independent then it is the sum, but this is unlikely to be the case for pings, so part of your entropy estimate will be to account for correlation. The sophisticated big sister of this hashing operation is called an 'entropy collector', and all good OSes have one.
If you're using the data to seed a PRNG, though, and the PRNG can use arbitrarily large seed input, then you don't have to hash because it will do that for you. You still have to estimate entropy if you want to know how 'random' your seed value was - you can use the best PRNG in the world, but its entropy is still limited by the entropy of the seed.]
Random numbers are too important to be left to chance.
Or external influence/manipulation.
Short answer
Using ping timing data by itself would not be truly random, but it can be used as a source of entropy which can then be used to generate truly random data.
Longer version
How random are ping times?
By itself, timing data from network operations (such as ping) would not be uniformly distributed. (And the idea of selecting random hosts is not practical - many will not respond at all, and the differences between hosts can be huge, with gaps between ranges of response time - think satellite connections).
However, while the timing will not be well distributed, there will be some level of randomness in the data. Or to put it another way, a level of information entropy is present. It is a fine idea to feed the timing data into a random number generator to seed it. So what level of entropy is present?
For network timing data of say around 50ms, measured to the nearest 0.1ms, with a spread of values of 2ms, you have about 20 values. Rounding down to the nearest power of 2 (16 = 2^4) you have 4 bits of entropy per timing value. If it is for any kind of secure application (such as generating cryptographic keys) then I would be conservative and say it was only 2 or 3 bits of entropy per reading. (Note that I've done a very rough estimate here, and ignored the possibility of attack).
How to generate truly random data
For true random numbers, you need to send the data into something designed along the lines of /dev/random that will collect the entropy, distributing it within a data store (using some kind of hash function, usually a secure one). At the same time, the entropy estimate is increased. So for a 128 bit AES key, 64 ping timings would be required before the entropy pool had enough entropy.
To be more robust, you could then add timing data from the keyboard and mouse usage, hard disk response times, motherboard sensor data (eg temperature), etc. It increases the rate of entropy collection and makes it hard for an attacker to monitor all sources of entropy. And indeed this is what is done with modern systems. The full list of MS Windows entropy sources is listed in the second comment of this post.
More reading
For discussion of the (computer security) attacks on random number generators, and the design of a cryptographically secure random number generator, you could do worse than read the yarrow paper by Bruce Schneier and John Kelsey. (Yarrow is used by BSD and Mac OS X systems).
No.
Unplug the network cable (or /etc/init.d/networking stop) and the entropy basically drops to zero.
Perform a Denial-Of-Service attack on the machine it's pinging and you also get predictable results (the ping-timeout value)
I guess you could. A couple things to watch out for:
Even if pinging random IP addresses, the first few hops (from you to the first real L3 router in the ISP network) will be the same for every packet. This puts a lower bound on the round trip time, even if you ping something in a datacenter in that first Point of Presence. So you have to be careful about normalizing the timing, there is a lower bound on the round trip.
You'd also have to be careful about traffic shaping in the network. A typical leaky bucket implementation in a router releases N bytes every M microseconds, which effectively perturbs your timing into specific timeslots rather than a continuous range of times. So you might need to discard the low order bits of your timestamp.
However I would disagree with the premise that there are not good sources of entropy in commodity hardware. Many x86 chipsets for the last few years have included random number generators. The ones I am familiar with use relatively sensitive ADCs to measure temperature in two different locations on the die, and subtract them. The low order bits of this temperature differential can be shown (via Chi-squared analysis) to be strongly random. As you increase the processing load on the system the overall temperature goes up, but the differential between two areas of the die remains uncorrelated and unpredictable.
The best source of randomness on commodity hardware I've seen, was a guy who removed a filter or something from his webcam, put opaque glue on the lens, and was then able to easily detect individual white pixels from cosmic rays striking the CCD. These are as close to perfectly random as possible, and are protected from external snooping by quantum effects.
Part of a good random number generator is equal probabilities of all numbers as n -> infinity.
So if you are planning to generate random bytes, then with sufficient data from a good rng, each byte should have an equal probability of being returned. Further, there should be no pattern or predictibiltiy (spikes in probability during certain time periods) of certain numbers being returned.
I am not too sure with using ping what you would be measuring to get the random variable, is it response time? If so, you can be pretty sure that some response times, or ranges of response times, will be more frequent than others and hence would make a potentially insecure random number generator.
If you want commodity hardware, your sound card should pretty much do it. Just turn up the volume on an analog input and you have a cheap white noise source. Cheap randomness without the need for a network.
The approach of measuring something to generate a random seed appears to be a pretty good one. The O'Reilly book Practical Unix and Internet Security gives a few similar additional methods of determining a random seed, such as asking the user to type a few keystrokes, and then measuring the time between keystrokes. (The book notes that this technique is used by PGP as a source of its randomness.)
I wonder if the current temperature of a system's CPU (measured out to many decimal places) could be a viable component of a random seed. This approach would have the advantage of not needing to access the network (so the random generator wouldn't become unavailable when the network connection goes down).
However, it's probably not likely that a CPU's internal sensor could accurately measure the CPU temperature out to enough decimal places to make the value truly viable as a random number seed; at least, not with "commodity-class hardware," as mentioned in the question!
It's not as good as using atmospheric noise but it's still truly random since it depends on the characteristics of the network which is notorious for random non-repeatable behavior.
See Random.org for more on randomness.
Here's an attempt at an implementation:
#ips : list = getIpAddresses();
#rnd = PseudorandomNumberGenerator(0 to (ips.count - 1));
#getTrueRandomNumber() { ping(ips[rnd.nextNumber()]).averageTime }
I would sooner use something like ISAAC as a stronger PRNG before trusting round trip pings as entropy. As others have said, it would just be too easy for someone to not only guess your numbers, but also possibly control them to various degrees.
Other great sources of entropy exist, which others have mentioned. One that was not mentioned (which might not be practical) is sampling noise from the on board audio device.. which is usually going to be a little noisy even if no microphone is connected to it.
I went 9 rounds with trying to come up with a strong (and fast) PRNG for a client/server RPC mechanism I was writing. Both sides had an identical key, consisting of 1024 lines of 32 character ciphers. The client would send AUTH xx, the server would return AUTH yy .. and both sides knew which two lines of the key to use to produce the blowfish secret (+ salt). Server would then send a SHA-256 digest of the entire key (encrypted), client knew it was talking to something that had the correct key .. session continued. Yeah, very weak protection for man in the middle, but a public key was out of the question for how the device was being used.
So, you had a non blocking server that had to handle up to 256 connections .. not only did the PRNG have to be strong, it had to be fast. It wasn't such a hardship to use slower methods to gather entropy in the client, but that could not be afforded in the server.
So, I have to ask regarding your idea .. how practical would it be?
No mathmatical computation can produce a random result but in the "real world" computers don't exactly just crunch numbers... With a little bit of creativity it should be possible to produce random results of the kind where there is no known method of reproducing or predicting exact outcomes.
One of the easiest to implement ideas I've seen which works universally on all systems is to use static from the computers sound card line in/mic port.
Other ideas include thermal noise and low level timing of cache lines. Many modern PCs with TPM chips have encryption quality hardware random number generators already onboard.
My kneejerk reaction to ping (esp if using ICMP) is that your cheating too blatently. At that point you might as well whip out a giger counter and use background radiation as your random source.
Yes, it's possible, but... the devil's in the details.
If you're going to generate a 32-bit integer, you need to gather >32 bits of entropy (and use a sufficient mixing function to get that entropy spread around, but that's known and doable). The big question that is:
how much entropy do ping times have?
The answer to this question depends on all sorts of assumptions about the network and your attack model, and there's different answers in different circumstances.
If attackers are able to totally control ping times, you get 0 bits of entropy per ping, and you can't ever total 32-bits of entropy, no matter how much you mix. If they have less than perfect control over ping times, you'll get some entropy, and (if you don't overestimate the amount of entropy you're gathering) will get perfectly random 32-bit numbers.
YouTube shows a device in action: http://www.youtube.com/watch?v=7n8LNxGbZbs
Random is, if nobody can predict the next state.
Though i cant definitively site for or against, this implementation has its issues.
Where are these IP Addresses coming from, if they are randomly selected, what happens when they do not reply or are late in replying, does that mean the random number will be slower to appear.
Also, even if you would make a visual graph of 100.000 results and calculated that there are no or few correlations between the numbers, does not mean it is truly random. As explained by dilbert :)
It doesn't strike me as a good source of randomness.
What metric would you use -- the obvious one is response time, but the range of values you can reasonably expect is small: a few tens of milliseconds to a few thousand. The response times themselves will follow a bell curve and not be randomly distributed across any interval (how would you choose the interval?) so you would have to try and select a few 'random' bits from the numbers.
The LSB might give you a random bit stream, but you would have to consider clock granularity issues - maybe due to how interrupts work you would always get multiples of 2ms on some systems.
There are probably much better 'interesting' ways of getting random bits -- maybe google for a random word, grab the first page and choose the Nth bit from the page.
Eh, I find that this kind of question leads into discussions about the meaning of 'truly random' pretty quickly.
I think that measuring pings would yield decent-quality random bits, but at an insufficient rate to be of much use (unless you were willing to do some serious DDOSing).
And I don't see that it would be any more random than measuring analogue/mechanical properties of the computer, or the behaviour of the meatbag operating it.
(edit) On a practical note, this approach opens you up to the possibility of someone on your network manipulating your 'random' number generator.
It seems to me that true randomness is ineffable - there is no way to know whether a sequence is random, since by definition it can contain anything no matter how improbable. Guaranteeing a particular distribution pattern reduces the randomness. The word "pattern" is a bit of a giveaway.
I MADE U A RANDOM NUMBER
BUT I EATED IT
Randomness is not a binary property -- it's a value between 0 and 1 that describes how difficult it is to predict the next value in a stream.
Asking "how random can my values be if I base them on pings?" is actually asking "how random are pings?". You can estimate that by gathering a large enough set of data (1 mln pings for example) and mapping their distribution curve and behavior in time. If the distribution is flat and the behavior is difficult to predict, the data seems more random. The more bumpy distribution or predictable behavior suggest lower randomness.
You should also consider the sample resolution. I could imagine the results being rounded in some way to a milisecond, so with pings you could have integer values between 0 and 500. That's not a lot of resolution.
On the practical side, I would recommend against it, since pings can be predicted and manipulated, further reducing their randomness.
Generally, I suggest against "rolling your own" randomness generators, encryption methods and hashing algorithms. As fun as it seems, it's mostly a lot of very intimidating math.
As to how to build a really good entropy generator -- I think that's probably going to have to be a sealed box that outputs some sort of result of interactions on atomic or sub-atomic level. I mean, if you're using a source of entropy that the enemy can easily read too, he only needs to find out your algorythm. Any form of connection is a possible attack vector, so you should place the source of entropy as close to the service that consumes it as possible.
You can use the XKCD method:
I got some code that creates random numbers with traceroute. I also have a program that does it using ping. I did it over a year ago for a class project. All it does is run traceroute on and address and it takes the least sig digit of the ms times. It works pretty well at getting random numbers but I really don't know how close it is to true random.
Here is a list of 8 numbers that I got when I ran it.
455298558263758292242406192
506117668905625112192115962
805206848215780261837105742
095116658289968138760389050
465024754117025737211084163
995116659108459780006127281
814216734206691405380713492
124216749135482109975241865
#include <iostream>
#include <string>
#include <stdio.h>
#include <cstdio>
#include <stdlib.h>
#include <vector>
#include <fstream>
using namespace std;
int main()
{
system("traceroute -w 5 www.google.com >> trace.txt");
string fname = "trace.txt";
ifstream in;
string temp;
vector<string> tracer;
vector<string> numbers;
in.open(fname.c_str());
while(in>>temp)
tracer.push_back(temp);
system("rm trace.txt");
unsigned index = 0;
string a = "ms";
while(index<tracer.size())
{
if(tracer[index]== a)
numbers.push_back(tracer[index-1]);
++index;
}
std::string rand;
for(unsigned i = 0 ; i < numbers.size() ; ++i)
{
std::string temp = numbers[i];
int index = temp.size();
rand += temp[index - 1];
}
cout<<rand<<endl;
return 0;
}
Very simply, since networks obey prescribed rules, the results are not random.
The webcam idea sounds (slightly) reasonable. Linux people often recommend simply using the random noise from a soundcard which has no mic attached.
here is my suggestion :
1- choose a punch of websites that are as far away from your location as possible. e.g. if you are in US try some websites that have their server IPs in malasia , china , russia , India ..etc . servers with high traffic are better.
2- during times of high internet traffic in your country (in my country it is like 7 to 11 pm) ping those websites many many many times ,take each ping result (use only the integer value) and calculate modulus 2 of it ( i.e from each ping operation you get one bit : either 0 or 1).
3- repeat the process for several days ,recording the results.
4- collect all the bits you got from all your pings (probably you will get hundreds of thousands of bits ) and choose from them your bits . (maybe you wanna choose your bits by using some data from the same method mentioned above :) )
BE CAREFUL : in your code you should check for timeout ..etc