Truly random number generator - random

From what I understand PRNG uses a seed that generates a sequence of numbers that is not truly random. Would it be possible to create a truly random number generator by reusing PRNG over and over with different seeds each time it is used. The seed could be extracted from dev/random or the current time or clock tick. If not then is there a truly random number generator implemented in software?
Thanks

If you re-seed the PRNG every time you need a random number you can just cut out the middle man and use the seed directly as random number.
But what you're talking about is done in practice. Those are so-called cryptographically-secure PRNGs and they are employed in many operating systems to provide random numbers for cryptographic applications. They get re-seeded frequently from the entropy pool and are designed so that it is computationally very hard to figure out the next number from knowing past ones (something that's very trivial to do for an LCG, for example) and also to figure out past numbers from the current one.
The benefot of this approach is that you don't block generating random numbers. Entropy in a system is a limited resource and can only come from outside sources, so by using a CSPRNG you can safely stretch it and not compromise security at the same time.

The simple answer is that there is no such implementation because, as far as I know, it's simply not possible. To generate truly random numbers you need an outside source of entropy like a hardware random number generator.

The clock is not very random, but /dev/random has some randomness - it's actually like a bucket of randomness that you can deplete depending on the rate of randomness production and consumption. If you use dev/random, then you don't have use an RNG. Seeding an RNG from /dev/random is redundant.

Intel is working on something that could be truly groundbreaking if it works as advertised. It would practically render hardware PRNGs redundant.

Related

Securely Use Random Number Generator for Lottery Winning

I want to design a lottery winning mechanism using random number generator. I know that for computer, there is no true randomness but only "pseudorandom". If the system gets hacked and random seed is seen, people will know the sequence of random numbers. In fact, there is news that people did this and won several lotteries. I am thinking about two ways of designing my system:
Use random number generator as a global variable. There is only one
random seed; the sequence is generated when the system starts.
Con:
a. Once the random seed is seen, hackers will know the sequence
easily.
b. Once the system crashes and restarts, the sequence will repeat
itself.
Create a random number generator using timestamp as random seed each
time to generate a number.
Con:
a. Obviously timestamp cannot be directly used. There are some
tricks needed to be done with the timestamp each time. For example,
plus or minus some values each time on the timestamp. What algorithm can I use here to do this kind of modification on timestamp?
b. Is this method even taking advantage of random number generator?
It seems I am just creating a random number by myself...
As we can see, either of the method above is not secure enough. Which way is slightly better? Or is there a better way?
The notion that computers are incapable of truly random numbers hasn't been true for decades. All modern desktop and laptop computers have true hardware-based random number generators. Even most small embedded systems do as well.
That said, it may be the case that your programming language hasn't caught up to the recent hardware, or that even if it has, it's easy to make a mistake with RNGs and get a bad result from a good generator. So it's probably a good idea to use something like random.org unless you know what you're doing.

What are typical means by which a random number can be generated in an embedded system?

What are typical means by which a random number can be generated in an embedded system? Can you offer advantages and disadvantages for each method, and/or some factors that might make you choose one method over another?
First, you have to ask a fundamental question: do you need unpredictable random numbers?
For example, cryptography requires unpredictable random numbers. That is, nobody must be able to guess what the next random number will be. This precludes any method that seeds a random number generator from common parameters such as the time: you need a proper source of entropy.
Some applications can live with a non-cryptographic-quality random number generator. For example, if you need to communicate over Ethernet, you need a random number generator for the exponential back-off; statistic randomness is enough for this¹.
Unpredictable RNG
You need an unpredictable RNG whenever an adversary might try to guess your random numbers and do something bad based on that guess. For example, if you're going to generate a cryptographic key, or use many other kinds of cryptographic algorithms, you need an unpredictable RNG.
An unpredictable RNG is made of two parts: an entropy source, and a pseudo-random number generator.
Entropy sources
An entropy source kickstarts the unpredictability. Entropy needs to come from an unpredictable source or a blend of unpredictable sources. The sources don't need to be fully unpredictable, they need to not be fully predictable. Entropy quantifies the amount of unpredictability. Estimating entropy is difficult; look for research papers or evaluations from security professionals.
There are three approaches to generating entropy.
Your device may include some non-deterministic hardware. Some devices include a dedicated hardware RNG based on physical phenomena such as unstable oscillators, thermal noise, etc. Some devices have sensors which capture somewhat unpredictable values, such as the low-order bits of light or sound sensors.
Beware that hardware RNG often have precise usage conditions. Most methods require some time after power-up before their output is truly random. Often environmental factors such as extreme temperatures can affect the randomness. Read the RNG's usage notes very carefully. For cryptographic applications, it is generally recommended to make statistical tests the HRNG's output and refuse to operate if these tests fail.
Never use a hardware RNG directly. The output is rarely fully unpredictable — e.g. each bit may have a 60% probability of being 1, or the probability of two consecutive bits being equal may be only 48%. Use the hardware RNG to seed a PRNG as explained below.
You can preload a random seed during manufacturing and use that afterwards. Entropy doesn't wear off when you use it²: if you have enough entropy to begin with, you'll have enough entropy during the lifetime of your device. The danger with keeping entropy around is that it must remain confidential: if the entropy pool accidentally leaks, it's toast.
If your device has a connection to a trusted third party (e.g. a server of yours, or a master node in a sensor network), it can download entropy from that (over a secure channel).
Pseudo-random number generator
A PRNG, also called deterministic random bit generator (DRBG), is a deterministic algorithm that generates a sequence of random numbers by transforming an internal state. The state must be seeded with sufficient entropy, after which the PRNG can run practically forever. Cryptographic-quality PRNG algorithms are based on cryptographic primitives; always use a vetted algorithm (preferably some well-audited third-party code if available).
The PRNG needs to be seeded with entropy. You can choose to inject entropy once during manufacturing, or at each boot, or periodically, or any combination.
Entropy after a reboot
You need to take care that the device doesn't boot twice in the same RNG state: otherwise an observer can repeat the same sequence of RNG calls after a reset and will know the RNG output the second time round. This is an issue for factory-injected entropy (which by definition is always the same) as well as for entropy derived from sensors (which takes time to accumulate).
If possible, save the RNG state to persistent storage. When the device boots, read the RNG state, apply some transformation to it (e.g. by generating one random word), and save the modified state. After this is done, you can start returning random numbers to applications and system services. That way, the device will boot with a different RNG state each time.
If this is not possible, you ned to be very careful. If your device has factory-injected entropy plus a reliable clock, you can mix the clock value into the RNG state to achieve unicity; however, beware that if your device loses power and the clock restarts from some fixed origin (blinking twelve), you'll be in a repeatable state.
Predictable RNG state after a reset or at the first boot is a common problem with embedded devices (and with servers). For example, a study of RSA public keys showed that many had been generated with insufficient entropy, resulting in many devices generating the same key³.
Statistical RNG
If you can't achieve a cryptographic quality, you can fall back to a less good RNG. You need to be aware that some applications (including a lot of cryptography) will be impossible.
Any RNG relies on a two-part structure: a unique seed (i.e. an entropy source) and a deterministic algorithm based on that seed.
If you can't gather enough entropy, at least gather as much as possible. In particular, make sure that no two devices start from the same state (this can usually be achieved by mixing the serial number into the RNG seed). If at all possible, arrange for the seed not to repeat after a reset.
The only excuse not to use a cryptographic DRBG is if your device doesn't have enough computing power. In that case, you can fall back to faster algorithm that allow observers to guess some numbers based on the RNG's past or future output. The Mersenne twister is a popular choice, but there have been improvements since its invention.
¹ Even this is debatable: with non-crypto-quality random backoff, another device could cause a denial of service by aligning its retransmission time with yours. But there are other ways to cause a DoS, by transmitting more often.
² Technically, it does, but only at an astronomical scale.
³ Or at least with one factor in common, which is just as bad.
One way to do it would be to create a Pseudo Random Bit Sequence, just a train of zeros and ones, and read the bottom bits as a number.
PRBS can be generated by tapping bits off a shift register, doing some logic on them, and using that logic to produce the next bit shifted in. Seed the shift register with any non zero number. There's a math that tells you which bits you need to tap off of to generate a maximum length sequence (i.e., 2^N-1 numbers for an N-bit shift register). There are tables out there for 2-tap, 3-tap, and 4-tap implementations. You can find them if you search on "maximal length shift register sequences" or "linear feedback shift register.
from: http://www.markharvey.info/fpga/lfsr/
HOROWITZ AND HILL gave a great part of a chapter on this. Most of the math surrounds the nature of the PRBS and not the number you generate with the bit sequence. There are some papers out there on the best ways to get a number out of the bit sequence and improving correlation by playing around with masking the bits you use to generate the random number, e.g., Horan and Guinee, Correlation Analysis of Random Number Sequences based on Pseudo Random Binary Sequence Generation, In the Proc. of IEEE ISOC ITW2005 on Coding and Complexity; editor M.J. Dinneen; co-chairs U. Speidel and D. Taylor; pages 82-85
An advantage would be that this can be achieved simply by bitshifting and simple bit logic operations. A one-liner would do it. Another advantage is that the math is pretty well understood. A disadvantage is that this is only pseudorandom, not random. Also, I don't know much about random numbers, and there might be better ways to do this that I simply don't know about.
How much energy you expend on this would depend on how random you need the number to be. If I were running a gambling site, and needed random numbers to generate deals, I wouldn't depend on Pseudo Random Bit Sequences. In those cases, I would probably look into analog noise techniques, maybe Johnson Noise around a big honking resistor or some junction noise on a PN junction, amplify that and sample it. The advantages of that are that if you get it right, you have a pretty good random number. The disadvantages are that sometimes you want a pseudorandom number where you can exactly reproduce a sequence by storing a seed. Also, this uses hardware, which someone must pay for, instead of a line or two of code, which is cheap. It also uses A/D conversion, which is yet another peripheral to use. Lastly, if you do it wrong -- say make a mistake where 60Hz ends up overwhelming your white noise-- you can get a pretty lousy random number.
What are typical means by which a random number can be generated in an embedded system?
Giles indirectly stated this: it depends on the use.
If you are using the generator to drive a simulation, then all you need is a uniform distribution and a linear congruential generator (LCG) will work fine.
If you need a secure generator, then its a trickier problem. I'm side-stepping what it means to be secure, but from 10,000 feet think "wrap it in a cryptographic transformation", like a SHA-1/HMAC or SHA-512/HMAC. There are others ways, like sampling random events, but they may not be viable.
When you need secure random numbers, some low resource devices are notoriously difficult to work with. See, for example, Mining Your Ps and Qs: Detection of Widespread Weak Keys in Network Devices and Traffic sensor flaw that could allow driver tracking fixed. And a caveat for Linux 3.0 kernel users: the kernel removed a couple of entropy sources, so entropy depletion and starvation might have gotten worse. See Appropriate sources of entropy on LWN.
If you have a secure generator, then your problem becomes getting your hands on a good seed (or seeds over time). One of the better methods I have seen for environments that are constrained is Hedging. Hedging was proposed for Virtual Machines where a program could produce the same sequence after a VM reset.
The idea for hedging is to extract the randomness provided by your peer, and use it to keep you secure generator fit. For example, in the case of TLS, there is a client_random and a server_random. If the device is a server, then it would stir in the client_random. If the device is a client, then it would stir in server_random.
You can find the two papers of interest that address hedging at:
When Good Randomness Goes Bad: Virtual Machine Reset
Vulnerabilities and Hedging Deployed Cryptography
When Virtual is Harder than Real: Resource Allocation Challenges in
Virtual Machine Based IT Environments
Using client_random and a server_random is consistent with Peter Guttman's view on the subject: "mix every entropy source you can get your hands on into your PRNG, including less-than-perfect ones". Gutmann is the author of Engineering Security.
Hedging only solves part of the problem. You will still need to solve other problems, like how to bootstrap the entropy pool, how to regenerate system key pairs when the pool is in a bad state, and how persist the entropy across reboots when there's no filesystem.
Although it may not be the most complex or sound method, it can be fun to use external stimuli as your seed for random number generation. Consider using analogue input from a photodiode, or a thermistor. Even random noise from a floating pin could potentially yield some interesting results.

how does random() actually work?

Every language has a random() function or something similar to generate a pseudo-random number. I am wondering what happens underneath to generate these numbers? I am not programming anything that makes this knowledge necessary, just trying to satisfy my own curiosity.
The entire first chapter of Donald Knuth's seminal
work Seminumerical Algorithms is taken up with the subject of random number generation. I really don't think an SO answer is going to come close to describing the issues involved. Read the book.
It turns out to be surprisingly easy to get half-way-decent pseudorandom numbers. For decades the gold standard was a remarkably simple algorithm: keep state x, multiply by constant A (32x32 => 64 bits) then add constant B, then return the low 32-bits, which also become the new x. If A and B are chosen carefully this actually works fairly well.
Pseudorandom numbers need to be repeatable, too, in order to reproduce behavior during debugging. So, seeding the generator (initializing x with, say, the time-of-day) is typically avoided during debugging.
In recent years, and with more compute cycles available to burn, more sophisticated algorithms are available, some of them invented since the publication of the otherwise quite authoritive Seminumerical Algorithms. Operating systems are also starting to provide hardware and network-derived entropy bits for specialized cryptographic purposes.
The Wikipedia page is a good reference.
The actual algorithm used is going to be dependent on the language and the implementation of the language.
random() is a so called pseudorandom number generator (PRNG). random() is mostly implemented as a Linear congruential generator. This is a function of the form X(n+1) (aXn +c) modulo m. Xn is the sequence of generated pseudorandom numbers. The genarated sequence of numbers is easy guessable. This algorithm can't be used as a cryptographically safe PRNG.
Wikipedia:Linear congruential generator
And take a look at the diehard tests for PRNG
PRNG Diehard Tests
To exactly answer you answer, the random function is provided by the operation system (usually).
But how the operating system creates this random numbers is a specialized area in computer science. See for example the wiki page posted in the answers above.
One thing you might want to examine is the family of random devices available on some Unix-like OSes like Linux and Mac OSX. For example, on Linux, the kernel gathers entropy from a variety of sources into a pool which it then uses to seed it's pseudo-random number generator. The entropy can come from a variety of sources, the most notable being device driver jitter from keypresses, network events, hard disk activity and (most of all) mouse movements. Aside from this, there are other techniques to gather entropy, some of them even implemented totally in hardware. There are two character devices you can get random bytes from and on Linux, they behave in the following way:
/dev/urandom gives you a constant stream of bytes which is very random but not cryptographically safe because it reuses whatever entropy is available in the pool.
/dev/random gives you cryptographically safe random numbers but it won't give you a constant stream as it uses the entropy available in the pool and then blocks while more entropy is collected.
Note that while Mac OSX uses a different method for it's PRNG and therefore does not block, my personal benchmarks (done in college) have shown it to be every-so-slightly less random than the Linux kernel. Certainly good enough, though.
So, in my projects, when I need randomness, I typically go for reading from one of the random devices, at least for the seed for an algorithm in my program.
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG),1 is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed (which may include truly random values). Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.[2]
PRNGs are central in applications such as simulations (e.g. for the Monte Carlo method), electronic games (e.g. for procedural generation), and cryptography. Cryptographic applications require the output not to be predictable from earlier outputs, and more elaborate algorithms, which do not inherit the linearity of simpler PRNGs, are needed.
Good statistical properties are a central requirement for the output of a PRNG. In general, careful mathematical analysis is required to have any confidence that a PRNG generates numbers that are sufficiently close to random to suit the intended use. John von Neumann cautioned about the misinterpretation of a PRNG as a truly random generator, and joked that "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."[3]
You can check out the wikipedia page for more here

Does any software exist for building entropy pools from user input?

It'd be nice to be able, for some purposes, to bypass any sort of algorithmically generated random numbers in favor of natural input---say, dice rolls. Cryptographic key generation, for instance, strikes me as a situation where little enough random data is needed, and the requirement that the data be truly random is high enough, that this might be a feasible and desirable thing to do.
So what I'd like to know, before I go and get my hands dirty, is this: does any software exist for building an entropy pool directly from random digit input? Note that it's not quite enough to simply convert things from radix r to radix 2; since, for instance, 3 and 2 are relatively prime, it's not entirely straightforward to turn a radix-3 (or radix-6) number into binary digits while holding onto maximal entropy in the original input.
The device /dev/random does exactly this on Linux -- maybe it would be worth looking at the source?
EDIT:
As joeytwiddle says, if sufficient randomness is unavailable, /dev/random will block, waiting for entropy to "build up" by monitoring external devices (e.g. mouse, disk drives). This may or may not be what you want. If you'd prefer never to wait and are satisfied with possibly-lower-quality randomness, use /dev/urandom instead -- it's a non-blocking pseudorandom number generator that injects randomness from /dev/random whenever it is available, making it more random than a plain deterministic PRNG. (See man /dev/urandom for further details.)
This paper suggests various approaches with implementation ideas for both UN*X and Windows.
I'm not sure what you're asking for. "Entropy pool" is just a word for "some random numbers", so you could certainly use dice rolls; simply use them as the seen to a pseudorandom number generator that has the characteristics you want.
You can get physically generated random numbers online from, eg, Lavarnd or Hotbits.
Note that the amount of entropy in the pool doesn't necessarily have to be an integer. This should mostly deal with your prime-factors-other-than-2 issue.
Even if you end up using an implementation that does require integer estimates, you need quite a few dice rolls to generate a crypto key. So you could just demand them in bunches. If the user gives you the results of 10 d6 rolls, and you estimate the entropy as 25 bits, you've only lost 0.08 bits per dice roll. Remember to round down ;-)
Btw, I would treat asking the user for TRNG data, rather than drawing it from hardware sources as /dev/random does, to be a fun toy rather than an improvement. It's difficult enough for experts to generate random numbers - you don't want to leave general users at the mercy of their own amateurism. "The generation of random numbers is too important to be left to chance" --Robert Coveyou.
By another way, the authors of BSD argue that since entropy estimation for practical sources on PC hardware isn't all that well understood (being a physics problem, not a math problem), using a PRNG isn't actually that bad an option, provided that it is well-reseeded according to Schneier / Kelsey / Ferguson's Yarrow design. Your dice idea does at least have the advantage over typical sources of entropy for /dev/random, that as long as the user can be trusted to find fair dice and roll them properly, you can confidently put a lower bound the entropy. It has the disadvantage that an observer with a good pair of binoculars and/or a means of eavesdropping on their keyboard (e.g. by its E/M emissions) can break the whole scheme, so really it all depends on your threat model.

Do stateless random number generators exist?

Is there a difference between generating multiple numbers using a single random number generator (RNG) versus generating one number per generator and discarding it? Do both implementations generate numbers which are equally random? Is there a difference between the normal RNGs and the secure RNGs for this?
I have a web application that is supposed to generate a list of random numbers on behalf of clients. That is, the numbers should appear to be random from each client's point of view. Does this mean I need retain a separate random RNG per client session? Or can I share a single RNG across all sessions? Or can I create and discard a RNG on a per-request basis?
UPDATE: This question is related to Is a subset of a random sequence also random?
A random number generator has a state -- that's actually a necessary feature. The next "random" number is a function of the previous number and the seed/state. The purists call them pseudo-random number generators. The numbers will pass statistical tests for randomness, but aren't -- actually -- random.
The sequence of random values is finite and does repeat.
Think of a random number generator as shuffling a collection of numbers and then dealing them out in a random order. The seed is used to "shuffle" the numbers. Once the seed is set, the sequence of numbers is fixed and very hard to predict. Some seeds will repeat sooner than others.
Most generators have period that is long enough that no one will notice it repeating. A 48-bit random number generator will produce several hundred billion random numbers before it repeats -- with (AFAIK) any 32-bit seed value.
A generator will only generate random-like values when you give it a single seed and let it spew values. If you change seeds, then numbers generated with the new seed value may not appear random when compared with values generated by the previous seed -- all bets are off when you change seeds. So don't.
A sound approach is to have one generator and "deal" the numbers around to your various clients. Don't mess with creating and discarding generators. Don't mess with changing seeds.
Above all, never try to write your own random number generator. The built-in generators in most language libraries are really good. Especially modern ones that use more than 32 bits.
Some Linux distros have a /dev/random and /dev/urandom device. You can read these once to seed your application's random number generator. These have more-or-less random values, but they work by "gathering noise" from random system events. Use them sparingly so there are lots of random events between uses.
I would recommend using a single generator multiple times. As far as I know, all the generators have a state. When you seed a generator, you set its state to something based on the seed. If you keep spawning new ones, it's likely that the seeds you pick will not be as random as the numbers generated by using just one generator.
This is especially true with most generators I've used, which use the current time in milliseconds as a seed.
Hardware-based, true [1], random number generators are possible, but non-trivial and often have low mean rates. Availablity can also be an issue [2]. Googling for "shot noise" or "radioactive decay" in combination with "random number generator" should return some hits.
These systems do not need to maintain state. Probably not what you were looking for.
As noted by others, software systems are only pseudo-random, and must maintain state.
A compromise is to use a hardware based RNG to provide an entropy pool (stored state) which is made available to seed a PRNG. This is done quite explicitly in the linux implementation of /dev/random [3] and /dev/urandom [4].
These is some argument about just how random the default inputs to the /dev/random entropy pool really are.
Footnotes:
modulo any problems with our understanding of physics
because you're waiting for a random process
/dev/random features direct access to the entropy pool seeded from various sources believed to be really or nearly random, and blocks when the entropy is exhausted
/dev/urandom is like /dev/random, but when the entopy is exhausted a cryptographic hash is employed which makes the entropy pool effectively a stateful PRNG
If you create a RNG and generate a single random number from it then discard the RNG, the number generated is only as random as the seed used to start the RNG.
It would be much better to create a single RNG and draw many numbers from it.
As people have already said, it's much better to seed the PRNG once, and reuse it. A secure PRNG is simply one which is suitable for cryptographic applications. The only way re-seeding each time will give reasonably random results is where it comes from a genuinely random "real world" source - ie specialised hardware. Even then, it's possible that the source is biased and it will still be theoretically better to use the same PRNG over.
Normally seeding a new state takes quite while for a serious PRNG, and making new ones each time won't really help much.
The only case I can think of where you might want more than one PRNG is for different systems, say in a casino game you have one generator for shuffling cards and a separate one to generate comments done by the computer control characters, this way REALLY dedicated users can't guess outcomes based on character behaviors.
A nice solution for seeding is to use this (Random.org) , they supply random numbers generated from the atmospheric noise for free. It could be a better source for seeding than using time.
Edit: In your case, I would definitely use one PRNG per client, if for no other reason than for good programming standards. Anyways if you share one PRNG among clients, you will still be providing pseudo-random values to each, of a quality equal to your PRNG's quality. So that's a viable option but seems like a bad policy for programming
It's worth mentioning that Haskell is a language which attempts to entirely eliminate mutable state. In order to reconcile this goal with hard-requirements like IO (which requires some form of mutability), monads must be used to thread state from one calculation to the next. In this way, Haskell implements its pseudo-random number generator. Strictly speaking, generating random numbers is an inherently stateful operation, but Haskell is able to hide this fact by moving the state "mutation" into the bind (>>=) operation.
This probably sounds a little abstract, and it doesn't really answer your question completely, but I think it is still applicable. From a theoretical standpoint, it is impossible to work with a RNG without involving state. Regardless, there are techniques which can be used to mitigate this interaction and make it appear as if the entire operation is of a stateless nature.
It's generally better to create a single PRNG and pull multiple values from it. Creating multiple instances means you need to ensure that the seeds for the instances are guaranteed unique, which will require incorporating instance-specific information.
As an aside, there are better "true" Random Number Generators, but they usually require specialized hardware which does things like derive random data from electrical signal variance inside the computer. Unless you're really worried about it, I'd say the Pseudo Random Number Generators built into the language libraries and/or OS are probably sufficient, as long as your seed value is not easily predictable.
The use of a secure PRNG depends on your application. What are the random numbers used for?
If they're something of real value (e.g. anything cryptographically related), you wouldn't want to use anything less.
Secure PRNGs are much slower, and may require libraries to do operation of arbitrary precision, and primality testing, etc etc...
Well, as long as they are seeded differently each time they're created, then no, I don't think there'd be any difference; however, if it depended on something like the time, then they'd probably be non-uniform, due to the biased seed.

Resources