Generating true random numbers

Generating true random numbers - random

In my cryptography lessons I learned that one should use as many parameters together as possible in order to have the entropy to generate a close to perfect random number. My question is: Let's say I would just measure the temperature of room at a given moment (e.g. 19,6573°C), would the number not be already random enough? I mean an attacker could not possibly guess my room temperature and he could also not measure it afterwards, because he would need to go back in time.
Since cryptographic algorithms would turn my temperature from 19,6573 to a string of 512 characters or more and an attacker would not be able to reproduce the measurement, my random number should be random enough, shouldn't it?

Related

How to get truly random data, not random data fed into a PRNG seed like CSRNG's do?

From what I understand, a CSRNG like RNGCryptoServiceProvider still passes the truly random user data like mouse movement, etc through a PRNG to sort of sanitize the output and make it equal distribution. The bits need to be completely independent.
(this is for a theoretical infinite computing power attacker)
If the CSRNG takes 1KB of true random data and expands it to 1MB, all the attacker has to do is generate every combination of 1KB of data, expand it, and see which 1MB of data generates a one-time pad that returns sensible english output. I read somewhere that if the one-time pad had a PRNG anywhere in the RNG, it is just a glorified stream cipher. I was wondering if the truly random starting data was in large enough numbers to just use instead of cryptographically expanding. I need truly random output for a one-time pad, not just a cryptographically secure RNG. Or perhaps if there were other ways to somehow get truly random data, so that all bits are independent of each other. I was thinking of XOR'ing with the mouse coordinates for a few seconds, then perhaps the last digits of the Environment.TickCount, then maybe getting microphone input (1, 2, 3, 4) as well. However, as some point out on stackoverflow, I should really just let the OS handle it all. Unfortunately that isn't possible since there is an PSRNG used. I would like to avoid a hardware solution, since this is meant to be an easy to use program, and also not utilize RDRAND since it ALSO uses a PRNG (unless RDRAND can return the truly random data before it goes through a PRNG??). Would appreciate any responses if such a thing is even possible; I've been working on this for weeks under the impression that RNGCryptoServiceProvider was sufficient for a one time pad. Thanks.
(Side note: some say for most crypto functions you don't need true entropy, just unpredictability. for a one-time pad, it MUST be random otherwise it is not a one time pad.)

As you know, "truly random" means each of the bits is independent of everything else as well as uniformly distributed. However, this ideal is hard, if not impossible, to achieve in practice. In general, the closest way to get "truly random data" in practice is to gather hard-to-guess bits from nondeterministic sources, then condense those bits into a random block of data.
There are many issues involved with getting this close to "truly random data", including the following:
The sources must be nondeterministic, that is, their output cannot be determined by their inputs. Examples of nondeterministic sources include timings of input devices; thermal noise; and the noise registered by microphone and camera outputs.
The sources' output must be hard to guess. This is more formally known as entropy, such as 32 bits of entropy per 64 bits of output. However, measuring entropy is far from trivial. If you need 1 MB (8 million bits) of truly random data, you need to have data with at least 8 million bits of entropy (which in practice will be many times more than 1 MB long depending on the sources), then condense the data somehow into 1 MB of data while preserving that entropy.
The sources must be independent of each other.
There should be two or more independent sources. This is because it's impossible to extract full randomness from just one source (see McInnes and Pinkas 1990). On the other hand, extracting randomness from three or more independent sources is relatively trivial, but there is still a matter of choosing an appropriate randomness extractor, and a survey of randomness extractors would be beyond the scope of this answer.
In general, for random number generation purposes, the more sources available, the better.
REFERENCES:
McInnes, J. L., & Pinkas, B. (1990, August). On the impossibility of private key cryptography with weakly random keys. In Conference on the Theory and Application of Cryptography (pp. 421-435).

How can a pseudorandom number generator possibly be non-repeating?

My understanding is that PRNG's work by using an input seed and an algorithm that converts it to a very unrelated output, so that the next generated number is as unpredictable as possible. But here's the problem I see with it:
Any pseudorandom number generator that I can imagine has to have a finite number of outcomes. Let's say that I'm using a random number generator that can generate any number between 0 and one hundred billion. If I call for an output one hundred billion and one times, I can be certain that one number has been output more than once. If the same seed will always give the same output when put through an algorithm, then I can be sure that the PRNG will begin a loop. Where is my logic flawed here?
In the case that I am correct, if you know the algorithm for a PRNG, and that PRNG is being used for cryptography, can not this approach be used (and are there any measures in place to prevent it?):
Use the PRNG to generate the entire looping set of numbers possible.
Know the timestamp of when a private key was generated, and know the time AND -output of the PRNG later on
Based on how long it takes to calculate, determine how many numbers are between the known output and the unknown one
Lookup in the pre-generated list to find the generated number

You are absolutely right that in theory that approach can be used to break a PRNG, since, as you noted, given a sufficiently long sequence of outputs, you can start to predict what comes next.
The issue is that "sufficiently long" might be so long that this approach is completely impractical. For example, the Mersenne twister PRNG, which isn't designed for cryptographic use, has a period of 219,937 - 1, which is so long that it's completely infeasible to attempt the attack that you're describing.
Generally speaking, imagine that a pseudorandom generator uses n bits of internal storage. That gives 2n possible internal configurations of those bits, meaning that you may need to see 2n + 1 outputs before you're guaranteed to see a repeat. Given that most cryptographically secure PRNGs use at least 256 bits of internal storage, this makes your attack infeasible.
One detail worth noting is that there's a difference between "the PRNG repeats a number" and "from that point forward the numbers will always be the same." It's possible that a PRNG will repeat an output multiple times before moving on to output a different number next, provided that the internal state is different each time.

You are correct, a PRNG produces a long sequence of numbers and then repeats. For ordinary use this is usually sufficient. Not for cryptographic use, as you point out.
For ideal cryptographic numbers, we need to use a true RNG (TRNG), which generates random numbers from some source of entropy (= randomness in this context). Such a source may be a small piece of radioactive material on a card, thermal noise in a disconnected microphone circuit or other possibilities. A mixture of many different sources will be more resistant to attacks.
Commonly such sources of entropy do not produce enough random numbers to be used directly. That is where PRNGs are used to 'stretch' the real entropy to produce more pseudo random numbers from the smaller amount of entropy provided by the TRNG. The entropy is used to seed the PRNG and the PRNG produces more numbers based on that seed. The amount of stretching allowed is limited, so the attacker never gets a long enough string of pseudo-random numbers to do any worthwhile analysis. After the limit is reached, the PRNG must be reseeded from the TRNG.
Also, the PRNG should be reseeded anyway after every data request, no matter how small. There are various cryptographic primitives that can help with this, such as hashes. For example, after every data request, a further 128 bits of data could be generated, XOR'ed with any accumulated entropy available, hashed and the resulting hash output used to reseed the generator.
Cryptographic RNGs are slower than ordinary PRNGs because they use slow cryptographic primitives and because they take extra precautions against attacks.
For an example of a CSPRNG see Fortuona

It's possible to create truly random number generators on a PC because they are undeterministic machines.
Indeed, with the complexity of the hierarchical memory levels, the intricacy of the CPU pipelines, the coexistence of innumerable processes and threads activated at arbitrary moments and competing for resources, and the asynchronism of the I/O devices, there is no predictable relation between the number of operations performed and the elapsed time.
So looking at the system time every now and then is a perfect source randomness.

Any pseudorandom number generator that I can imagine has to have a finite number of outcomes.
I don't see why that's true. Why can't it have gradually increasing state, failing when it runs out of memory?
Here's a trivial PRNG algorithm that never repeats:
1) Seed with any amount of data unknown to an attacker as the seed.
2) Compute the SHA512 hash of the data.
3) Output the first 256 bits of that hash.
4) Append the last byte of that hash to the data.
5) Go to step 2.
And, for practical purposes, this doesn't matter. With just 128 bits of state, you can generate a PRNG that won't repeat for 340282366920938463463374607431768211456 outputs. If you pull a billion outputs a second for a billion years, you won't get through a billionth of them.

Public source of randomness

I want to set up "public lottery", in which everyone can see the selection is random and fair. If I only needed one bit, I would use, for example, the LSB of the closing Dow Jones index for that day. The problem is, I need 32 bits. I need a source that is:
available daily
visible to the public throughout the world
not manipulable (by me or anyone else)
unbiased
simple
I suppose I could just pick 32 stocks or stock-indices and use the LSB of each, that would be at least difficult to manipulate, and run them through some hash to eliminate any bias toward 0, but that doesn't really qualify as "simple". Other thoughts: some feed of meteorological or seismological data. That would be more difficult to manipulate (much easier to buy a share of stock than to cause an earthquake) but harder to authenticate (since there aren't armies of auditors watching weather data).
Any suggestions?

Check out http://www.random.org/ They have a section for Third-Party Draw Service
The Third-Party Draw Service is useful for people who operate raffles,
sweepstakes, promotional giveaways and other lottery type services
professionally. In a similar fashion to a certified official,
RANDOM.ORG acts as an unbiased third party who conducts the drawings
in a manner that is guaranteed to be fair and truly random. The
drawings are made using true randomness that comes from atmospheric
noise, which for many purposes is better than the pseudo-random number
algorithms typically used in computer programs.
Check out the Public Records for details about recent drawings held
with the service.
This sounds like what you are looking for, but you would end up having to rely on random.org for the numbers.

The part "visible to the public throughout the world" is the trickiest part in my opinion.
An excellent source of really random numbers is the noise on a webcam (or any other CCD camera). This noise is caused by quantum fluctuation of electron temperature on the CCD plate, so it's truly random.
You could use a picture from a publicly available webcam, but it's hard to find one with a closed shutter... You could set one up and make it available yourself, or you could use one that monitors some meteorological event and subtract a time-averaged image every day.
I hope this is simple enough!

Look at the XKCD GeoHashing algorithm.
MD5(Date, Dow Jones Opening)
Depends how "simple" you want.

I would take a large set of unrelated inputs. You could include some or all of these:
Stock prices (preferably from multiple locations, e.g. Last digit of Dow Jones + last digit of FTSE)
Last digit of the reading from a publicly-visible digital thermometer (easy to find in large cities)
The date
MD5 sum of the current google.com logo image
Name of top-billed guest on today's episode of <insert name of TV talk show here>
Other public lotteries
Concatenate all of these into one large string and apply a cryptographic hash function to it.
The hash will not increase the total entropy, but what it will do is make the output harder to manipulate (because the attacker would need to manipulate many inputs simultaneously.)
Now just take the first 32 bits of the hash.

Separate the non deterministic from the random use a third party service that streams random number sets with a sn assigned to each set.
you set up the number of bits and the number of digits in sn.
Now it streams in random sets with assigned sn in a loop the size of your sn. Save it and you get a batch set of numbers that you put out for public record
Now you can chose a smaller number that doesn't need to be random, just non deterministic to pick the single set of numbers

Why are some random() functions deemed "not secure?"

I've heard people being warned all over the place not to rely on a language's random() function to generate a random number or string sequence "for security reasons." Java even has a SecureRandom class. Why is this?

When people talk about predicting the output of a random number generator, they don't even need to get the actual "next number". Even something subtle like noticing that the random numbers aren't evenly distributed, or that they never produce the same number twice in a row, or that "bit 5 is always set", can go a long way towards turning an attack based on guessing a "random" number from taking years, to taking days.
There is a tradeoff, generally, too. Without specific hardware to do it, generating large quantities of random numbers quickly can be really hard, since there isn't enough "randomness" available to the computer so it has to fake it.
If you're not using the randomness for security (cryptography, passwords, etc), but instead for things like simulations or numerical work, then it doesn't matter too much if they're predictable, only that they're statistically random.

Almost every random number generator is 'pseudo random' in that it uses a table of random numbers or a predictable formula. A seed is sometimes used to "start" the random sequence at a specific point, e.g. seedRandom(timer).
This was especially prevalent in the days of BAsIC programming, because it's random number generator always started at exactly the same sequence of numbers, making it unusable for any kind of GUID generation.
Back in the day, the Z-80 microprocessor had a truly random number generator, although it was only a number between 0 and 127. It used a processor cycle function and was unpredictable.

Pseudo-random numbers that can be determined in advance can lead to security holes that are vulnerable to a random number generator attack.

Predictability of a random number is a big issue. Most "random" functions derive their value from time. Given the right set of conditions you could end up with two "random" numbers of a large value that are the same.
In windows .NET world CPRNG (Cryptographically secure pseudo random number generator) can be found in System.Security.Cryptography.RandomNumberGenerator through underlying win32 APIs
In Linux there is a random "device"

What is the difference between a non-secure random number generator and a secure random number generator?

As the title says: What is the difference between a non-secure random number generator and a secure random number generator?

No computationally feasible algorithm should:
recover the seed, or
predict the "next bit"
for a secure random number generator.
Example: a linear feedback shift register produces lots of random numbers out there, but given enough output, the seed can be discovered and all subsequent numbers predicted.

A secure random number should not be predictable even given the list of previously generated random numbers. You'd typically use it for a key to an encryption routine, so you wouldn't want it guessable or predictable. Of course, guessable depends on the context, but you should assume the attacker knows all the things you know and might use to produce your random number.
There are various web sites that generate secure random numbers, one trusted one is hotbits. If you are only doing the random number generation as a one off activity, why not use a lottery draw result, since it's provably random. Of course, don't tell anyone which lottery and which draw, and put those numbers through a suitable mangle to get the range you want.

With just a "random number" one usually means a pseudo random number. Because it's a pseudo random number it can be (easily) predicted by an attacker.
A secure random number is a random number from a truly random data source, ie. involving an entropy pool of some sorts.

Agree with Purfiedeas. There is also nice article about that, called Cheat Online Poker

A random number would probably mean a pseudo random number returned by an algorithm using a 'seed'.
A secure random number would be a true random number returned from a device such as a caesium based random number generator (which uses the decay rate of the caesium to return numbers). This is naturally occurring and can't be predicted.

It probably depends on the context, but when you are comparing them like this, I'd say "random number" is a pseduo random number and a "secure random number" is truly random. The former gives you a number based on a seed and an algorithm, the other on some inherintly random function.

It's like the difference between AES and ROT13.
To be less flippant, there is generally a tradeoff when generating random numbers between how hard it is and how predictable the next one in the sequence is once you've seen a few. A random number returned by your language's built-in rand() will usually be of the cheap, predictable variety.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio