Why are some random() functions deemed "not secure?" - random

I've heard people being warned all over the place not to rely on a language's random() function to generate a random number or string sequence "for security reasons." Java even has a SecureRandom class. Why is this?

When people talk about predicting the output of a random number generator, they don't even need to get the actual "next number". Even something subtle like noticing that the random numbers aren't evenly distributed, or that they never produce the same number twice in a row, or that "bit 5 is always set", can go a long way towards turning an attack based on guessing a "random" number from taking years, to taking days.
There is a tradeoff, generally, too. Without specific hardware to do it, generating large quantities of random numbers quickly can be really hard, since there isn't enough "randomness" available to the computer so it has to fake it.
If you're not using the randomness for security (cryptography, passwords, etc), but instead for things like simulations or numerical work, then it doesn't matter too much if they're predictable, only that they're statistically random.

Almost every random number generator is 'pseudo random' in that it uses a table of random numbers or a predictable formula. A seed is sometimes used to "start" the random sequence at a specific point, e.g. seedRandom(timer).
This was especially prevalent in the days of BAsIC programming, because it's random number generator always started at exactly the same sequence of numbers, making it unusable for any kind of GUID generation.
Back in the day, the Z-80 microprocessor had a truly random number generator, although it was only a number between 0 and 127. It used a processor cycle function and was unpredictable.

Pseudo-random numbers that can be determined in advance can lead to security holes that are vulnerable to a random number generator attack.

Predictability of a random number is a big issue. Most "random" functions derive their value from time. Given the right set of conditions you could end up with two "random" numbers of a large value that are the same.
In windows .NET world CPRNG (Cryptographically secure pseudo random number generator) can be found in System.Security.Cryptography.RandomNumberGenerator through underlying win32 APIs
In Linux there is a random "device"

Related

How can a pseudorandom number generator possibly be non-repeating?

My understanding is that PRNG's work by using an input seed and an algorithm that converts it to a very unrelated output, so that the next generated number is as unpredictable as possible. But here's the problem I see with it:
Any pseudorandom number generator that I can imagine has to have a finite number of outcomes. Let's say that I'm using a random number generator that can generate any number between 0 and one hundred billion. If I call for an output one hundred billion and one times, I can be certain that one number has been output more than once. If the same seed will always give the same output when put through an algorithm, then I can be sure that the PRNG will begin a loop. Where is my logic flawed here?
In the case that I am correct, if you know the algorithm for a PRNG, and that PRNG is being used for cryptography, can not this approach be used (and are there any measures in place to prevent it?):
Use the PRNG to generate the entire looping set of numbers possible.
Know the timestamp of when a private key was generated, and know the time AND -output of the PRNG later on
Based on how long it takes to calculate, determine how many numbers are between the known output and the unknown one
Lookup in the pre-generated list to find the generated number
You are absolutely right that in theory that approach can be used to break a PRNG, since, as you noted, given a sufficiently long sequence of outputs, you can start to predict what comes next.
The issue is that "sufficiently long" might be so long that this approach is completely impractical. For example, the Mersenne twister PRNG, which isn't designed for cryptographic use, has a period of 219,937 - 1, which is so long that it's completely infeasible to attempt the attack that you're describing.
Generally speaking, imagine that a pseudorandom generator uses n bits of internal storage. That gives 2n possible internal configurations of those bits, meaning that you may need to see 2n + 1 outputs before you're guaranteed to see a repeat. Given that most cryptographically secure PRNGs use at least 256 bits of internal storage, this makes your attack infeasible.
One detail worth noting is that there's a difference between "the PRNG repeats a number" and "from that point forward the numbers will always be the same." It's possible that a PRNG will repeat an output multiple times before moving on to output a different number next, provided that the internal state is different each time.
You are correct, a PRNG produces a long sequence of numbers and then repeats. For ordinary use this is usually sufficient. Not for cryptographic use, as you point out.
For ideal cryptographic numbers, we need to use a true RNG (TRNG), which generates random numbers from some source of entropy (= randomness in this context). Such a source may be a small piece of radioactive material on a card, thermal noise in a disconnected microphone circuit or other possibilities. A mixture of many different sources will be more resistant to attacks.
Commonly such sources of entropy do not produce enough random numbers to be used directly. That is where PRNGs are used to 'stretch' the real entropy to produce more pseudo random numbers from the smaller amount of entropy provided by the TRNG. The entropy is used to seed the PRNG and the PRNG produces more numbers based on that seed. The amount of stretching allowed is limited, so the attacker never gets a long enough string of pseudo-random numbers to do any worthwhile analysis. After the limit is reached, the PRNG must be reseeded from the TRNG.
Also, the PRNG should be reseeded anyway after every data request, no matter how small. There are various cryptographic primitives that can help with this, such as hashes. For example, after every data request, a further 128 bits of data could be generated, XOR'ed with any accumulated entropy available, hashed and the resulting hash output used to reseed the generator.
Cryptographic RNGs are slower than ordinary PRNGs because they use slow cryptographic primitives and because they take extra precautions against attacks.
For an example of a CSPRNG see Fortuona
It's possible to create truly random number generators on a PC because they are undeterministic machines.
Indeed, with the complexity of the hierarchical memory levels, the intricacy of the CPU pipelines, the coexistence of innumerable processes and threads activated at arbitrary moments and competing for resources, and the asynchronism of the I/O devices, there is no predictable relation between the number of operations performed and the elapsed time.
So looking at the system time every now and then is a perfect source randomness.
Any pseudorandom number generator that I can imagine has to have a finite number of outcomes.
I don't see why that's true. Why can't it have gradually increasing state, failing when it runs out of memory?
Here's a trivial PRNG algorithm that never repeats:
1) Seed with any amount of data unknown to an attacker as the seed.
2) Compute the SHA512 hash of the data.
3) Output the first 256 bits of that hash.
4) Append the last byte of that hash to the data.
5) Go to step 2.
And, for practical purposes, this doesn't matter. With just 128 bits of state, you can generate a PRNG that won't repeat for 340282366920938463463374607431768211456 outputs. If you pull a billion outputs a second for a billion years, you won't get through a billionth of them.

How can we know if a sequence is pseudo random or true random?

Given a sequence (ex 1 4 3 5 3 6 .....) and its range (ex 1-10 ), knowing that it is generated from a "Random Generator".
How to know whether that "Random Generator" is pseudo random or true random (Assuming the sequence is infinite).
Obviously, you can't. For one thing -- the only thing you can actually observe is a finite sequence of numbers. Every possible observed sequence will have a non-zero probability of occurring even if the sequence is genuinely random. You can observe 20 tails in a row and that is completely consistent with tossing a genuinely fair coin. Conversely, any finite sequence, no matter how random it looks, can be generated by a deterministic process.
Having said that, there are various statistical tests (most famously the Diehard tests developed by George Marsaglia) which can be applied to a sequence. They can't certify a sequence as random or pseudorandom with certainty, but poorly designed pseudorandom number generators will do poorly on these tests. On the other hand, if a sequence does well on these tests then it will be more or less impossible (without knowing the source of the numbers) to tell if it came from a pseudorandom number generator or a genuinely random source. The entire point of 50+ years of research is to ensure that the answer to your question is effectively "No - you can't tell".
To add to the well written answer of John, I would like to add a few remarks.
First, I do believe that out of every "Random Generator", as you name them, none of the random number that you'll get are truly random. Even further, we do not know a way to procude a true "random" sequence of number. The only true random that you can obtain comes from quantic particles as they are not determinist and can be considered at some extent as random. When you have a website or a program that gives you a random number, it comes from a determinist method, which could theoretically be deduced, if we knew all of the initial conditions. Some of the more "random" algorithms, for example, use the variation of the atmosthere as a way to produce a seemingly random result (see this random generator for example). And yet, if we could get all of the parameters used on an instant T, you could theoretically "guess a random number".
What you can do though, if you do not recognise a pattern in your data, is to do a statistical analysis of your data. As John said, there are numerous methods to recognise a correlation between your random values, and you could get some informations about your data. You could use tools on many mathematical programming tools (Matlab, Maple for example ...) to try to analyse your data. But, in the end, you might never be able to tell with a full certainty the veracity of your results.
So, just like John said, NO, you can't.

How do you make an algorithm for a Random Number Generator?

According to a teacher of mine in-order to do this you make two arrays with numbers, possessing several decimals. One positive array and one negative.
Array 1 [0] = e.g 1.5739
Array 2 [0] = e.g -5.31729
Then you find current time
201305220957 or May 22,2013 at 9:57 AM
And use this equation:
(201305211647*1.5739)--5.31729
-Then you use absolute value and round to 1.0 decimal place and you have your number
Is it true that in most generators the value is dependent on time?
Bottom line up front - Generating random numbers is really hard to do right, and has badly burned some seriously smart people (John Von Neumann for one). Normal people shouldn't try to create their own RNG algorithms. It requires expertise in number theory, probability & statistics, and numerical computation. Unless you qualify in all three fields, you're much better off using algorithms developed by people who are. If you want to know how to do it right you can find lots of good info at http://en.wikipedia.org/wiki/Random_number_generation and http://en.wikipedia.org/wiki/Pseudorandom_number_generator.
Speaking bluntly, your teacher is totally clueless about this topic.
If you need to generate random numbers for encryption, or statistical purposes - you need to get a well studied generator. One I like is the Mersenne Twister, that has very nice statistical properties, is fast to run and easy to code.
If you just need a reasonably random generator - for example to make things appear random in a game - you can use the classic "Linear Congruent Generator" which is trivial to write and produces pretty random looking output. ( Not safe for heavy duty computation ).
The LCG generator:
int seed = 0x333; // chose any number.
int random() { seed = ( seed * 69069 ) + 1; return seed; }
There are several numbers you can use instead of 69069. But don't pick your own. Chose one from here if you don't like 69069.
http://en.wikipedia.org/wiki/Linear_congruential_generator
The wikipedia has a lot of great pages devoted to random number generation (RNG). One page is devoted to just listing the various types of random number generators used throughout history. One of the earliest and weakest is known as the Middle Square method - easy to implement in programming and suitable for many low level tasks. Some computers have a linear feedback shift register (LFSR) built into the circuitry for random number generation, but its not very advanced. One of the more modern generators (not the most modern), and which is considered cryptographically secure (in the sense of how unpredictable it is), is known as the Mersenne Twister.
Properly, these are pseudorandom number generators (PRNG), because they arent truly random. They arent truly random because computers are deterministic machines (state machines); no predetermined algorithm can be programmed to generate truly random numbers from a known prior state.
That said, the invention of true random number generator (TRNG) hardware circuitry (typically analog) does exist, and are approached in different ways. From checking ambient conditions such as temperature and pressure, to phenomenon a more nuanced and bit more subject to conditions atomic/quantum, such as what state a multistable circuit with feedback settles in. Most modern personal computers do not use this and, if they do, probably only use it to find the seed value for PRNGs. Then, you also have RNG programs that check through an internet connection, to find random number servers online, most of which use TRNGs. You can even rely on lookup tables from real world phenomenon that have been documented; lookup tables is very old school.
Pseudorandom number generators only have the appearance of randomness, namely, they follow a particular distribution and the ability to predict future values from prior ones is not easy. There are a set of diehard tests which have the sole purpose of testing the quality of a random number generator. In general, though, all a PRNG is, is an algorithm that produces a sequence of integers. Ideally, we are looking for an algorithm that produces a sequence of numbers which are suitably unpredictable (this is the hardest part) from prior terms in the sequence, while also following a particular distribution (uniform, typically), meaning that every value in a range is produced in equal proportion.
In general, its a trivial task to convert one distribution into another, regardless if its TRNG or PRNG. Uniform discrete integer distributions (which is what PRNGs generate) can easily be extended or compressed to span any arbitrary interval of integers using a variety of scaling techniques that preserve uniformity, or converted into uniform floating points distributions by randomly choosing a large integer and scaling it down into a float, etc.
Uniform floating point numbers can easily be converted to any other non-uniform distribution such as the Normal, Chi-Square, exponential, etc., using a variety of methods, such as Inverse Transform sampling, Rejection sampling, or simple algebraic relationships on distributions (e.g. the Chi-Square is simply the sum of the squares of independent normal distributions). Additionally some distributions can be suitably approximated using simple mathematical functions applied to a uniform float.
Ultimately the hardest part and the heart of most studies in the subject is in the generation of those uniformly distributed integers, that form the basis of all other distributions.
There is nothing fundamentally wrong with using clock time to get an initial seed value, for example. Id favor using the lower three or four significant digits of the time expressed in microseconds, however, for something suitably unpredictable. Or using the LFSR in a computer for the same purpose, of getting a more advanced algorithm jump started.

True random vs. Pseudo Random (can you pseudo-random true-randomness)

Ok, so this question involves a bit of a forward. Bear with me.
There's this website random.org (and others like it) that claim to use some sort of quantum process or another to produce true random numbers.
If one were to query this site over and over and develop a massive log of true random numbers. This log is then rearranged by a program to mix it up as randomly as it can. Is the resulting output less random than when it started? By how much?
Any good/cheap further reading on the subject?
Reordering random numbers by a fixed permutation does not change the degree of randomness.
So if you have a perfect random number source, the same bits reshuffled will be equally random. This will be true if whether the "shuffle" is a fixed reordering (e.g. reversing all the bits) or a shuffle generated by a pseudo-random number generator (which is really a very obfuscated way of defining a fixed re-ordering from some initial seed).
This is provable from the underlying maths - if you reorder a set of truly independent identically distributed random variables then the resulting distribution will be the same as the one that you started with. Hence it is equally random.
However, this does not work if the shuffling is dependent on the values of the random bits in some way. If, for example, you sort the bits rather than permuting them then you won't have very good random output :-).
It would depend on how you reorder them. If you used pseudo random function to do it the results will likely be less random. If you use the true random to reorder itself it will not be more random.
One thing that people forget is the reason to use pseudo random function over some true random numbers is repeatedly and testing. If you get some unexpected results using pseudo random function will make looking at the possible problem easer.
If you have a process that needs N 'random' numbers, you can take N from the site, and use them, IN THAT ORDER, and all will be well. If you reshuffle them, you will make them less random.
If you need an ongoing supply of random numbers, then the question is the relative quality of some pseudo-random juggle of these versus what would happen if you had a true random sequence.
Since, however, linux and windows both supply real random numbers by harnessing hardware entropy, why not just use those?

What is the difference between a non-secure random number generator and a secure random number generator?

As the title says: What is the difference between a non-secure random number generator and a secure random number generator?
No computationally feasible algorithm should:
recover the seed, or
predict the "next bit"
for a secure random number generator.
Example: a linear feedback shift register produces lots of random numbers out there, but given enough output, the seed can be discovered and all subsequent numbers predicted.
A secure random number should not be predictable even given the list of previously generated random numbers. You'd typically use it for a key to an encryption routine, so you wouldn't want it guessable or predictable. Of course, guessable depends on the context, but you should assume the attacker knows all the things you know and might use to produce your random number.
There are various web sites that generate secure random numbers, one trusted one is hotbits. If you are only doing the random number generation as a one off activity, why not use a lottery draw result, since it's provably random. Of course, don't tell anyone which lottery and which draw, and put those numbers through a suitable mangle to get the range you want.
With just a "random number" one usually means a pseudo random number. Because it's a pseudo random number it can be (easily) predicted by an attacker.
A secure random number is a random number from a truly random data source, ie. involving an entropy pool of some sorts.
Agree with Purfiedeas. There is also nice article about that, called Cheat Online Poker
A random number would probably mean a pseudo random number returned by an algorithm using a 'seed'.
A secure random number would be a true random number returned from a device such as a caesium based random number generator (which uses the decay rate of the caesium to return numbers). This is naturally occurring and can't be predicted.
It probably depends on the context, but when you are comparing them like this, I'd say "random number" is a pseduo random number and a "secure random number" is truly random. The former gives you a number based on a seed and an algorithm, the other on some inherintly random function.
It's like the difference between AES and ROT13.
To be less flippant, there is generally a tradeoff when generating random numbers between how hard it is and how predictable the next one in the sequence is once you've seen a few. A random number returned by your language's built-in rand() will usually be of the cheap, predictable variety.

Resources