Random number understanding [duplicate] - random

This question already has answers here:
How does a random number generator work?
(9 answers)
how does random() actually work?
(7 answers)
Closed 8 years ago.
I can't understand how could computer make random numbers.
I mean what piece of hardware can do this? and does the computer has only one source to do this and all the programming languages use that?
Thanks in advance.

The short answer is that computers can't easily make truly random numbers. There are a couple ways to generate random numbers, though, some fast but not random, and some slow but true...
Pseudo-Random Generators
Most low-level languages (Namely, C) have built in functionality that allows them to psuedo generate random numbers, but this is not true random number generation. It works by starting with a "seed" value, an initial string of numbers, and then modifying this seed, over and over again, to create a "random" string.
They fall short in that, with the right seed and factors, conditions can be created to force a certain number to be generated. Also, due to the nature of the generation, when graphed, the results will not be evenly distributed. As mentioned by the above answerer, there are things a programmer can do to make it more random, but the method can not be truly random, for the above reasons. An example is the random number generator in most programming languages. It is hard-coded, and is performed in the CPU.
Entropy Generators
Random numbers that work through entropy generation work by measuring a type of entropy (disorder, or, as I have heard it defined, chaos #duffymo has informed me that chaos is not a good synonym. Sorry!) that is presumed to be random. Atmospheric and thermal noise are common things measured. They are generally considered to be "better" than the above choice, as they are, for the most part, closer to true randomness. One issue is that they are slow - numbers can not be generated unless enough entropy is harvested. An example is random.org, an atmospheric noise entropacal random number generator (say that 10 times fast!). It is performed by whatever piece of hardware makes the measurement of entropy.
Quantum Generators
A subset of entropy generators, quantum generators measure quantum factors (factors not used in classical physics), such as the spin of particles to determine a number. A downside is that true quantum generators are expensive. An example is this piece of hardware which uses the path of a photon to determine a number.
Hope this helps!

It can be hardware, but most languages like Java and C# use a software construct best explained by Donald Knuth in his opus "The Art of Computer Programming": linear congruential generator.
As you can imagine, there are problems with these approaches.
There are attempts to improve it (e.g. Mersenne Twister).
There are extensive statistical tests to assess a given random number generation algorithm called the Diehard Tests. (I always picture big vehicles in a snowstorm being cranked in the cold by honking batteries when I hear about those tests.)
I'd be willing to bet that the period on these pseudo random number generators is more than adequate for your applications.
The best way to generate a truly random number is to use a quantum process from nature in hardware.

Related

Is there a somewhat-reliable way to detect that a list of integers came from a common PRNG?

Basically I'm looking for a detective function. I pass it a list of integers (probably between 20 and 100 integers) and it tell me "Yeah, 84% chance this came from a PRNG, I tested it against the main ones that most modern programming languages use", or "No, only 12% chance this came from a well-known PRNG".
If it helps (or hinders), the integers will always be between 1 and 999.
Does this exist?
Unless you are prepared to break new ground in number theory, you would only be able to detect obsolete, badly designed, or poorly seeded PRNGs. Good PRNGs are explicitly designed to prevent what you are trying to do. Random number generation is a critical part of digital cryptography, so a lot of effort goes into producing random numbers that meet all known tests.
There are batteries of tests to profile PRNGs. See for example this NIST page.
As the comments point out, the first two sentences are overstated and are only strictly true for PRNGs that may be used in cryptography. Weaker (i.e. more predictable) PRNGs might be chosen for other domains in order to improve time or space performance.
You can write a battery of tests for a list of candidate generators, but there are a lot of generators, and some have enormous state where adjacent values of a well-seeded generator will reveal nothing useful and you'll have to see wait for a long time before you can get the two data points which will have an informative relationship.
On the plus side; while the list of random number generators that you might encounter is vast, there are telltale signs that will help you identify some classes of simple generators quickly and then you can perform focussed analysis to derive the specific configuration.
Unfortunately even a simple generator like KISS shows that while the generator can be trivially broken when you know its configuration, it can hide its signature from anything that does not know its configuration, leaving you in a situation where you have to individually test for every possible configuration.
There are quality tests like dieharder and TestU01 which will consume many megabytes of data to identify any weakness in a generator; however, these can also identify weaknesses in real RNGs, so they could give a strong false positive.
To consume only a 100 integers you would really need to have a list of generators in mind. For example, to detect LCG used inappropriately, you simply test to see if the bottom three bits cycle through a repeating pattern of 8 values -- but this is by far the easiest case.
If you had a sequence 625 or more 32-bit integers, you could detect with high confidence whether it was from consecutive calls to Mersenne Twister. That is because it leaks state information in the output values.
For an example of how it is done, see this blog entry.
Similar results are in theory possible when you don't have ideal data such as full 32-bit integers, but you would need a longer sequence and the maths gets harder. You would also need to know - or perhaps guess by trying obvious options - how the numbers were being reduced from the larger range to the smaller one.
Similar results are possible from other PRNGs, but generally only the non-cryptographic ones.
In principle you could identify specific PRNG sequences with very high confidence, but even simple barriers such as missing numbers from the strict sequence can make it a lot harder. There will also be many PRNGs that you will not be able to reliably detect, and typically you will either have close to 100% confidence of a match (to a hackable PRNG) or 0% confidence of any match.
Whether or not a PRNG is a hackable (and therefore could be detected by the numbers it emits) is not a general indicator of PRNG quality. Obviously, "hackable" is opposite to a requirement for "secure", so don't consider Mersenne Twister for creating unguessable codes. However, do consider it as a source of randomness for e.g. neural networks, genetic algorithms, monte-carlo simulations and other places where you need a lot of statistically random-looking data.

How would one know if one saw a random number generator?

I have been reading various articles about random numbers and their generators. There are usually 3 important conclusions that I draw from them:
Random numbers are not truly random
Much of the time they have a bias (modulo bias)
Humans are incapable of being random number generators, when they are trying to "act randomly"
So, with the latter-most of these observations in mind, how would we be able to
Tell if a sequence of numbers that we see is truly random, and more importantly
Is there some way we can prove that said sequence is really random?
I'm tempted to say that so long as you generate a sufficiently large enough sample set 1,000,000+, you should see more or less a uniform dispersion of (pseudo)random numbers occur. However, I'm sure some Maths genius has a way of discrediting this, because surely the by laws of probability you could get a run of one number just as likely as any other sequence.
From what I have read, if you really need random numbers its best to try and reuse what cryptographic libraries use. The field of Cryptography is obviously complex and relies on random numbers for key generation. From the section in OWASP's guide titled "Reversible Authentication Tokens" it says this...
The only way to generate secure authentication tokens is to ensure
there is no way to predict their sequence. In other words: true random
numbers.
It could be argued that computers can not generate true random
numbers, but using new techniques such as reading mouse movements and
key strokes to improve entropy has significantly increased the
randomness of random number generators. It is critical that you do not
try to implement this on your own; use of existing, proven
implementations is highly desirable.
Most operating systems include functions to generate random numbers
that can be called from almost any programming language.
My take is that unless you're coding Cryptographic libraries yourself, put trust in those that are (e.g. use Java Cryptography Extension) so you don't have to proove it yourself.
Pretty Simple Test:
If you really want to get into testing random numbers, you could simulate a program that outputs random numbers from 1-100 100 times as an example.
Then look at those numbers and see if there's any patterns. Then follow that test by restarting the program several times and repeating the process.
Examine all data to figure out if random numbers are always random, just random during individual tests, or never. :P
Testing a random number generator is probably mostly up to what you want to look for. Even pure non-repeatability is no guarantee of randomness.
There are some companies that will test a random number generator for the purposes of certification (e.g. online casinos). One that I found quickly is called iTech Labs, though their testing methodology page leaves a lot to be desired in terms of technical detail.
Other testers and certification bodies publish the required data for a certification; there's more specific detail here but not as much as you want.
You could potentially do a statistical analysis and compare the results of your random number generator to a "true" random source but the argument could be made for bias from trying to translate the true random source into your possibility space anyway.
Randomness tests verify the mathematical properties of the sequence. For example entry frequencies (all symbols are expected to have the same frequency), local variance, sequence analysis (the probability of a symbol must not depend on the previous ones).
A definite proof does not exist, but there is a quality factor - the probability of a sequence to really be random.
Another criterion could be based on compressibility: true randomness has maximum entropy and can not therefore be compressed.
This test is not reliable for randomness, of course, but allows quick and dirty testing with ready tools such as zlib.

Recommendation Needed for a PRNG

I'm looking for a Pseudo-Random Number Generation algorithm capable of producing a random 128-/256-bit number. Security and cryptographic integrity are not important; simplicity and performance are valued above all else. Ideally, the algorithm will be usable on modern mobile phone platforms. Can you recommend such an algorithm? Is it feasible? Thanks in advance!
You should try SFMT: SIMD-oriented Fast Mersenne Twister.
This PRNG has been designed to produce 128-bit integers, by taking advantage of vector instructions offered by processors.
For more information about this PRNG, please have a look at another post I answered to by advising SFMT: best pseudo random number generator
For a complete description, see the official page, where you can also download SFMT: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html
If simplicity is your top priority, look at the generator in this article. The heart of the generator is just two lines of code. It's not state-of-the-art like Mersenne Twister, but it is simpler and still has good statistical properties.
http://burtleburtle.net/bob/rand/smallprng.html
That is small (128 bits of state) and fast and passes every general purpose statistical test available at this time. Every other PRNG linked to in the responses here so far fails tests rapid - the MWC-based PRNG fail many many tests, while SFMT fails only binary matrix rank / linear complexity type tests.
As others have said, to get 128 bits simply concatenate sequential 32 bit outputs. Do not forcibly extract more bits from a PRNGs state that its normal output function yields - that will generally degrade output quality, sometime by a large amount.

how does random() actually work?

Every language has a random() function or something similar to generate a pseudo-random number. I am wondering what happens underneath to generate these numbers? I am not programming anything that makes this knowledge necessary, just trying to satisfy my own curiosity.
The entire first chapter of Donald Knuth's seminal
work Seminumerical Algorithms is taken up with the subject of random number generation. I really don't think an SO answer is going to come close to describing the issues involved. Read the book.
It turns out to be surprisingly easy to get half-way-decent pseudorandom numbers. For decades the gold standard was a remarkably simple algorithm: keep state x, multiply by constant A (32x32 => 64 bits) then add constant B, then return the low 32-bits, which also become the new x. If A and B are chosen carefully this actually works fairly well.
Pseudorandom numbers need to be repeatable, too, in order to reproduce behavior during debugging. So, seeding the generator (initializing x with, say, the time-of-day) is typically avoided during debugging.
In recent years, and with more compute cycles available to burn, more sophisticated algorithms are available, some of them invented since the publication of the otherwise quite authoritive Seminumerical Algorithms. Operating systems are also starting to provide hardware and network-derived entropy bits for specialized cryptographic purposes.
The Wikipedia page is a good reference.
The actual algorithm used is going to be dependent on the language and the implementation of the language.
random() is a so called pseudorandom number generator (PRNG). random() is mostly implemented as a Linear congruential generator. This is a function of the form X(n+1) (aXn +c) modulo m. Xn is the sequence of generated pseudorandom numbers. The genarated sequence of numbers is easy guessable. This algorithm can't be used as a cryptographically safe PRNG.
Wikipedia:Linear congruential generator
And take a look at the diehard tests for PRNG
PRNG Diehard Tests
To exactly answer you answer, the random function is provided by the operation system (usually).
But how the operating system creates this random numbers is a specialized area in computer science. See for example the wiki page posted in the answers above.
One thing you might want to examine is the family of random devices available on some Unix-like OSes like Linux and Mac OSX. For example, on Linux, the kernel gathers entropy from a variety of sources into a pool which it then uses to seed it's pseudo-random number generator. The entropy can come from a variety of sources, the most notable being device driver jitter from keypresses, network events, hard disk activity and (most of all) mouse movements. Aside from this, there are other techniques to gather entropy, some of them even implemented totally in hardware. There are two character devices you can get random bytes from and on Linux, they behave in the following way:
/dev/urandom gives you a constant stream of bytes which is very random but not cryptographically safe because it reuses whatever entropy is available in the pool.
/dev/random gives you cryptographically safe random numbers but it won't give you a constant stream as it uses the entropy available in the pool and then blocks while more entropy is collected.
Note that while Mac OSX uses a different method for it's PRNG and therefore does not block, my personal benchmarks (done in college) have shown it to be every-so-slightly less random than the Linux kernel. Certainly good enough, though.
So, in my projects, when I need randomness, I typically go for reading from one of the random devices, at least for the seed for an algorithm in my program.
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG),1 is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed (which may include truly random values). Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.[2]
PRNGs are central in applications such as simulations (e.g. for the Monte Carlo method), electronic games (e.g. for procedural generation), and cryptography. Cryptographic applications require the output not to be predictable from earlier outputs, and more elaborate algorithms, which do not inherit the linearity of simpler PRNGs, are needed.
Good statistical properties are a central requirement for the output of a PRNG. In general, careful mathematical analysis is required to have any confidence that a PRNG generates numbers that are sufficiently close to random to suit the intended use. John von Neumann cautioned about the misinterpretation of a PRNG as a truly random generator, and joked that "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."[3]
You can check out the wikipedia page for more here

True random number generator [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Sorry for this not being a "real" question, but Sometime back i remember seeing a post here about randomizing a randomizer randomly to generate truly random numbers, not just pseudo random. I dont see it if i search for it.
Does anybody know about that article?
I have to disagree with a lot of the answers to this question.
It is possible to collect random data on a computer. SSL, SSH and VPNs would not be secure if you couldn't.
The way software random number generator work is there is a pool of random data that is gathered from many different places, such as clock drift, interrupt timings, etc.
The trick to these schemes is in correctly estimating the entropy (the posh name for the randomness). It doesn't matter whether the source is bias, as long as you estimate the entropy correctly.
To illustrate this, the chance of me hitting the letter e in this comment is much higher than that of z , so if I were to use key interrupts as a source of entropy it would be bias - but there is still some randomness to be had in that input. You can't predict exactly which sequence of letters will come next in this paragraph. You can extract entropy from this uncertainty and use it part of a random byte.
Good quality real-random generators like Yarrow have quite sophisticated entropy estimation built in to them and will only emit as many bytes as it can reliably say it has in its "randomness pool."
I believe that was on thedailywtf.com - ie. not something that you want to do.
It is not possible to get a truly random number from pseudorandom numbers, no matter how many times you call randomize().
You can get "true" random numbers from special hardware. You could also collect entropy from mouse movements and things like that.
At the end of the post, I will answer your question of why you might want to use multiple random number generators for "more randomness".
There are philosophical debates about what randomness means. Here, I will mean "indistinguishable in every respect from a uniform(0,1) iid distribution over the samples drawn" I am totally ignoring philosophical questions of what random is.
Knuth volume 2 has an analysis where he attempts to create a random number generator as you suggest, and then analyzes why it fails, and what true random processes are. Volume 2 examines RNGs in detail.
The others recommend you using random physical processes to generate random numbers. However, as we can see in the Espo/vt interaction, these processes can have subtle periodic elements and other non-random elements, in part due to outside factors with deterministic behavior. In general, it is best never to assume randomness, but always to test for it, and you usually can correct for such artifacts if you are aware of them.
It is possible to create an "infinite" stream of bits that appears completely random, deterministically. Unfortunately, such approaches grow in memory with the number of bits asked for (as they would have to, to avoid repeating cycles), so their scope is limited.
In practice, you are almost always better off using a pseudo-random number generator with known properties. The key numbers to look for is the phase-space dimension (which is roughly offset between samples you can still count on being uniformally distributed) and the bit-width (the number of bits in each sample which are uniformally random with respect to each other), and the cycle size (the number of samples you can take before the distribution starts repeating).
However, since random numbers from a given generator are deterministically in a known sequence, your procedure might be exposed by someone searching through the generator and finding an aligning sequence. Therefore, you can likely avoid your distribution being immediately recognized as coming from a particular random number generator if you maintain two generators. From the first, you sample i, and then map this uniformally over one to n, where n is at most the phase dimension. Then, in the second you sample i times, and return the ith result. This will reduce your cycle size to (orginal cycle size/n) in the worst case, but for that cycle will still generate uniform random numbers, and do so in a way that makes the search for alignment exponential in n. It will also reduce the independent phase length. Don't use this method unless you understand what reduced cycle and independent phase lengths mean to your application.
An algorithm for truly random numbers cannot exist as the definition of random numbers is:
Having unpredictable outcomes and, in
the ideal case, all outcomes equally
probable; resulting from such
selection; lacking statistical
correlation.
There are better or worse pseudorandom number generators (PRNGs), i.e. completely predictable sequences of numbers that are difficult to predict without knowing a piece of information, called the seed.
Now, PRNGs for which it is extremely hard to infer the seed are cryptographically secure. You might want to look them up in Google if that is what you seek.
Another way (whether this is truly random or not is a philosophical question) is to use random sources of data. For example, unpredictable physical quantities, such as noise, or measuring radioactive decay.
These are still subject to attacks because they can be independently measured, have biases, and so on. So it's really tricky. This is done with custom hardware, which is usually quite expensive. I have no idea how good /dev/random is, but I would bet it is not good enough for cryptography (most cryptography programs come with their own RNG and Linux also looks for a hardware RNG at start-up).
According to Wikipedia /dev/random, in Unix-like operating systems, is a special file that serves as a true random number generator.
The /dev/random driver gathers environmental noise from various non-deterministic sources including, but not limited to, inter-keyboard timings and inter-interrupt timings that occur within the operating system environment. The noise data is sampled and combined with a CRC-like mixing function into a continuously updating ``entropy-pool''. Random bit strings are obtained by taking a MD5 hash of the contents of this pool. The one-way hash function distills the true random bits from pool data and hides the state of the pool from adversaries.
The /dev/random routine maintains an estimate of true randomness in the pool and decreases it every time random strings are requested for use. When the estimate goes down to zero, the routine locks and waits for the occurrence of non-deterministic events to refresh the pool.
The /dev/random kernel module also provides another interface, /dev/urandom, that does not wait for the entropy-pool to re-charge and returns as many bytes as requested. As a result /dev/urandom is considerably faster at generation compared to /dev/random which is used only when very high quality randomness is desired.
John von Neumann once said something to the effect of "anyone attempting to generate random numbers via algorithmic means is, of course, living in sin."
Not even /dev/random is random, in a mathematician's or a physicist's sense of the word. Not even radioisotope decay measurement is random. (The decay rate is. The measurement isn't. Geiger counters have a small reset time after each detected event, during which time they are unable to detect new events. This leads to subtle biases. There are ways to substantially mitigate this, but not completely eliminate it.)
Stop looking for true randomness. A good pseudorandom number generator is really what you're looking for.
If you believe in a deterministic universe, true randomness doesn't exist. :-) For example, someone has suggested that radioactive decay is truly random, but IMHO, just because scientists haven't yet worked out the pattern, doesn't mean that there isn't a pattern there to be worked out. Usually, when you want "random" numbers, what you need are numbers for encryption that no one else will be able to guess.
The closest you can get to random is to measure something natural that no enemy would also be able to measure. Usually you throw away the most significant bits, from your measurement, leaving numbers with are more likely to be evenly spread. Hard core random number users get special hardware that measures radioactive events, but you can get some randomness from the human using the computer from things like keypress intervals and mouse movements, and if the computer doesn't have direct users, from CPU temperature sensors, and from network traffic. You could also use things like web cams and microphones connected to sound cards, but I don't know if anyone does.
To summarize some of what has been said, our working definition of what a secure source of randomness is is similar to our definition of cryptographically secure: it appears random if smart folks have looked at it and weren't able to show that it isn't completely unpredictable.
There is no system for generating random numbers which couldn't conceivably be predicted, just as there is no cryptographic cipher that couldn't conceivably be cracked. The trusted solutions used for important work are merely those which have proven to be difficult to defeat so far. If anyone tells you otherwise, they're selling you something.
Cleverness is rarely rewarded in cryptography. Go with tried and true solutions.
A computer usually has many readily available physical sources of random noise:
Microphone (hopefully in a noisy place)
Compressed video from a webcam (pointed to something variable, like a lava lamp or a street)
Keyboard & mouse timing
Network packet content and timing (the whole world contributes)
And sometimes
Clock drift based hardware
Geiger counters and other detectors of rare events
All sorts of sensors attached to A/D converters
What's difficult is estimating the entropy of these sources, which is in most cases low despite the high data rates and very variable; but entropy can be estimated with conservative assumptions, or at least not wasted, to feed systems like Yarrow or Fortuna.
It's not possible to obtain 'true' random numbers, a computer is a logical construct that can't possibly create 'truly' random anything, only pseudo-random. There are better and worse pseudo-random algorithms out there, however.
In order to obtain a 'truly' random number you need a physical random source, some gambling machines actually have these built in - often it's a radioactive source, the radioactive decay (which as far as I know is truly random) is used to generate the numbers.
One of the best method to generate a random number is through Clock Drift. This primarily works with two oscillators.
An analogy of how this works is imagine a race car on a simple oval circuit with a while line at the start of the lap and also a while line on one of the tyres. When the car completes a lap, a number will be generated based on the difference between the position of the white line on the road and on the tyre.
Very easy to generate and impossible to predict.

Resources