Input File Format for NIST Test for Randomness - random

I wish to check the randomness of the 32-bits numbers generated via. a random number generator. I have 1000 numbers of 32-bits each.
How should I create my ASCII file?
What is the value for the bitstream in .\assess bitstream command? Is it '1000*32' or '1000' or '32'?
Reference:
https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-22r1a.pdf

The individual tests (there are 15 different tests) that are conducted in NIST take a bit sequence and compute a P-value for this sequence which is used to decide randomness. Now, different tests have different lower limits for the sequence length (n). The minimum value for n ranges from 100 to 1,000,000 in the tests available on NIST.
So if you want to run a test, you need at least 100 bits in your sequence. And finally, you need multiple sequences for each test to decide randomness reliably (e.g., 100 or 1000 sequences each with n bits). All these sequences will be tested individually and the proportion of sequences that passed the tests will be reported at the end.
Here is an example. I have 100,000,000 (100M) bits randomly generated from some source. I run the NIST test dividing them into 100 sequences, each with 1,000,000 bits. Therefore, the NIST command in my case would be:
./assess 1000000
Later the program will put a prompt for the number of sequences and I will have to enter 100 there.
You can also use different lengths for different tests by running them separately.

Related

Looking for a pseudo random number generation algorithm with specific properties

I'm looking for a pseudo random number generator which has the following properties:
Non-repeating: The returned numbers must be unique until all numbers from 0 to n have been returned once, only then it can repeat each number once more, etc.
Deterministic: If I used the same seed twice it needs to result in the same sequence.
Few allocations: It should not require to allocate a large memory area in order to then mix its data up like sequence permutations would.
My goal is that I could initialize the random number generator with some seed value and then continuously call its function to generate the next number in the sequence, possibly passing it the previous one.
One possible method is a block cypher. Encrypt the numbers 0, 1, 2, ... with a given key and the output is guaranteed unique, and will only repeat once the block size is passed. Each key will generate a different permutation. You just need to keep track of the key and the last number you encrypted.
DES uses a 64 bit block and AES uses a 128 bit block. If those sizes don't suit then you need to look at Format preserving encryption for an appropriately sized block.
One point to note, a non-repeating generator is not random. As more numbers are generated the pool of unused numbers shrinks, until the last number is fully determined. You need to consider if this is important in your application.

How can a pseudorandom number generator possibly be non-repeating?

My understanding is that PRNG's work by using an input seed and an algorithm that converts it to a very unrelated output, so that the next generated number is as unpredictable as possible. But here's the problem I see with it:
Any pseudorandom number generator that I can imagine has to have a finite number of outcomes. Let's say that I'm using a random number generator that can generate any number between 0 and one hundred billion. If I call for an output one hundred billion and one times, I can be certain that one number has been output more than once. If the same seed will always give the same output when put through an algorithm, then I can be sure that the PRNG will begin a loop. Where is my logic flawed here?
In the case that I am correct, if you know the algorithm for a PRNG, and that PRNG is being used for cryptography, can not this approach be used (and are there any measures in place to prevent it?):
Use the PRNG to generate the entire looping set of numbers possible.
Know the timestamp of when a private key was generated, and know the time AND -output of the PRNG later on
Based on how long it takes to calculate, determine how many numbers are between the known output and the unknown one
Lookup in the pre-generated list to find the generated number
You are absolutely right that in theory that approach can be used to break a PRNG, since, as you noted, given a sufficiently long sequence of outputs, you can start to predict what comes next.
The issue is that "sufficiently long" might be so long that this approach is completely impractical. For example, the Mersenne twister PRNG, which isn't designed for cryptographic use, has a period of 219,937 - 1, which is so long that it's completely infeasible to attempt the attack that you're describing.
Generally speaking, imagine that a pseudorandom generator uses n bits of internal storage. That gives 2n possible internal configurations of those bits, meaning that you may need to see 2n + 1 outputs before you're guaranteed to see a repeat. Given that most cryptographically secure PRNGs use at least 256 bits of internal storage, this makes your attack infeasible.
One detail worth noting is that there's a difference between "the PRNG repeats a number" and "from that point forward the numbers will always be the same." It's possible that a PRNG will repeat an output multiple times before moving on to output a different number next, provided that the internal state is different each time.
You are correct, a PRNG produces a long sequence of numbers and then repeats. For ordinary use this is usually sufficient. Not for cryptographic use, as you point out.
For ideal cryptographic numbers, we need to use a true RNG (TRNG), which generates random numbers from some source of entropy (= randomness in this context). Such a source may be a small piece of radioactive material on a card, thermal noise in a disconnected microphone circuit or other possibilities. A mixture of many different sources will be more resistant to attacks.
Commonly such sources of entropy do not produce enough random numbers to be used directly. That is where PRNGs are used to 'stretch' the real entropy to produce more pseudo random numbers from the smaller amount of entropy provided by the TRNG. The entropy is used to seed the PRNG and the PRNG produces more numbers based on that seed. The amount of stretching allowed is limited, so the attacker never gets a long enough string of pseudo-random numbers to do any worthwhile analysis. After the limit is reached, the PRNG must be reseeded from the TRNG.
Also, the PRNG should be reseeded anyway after every data request, no matter how small. There are various cryptographic primitives that can help with this, such as hashes. For example, after every data request, a further 128 bits of data could be generated, XOR'ed with any accumulated entropy available, hashed and the resulting hash output used to reseed the generator.
Cryptographic RNGs are slower than ordinary PRNGs because they use slow cryptographic primitives and because they take extra precautions against attacks.
For an example of a CSPRNG see Fortuona
It's possible to create truly random number generators on a PC because they are undeterministic machines.
Indeed, with the complexity of the hierarchical memory levels, the intricacy of the CPU pipelines, the coexistence of innumerable processes and threads activated at arbitrary moments and competing for resources, and the asynchronism of the I/O devices, there is no predictable relation between the number of operations performed and the elapsed time.
So looking at the system time every now and then is a perfect source randomness.
Any pseudorandom number generator that I can imagine has to have a finite number of outcomes.
I don't see why that's true. Why can't it have gradually increasing state, failing when it runs out of memory?
Here's a trivial PRNG algorithm that never repeats:
1) Seed with any amount of data unknown to an attacker as the seed.
2) Compute the SHA512 hash of the data.
3) Output the first 256 bits of that hash.
4) Append the last byte of that hash to the data.
5) Go to step 2.
And, for practical purposes, this doesn't matter. With just 128 bits of state, you can generate a PRNG that won't repeat for 340282366920938463463374607431768211456 outputs. If you pull a billion outputs a second for a billion years, you won't get through a billionth of them.

Need an Algorithm to generate Serialnumber

I want to generate 16-digits hexadecimal serial-number like: F204-8BE2-17A2-CFF3.
(This pattern give me 16^16 distinct serial-number But I don't need all of them)
I need you all to suggest me an algorithm to generate these serial-numbers randomly with an special characteristic which is:
each two serial-numbers have (at-least) 6 different digits
(= It means if you are given two most similar serial-number, they should still have difference in 6 indexes)
I know that a good algorithm with this characteristic needs to remember previously generated serial-numbers and I don't want that much.
In fact, I need an algorithm which do this with least probability for a chosen pair to collide (less than 0.001 seems sufficient )
PS:
I've just tried to create 10K string randomly using MD5 hash and It gave similar string( similar=more than 3 same digits) with 0.00018 probability.
It is possible to construct a correct generator without having to remember all previously generated codes. You can generate serial numbers that are spaced 6 characters apart by using Hamming code. A hamming code can be designed to arbitrarily space out two distinct generated values. Obviously, the greater the distance, the higher redundancy you will have to use, resulting in more complex code and longer numbers.
First you design a hamming code to your liking, that encodes a number into a sequence of hexadecimal digits and then you can take any sequence of numbers and use it as a seed, such as prime numbers. You just always need to remember, what number was used last and use the next one.
That being said, if you don't need to properly ensure minimal distance of two serials, and would settle for a small error, I would suggest that any half decent hash function or cypher should produce decently spaced out outputs. Therefore the first thing I would try to do is to take MD5 or SHA hashes and test-drive them on numbers 1 - 1000. My hopes are, the results will be quite satisfactory.
I suggest you look into the ANSI X9.17 pseudorandom bit generator. An algorithmic sketch is given in these slides. ANSI X9.17 generates 64-bit pseudorandom strings which is what you want.
A revised and enhanced version of this generator was approved by NIST. Please have a look at this page.
Now whether you use ANSI X9.17 generator, another generator, or develop your own, it's a good idea to have the generator pass some statistical tests in order to ensure the quality of its pseudorandom bits.
Example tests include the ENT battery, the DIEHARD battery, and the NIST battery.

Make CURAND generate positive different random numbers less than a specific number

I am trying to use CURAND library to generate random numbers which are completely independent of each other. Hence I want to give different seeds to each thread.
So, Question 1: How do I give different seeds to each thread?(Is there some time function in CUDA which I can use?)
Now I also want to generate this random number between a range i.e 0 to 10000. How do I accomplish that to happen.
Currently I am using curand_normal (as I want to have numbers from normal distribution) but its giving me negative and same numbers which I do not want.
Setting different seeds is not a statistically sound way to get independent (non-correlated) random numbers (with any single random number generator). You would be better off selecting different sub-sequences of a single sequence, and most random number libraries will allow you to do that, including cuRAND.
Check out the examples in the CUDA SDK, for example the EstimatePiP or EstimatePiInlineP examples use cuRAND to generate pseudo-random numbers.
For the second part of your question, as mentioned in the cuRAND manual the curand_normal() routines return Normally distributed numbers with mean 0.0 and standard deviation 1.0 (i.e. Standard Normal Distribution). Clearly that means that you will have ~50% negative numbers.
It doesn't make sense to specify a fixed range along with the Normal distribution. You either want some other distribution (e.g. Uniform) with the fixed range or else you want the Normal distribution with a specific mean and standard distribution. To get from the Standard Normal to your target mean/std.dev. you simply multiply the random draw by the target standard deviation and add the target mean.

how to generate longer random number from a short random number?

I have a short random number input, let's say int 0-999.
I don't know the distribution of the input. Now I want to generate a random number in range 0-99999 based on the input without changing the distribution shape.
I know there is a way to make the input to [0,1] by dividing it by 999 and then multiple 99999 to get the result. However, this method doesn't cover all the possible values, like 99999 will never get hit.
Assuming your input is some kind of source of randomness...
You can take two consecutive inputs and combine them:
input() + 1000*(input()%100)
Be careful though. This relies on the source having plenty of entropy, so that a given input number isn't always followed by the same subsequent input number. If your source is a PRNG designed to cycle between the numbers 0–999 in some fashion, this technique won't work.
With most production entropy sources (e.g., /dev/urandom), this should work fine. OTOH, with a production entropy source, you could fetch a random number between 0–99999 fairly directly.
You can try something like the following:
(input * 100) + random
where random is a random number between 0 and 99.
The problem is that input only specifies which 100 range to use. For instance 50 just says you will have a number between 5000 and 5100 (to keep a similar shape distribution). Which number between 5000 and 5100 to pick is up to you.

Resources