generating unique random numbers with aes - random

I m studiying the possibility of generating unique random number with AES-128. When the input is unique, the output is also unique after executing the AES algorithm. The execution of the ares algorithm results a kind of vector let's note by Rand[].
Let's note the pseudo-random number pingOffset = (Rand[0] + Rand[1]x 256) modulo Period (the parameter "period indicates a periodicity fixed number for every pingOffset generation).
Can I confirm so that the pingOffset is unique for every unique input. (The input is unique beacause it consist of a device address which is unique).

Related

Random number generator with freely chosen period

I want a simple (non-cryptographic) random number generation algorithm where I can freely choose the period.
One candidate would be a special instance of LCG:
X(n+1) = (aX(n)+c) mod m (m,c relatively prime; (a-1) divisible by all prime factors of m and also divisible by 4 if m is).
This has period m and does not restrict possible values of m.
I intend to use this RNG to create a permutation of an array by generating indices into it. I tried the LCG and it might be OK. However, it may not be "random enough" in that distances between adjacent outputs have very few possible values (i.e, plotting x(n) vs n gives a wrapped line). The arrays I want to index into have some structure that has to do with this distance and I want to avoid potential issues with this.
Of course, I could use any good PRNG to shuffle (using e.g. Fisher–Yates) an array [1,..., m]. But I don't want to have to store this array of indices. Is there some way to capture the permuted indices directly in an algorithm?
I don't really mind the method ending up biased w.r.t choice of RNG seed. Only the period matters and the permuted sequence (for a given seed) being reasonably random.
Encryption is a one-to-one operation. If you encrypt a range of numbers, you will get the same count of apparently random numbers back. In this case the period will be the size of the chosen range. So for a period of 20, encrypt the numbers 0..19.
If you want the output numbers to be in a specific range, then pick a block cipher with an appropriately sized block and use Format Preserving Encryption if needed, as #David Eisenstat suggests.
It is not difficult to set up a cipher with almost any reasonable block size, so long as it is an even number of bits, using the Feistel structure. If you don't require cryptographic security then four or six Feistel rounds should give you enough randomness.
Changing the encryption key will give you a different ordering of the numbers.

What is the probability that a UUID, stripped of all of its letters and dashes, is unique?

Say I have a UUID a9318171-2276-498c-a0d6-9d6d0dec0e84.
I then remove all the letters and dashes to get 9318171227649806960084.
What is the probability that this is unique, given a set of ID's that are generated in the same way? How does this compare to a normal set of UUID's?
UUIDs are represented as 32 hexadecimal (base-16) digits, displayed in 5 groups separated by hyphens. The issue with your question is that for any generated UUID we could get any valid hexadecimal number from the set of [ 0-9,A-F ] inclusive.
This leaves us with a dilemma since we don't know, beforehand, how many of the hexadecimal digits generated for each UUID would be an alpha-characte: [A-F]. The only thing that we can be certain of, is that each generated character of the UUID has a 5/16 chance of being an alpha character: [A-F]. Knowing this makes it impossible to answer this question accurately since removing the hyphens and alpha characters leaves us with variable length UUIDs for each generated UUID...
With that being said, to give you something to think about we know that each UUID is 36 characters in length, including the hyphens. So if we simplify and say, we have no hyphens, now each UUID can be only be 32 characters in length. Building on this if we further simplify and say that each of the 32 characters can only be a numeric character: [0-9] we could now give an accurate probability for uniqueness of each generated, simplified, UUID (according to our above mentioned simplifications):
Assuming a UUID is represented by 32 characters, where each character is a numerical character from the set of [0-9]. We know that we need to generate 32 numbers in order to create a valid simplified UUID. Now the chances of selecting any given number: [0-9] is 1/10. Another way to think about this is the following: each number has an equal opportunity of being generated and since there are 10 numbers: each number has a 10% chance of being generated.
Furthermore, when a number is generated, the number is generated independently of the previously generated numbers i.e. each number generated doesn't depend on the outcome of the previous number generated. Therefore, for each of the 32 numeric characters generated: each number is independent of one another and since the outcome of any number selected is a number and only a number from [0-9] we can say that each number selected is mututally exclusive to one another.
Knowing these facts we can take advantage of the Product Rule which states that the probability of the occurrence of two independent events is the product of their individual probabilities. For example, the probability of getting two heads on two coin tosses is 0.5 x 0.5 or 0.25. Therefore, the generation of two identical UUIDs would be:
1/10 * 1/10 * 1/10 * .... * 1/10 where the number of 1/10s would be 32.
Simplifying to 1/(10^32), or in general: to 1/(10^n) where n is the length of your UUID. So with all that being said the possibility of generating two unique UUIDs, given our assumptions, is infinitesimally small.
Hopefully that helps!

Generating non-colliding random numbers from the combination of 2 numbers in a set?

I have a set of 64-bit unsigned integers with length >= 2. I pick 2 random integers, a, b from that set. I apply a deterministic operation to combine a and b into different 64-bit unsigned integers, c_1, c_2, c_3, etc. I add those c_ns to the set. I repeat that process.
What procedure can I use to guarantee that c will practically never collide with an existing bitstring on the set, even after millions of steps?
Since you're generating multiple 64-bit values from a pair of 64-bit numbers, I would suggest that you select two numbers at random, and use them to initialize a 64 bit xorshift random number generator with 128 bits of state. See https://en.wikipedia.org/wiki/Xorshift#xorshift.2B for an example.
However, it's rather difficult to predict the collision probability when you're using multiple random number generators. With a single PRNG, the rule of thumb is that you'll have a 50% chance of a collision after generating the square root of the range. For example, if you were generating 32-bit random numbers, your collision probability reaches 50% after about 70,000 numbers generated. Square root of 2^32 is 65,536.
With a single 64-bit PRNG, you could generate more than a billion random numbers without too much worry about collisions. In your case, you're picking two numbers from a potentially small pool, then initializing a PRNG and generating a relatively small number of values that you add back to the pool. I don't know how to calculate the collision probability in that case.
Note, however, that whatever the probability of collision, the possibility of collision always exists. That "one in a billion" chance does in fact occur: on average once every billion times you run the program. You're much better off saving your output numbers in a hash set or other data structure that won't allow you to store duplicates.
I think the best you can do without any other given constraints is to use a pseudo-random function that maps two 64-bit integers to a 64-bit integer. Depending on whether the order of a and b matter for your problem or not (i.e. (3, 5) should map to something else than (5, 3)) you shouldn't or should sort them before.
The natural choice for a pseudo-random function that maps a larger input to a smaller input is a hash function. You can select any hash function that produces an output of at least 64-bit and truncate it. (My favorite in this case would be SipHash with an arbitrary fixed key, it is fast and has public domain implementations in many languages, but you might just use whatever is available.)
The expected amount of numbers you can generate before you get a collision is determined by the birthday bound, as you are essentially selecting values at random. The linked article contains a table for the probabilities for 64-bit values. As an example, if you generate about 6 million entries, you have a collision probability of one in a million.
I don't think it is possible to beat this approach in the general case, as you could encode an arbitrary amount of information in the sequence of elements you combine while the amount of information in the output value is fixed to 64-bit. Thus you have to consider collisions, and a random function spreads out the probability evenly among all possible sequences.

About Mersenne Twister generator's period

I have read that Mersenne Twister generator has a period of 2¹⁹⁹³⁷ - 1, but I'm confused about why can that be possible. I see this implementation of the Mersenne Twister algorithm and in the first comment it clearly says that it produces values in the range 0 to 2³² - 1. Therefore, after it has produced 2³² - 1 different random numbers, it will necessarily come back to the starting point (the seed), so the period can be at maximum 2³² - 1.
Also (and tell me if I'm wrong, please), a computer can't hold the number (2¹⁹⁹³⁷ - 1) ~ 4.3×10⁶⁰⁰¹, at least in a single block of memory. What am I missing here?
Your confusion stems from thinking that the output number and the internal state of a PRNG have to be the same thing.
Some very old PRNGs used to do this, such as Linear Congruental Generators. In those generators, the current output was fed back into the generator for the next step.
However, most PRNGS, including the Mersenne Twister, work from a much larger state, which it updates and uses to generate a 32-bit number (it doesn't really matter which order this is done in for the purposes of this answer).
In fact, the Mersenne Twister does indeed store 624 times 32-bit values, and that is 19968 bits, enough to contain the very long period that you are wondering about. The values are handled separately (as unsigned 32-bit integers), not treated as one giant number in a single-step calculation. The 32-bit random number you get from the output is related to this state, but does not determine the next number by itself.
You are wrong at
Therefore, after it has produced 2³² - 1 different random numbers, it
will necessarily come back to the starting point (the seed)...
That's right that the next number can be the same with one of the number already generated, but the internal state of the random number generator will not be the same. (Noone told you that every number in the range 2³² - 1 will be generated at the 2³² - 1th step.) So there's no bijection between the random number generated and the internal state of the generator. The random number generated can be calculated from the state but you don't even have to do it. You can step the internal state also without creating the random number.
And of course, the computer doesn't store the whole number sequence. It calculates the random number from the internal state. Consider a number sequence like 1, -1, 1, -1 ... you can generate the Nth number without storing number of N elements.

Is there "good" PRNG generating values without hidden state?

I need some good pseudo random number generator that can be computed like a pure function from its previous output without any state hiding. Under "good" I mean:
I must be able to parametrize generator in such way that running it for 2^n iterations with any parameters (or with some large subset of them) should cover all or almost all values between 0 and 2^n - 1, where n is the number of bits in output value.
Combined generator output of n + p bits must cover all or almost all values between 0 and 2^(n + p) - 1 if I run it for 2^n iterations for every possible combination of its parameters, where p is the number of bits in parameters.
For example, LCG can be computed like a pure function and it can meet first condition, but it can not meet second one. Say, we have 32-bit LCG, m = 2^32 and it is constant, our p = 64 (two 32-bit parameters a and c), n + p = 96, so we must peek data by three ints from output to meet second condition. Unfortunately, condition can not be meet because of strictly alternating sequence of odd and even ints in output. To overcome this, hidden state must be introduced, but that makes function not pure and breaks first condition (long hidden period).
EDIT: Strictly speaking, I want family of functions parametrized by p bits and with full state of n bits, each generating all possible binary strings of p + n bits in unique "randomish" way, not just continuously incrementing (p + n)-bit int. Parametrization required to select that unique way.
Am I wanting too much?
You can use any block cipher, with a fixed key. To generate the next number, decrypt the current one, increment it, and re-encrypt it. Because block ciphers are 1:1, they'll necessarily iterate through every number in the output domain before repeating.
Try LFSR
All you need is list of primitive polynomials.
Period of generating finite field this way, generates field of size 2^n-1. But you can generalise this procedure to generate anything whit period of k^n-1.
I have not seen this implemented, but all you have to implement is shifting numbers by small number s>n where gcd(s,2^n-1) == 1. gcd stands for greatest common divisor

Resources