Why is Random class isn't really random? [closed] - random

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've read about it on a message board - Random class isn't really random. It is created with predictable fashion using a mathematical formula.
Is it really true? If so, Random isn't really random??

Because deterministic computers are really bad at generating "true" random numbers by themselves.
Also, a predictable/repeatable random sequence is often surprisingly useful, since it helps in testing.

It's really hard to create something that is absolutely random. See the Wikipedia articles on randomness and pseudo-randomness

As others have already said, Random creates pseudo-random numbers, depending on some seed value. It may be helpful to know that the .NET class Random has two constructors:
Random(int Seed)
creates a random number generator with a given seed value, helpful if you want reproducible behaviour of your program. On the other hand,
Random()
creates a random number generator with date-time depending seed value, which means, almost every time you start your program again, it will produce a different sequence of (pseudo-)random numbers.

The sequence is predictable for each starting seed. For different seeds, different sequences of numbers are returned. If the seed used is itself random (such as the DatetTime.Now.Ticks), then the numbers returned a adequately 'random'.
Alternatively, you can use a cryptographic random number generator such as the RNGCryptoServiceProvider class.

It isn't random it's a random-like number generating algorithm and it's based on a number to generate. If you set that random number to something like the system time the numbers are more close to random, but if you use these numbers to lets say, an encryption algorithm, is the attacker knows WHEN you generate the random numbers and the algorithm you use, then it is more possible that your encryption will break.
The only way to generate true random numbers is to measure something natural, for example voltage levels or have a microphone picking up sounds somewhere or something like that.

It is true, but you can always seed the random number generator with some time dependent value, or if you're really prepared to push the boat out, look at www.random.org...
In the case of the Random class though, I think it should be random enough for most requirements... I can't see a method to actually seed it, so I'm guessing it must automatically seed as built in behaviour...

Correct. Class Random is not absolutely totally random. The important question is, is it as statistically close to being random as you need it to be. The output from class Random is statistically as nearly random as a reasonable deterministic program can be. The algorithm uses a 48-bit seed modified by a linear congruential formula. If the Random object is created using the parameterless constructor, the 48 low-order bits of milli-second time get used as the seed. If the Random object is created using the seed parameter (a long), the 48 low-order bits of the long get used as the seed.
If Random is instanced with the same seed and make the exact same sequence of next calls are made from it, the exact same sequence of values will result from that instance. This is deliberate to allow for predictable software testing and dmonstrations. Ordinarliy, Random is not used with a constant seed for operational use since it is usually used to get unpredictable psuedo-random sequences. If two instances of Random with the parameterless constructors get created in the same clock millisecond, they will also get the same sequences from both instances. It is important to note that eventually, a Random instance will repeat its pattern. Therefore, a Random instance should not be used for enormously long sequences before creating a new instance.
There is no reason not to use the Random class except for high-security cryptographic applications or some special need where some aspect of true randomness is of paramount importance, something that is uncommon. In those cases, you really need a hardware randomizer that uses radioactive decay or infinitesimal molecular level brownian motion induced randomness to generate a random result. Sun SPARC hardware platforms had such hardware installable. Other platforms can have them too, along with the hardware drivers that give access to the randomness they generate.
The algorithm used in class Random is the result of considerable research by some of the best minds in computer science and mathematics. Given the right parameters, it provides remarkable and outstanding results. Other more recent algorithms may be better for some limited applications, but they also have performance or specific application issues that make them less suitible for general purpose use. The linear congruential algorithm still remains one of the most widely used general purpose pseudo-random number generators.
The following quote is from Donald Knuth's book, The Art of Computer Programming, Volume 2, Semi-numerical Algorithms, Section 3.2.1. The quote describes the linear congruential method and discusses its properties. If you don't know who Donald Knuth is or have never read any of his papers or books, he, amongst other things, showed that there can be no sort faster than Tony Hoare's Quicksort with partion pivot strategies created by Robert Sedgewick. Robert Sedgewick, who suggested the best simple pivot selection strategies for Quicksort, did his doctoral thesis on Quicksort under Donald Knuth's supervision. Knuth's multi-volume work, The Art Of Computer Programming, is one of the greatest expositions of the most important theoretical aspects of computing ever assembled, including sorting, searching and randomizing algorithms. There is a lot of discussion in Chapter 3 of this about what randomness really is, statistically and philosophically, and about software that emmulates true randomness to the point where it is statistically nearly indistinguishable from it for very large, but still finite, sequences. What follows is pretty heavy reading:
3.2.1. The Linear Congruential Method
By far the most popular random number generators in use today are special cases of the following
scheme, introduced by D. H. Lehmer in 1949. [See Proc. 2nd Symp. on
Large-Scale Digital Calculating Machinery (Cambridge, Mass.: Harvard
University Press, 1951), 141-146.]
We choose four magic integers:
m, the modulus; 0 < m.
a, the multiplier; 0 <= a < m.
c, the increment; 0 <= c < m.
X[0], the starting value; 0 <= X[0] < m. (equation 1)
The desired sequence of random numbers (X[n] ) is then obtained by setting
X[n+1] = (a * X[n] + c) mod m, n >= O. (equation 2)
This is called a linear congruential sequence. Taking the remainder mod m is somewhat
like determining where a ball will land in a spinning roulette wheel.
For example, the sequence obtained when m == 10 and X[0] == a == c == 7 is
7, 6, 9, 0, 7, 6, 9, 0, ... . (example 3)
As this example shows, the
sequence is not always "random" for all choices of m, a, c, and X[0];
the principles of choosing the magic numbers appropriately will be
investigated carefully in later parts of this chapter.
Example (3)
illustrates the fact that the congruential sequences always get into
a loop: There is ultimately a cycle of numbers that is repeated
endlessly. This property is common to all sequences having the
general form X[n+1] = f(X[n]), when f transforms a finite set into
itself; see exercise 3.1-6. The repeating cycle is called the period;
sequence (3) has a period of length 4. A useful sequence will of
course have a relatively long period.
The special case c == 0 deserves
explicit mention, since the number generation process is a little
faster when c == 0 than it is when c != O. We shall see later that the
restriction c == 0 cuts down the length of the period of the sequence,
but it is still possible to make the period reasonably long. Lehmer's
original
generation method had c == 0, although he mentioned c != 0 as a possibility; the fact that c
!= 0 can lead to longer periods is due to Thomson [Compo J. 1 (1958), 83, 86] and, independently, > to Rotenberg [JACM 7 (1960), 75-77]. The
terms multiplicative congruential method and mixed congruential
method are used by many authors to denote linear congruential
sequences with c == 0 and c != 0, respectively.
The letters m, a, c,
and X[0] will be used throughout this chapter in the sense described
above. Furthermore, we will find it useful to define
b = a - 1, (equation 4)
in order to simplify many of our formulas.
We can immediately reject the case a == 1, for this would mean that X[n] = (X[0]
+ n * c) mod m, and the sequence would certainly not behave as a random sequence. The case a == 0 is even worse. Hence for practical purposes
we may assume that
a >= 2, b >= 1. (equation 5)
Now we can prove a generalization of Eq. (2),
X[n+k] = (a^k * X[n] + (a^k - 1) * c / b) mod m, k >= 0, n >= 0,
(equation 6)
which expresses the (n+k)th term directly in terms of the nth
term. (The special case n == 0 in this equation is worthy of note.) It
follows that the subsequence consisting of every kth term of (X[n])
is another linear congruential sequence, having the multiplier a k
mod m and the increment ((a^k - 1) * c / b) mod m.
An important corollary
of (6) is that the general sequence defined by m, a, c, and X[0] can be
expressed very simply in terms of the special case where c == 1 and X[0]
== O. Let
Y[0] = 0, Y[n+1] = (a * Y[n+1]) mod m. (equation 7)
According to Eq. (6) we will have Y[k] === (a^k - 1) / b(modulo m), hence the general
sequence defined in (2) satisfies
X[n] = (A * Y[n] + X[0]) mod m, where A == (X[0] * b + c) mod m. (equation 8)

Related

Hashing with the Division Method - Choosing number of slots?

So, in CLRS, there's this quote
A prime not too close to an exact power of 2 is often a good choice for m.
Several Questions...
I understand how a power of 2 will just be the lower order bits of your key...however, say you have keys from a universe of 1 to 1 million, with each key having an equal probability of being any number from universe (which I'm guessing is a common assumption about your universe if given no other data?) then wouldn't taking say the 4 lower order bits result in (2^4) lower order bit patterns that were pretty much equally likely for the keys from 1 to 1 million? How am I thinking about this incorrectly?
Why a prime number? So, if power of 2's aren't a good idea, why is a prime number a better choice as opposed to a composite number close to a power of 2 (Also why should it be close to a power of 2...lol)?
You are trying to find a hash table that works well for typical input data, and typical input data does things that you wouldn't expect from good random number generators. Very often you get formatted or semi-formatted strings which, when converted to numbers, end up as K, K+A, K+2A, K+3A,.... for some integers K and A. If K+xA and K+yA hash to the same number mod m, then (x-y)A must be 0 mod m. If m is prime, this can only happen if A = 0 mod m or if x = y mod m, so one time in m. But if m=pq and A happens to be divisible by p, then you get a collision every time x-y is divisible by q, which is more often since q < m.
I guess close to a power of 2 because it might be convenient for the memory management system to have blocks of memory of the resulting size - I really don't know. If you really care, and if you have the time, you could try different primes with some representative data and see which of them are best in practice.

Generating Diffie-hellman parameters (generator)

I'm trying to implement a diffie-hellman key exchange. Let's say I found a large prime number p - how can I find a generator g?
Restricted by the multiprecision library that I have to use, only a few basic operations (+, *, -, /, pow, modExp, modMult, mod, gcd, isPrime, genRandomPrime, genRandomBits, and a few more) are available.
Would it work to look for a safe prime q, so that every number n for which gcd(n,q) == 1 should be a generator, right?
You basically answered your question. Just the test gcd(n,q)==1 is not necessary since q is prime. It means that any number n, such that n < q does not have common factor with q and gcd(n,q) will always output 1.
You can check whether q=2p + 1 is prime number. If so, then ord(Zq) = q-1 = (2p+1)-1 = 2p. Since ord(x) | ord(Zq) for every x in Zq ord(x)=2 or ord(x)=p or ord(x)=2p. Thus you just need to check whether your randomly chosen element x from {2,...,q-1} is of order 2. If not then it is of order p or 2p and you can use it as a generator.
As a rule, don't ask programmers questions about cryptography. Cryptography is subtle and, as a result, difficult in invisible ways that lead readily to self-deception about one's own competence. Instead, ask cryptographers (many of which are also programmers). Stack Exchange has a cryptography board, where this question has already been answered.
https://crypto.stackexchange.com/questions/29926/what-diffie-hellman-parameters-should-i-use
I could quibble with the advice there, but it's basically sound. Unless you really want to learn the relevant mathematics, I'd defer to authorities; they're cited in the answer above.
As to the mathematics question you ask, here's a tiny introduction. The multiplicative group modulo a prime p has size p-1. (See Fermat's Little Theorem.) The order of any element must divide p-1. The most favorable case is where p-1=2q, where q is also prime.
You've already gotten the ritual admonishment not to roll your own crypto if you care at all about your security, so here's how to find a generator mod a safe prime q. A number g in the closed range [2, q - 2] is a generator if and only if g^((q-1)/2) != 1 mod q, which you should compute with the standard algorithm for modular exponentiation. Choose random values of g until one passes the test.

recurrence using (mod 2^32+1)

Where m = 2^32+1 = 641*6700417 the mod function is little more than a single subtract on 32-bit processors. I don’t care that the recurrence
Seed = Seed*a%m
is not a good random number generator. I wish to use it in an encryption algorithm as a 32-bit wide sbox. Is there an algorithm that would return true if a trial value of “a” would cause the recurrence to visit all 2^32 values?
Assuming that such an algorithm exits I suspect that if a*b%m = 1 then the recurrence using “b” would run backwards. Is what I suspect true. I would use “b” to implement the inverse sbox.
I can do everything I ask using mod (2^16+1) but that number is prime.
Is there an algorithm that would return true if a trial value of “a” would cause the recurrence to visit all 232 values?
Yes there is:
return false;
The most obvious reason is that the set of all 232 possible values includes the value zero, and there the recurrence gets stuck, so it isn't cyclic. But even if you exclude zero, if you start with a multiple of 641, then you will only ever visit multiples of 641, and the same holds for the other factor.
This kind of “visit all values” property only works if you reduce modulo some prime, and if you exclude zero.
This is not a very simple question. It is easy to answer that in your case the numbers you use are poor. However, the easy answer: use primes is not the correct answer, either.
If we use the recurrence:
r := (r a) mod m
it is easy to see, that this gives the maximal cycle (m-1) if and only if all a^i mod m give different numbers for i = 0 .. m-2. However, this does not automatically happen even if both a and m are primes.
An example of two primes which cannot be used: a = 13, m = 17, because 13^4 mod 17 == 1, and the cycle will be very short (4 steps).
So, we need some other requirements, as well. To make a very long story short, a generator of this type (multiplicative congruential generator) produces the maximal cycle (m-1) if:
m is prime
a is a primitive root of m
Unfortunately, the latter requirement is a bit difficult, as there is no general formula to find primitive roots. (And please note that a does not have to be a prime, for example the combination m=17 and a=10 gives the full cycle.)
So, despite this being a seemingly simple problem, it touches some rather fundamental aspects of number theory.

Create a random permutation of 1..N in constant space

I am looking to enumerate a random permutation of the numbers 1..N in fixed space. This means that I cannot store all numbers in a list. The reason for that is that N can be very large, more than available memory. I still want to be able to walk through such a permutation of numbers one at a time, visiting each number exactly once.
I know this can be done for certain N: Many random number generators cycle through their whole state space randomly, but entirely. A good random number generator with state size of 32 bit will emit a permutation of the numbers 0..(2^32)-1. Every number exactly once.
I want to get to pick N to be any number at all and not be constrained to powers of 2 for example. Is there an algorithm for this?
The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.
Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.
Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).
If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.
One way to do it would be
Find a prime p larger than N, preferably not much larger.
Find a primitive root of unity g modulo p, that is, a number 1 < g < p such that g^k ≡ 1 (mod p) if and only if k is a multiple of p-1.
Go through g^k (mod p) for k = 1, 2, ..., ignoring the values that are larger than N.
For every prime p, there are φ(p-1) primitive roots of unity, so it works. However, it may take a while to find one. Finding a suitable prime is much easier in general.
For finding a primitive root, I know nothing substantially better than trial and error, but one can increase the probability of a fast find by choosing the prime p appropriately.
Since the number of primitive roots is φ(p-1), if one randomly chooses r in the range from 1 to p-1, the expected number of tries until one finds a primitive root is (p-1)/φ(p-1), hence one should choose p so that φ(p-1) is relatively large, that means that p-1 must have few distinct prime divisors (and preferably only large ones, except for the factor 2).
Instead of randomly choosing, one can also try in sequence whether 2, 3, 5, 6, 7, 10, ... is a primitive root, of course skipping perfect powers (or not, they are in general quickly eliminated), that should not affect the number of tries needed greatly.
So it boils down to checking whether a number x is a primitive root modulo p. If p-1 = q^a * r^b * s^c * ... with distinct primes q, r, s, ..., x is a primitive root if and only if
x^((p-1)/q) % p != 1
x^((p-1)/r) % p != 1
x^((p-1)/s) % p != 1
...
thus one needs a decent modular exponentiation (exponentiation by repeated squaring lends itself well for that, reducing by the modulus on each step). And a good method to find the prime factor decomposition of p-1. Note, however, that even naive trial division would be only O(√p), while the generation of the permutation is Θ(p), so it's not paramount that the factorisation is optimal.
Another way to do this is with a block cipher; see this blog post for details.
The blog posts links to the paper Ciphers with Arbitrary Finite Domains which contains a bunch of solutions.
Consider the prime 3. To fully express all possible outputs, think of it this way...
bias + step mod prime
The bias is just an offset bias. step is an accumulator (if it's 1 for example, it would just be 0, 1, 2 in sequence, while 2 would result in 0, 2, 4) and prime is the prime number we want to generate the permutations against.
For example. A simple sequence of 0, 1, 2 would be...
0 + 0 mod 3 = 0
0 + 1 mod 3 = 1
0 + 2 mod 3 = 2
Modifying a couple of those variables for a second, we'll take bias of 1 and step of 2 (just for illustration)...
1 + 2 mod 3 = 0
1 + 4 mod 3 = 2
1 + 6 mod 3 = 1
You'll note that we produced an entirely different sequence. No number within the set repeats itself and all numbers are represented (it's bijective). Each unique combination of offset and bias will result in one of prime! possible permutations of the set. In the case of a prime of 3 you'll see that there are 6 different possible permuations:
0,1,2
0,2,1
1,0,2
1,2,0
2,0,1
2,1,0
If you do the math on the variables above you'll not that it results in the same information requirements...
1/3! = 1/6 = 1.66..
... vs...
1/3 (bias) * 1/2 (step) => 1/6 = 1.66..
Restrictions are simple, bias must be within 0..P-1 and step must be within 1..P-1 (I have been functionally just been using 0..P-2 and adding 1 on arithmetic in my own work). Other than that, it works with all prime numbers no matter how large and will permutate all possible unique sets of them without the need for memory beyond a couple of integers (each technically requiring slightly less bits than the prime itself).
Note carefully that this generator is not meant to be used to generate sets that are not prime in number. It's entirely possible to do so, but not recommended for security sensitive purposes as it would introduce a timing attack.
That said, if you would like to use this method to generate a set sequence that is not a prime, you have two choices.
First (and the simplest/cheapest), pick the prime number just larger than the set size you're looking for and have your generator simply discard anything that doesn't belong. Once more, danger, this is a very bad idea if this is a security sensitive application.
Second (by far the most complicated and costly), you can recognize that all numbers are composed of prime numbers and create multiple generators that then produce a product for each element in the set. In other words, an n of 6 would involve all possible prime generators that could match 6 (in this case, 2 and 3), multiplied in sequence. This is both expensive (although mathematically more elegant) as well as also introducing a timing attack so it's even less recommended.
Lastly, if you need a generator for bias and or step... why don't you use another of the same family :). Suddenly you're extremely close to creating true simple-random-samples (which is not easy usually).
The fundamental weakness of LCGs (x=(x*m+c)%b style generators) is useful here.
If the generator is properly formed then x%f is also a repeating sequence of all values lower than f (provided f if a factor of b).
Since bis usually a power of 2 this means that you can take a 32-bit generator and reduce it to an n-bit generator by masking off the top bits and it will have the same full-range property.
This means that you can reduce the number of discard values to be fewer than N by choosing an appropriate mask.
Unfortunately LCG Is a poor generator for exactly the same reason as given above.
Also, this has exactly the same weakness as I noted in a comment on #JerryCoffin's answer. It will always produce the same sequence and the only thing the seed controls is where to start in that sequence.
Here's some SageMath code that should generate a random permutation the way Daniel Fischer suggested:
def random_safe_prime(lbound):
while True:
q = random_prime(lbound, lbound=lbound // 2)
p = 2 * q + 1
if is_prime(p):
return p, q
def random_permutation(n):
p, q = random_safe_prime(n + 2)
while True:
r = randint(2, p - 1)
if pow(r, 2, p) != 1 and pow(r, q, p) != 1:
i = 1
while True:
x = pow(r, i, p)
if x == 1:
return
if 0 <= x - 2 < n:
yield x - 2
i += 1

Constraint Satisfaction: Choosing real numbers with certain characteristics

I have a set of n real numbers. I also have a set of functions,
f_1, f_2, ..., f_m.
Each of these functions takes a list of numbers as its argument. I also have a set of m ranges,
[l_1, u_1], [l_2, u_2], ..., [l_m, u_m].
I want to repeatedly choose a subset {r_1, r_2, ..., r_k} of k elements such that
l_i <= f_i({r_1, r_2, ..., r_k}) <= u_i for 1 <= i <= m.
Note that the functions are smooth. Changing one element in {r_1, r_2, ..., r_k} will not change f_i({r_1, r_2, ..., r_k}) by much. average and variance are two f_i that are commonly used.
These are the m constraints that I need to satisfy.
Moreover I want to do this so that the set of subsets I choose is uniformly distributed over the set of all subsets of size k that satisfy these m constraints. Not only that, but I want to do this in an efficient manner. How quickly it runs will depend on the density of solutions within the space of all possible solutions (if this is 0.0, then the algorithm can run forever). (Assume that f_i (for any i) can be computed in a constant amount of time.)
Note that n is large enough that I cannot brute-force the problem. That is, I cannot just iterate through all k-element subsets and find which ones satisfy the m constraints.
Is there a way to do this?
What sorts of techniques are commonly used for a CSP like this? Can someone point me in the direction of good books or articles that talk about problems like this (not just CSPs in general, but CSPs involving continuous, as opposed to discrete values)?
Assuming you're looking to write your own application and use existing libraries to do this, there are choices in many languages, like Python-constraint, or Cream or Choco for Java, or CSP for C++. The way you've described the problem it sound like you're looking for a general purpose CSP solver. Are there any properties of your functions that may help reduce the complexity, such as being monotonic?
Given the problem as you've described it, you can pick from each range r_i uniformly and throw away any m-dimensional point that fails to meet the criterion. It will be uniformly distributed because the original is uniformly distributed and the set of subsets is a binary mask over the original.
Without knowing more about the shape of f, you can't make any guarantees about whether time is polynomial or not (or even have any idea of how to hit a spot that meets the constraint). After all, if f_1 = (x^2 + y^2 - 1) and f_2 = (1 - x^2 - y^2) and the constraints are f_1 < 0 and f_2 < 0, you can't satisfy this at all (and without access to the analytic form of the functions, you could never know for sure).
Given the information in your message, I'm not sure it can be done at all...
Consider:
numbers = {1....100}
m = 1 (keep it simple)
F1 = Average
L1 = 10
U1 = 50
Now, how many subset of {1...100} can you come up with that produces an average between 10 & 50?
This looks like a very hard problem. For the simplest case with linear functions you could take a look at linear programming.

Resources