Generating Diffie-hellman parameters (generator) - algorithm

I'm trying to implement a diffie-hellman key exchange. Let's say I found a large prime number p - how can I find a generator g?
Restricted by the multiprecision library that I have to use, only a few basic operations (+, *, -, /, pow, modExp, modMult, mod, gcd, isPrime, genRandomPrime, genRandomBits, and a few more) are available.
Would it work to look for a safe prime q, so that every number n for which gcd(n,q) == 1 should be a generator, right?

You basically answered your question. Just the test gcd(n,q)==1 is not necessary since q is prime. It means that any number n, such that n < q does not have common factor with q and gcd(n,q) will always output 1.
You can check whether q=2p + 1 is prime number. If so, then ord(Zq) = q-1 = (2p+1)-1 = 2p. Since ord(x) | ord(Zq) for every x in Zq ord(x)=2 or ord(x)=p or ord(x)=2p. Thus you just need to check whether your randomly chosen element x from {2,...,q-1} is of order 2. If not then it is of order p or 2p and you can use it as a generator.

As a rule, don't ask programmers questions about cryptography. Cryptography is subtle and, as a result, difficult in invisible ways that lead readily to self-deception about one's own competence. Instead, ask cryptographers (many of which are also programmers). Stack Exchange has a cryptography board, where this question has already been answered.
https://crypto.stackexchange.com/questions/29926/what-diffie-hellman-parameters-should-i-use
I could quibble with the advice there, but it's basically sound. Unless you really want to learn the relevant mathematics, I'd defer to authorities; they're cited in the answer above.
As to the mathematics question you ask, here's a tiny introduction. The multiplicative group modulo a prime p has size p-1. (See Fermat's Little Theorem.) The order of any element must divide p-1. The most favorable case is where p-1=2q, where q is also prime.

You've already gotten the ritual admonishment not to roll your own crypto if you care at all about your security, so here's how to find a generator mod a safe prime q. A number g in the closed range [2, q - 2] is a generator if and only if g^((q-1)/2) != 1 mod q, which you should compute with the standard algorithm for modular exponentiation. Choose random values of g until one passes the test.

Related

Hashing with the Division Method - Choosing number of slots?

So, in CLRS, there's this quote
A prime not too close to an exact power of 2 is often a good choice for m.
Several Questions...
I understand how a power of 2 will just be the lower order bits of your key...however, say you have keys from a universe of 1 to 1 million, with each key having an equal probability of being any number from universe (which I'm guessing is a common assumption about your universe if given no other data?) then wouldn't taking say the 4 lower order bits result in (2^4) lower order bit patterns that were pretty much equally likely for the keys from 1 to 1 million? How am I thinking about this incorrectly?
Why a prime number? So, if power of 2's aren't a good idea, why is a prime number a better choice as opposed to a composite number close to a power of 2 (Also why should it be close to a power of 2...lol)?
You are trying to find a hash table that works well for typical input data, and typical input data does things that you wouldn't expect from good random number generators. Very often you get formatted or semi-formatted strings which, when converted to numbers, end up as K, K+A, K+2A, K+3A,.... for some integers K and A. If K+xA and K+yA hash to the same number mod m, then (x-y)A must be 0 mod m. If m is prime, this can only happen if A = 0 mod m or if x = y mod m, so one time in m. But if m=pq and A happens to be divisible by p, then you get a collision every time x-y is divisible by q, which is more often since q < m.
I guess close to a power of 2 because it might be convenient for the memory management system to have blocks of memory of the resulting size - I really don't know. If you really care, and if you have the time, you could try different primes with some representative data and see which of them are best in practice.

recurrence using (mod 2^32+1)

Where m = 2^32+1 = 641*6700417 the mod function is little more than a single subtract on 32-bit processors. I don’t care that the recurrence
Seed = Seed*a%m
is not a good random number generator. I wish to use it in an encryption algorithm as a 32-bit wide sbox. Is there an algorithm that would return true if a trial value of “a” would cause the recurrence to visit all 2^32 values?
Assuming that such an algorithm exits I suspect that if a*b%m = 1 then the recurrence using “b” would run backwards. Is what I suspect true. I would use “b” to implement the inverse sbox.
I can do everything I ask using mod (2^16+1) but that number is prime.
Is there an algorithm that would return true if a trial value of “a” would cause the recurrence to visit all 232 values?
Yes there is:
return false;
The most obvious reason is that the set of all 232 possible values includes the value zero, and there the recurrence gets stuck, so it isn't cyclic. But even if you exclude zero, if you start with a multiple of 641, then you will only ever visit multiples of 641, and the same holds for the other factor.
This kind of “visit all values” property only works if you reduce modulo some prime, and if you exclude zero.
This is not a very simple question. It is easy to answer that in your case the numbers you use are poor. However, the easy answer: use primes is not the correct answer, either.
If we use the recurrence:
r := (r a) mod m
it is easy to see, that this gives the maximal cycle (m-1) if and only if all a^i mod m give different numbers for i = 0 .. m-2. However, this does not automatically happen even if both a and m are primes.
An example of two primes which cannot be used: a = 13, m = 17, because 13^4 mod 17 == 1, and the cycle will be very short (4 steps).
So, we need some other requirements, as well. To make a very long story short, a generator of this type (multiplicative congruential generator) produces the maximal cycle (m-1) if:
m is prime
a is a primitive root of m
Unfortunately, the latter requirement is a bit difficult, as there is no general formula to find primitive roots. (And please note that a does not have to be a prime, for example the combination m=17 and a=10 gives the full cycle.)
So, despite this being a seemingly simple problem, it touches some rather fundamental aspects of number theory.

Create a random permutation of 1..N in constant space

I am looking to enumerate a random permutation of the numbers 1..N in fixed space. This means that I cannot store all numbers in a list. The reason for that is that N can be very large, more than available memory. I still want to be able to walk through such a permutation of numbers one at a time, visiting each number exactly once.
I know this can be done for certain N: Many random number generators cycle through their whole state space randomly, but entirely. A good random number generator with state size of 32 bit will emit a permutation of the numbers 0..(2^32)-1. Every number exactly once.
I want to get to pick N to be any number at all and not be constrained to powers of 2 for example. Is there an algorithm for this?
The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.
Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.
Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).
If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.
One way to do it would be
Find a prime p larger than N, preferably not much larger.
Find a primitive root of unity g modulo p, that is, a number 1 < g < p such that g^k ≡ 1 (mod p) if and only if k is a multiple of p-1.
Go through g^k (mod p) for k = 1, 2, ..., ignoring the values that are larger than N.
For every prime p, there are φ(p-1) primitive roots of unity, so it works. However, it may take a while to find one. Finding a suitable prime is much easier in general.
For finding a primitive root, I know nothing substantially better than trial and error, but one can increase the probability of a fast find by choosing the prime p appropriately.
Since the number of primitive roots is φ(p-1), if one randomly chooses r in the range from 1 to p-1, the expected number of tries until one finds a primitive root is (p-1)/φ(p-1), hence one should choose p so that φ(p-1) is relatively large, that means that p-1 must have few distinct prime divisors (and preferably only large ones, except for the factor 2).
Instead of randomly choosing, one can also try in sequence whether 2, 3, 5, 6, 7, 10, ... is a primitive root, of course skipping perfect powers (or not, they are in general quickly eliminated), that should not affect the number of tries needed greatly.
So it boils down to checking whether a number x is a primitive root modulo p. If p-1 = q^a * r^b * s^c * ... with distinct primes q, r, s, ..., x is a primitive root if and only if
x^((p-1)/q) % p != 1
x^((p-1)/r) % p != 1
x^((p-1)/s) % p != 1
...
thus one needs a decent modular exponentiation (exponentiation by repeated squaring lends itself well for that, reducing by the modulus on each step). And a good method to find the prime factor decomposition of p-1. Note, however, that even naive trial division would be only O(√p), while the generation of the permutation is Θ(p), so it's not paramount that the factorisation is optimal.
Another way to do this is with a block cipher; see this blog post for details.
The blog posts links to the paper Ciphers with Arbitrary Finite Domains which contains a bunch of solutions.
Consider the prime 3. To fully express all possible outputs, think of it this way...
bias + step mod prime
The bias is just an offset bias. step is an accumulator (if it's 1 for example, it would just be 0, 1, 2 in sequence, while 2 would result in 0, 2, 4) and prime is the prime number we want to generate the permutations against.
For example. A simple sequence of 0, 1, 2 would be...
0 + 0 mod 3 = 0
0 + 1 mod 3 = 1
0 + 2 mod 3 = 2
Modifying a couple of those variables for a second, we'll take bias of 1 and step of 2 (just for illustration)...
1 + 2 mod 3 = 0
1 + 4 mod 3 = 2
1 + 6 mod 3 = 1
You'll note that we produced an entirely different sequence. No number within the set repeats itself and all numbers are represented (it's bijective). Each unique combination of offset and bias will result in one of prime! possible permutations of the set. In the case of a prime of 3 you'll see that there are 6 different possible permuations:
0,1,2
0,2,1
1,0,2
1,2,0
2,0,1
2,1,0
If you do the math on the variables above you'll not that it results in the same information requirements...
1/3! = 1/6 = 1.66..
... vs...
1/3 (bias) * 1/2 (step) => 1/6 = 1.66..
Restrictions are simple, bias must be within 0..P-1 and step must be within 1..P-1 (I have been functionally just been using 0..P-2 and adding 1 on arithmetic in my own work). Other than that, it works with all prime numbers no matter how large and will permutate all possible unique sets of them without the need for memory beyond a couple of integers (each technically requiring slightly less bits than the prime itself).
Note carefully that this generator is not meant to be used to generate sets that are not prime in number. It's entirely possible to do so, but not recommended for security sensitive purposes as it would introduce a timing attack.
That said, if you would like to use this method to generate a set sequence that is not a prime, you have two choices.
First (and the simplest/cheapest), pick the prime number just larger than the set size you're looking for and have your generator simply discard anything that doesn't belong. Once more, danger, this is a very bad idea if this is a security sensitive application.
Second (by far the most complicated and costly), you can recognize that all numbers are composed of prime numbers and create multiple generators that then produce a product for each element in the set. In other words, an n of 6 would involve all possible prime generators that could match 6 (in this case, 2 and 3), multiplied in sequence. This is both expensive (although mathematically more elegant) as well as also introducing a timing attack so it's even less recommended.
Lastly, if you need a generator for bias and or step... why don't you use another of the same family :). Suddenly you're extremely close to creating true simple-random-samples (which is not easy usually).
The fundamental weakness of LCGs (x=(x*m+c)%b style generators) is useful here.
If the generator is properly formed then x%f is also a repeating sequence of all values lower than f (provided f if a factor of b).
Since bis usually a power of 2 this means that you can take a 32-bit generator and reduce it to an n-bit generator by masking off the top bits and it will have the same full-range property.
This means that you can reduce the number of discard values to be fewer than N by choosing an appropriate mask.
Unfortunately LCG Is a poor generator for exactly the same reason as given above.
Also, this has exactly the same weakness as I noted in a comment on #JerryCoffin's answer. It will always produce the same sequence and the only thing the seed controls is where to start in that sequence.
Here's some SageMath code that should generate a random permutation the way Daniel Fischer suggested:
def random_safe_prime(lbound):
while True:
q = random_prime(lbound, lbound=lbound // 2)
p = 2 * q + 1
if is_prime(p):
return p, q
def random_permutation(n):
p, q = random_safe_prime(n + 2)
while True:
r = randint(2, p - 1)
if pow(r, 2, p) != 1 and pow(r, q, p) != 1:
i = 1
while True:
x = pow(r, i, p)
if x == 1:
return
if 0 <= x - 2 < n:
yield x - 2
i += 1

finding smallest scale factor to get each number within one tenth of a whole number from a set of doubles

Suppose we have a set of doubles s, something like this:
1.11, 1.60, 5.30, 4.10, 4.05, 4.90, 4.89
We now want to find the smallest, positive integer scale factor x that any element of s multiplied by x is within one tenth of a whole number.
Sorry if this isn't very clear—please ask for clarification if needed.
Please limit answers to C-style languages or algorithmic pseudo-code.
Thanks!
You're looking for something called simultaneous Diophantine approximation. The usual statement is that you're given real numbers a_1, ..., a_n and a positive real epsilon and you want to find integers P_1, ..., P_n and Q so that |Q*a_j - P_j| < epsilon, hopefully with Q as small as possible.
This is a very well-studied problem with known algorithms. However, you should know that it is NP-hard to find the best approximation with Q < q where q is another part of the specification. To the best of my understanding, this is not relevant to your problem because you have a fixed epsilon and want the smallest Q, not the other way around.
One algorithm for the problem is (Lenstra–Lenstra)–Lovász's lattice reduction algorithm. I wonder if I can find any good references for you. These class notes mention the problem and algorithm, but probably aren't of direct help. Wikipedia has a fairly detailed page on the algorithm, including a fairly large list of implementations.
To answer Vlad's modified question (if you want exact whole numbers after multiplication), the answer is known. If your numbers are rationals a1/b1, a2/b2, ..., aN/bN, with fractions reduced (ai and bi relatively prime), then the number you need to multiply by is the least common multiple of b1, ..., bN.
This is not a full answer, but some suggestions:
Note: I'm using "s" for the scale factor, and "x" for the doubles.
First of all, ask yourself if brute force doesn't work. E.g. try s = 1, then s = 2, then s = 3, and so forth.s
We have a list of numbers x[i], and a tolerance t = 1/10. We want to find the smallest positive integer s, such that for each x[i], there is an integer q[i] such that |s * x[i] - q[i]| < t.
First note that if we can produce an ordered list for each x[i], it's simple enough to merge these to find the smallest s that will work for all of them. Secondly note that the answer depends only on the fractional part of x[i].
Rearranging the test above, we have |x - q/s| < t/s. That is, we want to find a "good" rational approximation for x, in the sense that the approximation should be better than t/s. Mathematicians have studied a variant of this where the criterion for "good" is that it has to be better than any with a smaller "s" value, and the best way to find these is through truncations of the continued fraction expansion.
Unfortunately, this isn't quite what you need, since once you get under your tolerance, you don't necessarily need to continue to get increasingly better -- the same tolerance will work. The next obvious thing is to use this to skip to the first number that would work, and do brute force from there. Unfortunately, for any number the largest the first s can be is 5, so that doesn't buy you all that much. However, this method will find you an s that works, just not the smallest one. Can we use this s to find a smaller one, if it exists? I don't know, but it'll set an upper limit for brute forcing.
Also, if you need the tolerance for each x to be < t, than this means the tolerance for the product of all x must be < t^n. This might let you skip forward a great deal, and set a reasonable lower limit for brute forcing.

Why is Random class isn't really random? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've read about it on a message board - Random class isn't really random. It is created with predictable fashion using a mathematical formula.
Is it really true? If so, Random isn't really random??
Because deterministic computers are really bad at generating "true" random numbers by themselves.
Also, a predictable/repeatable random sequence is often surprisingly useful, since it helps in testing.
It's really hard to create something that is absolutely random. See the Wikipedia articles on randomness and pseudo-randomness
As others have already said, Random creates pseudo-random numbers, depending on some seed value. It may be helpful to know that the .NET class Random has two constructors:
Random(int Seed)
creates a random number generator with a given seed value, helpful if you want reproducible behaviour of your program. On the other hand,
Random()
creates a random number generator with date-time depending seed value, which means, almost every time you start your program again, it will produce a different sequence of (pseudo-)random numbers.
The sequence is predictable for each starting seed. For different seeds, different sequences of numbers are returned. If the seed used is itself random (such as the DatetTime.Now.Ticks), then the numbers returned a adequately 'random'.
Alternatively, you can use a cryptographic random number generator such as the RNGCryptoServiceProvider class.
It isn't random it's a random-like number generating algorithm and it's based on a number to generate. If you set that random number to something like the system time the numbers are more close to random, but if you use these numbers to lets say, an encryption algorithm, is the attacker knows WHEN you generate the random numbers and the algorithm you use, then it is more possible that your encryption will break.
The only way to generate true random numbers is to measure something natural, for example voltage levels or have a microphone picking up sounds somewhere or something like that.
It is true, but you can always seed the random number generator with some time dependent value, or if you're really prepared to push the boat out, look at www.random.org...
In the case of the Random class though, I think it should be random enough for most requirements... I can't see a method to actually seed it, so I'm guessing it must automatically seed as built in behaviour...
Correct. Class Random is not absolutely totally random. The important question is, is it as statistically close to being random as you need it to be. The output from class Random is statistically as nearly random as a reasonable deterministic program can be. The algorithm uses a 48-bit seed modified by a linear congruential formula. If the Random object is created using the parameterless constructor, the 48 low-order bits of milli-second time get used as the seed. If the Random object is created using the seed parameter (a long), the 48 low-order bits of the long get used as the seed.
If Random is instanced with the same seed and make the exact same sequence of next calls are made from it, the exact same sequence of values will result from that instance. This is deliberate to allow for predictable software testing and dmonstrations. Ordinarliy, Random is not used with a constant seed for operational use since it is usually used to get unpredictable psuedo-random sequences. If two instances of Random with the parameterless constructors get created in the same clock millisecond, they will also get the same sequences from both instances. It is important to note that eventually, a Random instance will repeat its pattern. Therefore, a Random instance should not be used for enormously long sequences before creating a new instance.
There is no reason not to use the Random class except for high-security cryptographic applications or some special need where some aspect of true randomness is of paramount importance, something that is uncommon. In those cases, you really need a hardware randomizer that uses radioactive decay or infinitesimal molecular level brownian motion induced randomness to generate a random result. Sun SPARC hardware platforms had such hardware installable. Other platforms can have them too, along with the hardware drivers that give access to the randomness they generate.
The algorithm used in class Random is the result of considerable research by some of the best minds in computer science and mathematics. Given the right parameters, it provides remarkable and outstanding results. Other more recent algorithms may be better for some limited applications, but they also have performance or specific application issues that make them less suitible for general purpose use. The linear congruential algorithm still remains one of the most widely used general purpose pseudo-random number generators.
The following quote is from Donald Knuth's book, The Art of Computer Programming, Volume 2, Semi-numerical Algorithms, Section 3.2.1. The quote describes the linear congruential method and discusses its properties. If you don't know who Donald Knuth is or have never read any of his papers or books, he, amongst other things, showed that there can be no sort faster than Tony Hoare's Quicksort with partion pivot strategies created by Robert Sedgewick. Robert Sedgewick, who suggested the best simple pivot selection strategies for Quicksort, did his doctoral thesis on Quicksort under Donald Knuth's supervision. Knuth's multi-volume work, The Art Of Computer Programming, is one of the greatest expositions of the most important theoretical aspects of computing ever assembled, including sorting, searching and randomizing algorithms. There is a lot of discussion in Chapter 3 of this about what randomness really is, statistically and philosophically, and about software that emmulates true randomness to the point where it is statistically nearly indistinguishable from it for very large, but still finite, sequences. What follows is pretty heavy reading:
3.2.1. The Linear Congruential Method
By far the most popular random number generators in use today are special cases of the following
scheme, introduced by D. H. Lehmer in 1949. [See Proc. 2nd Symp. on
Large-Scale Digital Calculating Machinery (Cambridge, Mass.: Harvard
University Press, 1951), 141-146.]
We choose four magic integers:
m, the modulus; 0 < m.
a, the multiplier; 0 <= a < m.
c, the increment; 0 <= c < m.
X[0], the starting value; 0 <= X[0] < m. (equation 1)
The desired sequence of random numbers (X[n] ) is then obtained by setting
X[n+1] = (a * X[n] + c) mod m, n >= O. (equation 2)
This is called a linear congruential sequence. Taking the remainder mod m is somewhat
like determining where a ball will land in a spinning roulette wheel.
For example, the sequence obtained when m == 10 and X[0] == a == c == 7 is
7, 6, 9, 0, 7, 6, 9, 0, ... . (example 3)
As this example shows, the
sequence is not always "random" for all choices of m, a, c, and X[0];
the principles of choosing the magic numbers appropriately will be
investigated carefully in later parts of this chapter.
Example (3)
illustrates the fact that the congruential sequences always get into
a loop: There is ultimately a cycle of numbers that is repeated
endlessly. This property is common to all sequences having the
general form X[n+1] = f(X[n]), when f transforms a finite set into
itself; see exercise 3.1-6. The repeating cycle is called the period;
sequence (3) has a period of length 4. A useful sequence will of
course have a relatively long period.
The special case c == 0 deserves
explicit mention, since the number generation process is a little
faster when c == 0 than it is when c != O. We shall see later that the
restriction c == 0 cuts down the length of the period of the sequence,
but it is still possible to make the period reasonably long. Lehmer's
original
generation method had c == 0, although he mentioned c != 0 as a possibility; the fact that c
!= 0 can lead to longer periods is due to Thomson [Compo J. 1 (1958), 83, 86] and, independently, > to Rotenberg [JACM 7 (1960), 75-77]. The
terms multiplicative congruential method and mixed congruential
method are used by many authors to denote linear congruential
sequences with c == 0 and c != 0, respectively.
The letters m, a, c,
and X[0] will be used throughout this chapter in the sense described
above. Furthermore, we will find it useful to define
b = a - 1, (equation 4)
in order to simplify many of our formulas.
We can immediately reject the case a == 1, for this would mean that X[n] = (X[0]
+ n * c) mod m, and the sequence would certainly not behave as a random sequence. The case a == 0 is even worse. Hence for practical purposes
we may assume that
a >= 2, b >= 1. (equation 5)
Now we can prove a generalization of Eq. (2),
X[n+k] = (a^k * X[n] + (a^k - 1) * c / b) mod m, k >= 0, n >= 0,
(equation 6)
which expresses the (n+k)th term directly in terms of the nth
term. (The special case n == 0 in this equation is worthy of note.) It
follows that the subsequence consisting of every kth term of (X[n])
is another linear congruential sequence, having the multiplier a k
mod m and the increment ((a^k - 1) * c / b) mod m.
An important corollary
of (6) is that the general sequence defined by m, a, c, and X[0] can be
expressed very simply in terms of the special case where c == 1 and X[0]
== O. Let
Y[0] = 0, Y[n+1] = (a * Y[n+1]) mod m. (equation 7)
According to Eq. (6) we will have Y[k] === (a^k - 1) / b(modulo m), hence the general
sequence defined in (2) satisfies
X[n] = (A * Y[n] + X[0]) mod m, where A == (X[0] * b + c) mod m. (equation 8)

Resources