Why modulo 65521 in Adler-32 checksum algorithm? - algorithm

The Adler-32 checksum algorithm does sums modulo 65521. I know that 65521 is the largest prime number that fits in 16 bits, but why is it important to use a prime number in this algorithm?
(I'm sure the answer will seem obvious once someone tells me, but the number-theory parts of my brain just aren't working. Even without expertise in checksum algorithms, a smart person who reads http://en.wikipedia.org/wiki/Fletcher%27s_checksum can probably explain it to me.)

Why was mod prime used for Adler32?
From Adler's own website http://zlib.net/zlib_tech.html
However, Adler-32
has been constructed to minimize the
ways to make small changes in the data
that result in the same check value,
through the use of sums significantly
larger than the bytes and by using a
prime (65521) for the modulus. It is
in this area that some analysis is
deserved, but it has not yet been
done.
The main reason for Adler-32 is, of
course, speed in software
implementations.
An alternative to Adler-32 is Fletcher-32, which replaces the modulo of 65521 with 65535. This paper shows that Fletcher-32 is superior for channels with low-rate random bit errors.
It was used because primes tend to have better mixing properties. Exactly how good it is remains to be discussed.
Other Explanations
Someone else in this thread makes a somewhat convincing argument that modulus a prime is better for detecting bit-swapping. However, this is most likely not the case because bit-swapping is extremely rare. The two most prevalent errors are:
Random bit-flips (1 <-> 0) common anywhere.
Bit shifting (1 2 3 4 5 -> 2 3 4 5 or 1 1 2 3 4 5) common in networking
Most of the bit-swapping out there is caused by random bit-flips that happened to look like a bit swap.
Error correction codes are in fact, designed to withstand n-bits of deviation. From Adler's website:
A properly constructed CRC-n has the
nice property that less than n bits in
error is always detectable. This is
not always true for Adler-32--it can
detect all one- or two-byte errors but
can miss some three-byte errors.
Effectiveness of using a prime modulus
I did a long writeup on essentially the same question. Why modulo a prime number?
http://www.codexon.com/posts/hash-functions-the-modulo-prime-myth
The short answer
We know much less about prime numbers than composite ones. Therefore people like Knuth started using them.
While it might be true that primes have less relationship to much of the data we hash, increasing the table/modulo size also decreases the probability of a collision (sometimes more than any benefit gained from rounding down to the nearest prime).
Here is a graph of collisions per bucket with 10 million cryptographically random integers comparing mod 65521 vs 65535.

The Adler-32 algorithm is to compute
A = 1 + b1 + b2 + b3 + ...
and
B = (1 + b1) + (1 + b1 + b2) + (1 + b1 + b2 + b3) + ... = 1 + b1 + 2 * b2 + 3 * b3 + ...
and report them modulo m. When m is prime, the numbers modulo m form what mathematicians call a field. Fields have the handy property that for any nonzero c, we have a = b if and only if c * a = c * b. Compare the times table modulo 6, which is not a prime, with the times table modulo 5, which is:
* 0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 1 2 3 4 5
2 0 2 4 0 2 4
3 0 3 0 3 0 3
4 0 4 2 0 4 2
5 0 5 4 3 2 1
* 0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1
Now, the A part gets fooled whenever we interchange two bytes -- addition is commutative after all. The B part is supposed to detect this kind of error, but when m is not a prime, more locations are vulnerable. Consider an Adler checksum mod 6 of
1 3 2 0 0 4
We have A = 4 and B = 1. Now consider swapping b2 and b4:
1 0 2 3 0 4
A and B are unchanged because 2 * 3 = 4 * 0 = 2 * 0 = 4 * 3 (modulo 6). One can also swap 2 and 5 to the same effect. This is more likely when the times table is unbalanced -- modulo 5, these changes are detected. In fact, the only time a prime modulus fails to detect a single swap is when two equal indexes mod m are swapped (and if m is big, they must be far apart!).^ This logic can also be applied to interchanged substrings.
The disadvantage in using a smaller modulus is that it will fail slightly more often on random data; in the real world, however, corruption is rarely random.
^ Proof: suppose that we swap indexes i and j with values a and b. Then ai + bj = aj + bi, so ai - aj + bj - bi = 0 and (a - b)*(i - j) = 0. Since a field is an integral domain, it follows that a = b (values are congruent) or i = j (indexes are congruent).
EDIT: the website that Unknown linked to (http://www.zlib.net/zlib_tech.html) makes it clear that the design of Adler-32 was not at all principled. Because of the Huffman code in a DEFLATE stream, even small errors are likely to change the framing (because it's data-dependent) and cause large errors in the output. Consider this answer a slightly contrived example for why people ascribe certain properties to primes.

Long story short:
The modulo of a prime has the best bit-shuffeling properties, and that's exactly what we want for a hash-value.

For perfectly random data, the more buckets the better.
Let's say the data is non-random in some way. Now, the only way that the non-randomness could affect the algorithm is by creating a situation where some buckets have a higher probability of being used than others.
If the modulo number is non-prime, then any pattern affecting one of the numbers making up the modulo could affect the hash. So if you're using 15, a pattern every 3 or 5 as well as every 15 could cause collisions, while if you're using 13 the pattern would have to be every 13 to cause collisions.
65535 = 3*5*17*257, so a pattern involving 3 or 5 could cause collisions using this modulo-- if multiples of 3 were much more common for some reason, for instance, then only the buckets which were multiples of 3 would be put to good use.
Now I'm not sure whether, realistically, this is likely to be an issue. It would be good to determine the collision rate empirically with actual data of the type one wants to hash, not random numbers. (For instance, would numerical data involving http://en.wikipedia.org/wiki/Benford's_law">Benford's Law or some such irregularity cause patterns that would affect this algorithm? How about using ASCII codes for realistic text?)

Checksums are generally used with the intention of detecting that two things are different, especially in cases where both things are not available at the same time and place. They might be available at different places (e.g. a packet of information as sent, versus a packet of information as received), or different times (e.g. a block of information when it was stored, versus a block of information when it was read back). In some cases, it may be desirable to check whether two things that are stored independently in two different places are likely to match, without having to send the actual data from one device to the other (e.g. comparing loaded code images or configurations).
If the only reasons that the things being compared wouldn't match would be random corruption of one of them, then the use of a prime modulus for an Adler-32 checksum is probably not particularly helpful. If, however, it's possible that one of the things might have had some 'deliberate' changes made to it, use of a non-prime modulus may cause certain changes to go unnoticed. For example, the effects of changing a byte from 00 to FF, and changing another byte that's some multiple of 257 bytes earlier or later from FF to 00, would cancel out when using a Fletcher's checksum, but not when using Adler-32 checksum. It's not particularly likely that such a scenario would occur from random corruption, but such offsetting changes could occur when changing a program. It wouldn't be especially likely that they'd occur an exact multiple of 257 bytes apart, but it's a risk which can be avoided by using a prime modulus (provided, at least, that the number of bytes in the file is smaller than the modulus)

The answer lies in the field theory.
The set Z/Z_n with the operations plus und times is a field when n is a prime (i.e. addition und multiplication with modulo n).
In other words, the following equation:
m * x = (in Z/Z_n)
has only one solution for any value ofm (namely x = 0)
Consider this example:
2 * x = 0 (mod 10)
This equation has two solutions, x = 0 AND x = 5. That is because 10 is not a prime and can be written as 2 * 5.
This property is responsible for better distribution of the hash values.

Related

Knowing the number of bits to represent a number

I learn that in order to determine the number of bits needed to represent a number n is by taking the logarithm of n, i.e. log(n) (base 2). However, I am not convinced! Look at my example:
if n=4, then I need log4 = 2 bits to represent 4, but 4 is (100) in binary which is clearly 3 bits!!
Can someone explain why?
Thank you.
Are you sure you aren't talking about n bit arrangements ?
With 2 bits you have 4 different sequences:
00
01
10
11
The number 4 is effectively 100 in binary, but I'm suspecting that you mixed those concepts.
To most direct scheme, you take ceil(log2(N+1)) with log2 expressed as floating.
In pure integral, a naive scheme would be to divide (integral div, thus trunc) the number by 2 until you get a result of zero (e.g. 4/2=2, 2/2=1, 1/2=0 - three divisions to go to zero, thus 3 bits are needed).
More advanced schemes exist, but going that path may hurt you performance - modern CPU-es have instructions to detect the position of the msb set to 1 for a number, instructions which require very few CPU cycles.

Improve number compression algorithm?

I have many unique numbers, all positive and the order doesn't matter, 0 < num < 2^32.
Example: 23 56 24 26
The biggest, 56, needs 6 bits space. So, I need: 4*6 = 24 bits in total.
I do the following to save space:
I sort them first: 23 24 26 56 (because the order doesn't matter)
Now I get the difference of each from the previous: 23 1 2 30
The biggest, 30, needs 5 bits space.
After this I store all the numbers in 4*5 bits = 20 bits space.
Question: how to further improve this algorithm?
More information: Since requested, the numbers are mostly on the range of 2.000-4.000. Numbers less than 300 are pretty rare. Numbers more than 16.000 are pretty rare also. Generally speaking, all the numbers will be close. For example, they may be all in the 1.000-2.000 range or they may all be in the 16.000-20.000 range. The total number of numbers will be something in the range of 500-5.000.
Your first step is good one to take because sorting reduces the differences to least. Here is a way to improve your algorithm:
sort and calculate differences as you have done.
Use Huffman coding on it.
Use of Huffman coding is more important then your step; I'll show you why:
consider the following data:
1 2 3 4 5 6 7 4294967295
where 4294967295 = 2^32-1. Using your algorithm:
1 1 1 1 1 1 1 4294967288
total bits needed is still 32*8
Using Huffman coding, the frequencies are:
1 => 7
4294967288 => 1
Huffman codes are 1 => 0 and 4294967288 => 1
Total bits needed = 7*1 + 1 = 8 bits
Huffman coding reduces size by 32*8/8 = 32 times
This problem is well known in database community as "Inverted index compression". You can google for some papers.
Following are some of the most common techniques:
Variable byte coding (VByte)
Simple9, Simple16
"Frame Of Reference" family of techniques
PForDelta
Adaptive Frame Of Reference (AFOR)
Rice-Golomb coding (often used as a part of other techniques)
VByte and Simple9/16 are easiest to implement, fast and have good compression ratio in practice.
Huffman coding is not very good for index compression because it is slow and differences are quite random in practice. (But it may be a good choice in your case.)
How many numbers do you have ? If your set covers the range [0..(2^32)-1] densely enough (you do the maths) then a 4GiB bitfield, where the n-th bit represents the presence, or absence, of the natural number n may be useful.
If your numbers are not uniformly distributed, a better compression will be achieved by using frequencies of the numbers and affect less bits to most frequent ones. This is the idea behind huffman coding.

How to generate a function that will algebraically encode a sequence?

Is there any way to generate a function F that, given a sequence, such as:
seq = [1 2 4 3 0 5 4 2 6]
Then F(seq) will return a function that generates that sequence? That is,
F(seq)(0) = 1
F(seq)(1) = 2
F(seq)(2) = 4
... and so on
Also, if it is, what is the function of lowest complexity that does so, and what is the complexity of the generated functions?
EDIT
It seems like I'm not clear, so I'll try to exemplify:
F(seq([1 3 5 7 9])}
# returns something like:
F(x) = 1 + 2*x
# limited to the domain x ∈ [1 2 3 4 5]
In other words, I want to compute a function that can be used to algebraically, using mathematical functions such as +, *, etc, restore a sequence of integers, even if you cleaned it from memory. I don't know if it is possible, but, as one could easily code an approximation for such function for trivial cases, I'm wondering how far it goes and if there is some actual research concerning that.
EDIT 2 Answering another question, I'm only interested in sequences of integers - if that is important.
Please let me know if it is still not clear!
Well, if you just want to know a function with "+ and *", that is to say, a polynomial, you can go and check Wikipedia for Lagrange Polynomial (https://en.wikipedia.org/wiki/Lagrange_polynomial).
It gives you the lowest degree polynomial that encodes your sequence.
Unfortenately, you probably won't be able to store less than before, as the probability of the polynom being of degree d=n-1 where n is the size of the array is very high with random integers.
Furthermore, you will have to store rational numbers instead of integers.
And finally, the access to any number of the array will be in O(d) (using Horner algorithm for polynomial evaluation), in comparison to O(1) with the array.
Nevertheless, if you know that your sequences may be very simple and very long, it might be an option.
If the sequence comes from a polynomial with a low degree, an easy way to find the unique polynomial that generates it is using Newton's series. Constructing the polynomial for a n numbers has O(n²) time complexity, and evaluating it has O(n).
In Newton's series the polynomial is expressed in terms of x, x(x-1), x(x-1)(x-2) etc instead of the more familiar x, x², x³. To get the coefficients, basically you compute the differences between subsequent items in the sequence, then the differences between the differences, until only one is left or you get a sequence of all zeros. The numbers you get along the bottom, divided by factorial of the degree of the term, give you the coefficients. For example with the first sequence you get these differences:
1 2 4 3 0 5 4 2 6
1 2 -1 -3 5 -1 -2 4
1 -3 -2 8 -6 -1 6
-4 1 10 -14 5 7
5 9 -24 19 2
4 -33 43 -17
-37 76 -60
113 -136
-249
The polynomial that generates this sequence is therefore:
f(x) = 1 + x(1 + (x-1)(1/2 + (x-2)(-4/6 + (x-3)(5/24 + (x-4)(4/120
+ (x-5)(-37/720 + (x-6)(113/5040 + (x-7)(-249/40320))))))))
It's the same polynomial you get using other techniques, like Lagrange interpolation; this is just the easiest way to generate it as you get the coefficients for a polynomial form that can be evaluated with Horner's method, unlike the Lagrange form for example.
There is no magic if you say that the sequence could be completely random. And yet, it is always possible, but won't save you memory. Any interpolation method requires the same amount of memory in the worst case. Because, if it didn't, it would be possible to compress everything to a single bit.
On the other hand, it is sometimes possible to use a brute force, some heuristics (like genetic algorithms), or numerical methods to reproduce some kind of mathematical expression having a specified type, but good luck with that :)
Just use some archiving tools instead in order to save memory usage.
I think it will be useful for you to read about this: http://en.wikipedia.org/wiki/Entropy_(information_theory)

Perfect powers of numbers which can fit in 64 bit size integer (using priority queues)

How can we print out all perfect powers that can be represented as 64-bit long integers: 4, 8, 9, 16, 25, 27, .... A perfect power is a number that can be written as ab for integers a and b ≥ 2.
It's not a homework problem, I found it in job interview questions section of an algorithm design book. Hint, the chapter was based on priority queues.
Most of the ideas I have are quadratic in nature, that keep finding powers until they stop fitting 64 bit but that's not what an interviewer will look for. Also, I'm not able to understand how would PQ's help here.
Using a small priority queue, with one entry per power, is a reasonable way to list the numbers. See following python code.
import Queue # in Python 3 say: queue
pmax, vmax = 10, 150
Q=Queue.PriorityQueue(pmax)
p = 2
for e in range(2,pmax):
p *= 2
Q.put((p,2,e))
print 1,1,2
while not Q.empty():
(v, b, e) = Q.get()
if v < vmax:
print v, b, e
b += 1
Q.put((b**e, b, e))
With pmax, vmax as in the code above, it produces the following output. For the proposed problem, replace pmax and vmax with 64 and 2**64.
1 1 2
4 2 2
8 2 3
9 3 2
16 2 4
16 4 2
25 5 2
27 3 3
32 2 5
36 6 2
49 7 2
64 2 6
64 4 3
64 8 2
81 3 4
81 9 2
100 10 2
121 11 2
125 5 3
128 2 7
144 12 2
The complexity of this method is O(vmax^0.5 * log(pmax)). This is because the number of perfect squares is dominant over the number of perfect cubes, fourth powers, etc., and for each square we do O(log(pmax)) work for get and put queue operations. For higher powers, we do O(log(pmax)) work when computing b**e.
When vmax,pmax =64, 2**64, there will be about 2*(2^32 + 2^21 + 2^16 + 2^12 + ...) queue operations, ie about 2^33 queue ops.
Added note: This note addresses cf16's comment, “one remark only, I don't think "the number of perfect squares is dominant over the number of perfect cubes, fourth powers, etc." they all are infinite. but yes, if we consider finite set”. It is true that in the overall mathematical scheme of things, the cardinalities are the same. That is, if P(j) is the set of all j'th powers of integers, then the cardinality of P(j) == P(k) for all integers j,k > 0. Elements of any two sets of powers can be put into 1-1 correspondence with each other.
Nevertheless, when computing perfect powers in ascending order, no matter how many are computed, finite or not, the work of delivering squares dominates that for any other power. For any given x, the density of perfect kth powers in the region of x declines exponentially as k increases. As x increases, the density of perfect kth powers in the region of x is proportional to (x1/k)/x, hence third powers, fourth powers, etc become vanishingly rare compared to squares as x increases.
As a concrete example, among perfect powers between 1e8 and 1e9 the number of (2; 3; 4; 5; 6)th powers is about (21622; 535; 77; 24; 10). There are more than 30 times as many squares between 1e8 and 1e9 than there are instances of any higher powers than squares. Here are ratios of the number of perfect squares between two numbers, vs the number of higher perfect powers: 10¹⁰–10¹⁵, r≈301; 10¹⁵–10²⁰, r≈2K; 10²⁰–10²⁵, r≈15K; 10²⁵–10³⁰, r≈100K. In short, as x increases, squares dominate more and more when perfect powers are delivered in ascending order.
A priority queue helps, for example, if you want to avoid duplicates in the output, or if you want to list the values particularly sorted.
Priority queues can often be replaced by sorting and vice versa. You could therefore generate all combinations of ab, then sort the results and remove adjacent duplicates. In this application, this approach appears to be slightly but perhaps not drammatically memory-inefficient as witnessed by one of the sister answers.
A priority queue can be superior to sorting, if you manage to remove duplicates as you go; or if you want to avoid storing and processing the whole result to be generated in memory. The other sister answer is an example of the latter but it could easily do both with a slight modification.
Here it makes the difference between an array taking up ~16 GB of RAM and a queue with less than 64 items taking up several kilobytes at worst. Such a huge difference in memory consumption also translates to RAM access time versus cache access time difference, so the memory lean algorithm may end up much faster even if the underlying data structure incurs some overhead by maintaining itself and needs more instructions compared to the naive algorithm that uses sorting.
Because the size of the input is fixed, it is not technically possible that the methods you thought of have been quadratic in nature. Having two nested loops does not make an algorithm quadratic, until you can say that the upper bound of each such loop is proportional to input size, and often not even then). What really matters is how many times the innermost logic actually executes.
In this case the competition is between feasible constants and non-feasible constants.
The only way I can see the priority queue making much sense is that you want to print numbers as they become available, in strictly increasing order, and of course without printing any number twice. So you start off with a prime generator (that uses the sieve of eratosthenes or some smarter technique to generate the sequence 2, 3, 5, 7, 11, ...). You start by putting a triple representing the fact that 2^2 = 4 onto the queue. Then you repeat a process of removing the smallest item (the triple with the smallest exponentiation result) from the queue, printing it, increasing the exponent by one, and putting it back onto the queue (with its priority determined by the result of the new exponentiation). You interleave this process with one that generates new primes as needed (sometime before p^2 is output).
Since the largest exponent base we can possibly have is 2^32 (2^32)^2 = 2^64, the number of elements on the queue shouldn't exceed the number of primes less than 2^32, which is evidently 203,280,221, which I guess is a tractable number.

Create a random permutation of 1..N in constant space

I am looking to enumerate a random permutation of the numbers 1..N in fixed space. This means that I cannot store all numbers in a list. The reason for that is that N can be very large, more than available memory. I still want to be able to walk through such a permutation of numbers one at a time, visiting each number exactly once.
I know this can be done for certain N: Many random number generators cycle through their whole state space randomly, but entirely. A good random number generator with state size of 32 bit will emit a permutation of the numbers 0..(2^32)-1. Every number exactly once.
I want to get to pick N to be any number at all and not be constrained to powers of 2 for example. Is there an algorithm for this?
The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.
Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.
Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).
If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.
One way to do it would be
Find a prime p larger than N, preferably not much larger.
Find a primitive root of unity g modulo p, that is, a number 1 < g < p such that g^k ≡ 1 (mod p) if and only if k is a multiple of p-1.
Go through g^k (mod p) for k = 1, 2, ..., ignoring the values that are larger than N.
For every prime p, there are φ(p-1) primitive roots of unity, so it works. However, it may take a while to find one. Finding a suitable prime is much easier in general.
For finding a primitive root, I know nothing substantially better than trial and error, but one can increase the probability of a fast find by choosing the prime p appropriately.
Since the number of primitive roots is φ(p-1), if one randomly chooses r in the range from 1 to p-1, the expected number of tries until one finds a primitive root is (p-1)/φ(p-1), hence one should choose p so that φ(p-1) is relatively large, that means that p-1 must have few distinct prime divisors (and preferably only large ones, except for the factor 2).
Instead of randomly choosing, one can also try in sequence whether 2, 3, 5, 6, 7, 10, ... is a primitive root, of course skipping perfect powers (or not, they are in general quickly eliminated), that should not affect the number of tries needed greatly.
So it boils down to checking whether a number x is a primitive root modulo p. If p-1 = q^a * r^b * s^c * ... with distinct primes q, r, s, ..., x is a primitive root if and only if
x^((p-1)/q) % p != 1
x^((p-1)/r) % p != 1
x^((p-1)/s) % p != 1
...
thus one needs a decent modular exponentiation (exponentiation by repeated squaring lends itself well for that, reducing by the modulus on each step). And a good method to find the prime factor decomposition of p-1. Note, however, that even naive trial division would be only O(√p), while the generation of the permutation is Θ(p), so it's not paramount that the factorisation is optimal.
Another way to do this is with a block cipher; see this blog post for details.
The blog posts links to the paper Ciphers with Arbitrary Finite Domains which contains a bunch of solutions.
Consider the prime 3. To fully express all possible outputs, think of it this way...
bias + step mod prime
The bias is just an offset bias. step is an accumulator (if it's 1 for example, it would just be 0, 1, 2 in sequence, while 2 would result in 0, 2, 4) and prime is the prime number we want to generate the permutations against.
For example. A simple sequence of 0, 1, 2 would be...
0 + 0 mod 3 = 0
0 + 1 mod 3 = 1
0 + 2 mod 3 = 2
Modifying a couple of those variables for a second, we'll take bias of 1 and step of 2 (just for illustration)...
1 + 2 mod 3 = 0
1 + 4 mod 3 = 2
1 + 6 mod 3 = 1
You'll note that we produced an entirely different sequence. No number within the set repeats itself and all numbers are represented (it's bijective). Each unique combination of offset and bias will result in one of prime! possible permutations of the set. In the case of a prime of 3 you'll see that there are 6 different possible permuations:
0,1,2
0,2,1
1,0,2
1,2,0
2,0,1
2,1,0
If you do the math on the variables above you'll not that it results in the same information requirements...
1/3! = 1/6 = 1.66..
... vs...
1/3 (bias) * 1/2 (step) => 1/6 = 1.66..
Restrictions are simple, bias must be within 0..P-1 and step must be within 1..P-1 (I have been functionally just been using 0..P-2 and adding 1 on arithmetic in my own work). Other than that, it works with all prime numbers no matter how large and will permutate all possible unique sets of them without the need for memory beyond a couple of integers (each technically requiring slightly less bits than the prime itself).
Note carefully that this generator is not meant to be used to generate sets that are not prime in number. It's entirely possible to do so, but not recommended for security sensitive purposes as it would introduce a timing attack.
That said, if you would like to use this method to generate a set sequence that is not a prime, you have two choices.
First (and the simplest/cheapest), pick the prime number just larger than the set size you're looking for and have your generator simply discard anything that doesn't belong. Once more, danger, this is a very bad idea if this is a security sensitive application.
Second (by far the most complicated and costly), you can recognize that all numbers are composed of prime numbers and create multiple generators that then produce a product for each element in the set. In other words, an n of 6 would involve all possible prime generators that could match 6 (in this case, 2 and 3), multiplied in sequence. This is both expensive (although mathematically more elegant) as well as also introducing a timing attack so it's even less recommended.
Lastly, if you need a generator for bias and or step... why don't you use another of the same family :). Suddenly you're extremely close to creating true simple-random-samples (which is not easy usually).
The fundamental weakness of LCGs (x=(x*m+c)%b style generators) is useful here.
If the generator is properly formed then x%f is also a repeating sequence of all values lower than f (provided f if a factor of b).
Since bis usually a power of 2 this means that you can take a 32-bit generator and reduce it to an n-bit generator by masking off the top bits and it will have the same full-range property.
This means that you can reduce the number of discard values to be fewer than N by choosing an appropriate mask.
Unfortunately LCG Is a poor generator for exactly the same reason as given above.
Also, this has exactly the same weakness as I noted in a comment on #JerryCoffin's answer. It will always produce the same sequence and the only thing the seed controls is where to start in that sequence.
Here's some SageMath code that should generate a random permutation the way Daniel Fischer suggested:
def random_safe_prime(lbound):
while True:
q = random_prime(lbound, lbound=lbound // 2)
p = 2 * q + 1
if is_prime(p):
return p, q
def random_permutation(n):
p, q = random_safe_prime(n + 2)
while True:
r = randint(2, p - 1)
if pow(r, 2, p) != 1 and pow(r, q, p) != 1:
i = 1
while True:
x = pow(r, i, p)
if x == 1:
return
if 0 <= x - 2 < n:
yield x - 2
i += 1

Resources