Generating a stateless, pseudo-random permutation of integers from 0 to n? - algorithm

Question spawned from this one. The problem can be formulated as follows:
Given two positive integers n and m, with m <= n, is there a way to find a suite of numbers, which cycles and covers all possible values from 0 to n?
As a basic example, if we take 3 as a number, for whatever number current between 0 and 3, we can compute the next value as:
next = (current+3) % 4
This will cycle. For instance: 1 -> 0 -> 3 -> 2 -> 1 etc. I found this solution by "chance" and it is even general ((i + n) % (n + 1) for any n), I cannot prove it mathematically. And it is a little too obvious.
Are there better ways to generate such a permutation?

I'm not sure what you intend m in the question to refer to, or how you're defining "a suite of numbers"). However, one way of getting a cycle of number is to use a recursion (or iteration) of the form:
next = f(current)
for some function f. For example, linear congruential RNGs use the iteration:
x = ( a · x + c ) mod m where 0 < a, c < m
They don't always produce all values from 0 to m-1, but under certain circumstances they do:
c and m are relatively prime
a - 1 is divisible by every prime factor of m (not including m)
if m is divisible by 4, a - 1 is divisible by 4.
(This is the Hull-Dobell theorem.)
Note that a, c == 1 satisfies the above criteria for any m. Futhermore, if m is prime, any values of a and c satisify the criteria, and if m is a power of 2, then the criteria are satisfied by any a, c such that a == 1 mod 4 and c == 1 mod 2. However, for certain values of m (eg. 6), the only value of a which will work is 1.
This might not qualify as "stateless", but I don't think that there is any strictly stateless solution; for example, you might look for some function f such that:
f(0), f(1),... f(m-1)
is a permutation of
0, 1, ..., m-1
so that you could generate the cycle by calling f(i) for successive values of i. But that's still a state, since you have to remember the last value of i you used,

Incrementing each subsequent number by any number that does not share a common prime divisor with (n-m+1) would cover the sequence (e.g. for the sequence [2-11] (10 numbers) incrementing by 3, 7, or 9 would work but 2, 4, 5, 6, and 8 would not because they share a common divisor (2 and/or 5)
EDIT
I took out the shuffling idea since it seems that you want to increment by the same number each time. If you want a truly "random" sequence that has m at the first element just take m out and place it at the beginning. I'm not sure how that helps you, though.

Related

Count "cool" divisors of given number N

I'm trying to solve pretty complex problem with divisors and number theory.
Namely for a given number m we can say that k is cool divisor if k<m k|m (k divides m evenly), and for a given number n the number k^n (k to the power of n) is not divisor of m. Let s(x) - number of cool divisors of x.
Now for given a and b we should find D = s(a) + s(a+1) + s(a+2) + s(a+3) + ... + s(a+b).
Limits for all values:
(1 <= a <= 10^6), (1 <= b <= 10^7), (2<=n<=10)
Example
Let's say a=32, b=1, n=3;
x = 32, n = 3 divisors of 32 are {1,2,4,8,16,32}. However only {4,8,16} fill the conditions so s(32) = 3
x = 33, n = 3 divisors of 33 are {1,3,11,33}. Only the numbers {3,11} fill the conditions so s(33)=2;
D = s(32) + s(33) = 3 + 2 = 5
What I have tried
We should answer all those questions for 100 test cases in 3 seconds time limit.
I have two ideas, the first one: I iterate in the interval [a, a+b] and for each value i in the range I check how many cool divisors are there for that value, we can check this in O(sqrt(N)) if the function for getting number of power of N is considered as O(1) so the total function for this is O(B*sqrt(B)).
The second one, I'm now sure if it will work and how fast it will be. First I do a precomputation, I have a for loop that iterates from 1 to N, where N = 10^7
and now in the range [2, N] for each number whose divisor is i, where i is in the range [2,N] and I check if i to the power of n is not divisor of j then we update that the number j has one more cool divisor. With this I think that the complexity will be O(NlogN) and for the answers O(B).
Your first idea works but you can improve it.
Instead of checking all numbers from 1 to sqrt(N) whether they are cool divisors, you can factorize N=*p0^q0*p1^q1*p2^q2...pk^qk*. Then the number of cool divisors should then be (q0+1)(q1+1)...(qk+1) - (q0/n+1)(q1/n+1)...(qk/n+1).
So you can first preprocess and find out all the prime numbers using some existing algo like Sieve of Eratosthenes and for each number N between [a,a+b] you do a factorization. The complexity should be roughly O(BlogB).
Your second idea works as well.
For each number i between [2,a+b], you can just check the multiples of i between [a,a+b] and see whether i is a cool divisor of those multiples. The complexity should be O(BlogB) as well. Some tricks can be played in this idea to speed up the program is that, once you don't need to use divide/mod operations from time to time to check whether i is a cool divisor. You can compute the first number m between [a, a+b] that i^n|m. This m should be m=ceiling(a/(i^n))(i^n). And then you know i^n|m+p*i does not hold for p between [1,i^(n-1) - 1] and holds for p=i^n-1. Basically, you know i is not a cool divisor every i^(n-1) multiples, and you do not need to use divide/mod to figure it out, which will speed the program up.

Dividing N items in p groups

You are given N total number of item, P group in which you have to divide the N items.
Condition is the product of number of item held by each group should be max.
example N=10 and P=3 you can divide the 10 item in {3,4,3} since 3x3x4=36 max possible product.
You will want to form P groups of roughly N / P elements. However, this will not always be possible, as N might not be divisible by P, as is the case for your example.
So form groups of floor(N / P) elements initially. For your example, you'd form:
floor(10 / 3) = 3
=> groups = {3, 3, 3}
Now, take the remainder of the division of N by P:
10 mod 3 = 1
This means you have to distribute 1 more item to your groups (you can have up to P - 1 items left to distribute in general):
for i = 0 up to (N mod P) - 1:
groups[i]++
=> groups = {4, 3, 3} for your example
Which is also a valid solution.
For fun I worked out a proof of the fact that it in an optimal solution either all numbers = N/P or the numbers are some combination of floor(N/P) and ceiling(N/P). The proof is somewhat long, but proving optimality in a discrete context is seldom trivial. I would be interested if anybody can shorten the proof.
Lemma: For P = 2 the optimal way to divide N is into {N/2, N/2} if N is even and {floor(N/2), ceiling(N/2)} if N is odd.
This follows since the constraint that the two numbers sum to N means that the two numbers are of the form x, N-x.
The resulting product is (N-x)x = Nx - x^2. This is a parabola that opens down. Its max is at its vertex at x = N/2. If N is even this max is an integer. If N is odd, then x = N/2 is a fraction, but such parabolas are strictly unimodal, so the closer x gets to N/2 the larger the product. x = floor(N/2) (or ceiling, it doesn't matter by symmetry) is the closest an integer can get to N/2, hence {floor(N/2),ceiling(N/2)} is optimal for integers.
General case: First of all, a global max exists since there are only finitely many integer partitions and a finite list of numbers always has a max. Suppose that {x_1, x_2, ..., x_P} is globally optimal. Claim: given and i,j we have
|x_i - x_ j| <= 1
In other words: any two numbers in an optimal solution differ by at most 1. This follows immediately from the P = 2 lemma (applied to N = x_i + x_ j).
From this claim it follows that there are at most two distinct numbers among the x_i. If there is only 1 number, that number is clearly N/P. If there are two numbers, they are of the form a and a+1. Let k = the number of x_i which equal a+1, hence P-k of the x_i = a. Hence
(P-k)a + k(a+1) = N, where k is an integer with 1 <= k < P
But simple algebra yields that a = (N-k)/P = N/P - k/P.
Hence -- a is an integer < N/P which differs from N/P by less than 1 (k/P < 1)
Thus a = floor(N/P) and a+1 = ceiling(N/P).
QED

Know repetitions in multiset by its sum

I'm given the size N of the multiset and its sum S. The elements of the set are supposed to be continuous, for example a multiset K having 6 (N=6) elements {1,1,2,2,2,3}, so S=11 (the multiset always contains first N repeating natural numbers).
How can I know the total changes to make so that there can be no repetitions and the set becomes continuous?
For the above example the multiset K needs 3 changes. Hence, finally the set K will become {1,2,3,4,5,6}.
What I did is, I found out the actual sum (i.e. n*(n+1)/2) and subtracted the given sum. Let it be T.
Then, T=ceil(T/n), then the answer becomes 2*T, it is working for most cases.
But, I guess I'm missing some cases. Does there exists some algorithm to know how many elements to change?
I'm given only the size and sum of the multiset.
As you already noticed, for a given N, the sum should be S' = N * (N-1) / 2. You are given some value S.
Clearly, if S' = S the answer is 0.
If S'- S <= N - 1, then the multiset that requires least changes is
{1, 2, ..., N-1, X}
where X = N - (S' - S), which is in the range [1, N-1]. In other words, X makes up for the difference in sum between the required and the actual multiset. Your answer would be 1.
If the difference is larger than N-1, then also N-1 cannot be in the multiset. If S'- S <= (N - 1) + (N - 2), a multiset that requires least changes is
{1, 2, ..., N-2, 1, X}
where X = N + (N - 1) - (S'- S), which is in the range [1, N-2]. Your answer would be 2.
Generalizing, you would get a table like:
S' - S | answer
-----------------------
[ 0, 0] | 0
[ 1, N-1] | 1
[ N, 2N-3] | 2
[2N-2, 3N-6] | 3
and so on. You could find a formula to get the answer in terms of N and S, but it seems much easier to use a simple loop. I'll leave the implementation to you.

Number of Positive Solutions to a1 x1+a2 x2+......+an xn=k (k<=10^18)

The question is Number of solutions to a1 x1+a2 x2+....+an xn=k with constraints: 1)ai>0 and ai<=15 2)n>0 and n<=15 3)xi>=0 I was able to formulate a Dynamic programming solution but it is running too long for n>10^10. Please guide me to get a more efficient soution.
The code
int dp[]=new int[16];
dp[0]=1;
BigInteger seen=new BigInteger("0");
while(true)
{
for(int i=0;i<arr[0];i++)
{
if(dp[0]==0)
break;
dp[arr[i+1]]=(dp[arr[i+1]]+dp[0])%1000000007;
}
for(int i=1;i<15;i++)
dp[i-1]=dp[i];
seen=seen.add(new BigInteger("1"));
if(seen.compareTo(n)==0)
break;
}
System.out.println(dp[0]);
arr is the array containing coefficients and answer should be mod 1000000007 as the number of ways donot fit into an int.
Update for real problem:
The actual problem is much simpler. However, it's hard to be helpful without spoiling it entirely.
Stripping it down to the bare essentials, the problem is
Given k distinct positive integers L1, ... , Lk and a nonnegative integer n, how many different finite sequences (a1, ..., ar) are there such that 1. for all i (1 <= i <= r), ai is one of the Lj, and 2. a1 + ... + ar = n. (In other words, the number of compositions of n using only the given Lj.)
For convenience, you are also told that all the Lj are <= 15 (and hence k <= 15), and n <= 10^18. And, so that the entire computation can be carried out using 64-bit integers (the number of sequences grows exponentially with n, you wouldn't have enough memory to store the exact number for large n), you should only calculate the remainder of the sequence count modulo 1000000007.
To solve such a problem, start by looking at the simplest cases first. The very simplest cases are when only one L is given, then evidently there is one admissible sequence if n is a multiple of L and no admissible sequence if n mod L != 0. That doesn't help yet. So consider the next simplest cases, two L values given. Suppose those are 1 and 2.
0 has one composition, the empty sequence: N(0) = 1
1 has one composition, (1): N(1) = 1
2 has two compositions, (1,1); (2): N(2) = 2
3 has three compositions, (1,1,1);(1,2);(2,1): N(3) = 3
4 has five compositions, (1,1,1,1);(1,1,2);(1,2,1);(2,1,1);(2,2): N(4) = 5
5 has eight compositions, (1,1,1,1,1);(1,1,1,2);(1,1,2,1);(1,2,1,1);(2,1,1,1);(1,2,2);(2,1,2);(2,2,1): N(5) = 8
You may see it now, or need a few more terms, but you'll notice that you get the Fibonacci sequence (shifted by one), N(n) = F(n+1), thus the sequence N(n) satisfies the recurrence relation
N(n) = N(n-1) + N(n-2) (for n >= 2; we have not yet proved that, so far it's a hypothesis based on pattern-spotting). Now, can we see that without calculating many values? Of course, there are two types of admissible sequences, those ending with 1 and those ending with 2. Since that partitioning of the admissible sequences restricts only the last element, the number of ad. seq. summing to n and ending with 1 is N(n-1) and the number of ad. seq. summing to n and ending with 2 is N(n-2).
That reasoning immediately generalises, given L1 < L2 < ... < Lk, for all n >= Lk, we have
N(n) = N(n-L1) + N(n-L2) + ... + N(n-Lk)
with the obvious interpretation if we're only interested in N(n) % m.
Umm, that linear recurrence still leaves calculating N(n) as an O(n) task?
Yes, but researching a few of the mentioned keywords quickly leads to an algorithm needing only O(log n) steps ;)
Algorithm for misinterpreted problem, no longer relevant, but may still be interesting:
The question looks a little SPOJish, so I won't give a complete algorithm (at least, not before I've googled around a bit to check if it's a contest question). I hope no restriction has been omitted in the description, such as that permutations of such representations should only contribute one to the count, that would considerably complicate the matter. So I count 1*3 + 2*4 = 11 and 2*4 + 1*3 = 11 as two different solutions.
Some notations first. For m-tuples of numbers, let < | > denote the canonical bilinear pairing, i.e.
<a|x> = a_1*x_1 + ... + a_m*x_m. For a positive integer B, let A_B = {1, 2, ..., B} be the set of positive integers not exceeding B. Let N denote the set of natural numbers, i.e. of nonnegative integers.
For 0 <= m, k and B > 0, let C(B,m,k) = card { (a,x) \in A_B^m × N^m : <a|x> = k }.
Your problem is then to find \sum_{m = 1}^15 C(15,m,k) (modulo 1000000007).
For completeness, let us mention that C(B,0,k) = if k == 0 then 1 else 0, which can be helpful in theoretical considerations. For the case of a positive number of summands, we easily find the recursion formula
C(B,m+1,k) = \sum_{j = 0}^k C(B,1,j) * C(B,m,k-j)
By induction, C(B,m,_) is the convolution¹ of m factors C(B,1,_). Calculating the convolution of two known functions up to k is O(k^2), so if C(B,1,_) is known, that gives an O(n*k^2) algorithm to compute C(B,m,k), 1 <= m <= n. Okay for small k, but our galaxy won't live to see you calculating C(15,15,10^18) that way. So, can we do better? Well, if you're familiar with the Laplace-transformation, you'll know that an analogous transformation will convert the convolution product to a pointwise product, which is much easier to calculate. However, although the transformation is in this case easy to compute, the inverse is not. Any other idea? Why, yes, let's take a closer look at C(B,1,_).
C(B,1,k) = card { a \in A_B : (k/a) is an integer }
In other words, C(B,1,k) is the number of divisors of k not exceeding B. Let us denote that by d_B(k). It is immediately clear that 1 <= d_B(k) <= B. For B = 2, evidently d_2(k) = 1 if k is odd, 2 if k is even. d_3(k) = 3 if and only if k is divisible by 2 and by 3, hence iff k is a multiple of 6, d_3(k) = 2 if and only if one of 2, 3 divides k but not the other, that is, iff k % 6 \in {2,3,4} and finally, d_3(k) = 1 iff neither 2 nor 3 divides k, i.e. iff gcd(k,6) = 1, iff k % 6 \in {1,5}. So we've seen that d_2 is periodic with period 2, d_3 is periodic with period 6. Generally, like reasoning shows that d_B is periodic for all B, and the minimal positive period divides B!.
Given any positive period P of C(B,1,_) = d_B, we can split the sum in the convolution (k = q*P+r, 0 <= r < P):
C(B,m+1, q*P+r) = \sum_{c = 0}^{q-1} (\sum_{j = 0}^{P-1} d_B(j)*C(B,m,(q-c)*P + (r-j)))
+ \sum_{j = 0}^r d_B(j)*C(B,m,r-j)
The functions C(B,m,_) are no longer periodic for m >= 2, but there are simple formulae to obtain C(B,m,q*P+r) from C(B,m,r). Thus, with C(B,1,_) = d_B and C(B,m,_) known up to P, calculating C(B,m+1,_) up to P is an O(P^2) task², getting the data necessary for calculating C(B,m+1,k) for arbitrarily large k, needs m such convolutions, hence that's O(m*P^2).
Then finding C(B,m,k) for 1 <= m <= n and arbitrarily large k is O(n^2*P^2), in time and O(n^2*P) in space.
For B = 15, we have 15! = 1.307674368 * 10^12, so using that for P isn't feasible. Fortunately, the smallest positive period of d_15 is much smaller, so you get something workable. From a rough estimate, I would still expect the calculation of C(15,15,k) to take time more appropriately measured in hours than seconds, but it's an improvement over O(k) which would take years (for k in the region of 10^18).
¹ The convolution used here is (f \ast g)(k) = \sum_{j = 0}^k f(j)*g(k-j).
² Assuming all arithmetic operations are O(1); if, as in the OP, only the residue modulo some M > 0 is desired, that holds if all intermediate calculations are done modulo M.

Quadratic testing in hash tables

During an assignment, I was asked to show that a hash table of size m (m>3, m is prime) that is less than half full, and that uses quadratic checking (hash(k, i) = (h(k) + i^2) mod m) we will always find a free spot.
I've checked and arrived to the conclusion that the spots that will be found (when h(k)=0) are 0 mod m, 1 mod m, 4 mod m, 9 mod m, ...
My problem is that I can't figure a way to show that it will always find the free spot. I've tested it myself with different values of m, and also have proven myself that if the hash table is more than half full, we might never find a free spot.
Can anyone please hint me towards the way to solve this?
Thanks!
0, 1, 4, ..., ((m-1)/2)^2 are all distinct mod m. Why?
Suppose two numbers from that range, i^2 and j^2, are equivalent mod m.
Then i^2 - j^2 = (i-j)(i+j) = 0 (mod m). Since m is prime, m must divide one of those factors. But the factors are both less than m, so one of them ((i-j)) is 0. That is, i = j.
Since we are starting at 0, more than half the slots that are distinct. If you can only fill less than m/2 of them, at least one remains open.
Let's break the proof down.
Setup
First, some background.
With a hash table, we define a probe sequence P. For any item q, following P will eventually lead to the right item in the hash table. The probe sequence is just a series of functions {h_0, ..., h_M-1} where h_i is a hash function.
To insert an item q into the table, we look at h_0(q), h_1(q), and so on, until we find an empty spot. To find q later, we examine the same sequence of locations.
In general, the probe sequence is of the form h_i(q) = [h(q) + c(i)] mod M, for a hash table of size M, where M is a prime number. The function c(i) is the collision-resolution strategy, which must have two properties:
First, c(0) = 0. This means that the first probe in the sequence must be equal to just performing the hash.
Second, the values {c(0) mod M, ..., c(M-1) mod M} must contain every integer between 0 and M-1. This means that if you keep trying to find empty spots, the probe sequence will eventually probe every array position.
Applying quadratic probing
Okay, we've got the setup of how the hash table works. Let's look at quadratic probing. This just means that for our c(i) we're using a general quadratic equation of the form ai^2 + bi + c, though for most implementations you'll usually just see c(i) = i^2 (that is, b, c = 0).
Does quadratic probing meet the two properties we talked about before? Well, it's certainly true that c(0) = 0 here, since (0)^2 is indeed 0, so it meets the first property. What about the second property?
It turns out that in general, the answer is no.
Theorem. When quadratic probing is used in a hash table of size M, where M is a prime number, only the first floor[M/2] probes in the probe sequence are distinct.
Let's see why this is the case, using a proof by contradiction.
Say that the theorem is wrong. Then that means there are two values a and b such that 0 <= a < b < floor[M/2] that probe the same position.
h_a(q) and h_b(q) must probe the same position, by (1), so h_a(q) = h_b(q).
h_a(q) = h_b(q) ==> h(q) + c(a) = h(q) + c(b), mod M.
The h(q) on both sides cancel. Our c(i) is just c(i) = i^2, so we have a^2 = b^2.
Solving the quadratic equation in (4) gives us a^2 - b^2 = 0, mod M. This is a difference of two squares, so the solution is (a - b)(a + b) = 0, mod M.
But remember, we said M was a prime number. The only way that (a - b)(a + b) can be zero mod M is if [case I] (a - b) is zero, or [case II] (a + b) is zero mod M.
Case I can't be right, because we said that a != b, so a - b must be something other than zero.
The only way for (a + b) to be zero mod M is for a + b to be equal to be a multiple of M or zero. They clearly can't be zero, since they're both bigger than zero. And since they're both less than floor[M/2], their sum must be less than M. So case II can't be right either.
Thus, if the theorem were wrong, one of two quantities must be zero, neither of which can possibly be zero -- a contradiction! QED: quadratic probing doesn't satisfy property two once your table is more than half full and if your table size is a prime number. The proof is complete!
From Wikipedia:
For prime m > 2, most choices of c1 and c2 will make h(k,i) distinct for i in [0,(m − 1) / 2]. Such choices include c1 = c2 = 1/2, c1 = c2 = 1, and c1 = 0,c2 = 1. Because there are only about m/2 distinct probes for a given element, it is difficult to guarantee that insertions will succeed when the load factor is > 1/2.
See the quadratic probing section in Data Structures and Algorithms with Object-Oriented Design Patterns in C++ for a proof that m/2 elements are distinct when m is prime.

Resources