Calculating sum of geometric series (mod m) - number-theory

I have a series
S = i^(m) + i^(2m) + ............... + i^(km) (mod m)
0 <= i < m, k may be very large (up to 100,000,000), m <= 300000
I want to find the sum. I cannot apply the Geometric Progression (GP) formula because then result will have denominator and then I will have to find modular inverse which may not exist (if the denominator and m are not coprime).
So I made an alternate algorithm making an assumption that these powers will make a cycle of length much smaller than k (because it is a modular equation and so I would obtain something like 2,7,9,1,2,7,9,1....) and that cycle will repeat in the above series. So instead of iterating from 0 to k, I would just find the sum of numbers in a cycle and then calculate the number of cycles in the above series and multiply them. So I first found i^m (mod m) and then multiplied this number again and again taking modulo at each step until I reached the first element again.
But when I actually coded the algorithm, for some values of i, I got cycles which were of very large size. And hence took a large amount of time before terminating and hence my assumption is incorrect.
So is there any other pattern we can find out? (Basically I don't want to iterate over k.)
So please give me an idea of an efficient algorithm to find the sum.

This is the algorithm for a similar problem I encountered
You probably know that one can calculate the power of a number in logarithmic time. You can also do so for calculating the sum of the geometric series. Since it holds that
1 + a + a^2 + ... + a^(2*n+1) = (1 + a) * (1 + (a^2) + (a^2)^2 + ... + (a^2)^n),
you can recursively calculate the geometric series on the right hand to get the result.
This way you do not need division, so you can take the remainder of the sum (and of intermediate results) modulo any number you want.

As you've noted, doing the calculation for an arbitrary modulus m is difficult because many values might not have a multiplicative inverse mod m. However, if you can solve it for a carefully selected set of alternate moduli, you can combine them to obtain a solution mod m.
Factor m into p_1, p_2, p_3 ... p_n such that each p_i is a power of a distinct prime
Since each p is a distinct prime power, they are pairwise coprime. If we can calculate the sum of the series with respect to each modulus p_i, we can use the Chinese Remainder Theorem to reassemble them into a solution mod m.
For each prime power modulus, there are two trivial special cases:
If i^m is congruent to 0 mod p_i, the sum is trivially 0.
If i^m is congruent to 1 mod p_i, then the sum is congruent to k mod p_i.
For other values, one can apply the usual formula for the sum of a geometric sequence:
S = sum(j=0 to k, (i^m)^j) = ((i^m)^(k+1) - 1) / (i^m - 1)
TODO: Prove that (i^m - 1) is coprime to p_i or find an alternate solution for when they have a nontrivial GCD. Hopefully the fact that p_i is a prime power and also a divisor of m will be of some use... If p_i is a divisor of i. the condition holds. If p_i is prime (as opposed to a prime power), then either the special case i^m = 1 applies, or (i^m - 1) has a multiplicative inverse.
If the geometric sum formula isn't usable for some p_i, you could rearrange the calculation so you only need to iterate from 1 to p_i instead of 1 to k, taking advantage of the fact that the terms repeat with a period no longer than p_i.
(Since your series doesn't contain a j=0 term, the value you want is actually S-1.)
This yields a set of congruences mod p_i, which satisfy the requirements of the CRT.
The procedure for combining them into a solution mod m is described in the above link, so I won't repeat it here.

This can be done via the method of repeated squaring, which is O(log(k)) time, or O(log(k)log(m)) time, if you consider m a variable.
In general, a[n]=1+b+b^2+... b^(n-1) mod m can be computed by noting that:
a[j+k]==b^{j}a[k]+a[j]
a[2n]==(b^n+1)a[n]
The second just being the corollary for the first.
In your case, b=i^m can be computed in O(log m) time.
The following Python code implements this:
def geometric(n,b,m):
T=1
e=b%m
total = 0
while n>0:
if n&1==1:
total = (e*total + T)%m
T = ((e+1)*T)%m
e = (e*e)%m
n = n/2
//print '{} {} {}'.format(total,T,e)
return total
This bit of magic has a mathematical reason - the operation on pairs defined as
(a,r)#(b,s)=(ab,as+r)
is associative, and the rule 1 basically means that:
(b,1)#(b,1)#... n times ... #(b,1)=(b^n,1+b+b^2+...+b^(n-1))
Repeated squaring always works when operations are associative. In this case, the # operator is O(log(m)) time, so repeated squaring takes O(log(n)log(m)).
One way to look at this is that the matrix exponentiation:
[[b,1],[0,1]]^n == [[b^n,1+b+...+b^(n-1))],[0,1]]
You can use a similar method to compute (a^n-b^n)/(a-b) modulo m because matrix exponentiation gives:
[[b,1],[0,a]]^n == [[b^n,a^(n-1)+a^(n-2)b+...+ab^(n-2)+b^(n-1)],[0,a^n]]

Based on the approach of #braindoper a complete algorithm which calculates
1 + a + a^2 + ... +a^n mod m
looks like this in Mathematica:
geometricSeriesMod[a_, n_, m_] :=
Module[ {q = a, exp = n, factor = 1, sum = 0, temp},
While[And[exp > 0, q != 0],
If[EvenQ[exp],
temp = Mod[factor*PowerMod[q, exp, m], m];
sum = Mod[sum + temp, m];
exp--];
factor = Mod[Mod[1 + q, m]*factor, m];
q = Mod[q*q, m];
exp = Floor[ exp /2];
];
Return [Mod[sum + factor, m]]
]
Parameters:
a is the "ratio" of the series. It can be any integer (including zero and negative values).
n is the highest exponent of the series. Allowed are integers >= 0.
mis the integer modulus != 0
Note: The algorithm performs a Mod operation after every arithmetic operation. This is essential, if you transcribe this algorithm to a language with a limited word length for integers.

Related

Idea of dynamic programming solution

Given natural number N (1 <= N <= 2000), count the number of sets of natural numbers with the sum equal to N, if we know that ratio of any two elements in given set is more than 2
(for any x, y in given set: max(x, y) / min(x, y) >= 2)
I am trying to use given ratio so it would be possible to count the sum using geometry progression formula, but I haven't succeeded yet. Somehow it's necessary to come up with dynamic programming solution, but I have no idea how to come up with a formula
As Stef suggested in the comments, if you count the number of ways you can make n, using numbers that are at most k, you can calculate this using dynamic programming. For a given n, k, either you use k or you don't: if you do, then you have n-k left, and can use numbers <= k/2, and if you don't, then you still have n, and can use numbers <= k-1. It's very similar to a coin change algorithm, or to a standard algorithm for counting partitions.
With that, here's a program that prints out the values up to n=2000 in the sequence:
N = 2000
A = [[0] * (i+1) for i in range(N+1)]
A[0][0] = 1
for n in range(1, N+1):
for k in range((n+1)//2, n+1):
A[n][k] = A[n-k][min(n-k, k//2)] + A[n][k-1]
for i in range(N+1):
print(i, A[i][i])
It has a couple of optimizations: A[n, k] is the same as A[n, n] for k>n, and A[n, k]=0 when 2k+1 < N (because if you use k, then the largest integer you can get is at most k+k/2+k/4+... <= 2k-1 -- the infinite sum is 2k, but with integer arithmetic you can never achieve this). These two optimizations give a speedup factor of 2 each, compared to computing the whole (n+1)x(n+1) table.
With these two optimizations, and the array-based bottom-up dynamic programming approach, this prints out all the solutions in around 0.5s on my machine.

Find prime factors such that difference is smallest as possible

Suppose n, a, b are positive integers where n is not a prime number, such that n=ab with a≥b and (a−b) is small as possible. What would be the best algorithm to find the values of a and b if n is given?
I read a solution where they try to represent n as the difference between two squares via searching for a square S bigger than n such that S - n = (another square). Why would that be better than simply finding the prime factors of n and searching for the combination where a,b are factors of n and a - b is minimized?
Firstly....to answer why your approach
simply finding the prime factors of n and searching for the combination where a,b are factors of n and a - b is minimized
is not optimal:
Suppose your number is n = 2^7 * 3^4 * 5^2 * 7 * 11 * 13 (=259459200), well within range of int. From the combinatorics theory, this number has exactly (8 * 5 * 3 * 2 * 2 * 2 = 960) factors. So, firstly you find all of these 960 factors, then find all pairs (a,b) such that a * b = n, which in this case will be (6C1 + 9C2 + 11C3 + 13C4 + 14C5 + 15C6 + 16C7 + 16C8) ways. (if I'm not wrong, my combinatorics is a bit weak). This is of the order 1e5 if implemented optimally. Also, implementation of this approach is hard.
Now, why the difference of squares approach
represent S - n = Q, such that S and Q are perfect squares
is good:
This is because if you can represent S - n = Q, this implies, n = S - Q
=> n = s^2 - q^2
=> n = (s+q)(s-q)
=> Your reqd ans = 2 * q
Now, even if you iterate for all squares, you will either find your answer or terminate when difference of 2 consecutive squares is greater than n
But I don't think this will be doable for all n (eg. if n=6, there is no solution for (S,Q).)
Another approach:
Iterate from floor(sqrt(n)) to 1. The first number (say, x), such that x|n will be one of the numbers in the required pair (a,b). Other will be, obvs, y = x/n. So, your answer will be y - x.
This is O(sqrt(n)) time complex algorithm.
A general method could be this:
Find the prime factorization of your number: n = Π pi ai. Except for the worst cases where n is prime or semiprime, this will be substantially faster than O(n1/2) time of the iteration down from the square root, which won't divide the found factors out of the number.
Recall that the simplest, trial division, prime factorization is done by repeatedly trying to divide the number by increasing odd numbers (or by primes) below the number's square root, dividing out of the number each factor -- thus prime by construction -- as it is found (n := n/f).
Then, lazily enumerate the factors of n in order from its prime factorization. Stop after producing half of them. Having thus found n's (not necessarily prime) factor that is closest to its square root, find the second factor by simple division.
In case this must repeatedly run many times, it will greatly pay out to precalculate the needed primes below the n's square root, to use in the factorizations.

Dynamic programming approximation

I am trying to calculate a function F(x,y) using dynamic programming. Functionally:
F(X,Y) = a1 F(X-1,Y)+ a2 F(X-2,Y) ... + ak F(X-k,Y) + b1 F(X,Y-1)+ b2 F(X,Y-2) ... + bk F(X,Y-k)
where k is a small number (k=10).
The problem is, X=1,000,000 and Y=1,000,000. So it is infeasible to calculate F(x,y) for every value between x=1..1000000 and y=1..1000000. Is there an approximate version of DP where I can avoid calculating F(x,y) for a large number of inputs and still get accurate estimate of F(X,Y).
A similar example is string matching algorithms (Levenshtein's distance) for two very long and similar strings (eg. similar DNA sequences). In such cases only the diagonal scores are important and the far-from-diagonal entries do not contribute to the final distance. How do we avoid calculating off-the-diagonal entries?
PS: Ignore the border cases (i.e. when x < k and y < k).
I'm not sure precisely how to adapt the following technique to your problem, but if you were working in just one dimension there is an O(k3 log n) algorithm for computing the nth term of the series. This is called a linear recurrence and can be solved using matrix math, of all things. The idea is to suppose that you have a recurrence defined as
F(1) = x_1
F(2) = x_2
...
F(k) = x_k
F(n + k + 1) = c_1 F(n) + c_2 F(n + 1) + ... + c_k F(n + k)
For example, the Fibonacci sequence is defined as
F(0) = 0
F(1) = 1
F(n + 2) = 1 x F(n) + 1 x F(n + 1)
There is a way to view this computation as working on a matrix. Specifically, suppose that we have the vector x = (x_1, x_2, ..., x_k)^T. We want to find a matrix A such that
Ax = (x_2, x_3, ..., x_k, x_{k + 1})^T
That is, we begin with a vector of terms 1 ... k of the sequence, and then after multiplying by matrix A end up with a vector of terms 2 ... k + 1 of the sequence. If we then multiply that vector by A, we'd like to get
A(x_2, x_3, ..., x_k, x_{k + 1})^T = (x_3, x_4, ..., x_k, x_{k + 1}, x_{k + 2})
In short, given k consecutive terms of the series, multiplying that vector by A gives us the next term of the series.
The trick uses the fact that we can group the multiplications by A. For example, in the above case, we multiplied our original x by A to get x' (terms 2 ... k + 1), then multiplied x' by A to get x'' (terms 3 ... k + 2). However, we could have instead just multiplied x by A2 to get x'' as well, rather than doing two different matrix multiplications. More generally, if we want to get term n of the sequence, we can compute Anx, then inspect the appropriate element of the vector.
Here, we can use the fact that matrix multiplication is associative to compute An efficiently. Specifically, we can use the method of repeated squaring to compute An in a total of O(log n) matrix multiplications. If the matrix is k x k, then each multiplication takes time O(k3) for a total of O(k3 log n) work to compute the nth term.
So all that remains is actually finding this matrix A. Well, we know that we want to map from (x_1, x_2, ..., x_k) to (x_1, x_2, ..., x_k, x_{k + 1}), and we know that x_{k + 1} = c_1 x_1 + c_2 x_2 + ... + c_k x_k, so we get this matrix:
| 0 1 0 0 ... 0 |
| 0 0 1 0 ... 0 |
A = | 0 0 0 1 ... 0 |
| ... |
| c_1 c_2 c_3 c_4 ... c_k |
For more detail on this, see the Wikipedia entry on solving linear recurrences with linear algebra, or my own code that implements the above algorithm.
The only question now is how you adapt this to when you're working in multiple dimensions. It's certainly possible to do so by treating the computation of each row as its own linear recurrence, then to go one row at a time. More specifically, you can compute the nth term of the first k rows each in O(k3 log n) time, for a total of O(k4 log n) time to compute the first k rows. From that point forward, you can compute each successive row in terms of the previous row by reusing the old values. If there are n rows to compute, this gives an O(k4 n log n) algorithm for computing the final value that you care about. If this is small compared to the work you'd be doing before (O(n2 k2), I believe), then this may be an improvement. Since you're saying that n is on the order of one million and k is about ten, this does seem like it should be much faster than the naive approach.
That said, I wouldn't be surprised if there was a much faster way of solving this problem by not proceeding row by row and instead using a similar matrix trick in multiple dimensions.
Hope this helps!
Without knowing more about your specific problem, the general approach is to use a top-down dynamic programming algorithm and memoize the intermediate results. That way you will only calculate the values that will be actually used (while saving the result to avoid repeated calculations).

Number of Positive Solutions to a1 x1+a2 x2+......+an xn=k (k<=10^18)

The question is Number of solutions to a1 x1+a2 x2+....+an xn=k with constraints: 1)ai>0 and ai<=15 2)n>0 and n<=15 3)xi>=0 I was able to formulate a Dynamic programming solution but it is running too long for n>10^10. Please guide me to get a more efficient soution.
The code
int dp[]=new int[16];
dp[0]=1;
BigInteger seen=new BigInteger("0");
while(true)
{
for(int i=0;i<arr[0];i++)
{
if(dp[0]==0)
break;
dp[arr[i+1]]=(dp[arr[i+1]]+dp[0])%1000000007;
}
for(int i=1;i<15;i++)
dp[i-1]=dp[i];
seen=seen.add(new BigInteger("1"));
if(seen.compareTo(n)==0)
break;
}
System.out.println(dp[0]);
arr is the array containing coefficients and answer should be mod 1000000007 as the number of ways donot fit into an int.
Update for real problem:
The actual problem is much simpler. However, it's hard to be helpful without spoiling it entirely.
Stripping it down to the bare essentials, the problem is
Given k distinct positive integers L1, ... , Lk and a nonnegative integer n, how many different finite sequences (a1, ..., ar) are there such that 1. for all i (1 <= i <= r), ai is one of the Lj, and 2. a1 + ... + ar = n. (In other words, the number of compositions of n using only the given Lj.)
For convenience, you are also told that all the Lj are <= 15 (and hence k <= 15), and n <= 10^18. And, so that the entire computation can be carried out using 64-bit integers (the number of sequences grows exponentially with n, you wouldn't have enough memory to store the exact number for large n), you should only calculate the remainder of the sequence count modulo 1000000007.
To solve such a problem, start by looking at the simplest cases first. The very simplest cases are when only one L is given, then evidently there is one admissible sequence if n is a multiple of L and no admissible sequence if n mod L != 0. That doesn't help yet. So consider the next simplest cases, two L values given. Suppose those are 1 and 2.
0 has one composition, the empty sequence: N(0) = 1
1 has one composition, (1): N(1) = 1
2 has two compositions, (1,1); (2): N(2) = 2
3 has three compositions, (1,1,1);(1,2);(2,1): N(3) = 3
4 has five compositions, (1,1,1,1);(1,1,2);(1,2,1);(2,1,1);(2,2): N(4) = 5
5 has eight compositions, (1,1,1,1,1);(1,1,1,2);(1,1,2,1);(1,2,1,1);(2,1,1,1);(1,2,2);(2,1,2);(2,2,1): N(5) = 8
You may see it now, or need a few more terms, but you'll notice that you get the Fibonacci sequence (shifted by one), N(n) = F(n+1), thus the sequence N(n) satisfies the recurrence relation
N(n) = N(n-1) + N(n-2) (for n >= 2; we have not yet proved that, so far it's a hypothesis based on pattern-spotting). Now, can we see that without calculating many values? Of course, there are two types of admissible sequences, those ending with 1 and those ending with 2. Since that partitioning of the admissible sequences restricts only the last element, the number of ad. seq. summing to n and ending with 1 is N(n-1) and the number of ad. seq. summing to n and ending with 2 is N(n-2).
That reasoning immediately generalises, given L1 < L2 < ... < Lk, for all n >= Lk, we have
N(n) = N(n-L1) + N(n-L2) + ... + N(n-Lk)
with the obvious interpretation if we're only interested in N(n) % m.
Umm, that linear recurrence still leaves calculating N(n) as an O(n) task?
Yes, but researching a few of the mentioned keywords quickly leads to an algorithm needing only O(log n) steps ;)
Algorithm for misinterpreted problem, no longer relevant, but may still be interesting:
The question looks a little SPOJish, so I won't give a complete algorithm (at least, not before I've googled around a bit to check if it's a contest question). I hope no restriction has been omitted in the description, such as that permutations of such representations should only contribute one to the count, that would considerably complicate the matter. So I count 1*3 + 2*4 = 11 and 2*4 + 1*3 = 11 as two different solutions.
Some notations first. For m-tuples of numbers, let < | > denote the canonical bilinear pairing, i.e.
<a|x> = a_1*x_1 + ... + a_m*x_m. For a positive integer B, let A_B = {1, 2, ..., B} be the set of positive integers not exceeding B. Let N denote the set of natural numbers, i.e. of nonnegative integers.
For 0 <= m, k and B > 0, let C(B,m,k) = card { (a,x) \in A_B^m × N^m : <a|x> = k }.
Your problem is then to find \sum_{m = 1}^15 C(15,m,k) (modulo 1000000007).
For completeness, let us mention that C(B,0,k) = if k == 0 then 1 else 0, which can be helpful in theoretical considerations. For the case of a positive number of summands, we easily find the recursion formula
C(B,m+1,k) = \sum_{j = 0}^k C(B,1,j) * C(B,m,k-j)
By induction, C(B,m,_) is the convolution¹ of m factors C(B,1,_). Calculating the convolution of two known functions up to k is O(k^2), so if C(B,1,_) is known, that gives an O(n*k^2) algorithm to compute C(B,m,k), 1 <= m <= n. Okay for small k, but our galaxy won't live to see you calculating C(15,15,10^18) that way. So, can we do better? Well, if you're familiar with the Laplace-transformation, you'll know that an analogous transformation will convert the convolution product to a pointwise product, which is much easier to calculate. However, although the transformation is in this case easy to compute, the inverse is not. Any other idea? Why, yes, let's take a closer look at C(B,1,_).
C(B,1,k) = card { a \in A_B : (k/a) is an integer }
In other words, C(B,1,k) is the number of divisors of k not exceeding B. Let us denote that by d_B(k). It is immediately clear that 1 <= d_B(k) <= B. For B = 2, evidently d_2(k) = 1 if k is odd, 2 if k is even. d_3(k) = 3 if and only if k is divisible by 2 and by 3, hence iff k is a multiple of 6, d_3(k) = 2 if and only if one of 2, 3 divides k but not the other, that is, iff k % 6 \in {2,3,4} and finally, d_3(k) = 1 iff neither 2 nor 3 divides k, i.e. iff gcd(k,6) = 1, iff k % 6 \in {1,5}. So we've seen that d_2 is periodic with period 2, d_3 is periodic with period 6. Generally, like reasoning shows that d_B is periodic for all B, and the minimal positive period divides B!.
Given any positive period P of C(B,1,_) = d_B, we can split the sum in the convolution (k = q*P+r, 0 <= r < P):
C(B,m+1, q*P+r) = \sum_{c = 0}^{q-1} (\sum_{j = 0}^{P-1} d_B(j)*C(B,m,(q-c)*P + (r-j)))
+ \sum_{j = 0}^r d_B(j)*C(B,m,r-j)
The functions C(B,m,_) are no longer periodic for m >= 2, but there are simple formulae to obtain C(B,m,q*P+r) from C(B,m,r). Thus, with C(B,1,_) = d_B and C(B,m,_) known up to P, calculating C(B,m+1,_) up to P is an O(P^2) task², getting the data necessary for calculating C(B,m+1,k) for arbitrarily large k, needs m such convolutions, hence that's O(m*P^2).
Then finding C(B,m,k) for 1 <= m <= n and arbitrarily large k is O(n^2*P^2), in time and O(n^2*P) in space.
For B = 15, we have 15! = 1.307674368 * 10^12, so using that for P isn't feasible. Fortunately, the smallest positive period of d_15 is much smaller, so you get something workable. From a rough estimate, I would still expect the calculation of C(15,15,k) to take time more appropriately measured in hours than seconds, but it's an improvement over O(k) which would take years (for k in the region of 10^18).
¹ The convolution used here is (f \ast g)(k) = \sum_{j = 0}^k f(j)*g(k-j).
² Assuming all arithmetic operations are O(1); if, as in the OP, only the residue modulo some M > 0 is desired, that holds if all intermediate calculations are done modulo M.

Algorithm to find smallest N such that N! is divisible by a prime raised to a power

Is there an efficient algorithm to compute the smallest integer N such that N! is divisible by p^k where p is a relatively small prime number and k, a very large integer. In other words,
factorial(N) mod p^k == 0
If, given N and p, I wanted to find how many times p divides into N!, I would use the well-known formula
k = Sum(floor(N/p^i) for i=1,2,...
I've done brute force searches for small values of k but that approach breaks down very quickly as k increases and there doesn't appear to be a pattern that I can extrapolate to larger values.
Edited 6/13/2011
Using suggestions proposed by Fiver and Hammar, I used a quasi-binary search to solve the problem but not quite in the manner they suggested. Using a truncated version of the second formula above, I computed an upper bound on N as the product of k and p (using just the first term). I used 1 as the lower bound. Using the classic binary search algorithm, I computed the midpoint between these two values and calculated what k would be using this midpoint value as N in the second formula, this time with all the terms being used.
If the computed k was too small, I adjusted the lower bound and repeated. Too big, I first tested to see if k computed at midpoint-1 was smaller than the desired k. If so, midpoint was returned as the closest N. Otherwise, I adjusted the highpoint and repeated.
If the computed k were equal, I tested whether the value at midpoint-1 was equal to the value at midpoint. If so, I adjusted the highpoint to be the midpoint and repeated. If midpoint-1 was less than the desired k, the midpoint was returned as the desired answer.
Even with very large values for k (10 or more digits), this approach works O(n log(n)) speeds.
OK this is kind of fun.
Define f(i) = (p^i - 1) / (p - 1)
Write k in a kind of funny "base" where the value of position i is this f(i).
You do this from most-significant to least-significant digit. So first, find the largest j such that f(j) <= k. Then compute the quotient and remainder of k / f(j). Store the quotient as q_j and the remainder as r. Now compute the quotient and remainder of r / f(j-1). Store the quotient as q_{j-1} and the remainder as r again. Now compute the quotient and remainder of r / f(j-2). And so on.
This generates a sequence q_j, q_{j-1}, q_{j-2}, ..., q_1. (Note that the sequence ends at 1, not 0.) Then compute q_j*p^j + q_{j-1}*p^(j-1) + ... q_1*p. That's your N.
Example: k = 9, p = 3. So f(i) = (3^i - 1) / 2. f(1) = 1, f(2) = 4, f(3) = 13. So the largest j with f(j) <= 9 is i = 2 with f(2) = 4. Take the quotient and remainder of 9 / 4. That's a quotient of 2 (which is the digit in our 2's place) and remainder of 1.
For that remainder of 1, find the quotient and remainder of 1 / f(1). Quotient is 1, remainder is zero, so we are done.
So q_2 = 2, q_1 = 1. 2*3^2 + 1*3^1 = 21, which is the right N.
I have an explanation on paper for why this works, but I am not sure how to communicate it in text... Note that f(i) answers the question, "how many factors of p are there in (p^i)!". Once you find the largest i,j such that j*f(i) is less than k, and realize what you are really doing is finding the largest j*p^i less than N, the rest kind of falls out of the wash. In our p=3 example, for instance, we get 4 p's contributed by the product of 1-9, 4 more contributed by the product of 10-18, and one more contributed by 21. Those first two are just multiples of p^2; f(2) = 4 is telling us that each multiple of p^2 contributes 4 more p's to the product.
[update]
Code always helps to clarify. Save the following perl script as foo.pl and run it as foo.pl <p> <k>. Note that ** is Perl's exponentiation operator, bdiv computes a quotient and remainder for BigInts (unlimited-precision integers), and use bigint tells Perl to use BigInts everywhere.
#!/usr/bin/env perl
use warnings;
use strict;
use bigint;
#ARGV == 2
or die "Usage: $0 <p> <k>\n";
my ($p, $k) = map { Math::BigInt->new($_) } #ARGV;
sub f {
my $i = shift;
return ($p ** $i - 1) / ($p - 1);
}
my $j = 0;
while (f($j) <= $k) {
$j++;
}
$j--;
my $N = 0;
my $r = $k;
while ($r > 0) {
my $val = f($j);
my ($q, $new_r) = $r->bdiv($val);
$N += $q * ($p ** $j);
$r = $new_r;
$j--;
}
print "Result: $N\n";
exit 0;
Using the formula you mentioned, the sequence of k values given fixed p and N = 1,2... is non-decreasing. This means you can use a variant of binary search to find N given the desired k.
Start with N = 1, and calculate k.
Double N until k is greater or equal than your desired k to get an upper bound.
Do a binary search on the remaining interval to find your k.
Why don't you try binary search for the answer, using the second formula you mentioned?
You only need to consider values for N, for which p divides N, because if it doesn't, then N! and (N-1)! are divided by the same power of p, so N can't be the smallest one.
Consider
I = (pn)!
and ignore prime factors other than p. The result looks like
I = pn * pn-1 * pn-2 * ... * p * 1
I = pn + (n-1) + (n-2) + ... 2 + 1
I = p(n2 +n)/2
So we're trying to find the smallest n such that
(n2 +n)/2 >= k
which if I remember the quadratic equation right gives us
N = pn, where n >= (sqrt(1+8k) -1)/2
(P.S. Does anyone know how to show the radical symbol in markdown?)
EDIT:
This is wrong. Let me see if I can salvage it...

Resources