Fast way to calculate n! mod m where m is prime? - algorithm

I was curious if there was a good way to do this. My current code is something like:
def factorialMod(n, modulus):
ans=1
for i in range(1,n+1):
ans = ans * i % modulus
return ans % modulus
But it seems quite slow!
I also can't calculate n! and then apply the prime modulus because sometimes n is so large that n! is just not feasible to calculate explicitly.
I also came across http://en.wikipedia.org/wiki/Stirling%27s_approximation and wonder if this can be used at all here in some way?
Or, how might I create a recursive, memoized function in C++?

n can be arbitrarily large
Well, n can't be arbitrarily large - if n >= m, then n! ≡ 0 (mod m) (because m is one of the factors, by the definition of factorial).
Assuming n << m and you need an exact value, your algorithm can't get any faster, to my knowledge. However, if n > m/2, you can use the following identity (Wilson's theorem - Thanks #Daniel Fischer!)
to cap the number of multiplications at about m-n
(m-1)! ≡ -1 (mod m)
1 * 2 * 3 * ... * (n-1) * n * (n+1) * ... * (m-2) * (m-1) ≡ -1 (mod m)
n! * (n+1) * ... * (m-2) * (m-1) ≡ -1 (mod m)
n! ≡ -[(n+1) * ... * (m-2) * (m-1)]-1 (mod m)
This gives us a simple way to calculate n! (mod m) in m-n-1 multiplications, plus a modular inverse:
def factorialMod(n, modulus):
ans=1
if n <= modulus//2:
#calculate the factorial normally (right argument of range() is exclusive)
for i in range(1,n+1):
ans = (ans * i) % modulus
else:
#Fancypants method for large n
for i in range(n+1,modulus):
ans = (ans * i) % modulus
ans = modinv(ans, modulus)
ans = -1*ans + modulus
return ans % modulus
We can rephrase the above equation in another way, that may or may-not perform slightly faster. Using the following identity:
we can rephrase the equation as
n! ≡ -[(n+1) * ... * (m-2) * (m-1)]-1 (mod m)
n! ≡ -[(n+1-m) * ... * (m-2-m) * (m-1-m)]-1 (mod m)
(reverse order of terms)
n! ≡ -[(-1) * (-2) * ... * -(m-n-2) * -(m-n-1)]-1 (mod m)
n! ≡ -[(1) * (2) * ... * (m-n-2) * (m-n-1) * (-1)(m-n-1)]-1 (mod m)
n! ≡ [(m-n-1)!]-1 * (-1)(m-n) (mod m)
This can be written in Python as follows:
def factorialMod(n, modulus):
ans=1
if n <= modulus//2:
#calculate the factorial normally (right argument of range() is exclusive)
for i in range(1,n+1):
ans = (ans * i) % modulus
else:
#Fancypants method for large n
for i in range(1,modulus-n):
ans = (ans * i) % modulus
ans = modinv(ans, modulus)
#Since m is an odd-prime, (-1)^(m-n) = -1 if n is even, +1 if n is odd
if n % 2 == 0:
ans = -1*ans + modulus
return ans % modulus
If you don't need an exact value, life gets a bit easier - you can use Stirling's approximation to calculate an approximate value in O(log n) time (using exponentiation by squaring).
Finally, I should mention that if this is time-critical and you're using Python, try switching to C++. From personal experience, you should expect about an order-of-magnitude increase in speed or more, simply because this is exactly the sort of CPU-bound tight-loop that natively-compiled code excels at (also, for whatever reason, GMP seems much more finely-tuned than Python's Bignum).

Expanding my comment to an answer:
Yes, there are more efficient ways to do this. But they are extremely messy.
So unless you really need that extra performance, I don't suggest to try to implement these.
The key is to note that the modulus (which is essentially a division) is going to be the bottleneck operation. Fortunately, there are some very fast algorithms that allow you to perform modulus over the same number many times.
Division by Invariant Integers using Multiplication
Montgomery Reduction
These methods are fast because they essentially eliminate the modulus.
Those methods alone should give you a moderate speedup. To be truly efficient, you may need to unroll the loop to allow for better IPC:
Something like this:
ans0 = 1
ans1 = 1
for i in range(1,(n+1) / 2):
ans0 = ans0 * (2*i + 0) % modulus
ans1 = ans1 * (2*i + 1) % modulus
return ans0 * ans1 % modulus
but taking into account for an odd # of iterations and combining it with one of the methods I linked to above.
Some may argue that loop-unrolling should be left to the compiler. I will counter-argue that compilers are currently not smart enough to unroll this particular loop. Have a closer look and you will see why.
Note that although my answer is language-agnostic, it is meant primarily for C or C++.

n! mod m can be computed in O(n1/2 + ε) operations instead of the naive O(n). This requires use of FFT polynomial multiplication, and is only worthwhile for very large n, e.g. n > 104.
An outline of the algorithm and some timings can be seen here: http://fredrikj.net/blog/2012/03/factorials-mod-n-and-wilsons-theorem/

If we want to calculate M = a*(a+1) * ... * (b-1) * b (mod p), we can use the following approach, if we assume we can add, substract and multiply fast (mod p), and get a running time complexity of O( sqrt(b-a) * polylog(b-a) ).
For simplicity, assume (b-a+1) = k^2, is a square. Now, we can divide our product into k parts, i.e. M = [a*..*(a+k-1)] *...* [(b-k+1)*..*b]. Each of the factors in this product is of the form p(x)=x*..*(x+k-1), for appropriate x.
By using a fast multiplication algorithm of polynomials, such as Schönhage–Strassen algorithm, in a divide & conquer manner, one can find the coefficients of the polynomial p(x) in O( k * polylog(k) ). Now, apparently there is an algorithm for substituting k points in the same degree-k polynomial in O( k * polylog(k) ), which means, we can calculate p(a), p(a+k), ..., p(b-k+1) fast.
This algorithm of substituting many points into one polynomial is described in the book "Prime numbers" by C. Pomerance and R. Crandall. Eventually, when you have these k values, you can multiply them in O(k) and get the desired value.
Note that all of our operations where taken (mod p).
The exact running time is O(sqrt(b-a) * log(b-a)^2 * log(log(b-a))).

Expanding on my comment, this takes about 50% of the time for all n in [100, 100007] where m=(117 | 1117):
Function facmod(n As Integer, m As Integer) As Integer
Dim f As Integer = 1
For i As Integer = 2 To n
f = f * i
If f > m Then
f = f Mod m
End If
Next
Return f
End Function

I found this following function on quora:
With f(n,m) = n! mod m;
function f(n,m:int64):int64;
begin
if n = 1 then f:= 1
else f:= ((n mod m)*(f(n-1,m) mod m)) mod m;
end;
Probably beat using a time consuming loop and multiplying large number stored in string. Also, it is applicable to any integer number m.
The link where I found this function : https://www.quora.com/How-do-you-calculate-n-mod-m-where-n-is-in-the-1000s-and-m-is-a-very-large-prime-number-eg-n-1000-m-10-9+7

If n = (m - 1) for prime m then by http://en.wikipedia.org/wiki/Wilson's_theorem n! mod m = (m - 1)
Also as has already been pointed out n! mod m = 0 if n > m

Assuming that the "mod" operator of your chosen platform is sufficiently fast, you're bounded primarily by the speed at which you can calculate n! and the space you have available to compute it in.
Then it's essentially a 2-step operation:
Calculate n! (there are lots of fast algorithms so I won't repeat any here)
Take the mod of the result
There's no need to complexify things, especially if speed is the critical component. In general, do as few operations inside the loop as you can.
If you need to calculate n! mod m repeatedly, then you may want to memoize the values coming out of the function doing the calculations. As always, it's the classic space/time tradeoff, but lookup tables are very fast.
Lastly, you can combine memoization with recursion (and trampolines as well if needed) to get things really fast.

Related

What is the correct time complexity for n factorial time funcion?

I am very new to this topic and I am trying to grasp everything related to the asymptotic notations. I want to ask for your opinion on the following question:
If we have, for an algorithm, that T(n)=n!, then we can say for its time complexity that:
1 x 1 x 1 ... x1 <= n! <= n x n x n ... x n
This relation means that n! = O(n^n) and n! = Ω(1). However, can't we do better? We want the big-oh to be as close as we can to the function T(n). If we do the following:
n! <= 1 x 2 x 3 x 4 ... x n x n
That is, for the second to last element, we replace (n-1) with n. Now isnt this relation true? So isn't it true that n! = O(1 x 2 ... x n x n)? Something similar can be said for the lower bound Ω.
I am not sure if there is an error in my though process so I would really appreciate your input. Thanks in advance.
The mathematical statement n! = O(1 x 2 ... x n x n) is true. But also not terribly helpful nor enlightening. In what situations do you want to write n! = O(...)?
Either you are satisfied with n! = n!, and you don't need to write n! = O(1 x 2 ... x n x n). Or you are not satisfied with n! = n!; you want something that explains better exactly how large is n!; then you shouldn't be satisfied with n! = O(1 x 2 ... x n x n) either, as it is not any easier to understand.
Personally, I am satisfied with polynomials, like n^2. I am satisfied with exponentials, like 2^n. I am also somewhat satisfied with n^n, because I know n^n = 2^(n log n), and I also know I can't hope to find a better expression for n^n.
But I am not satisfied with n!. I would like to be able to compare it to exponentials.
Here are two comparisons:
n! < n^n
2^n < n!
The first one is obtained by upperbounding every factor by n in the product; the second one is obtained by lowerbounding every factor by 2 in the product.
That's already pretty good; it tells us that n! is somewhere between the exponential 2^n and the superexponential n^n.
But you can easily tell that the upperbound n^n is too high; for instance, you can find the following tighter bounds quite easily:
n! < n^(n-1)
n! < 2 * n^(n-2)
n! < 6 * n^(n-3)
Note that n^(n-3) is a lot smaller than n^n when n is big! This is slightly better, but still not satisfying.
You could go even further, and notice that half the factors are smaller than n/2, thus:
n! < (n/2)^(n/2) * n^(n/2) = (1/2)^(n/2) * n^n = (n / sqrt(2))^n =~ (0.7 n)^n
This is a slightly tighter upper bound! But can we do even better? I am still not satisfied.
If you are not satisfied either, I encourage you to read: https://en.wikipedia.org/wiki/Stirling%27s_approximation

Time complexity: theory vs reality

I'm currently doing an assignment that requires us to discuss time complexities of different algorithms.
Specifically sum1 and sum2
def sum1(a):
"""Return the sum of the elements in the list a."""
n = len(a)
if n == 0:
return 0
if n == 1:
return a[0]
return sum1(a[:n/2]) + sum1(a[n/2:])
def sum2(a):
"""Return the sum of the elements in the list a."""
return _sum(a, 0, len(a)-1)
def _sum(a, i, j):
"""Return the sum of the elements from a[i] to a[j]."""
if i > j:
return 0
if i == j:
return a[i]
mid = (i+j)/2
return _sum(a, i, mid) + _sum(a, mid+1, j)
Using the Master theorem, my best guess for both of theese are
T(n) = 2*T(n/2)
which accoring to Wikipedia should equate to O(n) if I haven't made any mistakes in my assumptions, however when I do a benchmark with different arrays of length N with random integers in the range 1 to 100, I get the following result.
I've tried running the benchmark a multiple of times and I get the same result each time. sum2 seems to be twice as fast as sum1 which baffles me since they should make the same amount of operations. (?).
My question is, are these algorthim both linear and if so why do their run time vary.
If it does matter, I'm running these tests on Python 2.7.14.
sum1 looks like O(n) on the surface, but for sum1 T(n) is actually 2T(n/2) + 2*n/2. This is because of the list slicing operations which themselves are O(n). Using the master theorem, the complexity becomes O(n log n) which causes the difference.
Thus, for sum1, the time taken t1 = k1 * n log n. For sum2, time taken t2 = k2 * n.
Since you are plotting a time vs log n graph, let x = log n. Then,
t1 = k1 * x * 10^x
t2 = k2 * 10^x
With suitable values for k1 and k2, you get a graph very similar to yours. From your data, when x = 6, 0.6 ~ k1 * 6 * 10^6 or k1 ~ 10^(-7) and 0.3 ~ k2 * 10^6 or k2 = 3 * 10^(-7).
Your graph has log10(N) on the x-axis, which means that the right-most data points are for an N value that's ten times the previous ones. And, indeed, they take roughly ten times as long. So that's a linear progression, as you expect.

Find the efficiency in Big-O notation

I was having problem with the following question
Consider the following nested loop construct. Categorize its efficiency in terms of the
variable n using "big-o" notation. Suppose the statements represented by the ellipsis
(...) require four main memory accesses (each requiring one microsecond) and two
disk file accesses (each requiring one millisecond). Express in milliseconds the amount
of time this construct would require to execute if n were 1000.
x = 1;
do
{
y = n;
while (y > 0)
{
...
y--;
}
x *= 2;
} while (x < n*n);
Inner loop with y is O(n).
Outer loop runs with x = 1, 2, 2^2, 2^3, ... 2^k < n * n. Hence it runs in O(log(n*n)) which is O(2 * log(n))
Hence complexity is O(n * log(n))
Just to add some more explanation to theother answer, a notable part of code is the x *= 2; ie a doubling. So this part is not linear. So you should be thinking log2.
Therefore, x will reach n*n in log2(n*n). = log2(n^2) = 2 x log2(n).
The y countdown is linear - so that is O(n)
There is a loop within a loop so you multiply both operations as in:
n * 2 x log2(n) = O(n * 2 * log2(n)). Then you take out constant factors to get:
O(n * log2(n))

1/x + 1/y = 1/N(factorial)

The question is, how to solve 1/x + 1/y = 1/N! (N factorial). Find the number of values that satisfy x and y for large values of N.
I've solved the problem for relatively small values of N (any N! that'll fit into a long). So, I know I solve the problem by getting all the divisors of (N!)^2. But that starts failing when (N!)^2 fails to fit into a long. I also know I can find all the divisors of N! by adding up all the prime factors of each number factored in N!. What I am missing is how I can use all the numbers in the factorial to find the x and y values.
EDIT: Not looking for the "answer" just a hint or two.
Problem : To find the count of factors of (N!)^2.
Hints :
1) You don't really need to compute (N!)^2 to find its prime factors.
Why?
Say you find the prime factorization of N! as (p1^k1) x (p2^k2) .... (pi^ki)
where pj's are primes and kj's are exponents.
Now the number of factors of N! is as obvious as
(k1 + 1) x (k2 + 1) x ... x (ki + 1).
2) For (N!)^2, the above expression would be,
(2*k1 + 1) * (2*k2 + 1) * .... * (2*k1 + 1)
which is essentially what we are looking for.
For example, lets take N=4, N! = 24 and (N!)^2 = 576;
24 = 2^3 * 3^1;
Hence no of factors = (3+1) * (1+1) = 8, viz {1,2,3,4,6,8,12,24}
For 576 = 2^6 * 3^2, it is (2*3 + 1) * (2*1 + 1) = 21;
3) Basically you need to find the multiplicity of each primes <= N here.
Please correct me if i'm wrong somewhere till here.
Here is your hint. Suppose that m = p1k1 · p2k2 · ... · pjkj. Every factor of m will have from 0 to k1 factors of p1, 0 to k2 factors of p2, and so on. Thus there are (1 + k1) · (1 + k2) · ... · (1 + kj) possible divisors.
So you need to figure out the prime factorization of n!2.
Note, this will count, for instance, 1⁄6 = 1⁄8 + 1⁄24 as being a different pair from 1⁄6 = 1⁄24 + 1⁄8. If order does not matter, add 1 and divide by 2. (The divide by 2 is because typically 2 divisors will lead to the same answer, with the add 1 for the exception that the divisor n! leads to a pair that pairs with itself.)
It's more to math than programming.
Your equation implies xy = n!(x+y).
Let c = gcd(x,y), so x = cx', y= cy', and gcd(x', y')=1.
Then c^2 x' y'=n! c (x'+y'), so cx'y' = n!(x' + y').
Now, as x' and y' are coprime, and cannot be divisible be x'+y', c should be.
So c = a(x'+y'), which gives ax'y'=n!.
To solve your problem, you should find all two coprime divisors of n!, every pair of which will give a solution as ( n!(x'+y')/y', n!(x'+y')/x').
Let F(N) be the number of (x,y) combinations that satisfy your requirements.
F(N+1) = F(N) + #(x,y) that satisfy the condition for N+1 and at least one of them (x or y) is not divisible N+1.
The intuition here is for all combinations (x,y) that work for N, (x*(N+1), y*(N+1)) would work for N+1. Also, if (x,y) is a solution for N+1 and both are divisible by n+1, then (x/(N+1),y/(N+1)) is a solution for N.
Now, I am not sure how difficult it is to find #(x,y) that work for (N+1) and at least one of them not divisible by N+1, but should be easier than solving the original problem.
Now Multiplicity or Exponent for Prime p in N! can be found by below formula:\
Exponent of P in (N!)= [N/p] + [N/(P^2)] +[N/(P^3)] + [N/(P^4)] +...............
where [x]=Step function E.g. [1.23]=integer part(1.23)=1
E.g. Exponent of 3 in 24! = [24/3] +[24/9]+ [24/27] + ... = 8 +2 +0 + 0+..=10
Now whole problem reduces to identifying prime number below N and finding its Exponent in N!

n! modulo m , a^p modulo m

Is there faster algo to calculate (n! modulo m).
faster than reduction at every multiplication step.
And also
Is there faster algo to calculate (a^p modulo m) better than right-left binary method.
here is my code:
n! mod m
ans=1
for(int i=1;i<=n;i++)
ans=(ans*i)%m;
a^p mod m
result=1;
while(p>0){
if(p%2!=0)
result=(result*a)%m;
p=(p>>1);
a=(a*a)%m;
}
Now the a^n mod m is a O(logn), It's the Modular Exponentiation Algorithm.
Now for the other one, n! mod m, the algorithm you proposed is clearly O(n), So obviously the first algorithm is faster.
The standard trick for computing a^p modulo m is to use successive square. The idea is to expand p into binary, say
p = e0 * 2^0 + e1 * 2^1 + ... + en * 2^n
where (e0,e1,...,en) are binary (0 or 1) and en = 1. Then use laws of exponents to get the following expansion for a^p
a^p = a^( e0 * 2^0 + e1 * 2^1 + ... + en * 2^n )
= a^(e0 * 2^0) * a^(e1 * 2^1) * ... * a^(en * 2^n)
= (a^(2^0))^e0 * (a^(2^1))^e1 * ... * (a^(2^n))^en
Remember that each ei is either 0 or 1, so these just tell you which numbers to take. So the only computations that you need are
a, a^2, a^4, a^8, ..., a^(2^n)
You can generate this sequence by squaring the previous term. Since you want to compute the answer mod m, you should do the modular arithmetic first. This means you want to compute the following
A0 = a mod m
Ai = (Ai)^2 mod m for i>1
The answer is then
a^p mod m = A0^e0 + A1^e1 + ... + An^en
Therefore the computation takes log(p) squares and calls to mod m.
I'm not certain whether or not there is an analog for factorials, but a good place to start looking would be at Wilson's Theorem. Also, you should put in a test for m <= n, in which case n! mod m = 0.
For the first computation, you should only bother with the mod operator if ans > m:
ans=1
for(int i=1;i<=n;i++) {
ans *= i;
if (ans > m) ans %= m;
}
For the second computation, using (p & 1) != 0 will probably be a lot faster than using p%2!=0 (unless the compiler recognizes this special case and does it for you). Then the same comment applies about avoiding the % operator unless necessary.

Resources