Fast algorithm to calculate large n! mod 2³² - algorithm

I want to calculate the exact value of N! mod 232. N can be up to 231
Any language is fine but I would appreciate detailed explanation of algorithm.
Time limit is < 1 sec

In python:
if n > 33:
return 0
return reduce(lambda x, y: x*y, range(1, n+1)) % 2**32
We know that 34! is divisible by 232 because in the sequence:
1 * 2 * 3 * 4 * ... * 34
there are:
17 multiples of 2
8 multiples of 4
4 multiples of 8
2 multiples of 16
1 multiple of 32
32 multiplications by 2
It's a factor of every larger factorial, so all the larger ones are 0 mod 232
For small values of N, if you don't have bignum arithmetic available, you can do the individual multiplications mod 232, and/or you can prefactor the power of 2 in the factorial, which is easy to compute (see above).

Calculate the factorial normally (multiply the numbers 1,2,3,...), performing the modulo after each multiplication. This will give you the result for small values of N.
For larger values of N, do the same. Pretty soon, your intermediate result will be 0, and then you can stop the loop immediately and return 0. The point at which you stop will be relatively fast: For N == 64 the result will already be 0 because the product of 1..64 contains 32 even numbers and is therefore divisible by 2^32. The actual minimal value of N where you get 0 will be less than 64.

In general, you can implement algorithms modulo small powers of two without bignums or modular reduction using the integer types (int, long) available in most programming languages. For modulo 232 you would use a 32-bit int. "Integer overflow" takes care of the modular arithmetic.
In this case, since there are only 34 distinct results, a lookup table may be faster than computing the factorial, assuming the factorials are used often enough that the table gets loaded into the CPU cache. The execution time will be measured in microseconds.

When multiplying 2 numbers of arbitrary length, the lower bits are always exact because it doesn't depend on high order bits. Basically a×b mod m = [(a mod m)×(b mod m)] mod m so to do N! mod m just do
1×2×...×N mod m = (...(((1×2 mod m)×3 mod m)×4 mod m)...)×N mod m
Modulo 2n is a special case because getting the modulus is rather easy with an AND operation. Modulo 232 is even more special because all unsigned operations in C and most C-like languages are reduced modulo 232 for a 32-bit unsigned type
As a result you can just multiply the numbers in a twice-as-wide type then after that AND with 232 - 1 to get the modulus
uint64_t p = 1;
for (uint32_t i = 1; i <= n; i++)
p = p*i & 0xFFFFFFFFU;
return p;

Calculating a modulo is a very fast operation, especially the modulo of a power of 2. A multiplication is very costly in comparison.
The fastest algorithm would factorize the factors of the factorial in prime numbers (which is very fast since the numbers are smaller than 33). And get the result by multiplying all of them together, by taking the modulo in between each multiplication, and starting with the big numbers.
E.g.: to calculate 10! mod 232: use de Polignac's formula, to get the prime factors of 10!
which gives you :
10! = 7 * 5 * 5 * 3 * 3 * 3 * 3 * 2 ...
this would be faster than the basic algorithm, because calculating
(29! mod 232) X 30
is much harder than multiplying by 5, 3 and 2, and taking the modulo in between each time.


Making a very large calculation

I am want to calculate the value X =n!/2^r
where n<10^6 and r<10^6
and it's guarantee that value of X is between O to 10
How to calculate X since i can't simple divide the factorial and power term since they overflow the long integer.
My Approach
Do with the help of Modulus. Let take a prime number greater than 10 let say 101
X= [(Factorial N%101)*inverse Modulo of(2^r)]%101;
Note that inverse modulo can easily be calculate and 2^r%101 can also be calculated.
It's not guarantee that X is always be integer it can be float also.
My method works fine when X is integer ? How to deal when X is a floating point number
If approximate results are OK and you have access to a math library with base-2 exponential (exp2 in C), natural log gamma (lgamma in C), and natural log (log in C), then you can do
exp2(lgamma(n+1)/log(2) - r).
Find the power that 2 appears at in n!. This is:
P = n / 2 + n / 2^2 + n / 2^3 + ...
Using integer division until you reach a 0 result.
If P >= r, then you have an integer result. You can find this result by computing the factorial such that you ignore r powers of 2. Something like:
factorial = 1
for i = 2 to n:
factor = i
while factor % 2 == 0 and r != 0:
factor /= 2
r -= 1
factorial *= factor
If P < r, set r = P, apply the same algorithm and divide the result by 2^(initial_r - P) in the end.
Except for a very few cases (with small n and r) X will not be an integer -- for if n >= 11 then 11 divides n! but doesn't divide any power of two, so if X were integral it would have to be at least 11.
One method would be: initialise X to one; then loop: if X > 10 divide by 2 till its not; if X < 10 multiply by the next factors till its not; until you run out of factors and powers of 2.
An approach that would be tunable for precision/performance would be the following:
Store the factorial in an integer with a fixed number of bits. We can drop the last few digits if the number gets too large, since they won't affect the overall result altogether that much. By scaling this integer larger/smaller the algorithm gets tunable for either performance or precision.
Whenever the integer would overflow due to multiplication, shift it to the right by a few places and subtract that value from r. In the end there should be a small number left as r and an integer v with the most significant bits of the factorial. This v can now be interpreted as a fixed-point number with r fractional digits.
Depending upon the required precision this approach might even work with long, though I haven't had the time to test this approach yet apart from a bit experimenting with a calculator.

How many numbers have a maximum number of unique prime factors in a given range

Note that the divisors have to be unique
So 32 has 1 unique prime factor [2], 40 has [2, 5] and so on.
Given a range [a, b], a, b <= 2^31, we should find how many numbers in this range have a maximum number of unique divisors.
The best algorithm I can imagine is an improved Sieve of Eratosthenes, with an array counting how many prime factors a number has. But it is not only O(n), which is unacceptable with such a range, but also very inefficient in terms of memory.
What is the best algorithm to solve this question? Is there such an algorithm?
I'll write a first idea in Python-like pseudocode. First find out how many prime factors you may need at most:
p = 1
i = 0
while primes[i] * p <= b:
p = p * primes[i]
i = i + 1
This only used b, not a, so you may have to decrease the number of actual prime factors. But since the result of the above is at most 9 (as the product of the first 10 primes already exceeds 231), you can conceivably go down from this maximum one step at a time:
cnt = 0
while cnt == 0:
cnt = count(i, 1, 0)
i = i - 1
return cnt
So now we need to implement this function count, which I define recursively.
def count(numFactorsToGo, productSoFar, nextPrimeIndex):
if numFactorsToGo > 0:
cnt = 0
while productSoFar * primes[nextPrimeIndex] <= b:
cnt = cnt + count(numFactorsToGo - 1,
productSoFar * primes[nextPrimeIndex],
nextPrimeIndex + 1)
nextPrimeIndex = nextPrimeIndex + 1
return cnt
return floor(b / productSoFar) - ceil(a / productSoFar) + 1
This function has two cases to distinguish. In the first case, you don't have the desired number of prime factors yet. So you multiply in another prime, which has to be larger than the largest prime already included in the product so far. You achieve this by starting at the given index for the next prime. You add the counts for all these recursive calls.
The second case is where you have reached the desired number of prime factors. In this case, you want to count all possible integers k such that a ≤ k∙p ≤ b. Which translates easily into ⌈a/p⌉ ≤ k ≤ ⌊b/p⌋ so the count would be ⌊b/p⌋ − ⌈a/p⌉ + 1. In an actual implementation I'd not use floating-point division and floor or ceil, but instead I'd make use of truncating integer division for the sake of performance. So I'd probably write this line as
return (b // productSoFar) - ((a - 1) // productSoFar + 1) + 1
As it is written now, you'd need the primes array precomouted up to 231, which would be a list of 105,097,565 numbers according to Wolfram Alpha. That will cause considerable memory requirements, and will also make the outer loops (where productSoFar is still small) iterate over a large number of primes which won't be needed later on.
One thing you can do is change the end of loop condition. Instead of just checking that adding one more prime doesn't make the product exceed b, you can check whether including the next primesToGo primes in the product is possible without exceeding b. This will allow you to end the loop a lot earlier if the total number of prime factors is large.
For a small number of prime factors, things are still tricky. In particular if you have a very narrow range [a, b] then the number with maximal prime factor count might well be a large prime factor times a product of very small primes. Consider for example [2147482781, 2147482793]. This interval contains 4 elements with 4 distinct factors, some of which contain quite large prime factors, namely
3 ∙ 5 ∙ 7 ∙ 20,452,217
22 ∙ 3 ∙ 11 ∙ 16,268,809
2 ∙ 5 ∙ 19 ∙ 11,302,541
23 ∙ 7 ∙ 13 ∙ 2,949,839
Since there are only 4,792 primes up to sqrt(231), with 46,337 as their largest (which fits into a 16 bit unsigned integer). It would be possible to precompute only those, and use that to factor each number in the range. But that would again mean iterating over the range. Which makes sense for small ranges, but not for large ones.
So perhaps you need to distinguish these cases up front, and then choose the algorithm accordingly. I don't have a good idea of how to combine these ideas – yet. If someone else does, feel free to extend this post or write your own answer building on this.

Why is naive multiplication n^2 time?

I've read that operations such as addition/subtraction were linear time, and that "grade-school" long multiplication is n^2 time. Why is this true?
Isn't addition floor(log n) times, when n is the smaller operand? The same argument goes for subtraction, and for multiplication, if we make a program to do long multiplication instead of adding integers together, shouldn't the complexity be floor(log a) * floor(log b) where a and b are the operands?
The answer depends on what is "n." When they say that addition is O(n) and multiplication (with the naïve algorithm) is O(n^2), n is the length of the number, either in bits or some other unit. This definition is used because arbitrary precision arithmetic is implemented as operations on lists of "digits" (not necessarily base 10).
If n is the number being added or multiplied, the complexities would be log n and (log n)^2 for positive n, as long as the numbers are stored in log n space.
The naive approach to multiplication of (for example) 273 x 12 is expanded out (using the distributive rule) as (200 + 70 + 3) x (10 + 2) or:
200 x 10 + 200 x 2
+ 70 x 10 + 70 x 2
+ 3 x 10 + 3 x 2
The idea of this simplification is to reduce the multiplications to something that can be done easily. For your primary school math, that would be working with digits, assuming you know the times tables from zero to nine. For bignum libraries where each "digit" may be a value from 0 to 9999 (for ease of decimal printing), the same rules apply, being able to multiply numbers less than 10,000 relatively constantly).
Hence, if n is the number of digits, the complexity is indeed O(n2) since the number of "constant" operations tends to rise with the product of the "digit" counts.
This is true even if your definition of digit varies slightly (such as being a value from 0 to 9999 or even being one of the binary digits 0 or 1).

Karatsuba Multiplication for unequal size, non-power-of-2 operands

What's the most efficient way to implement Karatsuba large number multiplication with input operands of unequal size and whose size is not a power of 2 and perhaps not even an even number? Padding the operands means additional memory, and I want to try and make it memory-efficient.
One of the things I notice in non-even-number size Karatsuba is that if we try to divide the number into "halves" as close to even as possible, one half will have m+1 elements and the other m, where m = floor(n/2), n being the number of elements in the split number. If both numbers are of the same odd size, then we need to compute products of two numbers of size m+1, requiring n+1 storage, as opposed to n in the case when n is even. So am I right in guessing that Karatsuba for odd sizes may require slightly more memory than for even sizes?
Most of the time, length of operands will not be a power of 2. I think this is rare case. Most of the time there will be different lengths of operands. But this will not be a problem for a Karatsuba algo.
Actually, I don't see any problem here. This overhead (odd length) is so light and definitely not a big deal. Problem about different lengths - let's assume, that X = 1234 and Y = 45
So, a = 12, b = 34, c = 0, d = 45
So, after that X * Y = 10 ^ 4 * ac + 10 ^ 2 (ad + bc) + bd
ac = 0;
bd = 34 * 45;
ad + bc = (a + b) * (c + d) - ac - bd = 540;
And, if we assume, that we could multiply 2 digit numbers easily - you could get the answer = 55530. Same, as just multiply 1234 * 45 in any calculator :) So, I can't see any memory issue with different lengths of numbers.
You can multiply the numbers by powers of 10 so that each of them have even numbers of digits. Apply karatsuba algorithm and them divide the answer by the the factor of powers of 10 that you multiplied the original 2 numbers to make them even.
Eg: 123*12
Compute 1230*1200 and divide the answer with 1000.
To answer your doubt in comments above. Trick is to follow the formula to calculate powers of 10 in case of decimal calculation.
10^2m(A.C) + 10^m((A+B).(C+D)-A.C-B.D) + B.D
m = n/2 + n%2
n is length of number
Refer to wiki it explains in details.

Sum of digits of a factorial

Link to the original problem
It's not a homework question. I just thought that someone might know a real solution to this problem.
I was on a programming contest back in 2004, and there was this problem:
Given n, find sum of digits of n!. n can be from 0 to 10000. Time limit: 1 second. I think there was up to 100 numbers for each test set.
My solution was pretty fast but not fast enough, so I just let it run for some time. It built an array of pre-calculated values which I could use in my code. It was a hack, but it worked.
But there was a guy, who solved this problem with about 10 lines of code and it would give an answer in no time. I believe it was some sort of dynamic programming, or something from number theory. We were 16 at that time so it should not be a "rocket science".
Does anyone know what kind of an algorithm he could use?
EDIT: I'm sorry if I didn't made the question clear. As mquander said, there should be a clever solution, without bugnum, with just plain Pascal code, couple of loops, O(n2) or something like that. 1 second is not a constraint anymore.
I found here that if n > 5, then 9 divides sum of digits of a factorial. We also can find how many zeros are there at the end of the number. Can we use that?
Ok, another problem from programming contest from Russia. Given 1 <= N <= 2 000 000 000, output N! mod (N+1). Is that somehow related?
I'm not sure who is still paying attention to this thread, but here goes anyway.
First, in the official-looking linked version, it only has to be 1000 factorial, not 10000 factorial. Also, when this problem was reused in another programming contest, the time limit was 3 seconds, not 1 second. This makes a huge difference in how hard you have to work to get a fast enough solution.
Second, for the actual parameters of the contest, Peter's solution is sound, but with one extra twist you can speed it up by a factor of 5 with 32-bit architecture. (Or even a factor of 6 if only 1000! is desired.) Namely, instead of working with individual digits, implement multiplication in base 100000. Then at the end, total the digits within each super-digit. I don't know how good a computer you were allowed in the contest, but I have a desktop at home that is roughly as old as the contest. The following sample code takes 16 milliseconds for 1000! and 2.15 seconds for 10000! The code also ignores trailing 0s as they show up, but that only saves about 7% of the work.
#include <stdio.h>
int main() {
unsigned int dig[10000], first=0, last=0, carry, n, x, sum=0;
dig[0] = 1;
for(n=2; n <= 9999; n++) {
carry = 0;
for(x=first; x <= last; x++) {
carry = dig[x]*n + carry;
dig[x] = carry%100000;
if(x == first && !(carry%100000)) first++;
carry /= 100000; }
if(carry) dig[++last] = carry; }
for(x=first; x <= last; x++)
sum += dig[x]%10 + (dig[x]/10)%10 + (dig[x]/100)%10 + (dig[x]/1000)%10
+ (dig[x]/10000)%10;
printf("Sum: %d\n",sum); }
Third, there is an amazing and fairly simple way to speed up the computation by another sizable factor. With modern methods for multiplying large numbers, it does not take quadratic time to compute n!. Instead, you can do it in O-tilde(n) time, where the tilde means that you can throw in logarithmic factors. There is a simple acceleration due to Karatsuba that does not bring the time complexity down to that, but still improves it and could save another factor of 4 or so. In order to use it, you also need to divide the factorial itself into equal sized ranges. You make a recursive algorithm prod(k,n) that multiplies the numbers from k to n by the pseudocode formula
prod(k,n) = prod(k,floor((k+n)/2))*prod(floor((k+n)/2)+1,n)
Then you use Karatsuba to do the big multiplication that results.
Even better than Karatsuba is the Fourier-transform-based Schonhage-Strassen multiplication algorithm. As it happens, both algorithms are part of modern big number libraries. Computing huge factorials quickly could be important for certain pure mathematics applications. I think that Schonhage-Strassen is overkill for a programming contest. Karatsuba is really simple and you could imagine it in an A+ solution to the problem.
Part of the question posed is some speculation that there is a simple number theory trick that changes the contest problem entirely. For instance, if the question were to determine n! mod n+1, then Wilson's theorem says that the answer is -1 when n+1 is prime, and it's a really easy exercise to see that it's 2 when n=3 and otherwise 0 when n+1 is composite. There are variations of this too; for instance n! is also highly predictable mod 2n+1. There are also some connections between congruences and sums of digits. The sum of the digits of x mod 9 is also x mod 9, which is why the sum is 0 mod 9 when x = n! for n >= 6. The alternating sum of the digits of x mod 11 equals x mod 11.
The problem is that if you want the sum of the digits of a large number, not modulo anything, the tricks from number theory run out pretty quickly. Adding up the digits of a number doesn't mesh well with addition and multiplication with carries. It's often difficult to promise that the math does not exist for a fast algorithm, but in this case I don't think that there is any known formula. For instance, I bet that no one knows the sum of the digits of a googol factorial, even though it is just some number with roughly 100 digits.
This is A004152 in the Online Encyclopedia of Integer Sequences. Unfortunately, it doesn't have any useful tips about how to calculate it efficiently - its maple and mathematica recipes take the naive approach.
I'd attack the second problem, to compute N! mod (N+1), using Wilson's theorem. That reduces the problem to testing whether N is prime.
Small, fast python script found at It's elegant but still brute force.
import sys
for arg in sys.argv[1:]:
print reduce( lambda x,y: int(x)+int(y),
str( reduce( lambda x, y: x*y, range(1,int(arg)))))
$ time python 432 951 5436 606 14 9520
real 0m1.252s
user 0m1.108s
sys 0m0.062s
Assume you have big numbers (this is the least of your problems, assuming that N is really big, and not 10000), and let's continue from there.
The trick below is to factor N! by factoring all n<=N, and then compute the powers of the factors.
Have a vector of counters; one counter for each prime number up to N; set them to 0. For each n<= N, factor n and increase the counters of prime factors accordingly (factor smartly: start with the small prime numbers, construct the prime numbers while factoring, and remember that division by 2 is shift). Subtract the counter of 5 from the counter of 2, and make the counter of 5 zero (nobody cares about factors of 10 here).
compute all the prime number up to N, run the following loop
for (j = 0; j< last_prime; ++j) {
count[j] = 0;
for (i = N/ primes[j]; i; i /= primes[j])
count[j] += i;
Note that in the previous block we only used (very) small numbers.
For each prime factor P you have to compute P to the power of the appropriate counter, that takes log(counter) time using iterative squaring; now you have to multiply all these powers of prime numbers.
All in all you have about N log(N) operations on small numbers (log N prime factors), and Log N Log(Log N) operations on big numbers.
and after the improvement in the edit, only N operations on small numbers.
1 second? Why can't you just compute n! and add up the digits? That's 10000 multiplications and no more than a few ten thousand additions, which should take approximately one zillionth of a second.
You have to compute the fatcorial.
1 * 2 * 3 * 4 * 5 = 120.
If you only want to calculate the sum of digits, you can ignore the ending zeroes.
For 6! you can do 12 x 6 = 72 instead of 120 * 6
For 7! you can use (72 * 7) MOD 10
I wrote a response too quickly...
10 is the result of two prime numbers 2 and 5.
Each time you have these 2 factors, you can ignore them.
1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 * 11 * 12 * 13 * 14 * 15...
1 2 3 2 5 2 7 2 3 2 11 2 13 2 3
2 3 2 3 5 2 7 5
2 3
The factor 5 appears at 5, 10, 15...
Then a ending zero will appear after multiplying by 5, 10, 15...
We have a lot of 2s and 3s... We'll overflow soon :-(
Then, you still need a library for big numbers.
I deserve to be downvoted!
Let's see. We know that the calculation of n! for any reasonably-large number will eventually lead to a number with lots of trailing zeroes, which don't contribute to the sum. How about lopping off the zeroes along the way? That'd keep the sizer of the number a bit smaller?
Hmm. Nope. I just checked, and integer overflow is still a big problem even then...
Even without arbitrary-precision integers, this should be brute-forceable. In the problem statement you linked to, the biggest factorial that would need to be computed would be 1000!. This is a number with about 2500 digits. So just do this:
Allocate an array of 3000 bytes, with each byte representing one digit in the factorial. Start with a value of 1.
Run grade-school multiplication on the array repeatedly, in order to calculate the factorial.
Sum the digits.
Doing the repeated multiplications is the only potentially slow step, but I feel certain that 1000 of the multiplications could be done in a second, which is the worst case. If not, you could compute a few "milestone" values in advance and just paste them into your program.
One potential optimization: Eliminate trailing zeros from the array when they appear. They will not affect the answer.
OBVIOUS NOTE: I am taking a programming-competition approach here. You would probably never do this in professional work.
another solution using BigInteger
static long q20(){
long sum = 0;
String factorial = factorial(new BigInteger("100")).toString();
for(int i=0;i<factorial.length();i++){
sum += Long.parseLong(factorial.charAt(i)+"");
return sum;
static BigInteger factorial(BigInteger n){
BigInteger one = new BigInteger("1");
if(n.equals(one)) return one;
return n.multiply(factorial(n.subtract(one)));
