Outlier in linear algorithm for nth Fibonacci - performance

So I was taught that using the recurrence relation of Fibonacci numbers, I could get an O(n) algorithm. But, due to the large size of Fibonacci numbers for large n, the addition takes proportionally longer : meaning that, the time complexity is no longer linear.
That's all well and fine but in this graph (source), why is there a number near 1800000 which takes significantly longer to compute than it's neighbours?
EDIT: As mentioned in the answers, the outlier is around 180000, not 1800000

The outlier occurred at 180,000, not 1,800,000. I don't know how big integers are stored in python, but assuming 32 bit words stored as binary, fib(180000) takes close to 100,000 bytes. I suspect an issue with the testing for why 180,000 would take significantly longer than 181,000 or 179,000.
#cdlane mentioned the time it would take for fib(170000) to fib(200000), but the increment is 1000, so that would be 30 test cases, run 10 times each, which would take less than 20 minutes.
The article linked to mentioned a matrix variation for calculating Fibonacci number which is a log2(n) process. This can be further optimized using a Lucas sequence which is similar logic (repeated squaring for raising a matrix to a power). Example C code for 64 bit unsigned integers:
uint64_t fibl(uint64_t n) {
uint64_t a, b, p, q, qq, aq;
a = q = 1;
b = p = 0;
while(1) {
if(n & 1) {
aq = a*q;
a = b*q + aq + a*p;
b = b*p + aq;
}
n >>= 1;
if(n == 0)
break;
qq = q*q;
q = 2*p*q + qq;
p = p*p + qq;
}
return b;
}

probably the measurement of time was done on a system where other
tasks were performed simultaneously, which was causing a single
computation to take longer than expected.
Part of the graph that the OP didn't copy says:
Note: Each data point was averaged over 10 calculcations[sic].
So that seems to negate that explanation unless it was a huge difference and we're only seeing it averaged with normal times.
(My 2012 vintage Mac mini calculates the 200,000th number in 1/2 a second using the author's code. The machine that produced this data took over 4 seconds to calculate the 200,000th number. I pulled out a 2009 vintage MacBook to confirm that this is believable.)
I did my own crude graph with a modified version of the code that didn't restart at the beginning each time but kept a running total of the time and came up with the following chart:
The times are a bit longer than expected as the algorithm I used adds some printing overhead into each calculation. But no big anomaly between 180,000 and 195,000.

Related

What is the best approach for computing logarithm of an integer x base 2 approximated to the greatest integer less than or equal to it? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
What is the best amortized time complexity to compute floor(log(x)) amongst the algorithms that find floor(log(x)) for base 2?
There are many different algorithms for computing logarithms, each of which represents a different tradeoff of some sort. This answer surveys a variety of approaches and some of the tradeoffs involved.
Approach 1: Iterated Multiplication
One simple approach for computing ⌊logb n⌋ is to compute the sequence b0, b1, b2, etc. until we find a value greater than n. At that point, we can stop, and return the exponent just before this. The code for this is fairly straightforward:
x = 0; # Exponent
bX = 1; # b^Exponent
while (bx <= n) {
x++;
bX *= b;
}
return x - 1;
How fast is this? Notice that the inner loop counts up x = 0, x = 1, x = 2, etc. until eventually we reach x = ⌊logb x⌋, doing one multiplication per iteration. If we assume all the integers we're dealing with fit into individual machine words - say, we're working with int or long or something like that - then each multiply takes time O(1) and the overall runtime is O(logb n), with a memory usage of O(1).
Approach 2: Repeated Squaring
There's an old interview question that goes something like this. I have a number n, and I'm not going to tell you what it is. You can make queries of the form "is x equal to n, less than n, or greater than n?," and your goal is to use the fewest queries to figure out what n is. Assuming you have literally no idea what n can be, one reasonable approach works like this: guess the values 1, 2, 4, 8, 16, 32, ..., 2k, ..., until you overshoot n. At that point, use a binary search on the range you just found to discover what n is. This runs in time O(log n), since after computing 21 + log2 n = 2n you'll overshoot n, and after that you're binary searching over a range of size n for a total runtime of O(log n).
Finding logarithms, in a sense, kinda sorta matches this problem. We have a number n written as bx for some unknown x, and we want to find x. Using the above strategy as a starting point, we can therefore compute b20, b21, b22, etc. until we overshoot bx. From there, we can run a secondary binary search to figure out the exact exponent required.
We can compute the series of values b2k by using the fact that
b2k+1 = b2 · 2k = (b2k)2
and find a value that overshoots as follows:
x = 1; # exponent
bX = b; # b^x
while (bX <= n) {
bX *= bX; # bX = bX^2
x *= 2;
}
# Overshot, now do the binary search
The problem is how to do that binary search to figure things out. Specifically, we know that b2x is too big, but we don't know by how much. And unlike the "guess the number" game, binary searching over the exponent is a bit tricky.
One cute solution to this is based on the idea that if x is the value that we're looking for, then we can write x as a series of bits in binary. For example, let's write x = am-12m-1 + am-22m-2 + ... + a121 + a020. Then
bx = bam-12m-1 + am-22m-2 + ... + a121 + a020
= 2am-12m-1 · 2am-22m-2 · ... · 2a0 20
In other words, we can try to discover what bx is by building up x one bit at a time. To do so, as we compute the values b1, b2, b4, b8, etc., we can write down the values we discover. Then, once we overshoot, we can try multiplying them in and see which ones should be included and which should be excluded. Here's what that looks like:
x = 1; // Exponent
bX = b; // b^x
powers = [b]; // b^{2^0}
exps = [1]; // 2^0
while (bX <= n) {
bX *= bX; // bX = bX^2
powers += bX; // Append bX
x++;
exps += x;
}
# Overshot, now recover the bits
resultExp = 1
result = 0;
while (x > 0) {
# If including this bit doesn't overshoot, it's part of the
# representation of x.
if (resultExp * powers[x] <= n) {
resultExp *= powers[x];
result += exps[x];
}
x--;
}
return result;
This is certainly a more involved approach, but it is faster. Since the value we're looking for is ⌊bx⌋ and we're essentially using the solution from the "guess the number game" to figure out what x is, the number of multiplications is O(log logb n), with a memory usage of O(log logb n) to hold the intermediate powers. This is exponentially faster than the previous solution!
Approach 3: Zeckendorf Representations
There's a slight modification on the previous approach that keeps the runtime of O(log logb n) but drops the auxiliary space usage to O(1). The idea is that, instead of writing the exponent in binary using the regular system, we write the number out using Zeckendorf's theorem, which is a binary number system based on the Fibonacci sequence. The advantage is that instead of having to store the intermediate powers of two, we can use the fact that any two consecutive Fibonacci numbers are sufficient to let you compute the next or previous Fibonacci number, allowing us to regenerate the powers of b as needed. Here's an implementation of that idea in C++.
Approach 4: Bitwise Iterated Multiplication
In some cases, you need to find logarithms where the log base is two. In that case, you can take advantage of the fact that numbers on a computer are represented in binary and that multiplications and divisions by two correspond to bit shifts.
For example, let's take the iterated multiplication approach from before, where we computed larger and larger powers of b until we overshot. We can do that same technique using bitshifts, and it's much faster:
x = 0; # Exponent
while ((1 << x) <= n) {
x++;
}
return x - 1;
This approach still runs in time O(log n), as before, but is probably faster implemented this way than with multiplications because the CPU can do bit shifts much more quickly.
Approach 5: Bitwise Binary Search
The base-two logarithm of a number, written in binary, is equivalent to the position of the most-significant bit of that number. To find that bit, we can use a binary search technique somewhat reminiscent of Approach 2, though done faster because the machine can process multiple bits in parallel in a single instruction. Basically, as before, we generate the sequence 220, 221, etc. until we overshoot the number, giving an upper bound on how high the highest bit can be. From there, we do a binary search to find the highest 1 bit. Here's what that looks like:
x = 1;
while ((1 << x) <= n) {
x *= 2;
}
# We've overshot the high-order bit. Do a binary search to find it.
low = 0;
high = x;
while (low < high) {
mid = (low + high) / 2;
# Form a bitmask with 1s up to and including bit number mid.
# This can be done by computing 2^{m+1} - 1.
mask = (1 << (mid + 1)) - 1
# If the mask overlaps, branch higher
if (mask & n) {
low = mid + 1
}
# Otherwise, branch lower
else {
high = mid
}
}
return high - 1
This approach runs in time O(log log n), since the binary search takes time logarithmic in the quantity being searched for and the quantity we're searching for is O(log n). It uses O(1) auxiliary space.
Approach 6: Magical Word-Level Parallelism
The speedup in Approach 5 is largely due to the fact that we can test multiple bits in parallel using a single bitwise operation. By doing some truly amazing things with machine words, it's possible to harness this parallelism to find the most-significant bit in a number in time O(1) using only basic arithmetic operations and bit-shifts, and to do so in a way where the runtime is completely independent of the size of a machine word (e.g. the algorithm works equally quickly on a 16-bit, 32-bit, and 64-bit machine). The techniques involved are fairly complex and I will confess that I had no idea this was possible to do until recently when I learned the technique when teaching an advanced data structures course.
Summary
To summarize, here are the approaches listed, their time complexity, and their space complexity.
Approach Which Bases? Time Complexity Space Complexity
--------------------------------------------------------------------------
Iter. Multiplication Any O(log_b n) O(1)
Repeated Squaring Any O(log log_b n) O(log log_b n)
Zeckendorf Logarithm Any O(log log_b n) O(1)
Bitwise Multiplication 2 O(log n) O(1)
Bitwise Binary Search 2 O(log log n) O(1)
Word-Level Parallelism 2 O(1) O(1)
There are many other algorithms I haven't mentioned here that are worth exploring. Some algorithms work by segmenting machine words into blocks of some fixed size, precomputing the position of the first 1 bit in each block, then testing one block at a time. These approaches have runtimes that depend on the size of the machine word, and (to the best of my knowledge) none of them are asymptotically faster than the approaches I've outlined here. Other approaches work by using the fact that some processors have instructions that immediately output the position of the most significant bit in a number, or work by using floating-point hardware. Those are also interesting and fascinating, be sure to check them out!
Another area worth exploring is when you have arbitrary-precision integers. There, the costs of multiplications, divisions, shifts, etc. are not O(1), and the relative costs of these algorithms change. This is definitely worth exploring in more depth if you're curious!
The code included here is written in pseudocode because it's designed mostly for exposition. In a real implementation, you'd need to worry about overflow, handling the case where the input is negative or zero, etc. Just FYI. :-)
Hope this helps!

Why does this loop take O(2^n) time complexity?

There is a loop which perform a brute-force algorithm to calculate 5 * 3 without using arithmetical operators.
I just need to add Five 3times so that it takes O(3) which is O(y) if inputs are x * y.
However, in a book, it says it takes O(2^n) where n is the number of bits in the input. I don't understand why it use O(2^n) to represent it O(y). Is it more good way to show time complexity?. Could you please explain me?
I'm not asking other algorithm to calculate this.
int result = 0
for(int i=0; i<3; i++){
result += 5
}
You’re claiming that the time complexity is O(y) on the input, and the book is claiming that the time complexity is O(2n) on the number of bits in the input. Good news: you’re both right! If a number y can be represented by n bits, y is at most 2n − 1.
I think that you're misreading the passage from the book.
When the book is talking about the algorithm for computing the product of two numbers, it uses the example of multiplying 3 × 5 as a concrete instance of the more general idea of computing x × y by adding y + y + ... + y, x total times. It's not claiming that the specific algorithm "add 5 + 5 + 5" runs in time O(2n). Instead, think about this algorithm:
int total = 0;
for (int i = 0; i < x; i++) {
total += y;
}
The runtime of this algorithm is O(x). If you measure the runtime as a function of the number of bits n in the number x - as is suggested by the book - then the runtime is O(2n), since to represent the number x you need O(log n) bits. This is the distinction between polynomial time and pseudopolynomial time, and the reason the book then goes on to describe a better algorithm for solving this problem is so that the runtime ends up being a polynomial in the number of bits used to represent the input rather than in the numeric value of the numbers. The exposition about grade-school multiplication and addition is there to help you get a better sense for the difference between these two quantities.
Do not think with 3 and 5. Think how to calculate 2 billion x 2 billion (roughly 2^31 multiplied with 2^31)
Your inputs are 31 bits each (N) and your loop will be executed 2 billion times i.e. 2^N.
So, book is correct. For 5x3 case, 3 is 2 bits. So it is complexity is O(2^2). Again correct.

Fastest way to compute (n + 1)^j from (n^j)

I need to compute 0^j, 1^j, ..., k^j for some very large k and j (both in the order of a few millions). I am using GMP to handle the big integer numbers (yes, I need integer numbers as I need full precision). Now, I wonder, once I have gone through the effort of computing n^j, isn't there a way to speed up the computation of (n + 1)^j, instead of starting from scratch?
Here is the algorithm I am currently using to compute the power:
mpz_class pow(unsigned long int b, unsigned long int e)
{
mpz_class res = 1;
mpz_class m = b;
while(e)
{
if(e & 1)
{
res *= m;
}
e >>= 1;
m *= m;
}
return res;
}
As you can see, every time I start from scratch, and it takes a lot of time.
To compute n^j, why not find at least one factor of n, say k perform n^j = k^j * (n/k)^j ? By the time n^j is being computed, both k^j and (n/k)^j should be known.
However the above takes potentially O(sqrt(n)) time for n. We have a computation of n^j independently in O(log(j)) time by Exponentiation by Squaring as you have mentioned in the code above.
So you could have a mix of the above depending on which is larger:
If n is much smaller than log(j), compute n^j by factorization.
Whenever n^j is known compute {(2*n)^j, (3*n)^j, ..., ((n-1)*n)^j, n * n^j} and keep it in a lookup table.
If n is larger than log(j) and a ready computation as above is not possible, use the logarithmic method and then compute the other related powers like above.
If n is a pure power of 2 (possible const time computation), compute the jth power by shifting and calculate the related sums.
If n is even (const time computation again), use the factorization method and compute associated products.
The above should make it quite fast. For example, identification of even numbers by itself should convert half of the power computations to multiplications. There could be many more thumb rules that could be found regarding factorization that could reduce the computation further (especially for divisibility by 3, 7 etc)
You may want to use the binomial expansion of (n+1)^j as n^j + jn^(j-1)+j(j-1)/2 * n^(j-2) +... + 1 and memoize lower powers already computed and reuse them to compute (n+1)^j in O(n) time by addition. If you compute the coefficients j, j*(j-1)/2,... incrementally while adding each term, that can be done in O(n) too.

Determine time complexity for unpredictive non-linear iterative algorithm

I don't know whether I used the unpredictive word correctly or not. But here's the problem:
I have a rectangular piece of paper of length a and breadth b. I will keep cutting the squares from it, of side equal to min(a,b) until the last square is of unit length. Determine the number of squares I can cut.
Here's my algorithm :
#include <iostream>
using namespace std;
int main()
{
long long a,b,temp,small,large,res;
cin >> a >> b;
res = 0;
small = min(a,b);
large = a + b - small;
while(small > 0 && small != large)
{
res = res + large/small;
temp = large % small;
large = small;
small = temp;
}
cout << res;
return 0;
}
I am confused how to calculate time complexity in this case, as max(a,b) decreases to 1, in non-even fashion, depending upon the initial values of a and b. The best case would definitely be when, either or both of them is 1 already. The worst case would be, I guess, when both are prime. Please help me to analyze the time complexity.
This algorithm is very similar to the Euclidean algorithm for computing the greatest common divisor. Recall that that algorithm works by:
Start with two numbers a, b, assume without loss than a >= b. If a == b then stop.
In the next round, the two numbers are b and a % b instead.
Now consider your algorithm. It's the same, except it's a - b instead. But this will actually do the same thing if a < 2 * b. And if a < k * b, then in the next round it changes only by a multiple of b, so after at most k rounds, it will converge to a % b. So this is just a slower version of the Euclidean algorithm.
The time complexity of the Euclidean algorithm is quite fast -- since it's repeated division, the number of rounds is not more than the number of digits.
Edit: To expand on last part:
To analyze the time complexity, a first question is, how many rounds does it take.
An easy way to start is, if a and b have n and m digits in their (binary) description, then there can't be more than n + m rounds. Because, as long as b is at least two in a given round, we will be dividing one of the numbers by two in that round, so the result will have one less digit. If b is one, then this is the last round.
A second question is, how much time does it take to do a single round.
If you are satisfied with "the running time is at most polynomial in the number of digits", then this is now clear, since you can easily do division in polynomial in the number of digits.
I'm not actually sure what the tightest analysis possible is. You might be able to do some kind of win-win analysis to improve on this, I'm almost sure this has been studied but I don't know a reference, sorry.

Sum of digits of a factorial

Link to the original problem
It's not a homework question. I just thought that someone might know a real solution to this problem.
I was on a programming contest back in 2004, and there was this problem:
Given n, find sum of digits of n!. n can be from 0 to 10000. Time limit: 1 second. I think there was up to 100 numbers for each test set.
My solution was pretty fast but not fast enough, so I just let it run for some time. It built an array of pre-calculated values which I could use in my code. It was a hack, but it worked.
But there was a guy, who solved this problem with about 10 lines of code and it would give an answer in no time. I believe it was some sort of dynamic programming, or something from number theory. We were 16 at that time so it should not be a "rocket science".
Does anyone know what kind of an algorithm he could use?
EDIT: I'm sorry if I didn't made the question clear. As mquander said, there should be a clever solution, without bugnum, with just plain Pascal code, couple of loops, O(n2) or something like that. 1 second is not a constraint anymore.
I found here that if n > 5, then 9 divides sum of digits of a factorial. We also can find how many zeros are there at the end of the number. Can we use that?
Ok, another problem from programming contest from Russia. Given 1 <= N <= 2 000 000 000, output N! mod (N+1). Is that somehow related?
I'm not sure who is still paying attention to this thread, but here goes anyway.
First, in the official-looking linked version, it only has to be 1000 factorial, not 10000 factorial. Also, when this problem was reused in another programming contest, the time limit was 3 seconds, not 1 second. This makes a huge difference in how hard you have to work to get a fast enough solution.
Second, for the actual parameters of the contest, Peter's solution is sound, but with one extra twist you can speed it up by a factor of 5 with 32-bit architecture. (Or even a factor of 6 if only 1000! is desired.) Namely, instead of working with individual digits, implement multiplication in base 100000. Then at the end, total the digits within each super-digit. I don't know how good a computer you were allowed in the contest, but I have a desktop at home that is roughly as old as the contest. The following sample code takes 16 milliseconds for 1000! and 2.15 seconds for 10000! The code also ignores trailing 0s as they show up, but that only saves about 7% of the work.
#include <stdio.h>
int main() {
unsigned int dig[10000], first=0, last=0, carry, n, x, sum=0;
dig[0] = 1;
for(n=2; n <= 9999; n++) {
carry = 0;
for(x=first; x <= last; x++) {
carry = dig[x]*n + carry;
dig[x] = carry%100000;
if(x == first && !(carry%100000)) first++;
carry /= 100000; }
if(carry) dig[++last] = carry; }
for(x=first; x <= last; x++)
sum += dig[x]%10 + (dig[x]/10)%10 + (dig[x]/100)%10 + (dig[x]/1000)%10
+ (dig[x]/10000)%10;
printf("Sum: %d\n",sum); }
Third, there is an amazing and fairly simple way to speed up the computation by another sizable factor. With modern methods for multiplying large numbers, it does not take quadratic time to compute n!. Instead, you can do it in O-tilde(n) time, where the tilde means that you can throw in logarithmic factors. There is a simple acceleration due to Karatsuba that does not bring the time complexity down to that, but still improves it and could save another factor of 4 or so. In order to use it, you also need to divide the factorial itself into equal sized ranges. You make a recursive algorithm prod(k,n) that multiplies the numbers from k to n by the pseudocode formula
prod(k,n) = prod(k,floor((k+n)/2))*prod(floor((k+n)/2)+1,n)
Then you use Karatsuba to do the big multiplication that results.
Even better than Karatsuba is the Fourier-transform-based Schonhage-Strassen multiplication algorithm. As it happens, both algorithms are part of modern big number libraries. Computing huge factorials quickly could be important for certain pure mathematics applications. I think that Schonhage-Strassen is overkill for a programming contest. Karatsuba is really simple and you could imagine it in an A+ solution to the problem.
Part of the question posed is some speculation that there is a simple number theory trick that changes the contest problem entirely. For instance, if the question were to determine n! mod n+1, then Wilson's theorem says that the answer is -1 when n+1 is prime, and it's a really easy exercise to see that it's 2 when n=3 and otherwise 0 when n+1 is composite. There are variations of this too; for instance n! is also highly predictable mod 2n+1. There are also some connections between congruences and sums of digits. The sum of the digits of x mod 9 is also x mod 9, which is why the sum is 0 mod 9 when x = n! for n >= 6. The alternating sum of the digits of x mod 11 equals x mod 11.
The problem is that if you want the sum of the digits of a large number, not modulo anything, the tricks from number theory run out pretty quickly. Adding up the digits of a number doesn't mesh well with addition and multiplication with carries. It's often difficult to promise that the math does not exist for a fast algorithm, but in this case I don't think that there is any known formula. For instance, I bet that no one knows the sum of the digits of a googol factorial, even though it is just some number with roughly 100 digits.
This is A004152 in the Online Encyclopedia of Integer Sequences. Unfortunately, it doesn't have any useful tips about how to calculate it efficiently - its maple and mathematica recipes take the naive approach.
I'd attack the second problem, to compute N! mod (N+1), using Wilson's theorem. That reduces the problem to testing whether N is prime.
Small, fast python script found at http://www.penjuinlabs.com/blog/?p=44. It's elegant but still brute force.
import sys
for arg in sys.argv[1:]:
print reduce( lambda x,y: int(x)+int(y),
str( reduce( lambda x, y: x*y, range(1,int(arg)))))
$ time python sumoffactorialdigits.py 432 951 5436 606 14 9520
3798
9639
74484
5742
27
141651
real 0m1.252s
user 0m1.108s
sys 0m0.062s
Assume you have big numbers (this is the least of your problems, assuming that N is really big, and not 10000), and let's continue from there.
The trick below is to factor N! by factoring all n<=N, and then compute the powers of the factors.
Have a vector of counters; one counter for each prime number up to N; set them to 0. For each n<= N, factor n and increase the counters of prime factors accordingly (factor smartly: start with the small prime numbers, construct the prime numbers while factoring, and remember that division by 2 is shift). Subtract the counter of 5 from the counter of 2, and make the counter of 5 zero (nobody cares about factors of 10 here).
compute all the prime number up to N, run the following loop
for (j = 0; j< last_prime; ++j) {
count[j] = 0;
for (i = N/ primes[j]; i; i /= primes[j])
count[j] += i;
}
Note that in the previous block we only used (very) small numbers.
For each prime factor P you have to compute P to the power of the appropriate counter, that takes log(counter) time using iterative squaring; now you have to multiply all these powers of prime numbers.
All in all you have about N log(N) operations on small numbers (log N prime factors), and Log N Log(Log N) operations on big numbers.
and after the improvement in the edit, only N operations on small numbers.
HTH
1 second? Why can't you just compute n! and add up the digits? That's 10000 multiplications and no more than a few ten thousand additions, which should take approximately one zillionth of a second.
You have to compute the fatcorial.
1 * 2 * 3 * 4 * 5 = 120.
If you only want to calculate the sum of digits, you can ignore the ending zeroes.
For 6! you can do 12 x 6 = 72 instead of 120 * 6
For 7! you can use (72 * 7) MOD 10
EDIT.
I wrote a response too quickly...
10 is the result of two prime numbers 2 and 5.
Each time you have these 2 factors, you can ignore them.
1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 * 11 * 12 * 13 * 14 * 15...
1 2 3 2 5 2 7 2 3 2 11 2 13 2 3
2 3 2 3 5 2 7 5
2 3
The factor 5 appears at 5, 10, 15...
Then a ending zero will appear after multiplying by 5, 10, 15...
We have a lot of 2s and 3s... We'll overflow soon :-(
Then, you still need a library for big numbers.
I deserve to be downvoted!
Let's see. We know that the calculation of n! for any reasonably-large number will eventually lead to a number with lots of trailing zeroes, which don't contribute to the sum. How about lopping off the zeroes along the way? That'd keep the sizer of the number a bit smaller?
Hmm. Nope. I just checked, and integer overflow is still a big problem even then...
Even without arbitrary-precision integers, this should be brute-forceable. In the problem statement you linked to, the biggest factorial that would need to be computed would be 1000!. This is a number with about 2500 digits. So just do this:
Allocate an array of 3000 bytes, with each byte representing one digit in the factorial. Start with a value of 1.
Run grade-school multiplication on the array repeatedly, in order to calculate the factorial.
Sum the digits.
Doing the repeated multiplications is the only potentially slow step, but I feel certain that 1000 of the multiplications could be done in a second, which is the worst case. If not, you could compute a few "milestone" values in advance and just paste them into your program.
One potential optimization: Eliminate trailing zeros from the array when they appear. They will not affect the answer.
OBVIOUS NOTE: I am taking a programming-competition approach here. You would probably never do this in professional work.
another solution using BigInteger
static long q20(){
long sum = 0;
String factorial = factorial(new BigInteger("100")).toString();
for(int i=0;i<factorial.length();i++){
sum += Long.parseLong(factorial.charAt(i)+"");
}
return sum;
}
static BigInteger factorial(BigInteger n){
BigInteger one = new BigInteger("1");
if(n.equals(one)) return one;
return n.multiply(factorial(n.subtract(one)));
}

Resources