I basically stucked on rather simple problem:
Toss N coins and discover, how many of them land heads
Solution performance must not depend on N, so we can't just call Math.random() < 0.5 N times. Obviously, there is Gaussian distribution to the rescue.
I used Box-Muller method for it:
function gaussian_random(mean, variance) {
var s;
var x;
var y;
do {
x = Math.random() * 2.0 - 1.0;
y = Math.random() * 2.0 - 1.0;
s = Math.pow(x, 2) + Math.pow(y, 2);
} while ( (s > 1) || (s == 0) );
var gaussian = x * Math.sqrt(-2*Math.log(s)/s);
return mean + gaussian * Math.sqrt(variance);
}
Math says, that mean of N coin tosses is N/2 and variance is N/4
Then, I made test, which tosses N coins M times, giving Minimum, Maximum, and Average number of heads.
I compared results of naive approach (Math.random() many times) and Gaussian Box-Muller approach.
There is typical output of tests:
Toss 1000 coins, 10000 times
Straight method:
Elapsed time: 127.330 ms
Minimum: 434
Maximum: 558
Average: 500.0306
Box-Muller method:
Elapsed time: 2.575 ms
Minimum: 438.0112461962819
Maximum: 562.9739632480057
Average: 499.96195358695064
Toss 10 coins, 10000 times
Straight method:
Elapsed time: 2.100 ms
Minimum: 0
Maximum: 10
Average: 5.024
Box-Muller method:
Elapsed time: 2.270 ms
Minimum: -1.1728354576573263
Maximum: 11.169478925333504
Average: 5.010078819562535
As we can see on N = 1000 it fits almost perfectly.
BUT, on N = 10 Box-Muller goes crazy: it allows such tests outcomes, where i can get (quite rarely, but it is possible) 11.17 heads from 10 coin tosses! :)
No doubt, i am doing something wrong. But what exactly?
There is source of test, and link to launch it
Updated x2: it seems, previously I didn't describe problem well. There is general version of it:
How to get sample mean of N uniform random values (either discrete or continuous) in amortized constant time. Gaussian distribution is efficient for large N, but is there a way to make it work good on small N? Or on which exactly N, solution should switch from Gaussian method to some other (for exampled straightforward).
Math says, that mean of N coin tosses is N/2 and variance is N/4.
Math only says that if it's a fair coin. And there's no way the solution doesn't depend on N.
Assuming all tosses are independent of each other, for exact behaviors use a binomial distribution rather than a normal distribution. Binomial has two parameters: N is the number of coin tosses, and p is the probability of getting heads (or tails if you prefer). In pseudocode...
function binomial(n, p) {
counter = 0
successes = 0
while counter < n {
if Math.random() <= p
successes += 1
counter += 1
}
return successes
}
There are faster algorithms for large N, but this is straightforward and mathematically correct.
Based on what discussed in approved answer i came up to this particular solution.
There is rule of thumb n*p >= 10 and n*(1-p) >= 10, but lets define stricter one.
First of all, Box-Muller gonna be hard capped at [-6,6], to ensure proper outcome (640 kB ought..., I mean, 6 sigmas ought to be enough for everybody).
Then, using 6 constant, we state, that in order for Box-Muller to produce valid results, Math.sqrt(variance) * 6 must not exceed mean. After using N/2 and N/4 as mean and variance respectively, and reductions, we end up with this:
Math.sqrt(N) * 6 <= N
N >= 36
While this condition is true, we can safely use capped Box-Muller Gaussian.
On any lower sample size, stick to the Binomial solution.
Just checked this rule statistically - on relatively large amount (10 million) of tests, minimum value stops falling out of boundaries from sample size 32 and above.
Related
There is a loop which perform a brute-force algorithm to calculate 5 * 3 without using arithmetical operators.
I just need to add Five 3times so that it takes O(3) which is O(y) if inputs are x * y.
However, in a book, it says it takes O(2^n) where n is the number of bits in the input. I don't understand why it use O(2^n) to represent it O(y). Is it more good way to show time complexity?. Could you please explain me?
I'm not asking other algorithm to calculate this.
int result = 0
for(int i=0; i<3; i++){
result += 5
}
You’re claiming that the time complexity is O(y) on the input, and the book is claiming that the time complexity is O(2n) on the number of bits in the input. Good news: you’re both right! If a number y can be represented by n bits, y is at most 2n − 1.
I think that you're misreading the passage from the book.
When the book is talking about the algorithm for computing the product of two numbers, it uses the example of multiplying 3 × 5 as a concrete instance of the more general idea of computing x × y by adding y + y + ... + y, x total times. It's not claiming that the specific algorithm "add 5 + 5 + 5" runs in time O(2n). Instead, think about this algorithm:
int total = 0;
for (int i = 0; i < x; i++) {
total += y;
}
The runtime of this algorithm is O(x). If you measure the runtime as a function of the number of bits n in the number x - as is suggested by the book - then the runtime is O(2n), since to represent the number x you need O(log n) bits. This is the distinction between polynomial time and pseudopolynomial time, and the reason the book then goes on to describe a better algorithm for solving this problem is so that the runtime ends up being a polynomial in the number of bits used to represent the input rather than in the numeric value of the numbers. The exposition about grade-school multiplication and addition is there to help you get a better sense for the difference between these two quantities.
Do not think with 3 and 5. Think how to calculate 2 billion x 2 billion (roughly 2^31 multiplied with 2^31)
Your inputs are 31 bits each (N) and your loop will be executed 2 billion times i.e. 2^N.
So, book is correct. For 5x3 case, 3 is 2 bits. So it is complexity is O(2^2). Again correct.
I've got an algorithm that can be interpreted as dividing up the number line into an equal number of chunks. For simplicity, I'll stick with [0,1), it will be divided up like so:
0|----|----|----|----|1
What I need to do is take a range of numbers [j,k) and find the largest number of chunks, N, up to some maximum M, that will divide up the number line so that [j,k) still all fall into the same "bin". This is trickier than it sounds, as the range can straddle a bin like so:
j|-|k
0|----|----|----|----|1
So that you may have to get to quite a low number before the range is entirely contained. What's more, as the number of bins goes up, the range may move in and out of a single bin, so that there's local minima.
The obvious answer is to start with M bins, and decrease the number until the range falls into a single bin. However, I'd like to know if there's a faster way than enumerating all possible divisions, as my maximum number can be reasonable large (80 million or so).
Is there a better algorithm for this?
Here I would like to give another heuristic, which is different from btilly's.
The task is to find integers m and n such that m / n <= j < k <= (m + 1) / n, with n as big as possible (but still under M).
Intuitively, it is preferable that the fraction m / n is close to j. This leads to the idea of using continued fractions.
The algorithm that I propose is quite simple:
calculate all the continued fractions of j using minus signs (so that the fractions are always approching j from above), until the denominator exceeds M;
for each such fraction m / n, find the biggest integer i >= 0 such that k <= (m * i + 1) / (n * i) and n * i <= M, and replace the fraction m / n with (m * i) / (n * i);
among all the fractions in 2, find the one with biggest denominator.
The algorithm is not symmetric in j and k. Hence there is a similar k-version, which in general should not give the same answer, so that you can choose the bigger one from the two results.
Example: Here I will take btilly's example: j = 0.6 and k = 0.65, but I will take M = 10.
I will first go through the j-procedure. To calculate the continued fraction expansion of j, we compute:
0.6
= 0 + 0.6
= 0 + 1 / (2 - 0.3333)
= 0 + 1 / (2 - 1 / (3 - 0))
Since 0.6 is a rational number, the expansion terminates in fintely many steps. The corresponding fractions are:
0 = 0 / 1
0 + 1 / 2 = 1 / 2
0 + 1 / (2 - 1 / 3) = 3 / 5
Computing the corresponding i values in step 2, we replaces the three factions with:
0 / 1 = 0 / 1
1 / 2 = 3 / 6
3 / 5 = 6 / 10
The biggest denominator is given by 6 / 10.
Continue with the example above, the corresponding k-procedure goes as follows:
0.65
= 1 - 0.35
= 1 - 1 / (3 - 0.1429)
= 1 - 1 / (3 - 1 / (7 - 0))
Hence the corresponding fractions:
1 = 1 / 1
1 - 1 / 3 = 2 / 3
1 - 1 / (3 - 1 / 7) = 13 / 20
Passing step 2, we get:
1 / 1 = 2 / 2
2 / 3 = 6 / 9
13 / 20 = 0 / 0 (this is because 20 is already bigger than M = 10)
The biggest denominator is given by 6 / 9.
EDIT: experimental results.
To my surprise, the algorithm works better than I thought.
I did the following experiment, with the bound M ignored (equivalently, one can take M big enough).
In every round, I generate a pair (j, k) of uniformly distributed random numbers in the inteval [0, 1) with j < k. If the difference k - j is smaller than 1e-4, I discard this pair, making this round ineffective. Otherwise I calculate the true result trueN using naive algorithm, and calculate the heuristic result heurN using my algorithm, and add them to statistic data. This goes for 1e6 rounds.
Here is the result:
effective round = 999789
sum of trueN = 14013312
sum of heurN = 13907575
correct percentage = 99.2262 %
average quotient = 0.999415
The correct percentage is the percentage of effective rounds such that trueN is equal to heurN, and the average quotient is the average of the quotient heurN / trueN for all effective rounds.
Thus the method gives the correct answer in 99%+ cases.
I also did experiments with smaller M values, and the results are similar.
The best case for the bin size must be larger than k-j.
Consider the number line segments [0..j]and [k..1). If we can divide both of the partial segments into parts using the same bin size, we should be able to solve the problem.
So if we consider gcd((j-0)/(k-j), (1-k)/(k-j)), (where we use the greatest-integer-function after the division), we should be able to get a good estimate, or the best value. There are corner cases: if (k-j) > j or (k-j) > (1-k), the best value is 1 itself.
So a very good estimate should be min(1, (k-j) * gcd((j-0)/(k-j), (1-k)/(k-j)))
Let's turn this around a bit.
You'd like to find m, n as large as you can (though n < M) with m/n close to but less than j and k <= (m+1)/n.
All promising candidates will be on the https://en.wikipedia.org/wiki/Stern%E2%80%93Brocot_tree. Indeed you'll get a reasonably good answer just walking the Stern-Brocot tree to find the last "large rational" fitting your limit just below j whose top is at k or above.
There is a complication. Usually the tree converges quickly. But sometimes the Stern-Brocot tree has long sequences with very small gaps. For example the sequence to get to 0.49999999 will include 1/3, 2/5, 3/7, 4/9, ... We always fall into those sequences when a/b < c/d, then we take the median (a+c)/(b+d) and then we walk towards one side, so (a+i*c)/(b+i*d). If you're clever, then rather than walking the whole sequence you can just do a binary search for the right power of i to use.
The trick to that cleverness is to view your traversal as:
Start with 2 "equal" fractions.
Take their median. If that exceeds M then I'm done. Else figure out which direction I am going from that.
Try powers of 2 in (a+i*c)/(b+i*d) until I know what range i is in for my range and M conditions.
Do binary search to find the last i that I can use.
(a+i*c)/(b+i*d) and (a+i*c+c)/(b+i*d+d) are my two new equal fractions. Go back to the first step.
The initial equal fractions are, of course, 0/1 and 1/1.
This will always find a decent answer in O(log(M)) operations. Unfortunately this reasonably good answer is NOT always correct. Consider the case where M = 3, j=0.6 and k=0.65. In this case the heuristic would stop at 1/2 while the actual best answer is 1/3.
Another way that it can fail is that it only finds reduced answers. In the above example if M was 4 then it still thinks that the best answer is 1/2 when it is actually 1/4. It is easy to handle this by testing whether a multiple of your final answer will work. (That step will improve your answer a fixed, but reasonably large, fraction of the time.)
I'm currently utilizing an online variance algorithm to calculate the variance for a given sequence. This works nicely, and also gives good numerical stability and overflow resistance, at the cost of some speed, which is fine. My question is, does an algorithm exist that will be faster than this if the sample mean is already known, while having similar stability and resistance to overflow (hence not something like a naive variance calculation).
The current online variance calculation is a single-pass algorithm with both divisions and multiplications in the main loop (which is what impacts speed). From wikipedia:
def online_variance(data):
n = 0
mean = 0
M2 = 0
for x in data:
n = n + 1
delta = x - mean
mean = mean + delta/n
M2 = M2 + delta*(x - mean)
variance = M2/(n - 1)
return variance
The thing that causes a naive variance calculation to go unstable is the fact that you separately sum the X (to get mean(x)) and the X^2 values and then take the difference
var = mean(x^2) - (mean(x))^2
But since the definition of variance is
var = mean((x - mean(x))^2)
You can just evaluate that and it will be as fast as it can be. When you don't know the mean, you have to compute it first for stability, or use the "naive" formulation that goes through the data only once at the expense of numerical stability.
EDIT
Now that you have given the "original" code, it's easy to be better (faster). As you correctly point out, the division in the inner loop is slowing you down. Try this one for comparison:
def newVariance(data, mean):
n = 0
M2 = 0
for x in data:
n = n + 1
delta = x - mean
M2 = M2 + delta * delta
variance = M2 / (n - 1)
return variance
Note - this looks a lot like the two_pass_variance algorithm from Wikipedia, except that you don't need the first pass to compute the mean since you say it is already known.
I'm trying to calculate an appropriate timeout time for a real time simulator that I'm writing:
For p = probability of success, the time for a successful request = m, and the time for a failed attempt = f. What is the average time for 5 successful requests?
Let's call the total number of tries x.
x = 5p + (x-5)(1-p)
x = 5 / p
The total time would be
t = 5m + (x-5)f
or
t = 5m + (5 / p - 5)f
If m=1, f=2, and p=0.1, the answer should be 5 * 1 + 45 * 2 = 95. This checks out.
There might be errors in here, but I did my best.
This is probably more appropriate on the stats exchange, by the way, but here's the answer:
If you want the average time, you need to average over the possible total number of trials required to get your five successes. This could be anywhere from 5 to infinite (it requires at least 5 trials to get 5 successes, and you could in theory have an infinitely long sequence of failures). I would suggest you could happily cut this off at a reasonable point to get an answer accurate to several decimal places for anything other than pathological values of p. Let n be the number of trials, from which we want to observe x=5 successes. The probability for 5 successes in n trials is given by the binomial distribution, parameterised by x, n, and p. Let Bin(x; n,p) be the binomial probability, then the time associated with this is:
5m + (n-5)f
To get the expectation (average) of this quantity, then you want the sum:
Bin(5; n,p) * 5m + (n-5)f
for for n=5 to n=inf. Depending on your value of p, you should be able to stop at n=20 to 30 and still obtain fairly accurate answers. Beware if you're using a naive implementation of the binomial probability that the computation of the binomial coefficient, which involves a term of n!, may fail for moderately large n, so you may want to consider the normal approximation to the binomial.
Link to the original problem
It's not a homework question. I just thought that someone might know a real solution to this problem.
I was on a programming contest back in 2004, and there was this problem:
Given n, find sum of digits of n!. n can be from 0 to 10000. Time limit: 1 second. I think there was up to 100 numbers for each test set.
My solution was pretty fast but not fast enough, so I just let it run for some time. It built an array of pre-calculated values which I could use in my code. It was a hack, but it worked.
But there was a guy, who solved this problem with about 10 lines of code and it would give an answer in no time. I believe it was some sort of dynamic programming, or something from number theory. We were 16 at that time so it should not be a "rocket science".
Does anyone know what kind of an algorithm he could use?
EDIT: I'm sorry if I didn't made the question clear. As mquander said, there should be a clever solution, without bugnum, with just plain Pascal code, couple of loops, O(n2) or something like that. 1 second is not a constraint anymore.
I found here that if n > 5, then 9 divides sum of digits of a factorial. We also can find how many zeros are there at the end of the number. Can we use that?
Ok, another problem from programming contest from Russia. Given 1 <= N <= 2 000 000 000, output N! mod (N+1). Is that somehow related?
I'm not sure who is still paying attention to this thread, but here goes anyway.
First, in the official-looking linked version, it only has to be 1000 factorial, not 10000 factorial. Also, when this problem was reused in another programming contest, the time limit was 3 seconds, not 1 second. This makes a huge difference in how hard you have to work to get a fast enough solution.
Second, for the actual parameters of the contest, Peter's solution is sound, but with one extra twist you can speed it up by a factor of 5 with 32-bit architecture. (Or even a factor of 6 if only 1000! is desired.) Namely, instead of working with individual digits, implement multiplication in base 100000. Then at the end, total the digits within each super-digit. I don't know how good a computer you were allowed in the contest, but I have a desktop at home that is roughly as old as the contest. The following sample code takes 16 milliseconds for 1000! and 2.15 seconds for 10000! The code also ignores trailing 0s as they show up, but that only saves about 7% of the work.
#include <stdio.h>
int main() {
unsigned int dig[10000], first=0, last=0, carry, n, x, sum=0;
dig[0] = 1;
for(n=2; n <= 9999; n++) {
carry = 0;
for(x=first; x <= last; x++) {
carry = dig[x]*n + carry;
dig[x] = carry%100000;
if(x == first && !(carry%100000)) first++;
carry /= 100000; }
if(carry) dig[++last] = carry; }
for(x=first; x <= last; x++)
sum += dig[x]%10 + (dig[x]/10)%10 + (dig[x]/100)%10 + (dig[x]/1000)%10
+ (dig[x]/10000)%10;
printf("Sum: %d\n",sum); }
Third, there is an amazing and fairly simple way to speed up the computation by another sizable factor. With modern methods for multiplying large numbers, it does not take quadratic time to compute n!. Instead, you can do it in O-tilde(n) time, where the tilde means that you can throw in logarithmic factors. There is a simple acceleration due to Karatsuba that does not bring the time complexity down to that, but still improves it and could save another factor of 4 or so. In order to use it, you also need to divide the factorial itself into equal sized ranges. You make a recursive algorithm prod(k,n) that multiplies the numbers from k to n by the pseudocode formula
prod(k,n) = prod(k,floor((k+n)/2))*prod(floor((k+n)/2)+1,n)
Then you use Karatsuba to do the big multiplication that results.
Even better than Karatsuba is the Fourier-transform-based Schonhage-Strassen multiplication algorithm. As it happens, both algorithms are part of modern big number libraries. Computing huge factorials quickly could be important for certain pure mathematics applications. I think that Schonhage-Strassen is overkill for a programming contest. Karatsuba is really simple and you could imagine it in an A+ solution to the problem.
Part of the question posed is some speculation that there is a simple number theory trick that changes the contest problem entirely. For instance, if the question were to determine n! mod n+1, then Wilson's theorem says that the answer is -1 when n+1 is prime, and it's a really easy exercise to see that it's 2 when n=3 and otherwise 0 when n+1 is composite. There are variations of this too; for instance n! is also highly predictable mod 2n+1. There are also some connections between congruences and sums of digits. The sum of the digits of x mod 9 is also x mod 9, which is why the sum is 0 mod 9 when x = n! for n >= 6. The alternating sum of the digits of x mod 11 equals x mod 11.
The problem is that if you want the sum of the digits of a large number, not modulo anything, the tricks from number theory run out pretty quickly. Adding up the digits of a number doesn't mesh well with addition and multiplication with carries. It's often difficult to promise that the math does not exist for a fast algorithm, but in this case I don't think that there is any known formula. For instance, I bet that no one knows the sum of the digits of a googol factorial, even though it is just some number with roughly 100 digits.
This is A004152 in the Online Encyclopedia of Integer Sequences. Unfortunately, it doesn't have any useful tips about how to calculate it efficiently - its maple and mathematica recipes take the naive approach.
I'd attack the second problem, to compute N! mod (N+1), using Wilson's theorem. That reduces the problem to testing whether N is prime.
Small, fast python script found at http://www.penjuinlabs.com/blog/?p=44. It's elegant but still brute force.
import sys
for arg in sys.argv[1:]:
print reduce( lambda x,y: int(x)+int(y),
str( reduce( lambda x, y: x*y, range(1,int(arg)))))
$ time python sumoffactorialdigits.py 432 951 5436 606 14 9520
3798
9639
74484
5742
27
141651
real 0m1.252s
user 0m1.108s
sys 0m0.062s
Assume you have big numbers (this is the least of your problems, assuming that N is really big, and not 10000), and let's continue from there.
The trick below is to factor N! by factoring all n<=N, and then compute the powers of the factors.
Have a vector of counters; one counter for each prime number up to N; set them to 0. For each n<= N, factor n and increase the counters of prime factors accordingly (factor smartly: start with the small prime numbers, construct the prime numbers while factoring, and remember that division by 2 is shift). Subtract the counter of 5 from the counter of 2, and make the counter of 5 zero (nobody cares about factors of 10 here).
compute all the prime number up to N, run the following loop
for (j = 0; j< last_prime; ++j) {
count[j] = 0;
for (i = N/ primes[j]; i; i /= primes[j])
count[j] += i;
}
Note that in the previous block we only used (very) small numbers.
For each prime factor P you have to compute P to the power of the appropriate counter, that takes log(counter) time using iterative squaring; now you have to multiply all these powers of prime numbers.
All in all you have about N log(N) operations on small numbers (log N prime factors), and Log N Log(Log N) operations on big numbers.
and after the improvement in the edit, only N operations on small numbers.
HTH
1 second? Why can't you just compute n! and add up the digits? That's 10000 multiplications and no more than a few ten thousand additions, which should take approximately one zillionth of a second.
You have to compute the fatcorial.
1 * 2 * 3 * 4 * 5 = 120.
If you only want to calculate the sum of digits, you can ignore the ending zeroes.
For 6! you can do 12 x 6 = 72 instead of 120 * 6
For 7! you can use (72 * 7) MOD 10
EDIT.
I wrote a response too quickly...
10 is the result of two prime numbers 2 and 5.
Each time you have these 2 factors, you can ignore them.
1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 * 11 * 12 * 13 * 14 * 15...
1 2 3 2 5 2 7 2 3 2 11 2 13 2 3
2 3 2 3 5 2 7 5
2 3
The factor 5 appears at 5, 10, 15...
Then a ending zero will appear after multiplying by 5, 10, 15...
We have a lot of 2s and 3s... We'll overflow soon :-(
Then, you still need a library for big numbers.
I deserve to be downvoted!
Let's see. We know that the calculation of n! for any reasonably-large number will eventually lead to a number with lots of trailing zeroes, which don't contribute to the sum. How about lopping off the zeroes along the way? That'd keep the sizer of the number a bit smaller?
Hmm. Nope. I just checked, and integer overflow is still a big problem even then...
Even without arbitrary-precision integers, this should be brute-forceable. In the problem statement you linked to, the biggest factorial that would need to be computed would be 1000!. This is a number with about 2500 digits. So just do this:
Allocate an array of 3000 bytes, with each byte representing one digit in the factorial. Start with a value of 1.
Run grade-school multiplication on the array repeatedly, in order to calculate the factorial.
Sum the digits.
Doing the repeated multiplications is the only potentially slow step, but I feel certain that 1000 of the multiplications could be done in a second, which is the worst case. If not, you could compute a few "milestone" values in advance and just paste them into your program.
One potential optimization: Eliminate trailing zeros from the array when they appear. They will not affect the answer.
OBVIOUS NOTE: I am taking a programming-competition approach here. You would probably never do this in professional work.
another solution using BigInteger
static long q20(){
long sum = 0;
String factorial = factorial(new BigInteger("100")).toString();
for(int i=0;i<factorial.length();i++){
sum += Long.parseLong(factorial.charAt(i)+"");
}
return sum;
}
static BigInteger factorial(BigInteger n){
BigInteger one = new BigInteger("1");
if(n.equals(one)) return one;
return n.multiply(factorial(n.subtract(one)));
}