Given that the first number to divide all (1,2,..,10) is 2520.
And given that the first number to divide all (1,2,..,20) is 232792560.
Find the first number to divide all (1,2,..,100). (all consecutive numbers from 1 to 100).
The answer should run in less than a minute.
I'm writing the solution in Java, and I'm facing two problems:
How can I compute this is the solution itself is a number so huge that cannot be handled?
I tried using "BigInteger" by I'm doing many additions and divisions and I don't know if this is increasing my time complexity.
How can I calculate this in a less than a minute? The solution I though about so far haven't even stopped.
This is my Java code (using big integers):
public static boolean solved(int start, int end, BigInteger answer) {
for (int i=start; i<=end; i++) {
if (answer.mod(new BigInteger(valueOf(i))).compareTo(new BigInteger(valueOf(0)))==0) {
return false;
}
}
return true;
}
public static void main(String[] args) {
BigInteger answer = new BigInteger("232792560");
BigInteger y = new BigInteger("232792560");
while(!solved(21,100, answer)) {
answer = answer.add(y);
}
System.out.println(answer);
}
I take advantage of the fact that I know already the solution for (1,..,20).
Currently is simply not stopping.
I though I could improve it by changing the function solved to check for only values we care about.
For example:
100 = 25,50,100
99 = 33,99
98 = 49,98
97 = 97
96 = 24,32,48,96
And so on. But, this simple calculation of identifying the smallest group of number needed has become a problem itself to which I didn't look for / found a solution. Of course, the time complexity should stay under a minute either case.
Any other ideas?
The first number that can be divided by all elements of some set (which is what you have there, despite the slightly different formulation) is also known as the Least Common Multiple of that set. LCM(a, b, c) = LCM(LCM(a, b), c) and so on, so in general, it can be computed by taking n - 1 pairwise LCMs where n is the number of items in the set. BigInteger does not have an lcm function, but the LCM can be computed via a * b / gcd(a, b) so in Java with BigInteger:
static BigInteger lcm(BigInteger a, BigInteger b) {
return a.multiply(b).divide(a.gcd(b));
}
For 1 to 20, computing the LCM in that way indeed results in 232792560. It's easy to do it up to 100 too.
Find all max prime powers in your range and take their product.
E.g. 1-10: 2^3, 3^2, 5^1, 7^1: product is 2520, which is the right answer (not 5250). You could find the primes via the sieve of Eratosthenes or just download them from a list of primes.
As 100 is small, you can work this out by producing the prime factorization of all numbers from 2 to 100 and keep the largest exponent of each prime among all factorizations. In fact, trying divisions by 2, 3, 5 and 7 will be enough to check primality up to 100, and there are just 25 primes to consider. You can implement a simple sieve to find the primes and perform the factorizations.
After you found all exponents of the prime decomposition of the lcm, you can either leave this as the answer, or perform the multiplications explicitly.
Related
I faced this problem on one training. Namely we have given N different values (N<= 100). Let's name this array A[N], for this array A we are sure that we have 1 in the array and A[i] ≤ 109. Secondly we have given number S where S ≤ 109.
Now we have to solve classic coin problem with this values. Actually we need to find minimum number of element which will sum to exactly S. Every element from A can be used infinite number of times.
Time limit: 1 sec
Memory limit: 256 MB
Example:
S = 1000, N = 10
A[] = {1,12,123,4,5,678,7,8,9,10}. The result is 10.
1000 = 678 + 123 + 123 + 12 + 12 + 12 + 12 + 12 + 12 + 4
What I have tried
I tried to solve this with classic dynamic programming coin problem technique but it uses too much memory and it gives memory limit exceeded.
I can't figure out what should we keep about those values. Thanks in advance.
Here are the couple test cases that cannot be solved with the classic dp coin problem.
S = 1000000000 N = 100
1 373241370 973754081 826685384 491500595 765099032 823328348 462385937
251930295 819055757 641895809 106173894 898709067 513260292 548326059
741996520 959257789 328409680 411542100 329874568 352458265 609729300
389721366 313699758 383922849 104342783 224127933 99215674 37629322
230018005 33875545 767937253 763298440 781853694 420819727 794366283
178777428 881069368 595934934 321543015 27436140 280556657 851680043
318369090 364177373 431592761 487380596 428235724 134037293 372264778
267891476 218390453 550035096 220099490 71718497 860530411 175542466
548997466 884701071 774620807 118472853 432325205 795739616 266609698
242622150 433332316 150791955 691702017 803277687 323953978 521256141
174108096 412366100 813501388 642963957 415051728 740653706 68239387
982329783 619220557 861659596 303476058 85512863 72420422 645130771
228736228 367259743 400311288 105258339 628254036 495010223 40223395
110232856 856929227 25543992 957121494 359385967 533951841 449476607
134830774
OUTPUT FOR THIS TEST CASE: 5
S = 999865497 N = 7
1 267062069 637323855 219276511 404376890 528753603 199747292
OUTPUT FOR THIS TEST CASE: 1129042
S = 1000000000 N = 40
1 12 123 4 5 678 7 8 9 10 400 25 23 1000 67 98 33 46 79 896 11 112 1223 412
532 6781 17 18 19 170 1400 925 723 11000 607 983 313 486 739 896
OUTPUT FOR THIS TEST CASE: 90910
(NOTE: Updated and edited for clarity. Complexity Analysis added at the end.)
OK, here is my solution, including my fixes to the performance issues found by #PeterdeRivaz. I have tested this against all of the test cases provided in the question and the comments and it finishes all in under a second (well, 1.5s in one case), using primarily only the memory for the partial results cache (I'd guess about 16MB).
Rather than using the traditional DP solution (which is both too slow and requires too much memory), I use a Depth-First, Greedy-First combinatorial search with pruning using current best results. I was surprised (very) that this works as well as it does, but I still suspect that you could construct test sets that would take a worst-case exponential amount of time.
First there is a master function that is the only thing that calling code needs to call. It handles all of the setup and initialization and calls everything else. (all code is C#)
// Find the min# of coins for a specified sum
int CountChange(int targetSum, int[] coins)
{
// init the cache for (partial) memoization
PrevResultCache = new PartialResult[1048576];
// make sure the coins are sorted lowest to highest
Array.Sort(coins);
int curBest = targetSum;
int result = CountChange_r(targetSum, coins, coins.GetLength(0)-1, 0, ref curBest);
return result;
}
Because of the problem test-cases raised by #PeterdeRivaz I have also added a partial results cache to handle when there are large numbers in N[] that are close together.
Here is the code for the cache:
// implement a very simple cache for previous results of remainder counts
struct PartialResult
{
public int PartialSum;
public int CoinVal;
public int RemainingCount;
}
PartialResult[] PrevResultCache;
// checks the partial count cache for already calculated results
int PrevAddlCount(int currSum, int currCoinVal)
{
int cacheAddr = currSum & 1048575; // AND with (2^20-1) to get only the first 20 bits
PartialResult prev = PrevResultCache[cacheAddr];
// use it, as long as it's actually the same partial sum
// and the coin value is at least as large as the current coin
if ((prev.PartialSum == currSum) && (prev.CoinVal >= currCoinVal))
{
return prev.RemainingCount;
}
// otherwise flag as empty
return 0;
}
// add or overwrite a new value to the cache
void AddPartialCount(int currSum, int currCoinVal, int remainingCount)
{
int cacheAddr = currSum & 1048575; // AND with (2^20-1) to get only the first 20 bits
PartialResult prev = PrevResultCache[cacheAddr];
// only add if the Sum is different or the result is better
if ((prev.PartialSum != currSum)
|| (prev.CoinVal <= currCoinVal)
|| (prev.RemainingCount == 0)
|| (prev.RemainingCount >= remainingCount)
)
{
prev.PartialSum = currSum;
prev.CoinVal = currCoinVal;
prev.RemainingCount = remainingCount;
PrevResultCache[cacheAddr] = prev;
}
}
And here is the code for the recursive function that does the actual counting:
/*
* Find the minimum number of coins required totaling to a specifuc sum
* using a list of coin denominations passed.
*
* Memory Requirements: O(N) where N is the number of coin denominations
* (primarily for the stack)
*
* CPU requirements: O(Sqrt(S)*N) where S is the target Sum
* (Average, estimated. This is very hard to figure out.)
*/
int CountChange_r(int targetSum, int[] coins, int coinIdx, int curCount, ref int curBest)
{
int coinVal = coins[coinIdx];
int newCount = 0;
// check to see if we are at the end of the search tree (curIdx=0, coinVal=1)
// or we have reached the targetSum
if ((coinVal == 1) || (targetSum == 0))
{
// just use math get the final total for this path/combination
newCount = curCount + targetSum;
// update, if we have a new curBest
if (newCount < curBest) curBest = newCount;
return newCount;
}
// prune this whole branch, if it cannot possibly improve the curBest
int bestPossible = curCount + (targetSum / coinVal);
if (bestPossible >= curBest)
return bestPossible; //NOTE: this is a false answer, but it shouldnt matter
// because we should never use it.
// check the cache to see if a remainder-count for this partial sum
// already exists (and used coins at least as large as ours)
int prevRemCount = PrevAddlCount(targetSum, coinVal);
if (prevRemCount > 0)
{
// it exists, so use it
newCount = prevRemCount + targetSum;
// update, if we have a new curBest
if (newCount < curBest) curBest = newCount;
return newCount;
}
// always try the largest remaining coin first, starting with the
// maximum possible number of that coin (greedy-first searching)
newCount = curCount + targetSum;
for (int cnt = targetSum / coinVal; cnt >= 0; cnt--)
{
int tmpCount = CountChange_r(targetSum - (cnt * coinVal), coins, coinIdx - 1, curCount + cnt, ref curBest);
if (tmpCount < newCount) newCount = tmpCount;
}
// Add our new partial result to the cache
AddPartialCount(targetSum, coinVal, newCount - curCount);
return newCount;
}
Analysis:
Memory: Memory usage is pretty easy to determine for this algorithm. Basiclly there's only the partial results cache and the stack. The cache is fixed at appx. 1 million entries times the size of each entry (3*4 bytes), so about 12MB. The stack is limited to O(N), so together, memory is clearly not a problem.
CPU: The run-time complexity of this algorithm starts out hard to determine and then gets harder, so please excuse me because there's a lot of hand-waving here. I tried to search for an analysis of just the brute-force problem (combinatorial search of sums of N*kn base values summing to S) but not much turned up. What little there was tended to say it was O(N^S), which is clearly too high. I think that a fairer estimate is O(N^(S/N)) or possibly O(N^(S/AVG(N)) or even O(N^(S/(Gmean(N))) where Gmean(N) is the geometric mean of the elements of N[]. This solution starts out with the brute-force combinatorial search and then improves it with two significant optimizations.
The first is the pruning of branches based on estimates of the best possible results for that branch versus what the best result it has already found. If the best-case estimators were perfectly accurate and the work for branches was perfectly distributed, this would mean that if we find a result that is better than 90% of the other possible cases, then pruning would effectively eliminate 90% of the work from that point on. To make a long story short here, this should work out that the amount of work still remaining after pruning should shrink harmonically as it progress. Assuming that some kind of summing/integration should be applied to get a work total, this appears to me to work out to a logarithm of the original work. So let's call it O(Log(N^(S/N)), or O(N*Log(S/N)) which is pretty darn good. (Though O(N*Log(S/Gmean(N))) is probably more accurate).
However, there are two obvious holes with this. First, it is true that the best-case estimators are not perfectly accurate and thus they will not prune as effectively as assumed above, but, this is somewhat counter-balanced by the Greedy-First ordering of the branches which gives the best chances for finding better solutions early in the search which increase the effectiveness of pruning.
The second problem is that the best-case estimator works better when the different values of N are far apart. Specifically, if |(S/n2 - S/n1)| > 1 for any 2 values in N, then it becomes almost perfectly effective. For values of N less than SQRT(S), then even two adjacent values (k, k+1) are far enough apart that that this rule applies. However for increasing values above SQRT(S) a window opens up so that any number of N-values within that window will not be able to effectively prune each other. The size of this window is approximately K/SQRT(S). So if S=10^9, when K is around 10^6 this window will be almost 30 numbers wide. This means that N[] could contain 1 plus every number from 1000001 to 1000029 and the pruning optimization would provide almost no benefit.
To address this, I added the partial results cache which allows memoization of the most recent partial sums up to the target S. This takes advantage of the fact that when the N-values are close together, they will tend to have an extremely high number of duplicates in their sums. As best as I can figure, this effectiveness is approximately the N times the J-th root of the problem size where J = S/K and K is some measure of the average size of the N-values (Gmean(N) is probably the best estimate). If we apply this to the brute-force combinatorial search, assuming that pruning is ineffective, we get O((N^(S/Gmean(N)))^(1/Gmean(N))), which I think is also O(N^(S/(Gmean(N)^2))).
So, at this point take your pick. I know this is really sketchy, and even if it is correct, it is still very sensitive to the distribution of the N-values, so lots of variance.
[I've replaced the previous idea about bit operations because it seems to be too time consuming]
A bit crazy idea and incomplete but may work.
Let's start with introducing f(n,s) which returns number of combinations in which s can be composed from n coins.
Now, how f(n+1,s) is related to f(n)?
One of possible ways to calculate it is:
f(n+1,s)=sum[coin:coins]f(n,s-coin)
For example, if we have coins 1 and 3,
f(0,)=[1,0,0,0,0,0,0,0] - with zero coins we can have only zero sum
f(1,)=[0,1,0,1,0,0,0,0] - what we can have with one coin
f(2,)=[0,0,1,0,2,0,1,0] - what we can have with two coins
We can rewrite it a bit differently:
f(n+1,s)=sum[i=0..max]f(n,s-i)*a(i)
a(i)=1 if we have coin i and 0 otherwise
What we have here is convolution: f(n+1,)=conv(f(n,),a)
https://en.wikipedia.org/wiki/Convolution
Computing it as definition suggests gives O(n^2)
But we can use Fourier transform to reduce it to O(n*log n).
https://en.wikipedia.org/wiki/Convolution#Convolution_theorem
So now we have more-or-less cheap way to find out what numbers are possible with n coins without going incrementally - just calculate n-th power of F(a) and apply inverse Fourier transform.
This allows us to make a kind of binary search which can help handling cases when the answer is big.
As I said the idea is incomplete - for now I have no idea how to combine bit representation with Fourier transforms (to satisfy memory constraint) and whether we will fit into 1 second on any "regular" CPU...
I have written this code to check if a number is prime (for numbers upto 10^9+7)
Is this a good method ??
What will be the time complexity for this ??
What I have done is that I have made a unordered_set which stores the prime numbers upto sqrt(n).
When checking if a number is prime or not if first check if its is less than the max number in the table.
If it is less it is searched in the table so the complexity should be O(1) in this case.
If it is more the number is put through a divisibility test with the numbers from the set of number containing the prime numbers.
#include<iostream>
#include<set>
#include<math.h>
#include<unordered_set>
#define sqrt10e9 31623
using namespace std;
unordered_set<long long> primeSet = { 2, 3 }; //used for fast lookups
void genrate_prime_set(long range) //this generates prime number upto sqrt(10^9+7)
{
bool flag;
set<long long> tempPrimeSet = { 2, 3 }; //a temporay set is used for genration
set<long long>::iterator j;
for (int i = 3; i <= range; i = i + 2)
{
//cout << i << " ";
flag = true;
for (j = tempPrimeSet.begin(); *j * *j <= i; ++j)
{
if (i % (*j) == 0)
{
flag = false;
break;
}
}
if (flag)
{
primeSet.insert(i);
tempPrimeSet.insert(i);
}
}
}
bool is_prime(long long i,unordered_set<long long> primeSet)
{
bool flag = true;
if(i <= sqrt10e9) //if number exist in the lookup table
return primeSet.count(i);
//if it doesn't iterate through the table
for (unordered_set<long long>::iterator j = primeSet.begin(); j != primeSet.end(); ++j)
{
if (*j * *j <= i && i % (*j) == 0)
{
flag = false;
break;
}
}
return flag;
}
int main()
{
//long long testCases, a, b, kiwiCount;
bool primeFlag = true;
//unordered_set<int> primeNum;
genrate_prime_set(sqrt10e9);
cout << primeSet.size()<<"\n";
cout << is_prime(9999991,primeSet);
return 0;
}
This doesn't strike me as a particularly efficient way to do the job at hand.
Although it probably won't make a big difference in the end, the efficient way to generate all the primes up to some specific limit is clearly to use a sieve--the sieve of Eratosthenes is simple and fast. There are a couple of modifications that can be faster, but for the small size you're dealing with, they're probably not worthwhile.
These normally produce their output in a more effective format than you're currently using as well. In particular, you typically just dedicate one bit to each possible prime (i.e., each odd number) and end up with it zeroed if the number is composite, and one if it's prime (you can, of course, reverse the sense if you prefer).
Since you only need one bit for each odd number from 3 to 31623, this requires only about 16 K bits, or about 2K bytes--a truly minuscule amount of memory by modern standards (especially: little enough to fit in L1 cache quite easily).
Since the bits are stored in order, it's also trivial to compute and test by the factors up to the square root of the number you're testing instead of testing against all the numbers in the table (including those greater than the square root of the number you're testing, which is obviously a waste of time). This also optimizes access to the memory in case some of it's not in the cache (i.e., you can access all the data in order, making life as easy as possible for the hardware prefetcher).
If you wanted to optimize further, I'd consider just using the sieve to find all primes up to 109+7, and look up inputs. Whether this is a win will depend (heavily) upon the number of queries you can expect to receive. A quick check shows that a simple implementation of the Sieve of Eratosthenes can find all primes up to 109 in about 17 seconds. After that, each query is (of course) essentially instantaneous (i.e., the cost of a single memory read). This does require around 120 megabytes of memory for the result of the sieve, which would once have been a major consideration, but (except on fairly limited systems) normally wouldn't be any more.
The very short answer: do research on the subject, starting with the term "Miller-Rabin"
The short answer is no:
Looking for factors of a number is a poor way to check for primality
Exhaustively searching through primes is a poor way to look for factors
Especially if you search through every prime, rather than just the ones less than or equal to the square root of the number
Doing a primality test on each number of them is a poor way to generate a list of primes
Also, you should take in primeSet by reference rather than copy, if it really needs to be a parameter.
Note: testing small primes to see if they divide a number is a useful first step of a primality test, but should generally only be used for the smallest primes before switching to a better method
No, it's not a very good way to determine if a number is prime. Here is pseudocode for a simple primality test that is sufficient for numbers in your range; I'll leave it to you to translate to C++:
function isPrime(n)
d := 2
while d * d <= n
if n % d == 0
return False
d := d + 1
return True
This works by trying every potential divisor up to the square root of the input number n; if no divisor has been found, then the input number could not be composite, meaning of the form n = p × q, because one of the two divisors p or q must be less than the square root of n while the other is greater than the square root of n.
There are better ways to determine primality; for instance, after initially checking if the number is even (and hence prime only if n = 2), it is only necessary to test odd potential divisors, halving the amount of work necessary. If you have a list of primes up to the square root of n, you can use that list as trial divisors and make the process even faster. And there are other techniques for larger n.
But that should be enough to get you started. When you are ready for more, come back here and ask more questions.
I can only suggest a way to use a library function in Java to check the primality of a number. As for the other questions, I do not have any answers.
The java.math.BigInteger.isProbablePrime(int certainty) returns true if this BigInteger is probably prime, false if it's definitely composite. If certainty is ≤ 0, true is returned. You should try and use it in your code. So try rewriting it in Java
Parameters
certainty - a measure of the uncertainty that the caller is willing to tolerate: if the call returns true the probability that this BigInteger is prime exceeds (1 - 1/2^certainty). The execution time of this method is proportional to the value of this parameter.
Return Value
This method returns true if this BigInteger is probably prime, false if it's definitely composite.
Example
The following example shows the usage of math.BigInteger.isProbablePrime() method
import java.math.*;
public class BigIntegerDemo {
public static void main(String[] args) {
// create 3 BigInteger objects
BigInteger bi1, bi2, bi3;
// create 3 Boolean objects
Boolean b1, b2, b3;
// assign values to bi1, bi2
bi1 = new BigInteger("7");
bi2 = new BigInteger("9");
// perform isProbablePrime on bi1, bi2
b1 = bi1.isProbablePrime(1);
b2 = bi2.isProbablePrime(1);
b3 = bi2.isProbablePrime(-1);
String str1 = bi1+ " is prime with certainity 1 is " +b1;
String str2 = bi2+ " is prime with certainity 1 is " +b2;
String str3 = bi2+ " is prime with certainity -1 is " +b3;
// print b1, b2, b3 values
System.out.println( str1 );
System.out.println( str2 );
System.out.println( str3 );
}
}
Output
7 is prime with certainity 1 is true
9 is prime with certainity 1 is false
9 is prime with certainity -1 is true
I am having issues with understanding dynamic programming solutions to various problems, specifically the coin change problem:
"Given a value N, if we want to make change for N cents, and we have infinite supply of each of S = { S1, S2, .. , Sm} valued coins, how many ways can we make the change? The order of coins doesn’t matter.
For example, for N = 4 and S = {1,2,3}, there are four solutions: {1,1,1,1},{1,1,2},{2,2},{1,3}. So output should be 4. For N = 10 and S = {2, 5, 3, 6}, there are five solutions: {2,2,2,2,2}, {2,2,3,3}, {2,2,6}, {2,3,5} and {5,5}. So the output should be 5."
There is another variation of this problem where the solution is the minimum number of coins to satisfy the amount.
These problems appear very similar, but the solutions are very different.
Number of possible ways to make change: the optimal substructure for this is DP(m,n) = DP(m-1, n) + DP(m, n-Sm) where DP is the number of solutions for all coins up to the mth coin and amount=n.
Minimum amount of coins: the optimal substructure for this is
DP[i] = Min{ DP[i-d1], DP[i-d2],...DP[i-dn] } + 1 where i is the total amount and d1..dn represent each coin denomination.
Why is it that the first one required a 2-D array and the second a 1-D array? Why is the optimal substructure for the number of ways to make change not "DP[i] = DP[i-d1]+DP[i-d2]+...DP[i-dn]" where DP[i] is the number of ways i amount can be obtained by the coins. It sounds logical to me, but it produces an incorrect answer. Why is that second dimension for the coins needed in this problem, but not needed in the minimum amount problem?
LINKS TO PROBLEMS:
http://comproguide.blogspot.com/2013/12/minimum-coin-change-problem.html
http://www.geeksforgeeks.org/dynamic-programming-set-7-coin-change/
Thanks in advance. Every website I go to only explains how the solution works, not why other solutions do not work.
Lets first talk about the number of ways, DP(m,n) = DP(m-1, n) + DP(m, n-Sm). This in indeed correct because either you can use the mth denomination or you can avoid it. Now you say why don't we write it as DP[i] = DP[i-d1]+DP[i-d2]+...DP[i-dn]. Well this will lead to over counting , lets take an example where n=4 m=2 and S={1,3}. Now according to your solution dp[4]=dp[1]+dp[3]. ( Assuming 1 to be a base case dp[1]=1 ) .Now dp[3]=dp[2]+dp[0]. ( Again dp[0]=1 by base case ). Again applying the same dp[2]=dp[1]=1. Thus in total you get answer as 3 when its supposed to be just 2 ( (1,3) and (1,1,1,1) ). Its so because
your second method treats (1,3) and (3,1) as two different solution.Your second method can be applied to case where order matters, which is also a standard problem.
Now to your second question you say that minimum number of denominations can
be found out by DP[i] = Min{ DP[i-d1], DP[i-d2],...DP[i-dn] } + 1. Well this is correct as in finding minimum denominations, order or no order does not matter. Why this is linear / 1-D DP , well although the DP array is 1-D each state depends on at most m states unlike your first solution where array is 2-D but each state depends on at most 2 states. So in both case run time which is ( number of states * number of states each state depends on ) is the same which is O(nm). So both are correct, just your second solution saves memory. So either you can find it by 1-D array method or by 2-D by using the recurrence
dp(n,m)=min(dp(m-1,n),1+dp(m,n-Sm)). (Just use min in your first recurrence)
Hope I cleared the doubts , do post if still something is unclear.
This is a very good explanation of the coin change problem using Dynamic Programming.
The code is as follows:
public static int change(int amount, int[] coins){
int[] combinations = new int[amount + 1];
combinations[0] = 1;
for(int coin : coins){
for(int i = 1; i < combinations.length; i++){
if(i >= coin){
combinations[i] += combinations[i - coin];
//printAmount(combinations);
}
}
//System.out.println();
}
return combinations[amount];
}
This question already has answers here:
Calculating factorial of large numbers in C
(16 answers)
Closed 2 years ago.
Consider problem of calculating factorial of a number.
When result is bigger than 2^32 then we will get overflow error.
How can we design a program to calculate factorial of big numbers?
EDIT: assume we are using C++ language.
EDIT2: it is a duplicate question of this one
As a question with just algorithm tagged. Your 2^32 is not an issue because an algorithm can never have an Overflow error. Implementations of an algorithm can and do have overflow errors. So what language are you using?
Most languages have a BigNumber or BigInteger that can be used.
Here's a C++ BigInteger library: https://mattmccutchen.net/bigint/
I suggest that you google for: c++ biginteger
If you can live with approximate values, consider using the Stirling approximation and compute it in double precision.
If you want exact values, you'll need arbitrary-precision arithmetic and a lot of computation time...
Doing this requires you to take one of a few approaches, but basically boils down to:
splitting your number across multiple variables (stored in an array) and
managing your operations across the array.
That way each int/element in the array has a positional magnitude and can be strung together in the end to make your whole number.
A good example here in C: http://www.go4expert.com/forums/c-program-calculate-factorial-t25352/
Test this script:
import gmpy as gm
print gm.fac(3000)
For very big number is difficult to stock or print result.
For some purposes, such as working out the number of combinations, it is sufficient to compute the logarithm of the factorial, because you will be dividing factorials by factorials and the final result is of a more reasonable size - you just subtract logarithms before taking the exponential of the result.
You can compute the logarithm of the factorial by adding logarithms, or by using the http://en.wikipedia.org/wiki/Gamma_function, which is often available in mathematical libraries (there are good ways to approximate this).
First invent a way to store and use big numbers. Common way is to interpret array of integers as digits of a big number. Then add basic operations to your system, such as multiplication. Then multiply.
Or use already made solutions. Google for: c++ big integer library
You can use BigInteger for finding factorial of a Big numbers probably greater than 65 as the range of data type long ends at 65! and it starts returning 0 after that. Please refer to below Java code. Hope it would help:
import java.math.BigInteger;
public class factorial {
public factorial() {
// TODO Auto-generated constructor stub
}
public static void main(String args[])
{
factorial f = new factorial();
System.out.println(f.fact(100));
}
public BigInteger fact(int num)
{
BigInteger sum = BigInteger.valueOf(1);
for(int i = num ; i>= 2; i --)
{
sum = sum.multiply(BigInteger.valueOf(i));
}
return sum;
}
}
If you want to improve the range of your measurement, you can use logarithms. Logarithms will convert your multiplication to additions making it much smaller to store.
factorial(n) => n * factorial(n-1)
log(factorial(n)) => log(n) * log(factorial(n-1))
5! = 5*4*3*2*1 = 120
log(5!) = log(5) + log(4) + log(3) + log(2) + log(1) = 2.0791812460476247
In this example, I used base 10 logarithms, but any base works.
10^2.0791812460476247
Or 10^0.0791812460476247*10^2 or 1.2*10^2
Implementation example in javascript
I would like to genrate a random permutation as fast as possible.
The problem: The knuth shuffle which is O(n) involves generating n random numbers.
Since generating random numbers is quite expensive.
I would like to find an O(n) function involving a fixed O(1) amount of random numbers.
I realize that this question has been asked before, but I did not see any relevant answers.
Just to stress a point: I am not looking for anything less than O(n), just an algorithm involving less generation of random numbers.
Thanks
Create a 1-1 mapping of each permutation to a number from 1 to n! (n factorial). Generate a random number in 1 to n!, use the mapping, get the permutation.
For the mapping, perhaps this will be useful: http://en.wikipedia.org/wiki/Permutation#Numbering_permutations
Of course, this would get out of hand quickly, as n! can become really large soon.
Generating a random number takes long time you say? The implementation of Javas Random.nextInt is roughly
oldseed = seed;
nextseed = (oldseed * multiplier + addend) & mask;
return (int)(nextseed >>> (48 - bits));
Is that too much work to do for each element?
See https://doi.org/10.1145/3009909 for a careful analysis of the number of random bits required to generate a random permutation. (It's open-access, but it's not easy reading! Bottom line: if carefully implemented, all of the usual methods for generating random permutations are efficient in their use of random bits.)
And... if your goal is to generate a random permutation rapidly for large N, I'd suggest you try the MergeShuffle algorithm. An article published in 2015 claimed a factor-of-two speedup over Fisher-Yates in both parallel and sequential implementations, and a significant speedup in sequential computations over the other standard algorithm they tested (Rao-Sandelius).
An implementation of MergeShuffle (and of the usual Fisher-Yates and Rao-Sandelius algorithms) is available at https://github.com/axel-bacher/mergeshuffle. But caveat emptor! The authors are theoreticians, not software engineers. They have published their experimental code to github but aren't maintaining it. Someday, I imagine someone (perhaps you!) will add MergeShuffle to GSL. At present gsl_ran_shuffle() is an implementation of Fisher-Yates, see https://www.gnu.org/software/gsl/doc/html/randist.html?highlight=gsl_ran_shuffle.
Not what you asked exactly, but if provided random number generator doesn't satisfy you, may be you should try something different. Generally, pseudorandom number generation can be very simple.
Probably, best-known algorithm
http://en.wikipedia.org/wiki/Linear_congruential_generator
More
http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators
As other answers suggest, you can make a random integer in the range 0 to N! and use it to produce a shuffle. Although theoretically correct, this won't be faster in general since N! grows fast and you'll spend all your time doing bigint arithmetic.
If you want speed and you don't mind trading off some randomness, you will be much better off using a less good random number generator. A linear congruential generator (see http://en.wikipedia.org/wiki/Linear_congruential_generator) will give you a random number in a few cycles.
Usually there is no need in full-range of next random value, so to use exactly the same amount of randomness you can use next approach (which is almost like random(0,N!), I guess):
// ...
m = 1; // range of random buffer (single variant)
r = 0; // random buffer (number zero)
// ...
for(/* ... */) {
while (m < n) { // range of our buffer is too narrow for "n"
r = r*RAND_MAX + random(); // add another random to our random-buffer
m *= RAND_MAX; // update range of random-buffer
}
x = r % n; // pull-out next random with range "n"
r /= n; // remove it from random-buffer
m /= n; // fix range of random-buffer
// ...
}
P.S. of course there will be some errors related with division by value different from 2^n, but they will be distributed among resulted samples.
Generate N numbers (N < of the number of random number you need) before to do the computation, or store them in an array as data, with your slow but good random generator; then pick up a number simply incrementing an index into the array inside your computing loop; if you need different seeds, create multiple tables.
Are you sure that your mathematical and algorithmical approach to the problem is correct?
I hit exactly same problem where Fisher–Yates shuffle will be bottleneck in corner cases. But for me the real problem is brute force algorithm that doesn't scale well to all problems. Following story explains the problem and optimizations that I have come up with so far.
Dealing cards for 4 players
Number of possible deals is 96 bit number. That puts quite a stress for random number generator to avoid statical anomalies when selecting play plan from generated sample set of deals. I choose to use 2xmt19937_64 seeded from /dev/random because of the long period and heavy advertisement in web that it is good for scientific simulations.
Simple approach is to use Fisher–Yates shuffle to generate deals and filter out deals that don't match already collected information. Knuth shuffle takes ~1400 CPU cycles per deal mostly because I have to generate 51 random numbers and swap 51 times entries in the table.
That doesn't matter for normal cases where I would only need to generate 10000-100000 deals in 7 minutes. But there is extreme cases when filters may select only very small subset of hands requiring huge number of deals to be generated.
Using single number for multiple cards
When profiling with callgrind (valgrind) I noticed that main slow down was C++ random number generator (after switching away from std::uniform_int_distribution that was first bottleneck).
Then I came up with idea that I can use single random number for multiple cards. The idea is to use least significant information from the number first and then erase that information.
int number = uniform_rng(0, 52*51*50*49);
int card1 = number % 52;
number /= 52;
int cards2 = number % 51;
number /= 51;
......
Of course that is only minor optimization because generation is still O(N).
Generation using bit permutations
Next idea was exactly solution asked in here but I ended up still with O(N) but with larger cost than original shuffle. But lets look into solution and why it fails so miserably.
I decided to use idea Dealing All the Deals by John Christman
void Deal::generate()
{
// 52:26 split, 52!/(26!)**2 = 495,918,532,948,1041
max = 495918532948104LU;
partner = uniform_rng(eng1, max);
// 2x 26:13 splits, (26!)**2/(13!)**2 = 10,400,600**2
max = 10400600LU*10400600LU;
hands = uniform_rng(eng2, max);
// Create 104 bit presentation of deal (2 bits per card)
select_deal(id, partner, hands);
}
So far good and pretty good looking but select_deal implementation is PITA.
void select_deal(Id &new_id, uint64_t partner, uint64_t hands)
{
unsigned idx;
unsigned e, n, ns = 26;
e = n = 13;
// Figure out partnership who owns which card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx > 0; ) {
uint64_t cut = ncr(idx - 1, ns);
if (partner >= cut) {
partner -= cut;
// Figure out if N or S holds the card
ns--;
cut = ncr(ns, n) * 10400600LU;
if (hands > cut) {
hands -= cut;
n--;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
} else
new_id[idx%NUM_SUITS + NUM_SUITS] |= 1 << (idx/NUM_SUITS);
idx--;
}
unsigned ew = 26;
// Figure out if E or W holds a card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx-- > 0; ) {
if (new_id[idx%NUM_SUITS + NUM_SUITS] & (1 << (idx/NUM_SUITS))) {
uint64_t cut = ncr(--ew, e);
if (hands >= cut) {
hands -= cut;
e--;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
}
}
}
Now that I had the O(N) permutation solution done to prove algorithm could work I started searching for O(1) mapping from random number to bit permutation. Too bad it looks like only solution would be using huge lookup tables that would kill CPU caches. That doesn't sound good idea for AI that will be using very large amount of caches for double dummy analyzer.
Mathematical solution
After all hard work to figure out how to generate random bit permutations I decided go back to maths. It is entirely possible to apply filters before dealing cards. That requires splitting deals to manageable number of layered sets and selecting between sets based on their relative probabilities after filtering out impossible sets.
I don't yet have code ready for that to tests how much cycles I'm wasting in common case where filter is selecting major part of deal. But I believe this approach gives the most stable generation performance keeping the cost less than 0.1%.
Generate a 32 bit integer. For each index i (maybe only up to half the number of elements in the array), if bit i % 32 is 1, swap i with n - i - 1.
Of course, this might not be random enough for your purposes. You could probably improve this by not swapping with n - i - 1, but rather by another function applied to n and i that gives better distribution. You could even use two functions: one for when the bit is 0 and another for when it's 1.