Anticipate factorial overflow - algorithm

I'm wondering how could I anticipate whether the next iteration will generate an integer overflow while calculating the factorial F or not?
Let's say that at each iteration I have an int I and the maximum value is MAX_INT.
It sounds like a homework, I know. It's not. It's just me asking myself "stupid" questions.
Addendum
I though about, given a number of BITS (the width an integer can take, in bits), I could round up the number I to the next power of two, and detect if a shift to left would exceed BITS. But how would that look like, algorithmically?

Alternative hint:
a * b ≤ MAX_INT
is equivalent to
a ≤ MAX_INT / b
if b > 0.

Factorials are a series of multiplications, and the number of bits needed to hold the result of a multiplication is the sum of the bits of the two multiplicands. So, keep a running total of how many bits are used in your result, and the current number of bits needed to hold the value you are multiplying in. When that's greater than the number of bits left, you're about to overflow.

If you've so far got m = (n-1)! and you're about to multiply by n, you can guard against overflow by checking that
m <= MAX_INT / n

You can probably use Stirling's Approximation formula which says that
ln (n!) = n*ln(n) - n + ln(2*pi*n)/2 + O(1/n)
and will be quite accurate.
You don't actually need to go about trying to multiply etc. Of course, this does not directly answer what you asked, but given that you are just curious, hope this helps.

Related

Hashing with the Division Method - Choosing number of slots?

So, in CLRS, there's this quote
A prime not too close to an exact power of 2 is often a good choice for m.
Several Questions...
I understand how a power of 2 will just be the lower order bits of your key...however, say you have keys from a universe of 1 to 1 million, with each key having an equal probability of being any number from universe (which I'm guessing is a common assumption about your universe if given no other data?) then wouldn't taking say the 4 lower order bits result in (2^4) lower order bit patterns that were pretty much equally likely for the keys from 1 to 1 million? How am I thinking about this incorrectly?
Why a prime number? So, if power of 2's aren't a good idea, why is a prime number a better choice as opposed to a composite number close to a power of 2 (Also why should it be close to a power of 2...lol)?
You are trying to find a hash table that works well for typical input data, and typical input data does things that you wouldn't expect from good random number generators. Very often you get formatted or semi-formatted strings which, when converted to numbers, end up as K, K+A, K+2A, K+3A,.... for some integers K and A. If K+xA and K+yA hash to the same number mod m, then (x-y)A must be 0 mod m. If m is prime, this can only happen if A = 0 mod m or if x = y mod m, so one time in m. But if m=pq and A happens to be divisible by p, then you get a collision every time x-y is divisible by q, which is more often since q < m.
I guess close to a power of 2 because it might be convenient for the memory management system to have blocks of memory of the resulting size - I really don't know. If you really care, and if you have the time, you could try different primes with some representative data and see which of them are best in practice.

2^n mod (m) algorithm

In class, we were presented with an algorithm for 2^n mod(m).
to find 2^n mod(m){
if n=0 {return 1;}
r=2^(n-1)mod(m);
if 2r < m {return 2r;}
if 2r > =m {return 2r-m;}
}
We were told that the runtime is O(n*size(m)) where size of m is the number of bits in m.
I understand the n part, but I cannot explain the size(m) unless it is because of the subtraction involved. Can anyone shed some light on that?
Thanks in advance.
The n part is clear, as you have already understood yourself. The size(m) (which is the number of digits in m, which is basically log(m)) is because of mod. Even though your CPU does that for you in one instruction, it takes log(m) (let's say 32 bits) times. If m is very large, as is common with encryption keys, this can become considerable.
Why number of digits in m? Remember division:
abcdefghijk | xyz
|-----
alm | nrvd...
opq
stu
wabc
.......
The number of times you do the minus, is at most the number of digits in the dividend.
I believe this is used in cryptography (so called noninvertible function).
If we need to compute (2**n) mod m recursively, this would be the most obvious way to do it. Since the depth of recursion is n, the O(n) complexity is obvious.
However, if we would like to support arbitrary size of m (512 bit keys are possible in cryptography, and are much larger than any arithmetic register), we should also consider that (in most cases we don't have to use arbitrary precision arithmetics, so this term is usually 1).
EDIT #Mysticial: The function does not call the hardware mod operation explicitely, all it does is shift and substraction. shift is always O(1) while addition/substraction is O(ceil(m/sizeof_ALU_precision))

integer nth root

x' is the nth root of y if x' is the largest integer such that x^n <= y. x, x' and y are all integers. Is there any efficient way to compute such nth root? I know this is usually done by nth root algorithm, but the difficulty here is everything is integer because I'm working with an embedded system.
BTW, I've even tried to binary search from 1 to y to identify largest x such that x^n <= y, but it does not work since x^n overflows easily especially when n is large.
Store a table for given y of the maximum x such that x^y does not overflow. Use these values for binary search; that way, no more overflow and a neat algorithm that will work as long as x and n have the same (integer) type. Right?
Note: for y > 32, the maximum value for x is 2 for 32-bit integers... in other words, your table will be the same size as the number of bits in integers your system understands, approximately.
Are you looking for integer roots only? Or do you want to know that the 5th root of 34 is 2.024...? Or is "2" a sufficient answer? If you want the decimal places, you'll have to do some kind of floating point or fixed point math.
You should read Computing principal roots, and note what it says about the first Newton approximate. If an error of about 0.03% is close enough, I'd suggest you go with this. You'd probably want to build a table that you can use to do the initial approximations. This table isn't as large as it sounds. The cube root of 2^32 is only about 1,626. You can easily compute the squares, and it's easy to generate x^n if you can generate x^2 and x^3. So doing the approximations is pretty easy.
Another possibility is to build yourself a table of roots and use some kind of interpolation. Again, that table wouldn't have to be very large if you treat the square root as a special case. The 5th root of 2^32 is less than 100, so you're talking a pretty small table to get a pretty large range of roots.
I think the best method is to use the Newton-Raphson method from the Wikipedia article.
A good starting value can be computed from the bit length of the input divided by n. In each iteration you use integer division that rounds down. Iterate until you have found a value x such that x^n <= y < (x+1)^n.
You have to be careful to avoid overflow. As the other answer says, you can use a table of the maximal root for n < bit size to do that (for greater n the answer is always 1, except for y = 0).

Optimized algorithm for converting a decimal to a "pretty" fraction

Rather than converting an arbitrary decimal to an exact fraction (something like 323527/4362363), I am trying to convert to just common easily-discernible (in terms of human-readability) quantities like 1/2, 1/4, 1/8 etc.
Other than using a series of if-then, less than/equal to etc comparisons, are there more optimized techniques to do this?
Edit: In my particular case, approximations are acceptable. The idea is that 0.251243 ~ 0.25 = 1/4 - in my usage case, that's "good enough", with the latter more preferable for human readability in terms of a quick indicator (not used for calculation, just used as display numerics).
Look up "continued fraction approximation". Wikipedia has a basic introduction in its "continued fraction" article, but there are optimized algorithms that generate the approximated value while generating the fraction.
Then pick some stopping heuristic, a combination of size of denominator and closeness of approximation, for when you're "close enough".
You can use Euclidean algorithm to get Greatest Common Divisor between enumerator and denominator and divide them by it.
In the following, I'm going to assume that our decimals fall in between 0 and 1. It should be straightforward to adapt this to larger numbers and negative numbers.
Probably the easiest thing to do would be to choose the largest denominator that you would find acceptable and then create a list of fractions between 0 and 1 which have that denominators less than or equal to them. Be sure to avoid any fractions which can be simplified. Obviously, once you've listed 1/2, you don't need 2/4. You can avoid fractions which can be simplified by checking that the GCD of the numerator and denominator is 1 suing Euclid's algorithm. Once you have your list. Evaluate these as floating point numbers (probably doubles, but the data type obviously depends on your choice of programming language). Then insert them into a balanced binary search tree storing both the original fraction and the floating point evaluation of the fraction. You should only need to do this once to set things up initially so the n*log(n) time (where n is the number of fractions) isn't very much.
Then, whenever you get a number, simply search the tree to find the closest number to it which is in the search tree. Note that this is slightly more complicated than searching for an exact match because the node you're looking for may not be a leaf node. So, as you traverse the tree keep a record of the closest valued node that you have visited. Once you reach a leaf node and compare that one to your closest valued node that you have visited, you are done. Whichever your closest one is, it's fraction is your answer.
Here is a suggestion: Assuming your starting fraction is p/q
Calculate r = p/q as a rational(floating point) value (e.g. r = float(p)/float(q))
Calculate the rounded decimal x = int(10000*r)
Calculate GCD (greatest common denominator) of x and 10000: s = GCD(x, 10000)
Represent the result as m / n where m = x/s and n = y/s (your example computes to 371 / 5000)
Normally, all denominators of 1000 are fairly human readable.
This might not provide the best result when the value is closer to simpler cases such as 1/3. However, I personally find 379/1000 much more human readable than 47/62 (which is the shortest fractional representation). You can add a few exceptions to fine tune such process though (e.g. calculating the p/GCD(p,q) , q/GCD(p,q) and accepting it if one of those are single digit values before proceeding to this method)
Pretty dumb solution, just for "previewing" fraction :
factor = 1/decimal
result = 1/Round(factor)
mult = 1
while (result = 1) {
mult = mult * 10
result = (1 * mult)/(Round(mult * factor))
}
result = simplify_with_GCD(result)
good luck!

Programming problem - Game of Blocks

maybe you would have an idea on how to solve the following problem.
John decided to buy his son Johnny some mathematical toys. One of his most favorite toy is blocks of different colors. John has decided to buy blocks of C different colors. For each color he will buy googol (10^100) blocks. All blocks of same color are of same length. But blocks of different color may vary in length.
Jhonny has decided to use these blocks to make a large 1 x n block. He wonders how many ways he can do this. Two ways are considered different if there is a position where the color differs. The example shows a red block of size 5, blue block of size 3 and green block of size 3. It shows there are 12 ways of making a large block of length 11.
Each test case starts with an integer 1 ≤ C ≤ 100. Next line consists c integers. ith integer 1 ≤ leni ≤ 750 denotes length of ith color. Next line is positive integer N ≤ 10^15.
This problem should be solved in 20 seconds for T <= 25 test cases. The answer should be calculated MOD 100000007 (prime number).
It can be deduced to matrix exponentiation problem, which can be solved relatively efficiently in O(N^2.376*log(max(leni))) using Coppersmith-Winograd algorithm and fast exponentiation. But it seems that a more efficient algorithm is required, as Coppersmith-Winograd implies a large constant factor. Do you have any other ideas? It can possibly be a Number Theory or Divide and Conquer problem
Firstly note the number of blocks of each colour you have is a complete red herring, since 10^100 > N always. So the number of blocks of each colour is practically infinite.
Now notice that at each position, p (if there is a valid configuration, that leaves no spaces, etc.) There must block of a color, c. There are len[c] ways for this block to lie, so that it still lies over this position, p.
My idea is to try all possible colors and positions at a fixed position (N/2 since it halves the range), and then for each case, there are b cells before this fixed coloured block and a after this fixed colour block. So if we define a function ways(i) that returns the number of ways to tile i cells (with ways(0)=1). Then the number of ways to tile a number of cells with a fixed colour block at a position is ways(b)*ways(a). Adding up all possible configurations yields the answer for ways(i).
Now I chose the fixed position to be N/2 since that halves the range and you can halve a range at most ceil(log(N)) times. Now since you are moving a block about N/2 you will have to calculate from N/2-750 to N/2-750, where 750 is the max length a block can have. So you will have to calculate about 750*ceil(log(N)) (a bit more because of the variance) lengths to get the final answer.
So in order to get good performance you have to through in memoisation, since this inherently a recursive algorithm.
So using Python(since I was lazy and didn't want to write a big number class):
T = int(raw_input())
for case in xrange(T):
#read in the data
C = int(raw_input())
lengths = map(int, raw_input().split())
minlength = min(lengths)
n = int(raw_input())
#setup memoisation, note all lengths less than the minimum length are
#set to 0 as the algorithm needs this
memoise = {}
memoise[0] = 1
for length in xrange(1, minlength):
memoise[length] = 0
def solve(n):
global memoise
if n in memoise:
return memoise[n]
ans = 0
for i in xrange(C):
if lengths[i] > n:
continue
if lengths[i] == n:
ans += 1
ans %= 100000007
continue
for j in xrange(0, lengths[i]):
b = n/2-lengths[i]+j
a = n-(n/2+j)
if b < 0 or a < 0:
continue
ans += solve(b)*solve(a)
ans %= 100000007
memoise[n] = ans
return memoise[n]
solve(n)
print "Case %d: %d" % (case+1, memoise[n])
Note I haven't exhaustively tested this, but I'm quite sure it will meet the 20 second time limit, if you translated this algorithm to C++ or somesuch.
EDIT: Running a test with N = 10^15 and a block with length 750 I get that memoise contains about 60000 elements which means non-lookup bit of solve(n) is called about the same number of time.
A word of caution: In the case c=2, len1=1, len2=2, the answer will be the N'th Fibonacci number, and the Fibonacci numbers grow (approximately) exponentially with a growth factor of the golden ratio, phi ~ 1.61803399. For the
huge value N=10^15, the answer will be about phi^(10^15), an enormous number. The answer will have storage
requirements on the order of (ln(phi^(10^15))/ln(2)) / (8 * 2^40) ~ 79 terabytes. Since you can't even access 79
terabytes in 20 seconds, it's unlikely you can meet the speed requirements in this special case.
Your best hope occurs when C is not too large, and leni is large for all i. In such cases, the answer will
still grow exponentially with N, but the growth factor may be much smaller.
I recommend that you first construct the integer matrix M which will compute the (i+1,..., i+k)
terms in your sequence based on the (i, ..., i+k-1) terms. (only row k+1 of this matrix is interesting).
Compute the first k entries "by hand", then calculate M^(10^15) based on the repeated squaring
trick, and apply it to terms (0...k-1).
The (integer) entries of the matrix will grow exponentially, perhaps too fast to handle. If this is the case, do the
very same calculation, but modulo p, for several moderate-sized prime numbers p. This will allow you to obtain
your answer modulo p, for various p, without using a matrix of bigints. After using enough primes so that you know their product
is larger than your answer, you can use the so-called "Chinese remainder theorem" to recover
your answer from your mod-p answers.
I'd like to build on the earlier #JPvdMerwe solution with some improvements. In his answer, #JPvdMerwe uses a Dynamic Programming / memoisation approach, which I agree is the way to go on this problem. Dividing the problem recursively into two smaller problems and remembering previously computed results is quite efficient.
I'd like to suggest several improvements that would speed things up even further:
Instead of going over all the ways the block in the middle can be positioned, you only need to go over the first half, and multiply the solution by 2. This is because the second half of the cases are symmetrical. For odd-length blocks you would still need to take the centered position as a seperate case.
In general, iterative implementations can be several magnitudes faster than recursive ones. This is because a recursive implementation incurs bookkeeping overhead for each function call. It can be a challenge to convert a solution to its iterative cousin, but it is usually possible. The #JPvdMerwe solution can be made iterative by using a stack to store intermediate values.
Modulo operations are expensive, as are multiplications to a lesser extent. The number of multiplications and modulos can be decreased by approximately a factor C=100 by switching the color-loop with the position-loop. This allows you to add the return values of several calls to solve() before doing a multiplication and modulo.
A good way to test the performance of a solution is with a pathological case. The following could be especially daunting: length 10^15, C=100, prime block sizes.
Hope this helps.
In the above answer
ans += 1
ans %= 100000007
could be much faster without general modulo :
ans += 1
if ans == 100000007 then ans = 0
Please see TopCoder thread for a solution. No one was close enough to find the answer in this thread.

Resources