Calculating limits in dynamic programming - c++11

I found this question on topcoder:
Your friend Lucas gave you a sequence S of positive integers.
For a while, you two played a simple game with S: Lucas would pick a number, and you had to select some elements of S such that the sum of all numbers you selected is the number chosen by Lucas. For example, if S={2,1,2,7} and Lucas chose the number 11, you would answer that 2+2+7 = 11.
Lucas now wants to trick you by choosing a number X such that there will be no valid answer. For example, if S={2,1,2,7}, it is not possible to select elements of S that sum up to 6.
You are given the int[] S. Find the smallest positive integer X that cannot be obtained as the sum of some (possibly all) elements of S.
Constraints: - S will contain between 1 and 20 elements, inclusive. - Each element of S will be between 1 and 100,000, inclusive.
But in the editorial solution it has been written:
How about finding the smallest impossible sum? Well, we can try the following naive algorithm: First try with x = 1, if this is not a valid sum (found using the methods in the previous section), then we can return x, else we increment x and try again, and again until we find the smallest number that is not a valid sum.
Let's find an upper bound for the number of iterations, the number of values of x we will need to try before we find a result. First of all, the maximum sum possible in this problem is 100000 * 20 (All numbers are the maximum 100000), this means that 100000 * 20 + 1 will not be an impossible value. We can be certain to need at most 2000001 steps.
How good is this upper bound? If we had 100000 in each of the 20 numbers, 1 wouldn't be a possible sum. So we actually need one iteration in that case. If we want 1 to be a possible sum, we should have 1 in the initial elements. Then we need a 2 (Else we would only need 2 iterations), then a 4 (3 can be found by adding 1+2), then 8 (Numbers from 5 to 7 can be found by adding some of the first 3 powers of two), then 16, 32, .... It turns out that with the powers of 2, we can easily make inputs that require many iterations. With the first 17 powers of two, we can cover up to the first 262143 integer numbers. That should be a good estimation for the largest number. (We cannot use 2^18 in the input, smaller than 100000).
Up to 262143 times, we need to query if a number x is in the set of possible sums. We can just use a boolean array here. It appears that even O(log(n)) data structures should be fast enough, however.
I did understand the first paragraph. But after that they have explained something about "How good is this upper bound?...". I couldnt understand that paragraph. How did they deduce to the fact that we need to query 262143 times if a number x is in the set of possible sums?
I am a newbie at dynamic programming and so it would be great if somebody could explain this to me.
Thank you.

The idea is as follows:
If the input sequence contains the first k powers of two: 2^0, 2^1, ... 2^(k-1), then the sum can be any integer between 0 and (2^k) - 1. Since the greatest power of two that can appear in the sequence is 2^17, the greatest sum that you can build from 18 numbers is 2^18 - 1=262,143. If a power of two would be missing, there would be a smaller sum that was not possible to achieve.
However, the statement is missing that there may be 2 more numbers in the sequence (at most 20). From these two numbers, you can repeat the same process. Hence, the maximum number to check is actually (2^18) - 1 + (2^2) - 1.
You may wonder why we use powers of two and not any other powers. The reason is the binary selection that we perform on the numbers in the input sequence. Either we add a number to the sum or we don't. So, if we represent this selection for number ni as a selection variable si (either 0 or 1), then the possible sum is:
s = s0 * n0 + s1 * n1 + s2 * n2 + ...
Now, if we choose the ni to be powers of two ni = 2^i, then:
s = s0 * 2^0 + s1 * 2^1 + s2 * 2^2 + ...
= sum si * 2^i
This is equivalent to the binary representations of numbers (see Positional Notation). By definition, different choices for the selection variables will produce different sums. Hence, the number of possible sums is maximal by choosing powers of two in the input sequence.

Related

Algorithm for obtaining a sum with minimum number of terms

The problem statement is following:
Given N. We need to find x1,x2,..,xp such that N = x1 + x2 + .. + xp, p must be minimum(means number of terms in the sum) and we also must be able to get all the numbers from 1 to (N-1) from the sum of the subset of (x1,x2,x3..xp).And numbers in the set might be repeated also.
For example if N=7.
7 = 1+2+4
And 6= (2,4) , 5= (4,1), 4 = (4),3=(1,2) and so on.
Example 2:
8 = 1+2+4+1
Example 3:(invalid)
8 = 1+2+5
But we can't get 4 from the subset of (1,2,5).So (1,2,5) is not a valid combination
My approach is if 'N-1'can be written as sum of p terms than 'N' either have p or p+1 terms. But that approach will require to check all possible combinations which sums up to "N-1" and have "p" terms. Can anyone has better solution other than this?
Solution:
Step1:
Assume that we got "K" entries in our set as our answer. Therefore we can obtain 2^K different numbers of sums from these numbers because each entry either will appear or not appear in the sum. And also if the the number is "N", we need to compute the sum for '1' to 'N'. Therefore (2^K -1) = N K=log(N+1)
Step2:
After the step1, we know that our answer must include "K" entries but what these entries actual are? Assume that our entries are (a1,a2,a3...ak). So number P can be written as
P = a1*b1 + a2*b2 + a3*b3....+ ak*bk. Where all b[i] = 0 or 1. Here, we can see P as a decimal representation of binary number (b1 b2 b3 bk), therefore we can take a[i] = 2^(i-1).
You should take all numbers 1,2,4 ....2^k, N-(1+...+2^k). (The last one only if it doesn't equal to 0)
Proof
First of all, if we only get k numbers, we can get maximum 2^k - 1 different sums except 0. So if N>=2^k, We need at least k + 1 numbers. So you can see that if our group of numbers correct it's minimum by size(or one of the minimums)
It's easy to see that we can get any number from 0 to 2^(k+1) - 1 using first numbers. What If we need more? We just get last number because it's less than 2^(k + 1). And get difference using first elements
I haven't run out the numbers on this, but you should be very very interested in the fact that you have listed the first three powers of two.
If I were looking for a better solution, that's where I'd start.

Is this possible? Last few digits of sum equal to another number

I have a n-digit number and a list of numbers, from which any number can be used any number of times.
Taking numbers from the list, how do I know that it is possible to generate a sum such that the last n-digits of the sum are the the n-digit number?
Note: The sum has some initial value, its not zero.
EDIT - If a solution exists, I need to find the minimum number of the numbers added to get a number such that it has the last 4 digits as the given number. That be easily solved with DP (minimum coin change problem).
For example, if n=4,
Given number = 1212
Initial value = 5234
List = [1023, 101, 1]
A solution exists: 21212 = 5234 + 1023*15 + 101*6 + 1*27
It's easy to find a counterexample (see comments).
Now, for the solution here's a dynamic programming approach:
All arithmetic is modulo 10^n. For each value in the range 0 - 10^n-1 you need a flag whether it was found and you need a queue for the elements to be processed.
Push the initial value to the to-be-processed-list.
Get an element from the to-be-processed list. If empty, finished. No solution.
Try to add each number separately to this number. If it was already found, nothing to do. If sum is found, you've finished, there's a solution. If not, mark it as found and push it to the queue.
Goto 2
An actual solution can be reconstructed if you store how you reached a number. You just have to walk back from sum till you hit the initial value.
If the greatest common factor of the numbers in the list is a unit modulo 10n (that is, not divisible by 2 or 5) you can solve the problem for any choice of the other given values: use the extended Euclid's algorithm to find a linear combination of the list that sums to the gcf, find the multiplicative inverse of the gcf modulo 10n and multiply by the difference between the given and the initial values.
If the gcf of the numbers in the list is divisible by 2 or 5 (that is, is not a unit) and the difference between the given and the initial value is also divisible by 2 or 5, divide the numbers in the list and the difference by the largest powers of 2 and 5 that divide them all. If the gcf you end up with is a unit there is a solution and you can find it with the procedure above. Otherwise there is no solution.
For example, given 16 and initial value for the sum 5, and list of numbers [3].
The gcf of the numbers in the list is 3 which is a unit. Its inverse modulo 100 is 67 (3×67 = 201).
Multiplying by the difference between the given number and the initial value 16-5 = 11 we get the factor 67*11 = 737 for 3. Since we're working modulo 100 that's the same as 37.
Checking the result: 5 + 37×3 = 16. Yep, that works.

Finding the missing number - from Programming Pearls 2nd Edition

From the book Programming Pearls 2nd Edition, quoting the question A from Column 2, section 2.1
Given a sequential file that contains at most four billion integers in
random order, find a 32-bit integer that isn't in the file (and there
must be at least one missing - why ?). How would you solve this
problem with ample quantities of main memory ? How would you solve it
if you could use several external "scratch" files but only a few
hundred bytes of main memory ?
The problem statement says "at most" four billion integers. So one valid input could be in the range 100 - 299 with one missing number. If this understanding of the problem statement is correct then the required inputs for this problem are the file containing the numbers and also the range of the numbers in the file, ie: i to n.
For this problem isn't the following O(n) solution more intuitive than the one given in the book (from Ed Reingold) ? Or am I missing something ?
Assume the given range is i...n
using the forumla (n * (n + 1) / 2)
x = the sum of numbers from 1 to i-1
y = the sum of numbers from 1 to n
walk through the input and get a sum of the numbers (value z)
missing number = (y - x - z)
You are missing something:
The numbers don't said to be unique, the summation approach
assumes unique elements and only one missing in the range
You don't know the range of the numbers, in 32 bit integer can have
much more then 4B elements (to be exact, there are 294967296 more numbers that can be represented by 32 bits then 4B)
Look at the simplified counter example with range = [1,5], array = [5,5,5,4,1].
In this case, you will get x=1, y = 15, z = 20.
However, 20-15-1 = 4, and it is not missing.
You can use radix sort, which runs in O(n) in this case (because 32 bits is constant), and then scan the sorted array to find the first missing element.
EDIT: A more efficient way to do it is yet with a variation of radix sort and selection algorithm:
Let your current number of bits be k. Look at the first bit, and partition the array into two parts, the first with this bit unset, and the second with this bit set.
In at least one of the parts - there must be less then (2^k) / 2 = 2^(k-1) elements.
Reduce the problem to this sub array only and use k' = k-1 (reduce the current number of bits) and return to 1.
Keep doing it until you exhausted your bits, and you will get a number that is not in the original list.
Note that assuming the list is random enough, the complexity of the algorithm is O(n) - for any number of bits (and not O(n*k))

Greatest GCD between some numbers

We've got some nonnegative numbers. We want to find the pair with maximum gcd. actually this maximum is more important than the pair!
For example if we have:
2 4 5 15
gcd(2,4)=2
gcd(2,5)=1
gcd(2,15)=1
gcd(4,5)=1
gcd(4,15)=1
gcd(5,15)=5
The answer is 5.
You can use the Euclidean Algorithm to find the GCD of two numbers.
while (b != 0)
{
int m = a % b;
a = b;
b = m;
}
return a;
If you want an alternative to the obvious algorithm, then assuming your numbers are in a bounded range, and you have plenty of memory, you can beat O(N^2) time, N being the number of values:
Create an array of a small integer type, indexes 1 to the max input. O(1)
For each value, increment the count of every element of the index which is a factor of the number (make sure you don't wraparound). O(N).
Starting at the end of the array, scan back until you find a value >= 2. O(1)
That tells you the max gcd, but doesn't tell you which pair produced it. For your example input, the computed array looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 2 1 1 2 0 0 0 0 0 0 0 0 0 1
I don't know whether this is actually any faster for the inputs you have to handle. The constant factors involved are large: the bound on your values and the time to factorise a value within that bound.
You don't have to factorise each value - you could use memoisation and/or a pregenerated list of primes. Which gives me the idea that if you are memoising the factorisation, you don't need the array:
Create an empty set of int, and a best-so-far value 1.
For each input integer:
if it's less than or equal to best-so-far, continue.
check whether it's in the set. If so, best-so-far = max(best-so-far, this-value), continue. If not:
add it to the set
repeat for all of its factors (larger than best-so-far).
Add/lookup in a set could be O(log N), although it depends what data structure you use. Each value has O(f(k)) factors, where k is the max value and I can't remember what the function f is...
The reason that you're finished with a value as soon as you encounter it in the set is that you've found a number which is a common factor of two input values. If you keep factorising, you'll only find smaller such numbers, which are not interesting.
I'm not quite sure what the best way is to repeat for the larger factors. I think in practice you might have to strike a balance: you don't want to do them quite in decreasing order because it's awkward to generate ordered factors, but you also don't want to actually find all the factors.
Even in the realms of O(N^2), you might be able to beat the use of the Euclidean algorithm:
Fully factorise each number, storing it as a sequence of exponents of primes (so for example 2 is {1}, 4 is {2}, 5 is {0, 0, 1}, 15 is {0, 1, 1}). Then you can calculate gcd(a,b) by taking the min value at each index and multiplying them back out. No idea whether this is faster than Euclid on average, but it might be. Obviously it uses a load more memory.
The optimisations I can think of is
1) start with the two biggest numbers since they are likely to have most prime factors and thus likely to have the most shared prime factors (and thus the highest GCD).
2) When calculating the GCDs of other pairs you can stop your Euclidean algorithm loop if you get below your current greatest GCD.
Off the top of my head I can't think of a way that you can work out the greatest GCD of a pair without trying to work out each pair individually (and optimise a bit as above).
Disclaimer: I've never looked at this problem before and the above is off the top of my head. There may be better ways and I may be wrong. I'm happy to discuss my thoughts in more length if anybody wants. :)
There is no O(n log n) solution to this problem in general. In fact, the worst case is O(n^2) in the number of items in the list. Consider the following set of numbers:
2^20 3^13 5^9 7^2*11^4 7^4*11^3
Only the GCD of the last two is greater than 1, but the only way to know that from looking at the GCDs is to try out every pair and notice that one of them is greater than 1.
So you're stuck with the boring brute-force try-every-pair approach, perhaps with a couple of clever optimizations to avoid doing needless work when you've already found a large GCD (while making sure that you don't miss anything).
With some constraints, e.g the numbers in the array are within a given range, say 1-1e7, it is doable in O(NlogN) / O(MAX * logMAX), where MAX is the maximum possible value in A.
Inspired from the sieve algorithm, and came across it in a Hackerrank Challenge -- there it is done for two arrays. Check their editorial.
find min(A) and max(A) - O(N)
create a binary mask, to mark which elements of A appear in the given range, for O(1) lookup; O(N) to build; O(MAX_RANGE) storage.
for every number a in the range (min(A), max(A)):
for aa = a; aa < max(A); aa += a:
if aa in A, increment a counter for aa, and compare it to current max_gcd, if counter >= 2 (i.e, you have two numbers divisible by aa);
store top two candidates for each GCD candidate.
could also ignore elements which are less than current max_gcd;
Previous answer:
Still O(N^2) -- sort the array; should eliminate some of the unnecessary comparisons;
max_gcd = 1
# assuming you want pairs of distinct elements.
sort(a) # assume in place
for ii = n - 1: -1 : 0 do
if a[ii] <= max_gcd
break
for jj = ii - 1 : -1 :0 do
if a[jj] <= max_gcd
break
current_gcd = GCD(a[ii], a[jj])
if current_gcd > max_gcd:
max_gcd = current_gcd
This should save some unnecessary computation.
There is a solution that would take O(n):
Let our numbers be a_i. First, calculate m=a_0*a_1*a_2*.... For each number a_i, calculate gcd(m/a_i, a_i). The number you are looking for is the maximum of these values.
I haven't proved that this is always true, but in your example, it works:
m=2*4*5*15=600,
max(gcd(m/2,2), gcd(m/4,4), gcd(m/5,5), gcd(m/15,15))=max(2, 2, 5, 5)=5
NOTE: This is not correct. If the number a_i has a factor p_j repeated twice, and if two other numbers also contain this factor, p_j, then you get the incorrect result p_j^2 insted of p_j. For example, for the set 3, 5, 15, 25, you get 25 as the answer instead of 5.
However, you can still use this to quickly filter out numbers. For example, in the above case, once you determine the 25, you can first do the exhaustive search for a_3=25 with gcd(a_3, a_i) to find the real maximum, 5, then filter out gcd(m/a_i, a_i), i!=3 which are less than or equal to 5 (in the example above, this filters out all others).
Added for clarification and justification:
To see why this should work, note that gcd(a_i, a_j) divides gcd(m/a_i, a_i) for all j!=i.
Let's call gcd(m/a_i, a_i) as g_i, and max(gcd(a_i, a_j),j=1..n, j!=i) as r_i. What I say above is g_i=x_i*r_i, and x_i is an integer. It is obvious that r_i <= g_i, so in n gcd operations, we get an upper bound for r_i for all i.
The above claim is not very obvious. Let's examine it a bit deeper to see why it is true: the gcd of a_i and a_j is the product of all prime factors that appear in both a_i and a_j (by definition). Now, multiply a_j with another number, b. The gcd of a_i and b*a_j is either equal to gcd(a_i, a_j), or is a multiple of it, because b*a_j contains all prime factors of a_j, and some more prime factors contributed by b, which may also be included in the factorization of a_i. In fact, gcd(a_i, b*a_j)=gcd(a_i/gcd(a_i, a_j), b)*gcd(a_i, a_j), I think. But I can't see a way to make use of this. :)
Anyhow, in our construction, m/a_i is simply a shortcut to calculate the product of all a_j, where j=1..1, j!=i. As a result, gcd(m/a_i, a_i) contains all gcd(a_i, a_j) as a factor. So, obviously, the maximum of these individual gcd results will divide g_i.
Now, the largest g_i is of particular interest to us: it is either the maximum gcd itself (if x_i is 1), or a good candidate for being one. To do that, we do another n-1 gcd operations, and calculate r_i explicitly. Then, we drop all g_j less than or equal to r_i as candidates. If we don't have any other candidate left, we are done. If not, we pick up the next largest g_k, and calculate r_k. If r_k <= r_i, we drop g_k, and repeat with another g_k'. If r_k > r_i, we filter out remaining g_j <= r_k, and repeat.
I think it is possible to construct a number set that will make this algorithm run in O(n^2) (if we fail to filter out anything), but on random number sets, I think it will quickly get rid of large chunks of candidates.
pseudocode
function getGcdMax(array[])
arrayUB=upperbound(array)
if (arrayUB<1)
error
pointerA=0
pointerB=1
gcdMax=0
do
gcdMax=MAX(gcdMax,gcd(array[pointera],array[pointerb]))
pointerB++
if (pointerB>arrayUB)
pointerA++
pointerB=pointerA+1
until (pointerB>arrayUB)
return gcdMax

Number of sequences over {0,1} such that sequence contains at least half ones

How to calculate number of sequences over {0,1} such that each sequence contains at least half ones?
The total number of sequences of length n is 2^n. If n is odd, exactly the half of them (2^(n-1)) have at least half ones.
For even n, you have to take into account that there are n!/((n/2)!^2) sequences with exactly half ones. So in this case I think you have in total
(1/2)*(2^n + n!/((n/2)!^2)).
Suppose the total length of the sequence is n , and the number of sequences that contains n/2 one is :
n!/((n/2)!^2)
EDIT:
Sorry, I made a mistake. I meant n!/((n/2)!^2) but not n!/(2*(n/2)!). I considered it as combination problems and used following formulas. (substitute k with n/2)
Edit: Dang! We (I) should always read the problem carefully!
The following deals with enumerating the number of sequences where the number of 0s and 1s is equal! The actual problem is that the number of zeros should be less or equal to that of 1s !!!
Pierr's formula, n!/(2*(n/2)!) is almost correct, it is actually, n!/((n/2)! * (n/2)!)
but this could use a bit of explanation (pun intended ;-) ).
n being the total length, we know that n has to be even, since the problem requires an equal number of 0s and of 1s.
Let's focus on placing the 0s. For a sequence of length n, we have n/2 zero-bits, to put in one of the n positions of the sequence. We only need to count for the zero-bits, since after that there will be no choices left with regards to the one-bits: all other positions will require a 1 in them.
So... n/2 zero-bits, for n positions... There are n ways to pick the first position, then (n-1) ways to pick the second position (two bits couldn't occupy the same position), etc.
This number of choices is therefore
n! / (n/2)!
for example, for n=6, we have
6 * 5 * 4 choices,
which, by multiplying and dividing by (3*2*1) is equivalent to
= 6 * 5 * 4 * (3 * 2 * 1) / (3 * 2 * 1)
= 6! / 3!
= n! / (n/2)! (a)
Now... some of these choices [of where to put first bit, second bit etc,] result in the same combination, because all zero-bits are the same, and therefore whether one put say the "first" bit in position x and the "2nd" bit in position y, or the first into y and the second into x, we'd have the same combination. There are (n/2)! ways of arranging these n/2 bits. In the example of n = 6, there are 3 ways of picking the position for the "first" bit, 2 ways for the 2nd bit and 1 (i.e. no choice) for the last zero-bit. The complete formula then needs to be (a) divided by (n/2)!, i.e:
n! / (n/2)! * 1/(n/2)!
= n! / ((n/2)! * (n/2)!)

Resources