Given a list of N non-negative integers, propose an algorithm to check if the sum of X numbers from the list equals the remaining N-X.
In other words, a simpler case of the Subset sum problem which involves the entire set.
An attempted solution
Sort the elements of the list in descending order. Initialize a variable SUM to the first element. Remove first element (largest, a(1)). Let a(n) denote the n-th element in current list.
While list has more than one element,
Make SUM equal to SUM + a(1) or SUM - a(1), whichever is closest to a(2). (where closest means |a(2) - SUM_POSSIBLE| is minimum).
Remove a(1).
If the SUM equals -a(1) or a(1), there exists a linear sum.
The problem
I cannot seem to resolve above algorithm, if it is correct, I would like a proof.
If it is wrong (more likely), is there a way to get this done in linear time?
PS: If I'm doing something wrong please forgive :S
Notice that you want the sum of x numbers to be equal to the sum of the other N-x numbers.
You can simplify this by saying you want to see if there's a subset which sums up to S/2 where S is the total sum of the whole set.
So, you can calculate the Sum you need to get to with one iteration (O(n)).
Then just use a known algorithm like Knapsack to find a subset that meets your sum.
Another more "mathematical" explanation: Dynamic Programming – 3 : Subset Sum
Edit:
As an answer to your other question, your algorithm is wrong. consider this list of numbers:
{3,3,4,4}
The total sum is 14, so you're looking for a subset with the sum of 7. Obviously it will be 3+4.
Your algorithm will return false after examining the 2 3's
Related
I have n integers a_1, ..., a_n. I want to pick the minimum number from all of them whose xor forms others.
For example, consider [1,2,3], 1^3=2 so you don't need 2 in the array. So you can remove it. To end up with [1,3]. So the min number of elements is 2 and they can form all the original elements in the array by xoring any 2 of them. Would a greedy approach work here? or DP?
Edit: To explain what I am thinking. A greedy approach I thought about was due to the fact that if a^b=c then a^c=b and b^c=a. First I delete all duplicates. then I would first in the beginning list all the pairs that each element can pair up with to form another element in the array. It takes O(n^3) for preprocessing. Then I pick the element with the least contribution and I delete it and subsequently subtract 1 from each of the other elements. I repeat this until all elements have <=2 pairs. and I stop. This would also take O(n^3) for a total of O(n^3). Does this greedy approach work? Is there a DP way to do it?
If n is bounded by 50 I think backtracking should work.
Suppose at some step we have already selected a subset S of numbers (that should produce all the others) and want to include a new number to that subset.
Then we can do the following:
Consider all remaining numbers R and include in S all numbers that can't be produced by others (in S and R)
Include in S a random (or "best" in some way) number from R
Remove from R all numbers that can be produced by those in updated S
Also you should keep track of the current best solution and cut off all the branches that won't allow to get a better result.
I have a list of numbers and I have a sum value. For instance,
list = [1, 2, 3, 5, 7, 11, 10, 23, 24, 54, 79 ]
sum = 20
I would like to generate a sequence of numbers taken from that list, such that the sequence sums up to that target. In order to help achieve this, the sequence can be of any length and repetition is allowed.
result = [2, 3, 5, 10] ,or result = [1, 1, 2, 3, 3, 5, 5] ,or result = [10, 10]
I've been doing a lot of research into this problem and have found the subset sum problem to be of interest. My problem is, in a few ways, similar to the subset sum problem in that I would like to find a subset of numbers that produces the targeted sum.
However, unlike the subset sum problem which finds all sets of numbers that sum up to the target (and so runs in exponential time if brute forcing), I only want to find one set of numbers. I want to find the first set that gives me the sum. So, in a certain sense, speed is a factor.
Additionally, I would like there to be some degree of randomness (or pseudo-randomness) to the algorithm. That is, should I run the algorithm using the same list and sum multiple times, I should get a different set of numbers each time.
What would be the best algorithm to achieve this?
Additional Notes:
What I've achieved so far is using a naive method where I cycle through the list adding it to every combination of values. This obviously takes a long time and I'm currently not feeling too happy about it. I'm hoping there is a better way to do this!
If there is no sequence that gives me the exact sum, I'm satisfied with a sequence that gives me a sum that is as close as possible to the targeted sum.
As others said, this is a NP-problem.
However, this doesn't mean small improvements aren't possible:
Is 1 in the list? [1,1,1,1...] is the solution. O(1) in a sorted list
Remove list element bigger than the target sum. O(n)
Is there any list element x with (x%sum)==0 ? Again, easy solution. O(n)
Are there any list elements x,y with (x%y)==0 ? Remove x. O(n^2)
(maybe even: Are there any list elements x,y,z with (x%y)==z or (x+y)==z ? Remove x. O(n^3))
Before using the full recursion, try if you can get the sum
just with the smallest even and smallest odd number.
...
Subset Sum problem isn't about finding all subsets, but rather about determining if there is some subset. It is a decision problem. All problems in NP are like this. And even this simpler problem is NP-complete.
This means that if you want an exact answer (the subset must sum exactly some value) you won't be able to do much better than the any subset sum algorithm (it is exponential unless P=NP).
I would attempt to reduce the problem to a brute-force search of a smaller set.
Sort the list smallest to largest.
Keep a sum and result list.
Repeat {
Draw randomly from the subset of list less than target - sum.
Increment sum by drawn value, add drawn value to result list.
} until list[0] > sum or sum == 0
If sum != 0, brute force search for small combinations from list that match the difference between sum and small combinations of result.
This approach may fail to find valid solutions, even if they exist. It can, however, quickly find a solution or quickly fail before having to resort to a slower brute force approach using the entire set at a greater depth.
This is a greedy approach to the problem:
Without 'randomness':
Obtain the single largest number in the set that is smaller than your desired sum- we'll name it X. Given it's ordered, at best it's O(1), and O(N) at worst if the sum is 2.
As you can repeat the value- say c times, do so as many times until you get closest to the sum, but be careful! Create a range of values- essentially now you'll be finding another sum! You'll now be find numbers that add up to R = (sum - X * c). So find the largest number smaller than R. Check if R - (number you just found) = 0 or if any [R - (number you just found)] % (smaller #s) == 0.
If it becomes R > 0, make partial sums of the smaller numbers less than R (this will not be more than 5 ~ 10 computations because of the nature of this algorithm). See if these would then satisfy it.
If that step makes R < 0, remove one X and start the process again.
With 'randomness':
Just get X randomly! :-)
Note: This would work best if you have a few single digit numbers.
In a recent campus Facebook interview i have asked to divide an array into 3 equal parts such that the sum in each array is roughly equal to sum/3.My Approach1. Sort The Array2. Fill the array[k] (k=0) uptil (array[k]<=sum/3)3. After that increment k and repeat the above step for array[k]Is there any better algorithm for this or it is NP Hard Problem
This is a variant of the partition problem (see http://en.wikipedia.org/wiki/Partition_problem for details). In fact a solution to this can solve that one (take an array, pad with 0s, and then solve this problem) so this problem is NP hard.
There is a dynamic programming approach that is pseudo-polynomial. For each i from 0 to the size of the array, you keep track of all possible combinations of current sizes for the sub arrays, and their current sums. As long as there are a limited number possible sums of subsets of the array, this runs acceptably fast.
The solution that I would have suggested is to just go for "good enough" closeness. First let's consider the simpler problem with all values positive. Then sort by value descending. Take that array in threes. Build up the three subsets by always adding the largest of the triple to the one with the smallest sum, the smallest to the one with the largest, and the middle to the middle. You will end up dividing the array evenly, and the difference will be no more than the value of the third smallest element.
For the general case you can divide into positive and negative, use the above approach on each, and then brute force all combinations of a group of positives, a group of negatives, and the few leftover values in the middle that did not divide evenly.
Here are details on a dynamic programming solution if you are interested. The running time and memory usage is O(n*(sum)^2) where n is the size of your array and sum is the sum of absolute values of your array values. For each array index j from 1 to n, store all the possible values you can get for your 3 subset sums when you split the array from index 1 to j into 3 subsets. Also for each possibility, store one possible way to split the array to get the 3 sums. Then to extend this information for 1 to (j+1) given the information from 1 to j, simply take each possible combination of 3 sums for splitting 1 to j and form the 3 combinations of 3 sums you get when you choose to add the (j+1)th array element to any one of the 3 subsets. Finally, when you are done and reach j = n, go through the set of all combinations of 3 subset sums you can get when you split array positions 1 to n into 3 sets, and choose the one whose maximum deviation from sum/3 is minimized. At first this may seem like O(n*(sum)^3) complexity, but for each j and each combination of the first 2 subset sums, the 3rd subset sum is uniquely determined. (because you are not allowed to omit any elements of the array). Thus the complexity really is O(n*(sum)^2).
If I have a list L of positive integers and I am given another number K, I need to find the number in the list with which XOR of K is maximum.
I have thought of an algorithm for this. I want to verify its correctness with counter arguments. My algorithm is:
Find P=K's complement (1's complement). Now find the number which is closest to P in the list L. Let this number be N. The XOR of K and N will be maximum.
Closest number to a number I in a given set of numbers is a number whose difference with I is minimum.
Lets say, it is not correct for the numbers greater than P in the list L. But isn't it correct for the numbers <=P ?
Please tell me whether I am correct or not by providing counter arguments/suggestions/ideas.
i think you need something called a Trie.
consider every bit of K, from higher to lower, of course we can be greedy when determine whether this bit of answer can be 1, i mean, first you try your best to get 1024(or even higher), and then 512, and then 256 and then......finally to the last bit 1.
So first you need to check whether some number in the list L has the opposite value to K in the highest bit, then among all the chosen numbers, then you need to find the numbers which has the opposite value to K in the second highest bit.
now the solution is obvious, build a Trie with L, determine the answer's bits from higher to lower, which corresponds to travel on that tree.
No, not right.
Let K = 0011, so that P = 1100. Let L = {0011, 1100}. Your algorithm would choose N = 1100, which is obviously incorrect since N^K = 0, while 0011^N = 3.
Coding and running the obvious brute-force algorithm would have taken far less time than you've already spent on this.
I'm re-reading Skiena's Algorithm Design Manual to catch up on some stuff I've forgotten since school, and I'm a little baffled by his descriptions of Dynamic Programming. I've looked it up on Wikipedia and various other sites, and while the descriptions all make sense, I'm having trouble figuring out specific problems myself. Currently, I'm working on problem 3-5 from the Skiena book. (Given an array of n real numbers, find the maximum sum in any contiguous subvector of the input.) I have an O(n^2) solution, such as described in this answer. But I'm stuck on the O(N) solution using dynamic programming. It's not clear to me what the recurrence relation should be.
I see that the subsequences form a set of sums, like so:
S = {a,b,c,d}
a a+b a+b+c a+b+c+d
b b+c b+c+d
c c+d
d
What I don't get is how to pick which one is the greatest in linear time. I've tried doing things like keeping track of the greatest sum so far, and if the current value is positive, add it to the sum. But when you have larger sequences, this becomes problematic because there may be stretches of negative numbers that would decrease the sum, but a later large positive number may bring it back to being the maximum.
I'm also reminded of summed area tables. You can calculate all the sums using only the cumulative sums: a, a+b, a+b+c, a+b+c+d, etc. (For example, if you need b+c, it's just (a+b+c) - (a).) But don't see an O(N) way to get it.
Can anyone explain to me what the O(N) dynamic programming solution is for this particular problem? I feel like I almost get it, but that I'm missing something.
You should take a look to this pdf back in the school in http://castle.eiu.edu here it is:
The explanation of the following pseudocode is also int the pdf.
There is a solution like, first sort the array in to some auxiliary memory, then apply Longest Common Sub-Sequence method to the original array and the sorted array, with sum(not the length) of common sub-sequence in the 2 arrays as the entry into the table (Memoization). This can also solve the problem
Total running time is O(nlogn)+O(n^2) => O(n^2)
Space is O(n) + O(n^2) => O(n^2)
This is not a good solution when memory comes into picture. This is just to give a glimpse on how problems can be reduced to one another.
My understanding of DP is about "making a table". In fact, the original meaning "programming" in DP is simply about making tables.
The key is to figure out what to put in the table, or modern terms: what state to track, or what's the vertex key/value in DAG (ignore these terms if they sound strange to you).
How about choose dp[i] table being the largest sum ending at index i of the array, for example, the array being [5, 15, -30, 10]
The second important key is "optimal substructure", that is to "assume" dp[i-1] already stores the largest sum for sub-sequences ending at index i-1, that's why the only step at i is to decide whether to include a[i] into the sub-sequence or not
dp[i] = max(dp[i-1], dp[i-1] + a[i])
The first term in max is to "not include a[i]", the second term is to "include a[i]". Notice, if we don't include a[i], the largest sum so far remains dp[i-1], which comes from the "optimal substructure" argument.
So the whole program looks like this (in Python):
a = [5,15,-30,10]
dp = [0]*len(a)
dp[0] = max(0,a[0]) # include a[0] or not
for i in range(1,len(a)):
dp[i] = max(dp[i-1], dp[i-1]+a[i]) # for sub-sequence, choose to add or not
print(dp, max(dp))
The result: largest sum of sub-sequence should be the largest item in dp table, after i iterate through the array a. But take a close look at dp, it holds all the information.
Since it only goes through items in array a once, it's a O(n) algorithm.
This problem seems silly, because as long as a[i] is positive, we should always include it in the sub-sequence, because it will only increase the sum. This intuition matches the code
dp[i] = max(dp[i-1], dp[i-1] + a[i])
So the max. sum of sub-sequence problem is easy, and doesn't need DP at all. Simply,
sum = 0
for v in a:
if v >0
sum += v
However, what about largest sum of "continuous sub-array" problem. All we need to change is just a single line of code
dp[i] = max(dp[i-1]+a[i], a[i])
The first term is to "include a[i] in the continuous sub-array", the second term is to decide to start a new sub-array, starting a[i].
In this case, dp[i] is the max. sum continuous sub-array ending with index-i.
This is certainly better than a naive approach O(n^2)*O(n), to for j in range(0,i): inside the i-loop and sum all the possible sub-arrays.
One small caveat, because the way dp[0] is set, if all items in a are negative, we won't select any. So for the max sum continuous sub-array, we change that to
dp[0] = a[0]