Given a set of n integers, list all possible subsets with sum>=k - algorithm

Given an unsorted set of integers in the form of array, find all possible subsets whose sum is greater than or equal to a const integer k,
eg:- Our set is {1,2,3} and k=2
Possible subsets:-
{2},
{3},
{1,2},
{1,3},
{2,3},
{1,2,3}
I can only think of a naive algorithm which lists all the subsets of set and checks if sum of subset is >=k or not, but its an exponential algorithm and listing all subsets requires O(2^N). Can I use dynamic programming to solve it in polynomial time?

Listing all the subsets is going to be still O(2^N) because in the worst case you may still have to list all subsets apart from the empty one.
Dynamic programming can help you count the number of sets that have sum >= K
You go bottom-up keeping track of how many subsets summed to some value from range [1..K]. An approach like this will be O(N*K) which is going to be only feasible for small K.
The idea with the dynamic programming solution is best illustrated with an example. Consider this situation. Assume you know that out of all the sets composed of the first i elements you know that t1 sum to 2 and t2 sum to 3. Let's say that the next i+1 element is 4. Given all the existing sets we can build all the new sets by either appending the element i+1 or leaving it out. If we leave it out we get t1 subsets that sum to 2 and t2 subsets that sum to 3. If we append it then we obtain t1 subsets that sum to 6 (2 + 4) and t2 that sum to 7 (3 + 4) and one subset which contains just i+1 which sums to 4. That gives us the numbers of subsets that sum to (2,3,4,6,7) consisting of the first i+1 elements. We continue until N.
In pseudo-code this could look something like this:
int DP[N][K];
int set[N];
//go through all elements in the set by index
for i in range[0..N-1]
//count the one element subset consisting only of set[i]
DP[i][set[i]] = 1
if (i == 0) continue;
//case 1. build and count all subsets that don't contain element set[i]
for k in range[1..K-1]
DP[i][k] += DP[i-1][k]
//case 2. build and count subsets that contain element set[i]
for k in range[0..K-1]
if k + set[i] >= K then break inner loop
DP[i][k+set[i]] += DP[i-1][k]
//result is the number of all subsets - number of subsets with sum < K
//the -1 is for the empty subset
return 2^N - sum(DP[N-1][1..K-1]) - 1

Can I use dynamic programming to solve it in polynomial time?
No. The problem is even harder than #amit (in the comments) mentions. Finding if there exists a subset that sums to a specific k is the subset-sum problem, which is NP-hard. Instead you are asking for how many solutions are equal to a specific k, which is in the much more difficult class of P#. In addition, your exact problem is slightly more difficult since you want to not only count, but enumerate all the possible subsets for k and targets < k.

If k is 0, and every element of the set is positive then you have no choice but to output every possible subset, so the lower-bound to this problem is O(2N) -- the time taken to produce the output.
Unless you know something more about the value k that you haven't told us, there's no faster general solution that to just check every subset.

Related

The largest number of subsets for a set

Given a set of X numbers less than or equal to Y, which may contain repeated numbers:
which algorithm gives you the maximum number of subsets whose sum of their elements is greater than or equal to Y, where none of the elements of one subset can be contained in another, and each subset cannot repeat the same element.
(note: if in the initial set two numbers are repeated, each counts as a distinct element)
subsets can group elements into duos, trios, quartets or any other number.
doing two for loops to search for the best combination for the highest number worked for doubles, but given that it is possible to do trios and so on, cases like "1 1 1 1 1 7 8" would be suboptimized
You could implement a 'brute force' method and go through every possible partitioning and check if it satisfies your requirements. This would be quite simple, but horribly inefficient except for trivial cases.
Suppose you have N elements e_i in your set S, with 0 <= e_i <= Y. Choose numparts as the number of partitions you are going to try to create with element sum >= Y. Assuming sum e_i >= Y, we can set numparts = 1 initially, otherwise, obviously, the answer is zero..
Then you can generate partitions by creating an array of N elements p_i where 0 <= p_i < numparts. There are not more than numparts^N possible such partitions!! Now, you have to try to find one in which for all 0 <= j < numparts, sum {e_i : p_i = j} >= Y. If you find one, increment numparts, if you don't, then you have your answer which is the largest numparts value for which you did find a qualifying partition.
You could improve the efficiency of this approach significantly by avoiding lots of partitions that don't have a sum >= Y. There are 'only' 2^N distinct subsets of S so the number of subsets with sums >=Y must be less than or equal to 2^N. For each such subset S_k, you can try to find the maximum number of partitions of S - S_k each with sums >= Y which is just a recursion of the same problem. This would give you the absolute maximum result you're looking for, but would still be a computational nightmare for non-trivial problems.
A quick-but-suboptimal algorithm is simply to sort the array in ascending order, then partition according to the partition sum as you process the sorted elements sequentially. e.g.
Suppose s[i] are the elements in the sorted array,...
partitionno = 0;
partitionsum = 0;
for (i=0; i<N; i++) {
partitionsum += s[i];
if (partitionsum >= Y) {
partitionsum = 0;
partitionno++;
}
}
giving partitionno subsets each having a sum of at least Y. Sorting can be done in O(N) time, and the algorithm above is also O(N) so you could use this for N in the 1000000s or more.
This is strongly NP hard since it contains as a special case the special case of the 3 partition problem of dividing a set into triplets that each have the same sum where all numbers are in the range of that sum/4 to that sum/2. And that special case is known to be strongly NP hard.
Therefore there is no known algorithm to solve it, and finding one would be a really big deal.

Divide linked-list into 2 sublists with equal sum

I'm trying to divide a linked-list into 2 sublists with equal sum. These sublists do not need to consist of consecutive elements.
I have a linked list as
Eg.1
LinkedList={1,7,5,5,4}
should be divided into
LinkedList1={1,5,5}
LinkedList2={7,4}
Both have the same sum of elements as 11.
Eg.2
LinkedList={42,2,3,2,2,2,5,20,2,20}
This should be divided into two list of equal sum i.e 50.
LinkedList1={42,3,5}
LinkedList2={2,2,2,2,20,2,20}
Can someone provide some pseudocode to solve this problem?
This is what I've thought so far:
Sum the elements of linked list and divide by 2.
Now till the sum of your linkedlist1 is less than the sum of linkedlist/2 keep pushing elements into linkedlist1.
If not equal and less than linkedlist sum/2 move to the next element and the current element can be pushed to the linkedlist2.
But this would only work if the elements are in a particular order.
This is known as the partition problem.
There are a few approaches to solving the problem, but I'll just mention the most common 2 below (see Wikipedia for more details on either approach or other approaches).
This can be solved with a dynamic programming approach, which basically comes down to, for each element and value, either including or excluding that element, and looking up whether there's a subset summing to the corresponding value. More specifically, we have the following recurrence relation:
p(i, j) is True if a subset of { x1, ..., xj } sums to i and False otherwise.
p(i, j) is True if either p(i, j − 1) is True or if p(i − xj, j − 1) is True
p(i, j) is False otherwise
Then p(N/2, n) tells us whether a subset exists.
The running time is O(Nn) where n is the number of elements in the input set and N is the sum of elements in the input set.
The "approximate" greedy approach (doesn't necessarily find an equal-sum partition) is pretty straight-forward - it just involves putting each element in the set with the smallest sum. Here's the pseudo-code:
INPUT: A list of integers S
OUTPUT: An attempt at a partition of S into two sets of equal sum
1 function find_partition( S ):
2 A ← {}
3 B ← {}
4 sort S in descending order
5 for i in S:
6 if sum(A) <= sum(B)
7 add element i to set A
8 else
9 add element i to set B
10 return {A, B}
The running time is O(n log n).

Finding sub-array sum in an integer array

Given an array of N positive integers. It can have n*(n+1)/2 sub-arrays including single element sub-arrays. Each sub-array has a sum S. Find S's for all sub-arrays is obviously O(n^2) as number of sub-arrays are O(n^2). Many sums S's may be repeated also. Is there any way to find count of all distinct sum (not the exact values of sums but only count) in O(n logn).
I tried an approach but stuck on the way. I iterated the array from index 1 to n.
Say a[i] is the given array. For each index i, a[i] will add to all the sums in which a[i-1] is involved and will include itself also as individual element. But duplicate will emerge if among sums in which a[i-1] is involved, the difference of two sums is a[i]. I mean that, say sums Sp and Sq end up at a[i-1] and difference of both is a[i]. Then Sp + a[i] equals Sq, giving Sq as a duplicate.
Say C[i] is count of the distinct sums in which end up at a[i].
So C[i] = C[i-1] + 1 - numbers of pairs of sums in which a[i-1] is involved whose difference is a[i].
But problem is to find the part of number of pairs in O(log n). Please give me some hint about this or if I am on wrong way and completely different approach is required problem point that out.
When S is not too large, we can count the distinct sums with one (fast) polynomial multiplication. When S is larger, N is hopefully small enough to use a quadratic algorithm.
Let x_1, x_2, ..., x_n be the array elements. Let y_0 = 0 and y_i = x_1 + x_2 + ... + x_i. Let P(z) = z^{y_0} + z^{y_1} + ... + z^{y_n}. Compute the product of polynomials P(z) * P(z^{-1}); the coefficient of z^k with k > 0 is nonzero if and only if k is a sub-array sum, so we just have to read off the number of nonzero coefficients of positive powers. The powers of z, moreover, range from -S to S, so the multiplication takes time on the order of S log S.
You can look at the sub-arrays as a kind of tree. In the sense that subarray [0,3] can be divided to [0,1] and [2,3].
So build up a tree, where nodes are defined by length of the subarray and it's staring offset in the original array, and whenever you compute a subarray, store the result in this tree.
When computing a sub-array, you can check this tree for existing pre-computed values.
Also, when dividing, parts of the array can be computed on different CPU cores, if that matters.
This solution assumes that you don't need all values at once, rather ad-hoc.
For the former, there could be some smarter solution.
Also, I assume that we're talking about counts of elements in 10000's and more. Otherwise, such work is a nice excercise but has not much of a practical value.

Efficiently calculate next permutation of length k from n choices

I need to efficiently calculate the next permutation of length k from n
choices. Wikipedia lists a great
algorithm
for computing the next permutation of length n from n choices.
The best thing I can come up with is using that algorithm (or the Steinhaus–Johnson–Trotter algorithm), and then just only considering the first k items of the list, and iterating again whenever the changes are all above that position.
Constraints:
The algorithm must calculate the next permutation given nothing more than
the current permutation. If it needs to generate a list of all permutations,
it will take up too much memory.
It must be able to compute a permutation of only length k of n (this is
where the other algorithm fails
Non-constraints:
Don't care if it's in-place or not
I don't care if it's in lexographical order, or any order for that matter
I don't care too much how efficiently it computes the next permutation,
within reason of course, it can't give me the next permutation by making a
list of all possible ones each time.
You can break this problem down into two parts:
1) Find all subsets of size k from a set of size n.
2) For each such subset, find all permutations of a subset of size k.
The referenced Wikipedia article provides an algorithm for part 2, so I won't repeat it here. The algorithm for part 1 is pretty similar. For simplicity, I'll describe it for "find all subsets of size k of the integers [0...n-1].
1) Start with the subset [0...k-1]
2) To get the next subset, given a subset S:
2a) Find the smallest j such that j ∈ S ∧ j+1 ∉ S. If j == n-1, there is no next subset; we're done.
2b) The elements less than j form a sequence i...j-1 (since if any of those values were missing, j wouldn't be minimal). If i is not 0, replace these elements with i-i...j-i-1. Replace element j with element j+1.

Subset sum for exactly k integers?

Following from these question Subset sum problem and Sum-subset with a fixed subset size I was wondering what the general algorithm for solving a subset sum problem, where we are forced to use EXACTLY k integers, k <= n.
Evgeny Kluev mentioned that he would go for using optimal for k = 4 and after that use brute force approach for k- 4 and optimal for the rest. Anyone could enlight what he means by a brute force approach here combined with optimal k=4 algo?
Perhaps someone knows a better, general solution?
The original dynamic programming algorithm applies, with a slight extension - in addition to remembering partial sums, you also need to remember number of ints used to get the sums.
In the original algorithm, assuming the target sum is M and there are n integers, you fill a boolean n x M array A, where A[i,m] is true iff sum m can be achieved by picking (any number of) from first i+1 ints (assuming indexing from 0).
You can extend it to a three dimensional array nxMxk, which has a similar property - A[i,m,l] is true iff, sum m can be achieved by picking exactly l from first i+1 ints.
Assuming the ints are in array j[0..n-1]:
The recursive relation is pretty similar - the field A[0,j[0],1] is true (you pick j[0], getting sum j[0] with 1 int (duh)), other fields in A[0,*,*] are false and deriving fields in A[i+1,*,*] from A[i,*,*] is also similar to the original algorithm: A[i+1,m,l] is true if A[i,m,l]is true (if you can pick m from first i ints, then obviously you can pick m from first i+1 ints) or if A[i, m - j[i+1], l-1] is true (if you pick j[i+1] then you increase the sum by j[i+1] and the number of ints by 1).
If k is small then obviously it makes sense to skip all of the above part and just iterate over all combinations of k ints and checking their sums. k<=4 indeed seems like a sensible threshold.

Resources