Time Complexity of Subset-Sum Enumeration - algorithm

Normally, when dealing with combinations, the Big-O complexity seems to be O(n choose k). In this algorithm, I am generating all the combinations within the array that match the target sum:
def combos(candidates,start, target):
if target == 0:
return [[]]
res = []
for i in range(start,len(candidates)):
for c in combos(candidates, i+1, target-candidates[i]):
res.append([candidates[i]]+ c)
return res
print combos([2,1,10,5,6,4], 10)
# [[1, 5, 4], [10], [6, 4]]
I am having a hard time determining Big-O here, is this a O(n choose t) algorithm? If not, what is it and why?

If the point is to give the worst-cast complexity in terms of the set size, n, then it is Θ(2n). Given any set, if the target sum is large enough, you'll end up enumerating all the possible subsets of the set. This is Θ(2n), as can be seen in two ways:
Each item can be chosen or not.
It is your Θ(n choose k), just summed up over all k.
A more refined bound would take into account both n and the target sum t. In this case, following the reasoning of the 2nd point above, then if all elements (and target sum) are positive integers, then the complexity will be the sum of Θ(n choose k) for k summing only up to t.

Your algorithm is at least O(2^n) and I believe it is O(n * 2^n). Here is an explanation.
In your algorithm you have to generate all possible combinations of a set (except of an empty set). So this is:
O(2^n) at least. Now for every combinations you have to sum them up. Some sets are of length 1 on is of length n, but majority of them would be somewhere of length n/2. So I believe that your complexity is close to the O(n * 2^n).

Related

How can I create an array of 'n' positive integers where none of the subsequences of the array have equal sum?

I was watching a lecture on the question "subsequence sum equals to k" in which you're given an array of n positive integers and a target sum = k. Your task is to check if any of the subsequence of the array have a sum equal to the target sum. The recursive solution works in O(2^N). The lecturer said that if we memoize the recursive solution, the time complexity will drop to O(N*K). But as much as I understand, memoization simply removes overlapping subproblems. So if all of the subsequences have different sum, won't the time complexity of the solution still be O(2^N)? Just to test this hypothesis, I was trying to create an array of n positive integers where none of the subsequences have equal sum.
Also, I tried the tabulation method and was unable to understand why the time complexity drops in the case of tabulation. Please point to any resource where I can learn exactly what subproblems does tabulation avoid.
Note that O(NK) is not always smaller than O(2N). If K = 2N for example, then O(KN) = O(N * 2N), which is larger.
Furthermore, this is the sort of range you're dealing with when every subsequence sum is different.
If your N integers are powers of 2, for example: [20, 21, 22, ...], then every subsequence has a different sum, and K=2N is the smallest positive integer that isn't a subsequence sum.
The tabulation method is only an improvement when K is known to be relatively small.
If each value in the array is a different, positive power of the same base, no two sums will be equal.
Python code:
def f(A):
sums = set([0])
for a in A:
new_sums = set()
for s in sums:
new_sum = s + a
if new_sum in sums:
print(new_sum)
return None
new_sums.add(new_sum)
sums = sums.union(new_sums)
return sums
for b in range(2, 10):
A = []
for p in range(5):
A.append(b**p)
sums = f(A)
print(A)
print(len(sums))
print(sums)
In the recursive case without memoization, you'll compute the sums for all subsequences, which has O(2^N) complexity.
Now consider the memoization case.
Let dp[i][j] = 1 if there exists a subsequence in the array arr[i:] that has sum j, else dp[i][j] = 0.
The algorithm is
for each index i in range(n,0,-1):
j = array[i]
for each x in range(0,k):
dp[i][x] += dp[i+1][x]
if dp[i+1][x] == 1:
dp[i][x+j] = 1
return dp[0][k]
For each index, we traverse the subsequence sums seen till yet (in range k), and mark them True for the current index. For each such sum, we also add the value of the current element, and mark that True.
Which sub-problems were reduced?
We just track if sum x is possible in a subarray. In the recursive case, there could be 100 subsequences that have sum x. Here, since we're using a bool to track if x is possible in the subarray, we effectively avoid going over all subsequences just to check if the sum is possible.
For each index, since we do a O(k) traversal of going through all sums, the complexity becomes O(N*k).

What numbers add up to k?

I am not sure how to do this. Given a list of numbers and a number k, return all pairs of numbers from the list that add up to k. only pass through the list once.
For example, given [10, 15, 3, 7] and k of 17. The program should return 10 + 7.
How do you order and return every pair while only going through the list once.
Use a set to keep track of what you've seen. Runtime O(N), Space: O(N)
def twoAddToK(nums, k):
seen = set()
N = len(nums)
for i in range(N):
if k - nums[i] in seen:
return True
seen.add(nums[i])
return False
As an alternative to Shawn's code, which uses a set, there is also the option of sorting the list in O(N log N) time (and possibly no extra space, if you are allowed to overwrite the original input), and then applying an O(N) algorithm to solve the problem on a sorted list
While asymptotic complexity slightly favors using hash sets in terms of time, since O(N) is better than O(N log N), I am ready to bet that sorting + single-pass lookup is considerably faster in practice.

Kth Smallest SUM In Two Sorted Arrays - Binary Search Solution

I am trying to solve an interview practice problem.
The problem is:
Given two integer arrays sorted in ascending order and an integer k. Define sum = a + b, where a is an element from the first array and b is an element from the second one. Find the kth smallest sum out of all possible sums.
For example
Given [1, 7, 11] and [2, 4, 6].
For k = 3, return 7.
For k = 4, return 9.
For k = 8, return 15.
We define n as the size of A, and m as the size of B.
I know how to solve it using heap (O(k log min(n, m, k)) time complexity). But the problem states that there is another binary search method to do it with O( (m + n) log maxValue), where maxValue is the max number in A and B. Can anyone give some comments for solving it using binary search?
My thinking is that we may use x = A[] + B[] as the searching object, because the k-th x is what we want. If so, how can x be updated in binary search? How can I check if the updated x is valid or not (such a pair really exists or not)?
Thank you
The original problem is here:
https://www.lintcode.com/en/problem/kth-smallest-sum-in-two-sorted-arrays/
You can solve for binary search and sliding window, and the time complexity is O((N + M) log maxvalue).
Let's think solving this problem (I call it counting problem).
You are given integers N, M, S, sequences a and b.
The length of sequence a is exactly N.
The length of sequence b is exactly M.
The sequence a, b is sorted.
Please calculate the number of pairs that satisfies a[i]+b[j]<=S (0<=i<=N-1, 0<=j<=M-1).
Actually, this counting problem can solve for binary search in O(N log M) time.
Surprisingly, this problem can solve for O(N+M).
Binary Search Algorithm
You can solve the maximum value of x that satisfies a[i] + b[x] <= S --> b[x] <= S - a[i] for O(log M).
Therefore, you can solve the number of value of x for O(log M) because it is equal to x+1.
O(N+M) Algorithm
Actually, if you do i++, the value of x is equal or less than previous value of x.
So you can use sliding window algorithm and.
You can solve for O(N+M), because you have to do operation i++ N times, and operation x-- M times.
Solving this main problem
You can binary_search for S and you can solve the inequality (counting problem's answer <= K).
The answer is the maximum value of S.
The time complexity is O((N + M) log maxvalue).

Find pairs with given difference

Given n, k and n number of integers. How would you find the pairs of integers for which their difference is k?
There is a n*log n solution, but I cannot figure it out.
You can do it like this:
Sort the array
For each item data[i], determine its two target pairs, i.e. data[i]+k and data[i]-k
Run a binary search on the sorted array for these two targets; if found, add both data[i] and data[targetPos] to the output.
Sorting is done in O(n*log n). Each of the n search steps take 2 * log n time to look for the targets, for the overall time of O(n*log n)
For this problem exists the linear solution! Just ask yourself one question. If you have a what number should be in the array? Of course a+k or a-k (A special case: k = 0, required an alternative solution). So, what now?
You are creating a hash-set (for example unordered_set in C++11) with all values from the array. O(1) - Average complexity for each element, so it's O(n).
You are iterating through the array, and check for each element Is present in the array (x+k) or (x-k)?. You check it for each element, in set in O(1), You check each element once, so it's linear (O(n)).
If you found x with pair (x+k / x-k), it is what you are looking for.
So it's linear (O(n)). If you really want O(n lg n) you should use a set on tree, with checking is_exist in (lg n), then you have O(n lg n) algorithm.
Apposition: No need to check x+k and x-k, just x+k is sufficient. Cause if a and b are good pair then:
if a < b then
a + k == b
else
b + k == a
Improvement: If you know a range, you can guarantee linear complexity, by using bool table (set_tab[i] == true, when i is in table.).
Solution similar to one above:
Sort the array
set variables i = 0; j = 1;
check the difference between array[i] and array[j]
if the difference is too small, increase j
if the difference is too big, increase i
if the difference is the one you're looking for, add it to results and increase j
repeat 3 and 4 until the end of array
Sorting is O(n*lg n), the next step is, if I'm correct, O(n) (at most 2*n comparisons), so the whole algorithm is O(n*lg n)

Sum-subset with a fixed subset size

The sum-subset problem states:
Given a set of integers, is there a non-empty subset whose sum is zero?
This problem is NP-complete in general. I'm curious if the complexity of this slight variant is known:
Given a set of integers, is there a subset of size k whose sum is zero?
For example, if k = 1, you can do a binary search to find the answer in O(log n). If k = 2, then you can get it down to O(n log n) (e.g. see Find a pair of elements from an array whose sum equals a given number). If k = 3, then you can do O(n^2) (e.g. see Finding three elements in an array whose sum is closest to a given number).
Is there a known bound that can be placed on this problem as a function of k?
As motivation, I was thinking about this question How do you partition an array into 2 parts such that the two parts have equal average? and trying to determine if it is actually NP-complete. The answer lies in whether or not there is a formula as described above.
Barring a general solution, I'd be very interested in knowing an optimal bound for k=4.
For k=4, space complexity O(n), time complexity O(n2 * log(n))
Sort the array. Starting from 2 smallest and 2 largest elements, calculate all lesser sums of 2 elements (a[i] + a[j]) in the non-decreasing order and all greater sums of 2 elements (a[k] + a[l]) in the non-increasing order. Increase lesser sum if total sum is less than zero, decrease greater one if total sum is greater than zero, stop when total sum is zero (success) or a[i] + a[j] > a[k] + a[l] (failure).
The trick is to iterate through all the indexes i and j in such a way, that (a[i] + a[j]) will never decrease. And for k and l, (a[k] + a[l]) should never increase. A priority queue helps to do this:
Put key=(a[i] + a[j]), value=(i = 0, j = 1) to priority queue.
Pop (sum, i, j) from priority queue.
Use sum in the above algorithm.
Put (a[i+1] + a[j]), i+1, j and (a[i] + a[j+1]), i, j+1 to priority queue only if these elements were not already used. To keep track of used elements, maintain an array of maximal used 'j' for each 'i'. It is enough to use only values for 'j', that are greater, than 'i'.
Continue from step 2.
For k>4
If space complexity is limited to O(n), I cannot find anything better, than use brute force for k-4 values and the above algorithm for the remaining 4 values. Time complexity O(n(k-2) * log(n)).
For very large k integer linear programming may give some improvement.
Update
If n is very large (on the same order as maximum integer value), it is possible to implement O(1) priority queue, improving complexities to O(n2) and O(n(k-2)).
If n >= k * INT_MAX, different algorithm with O(n) space complexity is possible. Precalculate a bitset for all possible sums of k/2 values. And use it to check sums of other k/2 values. Time complexity is O(n(ceil(k/2))).
The problem of determining whether 0 in W + X + Y + Z = {w + x + y + z | w in W, x in X, y in Y, z in Z} is basically the same except for not having annoying degenerate cases (i.e., the problems are inter-reducible with minimal resources).
This problem (and thus the original for k = 4) has an O(n^2 log n)-time, O(n)-space algorithm. The O(n log n)-time algorithm for k = 2 (to determine whether 0 in A + B) accesses A in sorted order and B in reverse sorted order. Thus all we need is an O(n)-space iterator for A = W + X, which can be reused symmetrically for B = Y + Z. Let W = {w1, ..., wn} in sorted order. For all x in X, insert a key-value item (w1 + x, (1, x)) into a priority queue. Repeatedly remove the min element (wi + x, (i, x)) and insert (wi+1 + x, (i+1, x)).
Question that is very similar:
Is this variant of the subset sum problem easier to solve?
It's still NP-complete.
If it were not, the subset-sum would also be in P, as it could be represented as F(1) | F(2) | ... F(n) where F is your function. This would have O(O(F(1)) + O(F(2)) + O(F(n))) which would still be polynomial, which is incorrect as we know it's NP-complete.
Note that if you have certain bounds on the inputs you can achieve polynomial time.
Also note that the brute-force runtime can be calculated with binomial coefficients.
The solution for k=4 in O(n^2log(n))
Step 1: Calculate the pairwise sum and sort the list. There are n(n-1)/2 sums. So the complexity is O(n^2log(n)). Keep the identities of the individuals which make the sum.
Step 2: For each element in the above list search for the complement and make sure they don't share "the individuals). There are n^2 searches, each with complexity O(log(n))
EDIT: The space complexity of the original algorithm is O(n^2). The space complexity can be reduced to O(1) by simulating a virtual 2D matrix (O(n), if you consider space to store sorted version of the array).
First about 2D matrix: sort the numbers and create a matrix X using pairwise sums. Now the matrix is ins such a way that all the rows and columns are sorted. To search for a value in this matrix, search the numbers on the diagonal. If the number is in between X[i,i] and X[i+1,i+1], you can basically halve the search space by to matrices X[i:N, 0:i] and X[0:i, i:N]. The resulting search algorithm is O(log^2n) (I AM NOT VERY SURE. CAN SOMEBODY CHECK IT?).
Now, instead of using a real matrix, use a virtual matrix where X[i,j] are calculated as needed instead of pre-computing them.
Resulting time complexity: O( (nlogn)^2 ).
PS: In the following link, it says the complexity of 2D sorted matrix search is O(n) complexity. If that is true (i.e. O(log^2n) is incorrect), then the finally complexity is O(n^3).
To build on awesomo's answer... if we can assume that numbers are sorted, we can do better than O(n^k) for given k; simply take all O(n^(k-1)) subsets of size (k-1), then do a binary search in what remains for a number that, when added to the first (k-1), gives the target. This is O(n^(k-1) log n). This means the complexity is certainly less than that.
In fact, if we know that the complexity is O(n^2) for k=3, we can do even better for k > 3: choose all (k-3)-subsets, of which there are O(n^(k-3)), and then solve the problem in O(n^2) on the remaining elements. This is O(n^(k-1)) for k >= 3.
However, maybe you can do even better? I'll think about this one.
EDIT: I was initially going to add a lot proposing a different take on this problem, but I've decided to post an abridged version. I encourage other posters to see whether they believe this idea has any merit. The analysis is tough, but it might just be crazy enough to work.
We can use the fact that we have a fixed k, and that sums of odd and even numbers behave in certain ways, to define a recursive algorithm to solve this problem.
First, modify the problem so that you have both even and odd numbers in the list (this can be accomplished by dividing by two if all are even, or by subtracting 1 from numbers and k from the target sum if all are odd, and repeating as necessary).
Next, use the fact that even target sums can be reached only by using an even number of odd numbers, and odd target sums can be reached using only an odd number of odd numbers. Generate appropriate subsets of the odd numbers, and call the algorithm recursively using the even numbers, the sum minus the sum of the subset of odd numbers being examined, and k minus the size of the subset of odd numbers. When k = 1, do binary search. If ever k > n (not sure this can happen), return false.
If you have very few odd numbers, this could allow you to very quickly pick up terms that must be part of a winning subset, or discard ones that cannot. You can transform problems with lots of even numbers to equivalent problems with lots of odd numbers by using the subtraction trick. The worst case must therefore be when the numbers of even and odd numbers are very similar... and that's where I am right now. A uselessly loose upper bound on this is many orders of magnitudes worse than brute-force, but I feel like this is probably at least as good as brute-force. Thoughts are welcome!
EDIT2: An example of the above, for illustration.
{1, 2, 2, 6, 7, 7, 20}, k = 3, sum = 20.
Subset {}:
{2, 2, 6, 20}, k = 3, sum = 20
= {1, 1, 3, 10}, k = 3, sum = 10
Subset {}:
{10}, k = 3, sum = 10
Failure
Subset {1, 1}:
{10}, k = 1, sum = 8
Failure
Subset {1, 3}:
{10}, k = 1, sum = 6
Failure
Subset {1, 7}:
{2, 2, 6, 20}, k = 1, sum = 12
Failure
Subset {7, 7}:
{2, 2, 6, 20}, k = 1, sum = 6
Success
The time complexity is trivially O(n^k) (number of k-sized subsets from n elements).
Since k is a given constant, a (possibly quite high-order) polynomial upper bounds the complexity as a function of n.

Resources