global max of local mins of k contiguous elements - algorithm

If this question appears to be a replicate, please point it out.
The problem states:
Given an array of n elements and an integer k <= n, find the max of {min{a_i+1 ... a_i+k} for i in {0 ... n-k}}, i.e. find the max of the mins of k contiguous numbers.
For example, let the sequence a = {10, 21, 11, 13, 16, 15, 12, 9} and k = 3.
The max is 13 for block {13, 16, 15}.
Hopefully the problem is clear!
It is straight forward to have a simple "brute force" making it O(nk). I am wondering if we can do it in O(nlogn) with "divide and conquer" or even in O(n) possibly with "dynamic programming".
It appears to me that if I try "divide and conquer", I have to deal with a set of blocks that is just across the middle border. Yet figuring out the max in that case seems to take O(k2), making the whole recurrence O(nk) again. (Maybe I get the numbers wrong somewhere!)
Looking for directions from you guys! Words and pseudocode are welcome!

You can do it in O(n log k) time; I'll start you off with a tip.
Suppose you have elements ai+1, ..., ai+k in some data structure with logarithmic time insert and constant time min operations (like a min heap). Can you use that to get the same data structure, but with elements ai+2, ..., ai+k+1
in O(log k) time?
Once you have that, then you can basically go through all of the consecutive groups of k and take the maximum of the minimums normally.

Here's one way to solve it using a dynamic programming-like solution. We construct an array "ys" that stores the mins of ever increasing ranges. It uses the idea that if the ys currently store mins of ranges of length p, then if we do for all i: ys[i] = min(ys[i], ys[i+p]), then we've now got mins of ranges of length 2p. Similarly, if we do ys[i] = min(ys[i], ys[i+p], xs[i+2p]) then we've got mins of ranges of length 2p+1. By using a technique very like exponentiation-by-squaring, we can end up with ys storing min-ranges of length k.
def mins(xs, k, ys):
if k == 1: return ys
mins(xs, k // 2, ys)
for i in xrange(len(ys)):
if i + k//2 < len(ys): ys[i] = min(ys[i], ys[i+k//2])
if k % 2:
if i+k-1 < len(ys): ys[i] = min(ys[i], xs[i+k-1])
return ys
def maxmin(xs, k):
return max(mins(xs, k, xs[:]))
We call mins recursively log_2(k) times, and otherwise we iterate over the ys array once per call. Thus, the algorithm is O(n log k).

Related

Count of divisors of numbers till N in O(N)?

So, we can count divisors of each number from 1 to N in O(NlogN) algorithm with sieve:
int n;
cin >> n;
for (int i = 1; i <= n; i++) {
for (int j = i; j <= n; j += i) {
cnt[j]++; //// here cnt[x] means count of divisors of x
}
}
Is there way to reduce it to O(N)?
Thanks in advance.
Here is a simple optimization on #גלעד ברקן's solution. Rather than use sets, use arrays. This is about 10x as fast as the set version.
n = 100
answer = [None for i in range(0, n+1)]
answer[1] = 1
small_factors = [1]
p = 1
while (p < n):
p = p + 1
if answer[p] is None:
print("\n\nPrime: " + str(p))
limit = n / p
new_small_factors = []
for i in small_factors:
j = i
while j <= limit:
new_small_factors.append(j)
answer[j * p] = answer[j] + answer[i]
j = j * p
small_factors = new_small_factors
print("\n\nAnswer: " + str([(k,d) for k,d in enumerate(answer)]))
It is worth noting that this is also a O(n) algorithm for enumerating primes. However with the use of a wheel generated from all of the primes below size log(n)/2 it can create a prime list in time O(n/log(log(n))).
How about this? Start with the prime 2 and keep a list of tuples, (k, d_k), where d_k is the number of divisors of k, starting with (1,1):
for each prime, p (ascending and lower than or equal to n / 2):
for each tuple (k, d_k) in the list:
if k * p > n:
remove the tuple from the list
continue
power = 1
while p * k <= n:
add the tuple to the list if k * p^power <= n / p
k = k * p
output (k, (power + 1) * d_k)
power = power + 1
the next number the output has skipped is the next prime
(since clearly all numbers up to the next prime are
either smaller primes or composites of smaller primes)
The method above also generates the primes, relying on O(n) memory to keep finding the next prime. Having a more efficient, independent stream of primes could allow us to avoid appending any tuples (k, d_k) to the list, where k * next_prime > n, as well as free up all memory holding output greater than n / next_prime.
Python code
Consider the total of those counts, sum(phi(i) for i=1,n). That sum is O(N log N), so any O(N) solution would have to bypass individual counting.
This suggests that any improvement would need to depend on prior results (dynamic programming). We already know that phi(i) is the product of each prime degree plus one. For instance, 12 = 2^2 * 3^1. The degrees are 2 and 1, respective. (2+1)*(1+1) = 6. 12 has 6 divisors: 1, 2, 3, 4, 6, 12.
This "reduces" the question to whether you can leverage the prior knowledge to get an O(1) way to compute the number of divisors directly, without having to count them individually.
Think about the given case ... divisor counts so far include:
1 1
2 2
3 2
4 3
6 4
Is there an O(1) way to get phi(12) = 6 from these figures?
Here is an algorithm that is theoretically better than O(n log(n)) but may be worse for reasonable n. I believe that its running time is O(n lg*(n)) where lg* is the https://en.wikipedia.org/wiki/Iterated_logarithm.
First of all you can find all primes up to n in time O(n) using the Sieve of Atkin. See https://en.wikipedia.org/wiki/Sieve_of_Atkin for details.
Now the idea is that we will build up our list of counts only inserting each count once. We'll go through the prime factors one by one, and insert values for everything with that as the maximum prime number. However in order to do that we need a data structure with the following properties:
We can store a value (specifically the count) at each value.
We can walk the list of inserted values forwards and backwards in O(1).
We can find the last inserted number below i "efficiently".
Insertion should be "efficient".
(Quotes are the parts that are hard to estimate.)
The first is trivial, each slot in our data structure needs a spot for the value. The second can be done with a doubly linked list. The third can be done with a clever variation on a skip-list. The fourth falls out from the first 3.
We can do this with an array of nodes (which do not start out initialized) with the following fields that look like a doubly linked list:
value The answer we are looking for.
prev The last previous value that we have an answer for.
next The next value that we have an answer for.
Now if i is in the list and j is the next value, the skip-list trick will be that we will also fill in prev for the first even after i, the first divisible by 4, divisible by 8 and so on until we reach j. So if i = 81 and j = 96 we would fill in prev for 82, 84, 88 and then 96.
Now suppose that we want to insert a value v at k between an existing i and j. How do we do it? I'll present pseudocode starting with only k known then fill it out for i = 81, j = 96 and k = 90.
k.value := v
for temp in searching down from k for increasing factors of 2:
if temp has a value:
our_prev := temp
break
else if temp has a prev:
our_prev = temp.prev
break
our_next := our_prev.next
our_prev.next := k
k.next := our_next
our_next.prev := k
for temp in searching up from k for increasing factors of 2:
if j <= temp:
break
temp.prev = k
k.prev := our_prev
In our particular example we were willing to search downwards from 90 to 90, 88, 80, 64, 0. But we actually get told that prev is 81 when we get to 88. We would be willing to search up to 90, 92, 96, 128, 256, ... however we just have to set 92.prev 96.prev and we are done.
Now this is a complicated bit of code, but its performance is O(log(k-i) + log(j-k) + 1). Which means that it starts off as O(log(n)) but gets better as more values get filled in.
So how do we initialize this data structure? Well we initialize an array of uninitialized values then set 1.value := 0, 1.next := n+1, and 2.prev := 4.prev := 8.prev := 16.prev := ... := 1. And then we start processing our primes.
When we reach prime p we start by searching for the previous inserted value below n/p. Walking backwards from there we keep inserting values for x*p, x*p^2, ... until we hit our limit. (The reason for backwards is that we do not want to try to insert, say, 18 once for 3 and once for 9. Going backwards prevents that.)
Now what is our running time? Finding the primes is O(n). Finding the initial inserts is also easily O(n/log(n)) operations of time O(log(n)) for another O(n). Now what about the inserts of all of the values? That is trivially O(n log(n)) but can we do better?
Well first all of the inserts to density 1/log(n) filled in can be done in time O(n/log(n)) * O(log(n)) = O(n). And then all of the inserts to density 1/log(log(n)) can likewise be done in time O(n/log(log(n))) * O(log(log(n))) = O(n). And so on with increasing numbers of logs. The number of such factors that we get is O(lg*(n)) for the O(n lg*(n)) estimate that I gave.
I haven't shown that this estimate is as good as you can do, but I think that it is.
So, not O(n), but pretty darned close.

Kth Smallest SUM In Two Sorted Arrays - Binary Search Solution

I am trying to solve an interview practice problem.
The problem is:
Given two integer arrays sorted in ascending order and an integer k. Define sum = a + b, where a is an element from the first array and b is an element from the second one. Find the kth smallest sum out of all possible sums.
For example
Given [1, 7, 11] and [2, 4, 6].
For k = 3, return 7.
For k = 4, return 9.
For k = 8, return 15.
We define n as the size of A, and m as the size of B.
I know how to solve it using heap (O(k log min(n, m, k)) time complexity). But the problem states that there is another binary search method to do it with O( (m + n) log maxValue), where maxValue is the max number in A and B. Can anyone give some comments for solving it using binary search?
My thinking is that we may use x = A[] + B[] as the searching object, because the k-th x is what we want. If so, how can x be updated in binary search? How can I check if the updated x is valid or not (such a pair really exists or not)?
Thank you
The original problem is here:
https://www.lintcode.com/en/problem/kth-smallest-sum-in-two-sorted-arrays/
You can solve for binary search and sliding window, and the time complexity is O((N + M) log maxvalue).
Let's think solving this problem (I call it counting problem).
You are given integers N, M, S, sequences a and b.
The length of sequence a is exactly N.
The length of sequence b is exactly M.
The sequence a, b is sorted.
Please calculate the number of pairs that satisfies a[i]+b[j]<=S (0<=i<=N-1, 0<=j<=M-1).
Actually, this counting problem can solve for binary search in O(N log M) time.
Surprisingly, this problem can solve for O(N+M).
Binary Search Algorithm
You can solve the maximum value of x that satisfies a[i] + b[x] <= S --> b[x] <= S - a[i] for O(log M).
Therefore, you can solve the number of value of x for O(log M) because it is equal to x+1.
O(N+M) Algorithm
Actually, if you do i++, the value of x is equal or less than previous value of x.
So you can use sliding window algorithm and.
You can solve for O(N+M), because you have to do operation i++ N times, and operation x-- M times.
Solving this main problem
You can binary_search for S and you can solve the inequality (counting problem's answer <= K).
The answer is the maximum value of S.
The time complexity is O((N + M) log maxvalue).

Evenly pruning an ordered set

Given an ordered set of n elements (a1, a2, ..., an), what algorithm can I use to pick M of these elements (M < n) such that the sampling from the original set is as spread out as possible?
For example, if the starting set is the numbers from 1 to 9 (i.e. n=9), and I want to evenly sample it so I end up with only 5 of those numbers (i.e. M=9), I'd select 1, 3, 5, 7, 9. To get 3 of the original 9, I'd go with 1, 5, 9, and so on. But what would the pseudo-code for picking the elements look like for any n and M?
The mathematical formulation for this problem would be as follows: given M < n, find the set q(1), q(2), ..., q(M) such that 1 <= q(k) < q(k+1) <= M for any k:[1, M-1], and the sum of q(k+1)-q(k) for k: [1, M-1] is the maximum possible.
Following Nico's suggestion of maximizing the minimum difference (the form of the objective function is fairly flexible), there's a simple O(n^2 M)-time dynamic program that goes something like this.
def prune(a, M):
n = len(a)
# The [i][j] entry is the max min gap
# for a[0], ..., a[i] choose j + 1 (a[0] is always chosen).
table = [[float('inf')] * M for i in range(n)]
for i in range(1, n):
for j in range(M):
table[i][j] = max(min(table[k][j], a[i] - a[k])
for k in range(i))
# Trace back the sequence of argmaxes to recover the chosen indexes.
This can be improved to O(n M) time using total monotonicity. The idea is that, as i increases, the proper value for k can only increase too, and as soon as the objective decreases when we increase k, we can move on to the next i.
Both of the above algorithms handle fairly general objectives. If max is OK, then there's an O(n log n)-time algorithm that uses binary search to guess the minimum gap and then check whether it's feasible. I'll wait for you to update the question.

How to find pair with kth largest sum?

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
The pairs with largest sums are
13 + 16 = 29
13 + 12 = 25
8 + 16 = 24
13 + 8 = 21
8 + 12 = 20
So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?
Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.
I am already aware of the O(Klogk) solution , using Max-Heap given here .
It also is one of the favorite Google interview question , and they demand a O(k) solution .
I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .
Can someone explain the correct solution with a pseudocode .
P.S.
Please DON'T post this link as answer/comment.It DOESN'T contain the answer.
I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.
Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?
We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.
This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.
If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.
I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).
A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.
OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.
Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).
EDIT: This does not work. I leave the answer, since apparently I am not the only one who could have this kind of idea; see the discussion below.
A counter-example is x = (2, 3, 6), y = (1, 4, 5) and k=3, where the algorithm gives 7 (3+4) instead of 8 (3+5).
Let x and y be the two arrays, sorted in decreasing order; we want to construct the K-th largest sum.
The variables are: i the index in the first array (element x[i]), j the index in the second array (element y[j]), and k the "order" of the sum (k in 1..K), in the sense that S(k)=x[i]+y[j] will be the k-th greater sum satisfying your conditions (this is the loop invariant).
Start from (i, j) equal to (0, 0): clearly, S(1) = x[0]+y[0].
for k from 1 to K-1, do:
if x[i+1]+ y[j] > x[i] + y[j+1], then i := i+1 (and j does not change) ; else j:=j+1
To see that it works, consider you have S(k) = x[i] + y[j]. Then, S(k+1) is the greatest sum which is lower (or equal) to S(k), and such as at least one element (i or j) changes. It is not difficult to see that exactly one of i or j should change.
If i changes, the greater sum you can construct which is lower than S(k) is by setting i=i+1, because x is decreasing and all the x[i'] + y[j] with i' < i are greater than S(k). The same holds for j, showing that S(k+1) is either x[i+1] + y[j] or x[i] + y[j+1].
Therefore, at the end of the loop you found the K-th greater sum.
tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in O(K) time.
Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).
Let's see: first of all, the arrays are sorted. So, if the arrays are a and b with lengths M and N, and as you have arranged them, the largest items are in slots M and N respectively, the largest pair will always be a[M]+b[N].
Now, what's the second largest pair? It's going to have perhaps one of {a[M],b[N]} (it can't have both, because that's just the largest pair again), and at least one of {a[M-1],b[N-1]}. BUT, we also know that if we choose a[M-1]+b[N-1], we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.
Consider the following two arrays: a = [1, 2, 53]; b = [66, 67, 68]. Our highest pair is 53+68. If we lose the smaller of those two, our pair is 68+2; if we lose the larger, it's 53+67. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost Q).
At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from {{a[M],b[N]}, {a[M-1],b[N-1]}. So, we also need to look behind.
So, let's code (python, should be 2/3 compatible):
def kth(a,b,k):
M = len(a)
N = len(b)
if k > M*N:
raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
(ia,ib) = M-1,N-1 #0 based arrays
# we need this for lookback
nottakenindices = (0,0) # could be any value
nottakensum = float('-inf')
for i in range(k-1):
optionone = a[ia]+b[ib-1]
optiontwo = a[ia-1]+b[ib]
biggest = max((optionone,optiontwo))
#first deal with look behind
if nottakensum > biggest:
if optionone == biggest:
newnottakenindices = (ia,ib-1)
else: newnottakenindices = (ia-1,ib)
ia,ib = nottakenindices
nottakensum = biggest
nottakenindices = newnottakenindices
#deal with case where indices hit 0
elif ia <= 0 and ib <= 0:
ia = ib = 0
elif ia <= 0:
ib-=1
ia = 0
nottakensum = float('-inf')
elif ib <= 0:
ia-=1
ib = 0
nottakensum = float('-inf')
#lookahead cases
elif optionone > optiontwo:
#then choose the first option as our next pair
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
elif optionone < optiontwo: # choose the second
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#next two cases apply if options are equal
elif a[ia] > b[ib]:# drop the smallest
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
else: # might be equal or not - we can choose arbitrarily if equal
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#+2 - one for zero-based, one for skipping the 1st largest
data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
if ia <= 0 and ib <= 0:
raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
return data, narrative
For those without python, here's an ideone: http://ideone.com/tfm2MA
At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.
Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.
Here's a reference implementation (not O(K), but will always work, unless there's a corner case with cases where pairs have equal sums):
import itertools
def refkth(a,b,k):
(rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
data = k,righta,rightb,righta+rightb,rightia,rightib
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
return data, narrative
This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The enumerate function decorates each item with its index.
The max-heap algorithm in the other question is simple, fast and correct. Don't knock it. It's really well explained too. https://stackoverflow.com/a/5212618/284795
Might be there isn't any O(k) algorithm. That's okay, O(k log k) is almost as fast.
If the last two solutions were at (a1, b1), (a2, b2), then it seems to me there are only four candidate solutions (a1-1, b1) (a1, b1-1) (a2-1, b2) (a2, b2-1). This intuition could be wrong. Surely there are at most four candidates for each coordinate, and the next highest is among the 16 pairs (a in {a1,a2,a1-1,a2-1}, b in {b1,b2,b1-1,b2-1}). That's O(k).
(No it's not, still not sure whether that's possible.)
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
Merge the 2 arrays and note down the indexes in the sorted array. Here is the index array looks like (starting from 1 not 0)
[1, 2, 4, 6, 8]
[3, 5, 7, 9]
Now start from end and make tuples. sum the elements in the tuple and pick the kth largest sum.
public static List<List<Integer>> optimization(int[] nums1, int[] nums2, int k) {
// 2 * O(n log(n))
Arrays.sort(nums1);
Arrays.sort(nums2);
List<List<Integer>> results = new ArrayList<>(k);
int endIndex = 0;
// Find the number whose square is the first one bigger than k
for (int i = 1; i <= k; i++) {
if (i * i >= k) {
endIndex = i;
break;
}
}
// The following Iteration provides at most endIndex^2 elements, and both arrays are in ascending order,
// so k smallest pairs must can be found in this iteration. To flatten the nested loop, refer
// 'https://stackoverflow.com/questions/7457879/algorithm-to-optimize-nested-loops'
for (int i = 0; i < endIndex * endIndex; i++) {
int m = i / endIndex;
int n = i % endIndex;
List<Integer> item = new ArrayList<>(2);
item.add(nums1[m]);
item.add(nums2[n]);
results.add(item);
}
results.sort(Comparator.comparing(pair->pair.get(0) + pair.get(1)));
return results.stream().limit(k).collect(Collectors.toList());
}
Key to eliminate O(n^2):
Avoid cartesian product(or 'cross join' like operation) of both arrays, which means flattening the nested loop.
Downsize iteration over the 2 arrays.
So:
Sort both arrays (Arrays.sort offers O(n log(n)) performance according to Java doc)
Limit the iteration range to the size which is just big enough to support k smallest pairs searching.

Sum-subset with a fixed subset size

The sum-subset problem states:
Given a set of integers, is there a non-empty subset whose sum is zero?
This problem is NP-complete in general. I'm curious if the complexity of this slight variant is known:
Given a set of integers, is there a subset of size k whose sum is zero?
For example, if k = 1, you can do a binary search to find the answer in O(log n). If k = 2, then you can get it down to O(n log n) (e.g. see Find a pair of elements from an array whose sum equals a given number). If k = 3, then you can do O(n^2) (e.g. see Finding three elements in an array whose sum is closest to a given number).
Is there a known bound that can be placed on this problem as a function of k?
As motivation, I was thinking about this question How do you partition an array into 2 parts such that the two parts have equal average? and trying to determine if it is actually NP-complete. The answer lies in whether or not there is a formula as described above.
Barring a general solution, I'd be very interested in knowing an optimal bound for k=4.
For k=4, space complexity O(n), time complexity O(n2 * log(n))
Sort the array. Starting from 2 smallest and 2 largest elements, calculate all lesser sums of 2 elements (a[i] + a[j]) in the non-decreasing order and all greater sums of 2 elements (a[k] + a[l]) in the non-increasing order. Increase lesser sum if total sum is less than zero, decrease greater one if total sum is greater than zero, stop when total sum is zero (success) or a[i] + a[j] > a[k] + a[l] (failure).
The trick is to iterate through all the indexes i and j in such a way, that (a[i] + a[j]) will never decrease. And for k and l, (a[k] + a[l]) should never increase. A priority queue helps to do this:
Put key=(a[i] + a[j]), value=(i = 0, j = 1) to priority queue.
Pop (sum, i, j) from priority queue.
Use sum in the above algorithm.
Put (a[i+1] + a[j]), i+1, j and (a[i] + a[j+1]), i, j+1 to priority queue only if these elements were not already used. To keep track of used elements, maintain an array of maximal used 'j' for each 'i'. It is enough to use only values for 'j', that are greater, than 'i'.
Continue from step 2.
For k>4
If space complexity is limited to O(n), I cannot find anything better, than use brute force for k-4 values and the above algorithm for the remaining 4 values. Time complexity O(n(k-2) * log(n)).
For very large k integer linear programming may give some improvement.
Update
If n is very large (on the same order as maximum integer value), it is possible to implement O(1) priority queue, improving complexities to O(n2) and O(n(k-2)).
If n >= k * INT_MAX, different algorithm with O(n) space complexity is possible. Precalculate a bitset for all possible sums of k/2 values. And use it to check sums of other k/2 values. Time complexity is O(n(ceil(k/2))).
The problem of determining whether 0 in W + X + Y + Z = {w + x + y + z | w in W, x in X, y in Y, z in Z} is basically the same except for not having annoying degenerate cases (i.e., the problems are inter-reducible with minimal resources).
This problem (and thus the original for k = 4) has an O(n^2 log n)-time, O(n)-space algorithm. The O(n log n)-time algorithm for k = 2 (to determine whether 0 in A + B) accesses A in sorted order and B in reverse sorted order. Thus all we need is an O(n)-space iterator for A = W + X, which can be reused symmetrically for B = Y + Z. Let W = {w1, ..., wn} in sorted order. For all x in X, insert a key-value item (w1 + x, (1, x)) into a priority queue. Repeatedly remove the min element (wi + x, (i, x)) and insert (wi+1 + x, (i+1, x)).
Question that is very similar:
Is this variant of the subset sum problem easier to solve?
It's still NP-complete.
If it were not, the subset-sum would also be in P, as it could be represented as F(1) | F(2) | ... F(n) where F is your function. This would have O(O(F(1)) + O(F(2)) + O(F(n))) which would still be polynomial, which is incorrect as we know it's NP-complete.
Note that if you have certain bounds on the inputs you can achieve polynomial time.
Also note that the brute-force runtime can be calculated with binomial coefficients.
The solution for k=4 in O(n^2log(n))
Step 1: Calculate the pairwise sum and sort the list. There are n(n-1)/2 sums. So the complexity is O(n^2log(n)). Keep the identities of the individuals which make the sum.
Step 2: For each element in the above list search for the complement and make sure they don't share "the individuals). There are n^2 searches, each with complexity O(log(n))
EDIT: The space complexity of the original algorithm is O(n^2). The space complexity can be reduced to O(1) by simulating a virtual 2D matrix (O(n), if you consider space to store sorted version of the array).
First about 2D matrix: sort the numbers and create a matrix X using pairwise sums. Now the matrix is ins such a way that all the rows and columns are sorted. To search for a value in this matrix, search the numbers on the diagonal. If the number is in between X[i,i] and X[i+1,i+1], you can basically halve the search space by to matrices X[i:N, 0:i] and X[0:i, i:N]. The resulting search algorithm is O(log^2n) (I AM NOT VERY SURE. CAN SOMEBODY CHECK IT?).
Now, instead of using a real matrix, use a virtual matrix where X[i,j] are calculated as needed instead of pre-computing them.
Resulting time complexity: O( (nlogn)^2 ).
PS: In the following link, it says the complexity of 2D sorted matrix search is O(n) complexity. If that is true (i.e. O(log^2n) is incorrect), then the finally complexity is O(n^3).
To build on awesomo's answer... if we can assume that numbers are sorted, we can do better than O(n^k) for given k; simply take all O(n^(k-1)) subsets of size (k-1), then do a binary search in what remains for a number that, when added to the first (k-1), gives the target. This is O(n^(k-1) log n). This means the complexity is certainly less than that.
In fact, if we know that the complexity is O(n^2) for k=3, we can do even better for k > 3: choose all (k-3)-subsets, of which there are O(n^(k-3)), and then solve the problem in O(n^2) on the remaining elements. This is O(n^(k-1)) for k >= 3.
However, maybe you can do even better? I'll think about this one.
EDIT: I was initially going to add a lot proposing a different take on this problem, but I've decided to post an abridged version. I encourage other posters to see whether they believe this idea has any merit. The analysis is tough, but it might just be crazy enough to work.
We can use the fact that we have a fixed k, and that sums of odd and even numbers behave in certain ways, to define a recursive algorithm to solve this problem.
First, modify the problem so that you have both even and odd numbers in the list (this can be accomplished by dividing by two if all are even, or by subtracting 1 from numbers and k from the target sum if all are odd, and repeating as necessary).
Next, use the fact that even target sums can be reached only by using an even number of odd numbers, and odd target sums can be reached using only an odd number of odd numbers. Generate appropriate subsets of the odd numbers, and call the algorithm recursively using the even numbers, the sum minus the sum of the subset of odd numbers being examined, and k minus the size of the subset of odd numbers. When k = 1, do binary search. If ever k > n (not sure this can happen), return false.
If you have very few odd numbers, this could allow you to very quickly pick up terms that must be part of a winning subset, or discard ones that cannot. You can transform problems with lots of even numbers to equivalent problems with lots of odd numbers by using the subtraction trick. The worst case must therefore be when the numbers of even and odd numbers are very similar... and that's where I am right now. A uselessly loose upper bound on this is many orders of magnitudes worse than brute-force, but I feel like this is probably at least as good as brute-force. Thoughts are welcome!
EDIT2: An example of the above, for illustration.
{1, 2, 2, 6, 7, 7, 20}, k = 3, sum = 20.
Subset {}:
{2, 2, 6, 20}, k = 3, sum = 20
= {1, 1, 3, 10}, k = 3, sum = 10
Subset {}:
{10}, k = 3, sum = 10
Failure
Subset {1, 1}:
{10}, k = 1, sum = 8
Failure
Subset {1, 3}:
{10}, k = 1, sum = 6
Failure
Subset {1, 7}:
{2, 2, 6, 20}, k = 1, sum = 12
Failure
Subset {7, 7}:
{2, 2, 6, 20}, k = 1, sum = 6
Success
The time complexity is trivially O(n^k) (number of k-sized subsets from n elements).
Since k is a given constant, a (possibly quite high-order) polynomial upper bounds the complexity as a function of n.

Resources