How To Find K-th Smallest Element in Multiset-sum? - algorithm

Need some help designing an algorithm to solve this problem.
Let a and b be integers with a ≤ b, and let [a,b] denote the set {a, a + 1, a + 2, ..., b}. Suppose we are given n such sets, [a1,b1],...[an,bn], their multiset-sum is
S = {a1, a1 + 1,..., b1, a2,a2 + 1,...,b2,...,an,an + 1, ..., bn}
For example, the multiset-sum of [5,25], [3,10], and [8,12], is
{3,4,5,5,6,6,7,7,8,8,8,9,9,9,10,10,10,...,25}
Given the sets[a1, b1],...,[an, bn] such that 0 ≤ ai, bi ≤ N and an integer k > 0, design an efficient algorithm that outputs the k smallest element in S, the multiset-sum of the sets. Determine the running time of the algorithm in terms of n and N.
I've already designed two helper algorithms called FindElementsBefore(x, [a1,b1]...[an,bn]) and FindElementsAfter(x, [a1,b1]...[an,bn]). These both accept an element x and each of the sets and return the number of elements in S less than x and greater than x respectively.
I've been told by my professor that using these two helper methods, I should be able to solve the above problem, but I am absolutely stumped. How do I solve this?

Use a binary search.
You already know the largest and smallest values in your multiset-sum. Thus, you have an upper and lower bound for the k-th smallest element. Now you can simply recurse on the upper and lower bounds, depending on the value of FindElementsBefore(mid, ...) <= k.

Related

Select k numbers maximizing sum of pairwise xor

Given a range [l, r] (where l < r), and a number k (where k <= r - l), I want to select a set S of k distinct numbers in [l, r] which maximizes the sum of pairwise xors. For example, if [l, r] = [2, 10] and k = 3 and we choose S = {4, 5, 6}, the sum of xors is d(4, 5) + d(4, 6) + d(5, 6) = 1 + 1 + 2 = 4.
Here's my thinking so far: in [l, r], for each bit index i less than or equal to the index of the highest set bit in r, the number of elements in S ^ S with the ith bit set is equal to j * (k-j), where j is the count of the elements in S with the ith bit set. To optimize this we want to select S such that, for each bit i, S contains k/2 elements with the ith bit set. This is easy for k = 2, but I'm stuck on generalizing this for k > 2.
At a first glance it seems that there is no algebraic solution for this problem. I mean, this seems like a NP-hard problem (a optimizational problem) that is not solvable in polynomial time.
As almost always possible one can brute force through the feasible space.
Intuitively, I can suggest to look into Locality Sensitive Hashing. In LSH one normally tries to find similarities between two sets. But in you case, you can abuse this algorithm in the following sense.
The domain is subdivided into few buckets.
You sample randomly points in the space [l,r].
High probable points (large Hamming distance) are placed in the buckets.
In the end you brute force in the most probable bucket.
In the end one can expect that points with large Hamming distances should be in the same neighborhood (that's why the name Locality Sensitive Hashing). However, it is just an idea.

Kth Smallest SUM In Two Sorted Arrays - Binary Search Solution

I am trying to solve an interview practice problem.
The problem is:
Given two integer arrays sorted in ascending order and an integer k. Define sum = a + b, where a is an element from the first array and b is an element from the second one. Find the kth smallest sum out of all possible sums.
For example
Given [1, 7, 11] and [2, 4, 6].
For k = 3, return 7.
For k = 4, return 9.
For k = 8, return 15.
We define n as the size of A, and m as the size of B.
I know how to solve it using heap (O(k log min(n, m, k)) time complexity). But the problem states that there is another binary search method to do it with O( (m + n) log maxValue), where maxValue is the max number in A and B. Can anyone give some comments for solving it using binary search?
My thinking is that we may use x = A[] + B[] as the searching object, because the k-th x is what we want. If so, how can x be updated in binary search? How can I check if the updated x is valid or not (such a pair really exists or not)?
Thank you
The original problem is here:
https://www.lintcode.com/en/problem/kth-smallest-sum-in-two-sorted-arrays/
You can solve for binary search and sliding window, and the time complexity is O((N + M) log maxvalue).
Let's think solving this problem (I call it counting problem).
You are given integers N, M, S, sequences a and b.
The length of sequence a is exactly N.
The length of sequence b is exactly M.
The sequence a, b is sorted.
Please calculate the number of pairs that satisfies a[i]+b[j]<=S (0<=i<=N-1, 0<=j<=M-1).
Actually, this counting problem can solve for binary search in O(N log M) time.
Surprisingly, this problem can solve for O(N+M).
Binary Search Algorithm
You can solve the maximum value of x that satisfies a[i] + b[x] <= S --> b[x] <= S - a[i] for O(log M).
Therefore, you can solve the number of value of x for O(log M) because it is equal to x+1.
O(N+M) Algorithm
Actually, if you do i++, the value of x is equal or less than previous value of x.
So you can use sliding window algorithm and.
You can solve for O(N+M), because you have to do operation i++ N times, and operation x-- M times.
Solving this main problem
You can binary_search for S and you can solve the inequality (counting problem's answer <= K).
The answer is the maximum value of S.
The time complexity is O((N + M) log maxvalue).

Pseudo polynomial or fast solution for the relaxed subset-sum

I have an array A of positive integers [a0, a1, a2, ..., an] and a positive number K. I need to find all (or almost all) pairs of subsets U and V of array A such as:
sum of all elements in U are less or equal to K
sum of all elements in V are less or equal to K
U + V may contain not all elements of original array A
all elements from U should come before all elements in V in initial array A. For example, let's imagine that we choose U = [a1, a3, a5] then we can start building array V only from a6. It is not allowed to use element a0, a2 or a4 in this case.
I was able to find DP solution, which is O(N^2 * K^2) (where N is total number of elements in A). Although N and K are small (< 100) it is still too slow.
I'm looking for some approximation algorithm or pseudo-polynomial dynamic programming algorithm. Bin packing problem looks similar to mine, but I'm not sure how I can apply it to my constraints...
Please advise.
EDIT: each number has upper bound equal to 50

Sum-subset with a fixed subset size

The sum-subset problem states:
Given a set of integers, is there a non-empty subset whose sum is zero?
This problem is NP-complete in general. I'm curious if the complexity of this slight variant is known:
Given a set of integers, is there a subset of size k whose sum is zero?
For example, if k = 1, you can do a binary search to find the answer in O(log n). If k = 2, then you can get it down to O(n log n) (e.g. see Find a pair of elements from an array whose sum equals a given number). If k = 3, then you can do O(n^2) (e.g. see Finding three elements in an array whose sum is closest to a given number).
Is there a known bound that can be placed on this problem as a function of k?
As motivation, I was thinking about this question How do you partition an array into 2 parts such that the two parts have equal average? and trying to determine if it is actually NP-complete. The answer lies in whether or not there is a formula as described above.
Barring a general solution, I'd be very interested in knowing an optimal bound for k=4.
For k=4, space complexity O(n), time complexity O(n2 * log(n))
Sort the array. Starting from 2 smallest and 2 largest elements, calculate all lesser sums of 2 elements (a[i] + a[j]) in the non-decreasing order and all greater sums of 2 elements (a[k] + a[l]) in the non-increasing order. Increase lesser sum if total sum is less than zero, decrease greater one if total sum is greater than zero, stop when total sum is zero (success) or a[i] + a[j] > a[k] + a[l] (failure).
The trick is to iterate through all the indexes i and j in such a way, that (a[i] + a[j]) will never decrease. And for k and l, (a[k] + a[l]) should never increase. A priority queue helps to do this:
Put key=(a[i] + a[j]), value=(i = 0, j = 1) to priority queue.
Pop (sum, i, j) from priority queue.
Use sum in the above algorithm.
Put (a[i+1] + a[j]), i+1, j and (a[i] + a[j+1]), i, j+1 to priority queue only if these elements were not already used. To keep track of used elements, maintain an array of maximal used 'j' for each 'i'. It is enough to use only values for 'j', that are greater, than 'i'.
Continue from step 2.
For k>4
If space complexity is limited to O(n), I cannot find anything better, than use brute force for k-4 values and the above algorithm for the remaining 4 values. Time complexity O(n(k-2) * log(n)).
For very large k integer linear programming may give some improvement.
Update
If n is very large (on the same order as maximum integer value), it is possible to implement O(1) priority queue, improving complexities to O(n2) and O(n(k-2)).
If n >= k * INT_MAX, different algorithm with O(n) space complexity is possible. Precalculate a bitset for all possible sums of k/2 values. And use it to check sums of other k/2 values. Time complexity is O(n(ceil(k/2))).
The problem of determining whether 0 in W + X + Y + Z = {w + x + y + z | w in W, x in X, y in Y, z in Z} is basically the same except for not having annoying degenerate cases (i.e., the problems are inter-reducible with minimal resources).
This problem (and thus the original for k = 4) has an O(n^2 log n)-time, O(n)-space algorithm. The O(n log n)-time algorithm for k = 2 (to determine whether 0 in A + B) accesses A in sorted order and B in reverse sorted order. Thus all we need is an O(n)-space iterator for A = W + X, which can be reused symmetrically for B = Y + Z. Let W = {w1, ..., wn} in sorted order. For all x in X, insert a key-value item (w1 + x, (1, x)) into a priority queue. Repeatedly remove the min element (wi + x, (i, x)) and insert (wi+1 + x, (i+1, x)).
Question that is very similar:
Is this variant of the subset sum problem easier to solve?
It's still NP-complete.
If it were not, the subset-sum would also be in P, as it could be represented as F(1) | F(2) | ... F(n) where F is your function. This would have O(O(F(1)) + O(F(2)) + O(F(n))) which would still be polynomial, which is incorrect as we know it's NP-complete.
Note that if you have certain bounds on the inputs you can achieve polynomial time.
Also note that the brute-force runtime can be calculated with binomial coefficients.
The solution for k=4 in O(n^2log(n))
Step 1: Calculate the pairwise sum and sort the list. There are n(n-1)/2 sums. So the complexity is O(n^2log(n)). Keep the identities of the individuals which make the sum.
Step 2: For each element in the above list search for the complement and make sure they don't share "the individuals). There are n^2 searches, each with complexity O(log(n))
EDIT: The space complexity of the original algorithm is O(n^2). The space complexity can be reduced to O(1) by simulating a virtual 2D matrix (O(n), if you consider space to store sorted version of the array).
First about 2D matrix: sort the numbers and create a matrix X using pairwise sums. Now the matrix is ins such a way that all the rows and columns are sorted. To search for a value in this matrix, search the numbers on the diagonal. If the number is in between X[i,i] and X[i+1,i+1], you can basically halve the search space by to matrices X[i:N, 0:i] and X[0:i, i:N]. The resulting search algorithm is O(log^2n) (I AM NOT VERY SURE. CAN SOMEBODY CHECK IT?).
Now, instead of using a real matrix, use a virtual matrix where X[i,j] are calculated as needed instead of pre-computing them.
Resulting time complexity: O( (nlogn)^2 ).
PS: In the following link, it says the complexity of 2D sorted matrix search is O(n) complexity. If that is true (i.e. O(log^2n) is incorrect), then the finally complexity is O(n^3).
To build on awesomo's answer... if we can assume that numbers are sorted, we can do better than O(n^k) for given k; simply take all O(n^(k-1)) subsets of size (k-1), then do a binary search in what remains for a number that, when added to the first (k-1), gives the target. This is O(n^(k-1) log n). This means the complexity is certainly less than that.
In fact, if we know that the complexity is O(n^2) for k=3, we can do even better for k > 3: choose all (k-3)-subsets, of which there are O(n^(k-3)), and then solve the problem in O(n^2) on the remaining elements. This is O(n^(k-1)) for k >= 3.
However, maybe you can do even better? I'll think about this one.
EDIT: I was initially going to add a lot proposing a different take on this problem, but I've decided to post an abridged version. I encourage other posters to see whether they believe this idea has any merit. The analysis is tough, but it might just be crazy enough to work.
We can use the fact that we have a fixed k, and that sums of odd and even numbers behave in certain ways, to define a recursive algorithm to solve this problem.
First, modify the problem so that you have both even and odd numbers in the list (this can be accomplished by dividing by two if all are even, or by subtracting 1 from numbers and k from the target sum if all are odd, and repeating as necessary).
Next, use the fact that even target sums can be reached only by using an even number of odd numbers, and odd target sums can be reached using only an odd number of odd numbers. Generate appropriate subsets of the odd numbers, and call the algorithm recursively using the even numbers, the sum minus the sum of the subset of odd numbers being examined, and k minus the size of the subset of odd numbers. When k = 1, do binary search. If ever k > n (not sure this can happen), return false.
If you have very few odd numbers, this could allow you to very quickly pick up terms that must be part of a winning subset, or discard ones that cannot. You can transform problems with lots of even numbers to equivalent problems with lots of odd numbers by using the subtraction trick. The worst case must therefore be when the numbers of even and odd numbers are very similar... and that's where I am right now. A uselessly loose upper bound on this is many orders of magnitudes worse than brute-force, but I feel like this is probably at least as good as brute-force. Thoughts are welcome!
EDIT2: An example of the above, for illustration.
{1, 2, 2, 6, 7, 7, 20}, k = 3, sum = 20.
Subset {}:
{2, 2, 6, 20}, k = 3, sum = 20
= {1, 1, 3, 10}, k = 3, sum = 10
Subset {}:
{10}, k = 3, sum = 10
Failure
Subset {1, 1}:
{10}, k = 1, sum = 8
Failure
Subset {1, 3}:
{10}, k = 1, sum = 6
Failure
Subset {1, 7}:
{2, 2, 6, 20}, k = 1, sum = 12
Failure
Subset {7, 7}:
{2, 2, 6, 20}, k = 1, sum = 6
Success
The time complexity is trivially O(n^k) (number of k-sized subsets from n elements).
Since k is a given constant, a (possibly quite high-order) polynomial upper bounds the complexity as a function of n.

Find sum in array equal to zero

Given an array of integers, find a set of at least one integer which sums to 0.
For example, given [-1, 8, 6, 7, 2, 1, -2, -5], the algorithm may output [-1, 6, 2, -2, -5] because this is a subset of the input array, which sums to 0.
The solution must run in polynomial time.
You'll have a hard time doing this in polynomial time, as the problem is known as the Subset sum problem, and is known to be NP-complete.
If you do find a polynomial solution, though, you'll have solved the "P = NP?" problem, which will make you quite rich.
The closest you get to a known polynomial solution is an approximation, such as the one listed on Wikipedia, which will try to get you an answer with a sum close to, but not necessarily equal to, 0.
This is a Subset sum problem, It's NP-Compelete but there is pseudo polynomial time algorithm for it. see wiki.
The problem can be solved in polynomial if the sum of items in set is polynomially related to number of items, from wiki:
The problem can be solved as follows
using dynamic programming. Suppose the
sequence is
x1, ..., xn
and we wish to determine if there is a
nonempty subset which sums to 0. Let N
be the sum of the negative values and
P the sum of the positive values.
Define the boolean-valued function
Q(i,s) to be the value (true or false)
of
"there is a nonempty subset of x1, ..., xi which sums to s".
Thus, the solution to the problem is
the value of Q(n,0).
Clearly, Q(i,s) = false if s < N or s
P so these values do not need to be stored or computed. Create an array to
hold the values Q(i,s) for 1 ≤ i ≤ n
and N ≤ s ≤ P.
The array can now be filled in using a
simple recursion. Initially, for N ≤ s
≤ P, set
Q(1,s) := (x1 = s).
Then, for i = 2, …, n, set
Q(i,s) := Q(i − 1,s) or (xi = s) or Q(i − 1,s − xi) for N ≤ s ≤ P.
For each assignment, the values of Q
on the right side are already known,
either because they were stored in the
table for the previous value of i or
because Q(i − 1,s − xi) = false if s −
xi < N or s − xi > P. Therefore, the
total number of arithmetic operations
is O(n(P − N)). For example, if all
the values are O(nk) for some k, then
the time required is O(nk+2).
This algorithm is easily modified to
return the subset with sum 0 if there
is one.
This solution does not count as
polynomial time in complexity theory
because P − N is not polynomial in the
size of the problem, which is the
number of bits used to represent it.
This algorithm is polynomial in the
values of N and P, which are
exponential in their numbers of bits.
A more general problem asks for a
subset summing to a specified value
(not necessarily 0). It can be solved
by a simple modification of the
algorithm above. For the case that
each xi is positive and bounded by the
same constant, Pisinger found a linear
time algorithm.[2]
It is well known Subset sum problem which NP-complete problem.
If you are interested in algorithms then most probably you are math enthusiast that I advise you look at
Subset Sum problem in mathworld
and here you can find the algorithm for it
Polynomial time approximation algorithm
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi+y,
for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in
increasing order do //trim the list by
eliminating numbers
close one to another
if y<(1-c/N)z, set y=z and add z to S
if S contains a number between (1-c)s and s, output yes, otherwise no

Resources