Find a subset with sum within a range - algorithm

How can I find a subset of an array that the sum of its elements is within a given range?
For example:
let a = [ 1, 1, 3, 6, 7, 50]
let b = getSubsetSumRange(3, 5)
so b could potentially be [1, 1, 3], [1, 3], [3], [1, 3]; It doesn't matter the order I only need one of them.

You would probably like to use dynamic programming approach to solve this problem.
Let F[i][j] have value true if it is possible to select some numbers among numbers from the original subset a[1..i] so that their sum is equal to j.
i would obviously vary from 1 to length of a, and j from 0 to max inclusively, where max is the second number from your given range.
F[i][0] = true for all i by definition (you can always select empty subset).
Then F[i][j] = F[i - 1][j - a[i]] | F[i - 1][j]
Logically it means that if you can select a subset with sum j from elements 1..i-1, then you obviously can do it with the subset 1..i, and if you can select a subset with sum j - a[i] from elements 1..i-1, then by adding your new element a[i] to that subset, you can get your desired sum j.
After you have calculated the values of F, you can find any F[n][j] that is true for values j lying in your desired range.
Say you have found such number k. Then the algorithm to find the required set would look like that:
for i = n..1:
if F[i - 1][k - a[i]] == True then
output a[i] to the answer
k -= a[i]
if k == 0
break

Related

knapsack problem variation with almost no constraints

I have this variation of knapsack with very few constraints, and the lack of contraints does that i really don't know where to start.
Given a set S of positive integers. could be:
1 2 3 4 5 6 7 8 9 10 11 12 13
find two non-overlapping subsets that have the same total each. The two sets do not need to contain all numbers in S.
So for the former example, the answer would be
[1,2] and [3]
Usually these problems have constraints such as the subsets needing to have specific sums, or the subsets needing to span over all elements of S.
This makes it hard for me to imagine how to solve this via bruteforce. Every time I come up with a dynamic programming table, I can't get it to cover all possible permutations of subsets
This problem might be solved like subset sum problem in pseudopolynomial time O(n*summ)
We fill array 0..summ with possible subset sums, and when we meet the same sum - we stop.
Two equal sums might be composed with some equal items - and we just remove them, so the rest sums contain only distinct items.
Example in Python using binary arithmetics to store sets (bit i+1 corresponds to using i-th item in sum). common contains equal bits, we remove them using xor operation.
The last lines retrieve needed sets themselves.
L = [2,5,11,17,29,37,43]
summ = sum(L)
A =[0]*(summ+1)
A[0] = 1
X = 0
Y = 0
for i in range(len(L)):
for k in range(summ, L[i] - 1, -1):
if A[k - L[i]]:
t = A[k - L[i]] | (2 << i)
if A[k]:
common = A[k] & t
X = A[k] ^ common
Y = t ^ common
break
else:
A[k] = t
if X:
break
first = [L[i] for i in range(len(L)) if (X & (2 << i))]
second = [L[i] for i in range(len(L)) if (Y & (2 << i))]
print(first, second)
>>> [2, 11, 29] [5, 37]
In this example code finds equal sums 59 for [2, 11, 17, 29] and [5, 17, 37] and removes common 17 to get final results with sum 42
It is not obligatory to store sets in A[] cells - we can store the last item of sum, then unwind item sequence
L = [2,5,11,17,29,37,43]
summ = sum(L)
A =[0]*(summ+1)
A[0] = -1
last = 0
for i in range(len(L)):
for k in range(summ, L[i] - 1, -1):
if A[k - L[i]]:
t = L[i]
if A[k]:
last = k
break
else:
A[k] = t
if last:
break
first = set()
k = last
while k:
first.add(A[k])
k = k - A[k]
second = set()
second.add(t)
k = last - t
while k:
second.add(A[k])
k = k - A[k]
print(first.difference(second),second.difference(first))
>>> {2, 11, 29} {37, 5}

Arrange n items in k nonempty groups such that the difference between the minimum element and the maximum element of each group is minimized

Given N items with values x[1], ..., x[n] and an integer K find a linear time algorithm to arrange these N items in K non empty groups such that in each group the range (difference between minimum and maximum element values/keys in each group) is minimized and therefore the sum of the ranges is minimum.
For example given N=4, K=2 and the elements 1 1 4 3 the minimum range is 1 for groups (1,1) and (4,3).
You can binary search the answer.
Assume the optimal answer is x. Now you should verify whether we can group the items into k groups where the maximum difference between the group items is at most x. This can be done in O(n) [after sorting the array]. Traverse the sorted array and pick consecutive items until the difference between minimum number you have picked for this group and the maximum number you have picked hasn't exceeded x. After that you should initialize a new group and repeat this process. At the end count how many groups you have made. If the number of groups is more than k we can conclude that we can not group the items in k groups with x being the answer. So we should increase x. By binary searching on x we can find the minimum x.
The overall complexity is O(NlogN).
Here is a sample implementation in C++
#include <algorithm>
#include <iostream>
using namespace std;
int main()
{
int n = 4, k = 2;
std::vector<int> v = {1, 1, 4, 3};
sort(v.begin(), v.end());
int low = 0, high = *max_element(v.begin(), v.end());
while ( low < high ){
int x = (low+high)/2;
int groups = 0;
int left = 0;
while (left < v.size()){
int right = left;
while( right < v.size() && v[right] - v[left] <= x ){
++right;
}
++groups;
left = right;
}
// printf("x:%d groups:%d\n", x, groups );
if (groups > k)
{
low = x + 1;
} else {
high = x;
}
}
cout << "result is " << low << endl;
}
Alright, I'll assume that we want to minimize the sum of differences over all groups.
Let's sort the numbers. There's an optimal answer where each group is a consecutive segment in the sorted array (proof: let A1 < B1 < A2 < B2. We can exchange A2 and B1. The answer will not increase).
Let a[l], a[l + 1], ..., a[r] is a group. It's cost is a[r] - a[l] = (a[r] - a[r - 1]) + (a[r - 1] - a[r - 2]) + ... + (a[l + 1] - a[l]). It leads us to a key insight: k groups is k - 1 gaps and the answer is a[n - 1] - a[0] - sum of gaps. Thus, we just need to maximize the gaps.
Here is a final solution:
sort the array
compute differences between adjacent numbers
take k - 1 largest differences. That's exactly where the groups split.
We can find the k-1th largest element in linear time (or if we are fine with O(N log N) time, we can just sort them). That's it.
Here is an example:
x = [1, 1, 4, 3], k = 2
sorted: [1, 1, 3, 4]
differences: [0, 2, 1]
taking k - 1 = 1 largest gaps: it's 2. Thus the groups are [1, 1] and [3, 4].
A slightly more contrived one:
x = [8, 2, 0, 3], k = 3
sorted: [0, 2, 3, 8]
differences: [2, 1, 5]
taking k - 1 = 2 largest gaps: they're 2 and 5. Thus, the groups are [0], [2, 3], [8] with the total cost of 1.

Finding largest sum in an unsorted array using divide and conquer algorithm

I have a sequence of n real numbers stored in a array, A[1], A[2], …, A[n]. I am trying to implement a divide and conquer algorithm to find two numbers A[i] and A[j], where i < j, such that A[i] ≤ A[j] and their sum is the largest.
For eg. {2, 5, 9, 3, -2, 7} will give the output of 14 (5+9, not 16=9+7). Can anyone suggest me some ideas on how to do it?
Thanks in advance.
This problem is not really suited to a divide and conquer approach. It's easy to observe that if (i, j) is a solution for this problem, then A[j] >= A[k] for every k > j, i.e A[j] is the maximum in A[j..n]
Prove: if there exists such k > j and A[k] > A[j], then (j, k) is a better solution than (i, j)
So we only need to consider js that satisfies that criteria.
Algorithm (pseudo-code)
maxj = n
for (j = n - 1 down to 1):
if (a[j] > a[maxj]) then:
maxj = j
else:
check if (j, maxj) is a better solution
Complexity: O(n)
C++ implementation: http://ideone.com/ENp5WR (The implementation use an integer array, but it should be the same for floats)
Declare two variables, during your algorithm check if the current number is bigger than either of the two values currently be stored in the variables, if yes replace the smallest, if not, continue.
Here's a recursive solution in Python. I wouldn't exactly call it "divide and conquer" but then again, this problem isn't very suited to a divide and conquer approach.
def recurse(lst, pair): # the remaining list left to process
if not lst: return # if lst is empty, return
for i in lst[1:]: # for each elements in lst starting from index 1
curr_sum = lst[0] + i
if lst[0] < i and curr_sum > pair[0]+pair[1]: # if the first value is less than the second and curr_sum is greater than the max sum so far
pair[0] = lst[0]
pair[1] = i # update pair to contain the new pair of values that give the max sum
recurse(lst[1:], pair) # recurse on the sub list from index 1 to the end
def find_pair(s):
if len(s) < 2: return s[0]
pair = [s[0],s[1]] # initialises pair array
recurse(s, pair) # passed by reference
return pair
Sample output:
s = [2, 5, 9, 3, -2, 7]
find_pair(s) # ============> (5,9)
I think you can just use an algorithm in O(n) as described follow
(The merge part uses constant time)
Here is the outline of the algorithm:
Divide the problem into two half: LHS & RHS
Each half should returned the largest answer meeting the requirement in that half AND the largest element in that half
Merge and return the answer to upper level: answer is the maximum of LHS's answer, RHS's answer, and the sum of the largest element in both half (consider this only if RHS's largest element >= LHS's largest element)
Here is the demonstration of the algorithm using your example: {2, 5, 9, 3, -2, 7}
Divide into {2,5,9}, {3,-2,7}
Divide into {2,5}, {9}, {3,-2}, {7}
{2,5} return max(2,5, 5+2) = 7, largest element = 5
{9} return 9, largest element = 9
{3,-2} return max(3,-2) = 3, largest element = 3
{7} return 7, largest element = 7
{2,5,9} merged from {2,5} & {9}: return max(7,9,9+5) = 14, largest element = max(9,5) = 9
{3,-2,7} merged from {3,-2} & {7}: return max(3,7,7+3) = 10, largest element = max(7,3) = 7
{2,5,9,3,-2,7} merged from {2,5,9} and {3,-2,7}: return max(14,10) = 14, largest element = max(9,7) = 9
ans = 14
Special cases like {5,4,3,2,1} which yields no answer needs extra handling but not affecting the core part and the complexity of the algorithm.

Counting Inversions In An Array - Special Case

Inversion Count for an array indicates – how far (or close) the array is from being sorted. If array is already sorted then inversion count is 0. If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
Now, there are various algorithms to solve this in O(n log n).
There is a special case where the array only has 3 types of elements - 1, 2 and 3. Now, is it possible to count the inversions in O(n) ?
Eg 1,1,3,2,3,1,3
Yes it is. Just take 3 integers a,b,c where a is number of 1's encountered till now, b is number of 2's encountered till now and c is number of 3's encountered till now. Given this follow the algorithm below ( I assume numbers are given in array arr and the size is n, with 1 based indexing, also following is just a pseudocode )
no_of_inv = 0
a = 0
b = 0
c = 0
for i from 1 to n:
if arr[i] == 1:
no_of_inv = no_of_inv + b + c
a++
else if arr[i] == 2:
no_of_inv = no_of_inv + c
b++
else:
c++
(This algorithm is extremely similar to Sasha's. I just wanted to provide an explanation as well.)
Every inversion (i, j) satisfies 0 ≤ i < j < n. Let's define S[j] to be the number of inversions of the form (i, j); that is, S[j] is the number of times A[i] > A[j] for 0 ≤ i < j. Then the total number of inversions is T = S[0] + S[1] + … + S[n - 1].
Let C[x][j] be the number of times A[i] > x for 0 ≤ i < j. Then S[j] = C[A[j]][j] for all j. If we can compute the 3n values C[x][j] in linear time, then we can compute S in linear time.
Here is some Python code:
>>> import numpy as np
>>> A = np.array([1, 1, 3, 2, 3, 1, 3])
>>> C = {x: np.cumsum(A > x) for x in np.unique(A)}
>>> T = sum(C[A[j]][j] for j in range(len(A)))
>>> print T
4
This could be made more efficient—although not in asmpytotic terms—by not storing all C values at once. The algorithm really only needs a single pass through the array. I have chosen to present it this way because it is most concise.

Sum of continuous sequences

Given an array A with N elements, I want to find the sum of minimum elements in all the possible contiguous sub-sequences of A. I know if N is small we can look for all possible sub sequences but as N is upto 10^5 what can be best way to find this sum?
Example: Let N=3 and A[1,2,3] then ans is 10 as Possible contiguous sub sequences {(1),(2),(3),(1,2),(1,2,3),(2,3)} so Sum of minimum elements = 1 + 2 + 3 + 1 + 1 + 2 = 10
Let's fix one element(a[i]). We want to know the position of the rightmost element smaller than this one located to the left from i(L). We also need to know the position of the leftmost element smaller than this one located to the right from i(R).
If we know L and R, we should add (i - L) * (R - i) * a[i] to the answer.
It is possible to precompute L and R for all i in linear time using a stack. Pseudo code:
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
We can reverse the array and run the same algorithm to find R.
How to deal with equal elements? Let's assume that a[i] is a pair <a[i], i>. All elements are distinct now.
The time complexity is O(n).
Here is a full pseudo code(I assume that int can hold any integer value here, you should
choose a feasible type to avoid an overflow in a real code. I also assume that all elements are distinct):
int[] getLeftSmallerElementPositions(int[] a):
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
return L
int[] getRightSmallerElementPositions(int[] a):
R = getLeftSmallerElementPositions(reversed(a))
for i <- 0 ... n - 1:
R[i] = n - 1 - R[i]
return reversed(R)
int findSum(int[] a):
L = getLeftSmallerElementPositions(a)
R = getRightSmallerElementPositions(a)
int res = 0
for i <- 0 ... n - 1:
res += (i - L[i]) * (R[i] - i) * a[i]
return res
If the list is sorted, you can consider all subsets for size 1, then 2, then 3, to N. The algorithm is initially somewhat inefficient, but an optimized version is below. Here's some pseudocode.
let A = {1, 2, 3}
let total_sum = 0
for set_size <- 1 to N
total_sum += sum(A[1:N-(set_size-1)])
First, sets with one element:{{1}, {2}, {3}}: sum each of the elements.
Then, sets of two element {{1, 2}, {2, 3}}: sum each element but the last.
Then, sets of three elements {{1, 2, 3}}: sum each element but the last two.
But this algorithm is inefficient. To optimize to O(n), multiply each ith element by N-i and sum (indexing from zero here). The intuition is that the first element is the minimum of N sets, the second element is the minimum of N-1 sets, etc.
I know it's not a python question, but sometimes code helps:
A = [1, 2, 3]
# This is [3, 2, 1]
scale = range(len(A), 0, -1)
# Take the element-wise product of the vectors, and sum
sum(a*b for (a,b) in zip(A, scale))
# Or just use the dot product
np.dot(A, scale)

Resources