Maximum Sum for Subarray with fixed cutoff - algorithm

I have a list of integers, and I need to find a way to get the maximum sum of a subset of them, adding elements to the total until the sum is equal to (or greater than) a fixed cutoff. I know this seems similar to the knapsack, but I was unsure whether it was equivalent.
Sorting the array and adding the maximum element until sum <= cutoff does not work. Observe the following list:
list = [6, 5, 4, 4, 4, 3, 2, 2, 1]
cutoff = 15
For this list, doing it the naive way results in a sum of 15, which is very sub-optimal. As far as I can see, the maximum you could arrive at using this list is 20, by adding 4 + 4 + 4 + 2 + 6. If this is just a different version of knapsack, I can just implement a knapsack solution, as I probably have small enough lists to get away with this, but I'd prefer to do something more efficient.

First of all in any sum, you won't have produced a worse result by adding the largest element last. So there is no harm in assuming that the elements are sorted from smallest to largest as a first step.
And now you use a dynamic programming approach similar to the usual subset sum.
def best_cutoff_sum (cutoff, elements):
elements = sorted(elements)
sums = {0: None}
for e in elements:
next_sums = {}
for v, path in sums.iteritems():
next_sums[v] = path
if v < cutoff:
next_sums[v + e] = [e, path]
sums = next_sums
best = max(sums.keys())
return (best, sums[best])
print(best_cutoff_sum(15, [6, 5, 4, 4, 4, 3, 2, 2, 1]))
With a little work you can turn the path from the nested array it currently is to whatever format you want.
If your list of non-negative elements has n elements, your cutoff is c and your maximum value is v, then this algorithm will take time O(n * (k + v))

Related

Dynamic Programming Problem for "disorder" in array

Given a sequence a=(a1,a2....an) from n postive integers. We call Disorder D(ak) of ak=(a1,a2...ak) the diference between ak's max from ak's min. We call Total Disorder the sum all D(ak) for all subsequences from k=2 to k=n. We are looking for a dp algorithm with a recursive solution for b*, witch is a permutation of a,and it achieves minimum D(ak) from k=2 to k=n.
Exmples:
a=(6, 2, 3, 1, 3, 3) then b*=(3, 3, 3, 2, 1, 6)[with D(b*) = 0 + 0 + 1 + 2 + 5 = 8]
a=(1, 3, 3, 3, 6, 6) then b*=(3, 3, 3, 6, 6, 1)[with D(b*) = 0 + 0 + 3 + 3 + 5 = 11]
The only thing i was able to prove was that at the end of b* the number will be either the max of a or min of a.
Pls help.
First sort the input array, and then consider building the result permutation backwards from the end towards the start.
For every element you will either remove the first or last element of the sorted array. Also, for every position k, the disorder of the subarray ending at that position is known -- it's just the difference between the two ends of the remaining element array.
To find the optimal selection, then, you can use DP[k,n] = the minimum disorder so far if we've chosen n elements from the front of the sorted array (with the remainder chosen from the back).
DP[k,n] is easily calculated from DP[k+1,n] and DP[k+1,n-1], and the minimum DP[0,?] is the minimum total disorder.

Find the maximum weight that can be collected from a store under given limit

I faced this problem in placement exam of SAP labs:
It's your birthday, so you are given a bag with a fixed space 'S'. You can go to a store and pick as many items you like which can be accommodated inside your bag. The store has 'n' items and each item occupies a space s[i]. You have to find out the maximum space in bag which you can fill.
For example, say the limit of you bag is S = 15 and the store has 10 items of sizes [1, 7, 3, 5, 4, 10, 6, 15, 20, 8]. Now you can fill 15 space by various ways such as [1, 7, 3, 4], [7, 3, 5], [15], [5, 10] and many more. So you return 15.
Note: There is quirk in the sizes of items. All of the items but at most 15 follow the following rule: *for all i, j, either size[i]>=2*size[j]+1 or size[j] >= 2*size[i] +1 if i ≠ j.*
Constraints:
1<= n <= 60.
1<= size[i] <= 10^17.
1<= S <= 10^18.
Example: S = 9, n = 5, sizes = [1, 7, 4, 4, 10].
Output: 8. You can't fill exactly 9 space in any way. You can fill 8 space either by using [1, 7] or [4, 4].
Let´s call x the elements that follow that rule. Note that for this set of elements, we have some nice properties:
Given x in sorted ascending order, sum(x[i..j]) < x[j + 1]
To solve maximum sum <= k, just iterate in sorted descending order and substract from k x[i] whenever possible. original k - remaining k is the solution. Assuming elements were already sorted, this is O(|x|).
One way to obtain this set is to iterate items sorted by size in ascending order and add to set if :
set has no elements or
current element >= 2 * size[lastElementAdded] + 1
Now we are left with at most 15 items that do not follow this rule. So we can´t use the efficient solving like before. For each item, we can consider to put it or not in the bag. This leads to 2^15 possible sums. For each of those sums, we can run our method for the elements that follow the rule.
Overall complexity: 2^15 * (n - 15). For n = 60, this should be solved in less than a second.
As an exercise: by using accumulated sums and binary search, it can be brought down to 2^15 * log2(n - 15).

Algorithm to find combination of n numbers with largest sum

Problem is simple -
Suppose I have an array of following numbers -
4,1,4,5,7,4,3,1,5
I have to find number of sets of k elements each that can be created from above numbers having largest sum. Two sets are considered to be different if they have at least one different element.
e.g.
if k = 2, then there can be two sets - {7,5} and {7,5}. Note: 5 appears twice in above array.
I think I can start with something like-
1. Sort array
2. Create two arrays. One for different number and an other in parallel for number's occurence.
But I am stuck now. Any suggestions?
The algorithm is as follows:
1) Sort elements in descending order.
2) Look at this array. It may look something like this:
a ... a b ... b c ... c d ...
| <- k -> |
Now obviously all elements a and b will be in the sets with the largest sum. You can't replace any of them with a smaller element, because then the sum wouldn't be the largest possible. So you have no choice here, you have to choose all a and b for any of the sets.
On the other hand only some of the elements c will be in those sets. So the answer is just the number of possibilities, to choose c's to fill the positions left in the sets, after you have taken all larger elements. That is the binomial coefficient:
count of c's choose (k - (count of elements larger than c))
For example for an array (already sorted here)
[9, 8, 7, 7, 5, 5, 5, 5, 4, 4, 2, 2, 1, 1, 1]
and k = 6, you must choose 9, 8 and both 7's for every set with the largest sum (which is 41). And then you can choose any two out of the four 5's. So the result will be 4 choose 2 = 6.
With the same array and k = 4, the result would be x choose 0 = 1 (that unique set is {9, 8, 7, 7}), with k = 7 the result would be 4 choose 3 = 4, and with k = 9: 2 choose 1 = 2 (choosing any 4 for the set with the largest sum).
EDIT: I edited the answer, because we figured it out that OP needs to count multisets.
First, find the largest k numbers in the array. This is of course easy, and if k is very small, you can do it in O(k) by performing k linear scans. If k is not so small, you can use a binary heap, or a priority queue or just sort the array to do that which is respectively O(n * log(k)) or O(n * log(n)) when using sorting.
Let assume that you have computed k largest numbers. Of course all sets of size k with the largest sum have to contain exactly these k largest numbers and no more other numbers. On the other hand, any different set doesn't have the largest sum.
Let count[i] be the number of occurrences of number i in the input sequence.
Let occ[i] be the number of occurrences of number i in the largest k numbers.
We can compute these both tables in very different ways, for example using a hash table or if input numbers are small, you can use an array indexed by these numbers.
Let B be the array of distinct numbers from the largest k numbers.
Let m be the size of B.
Now let's compute the answer. We will do it in m steps. After i-th step we will have computed the number of different multisets consisting of the first i numbers from B. At the beginning the result is 1 since there is only one empty multiset. In the i-th step, we will multiply the current result by the number of possible chooses of occ[B[i]] elements from count[B[i]] elements, which is equal to binomial(occ[i], count[i])
For example, let's consider your instance with added one more 7 at the end and k set to 3:
k = 3
A = [4, 1, 4, 5, 7, 4, 3, 1, 5, 7]
The largest three numbers in A are 7, 7, 5
At the beginning we have:
count[7] = 2
count[5] = 2
occ[7] = 2
occ[5] = 1
result = 1
B = [7, 5]
We start with the first element in B which is 7. Its count is 2 and its occ is also 2, so we do:
// binomial(2, 2) is 1
result = result * binomial(2, 2)
Next element in B is 5, its count is 2 and its occ is 1, so we do:
// binomial(2, 1) is 2
result = result * binomial(2, 1)
And the final result is 2, since there are two different multisets [7, 7, 5]
I'd create a sorted dictionary of the frequencies of occurrence of the numbers in the input. Then take the two largest numbers and multiply the number of times they occur.
In C++, it could look something like this:
std::vector<int> inputs { 4, 1, 4, 5, 7, 3, 1, 5};
std::map<int, int> counts;
for (auto i : inputs)
++counts[i];
auto last = counts.rbegin();
int largest_count = *last;
int second_count = *++last;
int set_count = largeest_count * second_count;
You can do the following:
1) Sort the elements in descending order;
2) define variable answer=1;
3) Start from the beginning of the array and for each new value you see, count the number of its occurrence (lets call this variable count). every time do: answer = answer * count. The pseudo-code should look like this.
find_count(Array A, K)
{
sort(A,'descending);
int answer=1;
int count=1;
for (int i=1,j=1; i<K && j<A.length;j++)
{
if(A[i] != A[i-1])
{
answer = answer *count;
i++;
count=1;
}
else
count++;
}
return answer;
}

add elements of array thats sum equals the largest element

what is a way to add elements of array thats sum would equal the largest element in the array?
example for this array [4, 6, 23, 10, 1, 3] I have sorted the array first resulting in [1, 3, 4, 6, 10, 23] then I pop the last digit or the last element max = 23. I'm left with [1, 3, 4, 6, 10] and need a way to find a way to find the elements that add up to 23 which are 3 + 4 + 6 + 10 = 23. The elements don't have to be subsequent they can be at random points of the array but they must add up to max.
I can find the permutations of the sorted array from 2 elements to n-1 elements and sum them and compare them to max but that seems inefficient. plz help
This is exactly the subset sum problem, which is NP-Complete, but if your numbers are relatively small integers, there is an efficient pseudo-polynomial solution using Dynamic Programming:
D(i,0) = TRUE
D(0,x) = FALSE x>0
D(i,x) = D(i-1,x) OR D(i-1,x-arr[i])
If there is a solution, you need to step back in the matrix created by the DP solution, and "record" each choice you have made along the way, to get the elements used for the summation. This thread deals with how to find the actual elements in a very similar problem (known as knapsack problem), which is solved similarly: How to find which elements are in the bag, using Knapsack Algorithm [and not only the bag's value]?

find kth smallest number in O(logn) time

Here is the problem, an unsorted array a[n], and I need to find the kth smallest number in range [i, j], and absolutely 1<=i<=j<=n, k<=j-i+1.
Typically I will use quick-find to do the job, but it is not fast enough if there many query requests with different range [i, j], I hardly to figure out a algorithm to do the query in O(logn) time (preprocessing is allowed).
Any idea is appreciated.
PS
Let me make the problem easier to understand. Any kinds of preprocessing is allowed, but the query needs to be done in O(logn) time. And there will be many (more than 1) queries, like find the 1st in range [3,7], or 3rd in range [10,17], or 11th in range [33, 52].
By range [i, j] I mean in the original array, not sorted or something.
For example, a[5] = {3,1,7,5,9}, query 1st in range [3,4] is 5, 2nd in range [1,3] is 5, 3rd in range [0,2] is 7.
If pre-processing is allowed and not counted towards the time complexity, just use that to construct sub-lists so that you can efficiently find the element you're looking for. As with most optimisations, this trades space for time.
Your pre-processing step is to take your original list of n numbers and create a number of new sublists.
Each of these sublists is a portion of the original, starting with the nth element, extending for m elements and then sorted. So your original list of:
{3, 1, 7, 5, 9}
gives you:
list[0][0] = {3}
list[0][1] = {1, 3}
list[0][2] = {1, 3, 7}
list[0][3] = {1, 3, 5, 7}
list[0][4] = {1, 3, 5, 7, 9}
list[1][0] = {1}
list[1][1] = {1, 7}
list[1][2] = {1, 5, 7}
list[1][3] = {1, 5, 7, 9}
list[2][0] = {7}
list[2][1] = {5, 7}
list[2][2] = {5, 7, 9}
list[3][0] = {5}
list[3][1] = {5,9}
list[4][0] = {9}
This isn't a cheap operation (in time or space) so you may want to maintain a "dirty" flag on the list so you only perform it the first time after you do an modifying operation (insert, delete, change).
In fact, you can use lazy evaluation for even more efficiency. Basically set all sublists to an empty list when you start and whenever you perform a modifying operation. Then, whenever you attempt to access a sublist and it's empty, calculate that sublist (and that one only) before trying to get the kth value out of it.
That ensures sublists are evaluated only when needed and cached to prevent unnecessary recalculation. For example, if you never ask for a value from the 3-through-6 sublist, it's never calculated.
The pseudo-code for creating all the sublists is basically (for loops inclusive at both ends):
for n = 0 to a.lastindex:
create array list[n]
for m = 0 to a.lastindex - n
create array list[n][m]
for i = 0 to m:
list[n][m][i] = a[n+i]
sort list[n][m]
The code for lazy evaluation is a little more complex (but only a little), so I won't provide pseudo-code for that.
Then, in order to find the kth smallest number in the range i through j (where i and j are the original indexes), you simply look up lists[i][j-i][k-1], a very fast O(1) operation:
+--------------------------+
| |
| v
1st in range [3,4] (values 5,9), list[3][4-3=1][1-1-0] = 5
2nd in range [1,3] (values 1,7,5), list[1][3-1=2][2-1=1] = 5
3rd in range [0,2] (values 3,1,7), list[0][2-0=2][3-1=2] = 7
| | ^ ^ ^
| | | | |
| +-------------------------+----+ |
| |
+-------------------------------------------------+
Here's some Python code which shows this in action:
orig = [3,1,7,5,9]
print orig
print "====="
list = []
for n in range (len(orig)):
list.append([])
for m in range (len(orig) - n):
list[-1].append([])
for i in range (m+1):
list[-1][-1].append(orig[n+i])
list[-1][-1] = sorted(list[-1][-1])
print "(%d,%d)=%s"%(n,m,list[-1][-1])
print "====="
# Gives xth smallest in index range y through z inclusive.
x = 1; y = 3; z = 4; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 2; y = 1; z = 3; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 3; y = 0; z = 2; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
print "====="
As expected, the output is:
[3, 1, 7, 5, 9]
=====
(0,0)=[3]
(0,1)=[1, 3]
(0,2)=[1, 3, 7]
(0,3)=[1, 3, 5, 7]
(0,4)=[1, 3, 5, 7, 9]
(1,0)=[1]
(1,1)=[1, 7]
(1,2)=[1, 5, 7]
(1,3)=[1, 5, 7, 9]
(2,0)=[7]
(2,1)=[5, 7]
(2,2)=[5, 7, 9]
(3,0)=[5]
(3,1)=[5, 9]
(4,0)=[9]
=====
(1,3,4)=5
(2,1,3)=5
(3,0,2)=7
=====
Current solution is O( (logn)^2 ). I am pretty sure it can be modified to run on O(logn). The main advantage of this algorithm over paxdiablo's algorithm is space efficiency. This algorithm needs O(nlogn) space, not O(n^2) space.
First, the complexity of finding kth smallest element from two sorted arrays of length m and n is O(logm + logn). Complexity of finding kth smallest element from arrays of lengths a,b,c,d.. is O(loga+logb+.....).
Now, sort the whole array and store it. Sort the first half and second half of the array and store it and so on. You will have 1 sorted array of length n, 2 sorted of arrays of length n/2, 4 sorted arrays of length n/4 and so on. Total memory required = 1*n+2*n/2+4*n/4+8*n/8...= nlogn.
Once you have i and j figure out the list of of subarrays which, when concatenated, give you range [i,j]. There are going to be logn number of arrays. Finding kth smallest number among them would take O( (logn)^2) time.
Example for the last paragraph:
Assume the array is of size 8 (indexed from 0 to 7). You have the following sorted lists:
A:0-7, B:0-3, C:4-7, D:0-1, E:2-3, F:4-5, G:6-7.
Now construct a tree with pointers to these arrays such that every node contains its immediate constituents. A will be root, B and C are its children and so on.
Now implement a recursive function that returns a list of arrays.
def getArrays(node, i, j):
if i==node.min and j==node.max:
return [node];
if i<=node.left.max:
if j<=node.left.max:
return [getArrays(node.left, i, j)]; # (i,j) is located within left node
else:
return [ getArrays(node.left, i, node.left.max), getArrays(node.right, node.right.min, j) ]; # (i,j) is spread over left and right node
else:
return [getArrays(node.right, i, j)]; # (i,j) is located within right node
Preprocess: Make an nxn array where the [k][r] element is the kth smallest element of the first r elements (1-indexed for convenience).
Then, given some particular range [i,j] and value for k, do the following:
Find the element at the [k][j] slot of the matrix; call this x.
go down the i-1 column of your matrix and find how many values in it are smaller than or equal to x (treat column 0 as having 0 smaller entries). By construction, this column will be sorted (all columns will be sorted), so it can be found in log time. Call this value s
Find the element in the [k+s][j] slot of the matrix. This is your answer.
E.g., given 3 1 7 5 9
3 1 1 1 1
X 3 3 3 3
X X 7 5 5
X X X 7 7
X X X X 9
Now, if we're asked for the 2nd smallest in [2,4] range (again, 1-indexing), I first find the 2nd smallest in [1,4] range which is 3. I then look at column 1 and see that there is 1 element less than or equal to 3. Finally, I find the 3rd smallest in [1,4] range at [3][5] slot which is 5, as desired.
This takes n^2 space, and log(n) lookup time.
This one does not require pre-process but is somehow slower than O(logN). It's significantly faster than a naive iterate&count, and could support dynamic modification on the sequence.
It goes like this. Suppose the length n has n=2^x for some x. Construct a segment-tree whose root node represent [0,n-1]. For each of the node, if it represent a node [a,b], b>a, let it has two child nodes each representing [a,(a+b)/2], [(a+b)/2+1,b]. (That is, do a recursive divide-by-two).
Then, on each node, maintain a separate binary search tree for the numbers within that segment. Therefore, each modification on the sequence takes O(logN)[on the segement]*O(logN)[on the BST]. Queries can be done like this, Let Q(a,b,x) be rank of x within segment [a,b]. Obviously, if Q(a,b,x) can be computed efficiently, a binary search on x can compute the answer desired effectively (with an extra O(logE) factor.
Q(a,b,x) can be computed as: find smallest number of segments that make up [a,b], which can be done in O(logN) on the segment tree. For each segment, query on the binary search tree for that segment for the number of elements less than x. Add all these numbers to get Q(a,b,x).
This should be O(logN*logE*logN). Well not exactly what you have asked for though.
In O(log n) time it's not possible to read all of the elements of the array. Since it's not sorted, and there's no other provided information, this is impossible.
There's no way you can do better than O(n) in both worst and average case. You have to look at every single element.

Resources