Find the max sum of removed elements from head or tail? - algorithm

You are given a stack of N integers such that the first element represents the top of the stack and the last element represents the bottom of the stack. You need to pop at least one element from the stack. At any one moment, you can convert stack into a queue. The bottom of the stack represents the front of the queue. You cannot convert the queue back into a stack. Your task is to remove exactly K elements such that the sum of the K removed elements is maximised.
Print the maximum possible sum M of the K removed elements
Input: N=10, K=4
stack = [10 9 1 2 3 4 5 6 7 8]
Output: 34
Explanation:
Pop two elements from the stack. i.e {10,9}
Then convert the stack into queue and remove first 2 elements from the
queue. i.e {8,7}.
The maximum possible sum is 10+9+8+7 = 34
I was thinking of solving it with greedy algo, with code like following:
stk = [10, 9, 1, 30, 3, 4, 5, 100, 1, 8]
n = 10
k = 4
removed = 0
top = 0
sum = 0
bottom = len(stk)-1
while removed < k:
if stk[top] >= stk[bottom]:
sum += stk[top]
top += 1
else:
sum += stk[bottom]
bottom -= 1
removed += 1
print(sum)
under normal test case(given one) it'll work, but it'll fail in many other scenarios like:
[10, 9, 1, 30, 3, 4, 5, 100, 1, 8]
Any suggestions on how to improve on this?

The data structure gives you the option to select 1,...,n values from the stop of the structure and then m elements from the bottom of the structure where K = m+n. You can find the maximum by starting out with summing the first K elements of the structure and then work your way backwards by replacing the n'th element with the first element from the back. Working backwards to you get to only one stack element. Keep track of the maximum along the way.
In python:
lst = [10, 9, 1, 30, 3, 4, 5, 100, 1, 8]
K = 4
sum_ = sum(lst[0:K])
max_so_far = sum_
for i in range(K-1):
sum_ = sum_ - lst[K-1-i] + lst[-i-1]
max_so_far = max(sum_, max_so_far)
print(max_so_far)
The running time is O(n).

If you carefully look at the problem, it basically boils down to:
Select x elements from the start and y elements from the end of the array, such that the sum is maximum and x+y = K.
That is a pretty simple problem to solve, which basically requires this algorithm:
ans = sum(last K elements)
for i in range(0, K):
ans = max(ans, ans + array[i] - array[n-(k-i+1)]) #picking elements from the start and removing elements from the end

// code in C language
/*
Idea :
W.K.T even it is a stack or a queue we will pop/delete takes place at end or starting of array. we are making use of that.
2.K elements has to be removed in a stack such that their SUM is max
popping of elements has to be at start or end of stack and no. of elements has to be removed are 'K'.
4.We are taking all possibilities of K
i.e. if we take n elements from the front part of stack then K-n elements ahs to selected from back of stack
eg : K = 5 --> possibilities:
(0 5) (1 4) (2 3) (3 2) (4 1) (5 0) -->as you can observe first indx is increasing to K and second indx is decaying to 0 From K.
(0 5) means selecting 0 elements from front of stack and 5 elementts from back of stack
similarly, the other possibilities.
We are taking each possibility and calculating their sum , storing them in separate array 'Sum' and getting max of the sum[] and it will be answer . And the possibility (suppose a b) at which we are getting Max will be our moment where you can convert stack into a queue . Before stack converted into stack we have popped a elements from stack i.e. front of array and after stack converted to queue we deleting b elements which will be done at front of queue i.e. back of the stack.
*/
enter image description here

Related

Number of ways to pick the elements of an array?

How to formulate this problem in code?
Problem Statement:
UPDATED:
Find the number of ways to pick the element from the array which are
not visited.
We starting from 1,2,.....,n with some (1<= x <= n) number of elements already picked/visited randomly which is given in the input.
Now, we need to find the number of ways we can pick rest of the (n - x) number of elements present in the array, and the way we pick an element is defined as:
On every turn, we can only pick the element which is adjacent(either left or right) to some visited element i.e
in an array of elements:
1,2,3,4,5,6 let's say we have visited 3 & 6 then we can now pick
2 or 4 or 5, as they are unvisited and adjacent to visited nodes, now say we pick 2, so now we can pick 1 or 4 or 5 and continues.
example:
input: N = 6(number of elements: 1, 2, 3, 4, 5, 6)
M = 2(number of visited elements)
visited elements are = 1, 5
Output: 16(number of ways we can pick the unvisited elements)
ways: 4, 6, 2, 3
4, 6, 3, 2
4, 2, 3, 6
4, 2, 6, 3
4, 3, 2, 6
4, 3, 6, 2
6, 4, 2, 3
6, 4, 2, 3
6, 2, 3, 4
6, 2, 4, 3
2, 6, 4, 3
2, 6, 3, 4
2, 4, 6, 3
2, 4, 3, 6
2, 3, 4, 6
2, 3, 6, 4.
Some analysis of the problem:
The actual values in the input array are assumed to be 1...n, but these values do not really play a role. These values just represent indexes that are referenced by the other input array, which lists the visited indexes (1-based)
The list of visited indexes actually cuts the main array into subarrays with smaller sizes. So for example, when n=6 and visited=[1,5], then the original array [1,2,3,4,5,6] is cut into [2,3,4] and [6]. So it cuts it into sizes 3 and 1. At this point the index numbering loses its purpose, so the problem really is fully described with those two sizes: 3 and 1. To illustrate, the solution for (n=6, visited=[1,5]) is necessarily the same as for (n=7, visited[1,2,6]): the sizes into which the original array is cut, are the same in both cases (in a different order, but that doesn't influence the result).
Algorithm, based on a list of sizes of subarrays (see above):
The number of ways that one such subarray can be visited, is not that difficult: if the subarray's size is 1, there is just one way. If it is greater, then at each pick, there are two possibilities: either you pick from the left side or from the right side. So you get like 2*2*..*2*1 possibilities to pick. This is 2size-1 possibilities.
The two outer subarrays are an exception to this, as you can only pick items from the inside-out, so for those the number of ways to visit such a subarray is just 1.
The number of ways that you can pick items from two subarrays can be determined as follows: count the number of ways to pick from just one of those subarrays, and the number of ways to pick from the other one. Then consider that you can alternate when to pick from one sub array or from the other. This comes down to interweaving the two sub arrays. Let's say the larger of the two sub arrays has j elements, and the smaller k, then consider there are j+1 positions where an element from the smaller sub array can be injected (merged) into the larger array. There are "k multichoose j+1" ways ways to inject all elements from the smaller sub array.
When you have counted the number of ways to merge two subarrays, you actually have an array with a size that is the sum of those two sizes. The above logic can then be applied with this array and the next subarray in the problem specification. The number of ways just multiplies as you merge more subarrays into this growing array. Of course, you don't really deal with the arrays, just with sizes.
Here is an implementation in JavaScript, which applies the above algorithm:
function getSubArraySizes(n, visited) {
// Translate the problem into a set of sizes (of subarrays)
let j = 0;
let sizes = [];
for (let i of visited) {
let size = i - j - 1;
if (size > 0) sizes.push(size);
j = i;
}
let size = n - j;
if (size > 0) sizes.push(size);
return sizes;
}
function Combi(n, k) {
// Count combinations: "from n, take k"
// See Wikipedia on "Combination"
let c = 1;
let end = Math.min(k, n - k);
for (let i = 0; i < end; i++) {
c = c * (n-i) / (end-i); // This is floating point
}
return c; // ... but result is integer
}
function getPickCount(sizes) {
// Main function, based on a list of sizes of subarrays
let count = 0;
let result = 1;
for (let i = 0; i < sizes.length; i++) {
let size = sizes[i];
// Number of ways to take items from this chunk:
// - when items can only be taken from one side: 1
// - otherwise: every time we have a choice between 2, except for the last remaining item
let pickCount = i == 0 || i == sizes.length-1 ? 1 : 2 ** (size-1);
// Number of ways to merge/weave two arrays, where relative order of elements is not changed
// = a "k multichoice from n". See
// https://en.wikipedia.org/wiki/Combination#Number_of_combinations_with_repetition
let weaveCount = count == 0 ? 1 // First time only
: Combi(size+count, Math.min(count, size));
// Number of possibilities:
result *= pickCount * weaveCount;
// Update the size to be the size of the merged/woven array
count += size;
}
return result;
}
// Demo with the example input (n = 6, visited = 1 and 5)
let result = getPickCount(getSubArraySizes(6, [1, 5]));
console.log(result);

Minimum common remainder of division

I have n pairs of numbers: ( p[1], s[1] ), ( p[2], s[2] ), ... , ( p[n], s[n] )
Where p[i] is integer greater than 1; s[i] is integer : 0 <= s[i] < p[i]
Is there any way to determine minimum positive integer a , such that for each pair :
( s[i] + a ) mod p[i] != 0
Anything better than brute force ?
It is possible to do better than brute force. Brute force would be O(A·n), where A is the minimum valid value for a that we are looking for.
The approach described below uses a min-heap and achieves O(n·log(n) + A·log(n)) time complexity.
First, notice that replacing a with a value of the form (p[i] - s[i]) + k * p[i] leads to a reminder equal to zero in the ith pair, for any positive integer k. Thus, the numbers of that form are invalid a values (the solution that we are looking for is different from all of them).
The proposed algorithm is an efficient way to generate the numbers of that form (for all i and k), i.e. the invalid values for a, in increasing order. As soon as the current value differs from the previous one by more than 1, it means that there was a valid a in-between.
The pseudocode below details this approach.
1. construct a min-heap from all the following pairs (p[i] - s[i], p[i]),
where the heap comparator is based on the first element of the pairs.
2. a0 = -1; maxA = lcm(p[i])
3. Repeat
3a. Retrieve and remove the root of the heap, (a, p[i]).
3b. If a - a0 > 1 then the result is a0 + 1. Exit.
3c. if a is at least maxA, then no solution exists. Exit.
3d. Insert into the heap the value (a + p[i], p[i]).
3e. a0 = a
Remark: it is possible for such an a to not exist. If a valid a is not found below LCM(p[1], p[2], ... p[n]), then it is guaranteed that no valid a exists.
I'll show below an example of how this algorithm works.
Consider the following (p, s) pairs: { (2, 1), (5, 3) }.
The first pair indicates that a should avoid values like 1, 3, 5, 7, ..., whereas the second pair indicates that we should avoid values like 2, 7, 12, 17, ... .
The min-heap initially contains the first element of each sequence (step 1 of the pseudocode) -- shown in bold below:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
We retrieve and remove the head of the heap, i.e., the minimum value among the two bold ones, and this is 1. We add into the heap the next element from that sequence, thus the heap now contains the elements 2 and 3:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
We again retrieve the head of the heap, this time it contains the value 2, and add the next element of that sequence into the heap:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
The algorithm continues, we will next retrieve value 3, and add 5 into the heap:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
Finally, now we retrieve value 5. At this point we realize that the value 4 is not among the invalid values for a, thus that is the solution that we are looking for.
I can think of two different solutions. First:
p_max = lcm (p[0],p[1],...,p[n]) - 1;
for a = 0 to p_max:
zero_found = false;
for i = 0 to n:
if ( s[i] + a ) mod p[i] == 0:
zero_found = true;
break;
if !zero_found:
return a;
return -1;
I suppose this is the one you call "brute force". Notice that p_max represents Least Common Multiple of p[i]s - 1 (solution is either in the closed interval [0, p_max], or it does not exist). Complexity of this solution is O(n * p_max) in the worst case (plus the running time for calculating lcm!). There is a better solution regarding the time complexity, but it uses an additional binary array - classical time-space tradeoff. Its idea is similar to the Sieve of Eratosthenes, but for remainders instead of primes :)
p_max = lcm (p[0],p[1],...,p[n]) - 1;
int remainders[p_max + 1] = {0};
for i = 0 to n:
int rem = s[i] - p[i];
while rem >= -p_max:
remainders[-rem] = 1;
rem -= p[i];
for i = 0 to n:
if !remainders[i]:
return i;
return -1;
Explanation of the algorithm: first, we create an array remainders that will indicate whether certain negative remainder exists in the whole set. What is a negative remainder? It's simple, notice that 6 = 2 mod 4 is equivalent to 6 = -2 mod 4. If remainders[i] == 1, it means that if we add i to one of the s[j], we will get p[j] (which is 0, and that is what we want to avoid). Array is populated with all possible negative remainders, up to -p_max. Now all we have to do is search for the first i, such that remainder[i] == 0 and return it, if it exists - notice that the solution does not have to exists. In the problem text, you have indicated that you are searching for the minimum positive integer, I don't see why zero would not fit (if all s[i] are positive). However, if that is a strong requirement, just change the for loop to start from 1 instead of 0, and increment p_max.
The complexity of this algorithm is n + sum (p_max / p[i]) = n + p_max * sum (1 / p[i]), where i goes from to 0 to n. Since all p[i]s are at least 2, that is asymptotically better than the brute force solution.
An example for better understanding: suppose that the input is (5,4), (5,1), (2,0). p_max is lcm(5,5,2) - 1 = 10 - 1 = 9, so we create array with 10 elements, initially filled with zeros. Now let's proceed pair by pair:
from the first pair, we have remainders[1] = 1 and remainders[6] = 1
second pair gives remainders[4] = 1 and remainders[9] = 1
last pair gives remainders[0] = 1, remainders[2] = 1, remainders[4] = 1, remainders[6] = 1 and remainders[8] = 1.
Therefore, first index with zero value in the array is 3, which is a desired solution.

previous most recent bigger element in array

I try to solve the following problem. Given an array of real numbers [7, 2, 4, 8, 1, 1, 6, 7, 4, 3, 1] for every element I need to find most recent previous bigger element in the array.
For example there is nothing bigger then first element (7) so it has NaN. For the second element (2) 7 is bigger. So in the end the answer looks like:
[NaN, 7, 7, NaN, 8, 8, 8, 8, 7, 4, 3, 1]. Of course I can just check all the previous elements for every element, but this is quadratic in terms of the number of elements of the array.
My another approach was to maintain the sorted list of previous elements and then select the first element bigger then current. This sounds like a log linear to me (am not sure). Is there any better way to approach this problem?
Here's one way to do it
create a stack which is initially empty
for each number N in the array
{
while the stack is not empty
{
if the top item on the stack T is greater than N
{
output T (leaving it on the stack)
break
}
else
{
pop T off of the stack
}
}
if the stack is empty
{
output NAN
}
push N onto the stack
}
Taking your sample array [7, 2, 4, 8, 1, 1, 6, 7, 4, 3, 1], here's how the algorithm would solve it.
stack N output
- 7 NAN
7 2 7
7 2 4 7
7 4 8 NAN
8 1 8
8 1 1 8
8 1 6 8
8 6 7 8
8 7 4 7
8 7 4 3 4
8 7 4 3 1 3
The theory is that the stack doesn't need to keep small numbers since they will never be part of the output. For example, in the sequence 7, 2, 4, the 2 is not needed, because any number less than 2 will also be less than 4. Hence the stack only needs to keep the 7 and the 4.
Complexity Analysis
The time complexity of the algorithm can be shown to be O(n) as follows:
there are exactly n pushes (each number in the input array is
pushed onto the stack once and only once)
there are at most n pops (once a number is popped from the stack,
it is discarded)
there are at most n failed comparisons (since the number is popped
and discarded after a failed comparison)
there are at most n successful comparisons (since the algorithm
moves to the next number in the input array after a successful
comparison)
there are exactly n output operations (since the algorithm
generates one output for each number in the input array)
Hence we conclude that the algorithm executes at most 5n operations to complete the task, which is a time complexity of O(n).
We can keep for each array element the index of the its most recent bigger element. When we process a new element x, we check the previous element y. If y is greater then we found what we want. If not, we check which is the index of the most recent bigger element of y. We continue until we find our needed element and its index. Using python:
a = [7, 2, 4, 8, 1, 1, 6, 7, 4, 3, 1]
idx, result = [], []
for i, v in enumerate(a, -1):
while i >= 0 and v >= a[i]:
i = idx[i]
idx.append(i)
result.append(a[i] if i >= 0 else None)
Result:
[None, 7, 7, None, 8, 8, 8, 8, 7, 4, 3]
The algorithm is linear. When an index j is unsuccessfully checked because we are looking for the most recent bigger element of index i > j then from now on i will point to a smaller index than j and j won't be checked again.
Why not just define a variable 'current_largest' and iterate through your array from left to right? At each element, current largest is largest previous, and if the current element is larger, assign current_largest to the current element. Then move to the next element.
EDIT:
I just re-read your question and I may have misunderstood it. Do you want to find ALL larger previous elements?
EDIT2:
It seems to me like the current largest method will work. You just need to record current_largest before you assign it a new value. For example, in python:
current_largest = 0
for current_element in elements:
print("Largest previous is "+current_largest)
if(current_element>current_largest):
current_largest = current_element
If you want an array of these, then just push the value to an array in place of the print statement.
As per my best understanding of your question. Below is a solution.
Working Example : JSFIDDLE
var item = document.getElementById("myButton");
item.addEventListener("click", myFunction);
function myFunction() {
var myItems = [7, 2, 4, 8, 1, 1, 6, 7, 4, 3, 1];
var previousItem;
var currentItem;
var currentLargest;
for (var i = 0; i < myItems.length; i++) {
currentItem = myItems[i];
if (i == 0) {
previousItem = myItems[0];
currentItem = myItems[0];
myItems[i] = NaN;
}
else {
if (currentItem < previousItem) {
myItems[i] = previousItem;
currentLargest = previousItem;
}
if (currentItem > currentLargest) {
currentLargest = currentItem;
myItems[i] = NaN;
}
else {
myItems[i] = currentLargest;
}
previousItem = currentItem;
}
}
var stringItems = myItems.join(",");
document.getElementById("arrayAnswer").innerHTML = stringItems;
}

How to sort K sorted arrays, with MERGE SORT

I know that this question has been asked, and there is a very nice elegant solution using a min heap.
MY question is how would one do this using the merge function of merge sort.
You already have an array of sorted arrays. So you should be able to merge all of them into one array in O(nlog K) time, correct?
I just can't figure out how to do this!
Say I have
[ [5,6], [3,4], [1,2], [0] ]
Step 1: [ [3,4,5,6], [0,1,2] ]
Step2: [ [0,1,2,3,4,5,6] ]
Is there a simple way to do this? Is O(nlog K) theoretically achievable with mergesort?
As others have said, using the min heap to hold the next items is the optimal way. It's called an N-way merge. Its complexity is O(n log k).
You can use a 2-way merge algorithm to sort k arrays. Perhaps the easiest way is to modify the standard merge sort so that it uses non-constant partition sizes. For example, imagine that you have 4 arrays with lengths 10, 8, 12, and 33. Each array is sorted. If you concatenated the arrays into one, you would have these partitions (the numbers are indexes into the array, not values):
[0-9][10-17][18-29][30-62]
The first pass of your merge sort would have starting indexes of 0 and 10. You would merge that into a new array, just as you would with the standard merge sort. The next pass would start at positions 18 and 30 in the second array. When you're done with the second pass, your output array contains:
[0-17][18-62]
Now your partitions start at 0 and 18. You merge those two into a single array and you're done.
The only real difference is that rather than starting with a partition size of 2 and doubling, you have non-constant partition sizes. As you make each pass, the new partition size is the sum of the sizes of the two partitions you used in the previous pass. This really is just a slight modification of the standard merge sort.
It will take log(k) passes to do the sort, and at each pass you look at all n items. The algorithm is O(n log k), but with a much higher constant than the N-way merge.
For implementation, build an array of integers that contains the starting indexes of each of your sub arrays. So in the example above you would have:
int[] partitions = [0, 10, 18, 30];
int numPartitions = 4;
Now you do your standard merge sort. But you select your partitions from the partitions array. So your merge would start with:
merge (inputArray, outputArray, part1Index, part2Index, outputStart)
{
part1Start = partitions[part1Index];
part2Start = partitions[part2Index];
part1Length = part2Start - part1Start;
part2Length = partitions[part2Index-1] - part2Start;
// now merge part1 and part2 into the output array,
// starting at outputStart
}
And your main loop would look something like:
while (numPartitions > 1)
{
for (int p = 0; p < numPartitions; p += 2)
{
outputStart = partitions[p];
merge(inputArray, outputArray, p, p+1, outputStart);
// update partitions table
partitions[p/2] = partitions[p] + partitions[p+1];
}
numPartitions /= 2;
}
That's the basic idea. You'll have to do some work to handle the dangling partition when the number is odd, but in general that's how it's done.
You can also do it by maintaining an array of arrays, and merging each two arrays into a new array, adding that to an output array of arrays. Lather, rinse, repeat.
You should note that when we say complexity is O(n log k), we assume that n means TOTAL number of elements in ALL of k arrays, i.e. number of elements in a final merged array.
For example, if you want to merge k arrays that contain n elements each, total number of elements in final array will be nk. So complexity will be O(nk log k).
There different ways to merge arrays. To accoplish that task in N*Log(K) time you can use a structure called Heap (it is good structure to implement priority queue). I suppose that you already have it, if you don’t then pick up any available implementation: http://en.wikipedia.org/wiki/Heap_(data_structure)
Then you can do that like this:
1. We have A[1..K] array of arrays to sort, Head[1..K] - current pointer for every array and Count[1..K] - number of items for every array.
2. We have Heap of pairs (Value: int; NumberOfArray: int) - empty at start.
3. We put to the heap first item of every array - initialization phase.
4. Then we organize cycle:
5. Get pair (Value, NumberOfArray) from the heap.
6. Value is next value to output.
7. NumberOfArray – is number of array where we need to take next item (if any) and place to the heap.
8. If heap is not empty, then repeat from step 5
So for every item we operate only with heap built from K items as maximum. It mean that we will have N*Log(K) complexity as you asked.
I implemented it in python. The main idea is similar to mergesort. There are k arrays in lists. In function mainMerageK, just divide lists (k) into left (k/2) and right (k/2). Therefore, the total count of partition is log(k). Regarding function merge, it is easy to know the runtime is O(n). Finally, we get O(nlog k)
By the way, it also can be implemented in min heap, and there is a link: Merging K- Sorted Lists using Priority Queue
def mainMergeK(*lists):
# implemented by k-way partition
k = len(lists)
if k > 1:
mid = int(k / 2)
B = mainMergeK(*lists[0: mid])
C = mainMergeK(*lists[mid:])
A = merge(B, C)
print B, ' + ', C, ' = ', A
return A
return lists[0]
def merge(B, C):
A = []
p = len(B)
q = len(C)
i = 0
j = 0
while i < p and j < q:
if B[i] <= C[j]:
A.append(B[i])
i += 1
else:
A.append(C[j])
j += 1
if i == p:
for c in C[j:]:
A.append(c)
else:
for b in B[i:]:
A.append(b)
return A
if __name__ == '__main__':
x = mainMergeK([1, 3, 5], [2, 4, 6], [7, 8, 10], [9])
print x
The output likes below:
[1, 3, 5] + [2, 4, 6] = [1, 2, 3, 4, 5, 6]
[7, 8, 10] + [9] = [7, 8, 9, 10]
[1, 2, 3, 4, 5, 6] + [7, 8, 9, 10] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Just do it like a 2-way merge except with K items. Will result in O(NK). If you want O(N logK) you will need to use a min-heap to keep track of the K pointers(with source array as a metadata) in the algorithm below:
Keep an array of K elements - i.e K pointers showing position in each array.
Mark all K elements are valid.
loop:
Compare values in K pointers that are valid. if the value is minimum, select least index pointer and increment it into the next value in the array. If incremented value has crossed it's array, mark it invalid.
Add the least value into the result.
Repeat till all K elements are invalid.
For example,:
Positions Arrays
p1:0 Array 1: 0 5 10
p2:3 Array 2: 3 6 9
p3:2 Array 3: 2 4 6
Output (min of 0,3,2)=> 0. So output is {0}
Array
p1:5 0 5 10
p2:3 3 6 9
p3:2 2 4 6
Output (min of 5,3,2)=> 2. So {0,2}
Array
p1:5 0 5 10
p2:3 3 6 9
p3:4 2 4 6
Output (min of 5,3,4)=>3. So {0,2,3}
..and so on..until you come to a state where output is {0,2,3,4,5,6}
Array
p1:5 0 5 10
p2:9 3 6 9
p3:6 2 4 6
Output (min of 5,9,6)=>6. So {0,2,3,4,5,6}+{6} when you mark p3 as "invalid" as you have exhausted the array. (or if you are using a min-heap you will simply remove the min-item, get it's source array metadata: in this case array 3, see that it's done so you will not add anything new to the min-heap)

Allocate an array of integers proportionally compensating for rounding errors

I have an array of non-negative values. I want to build an array of values who's sum is 20 so that they are proportional to the first array.
This would be an easy problem, except that I want the proportional array to sum to exactly
20, compensating for any rounding error.
For example, the array
input = [400, 400, 0, 0, 100, 50, 50]
would yield
output = [8, 8, 0, 0, 2, 1, 1]
sum(output) = 20
However, most cases are going to have a lot of rounding errors, like
input = [3, 3, 3, 3, 3, 3, 18]
naively yields
output = [1, 1, 1, 1, 1, 1, 10]
sum(output) = 16 (ouch)
Is there a good way to apportion the output array so that it adds up to 20 every time?
There's a very simple answer to this question: I've done it many times. After each assignment into the new array, you reduce the values you're working with as follows:
Call the first array A, and the new, proportional array B (which starts out empty).
Call the sum of A elements T
Call the desired sum S.
For each element of the array (i) do the following:
a. B[i] = round(A[i] / T * S). (rounding to nearest integer, penny or whatever is required)
b. T = T - A[i]
c. S = S - B[i]
That's it! Easy to implement in any programming language or in a spreadsheet.
The solution is optimal in that the resulting array's elements will never be more than 1 away from their ideal, non-rounded values. Let's demonstrate with your example:
T = 36, S = 20. B[1] = round(A[1] / T * S) = 2. (ideally, 1.666....)
T = 33, S = 18. B[2] = round(A[2] / T * S) = 2. (ideally, 1.666....)
T = 30, S = 16. B[3] = round(A[3] / T * S) = 2. (ideally, 1.666....)
T = 27, S = 14. B[4] = round(A[4] / T * S) = 2. (ideally, 1.666....)
T = 24, S = 12. B[5] = round(A[5] / T * S) = 2. (ideally, 1.666....)
T = 21, S = 10. B[6] = round(A[6] / T * S) = 1. (ideally, 1.666....)
T = 18, S = 9. B[7] = round(A[7] / T * S) = 9. (ideally, 10)
Notice that comparing every value in B with it's ideal value in parentheses, the difference is never more than 1.
It's also interesting to note that rearranging the elements in the array can result in different corresponding values in the resulting array. I've found that arranging the elements in ascending order is best, because it results in the smallest average percentage difference between actual and ideal.
Your problem is similar to a proportional representation where you want to share N seats (in your case 20) among parties proportionnaly to the votes they obtain, in your case [3, 3, 3, 3, 3, 3, 18]
There are several methods used in different countries to handle the rounding problem. My code below uses the Hagenbach-Bischoff quota method used in Switzerland, which basically allocates the seats remaining after an integer division by (N+1) to parties which have the highest remainder:
def proportional(nseats,votes):
"""assign n seats proportionaly to votes using Hagenbach-Bischoff quota
:param nseats: int number of seats to assign
:param votes: iterable of int or float weighting each party
:result: list of ints seats allocated to each party
"""
quota=sum(votes)/(1.+nseats) #force float
frac=[vote/quota for vote in votes]
res=[int(f) for f in frac]
n=nseats-sum(res) #number of seats remaining to allocate
if n==0: return res #done
if n<0: return [min(x,nseats) for x in res] # see siamii's comment
#give the remaining seats to the n parties with the largest remainder
remainders=[ai-bi for ai,bi in zip(frac,res)]
limit=sorted(remainders,reverse=True)[n-1]
#n parties with remainter larger than limit get an extra seat
for i,r in enumerate(remainders):
if r>=limit:
res[i]+=1
n-=1 # attempt to handle perfect equality
if n==0: return res #done
raise #should never happen
However this method doesn't always give the same number of seats to parties with perfect equality as in your case:
proportional(20,[3, 3, 3, 3, 3, 3, 18])
[2,2,2,2,1,1,10]
You have set 3 incompatible requirements. An integer-valued array proportional to [1,1,1] cannot be made to sum to exactly 20. You must choose to break one of the "sum to exactly 20", "proportional to input", and "integer values" requirements.
If you choose to break the requirement for integer values, then use floating point or rational numbers. If you choose to break the exact sum requirement, then you've already solved the problem. Choosing to break proportionality is a little trickier. One approach you might take is to figure out how far off your sum is, and then distribute corrections randomly through the output array. For example, if your input is:
[1, 1, 1]
then you could first make it sum as well as possible while still being proportional:
[7, 7, 7]
and since 20 - (7+7+7) = -1, choose one element to decrement at random:
[7, 6, 7]
If the error was 4, you would choose four elements to increment.
A naïve solution that doesn't perform well, but will provide the right result...
Write an iterator that given an array with eight integers (candidate) and the input array, output the index of the element that is farthest away from being proportional to the others (pseudocode):
function next_index(candidate, input)
// Calculate weights
for i in 1 .. 8
w[i] = candidate[i] / input[i]
end for
// find the smallest weight
min = 0
min_index = 0
for i in 1 .. 8
if w[i] < min then
min = w[i]
min_index = i
end if
end for
return min_index
end function
Then just do this
result = [0, 0, 0, 0, 0, 0, 0, 0]
result[next_index(result, input)]++ for 1 .. 20
If there is no optimal solution, it'll skew towards the beginning of the array.
Using the approach above, you can reduce the number of iterations by rounding down (as you did in your example) and then just use the approach above to add what has been left out due to rounding errors:
result = <<approach using rounding down>>
while sum(result) < 20
result[next_index(result, input)]++
So the answers and comments above were helpful... particularly the decreasing sum comment from #Frederik.
The solution I came up with takes advantage of the fact that for an input array v, sum(v_i * 20) is divisible by sum(v). So for each value in v, I mulitply by 20 and divide by the sum. I keep the quotient, and accumulate the remainder. Whenever the accumulator is greater than sum(v), I add one to the value. That way I'm guaranteed that all the remainders get rolled into the results.
Is that legible? Here's the implementation in Python:
def proportion(values, total):
# set up by getting the sum of the values and starting
# with an empty result list and accumulator
sum_values = sum(values)
new_values = []
acc = 0
for v in values:
# for each value, find quotient and remainder
q, r = divmod(v * total, sum_values)
if acc + r < sum_values:
# if the accumlator plus remainder is too small, just add and move on
acc += r
else:
# we've accumulated enough to go over sum(values), so add 1 to result
if acc > r:
# add to previous
new_values[-1] += 1
else:
# add to current
q += 1
acc -= sum_values - r
# save the new value
new_values.append(q)
# accumulator is guaranteed to be zero at the end
print new_values, sum_values, acc
return new_values
(I added an enhancement that if the accumulator > remainder, I increment the previous value instead of the current value)

Resources