Problem coming up with an array function - algorithm

Let's say I have an increasing sequence of integers: seq = [1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4 ... ] not guaranteed to have exactly the same number of each integer but guaranteed to be increasing by 1.
Is there a function F that can operate on this sequence whereby F(seq, x) would give me all 1's when an integer in the sequence equals x and all other integers would be 0.
For example:
t = [1, 1, 1, 1, 2, 2, 3, 3, 3, 4]
F(t, 2) = [0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
EDIT: I probably should have made it more clear. Is there a solution where I can do some algebraic operations on the entire array to get the desired result, without iterating over it?
So, I'm wondering if I can do something like: F(t, x) = t op x ?
In Python (t is a numpy.array) it could be:
(t * -1) % x or something...
EDIT2: I found out that the identity function I(t[i] == x) is acceptable to use as an algebraic operation. Sorry, I did not know about identity functions.

There's a very simple solution to this that doesn't require most of the restrictions you place upon the domain. Just create a new array of the same size, loop through and test for equality between the element in the array and the value you want to compare against. When they're the same, set the corresponding element in the new array to 1. Otherwise, set it to 0. The actual implementation depends on the language you're working with, but should be fairly simple.
If we do take into account your domain, you can introduce a couple of optimisations. If you start with an array of zeroes, you only need to fill in the ones. You know you don't need to start checking until the (n - 1)th element, where n is the value you're comparing against, because there must be at least one of the numbers 1 to n in increasing order. If you don't have to start at 1, you can still start at (n - start). Similarly, if you haven't come across it at array[n - 1], you can jump n - array[n - 1] more elements. You can repeat this, skipping most of the elements, as much as you need to until you either hit the right value or the end of the list (if it's not in there at all).
After you finish dealing with the value you want, there's no need to check the rest of the array, as you know it'll always be increasing. So you can stop early too.

A simple method (with C# code) is to simply iterate over the sequence and test it, returning either 1 or 0.
foreach (int element in sequence)
if (element == myValue)
yield return 1;
else
yield return 0;
(Written using LINQ)
sequence.Select(elem => elem == myValue ? 1 : 0);

A dichotomy algorithm can quickly locate the range where t[x] = n making such a function of sub-linear complexity in time.

Are you asking for a readymade c++, java API or are you asking for an algorithm? Or is this homework question?
I see the simple algorithm for scanning the array from start to end and comparing with each. If equals then put as 1 else put as 0. Anyway to put the elements in the array you will have to access each element of the new array atleast one. So overall approach will be O(1).
You can certainly reduce the comparison by starting a binary search. Once you find the required number then simply go forward and backward searching for the same number.

Here is a java method which returns a new array.
public static int[] sequence(int[] seq, int number)
{
int[] newSequence = new int[seq.length];
for ( int index = 0; index < seq.length; index++ )
{
if ( seq[index] == number )
{
newSequence[index] = 1;
}
else
{
newSequence[index] = 0;
}
}
return newSequence;
}

I would initialize an array of zeroes, then do a binary search on the sequence to find the first element that fits your criteria, and only start setting 1's from there. As soon as you have a not equal condition, stop.

Here is a way to do it in O(log n)
>>> from bisect import bisect
>>> def f(t, n):
... i = bisect(t,n-1)
... j = bisect(t,n,lo=i) - i
... return [0]*i+[1]*j+[0]*(len(t)-j-i)
...
...
>>> t = [1, 1, 1, 1, 2, 2, 3, 3, 3, 4]
>>> print f(t, 2)
[0, 0, 0, 0, 1, 1, 0, 0, 0, 0]

Related

Need an algorithm to "evenly" iterate over all possible combinations of a set of values

sorry for the horrible title, I am really struggling to find the right words for what I am looking for.
I think what I want to do is actually quite simple, but I still can't really wrap my head around creating algorithms. I bet I could have easily found a solution on the web if I wasn't lacking basic knowledge of algorithm terminology.
Let's assume I want to iterate over all combinations of an array of five integers, where each integer is a number between zero and nine. Naturally, I could just increment from 0 to 99999. [0, 0, 0, 0, 1], [0, 0, 0, 0, 2], ... [9, 9, 9, 9, 9].
However, I need to "evenly" (don't really know how to call it) increment the individual elements. Ideally, the sequence of arrays that is produced by the algorithm should look something like this:
[0,0,0,0,0] [1,0,0,0,0] [0,1,0,0,0] [0,0,1,0,0]
[0,0,0,1,0] [0,0,0,0,1] [1,1,0,0,0] [1,0,1,0,0]
[1,0,0,1,0] [1,0,0,0,1] [1,1,0,1,0] [1,1,0,0,1]
[1,1,1,0,0] [1,1,1,1,0] [1,1,1,0,1] [1,1,1,1,1]
[2,0,0,0,0] [2,1,0,0,0] [2,0,1,0,0] [2,0,0,1,0]
[2,0,0,0,1] [2,1,1,0,0] [2,1,0,1,0] .....
I probably made a few mistake in the sequence above, but maybe you can guess what I am trying to approach. Don't introduce a number higher than 1 unless every possible combination of 0s and 1s has been determined, don't introduce a number higher than 2 unless every possible combination of 0s, 1s and 2s has been determined, and so on..
I would really appreciate someone pointing me in the right direction! Thanks a lot
You've already said that you can get the combinations you are looking for by enumerating all nk possible sequences, except that you don't get them in the desired order.
You could generate the sequences in the right order if you used an odometer-style enumerator. At first, all digits must be 0 or 1. When the odometer would wrap (after 1111...), you increment the set of the digits to [0, 1, 2]. Reset the sequence to 2000... and keep iterating, but only emit sequences that have at least one 2 in them, because you've already generated all sequences of 0's and 1's. Repeat until after wrapping you go beyond the maximum threshold.
Filtering out the duplicates that don't have the current top digit in them can be done by keeping track of the count of top numbers.
Here's an implementation in C with hard-enumed limits:
enum {
SIZE = 3,
TOP = 4
};
typedef struct Generator Generator;
struct Generator {
unsigned top; // current threshold
unsigned val[SIZE]; // sequence array
unsigned tops; // count of "top" values
};
/*
* "raw" generator backend which produces all sequences
* and keeps track of how many top numbers there are
*/
int gen_next_raw(Generator *gen)
{
int i = 0;
do {
if (gen->val[i] == gen->top) gen->tops--;
gen->val[i]++;
if (gen->val[i] == gen->top) gen->tops++;
if (gen->val[i] <= gen->top) return 1;
gen->val[i++] = 0;
} while (i < SIZE);
return 0;
}
/*
* actual generator, which filters out duplicates
* and increases the threshold if needed
*/
int gen_next(Generator *gen)
{
while (gen_next_raw(gen)) {
if (gen->tops) return 1;
}
gen->top++;
if (gen->top > TOP) return 0;
memset(gen->val, 0, sizeof(gen->val));
gen->val[0] = gen->top;
gen->tops = 1;
return 1;
}
The gen_next_raw function is the base implementation of the odometer with the addition of keeping a count of current top digits. The gen_next function uses it as backend. It filters out the duplicates and increases the threshold as needed. (All that can probably be done more efficiently.)
Generate the sequence with:
Generator gen = {0};
while (gen_next(&gen)) {
if (is_good(gen.val)) {
puts("Bingo!");
break;
}
}
You could break this down into two subproblems:
get all combinations with replacement of 0, 1, 2, ... for the given number of digits
get all (unique) permutations of those combinations
Your desired ordering is still different than the order those are typically generated in (e.g. (0,1,1) before (0,0,2), and (0,0,1) before (1,0,0)), but you can just collect all the combinations and all the permutations individually and sort them, at least requiring much less memory than for generating, collecting and sorting all those combinations.
Example in Python, using implementations of those functions from the itertools library; key=lambda c: c[::-1] sorts the lists in-order, but reversing the order of the individual elements to get your desired order:
from itertools import combinations_with_replacement, permutations
places = 3
max_digit = 3
all_combs = list(combinations_with_replacement(range(0, max_digit+1), r=places))
for comb in sorted(all_combs, key=lambda c: c[::-1]):
all_perms = set(permutations(comb))
for perm in sorted(all_perms, key=lambda c: c[::-1]):
print(perm)
And some selected output (64 elements in total)
(0, 0, 0)
(1, 0, 0)
(0, 1, 0)
...
(0, 1, 1)
(1, 1, 1)
(2, 0, 0)
(0, 2, 0)
...
(0, 1, 2)
(2, 1, 1)
...
(2, 2, 2)
(3, 0, 0)
(0, 3, 0)
...
(2, 3, 3)
(3, 3, 3)
For 27 places with values up to 27 that would still be too many combinations-with-replacement to generate and sort, so this part should be replaced with a custom algorithm.
keep track of how often each digit appears; start with all zeros
find the smallest digit that has a non-zero count, increment the count of the digit after that, and redistribute the remaining smaller counts back to the smallest digit (i.e. zero)
In Python:
def generate_combinations(places, max_digit):
# initially [places, 0, 0, ..., 0]
counts = [places] + [0] * max_digit
yield [i for i, c in enumerate(counts) for _ in range(c)]
while True:
# find lowest digit with a smaller digit with non-zero count
k = next(i for i, c in enumerate(counts) if c > 0) + 1
if k == max_digit + 1:
break
# add one more to that digit, and reset all below to start
counts[k] += 1
counts[0] = places - sum(counts[k:])
for i in range(1, k):
counts[i] = 0
yield [i for i, c in enumerate(counts) for _ in range(c)]
For the second part, we can still use a standard permutations generator, although for 27! that would be too many to collect in a set, but if you expect the result in the first few hundred combinations, you might just keep track of already seen permutations and skip those, and hope that you find the result before that set grows too large...
from itertools import permutations
for comb in generate_combinations(places=3, max_digit=3):
for p in set(permutations(comb)):
print(p)
print()

Interview Question - Which numbers shows up most times in a list of intervals

I only heard of this question, so I don't know the exact limits. You are given a list of positive integers. Each two consecutive values form a closed interval. Find the number that appears in most intervals. If two values appear the same amount of times, select the smallest one.
Example: [4, 1, 6, 5] results in [1, 4], [1, 6], [5, 6] with 1, 2, 3, 4, 5 each showing up twice. The correct answer would be 1 since it's the smallest.
I unfortunately have no idea how this can be done without going for an O(n^2) approach. The only optimisation I could think of was merging consecutive descending or ascending intervals, but this doesn't really work since [4, 3, 2] would count 3 twice.
Edit: Someone commented (but then deleted) a solution with this link http://www.zrzahid.com/maximum-number-of-overlapping-intervals/. I find this one the most elegant, even though it doesn't take into account the fact that some elements in my input would be both the beginning and end of some intervals.
Sort intervals based on their starting value. Then run a swipe line from left (the global smallest value) to the right (the global maximum value) value. At each meeting point (start or end of an interval) count the number of intersection with the swipe line (in O(log(n))). Time complexity of this algorithm would be O(n log(n)) (n is the number of intervals).
The major observation is that the result will be one of the numbers in the input (proof left to the reader as simple exercise, yada yada).
My solution will be inspired by #Prune's solution. The important step is mapping the input numbers to their order within all different numbers in the input.
I will work with C++ std. We can first load all the numbers into a set. We can then create map from that, which maps a number to its order within all numbers.
int solve(input) {
set<int> vals;
for (int n : input) {
vals.insert(n);
}
map<int, int> numberOrder;
int order = 0;
for (int n : vals) { // values in a set are ordered
numberOrder[n] = order++;
}
We then create process array (similar to #Prune's solution).
int process[map.size() + 1]; // adding past-the-end element
int curr = input[0];
for (int i = 0; i < input.size(); ++i) {
last = curr;
curr = input[i];
process[numberOrder[min(last, curr)]]++;
process[numberOrder[max(last, curr)] + 1]--;
}
int appear = 0;
int maxAppear = 0;
for (int i = 0; i < process.size(); ++i) {
appear += process[i];
if (appear > maxAppear) {
maxAppear = appear;
maxOrder = i;
}
}
Last, we need to find our found value in the map.
for (pair<int, int> a : numberOrder) {
if (a.second == maxOrder) {
return a.first;
}
}
}
This solution has O(n * log(n)) time complexity and O(n) space complexity, which is independent on maximum input number size (unlike other solutions).
If the maximum number in the range array is less than the maximum size limit of an array, my solution will work with complexity o(n).
1- I created a new array to process ranges and use it to find the
numbers that appears most in all intervals. For simplicity let's use
your example. the input = [1, 4], [1, 6], [5, 6]. let's call the new
array process and give it length 6 and it is initialized with 0s
process = [0,0,0,0,0,0].
2-Then loop through all the intervals and mark the start with (+1) and
the cell immediately after my range end with (-1)
for range [1,4] process = [1,0,0,0,-1,0]
for range [1,6] process = [2,0,0,0,-1,0]
for range [5,6] process = [2,0,0,0,0,0]
3- The p rocess array will work as accumulative array. initialize a
variable let's call it appear = process[0] which will be equal to 2
in our case. Go through process and keep accumulating what can you
notice? elements 1,2,3,4,5,6 will have appear =2 because each of
them appeared twice in the given ranges .
4- Maximize while you loop through process array you will find the
solution
public class Test {
public static void main(String[] args) {
int[] arr = new int[] { 4, 1, 6, 5 };
System.out.println(solve(arr));
}
public static int solve(int[] range) {
// I assume that the max number is Integer.MAX_VALUE
int size = (int) 1e8;
int[] process = new int[size];
// fill process array
for (int i = 0; i < range.length - 1; ++i) {
int start = Math.min(range[i], range[i + 1]);
int end = Math.max(range[i], range[i + 1]);
process[start]++;
if (end + 1 < size)
process[end + 1]--;
}
// Find the number that appears in most intervals (smallest one)
int appear = process[0];
int max = appear;
int solu = 0;
for (int i = 1; i < size; ++i) {
appear += process[i];
if (appear > max){
solu = i;
max = appear;
}
}
return solu;
}
}
Think of these as parentheses: ( to start and interval, ) to end. Now check the bounds for each pair [a, b], and tally interval start/end markers for each position: the lower number gets an interval start to the left; the larger number gets a close interval to the right. For the given input:
Process [4, 1]
result: [0, 1, 0, 0, 0, -1]
Process [1, 6]
result: [0, 2, 0, 0, 0, -1, 0, -1]
Process [6, 5]
result: [0, 2, 0, 0, 0, -1, 1, -2]
Now, merely make a cumulative sum of this list; the position of the largest value is your desired answer.
result: [0, 2, 0, 0, 0, -1, 1, -2]
cumsum: [0, 2, 2, 2, 2, 1, 2, 0]
Note that the final sum must be 0, and can never be negative. The largest value is 2, which appears first at position 1. Thus, 1 is the lowest integer that appears the maximum (2) quantity.
No that's one pass on the input, and one pass on the range of numbers. Note that with a simple table of values, you can save storage. The processing table would look something like:
[(1, 2)
(4, -1)
(5, 1)
(6, -2)]
If you have input with intervals both starting and stopping at a number, then you need to handle the starts first. For instance, [4, 3, 2] would look like
[(2, 1)
(3, 1)
(3, -1)
(4, -1)]
NOTE: maintaining a sorted insert list is O(n^2) time on the size of the input; sorting the list afterward is O(n log n). Either is O(n) space.
My first suggestion, indexing on the number itself, is O(n) time, but O(r) space on the range of input values.
[

How to detect if a repeating pattern exists

My question isn't language specific... I would probably implement this in C# or Python unless there is a specific feature of a language that helps me get what I am looking for.
Is there some sort of algorithm that anyone knows of that can help me determine if a list of numbers contains a repeating pattern?
Let's say I have a several lists of numbers...
[12, 4, 5, 7, 1, 2]
[1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 1, 1, 1, 1, 1]
[ 1, 2, 4, 12, 13, 1, 2, 4, 12, 13]
I need to detect if there is a repeating pattern in each list... For example, list 1 returns false, but and lists 2, 3, and 4 return true.
I was thinking maybe taking a count of each value that appears in the list and if val 1 == val 2 == val n... then that would do it. Any better ideas?
You want to look at the autocorrelation of the signal. Autocorrelation basically does a convolution of the signal with itself. When a you iteratively slide one signal across another, and there is a repeating pattern, the output will resonate strongly.
The second and fourth strings are periodic; I'm going to assume you're looking for an algorithm for detecting periodic strings. Most fast string matching algorithms need to find periods of strings in order to compute their shifting rules.
Knuth-Morris-Pratt's preprocessing, for instance, computes, for every prefix P[0..k] of the pattern P, the length SP[k] of the longest proper suffix P[s..k] of P[0..k] that exactly matches the prefix P[0..(k-s)]. If SP[k] < k/2, then P[0..k] is aperiodic; otherwise, it is a prefix of a string with period k - SP[k].
One option would be to look at compression algorithms, some of those rely on finding repeating patterns and replacing them with another symbol. In your case you simply need the part that identifies the pattern. You may find that it is similar to the method that you've described already though
Assuming that your "repeating pattern" is always repeated in full, like your sample data suggests, you could just think of your array as a bunch of repeating arrays of equal length. Meaning:
[1, 2, 3, 1, 2, 3, 1, 2, 3] is the same as [1, 2, 3] repeated three times.
This means that you could just check to see if every x value in the array is equal to each other. So:
array[0] == array[3] == array[6]
array[1] == array[4] == array[7]
array[2] == array[5] == array[8]
Since you don't know the length of the repeated pattern, you'd just have to try all possible lengths until you found a pattern or ran out of possible shorter arrays. I'm sure there are optimizations that can be added to the following, but it works (assuming I understand the question correctly, of course).
static void Main(string[] args)
{
int[] array1 = {12, 4, 5, 7, 1, 2};
int[] array2 = {1, 2, 3, 1, 2, 3, 1, 2, 3};
int[] array3 = {1, 1, 1, 1, 1, 1 };
int[] array4 = {1, 2, 4, 12, 13, 1, 2, 4, 12, 13 };
Console.WriteLine(splitMethod(array1));
Console.WriteLine(splitMethod(array2));
Console.WriteLine(splitMethod(array3));
Console.WriteLine(splitMethod(array4));
Console.ReadLine();
}
static bool splitMethod(int[] array)
{
for(int patternLength = 1; patternLength <= array.Length/2; patternLength++)
{
// if the pattern length doesn't divide the length of the array evenly,
// then we can't have a pattern of that length.
if(array.Length % patternLength != 0)
{
continue;
}
// To check if every x value is equal, we need to give a start index
// To begin our comparisons at.
// We'll start at index 0 and check it against 0+x, 0+x+x, 0+x+x+x, etc.
// Then we'll use index 1 and check it against 1+x, 1+x+x, 1+x+x+x, etc.
// Then... etc.
// If we find that every x value starting at a given start index aren't
// equal, then we'll continue to the next pattern length.
// We'll assume our patternLength will produce a pattern and let
// our test determines if we don't have a pattern.
bool foundPattern = true;
for (int startIndex = 0; startIndex < patternLength; startIndex++)
{
if (!everyXValueEqual(array, patternLength, startIndex))
{
foundPattern = false;
break;
}
}
if (foundPattern)
{
return true;
}
}
return false;
}
static bool everyXValueEqual(int[] array, int x, int startIndex)
{
// if the next index we want to compare against is outside the bounds of the array
// we've done all the matching we can for a pattern of length x.
if (startIndex+x > array.Length-1)
return true;
// if the value at starIndex equals the value at startIndex + x
// we can go on to test values at startIndex + x and startIndex + x + x
if (array[startIndex] == array[startIndex + x])
return everyXValueEqual(array, x, startIndex + x);
return false;
}
Simple pattern recognition is the task of compression algorithms. Depending on the type of input and the type of patterns you're looking for the algorithm of choice may be very different - just consider that any file is an array of bytes and there are many types of compression for various types of data. Lossless compression finds exact patterns that repeat and lossy compression - approximate patterns where the approximation is limited by some "real-world" consideration.
In your case you can apply a pseudo zip compression where you start filling up a list of encountered sequences
here's a pseudo suggestion:
//C#-based pseudo code
int[] input = GetInputData();
var encounters = new Dictionary<ItemCount<int[],int>>();// the string and the number of times it's found
int from = 0;
for(int to=0; to<input.Length; i++){
for (int j = from; j<=i; j++){ // for each substring between 'from' and 'i'
if (encounters.ContainsKey(input.SubArray(j,i)){
if (j==from) from++; // if the entire substring already exists - move the starting point
encounters[input.SubArray(j,i)] += 1; // increase the count where the substring already exists
} else {
// consider: if (MeetsSomeMinimumRequirements(input.SubArray(j,i))
encounters.Add(input.SubArray(j,i),1); //add a new pattern
}
}
}
Output(encounters.Where(itemValue => itemValue.Value>1); // show the patterns found more than once
I haven't debugged the sample above, so use it just as a starting point. The core idea is that you'd have an encounters list where various substrings are collected and counted, the most frequent will have highest Value in the end.
You can alter the algorithm above by storing some function of the substrings instead of the entire substring or add some minimum requirements such as minimum length etc. Too many options, complete discussion is not possible within a post.
Since you're looking for repeated patterns, you could force your array into a string and run a regular expression against it. This being my second answer, I'm just playing around here.
static Regex regex = new Regex(#"^(?<main>(?<v>;\d+)+?)(\k<main>)+$", RegexOptions.Compiled);
static bool regexMethod(int[] array)
{
string a = ";" + string.Join(";", array);
return regex.IsMatch(a);
}
The regular expression is
(?<v>;\d+) - A group named "v" which matches a semicolon (the delimiter in this case) and 1 or more digits
(?<main>(?<v>;\d+)+?) - a group named "main" which matches the "v" group 1 or more times, but the least number of times it can to satisfy the regex.
(\k<main>)+ - matches the text that the "main" group matched 1 or more times
^ ... $ - these anchor the ends of the pattern to the ends of the string.

(Any Language) Find all permutations of elements in a vector using swapping

I was asked this question in a Lab session today.
We can imagine a vector containing the elements 1 ... N - 1, with a length N. Is there an algorithmic (systematic) method of generating all permutations, or orders of the elements in the vector. One proposed method was to swap random elements. Obviously this would work provided all previously generated permutations were stored for future reference, however this is obviously a very inefficient method, both space wise and time wise.
The reason for doing this by the way is to remove special elements (eg elements which are zero) from special positions in the vector, where such an element is not allowed. Therefore the random method isn't quite so ridiculous, but imagine the case where the number of elements is large and the number of possible permutations (which are such that there are no "special elements" in any of the "special positions") is low.
We tried to work through this problem for the case of N = 5:
x = [1, 2, 3, 4, 5]
First, swap elements 4 and 5:
x = [1, 2, 3, 5, 4]
Then swap 3 and 5:
x = [1, 2, 4, 5, 3]
Then 3 and 4:
x = [1, 2, 5, 4, 3]
Originally we thought using two indices, ix and jx, might be a possible solution. Something like:
ix = 0;
jx = 0;
for(;;)
{
++ ix;
if(ix >= N)
{
ix = 0;
++ jx;
if(jx >= N)
{
break; // We have got to an exit condition, but HAVENT got all permutations
}
}
swap elements at positions ix and jx
print out the elements
}
This works for the case where N = 3. However it doesn't work for higher N. We think that this sort of approach might be along the right lines. We were trying to extend to a method where 3 indexes are used, for some reason we think that might be the solution: Using a 3rd index to mark a position in the vector where the index ix starts or ends. But we got stuck, and decided to ask the SO community for advice.
One way to do this is to, for the first character e:
First recurse on the next element
Then, for each element e2 after e:
Swap e and e2
Then recurse on the next element
And undo the swap
Pseudo-code:
permutation(input, 0)
permutation(char[] array, int start)
if (start == array.length)
print array
for (int i = start; i < array.length; i++)
swap(array[start], array[i])
permutation(array, start+1)
swap(array[start], array[i])
With the main call of this function, it will try each character in the first position and then recurse. Simply looping over all the characters works here because we undo each swap afterwards, so after the recursive call returns, we're guaranteed to be back where we started.
And then, for each of those recursive calls, it tries each remaining character in the second position. And so on.
Java live demo.

Allocate an array of integers proportionally compensating for rounding errors

I have an array of non-negative values. I want to build an array of values who's sum is 20 so that they are proportional to the first array.
This would be an easy problem, except that I want the proportional array to sum to exactly
20, compensating for any rounding error.
For example, the array
input = [400, 400, 0, 0, 100, 50, 50]
would yield
output = [8, 8, 0, 0, 2, 1, 1]
sum(output) = 20
However, most cases are going to have a lot of rounding errors, like
input = [3, 3, 3, 3, 3, 3, 18]
naively yields
output = [1, 1, 1, 1, 1, 1, 10]
sum(output) = 16 (ouch)
Is there a good way to apportion the output array so that it adds up to 20 every time?
There's a very simple answer to this question: I've done it many times. After each assignment into the new array, you reduce the values you're working with as follows:
Call the first array A, and the new, proportional array B (which starts out empty).
Call the sum of A elements T
Call the desired sum S.
For each element of the array (i) do the following:
a. B[i] = round(A[i] / T * S). (rounding to nearest integer, penny or whatever is required)
b. T = T - A[i]
c. S = S - B[i]
That's it! Easy to implement in any programming language or in a spreadsheet.
The solution is optimal in that the resulting array's elements will never be more than 1 away from their ideal, non-rounded values. Let's demonstrate with your example:
T = 36, S = 20. B[1] = round(A[1] / T * S) = 2. (ideally, 1.666....)
T = 33, S = 18. B[2] = round(A[2] / T * S) = 2. (ideally, 1.666....)
T = 30, S = 16. B[3] = round(A[3] / T * S) = 2. (ideally, 1.666....)
T = 27, S = 14. B[4] = round(A[4] / T * S) = 2. (ideally, 1.666....)
T = 24, S = 12. B[5] = round(A[5] / T * S) = 2. (ideally, 1.666....)
T = 21, S = 10. B[6] = round(A[6] / T * S) = 1. (ideally, 1.666....)
T = 18, S = 9. B[7] = round(A[7] / T * S) = 9. (ideally, 10)
Notice that comparing every value in B with it's ideal value in parentheses, the difference is never more than 1.
It's also interesting to note that rearranging the elements in the array can result in different corresponding values in the resulting array. I've found that arranging the elements in ascending order is best, because it results in the smallest average percentage difference between actual and ideal.
Your problem is similar to a proportional representation where you want to share N seats (in your case 20) among parties proportionnaly to the votes they obtain, in your case [3, 3, 3, 3, 3, 3, 18]
There are several methods used in different countries to handle the rounding problem. My code below uses the Hagenbach-Bischoff quota method used in Switzerland, which basically allocates the seats remaining after an integer division by (N+1) to parties which have the highest remainder:
def proportional(nseats,votes):
"""assign n seats proportionaly to votes using Hagenbach-Bischoff quota
:param nseats: int number of seats to assign
:param votes: iterable of int or float weighting each party
:result: list of ints seats allocated to each party
"""
quota=sum(votes)/(1.+nseats) #force float
frac=[vote/quota for vote in votes]
res=[int(f) for f in frac]
n=nseats-sum(res) #number of seats remaining to allocate
if n==0: return res #done
if n<0: return [min(x,nseats) for x in res] # see siamii's comment
#give the remaining seats to the n parties with the largest remainder
remainders=[ai-bi for ai,bi in zip(frac,res)]
limit=sorted(remainders,reverse=True)[n-1]
#n parties with remainter larger than limit get an extra seat
for i,r in enumerate(remainders):
if r>=limit:
res[i]+=1
n-=1 # attempt to handle perfect equality
if n==0: return res #done
raise #should never happen
However this method doesn't always give the same number of seats to parties with perfect equality as in your case:
proportional(20,[3, 3, 3, 3, 3, 3, 18])
[2,2,2,2,1,1,10]
You have set 3 incompatible requirements. An integer-valued array proportional to [1,1,1] cannot be made to sum to exactly 20. You must choose to break one of the "sum to exactly 20", "proportional to input", and "integer values" requirements.
If you choose to break the requirement for integer values, then use floating point or rational numbers. If you choose to break the exact sum requirement, then you've already solved the problem. Choosing to break proportionality is a little trickier. One approach you might take is to figure out how far off your sum is, and then distribute corrections randomly through the output array. For example, if your input is:
[1, 1, 1]
then you could first make it sum as well as possible while still being proportional:
[7, 7, 7]
and since 20 - (7+7+7) = -1, choose one element to decrement at random:
[7, 6, 7]
If the error was 4, you would choose four elements to increment.
A naïve solution that doesn't perform well, but will provide the right result...
Write an iterator that given an array with eight integers (candidate) and the input array, output the index of the element that is farthest away from being proportional to the others (pseudocode):
function next_index(candidate, input)
// Calculate weights
for i in 1 .. 8
w[i] = candidate[i] / input[i]
end for
// find the smallest weight
min = 0
min_index = 0
for i in 1 .. 8
if w[i] < min then
min = w[i]
min_index = i
end if
end for
return min_index
end function
Then just do this
result = [0, 0, 0, 0, 0, 0, 0, 0]
result[next_index(result, input)]++ for 1 .. 20
If there is no optimal solution, it'll skew towards the beginning of the array.
Using the approach above, you can reduce the number of iterations by rounding down (as you did in your example) and then just use the approach above to add what has been left out due to rounding errors:
result = <<approach using rounding down>>
while sum(result) < 20
result[next_index(result, input)]++
So the answers and comments above were helpful... particularly the decreasing sum comment from #Frederik.
The solution I came up with takes advantage of the fact that for an input array v, sum(v_i * 20) is divisible by sum(v). So for each value in v, I mulitply by 20 and divide by the sum. I keep the quotient, and accumulate the remainder. Whenever the accumulator is greater than sum(v), I add one to the value. That way I'm guaranteed that all the remainders get rolled into the results.
Is that legible? Here's the implementation in Python:
def proportion(values, total):
# set up by getting the sum of the values and starting
# with an empty result list and accumulator
sum_values = sum(values)
new_values = []
acc = 0
for v in values:
# for each value, find quotient and remainder
q, r = divmod(v * total, sum_values)
if acc + r < sum_values:
# if the accumlator plus remainder is too small, just add and move on
acc += r
else:
# we've accumulated enough to go over sum(values), so add 1 to result
if acc > r:
# add to previous
new_values[-1] += 1
else:
# add to current
q += 1
acc -= sum_values - r
# save the new value
new_values.append(q)
# accumulator is guaranteed to be zero at the end
print new_values, sum_values, acc
return new_values
(I added an enhancement that if the accumulator > remainder, I increment the previous value instead of the current value)

Resources