Find the smallest sum of the squares of two measurements taken at least 5 min apart - algorithm

I'm trying to solve this problem in Python3. I know how to find min1 and min2, but I cannot guess how to search 5 elements in a single pass.
Problem Statement
The input program serves measurements performed by a device at intervals of 1 minute. All data are in natural numbers not exceeding 1000. The problem is to find the smallest sum of the squares of two measurements performed at intervals not less than 5 minutes apart. The first line will contain one natural number -- the number of measurements N. It is guaranteed that 5 < N <= 10000. Each of the following N lines contains one natural number -- the result of the next measurement.
Your program should output a single number, the lowest sum of the squares of two measurements performed at intervals not less than 5 minutes apart.
Sample input:
9
12
45
5
4
21
20
10
12
26
Expected output: 169

I like this question. Fun brain-teaser. :)
I noticed your sample input was all integers in range(1, 100) with some repetition, so I generated sample lists like so:
>>> import random
>>> sample_list = [random.choice(range(1, 100)) for i in range(10)]
>>> sample_list
[74, 68, 57, 18, 36, 8, 89, 73, 77, 80]
According to the problem statement, these numbers represent data measured at one-minute intervals, and one of our constraints is that our result must represent data gathered at least five minutes apart. Ultimately, that means the indices of the data in the original list must differ by at least five. In other words, for any two inputs v1 and v2:
abs(sample_list.index(v1) - sample_list.index(v2)) >= 5
must be true. We also know that we're searching for the smallest sum, so it will be helpful to look at the smallest numbers first.
Thus, I started by mapping the values in the sample_list to the indices where they occur, then sorting them:
>>> occurrences = {}
>>> for index, value in enumerate(sample_list):
... try:
... occurrences[value].append(index)
... except KeyError:
... occurrences[value] = [index]
...
>>> occurrences
{80: [9], 18: [3], 68: [1], 73: [7], 89: [6], 8: [5], 57: [2], 74: [0], 77: [8], 36: [4]}
>>> sorted_occurrences = sorted(occurrences)
>>> sorted_occurrences
[8, 18, 36, 57, 68, 73, 74, 77, 80, 89]
After a whole lot of trial and error, here's what I finally came up with in function form (including some of the earlier-discussed pieces):
def smallest_sum_of_squares_five_apart(sample):
occurrences = {}
for index, value in enumerate(sample):
try:
occurrences[value].append(index)
except KeyError:
occurrences[value] = [index]
sorted_occurrences = sorted(occurrences)
least_sum = 0
for index, v1 in enumerate(sorted_occurrences):
if least_sum and v1**2 > least_sum:
return least_sum
for v2 in sorted_occurrences[:index+1]:
if (abs(max(occurrences[v1]) - min(occurrences[v2])) >= 5 or
abs(max(occurrences[v2]) - min(occurrences[v1])) >= 5):
print('Found candidates:', str((v1, v2)))
sum_of_squares = v1**2 + v2**2
if not least_sum or sum_of_squares < least_sum:
least_sum = sum_of_squares
return least_sum
The idea here is to:
Start by looking at the smallest values first.
Compare them one by one with all the values smaller, up to themselves.
Check each against our criteria. Notice we do this by checking the extremes of each, where these two numbers occur the farthest away from one another in the original sample.
Break out when checking becomes pointless.
Unfortunately, it is not sufficient to find the first one. Depending how the list is constructed, it will not always find the smallest pair first this way. In fact, it does not for your own sample input. However, once v1**2 (the square of the larger value) is larger than the sum, we know since all numbers are natural numbers it is pointless to continue looking.
I have included a full runnable implementation of this below. It takes a command line argument (default 10) indicating the number of items you want in the randomly generated sample. It will print the randomly generated sample as well as all candidate pairs it checked, and finally the sum itself. I have checked this on 10-sized inputs several times and it seems to be working in general. However, feedback is welcome if it is not correct. Note also you can uncomment your sample list from the question to see how it works (and that it gets the right answer) for it.
import random
import sys
def smallest_sum_of_squares_five_apart(sample):
occurrences = {}
for index, value in enumerate(sample):
try:
occurrences[value].append(index)
except KeyError:
occurrences[value] = [index]
sorted_occurrences = sorted(occurrences)
least_sum = 0
for index, v1 in enumerate(sorted_occurrences):
if least_sum and v1**2 > least_sum:
return least_sum
for v2 in sorted_occurrences[:index+1]:
if (abs(max(occurrences[v1]) - min(occurrences[v2])) >= 5 or
abs(max(occurrences[v2]) - min(occurrences[v1])) >= 5):
print('Found candidates:', str((v1, v2)))
sum_of_squares = v1**2 + v2**2
if not least_sum or sum_of_squares < least_sum:
least_sum = sum_of_squares
return least_sum
if __name__ == '__main__':
try:
r = int(sys.argv[1])
except IndexError:
r = 10
sample_list = [random.choice(range(1, 100)) for i in range(r)]
#sample_list = [9, 12, 45, 5, 4, 21, 20, 10, 12, 26]
print(sample_list)
print(smallest_sum_of_squares_five_apart(sample_list))

Try this:
#!/usr/bin/env python3
import queue
inp = [9,12,45,5,4,21,20,10,12,26]
q = queue.Queue() #Make a new queue
smallest = False #No smallest number, yet
best = False #No best sum of squares, yet
for x in inp:
q.put(x) #Place current element on queue
#If there's an item from more than five minutes ago, consider it
if q.qsize()>5:
temp = q.get() #Pop oldest item from queue into temporary variable
if not smallest: #If this is the first item more than 5 minutes old
smallest = temp #it is the smallest item by default
else: #otherwise...
smallest = min(temp,smallest) #only store it if it is the smallest yet
#If we have no best sum of squares or the current item produces one, then
#save it as the best
if (not best) or (x*x+smallest*smallest<best):
best = x*x+smallest*smallest
print(best)
The idea is to walk through the queue keeping track of the smallest element we have seen yet which is older than five minutes and comparing it against the newest element keeping track of the smallest sum of squares as we go.
I think you'll find the answer to be pretty intuitive if you think about it.
The algorithm operates in O(N) time.

Related

Approaching Dynamic Programming problem / Two restrictions

Given an array A of n integers and k <= n, we want to choose k numbers from this array and split them to pairs, such that the sum of the differences of those pairs (in absolute value) is minimal.
Example: If n = 8 and k = 6 and the array is A = [140, 100, 92, 21, 32, 48, 32, 100], then the optimal answer is 27.
Does someone have an idea?
Where do I start from in this problem?
I'm really bad at DP problems, so I would appreciate an informative answer describing the right approach to solve the problem.
Thanks in advance.
Sort elements. Now pairs ought to be made only with neighbors (for cases like 10,20,20,30 pairing 10/20 + 20/30 gives the same result as 10/30 + 20/20, for cases like 10,14,20 pair 10/20 is worse than 10/14 or 14/10)
Walk through array.
If pair is opened with the last element, we have the only possibility - close that pair with current element
If there is no opened pair and number of closed pairs is less than k/2, we have two possibilities - start pair or omit current element (if number of elements in the rest of array is larger than we must use), and we have to choose the best result from these cases.
So we can build recursion and then transform it into DP (code below is not DP yet, it builds full solution tree).
A = [140, 100, 92, 21, 32, 48, 32, 100]
n = len(A)
k = 6
def best(idx, openstate, pairsleft):
if pairsleft > (n - idx + 1)//2:
return 10000000
if pairsleft == 0:
return 0
if openstate:
return abs(A[idx] - A[idx-1]) + best(idx + 1, False, pairsleft - 1)
else:
return(min(best(idx + 1, True, pairsleft), best(idx + 1, False, pairsleft)))
A.sort()
print(best(0, False, k//2))
>> 27

How to generate all set combinations in a random order

First off, I'm not even sure the terminology is the right one, as I havent found anything similar (especially since I dont even know what keywords to use)
The problem:
There is a population of people, and I want to assign them into groups. I have a set of rules to give each assignation a score. I want to find the best one (or at least a very good one).
For example, with a population of four {A,B,C,D} and assigning to two groups of two, the possible assignations are:
{A,B},{C,D}
{A,C},{B,D}
{A,D},{B,C}
And for example, {B,A},{C,D} and {C,D},{A,B} are both the same as the first one (I don't care about the order inside the groups and the order of the groups themselves).
The number of people, the amount of groups and how many people fit in each group are all inputs.
My idea was to list each possible assignation, calculate their score and keep track of the best one. That is, to brute force it. As the population can be big, I was thinking of going through them in a random order and return the best one found when time runs out (probably when the user gets bored or thinks it is a good enough find). The population can vary from very small (the four listed) to really big (maybe 200+) so just trying random ones without caring about repeats breaks down with the small ones, where a brute force is possible (plus I wouldn't know when to stop if I used plain random permutations).
The population is big enough that listing all the assignations to be able to shuffle them doesn't fit into memory. So I need either a method to find all the possible assignations in a random order, or a method to, given an index, generate the corresponding assignation, and use an index array and shuffle that (the second would be better because I can then easily distribute the tasks into multiple servers).
A simple recursive algorithm for generating these pairings is to pair the first element with each of the remaining elements, and for each of those couplings, recursively generate all the pairings of the remaining elements. For groups, generate all the groups made up of the first element and all the combinations of the remaining elements, then recurse for the remainders.
You can compute how many possible sets of groups there are like this:
public static int numGroupingCombinations(int n, int groupSize)
{
if(n % groupSize != 0)
return 0; // n must be a multiple of groupSize
int count = 1;
while(n > groupSize)
{
count *= nCr(n - 1, groupSize - 1);
n -= groupSize;
}
return count;
}
public static int nCr(int n, int r)
{
int ret = 1;
for (int k = 0; k < r; k++) {
ret = ret * (n-k) / (k+1);
}
return ret;
}
So I need either a method to find all the possible assignations in a random order, or a method to, given an index, generate the corresponding assignation, and use an index array and shuffle that (the second would be better because I can then easily distribute the tasks into multiple servers).
To generate a grouping from an index, choose a combination of items to group with the first element by taking the modulo of the index with the number of possible combinations, and generating the combination from the result using this algorithm. Then divide the index by that same number and recursively generate the rest of the set.
public static void generateGrouping(String[] elements, int groupSize, int start, int index)
{
if(elements.length % groupSize != 0)
return;
int remainingSize = elements.length - start;
if(remainingSize == 0)
{
// output the elements:
for(int i = 0; i < elements.length; i += groupSize)
{
System.out.print("[");
for(int j = 0; j < groupSize; j++)
System.out.print(((j==0)?"":",")+elements[i+j]);
System.out.print("]");
}
System.out.println("");
return;
}
int combinations = nCr(remainingSize - 1, groupSize - 1);
// decide which combination of remaining elements to pair the first element with:
int[] combination = getKthCombination(remainingSize - 1, groupSize - 1, index % combinations);
// swap elements into place
for(int i = 0; i < groupSize - 1; i++)
{
String temp = elements[start + 1 + i];
elements[start + 1 + i] = elements[start + 1 + combination[i]];
elements[start + 1 + combination[i]] = temp;
}
generateGrouping(elements, groupSize, start + groupSize, index / combinations);
// swap them back:
for(int i = groupSize - 2; i >= 0; i--)
{
String temp = elements[start + 1 + i];
elements[start + 1 + i] = elements[start + 1 + combination[i]];
elements[start + 1 + combination[i]] = temp;
}
}
public static void getKthCombination(int n, int r, int k, int[] c, int start, int offset)
{
if(r == 0)
return;
if(r == n)
{
for(int i = 0; i < r; i++)
c[start + i] = i + offset;
return;
}
int count = nCr(n - 1, r - 1);
if(k < count)
{
c[start] = offset;
getKthCombination(n-1, r-1, k, c, start + 1, offset + 1);
return;
}
getKthCombination(n-1, r, k-count, c, start, offset + 1);
}
public static int[] getKthCombination(int n, int r, int k)
{
int[] c = new int[r];
getKthCombination(n, r, k, c, 0, 0);
return c;
}
Demo
The start parameter is just how far along the list you are, so pass zero when calling the function at the top level. The function could easily be rewritten to be iterative. You could also pass an array of indices instead of an array of objects that you want to group, if swapping the objects is a large overhead.
What you call "assignations" are partitions with a fixed number of equally sized parts. Well, mostly. You didn't specify what should happen if (# of groups) * (size of each group) is less than or greater than your population size.
Generating every possible partition in a non-specific order is not too difficult, but it is only good for small populations or for filtering and finding any partition that matches some independent criteria. If you need to optimize or minimize something, you'll end up looking at the whole set of partitions, which may not be feasible.
Based on the description of your actual problem, you want to read up on local search and optimization algorithms, of which the aforementioned simulated annealing is one such technique.
With all that said, here is a simple recursive Python function that generates fixed-length partitions with equal-sized parts in no particular order. It is a specialization of my answer to a similar partition problem, and that answer is itself a specialization of this answer. It should be fairly straightforward to translate into JavaScript (with ES6 generators).
def special_partitions(population, num_groups, group_size):
"""Yields all partitions with a fixed number of equally sized parts.
Each yielded partition is a list of length `num_groups`,
and each part a tuple of length `group_size.
"""
assert len(population) == num_groups * group_size
groups = [] # a list of lists, currently empty
def assign(i):
if i >= len(population):
yield list(map(tuple, groups))
else:
# try to assign to an existing group, if possible
for group in groups:
if len(group) < group_size:
group.append(population[i])
yield from assign(i + 1)
group.pop()
# assign to an entirely new group, if possible
if len(groups) < num_groups:
groups.append([population[i]])
yield from assign(i + 1)
groups.pop()
yield from assign(0)
for partition in special_partitions('ABCD', 2, 2):
print(partition)
print()
for partition in special_partitions('ABCDEF', 2, 3):
print(partition)
When executed, this prints:
[('A', 'B'), ('C', 'D')]
[('A', 'C'), ('B', 'D')]
[('A', 'D'), ('B', 'C')]
[('A', 'B', 'C'), ('D', 'E', 'F')]
[('A', 'B', 'D'), ('C', 'E', 'F')]
[('A', 'B', 'E'), ('C', 'D', 'F')]
[('A', 'B', 'F'), ('C', 'D', 'E')]
[('A', 'C', 'D'), ('B', 'E', 'F')]
[('A', 'C', 'E'), ('B', 'D', 'F')]
[('A', 'C', 'F'), ('B', 'D', 'E')]
[('A', 'D', 'E'), ('B', 'C', 'F')]
[('A', 'D', 'F'), ('B', 'C', 'E')]
[('A', 'E', 'F'), ('B', 'C', 'D')]
Let's say we have a total of N elements that we want to organize in G groups of E (with G*E = N). Neither the order of the groups nor the order of the elements within groups matter. The end goal is to produce every solution in a random order, knowing that we cannot store every solution at once.
First, let's think about how to produce one solution. Since order doesn't matter, we can normalize any solution by sorting the elements within groups as well as the groups themselves, by their first element.
For instance, if we consider the population {A, B, C, D}, with N = 4, G = 2, E = 2, then the solution {B,D}, {C,A} can be normalized as {A,C}, {B,D}. The elements are sorted within each group (A before C), and the groups are sorted (A before B).
When the solutions are normalized, the first element of the first group is always the first element of the population. The second element is one of the N-1 remaining, the third element is one of the N-2 remaining, and so on, except these elements must remain sorted. So there are (N-1)!/((N-E)!*(E-1)!) possibilities for the first group.
Similarly, the first element of the next groups are fixed : they are the first of the remaining elements after each group has been created. Thus, the number of possibilities for the (n+1)th group (n from 0 to G-1) is (N-nE-1)!/((N-(n+1)E)!*(E-1)!) = ((G-n)E-1)!/(((G-n-1)E)!*(E-1)!).
This gives us one possible way of indexing a solution. The index is not a single integer, but rather an array of G integers, the integer n (still from 0 to G-1) being in the range 1 to (N-nE-1)!/((N-nE-E)!*(E-1)!), and representing the group n (or "(n+1)th group") of the solution. This is easy to produce randomly and to check for duplicates.
The last thing we need to find is a way to produce a group from a corresponding integer, n. We need to choose E-1 elements from the N-nE-1 remaining. At this point, you can imagine listing every combination and choosing the (n+1)th one. Of course, this can be done without generating every combination : see this question.
For curiosity, the total number of solutions is (GE)!/(G!*(E!)^G).
In your example, it is (2*2)!/(2!*(2!)^2) = 3.
For N = 200 and E = 2, there are 6.7e186 solutions.
For N = 200 and E = 5, there are 6.6e243 solutions (the maximum I found for 200 elements).
Additionally, for N = 200 and E > 13, the number of possibilities for the first group is greater than 2^64 (so it cannot be stored in a 64-bit integer), which is problematic for representing an index. But as long as you don't need groups with more than 13 elements, you can use arrays of 64-bit integers as indices.
Perhaps a simulated annealing approach might work. You can start with a non-optimal initial solution and iterate using heuristics to improve.
Your scoring criteria may help you choose the initial solution, e.g. make the best scoring first group you can, then with what's left make the best scoring second group, and so on.
Good choices of "neighboring states" may be implied by your scoring criteria, but at the very least, you could consider two states neighboring if they differ by a single swap.
So the iteration portion of the algorithm would be to try a bunch of swaps, sampled randomly, and choose the one that improves the global score according to the annealing schedule.
I'm hoping you can find a better choice of the adjacent states! That is, I'm hoping you can find better rules for iteratively improving based on your scoring criteria.
If you have a sufficiently large population that you can't fit all assignations in memory and are unlikely to ever test all possible assignations, then the simplest method will just be to choose test assignations randomly. For example:
repeat
randomly shuffle population
put 1st n/2 members of the shuffled pop into assig1 and 2nd n/2 into assig2
score assignation and record it if best so far
until bored
If you have a large population it is unlikely that there will be much loss of efficiency due to duplicating a test as it is unlikely that you would chance on the same assignation again.
Depending on your scoring rules it may be more efficient to choose the next assignation to be tested by, for example swapping a pair of members between the best assignation so far found, but you haven't provided enough information to determine if that is the case.
Here is an approach targeting your optimization-problem (and ignoring your permutation-based approach).
I formulate the problem as mixed-integer-problem and use specialized solvers to calculate good solutions.
As your problem is not well-formulated, it might need some modifications. But the general message is: this approach will be hard to beat!.
Code
import numpy as np
from cvxpy import *
""" Parameters """
N_POPULATION = 50
GROUPSIZES = [3, 6, 12, 12, 17]
assert sum(GROUPSIZES) == N_POPULATION
N_GROUPS = len(GROUPSIZES)
OBJ_FACTORS = [0.4, 0.1, 0.15, 0.35] # age is the most important
""" Create fake data """
age_vector = np.clip(np.random.normal(loc=35.0, scale=10.0, size=N_POPULATION).astype(int), 0, np.inf)
height_vector = np.clip(np.random.normal(loc=180.0, scale=15.0, size=N_POPULATION).astype(int), 0, np.inf)
weight_vector = np.clip(np.random.normal(loc=85, scale=20, size=N_POPULATION).astype(int), 0, np.inf)
skill_vector = np.random.randint(0, 100, N_POPULATION)
""" Calculate a-priori stats """
age_mean, height_mean, weight_mean, skill_mean = np.mean(age_vector), np.mean(height_vector), \
np.mean(weight_vector), np.mean(skill_vector)
""" Build optimization-model """
# Variables
X = Bool(N_POPULATION, N_GROUPS) # 1 if part of group
D = Variable(4, N_GROUPS) # aux-var for deviation-norm
# Constraints
constraints = []
# (1) each person is exactly in one group
for p in range(N_POPULATION):
constraints.append(sum_entries(X[p, :]) == 1)
# (2) each group has exactly n (a-priori known) members
for g_ind, g_size in enumerate(GROUPSIZES):
constraints.append(sum_entries(X[:, g_ind]) == g_size)
# Objective: minimize deviation from global-statistics within each group
# (ugly code; could be improved a lot!)
group_deviations = [[], [], [], []] # age, height, weight, skill
for g_ind, g_size in enumerate(GROUPSIZES):
group_deviations[0].append((sum_entries(mul_elemwise(age_vector, X[:, g_ind])) / g_size) - age_mean)
group_deviations[1].append((sum_entries(mul_elemwise(height_vector, X[:, g_ind])) / g_size) - height_mean)
group_deviations[2].append((sum_entries(mul_elemwise(weight_vector, X[:, g_ind])) / g_size) - weight_mean)
group_deviations[3].append((sum_entries(mul_elemwise(skill_vector, X[:, g_ind])) / g_size) - skill_mean)
for i in range(4):
for g in range(N_GROUPS):
constraints.append(D[i,g] >= abs(group_deviations[i][g]))
obj_parts = [sum_entries(OBJ_FACTORS[i] * D[i, :]) for i in range(4)]
objective = Minimize(sum(obj_parts))
""" Build optimization-problem & solve """
problem = Problem(objective, constraints)
problem.solve(solver=GUROBI, verbose=True, TimeLimit=120) # might need to use non-commercial solver here
print('Min-objective: ', problem.value)
""" Evaluate solution """
filled_groups = [[] for g in range(N_GROUPS)]
for g_ind, g_size in enumerate(GROUPSIZES):
for p in range(N_POPULATION):
if np.isclose(X[p, g_ind].value, 1.0):
filled_groups[g_ind].append(p)
for g_ind, g_size in enumerate(GROUPSIZES):
print('Group: ', g_ind, ' of size: ', g_size)
print(' ' + str(filled_groups[g_ind]))
group_stats = []
for g in range(N_GROUPS):
age_mean_in_group = age_vector[filled_groups[g]].mean()
height_mean_in_group = height_vector[filled_groups[g]].mean()
weight_mean_in_group = weight_vector[filled_groups[g]].mean()
skill_mean_in_group = skill_vector[filled_groups[g]].mean()
group_stats.append((age_mean_in_group, height_mean_in_group, weight_mean_in_group, skill_mean_in_group))
print('group-assignment solution means: ')
for g in range(N_GROUPS):
print(np.round(group_stats[g], 1))
""" Compare with input """
input_data = np.vstack((age_vector, height_vector, weight_vector, skill_vector))
print('input-means')
print(age_mean, height_mean, weight_mean, skill_mean)
print('input-data')
print(input_data)
Output (time-limit of 2 minutes; commercial solver)
Time limit reached
Best objective 9.612058823514e-01, best bound 4.784117647059e-01, gap 50.2280%
('Min-objective: ', 0.961205882351435)
('Group: ', 0, ' of size: ', 3)
[16, 20, 27]
('Group: ', 1, ' of size: ', 6)
[26, 32, 34, 45, 47, 49]
('Group: ', 2, ' of size: ', 12)
[0, 6, 10, 12, 15, 21, 24, 30, 38, 42, 43, 48]
('Group: ', 3, ' of size: ', 12)
[2, 3, 13, 17, 19, 22, 23, 25, 31, 36, 37, 40]
('Group: ', 4, ' of size: ', 17)
[1, 4, 5, 7, 8, 9, 11, 14, 18, 28, 29, 33, 35, 39, 41, 44, 46]
group-assignment solution means:
[ 33.3 179.3 83.7 49. ]
[ 33.8 178.2 84.3 49.2]
[ 33.9 178.7 83.8 49.1]
[ 33.8 179.1 84.1 49.2]
[ 34. 179.6 84.7 49. ]
input-means
(33.859999999999999, 179.06, 84.239999999999995, 49.100000000000001)
input-data
[[ 22. 35. 28. 32. 41. 26. 25. 37. 32. 26. 36. 36.
27. 34. 38. 38. 38. 47. 35. 35. 34. 30. 38. 34.
31. 21. 25. 28. 22. 40. 30. 18. 32. 46. 38. 38.
49. 20. 53. 32. 49. 44. 44. 42. 29. 39. 21. 36.
29. 33.]
[ 161. 158. 177. 195. 197. 206. 169. 182. 182. 198. 165. 185.
171. 175. 176. 176. 172. 196. 186. 172. 184. 198. 172. 162.
171. 175. 178. 182. 163. 176. 192. 182. 187. 161. 158. 191.
182. 164. 178. 174. 197. 156. 176. 196. 170. 197. 192. 171.
191. 178.]
[ 85. 103. 99. 93. 71. 109. 63. 87. 60. 94. 48. 122.
56. 84. 69. 162. 104. 71. 92. 97. 101. 66. 58. 69.
88. 69. 80. 46. 74. 61. 25. 74. 59. 69. 112. 82.
104. 62. 98. 84. 129. 71. 98. 107. 111. 117. 81. 74.
110. 64.]
[ 81. 67. 49. 74. 65. 93. 25. 7. 99. 34. 37. 1.
25. 1. 96. 36. 39. 41. 33. 28. 17. 95. 11. 80.
27. 78. 97. 91. 77. 88. 29. 54. 16. 67. 26. 13.
31. 57. 84. 3. 87. 7. 99. 35. 12. 44. 71. 43.
16. 69.]]
Solution remarks
This solution looks quite nice (regarding mean-deviation) and it only took 2 minutes (we decided on the time-limit a-priori)
We also got tight bounds: 0.961 is our solution; we know it can't be lower than 4.784
Reproducibility
The code uses numpy and cvxpy
An commercial solver was used
You might need to use a non-commercial MIP-solver (supporting time-limit for early abortion; take current best-solution)
The valid open-source MIP-solvers supported in cvxpy are: cbc (no chance of setting time-limits for now) and glpk (check the docs for time-limit support)
Model decisions
The code uses L1-norm penalization, which results in an MIP-problem
Depending on your problem, it might be wise to use L2-norm penalization (one big deviation hurts more than many smaller ones), which will result in a harder problem (MIQP / MISOCP)

finding out the maximum number if cost is associated with using each digit

I have been given the total money I have. Now I know the cost it takes to write down each digit (1 to 9). So how to create a maximum number out of it? Is there any dynamic programming approach for this problem?
Example:
total money available = 2
cost of each digit (1 to 9) = 9, 11, 1, 12, 5, 8, 9, 10, 6
output:33
I don't think you need dynamic programming, just do the following:
Pick as many of the digit that costs the least as you can afford.
Now you have a number (consisting of only 1 type of digit).
Replace the first digit with the greatest possible digit that you can afford
If you have money left, do the same for the second, and the third and so on, until you run out of money.
Why this works:
Consider that 11111 > 9999 and 91111 > 88888, or, in words, it's best to:
Pick as many digits as possible, which is done by picking the cheapest digits.
Then replace these digits, from the left, with the highest valued digit you can afford (this is always better than picking a few more expensive digits to start).
Optimization:
To do this efficiently, discard any digits that cost more than a bigger digit: (because it's never a good idea to pick that one instead of a cheaper digit with a bigger value)
Given:
9, 11, 1, 12, 5, 8, 9, 10, 6
Removing all those where I put an X:
X, X, 1, X, 5, X, X, X, 6
So:
1, 5, 6
Now you can just do binary search on it (just remember which digit which value came from) (although, for only 9 digits, binary search doesn't really do wonders for the already-minor running time).
Running time:
O(n) (with or without the optimization, since 9 is constant)
Here is the code implemented on the algorithm proposed by Bernhard Baker's answer.
The question was asked in my hackerank exam conducted by Providence Health Services. The question is commonly asked in interview as Largest Number of Vaccines.
total_money = 2
cost_of_digit = [9, 11, 1, 12, 5, 8, 9, 10, 6]
# Appending the list digits with [weight, number]
k=1
digits=list()
for i in cost_of_digit:
digits.append([i,k])
k+=1
# Discarding any digits that cost more than a bigger digit: (because it's never a good idea to pick that one instead of a cheaper digit with a bigger value)
i = 8
while(i>0):
if digits[i][0] <= digits[i-1][0]:
del digits[i-1]
i-=1
else:
i-=1
# Sorting the digits based on weight
digits.sort(key=lambda x:x[0],reverse=True)
max_digits = total_money//digits[-1][0] # Max digits that we can have in ANSWER
selected_digit_weight = digits[-1][0]
ans=list()
if max_digits > 0:
for i in range(max_digits):
ans.append(digits[-1][1])
# Calculating extra money we have after the selected digits
extra_money = total_money % digits[-1][0]
index = 0
# Sorting digits in Descending order according to their value
digits.sort(key=lambda x:x[1],reverse=True)
while(extra_money >= 0 and index < max_digits):
temp = extra_money + selected_digit_weight # The money we have to replace the digit
swap = False # If no digit is changed we can break the while loop
for i in digits:
# Checking if the weight is less than money we have AND the digit is greater than previous digit
if i[0] <= temp and i[1] > digits[-1][1]:
ans[index] = i[1]
index += 1
extra_money = temp - i[0]
swap = True
break
if(not swap):
break
if len(ans) == 0:
print("Number cannot be formed")
else:
for i in ans:
print(i,end="")
print()
Resolve the Knapsack problem
with
Cost = Digit cost
Value = Digit number (9 is better than 1)
then sort in decreasing number.

Allocate an array of integers proportionally compensating for rounding errors

I have an array of non-negative values. I want to build an array of values who's sum is 20 so that they are proportional to the first array.
This would be an easy problem, except that I want the proportional array to sum to exactly
20, compensating for any rounding error.
For example, the array
input = [400, 400, 0, 0, 100, 50, 50]
would yield
output = [8, 8, 0, 0, 2, 1, 1]
sum(output) = 20
However, most cases are going to have a lot of rounding errors, like
input = [3, 3, 3, 3, 3, 3, 18]
naively yields
output = [1, 1, 1, 1, 1, 1, 10]
sum(output) = 16 (ouch)
Is there a good way to apportion the output array so that it adds up to 20 every time?
There's a very simple answer to this question: I've done it many times. After each assignment into the new array, you reduce the values you're working with as follows:
Call the first array A, and the new, proportional array B (which starts out empty).
Call the sum of A elements T
Call the desired sum S.
For each element of the array (i) do the following:
a. B[i] = round(A[i] / T * S). (rounding to nearest integer, penny or whatever is required)
b. T = T - A[i]
c. S = S - B[i]
That's it! Easy to implement in any programming language or in a spreadsheet.
The solution is optimal in that the resulting array's elements will never be more than 1 away from their ideal, non-rounded values. Let's demonstrate with your example:
T = 36, S = 20. B[1] = round(A[1] / T * S) = 2. (ideally, 1.666....)
T = 33, S = 18. B[2] = round(A[2] / T * S) = 2. (ideally, 1.666....)
T = 30, S = 16. B[3] = round(A[3] / T * S) = 2. (ideally, 1.666....)
T = 27, S = 14. B[4] = round(A[4] / T * S) = 2. (ideally, 1.666....)
T = 24, S = 12. B[5] = round(A[5] / T * S) = 2. (ideally, 1.666....)
T = 21, S = 10. B[6] = round(A[6] / T * S) = 1. (ideally, 1.666....)
T = 18, S = 9. B[7] = round(A[7] / T * S) = 9. (ideally, 10)
Notice that comparing every value in B with it's ideal value in parentheses, the difference is never more than 1.
It's also interesting to note that rearranging the elements in the array can result in different corresponding values in the resulting array. I've found that arranging the elements in ascending order is best, because it results in the smallest average percentage difference between actual and ideal.
Your problem is similar to a proportional representation where you want to share N seats (in your case 20) among parties proportionnaly to the votes they obtain, in your case [3, 3, 3, 3, 3, 3, 18]
There are several methods used in different countries to handle the rounding problem. My code below uses the Hagenbach-Bischoff quota method used in Switzerland, which basically allocates the seats remaining after an integer division by (N+1) to parties which have the highest remainder:
def proportional(nseats,votes):
"""assign n seats proportionaly to votes using Hagenbach-Bischoff quota
:param nseats: int number of seats to assign
:param votes: iterable of int or float weighting each party
:result: list of ints seats allocated to each party
"""
quota=sum(votes)/(1.+nseats) #force float
frac=[vote/quota for vote in votes]
res=[int(f) for f in frac]
n=nseats-sum(res) #number of seats remaining to allocate
if n==0: return res #done
if n<0: return [min(x,nseats) for x in res] # see siamii's comment
#give the remaining seats to the n parties with the largest remainder
remainders=[ai-bi for ai,bi in zip(frac,res)]
limit=sorted(remainders,reverse=True)[n-1]
#n parties with remainter larger than limit get an extra seat
for i,r in enumerate(remainders):
if r>=limit:
res[i]+=1
n-=1 # attempt to handle perfect equality
if n==0: return res #done
raise #should never happen
However this method doesn't always give the same number of seats to parties with perfect equality as in your case:
proportional(20,[3, 3, 3, 3, 3, 3, 18])
[2,2,2,2,1,1,10]
You have set 3 incompatible requirements. An integer-valued array proportional to [1,1,1] cannot be made to sum to exactly 20. You must choose to break one of the "sum to exactly 20", "proportional to input", and "integer values" requirements.
If you choose to break the requirement for integer values, then use floating point or rational numbers. If you choose to break the exact sum requirement, then you've already solved the problem. Choosing to break proportionality is a little trickier. One approach you might take is to figure out how far off your sum is, and then distribute corrections randomly through the output array. For example, if your input is:
[1, 1, 1]
then you could first make it sum as well as possible while still being proportional:
[7, 7, 7]
and since 20 - (7+7+7) = -1, choose one element to decrement at random:
[7, 6, 7]
If the error was 4, you would choose four elements to increment.
A naïve solution that doesn't perform well, but will provide the right result...
Write an iterator that given an array with eight integers (candidate) and the input array, output the index of the element that is farthest away from being proportional to the others (pseudocode):
function next_index(candidate, input)
// Calculate weights
for i in 1 .. 8
w[i] = candidate[i] / input[i]
end for
// find the smallest weight
min = 0
min_index = 0
for i in 1 .. 8
if w[i] < min then
min = w[i]
min_index = i
end if
end for
return min_index
end function
Then just do this
result = [0, 0, 0, 0, 0, 0, 0, 0]
result[next_index(result, input)]++ for 1 .. 20
If there is no optimal solution, it'll skew towards the beginning of the array.
Using the approach above, you can reduce the number of iterations by rounding down (as you did in your example) and then just use the approach above to add what has been left out due to rounding errors:
result = <<approach using rounding down>>
while sum(result) < 20
result[next_index(result, input)]++
So the answers and comments above were helpful... particularly the decreasing sum comment from #Frederik.
The solution I came up with takes advantage of the fact that for an input array v, sum(v_i * 20) is divisible by sum(v). So for each value in v, I mulitply by 20 and divide by the sum. I keep the quotient, and accumulate the remainder. Whenever the accumulator is greater than sum(v), I add one to the value. That way I'm guaranteed that all the remainders get rolled into the results.
Is that legible? Here's the implementation in Python:
def proportion(values, total):
# set up by getting the sum of the values and starting
# with an empty result list and accumulator
sum_values = sum(values)
new_values = []
acc = 0
for v in values:
# for each value, find quotient and remainder
q, r = divmod(v * total, sum_values)
if acc + r < sum_values:
# if the accumlator plus remainder is too small, just add and move on
acc += r
else:
# we've accumulated enough to go over sum(values), so add 1 to result
if acc > r:
# add to previous
new_values[-1] += 1
else:
# add to current
q += 1
acc -= sum_values - r
# save the new value
new_values.append(q)
# accumulator is guaranteed to be zero at the end
print new_values, sum_values, acc
return new_values
(I added an enhancement that if the accumulator > remainder, I increment the previous value instead of the current value)

Non-repeating pseudo random number stream with 'clumping'

I'm looking for a method to generate a pseudorandom stream with a somewhat odd property - I want clumps of nearby numbers.
The tricky part is, I can only keep a limited amount of state no matter how large the range is. There are algorithms that give a sequence of results with minimal state (linear congruence?)
Clumping means that there's a higher probability that the next number will be close rather than far.
Example of a desirable sequence (mod 10): 1 3 9 8 2 7 5 6 4
I suspect this would be more obvious with a larger stream, but difficult to enter by hand.
Update:
I don't understand why it's impossible, but yes, I am looking for, as Welbog summarized:
Non-repeating
Non-Tracking
"Clumped"
Cascade a few LFSRs with periods smaller than you need, combining them to get a result such than the fastest changing register controls the least significant values. So if you have L1 with period 3, L2 with period 15 and L3 with some larger period, N = L1(n) + 3 * L2(n/3) + 45 * L3(n/45). This will obviously generate 3 clumped values, then jump and general another 3 clumped values. Use something other than multiplication ( such as mixing some of the bits of the higher period registers ) or different periods to make the clump spread wider than the period of the first register. It won't be particularly smoothly random, but it will be clumpy and non-repeating.
For the record, I'm in the "non-repeating, non-random, non-tracking is a lethal combination" camp, and I hope some simple though experiments will shed some light. This is not formal proof by any means. Perhaps someone will shore it up.
So, I can generate a sequence that has some randomness easily:
Given x_i, x_(i+1) ~ U(x_i, r), where r > x_i.
For example:
if x_i = 6, x_(i+1) is random choice from (6+epsilon, some_other_real>6). This guarantees non-repeating, but at the cost that the distribution is monotonically increasing.
Without some condition (like monotonicity), inherent to the sequence of generated numbers themselves, how else can you guarantee uniqueness without carrying state?
Edit: So after researching RBarryYoung's claim of "Linear Congruential Generators" (not differentiators... is this what RBY meant), and clearly, I was wrong! These sequences exist, and by necessity, any PRNG whose next number is dependent only on the current number and some global, non changing state can't have repeats within a cycle (after some initial burn it period).
By defining the "clumping features" in terms of the probability distribution of its size, and the probability distribution of its range, you can then use simple random generators with the underlying distribution and produce the sequences.
One way to get "clumpy" numbers would be to use a normal distribution.
You start the random list with your "initial" random value, then you generate a random number with the mean of the previous random value and a constant variance, and repeat as necessary. The overall variance of your entire list of random numbers should be approximately constant, but the "running average" of your numbers will drift randomly with no particular bias.
>>> r = [1]
>>> for x in range(20):
r.append(random.normalvariate(r[-1], 1))
>>> r
[1, 0.84583267252801408, 0.18585962715584259, 0.063850022580489857, 1.2892164299497422,
0.019381814281494991, 0.16043424295472472, 0.78446377124854461, 0.064401889591144235,
0.91845494342245126, 0.20196939102054179, -1.6521524237203531, -1.5373703928440983,
-2.1442902977248215, 0.27655425357702956, 0.44417440706703393, 1.3128647361934616,
2.7402744740729705, 5.1420432435119352, 5.9326297626477125, 5.1547981880261782]
I know it's hard to tell by looking at the numbers, but you can sort of see that the numbers clump together a little bit - the 5.X's at the end, and the 0.X's on the second row.
If you need only integers, you can just use a very large mean and variance, and truncate/divide to obtain integer output. Normal distributions by definition are a continuous distribution, meaning all real numbers are potential output - it is not restricted to integers.
Here's a quick scatter plot in Excel of 200 numbers generated this way (starting at 0, constant variance of 1):
scatter data http://img178.imageshack.us/img178/8677/48855312.png
Ah, I just read that you want non-repeating numbers. No guarantee of that in a normal distribution, so you might have to take into account some of the other approaches others have mentioned.
I don't know of an existing algorithm that would do this, but it doesn't seem difficult to roll your own (depending on how stringent the "limited amount of state" requirement is). For example:
RANGE = (1..1000)
CLUMP_ODDS = .5
CLUMP_DIST = 10
last = rand(RANGE)
while still_want_numbers
if rand(CLUMP_ODDS) # clump!
next = last + rand(CLUMP_DIST) - (CLUMP_DIST / 2) # do some boundary checking here
else # don't clump!
next = rand(RANGE)
end
print next
last = next
end
It's a little rudimentary, but would something like that suit your needs?
In the range [0, 10] the following should give a uniform distribution. random() yields a (pseudo) random number r with 0 <= r < 1.
x(n + 1) = (x(n) + 5 * (2 * random() - 1)) mod 10
You can get your desired behavior by delinearizing random() - for example random()^k will be skewed towards small numbers for k > 1. An possible function could be the following, but you will have to try some exponents to find your desired distribution. And keep the exponent odd, if you use the following function... ;)
x(n + 1) = (x(n) + 5 * (2 * random() - 1)^3) mod 10
How about (psuedo code)
// clumpiness static in that value retained between calls
static float clumpiness = 0.0f; // from 0 to 1.0
method getNextvalue(int lastValue)
float r = rand(); // float from 0 to 1
int change = MAXCHANGE * (r - 0.5) * (1 - clumpiness);
clumpiness += 0.1 * rand() ;
if (clumpiness >= 1.0) clumpiness -= 1.0;
// -----------------------------------------
return Round(lastValue + change);
Perhaps you could generate a random sequence, and then do some strategic element swapping to get the desired property.
For example, if you find 3 values a,b,c in the sequence such that a>b and a>c, then with some probability you could swap elements a and b or elements a and c.
EDIT in response to comment:
Yes, you could have a buffer on the stream that is whatever size you are comfortable with. Your swapping rules could be deterministic, or based on another known, reproducible psuedo-random sequence.
Does a sequence like 0, 94, 5, 1, 3, 4, 14, 8, 10, 9, 11, 6, 12, 7, 16, 15, 17, 19, 22, 21, 20, 13, 18, 25, 24, 26, 29, 28, 31, 23, 36, 27, 42, 41, 30, 33, 34, 37, 35, 32, 39, 47, 44, 46, 40, 38, 50, 43, 45, 48, 52, 49, 55, 54, 57, 56, 64, 51, 60, 53, 59, 62, 61, 69, 68, 63, 58, 65, 71, 70, 66, 73, 67, 72, 79, 74, 81, 77, 76, 75, 78, 83, 82, 85, 80, 87, 84, 90, 89, 86, 96, 93, 98, 88, 92, 99, 95, 97, 2, 91 (mod 100) look good to you?
This is the output of a small ruby program (explanations below):
#!/usr/bin/env ruby
require 'digest/md5'
$seed = 'Kind of a password'
$n = 100 # size of sequence
$k = 10 # mixing factor (higher means less clumping)
def pseudo_random_bit(k, n)
Digest::MD5.hexdigest($seed + "#{k}|#{n}")[-1] & 1
end
def sequence(x)
h = $n/2
$k.times do |k|
# maybe exchange 1st with 2nd, 3rd with 4th, etc
x ^= pseudo_random_bit(k, x >> 1) if x < 2*h
# maybe exchange 1st with last
if [0, $n-1].include? x
x ^= ($n-1)*pseudo_random_bit(k, 2*h)
end
# move 1st to end
x = (x - 1) % $n
# maybe exchange 1st with 2nd, 3rd with 4th, etc
# (corresponds to 2nd with 3rd, 4th with 5th, etc)
x ^= pseudo_random_bit(k, h+(x >> 1)) if x < 2*(($n-1)/2)
# move 1st to front
x = (x + 1) % $n
end
x
end
puts (0..99).map {|x| sequence(x)}.join(', ')
The idea is basically to start with the sequence 0..n-1 and disturb the order by passing k times over the sequence (more passes means less clumping). In each pass one first looks at the pairs of numbers at positions 0 and 1, 2 and 3, 4 and 5 etc (general: 2i and 2i+1) and flips a coin for each pair. Heads (=1) means exchange the numbers in the pair, tails (=0) means don't exchange them. Then one does the same for the pairs at positions 1 and 2, 3 and 4, etc (general: 2i+1 and 2i+2). As you mentioned that your sequence is mod 10, I additionally exchanged positions 0 and n-1 if the coin for this pair dictates it.
A single number x can be mapped modulo n after k passes to any number of the interval [x-k, x+k] and is approximately binomial distributed around x. Pairs (x, x+1) of numbers are not independently modified.
As pseudo-random generator I used only the last of the 128 output bits of the hash function MD5, choose whatever function you want instead. Thanks to the clumping one won't get a "secure" (= unpredictable) random sequence.
Maybe you can chain together 2 or more LCGs in a similar manner described for the LSFRs described here. Incement the least-significant LCG with its seed, on full-cycle, increment the next LCG. You only need to store a seed for each LCG. You could then weight each part and sum the parts together. To avoid repititions in the 'clumped' LstSig part you can randomly reseed the LCG on each full cycle.

Resources