Maximum of groups with minimum numbers - algorithm

I'm creating a macro for an RPG, in Lua, in it I need to get the most sets with a stack of dices. to form a group the data must add up to the minimum of each group, and may exceed this minimum.
ex: 1, 2, 4, 5, 5, 6, 7, 10 w/ min = 10 will be: 6+4, 5+5, 7+1+2, 10.
I grouped the result of each dice into an array, and pulled out the data that can form groups on their own:
for i=#dice, 1, -1 do
table.sort(dice);
minimo = tonumber(minimum)
if dice[i] >= minimum then
stack.Total = stack.total+1;
table.insert(stack.dice, 1, math.floor(dice[i]))
table.remove(dice, i);
end;
end;
it doesn't have to be in Lua, just some mathematical formula will be of great help

Here's an efficient recursive solution. It likely doesn't scale as well as solving a mixed integer program would, but it's simple and doesn't require an external library. You could probably make it even faster by memoizing it, at the expense of a lot of memory.
The core idea is: form all possible groups that meet the minimum; for each such group, make the max number of groups out of the remaining rolls; take the best solution. The rest is optimization.
The first optimization is to loop over only some groups. Since we might as well put every roll in a group, the largest roll is in some group. To avoid looping over all permutations of the groups, enumerate the possibilities for that group only.
The second optimization is to stop searching if we find a provable optimum. Obviously we can't make more groups than the floor of the sum over the minimum. If we make this many, we can't improve.
The third optimization is to avoid enumerating duplicate groups. When we decrease i, we're considering groups that did not include the element at that position. To avoid duplicates, we skip i over the elements identical to the one that we just rejected.
In Python 3:
def all_groups(minimum, rolls, j):
roll = rolls[j]
if minimum <= roll:
yield [roll], rolls[:j]
else:
i = j - 1
while i >= 0:
for group, rest in all_groups(minimum - roll, rolls, i):
group.append(roll)
rest.extend(rolls[i + 1 : j])
yield group, rest
while i > 0 and rolls[i - 1] == rolls[i]:
i -= 1
i -= 1
def max_groups_helper(minimum, rolls, lower_bound=0):
upper_bound = sum(min(roll, minimum) for roll in rolls) // minimum
if upper_bound < lower_bound:
return None
if upper_bound <= 0:
return []
best = []
for group, rest in sorted(
all_groups(minimum, rolls, len(rolls) - 1),
key=lambda group_rest: sum(group_rest[0]),
):
candidate = max_groups_helper(minimum, rest, max(lower_bound - 1, len(best)))
if candidate is None:
continue
candidate.append(group)
best = candidate
if len(best) >= upper_bound:
break
return best
def max_groups(minimum, rolls):
assert minimum > 0
rolls = list(rolls)
return max_groups_helper(minimum, rolls, 0)

Related

What is the logic behind the algorithm

I am trying to solve a problem from codility
"Even sums"
but am unable to do so. Here is the question below.
Even sums is a game for two players. Players are given a sequence of N positive integers and take turns alternately. In each turn, a player chooses a non-empty slice (a subsequence of consecutive elements) such that the sum of values in this slice is even, then removes the slice and concatenates the remaining parts of the sequence. The first player who is unable to make a legal move loses the game.
You play this game against your opponent and you want to know if you can win, assuming both you and your opponent play optimally. You move first.
Write a function:
string solution(vector< int>& A);
that, given a zero-indexed array A consisting of N integers, returns a string of format "X,Y" where X and Y are, respectively, the first and last positions (inclusive) of the slice that you should remove on your first move in order to win, assuming you have a winning strategy. If there is more than one such winning slice, the function should return the one with the smallest value of X. If there is more than one slice with the smallest value of X, the function should return the shortest. If you do not have a winning strategy, the function should return "NO SOLUTION".
For example, given the following array:
A[0] = 4 A[1] = 5 A[2] = 3 A[3] = 7 A[4] = 2
the function should return "1,2". After removing a slice from positions 1 to 2 (with an even sum of 5 + 3 = 8), the remaining array is [4, 7, 2]. Then the opponent will be able to remove the first element (of even sum 4) or the last element (of even sum 2). Afterwards you can make a move that leaves the array containing just [7], so your opponent will not have a legal move and will lose. One of possible games is shown on the following picture
Note that removing slice "2,3" (with an even sum of 3 + 7 = 10) is also a winning move, but slice "1,2" has a smaller value of X.
For the following array:
A[0] = 2 A[ 1 ] = 5 A[2] = 4
the function should return "NO SOLUTION", since there is no strategy that guarantees you a win.
Assume that:
N is an integer within the range [1..100,000]; each element of array A is an integer within the range [1..1,000,000,000]. Complexity:
expected worst-case time complexity is O(N); expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments). Elements of input arrays can be modified.
I have found a solution online in python.
def check(start, end):
if start>end:
res = 'NO SOLUTION'
else:
res = str(start) + ',' + str(end)
return res
def trans( strr ):
if strr =='NO SOLUTION':
return (-1, -1)
else:
a, b = strr.split(',')
return ( int(a), int(b) )
def solution(A):
# write your code in Python 2.7
odd_list = [ ind for ind in range(len(A)) if A[ind]%2==1 ]
if len(odd_list)%2==0:
return check(0, len(A)-1)
odd_list = [-1] + odd_list + [len(A)]
res_cand = []
# the numbers at the either end of A are even
count = odd_list[1]
second_count = len(A)-1-odd_list[-2]
first_count = odd_list[2]-odd_list[1]-1
if second_count >= count:
res_cand.append( trans(check( odd_list[1]+1, len(A)-1-count )))
if first_count >= count:
res_cand.append( trans(check( odd_list[1]+count+1, len(A)-1 )))
twosum = first_count + second_count
if second_count < count <= twosum:
res_cand.append( trans(check( odd_list[1]+(first_count-(count-second_count))+1, odd_list[-2] )))
###########################################
count = len(A)-1-odd_list[-2]
first_count = odd_list[1]
second_count = odd_list[-2]-odd_list[-3]-1
if first_count >= count:
res_cand.append( trans(check( count, odd_list[-2]-1 )))
if second_count >= count:
res_cand.append( trans(check( 0, odd_list[-2]-count-1)) )
twosum = first_count + second_count
if second_count < count <= twosum:
res_cand.append( trans(check( count-second_count, odd_list[-3])) )
res_cand = sorted( res_cand, key=lambda x: (-x[0],-x[1]) )
cur = (-1, -2)
for item in res_cand:
if item[0]!=-1:
cur = item
return check( cur[0], cur[1] )
This code works and I am unable to understand the code and flow of one function to the the other. However I don't understand the logic of the algorithm. How it has approached the problem and solved it. This might be a long task but can anybody please care enough to explain me the algorithm. Thanks in advance.
So far I have figured out that the number of odd numbers are crucial to find out the result. Especially the index of the first odd number and the last odd number is needed to calculate the important values.
Now I need to understand the logic behind the comparison such as "if first_count >= count" and if "second_count < count <= twosum".
Update:
Hey guys I found out the solution to my question and finally understood the logic of the algorithm.
The idea lies behind the symmetry of the array. We can never win the game if the array is symmetrical. Here symmetrical is defined as the array where there is only one odd in the middle and equal number of evens on the either side of that one odd.
If there are even number of odds we can directly win the game.
If there are odd number of odds we should always try to make the array symmetrical. That is what the algorithm is trying to do.
Now there are two cases to it. Either the last odd will remain or the first odd will remain. I will be happy to explain more if you guys didn't understand it. Thanks.

Better Algorithm to find the maximum number who's square divides K :

Given a number K which is a product of two different numbers (A,B), find the maximum number(<=A & <=B) who's square divides the K .
Eg : K = 54 (6*9) . Both the numbers are available i.e 6 and 9.
My approach is fairly very simple or trivial.
taking the smallest of the two ( 6 in this case).Lets say A
Square the number and divide K, if its a perfect division, that's the number.
Else A = A-1 ,till A =1.
For the given example, 3*3 = 9 divides K, and hence 3 is the answer.
Looking for a better algorithm, than the trivial solution.
Note : The test cases are in 1000's so the best possible approach is needed.
I am sure someone else will come up with a nice answer involving modulus arithmetic. Here is a naive approach...
Each of the factors can themselves be factored (though it might be an expensive operation).
Given the factors, you can then look for groups of repeated factors.
For instance, using your example:
Prime factors of 9: 3, 3
Prime factors of 6: 2, 3
All prime factors: 2, 3, 3, 3
There are two 3s, so you have your answer (the square of 3 divides 54).
Second example of 36 x 9 = 324
Prime factors of 36: 2, 2, 3, 3
Prime factors of 9: 3, 3
All prime factors: 2, 2, 3, 3, 3, 3
So you have two 2s and four 3s, which means 2x3x3 is repeated. 2x3x3 = 18, so the square of 18 divides 324.
Edit: python prototype
import math
def factors(num, dict):
""" This finds the factors of a number recursively.
It is not the most efficient algorithm, and I
have not tested it a lot. You should probably
use another one. dict is a dictionary which looks
like {factor: occurrences, factor: occurrences, ...}
It must contain at least {2: 0} but need not have
any other pre-populated elements. Factors will be added
to this dictionary as they are found.
"""
while (num % 2 == 0):
num /= 2
dict[2] += 1
i = 3
found = False
while (not found and (i <= int(math.sqrt(num)))):
if (num % i == 0):
found = True
factors(i, dict)
factors(num / i, dict)
else:
i += 2
if (not found):
if (num in dict.keys()):
dict[num] += 1
else:
dict[num] = 1
return 0
#MAIN ROUTINE IS HERE
n1 = 37 # first number (6 in your example)
n2 = 41 # second number (9 in your example)
dict = {2: 0} # initialise factors (start with "no factors of 2")
factors(n1, dict) # find the factors of f1 and add them to the list
factors(n2, dict) # find the factors of f2 and add them to the list
sqfac = 1
# now find all factors repeated twice and multiply them together
for k in dict.keys():
dict[k] /= 2
sqfac *= k ** dict[k]
# here is the result
print(sqfac)
Answer in C++
int func(int i, j)
{
int k = 54
float result = pow(i, 2)/k
if (static_cast<int>(result)) == result)
{
if(i < j)
{
func(j, i);
}
else
{
cout << "Number is correct: " << i << endl;
}
}
else
{
cout << "Number is wrong" << endl;
func(j, i)
}
}
Explanation:
First recursion then test if result is a positive integer if it is then check if the other multiple is less or greater if greater recursive function tries the other multiple and if not then it is correct. Then if result is not positive integer then print Number is wrong and do another recursive function to test j.
If I got the problem correctly, I see that you have a rectangle of length=A, width=B, and area=K
And you want convert it to a square and lose the minimum possible area
If this is the case. So the problem with your algorithm is not the cost of iterating through mutliple iterations till get the output.
Rather the problem is that your algorithm depends heavily on the length A and width B of the input rectangle.
While it should depend only on the area K
For example:
Assume A =1, B=25
Then K=25 (the rect area)
Your algorithm will take the minimum value, which is A and accept it as answer with a single
iteration which is so fast but leads to wrong asnwer as it will result in a square of area 1 and waste the remaining 24 (whatever cm
or m)
While the correct answer here should be 5. which will never be reached by your algorithm
So, in my solution I assume a single input K
My ideas is as follows
x = sqrt(K)
if(x is int) .. x is the answer
else loop from x-1 till 1, x--
if K/x^2 is int, x is the answer
This might take extra iterations but will guarantee accurate answer
Also, there might be some concerns on the cost of sqrt(K)
but it will be called just once to avoid misleading length and width input

Find all possible combinations from 4 input numbers which can add up to 24

Actually, this question can be generalized as below:
Find all possible combinations from a given set of elements, which meets
a certain criteria.
So, any good algorithms?
There are only 16 possibilities (and one of those is to add together "none of them", which ain't gonna give you 24), so the old-fashioned "brute force" algorithm looks pretty good to me:
for (unsigned int choice = 1; choice < 16; ++choice) {
int sum = 0;
if (choice & 1) sum += elements[0];
if (choice & 2) sum += elements[1];
if (choice & 4) sum += elements[2];
if (choice & 8) sum += elements[3];
if (sum == 24) {
// we have a winner
}
}
In the completely general form of your problem, the only way to tell whether a combination meets "certain criteria" is to evaluate those criteria for every single combination. Given more information about the criteria, maybe you could work out some ways to avoid testing every combination and build an algorithm accordingly, but not without those details. So again, brute force is king.
There are two interesting explanations about the sum problem, both in Wikipedia and MathWorld.
In the case of the first question you asked, the first answer is good for a limited number of elements. You should realize that the reason Mr. Jessop used 16 as the boundary for his loop is because this is 2^4, where 4 is the number of elements in your set. If you had 100 elements, the loop limit would become 2^100 and your algorithm would literally take forever to finish.
In the case of a bounded sum, you should consider a depth first search, because when the sum of elements exceeds the sum you are looking for, you can prune your branch and backtrack.
In the case of the generic question, finding the subset of elements that satisfy certain criteria, this is known as the Knapsack problem, which is known to be NP-Complete. Given that, there is no algorithm that will solve it in less than exponential time.
Nevertheless, there are several heuristics that bring good results to the table, including (but not limited to) genetic algorithms (one I personally like, for I wrote a book on them) and dynamic programming. A simple search in Google will show many scientific papers that describe different solutions for this problem.
Find all possible combinations from a given set of elements, which
meets a certain criteria
If i understood you right, this code will helpful for you:
>>> from itertools import combinations as combi
>>> combi.__doc__
'combinations(iterable, r) --> combinations object\n\nReturn successive r-length
combinations of elements in the iterable.\n\ncombinations(range(4), 3) --> (0,1
,2), (0,1,3), (0,2,3), (1,2,3)'
>>> set = range(4)
>>> set
[0, 1, 2, 3]
>>> criteria = range(3)
>>> criteria
[0, 1, 2]
>>> for tuple in list(combi(set, len(criteria))):
... if cmp(list(tuple), criteria) == 0:
... print 'criteria exists in tuple: ', tuple
...
criteria exists in tuple: (0, 1, 2)
>>> list(combi(set, len(criteria)))
[(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]
Generally for a problem as this you have to try all posebilities, the thing you should do have the code abort the building of combiantion if you know it will not satesfie the criteria (if you criteria is that you do not have more then two blue balls, then you have to abort calculation that has more then two). Backtracing
def perm(set,permutation):
if lenght(set) == lenght(permutation):
print permutation
else:
for element in set:
if permutation.add(element) == criteria:
perm(sett,permutation)
else:
permutation.pop() //remove the element added in the if
The set of input numbers matters, as you can tell as soon as you allow e.g. negative numbers, imaginary numbers, rational numbers etc in your start set. You could also restrict to e.g. all even numbers, all odd number inputs etc.
That means that it's hard to build something deductive. You need brute force, a.k.a. try every combination etc.
In this particular problem you could build an algoritm that recurses - e.g. find every combination of 3 Int ( 1,22) that add up to 23, then add 1, every combination that add to 22 and add 2 etc. Which can again be broken into every combination of 2 that add up to 21 etc. You need to decide if you can count same number twice.
Once you have that you have a recursive function to call -
combinations( 24 , 4 ) = combinations( 23, 3 ) + combinations( 22, 3 ) + ... combinations( 4, 3 );
combinations( 23 , 3 ) = combinations( 22, 2 ) + ... combinations( 3, 2 );
etc
This works well except you have to be careful around repeating numbers in the recursion.
private int[][] work()
{
const int target = 24;
List<int[]> combos = new List<int[]>();
for(int i = 0; i < 9; i++)
for(int x = 0; x < 9; x++)
for(int y = 0; y < 9; y++)
for (int z = 0; z < 9; z++)
{
int res = x + y + z + i;
if (res == target)
{
combos.Add(new int[] { x, y, z, i });
}
}
return combos.ToArray();
}
It works instantly, but there probably are better methods rather than 'guess and check'. All I am doing is looping through every possibility, adding them all together, and seeing if it comes out to the target value.
If i understand your question correctly, what you are asking for is called "Permutations" or the number (N) of possible ways to arrange (X) numbers taken from a set of (Y) numbers.
N = Y! / (Y - X)!
I don't know if this will help, but this is a solution I came up with for an assignment on permutations.
You have an input of : 123 (string) using the substr functions
1) put each number of the input into an array
array[N1,N2,N3,...]
2)Create a swap function
function swap(Number A, Number B)
{
temp = Number B
Number B = Number A
Number A = temp
}
3)This algorithm uses the swap function to move the numbers around until all permutations are done.
original_string= '123'
temp_string=''
While( temp_string != original_string)
{
swap(array element[i], array element[i+1])
if (i == 1)
i == 0
temp_string = array.toString
i++
}
Hopefully you can follow my pseudo code, but this works at least for 3 digit permutations
(n X n )
built up a square matrix of nxn
and print all together its corresponding crossed values
e.g.
1 2 3 4
1 11 12 13 14
2 .. .. .. ..
3 ..
4 .. ..

How would you look at developing an algorithm for this hotel problem?

There is a problem I am working on for a programming course and I am having trouble developing an algorithm to suit the problem. Here it is:
You are going on a long trip. You start on the road at mile post 0. Along the way there are n
hotels, at mile posts a1 < a2 < ... < an, where each ai is measured from the starting point. The
only places you are allowed to stop are at these hotels, but you can choose which of the hotels
you stop at. You must stop at the final hotel (at distance an), which is your destination.
You'd ideally like to travel 200 miles a day, but this may not be possible (depending on the spacing
of the hotels). If you travel x miles during a day, the penalty for that day is (200 - x)^2. You want
to plan your trip so as to minimize the total penalty that is, the sum, over all travel days, of the
daily penalties.
Give an efficient algorithm that determines the optimal sequence of hotels at which to stop.
So, my intuition tells me to start from the back, checking penalty values, then somehow match them going back the forward direction (resulting in an O(n^2) runtime, which is optimal enough for the situation).
Anyone see any possible way to make this idea work out or have any ideas on possible implmentations?
If x is a marker number, ax is the mileage to that marker, and px is the minimum penalty to get to that marker, you can calculate pn for marker n if you know pm for all markers m before n.
To calculate pn, find the minimum of pm + (200 - (an - am))^2 for all markers m where am < an and (200 - (an - am))^2 is less than your current best for pn (last part is optimization).
For the starting marker 0, a0 = 0 and p0 = 0, for marker 1, p1 = (200 - a1)^2. With that starting information you can calculate p2, then p3 etc. up to pn.
edit: Switched to Java code, using the example from OP's comment. Note that this does not have the optimization check described in second paragraph.
public static void printPath(int path[], int i) {
if (i == 0) return;
printPath(path, path[i]);
System.out.print(i + " ");
}
public static void main(String args[]) {
int hotelList[] = {0, 200, 400, 600, 601};
int penalties[] = {0, (int)Math.pow(200 - hotelList[1], 2), -1, -1, -1};
int path[] = {0, 0, -1, -1, -1};
for (int i = 2; i <= hotelList.length - 1; i++) {
for(int j = 0; j < i; j++){
int tempPen = (int)(penalties[j] + Math.pow(200 - (hotelList[i] - hotelList[j]), 2));
if(penalties[i] == -1 || tempPen < penalties[i]){
penalties[i] = tempPen;
path[i] = j;
}
}
}
for (int i = 1; i < hotelList.length; i++) {
System.out.print("Hotel: " + hotelList[i] + ", penalty: " + penalties[i] + ", path: ");
printPath(path, i);
System.out.println();
}
}
Output is:
Hotel: 200, penalty: 0, path: 1
Hotel: 400, penalty: 0, path: 1 2
Hotel: 600, penalty: 0, path: 1 2 3
Hotel: 601, penalty: 1, path: 1 2 4
It looks like you can solve this problem with dynamic programming. The subproblem is the following:
d(i) : The minimum penalty possible when travelling from the start to hotel i.
The recursive formula is as follows:
d(0) = 0 where 0 is the starting position.
d(i) = min_{j=0, 1, ... , i-1} ( d(j) + (200-(ai-aj))^2)
The minimum penalty for reaching hotel i is found by trying all stopping places for the previous day, adding today's penalty and taking the minimum of those.
In order to find the path, we store in a separate array (path[]) which hotel we had to travel from in order to achieve the minimum penalty for that particular hotel. By traversing the array backwards (from path[n]) we obtain the path.
The runtime is O(n^2).
This is equivalent to finding the shortest path between two nodes in a directional acyclic graph. Dijkstra's algorithm will run in O(n^2) time.
Your intuition is better, though. Starting at the back, calculate the minimum penalty of stopping at that hotel. The first hotel's penalty is just (200-(200-x)^2)^2. Then, for each of the other hotels (in reverse order), scan forward to find the lowest-penalty hotel. A simple optimization is to stop as soon as the penalty costs start increasing, since that means you've overshot the global minimum.
I don't think you can do it as easily as sysrqb states.
On a side note, there is really no difference to starting from start or end; the goal is to find the minimum amount of stops each way, where each stop is as close to 200m as possible.
The question as stated seems to allow travelling beyond 200m per day, and the penalty is equally valid for over or under (since it is squared). This prefers an overage of miles per day rather than underage, since the penalty is equal, but the goal is closer.
However, given this layout
A ----- B----C-------D------N
0 190 210 390 590
It is not always true. It is better to go to B->D->N for a total penalty of only (200-190)^2 = 100. Going further via C->D->N gives a penalty of 100+400=500.
The answer looks like a full breadth first search with active pruning if you already have an optimal solution to reach point P, removing all solutions thus far where
sum(penalty-x) > sum(penalty-p) AND distance-to-x <= distance-to-p - 200
This would be an O(n^2) algorithm
Something like...
Quicksort all hotels by distance from start (discard any that have distance > hotelN)
Create an array/list of solutions, each containing (ListOfHotels, I, DistanceSoFar, Penalty)
Inspect each hotel in order, for each hotel_I
Calculate penalty to I, starting from each prior solution
Pruning
For each prior solution that is beyond 200 distanceSoFar from
current, and Penalty>current.penalty, remove it from list
loop
Following is the MATLAB code for hotel problem.
clc
clear all
% Data
% a = [0;50;180;150;50;40];
% a = [0, 200, 400, 600, 601];
a = [0,10,180,350,450,600];
% a = [0,1,2,3,201,202,203,403];
n = length(a);
opt(1) = 0;
prev(1)= 1;
for i=2:n
opt(i) =Inf;
for j = 1:i-1
if(opt(i)>(opt(j)+ (200-a(i)+a(j))^2))
opt(i)= opt(j)+ (200-a(i)+a(j))^2;
prev(i) = j;
end
end
S(i) = opt(i);
end
k = 1;
i = n;
sol(1) = n;
while(i>1)
k = k+1;
sol(k)=prev(i);
i = prev(i);
end
for i =k:-1:1
stops(i) = sol(i);
end
stops
Step 1 of 2
Sub-problem:
In this scenario, "C(j)" has been considered as sub-problem for minimum penalty gained up to the hotel "ai" when "0<=i<=n". The required value for the problem is "C(n)".
Algorithm to find minimum total penalty:
If the trip is stopped at the location "aj" then the previous stop will be "ai" and the value of i and should be less than j. Then all the possibilities of "ai", has been follows:
C(j) min{C(i)+(200-(aj-ai))^2}, 0<=i<=j.
Initialize the value of "C(0)" as "0" and “a0" as "0" to find the remaining values.
To find the optimal route, increase the value of "j" and "i" for each iteration of and use this detail to backtrack from "C(n)".
Here, "C(n)" refers the penalty of the last hotel (That is, the value of "i" is between "0" and "n").
Pseudocode:
//Function definition
Procedure min_tot()
//Outer loop to represent the value of for j = 1 to n:
//Calculate the distance of each stop C(j) = (200 — aj)^2
//Inner loop to represent the value of for i=1 to j-1:
//Compute total penalty and assign the minimum //total penalty to
"c(j)"
C(j) = min (C(i), C(i) + (200 — (aj — ai))^2}
//Return the value of total penalty of last hotel
return C(n)
Step 2 of 2
Explanation:
The above algorithm is used to find the minimum total penalty from the starting point to the end point.
It uses the function "min()" to find the total penalty for the each stop in the trip and computes the minimum
penalty value.
Running time of the algorithm:
This algorithm contains "n" sub-problems and each sub-problem take "O(n)" times to resolve.
It is needed to compute only the minimum values of "O(n)".
And the backtracking process takes "O(n)" times.
The total running time of the algorithm is nxn = n^2 = O(n^2) .
Therefore, this algorithm totally takes "0(n^2)" times to solve the whole problem.
I have come across this problem recently and wanted to share my solution written in Javascript.
Not dissimilar to the most of the above solutions, I have used dynamic programming approach. To calculate penalties[i], we need to search for such stopping place for the previous day so that the penalty is minimum.
penalties(i) = min_{j=0, 1, ... , i-1} ( penalties(j) + (200-(hotelList[i]-hotelList[j]))^2) The solution does not assume that the first penalty is Math.pow(200 - hotelList[1], 2). We don't know whether or not it is optimal to stop at the first top so this assumption should not be made.
In order to find the optimal path and store all the stops along the way, the helper array path is being used. Finally, the array is being traversed backwards to calculate the finalPath.
function calculateOptimalRoute(hotelList) {
const path = [];
const penalties = [];
for (i = 0; i < hotelList.length; i++) {
penalties[i] = Math.pow(200 - hotelList[i], 2)
path[i] = 0
for (j = 0; j < i; j++) {
const temp = penalties[j] + Math.pow((200 - (hotelList[i] - hotelList[j])), 2)
if (temp < penalties[i]) {
penalties[i] = temp;
path[i] = (j + 1);
}
}
}
const finalPath = [];
let index = path.length - 1
while (index >= 0) {
finalPath.unshift(index + 1);
index = path[index] - 1;
}
console.log('min penalty is ', penalties[hotelList.length - 1])
console.log('final path is ', finalPath)
return finalPath;
}
// calculateOptimalRoute([20, 40, 60, 940, 1500])
// Outputs [3, 4, 5]
// calculateOptimalRoute([190, 420, 550, 660, 670])
// Outputs [1, 2, 5]
// calculateOptimalRoute([200, 400, 600, 601])
// Outputs [1, 2, 4]
// calculateOptimalRoute([])
// Outputs []
To answer your question concisely, a PSPACE-complete algorithm is usually considered "efficient" for most Constraint Satisfaction Problems, so if you have an O(n^2) algorithm, that's "efficient".
I think the simplest method, given N total miles and 200 miles per day, would be to divide N by 200 to get X; the number of days you will travel. Round that to the nearest whole number of days X', then divide N by X' to get Y, the optimal number of miles to travel in a day. This is effectively a constant-time operation. If there were a hotel every Y miles, stopping at those hotels would produce the lowest possible score, by minimizing the effect of squaring each day's penalty. For instance, if the total trip is 605 miles, the penalty for travelling 201 miles per day (202 on the last) is 1+1+4 = 6, far less than 0+0+25 = 25 (200+200+205) you would get by minimizing each individual day's travel penalty as you went.
Now, you can traverse the list of hotels. The fastest method would be to simply pick the hotel that is the closest to each multiple of Y miles. It's linear-time and will produce a "good" result. However, I do not think this will produce the "best" result in all cases.
The more complex but foolproof method is to get the two closest hotels to each multiple of Y; the one immediately before and the one immediately after. This produces an array of X' pairs, which can be traversed in all possible permutations in 2^X' time. You can shorten this by applying Dijkstra to a map of these pairs, which will determine the least costly path for each day's travel, and will execute in roughly (2X')^2 time. This will probably be the most efficient algorithm that is guaranteed to produce the optimal result.
As #rmmh mentioned you are finding minimum distance path. Here distance is penalty ( 200-x )^2
So you will try to find a stopping plan by finding minimum penalty.
Lets say D(ai) gives distance of ai from starting point
P(i) = min { P(j) + (200 - (D(ai) - D(dj)) ^2 } where j is : 0 <= j < i
From a casual analysis it looks to be
O(n^2) algorithm ( = 1 + 2 + 3 + 4 + .... + n ) = O(n^2)
As a proof of concept, here is my JavaScript solution in Dynamic Programming without nested loops.
We start at zero miles.
We find the next stop by keeping the penalty as low as we can by comparing the penalty of a current hotel in the loop to the previous hotel's penalty.
Once we have our current minimum, we have found our stop for the day. We assign this point as our next starting point.
Optionally, we could keep the total of the penalties:
let hotels = [40, 80, 90, 200, 250, 450, 680, 710, 720, 950, 1000, 1080, 1200, 1480]
function findOptimalPath(arr) {
let start = 0
let stops = []
for (let i = 0; i < arr.length; i++) {
if (Math.pow((start + 200) - arr[i-1], 2) < Math.pow((start + 200) - arr[i], 2)) {
stops.push(arr[i-1])
start = arr[i-1]
}
}
console.log(stops)
}
findOptimalPath(hotels)
Here is my Python solution using Dynamic Programming:
distance = [150,180,250,340]
def hotelStop(distance):
n = len(distance)
DP = [0 for _ in distance]
for i in range(n-2,-1,-1):
min_penalty = float("inf")
for j in range(i+1,n):
# going from hotel i to j in first day
x = distance[j]-distance[i]
penalty = (200-x)**2
total_pentalty = penalty+ DP[j]
min_penalty = min(min_penalty,total_pentalty)
DP[i] = min_penalty
return DP[0]
hotelStop(distance)

Select k random elements from a list whose elements have weights

Selecting without any weights (equal probabilities) is beautifully described here.
I was wondering if there is a way to convert this approach to a weighted one.
I am also interested in other approaches as well.
Update: Sampling without replacement
If the sampling is with replacement, you can use this algorithm (implemented here in Python):
import random
items = [(10, "low"),
(100, "mid"),
(890, "large")]
def weighted_sample(items, n):
total = float(sum(w for w, v in items))
i = 0
w, v = items[0]
while n:
x = total * (1 - random.random() ** (1.0 / n))
total -= x
while x > w:
x -= w
i += 1
w, v = items[i]
w -= x
yield v
n -= 1
This is O(n + m) where m is the number of items.
Why does this work? It is based on the following algorithm:
def n_random_numbers_decreasing(v, n):
"""Like reversed(sorted(v * random() for i in range(n))),
but faster because we avoid sorting."""
while n:
v *= random.random() ** (1.0 / n)
yield v
n -= 1
The function weighted_sample is just this algorithm fused with a walk of the items list to pick out the items selected by those random numbers.
This in turn works because the probability that n random numbers 0..v will all happen to be less than z is P = (z/v)n. Solve for z, and you get z = vP1/n. Substituting a random number for P picks the largest number with the correct distribution; and we can just repeat the process to select all the other numbers.
If the sampling is without replacement, you can put all the items into a binary heap, where each node caches the total of the weights of all items in that subheap. Building the heap is O(m). Selecting a random item from the heap, respecting the weights, is O(log m). Removing that item and updating the cached totals is also O(log m). So you can pick n items in O(m + n log m) time.
(Note: "weight" here means that every time an element is selected, the remaining possibilities are chosen with probability proportional to their weights. It does not mean that elements appear in the output with a likelihood proportional to their weights.)
Here's an implementation of that, plentifully commented:
import random
class Node:
# Each node in the heap has a weight, value, and total weight.
# The total weight, self.tw, is self.w plus the weight of any children.
__slots__ = ['w', 'v', 'tw']
def __init__(self, w, v, tw):
self.w, self.v, self.tw = w, v, tw
def rws_heap(items):
# h is the heap. It's like a binary tree that lives in an array.
# It has a Node for each pair in `items`. h[1] is the root. Each
# other Node h[i] has a parent at h[i>>1]. Each node has up to 2
# children, h[i<<1] and h[(i<<1)+1]. To get this nice simple
# arithmetic, we have to leave h[0] vacant.
h = [None] # leave h[0] vacant
for w, v in items:
h.append(Node(w, v, w))
for i in range(len(h) - 1, 1, -1): # total up the tws
h[i>>1].tw += h[i].tw # add h[i]'s total to its parent
return h
def rws_heap_pop(h):
gas = h[1].tw * random.random() # start with a random amount of gas
i = 1 # start driving at the root
while gas >= h[i].w: # while we have enough gas to get past node i:
gas -= h[i].w # drive past node i
i <<= 1 # move to first child
if gas >= h[i].tw: # if we have enough gas:
gas -= h[i].tw # drive past first child and descendants
i += 1 # move to second child
w = h[i].w # out of gas! h[i] is the selected node.
v = h[i].v
h[i].w = 0 # make sure this node isn't chosen again
while i: # fix up total weights
h[i].tw -= w
i >>= 1
return v
def random_weighted_sample_no_replacement(items, n):
heap = rws_heap(items) # just make a heap...
for i in range(n):
yield rws_heap_pop(heap) # and pop n items off it.
If the sampling is with replacement, use the roulette-wheel selection technique (often used in genetic algorithms):
sort the weights
compute the cumulative weights
pick a random number in [0,1]*totalWeight
find the interval in which this number falls into
select the elements with the corresponding interval
repeat k times
If the sampling is without replacement, you can adapt the above technique by removing the selected element from the list after each iteration, then re-normalizing the weights so that their sum is 1 (valid probability distribution function)
I know this is a very old question, but I think there's a neat trick to do this in O(n) time if you apply a little math!
The exponential distribution has two very useful properties.
Given n samples from different exponential distributions with different rate parameters, the probability that a given sample is the minimum is equal to its rate parameter divided by the sum of all rate parameters.
It is "memoryless". So if you already know the minimum, then the probability that any of the remaining elements is the 2nd-to-min is the same as the probability that if the true min were removed (and never generated), that element would have been the new min. This seems obvious, but I think because of some conditional probability issues, it might not be true of other distributions.
Using fact 1, we know that choosing a single element can be done by generating these exponential distribution samples with rate parameter equal to the weight, and then choosing the one with minimum value.
Using fact 2, we know that we don't have to re-generate the exponential samples. Instead, just generate one for each element, and take the k elements with lowest samples.
Finding the lowest k can be done in O(n). Use the Quickselect algorithm to find the k-th element, then simply take another pass through all elements and output all lower than the k-th.
A useful note: if you don't have immediate access to a library to generate exponential distribution samples, it can be easily done by: -ln(rand())/weight
I've done this in Ruby
https://github.com/fl00r/pickup
require 'pickup'
pond = {
"selmon" => 1,
"carp" => 4,
"crucian" => 3,
"herring" => 6,
"sturgeon" => 8,
"gudgeon" => 10,
"minnow" => 20
}
pickup = Pickup.new(pond, uniq: true)
pickup.pick(3)
#=> [ "gudgeon", "herring", "minnow" ]
pickup.pick
#=> "herring"
pickup.pick
#=> "gudgeon"
pickup.pick
#=> "sturgeon"
If you want to generate large arrays of random integers with replacement, you can use piecewise linear interpolation. For example, using NumPy/SciPy:
import numpy
import scipy.interpolate
def weighted_randint(weights, size=None):
"""Given an n-element vector of weights, randomly sample
integers up to n with probabilities proportional to weights"""
n = weights.size
# normalize so that the weights sum to unity
weights = weights / numpy.linalg.norm(weights, 1)
# cumulative sum of weights
cumulative_weights = weights.cumsum()
# piecewise-linear interpolating function whose domain is
# the unit interval and whose range is the integers up to n
f = scipy.interpolate.interp1d(
numpy.hstack((0.0, weights)),
numpy.arange(n + 1), kind='linear')
return f(numpy.random.random(size=size)).astype(int)
This is not effective if you want to sample without replacement.
Here's a Go implementation from geodns:
package foo
import (
"log"
"math/rand"
)
type server struct {
Weight int
data interface{}
}
func foo(servers []server) {
// servers list is already sorted by the Weight attribute
// number of items to pick
max := 4
result := make([]server, max)
sum := 0
for _, r := range servers {
sum += r.Weight
}
for si := 0; si < max; si++ {
n := rand.Intn(sum + 1)
s := 0
for i := range servers {
s += int(servers[i].Weight)
if s >= n {
log.Println("Picked record", i, servers[i])
sum -= servers[i].Weight
result[si] = servers[i]
// remove the server from the list
servers = append(servers[:i], servers[i+1:]...)
break
}
}
}
return result
}
If you want to pick x elements from a weighted set without replacement such that elements are chosen with a probability proportional to their weights:
import random
def weighted_choose_subset(weighted_set, count):
"""Return a random sample of count elements from a weighted set.
weighted_set should be a sequence of tuples of the form
(item, weight), for example: [('a', 1), ('b', 2), ('c', 3)]
Each element from weighted_set shows up at most once in the
result, and the relative likelihood of two particular elements
showing up is equal to the ratio of their weights.
This works as follows:
1.) Line up the items along the number line from [0, the sum
of all weights) such that each item occupies a segment of
length equal to its weight.
2.) Randomly pick a number "start" in the range [0, total
weight / count).
3.) Find all the points "start + n/count" (for all integers n
such that the point is within our segments) and yield the set
containing the items marked by those points.
Note that this implementation may not return each possible
subset. For example, with the input ([('a': 1), ('b': 1),
('c': 1), ('d': 1)], 2), it may only produce the sets ['a',
'c'] and ['b', 'd'], but it will do so such that the weights
are respected.
This implementation only works for nonnegative integral
weights. The highest weight in the input set must be less
than the total weight divided by the count; otherwise it would
be impossible to respect the weights while never returning
that element more than once per invocation.
"""
if count == 0:
return []
total_weight = 0
max_weight = 0
borders = []
for item, weight in weighted_set:
if weight < 0:
raise RuntimeError("All weights must be positive integers")
# Scale up weights so dividing total_weight / count doesn't truncate:
weight *= count
total_weight += weight
borders.append(total_weight)
max_weight = max(max_weight, weight)
step = int(total_weight / count)
if max_weight > step:
raise RuntimeError(
"Each weight must be less than total weight / count")
next_stop = random.randint(0, step - 1)
results = []
current = 0
for i in range(count):
while borders[current] <= next_stop:
current += 1
results.append(weighted_set[current][0])
next_stop += step
return results
In the question you linked to, Kyle's solution would work with a trivial generalization.
Scan the list and sum the total weights. Then the probability to choose an element should be:
1 - (1 - (#needed/(weight left)))/(weight at n). After visiting a node, subtract it's weight from the total. Also, if you need n and have n left, you have to stop explicitly.
You can check that with everything having weight 1, this simplifies to kyle's solution.
Edited: (had to rethink what twice as likely meant)
This one does exactly that with O(n) and no excess memory usage. I believe this is a clever and efficient solution easy to port to any language. The first two lines are just to populate sample data in Drupal.
function getNrandomGuysWithWeight($numitems){
$q = db_query('SELECT id, weight FROM theTableWithTheData');
$q = $q->fetchAll();
$accum = 0;
foreach($q as $r){
$accum += $r->weight;
$r->weight = $accum;
}
$out = array();
while(count($out) < $numitems && count($q)){
$n = rand(0,$accum);
$lessaccum = NULL;
$prevaccum = 0;
$idxrm = 0;
foreach($q as $i=>$r){
if(($lessaccum == NULL) && ($n <= $r->weight)){
$out[] = $r->id;
$lessaccum = $r->weight- $prevaccum;
$accum -= $lessaccum;
$idxrm = $i;
}else if($lessaccum){
$r->weight -= $lessaccum;
}
$prevaccum = $r->weight;
}
unset($q[$idxrm]);
}
return $out;
}
I putting here a simple solution for picking 1 item, you can easily expand it for k items (Java style):
double random = Math.random();
double sum = 0;
for (int i = 0; i < items.length; i++) {
val = items[i];
sum += val.getValue();
if (sum > random) {
selected = val;
break;
}
}
I have implemented an algorithm similar to Jason Orendorff's idea in Rust here. My version additionally supports bulk operations: insert and remove (when you want to remove a bunch of items given by their ids, not through the weighted selection path) from the data structure in O(m + log n) time where m is the number of items to remove and n the number of items in stored.
Sampling wihout replacement with recursion - elegant and very short solution in c#
//how many ways we can choose 4 out of 60 students, so that every time we choose different 4
class Program
{
static void Main(string[] args)
{
int group = 60;
int studentsToChoose = 4;
Console.WriteLine(FindNumberOfStudents(studentsToChoose, group));
}
private static int FindNumberOfStudents(int studentsToChoose, int group)
{
if (studentsToChoose == group || studentsToChoose == 0)
return 1;
return FindNumberOfStudents(studentsToChoose, group - 1) + FindNumberOfStudents(studentsToChoose - 1, group - 1);
}
}
I just spent a few hours trying to get behind the algorithms underlying sampling without replacement out there and this topic is more complex than I initially thought. That's exciting! For the benefit of a future readers (have a good day!) I document my insights here including a ready to use function which respects the given inclusion probabilities further below. A nice and quick mathematical overview of the various methods can be found here: Tillé: Algorithms of sampling with equal or unequal probabilities. For example Jason's method can be found on page 46. The caveat with his method is that the weights are not proportional to the inclusion probabilities as also noted in the document. Actually, the i-th inclusion probabilities can be recursively computed as follows:
def inclusion_probability(i, weights, k):
"""
Computes the inclusion probability of the i-th element
in a randomly sampled k-tuple using Jason's algorithm
(see https://stackoverflow.com/a/2149533/7729124)
"""
if k <= 0: return 0
cum_p = 0
for j, weight in enumerate(weights):
# compute the probability of j being selected considering the weights
p = weight / sum(weights)
if i == j:
# if this is the target element, we don't have to go deeper,
# since we know that i is included
cum_p += p
else:
# if this is not the target element, than we compute the conditional
# inclusion probability of i under the constraint that j is included
cond_i = i if i < j else i-1
cond_weights = weights[:j] + weights[j+1:]
cond_p = inclusion_probability(cond_i, cond_weights, k-1)
cum_p += p * cond_p
return cum_p
And we can check the validity of the function above by comparing
In : for i in range(3): print(i, inclusion_probability(i, [1,2,3], 2))
0 0.41666666666666663
1 0.7333333333333333
2 0.85
to
In : import collections, itertools
In : sample_tester = lambda f: collections.Counter(itertools.chain(*(f() for _ in range(10000))))
In : sample_tester(lambda: random_weighted_sample_no_replacement([(1,'a'),(2,'b'),(3,'c')],2))
Out: Counter({'a': 4198, 'b': 7268, 'c': 8534})
One way - also suggested in the document above - to specify the inclusion probabilities is to compute the weights from them. The whole complexity of the question at hand stems from the fact that one cannot do that directly since one basically has to invert the recursion formula, symbolically I claim this is impossible. Numerically it can be done using all kind of methods, e.g. Newton's method. However the complexity of inverting the Jacobian using plain Python becomes unbearable quickly, I really recommend looking into numpy.random.choice in this case.
Luckily there is method using plain Python which might or might not be sufficiently performant for your purposes, it works great if there aren't that many different weights. You can find the algorithm on page 75&76. It works by splitting up the sampling process into parts with the same inclusion probabilities, i.e. we can use random.sample again! I am not going to explain the principle here since the basics are nicely presented on page 69. Here is the code with hopefully a sufficient amount of comments:
def sample_no_replacement_exact(items, k, best_effort=False, random_=None, ε=1e-9):
"""
Returns a random sample of k elements from items, where items is a list of
tuples (weight, element). The inclusion probability of an element in the
final sample is given by
k * weight / sum(weights).
Note that the function raises if a inclusion probability cannot be
satisfied, e.g the following call is obviously illegal:
sample_no_replacement_exact([(1,'a'),(2,'b')],2)
Since selecting two elements means selecting both all the time,
'b' cannot be selected twice as often as 'a'. In general it can be hard to
spot if the weights are illegal and the function does *not* always raise
an exception in that case. To remedy the situation you can pass
best_effort=True which redistributes the inclusion probability mass
if necessary. Note that the inclusion probabilities will change
if deemed necessary.
The algorithm is based on the splitting procedure on page 75/76 in:
http://www.eustat.eus/productosServicios/52.1_Unequal_prob_sampling.pdf
Additional information can be found here:
https://stackoverflow.com/questions/2140787/
:param items: list of tuples of type weight,element
:param k: length of resulting sample
:param best_effort: fix inclusion probabilities if necessary,
(optional, defaults to False)
:param random_: random module to use (optional, defaults to the
standard random module)
:param ε: fuzziness parameter when testing for zero in the context
of floating point arithmetic (optional, defaults to 1e-9)
:return: random sample set of size k
:exception: throws ValueError in case of bad parameters,
throws AssertionError in case of algorithmic impossibilities
"""
# random_ defaults to the random submodule
if not random_:
random_ = random
# special case empty return set
if k <= 0:
return set()
if k > len(items):
raise ValueError("resulting tuple length exceeds number of elements (k > n)")
# sort items by weight
items = sorted(items, key=lambda item: item[0])
# extract the weights and elements
weights, elements = list(zip(*items))
# compute the inclusion probabilities (short: π) of the elements
scaling_factor = k / sum(weights)
π = [scaling_factor * weight for weight in weights]
# in case of best_effort: if a inclusion probability exceeds 1,
# try to rebalance the probabilities such that:
# a) no probability exceeds 1,
# b) the probabilities still sum to k, and
# c) the probability masses flow from top to bottom:
# [0.2, 0.3, 1.5] -> [0.2, 0.8, 1]
# (remember that π is sorted)
if best_effort and π[-1] > 1 + ε:
# probability mass we still we have to distribute
debt = 0.
for i in reversed(range(len(π))):
if π[i] > 1.:
# an 'offender', take away excess
debt += π[i] - 1.
π[i] = 1.
else:
# case π[i] < 1, i.e. 'save' element
# maximum we can transfer from debt to π[i] and still not
# exceed 1 is computed by the minimum of:
# a) 1 - π[i], and
# b) debt
max_transfer = min(debt, 1. - π[i])
debt -= max_transfer
π[i] += max_transfer
assert debt < ε, "best effort rebalancing failed (impossible)"
# make sure we are talking about probabilities
if any(not (0 - ε <= π_i <= 1 + ε) for π_i in π):
raise ValueError("inclusion probabilities not satisfiable: {}" \
.format(list(zip(π, elements))))
# special case equal probabilities
# (up to fuzziness parameter, remember that π is sorted)
if π[-1] < π[0] + ε:
return set(random_.sample(elements, k))
# compute the two possible lambda values, see formula 7 on page 75
# (remember that π is sorted)
λ1 = π[0] * len(π) / k
λ2 = (1 - π[-1]) * len(π) / (len(π) - k)
λ = min(λ1, λ2)
# there are two cases now, see also page 69
# CASE 1
# with probability λ we are in the equal probability case
# where all elements have the same inclusion probability
if random_.random() < λ:
return set(random_.sample(elements, k))
# CASE 2:
# with probability 1-λ we are in the case of a new sample without
# replacement problem which is strictly simpler,
# it has the following new probabilities (see page 75, π^{(2)}):
new_π = [
(π_i - λ * k / len(π))
/
(1 - λ)
for π_i in π
]
new_items = list(zip(new_π, elements))
# the first few probabilities might be 0, remove them
# NOTE: we make sure that floating point issues do not arise
# by using the fuzziness parameter
while new_items and new_items[0][0] < ε:
new_items = new_items[1:]
# the last few probabilities might be 1, remove them and mark them as selected
# NOTE: we make sure that floating point issues do not arise
# by using the fuzziness parameter
selected_elements = set()
while new_items and new_items[-1][0] > 1 - ε:
selected_elements.add(new_items[-1][1])
new_items = new_items[:-1]
# the algorithm reduces the length of the sample problem,
# it is guaranteed that:
# if λ = λ1: the first item has probability 0
# if λ = λ2: the last item has probability 1
assert len(new_items) < len(items), "problem was not simplified (impossible)"
# recursive call with the simpler sample problem
# NOTE: we have to make sure that the selected elements are included
return sample_no_replacement_exact(
new_items,
k - len(selected_elements),
best_effort=best_effort,
random_=random_,
ε=ε
) | selected_elements
Example:
In : sample_no_replacement_exact([(1,'a'),(2,'b'),(3,'c')],2)
Out: {'b', 'c'}
In : import collections, itertools
In : sample_tester = lambda f: collections.Counter(itertools.chain(*(f() for _ in range(10000))))
In : sample_tester(lambda: sample_no_replacement_exact([(1,'a'),(2,'b'),(3,'c'),(4,'d')],2))
Out: Counter({'a': 2048, 'b': 4051, 'c': 5979, 'd': 7922})
The weights sum up to 10, hence the inclusion probabilities compute to: a → 20%, b → 40%, c → 60%, d → 80%. (Sum: 200% = k.) It works!
Just one word of caution for the productive use of this function, it can be very hard to spot illegal inputs for the weights. An obvious illegal example is
In: sample_no_replacement_exact([(1,'a'),(2,'b')],2)
ValueError: inclusion probabilities not satisfiable: [(0.6666666666666666, 'a'), (1.3333333333333333, 'b')]
b cannot appear twice as often as a since both have to be always be selected. There are more subtle examples. To avoid an exception in production just use best_effort=True, which rebalances the inclusion probability mass such that we have always a valid distribution. Obviously this might change the inclusion probabilities.
I used a associative map (weight,object). for example:
{
(10,"low"),
(100,"mid"),
(10000,"large")
}
total=10110
peek a random number between 0 and 'total' and iterate over the keys until this number fits in a given range.

Resources