Problem:
Find a combination of 48 numbers (x) ranging from 1-6 that maximises an equation (y). The equation comprises of 48 distinct functions that are unknown and only take in one number per function.
max: y = f1(x1) + f2(x2) + ... + f48(x48)
where: x = {1:6}
example: x = [6, 1, 4, ..., 4] => y = 167
My first idea was to solve this using brute force, however, the search space is very large 6^48. Does anyone know of an algorithm that I could use or clever programming tricks?
The search space if not that large at all.
y is the sum of 48 distinct functions, so you need to maximize each one of them. There are 6 possibilities for each f_i, in total you need to check 6*48=288 cases to brute force.
Start with some base answer like x = [1, ..., 1]. Find the optimal value for x_1, then x_2, etc.
Related
I am trying to find a solution in which a given resource (eg. budget) will be best distributed to different options which yields different results on the resource provided.
Let's say I have N = 1200 and some functions. (a, b, c, d are some unknown variables)
f1(x) = a * x
f2(x) = b * x^c
f3(x) = a*x + b*x^2 + c*x^3
f4(x) = d^x
f5(x) = log x^d
...
And also, let's say there n number of these functions that yield different results based on its input x, where x = 0 or x >= m, where m is a constant.
Although I am not able to find exact formula for the given functions, I am able to find the output. This means that I can do:
X = f1(N1) + f2(N2) + f3(N3) + ... + fn(Nn) where (N1 + ... Nn) = N as many times as there are ways of distributing N into n numbers, and find a specific case where X is the greatest.
How would I actually go about finding the best distribution of N with the least computation power, using whatever libraries currently available?
If you are happy with allocations constrained to be whole numbers then there is a dynamic programming solution of cost O(Nn) - so you can increase accuracy by scaling if you want, but this will increase cpu time.
For each i=1 to n maintain an array where element j gives the maximum yield using only the first i functions giving them a total allowance of j.
For i=1 this is simply the result of f1().
For i=k+1 consider when working out the result for j consider each possible way of splitting j units between f_{k+1}() and the table that tells you the best return from a distribution among the first k functions - so you can calculate the table for i=k+1 using the table created for k.
At the end you get the best possible return for n functions and N resources. It makes it easier to find out what that best answer is if you maintain of a set of arrays telling the best way to distribute k units among the first i functions, for all possible values of i and k. Then you can look up the best allocation for f100(), subtract off the value this allocated to f100() from N, look up the best allocation for f99() given the resulting resources, and carry on like this until you have worked out the best allocations for all f().
As an example suppose f1(x) = 2x, f2(x) = x^2 and f3(x) = 3 if x>0 and 0 otherwise. Suppose we have 3 units of resource.
The first table is just f1(x) which is 0, 2, 4, 6 for 0,1,2,3 units.
The second table is the best you can do using f1(x) and f2(x) for 0,1,2,3 units and is 0, 2, 4, 9, switching from f1 to f2 at x=2.
The third table is 0, 3, 5, 9. I can get 3 and 5 by using 1 unit for f3() and the rest for the best solution in the second table. 9 is simply the best solution in the second table - there is no better solution using 3 resources that gives any of them to f(3)
So 9 is the best answer here. One way to work out how to get there is to keep the tables around and recalculate that answer. 9 comes from f3(0) + 9 from the second table so all 3 units are available to f2() + f1(). The second table 9 comes from f2(3) so there are no units left for f(1) and we get f1(0) + f2(3) + f3(0).
When you are working the resources to use at stage i=k+1 you have a table form i=k that tells you exactly the result to expect from the resources you have left over after you have decided to use some at stage i=k+1. The best distribution does not become incorrect because that stage i=k you have worked out the result for the best distribution given every possible number of remaining resources.
Assume a classic 0-1 knapsack problem but you are allowed to overflow/underflow the sack with some penalty. X profit is deducted for every unit overflow (weight above max capacity) and Y profit is deducted for every unit underflow (weight below max capacity).
I thought of sorting all items by the ratio of profit to weight and then try to fill the sack like a normal knapsack problem then for remaining weight and items I calculate extra profit by taking the underflow, overflow in consideration.
This solution fails in some cases like when there are 3 items with weight 30,20,10 and profit 30, 25, 20 respectively. Max weight allowed is 39, underflow penalty is 5 and overflow penalty is 10.
My solution was to solve it like normal knapsack then considering penalties so it gives the solution of selecting items of weight 20,10 but then it does not add the item of weight 30 as its penalty is higher than profit. The optimal solution should be selection items of weight 30 and 10. The only thing I can think of now is to brute force which should be avoided if possible. If anyone could think of any other solution, that'd be great!
You can break it into two subproblems, one with an underweight penalty and one with an overweight penalty. More specifically, you can solve the problem by solving two different integer linear programming problems, and taking the best of the two solutions:
Say that you have n items of weights w1,w2,...,wn and values v1, v2, ..., vn. Say that the weight capacity is C, the penalty for undeweight is A and the penality for overweight is B (per unit).
In both problems, let the binary decision variable be x1, ..., xn indicating whether or not the corresponding item is selected.
Problem 1)
max v1*x1 + v2*x2 + ... + vn*xn - A*(C - w1*x1 - w2*x2 - ... - wn*xn)
subject to
w1*x1 + w2*x2 + ... + wn*xn <= C
Note that via algebra the objective function is the same as the affine expression
(v1 + A*w1)*x1 + ... + (vn + A*wn)*xn - A*C
and is maximized at the same values x1, ..., xn which maximize the purely linear function
(v1 + A*w1)*x1 + ... + (vn + A*wn)*xn
This subproblem can be solved using any ILP solver, or just as an ordinary knapsack problem.
Problem 2)
max v1*x1 + v2*x2 + ... + vn*xn - B*(w1*x1 + w2*x2 + ... + wn*xn - C)
subject to
w1*x1 + w2*x2 + ... + wn*xn >= C
which can be solved by maximizing the linear objective function
(v1 - B*w1)*x1 + ... + (vn - B*wn)*xn
Again, that can be solved with any ILP solver. This problem isn't a knapsack problem since the inequality in the main constraint points in the wrong direction, though there might be some way of reducing it to a knapsack problem.
On Edit. The second problem can also be solved as a knapsack problem -- one in which you decide which items to not include. Start with the solution in which you include everything. If this isn't feasible (if the sum of all weights doesn't exceed the capacity) then you are done. The solution of problem 1 is the global solution. Otherwise. Define the surplus, S, to be
S = w1 + w2 + ... + wn - C
Now, solve the following knapsack problem:
weights: w1, w2, ..., wn //same as before
values: Bw1 - v1, Bw2 - v2, ..., BWn - vn
capacity: S
A word on the values: Bwi - vi is a measure of how much removing the ith object helps (under the assumption that removing it keeps you above the original capacity so that you don't need to consider the underweight penalties). On the one hand, it removes part of the penalty, Bwi, but on the other hand it takes some value away, vi.
After you solve this knapsack problem -- remove these items. The remaining items are the solution for problem 2.
Lets see how this plays out for your toy problem:
weights: 30, 20, 10
values: 20, 25, 20
C: 39
A: 5 //per-unit underflow penalty
B: 10 //per-unit overflow penalty
For problem 1, solve the following knapsack problem:
weights: 30, 20, 10
values: 170, 125, 70 // = 20 + 5*30, 25 + 5*20, 20 + 5*10
C: 39
This has solution: include 20, 10 with value of 195. In terms of the original problem this has value 195 - 5*39 = 0. That seems a bit weird, but in terms of the original problem the value of using the last two items is 25 + 20 = 45 but it leaves you 9 units under with a penalty of 5*9 = 45 and 45 - 45 = 0
Second problem:
weights: 30, 20, 10
values: 280, 175, 80 // = 10*30 - 20, 10*20 - 25, 10*10 - 20
S: 26 // = 30 + 20 + 10 - 39
The solution of this problem is clearly to select 20. This means that 20 is selected for non-inclusion. This means that for the second problem I want to include the objects of weights 30 and 10.
The value of doing so is (in terms of the original problem)
20 + 20 - 10*1 = 30
Since 30 > 0 (the value of solution 1), this is the overall optimal solution.
To sum up: you can solve your version of the knapsack problem by solving two ordinary knapsack problems to find two candidate solutions and then taking the better of the two. If you already have a function to solve knapsack problems, it shouldn't be too hard to write another function which calls it twice, interprets the outputs, and returns the best solution.
You can still use standard dynamic programming.
Let's compute whether the sum s is reachable for all s from 0 to the sum of all elements of the array. That's exactly what a standard dynamic programming solution does. We don't care about penalty here.
Let's iterate over all reachable sums and choose the best one taking into account the penalty for over(or under)flow.
Given an equation
Like 2(p1) + 3(p2) + 7(p3) >= 257
I need to find all possible combinations of p1, p2, p3
such the above statement is true and the resulting sum ( left hand side of the equation ) is minimal where all xn were known.
I tried looking up algorithms for general cases like
(x1)(p1) + (x2)(p2) + (x3)(p4) + ... + (xn)(pn) >= target
And I came across the Knapsack problem and Subset-Sum algorithm solutions, but they weren't exactly like this problem.
I tried before using an algorithm in Python 3.x that has lower-bound values for pn, but it still runs in O( ridiculous ) time complexity.
Obviously all numbers here are natural numbers, otherwise there would be infinite solutions.
I can see two possible approaches, depending on whether the Pi have to be >= 0. The case with Pi >= 0 is more sensible, so I will consider it first.
Treat this as dynamic programming, where you work from left to right along the equation. Looking at the larger equation in your comment, first of all create a list of the contributions from p0: 0, 5, 10, 15... 190384760, and beside them the value of p0 that produces them: 0, 1, 2, ... 190384760/5.
Now use this table to work out the values of 5p0 + 7p1 possible by combining the first two: 0, 5, 7, 10, 12, 14.... and keep the value of p1 needed to produce them.
Working from right to left you will end up with a table of the values up to just over 190384755 that can be created by positive integer combinations of p0..p8. You obviously only care about the largest one >= 190384755. Consider all possible values of the p8 contribution, subtract these from 190384755, and look in the table for p0..p7 to see which of these are possible. This gives you all possible values of p8, and for each of these you can recursively repeat the process to print out all possible values of p7, and so on repeat the recursion to provide all values of p0..p8 that yields the lowest value just over 190384755. This is very similar to the pseudo-polynomial algorithm for subset sum.
If the Pi can be < 0, then the achievable values are all multiples of the gcd of the Pi, which is very likely to be all integers, and there are an infinite number of solutions for this. If this is really what you want, you can start by reading about the http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm.
Maybe the given example is just a toy case.
If not, exhaustive search is quite feasible: the minimal sum is bounded by 259 (combination 0, 0, 37), and there are less than a half million combinations under this bound.
In addition, if you set two variables, say p2 and p3, such that 3(p2) + 7(p3) < 257, it is an easy matter to find the smallest p1 such that 2(p1) + 3(p2) + 7(p3) >= 257. You will just have to try 3200 (p2, p3) combinations or so.
Given a series x(i), i from 1 to N, let's say N = 10,000.
for any i < j,
D(i,j) = x(i) - x(j), if x(i) > x (j); or,
= 0, if x(i) <= x(j).
Define
Dmax(im, jm) := max D(i,j), for all 1 <= i < j <=N.
What's the best algorithm to calculate Dmax, im, and jm?
I tried to use Dynamic programming, but this seems is not dividable... Then i'm a bit lost... Could you guys please suggest? is backtracking the way out?
Iterate over the series, keeping track of the following values:
The maximum element so far
The maximum descent so far
For each element, there are two possible values for the new maximum descent:
It remains the same
It equals maximumElementSoFar - newElement
So pick the one which gives the higher value. The maximum descent at the end of iteration will be your result. This will work in linear time and constant additional space.
If I understand you correctly you have an array of numbers, and want to find the largest positive difference between two neighbouring elements of the array ?
Since you're going to have to go through the array at least once, to compute the differences, I don't see why you can't just keep a record, as you go, of the largest difference found so far (and of its location), updating as that changes.
If your problem is as simple as I understand it, I'm not sure why you need to think about dynamic programming. I expect I've misunderstood the question.
Dmax(im, jm) := max D(i,j) = max(x(i) -x(j),0) = max(max(x(i) -x(j)),0)
You just need to compute x(i) -x(j) for all values , which is O(n^2), and then get the max. No need for dynamic programming.
You can divide the series x(i) into sub series where each sub series contains and descending sub list of x(i) (e.g if x = 5, 4, 1, 2, 1 then x1 = 5, 4, 1 and x2 = 2, 1) and then in each sub list you can do: first_in_sub_series - last_sub_series and then compare all the results you get and find the maximum and this is the answer.
If i understood the problem correctly this should provide you with a basic linear algorithm to solve it.
e.g:
x = 5, 4, 1, 2, 1 then x1 = 5, 4, 1 and x2 = 2, 1
rx1 = 4
rx2 = 1
dmax = 4 and im = 1 and jm = 3 because we are talking about x1 which is the first 3 items of x.
I think there should be an algorithm for this out there - probably in a field like bioinformatics (the problem reminds me a bit of sequence alignment) so I hope someone can help me out here.
The problem is as follows: Assume I have classified some data into two different classes X and Y. The result of this may look something like this: ..XXX Y XXX.. Further assume that we have some domain knowledge about those classes and know that it's extremely unlikely to have less than a certain number of instances in a row (ie it's unlikely that there are less than 4 Xs or Ys in a sequence - preferably I could use a different threshold per class but that's not a must). So if we use this domain knowledge it's "obvious" that we'd like to replace the single Y in the middle with a X.
So the algorithm should take a sequence of classified instances and the thresholds for the classes (or 1 threshold for all if it simplifies the problem) and try to find a sequence that fulfills the property (no sequences of classes shorter than the given threshold). Obviously there can be an extremely large number of correct solutions (eg in the above example we could also replace all X with a Y) so I think a reasonable optimization criterium would be to minimize the number of replacements.
I don't need an especially efficient algorithm here since the number of instances will be rather small (say < 4k) and we'll only have two classes. Also since this is obviously only a heuristic I'm fine with some inaccuracies if they vastly simplify the algorithm.
A very similar problem to this can be solved as a classic dynamic programming shortest path problem. We wish to find the sequence which minimises some notion of cost. Penalise each character in the sequence that is different from the corresponding character in the original sequence. Penalise each change of character in the sequence, so penalise each change from X to Y and vice versa.
This is not quite what you want because the penalty for YYYXYYY is the same as the penalty for YXXXXXXY - one penalty for YX and one for XY - however it may be a good approximation because e.g. if the base sequence says YYY....YXY....YY then it will be cheaper to change the central X to a Y than to pay the cost of XY and YX - and you can obviously fiddle with the different cost penalties to get something that looks plausible.
Now think of each position in the sequence as being two points, one above the other, one point representing "X goes here" and one representing "Y goes here". You can link points with lines of cost depending on whether the corresponding character is X or Y in the original sequence, and whether the line joins an X with an X or an X with a Y or so on. Then work out the shortest path from left to right using a dynamic program that works out the best paths terminating in X and Y at position i+1, given knowledge of the cost of the best paths terminating in X and Y at position i.
If you really want to penalise short lived changes more harshly than long lived changes you can probably do so by increasing the number of points in the path-finding representation - you would have points that correspond to "X here and the most recent Y was 3 characters ago". But depending on what you want for a penalty you might end up with an incoveniently large number of points at each character.
You can use dynamic programming as in the following pseudocode sketch (for simplicity, this code assumes the threshold is 3 Xs or Ys in a row, rather than 4):
min_switch(s):
n = len(s)
optx = array(4, n, infinity) // initialize all values to infinity
opty = array(4, n, infinity) // initialize all values to infinity
if s[0] == 'X':
optx[1][0] = 0
opty[1][0] = 1
else:
optx[1][0] = 1
opty[1][0] = 0
for i in {1, n - 1}:
x = s[i]
if x == 'X':
optx[1][i] = opty[3][i - 1]
optx[2][i] = optx[1][i - 1]
optx[3][i] = min(optx[2][i - 1], optx[3][i - 1])
opty[1][i] = 1 + min(optx[1][i - 1], optx[2][i - 1], optx[3][i - 1])
opty[2][i] = 1 + opty[1][i - 1]
opty[3][i] = 1 + min(opty[2][i - 1], opty[3][i - 1])
else:
optx[1][i] = 1 + min(opty[1][i - 1], opty[2][i - 1], opty[3][i - 1])
optx[2][i] = 1 + opty[1][i - 1]
optx[3][i] = 1 + min(opty[2][i - 1], opty[3][i - 1])
opty[1][i] = optx[3][i - 1]
opty[2][i] = opty[1][i - 1]
opty[3][i] = min(opty[2][i - 1], opty[3][i - 1])
return min(optx[3][n - 1], opty[3][n - 1])
The above code essentially computes the lowest cost of creating a smooth sequence up to the ith character storing the optimal value for all relevant numbers of consecutive Xs or Ys in a row (1, 2, or 3 in a row). More formally
opt[i][0][k] stores the smallest
cost to convert the string s[0...k]
into a smooth sequence then ends in
i consecutive Xs. Runs of 3 or more
are accounted for in opt[3][0][k].
opt[0][j][k] stores the smallest
cost to convert the string s[0...k]
into a smooth sequence then ends in
j consecutive Ys. Runs of 3 or more
are accounted for in opt[0][3][k].
It is straightforward to convert this to an algorithm that returns the sequence as well as the optimal cost.
Note that some of the cases in the above code are probably unnecessary, it's just a straightforward recurrence derived from the constraints.