0-1 Knapsack with penalty for under and overweight cases - algorithm

Assume a classic 0-1 knapsack problem but you are allowed to overflow/underflow the sack with some penalty. X profit is deducted for every unit overflow (weight above max capacity) and Y profit is deducted for every unit underflow (weight below max capacity).
I thought of sorting all items by the ratio of profit to weight and then try to fill the sack like a normal knapsack problem then for remaining weight and items I calculate extra profit by taking the underflow, overflow in consideration.
This solution fails in some cases like when there are 3 items with weight 30,20,10 and profit 30, 25, 20 respectively. Max weight allowed is 39, underflow penalty is 5 and overflow penalty is 10.
My solution was to solve it like normal knapsack then considering penalties so it gives the solution of selecting items of weight 20,10 but then it does not add the item of weight 30 as its penalty is higher than profit. The optimal solution should be selection items of weight 30 and 10. The only thing I can think of now is to brute force which should be avoided if possible. If anyone could think of any other solution, that'd be great!

You can break it into two subproblems, one with an underweight penalty and one with an overweight penalty. More specifically, you can solve the problem by solving two different integer linear programming problems, and taking the best of the two solutions:
Say that you have n items of weights w1,w2,...,wn and values v1, v2, ..., vn. Say that the weight capacity is C, the penalty for undeweight is A and the penality for overweight is B (per unit).
In both problems, let the binary decision variable be x1, ..., xn indicating whether or not the corresponding item is selected.
Problem 1)
max v1*x1 + v2*x2 + ... + vn*xn - A*(C - w1*x1 - w2*x2 - ... - wn*xn)
subject to
w1*x1 + w2*x2 + ... + wn*xn <= C
Note that via algebra the objective function is the same as the affine expression
(v1 + A*w1)*x1 + ... + (vn + A*wn)*xn - A*C
and is maximized at the same values x1, ..., xn which maximize the purely linear function
(v1 + A*w1)*x1 + ... + (vn + A*wn)*xn
This subproblem can be solved using any ILP solver, or just as an ordinary knapsack problem.
Problem 2)
max v1*x1 + v2*x2 + ... + vn*xn - B*(w1*x1 + w2*x2 + ... + wn*xn - C)
subject to
w1*x1 + w2*x2 + ... + wn*xn >= C
which can be solved by maximizing the linear objective function
(v1 - B*w1)*x1 + ... + (vn - B*wn)*xn
Again, that can be solved with any ILP solver. This problem isn't a knapsack problem since the inequality in the main constraint points in the wrong direction, though there might be some way of reducing it to a knapsack problem.
On Edit. The second problem can also be solved as a knapsack problem -- one in which you decide which items to not include. Start with the solution in which you include everything. If this isn't feasible (if the sum of all weights doesn't exceed the capacity) then you are done. The solution of problem 1 is the global solution. Otherwise. Define the surplus, S, to be
S = w1 + w2 + ... + wn - C
Now, solve the following knapsack problem:
weights: w1, w2, ..., wn //same as before
values: Bw1 - v1, Bw2 - v2, ..., BWn - vn
capacity: S
A word on the values: Bwi - vi is a measure of how much removing the ith object helps (under the assumption that removing it keeps you above the original capacity so that you don't need to consider the underweight penalties). On the one hand, it removes part of the penalty, Bwi, but on the other hand it takes some value away, vi.
After you solve this knapsack problem -- remove these items. The remaining items are the solution for problem 2.
Lets see how this plays out for your toy problem:
weights: 30, 20, 10
values: 20, 25, 20
C: 39
A: 5 //per-unit underflow penalty
B: 10 //per-unit overflow penalty
For problem 1, solve the following knapsack problem:
weights: 30, 20, 10
values: 170, 125, 70 // = 20 + 5*30, 25 + 5*20, 20 + 5*10
C: 39
This has solution: include 20, 10 with value of 195. In terms of the original problem this has value 195 - 5*39 = 0. That seems a bit weird, but in terms of the original problem the value of using the last two items is 25 + 20 = 45 but it leaves you 9 units under with a penalty of 5*9 = 45 and 45 - 45 = 0
Second problem:
weights: 30, 20, 10
values: 280, 175, 80 // = 10*30 - 20, 10*20 - 25, 10*10 - 20
S: 26 // = 30 + 20 + 10 - 39
The solution of this problem is clearly to select 20. This means that 20 is selected for non-inclusion. This means that for the second problem I want to include the objects of weights 30 and 10.
The value of doing so is (in terms of the original problem)
20 + 20 - 10*1 = 30
Since 30 > 0 (the value of solution 1), this is the overall optimal solution.
To sum up: you can solve your version of the knapsack problem by solving two ordinary knapsack problems to find two candidate solutions and then taking the better of the two. If you already have a function to solve knapsack problems, it shouldn't be too hard to write another function which calls it twice, interprets the outputs, and returns the best solution.

You can still use standard dynamic programming.
Let's compute whether the sum s is reachable for all s from 0 to the sum of all elements of the array. That's exactly what a standard dynamic programming solution does. We don't care about penalty here.
Let's iterate over all reachable sums and choose the best one taking into account the penalty for over(or under)flow.

Related

Optimization Algorithm for large search space

Problem:
Find a combination of 48 numbers (x) ranging from 1-6 that maximises an equation (y). The equation comprises of 48 distinct functions that are unknown and only take in one number per function.
max: y = f1(x1) + f2(x2) + ... + f48(x48)
where: x = {1:6}
example: x = [6, 1, 4, ..., 4] => y = 167
My first idea was to solve this using brute force, however, the search space is very large 6^48. Does anyone know of an algorithm that I could use or clever programming tricks?
The search space if not that large at all.
y is the sum of 48 distinct functions, so you need to maximize each one of them. There are 6 possibilities for each f_i, in total you need to check 6*48=288 cases to brute force.
Start with some base answer like x = [1, ..., 1]. Find the optimal value for x_1, then x_2, etc.

How to find a combination of elements that sum up just above threshold value

I have a problem statement which says: if you have an array of elements {x1,x2,x3,...x10}, find the combination of elements such that it just sums up above a threshold value (say the threshold value is 100).
So if there exists a combination like x2+x5+x8 = 105, x3+x5+x8=103, and x4+x5 = 101, then the algorithm should output X4, X5.
The knapsack algorithm emits a value that is near but on the lesser side of the threshold (which is 100 here). I want the opposite, that is the smallest sum of selected elements that is greater than 100.
Is there any set of algorithms or any special case of any algorithm which might solve this problem?
I'll start out by noting that you are asking for the smallest value strictly greater than some target. In general "strictly greater than" and "strictly less than" constraints are much harder than "greater than or equal to" or "less than or equal to" constraints. If you have all integer values, then you could simply translate your constraint "the sum exceeds 100" to "the sum is greater than or equal to 101". I'll assume that you've made such a transformation for the rest of the problem.
One approach would be to treat this as an integer optimization problem, in which the binary decision variable y_i for each number is whether or not we include it. Then our goal is to minimize the sum of the numbers, which can be modeled as:
min x_1*y_1 + x_2*y_2 + ... + x_n*y_n
The constraint in this case is that the sum of the numbers is at least 100:
x_1*y_1 + x_2*y_2 + ... + x_n*y_n >= 100
In general this is a hard problem (note that it is at least as hard as the subset sum problem, which is NP-complete). However modern optimization solvers may be efficient enough for your problem instances.
To test the scalability of a free solver for this problem, consider the following implementation with the lpSolve package in R (it returns the selected subset if the problem is feasible and NA otherwise):
library(lpSolve)
min.subset <- function(x, min.sum) {
mod <- lp("min", x, matrix(x, nrow=1), ">=", min.sum, all.bin=TRUE)
if (mod$status == 0) {
which(mod$solution >= 0.999)
} else {
NA
}
}
min.subset(1:10, 43.5)
# [1] 2 3 4 5 6 7 8 9
min.subset(1:10, 88)
# [1] NA
To test the scalability, I'll select n elements randomly from [1, 2, ..., 1000], setting the target sum to be half the sum of the elements. The runtimes were:
With n=100, it ran in 0.01 seconds
With n=1000, it ran in 0.1 seconds
With n=10000, it ran in 8.7 seconds
It appears you can solve this problem for more than 10k elements (with the selected distribution) without too many computational challenges. If your problem is too big for the free solver I've used here, you might consider Gurobi or cplex, two commercial solvers that are free for academic use but otherwise not free.
Suppose X is the sum of all x_i. Then equivalently, you are asking for a minimum subset of your x_i that sum up to at most X - 100 (as the complement of these x_i will be the optimum solution to your problem). So all Knapsack theory can be applied here.
In practice (really large instances), I'd suggest this form of Nemhauser-Ullman generalization which can solve instances with millions of objects.

Find equation based on known x and answers

So basically I have something like this
[always 8 numbers]
5-->10
2-->4
9-->18
7-->14
I know four x and the answers for that four x. I need to find equation so it fits for all of those x and their answers. I know there is infinite number of equations possible, but I would like to solve for shortest ones if possible.
For this example
x*2 or x+x fits the best
of course something like this x*3-x and infinite number of other equations works also but they're not most optimal ones like x*2
Any ideas, theories or algorithms that solve similar problem?
Using the numbers you provided:
5-->10
2-->4
9-->18
7-->14
You want to find a, b, c and d that solve the system defined by:
ax^3 + bx^2 + cx + d = f(x)
So, in your case it is:
125a + 25b + 5c + d = 10
8a + 4b + 2c + d = 4
729a + 81b + 9c + d = 18
343a + 49b + 7c + d = 14
If you solve the system you'll find that (a,b,c,d) must be (0, 0, 2, 0). So, the minimum polynomial is 2x.
I made a website some time ago that solves this:
http://juanlopes.net/actually42/#5%2010%202%204%209%2018%207%2014/true/true
If your goal is to fit the data to a polynomial function, i.e. something like:
f(x) = a_0 + a_1*x + a_2*x^2 + ... + a_n*x^n where each a_i is a real (or complex) number,
then there is some theory available as to when it is possible to put all those points on a single curve. What you can do is pick a degree (the highest power of x) and then write down a system of equations and solve the system (or try to solve it). For example, if the degree is 2, then your data become:
10 = a_0 + a_1*5 + a_2*5^2
4 = a_0 + a_1*2 + a_2*2^2
etc
If you are able to solve the system, then great. If not, you need a larger degree. Solving the system can be done (built in) in many languages via matrix multiplication. You may want to start out by saying: can my data all fit on a polynomial of degree 1? if yes, done. If not, does it fit on degree 2 polynomial? if yes, done. If not, degree 3, etc. Be careful though, because in general you may have data that you cannot fit "exactly" to a polynomial (or any function for that matter). If you just want a low degree polynomial that is very close, then you want to look into polynomial regression (which will give you a best fit polynomial), see: http://en.wikipedia.org/wiki/Polynomial_regression

Sum of Multiples of Numbers Greater Or Equal to Target, Optimization

Given an equation
Like 2(p1) + 3(p2) + 7(p3) >= 257
I need to find all possible combinations of p1, p2, p3
such the above statement is true and the resulting sum ( left hand side of the equation ) is minimal where all xn were known.
I tried looking up algorithms for general cases like
(x1)(p1) + (x2)(p2) + (x3)(p4) + ... + (xn)(pn) >= target
And I came across the Knapsack problem and Subset-Sum algorithm solutions, but they weren't exactly like this problem.
I tried before using an algorithm in Python 3.x that has lower-bound values for pn, but it still runs in O( ridiculous ) time complexity.
Obviously all numbers here are natural numbers, otherwise there would be infinite solutions.
I can see two possible approaches, depending on whether the Pi have to be >= 0. The case with Pi >= 0 is more sensible, so I will consider it first.
Treat this as dynamic programming, where you work from left to right along the equation. Looking at the larger equation in your comment, first of all create a list of the contributions from p0: 0, 5, 10, 15... 190384760, and beside them the value of p0 that produces them: 0, 1, 2, ... 190384760/5.
Now use this table to work out the values of 5p0 + 7p1 possible by combining the first two: 0, 5, 7, 10, 12, 14.... and keep the value of p1 needed to produce them.
Working from right to left you will end up with a table of the values up to just over 190384755 that can be created by positive integer combinations of p0..p8. You obviously only care about the largest one >= 190384755. Consider all possible values of the p8 contribution, subtract these from 190384755, and look in the table for p0..p7 to see which of these are possible. This gives you all possible values of p8, and for each of these you can recursively repeat the process to print out all possible values of p7, and so on repeat the recursion to provide all values of p0..p8 that yields the lowest value just over 190384755. This is very similar to the pseudo-polynomial algorithm for subset sum.
If the Pi can be < 0, then the achievable values are all multiples of the gcd of the Pi, which is very likely to be all integers, and there are an infinite number of solutions for this. If this is really what you want, you can start by reading about the http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm.
Maybe the given example is just a toy case.
If not, exhaustive search is quite feasible: the minimal sum is bounded by 259 (combination 0, 0, 37), and there are less than a half million combinations under this bound.
In addition, if you set two variables, say p2 and p3, such that 3(p2) + 7(p3) < 257, it is an easy matter to find the smallest p1 such that 2(p1) + 3(p2) + 7(p3) >= 257. You will just have to try 3200 (p2, p3) combinations or so.

Problems with dynamic programming

I've got difficulties with understanding dynamic programming, so I decided to solve some problems. I know basic dynamic algorithms like longest common subsequence, knapsack problem, but I know them because I read them, but I can't come up with something on my own :-(
For example we have subsequence of natural numbers. Every number we can take with plus or minus. At the end we take absolute value of this sum. For every subsequence find the lowest possible result.
in1: 10 3 5 4;
out1: 2
in2: 4 11 5 5 5;
out2: 0
in3: 10 50 60 65 90 100;
out3: 5
explanation for 3rd: 5 = |10+50+60+65-90-100|
what it worse my friend told me that it is simple knapsack problem, but I can't see any knapsack here. Is dynamic programming something difficult or only I have big problems with it?
As has been pointed out by amit, this algorithm can be understood as an instance of the partition problem. For a simple implementation take a look at this Python code:
def partition(A):
n = len(A)
if n == 0:
return 0
k, s = max(A), sum(A)/2.0
table = [0 if x else 1 for x in xrange(n*k)]
for i in xrange(n):
for j in xrange(n*k-1, -1, -1):
if table[j-A[i]] > table[j]:
table[j] = 1
minVal, minIdx = float('+inf'), -1
for j in xrange(int(s)+1):
if table[j] and s-j < minVal:
minVal, minIdx = s-j, j
return int(2*minVal)
When called with one of the inputs in the question:
partition([10, 50, 60, 65, 90, 100])
It will return 5, as expected. For fully understanding the math behind the solution, please take a look at this examples and click the "Balanced Partition" link.
The knapsack in here is weight = value = number for each element.
your bound W is 1/2 * sum(elements).
The idea is - you want to maximize the amount of numbers you "pick" without passing the limit of 1/2 * sum(elements), which is exactly knapsack with value=weight.
This problem is actually the partition problem, which is a special case of the subset sum problem.
The partition problem says: "Is it possible to get a subset of the elements that sums exactly to half?"
The derivation to your problem from here is simple - if there is, take these as +, and those you didn't take as -, and you get out = 0. [the other way around works the same]. Thus, your described problem is the optimization for partition problem.
This is the same problem as in Tug Of War, without the constraint of balanced team sizes (which is not relevant):
http://acm.uva.es/p/v100/10032.html
I had solved this problem with a top-down approach. It works on the constraint that there is an upper limit to the numbers given. Do you have an upper limit or are the numbers unconstrained? If they are unconstrained I don't see how to solve this with dynamic programming.

Resources