I'm working on a program to solve a variant of the 0/1 Knapsack problem.
The original problem is described here: https://en.wikipedia.org/wiki/Knapsack_problem.
In case the link goes missing in the future, I will give you a summary of the 0/1 Knapsack problem (if you are familiar with it, jump this paragraph):
Let's say we have n items, each with weight wi and value vi. We want to put items in a bag, that supports a maximum weight W, so that the total value inside the bag is the maximum possible without overweighting the bag. Items cannot have multiple instances (i.e., we only have one of each). The objective of the problem is to maximize SUM(vi.xi) so that SUM(wi.xi) <= W and xi = 0, 1 (xi represents the state of an item being or not in the bag).
For my case, there are small differences in both conditions and objective:
The weight of all items is 1, wi = 1, i = 1...n
I always want to put exactly half the items in the bag. So, the maximum weight capacity of the bag is half (rounded up) of the number of items.W = ceil[n/2] or W = floor[(n+1)/2].
Also, the weight inside the bag must be equal to its maximum capacity SUM(wi.xi) = W
Finally, instead of maximizing the value of the items inside the bag, the objective is that the value of the items inside is as close as possible to the value of the items outside. Hence, my objective is to minimize |SUM(vi.-xi) - SUM[vi(1-xi)]|, which simplifies into something like minimize |SUM[vi(2xi - 1)]|.
Now, there is a pseudo-code for the original 0/1 Knapsack problem in the Wikipedia page above (you can find it on the bottom of this text), but I am having trouble adapting it to my scenario. Can someone help? (I am not asking for code, just for an idea, so language is irrelevant)
Thanks!
Wikipedia's pseudo-code for 0/1 Knapsack problem:
Assume w1, w2, ..., wn, W are strictly positive integers. Define
m[i,w] to be the maximum value that can be attained with weight less
than or equal to w using items up to i (first i items).
We can define m[i,w] recursively as follows:
m[0, w]=0
m[i, w] = m[i-1, w] if wi > w (the new item is more than the current weight limit)
m[i, w]= max(m[i-1, w], m[i-1, w-wi] + vi) if wi <= w.
The solution can then be found by calculating m[n,W].
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
for j from 0 to W do:
m[0, j] := 0
for i from 1 to n do:
for j from 0 to W do:
if w[i-1] <= j then:
m[i, j] := max(m[i-1, j], m[i-1, j-w[i-1]] + v[i-1])
else:
m[i, j] := m[i-1, j]
Thanks to #harold, it seems like this problem is not a Knapsack problem, but a Partition problem. Part of the pseudo-code I was seeking is in the corresponding Wikipedia page: https://en.wikipedia.org/wiki/Partition_problem
EDIT: well, actually, Partition problem algorithms tell you whether a Set of items can be partitioned in 2 sets of equal value or not. Suppose it can't, you have approximation algorithms, which say whether you can have the set partiotioned in 2 sets with the difference their values being lower than d.
BUT, they don't tell you the resulting sub-sets, and that's what I was seeking.
I ended up finding a question here asking for that (here: Balanced partition), with a code example which I have tested and works fine.
Related
As I am not very proficient in various optimization/tree algorithms, I am seeking help.
Problem Description:
Assume, a large sequence of sorted nodes is given with each node representing an integer value L. L is always getting bigger with each node and no nodes have the same L.
The goal now is to find the best combination of nodes, where the difference between the L-values of subsequent nodes is closest to a given integer value M(L) that changes over L.
Example:
So, in the beginning I would have L = 50 and M = 100. The next nodes have L = 70,140,159,240,310.
First, the value of 159 seems to be closest to L+M = 150, so it is chosen as the right value.
However, in the next step, M=100 is still given and we notice that L+M = 259, which is far away from 240.
If we now go back and choose the node with L=140 instead, which then is followed by 240, the overall match between the M values and the L-differences is stronger. The algorithm should be able to find back to the optimal path, even if a mistake was made along the way.
Some additional information:
1) the start node is not necessarily part of the best combination/path, but if required, one could first develop an algorithm, which chooses the best starter candidate.
2) the optimal combination of nodes is following the sorted sequence and not "jumping back" -> so 1,3,5,7 is possible but not 1,3,5,2,7.
3) in the end, the differences between the L values of chosen nodes should in the mean squared sense be closest to the M values
Every help is much appreciated!
If I understand your question correctly, you could use Dijktras algorithm:
https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
http://www.mathworks.com/matlabcentral/fileexchange/20025-dijkstra-s-minimum-cost-path-algorithm
For that you have to know your neighbours of every node and create an Adjacency Matrix. With the implementation of Dijktras algorithm which I posted above you can specify edge weights. You could specify your edge weight in a manner that it is L of the node accessed + M. So for every node combination you have your L of new node + M. In that way the algorithm should find the optimum path between your nodes.
To get all edge combinations you can use Matlabs graph functions:
http://se.mathworks.com/help/matlab/ref/graph.html
If I understand your problem correctly you need an undirected graph.
You can access all edges with the command
G.Edges after you have created the graph.
I know its not the perfect answer but I hope it helps!
P.S. Just watch out, Djikstras algorithm can only handle positive edge weights.
Suppose we are given a number M and a list of n numbers, L[1], ..., L[n], and we want to find a subsequence of at least q of the latter numbers that minimises the sum of squared errors (SSE) with respect to M, where the SSE of a list of k positions x[1], ..., x[k] with respect to M is given by
SSE(M, x[1], ..., x[k]) = sum((L[x[i]]-L[x[i-1]]-M)^2) over all 2 <= i <= k,
with the SSE of a list of 0 or 1 positions defined to be 0.
(I'm introducing the parameter q and associated constraint on the subsequence length here because without it, there always exists a subsequence of length exactly 2 that achieves the minimum possible SSE -- and I'm guessing that such a short sequence isn't helpful to you.)
This problem can be solved in O(qn^2) time and O(qn) space using dynamic programming.
Define f(i, j) to be the minimum sum of squared errors achievable under the following constraints:
The number at position i is selected, and is the rightmost selected position. (Here, i = 0 implies that no positions are selected.)
We require that at least j (instead of q) of these first i numbers are selected.
Also define g(i, j) to be the minimum of f(k, j) over all 0 <= k <= i. Thus g(n, q) will be the minimum sum of squared errors achievable on the entire original problem. For efficient (O(1)) calculation of g(i, j), note that
g(i>0, j>0) = min(g(i-1, j), f(i, j))
g(0, 0) = 0
g(0, j>0) = infinity
To calculate f(i, j), note that if i > 0 then any solution must be formed by appending the ith position to some solution Y that selects at least j-1 positions and whose rightmost selected position is to the left of i -- i.e. whose rightmost selected position is k, for some k < i. The total SSE of this solution to the (i, j) subproblem will be whatever the SSE of Y was, plus a fixed term of (L[x[i]]-L[x[k]]-M)^2 -- so to minimise this total SSE, it suffices to minimise the SSE of Y. But we can compute that minimum: it is g(k, j-1).
Since this holds for any 0 <= k < i, it suffices to try all such values of k, and take the one that gives the lowest total SSE:
f(i>=j, j>=2) = min of (g(k, j-1) + (L[x[i]]-L[x[k]]-M)^2) over all 0 <= k < i
f(i>=j, j<2) = 0 # If we only need 0 or 1 position, SSE is 0
f(i, j>i) = infinity # Can't choose > i positions if the rightmost chosen position is i
With the above recurrences and base cases, we can compute g(n, q), the minimum possible sum of squared errors for the entire problem. By memoising values of f(i, j) and g(i, j), the time to compute all needed values of f(i, j) is O(qn^2), since there are at most (n+1)*(q+1) possible distinct combinations of input parameters (i, j), and computing a particular value of f(i, j) requires at most (n+1) iterations of the loop that chooses values of k, each iteration of which takes O(1) time outside of recursive subcalls. Storing solution values of f(i, j) requires at most (n+1)*(q+1), or O(qn), space, and likewise for g(i, j). As established above, g(i, j) can be computed in O(1) time when all needed values of f(x, y) have been computed, so g(n, q) can be computed in the same time complexity.
To actually reconstruct a solution corresponding to this minimum SSE, you can trace back through the computed values of f(i, j) in reverse order, each time looking for a value of k that achieves a minimum value in the recurrence (there may in general be many such values of k), setting i to this value of k, and continuing on until i=0. This is a standard dynamic programming technique.
I now answer my own post with my current implementation, in order to structure my post and load images. Unfortunately, the code does not do what it should do. Imagine L,M and q given like in the images below. With the calcf and calcg functions I calculated the F and G matrices where F(i+1,j+1) is the calculated and stored f(i,j) and G(i+1,j+1) from g(i,j). The SSE of the optimal combination should be G(N+1,q+1), but the result is wrong. If anyone found the mistake, that would be much appreciated.
G and F Matrix of given problem in the workspace. G and F are created by calculating g(N,q) via calcg(L,N,q,M).
calcf and calcg functions
I am trying to devise a pseudo-code for Knapsack algorithms, where a single item can be selected multiple times. The classical algorithm is
OPT(i, w) = max(OPT(i-1, w) or vi + OPT(i-1, w-wi))
In order to meet the requirements, I am modifying it to
k=1;
max = maximum(OPT(i-1, w))
while(OPT(i-1, w - k*wi) > 0) {
maximum = max(maximum, k*vi + OPT(i-1, w - k*wi))
k++
}
OPT(i, w) = maximum
Does this seem to be an appropriate solution? Or any better solution exists?
Please let me know if any additional information is required.
Rest all remains the same, vi denotes the value of ith element and wi denotes the weight of ith element.
If you want to be able to chose multiple items, all you have to do is to change the recursion when selecting an element:
OPT(i, w) = max(OPT(i-1, w) or vi + OPT(i, w-wi))
^ ^
removed the element not removing the element
The idea is, you allow readding it on next iteration. You add an element as much as you want, and you stop adding it when you "chose" to use the first condition in the stop recursive formula.
The recursion will still not be infinite because you must stop once w<0.
Time complexity does not change - O(nW)
Based on Algorithms DVP, the solution to Knapsack with repetition is like below:
K(0)=0
for w=1 to W:
K(w) = max{K(w - w_i) + v_i, w_i < w}
return K(W)
Here, W is the capacity; w_i is the weight of item i; v_i is the value of item i; K(w) is the max value achievable with knapsack of capacity w.
Your solution seems like 0-1 Knapsack though.
The dynamic programming algorithm to optimally fill a knapsack works well in the case of one knapsack. But is there an efficient known algorithm that will optimally fill 2 knapsacks (capacities can be unequal)?
I have tried the following two approaches and neither of them is correct.
First fill the first knapsack using the original DP algorithm to fill one knapsack and then fill the other knapsack.
First fill a knapsack of size W1 + W2 and then split the solution into two solutions (where W1 and W2 are the capacities of the two knapsacks).
Problem statement (see also Knapsack Problem at Wikipedia):
We have to fill the knapsack with a set of items (each item has a weight and a value) so as to maximize the value that we can get from the items while having a total weight less than or equal to the knapsack size.
We cannot use an item multiple times.
We cannot use a part of an item. We cannot take a fraction of an item. (Every item must be either fully included or not).
I will assume each of the n items can only be used once, and you must maximize your profit.
Original knapsack is dp[i] = best profit you can obtain for weight i
for i = 1 to n do
for w = maxW down to a[i].weight do
if dp[w] < dp[w - a[i].weight] + a[i].gain
dp[w] = dp[w - a[i].weight] + a[i].gain
Now, since we have two knapsacks, we can use dp[i, j] = best profit you can obtain for weight i in knapsack 1 and j in knapsack 2
for i = 1 to n do
for w1 = maxW1 down to a[i].weight do
for w2 = maxW2 down to a[i].weight do
dp[w1, w2] = max
{
dp[w1, w2], <- we already have the best choice for this pair
dp[w1 - a[i].weight, w2] + a[i].gain <- put in knapsack 1
dp[w1, w2 - a[i].weight] + a[i].gain <- put in knapsack 2
}
Time complexity is O(n * maxW1 * maxW2), where maxW is the maximum weight the knapsack can carry. Note that this isn't very efficient if the capacities are large.
The original DP assumes you mark in the dp array that values which you can obtain in the knapsack, and updates are done by consequently considering the elements.
In case of 2 knapsacks you can use 2-dimensional dynamic array, so dp[ i ][ j ] = 1 when you can put weight i to first and weight j to second knapsack. Update is similar to original DP case.
The recursive formula is anybody is looking:
Given n items, such that item i has weight wi and value pi. The two knapsacks havk capacities of W1 and W2.
For every 0<=i<=n, 0<=a<=W1, 0<=b<=W2, denote M[i,a,b] the maximal value.
for a<0 or b<0 - M[i,a,b] = −∞
for i=0, or a,b=0 - M[i,a,b] = 0
The formula:
M[i,a,b] = max{M[i-1,a,b], M[i-1,a-wi,b] + pi, M[i-1,a,b-wi] + pi}
Every solution to the problem with i items either has item i in knapsack 1, in knapsack 2, or in none of them.
In wikipedia the algorithm for Knapsack is as follows:
for i from 1 to n do
for j from 0 to W do
if j >= w[i] then
T[i, j] := max(T[i-1, j], T[i-1, j-w[i]] + v[i]) [18]
else
T[i, j] := T[i-1, j]
end if
end for
end for
And it is the same structures on all examples I found online.
What I can not understand is how does this code take into account the fact that perhaps the max value comes from a smaller knapsack? E.g. if the knapsack capacity is 8 then perhaps max value comes from capacity 7 (8 - 1).
I could not find anywhere logic to consider that perhaps the max value comes from a smaller knapsack. Is this wrong idea?
The Dynamic Programming solution of knapsack is basically recursive:
T(i,j) = max{ T(i-1,j) , T(i-1,j-w[i]) + v[i] }
// ^ ^
// ignore the element add the element, your value is increase
// by v[i] and the additional weight you can
// carry is decreased by w[i]
(The else condition is redundant in the recursive form if you set T(i,j) = -infinity for each j < 0).
The idea is exhaustive search, you start from one element and you have two possibilities: add it, or don't.
You check both options, and chose the best of those.
Since it is done recursively - you effectively checking ALL possibilities to assign the elements to the knapsack.
Note that the solution in wikipedia is basically a bottom-up solution for the same recursive formula
As I see, you have misunderstood the concept of knapsack. which I will describe here in details till we reach the code part.
First, there are two versions of the problem:
0-1 knapsack problem: here, the Items are indivisible, you either take an item or not. and can be solved with dynamic programming. //and this one is the one yo are facing problems with
Fractional knapsack problem: don't care about this one now.
For the first problem you can understand it as the following:
Given a knapsack with maximum capacity W, and a set S consisting of n items
Each item i has some weight wi and benefit value bi (all wi and W are integer values).
SO, How to pack the knapsack to achieve maximum total value of packed
items?
and in mathematical mouth:
and to solve this problem using Dynamic Programming We set up a table V[0..k, 0..W] with one row for each available item, and one column for each weight from 0 to W.
We need to carefully identify the sub-problems,
The sub-problem then will be to compute V[k,w], i.e., to find an optimal solution for
Sk= {items labeled 1, 2, .. k} in a knapsack of size w (maximum value achievable given capacity w and items 1,…, k)
So, we found this formula to solve our problem:
This algorithm only finds the max possible value that can be carried in the knapsack
i.e., the value in V[n,W]
To know the items that make this maximum value, this will be another topic.
I really hope that this answer will help you. I have an pp presentation that walks with you to fill the table and to show you the algorithm step by step. But I don't know how can I upload it to stackoverflow. let me know if any help needed.
I have 2 sets of integers, A and B, not necessarily of the same size. For my needs, I take the distance between each 2 elements a and b (integers) to be just abs(a-b).
I am defining the distance between the two sets as follows:
If the sets are of the same size, minimize the sum of distances of all pairs [a,b] (a from A and b from B), minimization over all possible 'pairs partitions' (there are n! possible partitions).
If the sets are not of the same size, let's say A of size m and B of size n, with m < n, then minimize the distance from (1) over all subsets of B which are of size m.
My question is, is the following algorithm (just an intuitive guess) gives the right answer, according to the definition written above.
Construct a matrix D of size m X n, with D(i,j) = abs(A(i)-B(j))
Find the smallest element of D, accumulate it, and delete the row and the column of that element. Accumulate the next smallest entry, and keep accumulating until all rows and columns are deleted.
for example, if A={0,1,4} and B={3,4}, then D is (with the elements above and to the left):
3 4
0 3 4
1 2 3
4 1 0
And the distance is 0 + 2 = 2, coming from pairing 4 with 4 and 3 with 1.
Note that this problem is referred to sometimes as the skis and skiers problem, where you have n skis and m skiers of varying lengths and heights. The goal is to match skis with skiers so that the sum of the differences between heights and ski lengths is minimized.
To solve the problem you could use minimum weight bipartite matching, which requires O(n^3) time.
Even better, you can achieve O(n^2) time with O(n) extra memory using the simple dynamic programming algorithm below.
Optimally, you can solve the problem in linear time if the points are already sorted using the algorithm described in this paper.
O(n^2) dynamic programming algorithm:
if (size(B) > size(A))
swap(A, B);
sort(A);
sort(B);
opt = array(size(B));
nopt = array(size(B));
for (i = 0; i < size(B); i++)
opt[i] = abs(A[0] - B[i]);
for (i = 1; i < size(A); i++) {
fill(nopt, infinity);
for (j = 1; j < size(B); j++) {
nopt[j] = min(nopt[j - 1], opt[j - 1] + abs(A[i] - B[j]));
swap(opt, nopt);
}
return opt[size(B) - 1];
After each iteration i of the outer for loop above, opt[j] contains the optimal solution matching {A[0],..., A[i]} using the elements {B[0],..., B[j]}.
The correctness of this algorithm relies on the fact that in any optimal matching if a1 is matched with b1, a2 is matched with b2, and a1 < a2, then b1 <= b2.
In order to get the optimum, solve the assignment problem on D.
The assignment problem finds a perfect matching in a bipartite graph such that the total edge weight is minimized, which maps perfectly to your problem. It is also in P.
EDIT to explain how OP's problem maps onto assignment.
For simplicity of explanation, extend the smaller set with special elements e_k.
Let A be the set of workers, and B be the set of tasks (the contents are just labels).
Let the cost be the distance between an element in A and B (i.e. an entry of D). The distance between e_k and anything is 0.
Then, we want to find a perfect matching of A and B (i.e. every worker is matched with a task), such that the cost is minimized. This is the assignment problem.
No It's not a best answer, for example:
A: {3,7} and B:{0,4} you will choose: {(3,4),(0,7)} and distance is 8 but you should choose {(3,0),(4,7)} in this case distance is 6.
Your answer gives a good approximation to the minimum, but not necessarily the best minimum. You are following a "greedy" approach which is generally much easier, and gives good results, but can not guarantee the best answer.