I was reading wikipedia regarding the 0-1 knapsack problem. I just want to clarify a couple things. I have two questions:
http://en.wikipedia.org/wiki/Knapsack_problem#0.2F1_Knapsack_Problem
I encountered this pseudo-code:
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
for w from 0 to W do
m[0, w] := 0
end for
for i from 1 to n do
for j from 0 to W do
if j >= w[i] then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
else
m[i, j] := m[i-1, j]
end if
end for
end for
Specifically for this part:
if j >= w[i] then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
1) Correct me if I'm wrong, but shouldn't it be:
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i], m[i,j-w[i]] + v[i])
?
Or if not, can someone explain me why it's not needed?
...
2) And I also have another question, if say I want to optimize this a bit. Would it be wise to have the loop "for j from 0 to W" increment by the GCD of all the weights of the items (i.e. GCD of the values stored in array w). (I'm thinking just code-wise right now when I'm about to implement it).
1) When you add m[i,j-w[i]] + v[i], you're allowing the same item i to be selected more than once, thus it is no longer 0/1 Knapsack - it becomes a Knapsack problem with unlimited amounts of each item.
2) Yes, but this GCD usually comes down to 1 on real instances, thus not worth bothering with for general cases, unless you know beforehand that your data would benefit from it. (In this case, you'd actually want to divide all your data by the GCD, and keep the original algorithm incrementing 1 at a time, then multiply the final result by the GCD. This would save you memory as well, but your Knapsack capacity must also be divisible by such GCD)
Related
Got this task from a game.
Farmer has
a field of constant size 16
a daily water supply which he can improve while progressing the game. For a given day it is 100, in a few days it may become 130.
around 50 types of crops. Each crop has daily yield (in gold coins) and water consumption.
Each crop type must be unique (planted once per day)
The goal is to get maximum gold per day.
Which breaks down to finding 1-16 crops whose sum of water consumption is no more than water supply (input parameter, e.g. 100) and yield sum is maximum.
Crop types are generated randomly with yield range 10-50000 and consumption range 1-120.
Brute force needs 50!/(50-16)! iterations - about 10e27.
Are there any more optimal ways to find max output?
This is called the Knapsack problem. Here's pseudocode lifted from Wikipedia.
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
// NOTE: The array "v" and array "w" are assumed to store all relevant values starting at index 1.
array m[0..n, 0..W];
for j from 0 to W do:
m[0, j] := 0
for i from 1 to n do:
m[i, 0] := 0
for i from 1 to n do:
for j from 0 to W do:
if w[i] > j then:
m[i, j] := m[i-1, j]
else:
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
https://en.wikipedia.org/wiki/Knapsack_problem
https://medium.com/#fabianterh/how-to-solve-the-knapsack-problem-with-dynamic-programming-eb88c706d3cf
Is there a way to compute the knapsack problem incrementally? Any approximation algorithm? I am trying to solve the problem in the following scenario.
Let D be my data set which is not ordered and should not be. D is divided into 3 subsets, namely D1, D2 and D3. D1, D2 and D3 can each be ordered if needed. I want to compute separate knapsack solutions for sets (D1,D2) and (D2,D3), but I don't want to avoid computing D2 two times. So, basically, I want to:
compute (D2) // do some operation
save it as an intermediate result
use it with D1 and get knapsack result for (D1, D2)
use it with D3 and get knapsack result for (D2,D3)
That way the data traversal over D2 is done only once. Is there a way to solve the knapsack incrementally like this?
Wikipedia gives this pseudocode for 0/1 Knapsack: https://en.wikipedia.org/wiki/Knapsack_problem#0.2F1_knapsack_problem
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
for j from 0 to W do:
m[0, j] := 0
for i from 1 to n do:
for j from 0 to W do:
if w[i-1] > j then:
m[i, j] := m[i-1, j]
else:
m[i, j] := max(m[i-1, j], m[i-1, j-w[i-1]] + v[i-1])
This builds a 2 dimensional array such that m[n, W] (the last element in the last row) is the solution -- you run this on D2.
Then you write another algorithm that takes this array as input and
Does not do the for j ... part to initialize the array
Does for i from D2.count+1 to (D2.count + other.count) do: to start where the other one left off. (you have to adjust i when looking up in the w and v arrays)
In http://en.wikipedia.org/wiki/Knapsack_problem, the DP is:
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
for j from 0 to W do
m[0, j] := 0
end for
for i from 1 to n do
for j from 0 to W do
if w[i] <= j then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
else
m[i, j] := m[i-1, j]
end if
end for
end for
I think switching the order for the weight loop and the number loop does not impact the optimal solution. Is this right? Say
for j from 0 to W do
for i from 1 to n do
Thanks.
You are correct. The value of m[i,j] depends only on values with both smaller is and js. The situation where changing the lop order matters is when one of the elements can increase. For example, if m[2,2] depends on m[1,3] then we need calculate the first row comlpetely before moving to the second row.
I need to modify wiki's knapsack pseudocode for my homework so it checks whether you can achieve exact weight W in the knapsack or not. Number of items is unlimited and you the value not important. I am thinking to add a while loop under j>-W[j] to check how many same items would it fit. Will that work?
Thanks
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
for w from 0 to W do
m[0, w] := 0
end for
for i from 1 to n do
for j from 0 to W do
if j >= w[i] then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
else
m[i, j] := m[i-1, j]
end if
end for
end for
If I'm not missing anything, if you are referring to this wiki article
In the modified version, your values array shall be the same as your w array. After calculating m[n,W], simply check whether it is equal to W.
EDIT:
If you have unlimited number of items, then you are dealing with Unbounded Knapsack Problem. This is a different problem and the same article gives the dynamic programming implementation for solving it.
I'm playing around with Levenshteins Edit Distance algorithm, and I want to extend this to count transpositions -- that is, exchanges of adjacent letters -- as 1 edit. The unmodified algorithm counts insertions, deletes or substitutions needed to reach a certain string from another. For instance, the edit distance from "KITTEN" to "SITTING" is 3. Here's the explanation from Wikipedia:
kitten → sitten (substitution of 'k' with 's')
sitten → sittin (substitution of 'e' with 'i')
sittin → sitting (insert 'g' at the end).
Following the same method, the edit distance from "CHIAR" to "CHAIR" is 2:
CHIAR → CHAR (delete 'I')
CHAR → CHAIR (insert 'I')
I would like to count this as "1 edit", since I only exchange two adjacent letters. How would I go about to do this?
You need one more case in the algorithm from Wikipedia:
if s[i] = t[j] then
d[i, j] := d[i-1, j-1]
else if i > 0 and j > 0 and s[i] = t[j - 1] and s[i - 1] = t[j] then
d[i, j] := minimum
(
d[i-2, j-2] + 1 // transpose
d[i-1, j] + 1, // deletion
d[i, j-1] + 1, // insertion
d[i-1, j-1] + 1 // substitution
)
else
d[i, j] := minimum
(
d[i-1, j] + 1, // deletion
d[i, j-1] + 1, // insertion
d[i-1, j-1] + 1 // substitution
)
You have to modify how you update the dynamic programming table. In the original algorithm one considers the tails(or heads) of the two words that differ at the most by length one. The update is the minimum of all such possibilities.
If you want to modify the algorithm such that changes in two adjacent locations count as one, the minimum above has to be computed over tails(or heads) that differ by at most two. You can extend this to larger neighborhoods but the complexity will increase exponentially in the size of that neighborhood.
You can generalize further and assign costs that depend on the character(s) deleted, inserted or substituted, but you have to make sure that the cost you assign to a pair-edit is lower than two single edits, otherwise the two single edits will always win.
Let the words be w1 and w2
dist(i,j) = min(
dist(i-2,j-2) && w1(i-1,i) == w2(j-1,j) else
dist(i-1,j-1) && w1(i) == w2(j) else
dist(i,j-1) + cost(w2(j)),
dist(i-1,j) + cost(w1(i)),
dist(i-1,j-1) + cost(w1(i), w2(j)),
dist(i, j-2) + cost(w2(j-1,j)),
dist(i-2, j) + cost(w1(i-1,i)),
dist(i-2,j-2) + cost(w1(i-1,i), w2(j-1,j))
)
What I mean by the && is that those lines should be considered only if the conditions are satisfied.
The other answers are implementing the Optimal String Alignment algorithm, not Damerau Levenshtein which I think is what you are describing.
I have a java implementation of OSA with some optimizations here:
https://gist.github.com/steveash/5426191