I'm learning about dynamic programming via the 0-1 knapsack problem.
I'm getting some weird Nulls out from the function part1. Like 3Null, 5Null etc. Why is this?
The code is an implementation of:
http://www.youtube.com/watch?v=EH6h7WA7sDw
I use a matrix to store all the values and keeps, dont know how efficient this is since it is a list of lists(indexing O(1)?).
This is my code:
(* 0-1 Knapsack problem
item = {value, weight}
Constraint is maxweight. Objective is to max value.
Input on the form:
Matrix[{value,weight},
{value,weight},
...
]
*)
lookup[x_, y_, m_] := m[[x, y]];
part1[items_, maxweight_] := {
nbrofitems = Dimensions[items][[1]];
keep = values = Table[0, {j, 0, nbrofitems}, {i, 1, maxweight}];
For[j = 2, j <= nbrofitems + 1, j++,
itemweight = items[[j - 1, 2]];
itemvalue = items[[j - 1, 1]];
For[i = 1, i <= maxweight, i++,
{
x = lookup[j - 1, i, values];
diff = i - itemweight;
If[diff > 0, y = lookup[j - 1, diff, values], y = 0];
If[itemweight <= i ,
{If[x < itemvalue + y,
{values[[j, i]] = itemvalue + y; keep[[j, i]] = 1;},
{values[[j, i]] = x; keep[[j, i]] = 0;}]
},
y(*y eller x?*)]
}
]
]
{values, keep}
}
solvek[keep_, items_, maxweight_] :=
{
(*w=remaining weight in knapsack*)
(*i=current item*)
w = maxweight;
knapsack = {};
nbrofitems = Dimensions[items][[1]];
For[i = nbrofitems, i > 0, i--,
If[keep[[i, w]] == 1, {Append[knapsack, i]; w -= items[[i, 2]];
i -= 1;}, i - 1]];
knapsack
}
Clear[keep, v, a, b, c]
maxweight = 5;
nbrofitems = 3;
a = {5, 3};
b = {3, 2};
c = {4, 1};
items = {a, b, c};
MatrixForm[items]
Print["Results:"]
results = part1[items, 5];
keep = results[[1]];
Print["keep:"];
Print[keep];
Print["------"];
results2 = solvek[keep, items, 5];
MatrixForm[results2]
(*MatrixForm[results[[1]]]
MatrixForm[results[[2]]]*)
{{{0,0,0,0,0},{0,0,5 Null,5 Null,5 Null},{0,3 Null,5 Null,5 Null,8 Null},{4 Null,4 Null,7 Null,9 Null,9 Null}},{{0,0,0,0,0},{0,0,Null,Null,Null},{0,Null,0,0,Null},{Null,Null,Null,Null,Null}}}
While your code gives errors here, the Null problem occurs because For[] returns Null. So add a ; at the end of the outermost For statement in part1 (ie, just before {values,keep}.
As I said though, the code snippet gives errors when I run it.
In case my answer isn't clear, here is how the problem occurs:
(
Do[i, {i, 1, 10}]
3
)
(*3 Null*)
while
(
Do[i, {i, 1, 10}];
3
)
(*3*)
The Null error has been reported by acl. There are more errors though.
Your keep matrix actually contains two matrices. You need to call solvek with the second one: solvek[keep[[2]], items, 5]
Various errors in solvek:
i -= 1 and i - 1 are more than superfluous (the latter one is a coding error anyway). The i-- in the beginning of the For is sufficient. As it is now you're decreasing i twice per iteration.
Append must be AppendTo
keep[[i, w]] == 1 must be keep[[i + 1, w]] == 1 as the keep matrix has one more row than there are items.
Not wrong but superfluous: nbrofitems = Dimensions[items][[1]]; nbrofitems is already globally defined
The code of your second part could look like:
solvek[keep_, items_, maxweight_] :=
Module[{w = maxweight, knapsack = {}, nbrofitems = Dimensions[items][[1]]},
For[i = nbrofitems, i > 0, i--,
If[keep[[i + 1, w]] == 1, AppendTo[knapsack, i]; w -= items[[i, 2]]]
];
knapsack
]
I have aproblem:
Thread::tdlen: Objects of unequal length in {Null} {} cannot be combined. >>
It seems to occur in the while test which makes no sense at all since I am onlu comparing numbers...?
The program is a program to solve the 0-1 knapsack dynamic programming problem though I use loops, not recursion.
I have put some printouts and i can only think that the problem is in the while loop and it doesnt make sense.
(* 0-1 Knapsack problem
item = {value, weight}
Constraint is maxweight. Objective is to max value.
Input on the form:
Matrix[{value,weight},
{value,weight},
...
]
*)
lookup[x_, y_, m_] := m[[x, y]];
generateTable[items_, maxweight_] := {
nbrofitems = Dimensions[items][[1]];
keep = values = Table[0, {j, 0, nbrofitems}, {i, 1, maxweight}];
For[j = 2, j <= nbrofitems + 1, j++,
itemweight = items[[j - 1, 2]];
itemvalue = items[[j - 1, 1]];
For[i = 1, i <= maxweight, i++,
{
x = lookup[j - 1, i, values];
diff = i - itemweight;
If[diff > 0, y = lookup[j - 1, diff, values], y = 0];
If[itemweight <= i ,
{If[x < itemvalue + y,
{values[[j, i]] = itemvalue + y; keep[[j, i]] = 1;},
{values[[j, i]] = x; keep[[j, i]] = 0;}]
},
y(*y eller x?*)]
}
]
];
{values, keep}
}
pickItems[keep_, items_, maxweight_] :=
{
(*w=remaining weight in knapsack*)
(*i=current item*)
w = maxweight;
knapsack = {};
nbrofitems = Dimensions[items][[1]];
i = nbrofitems + 1;
x = 0;
While[i > 0 && x < 10,
{
Print["lopp round starting"];
x++;
Print["i"];
Print[i];
Print["w"];
Print[w];
Print["keep[i,w]"];
Print[keep[[i, w]]];
If[keep[[i, w]] == 1,
{Append[knapsack, i];
Print["tjolahej"];
w -= items[[i - 1, 2]];
i -= 1;
Print["tjolahopp"];
},
i -= 1;
];
Print[i];
Print["loop round done"];
}
knapsack;
]
}
Clear[keep, v, a, b, c]
maxweight = 5;
nbrofitems = 3;
a = {5, 3};
b = {3, 2};
c = {4, 1};
items = {a, b, c};
MatrixForm[items]
results = generateTable[items, 5];
keep = results[[1]][[2]];
Print["keep:"];
MatrixForm[keep]
Print["------"];
results2 = pickItems[keep, items, 5];
MatrixForm[results2]
This is not really an answer to the specific question being asked, but some hints on general situations when this error occurs. The short answer is that this is a sign of passing lists of unequal lengths to some Listable function, user-defined or built-in.
Many of Mathematica's built-in functions are Listable(have Listable attribute). This basically means that, given lists in place of some or all arguments, Mathematica automatically threads the function over them. What really happens is that Thread is called internally (or, at least, so it appears). This can be illustrated by
In[15]:=
ClearAll[f];
SetAttributes[f,Listable];
f[{1,2},{3,4,5}]
During evaluation of In[15]:= Thread::tdlen: Objects of unequal length in
f[{1,2},{3,4,5}] cannot be combined. >>
Out[17]= f[{1,2},{3,4,5}]
You can get the same behavior by using Thread explicitly:
In[19]:=
ClearAll[ff];
Thread[ff[{1,2},{3,4,5}]]
During evaluation of In[19]:= Thread::tdlen: Objects of unequal length in
ff[{1,2},{3,4,5}] cannot be combined. >>
Out[20]= ff[{1,2},{3,4,5}]
In case of Listable functions, this is a bit more hidden though. Some typical examples would include things like {1, 2} + {3, 4, 5} or {1, 2}^{3, 4, 5} etc. I discussed this issue in a bit more detail here.
Try this version:
pickItems[keep_, items_, maxweight_] := Module[{},
{(*w=remaining weight in knapsack*)(*i=current item*)w = maxweight;
knapsack = {};
nbrofitems = Dimensions[items][[1]];
i = nbrofitems + 1;
x = 0;
While[i > 0 && x < 10,
{
Print["lopp round starting"];
x++;
Print["i"];
Print[i];
Print["w"];
Print[w];
Print["keep[i,w]"];
Print[keep[[i, w]]];
If[keep[[i, w]] == 1,
{
Append[knapsack, i];
Print["tjolahej"];
w -= items[[i - 1, 2]];
i -= 1;
Print["tjolahopp"];
},
i -= 1;
];
Print[i];
Print["loop round done"]
};
knapsack
]
}
]
no errors now, but I do not know what it does really :)
I wrote Levenshtein algorithm in in C++
If I input:
string s: democrat
string t: republican
I get the matrix D filled-up and the number of operations (the Levenshtein distance) can be read in D[10][8] = 8
Beyond the filled matrix I want to construct the optimal solution. How must look this solution? I don't have an idea.
Please only write me HOW MUST LOOK for this example.
The question is
Given the matrix produced by the Levenshtein algorithm, how can one find "the optimal solution"?
i.e. how can we find the precise sequence of string operations: inserts, deletes and substitution [of a single letter], necessary to convert the 's string' into the 't string'?
First, it should be noted that in many cases there are SEVERAL optimal solutions. While the Levenshtein algorithm supplies the minimum number of operations (8 in democrat/republican example) there are many sequences (of 8 operations) which can produce this conversion.
By "decoding" the Levenshtein matrix, one can enumerate ALL such optimal sequences.
The general idea is that the optimal solutions all follow a "path", from top left corner to bottom right corner (or in the other direction), whereby the matrix cell values on this path either remain the same or increase by one (or decrease by one in the reverse direction), starting at 0 and ending at the optimal number of operations for the strings in question (0 thru 8 democrat/republican case). The number increases when an operation is necessary, it stays the same when the letter at corresponding positions in the strings are the same.
It is easy to produce an algorithm which produces such a path (slightly more complicated to produce all possible paths), and from such path deduce the sequence of operations.
This path finding algorithm should start at the lower right corner and work its way backward. The reason for this approach is that we know for a fact that to be an optimal solution it must end in this corner, and to end in this corner, it must have come from one of the 3 cells either immediately to its left, immediately above it or immediately diagonally. By selecting a cell among these three cells, one which satisfies our "same value or decreasing by one" requirement, we effectively pick a cell on one of the optimal paths. By repeating the operation till we get on upper left corner (or indeed until we reach a cell with a 0 value), we effectively backtrack our way on an optimal path.
Illustration with the democrat - republican example
It should also be noted that one can build the matrix in one of two ways: with 'democrat' horizontally or vertically. This doesn't change the computation of the Levenshtein distance nor does it change the list of operations needed; it only changes the way we interpret the matrix, for example moving horizontally on the "path" either means inserting a character [from the t string] or deleting a character [off the s string] depending whether 'string s' is "horizontal" or "vertical" in the matrix.
I'll use the following matrix. The conventions are therefore (only going in the left-to-right and/or top-to-bottom directions)
an horizontal move is an INSERTION of a letter from the 't string'
an vertical move is a DELETION of a letter from the 's string'
a diagonal move is either:
a no-operation (both letters at respective positions are the same); the number doesn't change
a SUBSTITUTION (letters at respective positions are distinct); the number increase by one.
Levenshtein matrix for s = "democrat", t="republican"
r e p u b l i c a n
0 1 2 3 4 5 6 7 8 9 10
d 1 1 2 3 4 5 6 7 8 9 10
e 2 2 1 2 3 4 5 6 7 8 9
m 3 3 2 2 3 4 5 6 7 8 9
o 4 4 3 3 3 4 5 6 7 8 9
c 5 5 4 4 4 4 5 6 6 7 8
r 6 5 5 5 5 5 5 6 7 7 8
a 7 6 6 6 6 6 6 6 7 7 8
t 8 7 7 7 7 7 7 7 7 8 8
The arbitrary approach I use to select one path among several possible optimal paths is loosely described below:
Starting at the bottom-rightmost cell, and working our way backward toward
the top left.
For each "backward" step, consider the 3 cells directly adjacent to the current
cell (in the left, top or left+top directions)
if the value in the diagonal cell (going up+left) is smaller or equal to the
values found in the other two cells
AND
if this is same or 1 minus the value of the current cell
then "take the diagonal cell"
if the value of the diagonal cell is one less than the current cell:
Add a SUBSTITUTION operation (from the letters corresponding to
the _current_ cell)
otherwise: do not add an operation this was a no-operation.
elseif the value in the cell to the left is smaller of equal to the value of
the of the cell above current cell
AND
if this value is same or 1 minus the value of the current cell
then "take the cell to left", and
add an INSERTION of the letter corresponding to the cell
else
take the cell above, add
Add a DELETION operation of the letter in 's string'
Following this informal pseudo-code, we get the following:
Start on the "n", "t" cell at bottom right.
Pick the [diagonal] "a", "a" cell as next destination since it is less than the other two (and satisfies the same or -1 condition).
Note that the new cell is one less than current cell
therefore the step 8 is substitute "t" with "n": democra N
Continue with "a", "a" cell,
Pick the [diagonal] "c", "r" cell as next destination...
Note that the new cell is same value as current cell ==> no operation needed.
Continue with "c", "r" cell,
Pick the [diagonal] "i", "c" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 7 is substitute "r" with "c": democ C an
Continue with "i", "c" cell,
Pick the [diagonal] "l", "o" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 6 is substitute "c" with "i": demo I can
Continue with "l", "o" cell,
Pick the [diagonal] "b", "m" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 5 is substitute "o" with "l": dem L ican
Continue with "b", "m" cell,
Pick the [diagonal]"u", "e" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 4 is substitute "m" with "b": de B lican
Continue with "u", "e" cell,
Note the "diagonal" cell doesn't qualify, because the "left" cell is less than it.
Pick the [left] "p", "e" cell as next destination...
therefore the step 3 is instert "u" after "e": de U blican
Continue with "p", "e" cell,
again the "diagonal" cell doesn't qualify
Pick the [left] "e", "e" cell as next destination...
therefore the step 2 is instert "p" after "e": de P ublican
Continue with "e", "e" cell,
Pick the [diagonal] "r", "d" cell as next destination...
Note that the new cell is same value as current cell ==> no operation needed.
Continue with "r", "d" cell,
Pick the [diagonal] "start" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 1 is substitute "d" with "r": R epublican
You've arrived at a cell which value is 0 : your work is done!
The backtracking algorithm to infer the moves from the matrix implemented in python:
def _backtrack_string(matrix, output_word):
'''
Iteratively backtrack DP matrix to get optimal set of moves
Inputs: DP matrix (list:list:int),
Input word (str),
Output word (str),
Start x position in DP matrix (int),
Start y position in DP matrix (int)
Output: Optimal path (list)
'''
i = len(matrix) - 1
j = len(matrix[0]) - 1
optimal_path = []
while i > 0 and j > 0:
diagonal = matrix[i-1][j-1]
vertical = matrix[i-1][j]
horizontal = matrix[i][j-1]
current = matrix[i][j]
if diagonal <= vertical and diagonal <= horizontal and (diagonal <= current):
i = i - 1
j = j - 1
if diagonal == current - 1:
optimal_path.append("Replace " + str(j) + ", " + str(output_word[j]) )
elif horizontal <= vertical and horizontal <= current:
j = j - 1
optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
elif vertical <= horizontal and vertical <= current:
i = i - 1
optimal_path.append("Delete " + str(i))
elif horizontal <= vertical and horizontal <= current:
j = j - 1
optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
else:
i = i - 1
optimal_path.append("Delete " + str(i))
return reversed(optimal_path)
The output I get when I run the algorithm with original word "OPERATING" and desired word "CONSTANTINE" is the following
Insert 0, C
Replace 2, N
Replace 3, S
Replace 4, T
Insert 6, N
Replace 10, E
"" C O N S T A N T I N E
"" [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
<-- Insert 0, C
O [1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
\ Replace 2, N
P [2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
\ Replace 3, S
E [3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 9]
\ Replace 4, T
R [4, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 10] No move
\ <-- Insert 6, N
A [5, 5, 5, 5, 5, 5, 4, 5, 6, 7, 8, 9]
\ No move
T [6, 6, 6, 6, 6, 5, 5, 5, 5, 6, 7, 8]
\ No move
I [7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 6, 7]
\ No move
N [8, 8, 8, 7, 8, 7, 7, 6, 7, 6, 5, 6]
\ Replace 10, E
G [9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6, 6]
Note that I had to add extra conditions if the element in the diagonal is the same as the current element. There could be a deletion or insertion depending on values in the vertical (up) and horizontal (left) positions. We only get a "no operation" or "replace" operation when the following occurs
# assume bottom right of a 2x2 matrix is the reference position
# and has value v
# the following is the situation where we get a replace operation
[v + 1 , v<]
[ v< , v]
# the following is the situation where we get a "no operation"
[v , v<]
[v<, v ]
I think this is where the algorithm described in the first answer could break. There could be other arrangements in the 2x2 matrix above when neither operations are correct. The example shown with input "OPERATING" and output "CONSTANTINE" breaks the algorithm unless this is taken into account.
It's been some times since I played with it, but it seems to me the matrix should look something like:
. . r e p u b l i c a n
. 0 1 2 3 4 5 6 7 8 9 10
d 1 1 2 3 4 5 6 7 8 9 10
e 2 2 1 2 3 4 5 6 7 8 9
m 3 3 2 2 3 4 5 6 7 8 9
o 4 4 3 3 3 4 5 6 7 8 9
c 5 5 4 4 4 4 5 6 7 8 9
r 6 5 5 5 5 5 5 6 7 8 9
a 7 6 6 6 6 6 6 6 7 7 8
t 8 7 7 7 7 7 7 7 7 7 8
Don't take it for granted though.
Here is a VBA algorithm based on mjv's answer.
(very well explained, but some case were missing).
Sub TU_Levenshtein()
Call Levenshtein("democrat", "republican")
Call Levenshtein("ooo", "u")
Call Levenshtein("ceci est un test", "ceci n'est pas un test")
End Sub
Sub Levenshtein(ByVal string1 As String, ByVal string2 As String)
' Fill Matrix Levenshtein (-> array 'Distance')
Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long
string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)
For i = 0 To string1_length
distance(i, 0) = i
Next
For j = 0 To string2_length
distance(0, j) = j
Next
For i = 1 To string1_length
For j = 1 To string2_length
If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
distance(i, j) = distance(i - 1, j - 1)
Else
distance(i, j) = Application.WorksheetFunction.min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
End If
Next
Next
LevenshteinDistance = distance(string1_length, string2_length) ' for information only
' Write Matrix on VBA sheets (only for visuation, not used in calculus)
Cells.Clear
For i = 1 To UBound(distance, 1)
Cells(i + 2, 1).Value = Mid(string1, i, 1)
Next i
For i = 1 To UBound(distance, 2)
Cells(1, i + 2).Value = Mid(string2, i, 1)
Next i
For i = 0 To UBound(distance, 1)
For j = 0 To UBound(distance, 2)
Cells(i + 2, j + 2) = distance(i, j)
Next j
Next i
' One solution
current_posx = UBound(distance, 1)
current_posy = UBound(distance, 2)
Do
cc = distance(current_posx, current_posy)
Cells(current_posx + 1, current_posy + 1).Interior.Color = vbYellow ' visualisation again
' Manage border case
If current_posy - 1 < 0 Then
MsgBox ("deletion. " & Mid(string1, current_posx, 1))
current_posx = current_posx - 1
current_posy = current_posy
GoTo suivant
End If
If current_posx - 1 < 0 Then
MsgBox ("insertion. " & Mid(string2, current_posy, 1))
current_posx = current_posx
current_posy = current_posy - 1
GoTo suivant
End If
' Middle cases
cc_L = distance(current_posx, current_posy - 1)
cc_U = distance(current_posx - 1, current_posy)
cc_D = distance(current_posx - 1, current_posy - 1)
If (cc_D <= cc_L And cc_D <= cc_U) And (cc_D = cc - 1 Or cc_D = cc) Then
If (cc_D = cc - 1) Then
MsgBox "substitution. " & Mid(string1, current_posx, 1) & " by " & Mid(string2, current_posy, 1)
current_posx = current_posx - 1
current_posy = current_posy - 1
GoTo suivant
Else
MsgBox "no operation"
current_posx = current_posx - 1
current_posy = current_posy - 1
GoTo suivant
End If
ElseIf cc_L <= cc_D And cc_L = cc - 1 Then
MsgBox ("insertion. " & Mid(string2, current_posy, 1))
current_posx = current_posx
current_posy = current_posy - 1
GoTo suivant
Else
MsgBox ("deletion." & Mid(string1, current_posy, 1))
current_posx = current_posx
current_posy = current_posy - 1
GoTo suivant
End If
suivant:
Loop While Not (current_posx = 0 And current_posy = 0)
End Sub
I've done some work with the Levenshtein distance algorithm's matrix recently. I needed to produce the operations which would transform one list into another. (This will work for strings too.)
Do the following (vows) tests show the sort of functionality that you're looking for?
, "lev - complex 2"
: { topic
: lev.diff([13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11], [9, 13, 6, 5, 1, 8, 2, 15, 12, 11])
, "check actions"
: function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 9, val: 7 },
{ op: 'delete', pos: 5, val: 9 },
{ op: 'insert', pos: 0, val: 9 },
]); }
}
, "lev - complex 3"
: { topic
: lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11])
, "check actions"
: function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
{ op: 'insert', pos: 5, val: 9 },
{ op: 'insert', pos: 9, val: 7 }
]); }
}
, "lev - complex 4"
: { topic
: lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11, 16], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11, 17])
, "check actions"
: function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
{ op: 'insert', pos: 5, val: 9 },
{ op: 'insert', pos: 9, val: 7 },
{ op: 'replace', pos: 11, val: 17 }
]); }
}
Here is some Matlab code, is this correct by your opinion? Seems to give the right results :)
clear all
s = char('democrat');
t = char('republican');
% Edit Matrix
m=length(s);
n=length(t);
mat=zeros(m+1,n+1);
for i=1:1:m
mat(i+1,1)=i;
end
for j=1:1:n
mat(1,j+1)=j;
end
for i=1:m
for j=1:n
if (s(i) == t(j))
mat(i+1,j+1)=mat(i,j);
else
mat(i+1,j+1)=1+min(min(mat(i+1,j),mat(i,j+1)),mat(i,j));
end
end
end
% Edit Sequence
s = char('democrat');
t = char('republican');
i = m+1;
j = n+1;
display([s ' --> ' t])
while(i ~= 1 && j ~= 1)
temp = min(min(mat(i-1,j-1), mat(i,j-1)), mat(i-1,j));
if(mat(i-1,j) == temp)
i = i - 1;
t = [t(1:j-1) s(i) t(j:end)];
disp(strcat(['iinsertion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
elseif(mat(i-1,j-1) == temp)
if(mat(i-1,j-1) == mat(i,j))
i = i - 1;
j = j - 1;
disp(strcat(['uunchanged: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
else
i = i - 1;
j = j - 1;
t(j) = s(i);
disp(strcat(['substition: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
end
elseif(mat(i,j-1) == temp)
j = j - 1;
t(j) = [];
disp(strcat(['dddeletion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
end
end
C# implementation of JackIsJack answer with some changes:
Operations are output in 'forward' order (JackIsJack outputs in reverse order);
Last 'else' clause in original answer worked incorrectly (looks like copy-paste error).
Console application code:
class Program
{
static void Main(string[] args)
{
Levenshtein("1", "1234567890");
Levenshtein( "1234567890", "1");
Levenshtein("kitten", "mittens");
Levenshtein("mittens", "kitten");
Levenshtein("kitten", "sitting");
Levenshtein("sitting", "kitten");
Levenshtein("1234567890", "12356790");
Levenshtein("12356790", "1234567890");
Levenshtein("ceci est un test", "ceci n'est pas un test");
Levenshtein("ceci n'est pas un test", "ceci est un test");
}
static void Levenshtein(string string1, string string2)
{
Console.WriteLine("Levenstein '" + string1 + "' => '" + string2 + "'");
var string1_length = string1.Length;
var string2_length = string2.Length;
int[,] distance = new int[string1_length + 1, string2_length + 1];
for (int i = 0; i <= string1_length; i++)
{
distance[i, 0] = i;
}
for (int j = 0; j <= string2_length; j++)
{
distance[0, j] = j;
}
for (int i = 1; i <= string1_length; i++)
{
for (int j = 1; j <= string2_length; j++)
{
if (string1[i - 1] == string2[j - 1])
{
distance[i, j] = distance[i - 1, j - 1];
}
else
{
distance[i, j] = Math.Min(distance[i - 1, j] + 1, Math.Min(
distance[i, j - 1] + 1,
distance[i - 1, j - 1] + 1));
}
}
}
var LevenshteinDistance = distance[string1_length, string2_length];// for information only
Console.WriteLine($"Levernstein distance: {LevenshteinDistance}");
// List of operations
var current_posx = string1_length;
var current_posy = string2_length;
var stack = new Stack<string>(); // for outputting messages in forward direction
while (current_posx != 0 || current_posy != 0)
{
var cc = distance[current_posx, current_posy];
// edge cases
if (current_posy - 1 < 0)
{
stack.Push("Delete '" + string1[current_posx - 1] + "'");
current_posx--;
continue;
}
if (current_posx - 1 < 0)
{
stack.Push("Insert '" + string2[current_posy - 1] + "'");
current_posy--;
continue;
}
// Middle cases
var cc_L = distance[current_posx, current_posy - 1];
var cc_U = distance[current_posx - 1, current_posy];
var cc_D = distance[current_posx - 1, current_posy - 1];
if ((cc_D <= cc_L && cc_D <= cc_U) && (cc_D == cc - 1 || cc_D == cc))
{
if (cc_D == cc - 1)
{
stack.Push("Substitute '" + string1[current_posx - 1] + "' by '" + string2[current_posy - 1] + "'");
current_posx--;
current_posy--;
}
else
{
stack.Push("Keep '" + string1[current_posx - 1] + "'");
current_posx--;
current_posy--;
}
}
else if (cc_L <= cc_D && cc_L == cc - 1)
{
stack.Push("Insert '" + string2[current_posy - 1] + "'");
current_posy--;
}
else
{
stack.Push("Delete '" + string1[current_posx - 1]+"'");
current_posx--;
}
}
while(stack.Count > 0)
{
Console.WriteLine(stack.Pop());
}
}
}
The code to get all the edit paths according to edit matrix, source and target. Make a comment if there are any bugs. Thanks a lot!
import copy
from typing import List, Union
def edit_distance(source: Union[List[str], str],
target: Union[List[str], str],
return_distance: bool = False):
"""get the edit matrix
"""
edit_matrix = [[i + j for j in range(len(target) + 1)] for i in range(len(source) + 1)]
for i in range(1, len(source) + 1):
for j in range(1, len(target) + 1):
if source[i - 1] == target[j - 1]:
d = 0
else:
d = 1
edit_matrix[i][j] = min(edit_matrix[i - 1][j] + 1,
edit_matrix[i][j - 1] + 1,
edit_matrix[i - 1][j - 1] + d)
if return_distance:
return edit_matrix[len(source)][len(target)]
return edit_matrix
def get_edit_paths(matrix: List[List[int]],
source: Union[List[str], str],
target: Union[List[str], str]):
"""get all the valid edit paths
"""
all_paths = []
def _edit_path(i, j, optimal_path):
if i > 0 and j > 0:
diagonal = matrix[i - 1][j - 1] # the diagonal value
vertical = matrix[i - 1][j] # the above value
horizontal = matrix[i][j - 1] # the left value
current = matrix[i][j] # current value
# whether the source and target token are the same
flag = False
# compute the minimal value of the diagonal, vertical and horizontal
minimal = min(diagonal, min(vertical, horizontal))
# if the diagonal is the minimal
if diagonal == minimal:
new_i = i - 1
new_j = j - 1
path_ = copy.deepcopy(optimal_path)
# if the diagnoal value equals to current - 1
# it means `replace`` operation
if diagonal == current - 1:
path_.append(f"Replace | {new_j} | {target[new_j]}")
_edit_path(new_i, new_j, path_)
# if the diagonal value equals to current value
# and corresponding positional value of source and target equal
# it means this is current best path
elif source[new_i] == target[new_j]:
flag = True
# path_.append(f"Keep | {new_i}")
_edit_path(new_i, new_j, path_)
# if the position doesn't have best path
# we need to consider other situations
if not flag:
# if vertical value equals to minimal
# it means delete source corresponding value
if vertical == minimal:
new_i = i - 1
new_j = j
path_ = copy.deepcopy(optimal_path)
path_.append(f"Delete | {new_i}")
_edit_path(new_i, new_j, path_)
# if horizontal value equals to minimal
# if mean insert target corresponding value to source
if horizontal == minimal:
new_i = i
new_j = j - 1
path_ = copy.deepcopy(optimal_path)
path_.append(f"Insert | {new_j} | {target[new_j]}")
_edit_path(new_i, new_j, path_)
else:
all_paths.append(list(reversed(optimal_path)))
# get the rows and columns of the edit matrix
row_len = len(matrix) - 1
col_len = len(matrix[0]) - 1
_edit_path(row_len, col_len, optimal_path=[])
return all_paths
if __name__ == "__main__":
source = "BBDEF"
target = "ABCDF"
matrix = edit_distance(source, target)
print("print paths")
paths = get_edit_paths(matrix, source=list(source), target=list(target))
for path in paths:
print(path)
Can you do something like Python's yield statement in Mathematica, in order to create generators? See e.g. here for the concept.
Update
Here's an example of what I mean, to iterate over all permutations, using only O(n) space: (algorithm as in Sedgewick's Algorithms book):
gen[f_, n_] := Module[{id = -1, val = Table[Null, {n}], visit},
visit[k_] := Module[{t},
id++; If[k != 0, val[[k]] = id];
If[id == n, f[val]];
Do[If[val[[t]] == Null, visit[t]], {t, 1, n}];
id--; val[[k]] = Null;];
visit[0];
]
Then call it it like:
gen[Print,3], printing all 6 permutations of length 3.
As I have previously stated, using Compile will given faster code. Using an algorithm from fxtbook, the following code generates a next partition in lexicographic ordering:
PermutationIterator[f_, n_Integer?Positive, nextFunc_] :=
Module[{this = Range[n]},
While[this =!= {-1}, f[this]; this = nextFunc[n, this]];]
The following code assumes we run version 8:
ClearAll[cfNextPartition];
cfNextPartition[target : "MVM" | "C"] :=
cfNextPartition[target] =
Compile[{{n, _Integer}, {this, _Integer, 1}},
Module[{i = n, j = n, ni, next = this, r, s},
While[Part[next, --i] > Part[next, i + 1],
If[i == 1, i = 0; Break[]]];
If[i == 0, {-1}, ni = Part[next, i];
While[ni > Part[next, j], --j];
next[[i]] = Part[next, j]; next[[j]] = ni;
r = n; s = i + 1;
While[r > s, ni = Part[next, r]; next[[r]] = Part[next, s];
next[[s]] = ni; --r; ++s];
next
]], RuntimeOptions -> "Speed", CompilationTarget -> target
];
Then
In[75]:= Reap[PermutationIterator[Sow, 4, cfNextPartition["C"]]][[2,
1]] === Permutations[Range[4]]
Out[75]= True
This is clearly better in performance than the original gen function.
In[83]:= gen[dummy, 9] // Timing
Out[83]= {26.067, Null}
In[84]:= PermutationIterator[dummy, 9, cfNextPartition["C"]] // Timing
Out[84]= {1.03, Null}
Using Mathematica's virtual machine is not much slower:
In[85]:= PermutationIterator[dummy, 9,
cfNextPartition["MVM"]] // Timing
Out[85]= {1.154, Null}
Of course this is nowhere near C code implementation, yet provides a substantial speed-up over pure top-level code.
You probably mean the question to be more general but the example of iterating over permutations as given on the page you link to happens to be built in to Mathematica:
Scan[Print, Permutations[{1, 2, 3}]]
The Print there can be replaced with any function.