3D Complex Matrix Iteration and Data manipulation - matrix

I have achieved the desired result, but I'm trying to find a more elegant solution. Right now, it's a little hard coded and that's not good practice.
NOTE: This is an old Robot Language that resembles PASCAL.
Problem: I have a 3D Matrix of STRUCTS. 4 X 4 X 9, but I'm just focusing on the first 4 X 4. The STRUCT has data members I need to manipulate.
GlobalTub[i, j, k].calcPos ----This member is a type of position with 6 REALS (XYZWPR)
Initializing through the matrix is no problem. Just a simple nested FOR loop.
--Matrix Size
--numOfTubs = (X_CNT * Y_CNT * Z_CNT)
fCnt = 0
--Init Matrix
FOR i = 1 TO X_CNT DO
FOR j = 1 TO Y_CNT DO
FOR k = 1 TO Z_CNT DO
InitPos(GlobalTub[i, j, k].foundPos, 0, 0, 1, 0, 1, 1)
InitPos(GlobalTub[i, j, k].nextPos, 0, 0, 1, 0, 1, 1)
InitPos(GlobalTub[i, j, k].calcPos, 0, 0, 1, 0, 1, 1)
GlobalTub[i, j, k].inPlace = FALSE
--Assing Tub Number Colmun Major
fCnt = fCnt + 1
GlobalTub[i, j, k].tubNum = fCnt
ENDFOR
ENDFOR
ENDFOR
Now I have to "palletize" this matrix of STRUCTS. Right now I'm just using a hard coded flow for iterating 4 STRUCTS in X, shift over in Y, and then continue to the next 4.
--Used for Testing
--1 to 4
FOR i = 1 to 4 DO
TubPos[i] = tempXYZ
tempXYZ.X = tempXYZ.X + (xPitch + xTolerance)
ENDFOR
tempXYZ = TubPos[1]
tempXYZ.Y = tempXYZ.Y + (yPitch + yTolerance)
-- 5 to 8
FOR i = 1 to 4 DO
TubPos[i + 4] = tempXYZ
tempXYZ.X = tempXYZ.X + (xPitch + xTolerance)
ENDFOR
How could one achieve this with a nested FOR loop?Pallet of Parts

I answered my own question....just hammered it out.
--Init Loop Counter
fCnt = 1
FOR j = 1 TO Y_CNT DO
--Place 4 positions in X
FOR i = 1 to X_CNT DO
TubPos[fCnt] = tempXYZ
tempXYZ.X = tempXYZ.X + (xPitch + xTolerance)
fCnt = fCnt + 1
ENDFOR
--Shift Y position for next 4 Rows
tempXYZ = TubPos[fCnt-1]
tempXYZ.X = tempXYZ.X - ((xPitch + xTolerance) * (X_CNT - 1))
tempXYZ.Y = tempXYZ.Y + (yPitch + yTolerance)
ENDFOR
Here are my Output
Positions

Related

Min Abs Sum task from codility

There is already a topic about this task, but I'd like to ask about my specific approach.
The task is:
Let A be a non-empty array consisting of N integers.
The abs sum of two for a pair of indices (P, Q) is the absolute value
|A[P] + A[Q]|, for 0 ≤ P ≤ Q < N.
For example, the following array A:
A[0] = 1 A1 = 4 A[2] = -3 has pairs of indices (0, 0), (0,
1), (0, 2), (1, 1), (1, 2), (2, 2). The abs sum of two for the pair
(0, 0) is A[0] + A[0] = |1 + 1| = 2. The abs sum of two for the pair
(0, 1) is A[0] + A1 = |1 + 4| = 5. The abs sum of two for the pair
(0, 2) is A[0] + A[2] = |1 + (−3)| = 2. The abs sum of two for the
pair (1, 1) is A1 + A1 = |4 + 4| = 8. The abs sum of two for the
pair (1, 2) is A1 + A[2] = |4 + (−3)| = 1. The abs sum of two for
the pair (2, 2) is A[2] + A[2] = |(−3) + (−3)| = 6. Write a function:
def solution(A)
that, given a non-empty array A consisting of N integers, returns the
minimal abs sum of two for any pair of indices in this array.
For example, given the following array A:
A[0] = 1 A1 = 4 A[2] = -3 the function should return 1, as
explained above.
Given array A:
A[0] = -8 A1 = 4 A[2] = 5 A[3] =-10 A[4] = 3 the
function should return |(−8) + 5| = 3.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [1..100,000]; each element of array A
is an integer within the range [−1,000,000,000..1,000,000,000].
The official solution is O(N*M^2), but I think it could be solved in O(N).
My approach is to first get rid of duplicates and sort the array. Then we check both ends and sompare the abs sum moving the ends by one towards each other. We try to move the left end, the right one or both. If this doesn't improve the result, our sum is the lowest. My code is:
def solution(A):
A = list(set(A))
n = len(A)
A.sort()
beg = 0
end = n - 1
min_sum = abs(A[beg] + A[end])
while True:
min_left = abs(A[beg+1] + A[end]) if beg+1 < n else float('inf')
min_right = abs(A[beg] + A[end-1]) if end-1 >= 0 else float('inf')
min_both = abs(A[beg+1] + A[end-1]) if beg+1 < n and end-1 >= 0 else float('inf')
min_all = min([min_left, min_right, min_both])
if min_sum <= min_all:
return min_sum
if min_left == min_all:
beg += 1
min_sum = min_left
elif min_right == min_all:
end -= 1
min_sum = min_right
else:
beg += 1
end -= 1
min_sum = min_both
It passes almost all of the tests, but not all. Is there some bug in my code or the approach is wrong?
EDIT:
After the aka.nice answer I was able to fix the code. It scores 100% now.
def solution(A):
A = list(set(A))
n = len(A)
A.sort()
beg = 0
end = n - 1
min_sum = abs(A[beg] + A[end])
while beg <= end:
min_left = abs(A[beg+1] + A[end]) if beg+1 < n else float('inf')
min_right = abs(A[beg] + A[end-1]) if end-1 >= 0 else float('inf')
min_all = min(min_left, min_right)
if min_all < min_sum:
min_sum = min_all
if min_left <= min_all:
beg += 1
else:
end -= 1
return min_sum
Just take this example for array A
-11 -5 -2 5 6 8 12
and execute your algorithm step by step, you get a premature return:
beg=0
end=6
min_sum=1
min_left=7
min_right=3
min_both=3
min_all=3
return min_sum
though there is a better solution abs(5-5)=0.
Hint: you should check the sign of A[beg] and A[end] to decide whether to continue or exit the loop. What to do if both >= 0, if both <= 0, else ?
Note that A.sort() has a non neglectable cost, likely O(N*log(N)), it will dominate the cost of the solution you exhibit.
By the way, what is M in the official cost O(N*M^2)?
And the link you provide is another problem (sum all the elements of A or their opposite).

which solution is better in terms of space.time complexity?

i have 2 lists of integers. they are both sorted already. I want to find the elements (one from each list) that add up to a given number.
-first idea is to iterate over first list and use binary search to look for the number needed to sum to the given number. i know this will take nlogn time.
the other is to store one of the lists in a hashtable/map (i dont really know the difference) and iterate over other list and look up the needed value. does this take n time? and n memory?
overall which would be better?
You are comparing it right. But both has different aspects. Hashing is not a good choice if you have memory constraints. But if you have plenty of memory then yes, you can afford to do that.
Also you will see many times in Computer Science the notion of space-time tradeoff. It will always be some gain by losing some. Hashing runs in O(n) and space complexity is O(n). But in case of searching only O(nlogn) time complexity but space complexity is O(1)
Long story short, scenario lets you decide which one to select. I have shown just one aspect. There can be many. Know the constraints and tradeoffs of each and you will be able to decide it.
A better solution : (Time complexity: O(n) Space complexity: O(1))
Suppose there are 2 array a and b.
Now WLOG suppose a is sorted in ascending and another in descending (Even if it is not the case we can traverse it accordingly).
index1=0;index2=0; // considered 0 indexing
while(index1 <= N1-1 && index2 <= N2-1)
{
if ((a[index1] + b[index2]) == x)
// success
else if ((a[index1] + b[index2]) > x)
index2++;
else
index1++;
}
//failure no such element.
Sort list A in ascending order, and list B in descending order. Set a = 1 and b = 1.
If A[a] + B[b] = T, record the pair, increment a, and repeat.
Otherwise, A[a] + B[b] < T, increment a, and repeat from 1.
Otherwise, A[a] + B[b] > T, increment b, and repeat from 1.
Naturally, if a or b exceeds the size of A or B, respectively, terminate.
Example:
A = 1, 2, 2, 6, 8, 10, 11
B = 9, 8, 4, 3, 1, 1
T = 10
a = 1, b = 1
A[a] + B[b] = A[1] + B[1] = 10; record; a = a + 1 = 2; repeat.
A[a] + B[b] = A[2] + B[1] = 11; b = b + 1 = 2; repeat.
A[a] + B[b] = A[2] + B[2] = 10; record; a = a + 1 = 3; repeat.
A[a] + B[b] = A[3] + B[2] = 10; record; a = a + 1 = 4; repeat.
A[a] + B[b] = A[4] + B[2] = 14; b = b + 1 = 3; repeat.
A[a] + B[b] = A[4] + B[3] = 10; record; a = a + 1 = 5; repeat.
A[a] + B[b] = A[5] + B[3] = 12; b = b + 1 = 4; repeat.
A[a] + B[b] = A[5] + B[4] = 11; b = b + 1 = 5; repeat.
A[a] + B[b] = A[5] + B[5] = 9; a = a + 1 = 6; repeat.
A[a] + B[b] = A[6] + B[5] = 11; b = b + 1 = 6; repeat.
A[a] + B[b] = A[6] + B[6] = 11; b = b + 1 = 7; repeat.
Terminate.
You can do this without additional space if instead of having B sorted in descending order, you set b = |B| and decrement it instead of incrementing it, effectively reading it backwards.
The above procedure misses out on some duplicate answers where B has a string of duplicate values, for instance:
A = 2, 2, 2
B = 8, 8, 8
The algorithm as described above will yield three pairs, but you might want nine. This can be fixed by detecting this case, keeping separate counters ca and cb for the lengths of the runs of A[a] and B[b] you have seen, and adding ca * cb - ca copies of the last pair you added to the bag. In this example:
A = 2, 2, 2
B = 8, 8, 8
a = 1, b = 1
ca = 0, cb = 0
A[a] + B[b] = 10; record pair, a = a + 1 = 2, ca = ca + 1 = 2, repeat.
A[a] + B[b] = 10; record pair, a = a + 1 = 3, ca = ca + 1 = 2, repeat.
A[a] + B[b] = 10; record pair, a = a + 1 = 4;
a exceeds bounds, value of A[a] changed;
increment b to count run of B's;
b = b + 1 = 2, cb = cb + 1 = 2
b = b + 1 = 3, cb = cb + 1 = 3
b = b + 1 = 4;
b exceeds bounds, value of B[b] changed;
add ca * cb - ca = 3 * 3 - 3 = 6 copies of pair (2, 8).

Random Unique Pairs

I have a list of 100 items. I'd like to randomly pair these items with each other. These pairs must be unique, so there are 4950 possibilities (100 choose 2) total.
Of all 4950 pairs, I'd like to have 1000 pairs randomly selected. But they key is, I'd like each item (of the 100 items) to overall appear the same amount of times (here, 20 times).
I tried to implement this with code a couple of times. And it worked fine when I tried with a lower amount of pairs chosen, but each time I try with the full 1000 pairs, I get stuck in a loop.
Does anyone have an idea for an approach? And what if I change the number of pairs I wish to select (e.g., 1500 rather than 1000 random pairs)?
My attempt (written in VBA):
Dim City1(4951) As Integer
Dim City2(4951) As Integer
Dim CityCounter(101) As Integer
Dim PairCounter(4951) As Integer
Dim i As Integer
Dim j As Integer
Dim k As Integer
i = 1
While i < 101
CityCounter(i) = 0
i = i + 1
Wend
i = 1
While i < 4951
PairCounter(i) = 0
i = i + 1
Wend
i = 1
j = 1
While j < 101
k = j + 1
While k < 101
City1(i) = j
City2(i) = k
k = k + 1
i = i + 1
Wend
j = j + 1
Wend
Dim temp As Integer
i = 1
While i < 1001
temp = Random(1,4950)
While ((PairCounter(temp) = 1) Or (CityCounter( (City1(temp)) ) = 20) Or (CityCounter( (City2(temp)) ) = 20))
temp = Random(1,4950)
Wend
PairCounter(temp) = 1
CityCounter( (City1(temp)) ) = (CityCounter( (City1(temp)) ) + 1)
CityCounter( (City2(temp)) ) = (CityCounter( (City2(temp)) ) + 1)
i = i + 1
Wend
Take a list, scramble it, and mark every two elements off as a pair. Add these pairs to a list of pairs. Ensure that list of pairs is sorted.
Scramble the list of pairs, and add each pair to a "staged" pair list. Check if it's in the list of pairs. If it's in the list of pairs, scramble and start over. If you get the entire list without any duplicates, add the staged pair list to the pair list and start this paragraph over.
Since this involves a nondeterministic step at the end I'm not sure how slow it will be, but it should work.
This is old thread, but I was looking for something similar, and finaly did it myself.
The algorithm is not 100% random (after being a bit "tired" with unsuccessfull random trials starts systematic screening of the table :) - anyway for me - "random enough") but works reasonably fast, and returns required table (unfortunalety not always, but...) usually every second or third use (look in A1 if there is your reqired number of pairs for each item).
Here is VBA code to be run in Excel environment.
Output is directed to current sheet starting from A1 cell.
Option Explicit
Public generalmax%, oldgeneralmax%, generalmin%, alloweddiff%, i&
Public outtable() As Integer
Const maxpair = 100, upperlimit = 20
Sub generate_random_unique_pairs()
'by Kaper 2015.02 for stackoverflow.com/questions/14884975
Dim x%, y%, counter%
Randomize
ReDim outtable(1 To maxpair + 1, 1 To maxpair + 1)
Range("A1").Resize(maxpair + 1, maxpair + 1).ClearContents
alloweddiff = 1
Do
i = i + 1
If counter > (0.5 * upperlimit) Then 'try some systematic approach
For x = 1 To maxpair - 1 ' top-left or:' To 1 Step -1 ' bottom-right
For y = x + 1 To maxpair
Call test_and_fill(x, y, counter)
Next y
Next x
If counter > 0 Then
alloweddiff = alloweddiff + 1
counter = 0
End If
End If
' mostly used - random mode
x = WorksheetFunction.RandBetween(1, maxpair - 1)
y = WorksheetFunction.RandBetween(x + 1, maxpair)
counter = counter + 1
Call test_and_fill(x, y, counter)
If counter = 0 Then alloweddiff = WorksheetFunction.Max(alloweddiff, 1)
If i > (2.5 * upperlimit) Then Exit Do
Loop Until generalmin = upperlimit
Range("A1").Resize(maxpair + 1, maxpair + 1).Value = outtable
Range("A1").Value = generalmin
Application.StatusBar = ""
End Sub
Sub test_and_fill(x%, y%, ByRef counter%)
Dim temprowx%, temprowy%, tempcolx%, tempcoly%, tempmax%, j%
tempcolx = outtable(1, x + 1)
tempcoly = outtable(1, y + 1)
temprowx = outtable(x + 1, 1)
temprowy = outtable(y + 1, 1)
tempmax = 1+ WorksheetFunction.Max(tempcolx, tempcoly, temprowx, temprowy)
If tempmax <= (generalmin + alloweddiff) And tempmax <= upperlimit And outtable(y + 1, x + 1) = 0 Then
counter = 0
outtable(y + 1, x + 1) = 1
outtable(x + 1, y + 1) = 1
outtable(x + 1, 1) = 1 + outtable(x + 1, 1)
outtable(y + 1, 1) = 1 + outtable(y + 1, 1)
outtable(1, x + 1) = 1 + outtable(1, x + 1)
outtable(1, y + 1) = 1 + outtable(1, y + 1)
generalmax = WorksheetFunction.Max(generalmax, outtable(x + 1, 1), outtable(y + 1, 1), outtable(1, x + 1), outtable(1, y + 1))
generalmin = outtable(x + 1, 1)
For j = 1 To maxpair
If outtable(j + 1, 1) < generalmin Then generalmin = outtable(j + 1, 1)
If outtable(1, j + 1) < generalmin Then generalmin = outtable(1, j + 1)
Next j
If generalmax > oldgeneralmax Then
oldgeneralmax = generalmax
Application.StatusBar = "Working on pairs " & generalmax & "Total progress (non-linear): " & Format(1# * generalmax / upperlimit, "0%")
End If
alloweddiff = alloweddiff - 1
i = 0
End If
End Sub
Have an array appeared[] which keeps track of how many times each item already appeared in answer. Let's say each element has to appear k times. Iterate over the array, and while current element has its appeared value less than k, choose a random pair for it from that element who also have appeared less than k times. Add that pair to answer and increase appearance count for both.
create a 2-dimensional 100*100 matrix of booleans, all False
of these 10K booleans, set 1K of them to true, with the following constraints:
the diagonal should stay empty
no row or column should have more than 20 true values
at the end, every row and column should have 20 True values.
Now, there is the X=Y diagonal symmetry. Just add the following constraints:
the triangle at one side of the diagonal should stay empty
in the above constraints, the restrictions for rows&columns should be combined/added

Very interesting program of building pyramid

I have came across this very interesting program of printing numbers in pyramid.
If n = 1 then print the following,
1 2
4 3
if n = 2 then print the following,
1 2 3
8 9 4
7 6 5
if n = 3 then print the following,
1 2 3 4
12 13 14 5
11 16 15 6
10 9 8 7
I can print all these using taking quite a few loops and variables but it looks very specific. You might have noticed that all these pyramid filling starts in one direction until it find path filled. As you might have noticed 1,2,3,4,5,6,7,8,9,10,11,12 filed in outer edges till it finds 1 so after it goes in second row after 12 and prints 13,14 and so on. It prints in spiral mode something like snakes game snakes keep on going until it hits itself.
I would like to know is there any algorithms behind this pyramid generation or its just tricky time consuming pyramid generation program.
Thanks in advance. This is a very interesting challenging program so I kindly request no need of pipeline of down vote :)
I made a small recursive algorithm for your problem.
public int Determine(int n, int x, int y)
{
if (y == 0) return x + 1; // Top
if (x == n) return n + y + 1; // Right
if (y == n) return 3 * n - x + 1; // Bottom
if (x == 0) return 4 * n - y + 1; // Left
return 4 * n + Determine(n - 2, x - 1, y - 1);
}
You can call it by using a double for loop. x and y start at 0:
for (int y=0; y<=n; y++)
for (int x=0; x<=n; x++)
result[x,y] = Determine(n,x,y);
Here is some C code implementing the basic algorithm submitted by #C.Zonnerberg my example uses n=6 for a 6x6 array.
I had to make a few changes to get the output the way I expected it to look. I swapped most the the x's and y's and changed several of the n's to n-1 and changed the comparisons in the for loops from <= to <
int main(){
int x,y,n;
int result[6][6];
n=6;
for (x=0; x<n; x++){
for (y=0; y<n; y++) {
result[x][y] = Determine(n,x,y);
if(y==0)
printf("\n[%d,%d] = %2d, ", x,y, result[x][y]);
else
printf("[%d,%d] = %2d, ", x,y, result[x][y]);
}
}
return 0;
}
int Determine(int n, int x, int y)
{
if (x == 0) return y + 1; // Top
if (y == n-1) return n + x; // Right
if (x == n-1) return 3 * (n-1) - y + 1; // Bottom
if (y == 0) return 4 * (n-1) - x + 1; // Left
return 4 * (n-1) + Determine(n - 2, x - 1, y- 1);
}
Output
[0,0] = 1, [0,1] = 2, [0,2] = 3, [0,3] = 4, [0,4] = 5, [0,5] = 6,
[1,0] = 20, [1,1] = 21, [1,2] = 22, [1,3] = 23, [1,4] = 24, [1,5] = 7,
[2,0] = 19, [2,1] = 32, [2,2] = 33, [2,3] = 34, [2,4] = 25, [2,5] = 8,
[3,0] = 18, [3,1] = 31, [3,2] = 36, [3,3] = 35, [3,4] = 26, [3,5] = 9,
[4,0] = 17, [4,1] = 30, [4,2] = 29, [4,3] = 28, [4,4] = 27, [4,5] = 10,
[5,0] = 16, [5,1] = 15, [5,2] = 14, [5,3] = 13, [5,4] = 12, [5,5] = 11,
With an all-zeros array, you could start with [row,col] = [0,0], fill in this space, then add [0,1] to position (one to the right) until it's at the end or runs into a non-zero.
Then go down (add [1,0]), filling in space until it's the end or runs into a non-zero.
Then go left (add [0,-1]), filling in space until it's the end or runs into a non-zero.
Then go up (add [-1,0]), filling in space until it's the end or runs into a non-zero.
and repeat...

Levenshtein Distance: Inferring the edit operations from the matrix

I wrote Levenshtein algorithm in in C++
If I input:
string s: democrat
string t: republican
I get the matrix D filled-up and the number of operations (the Levenshtein distance) can be read in D[10][8] = 8
Beyond the filled matrix I want to construct the optimal solution. How must look this solution? I don't have an idea.
Please only write me HOW MUST LOOK for this example.
The question is
Given the matrix produced by the Levenshtein algorithm, how can one find "the optimal solution"?
i.e. how can we find the precise sequence of string operations: inserts, deletes and substitution [of a single letter], necessary to convert the 's string' into the 't string'?
First, it should be noted that in many cases there are SEVERAL optimal solutions. While the Levenshtein algorithm supplies the minimum number of operations (8 in democrat/republican example) there are many sequences (of 8 operations) which can produce this conversion.
By "decoding" the Levenshtein matrix, one can enumerate ALL such optimal sequences.
The general idea is that the optimal solutions all follow a "path", from top left corner to bottom right corner (or in the other direction), whereby the matrix cell values on this path either remain the same or increase by one (or decrease by one in the reverse direction), starting at 0 and ending at the optimal number of operations for the strings in question (0 thru 8 democrat/republican case). The number increases when an operation is necessary, it stays the same when the letter at corresponding positions in the strings are the same.
It is easy to produce an algorithm which produces such a path (slightly more complicated to produce all possible paths), and from such path deduce the sequence of operations.
This path finding algorithm should start at the lower right corner and work its way backward. The reason for this approach is that we know for a fact that to be an optimal solution it must end in this corner, and to end in this corner, it must have come from one of the 3 cells either immediately to its left, immediately above it or immediately diagonally. By selecting a cell among these three cells, one which satisfies our "same value or decreasing by one" requirement, we effectively pick a cell on one of the optimal paths. By repeating the operation till we get on upper left corner (or indeed until we reach a cell with a 0 value), we effectively backtrack our way on an optimal path.
Illustration with the democrat - republican example
It should also be noted that one can build the matrix in one of two ways: with 'democrat' horizontally or vertically. This doesn't change the computation of the Levenshtein distance nor does it change the list of operations needed; it only changes the way we interpret the matrix, for example moving horizontally on the "path" either means inserting a character [from the t string] or deleting a character [off the s string] depending whether 'string s' is "horizontal" or "vertical" in the matrix.
I'll use the following matrix. The conventions are therefore (only going in the left-to-right and/or top-to-bottom directions)
an horizontal move is an INSERTION of a letter from the 't string'
an vertical move is a DELETION of a letter from the 's string'
a diagonal move is either:
a no-operation (both letters at respective positions are the same); the number doesn't change
a SUBSTITUTION (letters at respective positions are distinct); the number increase by one.
Levenshtein matrix for s = "democrat", t="republican"
r e p u b l i c a n
0 1 2 3 4 5 6 7 8 9 10
d 1 1 2 3 4 5 6 7 8 9 10
e 2 2 1 2 3 4 5 6 7 8 9
m 3 3 2 2 3 4 5 6 7 8 9
o 4 4 3 3 3 4 5 6 7 8 9
c 5 5 4 4 4 4 5 6 6 7 8
r 6 5 5 5 5 5 5 6 7 7 8
a 7 6 6 6 6 6 6 6 7 7 8
t 8 7 7 7 7 7 7 7 7 8 8
The arbitrary approach I use to select one path among several possible optimal paths is loosely described below:
Starting at the bottom-rightmost cell, and working our way backward toward
the top left.
For each "backward" step, consider the 3 cells directly adjacent to the current
cell (in the left, top or left+top directions)
if the value in the diagonal cell (going up+left) is smaller or equal to the
values found in the other two cells
AND
if this is same or 1 minus the value of the current cell
then "take the diagonal cell"
if the value of the diagonal cell is one less than the current cell:
Add a SUBSTITUTION operation (from the letters corresponding to
the _current_ cell)
otherwise: do not add an operation this was a no-operation.
elseif the value in the cell to the left is smaller of equal to the value of
the of the cell above current cell
AND
if this value is same or 1 minus the value of the current cell
then "take the cell to left", and
add an INSERTION of the letter corresponding to the cell
else
take the cell above, add
Add a DELETION operation of the letter in 's string'
Following this informal pseudo-code, we get the following:
Start on the "n", "t" cell at bottom right.
Pick the [diagonal] "a", "a" cell as next destination since it is less than the other two (and satisfies the same or -1 condition).
Note that the new cell is one less than current cell
therefore the step 8 is substitute "t" with "n": democra N
Continue with "a", "a" cell,
Pick the [diagonal] "c", "r" cell as next destination...
Note that the new cell is same value as current cell ==> no operation needed.
Continue with "c", "r" cell,
Pick the [diagonal] "i", "c" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 7 is substitute "r" with "c": democ C an
Continue with "i", "c" cell,
Pick the [diagonal] "l", "o" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 6 is substitute "c" with "i": demo I can
Continue with "l", "o" cell,
Pick the [diagonal] "b", "m" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 5 is substitute "o" with "l": dem L ican
Continue with "b", "m" cell,
Pick the [diagonal]"u", "e" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 4 is substitute "m" with "b": de B lican
Continue with "u", "e" cell,
Note the "diagonal" cell doesn't qualify, because the "left" cell is less than it.
Pick the [left] "p", "e" cell as next destination...
therefore the step 3 is instert "u" after "e": de U blican
Continue with "p", "e" cell,
again the "diagonal" cell doesn't qualify
Pick the [left] "e", "e" cell as next destination...
therefore the step 2 is instert "p" after "e": de P ublican
Continue with "e", "e" cell,
Pick the [diagonal] "r", "d" cell as next destination...
Note that the new cell is same value as current cell ==> no operation needed.
Continue with "r", "d" cell,
Pick the [diagonal] "start" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 1 is substitute "d" with "r": R epublican
You've arrived at a cell which value is 0 : your work is done!
The backtracking algorithm to infer the moves from the matrix implemented in python:
def _backtrack_string(matrix, output_word):
'''
Iteratively backtrack DP matrix to get optimal set of moves
Inputs: DP matrix (list:list:int),
Input word (str),
Output word (str),
Start x position in DP matrix (int),
Start y position in DP matrix (int)
Output: Optimal path (list)
'''
i = len(matrix) - 1
j = len(matrix[0]) - 1
optimal_path = []
while i > 0 and j > 0:
diagonal = matrix[i-1][j-1]
vertical = matrix[i-1][j]
horizontal = matrix[i][j-1]
current = matrix[i][j]
if diagonal <= vertical and diagonal <= horizontal and (diagonal <= current):
i = i - 1
j = j - 1
if diagonal == current - 1:
optimal_path.append("Replace " + str(j) + ", " + str(output_word[j]) )
elif horizontal <= vertical and horizontal <= current:
j = j - 1
optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
elif vertical <= horizontal and vertical <= current:
i = i - 1
optimal_path.append("Delete " + str(i))
elif horizontal <= vertical and horizontal <= current:
j = j - 1
optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
else:
i = i - 1
optimal_path.append("Delete " + str(i))
return reversed(optimal_path)
The output I get when I run the algorithm with original word "OPERATING" and desired word "CONSTANTINE" is the following
Insert 0, C
Replace 2, N
Replace 3, S
Replace 4, T
Insert 6, N
Replace 10, E
"" C O N S T A N T I N E
"" [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
<-- Insert 0, C
O [1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
\ Replace 2, N
P [2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
\ Replace 3, S
E [3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 9]
\ Replace 4, T
R [4, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 10] No move
\ <-- Insert 6, N
A [5, 5, 5, 5, 5, 5, 4, 5, 6, 7, 8, 9]
\ No move
T [6, 6, 6, 6, 6, 5, 5, 5, 5, 6, 7, 8]
\ No move
I [7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 6, 7]
\ No move
N [8, 8, 8, 7, 8, 7, 7, 6, 7, 6, 5, 6]
\ Replace 10, E
G [9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6, 6]
Note that I had to add extra conditions if the element in the diagonal is the same as the current element. There could be a deletion or insertion depending on values in the vertical (up) and horizontal (left) positions. We only get a "no operation" or "replace" operation when the following occurs
# assume bottom right of a 2x2 matrix is the reference position
# and has value v
# the following is the situation where we get a replace operation
[v + 1 , v<]
[ v< , v]
# the following is the situation where we get a "no operation"
[v , v<]
[v<, v ]
I think this is where the algorithm described in the first answer could break. There could be other arrangements in the 2x2 matrix above when neither operations are correct. The example shown with input "OPERATING" and output "CONSTANTINE" breaks the algorithm unless this is taken into account.
It's been some times since I played with it, but it seems to me the matrix should look something like:
. . r e p u b l i c a n
. 0 1 2 3 4 5 6 7 8 9 10
d 1 1 2 3 4 5 6 7 8 9 10
e 2 2 1 2 3 4 5 6 7 8 9
m 3 3 2 2 3 4 5 6 7 8 9
o 4 4 3 3 3 4 5 6 7 8 9
c 5 5 4 4 4 4 5 6 7 8 9
r 6 5 5 5 5 5 5 6 7 8 9
a 7 6 6 6 6 6 6 6 7 7 8
t 8 7 7 7 7 7 7 7 7 7 8
Don't take it for granted though.
Here is a VBA algorithm based on mjv's answer.
(very well explained, but some case were missing).
Sub TU_Levenshtein()
Call Levenshtein("democrat", "republican")
Call Levenshtein("ooo", "u")
Call Levenshtein("ceci est un test", "ceci n'est pas un test")
End Sub
Sub Levenshtein(ByVal string1 As String, ByVal string2 As String)
' Fill Matrix Levenshtein (-> array 'Distance')
Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long
string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)
For i = 0 To string1_length
distance(i, 0) = i
Next
For j = 0 To string2_length
distance(0, j) = j
Next
For i = 1 To string1_length
For j = 1 To string2_length
If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
distance(i, j) = distance(i - 1, j - 1)
Else
distance(i, j) = Application.WorksheetFunction.min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
End If
Next
Next
LevenshteinDistance = distance(string1_length, string2_length) ' for information only
' Write Matrix on VBA sheets (only for visuation, not used in calculus)
Cells.Clear
For i = 1 To UBound(distance, 1)
Cells(i + 2, 1).Value = Mid(string1, i, 1)
Next i
For i = 1 To UBound(distance, 2)
Cells(1, i + 2).Value = Mid(string2, i, 1)
Next i
For i = 0 To UBound(distance, 1)
For j = 0 To UBound(distance, 2)
Cells(i + 2, j + 2) = distance(i, j)
Next j
Next i
' One solution
current_posx = UBound(distance, 1)
current_posy = UBound(distance, 2)
Do
cc = distance(current_posx, current_posy)
Cells(current_posx + 1, current_posy + 1).Interior.Color = vbYellow ' visualisation again
' Manage border case
If current_posy - 1 < 0 Then
MsgBox ("deletion. " & Mid(string1, current_posx, 1))
current_posx = current_posx - 1
current_posy = current_posy
GoTo suivant
End If
If current_posx - 1 < 0 Then
MsgBox ("insertion. " & Mid(string2, current_posy, 1))
current_posx = current_posx
current_posy = current_posy - 1
GoTo suivant
End If
' Middle cases
cc_L = distance(current_posx, current_posy - 1)
cc_U = distance(current_posx - 1, current_posy)
cc_D = distance(current_posx - 1, current_posy - 1)
If (cc_D <= cc_L And cc_D <= cc_U) And (cc_D = cc - 1 Or cc_D = cc) Then
If (cc_D = cc - 1) Then
MsgBox "substitution. " & Mid(string1, current_posx, 1) & " by " & Mid(string2, current_posy, 1)
current_posx = current_posx - 1
current_posy = current_posy - 1
GoTo suivant
Else
MsgBox "no operation"
current_posx = current_posx - 1
current_posy = current_posy - 1
GoTo suivant
End If
ElseIf cc_L <= cc_D And cc_L = cc - 1 Then
MsgBox ("insertion. " & Mid(string2, current_posy, 1))
current_posx = current_posx
current_posy = current_posy - 1
GoTo suivant
Else
MsgBox ("deletion." & Mid(string1, current_posy, 1))
current_posx = current_posx
current_posy = current_posy - 1
GoTo suivant
End If
suivant:
Loop While Not (current_posx = 0 And current_posy = 0)
End Sub
I've done some work with the Levenshtein distance algorithm's matrix recently. I needed to produce the operations which would transform one list into another. (This will work for strings too.)
Do the following (vows) tests show the sort of functionality that you're looking for?
, "lev - complex 2"
: { topic
: lev.diff([13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11], [9, 13, 6, 5, 1, 8, 2, 15, 12, 11])
, "check actions"
: function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 9, val: 7 },
{ op: 'delete', pos: 5, val: 9 },
{ op: 'insert', pos: 0, val: 9 },
]); }
}
, "lev - complex 3"
: { topic
: lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11])
, "check actions"
: function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
{ op: 'insert', pos: 5, val: 9 },
{ op: 'insert', pos: 9, val: 7 }
]); }
}
, "lev - complex 4"
: { topic
: lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11, 16], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11, 17])
, "check actions"
: function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
{ op: 'insert', pos: 5, val: 9 },
{ op: 'insert', pos: 9, val: 7 },
{ op: 'replace', pos: 11, val: 17 }
]); }
}
Here is some Matlab code, is this correct by your opinion? Seems to give the right results :)
clear all
s = char('democrat');
t = char('republican');
% Edit Matrix
m=length(s);
n=length(t);
mat=zeros(m+1,n+1);
for i=1:1:m
mat(i+1,1)=i;
end
for j=1:1:n
mat(1,j+1)=j;
end
for i=1:m
for j=1:n
if (s(i) == t(j))
mat(i+1,j+1)=mat(i,j);
else
mat(i+1,j+1)=1+min(min(mat(i+1,j),mat(i,j+1)),mat(i,j));
end
end
end
% Edit Sequence
s = char('democrat');
t = char('republican');
i = m+1;
j = n+1;
display([s ' --> ' t])
while(i ~= 1 && j ~= 1)
temp = min(min(mat(i-1,j-1), mat(i,j-1)), mat(i-1,j));
if(mat(i-1,j) == temp)
i = i - 1;
t = [t(1:j-1) s(i) t(j:end)];
disp(strcat(['iinsertion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
elseif(mat(i-1,j-1) == temp)
if(mat(i-1,j-1) == mat(i,j))
i = i - 1;
j = j - 1;
disp(strcat(['uunchanged: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
else
i = i - 1;
j = j - 1;
t(j) = s(i);
disp(strcat(['substition: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
end
elseif(mat(i,j-1) == temp)
j = j - 1;
t(j) = [];
disp(strcat(['dddeletion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
end
end
C# implementation of JackIsJack answer with some changes:
Operations are output in 'forward' order (JackIsJack outputs in reverse order);
Last 'else' clause in original answer worked incorrectly (looks like copy-paste error).
Console application code:
class Program
{
static void Main(string[] args)
{
Levenshtein("1", "1234567890");
Levenshtein( "1234567890", "1");
Levenshtein("kitten", "mittens");
Levenshtein("mittens", "kitten");
Levenshtein("kitten", "sitting");
Levenshtein("sitting", "kitten");
Levenshtein("1234567890", "12356790");
Levenshtein("12356790", "1234567890");
Levenshtein("ceci est un test", "ceci n'est pas un test");
Levenshtein("ceci n'est pas un test", "ceci est un test");
}
static void Levenshtein(string string1, string string2)
{
Console.WriteLine("Levenstein '" + string1 + "' => '" + string2 + "'");
var string1_length = string1.Length;
var string2_length = string2.Length;
int[,] distance = new int[string1_length + 1, string2_length + 1];
for (int i = 0; i <= string1_length; i++)
{
distance[i, 0] = i;
}
for (int j = 0; j <= string2_length; j++)
{
distance[0, j] = j;
}
for (int i = 1; i <= string1_length; i++)
{
for (int j = 1; j <= string2_length; j++)
{
if (string1[i - 1] == string2[j - 1])
{
distance[i, j] = distance[i - 1, j - 1];
}
else
{
distance[i, j] = Math.Min(distance[i - 1, j] + 1, Math.Min(
distance[i, j - 1] + 1,
distance[i - 1, j - 1] + 1));
}
}
}
var LevenshteinDistance = distance[string1_length, string2_length];// for information only
Console.WriteLine($"Levernstein distance: {LevenshteinDistance}");
// List of operations
var current_posx = string1_length;
var current_posy = string2_length;
var stack = new Stack<string>(); // for outputting messages in forward direction
while (current_posx != 0 || current_posy != 0)
{
var cc = distance[current_posx, current_posy];
// edge cases
if (current_posy - 1 < 0)
{
stack.Push("Delete '" + string1[current_posx - 1] + "'");
current_posx--;
continue;
}
if (current_posx - 1 < 0)
{
stack.Push("Insert '" + string2[current_posy - 1] + "'");
current_posy--;
continue;
}
// Middle cases
var cc_L = distance[current_posx, current_posy - 1];
var cc_U = distance[current_posx - 1, current_posy];
var cc_D = distance[current_posx - 1, current_posy - 1];
if ((cc_D <= cc_L && cc_D <= cc_U) && (cc_D == cc - 1 || cc_D == cc))
{
if (cc_D == cc - 1)
{
stack.Push("Substitute '" + string1[current_posx - 1] + "' by '" + string2[current_posy - 1] + "'");
current_posx--;
current_posy--;
}
else
{
stack.Push("Keep '" + string1[current_posx - 1] + "'");
current_posx--;
current_posy--;
}
}
else if (cc_L <= cc_D && cc_L == cc - 1)
{
stack.Push("Insert '" + string2[current_posy - 1] + "'");
current_posy--;
}
else
{
stack.Push("Delete '" + string1[current_posx - 1]+"'");
current_posx--;
}
}
while(stack.Count > 0)
{
Console.WriteLine(stack.Pop());
}
}
}
The code to get all the edit paths according to edit matrix, source and target. Make a comment if there are any bugs. Thanks a lot!
import copy
from typing import List, Union
def edit_distance(source: Union[List[str], str],
target: Union[List[str], str],
return_distance: bool = False):
"""get the edit matrix
"""
edit_matrix = [[i + j for j in range(len(target) + 1)] for i in range(len(source) + 1)]
for i in range(1, len(source) + 1):
for j in range(1, len(target) + 1):
if source[i - 1] == target[j - 1]:
d = 0
else:
d = 1
edit_matrix[i][j] = min(edit_matrix[i - 1][j] + 1,
edit_matrix[i][j - 1] + 1,
edit_matrix[i - 1][j - 1] + d)
if return_distance:
return edit_matrix[len(source)][len(target)]
return edit_matrix
def get_edit_paths(matrix: List[List[int]],
source: Union[List[str], str],
target: Union[List[str], str]):
"""get all the valid edit paths
"""
all_paths = []
def _edit_path(i, j, optimal_path):
if i > 0 and j > 0:
diagonal = matrix[i - 1][j - 1] # the diagonal value
vertical = matrix[i - 1][j] # the above value
horizontal = matrix[i][j - 1] # the left value
current = matrix[i][j] # current value
# whether the source and target token are the same
flag = False
# compute the minimal value of the diagonal, vertical and horizontal
minimal = min(diagonal, min(vertical, horizontal))
# if the diagonal is the minimal
if diagonal == minimal:
new_i = i - 1
new_j = j - 1
path_ = copy.deepcopy(optimal_path)
# if the diagnoal value equals to current - 1
# it means `replace`` operation
if diagonal == current - 1:
path_.append(f"Replace | {new_j} | {target[new_j]}")
_edit_path(new_i, new_j, path_)
# if the diagonal value equals to current value
# and corresponding positional value of source and target equal
# it means this is current best path
elif source[new_i] == target[new_j]:
flag = True
# path_.append(f"Keep | {new_i}")
_edit_path(new_i, new_j, path_)
# if the position doesn't have best path
# we need to consider other situations
if not flag:
# if vertical value equals to minimal
# it means delete source corresponding value
if vertical == minimal:
new_i = i - 1
new_j = j
path_ = copy.deepcopy(optimal_path)
path_.append(f"Delete | {new_i}")
_edit_path(new_i, new_j, path_)
# if horizontal value equals to minimal
# if mean insert target corresponding value to source
if horizontal == minimal:
new_i = i
new_j = j - 1
path_ = copy.deepcopy(optimal_path)
path_.append(f"Insert | {new_j} | {target[new_j]}")
_edit_path(new_i, new_j, path_)
else:
all_paths.append(list(reversed(optimal_path)))
# get the rows and columns of the edit matrix
row_len = len(matrix) - 1
col_len = len(matrix[0]) - 1
_edit_path(row_len, col_len, optimal_path=[])
return all_paths
if __name__ == "__main__":
source = "BBDEF"
target = "ABCDF"
matrix = edit_distance(source, target)
print("print paths")
paths = get_edit_paths(matrix, source=list(source), target=list(target))
for path in paths:
print(path)

Resources