Levenstein distance limit - algorithm

If I have some distance which I do not want to exceed. Example = 2.
Do I can break from algoritm before its complete completion knowing the minimum allowable distance?
Perhaps there are similar algorithms in which it can be done.
It is necessary for me to reduce the time of work programs.

Yes you can and it does reduce the complexity.
The main thing to observe is that levenstein_distance(a,b) >= |len(a) - len(b)| That is the distance can't be less than the difference in the lengths of the strings. At the very minimum you need to add characters to make them the same length.
Knowing this you can ignore all the cells in the original matrix where |i-j| > max_distance. So you can modify your loops from
for (i in 0 -> len(a))
for (j in 0 -> len(b))
to
for (i in 0-> len(a))
for (j in max(0,i-max_distance) -> min(len(b), i+max_distance))
You can keep the original matrix if it's easier for you, but you can also save space by having a matrix of (len(a), 2*max_distance) and adjusting the indices.
Once every cost you have in the last row > max_distance you can stop the algorithm.
This will give you O(N*max_distance) complexity. Since your max_distance is 2 the complexity is almost linear. You can also bail at the start is |len(a)-len(b)| > max_distance.

If you do top-down dynamic programming/recursion + memoization, you could pass the current size as an extra parameter and return early if it exceeds 2. But I think this will be inefficient because you will revisit states.
If you do bottom-up dp, you will fill row by row (you only have to keep the last and current row). If the last row only has entries greater than 2, you can terminate early.
Modify your source code according to my comment:
for (var i = 1; i <= source1Length; i++)
{
for (var j = 1; j <= source2Length; j++)
{
var cost = (source2[j - 1] == source1[i - 1]) ? 0 : 1;
matrix[i, j] = Math.Min(
Math.Min(matrix[i - 1, j] + 1, matrix[i, j - 1] + 1),
matrix[i - 1, j - 1] + cost);
}
// modify here:
// check here if matrix[i,...] is completely > 2, if yes, break
}

Related

Dynamic Programming - Rod Cutting Bottom Up Algorithm (CLRS) Solution Incorrect?

For the "rod cutting" problem:
Given a rod of length n inches and an array of prices that contains prices of all pieces of size smaller than n. Determine the maximum value obtainable by cutting up the rod and selling the pieces. [link]
Introduction to Algorithms (CLRS) page 366 gives this pseudocode for a bottom-up (dynamic programming) approach:
1. BOTTOM-UP-CUT-ROD(p, n)
2. let r[0 to n]be a new array .
3. r[0] = 0
4. for j = 1 to n
5. q = -infinity
6. for i = 1 to j
7. q = max(q, p[i] + r[j - i])
8. r[j] = q
9. return r[n]
Now, I'm having trouble understanding the logic behind line 6. Why are they doing max(q, p[i] + r[j - i]) instead of max(q, r[i] + r[j - i])? Since, this is a bottom up approach, we'll compute r[1] first and then r[2], r[3]... so on. This means while computing r[x] we are guaranteed to have r[x - 1].
r[x] denotes the max value we can get for a rod of length x (after cutting it up to maximize profit) whereas p[x] denotes the price of a single piece of rod of length x. Lines 3 - 8 are computing the value r[j] for j = 1 to n and lines 5 - 6 are computing the maximum price we can sell a rod of length j for by considering all the possible cuts. So, how does it ever make sense to use p[i] instead of r[i] in line 6. If trying to find the max price for a rod after we cut it at length = i, shouldn't we add the prices of r[i] and r[j - 1]?
I've used this logic to write a Java code and it seems to give the correct output for a number of test cases I've tried. Am I missing some cases in which where my code produces incorrect / inefficient solutions? Please help me out. Thanks!
class Solution {
private static int cost(int[] prices, int n) {
if (n == 0) {
return 0;
}
int[] maxPrice = new int[n];
for (int i = 0; i < n; i++) {
maxPrice[i] = -1;
}
for (int i = 1; i <= n; i++) {
int q = Integer.MIN_VALUE;
if (i <= prices.length) {
q = prices[i - 1];
}
for (int j = i - 1; j >= (n / 2); j--) {
q = Math.max(q, maxPrice[j - 1] + maxPrice[i - j - 1]);
}
maxPrice[i - 1] = q;
}
return maxPrice[n - 1];
}
public static void main(String[] args) {
int[] prices = {1, 5, 8, 9, 10, 17, 17, 20};
System.out.println(cost(prices, 8));
}
}
They should be equivalent.
The intuition behind the CLRS approach is that they are trying to find the single "last cut", assuming that the last piece of rod has length i and thus has value exactly p[i]. In this formulation, the "last piece" of length i is not cut further, but the remainder of length j-i is.
Your approach considers all splits of the rod into two pieces, where each of the two parts can be cut further. This considers a superset of cases compared to the CLRS approach.
Both approaches are correct and have the same asymptotic complexity. However, I would argue that the CLRS solution is more "canonical" because it more closely matches a common form of DP solution where you only consider the last "thing" (in this case, the last piece of uncut rod).
I guess both of the approach are correct.
before we prove both of them are correct lets define what exactly each approach does
p[i] + r[j - i] will give you the max value you can obtain from a rod of length j and of the piece is of size "i"(cannot divide that piece further)
r[i] + r[j-i] will give you the max value you can obtain from a rod of length i and the first cut was made at length "i"(can devide both the pieces further)
Now consider we have a rod of length X and the solution set will contain piece of length k
and since k is 0 < k < X you will find the max value at p[k] + r[X-k] in the first approach
and in the second approach you can find the same result with r[k] + r[X-k] since we know that r[k] will be >= p[k]
But in you approach you can get the result much faster(half of the time) since you are slicing the rod from both ends
so in you approach you can run the inner loop for half of the length should be good.
But I think in you code there is a bug in inner for loop
it should be j >= (i / 2) instead of j >= (n / 2)

Efficiently calculate edit distance between two strings

I have a string S of length 1000 and a query string Q of length 100. I want to calculate the edit distance of query string Q with every sub-string of string S of length 100. One naive way to do is calculate dynamically edit distance of every sub-string independently i.e. edDist(q,s[0:100]), edDist(q,s[1:101]), edDist(q,s[2:102])....... edDist(q,s[900:1000]) .
def edDist(x, y):
""" Calculate edit distance between sequences x and y using
matrix dynamic programming. Return distance. """
D = zeros((len(x)+1, len(y)+1), dtype=int)
D[0, 1:] = range(1, len(y)+1)
D[1:, 0] = range(1, len(x)+1)
for i in range(1, len(x)+1):
for j in range(1, len(y)+1):
delt = 1 if x[i-1] != y[j-1] else 0
D[i, j] = min(D[i-1, j-1]+delt, D[i-1, j]+1, D[i, j-1]+1)
return D[len(x), len(y)]
Can somebody suggest an alternate approach to calculate edit distance efficiently. My take on this is that we know the edDist(q,s[900:1000]). Can we somehow use this knowledge to calculate edDist[(q,s[899:999])]...since there we have a difference of 1 character only and then proceed backward to edDist[(q,s[1:100])] using the previously calculated edit Distance ?
Improving Space Complexity
One way to make your Levenshtein distance algorithm more efficient is to reduce the amount of memory required for your calculation.
To use an entire matrix, that requires you to utilize O(n * m) memory, where n represents the length of the first string and m the second string.
If you think about it, the only parts of the matrix we really care about are the last two columns that we're checking - the previous column and the current column.
Knowing this, we can pretend we have a matrix, but only really ever create these two columns; writing over the data when we need to update them.
All we need here is two arrays of size n + 1:
var column_crawler_0 = new Array(n + 1);
var column_crawler_1 = new Array(n + 1);
Initialize the values of these pseudo columns:
for (let i = 0; i < n + 1; ++i) {
column_crawler_0[i] = i;
column_crawler_1[i] = 0;
}
And then go through your normal algorithm, but just make sure that you're updating these arrays with the new values as we go along:
for (let j = 1; j < m + 1; ++j) {
column_crawler_1[0] = j;
for (let i = 1; i < n + 1; ++i) {
// Perform normal Levenshtein calculation method, updating current column
let cost = a[i-1] === b[j-1] ? 0 : 1;
column_crawler_1[i] = MIN(column_crawler_1[i - 1] + 1, column_crawler_0[i] + 1, column_crawler_0[i - 1] + cost);
}
// Copy current column into previous before we move on
column_crawler_1.map((e, i) => {
column_crawler_0[i] = e;
});
}
return column_crawler_1.pop()
If you want to analyze this approach further, I wrote a small open sourced library using this specific technique, so feel free to check it out if you're curious.
Improving Time Complexity
There's no non-trivial way to improve a Levenshtein distance algorithm to perform faster than O(n^2). There are a few complicated approached, one using VP-Tree data structures. There are a few good sources if you're curious to read about them here and here, and these approaches can reach up to an asymptotical speed of O(nlgn).

Find largest continuous sum such that the minimum of it and it's complement is largest

I'm given a sequence of numbers a_1,a_2,...,a_n. It's sum is S=a_1+a_2+...+a_n and I need to find a subsequence a_i,...,a_j such that min(S-(a_i+...+a_j),a_i+...+a_j) is the largest possible (both sums must be non-empty).
Example:
1,2,3,4,5 the sequence is 3,4, because then min(S-(a_i+...+a_j),a_i+...+a_j)=min(8,7)=7 (and it's the largest possible which can be checked for other subsequences).
I tried to do this the hard way.
I load all values into the array tab[n].
I do this n-1 times tab[i]+=tab[i-j]. So that tab[j] is the sum from the beginning till j.
I check all possible sums a_i+...+a_j=tab[j]-tab[i-1] and substract it from the sum, take the minimum and see if it's larger than before.
It takes O(n^2). This makes me very sad and miserable. Is there a better way?
Seems like this can be done in O(n) time.
Compute the sum S. The ideal subsequence sum is the longest one which gets closest to S/2.
Start with i=j=0 and increase j until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and save the values of i_best,j_best,sum_best.
Increment i and then increase j again until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and replace the values of i_best,j_best,sum_best if they are better. Repeat this step until done.
Note that both i and j are never decremented, so they are changed a total of at most O(n) times. Since all other operations take only constant time, this results in an O(n) runtime for the entire algorithm.
Let's first do some clarifications.
A subsequence of a sequence is actually a subset of the indices of the sequence. Haivng said that, and specifically int he case where you sequence has distinct elements, your problem will reduce to the famous Partition problem, which is known to be NP-complete. If that is the case, you can manage to solve the problem in O(Sn) where "n" is the number of elements and "S" is the total sum. This is not polynomial time as "S" can be arbitrarily large.
So lets consider the case with a contiguous subsequence. You need to observe array elements twice. First run sums them up into some "S". In the second run you carefully adjust array length. Lets assume you know that a[i] + a[i + 1] + ... + a[j] > S / 2. Then you let i = i + 1 to reduce the sum. Conversely, if it was smaller, you would increase j.
This code runs in O(n).
Python code:
from math import fabs
a = [1, 2, 3, 4, 5]
i = 0
j = 0
S = sum(a)
s = 0
while s + a[j] <= S / 2:
s = s + a[j]
j = j + 1
s = s + a[j]
best_case = (i, j)
best_difference = fabs(S / 2 - s)
while True:
if fabs(S / 2 - s) < best_difference:
best_case = (i, j)
best_difference = fabs(S / 2 - s)
if s > S / 2:
s -= a[i]
i += 1
else:
j += 1
if j == len(a):
break
s += a[j]
print best_case
i = best_case[0]
j = best_case[1]
print "Best subarray = ", a[i:j + 1]
print "Best sum = " , sum(a[i:j + 1])

Represent natural number as sum of squares using dynamic programming

The problem is to find the minimum number of squares required to sum to a number n.
Some examples:
min[ 1] = 1 (1²)
min[ 2] = 2 (1² + 1²)
min[ 4] = 1 (2²)
min[13] = 2 (3² + 2²)
I'm aware of Lagrange's four-square theorem which states that any natural number can be represented as the sum of four squares.
I'm trying to solve this using DP.
This is what I came up with (its not correct)
min[i] = 1 where i is a square number
min[i] = min(min[i - 1] + 1, 1 + min[i - prev]) where prev is a square number < i
What is the correct DP way to solve this?
I'm not sure if DP is the most efficient way to solve this problem, but you asked for DP.
min[i] = min(min[i - 1] + 1, 1 + min[i - prev]) where prev is a square number < i
This is close, I would write condition as
min[i] = min(1 + min[i - prev]) for each square number 'prev <= i'
Note, that for each i you need to check different possible values of prev.
Here's simple implementation in Java.
Arrays.fill(min, Integer.MAX_VALUE);
min[0] = 0;
for (int i = 1; i <= n; ++i) {
for (int j = 1; j*j <= i; ++j) {
min[i] = Math.min(min[i], min[i - j*j] + 1);
}
}
Seems to me that you're close...
You're taking the min() of two terms, each of which is min[i - p] + 1, where p is either 1 or some other square < i.
To fix this, just take the min() of min[i - p] + 1 over all p (where p is a square < i).
That would be a correct way. There may be a faster way.
Also, it might aid readability if you give min[] and min() different names. :-)
P.S. the above approach requires that you memoize min[], either explicitly, or as part of your DP framework. Otherwise, the complexity of the algorithm, due to recursion, would be something like O(sqrt(n)!) :-p though the average case might be a lot better.
P.P.S. See #Nikita's answer for a nice implementation. To which I would add the following optimizations... (I'm not nitpicking his implementation -- he presented it as a simple one.)
Check whether n is a perfect square, before entering the outer loop: if so, min[n] = 1 and we're done.
Check whether i is a perfect square before entering the inner loop: if so, min[i] = 1, and skip the inner loop.
Break out of the inner loop if min[i] has been set to 2, because it won't get better (if it could be done with one square, we would never have entered the inner loop, thanks to the previous optimization).
I wonder if the termination condition on the inner loop can be changed to reduce the number of iterations, e.g. j*j*2 <= i or even j*j*4 <= i. I think so but I haven't got my head completely around it.
For large i, it would be faster to compute a limit for j before the inner loop, and compare j directly to it for the loop termination condition, rather than squaring j on every inner loop iteration. E.g.
float sqrti = Math.sqrt(i);
for (int j = 1; j <= sqrti; ++j) {
On the other hand, you need j^2 for the recursion step anyway, so as long as you store it, you might as well use it.
For variety, here's another answer:
Define minsq[i, j] as the minimum number of squares from {1^2, 2^2, ..., j^2} that sum up to i. Then the recursion is:
minsq[i, j] = min(minsq[i - j*j, j] + 1, minsq[i, j - 1])
i.e., to compute minsq[i, j] we either use j^2 or we don't. Our answer for n is then:
minsq[n, floor(sqrt(n))]
This answer is perhaps conceptually simpler than the one presented earlier, but code-wise it is more difficult since one needs to be careful with the base cases. The time complexity for both answers is asymptotically the same.
I present a generalized very efficient dynamical programming algorithm to find the minimum number of positive integers of given power to reach a given target in JavaScript.
For example to reach 50000 with integers of 4th power the result would be [10,10,10,10,10] or to reach 18571 with integers of 7th power would result [3,4]. This algorithm would even work with rational powers such as to reach 222 with integers of 3/5th power would be [ 32, 32, 243, 243, 243, 3125 ]
function getMinimumCubes(tgt,p){
var maxi = Math.floor(Math.fround(Math.pow(tgt,1/p))),
hash = {0:[]},
pow = 0,
t = 0;
for (var i = 1; i <= maxi; i++){
pow = Math.fround(Math.pow(i,p));
for (var j = 0; j <= tgt - pow; j++){
t = j + pow;
hash[t] = hash[t] ? hash[t].length <= hash[j].length ? hash[t]
: hash[j].concat(i)
: hash[j].concat(i);
}
}
return hash[tgt];
}
var target = 729,
result = [];
console.time("Done in");
result = getMinimumCubes(target,2);
console.timeEnd("Done in");
console.log("Minimum number of integers to square and add to reach", target, "is", result.length, "as", JSON.stringify(result));
console.time("Done in");
result = getMinimumCubes(target,6);
console.timeEnd("Done in");
console.log("Minimum number of integers to take 6th power and add to reach", target, "is", result.length, "as", JSON.stringify(result));
target = 500;
console.time("Done in");
result = getMinimumCubes(target,3);
console.timeEnd("Done in");
console.log("Minimum number of integers to cube and add to reach", target, "is", result.length, "as", JSON.stringify(result));
target = 2017;
console.time("Done in");
result = getMinimumCubes(target,4);
console.timeEnd("Done in");
console.log("Minimum number of integers to take 4th power and add to reach", target, "is", result.length, "as", JSON.stringify(result));
target = 99;
console.time("Done in");
result = getMinimumCubes(target,2/3);
console.timeEnd("Done in");
console.log("Minimum number of integers to take 2/3th power and add to reach", target, "are", result);

Coin changing algorithm

Suppose I have a set of coins having denominations a1, a2, ... ak.
One of them is known to be equal to 1.
I want to make change for all integers 1 to n using the minimum number of coins.
Any ideas for the algorithm.
eg. 1, 3, 4 coin denominations
n = 11
optimal selection is 3, 0, 2 in the order of coin denominations.
n = 12
optimal selection is 2, 2, 1.
Note: not homework just a modification of this problem
This is a classic dynamic programming problem (note first that the greedy algorithm does not always work here!).
Assume the coins are ordered so that a_1 > a_2 > ... > a_k = 1. We define a new problem. We say that the (i, j) problem is to find the minimum number of coins making change for j using coins a_i > a_(i + 1) > ... > a_k. The problem we wish to solve is (1, j) for any j with 1 <= j <= n. Say that C(i, j) is the answer to the (i, j) problem.
Now, consider an instance (i, j). We have to decide whether or not we are using one of the a_i coins. If we are not, we are just solving a (i + 1, j) problem and the answer is C(i + 1, j). If we are, we complete the solution by making change for j - a_i. To do this using as few coins as possible, we want to solve the (i, j - a_i) problem. We arrange things so that these two problems are already solved for us and then:
C(i, j) = C(i + 1, j) if a_i > j
= min(C(i + 1, j), 1 + C(i, j - a_i)) if a_i <= j
Now figure out what the initial cases are and how to translate this to the language of your choice and you should be good to go.
If you want to try you hands at another interesting problem that requires dynamic programming, look at Project Euler Problem 67.
Here's a sample implementation of a dynamic programming algorithm in Python. It is simpler than the algorithm that Jason describes, because it only calculates 1 row of the 2D table he describes.
Please note that using this code to cheat on homework will make Zombie Dijkstra cry.
import sys
def get_best_coins(coins, target):
costs = [0]
coins_used = [None]
for i in range(1,target + 1):
if i % 1000 == 0:
print '...',
bestCost = sys.maxint
bestCoin = -1
for coin in coins:
if coin <= i:
cost = 1 + costs[i - coin]
if cost < bestCost:
bestCost = cost
bestCoin = coin
costs.append(bestCost)
coins_used.append(bestCoin)
ret = []
while target > 0:
ret.append(coins_used[target])
target -= coins_used[target]
return ret
coins = [1,10,25]
target = 100033
print get_best_coins(coins, target)
solution in C# code
public static long findPermutations(int n, List<long> c)
{
// The 2-dimension buffer will contain answers to this question:
// "how much permutations is there for an amount of `i` cents, and `j`
// remaining coins?" eg. `buffer[10][2]` will tell us how many permutations
// there are when giving back 10 cents using only the first two coin types
// [ 1, 2 ].
long[][] buffer = new long[n + 1][];
for (var i = 0; i <= n; ++i)
buffer[i] = new long[c.Count + 1];
// For all the cases where we need to give back 0 cents, there's exactly
// 1 permutation: the empty set. Note that buffer[0][0] won't ever be
// needed.
for (var j = 1; j <= c.Count; ++j)
buffer[0][j] = 1;
// We process each case: 1 cent, 2 cent, etc. up to `n` cents, included.
for (int i = 1; i <= n; ++i)
{
// No more coins? No permutation is possible to attain `i` cents.
buffer[i][0] = 0;
// Now we consider the cases when we have J coin types available.
for (int j = 1; j <= c.Count; ++j)
{
// First, we take into account all the known permutations possible
// _without_ using the J-th coin (actually computed at the previous
// loop step).
var value = buffer[i][j - 1];
// Then, we add all the permutations possible by consuming the J-th
// coin itself, if we can.
if (c[j - 1] <= i)
value += buffer[i - c[j - 1]][j];
// We now know the answer for this specific case.
buffer[i][j] = value;
}
}
// Return the bottom-right answer, the one we were looking for in the
// first place.
return buffer[n][c.Count];
}
Following is the bottom up approach of dynamic programming.
int[] dp = new int[amount+ 1];
Array.Fill(dp,amount+1);
dp[0] = 0;
for(int i=1;i<=amount;i++)
{
for(int j=0;j<coins.Length;j++)
{
if(coins[j]<=i) //if the amount is greater than or equal to the current coin
{
//refer the already calculated subproblem dp[i-coins[j]]
dp[i] = Math.Min(dp[i],dp[i-coins[j]]+1);
}
}
}
if(dp[amount]>amount)
return -1;
return dp[amount];
}

Resources