Words with at least 2 common letters [closed] - algorithm

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
A string is named 2-consistent if each word has at least 2 letters in common with the next word.
For example
"Atom another era" [atom has a and t in common with another and
another has e and a in common with era (the answer is not unique).
First of all I need a data structure which takes 2 words and answers in constant time at the question "Do these words have at least 2 letters in common?"
Now, given a string of n words I need to find the longest 2-consistent substring.
I can't figure out what data structure to use. I thought to radix tree or prefix tree, but I could not find the answer. Can you help me?

Assuming unaccented letters and ignoring capitalization, for each word you can store a bit-field in a 32-bit integer where bits 0-25 are set to 1 if the corresponding letter from a-z is present.
The integer can be computed in linear time like this:
int getBitField(char* word)
{
int bits = 0;
while(*word)
bits |= 1 << ((*word++) - 'a');
return bits;
}
If the words are assumed to be words in English or some other language, with a maximum word length then the difference between constant and linear time is fairly meaningless because (for the sake of argument) all words less than the maximum length can be padded out with non-matching characters, which will result in a constant time algorithm.
Once you have the bit fields for two words you can test if they are 2-consistent in constant time by ANDing them together and checking if the result is not zero (which would indicate no letters in common) and not a power of 2 (which would indicate only one letter in common as only a single bit is set). You can test for a power of 2 by ANDing a number with itself minus 1.
bool is2Consistent(int word1bits, int word2bits)
{
int common = word1bits & word2bits;
return (common & (common - 1)) != 0;
}
This won't work if you intend to define words like 'meet' and 'beef' which have repeated letters as 2-consistent.
If you wanted to test for 3-consistency, you just need to add an extra line to the function:
bool is3Consistent(int word1bits, int word2bits)
{
int common = word1bits & word2bits;
common &= (common - 1);
return (common & (common - 1)) != 0;
}
ANDing an integer with itself minus one just removes the least significant bit, so you could apply it an arbitrary number of times to test for 4-consistency, 5-consistency etc.

Part 1: Are wordOne and wordTwo 2-consistent ?
public bool IsWordsTwoConsistent(string first, string second)
{
int[] letters = Enumerable.Repeat(0, 26).ToArray();
int countDoubles = 0;
foreach (char c in first.toLowerCase())
{
letters[(int)c - 97]++;
}
foreach (char c in second.toLowerCase())
{
if (letters[(int)c - 97] > 0)
countDoubles++;
if (countDoubles > 1)
return true;
}
return false;
}
Part 2: Longest 2-consistent substring
public int GetPositionLongestTwoConsistentSubstring(string input)
{
string[] wordsArray = input.Split(' ');
int maxLocation = -1, maxLength = 0;
int candLocation = -1, candLength = 0; //candiadate
for (int i = 0 ; i < wordsArray.Length - 1 ; i++)
{
if (IsWordsTwoConsistent(wordsArray[i], wordsArray[i+1]))
{
candLength++;
if (candLocation == -1)
candLength = i;
}
else
{
if (candLength > maxLength)
{
maxLength = candLength;
maxLocation = candLocation;
}
candLength = 0;
candLocation = -1;
}
}
if (candLength > maxLength)
{
maxLength = candLength;
maxLocation = candLocation;
}
return maxLocation;
}

First of all I need a data structure which takes 2 words and answers
in constant time at the question "Do these words have at least 2
letters in common?"
Easy. First compute the adjacency matrix for the dictionary you are using where 'adjacent' is defined to mean 'having at least two letters in common'. I disagree with the comments above, storing even a comprehensive English dictionary isn't very much data these days. Storing the full adjacency matrix might take too much space for your liking, so use sparse array facilities.
Now, bear in mind that an English word is just a number in base-26 (or base-52 if you insist on distinguishing capital letters) so looking up the row and column for a pair of words is a constant-time operation and you have the solution to your question.
Oh sure, this consumes space and takes a fair amount of pre-computation but OP asks about a data structure for answering the question in constant time.

Related

How to convert from any large arbitrary base to another

What I’d like to do is to convert a string from one "alphabet" to another, much like converting numbers between bases, but more abstract and with arbitrary digits.
For instance, converting "255" from the alphabet "0123456789" to the alphabet "0123456789ABCDEF" would result in "FF". One way to do this is to convert the input string into an integer, and then back again. Like so: (pseudocode)
int decode(string input, string alphabet) {
int value = 0;
for(i = 0; i < input.length; i++) {
int index = alphabet.indexOf(input[i]);
value += index * pow(alphabet.length, input.length - i - 1);
}
return value;
}
string encode(int value, string alphabet) {
string encoded = "";
while(value > 0) {
int index = value % alphabet.length;
encoded = alphabet[index] + encoded;
value = floor(value / alphabet.length);
}
return encoded;
}
Such that decode("255", "0123456789") returns the integer 255, and encode(255, "0123456789ABCDEF") returns "FF".
This works for small alphabets, but I’d like to be able to use base 26 (all the uppercase letters) or base 52 (uppercase and lowercase) or base 62 (uppercase, lowercase and digits), and values that are potentially over a hundred digits. The algorithm above would, theoretically, work for such alphabets, but, in practice, I’m running into integer overflow because the numbers get so big so fast when you start doing 62^100.
What I’m wondering is if there is an algorithm to do a conversion like this without having to keep up with such gigantic integers? Perhaps a way to begin the output of the result before the entire input string has been processed?
My intuition tells me that it might be possible, but my math skills are lacking. Any help would be appreciated.
There are a few similar questions here on StackOverflow, but none seem to be exactly what I'm looking for.
A general way to store numbers in an arbitrary base would be to store it as an array of integers. Minimally, a number would be denoted by a base and array of int (or short or long depending on the range of bases you want) representing different digits in that base.
Next, you need to implement multiplication in that arbitrary base.
After that you can implement conversion (clue: if x is the old base, calculate x, x^2, x^3,..., in the new base. After that, multiply digits from old base accordingly to these numbers and then add them up).
Java-like Pseudocode:
ArbitraryBaseNumber source = new ArbitraryBaseNumber(11,"103A");
ArbitraryBaseNumber target = new ArbitraryBaseNumber(3,"0");
for(int digit : base3Num.getDigitListAsIntegers()) { // [1,0,3,10]
target.incrementBy(digit);
if(not final digit) {
target.multiplyBy(source.base);
}
}
The challenge that remains, of course, is to implement ArbitraryBaseNumber, with incrementBy(int) and multiplyBy(int) methods. Essentially to do that, you do in code exactly what a schoolchild does when doing addition and long-multiplication on paper. Google and you'll find example.

algorithm for generating a random numeric string, 10,000 chars in length?

Can be in any language or even pseudocode. I was asked this in an interview question, and was curious what you guys can come up with.
I think this is a trick question - the obvious answer of generating digits using a standard library routine is almost certainly flawed, if you want to generate every possible 10000 digit number with equal probability...
If an algorithmic random number generator maintains n bits of state, then clearly it can generate at most 2n possible different output sequences, because there are only 2n different initial configurations.
233219 < 1010000 < 233220, so if your algorithm uses less than 33220 bits of internal state, it cannot possibly generate some of the 1010000 possible 10000-digit (decimal) numbers.
Typical standard library random number generators won't use anything like this much internal state. Even the Mersenne Twister (the most frequently mentioned generator with a large state that I'm aware of) only keeps 624 32-bit words (= 19968 bits) of state.
Just one of many ways. You can pass in any string of the alphabet of characters you want to use:
public class RandomUtils
{
private static readonly Random random = new Random((int)DateTime.Now.Ticks);
public static string GenerateRandomDigitString(int length)
{
const string digits = "1234567890";
return GenerateRandomString(length, digits);
}
public static string GenerateRandomAlphaString(int length)
{
const string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
return GenerateRandomString(length, alpha);
}
public static string GenerateRandomString(int length, string alphabet)
{
int maxlen = alphabet.Length;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < length; i++)
{
sb.Append(alphabet[random.Next(0, maxlen)]);
}
return sb.ToString();
}
}
Without additional requirements, this will work:
StringBuilder randomStr = new StringBuilder(10000);
Random rnd = new Random();
for(int i = 0; i<10000;i++)
{
char randomChar = rnd.AsChar();
randomStr[i] = randomChar;
}
This will result in unprintable characters and other unpleasentness. Using an ASCII encoder you can get letters, numbers and punctutaiton by sticking to the range 32 - 126. Or creating a random number between 0 and 94 and adding 32. Not sure which aspect they were looking for in the question.
BTW, No I did not know the visible range off the top of my head, I looked it up on wikipedia.
Generate a number in the range 0..9. Convert it to a digit. Stuff that into a string. Repeat 10000 times.
I always like saying Computer Random Numbers are always only pseudo-random. Anyway, your favourite language will invariably have a random library. Next what is a numeric string ? 0-9 valued for each character ? Well let's start with that assumption. So we can generate bytes between to Ascii codes of 0-9 with offset (48) and (int) random*10 (since random generators typically return floats). Then place these all in a char buffer 10000 long and convert to string.
Return a string containing 10,000 1s -- that's just as random as any other digit string of the same length.
I think the real question was to determine what the interviewer actually wanted. For example, random in what sense? Uncompressable? Random over multiple runs of the same algorithm? Etc.
You can start with a list of seed digits:
seeds = [4,9,3,1,2,5,5,4,4,8,4,3] # This should be relatively large
Then, use a counter to keep track of which digit was last used. This would be system-wide and shouldn't reset with the system:
def next_digit():
counter = 0
while True:
yield counter
counter += 1
pos_it = next_digit()
rand_it = next_digit()
Next, use an algorithm that uses modulus to determine the "next number":
def random_digit():
position = pos_it.next() % len(seeds)
digit = seeds[position] * rand_it.next()
return digit % 10
Last, generate 10,000 of those digits.
output = ""
for i in range(10000):
output = "%s%s" % (output, random_digit())
I believe that an ideal answer would use more prime numbers, but this should be pretty sufficient.

Sort N numbers in digit order

Given a N number range E.g. [1 to 100], sort the numbers in digit order (i.e) For the numbers 1 to 100, the sorted output wound be
1 10 100 11 12 13 . . . 19 2 20 21..... 99
This is just like Radix Sort but just that the digits are sorted in reversed order to what would be done in a normal Radix Sort.
I tried to store all the digits in each number as a linked list for faster operation but it results in a large Space Complexity.
I need a working algorithm for the question.
From all the answers, "Converting to Strings" is an option, but is there no other way this can be done?
Also an algorithm for Sorting Strings as mentioned above can also be given.
Use any sorting algorithm you like, but compare the numbers as strings, not as numbers. This is basically lexiographic sorting of regular numbers. Here's an example gnome sort in C:
#include <stdlib.h>
#include <string.h>
void sort(int* array, int length) {
int* iter = array;
char buf1[12], buf2[12];
while(iter++ < array+length) {
if(iter == array || (strcmp(itoa(*iter, &buf1, 10), itoa(*(iter-1), &buf2, 10) >= 0) {
iter++;
} else {
*iter ^= *(iter+1);
*(iter+1) ^= *iter;
*iter ^= *(iter+1);
iter--;
}
}
}
Of course, this requires the non-standard itoa function to be present in stdlib.h. A more standard alternative would be to use sprintf, but that makes the code a little more cluttered. You'd possibly be better off converting the whole array to strings first, then sort, then convert it back.
Edit: For reference, the relevant bit here is strcmp(itoa(*iter, &buf1, 10), itoa(*(iter-1), &buf2, 10) >= 0, which replaces *iter >= *(iter-1).
I have a solution but not exactly an algorithm.. All you need to do is converts all the numbers to strings & sort them as strings..
Here is how you can do it with a recursive function (the code is in Java):
void doOperation(List<Integer> list, int prefix, int minimum, int maximum) {
for (int i = 0; i <= 9; i++) {
int newNumber = prefix * 10 + i;
if (newNumber >= minimum && newNumber <= maximum) {
list.add(newNumber);
}
if (newNumber > 0 && newNumber <= maximum) {
doOperation(list, newNumber, minimum, maximum);
}
}
}
You call it like this:
List<Integer> numberList = new ArrayList<Integer>();
int min=1, max =100;
doOperation(numberList, 0, min, max);
System.out.println(numberList.toString());
EDIT:
I translated my code in C++ here:
#include <stdio.h>
void doOperation(int list[], int &index, int prefix, int minimum, int maximum) {
for (int i = 0; i <= 9; i++) {
int newNumber = prefix * 10 + i;
if (newNumber >= minimum && newNumber <= maximum) {
list[index++] = newNumber;
}
if (newNumber > 0 && newNumber <= maximum) {
doOperation(list, index, newNumber, minimum, maximum);
}
}
}
int main(void) {
int min=1, max =100;
int* numberList = new int[max-min+1];
int index = 0;
doOperation(numberList, index, 0, min, max);
printf("[");
for(int i=0; i<max-min+1; i++) {
printf("%d ", numberList[i]);
}
printf("]");
return 0;
}
Basically, the idea is: for each digit (0-9), I add it to the array if it is between minimum and maximum. Then, I call the same function with this digit as prefix. It does the same: for each digit, it adds it to the prefix (prefix * 10 + i) and if it is between the limits, it adds it to the array. It stops when newNumber is greater than maximum.
i think if you convert numbers to string, you can use string comparison to sort them.
you can use anny sorting alghorighm for it.
"1" < "10" < "100" < "11" ...
Optimize the way you are storing the numbers: use a binary-coded decimal (BCD) type that gives simple access to a specific digit. Then you can use your current algorithm, which Steve Jessop correctly identified as most significant digit radix sort.
I tried to store all the digits in
each number as a linked list for
faster operation but it results in a
large Space Complexity.
Storing each digit in a linked list wastes space in two different ways:
A digit (0-9) only requires 4 bits of memory to store, but you are probably using anywhere from 8 to 64 bits. A char or short type takes 8 bits, and an int can take up to 64 bits. That's using 2X to 16X more memory than the optimal solution!
Linked lists add additional unneeded memory overhead. For each digit, you need an additional 32 to 64 bits to store the memory address of the next link. Again, this increases the memory required per digit by 8X to 16X.
A more memory-efficient solution stores BCD digits contiguously in memory:
BCD only uses 4 bits per digit.
Store the digits in a contiguous memory block, like an array. This eliminates the need to store memory addresses. You don't need linked lists' ability to easily insert/delete from the middle. If you need the ability to grow the numbers to an unknown length, there are other abstract data types that allow that with much less overhead. For example, a vector.
One option, if other operations like addition/multiplication are not important, is to allocate enough memory to store each BCD digit plus one BCD terminator. The BCD terminator can be any combination of 4 bits that is not used to represent a BCD digit (like binary 1111). Storing this way will make other operations like addition and multiplication trickier, though.
Note this is very similar to the idea of converting to strings and lexicographically sorting those strings. Integers are internally stored as binary (base 2) in the computer. Storing in BCD is more like base 10 (base 16, actually, but 6 combinations are ignored), and strings are like base 256. Strings will use about twice as much memory, but there are already efficient functions written to sort strings. BCD's will probably require developing a custom BCD type for your needs.
Edit: I missed that it's a contiguous range. That being the case, all the answers which talk about sorting an array are wrong (including your idea stated in the question that it's like a radix sort), and True Soft's answer is right.
just like Radix Sort but just that the digits are sorted in reversed order
Well spotted :-) If you actually do it that way, funnily enough, it's called an MSD radix sort.
http://en.wikipedia.org/wiki/Radix_sort#Most_significant_digit_radix_sorts
You can implement one very simply, or with a lot of high technology and fanfare. In most programming languages, your particular example faces a slight difficulty. Extracting decimal digits from the natural storage format of an integer, isn't an especially fast operation. You can ignore this and see how long it ends up taking (recommended), or you can add yet more fanfare by converting all the numbers to decimal strings before sorting.
Of course you don't have to implement it as a radix sort: you could use a comparison sort algorithm with an appropriate comparator. For example in C, the following is suitable for use with qsort (unless I've messed it up):
int lex_compare(void *a, void *b) {
char a_str[12]; // assuming 32bit int
char b_str[12];
sprintf(a_str, "%d", *(int*)a);
sprintf(b_str, "%d", *(int*)b);
return strcmp(a_str,b_str);
}
Not terribly efficient, since it does a lot of repeated work, but straightforward.
If you do not want to convert them to strings, but have enough space to store an extra copy of the list I would store the largest power of ten less than the element in the copy. This is probably easiest to do with a loop. Now call your original array x and the powers of ten y.
int findPower(int x) {
int y = 1;
while (y * 10 < x) {
y = y * 10;
}
return y;
}
You could also compute them directly
y = exp10(floor(log10(x)));
but I suspect that the iteration may be faster than the conversions to and from floating point.
In order to compare the ith and jth elements
bool compare(int i, int j) {
if (y[i] < y[j]) {
int ti = x[i] * (y[j] / y[i]);
if (ti == x[j]) {
return (y[i] < y[j]); // the compiler will optimize this
} else {
return (ti < x[j]);
}
} else if (y[i] > y[j]) {
int tj = x[j] * (y[i] / y[j]);
if (x[i] == tj) {
return (y[i] < y[j]); // the compiler will optimize this
} else {
return (x[i] < tj);
}
} else {
return (x[i] < x[j];
}
}
What is being done here is we are multiplying the smaller number by the appropriate power of ten to make the two numbers have an equal number of digits, then comparing them. if the two modified numbers are equal, then compare the digit lengths.
If you do not have the space to store the y arrays you can compute them on each comparison.
In general, you are likely better off using the preoptimized digit conversion routines.

C/C++/Java/C#: help parsing numbers

I've got a real problem (it's not homework, you can check my profile). I need to parse data whose formatting is not under my control.
The data look like this:
6,852:6,100,752
So there's first a number made of up to 9 digits, followed by a colon.
Then I know for sure that, after the colon:
there's at least one valid combination of numbers that add up to the number before the column
I know exactly how many numbers add up to the number before the colon (two in this case, but it can go as high as ten numbers)
In this case, 6852 is 6100 + 752.
My problem: I need to find these numbers (in this example, 6100 + 752).
It is unfortunate that in the data I'm forced to parse, the separator between the numbers (the comma) is also the separator used inside the number themselves (6100 is written as 6,100).
Once again: that unfortunate formatting is not under my control and, once again, this is not homework.
I need to solve this for up to 10 numbers that need to add up.
Here's an example with three numbers adding up to 6855:
6,855:360,6,175,320
I fear that there are cases where there would be two possible different solutions. HOWEVER if I get a solution that works "in most cases" it would be enough.
How do you typically solve such a problem in a C-style bracket language?
Well, I would start with the brute force approach and then apply some heuristics to prune the search space. Just split the list on the right by commas and iterate over all possible ways to group them into n terms (where n is the number of terms in the solution). You can use the following two rules to skip over invalid possibilities.
(1) You know that any group of 1 or 2 digits must begin a term.
(2) You know that no candidate term in your comma delimited list can be greater than the total on the left. (This also tells you the maximum number of digit groups that any candidate term can have.)
Recursive implementation (pseudo code):
int total; // The total read before the colon
// Takes the list of tokens as integers after the colon
// tokens is the set of tokens left to analyse,
// partialList is the partial list of numbers built so far
// sum is the sum of numbers in partialList
// Aggregate takes 2 ints XXX and YYY and returns XXX,YYY (= XXX*1000+YYY)
function getNumbers(tokens, sum, partialList) =
if isEmpty(tokens)
if sum = total return partialList
else return null // Got to the end with the wrong sum
var result1 = getNumbers(tokens[1:end], sum+token[0], Add(partialList, tokens[0]))
var result2 = getNumbers(tokens[2:end], sum+Aggregate(token[0], token[1]), Append(partialList, Aggregate(tokens[0], tokens[1])))
if result1 <> null return result1
if result2 <> null return result2
return null // No solution overall
You can do a lot better from different points of view, like tail recursion, pruning (you can have XXX,YYY only if YYY has 3 digits)... but this may work well enough for your app.
Divide-and-conquer would make for a nice improvement.
I think you should try all possible ways to parse the string and calculate the sum and return a list of those results that give the correct sum. This should be only one result in most cases unless you are very unlucky.
One thing to note that reduces the number of possibilities is that there is only an ambiguity if you have aa,bbb and bbb is exactly 3 digits. If you have aa,bb there is only one way to parse it.
Reading in C++:
std::pair<int,std::vector<int> > read_numbers(std::istream& is)
{
std::pair<int,std::vector<int> > result;
if(!is >> result.first) throw "foo!"
for(;;) {
int i;
if(!is >> i)
if(is.eof()) return result;
else throw "bar!";
result.second.push_back(i);
char ch;
if(is >> ch)
if(ch != ',') throw "foobar!";
is >> std::ws;
}
}
void f()
{
std::istringstream iss("6,852:6,100,752");
std::pair<int,std::vector<int> > foo = read_numbers(iss);
std::vector<int> result = get_winning_combination( foo.first
, foo.second.begin()
, foo.second.end() );
for( std::vector<int>::const_iterator i=result.begin(); i!=result.end(), ++i)
std::cout << *i << " ";
}
The actual cracking of the numbers is left as an exercise to the reader. :)
I think your main problem is deciding how to actually parse the numbers. The rest is just rote work with strings->numbers and iteration over combinations.
For instance, in the examples you gave, you could heuristically decide that a single-digit number followed by a three-digit number is, in fact, a four-digit number. Does a heuristic such as this hold true over a larger dataset? If not, you're also likely to have to iterate over the possible input parsing combinations, which means the naive solution is going to have a big polynomic complexity (O(nx), where x is >4).
Actually checking for which numbers add up is easy to do using a recursive search.
List<int> GetSummands(int total, int numberOfElements, IEnumerable<int> values)
{
if (numberOfElements == 0)
{
if (total == 0)
return new List<int>(); // Empty list.
else
return null; // Indicate no solution.
}
else if (total < 0)
{
return null; // Indicate no solution.
}
else
{
for (int i = 0; i < values.Count; ++i)
{
List<int> summands = GetSummands(
total - values[i], numberOfElements - 1, values.Skip(i + 1));
if (summands != null)
{
// Found solution.
summands.Add(values[i]);
return summands;
}
}
}
}

Edit Distance Algorithm

I have a dictionary of 'n' words given and there are 'm' Queries to respond to. I want to output the number of words in dictionary which are edit distance 1 or 2. I want to optimize the result set given that n and m are roughly 3000.
Edit added from answer below:
I will try to word it differently.
Initially there are 'n' words given as a set of Dictionary words. Next 'm' words are given which are query words and for each query word, I need to find if the word already exists in Dictionary (Edit Distance '0') or the total count of words in dictionary which are at edit distance 1 or 2 from the dictionary words.
I hope the Question is now Clear.
Well, it times out if the Time Complexity is (m*n)n.The naive use of DP Edit Distance Algorithm times out. Even Calculating the Diagonal Elements of 2k+1 times out where k is the threshold here k=3 in above case.
You want to use the Levenshtein distance between two words, but I assume you know that since that's what the question's tags say.
You would have to iterate through your List (assumption) and compare every word in the list with the current query you're executing. You could build a BK-tree to limit your search space, but that sounds like an overkill if you only have ~3000 words.
var upperLimit = 2;
var allWords = GetAllWords();
var matchingWords = allWords
.Where(word => Levenshtein(query, word) <= upperLimit)
.ToList();
Added after edit of original question
Finding cases where distance=0 would be easy Contains-queries if you have a case insensitive dictionary. Those cases where distance <= 2 would require a complete scan of the search space, 3000 comparisons per query word. Assuming an equal amount of query words would result in 9 million comparisons.
You mention that it times out, so I presume you have a timeout configured? Could your speed be due to a poor, or slow, implementation of the Levenshtein calculation?
(source: itu.edu.tr)
Above graph is stolen from CLiki: bk-tree
As seen, using bk-tree with an edit distance <= 2 would only visit about 1% of the search space, but that's assuming that you have a very large input data, in their case up to a half million words. I would assume similar numbers in your case, but such a low amount of inputs wouldnt cause much trouble even if stored in a List/Dictionary.
I will try to word it differently.
Initially there are 'n' words given as a set of Dictionary words.
Next 'm' words are given which are query words and for each query word, I need to find if the word already exists in Dictionary (Edit Distance '0') or the total count of words in dictionary which are at edit distance 1 or 2 from the dictionary words.
I hope the Question is now Clear.
Well, it times out if the Time Complexity is (m*n)*n.The naive use of DP Edit Distance Algorithm times out.
Even Calculating the Diagonal Elements of 2*k+1 times out where k is the threshold here k=3 in above case.
PS: BK Tree should suffice the purpose.Any Links about Implementation in C++.
public class Solution {
public int minDistance(String word1, String word2) {
int[][] table = new int[word1.length()+1][word2.length()+1];
for(int i = 0; i < table.length; ++i) {
for(int j = 0; j < table[i].length; ++j) {
if(i == 0)
table[i][j] = j;
else if(j == 0)
table[i][j] = i;
else {
if(word1.charAt(i-1) == word2.charAt(j-1))
table[i][j] = table[i-1][j-1];
else
table[i][j] = 1 + Math.min(Math.min(table[i-1][j-1],
table[i-1][j]), table[i][j-1]);
}
}
}
return table[word1.length()][word2.length()];
}
}
Checkout this simplified solution, solved using Dynamic Programming,
class Solution:
def minDistance(self, word1: str, word2: str) -> int:
return self.edit_distance(word1, word2)
#cache
def edit_distance(self, s, t):
# Edge conditions
if len(s) == 0:
return len(t)
if len(t) == 0:
return len(s)
# If 1st char matches
if s[0] == t[0]:
return self.edit_distance(s[1:], t[1:])
else:
return min(
1 + self.edit_distance(s[1:], t), # delete
1 + self.edit_distance(s, t[1:]), # insert
1 + self.edit_distance(s[1:], t[1:]) # replace
)

Resources