String Match with prefixes

String Match with prefixes - algorithm

I have list of string which are prefix list(assuming its huge in numbers), if I want to check for given name/string which longest prefix from prefix list will be match for this name/string.
i.e. Prefix List:['good','goo','go']
Input: name:'goodboy' result: good
For small number of data in list, we can use normal Search/match techniques but for huge data, can someone please suggest how can i imporve.

You can use a trie.
Here is an implementation:
class Trie(dict):
def add(self, s):
node = self
for ch in s:
if ch not in node:
node[ch] = Trie()
node = node[ch]
node["end"] = True
def findprefix(self, s):
node = self
len = 0
for i, ch in enumerate(s):
if "end" in node:
len = i
if ch not in node:
break
node = node[ch]
return s[:len]
trie = Trie()
for s in ["good", "goo", "go", "goodbyeparty"]:
trie.add(s)
print(trie.findprefix("goodbye")) # "good"```

Related

Optimize the prefix tree for string search

I'm thinking about improving the prefix tree. It allows me to search for the specified number of words containing the input string.
Task: We need a class that implements a list of company names by substring - from the list of all available names, output a certain number of companies that start with the entered line. It is assumed that the class will be called when filling out a form on a website/mobile application with a high RPS (Requests per second).
My Solution:
class SuggestService(companyNames : Seq[String]) {
val arrayCompany = companyNames.toArray
val tree = getTree(Ternary.apply, 0)
def getTree(tree: Ternary, index: Int): Ternary = {
if(index == arrayCompany.length-1)
return tree.insert(arrayCompany(index))
getTree(tree.insert(arrayCompany(index)), index+1)
}
def suggest(input: String, numberOfSuggest : Int) : Seq[String] = {
val result = tree.keysWithPrefix(input)
result.take(numberOfSuggest)
}
}
Tree Class:
sealed trait Ternary {
def insert(key: String): Ternary = Ternary.insert(this, key, 0)
def keysWithPrefix(prefix: String): List[String] = Ternary.keys(this, prefix)
}
case class Node(value: Option[Int], char: Char, left: Ternary, mid: Ternary, right: Ternary) extends Ternary
case object Leaf extends Ternary
object Ternary {
def apply: Ternary = Leaf
private def keys(root: Ternary, prefix: String): List[String] =
get(root, prefix, 0) match {
case None => Nil
case Some(node) =>
collect(node, prefix.dropRight(1))
}
private def collect(node: Ternary, prefix: String): List[String] =
node match {
case Leaf => Nil
case node: Node if node.value.isDefined =>
(prefix + node.char) +: (collect(node.left, prefix) ++ collect(node.mid, prefix + node.char) ++ collect(node.right, prefix))
case node: Node =>
collect(node.left, prefix) ++ collect(node.mid, prefix + node.char) ++ collect(node.right, prefix)
}
private def get(root: Ternary, prefix: String, step: Int): Option[Ternary] = root match {
case Leaf => None
case node: Node if node.char > prefix.charAt(step) => get(node.left, prefix, step)
case node: Node if node.char < prefix.charAt(step) => get(node.right, prefix, step)
case node: Node if step < prefix.length - 1 => get(node.mid, prefix, step + 1)
case node: Node => Some(node)
}
private def insert(root: Ternary, key: String, step: Int): Ternary = root match {
case Leaf =>
val node = Node(None, key.charAt(step), Leaf, Leaf, Leaf)
insert(node, key, step)
case node: Node if node.char > key.charAt(step) =>
val left = insert(node.left, key, step)
node.copy(left = left)
case node: Node if node.char < key.charAt(step) =>
val right = insert(node.right, key, step)
node.copy(right = right)
case node: Node if step < key.length - 1 =>
val mid = insert(node.mid, key, step + 1)
node.copy(mid = mid)
case node: Node =>
node.copy(value = Some(0))
}
}
The solution works fine, and it seems to promise to work very efficiently, but I am dissatisfied with it to a sufficient extent.
By the condition, we must return a list of words in an amount equal to the number of numberOfSuggest.
And I force the tree to return all the words containing input. And only then I take the required number of words from the resulting list:
def suggest(input: String, numberOfSuggest : Int) : Seq[String] = {
val result = tree.keysWithPrefix(input)
result.take(numberOfSuggest)
}
I want to try to save time, and teach the tree to return a ready-made list of words limited by the number of numberOfSuggest.
Experiment: https://scastie.scala-lang.org/m0MxnlChT0GkpGJIBnNnUQ

DNA subsequence dynamic programming question

I'm trying to solve DNA problem which is more of improved(?) version of LCS problem.
In the problem, there is string which is string and semi-substring which allows part of string to have one or no letter skipped. For example, for string "desktop", it has semi-substring {"destop", "dek", "stop", "skop","desk","top"}, all of which has one or no letter skipped.
Now, I am given two DNA strings consisting of {a,t,g,c}. I"m trying to find longest semi-substring, LSS. and if there is more than one LSS, print out the one in the fastest order.
For example, two dnas {attgcgtagcaatg, tctcaggtcgatagtgac} prints out "tctagcaatg"
and aaaattttcccc, cccgggggaatatca prints out "aattc"
I'm trying to use common LCS algorithm but cannot solve it with tables although I did solve the one with no letter skipped. Any advice?

This is a variation on the dynamic programming solution for LCS, written in Python.
First I'm building up a Suffix Tree for all the substrings that can be made from each string with the skip rule. Then I'm intersecting the suffix trees. Then I'm looking for the longest string that can be made from that intersection tree.
Please note that this is technically O(n^2). Its worst case is when both strings are the same character, repeated over and over again. Because you wind up with a lot of what logically is something like, "an 'l' at position 42 in the one string could have matched against position l at position 54 in the other". But in practice it will be O(n).
def find_subtree (text, max_skip=1):
tree = {}
tree_at_position = {}
def subtree_from_position (position):
if position not in tree_at_position:
this_tree = {}
if position < len(text):
char = text[position]
# Make sure that we've populated the further tree.
subtree_from_position(position + 1)
# If this char appeared later, include those possible matches.
if char in tree:
for char2, subtree in tree[char].iteritems():
this_tree[char2] = subtree
# And now update the new choices.
for skip in range(max_skip + 1, 0, -1):
if position + skip < len(text):
this_tree[text[position + skip]] = subtree_from_position(position + skip)
tree[char] = this_tree
tree_at_position[position] = this_tree
return tree_at_position[position]
subtree_from_position(0)
return tree
def find_longest_common_semistring (text1, text2):
tree1 = find_subtree(text1)
tree2 = find_subtree(text2)
answered = {}
def find_intersection (subtree1, subtree2):
unique = (id(subtree1), id(subtree2))
if unique not in answered:
answer = {}
for k, v in subtree1.iteritems():
if k in subtree2:
answer[k] = find_intersection(v, subtree2[k])
answered[unique] = answer
return answered[unique]
found_longest = {}
def find_longest (tree):
if id(tree) not in found_longest:
best_candidate = ''
for char, subtree in tree.iteritems():
candidate = char + find_longest(subtree)
if len(best_candidate) < len(candidate):
best_candidate = candidate
found_longest[id(tree)] = best_candidate
return found_longest[id(tree)]
intersection_tree = find_intersection(tree1, tree2)
return find_longest(intersection_tree)
print(find_longest_common_semistring("attgcgtagcaatg", "tctcaggtcgatagtgac"))

Let g(c, rs, rt) represent the longest common semi-substring of strings, S and T, ending at rs and rt, where rs and rt are the ranked occurences of the character, c, in S and T, respectively, and K is the number of skips allowed. Then we can form a recursion which we would be obliged to perform on all pairs of c in S and T.
JavaScript code:
function f(S, T, K){
// mapS maps a char to indexes of its occurrences in S
// rsS maps the index in S to that char's rank (index) in mapS
const [mapS, rsS] = mapString(S)
const [mapT, rsT] = mapString(T)
// h is used to memoize g
const h = {}
function g(c, rs, rt){
if (rs < 0 || rt < 0)
return 0
if (h.hasOwnProperty([c, rs, rt]))
return h[[c, rs, rt]]
// (We are guaranteed to be on
// a match in this state.)
let best = [1, c]
let idxS = mapS[c][rs]
let idxT = mapT[c][rt]
if (idxS == 0 || idxT == 0)
return best
for (let i=idxS-1; i>=Math.max(0, idxS - 1 - K); i--){
for (let j=idxT-1; j>=Math.max(0, idxT - 1 - K); j--){
if (S[i] == T[j]){
const [len, str] = g(S[i], rsS[i], rsT[j])
if (len + 1 >= best[0])
best = [len + 1, str + c]
}
}
}
return h[[c, rs, rt]] = best
}
let best = [0, '']
for (let c of Object.keys(mapS)){
for (let i=0; i<(mapS[c]||[]).length; i++){
for (let j=0; j<(mapT[c]||[]).length; j++){
let [len, str] = g(c, i, j)
if (len > best[0])
best = [len, str]
}
}
}
return best
}
function mapString(s){
let map = {}
let rs = []
for (let i=0; i<s.length; i++){
if (!map[s[i]]){
map[s[i]] = [i]
rs.push(0)
} else {
map[s[i]].push(i)
rs.push(map[s[i]].length - 1)
}
}
return [map, rs]
}
console.log(f('attgcgtagcaatg', 'tctcaggtcgatagtgac', 1))
console.log(f('aaaattttcccc', 'cccgggggaatatca', 1))
console.log(f('abcade', 'axe', 1))

Maximum element in a tree

I have the following ADT implementation in Scala.
How to find the maximum element in the tree? Can I introduce some helper function, and if yes, then how?
abstract class MySet {
def max: Int
def contains(tweet: Tweet): Boolean = false
}
class Empty extends MySet {
def max: throw new NoSuchElementExeption("max called on empty tree")
def contains(x: Int): Boolean =
if (x < elem) left.contains(x)
else if (elem < x) right.contains(x)
else true
}
class Node(elem: Int, left: MySet, right: MySet) extends Set {
def max: { ... }
def contains(x: Int): Boolean =
if (x < elem) left.contains(x)
else if (elem < x) right.contains(x)
else true
}
I found a solution in Haskell which feels quite intuitive can I convert it to Scala somehow?
data Tree a = Nil | Node a (Tree a) (Tree a)
maxElement Nil = error "maxElement called on empty tree"
maxElement (Node x Nil Nil) = x
maxElement (Node x Nil r) = max x (maxElement r)
maxElement (Node x l Nil) = max x (maxElement l)
maxElement (Node x l r) = maximum [x, maxElement l, maxElement r]
Update
I am not interested in copying the Haskell code in Scala instead I think Haskell version is more intuitive but because of this keyword and other stuff in Object oriented language. How can I write the equivalent code in object oriented style without pattern matching?

Your tree is heterogeneous, which means that each node can be either a full node with a value, or an empty leaf. Hence you need to distinguish which is which, otherwise you can call max on an empty node. There are many ways:
Classic OOP:
abstract class MySet {
def isEmpty: Boolean
...
}
class Empty extends MySet {
def isEmpty = true
...
}
class Node(...) extends MySet {
def isEmpty = false
...
}
So you do something like this:
var maxElem = elem
if(!left.isEmpty)
maxElem = maxElem.max(left.max)
end
if(!right.isEmpty)
maxElem = maxElem.max(right.max)
end
Since JVM has class information at runtime you can skip the definition of isEmpty:
var maxElem = elem
if(left.isInstanceOf[Node])
maxElem = maxElem.max(left.asInstanceOf[Node].max)
end
if(left.isInstanceOf[Node])
maxElem = maxElem.max(right.asInstanceOf[Node].max)
end
(asInstanceOf is not required if you defined max in MySet, but this pattern covers the case when you didn't)
Well, Scala has a syntactic sugar for the latter, and not surprisingly it's the pattern matching:
var maxElem = elem
left match {
case node: Node =>
maxElem = maxElem.max(node.max)
case _ =>
}
right match {
case node: Node =>
maxElem = maxElem.max(node.max)
case _ =>
}
maxElem
You can take it slightly further and write something like this:
def max = (left, right) match {
case (_: Empty, _: Empty) => elem
case (_: Empty, node: Node) => elem.max(node.max)
case (node: Node, _: Empty) => elem.max(node.max)
case (leftNode: Node, rightNode: Node) =>
elem.max(leftNode.max).max(rightNode.max)
}

If you don't want to use pattern matching, you will need to implement an isEmpty operation or its equivalent, to avoid calling max on an empty set.
The important thing is how the tree is organized. Based on the implementation of contains, it looks like you have an ordered tree (a "binary search tree") where every element in the left part is less than or equal to every element in the right part. If that's the case, your problem is fairly simple. Either the right sub tree is empty and the current element is the max, or the max element of the tree is the max of the right sub tree. That should be a simple recursive implementation with nothing fancy required.

Full disclosure, still learning Scala myself, but here is two versions I came up with (which the pattern match looks like a fair translation of the Haskell code)
sealed trait Tree {
def max: Int
def maxMatch: Int
}
case object EmptyTree extends Tree {
def max = 0
def maxMatch = 0
}
case class Node(data:Int,
left:Tree = EmptyTree,
right:Tree = EmptyTree) extends Tree {
def max:Int = {
data
.max(left.max)
.max(right.max)
}
def maxMatch: Int = {
this match {
case Node(x,EmptyTree,EmptyTree) => x
case Node(x,l:Node,EmptyTree) => x max l.maxMatch
case Node(x,EmptyTree,r:Node) => x max r.maxMatch
case Node(x,l:Node,r:Node) => x max (l.maxMatch max r.maxMatch)
}
}
}
Tests (all passing)
val simpleNode = Node(3)
assert(simpleNode.max == 3)
assert(simpleNode.maxMatch == 3)
val leftLeaf = Node(1, Node(5))
assert(leftLeaf.max == 5)
assert(leftLeaf.maxMatch == 5)
val leftLeafMaxRoot = Node(5,
EmptyTree, Node(2))
assert(leftLeafMaxRoot.max == 5)
assert(leftLeafMaxRoot.maxMatch == 5)
val nestedRightTree = Node(1,
EmptyTree,
Node(2,
EmptyTree, Node(3)))
assert(nestedRightTree.max == 3)
assert(nestedRightTree.maxMatch == 3)
val partialFullTree = Node(1,
Node(2,
Node(4)),
Node(3,
Node(6, Node(7))))
assert(partialFullTree.max == 7)
assert(partialFullTree.maxMatch == 7)

Howto remove a word from a Trie structure?

maybe I'm not smart enough to learn Haskell, but I'd give it the last chance.
I've got stuck at implementation of entry removal from a tree, Trie like structure to be a more specific (http://en.wikipedia.org/wiki/Trie).
I'm looking for any advices (not a solution !) how to implement such pure function.
I've had an idea about one algorithm. Recreate a new tree by traversing whole tree "skipping" values equal to each character of the word, with edge condition return original tree if next character won't be found. But there arises a problem when a character also belongs to another word.
data Trie = Trie { commonness :: Maybe Int
, children :: [(Char, Trie)]
} deriving (Eq, Read, Show)
-- Creates an empty "dictionary"
trie :: Trie
trie = Trie { commonness = Nothing, children = [] }
-- Inserts a word with given commonness into dictionary
add :: String -> Int -> Trie -> Trie
add [] freq tree
| (0 <= freq) && (freq <= 16) = tree { commonness = Just freq }
| otherwise = error $ "Commonness out of bounds: " ++ (show freq)
add word freq tree = tree { children = traverse word (children tree) }
where
traverse [] tree = error $ "traverse called with [] " ++ (show tree)
traverse (x:xs) [] = [(x, add xs freq trie)]
traverse str#(x:xs) (t:ts)
| x == fst t = (x, add xs freq $ snd t):ts
| otherwise = t:(traverse str ts)
remove :: String -> Trie -> Trie
???
And the data looks like:
GHCi> putStrLn $ groom $ add "learn" 16 $ add "leap" 5 $ add "sing" 7 $ add "lift" 10 trie
Trie{commonness = Nothing,
children =
[('l',
Trie{commonness = Nothing,
children =
[('i',
Trie{commonness = Nothing,
children =
[('f',
Trie{commonness = Nothing,
children = [('t', Trie{commonness = Just 10, children = []})]})]}),
('e',
Trie{commonness = Nothing,
children =
[('a',
Trie{commonness = Nothing,
children =
[('p', Trie{commonness = Just 5, children = []}),
('r',
Trie{commonness = Nothing,
children =
[('n',
Trie{commonness = Just 16, children = []})]})]})]})]}),
('s',
Trie{commonness = Nothing,
children =
[('i',
Trie{commonness = Nothing,
children =
[('n',
Trie{commonness = Nothing,
children =
[('g', Trie{commonness = Just 7, children = []})]})]})]})]}

This is going to be easier if you use a Map Char Trie instead of [(Char,Trie)] for your child table. That is what I'm going to assume for this answer. I'll get you started with the inductive case:
import qualified Data.Map as Map
remove :: String -> Trie -> Trie
remove (c:cs) t = t { children = Map.alter remove' c (children t) }
where
remove' (Just t) = Just (remove cs t)
remove' Nothing = Nothing
remove [] t = ...
I'll leave the base case to you. Here are the docs for the Map function I used, alter. You could get this same solution without using Map if you implemented alter for [(Char,a)].
Exercise: remove' is pretty wordy. See if you can shorten it using fmap.

In Python you can do something like this
def remove_string_helper(self, string, pnode, index):
if pnode:
flag = False
if index < len(string):
flag = self.remove_string_helper(string, pnode.childs.get(string[index]), index + 1)
if index == len(string) and pnode.is_complete_word:
pnode.is_complete_word = False
return len(pnode.childs) == 0
if flag:
pnode.childs.pop(string[index])
return len(self.childs) == 0
return False
def remove_string(self, string):
self.remove_string_helper(string, self.childs.get(string[0]), 1)

Is there a nearest-key map datastructure?

I have a situation where I need to find the value with the key closest to the one I request. It's kind of like a nearest map that defines distance between keys.
For example, if I have the keys {A, C, M, Z} in the map, a request for D would return C's value.
Any idea?

Most tree data structures use some sort of sorting algorithm to store and find keys. Many implementations of such can locate a close key to the key you probe with (usually it either the closest below or the closest above). For example Java's TreeMap implements such a data structure and you can tell it to get you the closest key below your lookup key, or the closest key above your lookup key (higherKey and lowerKey).
If you can calculate distances (its not always easy - Java's interface only require you to know if any given key is "below" or "above" any other given key) then you can ask for both closest above and closest below and then calculate for yourself which one is closer.

What's the dimensionality of your data? If it's just one dimensional, a sorted array will do it - a binary search will locate the exact match and/or reveal betweeen which two keys your search key lies - and a simple test will tell you which is closer.
If you need to locate not just the nearest key, but an associated value, maintain an identically sorted array of values - the index of the retrieved key in the key array is then the index of the value in the value array.
Of course, there are many alternative approaches - which one to use depends on many other factors, such as memory consumption, whether you need to insert values, if you control the order of insertion, deletions, threading issues, etc...

BK-trees do precisely what you want. Here's a good article on implementing them.
And here is a Scala implementation:
class BKTree[T](computeDistance: (T, T) => Int, node: T) {
val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]
def query(what: T, distance: Int): List[T] = {
val currentDistance = computeDistance(node, what)
val minDistance = currentDistance - distance
val maxDistance = currentDistance + distance
val elegibleNodes = (
subnodes.keys.toList
filter (key => minDistance to maxDistance contains key)
map subnodes
)
val partialResult = elegibleNodes flatMap (_.query(what, distance))
if (currentDistance <= distance) node :: partialResult else partialResult
}
def insert(what: T): Boolean = if (node == what) false else (
subnodes.get(computeDistance(node, what))
map (_.insert(what))
getOrElse {
subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
true
}
)
override def toString = node.toString+"("+subnodes.toString+")"
}
object Test {
def main(args: Array[String]) {
val root = new BKTree(distance, 'A')
root.insert('C')
root.insert('M')
root.insert('Z')
println(findClosest(root, 'D'))
}
def charDistance(a: Char, b: Char) = a - b abs
def findClosest[T](root: BKTree[T], what: T): List[T] = {
var distance = 0
var closest = root.query(what, distance)
while(closest.isEmpty) {
distance += 1
closest = root.query(what, distance)
}
closest
}
}
I'll admit to a certain dirt&uglyness about it, and of being way too clever with the insertion algorithm. Also, it will only work fine for small distance, otherwise you'll search repeatedly the tree. Here's an alternate implementation that does a better job of it:
class BKTree[T](computeDistance: (T, T) => Int, node: T) {
val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]
def query(what: T, distance: Int): List[T] = {
val currentDistance = computeDistance(node, what)
val minDistance = currentDistance - distance
val maxDistance = currentDistance + distance
val elegibleNodes = (
subnodes.keys.toList
filter (key => minDistance to maxDistance contains key)
map subnodes
)
val partialResult = elegibleNodes flatMap (_.query(what, distance))
if (currentDistance <= distance) node :: partialResult else partialResult
}
private def find(what: T, bestDistance: Int): (Int,List[T]) = {
val currentDistance = computeDistance(node, what)
val presentSolution = if (currentDistance <= bestDistance) List(node) else Nil
val best = currentDistance min bestDistance
subnodes.keys.foldLeft((best, presentSolution))(
(acc, key) => {
val (currentBest, currentSolution) = acc
val (possibleBest, possibleSolution) =
if (key <= currentDistance + currentBest)
subnodes(key).find(what, currentBest)
else
(0, Nil)
(possibleBest, possibleSolution) match {
case (_, Nil) => acc
case (better, solution) if better < currentBest => (better, solution)
case (_, solution) => (currentBest, currentSolution ::: solution)
}
}
)
}
def findClosest(what: T): List[T] = find(what, computeDistance(node, what))._2
def insert(what: T): Boolean = if (node == what) false else (
subnodes.get(computeDistance(node, what))
map (_.insert(what))
getOrElse {
subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
true
}
)
override def toString = node.toString+"("+subnodes.toString+")"
}
object Test {
def main(args: Array[String]) {
val root = new BKTree(distance, 'A')
root.insert('C')
root.insert('E')
root.insert('M')
root.insert('Z')
println(root.findClosest('D'))
}
def charDistance(a: Char, b: Char) = a - b abs
}

With C++ and STL containers (std::map) you can use the following template function:
#include <iostream>
#include <map>
//!This function returns nearest by metric specified in "operator -" of type T
//!If two items in map are equidistant from item_to_find, the earlier occured by key will be returned
template <class T,class U> typename std::map<T,U>::iterator find_nearest(std::map<T,U> map_for_search,const T& item_to_find)
{
typename std::map<T,U>::iterator itlow,itprev;
itlow=map_for_search.lower_bound(item_to_find);
itprev=itlow;
itprev--;
//for cases when we have "item_to_find" element in our map
//or "item_to_find" occures before the first element of map
if ((itlow->first==item_to_find) || (itprev==map_for_search.begin()))
return itlow;
//if "item"to_find" is besides the last element of map
if (itlow==map_for_search.end())
return itprev;
return (itlow->first-item_to_find < item_to_find-itprev->first)?itlow:itprev; // C will be returned
//note that "operator -" is used here as a function for distance metric
}
int main ()
{
std::map<char,int> mymap;
std::map<char,int>::iterator nearest;
//fill map with some information
mymap['B']=20;
mymap['C']=40;
mymap['M']=60;
mymap['Z']=80;
char ch='D'; //C should be returned
nearest=find_nearest<char,int>(mymap,ch);
std::cout << nearest->first << " => " << nearest->second << '\n';
ch='Z'; //Z should be returned
nearest=find_nearest<char,int>(mymap,ch);
std::cout << nearest->first << " => " << nearest->second << '\n';
ch='A'; //B should be returned
nearest=find_nearest<char,int>(mymap,ch);
std::cout << nearest->first << " => " << nearest->second << '\n';
ch='H'; // equidistant to C and M -> C is returned
nearest=find_nearest<char,int>(mymap,ch);
std::cout << nearest->first << " => " << nearest->second << '\n';
return 0;
}
Output:
C => 40
Z => 80
B => 20
C => 40
It is assumed that an operator - is used as a function to evaluate distance. You should implement that operator if class T is your own class, objects of which serve as keys in a map.
You could also change the code to use special class T static member function (say, distance), not operator -, instead:
return (T::distance(itlow->first,item_to_find) < T::distance(item_to_find,itprev->first))?itlow:itprev;
where distance should be smth. like
static distance_type some_type::distance()(const some_type& first, const some_type& second){//...}
and distance_type should support comparison by operator <

You can implement something like this as a tree. A simple approach is to assign each node in the tree a bitstring. Each level of the tree is stored as a bit. All parent information is encoded in the node's bitstring. You can then easily locate arbitrary nodes, and find parents and children. This is how Morton ordering works, for example. It has the extra advantage that you can calculate distances between nodes by simple binary subtraction.
If you have multiple links between data values, then your data structure is a graph rather than a tree. In that case, you need a slightly more sophisticated indexing system. Distributed hash tables do this sort of thing. They typically have a way of calculating the distance between any two nodes in the index space. For example, the Kademlia algorithm (used by Bittorrent) uses XOR distances applied to bitstring ids. This allows Bittorrent clients to lookup ids in a chain, converging on the unknown target location. You can use a similar approach to find the node(s) closest to your target node.

If your keys are strings and your similarity function is Levenshtein distance, then you can use finite-state machines:
Your map is a trie built as a finite-state machine (by unionizing all key/value pairs and determinizing). Then, compose your input query with a simple finite-state transducer that encodes the Levenshtein distance, and compose that with your trie. Then, use the Viterbi algorithm to extract the shortest path.
You can implement all this with only a few function calls using a finite-state toolkit.

in scala this is a technique I use to find the closest Int <= to the key you are looking for
val sMap = SortedMap(1 -> "A", 2 -> "B", 3 -> "C")
sMap.to(4).lastOption.get // Returns 3
sMap.to(-1) // Returns an empty Map

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

String Match with prefixes - algorithm

Related

Optimize the prefix tree for string search

DNA subsequence dynamic programming question

Maximum element in a tree

Howto remove a word from a Trie structure?

Is there a nearest-key map datastructure?

Categories

Resources