Maximum element in a tree - algorithm

I have the following ADT implementation in Scala.
How to find the maximum element in the tree? Can I introduce some helper function, and if yes, then how?
abstract class MySet {
def max: Int
def contains(tweet: Tweet): Boolean = false
}
class Empty extends MySet {
def max: throw new NoSuchElementExeption("max called on empty tree")
def contains(x: Int): Boolean =
if (x < elem) left.contains(x)
else if (elem < x) right.contains(x)
else true
}
class Node(elem: Int, left: MySet, right: MySet) extends Set {
def max: { ... }
def contains(x: Int): Boolean =
if (x < elem) left.contains(x)
else if (elem < x) right.contains(x)
else true
}
I found a solution in Haskell which feels quite intuitive can I convert it to Scala somehow?
data Tree a = Nil | Node a (Tree a) (Tree a)
maxElement Nil = error "maxElement called on empty tree"
maxElement (Node x Nil Nil) = x
maxElement (Node x Nil r) = max x (maxElement r)
maxElement (Node x l Nil) = max x (maxElement l)
maxElement (Node x l r) = maximum [x, maxElement l, maxElement r]
Update
I am not interested in copying the Haskell code in Scala instead I think Haskell version is more intuitive but because of this keyword and other stuff in Object oriented language. How can I write the equivalent code in object oriented style without pattern matching?

Your tree is heterogeneous, which means that each node can be either a full node with a value, or an empty leaf. Hence you need to distinguish which is which, otherwise you can call max on an empty node. There are many ways:
Classic OOP:
abstract class MySet {
def isEmpty: Boolean
...
}
class Empty extends MySet {
def isEmpty = true
...
}
class Node(...) extends MySet {
def isEmpty = false
...
}
So you do something like this:
var maxElem = elem
if(!left.isEmpty)
maxElem = maxElem.max(left.max)
end
if(!right.isEmpty)
maxElem = maxElem.max(right.max)
end
Since JVM has class information at runtime you can skip the definition of isEmpty:
var maxElem = elem
if(left.isInstanceOf[Node])
maxElem = maxElem.max(left.asInstanceOf[Node].max)
end
if(left.isInstanceOf[Node])
maxElem = maxElem.max(right.asInstanceOf[Node].max)
end
(asInstanceOf is not required if you defined max in MySet, but this pattern covers the case when you didn't)
Well, Scala has a syntactic sugar for the latter, and not surprisingly it's the pattern matching:
var maxElem = elem
left match {
case node: Node =>
maxElem = maxElem.max(node.max)
case _ =>
}
right match {
case node: Node =>
maxElem = maxElem.max(node.max)
case _ =>
}
maxElem
You can take it slightly further and write something like this:
def max = (left, right) match {
case (_: Empty, _: Empty) => elem
case (_: Empty, node: Node) => elem.max(node.max)
case (node: Node, _: Empty) => elem.max(node.max)
case (leftNode: Node, rightNode: Node) =>
elem.max(leftNode.max).max(rightNode.max)
}

If you don't want to use pattern matching, you will need to implement an isEmpty operation or its equivalent, to avoid calling max on an empty set.
The important thing is how the tree is organized. Based on the implementation of contains, it looks like you have an ordered tree (a "binary search tree") where every element in the left part is less than or equal to every element in the right part. If that's the case, your problem is fairly simple. Either the right sub tree is empty and the current element is the max, or the max element of the tree is the max of the right sub tree. That should be a simple recursive implementation with nothing fancy required.

Full disclosure, still learning Scala myself, but here is two versions I came up with (which the pattern match looks like a fair translation of the Haskell code)
sealed trait Tree {
def max: Int
def maxMatch: Int
}
case object EmptyTree extends Tree {
def max = 0
def maxMatch = 0
}
case class Node(data:Int,
left:Tree = EmptyTree,
right:Tree = EmptyTree) extends Tree {
def max:Int = {
data
.max(left.max)
.max(right.max)
}
def maxMatch: Int = {
this match {
case Node(x,EmptyTree,EmptyTree) => x
case Node(x,l:Node,EmptyTree) => x max l.maxMatch
case Node(x,EmptyTree,r:Node) => x max r.maxMatch
case Node(x,l:Node,r:Node) => x max (l.maxMatch max r.maxMatch)
}
}
}
Tests (all passing)
val simpleNode = Node(3)
assert(simpleNode.max == 3)
assert(simpleNode.maxMatch == 3)
val leftLeaf = Node(1, Node(5))
assert(leftLeaf.max == 5)
assert(leftLeaf.maxMatch == 5)
val leftLeafMaxRoot = Node(5,
EmptyTree, Node(2))
assert(leftLeafMaxRoot.max == 5)
assert(leftLeafMaxRoot.maxMatch == 5)
val nestedRightTree = Node(1,
EmptyTree,
Node(2,
EmptyTree, Node(3)))
assert(nestedRightTree.max == 3)
assert(nestedRightTree.maxMatch == 3)
val partialFullTree = Node(1,
Node(2,
Node(4)),
Node(3,
Node(6, Node(7))))
assert(partialFullTree.max == 7)
assert(partialFullTree.maxMatch == 7)

Related

Given: aabcdddeabb => Expected: [(a,2),(b,1),(c,1),(d,3),(e,1),(a,1),(b,1)] in Scala

I'm really interested in how this algorithm can be implemented. If possible, it would be great to see an implementation with and without recursion. I am new to the language so I would be very grateful for help. All I could come up with was this code and it goes no further:
print(counterOccur("aabcdddeabb"))
def counterOccur(string: String) =
string.toCharArray.toList.map(char => {
if (!char.charValue().equals(char.charValue() + 1)) (char, counter)
else (char, counter + 1)
})
I realize that it's not even close to the truth, I just don't even have a clue what else could be used.
First solution with using recursion. I take Char by Char from string and check if last element in the Vector is the same as current. If elements the same I update last element by increasing count(It is first case). If last element does not the same I just add new element to the Vector(second case). When I took all Chars from the string I just return result.
def counterOccur(string: String): Vector[(Char, Int)] = {
#tailrec
def loop(str: List[Char], result: Vector[(Char, Int)]): Vector[(Char, Int)] = {
str match {
case x :: xs if result.lastOption.exists(_._1.equals(x)) =>
val count = result(result.size - 1)._2
loop(xs, result.updated(result.size - 1, (x, count + 1)))
case x :: xs =>
loop(xs, result :+ (x, 1))
case Nil => result
}
}
loop(string.toList, Vector.empty[(Char, Int)])
}
println(counterOccur("aabcdddeabb"))
Second solution that does not use recursion. It works the same, but instead of the recursion it is using foldLeft.
def counterOccur2(string: String): Vector[(Char, Int)] = {
string.foldLeft(Vector.empty[(Char, Int)])((r, v) => {
val lastElementIndex = r.size - 1
if (r.lastOption.exists(lv => lv._1.equals(v))) {
r.updated(lastElementIndex, (v, r(lastElementIndex)._2 + 1))
} else {
r :+ (v, 1)
}
})
}
println(counterOccur2("aabcdddeabb"))
You can use a very simple foldLeft to accumulate. You also don't need toCharArray and toList because strings are implicitly convertible to Seq[Char]:
"aabcdddeabb".foldLeft(collection.mutable.ListBuffer[(Char,Int)]()){ (acc, elm) =>
acc.lastOption match {
case Some((c, i)) if c == elm =>
acc.dropRightInPlace(1).addOne((elm, i+1))
case _ =>
acc.addOne((elm, 1))
}
}
Here is a solution using foldLeft and a custom State case class:
def countConsecutives[A](data: List[A]): List[(A, Int)] = {
final case class State(currentElem: A, currentCount: Int, acc: List[(A, Int)]) {
def result: List[(A, Int)] =
((currentElem -> currentCount) :: acc).reverse
def nextState(newElem: A): State =
if (newElem == currentElem)
this.copy(currentCount = this.currentCount + 1)
else
State(
currentElem = newElem,
currentCount = 1,
acc = (this.currentElem -> this.currentCount) :: this.acc
)
}
object State {
def initial(a: A): State =
State(
currentElem = a,
currentCount = 1,
acc = List.empty
)
}
data match {
case a :: tail =>
tail.foldLeft(State.initial(a)) {
case (state, newElem) =>
state.nextState(newElem)
}.result
case Nil =>
List.empty
}
}
You can see the code running here.
One possibility is to use the unfold method. This method is defined for several collection types, here I'm using it to produce an Iterator (documented here for version 2.13.8):
def spans[A](as: Seq[A]): Iterator[Seq[A]] =
Iterator.unfold(as) {
case head +: tail =>
val (span, rest) = tail.span(_ == head)
Some((head +: span, rest))
case _ =>
None
}
unfold starts from a state and applies a function that returns, either:
None if we want to signal that the collection ended
Some of a pair that contains the next item of the collection we want to produce and the "remaining" state that will be fed to the next iteration.
In this example in particular, we start from a sequence of A called as (which can be a sequence of characters) and at each iteration:
if there's at least one item
we split head and tail
we further split the tail into the longest prefix that contains items equal to the head and the rest
we return the head and the prefix we got above as the next item
we return the rest of the collection as the state for the following iteration
otherwise, we return None as there's nothing more to be done
The result is a fairly flexible function that can be used to group together spans of equal items. You can then define the function you wanted initially in terms of this:
def spanLengths[A](as: Seq[A]): Iterator[(A, Int)] =
spans(as).map(a => a.head -> a.length)
This can be probably made more generic and its performance improved, but I hope this can be an helpful example about another possible approach. While folding a collection is a recursive approach, unfolding is referred to as a corecursive one (Wikipedia article).
You can play around with this code here on Scastie.
For
str = "aabcdddeabb"
you could extract matches of the regular expression
rgx = /(.)\1*/
to obtain the array
["aa", "b", "c", "ddd", "e", "a", "bb"]
and then map each element of the array to the desired string.1
def counterOccur(str: String): List[(Char, Int)] = {
"""(.)\1*""".r
.findAllIn(str)
.map(m => (m.charAt(0), m.length)).toList
}
counterOccur("aabcdddeabb")
#=> res0: List[(Char, Int)] = List((a,2), (b,1), (c,1), (d,3), (e,1), (a,1), (b,2))
The regular expression reads, "match any character and save it to capture group 1 ((.)), then match the content of capture group 1 zero or more times (\1*).
1. Scala code kindly provided by #Thefourthbird.

Optimize the prefix tree for string search

I'm thinking about improving the prefix tree. It allows me to search for the specified number of words containing the input string.
Task: We need a class that implements a list of company names by substring - from the list of all available names, output a certain number of companies that start with the entered line. It is assumed that the class will be called when filling out a form on a website/mobile application with a high RPS (Requests per second).
My Solution:
class SuggestService(companyNames : Seq[String]) {
val arrayCompany = companyNames.toArray
val tree = getTree(Ternary.apply, 0)
def getTree(tree: Ternary, index: Int): Ternary = {
if(index == arrayCompany.length-1)
return tree.insert(arrayCompany(index))
getTree(tree.insert(arrayCompany(index)), index+1)
}
def suggest(input: String, numberOfSuggest : Int) : Seq[String] = {
val result = tree.keysWithPrefix(input)
result.take(numberOfSuggest)
}
}
Tree Class:
sealed trait Ternary {
def insert(key: String): Ternary = Ternary.insert(this, key, 0)
def keysWithPrefix(prefix: String): List[String] = Ternary.keys(this, prefix)
}
case class Node(value: Option[Int], char: Char, left: Ternary, mid: Ternary, right: Ternary) extends Ternary
case object Leaf extends Ternary
object Ternary {
def apply: Ternary = Leaf
private def keys(root: Ternary, prefix: String): List[String] =
get(root, prefix, 0) match {
case None => Nil
case Some(node) =>
collect(node, prefix.dropRight(1))
}
private def collect(node: Ternary, prefix: String): List[String] =
node match {
case Leaf => Nil
case node: Node if node.value.isDefined =>
(prefix + node.char) +: (collect(node.left, prefix) ++ collect(node.mid, prefix + node.char) ++ collect(node.right, prefix))
case node: Node =>
collect(node.left, prefix) ++ collect(node.mid, prefix + node.char) ++ collect(node.right, prefix)
}
private def get(root: Ternary, prefix: String, step: Int): Option[Ternary] = root match {
case Leaf => None
case node: Node if node.char > prefix.charAt(step) => get(node.left, prefix, step)
case node: Node if node.char < prefix.charAt(step) => get(node.right, prefix, step)
case node: Node if step < prefix.length - 1 => get(node.mid, prefix, step + 1)
case node: Node => Some(node)
}
private def insert(root: Ternary, key: String, step: Int): Ternary = root match {
case Leaf =>
val node = Node(None, key.charAt(step), Leaf, Leaf, Leaf)
insert(node, key, step)
case node: Node if node.char > key.charAt(step) =>
val left = insert(node.left, key, step)
node.copy(left = left)
case node: Node if node.char < key.charAt(step) =>
val right = insert(node.right, key, step)
node.copy(right = right)
case node: Node if step < key.length - 1 =>
val mid = insert(node.mid, key, step + 1)
node.copy(mid = mid)
case node: Node =>
node.copy(value = Some(0))
}
}
The solution works fine, and it seems to promise to work very efficiently, but I am dissatisfied with it to a sufficient extent.
By the condition, we must return a list of words in an amount equal to the number of numberOfSuggest.
And I force the tree to return all the words containing input. And only then I take the required number of words from the resulting list:
def suggest(input: String, numberOfSuggest : Int) : Seq[String] = {
val result = tree.keysWithPrefix(input)
result.take(numberOfSuggest)
}
I want to try to save time, and teach the tree to return a ready-made list of words limited by the number of numberOfSuggest.
Experiment: https://scastie.scala-lang.org/m0MxnlChT0GkpGJIBnNnUQ

How to best build a persistent binary tree from a sorted stream

For a side project I wanted a simple way to generate a persistent binary search tree from a sorted stream. After some cursory searching I was only able to find descriptions of techniques that involved storing a sorted array where you can access any element by index. I ended up writing something that works but I figured this is well trodden territory and a canonical example is probably documented somewhere (and probably has a name).
The make shift code I made is included just for clarity. (It's also short)
object TreeFromStream {
sealed trait ImmutableTree[T] {
def height: Int
}
case class ImmutableTreeNode[T](
value: T,
left: ImmutableTree[T],
right: ImmutableTree[T]
) extends ImmutableTree[T] {
lazy val height = left.height + 1
}
case class NilTree[T]() extends ImmutableTree[T] {
def height = 0
}
#tailrec
def treeFromStream[T](
stream: Stream[T],
tree: ImmutableTree[T] = NilTree[T](),
ancestors: List[ImmutableTreeNode[T]] = Nil
): ImmutableTree[T] = {
(stream, ancestors) match {
case (Stream.Empty, _) =>
ancestors.foldLeft(tree) { case(right, root) => root.copy(right=right) }
case (_, ancestor :: nextAncestors) if ancestor.left.height == tree.height =>
treeFromStream(stream, ancestor.copy(right=tree), nextAncestors)
case (next #:: rest, _) =>
treeFromStream(
rest, NilTree(),
ImmutableTreeNode(next, tree, NilTree()) :: ancestors
)
}
}
}
To create a balanced tree, which I will guess you want to do, you will need to visit each node at least once. First, collect all the nodes into a buffer, and then recursively convert the buffer into a tree:
def tfs[T](stream: Stream[T]): ImmutableTree[T] = {
val ss = scala.collection.mutable.ArrayBuffer.empty[T]
def treeFromSubsequence(start: Int, end: Int): ImmutableTree[T] =
if (end == start) NilTree()
else if (end - start == 1) ImmutableTreeNode(ss(start), NilTree(), NilTree())
else {
val mid = (end - start) / 2
ImmutableTreeNode(ss(mid), treeFromSubsequence(start, mid), treeFromSubsequence(mid + 1, end))
}
stream.foreach { x => ss += x }
treeFromSubsequence(0, ss.length)
}
It will visit each value exactly twice, once to collect it and once to put it into the value field of a tree.

How to implement tail-recursive quick sort in Scala

I wrote a recursive version:
def quickSort[T](xs: List[T])(p: (T, T) => Boolean): List[T] = xs match{
case Nil => Nil
case _ =>
val x = xs.head
val (left, right) = xs.tail.partition(p(_, x))
val left_sorted = quickSort(left)(p)
val right_sorted = quickSort(right)(p)
left_sorted ::: (x :: right_sorted)
}
But I don't know how to change it into tail-recurisive. Can anyone give me a suggestion ?
Any recursive function can be be converted to use the heap, rather than the stack, to track the context. The process is called trampolining.
Here's how it could be implemented with Scalaz.
object TrampolineUsage extends App {
import scalaz._, Scalaz._, Free._
def quickSort[T: Order](xs: List[T]): Trampoline[List[T]] = {
assert(Thread.currentThread().getStackTrace.count(_.getMethodName == "quickSort") == 1)
xs match {
case Nil =>
return_ {
Nil
}
case x :: tail =>
val (left, right) = tail.partition(_ < x)
suspend {
for {
ls <- quickSort(left)
rs <- quickSort(right)
} yield ls ::: (x :: rs)
}
}
}
val xs = List.fill(32)(util.Random.nextInt())
val sorted = quickSort(xs).run
println(sorted)
val (steps, sorted1) = quickSort(xs).foldRun(0)((i, f) => (i + 1, f()))
println("sort took %d steps".format(steps))
}
Of course, you need either a really big structure or a really small stack to have a practical problem with a non-tail-recursive divide and conquer algorithm, as you can handle 2^N elements with a stack depth of N.
http://blog.richdougherty.com/2009/04/tail-calls-tailrec-and-trampolines.html
UPDATE
scalaz.Trampoline is a special case of a (much) more general structure, Free. It's defined as type Trampoline[+A] = Free[Function0, A]. It's actually possible to write quickSort more generically, so it is parameterized by the type constructor used in Free. This example shows how this is done, and how you can then use the same code to bind using the stack, the heap, or in concurrently.
https://github.com/scalaz/scalaz/blob/scalaz-seven/example/src/main/scala/scalaz/example/TrampolineUsage.scala
Tail recursion requires you to pass work, both completed and work-to-do, forward on each step. So you just have to encapsulate your work-to-do on the heap instead of the stack. You can use a list as a stack, so that's easy enough. Here's an implementation:
def quicksort[T](xs: List[T])(lt: (T,T) => Boolean) = {
#annotation.tailrec
def qsort(todo: List[List[T]], done: List[T]): List[T] = todo match {
case Nil => done
case xs :: rest => xs match {
case Nil => qsort(rest, done)
case x :: xrest =>
val (ls, rs) = (xrest partition(lt(x,_)))
if (ls.isEmpty) {
if (rs.isEmpty) qsort(rest, x :: done)
else qsort(rs :: rest, x :: done)
}
else qsort(ls :: List(x) :: rs :: rest, done)
}
}
qsort(List(xs),Nil)
}
This is, of course, just a special case of trampolining as linked to by retronym (where you don't need to pass the function forward). Fortunately, this case is easy enough to do by hand.
I just wrote this article which contains step by step instructions on how to convert the classic implementation of Quicksort to tail-recursive form:
Quicksort rewritten in tail-recursive form - An example in Scala
I hope you find it interesting!
One more version using tailrec, pattern matching and implicit ordering:
def sort[T](list: List[T])(implicit ordering: Ordering[T]): List[T] = {
#scala.annotation.tailrec
def quickSort(todo: List[List[T]], accumulator: List[T]): List[T] = todo match {
case Nil => accumulator
case head :: rest => head match {
case Nil => quickSort(rest, accumulator)
case pivot :: others =>
others.partition(ordering.lteq(_, pivot)) match {
case (Nil, Nil) => quickSort(rest, pivot :: accumulator)
case (Nil, larger) => quickSort(larger :: rest, pivot :: accumulator)
case (smaller, larger) => quickSort(smaller :: List(pivot) :: larger :: rest, accumulator)
}
}
}
quickSort(List(list), Nil)
}
val sorted = sort(someValues)
val reverseSorted = sort(someIntValues)(Ordering[Int].reverse)

Is there a nearest-key map datastructure?

I have a situation where I need to find the value with the key closest to the one I request. It's kind of like a nearest map that defines distance between keys.
For example, if I have the keys {A, C, M, Z} in the map, a request for D would return C's value.
Any idea?
Most tree data structures use some sort of sorting algorithm to store and find keys. Many implementations of such can locate a close key to the key you probe with (usually it either the closest below or the closest above). For example Java's TreeMap implements such a data structure and you can tell it to get you the closest key below your lookup key, or the closest key above your lookup key (higherKey and lowerKey).
If you can calculate distances (its not always easy - Java's interface only require you to know if any given key is "below" or "above" any other given key) then you can ask for both closest above and closest below and then calculate for yourself which one is closer.
What's the dimensionality of your data? If it's just one dimensional, a sorted array will do it - a binary search will locate the exact match and/or reveal betweeen which two keys your search key lies - and a simple test will tell you which is closer.
If you need to locate not just the nearest key, but an associated value, maintain an identically sorted array of values - the index of the retrieved key in the key array is then the index of the value in the value array.
Of course, there are many alternative approaches - which one to use depends on many other factors, such as memory consumption, whether you need to insert values, if you control the order of insertion, deletions, threading issues, etc...
BK-trees do precisely what you want. Here's a good article on implementing them.
And here is a Scala implementation:
class BKTree[T](computeDistance: (T, T) => Int, node: T) {
val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]
def query(what: T, distance: Int): List[T] = {
val currentDistance = computeDistance(node, what)
val minDistance = currentDistance - distance
val maxDistance = currentDistance + distance
val elegibleNodes = (
subnodes.keys.toList
filter (key => minDistance to maxDistance contains key)
map subnodes
)
val partialResult = elegibleNodes flatMap (_.query(what, distance))
if (currentDistance <= distance) node :: partialResult else partialResult
}
def insert(what: T): Boolean = if (node == what) false else (
subnodes.get(computeDistance(node, what))
map (_.insert(what))
getOrElse {
subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
true
}
)
override def toString = node.toString+"("+subnodes.toString+")"
}
object Test {
def main(args: Array[String]) {
val root = new BKTree(distance, 'A')
root.insert('C')
root.insert('M')
root.insert('Z')
println(findClosest(root, 'D'))
}
def charDistance(a: Char, b: Char) = a - b abs
def findClosest[T](root: BKTree[T], what: T): List[T] = {
var distance = 0
var closest = root.query(what, distance)
while(closest.isEmpty) {
distance += 1
closest = root.query(what, distance)
}
closest
}
}
I'll admit to a certain dirt&uglyness about it, and of being way too clever with the insertion algorithm. Also, it will only work fine for small distance, otherwise you'll search repeatedly the tree. Here's an alternate implementation that does a better job of it:
class BKTree[T](computeDistance: (T, T) => Int, node: T) {
val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]
def query(what: T, distance: Int): List[T] = {
val currentDistance = computeDistance(node, what)
val minDistance = currentDistance - distance
val maxDistance = currentDistance + distance
val elegibleNodes = (
subnodes.keys.toList
filter (key => minDistance to maxDistance contains key)
map subnodes
)
val partialResult = elegibleNodes flatMap (_.query(what, distance))
if (currentDistance <= distance) node :: partialResult else partialResult
}
private def find(what: T, bestDistance: Int): (Int,List[T]) = {
val currentDistance = computeDistance(node, what)
val presentSolution = if (currentDistance <= bestDistance) List(node) else Nil
val best = currentDistance min bestDistance
subnodes.keys.foldLeft((best, presentSolution))(
(acc, key) => {
val (currentBest, currentSolution) = acc
val (possibleBest, possibleSolution) =
if (key <= currentDistance + currentBest)
subnodes(key).find(what, currentBest)
else
(0, Nil)
(possibleBest, possibleSolution) match {
case (_, Nil) => acc
case (better, solution) if better < currentBest => (better, solution)
case (_, solution) => (currentBest, currentSolution ::: solution)
}
}
)
}
def findClosest(what: T): List[T] = find(what, computeDistance(node, what))._2
def insert(what: T): Boolean = if (node == what) false else (
subnodes.get(computeDistance(node, what))
map (_.insert(what))
getOrElse {
subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
true
}
)
override def toString = node.toString+"("+subnodes.toString+")"
}
object Test {
def main(args: Array[String]) {
val root = new BKTree(distance, 'A')
root.insert('C')
root.insert('E')
root.insert('M')
root.insert('Z')
println(root.findClosest('D'))
}
def charDistance(a: Char, b: Char) = a - b abs
}
With C++ and STL containers (std::map) you can use the following template function:
#include <iostream>
#include <map>
//!This function returns nearest by metric specified in "operator -" of type T
//!If two items in map are equidistant from item_to_find, the earlier occured by key will be returned
template <class T,class U> typename std::map<T,U>::iterator find_nearest(std::map<T,U> map_for_search,const T& item_to_find)
{
typename std::map<T,U>::iterator itlow,itprev;
itlow=map_for_search.lower_bound(item_to_find);
itprev=itlow;
itprev--;
//for cases when we have "item_to_find" element in our map
//or "item_to_find" occures before the first element of map
if ((itlow->first==item_to_find) || (itprev==map_for_search.begin()))
return itlow;
//if "item"to_find" is besides the last element of map
if (itlow==map_for_search.end())
return itprev;
return (itlow->first-item_to_find < item_to_find-itprev->first)?itlow:itprev; // C will be returned
//note that "operator -" is used here as a function for distance metric
}
int main ()
{
std::map<char,int> mymap;
std::map<char,int>::iterator nearest;
//fill map with some information
mymap['B']=20;
mymap['C']=40;
mymap['M']=60;
mymap['Z']=80;
char ch='D'; //C should be returned
nearest=find_nearest<char,int>(mymap,ch);
std::cout << nearest->first << " => " << nearest->second << '\n';
ch='Z'; //Z should be returned
nearest=find_nearest<char,int>(mymap,ch);
std::cout << nearest->first << " => " << nearest->second << '\n';
ch='A'; //B should be returned
nearest=find_nearest<char,int>(mymap,ch);
std::cout << nearest->first << " => " << nearest->second << '\n';
ch='H'; // equidistant to C and M -> C is returned
nearest=find_nearest<char,int>(mymap,ch);
std::cout << nearest->first << " => " << nearest->second << '\n';
return 0;
}
Output:
C => 40
Z => 80
B => 20
C => 40
It is assumed that an operator - is used as a function to evaluate distance. You should implement that operator if class T is your own class, objects of which serve as keys in a map.
You could also change the code to use special class T static member function (say, distance), not operator -, instead:
return (T::distance(itlow->first,item_to_find) < T::distance(item_to_find,itprev->first))?itlow:itprev;
where distance should be smth. like
static distance_type some_type::distance()(const some_type& first, const some_type& second){//...}
and distance_type should support comparison by operator <
You can implement something like this as a tree. A simple approach is to assign each node in the tree a bitstring. Each level of the tree is stored as a bit. All parent information is encoded in the node's bitstring. You can then easily locate arbitrary nodes, and find parents and children. This is how Morton ordering works, for example. It has the extra advantage that you can calculate distances between nodes by simple binary subtraction.
If you have multiple links between data values, then your data structure is a graph rather than a tree. In that case, you need a slightly more sophisticated indexing system. Distributed hash tables do this sort of thing. They typically have a way of calculating the distance between any two nodes in the index space. For example, the Kademlia algorithm (used by Bittorrent) uses XOR distances applied to bitstring ids. This allows Bittorrent clients to lookup ids in a chain, converging on the unknown target location. You can use a similar approach to find the node(s) closest to your target node.
If your keys are strings and your similarity function is Levenshtein distance, then you can use finite-state machines:
Your map is a trie built as a finite-state machine (by unionizing all key/value pairs and determinizing). Then, compose your input query with a simple finite-state transducer that encodes the Levenshtein distance, and compose that with your trie. Then, use the Viterbi algorithm to extract the shortest path.
You can implement all this with only a few function calls using a finite-state toolkit.
in scala this is a technique I use to find the closest Int <= to the key you are looking for
val sMap = SortedMap(1 -> "A", 2 -> "B", 3 -> "C")
sMap.to(4).lastOption.get // Returns 3
sMap.to(-1) // Returns an empty Map

Resources