Implement an immutable deque as a balanced binary tree? - data-structures

I've been thinking for a while about how to go about implementing a deque (that is, a double-ended queue) as an immutable data structure.
There seem to be different ways of doing this. AFAIK, immutable data structures are generally hierarchical, so that major parts of it can be reused after modifying operations such as the insertion or removal of an item.
Eric Lippert has two articles on his blog about this topic, along with sample implementations in C#.
Both of his implementations strike me as more elaborate than is actually necessary. Couldn't deques simply be implemented as binary trees, where elements can only be inserted or removed on the very "left" (the front) and on the very "right" (the back) of the tree?
o
/ \
… …
/ \
… …
/ \ / \
front --> L … … R <-- back
Additionally, the tree would be kept reasonably balanced with rotations:
right rotations upon insertion at the front or upon removal from the back, and
left rotations upon removal from the front or insertion at the back.
Eric Lippert is, in my opinion, a very smart person whom I deeply respect, yet he apparently didn't consider this approach. Thus I wonder, was it for a good reason? Is my suggested way of implementing deques naïve?

As Daniel noted, implementing immutable deques with well-known balanced search trees like AVL or red-black trees gives Θ(lg n) worst case complexity. Some of the implementations Lippert discusses may seem elaborate at first glance, but there are many immutable deques with o(lg n) worst or average or amortized complexity that are built from balanced trees along with two simple ideas:
Reverse the Spines
To perform deque operations on a traditional balanced search tree, we need access to the ends, but we only have access to the center. To get to the left end, we must navigate left child pointers until we finally reach a dead end. It would be preferable to have a pointer to the left and right ends without all that navigation effort. In fact, we really don't need access to the root node very often. Let's store a balanced search tree so that access to the ends is O(1).
Here is an example in C of how you might normally store an AVL tree:
struct AVLTree {
const char * value;
int height;
struct AVLTree * leftChild;
struct AVLTree * rightChild;
};
To set up the tree so that we can start at the edges and move towards the root, we change the tree and store all of the pointers along the paths from the root to the left and rightmost children in reverse. (These paths are called the left and right spine, respectively). Just like reversing a singly-linked list, the last element becomes the first, so the leftmost child is now easily accessible.
This is a little tricky to understand. To help explain it, imagine that you only did this for the left spine:
struct LeftSpine {
const char * value;
int height;
struct AVLTree * rightChild;
struct LeftSpine * parent;
};
In some sense, the leftmost child is now the "root" of the tree. If you drew the tree this way, it would look very strange, but if you simply take your normal drawing of a tree and reverse all of the arrows on the left spine, the meaning of the LeftSpine struct should become clearer. Access to the left side of the tree is now immediate. The same can be done for the right spine:
struct RightSpine {
double value;
int height;
struct AVLTree * leftChild;
struct RightSpine * parent;
};
If you store both a left and a right spine as well as the center element, you have immediate access to both ends. Inserting and deleting may still be Ω(lg n), because rebalancing operations may require traversing the entire left or right spine, but simply viewing to the left and rightmost elements is now O(1).
An example of this strategy is used to make purely functional treaps with implementations in SML and Java (more documentation). This is also a key idea in several other immutable deques with o(lg n) performance.
Bound the Rabalancing Work
As noted above, inserting at the left or rightmost end of an AVL tree can require Ω(lg n) time for traversing the spine. Here is an example of an AVL tree that demonstrates this:
A full binary tree is defined by induction as:
A full binary tree of height 0 is an empty node.
A full binary tree of height n+1 has a root node with full binary trees of height n as children.
Pushing an element onto the left of a full binary tree will necessarily increase the maximum height of the tree. Since the AVL trees above store that information in each node, and since every tree along the left spine of a full binary tree is also a full binary tree, pushing an element onto the left of an AVL deque that happens to be a full binary tree will require incrementing Ω(lg n) height values along the left spine.
(Two notes on this: (a) You can store AVL trees without keeping the height in the node; instead you keep only balance information (left-taller, right-taller, or even). This doesn't change the performance of the above example. (b) In AVL trees, you might need to do not only Ω(lg n) balance or height information updates, but Ω(lg n) rebalancing operations. I don't recall the details of this, and it may be only on deletions, rather than insertions.)
In order to achieve o(lg n) deque operations, we need to limit this work. Immutable deques represented by balanced trees usually use at least one of the following strategies:
Anticipate where rebalancing will be needed. If you are using a tree that requires o(lg n) rebalancing but you know where that rebalancing will be needed and you can get there quickly enough, you can perform your deque operations in o(lg n) time. Deques that use this as a strategy will store not just two pointers into the deque (the ends of the left and right spines, as discussed above), but some small number of jump pointers to places higher along the spines. Deque operations can then access the roots of the trees pointed to by the jump pointers in O(1) time. If o(lg n) jump pointers are maintained for all of the places where rebalancing (or changing node information) will be needed, deque operations can take o(lg n) time.
(Of course, this makes the tree actually a dag, since the trees on the spines pointed to by jump pointers are also pointed to by their children on the spine. Immutable data structures don't usually get along with non-tree graphs, since replacing a node pointed to by more than one other node requires replacing all of the other nodes that point to it. I have seen this fixed by just eliminating the non-jump pointers, turning the dag back into a tree. One can then store a singly-linked list with jump pointers as a list of lists. Each subordinate list contains all of the nodes between the head of that list and its jump pointer. This requires some care to deal with partially overlapping jump pointers, and a full explanation is probably not appropriate for this aside.)
This is one of the tricks used by Tsakalidis in his paper "AVL Trees for localized search" to allow O(1) deque operations on AVL trees with a relaxed balance condition. It is also the main idea used by Kaplan and Tarjan in their paper "Purely functional, real-time deques with catenation" and a later refinement of that by Mihaesau and Tarjan. Munro et al.'s "Deterministic Skip Lists" also deserves a mention here, though translating skip lists to an immutable setting by using trees sometimes changes the properties that allow such efficient modification near the ends. For examples of the translation, see Messeguer's "Skip trees, an alternative data structure to Skip lists in a concurrent approach", Dean and Jones's "Exploring the duality between skip lists and binary search trees", and Lamoureux and Nickerson's "On the Equivalence of B-trees and deterministic skip lists".
Do the work in bulk. In the full binary tree example above, no rebalancing is needed on a push, but Ω(lg n) nodes need to have their height or balance information updated. Instead of actually doing the incrementation, you could simply mark the spine at the ends as needing incrementation.
One way to understand this process is by analogy to binary numbers. (2^n)-1 is represented in binary by a string of n 1's. When adding 1 to this number, you need to change all of the 1's to 0's and then add a 1 at the end. The following Haskell encodes binary numbers as non-empty strings of bits, least significant first.
data Bit = Zero | One
type Binary = (Bit,[Bit])
incr :: Binary -> Binary
incr (Zero,x) = (One,x)
incr (One,[]) = (Zero,[One])
incr (One,(x:xs)) =
let (y,ys) = incr (x,xs)
in (Zero,y:ys)z
incr is a recursive function, and for numbers of the form (One,replicate k One), incr calls itself Ω(k) times.
Instead, we might represent groups of equal bits by only the number of bits in the group. Neighboring bits or groups of bits are combined into one group if they are equal (in value, not in number). We can increment in O(1) time:
data Bits = Zeros Int | Ones Int
type SegmentedBinary = (Bits,[Bits])
segIncr :: SegmentedBinary -> SegmentedBinary
segIncr (Zeros 1,[]) = (Ones 1,[])
segIncr (Zeros 1,(Ones n:rest)) = (Ones (n+1),rest)
segIncr (Zeros n,rest) = (Ones 1,Zeros (n-1):rest)
segIncr (Ones n,[]) = (Zeros n,[Ones 1])
segIncr (Ones n,(Zeros 1:Ones m:rest)) = (Zeros n,Ones (m+1):rest)
segIncr (Ones n,(Zeros p:rest)) = (Zeros n,Ones 1:Zeros (p-1):rest)
Since segIncr is not recursive and doesn't call any functions other than plus and minus on Ints, you can see it takes O(1) time.
Some of the deques mentioned in the section above entitled "Anticipate where rebalancing will be needed" actually use a different numerically-inspired technique called "redundant number systems" to limit the rebalancing work to O(1) and locate it quickly. Redundant numerical representations are fascinating, but possibly too far afield for this discussion. Elmasry et al.'s "Strictly-regular number system and data structures" is not a bad place to start reading about that topic. Hinze's "Bootstrapping one-sided flexible arrays" may also be useful.
In "Making data structures persistent", Driscoll et al. describe lazy recoloring, which they attribute to Tsakalidis. They apply it to red-black trees, which can be rebalanced after insertion or deletion with O(1) rotations (but Ω(lg n) recolorings) (see Tarjan's "Updataing a balanced tree in O(1) rotations"). The core of the idea is to mark a large path of nodes that need to be recolored but not rotated. A similar idea is used on AVL trees in the older versions of Brown & Tarjan's "A fast merging algorithm". (Newer versions of the same work use 2-3 trees; I have not read the newer ones and I do not know if they use any techniques like lazy recoloring.)
Randomize. Treaps, mentioned above, can be implemented in a functional setting so that they perform deque operations on O(1) time on average. Since deques do not need to inspect their elements, this average is not susceptible to malicious input degrading performance, unlike simple (no rebalancing) binary search trees, which are fast on average input. Treaps use an independent source of random bits instead of relying on randomness from the data.
In a persistent setting, treaps may be susceptible to degraded performance from malicious input with an adversary who can both (a) use old versions of a data structure and (b) measure the performance of operations. Because they do not have any worst-case balance guarantees, treaps can become quite unbalanced, though this should happen rarely. If an adversary waits for a deque operation that takes a long time, she can initiate that same operation repeatedly in order to measure and take advantage of a possibly unbalanced tree.
If this is not a concern, treaps are an attractively simple data structure. They are very close to the AVL spine tree described above.
Skip lists, mentioned above, might also be amenable to functional implementations with O(1) average-time deque operations.
The first two techniques for bounding the rebalancing work require complex modifications to data structures while usually affording a simple analysis of the complexity of deque operations. Randomization, along with the next technique, have simpler data structures but more complex analysis. The original analysis by Seidel and Aragon is not trivial, and there is some complex analysis of exact probabilities using more advanced mathematics than is present in the papers cited above -- see Flajolet et al.'s "Patterns in random binary search trees".
Amortize. There are several balanced trees that, when viewed from the roots up (as explained in "Reverse the Spines", above), offer O(1) amortized insertion and deletion time. Individual operations can take Ω(lg n) time, but they put the tree in such a nice state that a large number of operations following the expensive operation will be cheap.
Unfortunately, this kind of analysis does not work when old versions of the tree are still around. A user can perform operations on the old, nearly-out-of-balance tree many times without any intervening cheap operations.
One way to get amortized bounds in a persistent setting was invented by Chris Okasaki. It is not simple to explain how the amortization survives the ability to use arbitrary old versions of a data structure, but if I remember correctly, Okasaki's first (as far as I know) paper on the subject has a pretty clear explanation. For more comprehensive explanations, see his thesis or his book.
As I understand it, there are two essential ingredients. First, instead of just guaranteeing that a certain number of cheap operations occur before each expensive operation (the usual approach to amortization) you actually designate and set up that specific expensive operation before performing the cheap operations that will pay for it. In some cases, the operation is scheduled to be started (and finished) only after many intervening cheap steps. In other cases, the operation is actually scheduled only O(1) steps in the future, but cheap operations may do part of the expensive operation and then reschedule more of it for later. If an adversary looking to repeat an expensive operation over and over again is actually reusing the same scheduled operation each time. This sharing is where the second ingredient comes in.
The computation is set up using laziness. A lazy value is not computed immediately, but, once performed, its result is saved. The first time a client needs to inspect a lazy value, its value is computed. Later clients can use that cached value directly, without having to recompute it.
#include <stdlib.h>
struct lazy {
int (*oper)(const char *);
char * arg;
int* ans;
};
typedef struct lazy * lazyop;
lazyop suspend(int (*oper)(const char *), char * arg) {
lazyop ans = (lazyop)malloc(sizeof(struct lazy));
ans->oper = oper;
ans->arg = arg;
return ans;
}
void force(lazyop susp) {
if (0 == susp) return;
if (0 != susp->ans) return;
susp->ans = (int*)malloc(sizeof(int));
*susp->ans = susp->oper(susp->arg);
}
int get(lazyop susp) {
force(susp);
return *susp->ans;
}
Laziness constructs are included in some MLs, and Haskell is lazy by default. Under the hood, laziness is a mutation, which leads some authors to call it a "side effect". That might be considered bad if that kind of side effect doesn't play well with whatever the reasons were for selecting an immutable data structure in the first place, but, on the other hand, thinking of laziness as a side effect allows the application of traditional amortized analysis techniques to persistent data structures, as mentioned in a paper by Kaplan, Okasaki, and Tarjan entitled "Simple Confluently Persistent Catenable Lists".
Consider again the adversary from above who is attempting to repeatedly force the computation of an expensive operation. After the first force of the lazy value, every remaining force is cheap.
In his book, Okasaki explains how to build deques with O(1) amortized time required for each operation. It is essentially a B+-tree, which is a tree where all of the elements are stored at the leaves, nodes may vary in how many children they have, and every leaf is at the same depth. Okasaki uses the spine-reversal method discussed above, and he suspends (that is, stores as a lazy value) the spines above the leaf elements.
A structure by Hinze and Paterson called "Finger trees: a simple general-purpose data structure" is halfway between the deques designed by Okasaki and the "Purely functional representations of catenable sorted lists" of Kaplan and Tarjan. Hinze and Paterson's structure has become very popular.
As a evidence of how tricky the amortized analysis is to understand, Hinze and Paterson's finger trees are frequently implemented without laziness, making the time bounds not O(1) but still O(lg n). One implementation that seems to use laziness is the one in functional-dotnet. That project also includes an implementation of lazy values in C# which might help explain them if my explanation above is lacking.
Could deques be implemented as binary trees? Yes, and their worst-case complexity when used persistently would be no worse than those presented by Eric Lippert. However, Eric's trees are actually not complicated enough to get O(1) deque operations in a persistent setting, though only by a small complexity margin (making the center lazy) if you are willing to accept amortized performance. A different but also simple view of treaps can get O(1) expected performance in a functional setting, assuming an adversary who is not too tricky. Getting O(1) worst-case deque operations with a tree-like structure in a functional setting requires a good bit more complexity than Eric's implementations.
Two final notes (though this is a very interesting topic and I reserve the right to add more later) :-)
Nearly all of the deques mentioned above are finger search trees as well. In a functional setting this means they can be split at the ith element in O(lg(min(i,n-i))) time and two trees of size n and m can be concatenated in O(lg(min(n,m))) time.
I know of two ways of implementing deques that don't use trees. Okasaki presents one in his book and thesis and the paper I linked to above. The other uses a technique called "global rebuilding" and is presented in Chuang and Goldberg's "Real-time deques, multihead Turing machines, and purely functional programming".

If you use a balanced binary tree, insertions and removals on both ends are O(lg N) (both average and worst case).
The approach used in Eric Lippert's implementations is more efficient, running in constant time in the average case (the worst case still is O(lg N)).
Remember that modifying an immutable tree involves rewriting all parents of the node you are modifying. So for a deque, you do not want the tree to be balanced; instead you want the L and R nodes to be as close to the root as possible, whereas nodes in the middle of the tree can be further away.

The other answers are all awesome. I will add to them that I chose the finger tree implementation of a deque because it makes an unusual and interesting use of the generic type system. Most data structures are recursive in their structure, but this technique puts the recursion also in the type system which I had not seen before; I thought it might be of general interest.

Couldn't deques simply be implemented
as binary trees, where elements can
only be inserted or removed on the
very "left" (the front) and on the
very "right" (the back) of the tree?
Absolutely. A modified version of a height-balanced tree, AVL trees in particular, would be very easy to implement. However it means filling tree-based queue with n elements requires O(n lg n) time -- you should shoot for a deque which has similar performance characteristics as the mutable counterpart.
You can create a straightforward immutable deque with amortized constant time operations for all major operations using two stacks, a left and right stack. PushLeft and PushRight correspond to pushing values on the left and right stack respectively. You can get Deque.Hd and Deque.Tl from the LeftStack.Hd and LeftStack.Tl; if your LeftStack is empty, set LeftStack = RightStack.Reverse and RightStack = empty stack.
type 'a deque = Node of 'a list * 'a list // '
let peekFront = function
| Node([], []) -> failwith "Empty queue"
| Node(x::xs, ys) -> x
| Node([], ys) -> ys |> List.rev |> List.head
let peekRear = function
| Node([], []) -> failwith "Empty queue"
| Node(xs, y::ys) -> y
| Node(xs, []) -> xs |> List.rev |> List.head
let pushFront v = function
| Node(xs, ys) -> Node(v::xs, ys)
let pushRear v = function
| Node(xs, ys) -> Node(xs, v::ys)
let tl = function
| Node([], []) -> failwith "Empty queue"
| Node([], ys) -> Node(ys |> List.rev |> List.tail, [])
| Node(x::xs, ys) -> Node(xs, ys)
This is a very common implementation, and its very easy to optimize for better performance.

Related

Base 3 or more search? [duplicate]

I recently heard about ternary search in which we divide an array into 3 parts and compare. Here there will be two comparisons but it reduces the array to n/3. Why don't people use this much?
Actually, people do use k-ary trees for arbitrary k.
This is, however, a tradeoff.
To find an element in a k-ary tree, you need around k*ln(N)/ln(k) operations (remember the change-of-base formula). The larger your k is, the more overall operations you need.
The logical extension of what you are saying is "why don't people use an N-ary tree for N data elements?". Which, of course, would be an array.
A ternary search will still give you the same asymptotic complexity O(log N) search time, and adds complexity to the implementation.
The same argument can be said for why you would not want a quad search or any other higher order.
Searching 1 billion (a US billion - 1,000,000,000) sorted items would take an average of about 15 compares with binary search and about 9 compares with a ternary search - not a huge advantage. And note that each 'ternary compare' might involve 2 actual comparisons.
Wow. The top voted answers miss the boat on this one, I think.
Your CPU doesn't support ternary logic as a single operation; it breaks ternary logic into several steps of binary logic. The most optimal code for the CPU is binary logic. If chips were common that supported ternary logic as a single operation, you'd be right.
B-Trees can have multiple branches at each node; a order-3 B-tree is ternary logic. Each step down the tree will take two comparisons instead of one, and this will probably cause it to be slower in CPU time.
B-Trees, however, are pretty common. If you assume that every node in the tree will be stored somewhere separately on disk, you're going to spend most of your time reading from disk... and the CPU won't be a bottleneck, but the disk will be. So you take a B-tree with 100,000 children per node, or whatever else will barely fit into one block of memory. B-trees with that kind of branching factor would rarely be more than three nodes high, and you'd only have three disk reads - three stops at a bottleneck - to search an enormous, enormous dataset.
Reviewing:
Ternary trees aren't supported by hardware, so they run less quickly.
B-tress with orders much, much, much higher than 3 are common for disk-optimization of large datasets; once you've gone past 2, go higher than 3.
The only way a ternary search can be faster than a binary search is if a 3-way partition determination can be done for less than about 1.55 times the cost of a 2-way comparison. If the items are stored in a sorted array, the 3-way determination will on average be 1.66 times as expensive as a 2-way determination. If information is stored in a tree, however, the cost to fetch information is high relative to the cost of actually comparing, and cache locality means the cost of randomly fetching a pair of related data is not much worse than the cost of fetching a single datum, a ternary or n-way tree may improve efficiency greatly.
What makes you think Ternary search should be faster?
Average number of comparisons:
in ternary search = ((1/3)*1 + (2/3)*2) * ln(n)/ln(3) ~ 1.517*ln(n)
in binary search = 1 * ln(n)/ln(2) ~ 1.443*ln(n).
Worst number of comparisons:
in ternary search = 2 * ln(n)/ln(3) ~ 1.820*ln(n)
in binary search = 1 * ln(n)/ln(2) ~ 1.443*ln(n).
So it looks like ternary search is worse.
Also, note that this sequence generalizes to linear search if we go on
Binary search
Ternary search
...
...
n-ary search ≡ linear search
So, in an n-ary search, we will have "one only COMPARE" which might take upto n actual comparisons.
"Terinary" (ternary?) search is more efficient in the best case, which would involve searching for the first element (or perhaps the last, depending on which comparison you do first). For elements farther from the end you're checking first, while two comparisons would narrow the array by 2/3 each time, the same two comparisons with binary search would narrow the search space by 3/4.
Add to that, binary search is simpler. You just compare and get one half or the other, rather than compare, if less than get the first third, else compare, if less than get the second third, else get the last third.
Ternary search can be effectively used on parallel architectures - FPGAs and ASICs. For example if internal FPGA memory required for search is less than half of the FPGA resource, you can make a duplicate memory block. This would allow to simultaneously access two different memory addresses and do all comparisons in a single clock cycle. This is one of the reasons why 100MHz FPGA can sometimes outperform the 4GHz CPU :)
Here's some random experimental evidence that I haven't vetted at all showing that it's slower than binary search.
Almost all textbooks and websites on binary search trees do not really talk about binary trees! They show you ternary search trees! True binary trees store data in their leaves not internal nodes (except for keys to navigate). Some call these leaf trees and make the distinction between node trees shown in textbooks:
J. Nievergelt, C.-K. Wong: Upper Bounds for the Total Path Length of Binary Trees,
Journal ACM 20 (1973) 1–6.
The following about this is from Peter Brass's book on data structures.
2.1 Two Models of Search Trees
In the outline just given, we supressed an important point that at first seems
trivial, but indeed it leads to two different models of search trees, either of
which can be combined with much of the following material, but one of which
is strongly preferable.
If we compare in each node the query key with the key contained in the
node and follow the left branch if the query key is smaller and the right branch
if the query key is larger, then what happens if they are equal? The two models
of search trees are as follows:
Take left branch if query key is smaller than node key; otherwise take the
right branch, until you reach a leaf of the tree. The keys in the interior node
of the tree are only for comparison; all the objects are in the leaves.
Take left branch if query key is smaller than node key; take the right branch
if the query key is larger than the node key; and take the object contained
in the node if they are equal.
This minor point has a number of consequences:
{ In model 1, the underlying tree is a binary tree, whereas in model 2, each
tree node is really a ternary node with a special middle neighbor.
{ In model 1, each interior node has a left and a right subtree (each possibly a
leaf node of the tree), whereas in model 2, we have to allow incomplete
nodes, where left or right subtree might be missing, and only the
comparison object and key are guaranteed to exist.
So the structure of a search tree of model 1 is more regular than that of a tree
of model 2; this is, at least for the implementation, a clear advantage.
{ In model 1, traversing an interior node requires only one comparison,
whereas in model 2, we need two comparisons to check the three
possibilities.
Indeed, trees of the same height in models 1 and 2 contain at most approximately
the same number of objects, but one needs twice as many comparisons in model
2 to reach the deepest objects of the tree. Of course, in model 2, there are also
some objects that are reached much earlier; the object in the root is found
with only two comparisons, but almost all objects are on or near the deepest
level.
Theorem. A tree of height h and model 1 contains at most 2^h objects.
A tree of height h and model 2 contains at most 2^h+1 − 1 objects.
This is easily seen because the tree of height h has as left and right subtrees a
tree of height at most h − 1 each, and in model 2 one additional object between
them.
{ In model 1, keys in interior nodes serve only for comparisons and may
reappear in the leaves for the identification of the objects. In model 2, each
key appears only once, together with its object.
It is even possible in model 1 that there are keys used for comparison that
do not belong to any object, for example, if the object has been deleted. By
conceptually separating these functions of comparison and identification, this
is not surprising, and in later structures we might even need to define artificial
tests not corresponding to any object, just to get a good division of the search
space. All keys used for comparison are necessarily distinct because in a model
1 tree, each interior node has nonempty left and right subtrees. So each key
occurs at most twice, once as comparison key and once as identification key in
the leaf.
Model 2 became the preferred textbook version because in most textbooks
the distinction between object and its key is not made: the key is the object.
Then it becomes unnatural to duplicate the key in the tree structure. But in
all real applications, the distinction between key and object is quite important.
One almost never wishes to keep track of just a set of numbers; the numbers
are normally associated with some further information, which is often much
larger than the key itself.
You may have heard ternary search being used in those riddles that involve weighing things on scales. Those scales can return 3 answers: left is lighter, both are the same, or left is heavier. So in a ternary search, it only takes 1 comparison.
However, computers use boolean logic, which only has 2 answers. To do the ternary search, you'd actually have to do 2 comparisons instead of 1.
I guess there are some cases where this is still faster as earlier posters mentioned, but you can see that ternary search isn't always better, and it's more confusing and less natural to implement on a computer.
Theoretically the minimum of k/ln(k) is achieved at e and since 3 is closer to e than 2 it requires less comparisons. You can check that 3/ln(3) = 2.73.. and 2/ln(2) = 2.88.. The reason why binary search could be faster is that the code for it will have less branches and will run faster on modern CPUs.
I have just posted a blog about the ternary search and I have shown some results. I have also provided some initial level implementations on my git repo I totally agree with every one about the theory part of the ternary search but why not give it a try? As per the implementation that part is easy enough if you have three years of coding experience.
I found that if you have huge data set and you need to search it many times ternary search has an advantage.
If you think you can do better with a ternary search go for it.
Although you get the same big-O complexity (ln n) in both search trees, the difference is in the constants. You have to do more comparisons for a ternary search tree at each level. So the difference boils down to k/ln(k) for a k-ary search tree. This has a minimum value at e=2.7 and k=2 provides the optimal result.

What invariant do RRB-trees maintain?

Relaxed Radix Balanced Trees (RRB-trees) are a generalization of immutable vectors (used in Clojure and Scala) that have 'effectively constant' indexing and update times. RRB-trees maintain efficient indexing and update but also allow efficient concatenation (log n).
The authors present the data structure in a way that I find hard to follow. I am not quite sure what the invariant is that each node maintains.
In section 2.5, they describe their algorithm. I think they are ensuring that indexing into the node will only ever require e extra steps of linear search after radix searching. I do not understand how they derived their formula for the extra steps, and I think perhaps I'm not sure what each of the variables mean (in particular "a total of p sub-tree branches").
What's how does the RRB-tree concatenation algorithm work?
They do describe an invariant in section 2.4 "However, as mentioned earlier
B-Trees nodes do not facilitate radix searching. Instead we chose
the initial invariant of allowing the node sizes to range between m
and m - 1. This defines a family of balanced trees starting with
well known 2-3 trees, 3-4 trees and (for m=32) 31-32 trees. This
invariant ensures balancing and achieves radix branch search in the
majority of cases. Occasionally a few step linear search is needed
after the radix search to find the correct branch.
The extra steps required increase at the higher levels."
Looking at their formula, it looks like they have worked out the maximum and minimum possible number of values stored in a subtree. The difference between the two is the maximum possible difference between the maximum and minimum number of values underneath a point. If you divide this by the number of values underneath a slot, you have the maximum number of slots you could be off by when you work out which slot to look at to see if it contains the index you are searching for.
#mcdowella is correct that's what they say about relaxed nodes. But if you're splitting and joining nodes, a range from m to m-1 means you will sometimes have to adjust up to m-1 (m-2?) nodes in order to add or remove a single element from a node. This seems horribly inefficient. I think they meant between m and (2 m) - 1 because this allows nodes to be split into 2 when they get too big, or 2 nodes joined into one when they are too small without ever needing to change a third node. So it's a typo that the "2" is missing in "2 m" in the paper. Jean Niklas L’orange's masters thesis backs me up on this.
Furthermore, all strict nodes have the same length which must be a power of 2. The reason for this is an optimization in Rich Hickey's Clojure PersistentVector. Well, I think the important thing is to pack all strict nodes left (more on this later) so you don't have to guess which branch of the tree to descend. But being able to bit-shift and bit-mask instead of divide is a nice bonus. I didn't time the get() operation on a relaxed Scala Vector, but the relaxed Paguro vector is about 10x slower than the strict one. So it makes every effort to be as strict as possible, even producing 2 strict levels if you repeatedly insert at 0.
Their tree also has an even height - all leaf nodes are equal distance from the root. I think it would still work if relaxed trees had to be within, say, one level of one-another, though not sure what that would buy you.
Relaxed nodes can have strict children, but not vice-versa.
Strict nodes must be filled from the left (low-index) without gaps. Any non-full Strict nodes must be on the right-hand (high-index) edge of the tree. All Strict leaf nodes can always be full if you do appends in a focus or tail (more on that below).
You can see most of the invariants by searching for the debugValidate() methods in the Paguro implementation. That's not their paper, but it's mostly based on it. Actually, the "display" variables in the Scala implementation aren't mentioned in the paper either. If you're going to study this stuff, you probably want to start by taking a good look at the Clojure PersistentVector because the RRB Tree has one inside it. The two differences between that and the RRB Tree are 1. the RRB Tree allows "relaxed" nodes and 2. the RRB Tree may have a "focus" instead of a "tail." Both focus and tail are small buffers (maybe the same size as a strict leaf node), the difference being that the focus will probably be localized to whatever area of the vector was last inserted/appended to, while the tail is always at the end (PerSistentVector can only be appended to, never inserted into). These 2 differences are what allow O(log n) arbitrary inserts and removals, plus O(log n) split() and join() operations.

Why use binary search if there's ternary search?

I recently heard about ternary search in which we divide an array into 3 parts and compare. Here there will be two comparisons but it reduces the array to n/3. Why don't people use this much?
Actually, people do use k-ary trees for arbitrary k.
This is, however, a tradeoff.
To find an element in a k-ary tree, you need around k*ln(N)/ln(k) operations (remember the change-of-base formula). The larger your k is, the more overall operations you need.
The logical extension of what you are saying is "why don't people use an N-ary tree for N data elements?". Which, of course, would be an array.
A ternary search will still give you the same asymptotic complexity O(log N) search time, and adds complexity to the implementation.
The same argument can be said for why you would not want a quad search or any other higher order.
Searching 1 billion (a US billion - 1,000,000,000) sorted items would take an average of about 15 compares with binary search and about 9 compares with a ternary search - not a huge advantage. And note that each 'ternary compare' might involve 2 actual comparisons.
Wow. The top voted answers miss the boat on this one, I think.
Your CPU doesn't support ternary logic as a single operation; it breaks ternary logic into several steps of binary logic. The most optimal code for the CPU is binary logic. If chips were common that supported ternary logic as a single operation, you'd be right.
B-Trees can have multiple branches at each node; a order-3 B-tree is ternary logic. Each step down the tree will take two comparisons instead of one, and this will probably cause it to be slower in CPU time.
B-Trees, however, are pretty common. If you assume that every node in the tree will be stored somewhere separately on disk, you're going to spend most of your time reading from disk... and the CPU won't be a bottleneck, but the disk will be. So you take a B-tree with 100,000 children per node, or whatever else will barely fit into one block of memory. B-trees with that kind of branching factor would rarely be more than three nodes high, and you'd only have three disk reads - three stops at a bottleneck - to search an enormous, enormous dataset.
Reviewing:
Ternary trees aren't supported by hardware, so they run less quickly.
B-tress with orders much, much, much higher than 3 are common for disk-optimization of large datasets; once you've gone past 2, go higher than 3.
The only way a ternary search can be faster than a binary search is if a 3-way partition determination can be done for less than about 1.55 times the cost of a 2-way comparison. If the items are stored in a sorted array, the 3-way determination will on average be 1.66 times as expensive as a 2-way determination. If information is stored in a tree, however, the cost to fetch information is high relative to the cost of actually comparing, and cache locality means the cost of randomly fetching a pair of related data is not much worse than the cost of fetching a single datum, a ternary or n-way tree may improve efficiency greatly.
What makes you think Ternary search should be faster?
Average number of comparisons:
in ternary search = ((1/3)*1 + (2/3)*2) * ln(n)/ln(3) ~ 1.517*ln(n)
in binary search = 1 * ln(n)/ln(2) ~ 1.443*ln(n).
Worst number of comparisons:
in ternary search = 2 * ln(n)/ln(3) ~ 1.820*ln(n)
in binary search = 1 * ln(n)/ln(2) ~ 1.443*ln(n).
So it looks like ternary search is worse.
Also, note that this sequence generalizes to linear search if we go on
Binary search
Ternary search
...
...
n-ary search ≡ linear search
So, in an n-ary search, we will have "one only COMPARE" which might take upto n actual comparisons.
"Terinary" (ternary?) search is more efficient in the best case, which would involve searching for the first element (or perhaps the last, depending on which comparison you do first). For elements farther from the end you're checking first, while two comparisons would narrow the array by 2/3 each time, the same two comparisons with binary search would narrow the search space by 3/4.
Add to that, binary search is simpler. You just compare and get one half or the other, rather than compare, if less than get the first third, else compare, if less than get the second third, else get the last third.
Ternary search can be effectively used on parallel architectures - FPGAs and ASICs. For example if internal FPGA memory required for search is less than half of the FPGA resource, you can make a duplicate memory block. This would allow to simultaneously access two different memory addresses and do all comparisons in a single clock cycle. This is one of the reasons why 100MHz FPGA can sometimes outperform the 4GHz CPU :)
Here's some random experimental evidence that I haven't vetted at all showing that it's slower than binary search.
Almost all textbooks and websites on binary search trees do not really talk about binary trees! They show you ternary search trees! True binary trees store data in their leaves not internal nodes (except for keys to navigate). Some call these leaf trees and make the distinction between node trees shown in textbooks:
J. Nievergelt, C.-K. Wong: Upper Bounds for the Total Path Length of Binary Trees,
Journal ACM 20 (1973) 1–6.
The following about this is from Peter Brass's book on data structures.
2.1 Two Models of Search Trees
In the outline just given, we supressed an important point that at first seems
trivial, but indeed it leads to two different models of search trees, either of
which can be combined with much of the following material, but one of which
is strongly preferable.
If we compare in each node the query key with the key contained in the
node and follow the left branch if the query key is smaller and the right branch
if the query key is larger, then what happens if they are equal? The two models
of search trees are as follows:
Take left branch if query key is smaller than node key; otherwise take the
right branch, until you reach a leaf of the tree. The keys in the interior node
of the tree are only for comparison; all the objects are in the leaves.
Take left branch if query key is smaller than node key; take the right branch
if the query key is larger than the node key; and take the object contained
in the node if they are equal.
This minor point has a number of consequences:
{ In model 1, the underlying tree is a binary tree, whereas in model 2, each
tree node is really a ternary node with a special middle neighbor.
{ In model 1, each interior node has a left and a right subtree (each possibly a
leaf node of the tree), whereas in model 2, we have to allow incomplete
nodes, where left or right subtree might be missing, and only the
comparison object and key are guaranteed to exist.
So the structure of a search tree of model 1 is more regular than that of a tree
of model 2; this is, at least for the implementation, a clear advantage.
{ In model 1, traversing an interior node requires only one comparison,
whereas in model 2, we need two comparisons to check the three
possibilities.
Indeed, trees of the same height in models 1 and 2 contain at most approximately
the same number of objects, but one needs twice as many comparisons in model
2 to reach the deepest objects of the tree. Of course, in model 2, there are also
some objects that are reached much earlier; the object in the root is found
with only two comparisons, but almost all objects are on or near the deepest
level.
Theorem. A tree of height h and model 1 contains at most 2^h objects.
A tree of height h and model 2 contains at most 2^h+1 − 1 objects.
This is easily seen because the tree of height h has as left and right subtrees a
tree of height at most h − 1 each, and in model 2 one additional object between
them.
{ In model 1, keys in interior nodes serve only for comparisons and may
reappear in the leaves for the identification of the objects. In model 2, each
key appears only once, together with its object.
It is even possible in model 1 that there are keys used for comparison that
do not belong to any object, for example, if the object has been deleted. By
conceptually separating these functions of comparison and identification, this
is not surprising, and in later structures we might even need to define artificial
tests not corresponding to any object, just to get a good division of the search
space. All keys used for comparison are necessarily distinct because in a model
1 tree, each interior node has nonempty left and right subtrees. So each key
occurs at most twice, once as comparison key and once as identification key in
the leaf.
Model 2 became the preferred textbook version because in most textbooks
the distinction between object and its key is not made: the key is the object.
Then it becomes unnatural to duplicate the key in the tree structure. But in
all real applications, the distinction between key and object is quite important.
One almost never wishes to keep track of just a set of numbers; the numbers
are normally associated with some further information, which is often much
larger than the key itself.
You may have heard ternary search being used in those riddles that involve weighing things on scales. Those scales can return 3 answers: left is lighter, both are the same, or left is heavier. So in a ternary search, it only takes 1 comparison.
However, computers use boolean logic, which only has 2 answers. To do the ternary search, you'd actually have to do 2 comparisons instead of 1.
I guess there are some cases where this is still faster as earlier posters mentioned, but you can see that ternary search isn't always better, and it's more confusing and less natural to implement on a computer.
Theoretically the minimum of k/ln(k) is achieved at e and since 3 is closer to e than 2 it requires less comparisons. You can check that 3/ln(3) = 2.73.. and 2/ln(2) = 2.88.. The reason why binary search could be faster is that the code for it will have less branches and will run faster on modern CPUs.
I have just posted a blog about the ternary search and I have shown some results. I have also provided some initial level implementations on my git repo I totally agree with every one about the theory part of the ternary search but why not give it a try? As per the implementation that part is easy enough if you have three years of coding experience.
I found that if you have huge data set and you need to search it many times ternary search has an advantage.
If you think you can do better with a ternary search go for it.
Although you get the same big-O complexity (ln n) in both search trees, the difference is in the constants. You have to do more comparisons for a ternary search tree at each level. So the difference boils down to k/ln(k) for a k-ary search tree. This has a minimum value at e=2.7 and k=2 provides the optimal result.

Binary Search Tree for specific intent

We all know there are plenty of self-balancing binary search trees (BST), being the most famous the Red-Black and the AVL. It might be useful to take a look at AA-trees and scapegoat trees too.
I want to do deletions insertions and searches, like any other BST. However, it will be common to delete all values in a given range, or deleting whole subtrees. So:
I want to insert, search, remove values in O(log n) (balanced tree).
I would like to delete a subtree, keeping the whole tree balanced, in O(log n) (worst-case or amortized)
It might be useful to delete several values in a row, before balancing the tree
I will most often insert 2 values at once, however this is not a rule (just a tip in case there is a tree data structure that takes this into account)
Is there a variant of AVL or RB that helps me on this? Scapegoat-trees look more like this, but would also need some changes, anyone who has got experience on them can share some thougts?
More precisely, which balancing procedure and/or removal procedure would help me keep this actions time-efficient?
It is possible to delete a range of values a BST in O(logn + objects num).
The easiest way I know is to work with the Deterministic Skip List data structure (you might want to read a bit about this data structure before you go on).
In the deterministic skip list all of the real values are stored in the bottom level, and there are pointers on upper levels to them. Insert, search and remove are done in O(logn).
The range deletion operation can be done according to the following algorithm:
Find the first element in the range - O(logn)
Go forward in the linked list, and remove all elements that are still in the range. If there are elements with pointers to the upper levels - remove them too, until reaching the topmost level (removal from a linked list) - O(number of deleted objects)
Fix the pointers to fit deterministic skip list (2-3 elements between every pointer upward)
The total complexity of the range delete is O(logn + number of objects in the range).
Notice that if you choose to work with a random skip list, you get the same complexity, but on average, and not worst case. The plus is that you don't have to fix the upper level pointers to meet the 2-3 demand.
A deterministic skip list has a 1-1 mapping to a 2-3 tree, so with some more work, the procedure described above could work for a 2-3 tree as well.
Long ago in the pre-STL days I wrote my own B-Tree (BST) algorithm because I had a rather large data set at the time (roughly 700K items in 2 trees that were interdependent). I found that rebalancing after every 100-200 insertions/deletions was the peak performance I could get at the time based on experimentation on 486 and SGI hardware. This number may be different now, or maybe not since it does appear to be an algorithmic optimization limit unless you convert to a parallel model.
In short, you could apply a modification trigger for the rebalancing, and allow for forced rebalancing when you've completed all your modifications.
The improvement was remarkable. The initial straight load was not complete after 25m (killed the process). Rebalancing as we went also was killed after 15m. The restricted modification loads with a rebalance every 100 mods loaded and ran in less than 3m. Note that during the "run" portion, there were 0-8 modification to the tree per initial entry. You really need to consider whether you always need to be in-balance when the tree will be modified again in the near term.
Hmm, what about B-trees? They are also balanced, and if you choose a big-order one --- it depends on how many items do you have ---, you will save a bunch of object creation/destruction times.
To 2. If you have a B-tree of order 100, you can remove up to 100 items by one function call.
To 3. This feature can be applied to almost any of the trees, just implement a RemoveSome() function that removes N items and does a rebalance. For B-trees, it's a bit trickier, but can be done.
Note: I supposed you're a programmer. If you need a complete, tested, off-the-shelf solution, you need another answer.
It should be easy to implement deleting a node and its sub nodes in an AVL tree if every node stores its height instead of a balance factor. After deleting a node keep rotating until the two child nodes differ by no more than one. Then move up the tree and repeat. The only real difference from a normal deletion will be a while instead of an if for testing the heights.
The Set implementation in the OCaml standard library is a purely functional AVL tree that satisfies all of your requirements and, in particular, has very efficient implementations of set theoretic operations (union, intersection, difference). Insertion and deletion are O(log n). You can remove subtrees and runs of elements by representing them as a set and using set difference. You can insert two elements simultaneously by creating a 2-element set and applying set union.

Sorting a binary search tree on different key value

Say I have a binary tree with the following definition for a node.
struct node
{
int key1 ;
int key2 ;
}
The binary search tree is created on the basis of key1. Now is it possible to rearrange the binary search tree on basis of key2 in O(1) space. Although I can do this in variable space using an array of pointers to nodes.
The actual problem where I require this is "counting number of occurrences of unique words in a file and displaying the result in decreasing order of frequency."
Here, a BST node is
{
char *word;
int freq ;
}
The BST is first created on basis of alphabetic order of words and finally I want it on basis of freq.
Am I wrong at choice of data structure i.e a BST?
I think you can create a new tree sorted by freq and push there all elements popping them from an old tree.
That could be O(1) though likely more like O(log N) which isn't big anyway.
Also, I don't know how you call it in C#, but in Python you can use list but sort it by two different keys in-place.
Map, BST are good if you need to have sorted output for your dictionnary.
And it is good if you need to mix up add, remove and lookup operations.
I don't think this is your need here. You load the dictionnary, sort it, then do only look up in it, that's right ?
In this case a sorted array is probably a better container. (See Item 23 from Effective STL from Scott Meyer).
(Update: simply consider that a map could generate more memory cache misses than a sorted array, as an array get its data contiguous in memory, and as each node in a map contain 2 pointers to other nodes in the map. When your objects are simple and take not much space in memory, a sorted vector is probable a better option. I warmly recommand you to read that item from Meyer's book)
About the kind of sort you are talking about, you will need that algorithm from the stl:
stable_sort.
The idea is to sort the dictionnary, then sort with stable_sort() on the frequence key.
It will give something like that (not tested actually, but you got the idea):
struct Node
{
char * word;
int key;
};
bool operator < (const Node& l, const Node& r)
{
return std::string(l.word) < std::string(r.word));
}
bool freq_comp(const Node& l, const Node& r)
{
return l.key < r.key;
}
std::vector<node> my_vector;
... // loading elements
sort(vector.begin(), vector.end());
stable_sort(vector.begin(), vector.end(), freq_comp);
Using a HashTable (Java) or Dictionary (.NET) or equivalent data structure in your language of choice (hash_set or hash_map in STL) will give you O(1) inserts during the counting phase, unlike the binary search tree which would be somewhere from O(log n) to O(n) on insert depending on whether it balances itself. If performance is really that important just make sure you try to initialize your HashTable to a large enough size that it won't need to resize itself dynamically, which can be expensive.
As for listing by frequency, I can't immediately think of a tricky way to do that without involving a sort, which would be O(n log n).
Here is my suggestion for re-balancing the tree based off of the new keys (well, I have 2 suggestions).
The first and more direct one is to somehow adapt Heapsort's "bubble-up" function (to use Sedgewick's name for it). Here is a link to wikipedia, there they call it "sift-up". It is not designed for an entirely-unbalanced tree (which is what you'd need), but I believe it demonstrates the basic flow of an in-place reordering of a tree. It may be a bit hard to follow because the tree is in fact stored in array rather than a tree (though the logic in a sense treats it as a tree) --- perhaps, though, you'll find such an array-based representation is best! Who knows.
The more crazy-out-there suggestion of mine is to use a splay tree. I think they're nifty, and here's the wiki link. Basically, whichever element you access is "bubbled up" to the top, but it maintains the BST invariants. So you maintain the original Key1 for building the initial tree, but hopefully most of the "higher-frequency" values will also be near the top. This may not be enough (as all it will mean is that higher-frequency words will be "near" the top of the tree, not necessarily ordered in any fashion), but if you do happen to have or find or make a tree-balancing algorithm, it may run a lot faster on such a splay tree.
Hope this helps! And thank you for an interesting riddle, this sounds like a good Haskell project to me..... :)
You can easily do this in O(1) space, but not in O(1) time ;-)
Even though re-arranging a whole tree recursively until it is sorted again seems possible, it is probably not very fast - it may be O(n) at best, probably worse in practice. So you might get a better result by adding all nodes to an array once you are done with the tree and just sorting this array using quicksort on frequency (which will be O(log n) on average). At least that's what I would do. Even tough it takes extra space it sounds more promising to me than re-arranging the tree in place.
One approach you could consider is to build two trees. One indexed by word, one indexed by freq.
As long as the tree nodes contain a pointer to the data node, you could access if via the word-based tree to update the info, but later access it by the freq-based tree to output.
Although, if speed is really that important, I'd be looking to get rid of the string as a key. String comparisons are notoriously slow.
If speed is not important, I think your best bet is to gather the data based on word and re-sort based on freq as yves has suggested.

Resources