I am having trouble understanding why tries have an O(1) lookup time and binary trees are O(logn).
I understand that they are basically trees. Suppose that I have a trie for the English language, containing words up to 16 characters. Lookup time is O(16) which simplifies to O(1). This is because each trie node has an array of 26 children (26 letters in the alphabet), and pulling from an array is o(1). So you just need to do 16 pulls.
Whereas for a binary tree, if you have n elements (say n is very large...the entire english alphabet), you search the middle element, then the middle of the subsection depending on whether your element is lower/higher, then the subset of that, etc etc...basically dividing by two each time to get O(log base 2 of n).
The logic for each makes sense. But I feel like I'm missing something.
What's the key difference between the two that allows tries to pull off lookup in constant time whereas binary trees need O(logn)? Couldn't you structure a binary tree to be basically a trie with 2 children only? I suppose I'm having trouble pinpointing the difference.
You can build a trie like a binary tree. Search for kart-trees.
Related
I have an input stream of integers coming in an ascending order, my task is to create a balance binary search tree out of that stream, on the fly. I have gone through the link:BBST from a stream of integers and understood that we can make use of Red-Black trees. The thing is, I am looking for more optimal solutions that use 'sorted information' from the input data.
If you use a red-black tree but always start your insertion at the last node inserted, rather than the root, and use a bottom up insertion algorithm, insertion is O(1) amortized. This means constructing the tree will cost O(n), not Ω(n log n).
If the elements are coming in sorted order, then probably the simplest and most efficient thing to do is just to push each one to the end of a dynamic array (an array that doubles its size whenever it becomes full, for example).
Pushing into the array is amortized O(1).
Searches in a sorted array are O(log(n)) using binary search. Moreover, it's logarithmic time with very low constants.
Despite its simplicity, a sorted array is a type of a blanced binary search tree.
The problem is that the amount of data is unknown. Therefore, this tree must be self balancing, regardless of whether the input has a pattern or not (ascending order).
If O(log n) insertion times are not acceptable in this case (say for red-black trees) then I'm at a loss as to why you need to store this in balanced binary tree form in the first place. A binary search on a dynamic array is just as fast as a tree, with amortized O(1) insertion time.
If the size of the stream is known before hand, then of course building the tree would be much easier.
I am trying to brush up a bit on my understanding of binary trees and in particular binary search trees. Looking through the wikipedia showed me the following information (http://en.wikipedia.org/wiki/Binary_search_tree):
"Binary search trees keep their keys in sorted order, so that lookup and other operations can use the principle of binary search: when looking for a key in a tree (or a place to insert a new key), they traverse the tree from root to leaf, making comparisons to keys stored in the nodes of the tree and deciding, based on the comparison, to continue searching in the left or right subtrees. On average, this means that each comparison allows the operations to skip over half of the tree, so that each lookup/insertion/deletion takes time proportional to the logarithm of the number of items stored in the tree. This is much better than the linear time required to find items by key in an unsorted array, but slower than the corresponding operations on hash tables."
Can someone please elaborate / explain the following portions of that description:
1) "On average, this means that each comparison allows the operations to skip over half of the tree, so that each lookup/insertion/deletion takes time proportional to the logarithm of the number of items stored in the tree."
2) [from the last sentence] "...but slower than the corresponding operations on hash tables."
1) "On average" is only applicable if the BST is balanced, i.e. the left and right subtree contain a rougly equal number of nodes. This makes searching an O(log n) operation, because on each iteration you can roughly discard half of the remaining items.
2) On hash tables, searching, insertion and deletion all take expected O(1) time.
I'm using a red-black binary tree with linked leafs on a project (Java's TreeMap), to quickly find and iterate through the items. The problem is that I can easily get 35000 items or so on the tree, and several times I have to remove "all items above X", which can be almost the entire tree (like 30000 items at once, because all of them are bigger than X), and that takes too much time to remove them and rebalance the tree each time.
Is there any algorithm that can help me here (so I can make my own tree implementation)?
You're looking for the split operation on a red/black tree, which takes the red/black tree and some value k and splits it into two red/black trees, one with all keys greater than or equal to k and one with all keys less than k. This can be implemented in O(log n) time if you augment the structure to store some extra information. In your case, since you're using Java, you can just split the tree and discard the root of the tree you don't care about so that the garbage collector can handle it.
Details on how to implement this are given in this paper, starting on page 9. It's implemented in terms of a catenate (or join) operations which combines two trees, but I think the exposition is pretty clear.
Hope this helps!
I have just finished a job interview and I was struggling with this question, which seems to me as a very hard question for giving on a 15 minutes interview.
The question was:
Write a function, which given a stream of integers (unordered), builds a balanced search tree.
Now, you can't wait for the input to end (it's a stream), so you need to balance the tree on the fly.
My first answer was to use a Red-Black tree, which of course does the job, but i have to assume they didn't expect me to implement a red black tree in 15 minutes.
So, is there any simple solution for this problem i'm not aware of?
Thanks,
Dave
I personally think that the best way to do this would be to go for a randomized binary search tree like a treap. This doesn't absolutely guarantee that the tree will be balanced, but with high probability the tree will have a good balance factor. A treap works by augmenting each element of the tree with a uniformly random number, then ensuring that the tree is a binary search tree with respect to the keys and a heap with respect to the uniform random values. Insertion into a treap is extremely easy:
Pick a random number to assign to the newly-added element.
Insert the element into the BST using standard BST insertion.
While the newly-inserted element's key is greater than the key of its parent, perform a tree rotation to bring the new element above its parent.
That last step is the only really hard one, but if you had some time to work it out on a whiteboard I'm pretty sure that you could implement this on-the-fly in an interview.
Another option that might work would be to use a splay tree. It's another type of fast BST that can be implemented assuming you have a standard BST insert function and the ability to do tree rotations. Importantly, splay trees are extremely fast in practice, and it's known that they are (to within a constant factor) at least as good as any other static binary search tree.
Depending on what's meant by "search tree," you could also consider storing the integers in some structure optimized for lookup of integers. For example, you could use a bitwise trie to store the integers, which supports lookup in time proportional to the number of bits in a machine word. This can be implemented quite nicely using a recursive function to look over the bits, and doesn't require any sort of rotations. If you needed to blast out an implementation in fifteen minutes, and if the interviewer allows you to deviate from the standard binary search trees, then this might be a great solution.
Hope this helps!
AA Trees are a bit simpler than Red-Black trees, but I couldn't implement one off the top of my head.
One of the simplest balanced binary search tree is BB(α)-tree. You pick the constant α, which says how much unbalanced can the tree get. At all times, #descendants(child) <= (1-α) × #descendants(node) must hold. You treat it as normal binary search tree, but when the formula doesn't apply to some node anymore, you just rebuild that part of the tree from scratch, so that it is perfectly balanced.
The amortized time complexity for insertion or deletion is still O(log N), just as with other balanced binary trees.
I've seen this data structure talked about a lot, but I am unclear as to what sort of problem would demand such a data structure (over alternative representations). I've never needed one, but perhaps that's because I don't quite grok it. Can you enlighten me?
One example of where you would use a binary search tree would be a sorted list of values where you want to be able to quickly add elements.
Consider using an array for this purpose. You have very fast access to read random values, but if you want to add a new value, you have to find the place in the array where it belongs, shift everything over, and then insert the new value.
With a binary search tree, you simply traverse the tree looking for where the value would be if it were in the tree already, and then add it there.
Also, consider if you want to find out if your sorted array contains a particular value. You have to start at one end of the array and compare the value you're looking for to each individual value until you either find the value in the array, or pass the point where it would have been. With a binary search tree, you greatly reduce the number of comparisons you are likely to have to make. Just a quick caveat, however, it is definitely possible to contrive situations where the binary search tree requires more comparisons, but these are the exception, not the rule.
One thing I've used it for in the past is Huffman decoding (or any variable-bit-length scheme).
If you maintain your binary tree with the characters at the leaves, each incoming bit decides whether you move to the left or right node.
When you reach a leaf node, you have your decoded character and you can start on the next one.
For example, consider the following tree:
.
/ \
. C
/ \
A B
This would be a tree for a file where the predominant letter was C (by having less bits used for common letters, the file is shorter than it would be for a fixed-bit-length scheme). The codes for the individual letters are:
A: 00 (left, left).
B: 01 (left, right).
C: 1 (right).
The class of problems you use then for are those where you want to be able to both insert and access elements reasonably efficiently. As well as unbalanced trees (such as the Huffman example above), you can also use balanced trees which make the insertions a little more costly (since you may have to rebalance on the fly) but make lookups a lot more efficient since you're traversing the minimum possible number of nodes.
from wiki
Self-balancing binary search trees can be used in a natural way to construct and maintain ordered lists, such as priority queues. They can also be used for associative arrays; key-value pairs are simply inserted with an ordering based on the key alone. In this capacity, self-balancing BSTs have a number of advantages and disadvantages over their main competitor, hash tables. One advantage of self-balancing BSTs is that they allow fast (indeed, asymptotically optimal) enumeration of the items in key order, which hash tables do not provide. One disadvantage is that their lookup algorithms get more complicated when there may be multiple items with the same key.
Self-balancing BSTs can be used to implement any algorithm that requires mutable ordered lists, to achieve optimal worst-case asymptotic performance. For example, if binary tree sort is implemented with a self-balanced BST, we have a very simple-to-describe yet asymptotically optimal O(n log n) sorting algorithm. Similarly, many algorithms in computational geometry exploit variations on self-balancing BSTs to solve problems such as the line segment intersection problem and the point location problem efficiently. (For average-case performance, however, self-balanced BSTs may be less efficient than other solutions. Binary tree sort, in particular, is likely to be slower than mergesort or quicksort, because of the tree-balancing overhead as well as cache access patterns.)
Self-balancing BSTs are flexible data structures, in that it's easy to extend them to efficiently record additional information or perform new operations. For example, one can record the number of nodes in each subtree having a certain property, allowing one to count the number of nodes in a certain key range with that property in O(log n) time. These extensions can be used, for example, to optimize database queries or other list-processing algorithms.