There are many algorithms for finding optimal binary search trees given a set of keys and the associated probabilities of those keys being chosen. The binary search tree produced this way will have the lowest expected times to look up those elements. However, this binary search tree might not be optimal with regards to other measures. For example, if you attempt to look up a key that is not contained in the tree, the lookup time might be very large, as the tree might be imbalanced in order to optimize lookups of certain elements.
I am currently interested in seeing how to build an binary search tree from a set of keys where the goal is to minimize the time required to find the successor of some particular value. That is, I would like the tree to be structured in a way where, given some random key k, I can find the successor of k as efficiently as possible. I happen to know in advance the probability that a given random key falls in-between any two of the keys the tree is constructed from.
Does anyone know of an algorithm for this problem? Or am I mistaken that the standard algorithm for building optimal binary search trees will not produce efficient trees for this use case?
So now I feel silly, because there's an easy answer to this question. :-)
You use the standard, off-the-shelf algorithm for constructing optimal binary search trees to construct a binary search tree for the set of keys. You then annotate each node so that it stores the entire range between its key and the key before it. This means that you can find the successor efficiently by doing a standard search on the optimally-built tree. If at any point the key you're looking for is found to be contained in a range held in some node, then you're done. In other words, finding the successor is equivalent to just doing a search for the value in the BST.
Related
I have implemented several splay tree algorithms.
What's the best way to compare them?
Is it a good start to compare execution time when adding random nodes?
I've also implemented an Binary Search Tree that keeps track of how much every node is visited. I wrote an optimize() method that creates an Optimal Binary Search Tree.
If we do not plan on modifying a search tree, and we know exactly how often each item will be accessed, we can construct an optimal binary search tree, which is a search tree where the average cost of looking up an item (the expected search cost) is minimized.
How can I involve this in the comparison of splay trees?
I like the empirical approach.
In this approach:
Create a bunch of random typical data sets, of various lengths.
Run each implementation and find out what is the execution time for each.
Use Hypothesis testing methods to find out if one implementation is better then the other. In here, the null hypothesis (H0) is "The two implementations should take the same time to execute, on average.
Conclude from step 3 that one implementation is better then the other, with probability 1-p (where p is your p_value).
PS Wilcoxon test is considered a good one, and is used a lot in literature and research to compare two algorithms.
Binary research executing usually cause memory leaking problem, although it is faster than linear search.
These two search methods, depth first search and binary search, which is more adapted to search random numbers.
Depth first search is the answer here. Because of the nature of binary search, binary search cannot search random numbers (in trees or elsewhere), only sorted numbers. You see, in a stereotypical binary search the middle value is analyzed (or root of tree). If the target value is higher, then the second half of the search domain is chosen, if the number is lower, then the first half. The search is then performed recursively on whichever half is chosen. For this reason, binary search will not work at all on a randomly sorted list of values. I will not go into the specifics of DFS since this question is answered. I'm sure there is a good WIKI on it.
Binary Search would not be the best option for sorting random numbers.
Binary search is a search algorithm finds an element by taking the value of the middle element or root node and compares every other value in the data structure to it. It must begin with a sorted data structure. If the other number is lower than the midpoint, the lower half of the data structure becomes the full structure and the midpoint is found on the lower half. If the other number is higher than the midpoint, then the same process is performed on the higher half of the data structure. This process is repeated until the value is found or the halves cross.
Depth-first search is a search and traversal algorithm that visits nodes in a tree or graph data structure down a path as far as it can go before backtracking. It uses a stack to keep track of all the adjacent nodes of the node being visited and then continues to visit all its neighbors until those have also been considered visited. After all the nodes on the path has been visited, the algorithm backtracks using the stack.
DFS would be the best option for searching random numbers because the prerequisite of Binary Search is for it to be sorted initially. If the numbers are not sorted, it defeats the purpose of the algorithm. DFS will be able to find a value in the structure of random numbers in O(V + E) time complexity.
I have just finished a job interview and I was struggling with this question, which seems to me as a very hard question for giving on a 15 minutes interview.
The question was:
Write a function, which given a stream of integers (unordered), builds a balanced search tree.
Now, you can't wait for the input to end (it's a stream), so you need to balance the tree on the fly.
My first answer was to use a Red-Black tree, which of course does the job, but i have to assume they didn't expect me to implement a red black tree in 15 minutes.
So, is there any simple solution for this problem i'm not aware of?
Thanks,
Dave
I personally think that the best way to do this would be to go for a randomized binary search tree like a treap. This doesn't absolutely guarantee that the tree will be balanced, but with high probability the tree will have a good balance factor. A treap works by augmenting each element of the tree with a uniformly random number, then ensuring that the tree is a binary search tree with respect to the keys and a heap with respect to the uniform random values. Insertion into a treap is extremely easy:
Pick a random number to assign to the newly-added element.
Insert the element into the BST using standard BST insertion.
While the newly-inserted element's key is greater than the key of its parent, perform a tree rotation to bring the new element above its parent.
That last step is the only really hard one, but if you had some time to work it out on a whiteboard I'm pretty sure that you could implement this on-the-fly in an interview.
Another option that might work would be to use a splay tree. It's another type of fast BST that can be implemented assuming you have a standard BST insert function and the ability to do tree rotations. Importantly, splay trees are extremely fast in practice, and it's known that they are (to within a constant factor) at least as good as any other static binary search tree.
Depending on what's meant by "search tree," you could also consider storing the integers in some structure optimized for lookup of integers. For example, you could use a bitwise trie to store the integers, which supports lookup in time proportional to the number of bits in a machine word. This can be implemented quite nicely using a recursive function to look over the bits, and doesn't require any sort of rotations. If you needed to blast out an implementation in fifteen minutes, and if the interviewer allows you to deviate from the standard binary search trees, then this might be a great solution.
Hope this helps!
AA Trees are a bit simpler than Red-Black trees, but I couldn't implement one off the top of my head.
One of the simplest balanced binary search tree is BB(α)-tree. You pick the constant α, which says how much unbalanced can the tree get. At all times, #descendants(child) <= (1-α) × #descendants(node) must hold. You treat it as normal binary search tree, but when the formula doesn't apply to some node anymore, you just rebuild that part of the tree from scratch, so that it is perfectly balanced.
The amortized time complexity for insertion or deletion is still O(log N), just as with other balanced binary trees.
I've seen this data structure talked about a lot, but I am unclear as to what sort of problem would demand such a data structure (over alternative representations). I've never needed one, but perhaps that's because I don't quite grok it. Can you enlighten me?
One example of where you would use a binary search tree would be a sorted list of values where you want to be able to quickly add elements.
Consider using an array for this purpose. You have very fast access to read random values, but if you want to add a new value, you have to find the place in the array where it belongs, shift everything over, and then insert the new value.
With a binary search tree, you simply traverse the tree looking for where the value would be if it were in the tree already, and then add it there.
Also, consider if you want to find out if your sorted array contains a particular value. You have to start at one end of the array and compare the value you're looking for to each individual value until you either find the value in the array, or pass the point where it would have been. With a binary search tree, you greatly reduce the number of comparisons you are likely to have to make. Just a quick caveat, however, it is definitely possible to contrive situations where the binary search tree requires more comparisons, but these are the exception, not the rule.
One thing I've used it for in the past is Huffman decoding (or any variable-bit-length scheme).
If you maintain your binary tree with the characters at the leaves, each incoming bit decides whether you move to the left or right node.
When you reach a leaf node, you have your decoded character and you can start on the next one.
For example, consider the following tree:
.
/ \
. C
/ \
A B
This would be a tree for a file where the predominant letter was C (by having less bits used for common letters, the file is shorter than it would be for a fixed-bit-length scheme). The codes for the individual letters are:
A: 00 (left, left).
B: 01 (left, right).
C: 1 (right).
The class of problems you use then for are those where you want to be able to both insert and access elements reasonably efficiently. As well as unbalanced trees (such as the Huffman example above), you can also use balanced trees which make the insertions a little more costly (since you may have to rebalance on the fly) but make lookups a lot more efficient since you're traversing the minimum possible number of nodes.
from wiki
Self-balancing binary search trees can be used in a natural way to construct and maintain ordered lists, such as priority queues. They can also be used for associative arrays; key-value pairs are simply inserted with an ordering based on the key alone. In this capacity, self-balancing BSTs have a number of advantages and disadvantages over their main competitor, hash tables. One advantage of self-balancing BSTs is that they allow fast (indeed, asymptotically optimal) enumeration of the items in key order, which hash tables do not provide. One disadvantage is that their lookup algorithms get more complicated when there may be multiple items with the same key.
Self-balancing BSTs can be used to implement any algorithm that requires mutable ordered lists, to achieve optimal worst-case asymptotic performance. For example, if binary tree sort is implemented with a self-balanced BST, we have a very simple-to-describe yet asymptotically optimal O(n log n) sorting algorithm. Similarly, many algorithms in computational geometry exploit variations on self-balancing BSTs to solve problems such as the line segment intersection problem and the point location problem efficiently. (For average-case performance, however, self-balanced BSTs may be less efficient than other solutions. Binary tree sort, in particular, is likely to be slower than mergesort or quicksort, because of the tree-balancing overhead as well as cache access patterns.)
Self-balancing BSTs are flexible data structures, in that it's easy to extend them to efficiently record additional information or perform new operations. For example, one can record the number of nodes in each subtree having a certain property, allowing one to count the number of nodes in a certain key range with that property in O(log n) time. These extensions can be used, for example, to optimize database queries or other list-processing algorithms.
Today i listened a lecture about fenwick trees (binary indexed trees) and the teacher says than this tree is a generalization of interval and segment trees, but my implementations of this three data structures are different.
Is this afirmation true? and Why?
The following classification seems sensible although different people are bound to mix these terms up.
Fenwick tree/Binary-indexed tree link
The one where you use a single array and operations on the binary representation to store prefix sums (also called cumulative sums). Elements can be members of a monoid.
Range tree link
The family of trees where each node represents a subrange of a given range, say [0, N]. Used to compute associative operations on intervals.
Interval tree link
Trees where you store actual intervals. Most commonly you take a midpoint, keep the intersecting intervals at the node and repeat the process for the intervals to the left and to the right of the point.
Segment tree link
Similar to a range tree where leaves are elementary intervals in a possibly continuous space rather than discrete and higher nodes are unions of the elementary intervals. Used to check for point inclusion.
I have never heard binary indexed trees called a generalization of anything. It's certainly not a generalization of interval trees and segment trees. I suggest you follow the links to convince yourself of this.
than this tree is a generalization of interval and segment trees
If by "this tree" your teacher meant "the binary indexed tree", then he is wrong.
but my implementations of this three data structures are different
Of course they are different, your teacher never said they shouldn't be. He just said one is a generalization of the other (which isn't true, but still). Either way, the implementations are supposed to be different.
What would have the same implementation is a binary indexed tree and a fenwick tree, because those are the same thing.