Where red-black trees are useful [duplicate] - data-structures

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Red-Black Trees
I started watching a lecture at mit on red-black trees and after 15 minutes gave up.
Sure, I have not watched the previous 10 lectures but why no real world example before going into the theory?
Can someone give an example and explain why red-black trees are an essential data structure?

Red-black trees are self-balancing, and so can insert, delete, and search in O(log n) time. Other types of balanced trees (e.g. AVL trees) are often slower for insert and delete operations.
In addition, the code for red-black trees tends to be simpler.
They are good for creating Maps or associative arrays, and special-purpose data stores. I used one in a high speed telecom application to implement a least-cost routing system.

Note: I haven't seen the lecture.
Red-black trees are binary search trees that are self-balancing when an item is added or removed. This feature guarantees that every search within this tree has O(logn) complexity.
If you construct a tree without balancing it, it is possible to create a degenrated tree that is effectively a linked list with O(n) complexity.

Wikipedia says "self-balancing binary search tree".
"Put very simply, a red–black tree is a binary search tree that inserts and deletes in such a way that the tree is always reasonably balanced."
Which helps when data happens to be sorted. Without the balancing the tree devolves to a linked list.

Wikipedia provides an explanation and an important example.
Red-black trees provide worst-case guarantees for insertion time, deletion time and search time. This can be very useful for real time applications.
A real world example is the Completely Fair Scheduler in the Linux kernel.

Related

How is balancing of binary trees implemented?

I'm studying how to balance trees and I have some questions
Is it possible to balance a normal binary tree? If yes, which algorithm should be used?
Do I necessarily have to use a AVL or Red-black tree to obtain a balanced tree? How do these work?
I read something about rotations, weights but I'm kind of confused right now
Is it possible to balance a normal binary tree? If yes, which
algorithm should be used?
In O(n) you can build a complete tree, and populate it with the elements in in-order traversal.
It cannot be done better, because A BST might in rare cases decay to a chain (linked list), where all nodes have one son as null. In this cases, accessing the element in the middle is O(n) itself.
Do I necessarily have to use a AVL or Red-black tree to obtain a
balanced tree?
There are other balanced trees such as B+ trees, and other data structures (not trees) such as skip-lists. You might want to have a look at a list of known data structures, especially the trees section.
How do these work?
I find the wikipedia articles both on AVL tree and Red-Black tree very informative. If you have something specific you don't understand there - you should ask.
Also: Trying to implement a balanced trees on your own (Implement a known tree, not inventing a new one - of course) - is great for educational purposes, and by doing so - you will definitely understand how it works.
Well... AVL and red-black trees are "normal binary trees" that are balanced, and keep that balance (for some definition of "balanced"). I'm not a computer science teacher to come up with my own explanation of the algorithms, and I guess you aren't looking for a cut&paste from Wikipedia :-)
Now, for balancing binary trees: if the tree is a search tree (i.e. 'sorted', but 'balanced' doesn't really make all that much sense if it's not) you could always just recreate the tree. The simplest algorithm is to use an array with all the elements from the tree, in sorted order (easily obtained from an inorder traversal). Then build an algorithm around this general idea:
take the middle element of the array as the root of the tree. This will create a tree node, and two arrays "left" and "right", which are meant to form the left and right subtrees
Apply this same algorithm recursively to create a tree from the "left" array and one from the "right" array. These two trees become the children of the parent node.
You might have to be careful with the case when the array has an even number of elements: there is no obvious "middle element", and removing one of the two candidates will create arrays of different sizes. I'm too lazy to analyze this further to see if that could offset the whole balancing thing.
Of course, doing something like this every time you change the tree isn't such a great idea; you really want to use self-balancing trees like AVL for that. Doing it after creating the tree might not be all that useful either: you could just use the array itself and do binary searches on it, instead of making a tree. The array IS just another form of a binary tree...
EDIT: there is a reason why a lot of computer scientists have spent a lot of time developing data structures and algorithms that perform well in certain situations. Rolling your own version of a balanced binary tree is unlikely to beat these...
Can you balance an unbalanced tree?
Yes, You can. You use the same balance function you created for your AVL Tree inside a PostOrderTraversal function.
Should You Do it?
No!!! You should recreate it! Balancing the tree will cost you unnecessarily.
How do I recreate it?
Use an InOrderTraversal function to put your nodes into an array. Then use a variable that will always go to the middle of the array and the left middle, right middle and add the nodes to the new Tree.
Is it possible to balance a normal binary tree? If yes, which algorithm should be used?
Do I necessarily have to use a AVL or Red-black tree to obtain a balanced tree? How do these work?
In general, Trees are either unbalanced or balanced. AVL, Red-Black, 2-3, e.t.c. are just trees with some properties and according to their properties they use some extra variables and functions. Those extra variables and function can also be used in the "normal" binary trees. In other words those functions and variables are not bounded to their respective type of tree. The nodes of a "normal" binary tree always had a balance! You just didn't use it because you didn't care if the "normal" binary tree was balanced or not. They also always had a height, depth, e.t.c. You just didn't care. In general, you will realize at one point that all are a trade-off between speed and memory. If you know what you are doing, more memory usage will make your program faster. Less memory usage means more calculations so you will have a slower program.

Why dont we use 2-3 or 2-3-4-5 trees?

I have a basic understanding of how 2-3-4 trees maintain the height balance property operation after operation to make sure even the worst case operations are O(n logn).
But I do not understand it well enough to know why only 2-3-4?
Why not 2-3 or 2-3-4-5 etc?
Implementation of 2-3-4 trees typically requires either multiple classes (2NODE, 3NODE, 4NODE) or you have just NODE that has an array of items. In the case of multiple classes you waste lots of time constructing and destructing node instances and reparenting them is cumbersome. If you use a single class with arrays to hold items and children then you are either resizing arrays constantly which is similarly wasteful or you wind up wasting over half your memory on unused array elements. It's just not very efficient compared to Red-Black trees.
Red-Black trees have only one type of node structure. Since Red-Black trees have a duality with 2-3-4 trees, RB trees can use the exact same algorithms as 2-3-4 trees (no need for the stupidly confusing/complex implementations described in Cormen, Leiserson and Rivest that led to AA trees which are not less complex than the 2-3-4 algorithm.)
So, Red-Black trees for their ease of implementation plus their memory/CPU efficiency. (AVL trees are nice too. They produce more well balanced trees and are stupid simply to code but they tend to be less efficient due to working too often to maintain only a slightly more compact tree.)
Oh, and 2-3-4-5-6... etc aren't done because nothing is gained. 2-3-4 has a net-gain over 2-3 trees because they can be done without recursion easily (recursion tends to be less efficient, especially when it cannot be coded tail-recursively). However, B-Trees and Bplus-Trees are pretty much 2-3-4-5-6-7-8-9-etc trees where the max size of the nodes, n, is chosen so that n records can be stored in a single disk sector. (i.e. each disk sector is a node in the tree and the size of the sector is equivalent to the number of items stored in the node.) This is because the time to search through 512 records linearly in memory is still MUCH faster than traversing down a level in the tree which requires another disk seek/fetch. and O(512) is still O(1) and thus maintains O(lg n) for the tree.
To be honest, I wasn't aware of 2-3-4 trees. At my Data Structures class, we were taught 2-3 trees, and to be honest, most of us implemented AVL trees for the wet part of the exercise.
But apparently, there's a generalization of this type of tree:
(a,b) tree.
At my algoritm course they told us that they are commonly used for acessing memory from hard disc - known as B/B+ trees. You make tree that store sizeof your avabile ram and by doing so you minimalize number of reed from disc operation (if you made B with node that store for example 10^8 elements you only need log_10^8(n) reed from disc operation to find something on hard disc which is nothing. So something that you called 2-3-4-5-... trees is in fact widespread solution.

Create a balanced binary search tree from a stream of integers

I have just finished a job interview and I was struggling with this question, which seems to me as a very hard question for giving on a 15 minutes interview.
The question was:
Write a function, which given a stream of integers (unordered), builds a balanced search tree.
Now, you can't wait for the input to end (it's a stream), so you need to balance the tree on the fly.
My first answer was to use a Red-Black tree, which of course does the job, but i have to assume they didn't expect me to implement a red black tree in 15 minutes.
So, is there any simple solution for this problem i'm not aware of?
Thanks,
Dave
I personally think that the best way to do this would be to go for a randomized binary search tree like a treap. This doesn't absolutely guarantee that the tree will be balanced, but with high probability the tree will have a good balance factor. A treap works by augmenting each element of the tree with a uniformly random number, then ensuring that the tree is a binary search tree with respect to the keys and a heap with respect to the uniform random values. Insertion into a treap is extremely easy:
Pick a random number to assign to the newly-added element.
Insert the element into the BST using standard BST insertion.
While the newly-inserted element's key is greater than the key of its parent, perform a tree rotation to bring the new element above its parent.
That last step is the only really hard one, but if you had some time to work it out on a whiteboard I'm pretty sure that you could implement this on-the-fly in an interview.
Another option that might work would be to use a splay tree. It's another type of fast BST that can be implemented assuming you have a standard BST insert function and the ability to do tree rotations. Importantly, splay trees are extremely fast in practice, and it's known that they are (to within a constant factor) at least as good as any other static binary search tree.
Depending on what's meant by "search tree," you could also consider storing the integers in some structure optimized for lookup of integers. For example, you could use a bitwise trie to store the integers, which supports lookup in time proportional to the number of bits in a machine word. This can be implemented quite nicely using a recursive function to look over the bits, and doesn't require any sort of rotations. If you needed to blast out an implementation in fifteen minutes, and if the interviewer allows you to deviate from the standard binary search trees, then this might be a great solution.
Hope this helps!
AA Trees are a bit simpler than Red-Black trees, but I couldn't implement one off the top of my head.
One of the simplest balanced binary search tree is BB(α)-tree. You pick the constant α, which says how much unbalanced can the tree get. At all times, #descendants(child) <= (1-α) × #descendants(node) must hold. You treat it as normal binary search tree, but when the formula doesn't apply to some node anymore, you just rebuild that part of the tree from scratch, so that it is perfectly balanced.
The amortized time complexity for insertion or deletion is still O(log N), just as with other balanced binary trees.

Applications of red-black trees

What are the applications of red-black (RB) trees? Is there any application where only RB Trees can be used and no other data structures?
A red-black tree is a particular implementation of a self-balancing binary search tree, and today it seems to be the most popular choice of implementation.
Binary search trees are used to implement finite maps, where you store a set of keys with associated values. You can also implement sets by only using the keys and not storing any values.
Balancing the tree is needed to guarantee good performance, as otherwise the tree could degenerate into a list, for example if you insert keys which are already sorted.
The advantage of search trees over hash tables is that you can traverse the tree efficiently in sort order.
AVL-trees are another variant of balanced binary search trees. They were popular before red-black trees were known. They are more carefully balanced, with a maximal difference of one between the heights of the left and right subtree (RB trees guarantee at most a factor of two). Their main drawback is that rebalancing takes more effort.
So red-black trees are certainly a good but not the only choice for this application.
Red Black Trees are from a class of self balancing BSTs and as answered by others, any such self balancing tree can be used. I would like to add that Red-black trees are widely used as system symbol tables. For example they are used in implementing the following:
Java: java.util.TreeMap , java.util.TreeSet .
C++ STL: map, multimap, multiset.
Linux kernel: completely fair scheduler, linux/rbtree.h
Unless you have very specific performance requirements, an R-B tree could be replaced by some other self-balancing binary tree, for example an AVL tree. Choosing between the two of them is basically a performance optimization - they offer the same basic operations.
Not that either of them is definitively "faster" than the other, just that they're different enough that specific uses of them will tend to have slightly different performance, all else being equal. So if you draw your requirements carefully enough, or just by chance, you could end up with one of them being "fast enough" for your use, and the other not. R-B offers slightly faster insertion than AVL, at the cost of slightly slower lookup.
There is no such rule like red black can only be used in a particular case
it depends upon the application in cases like when You have to build the tree only once and you have to query it many times then you can go for a AVL tree because in AVL tree searching is quite fast.. But it is strictly balanced so insertion and deletion may take some time
AVl tree may be used for language dictionery where You have to build the data structure just once
and the red black tree is used in the Completely Fair Scheduler used in current Linux kernels now a days..
the constraints applied on the red black tree also enforce the point that that that the path from the root to the furthest leaf is no more than twice as long as the path from the root to the nearest leaf.
BTW you can look for the various seach and insert etc time required for a red black tree down here
Average Worst case
Space O(n) O(n)
Search O(log n) O(log n)
Insert O(log n) O(log n)
Delete O(log n) O(log n)

How do I determine which kind of tree data structure to choose?

Ok, so this is something that's always bothered me. The tree data structures I know of are:
Unbalanced binary trees
AVL trees
Red-black trees
2-3 trees
B-trees
B*-trees
Tries
Heaps
How do I determine what kind of tree is the best tool for the job? Obviously heaps are canonically used to form priority queues. But the rest of them just seem to be different ways of doing the same thing. Is there any way to choose the best one for the job?
Let’s pick them off one by one, shall we?
Unbalanced binary trees
For search tasks, never. Basically, their performance characteristics will be completely unpredictable and the overhead of balancing a tree won’t be so big as to make unbalanced trees a viable alternative.
Apart from that, unbalanced binary trees of course have other uses, but not as search trees.
AVL trees
They are easy to develop but their performance is generally surpassed by other balancing strategies because balancing them is comparatively time-intensive. Wikipedia claims that they perform better in lookup-intensive scenarios because their height is slightly less in the worst case.
Red-black trees
These are used inside most of C++’ std::map implemenations and probably in a few other standard libraries as well. However, there’s good evidence that they are actually worse than B(+) trees in every scenario due to caching behaviour of modern CPUs. Historically, when caching wasn’t as important (or as good), they surpassed B trees when used in main memory.
2-3 trees
B-trees
B*-trees
These require the most careful consideration of all the trees, since the different constants used are basically “magical” constans which relate in weird and sometimes unpredictable way to the underlying hardware architecture. For example, the optimal number of child nodes per level can depend on the size of a memory page or cache line.
I know of no good, general rule to distinguish between them.
Tries
Completely different. Tries are also search trees, but for text retrieval of substrings in a corpus. A trie is an uncompressed prefix tree (i.e. a tree in which the paths from root to leaf nodes correspond to all the prefixes of a given string).
Tries should be compared to, and offset against, suffix trees, suffix arrays and q-gram indices – not so much against other search trees because the data that they search is different: instead of discrete words in a corpus, the latter index structures allow a factor search.
Heaps
As you’ve already said, they are not search trees at all.
The same as any other data structure, you have to know the characteristics (complexity of search, insert, and delete operations) of each type of tree, and the requirements of the job you're selecting a tool for. The tree that has the best performance for the type of operations you'll do most often is usually the best tool for the job.
You can usually find the general characteristics for any kind of data structure on Wikipedia. Introduction to Algorithms also has at least a section (in some cases a whole chapter) on most of the data structures you've listed, so it's another good reference.
Similar question: When to choose RB tree, B-Tree or AVL tree?
Offhand, I'd say, write the simplest code that could possibly work (availing yourself of library-provided data structures if possible). Then measure its performance problems, if any.
If your performance needs are really extreme, read Konrad Rudolph's awesome answer. :)
Each of these has different complexity for insertion, deletion and retrieval, All have mostly O log(n) access times.
Each tree has specific characteristics which make them usefull in a certain way. You should compare there characteristics with the needs you have.

Resources