Red black tree over avl tree - algorithm

AVL and Red black trees are both self-balancing except Red and black color in the nodes. What's the main reason for choosing Red black trees instead of AVL trees? What are the applications of Red black trees?

What's the main reason for choosing Red black trees instead of AVL trees?
Both red-black trees and AVL trees are the most commonly used balanced binary search trees and they support insertion, deletion and look-up in guaranteed O(logN) time. However, there are following points of comparison between the two:
AVL trees are more rigidly balanced and hence provide faster look-ups. Thus for a look-up intensive task use an AVL tree.
For an insert intensive tasks, use a Red-Black tree.
AVL trees store the balance factor at each node. This takes O(N) extra space. However, if we know that the keys that will be inserted in the tree will always be greater than zero, we can use the sign bit of the keys to store the colour information of a red-black tree. Thus, in such cases red-black tree takes no extra space.
What are the application of Red black tree?
Red-black trees are more general purpose. They do relatively well on add, remove, and look-up but AVL trees have faster look-ups at the cost of slower add/remove. Red-black tree is used in the following:
Java: java.util.TreeMap, java.util.TreeSet
C++ STL (in most implementations): map, multimap, multiset
Linux kernel: completely fair scheduler, linux/rbtree.h

Try reading this article
It offers some good insights on differences, similarities, performance, etc.
Here's a quote from the article:
RB-Trees are, as well as AVL trees, self-balancing. Both of them provide O(log n) lookup and insertion performance.
The difference is that RB-Trees guarantee O(1) rotations per insert operation. That is what actually costs performance in real implementations.
Simplified, RB-Trees gain this advantage from conceptually being 2-3 trees without carrying around the overhead of dynamic node structures. Physically RB-Trees are implemented as binary trees, the red/black-flags simulate 2-3 behaviour
As far as my own understanding goes, AVL trees and RB trees are not very far off in terms of performance. An RB tree is simply a variant of a B-tree and balancing is implemented differently than an AVL tree.

Our understanding of the differences in performance has improved over the years and now the main reason to use red-black trees over AVL would be not having access to a good AVL implementation since they are slightly less common perhaps because they are not covered in CLRS.
Both trees are now considered forms of rank-balanced trees but red-black trees are consistently slower by about 20% in real world tests. Or even 30-40% slower when sequential data is inserted.
So people who have studied red-black trees but not AVL trees tend to choose red-black trees. The primary uses for red-black trees are detailed on the Wikipedia entry for them.

Other answers here sum up the pros & cons of RB and AVL trees well, but I found this difference particularly interesting:
AVL trees do not support constant amortized update cost [but red-black trees do]
Source: Mehlhorn & Sanders (2008) (section 7.4)
So, while both RB and AVL trees guarantee O(log(N)) worst-case time for lookup, insert and delete, restoring the AVL/RB property after inserting or deleting a node can be done in O(1) amortized time for red-black trees.

Insertions in AVL trees and in RB trees both require a maximum of 2 rotations. From https://adtinfo.org/ :
The primary advantage of red-black trees is that, in AVL trees, deleting one node from a tree containing n nodes may require log 2 n rotations, but deletion in a red-black tree never requires more than three rotations.

They're both the most commonly used tree types but Red black trees have faster insertions because they have relaxed balancing apart from that they both have O(log N) search time something that's common between them. They are both equally good but AVL is usually more balanced because of it's 1.44 logN complexity over 2 logN complexity

Related

Comparison of Avl Tree and Red Black Tree

I have an exam tomorrow and there are 3 questions those i can not understand on my notes.
1- #searches >> #insertions and #deletions=0 Which tree is that? (Avl or Red-Black Tree) (Answer is Avl)
2- #insertions>0 and #searches=#deletions=0 Which tree is that? (Avl or Red-Black Tree) (Answer is Red-Black)
3- #insertions=#deletions and #searches=0 Which tree is that? (Avl or Red-Black Tree) (Answer is Red-Black)
Can you explain them please?
Thanks for help
AVL trees, compared to red/black trees, usually have smaller height because the AVL invariants give less room for imbalance. However, red/black trees, compared to AVL trees, have faster insertions and deletions (the fixup cost of maintaining the red/black invariants is lower than the fixup cost of maintaining the AVL invariants.)
For case (1), an AVL tree is probably better because the cost of the lookups will be lower and, if the number of lookups is truly much larger than the number of insertions, the AVL tree will have a comparative advantage.
For case (2), the red/black tree will probably be faster because it supports faster insertions.
For case (3), for the same reason as part (2), the red/black tree will probably be faster.
Hope this helps!

How is insertion and deletion more faster in red black tree than AVL tree?

I would like to understand the difference bit better, but haven't found a source that can break it down to my level.
I am aware that both trees require at most 2 rotations per insertion. Then how is insertion faster in red-black trees?
And how insertion requires O(log n) rotations in avl tree while O(1) in red-black?
Well, I don't know what your level is, exactly, but to put it simply, red-black trees are less balanced than AVL trees. For red-black trees, the path from the root to the furthest leaf is no more than twice as long as the path from the root to the nearest leaf, while for AVL trees there is never more than one level difference between two neighboring subtrees. This makes insertions and deletions slightly more costly in AVL trees but lookup faster. The asymptotic and worst-case behavior of the two data structures is identical though (the runtime (not number of rotations) is O(log n) for insertions in both cases, the O(1) you mentioned is the so-called amortized runtime).
See this paragraph for a short comparison of the two data structures.
Insertion and deletion is not faster in red-black trees. This is a common ASSUMPTION and the assumption is based on the fact that red-black trees perform slightly fewer rotations on average per insert than AVL (.6 vs .7).
You can check for yourself in Java comparing TreeMap(red-black) to this implementation of TreeMapAVL and you can get exact numbers instead of the common, but incorrect, assumptions. https://github.com/dmcmanam/bbst-showdown

Why dont we use 2-3 or 2-3-4-5 trees?

I have a basic understanding of how 2-3-4 trees maintain the height balance property operation after operation to make sure even the worst case operations are O(n logn).
But I do not understand it well enough to know why only 2-3-4?
Why not 2-3 or 2-3-4-5 etc?
Implementation of 2-3-4 trees typically requires either multiple classes (2NODE, 3NODE, 4NODE) or you have just NODE that has an array of items. In the case of multiple classes you waste lots of time constructing and destructing node instances and reparenting them is cumbersome. If you use a single class with arrays to hold items and children then you are either resizing arrays constantly which is similarly wasteful or you wind up wasting over half your memory on unused array elements. It's just not very efficient compared to Red-Black trees.
Red-Black trees have only one type of node structure. Since Red-Black trees have a duality with 2-3-4 trees, RB trees can use the exact same algorithms as 2-3-4 trees (no need for the stupidly confusing/complex implementations described in Cormen, Leiserson and Rivest that led to AA trees which are not less complex than the 2-3-4 algorithm.)
So, Red-Black trees for their ease of implementation plus their memory/CPU efficiency. (AVL trees are nice too. They produce more well balanced trees and are stupid simply to code but they tend to be less efficient due to working too often to maintain only a slightly more compact tree.)
Oh, and 2-3-4-5-6... etc aren't done because nothing is gained. 2-3-4 has a net-gain over 2-3 trees because they can be done without recursion easily (recursion tends to be less efficient, especially when it cannot be coded tail-recursively). However, B-Trees and Bplus-Trees are pretty much 2-3-4-5-6-7-8-9-etc trees where the max size of the nodes, n, is chosen so that n records can be stored in a single disk sector. (i.e. each disk sector is a node in the tree and the size of the sector is equivalent to the number of items stored in the node.) This is because the time to search through 512 records linearly in memory is still MUCH faster than traversing down a level in the tree which requires another disk seek/fetch. and O(512) is still O(1) and thus maintains O(lg n) for the tree.
To be honest, I wasn't aware of 2-3-4 trees. At my Data Structures class, we were taught 2-3 trees, and to be honest, most of us implemented AVL trees for the wet part of the exercise.
But apparently, there's a generalization of this type of tree:
(a,b) tree.
At my algoritm course they told us that they are commonly used for acessing memory from hard disc - known as B/B+ trees. You make tree that store sizeof your avabile ram and by doing so you minimalize number of reed from disc operation (if you made B with node that store for example 10^8 elements you only need log_10^8(n) reed from disc operation to find something on hard disc which is nothing. So something that you called 2-3-4-5-... trees is in fact widespread solution.

Using a 2-3-4 tree instead of a splay tree

I'm in a data structures course right now and we learned about 2-3-4 trees and splay trees. I was wondering in what circumstances would you use a 2-3-4 tree instead of a splay tree? They're both self balancing and sorted so I don't see that much of a difference between them.
A 2-3-4 tree only changes the structure on insertions and deletions, while a splay-tree also re-organizes the nodes on searches.
Splay trees will, thanks to the re-organization on lookup, provide faster responses if your typical usage pattern happens to look up a small subset of elements most of the time.
It is possible to implement a 2-3-4 tree such that the smallest element can be looked up in O(1), but generally both offer insertion and deletion at amortized O(log n).

Applications of red-black trees

What are the applications of red-black (RB) trees? Is there any application where only RB Trees can be used and no other data structures?
A red-black tree is a particular implementation of a self-balancing binary search tree, and today it seems to be the most popular choice of implementation.
Binary search trees are used to implement finite maps, where you store a set of keys with associated values. You can also implement sets by only using the keys and not storing any values.
Balancing the tree is needed to guarantee good performance, as otherwise the tree could degenerate into a list, for example if you insert keys which are already sorted.
The advantage of search trees over hash tables is that you can traverse the tree efficiently in sort order.
AVL-trees are another variant of balanced binary search trees. They were popular before red-black trees were known. They are more carefully balanced, with a maximal difference of one between the heights of the left and right subtree (RB trees guarantee at most a factor of two). Their main drawback is that rebalancing takes more effort.
So red-black trees are certainly a good but not the only choice for this application.
Red Black Trees are from a class of self balancing BSTs and as answered by others, any such self balancing tree can be used. I would like to add that Red-black trees are widely used as system symbol tables. For example they are used in implementing the following:
Java: java.util.TreeMap , java.util.TreeSet .
C++ STL: map, multimap, multiset.
Linux kernel: completely fair scheduler, linux/rbtree.h
Unless you have very specific performance requirements, an R-B tree could be replaced by some other self-balancing binary tree, for example an AVL tree. Choosing between the two of them is basically a performance optimization - they offer the same basic operations.
Not that either of them is definitively "faster" than the other, just that they're different enough that specific uses of them will tend to have slightly different performance, all else being equal. So if you draw your requirements carefully enough, or just by chance, you could end up with one of them being "fast enough" for your use, and the other not. R-B offers slightly faster insertion than AVL, at the cost of slightly slower lookup.
There is no such rule like red black can only be used in a particular case
it depends upon the application in cases like when You have to build the tree only once and you have to query it many times then you can go for a AVL tree because in AVL tree searching is quite fast.. But it is strictly balanced so insertion and deletion may take some time
AVl tree may be used for language dictionery where You have to build the data structure just once
and the red black tree is used in the Completely Fair Scheduler used in current Linux kernels now a days..
the constraints applied on the red black tree also enforce the point that that that the path from the root to the furthest leaf is no more than twice as long as the path from the root to the nearest leaf.
BTW you can look for the various seach and insert etc time required for a red black tree down here
Average Worst case
Space O(n) O(n)
Search O(log n) O(log n)
Insert O(log n) O(log n)
Delete O(log n) O(log n)

Resources