The intuition of red-black tree - algorithm

I wanted to understand how red-black tree works. I understood the algorithm, how to fix properties after insert and delete operations, but something isn't clear to me. Why red-black tree is more balanced than binary tree? I want to understand the intuition, why rotations and fixing tree properties makes red-black tree more balanced.
Thanks.

Suppose you create a plain binary tree by inserting the following items in order: 1, 2, 3, 4, 5, 6, 7, 8, 9. Each new item will always be the largest item in the tree, and so inserted as the right-most possible node. You "tree" would look like this:
1
\
2
\
3
.
.
.
9
The rotations performed in a red-black tree (or any type of balanced binary tree) ensure that neither the left nor right subtree of any node is significantly deeper than the other (typically, the difference in height is 0 or 1, but any constant factor would do.) This way, operations whose running time depends on the height h of the tree are always O(lg n), since the rotations maintain the property that h = O(lg n), whereas in the worst case shown above h = O(n).
For a red-black tree in particular, the node coloring is simply a bookkeeping trick that help in proving that the rotations always maintain h = O(lg n). Different types of balanced binary trees (AVL trees, 2-3 trees, etc) use different bookkeeping techniques for maintaining the same property.

Why red-black tree is more balanced than binary search tree?
Because a red-black tree guarantees O(logN) performance for insertion, deletion and look ups for any order of operations.
Why rotations and fixing tree properties makes red-black tree more balanced?
Apart from the general properties that any binary search tree must obey, a red-black tree also obeys the following properties:
No node has two red links connected to it.
Every path from root to null link has the same number of black links.
Red links lean left.
Now we want to prove the following proposition :
Proposition. Height of tree is ≤ 2 lg N in the worst case.
Proof.
Since every path from the root to any null link has the same number of black links and two red links are never in-a-row, the maximum height will always be less than or equal to 2logN in the worst case.

Although quite late , but since I was recently studying RBT and was struggling with the intuition behind why some magical rotation and coloring balances the tree and was thinking the same question as the OP
why rotations and fixing tree properties makes red-black tree more balanced
After a few days of "research" , I had the eureka moment and decided to write it in details . I won't copy paste here as some formatting would be not right , so anyone who is interested , can check it from github . I tried to explain with a lot of images and simulation . Hope it helps someone someday who happens to trip in this thread searching the same question : )

Related

Why is shallow binary tree better?

Why do we always want shallow binary tree? In what cases is shallow binary tree better than non-shallow/minimum depth tree?
I am just confused as my prof keeps saying we want to aim for shallowest possible binary tree but I do not understand why. I guess smallar is better but is there any specific concrete reason? Sorry for my bad english thanks for your help
I'm assuming this is in regards to binary search trees - if not, please let me know and I can update this answer.
In a binary search tree, the cost of almost every operation (insertion, deletion, lookup, successor, predecessor, min, max, range search, split, join, etc.) depends on the height of the binary search tree. The reason for this is that these operations work by walking down the tree from the root until they either fall off the tree or find what they're looking for. The deeper the tree, the longer this can take if you get bad inputs.
By shuffling nodes around to keep the tree height low, we can make it so that these operations are, in general, very fast. A tree with height h can have at most 2h - 1 nodes in it, which is a huge number compared with h (figure that if h = 20, 2h - 1 is over a million!), so if you make an effort to pack the nodes into the tree higher up and closer to the root, you'll get better operation speeds all around.
There are some cases where it's actually beneficial to have trees that are as imbalanced as possible. For example, if you have a binary search tree and know in advance that some elements will be looked up more than others, you may want to shuffle the nodes around in the tree to put the high-frequency items higher up and the low-frequency items deeper in the tree. In non-binary-search-tree contexts, the randomized meldable priority queue works by randomly walking down a tree doing merges, and the less balanced the tree is the more likely it is for these operations to end early by falling off the tree.

How can i split an AVL tree at a given node at time O(log(n))?

I've been busting my head trying all kinds of ways but the best I got is O(log^2(n)).
the exact question is:
make a function Split(AVLtree T, int k) which returns 2 AVL trees (like a tuple) such that all values in T1 are lower than or equal to k and the rest are in T2. k is not necessarily in the tree. time must be O(log(n)).
Assume efficient implementation of AVL tree and I managed to make a merge function with time O(log(|h1-h2|)).
Any help would be greatly appriciated.
You're almost there, given that you have the merge function!
Do a regular successor search in the tree for k. This will trace out a path through the tree from the root to that successor node. Imagine cutting every edge traced out on the path this way, which will give you a collection of "pennants," single nodes with legal AVL trees hanging off to the sides. Then, show that if you merge them back together in the right order, the costs of the merges form a telescoping sum that adds up to O(log n).

Insertion and deletion of nodes in Splay Trees

I have 2 questions regarding splay trees:
1. Deletion of a node
The book I am using says the following: ''When deleting a key k, we splay the parent of the node w that gets removed. Example deletion of 8:
However, what I am doing is this: If the deleted node is not the root, I splay it (to the root), delete it, and splay the right-most node of the left-subtree. But since in this case, the deleted node is the root, I simply remove it and splay the right-most node of the left subtree immediately. Like this:
Is this way also correct? Notice that it is totally different (like my root is 7 not 6 like my book says).
2. In which order are the values in a splay tree inserted?
Is it possible to ''get'' the order of the values that are inserted in the left tree example above? In other words, how is this tree made (in which order are the nodes inserted to generate the following tree). Is there a way to figure this out?
Re deleting a node: both algorithms are correct, and both take time O(log n) amortized. Splaying a node costs O(log n). Creating a new link near the root costs O(log n). Splay trees have a lot of flexibility in how they are accessed and restructured.
Re reconstructing the sequence of insertions: assuming that the insert method is the usual unbalanced insert and splay, then the root is the last insertion. Unfortunately, there are, in general, several ways that it could have been splayed to the root. An asymptotic improvement on the obvious O(n! poly(n))-time brute force algorithm is to do an exhaustive search with memoization, which has cost O(4^n poly(n)).

Why is it important that a binary tree be balanced?

Why is it important that a binary tree be balanced
Imagine a tree that looks like this:
A
\
B
\
C
\
D
\
E
This is a valid binary tree, but now most operations are O(n) instead of O(lg n).
The balance of a binary tree is governed by the property called skewness. If a tree is more skewed, then the time complexity to access an element of a the binary tree increases. Say a tree
1
/ \
2 3
\ \
7 4
\
5
\
6
The above is also a binary tree, but right skewed. It has 7 elements, so an ideal binary tree require O(log 7) = 3 lookups. But you need to go one more level deep = 4 lookups in worst case. So the skewness here is a constant 1. But consider if the tree has thousands of nodes. The skewness will be even more considerable in that case. So it is important to keep the binary tree balanced.
But again the skewness is the topic of debate as the probablity analysis of a random binary tree shows that the average depth of a random binary tree with n elements is 4.3 log n . So it is really the matter of balancing vs the skewness.
One more interesting thing, computer scientists have even found an advantage in the skewness and proposed a skewed datastructure called skew heap
To ensure log(n) search time, you need to divide the total number of down level nodes by 2 at each branch. For example, if you have a linear tree, never branching from root to the leaf node, then the search time will be linear as in a linked list.
An extremely unbalanced tree, for example a tree where all nodes are linked to the left, means you still search through every single node before finding the last one, which is not the point of a tree at all and has no benefit over a linked list. Balancing the tree makes for better search times O(log(n)) as opposed to O(n).
As we know that most of the operations on Binary Search Trees proportional to height of the Tree, So it is desirable to keep height small. It ensure that search time strict to O(log(n)) of complexity.
Rather than that most of the Tree Balancing Techniques available applies more to
trees which are perfectly full or close to being perfectly balanced.
At the end of the end you need the simplicity over your tree and go for best binary trees like red-black tree or avl

Tree Datastructures

I've tried to understand what sorted trees are and binary trees and avl and and and ...
I'm still not sure, what makes a sorted tree sorted? And what is the complexity (Big-Oh) between searching in a sorted and searching in an unsorted tree? Hope you can help me.
Binary Trees
There exists two main types of binary trees, balanced and unbalanced. A balanced tree aims to keep the height of the tree (height = the amount of nodes between the root and the furthest child) as even as possible. There are several types of algorithms for balanced trees, the two most famous being AVL- and RedBlack-trees. The complexity for insert/delete/search operations on both AVL and RedBlack trees is O(log n) or better - which is the important part. Other self balancing algorithms are AA-, Splay- and Scapegoat-tree.
Balanced trees gain their property (and name) of being balanced from the fact that after every delete or insert operation on the tree the algorithm introspects the tree to make sure it's still balanced, if it's not it will try to fix this (which is done differently with each algorithm) by rotating nodes around in the tree.
Normal (or unbalanced) binary trees do not modify their structure to keep themselves balanced and have the risk of, most often overtime, to become very inefficient (especially if the values are inserted in order). However if performance is of no issue and you mainly want a sorted data structure then they might do. The complexity for insert/delete/search operations on an unbalanced tree range from O(1) (best case - if you want the root) to O(n) (worst-case if you inserted all nodes in order and want the largest node)
There exists another variation which is called a randomized binary tree which uses some kind of randomization to make sure the tree doesn't become fully unbalanced (which is the same as a linked list)
A binary search tree is an "tree"-structure where every node has two children-nodes.
The left nodes all have the property of being less than its parent, and the right-nodes are all greater than its parent.
The intressting thing with an binary-tree is that we can search for an value in O(log n) when the tree is properly sorted. Doing the same search in an LinkedList for an example would give us the searchspeed of O(n).
The best way to go about learning datastructures would be to do a day of googling and reading wikipedia articles.
This might get you started
http://en.wikipedia.org/wiki/Binary_search_tree
Do a google search for the following:
site:stackoverflow.com binary trees
to get a list of SO questions which will answer your several questions.
There isn't really a lot of point in using a tree structure if it isn't sorted in some fashion - if you are planning on searching for a node in the tree and it is unsorted, you will have to traverse the entire tree (O(n)). If you have a tree which is sorted in some fashion, then it is only necessary to traverse down a single branch of the tree (typically O(log n)).
In binary tree the right leaf is always smaller then the head, and the left leaf is always bigger, so you can search in sorted tree in O(log(n)), you just need to go right if if the key is smaller than head and to the left if bgger

Resources