Inserting into red black tree - algorithm

I am taking an algorithms course and in my course slides, there is an example of insertion into a red-black tree:
My question is, why don't we let "2" be a leaf node here? It looks like if we let it be a leaf node, then no condition of a red black tree is violated. What am I missing here?

The Problem is not with the position of 2 the the second tree of your image but the color of different nodes. Here is the explanation:
1st Rule of insertion in Red-Black tree is: the newly inserted node has to be always Red. You fall in case 3 insertion where both the father and uncle of node 2 is Red. So they are needed to be recolored to Black, and the grandfather will become Red but as the grandfather is root so it will become Black again.
So the new tree (after inserting 2) should be like this (r and b indicate color, .b is Nil node):
5b
/ \
1b 7b
/ \ / \
.b 2r .b .b
/ \
.b .b
And why we always need to insert red node in RBT, you may ask? Answer is, 1st we know every NIL nodes are always Black in RBT, 2nd we have rule 5. Every simple path from a given node to any of its descendant leaves contains the same number of black nodes. Now if we insert a black node at the end the tree will violate this rule, just put 2b in above tree instead of 2r and keep color of 1 and 7 red, then count black node from root to any Nil node, you will see some path have 2 back nodes and some path have 3 black nodes.

All the leaves of a Red Black tree have to be NIL
Check property 3

The wikipedia article, based on the same idea, explains it as follow:
In many of the presentations of tree data structures, it is possible for a node to have only one child, and leaf nodes contain data. It is possible to present red–black trees in this paradigm, but it changes several of the properties and complicates the algorithms. For this reason, this article uses "null leaves",
So clearly nothing prevents you to do it your way, but you have to take it in account in your algorithms, which make them significantly more complex. Perhaps this issue can be somewhat alleviated by using OOP, where leaves contain elements, but behave as nodes with empty leaves.
Anyway, it's a trade off: what you would gain in space (roughly two pointers set to NULL in C), you'd probably lose in code complexity, computation time, or in the object runtime representation (specialized methods for the leaves).

Black-height not uniform.
If you count the number of blacks nodes searching NIL nodes from root, 5-1-2-nil has three and 5-7-nil or 5-1-nil only two.
(rule: Every path from a given node to any of its descendant NIL nodes contains the same number of black nodes)

Related

Properties of Red-Black Tree

Properties of Red-Black Tree:
Every node is either red or black.
The root is black.
Every leaf (NIL) is black.
If a node is red, then both its children are black.
For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.
According to the properties, are these valid or invalid red black trees?
A.
I think this is valid
B.
I think this is valid, but I am not sure since there two adjacent red nodes?
C.
I think this is valid, but I am not sure since there two adjacent red nodes?
D.
I think this is not valid since it violate Property 4?
Did I understand these properties of a RBtree right? If not, where am I wrong?
You have listed the properties of Red-Black trees correctly. Of the four trees only C is not a valid red-black tree:
A.
This is a valid tree. Wikipedia confirms:
every perfect binary tree that consists only of black nodes is a red–black tree.
B.
I think this is valid, but I am not sure since there two adjacent red nodes?
It is valid. There is no problem with red nodes being siblings. They just should not be in a parent-child relationship.
C.
I think this is valid, but I am not sure since there two adjacent red nodes?
It is not valid. Not because of the adjacent red nodes, but because of property 5. The node with label 12 has paths to its leaves with varying number of black nodes. Same for the node 25.
As a general rule, a red node can never have exactly one NIL-leaf as child. Its children should either both be NIL-leaves, or both be (black) internal nodes. This follows from the properties.
D.
I think this is not valid since it violate Property 4?
Property 4 is not violated: the children of the red nodes are NIL leaves (not visualised here), which are black. The fact that these red nodes have black NIL leaves as siblings is irrelevant: there are no rules that concern siblings. So this is valid.
For an example that combines characteristics of tree C and D, see this valid tree depicted in the Wikipedia article, which also depicts the NIL leaves:
A, B & D are valid red-black trees
C is not valid red-black tree as the black height from root to leaf is not the same. It is 2 in some paths and 1 in other paths. It violates what you stated as rule 5.
If 12 had a right child that was black and 25 a left child that was black, then it would be a red-black tree.
A red-black tree is basically identical to a 2-3-4 tree(4-Btree), even though the splitting/swapping method is upside down.
2-3-4 trees have fixed-size 3-node buckets. The color black means that it's the central node of the 3-bucket. Any red-black tree is considered as a perfect quadtree/binary tree (of 3-node-buckets) with empty nodes(black holes and red holes).
In other words, every black node (every 3-bucket) has its absolute position in the perfect tree(2 dimensional unique Cartesian or 4-adic/2-adic unique fraction number).
NIL nodes are just extra flags to save space; you don't have enough memory to store a perfect quadtree/binary tree.
The easiest way to check a red-black tree is to check that each black node is a new bucket(going down) and each red node is grouped with the above black node(same bucket). If the central black node has less than 2 red nodes, you can just add empty red holes next to the central black node(left and right).
A new black node is always the grandson of the last black node, and each black node can have only two red daughter-nodes and no black son-nodes. If the red daughter(mother) is empty(dead/unborn), the motherless grandson-node is directly linked to its grandfather-node.
A motherless black grandson-node has no brother, but he can have a black cousin-node next to him; the 2 cousins are linked to the same grandfather.
A quadtree is a subset of a binary tree.
All black nodes have even heights(2,4,6...), and all red nodes have odd heights(1,3,5...). Optionally, you can use the half unit 0.5.
The 3-bucket has a fixed size 3; just add extra red holes(unborn unlinked red daughters) to make the size 3.

Largest and smallest number of internal nodes in red-black tree?

The smallest number of internal nodes in a red-black tree with black height of k is 2k-1 which is one in the following image:
The largest number of internal nodes with black height of k is 22k-1 which, if the black height is 2, should be 24 - 1 = 15. However, consider this image:
The number of internal nodes is 7. What am I doing wrong?
(I've completely rewritten this answer because, as the commenters noted, it was initially incorrect.)
I think it might help to think about this problem by using the isometry between red-black trees and 2-3-4 trees. Specifically, a red-black tree with black height h corresponds to a 2-3-4 tree with height h, where each red node corresponds to a key in a multi-key node.
This connection makes it easier for us to make a few neat observations. First, any 2-3-4 tree node in the bottom layer corresponds to a black node with either no red children, one red child, or two red children. These are the only nodes that can be leaf nodes in the red-black tree. If we wanted to maximize the number of total nodes in the tree, we'd want to make the 2-3-4 tree have nothing but 4-nodes, which (under the isometry) maps to a red/black tree where every black node has two red children. An interesting effect of this is that it makes the tree layer colors alternate between black and red, with the top layer (containing the root) being black.
Essentially, this boils down to counting the number of internal nodes in a complete binary tree of height 2h - 1 (2h layers alternating between black and red). This is equal to the number of nodes in a complete binary tree of height 2h - 2 (since if you pull off all the leaves, you're left with a complete tree of height one less than what you started with). This works out to 22h - 1 - 1, which differs from the number that you were given (which I'm now convinced is incorrect) but matches the number that you're getting.
You need to count the black NIL leafs in the tree if not this formula won't work. The root must not be RED that is in violation of one of the properties of a Red-Black tree.
The problem is you misunderstood the black height.
The black height of a node in a red-black tree is the the number of black nodes from the current node to a leaf not counting the current node. (This will be the same value in every route).
So if you just add two black leafs to every red node you will get a red-black tree with a black height of 2 and 15 internal nodes.
(Also in a red-black tree every red node has two black children so red nodes can't be leafs.)
After reading the discussion above,so if I add the root with red attribute, the second node I add will be a red again which would be a red violation, and after node restructuring, I assume that we again reach root black and child red ! with which we might not get (2^2k)-1 max internal nodes.
Am I missing something here , started working on rbt just recently ...
It seems you havent considered the "Black Leaves" (Black nodes) -- the 2 NIL nodes for each of the Red Nodes on the last level. If you consider the NIL nodes as leaves, the Red nodes on the last level now get counted as internal nodes totaling to 15.
The tree given here actually has 15 internal nodes. The NIL black children of red nodes in last layer are missing which are actually called external nodes ( node without a key ). The tree has black-height of 2. The actual expression for maximum number of internal nodes for a tree with black-height k is 4^(k)-1. In this case, it turns out to be 15.
In red-black trees, external nodes[null nodes] are always black but in your question for the second tree you have not mentioned external nodes and hence you are getting your count as 7 but if u mention external nodes[null nodes] and then count internal nodes you can see that it turns out to be 15.
Not sure that i understand the question.
For any binary tree where all layers (except maybe last one) have max number of items we will have 2^(k-1)-1 internal nodes, where k is number of layers. At second picture you have 4 layers, so number of internal nodes is 2^(4-1)-1=7

Children in a red / black tree?

According to this explanation of red black tree, the tree must have the following properties:
A node is either red or black.
The root is black. (This rule is sometimes omitted. Since the root can always be changed from red to black, but not necessarily
vice-versa, this rule has little effect on analysis.)
All leaves (NIL) are black. (All leaves are same color as the root.)
Both children of every red node are black.
Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.
What is stopping someone making every single node black?
It is possible. But to maintain condition 5, sometimes you might want to color a node RED.
e.g., Consider the following example.
a
/ \
b c
Here all nodes can be BLACK
Now if you want to insert a new node, which color will you choose? RED. since if you choose black, the condition 5 will not be satisfied. So basically you can keep inserting RED nodes unless any of the conditions (1-4) is not broken
The last rule you quoted is "Every simple path from a given node to any of its descendant leaves contains the same number of black nodes."
If all nodes are black, then the path from the root to any leaf must contain the same number of nodes. In other words, all leaves are at the same depth - so this is only possible for a perfect binary tree.

Can anyone explain the deletion of Left-Lean-Red-Black tree clearly?

I am learning Left-Lean-Red-Black tree, from Prof.Robert Sedgewick
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
While I got to understand the insert of the 2-3 tree and the LLRB, I have spent totally like 40 hours now for 2 weeks and I still can't get the deletion of the LLRB.
Can anyone really explain the deletion of LLRB to me?
Ok I am going to try this, and maybe the other good people of SO can help out. You know how one way of thinking of red nodes is as indicators of
where there there imbalance/new nodes in the tree, and
how much imbalance there is.
This is why all new nodes are red. When the nodes (locally) balance out, they undergo a color flip, and the redness is passed up to the parent, and now the parent may look imbalanced relative to its sibling.
As an illustration, consider a situation where you are adding nodes from larger to smaller. You start with node Z which is now root and is black. You add node Y, which is red and is a left child of Z. You add a red X as a child of Z, but now you have two successive reds, so you rotate right, recolor, and you have a balanced, all black (no imbalance/"new nodes"!) tree rooted at Y [first drawing]. Now you add W and V, in that order. At first they are both red [second drawing], but immediately V/X/W are rotated right, and color flipped, so that only X is red [third drawing]. This is important: X being red indicates that left subtree of Y is unbalanced by 2 nodes, or, in other words, there are two new nodes in the left subtree. So the height of the red links is the count of new, potentially unbalanced nodes: there are 2^height of new nodes in the red subtree.
Note how when adding nodes, the redness is always passed up: in color flip, two red children become black (=locally balanced) while coloring their parent red. Essentially what the deletion does, is reverse this process. Just like a new node is red, we always also want to delete a red node. If the node isn't red, then we want to make it red first. This can be done by a color flip (incidentally, this is why color flip in the code on page 3 is actually color-neutral). So if the child we want to delete is black, we can make it red by color-flipping its parent. Now the child is guaranteed to be red.
The next problem to deal with is the fact that when we start the deletion we don't know if the target node to be deleted is red or not. One strategy would be to find out first. However, according to my reading of your first reference, the strategy chosen there is to ensure that the deleted node can be made red, by "pushing" a red node down in front of the search node as we are searching down the tree for the node to be deleted. This may create unnecessary red nodes that fixUp() procedure will resolve on the way back up the tree. fixUp() presumably maintains the usual LLRBT invariants: "no successive red nodes" and "no right red nodes."
Not sure if that helps, or if we need to get into more detailed examination of code.
There is an interesting comment about the Sedgwich implementation and in particular its delete method from a Harvard Comp Sci professor. Left-Leaning Red-Black Trees Considered Harmful was written in 2013 (the Sedgwich pdf you referenced above is dated 2008):
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as the default and describes 2–3 trees as a variant. The delete implementation, however, only works for 2–3 trees. If you implement the default variant of insert and the only variant of delete, your tree won’t work. The text doesn’t highlight the switch from 2–3–4 to 2–3: not kind.
The most recent version I could find of the Sedgwich code, which contains a 2-3 implementation, is dated April 2014. It is on his Algorithms book site at RedBlackBST.java
Follow the next strategy to delete an arbitrary node in a LLRB tree which is not in a leaf:
Transform a LLRB tree to a 2-3-4 tree (we do not need to transform the whole tree, only a part of the tree).
Replace the value of the node (which we want to delete) its successor.
Delete its successor.
Fix the tree (recover balance, see the book "Algorithms 4th edition" on the pages 435, 436).
If a value in a leaf then we do not need to use a successor to replase this value, but we still need to transform the current tree to 2-3-4 tree to delete this value.
The slide on the page 20 of this presentation https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf and the book "Algorithms 4th edition" on the page 437 are a key. They show how a LLRB tree transformations into a 2-3 tree. In the book "Algorithms 4th edition" on the page 442 https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA442 is an algorithm of transformation for trees.
For example, open the page 54 of the presentation https://www.cs.princeton.edu/~rs/talks/LLRB/08Dagstuhl/RedBlack.pdf. The node H has children D and L. According to the algorithm on the page 442 we transform these three nodes into the 4-node of a 2-3-4 tree. Then the node D has children B and F we also transform these nodes into a node of 2-3-4 tree. Then the node B has children A and C we also transform these nodes into a node of 2-3-4 tree. And finally we need to delete A. After deletion we need to recover balance. We move up through the tree and we restore balance of the tree (according to rules, see the book "Algorithms 4th edition" on the pages 435, 436). If you need to delete the node D (the same tree on the page 54). You need the same transformations and need to replace the value of the node D on the value of the node E and delete the node E (because it is a successor of D).

How to tell whether a red-black tree can have X black nodes and Y red nodes or not

I have an exam next week in algorithms, and was given questions to prepare for it. One of these questions has me stumped though.
"Can we draw a red-black tree with 7 black nodes and 10 red nodes? why?"
It sounds like it could be answered quickly, but I can't get my mind around it.
The CRLS gives us the maximum height of a RB tree with n internal nodes: 2*lg(n+1).
I think the problem could be solved using this lemma alone, but am not sure.
Any tips?
Since this is exam preparation, I don't want to give you a direct answer, but I think what you need to consider is the properties that govern how you build a Red-Black Tree:
A node is either red or black.
The root is black. (This rule is sometimes omitted from other definitions. Since the root can always be changed from red to black but not necessarily vice-versa this rule has little effect on analysis.)
All leaves are black.
Both children of every red node are black.
Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.
(Stole these from the wikipedia page: http://en.wikipedia.org/wiki/Red-black_tree)
Given the count of nodes you listed, can you meet all of these properties?
the answer is simple.
As we know that a red node can have only black parent.The max no. of nodes will be when each black node's both children are red and, hence, every black node has red parent. So, for 'n' black nodes '2n' red node are possible.
Think it this way:
put the first node(which is root) & make it black
make both its children red
make left and right children of both these red nodes black and for all these black nodes,
follow the same procedure as followed with root until black node count reaches the given value (in this case 7)
hope this helped you visualize the solution.
The answer critically depends on whether your RB tree uses black dummy nodes at the leaves, and if so, they are included in the count of seven black nodes. If not, consider a complete tree of seven black nodes
*
/ \
* *
/\ /\
* * * *
You won't have much trouble adding ten red nodes.

Resources