RB Tree insertion scenario - data-structures

I was looking at Fig 5 of example insertions in this red-black tree tutorial:
I think even before inserting the new node with key x, the tree is already violating the rule which says:
All the external nodes black height should be same.
In my view, the black height of external nodes at node 15 is one more than the black height of external nodes at node 8.
I found that this use case is explained in all text books and online resources, so I am confused here.
How was the tree actually a RB tree before inserting the new node?

Related

Red Black Tree Insertion & Deletion Uniqueness

I've been learning and working on implementing a red-black tree data structure. I'm following this article on red-black tree deletion examples and looking at example 5 they have:
When I insert the same nodes into my tree, I get the following:
I understand that red black trees are not unique (I think), therefore both of the above trees are valid since they don't violate any of the properties.
In the example article, after deleting node 1, they get the following:
But after deleting node 1 in my code, I get the following:
Since in my case, node 1 is red, I don't call my delete_fix function which takes care of re-arranging the tree and such. The deletion algorithm I was following simply states to call a delete_fix function if the node to be deleted is black.
However, after comparing my tree with the one in the example article I can see that mine is not exactly optimized. It still follows the rules of the red-black tree though. Is this to be expected with red-black trees or am I missing something here?
However, after comparing my tree with the one in the example article I can see that mine is not exactly optimized.
It is optimised. Your tree will be fast at deleting nodes 5, 7, 20 & 28. The other only 5 & 7.
Bear in mind that for Red-Black Trees, they can be bushy in one direction. If the black tree height of real nodes is N, then the minimum path from root to leaf node is N (all black) and maximum path from root to leaf node is 2 * N (alternatively black-red-black-red etc). If you try to add a new node to the bushy path that is at maximum height, the tree will recolour and/or rebalance.
If you want a more balanced search tree you should use an AVL tree. Red-Black trees favour minimal insertion/deletion fixups over finding a node. Your tree is fine.

Converting 2-3-4 to Red-Black Tree

I am familiar with converting individual 2-node, 3-node, and 4-nodes straight to Red-Black trees. And this Stackoverflow link is a good explanation 2-3-4 to Red-Black. However, I have a question about the example given in that link.
This is how the Stackoverflow question 2-3-4 to red-black was illustrated 2-3-4 to Red-Black
I highlighted the part that I am questioning. Why is it on this guide 4-node connected to 2-node I found and others on the internet, they say when encountering a 4 node connected to a 2 or 3 node, you need to switch the colors around. But in the StackOverflow example that I highlighted red, they didn't. Thanks
The following image does not imply that the colors of the red-back tree need to be swapped:
It merely describes the process of splitting a B-tree node, which leads to an alternative B-tree for the same data.
Then it shows how that different shape of the B-tree leads to a different coloring in the corresponding red-black tree, and how that new coloring is also a valid alternative.
But the translation from B-tree to red-black tree follows the rules you referred to:
If we look at the left side of the image, we see at the bottom layer of the B-tree a 4-node. According to the rules, this translates to a black node (c) with two red children (b and d). The 2-node at the root translates to a black node (a).
If we look at the right side of the image, we see two 2-nodes at the bottom layer of the B-tree. These translate each to a back node (b and d). The root 3-node is translated to a back node (a) with a red node (c) as child.
This is exactly what is depicted at the bottom of the image. The point is that these two variants are valid red-black trees for the same data, but derived from different shapes of B-trees.
Such a transition from the left to the right version might be needed when inserting a node. For instance, if an "e" would be added, then it cannot be added as a child of the "d" node in the red-black tree without recoloring. By switching to the right-side version of the red-black tree, the node can be added as (red) child of node "d".

Is RBT always full?

As I understand, binary trees do not have to be full. However, it seems that RBTs have to be full (sometimes children are NIL). Is that true, or am I missing something?
The path from the root node to all leaf nodes of any given red-black tree all have the same number of black nodes. In that sense I suppose you could say that red-black trees are always 'full' but I don't see that being a very useful definition.
The general idea of the red-black algorithm is to constrain the actual maximum difference in total height of leaf nodes (not just black height) between the leaf node with the shortest total path and the leaf node with the longest total path. If you use that as your basis then a RB tree is 'full' if all leaf nodes have the same total height (just as a regular binary tree is full if all leaves are at the same depth) and an RB tree does not have to be filled.
No. Red Black Trees are not always full. In fact that's a seldom event. You can learn more about it by reading the book Introduction to Algorithms (Cormen, page 308), 3rd Edition (it has some figures illustrating the answer at page 310, i am not showing them because copyright).

Largest and smallest number of internal nodes in red-black tree?

The smallest number of internal nodes in a red-black tree with black height of k is 2k-1 which is one in the following image:
The largest number of internal nodes with black height of k is 22k-1 which, if the black height is 2, should be 24 - 1 = 15. However, consider this image:
The number of internal nodes is 7. What am I doing wrong?
(I've completely rewritten this answer because, as the commenters noted, it was initially incorrect.)
I think it might help to think about this problem by using the isometry between red-black trees and 2-3-4 trees. Specifically, a red-black tree with black height h corresponds to a 2-3-4 tree with height h, where each red node corresponds to a key in a multi-key node.
This connection makes it easier for us to make a few neat observations. First, any 2-3-4 tree node in the bottom layer corresponds to a black node with either no red children, one red child, or two red children. These are the only nodes that can be leaf nodes in the red-black tree. If we wanted to maximize the number of total nodes in the tree, we'd want to make the 2-3-4 tree have nothing but 4-nodes, which (under the isometry) maps to a red/black tree where every black node has two red children. An interesting effect of this is that it makes the tree layer colors alternate between black and red, with the top layer (containing the root) being black.
Essentially, this boils down to counting the number of internal nodes in a complete binary tree of height 2h - 1 (2h layers alternating between black and red). This is equal to the number of nodes in a complete binary tree of height 2h - 2 (since if you pull off all the leaves, you're left with a complete tree of height one less than what you started with). This works out to 22h - 1 - 1, which differs from the number that you were given (which I'm now convinced is incorrect) but matches the number that you're getting.
You need to count the black NIL leafs in the tree if not this formula won't work. The root must not be RED that is in violation of one of the properties of a Red-Black tree.
The problem is you misunderstood the black height.
The black height of a node in a red-black tree is the the number of black nodes from the current node to a leaf not counting the current node. (This will be the same value in every route).
So if you just add two black leafs to every red node you will get a red-black tree with a black height of 2 and 15 internal nodes.
(Also in a red-black tree every red node has two black children so red nodes can't be leafs.)
After reading the discussion above,so if I add the root with red attribute, the second node I add will be a red again which would be a red violation, and after node restructuring, I assume that we again reach root black and child red ! with which we might not get (2^2k)-1 max internal nodes.
Am I missing something here , started working on rbt just recently ...
It seems you havent considered the "Black Leaves" (Black nodes) -- the 2 NIL nodes for each of the Red Nodes on the last level. If you consider the NIL nodes as leaves, the Red nodes on the last level now get counted as internal nodes totaling to 15.
The tree given here actually has 15 internal nodes. The NIL black children of red nodes in last layer are missing which are actually called external nodes ( node without a key ). The tree has black-height of 2. The actual expression for maximum number of internal nodes for a tree with black-height k is 4^(k)-1. In this case, it turns out to be 15.
In red-black trees, external nodes[null nodes] are always black but in your question for the second tree you have not mentioned external nodes and hence you are getting your count as 7 but if u mention external nodes[null nodes] and then count internal nodes you can see that it turns out to be 15.
Not sure that i understand the question.
For any binary tree where all layers (except maybe last one) have max number of items we will have 2^(k-1)-1 internal nodes, where k is number of layers. At second picture you have 4 layers, so number of internal nodes is 2^(4-1)-1=7

Can anyone explain the deletion of Left-Lean-Red-Black tree clearly?

I am learning Left-Lean-Red-Black tree, from Prof.Robert Sedgewick
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
While I got to understand the insert of the 2-3 tree and the LLRB, I have spent totally like 40 hours now for 2 weeks and I still can't get the deletion of the LLRB.
Can anyone really explain the deletion of LLRB to me?
Ok I am going to try this, and maybe the other good people of SO can help out. You know how one way of thinking of red nodes is as indicators of
where there there imbalance/new nodes in the tree, and
how much imbalance there is.
This is why all new nodes are red. When the nodes (locally) balance out, they undergo a color flip, and the redness is passed up to the parent, and now the parent may look imbalanced relative to its sibling.
As an illustration, consider a situation where you are adding nodes from larger to smaller. You start with node Z which is now root and is black. You add node Y, which is red and is a left child of Z. You add a red X as a child of Z, but now you have two successive reds, so you rotate right, recolor, and you have a balanced, all black (no imbalance/"new nodes"!) tree rooted at Y [first drawing]. Now you add W and V, in that order. At first they are both red [second drawing], but immediately V/X/W are rotated right, and color flipped, so that only X is red [third drawing]. This is important: X being red indicates that left subtree of Y is unbalanced by 2 nodes, or, in other words, there are two new nodes in the left subtree. So the height of the red links is the count of new, potentially unbalanced nodes: there are 2^height of new nodes in the red subtree.
Note how when adding nodes, the redness is always passed up: in color flip, two red children become black (=locally balanced) while coloring their parent red. Essentially what the deletion does, is reverse this process. Just like a new node is red, we always also want to delete a red node. If the node isn't red, then we want to make it red first. This can be done by a color flip (incidentally, this is why color flip in the code on page 3 is actually color-neutral). So if the child we want to delete is black, we can make it red by color-flipping its parent. Now the child is guaranteed to be red.
The next problem to deal with is the fact that when we start the deletion we don't know if the target node to be deleted is red or not. One strategy would be to find out first. However, according to my reading of your first reference, the strategy chosen there is to ensure that the deleted node can be made red, by "pushing" a red node down in front of the search node as we are searching down the tree for the node to be deleted. This may create unnecessary red nodes that fixUp() procedure will resolve on the way back up the tree. fixUp() presumably maintains the usual LLRBT invariants: "no successive red nodes" and "no right red nodes."
Not sure if that helps, or if we need to get into more detailed examination of code.
There is an interesting comment about the Sedgwich implementation and in particular its delete method from a Harvard Comp Sci professor. Left-Leaning Red-Black Trees Considered Harmful was written in 2013 (the Sedgwich pdf you referenced above is dated 2008):
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as the default and describes 2–3 trees as a variant. The delete implementation, however, only works for 2–3 trees. If you implement the default variant of insert and the only variant of delete, your tree won’t work. The text doesn’t highlight the switch from 2–3–4 to 2–3: not kind.
The most recent version I could find of the Sedgwich code, which contains a 2-3 implementation, is dated April 2014. It is on his Algorithms book site at RedBlackBST.java
Follow the next strategy to delete an arbitrary node in a LLRB tree which is not in a leaf:
Transform a LLRB tree to a 2-3-4 tree (we do not need to transform the whole tree, only a part of the tree).
Replace the value of the node (which we want to delete) its successor.
Delete its successor.
Fix the tree (recover balance, see the book "Algorithms 4th edition" on the pages 435, 436).
If a value in a leaf then we do not need to use a successor to replase this value, but we still need to transform the current tree to 2-3-4 tree to delete this value.
The slide on the page 20 of this presentation https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf and the book "Algorithms 4th edition" on the page 437 are a key. They show how a LLRB tree transformations into a 2-3 tree. In the book "Algorithms 4th edition" on the page 442 https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA442 is an algorithm of transformation for trees.
For example, open the page 54 of the presentation https://www.cs.princeton.edu/~rs/talks/LLRB/08Dagstuhl/RedBlack.pdf. The node H has children D and L. According to the algorithm on the page 442 we transform these three nodes into the 4-node of a 2-3-4 tree. Then the node D has children B and F we also transform these nodes into a node of 2-3-4 tree. Then the node B has children A and C we also transform these nodes into a node of 2-3-4 tree. And finally we need to delete A. After deletion we need to recover balance. We move up through the tree and we restore balance of the tree (according to rules, see the book "Algorithms 4th edition" on the pages 435, 436). If you need to delete the node D (the same tree on the page 54). You need the same transformations and need to replace the value of the node D on the value of the node E and delete the node E (because it is a successor of D).

Resources