Something strange about Red-Black Tree Delete in Introduction to Algorithms - algorithm

In the Introduction to Algorithms 3rd edition, page 329, Figure 13.7 shows us about the 4 deletion cases.
(source: quoracdn.net)
But I have a problem: in that figure, all the node of x is BLACK and it's not nil. But I have tested many cases and it turns out that x must be either a RED node or nil, for the reason:
If the z has less then 2 children, then x must be a RED node or nil because z's black height must be 1.
If the z has 2 children, the x must be a RED node or nil because y is the successor of z and y's left child must be nil, then y's black height must be 1.
Am I wrong? Or is there anything that I has ignored? Hope someone can help me.
Thanks for your time.

What is z? It is not marked on any of the 4 cases. Do you mean w the sibling node?
To my mind, x is either null (and that is valid, you have just deleted it) or it is black and the node you have deleted is further down either x's left child tree or x's right child tree.
Note also before the FIXUP routine is called, various other cases are called.
If the node to deleted has 0 children and is Red => just delete it. It is a leaf node. It is still a red-black tree.
If the node to deleted has 1 child => The node is black and child red. No other possibilities can occur as they break the Red-Black tree rules. Delete the node and replace with the Child. The Child is now black. It is still a Red-Black tree.
If the node to be deleted has 2 children => Find the inorder Successor or Predecessor. Swap nodes to be deleted BUT NOT THE COLOURS. The inorder Successor or Predecessor will always be a node with 0 or 1 children. Now you just need to delete that which reduces to the 2 cases above.
So what is left to do?
The difficult case: 0 children and the node is Black.
In the diagrams above, case 2 is the start case and the sibling left and right children are nil. x no longer exists but w does and is black
To my mind, you need to find out in your code why x is red. It should never be red. If it is, the preamble code before the fixup code is not working and the fixup code is only called when the node to be deleted is black and with no children.

Related

Properties of Red-Black Tree

Properties of Red-Black Tree:
Every node is either red or black.
The root is black.
Every leaf (NIL) is black.
If a node is red, then both its children are black.
For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.
According to the properties, are these valid or invalid red black trees?
A.
I think this is valid
B.
I think this is valid, but I am not sure since there two adjacent red nodes?
C.
I think this is valid, but I am not sure since there two adjacent red nodes?
D.
I think this is not valid since it violate Property 4?
Did I understand these properties of a RBtree right? If not, where am I wrong?
You have listed the properties of Red-Black trees correctly. Of the four trees only C is not a valid red-black tree:
A.
This is a valid tree. Wikipedia confirms:
every perfect binary tree that consists only of black nodes is a red–black tree.
B.
I think this is valid, but I am not sure since there two adjacent red nodes?
It is valid. There is no problem with red nodes being siblings. They just should not be in a parent-child relationship.
C.
I think this is valid, but I am not sure since there two adjacent red nodes?
It is not valid. Not because of the adjacent red nodes, but because of property 5. The node with label 12 has paths to its leaves with varying number of black nodes. Same for the node 25.
As a general rule, a red node can never have exactly one NIL-leaf as child. Its children should either both be NIL-leaves, or both be (black) internal nodes. This follows from the properties.
D.
I think this is not valid since it violate Property 4?
Property 4 is not violated: the children of the red nodes are NIL leaves (not visualised here), which are black. The fact that these red nodes have black NIL leaves as siblings is irrelevant: there are no rules that concern siblings. So this is valid.
For an example that combines characteristics of tree C and D, see this valid tree depicted in the Wikipedia article, which also depicts the NIL leaves:
A, B & D are valid red-black trees
C is not valid red-black tree as the black height from root to leaf is not the same. It is 2 in some paths and 1 in other paths. It violates what you stated as rule 5.
If 12 had a right child that was black and 25 a left child that was black, then it would be a red-black tree.
A red-black tree is basically identical to a 2-3-4 tree(4-Btree), even though the splitting/swapping method is upside down.
2-3-4 trees have fixed-size 3-node buckets. The color black means that it's the central node of the 3-bucket. Any red-black tree is considered as a perfect quadtree/binary tree (of 3-node-buckets) with empty nodes(black holes and red holes).
In other words, every black node (every 3-bucket) has its absolute position in the perfect tree(2 dimensional unique Cartesian or 4-adic/2-adic unique fraction number).
NIL nodes are just extra flags to save space; you don't have enough memory to store a perfect quadtree/binary tree.
The easiest way to check a red-black tree is to check that each black node is a new bucket(going down) and each red node is grouped with the above black node(same bucket). If the central black node has less than 2 red nodes, you can just add empty red holes next to the central black node(left and right).
A new black node is always the grandson of the last black node, and each black node can have only two red daughter-nodes and no black son-nodes. If the red daughter(mother) is empty(dead/unborn), the motherless grandson-node is directly linked to its grandfather-node.
A motherless black grandson-node has no brother, but he can have a black cousin-node next to him; the 2 cousins are linked to the same grandfather.
A quadtree is a subset of a binary tree.
All black nodes have even heights(2,4,6...), and all red nodes have odd heights(1,3,5...). Optionally, you can use the half unit 0.5.
The 3-bucket has a fixed size 3; just add extra red holes(unborn unlinked red daughters) to make the size 3.

Inserting into Augmented Red Black Tree

I am looking at the code of inserting into an augmented Red black tree. This tree has an additional field called "size" and it keeps the size of a subtree rooted at a node x. Here is the pseudocode for inserting a new node:
AugmentedRBT_Insert(T,x){
BST_Insert(T,x); //insert as if it is a normal BST
x[color]=red; //insert as a red node
size[x]=1;
tmp=parent[x];
while(tmp!=NULL){ //start from the node x and follow the path to root
size[tmp]=size[tmp]+1; //update the size of each node
tmp=parent[tmp];
}
}
Forget about fixing the coloring and rotations, they will be done in another function. My question is, why do we set the size of the newly added node "x" to 1? I understand that it will not have any subtrees, so its size must be 1, but one of the requirements of RBT is that every red node has two black children, in fact every leaf node is NULL and even if we insert the node "x" as black, it still should have 2 black NULL nodes and i think we must set its size to 3? Am i wrong?
Thanks.
An insertion in a red-black tree, as in most binary trees, happens directly at a leaf. Hence the size of the subtree rooted at the leaf is 1. The red node does have two black children, because leaves always have the "root" or "nil" as a child, which is black. Those null elements aren't nodes, so we wouldn't count them.
Then, we go and adjust the sizes of all parents up to the root (they each get +1 for the node we just added).
Finally, we fix these values when we rotate the tree to balance it, if necessary. In your implementation, you will probably want to do both the size updates and rotations in one pass instead of two.

Inserting into red black tree

I am taking an algorithms course and in my course slides, there is an example of insertion into a red-black tree:
My question is, why don't we let "2" be a leaf node here? It looks like if we let it be a leaf node, then no condition of a red black tree is violated. What am I missing here?
The Problem is not with the position of 2 the the second tree of your image but the color of different nodes. Here is the explanation:
1st Rule of insertion in Red-Black tree is: the newly inserted node has to be always Red. You fall in case 3 insertion where both the father and uncle of node 2 is Red. So they are needed to be recolored to Black, and the grandfather will become Red but as the grandfather is root so it will become Black again.
So the new tree (after inserting 2) should be like this (r and b indicate color, .b is Nil node):
5b
/ \
1b 7b
/ \ / \
.b 2r .b .b
/ \
.b .b
And why we always need to insert red node in RBT, you may ask? Answer is, 1st we know every NIL nodes are always Black in RBT, 2nd we have rule 5. Every simple path from a given node to any of its descendant leaves contains the same number of black nodes. Now if we insert a black node at the end the tree will violate this rule, just put 2b in above tree instead of 2r and keep color of 1 and 7 red, then count black node from root to any Nil node, you will see some path have 2 back nodes and some path have 3 black nodes.
All the leaves of a Red Black tree have to be NIL
Check property 3
The wikipedia article, based on the same idea, explains it as follow:
In many of the presentations of tree data structures, it is possible for a node to have only one child, and leaf nodes contain data. It is possible to present red–black trees in this paradigm, but it changes several of the properties and complicates the algorithms. For this reason, this article uses "null leaves",
So clearly nothing prevents you to do it your way, but you have to take it in account in your algorithms, which make them significantly more complex. Perhaps this issue can be somewhat alleviated by using OOP, where leaves contain elements, but behave as nodes with empty leaves.
Anyway, it's a trade off: what you would gain in space (roughly two pointers set to NULL in C), you'd probably lose in code complexity, computation time, or in the object runtime representation (specialized methods for the leaves).
Black-height not uniform.
If you count the number of blacks nodes searching NIL nodes from root, 5-1-2-nil has three and 5-7-nil or 5-1-nil only two.
(rule: Every path from a given node to any of its descendant NIL nodes contains the same number of black nodes)

Can anyone explain the deletion of Left-Lean-Red-Black tree clearly?

I am learning Left-Lean-Red-Black tree, from Prof.Robert Sedgewick
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
While I got to understand the insert of the 2-3 tree and the LLRB, I have spent totally like 40 hours now for 2 weeks and I still can't get the deletion of the LLRB.
Can anyone really explain the deletion of LLRB to me?
Ok I am going to try this, and maybe the other good people of SO can help out. You know how one way of thinking of red nodes is as indicators of
where there there imbalance/new nodes in the tree, and
how much imbalance there is.
This is why all new nodes are red. When the nodes (locally) balance out, they undergo a color flip, and the redness is passed up to the parent, and now the parent may look imbalanced relative to its sibling.
As an illustration, consider a situation where you are adding nodes from larger to smaller. You start with node Z which is now root and is black. You add node Y, which is red and is a left child of Z. You add a red X as a child of Z, but now you have two successive reds, so you rotate right, recolor, and you have a balanced, all black (no imbalance/"new nodes"!) tree rooted at Y [first drawing]. Now you add W and V, in that order. At first they are both red [second drawing], but immediately V/X/W are rotated right, and color flipped, so that only X is red [third drawing]. This is important: X being red indicates that left subtree of Y is unbalanced by 2 nodes, or, in other words, there are two new nodes in the left subtree. So the height of the red links is the count of new, potentially unbalanced nodes: there are 2^height of new nodes in the red subtree.
Note how when adding nodes, the redness is always passed up: in color flip, two red children become black (=locally balanced) while coloring their parent red. Essentially what the deletion does, is reverse this process. Just like a new node is red, we always also want to delete a red node. If the node isn't red, then we want to make it red first. This can be done by a color flip (incidentally, this is why color flip in the code on page 3 is actually color-neutral). So if the child we want to delete is black, we can make it red by color-flipping its parent. Now the child is guaranteed to be red.
The next problem to deal with is the fact that when we start the deletion we don't know if the target node to be deleted is red or not. One strategy would be to find out first. However, according to my reading of your first reference, the strategy chosen there is to ensure that the deleted node can be made red, by "pushing" a red node down in front of the search node as we are searching down the tree for the node to be deleted. This may create unnecessary red nodes that fixUp() procedure will resolve on the way back up the tree. fixUp() presumably maintains the usual LLRBT invariants: "no successive red nodes" and "no right red nodes."
Not sure if that helps, or if we need to get into more detailed examination of code.
There is an interesting comment about the Sedgwich implementation and in particular its delete method from a Harvard Comp Sci professor. Left-Leaning Red-Black Trees Considered Harmful was written in 2013 (the Sedgwich pdf you referenced above is dated 2008):
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as the default and describes 2–3 trees as a variant. The delete implementation, however, only works for 2–3 trees. If you implement the default variant of insert and the only variant of delete, your tree won’t work. The text doesn’t highlight the switch from 2–3–4 to 2–3: not kind.
The most recent version I could find of the Sedgwich code, which contains a 2-3 implementation, is dated April 2014. It is on his Algorithms book site at RedBlackBST.java
Follow the next strategy to delete an arbitrary node in a LLRB tree which is not in a leaf:
Transform a LLRB tree to a 2-3-4 tree (we do not need to transform the whole tree, only a part of the tree).
Replace the value of the node (which we want to delete) its successor.
Delete its successor.
Fix the tree (recover balance, see the book "Algorithms 4th edition" on the pages 435, 436).
If a value in a leaf then we do not need to use a successor to replase this value, but we still need to transform the current tree to 2-3-4 tree to delete this value.
The slide on the page 20 of this presentation https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf and the book "Algorithms 4th edition" on the page 437 are a key. They show how a LLRB tree transformations into a 2-3 tree. In the book "Algorithms 4th edition" on the page 442 https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA442 is an algorithm of transformation for trees.
For example, open the page 54 of the presentation https://www.cs.princeton.edu/~rs/talks/LLRB/08Dagstuhl/RedBlack.pdf. The node H has children D and L. According to the algorithm on the page 442 we transform these three nodes into the 4-node of a 2-3-4 tree. Then the node D has children B and F we also transform these nodes into a node of 2-3-4 tree. Then the node B has children A and C we also transform these nodes into a node of 2-3-4 tree. And finally we need to delete A. After deletion we need to recover balance. We move up through the tree and we restore balance of the tree (according to rules, see the book "Algorithms 4th edition" on the pages 435, 436). If you need to delete the node D (the same tree on the page 54). You need the same transformations and need to replace the value of the node D on the value of the node E and delete the node E (because it is a successor of D).

Deletion procedure for a Binary Search Tree

Consider the deletion procedure on a BST, when the node to delete has two children. Let's say i always replace it with the node holding the minimum key in its right subtree.
The question is: is this procedure commutative? That is, deleting x and then y has the same result than deleting first y and then x?
I think the answer is no, but i can't find a counterexample, nor figure out any valid reasoning.
EDIT:
Maybe i've got to be clearer.
Consider the transplant(node x, node y) procedure: it replace x with y (and its subtree).
So, if i want to delete a node (say x) which has two children i replace it with the node holding the minimum key in its right subtree:
y = minimum(x.right)
transplant(y, y.right) // extracts the minimum (it doesn't have left child)
y.right = x.right
y.left = x.left
transplant(x,y)
The question was how to prove the procedure above is not commutative.
Deletion (in general) is not commutative. Here is a counterexample:
4
/ \
3 7
/
6
What if we delete 4 and then 3?
When we delete 4, we get 6 as the new root:
6
/ \
3 7
Deleting 3 doesn't change the tree, but gives us this:
6
\
7
What if we delete 3 and then 4?
When we delete 3 the tree doesn't change:
4
\
7
/
6
However, when we now delete 4, the new root becomes 7:
7
/
6
The two resulting trees are not the same, therefore deletion is not commutative.
UPDATE
I didn't read the restriction that this is when you always delete a node with 2 children. My solution is for the general case. I'll update it if/when I can find a counter-example.
ANOTHER UPDATE
I don't have concrete proof, but I'm going to hazard a guess:
In the general case, you handle deletions differently based on whether you have two children, one child, or no children. In the counter-example I provided, I first delete a node with two children and then a node with one child. After that, I delete a node with no children and then another node with one child.
In the special case of only deleting nodes with two children, you want to consider the case where both nodes are in the same sub-tree (since it wouldn't matter if they are in different sub-trees; you can be sure that the overall structure won't change based on the order of deletion). What you really need to prove is whether the order of deletion of nodes in the same sub-tree, where each node has two children, matters.
Consider two nodes A and B where A is an ancestor of B. Then you can further refine the question to be:
Is deletion commutative when you are considering the deletion of two nodes from a Binary Search Tree which have a ancestor-descendant relationship to each other (this would imply that they are in the same sub-tree)?
When you delete a node (let's say A), you traverse the right sub-tree to find the minimum element. This node will be a leaf node and can never be equal to B (because B has two children and cannot be a leaf node). You would then replace the value of A with the value of this leaf-node. What this means is that the only structural change to the tree is the replacement of A's value with the value of the leaf-node, and the loss of the leaf-node.
The same process is involved for B. That is, you replace the value of the node and replace a leaf-node. So in general, when you delete a node with two children, the only structural change is the change in value of the node you are deleting, and the deletion of the leaf node who's value you are using as replacement.
So the question is further refined:
Can you guarantee that you will always get the same replacement node regardless of the order of deletion (when you are always deleting a node with two children)?
The answer (I think) is yes. Why? Here are a few observations:
Let's say you delete the descendant node first and the ancestor node second. The sub-tree that was modified when you deleted the descendant node is not in the left sub-tree of the ancestor node's right child. This means that this sub-tree remains unaffected. What this also means is regardless of the order of deletion, two different sub-trees are modified and therefore the operation is commutative.
Again, let's say you delete the descendant node first and the ancestor node second. The sub-tree that was modified when you deleted the descendant node is in the left sub-tree of the ancestor node's right child. But even here, there is no overlap. The reason is when you delete the descendant node first, you look at the left sub-tree of the descendant node's right child. When you then delete the ancestor node, you will never go down that sub-tree since you will always be going towards the left after you enter the ancestor node's right-child's left sub-tree. So again, regardless of what you delete first you are modifying different sub-trees and so it appears order doesn't matter.
Another case is if you delete the ancestor node first and you find that the minimum node is a child of the descendant node. This means that the descendant node will end up with one child, and deleting the one child is trivial. Now consider the case where in this scenario, you deleted the descendant node first. Then you would replace the value of the descendant node with its right child and then delete the right child. Then when you delete the ancestor node, you end up finding the same minimum node (the old deleted node's left child, which is also the replaced node's left child). Either way, you end up with the same structure.
This is not a rigorous proof; these are just some observations I've made. By all means, feel free to poke holes!
It seems to me that the counterexample shown in Vivin's answer is the sole case of non-commutativity, and that it is indeed eliminated by the restriction that only nodes with two children can be deleted.
But it can also be eliminated if we discard what appears to be one of Vivin's premises, which is that we should traverse the right subtree as little as possible to find any acceptable successor. If, instead, we always promote the smallest node in the right subtree as the successor, regardless of how far away it turns out to be located, then even if we relax the restriction on deleting nodes with fewer than two children, Vivin's result
7
/
6
is never reached if we start at
4
/ \
3 7
/
6
Instead, we would first delete 3 (without successor) and then delete 4 (with 6 as successor), yielding
6
\
7
which is the same as if the order of deletion were reversed.
Deletion would then be commutative, and I think it is always commutative, given the premise I have named (successor is always smallest node in right subtree of deleted node).
I do not have a formal proof to offer, merely an enumeration of cases:
If the two nodes to be deleted are in different subtrees, then deletion of one does not affect the other. Only when they are in the same path can the order of deletion possibly affect the outcome.
So any effect on commutativity can come only when an ancestor node and one of its descendants are both deleted. Now, how does their vertical relationship affect commutativity?
Descendant in the left subtree of the ancestor. This situation will not affect commutativity because the successor comes from the right subtree and cannot affect the left subtree at all.
Descendant in the right subtree of the ancestor. If the ancestor's successor is always the smallest node in the right subtree, then order of deletion cannot change the choice of successor, no matter what descendant is deleted before or after the ancestor. Even if the successor to the ancestor turns out to be the descendant node that is also to be deleted, that descendant too is replaced with the the next-largest node to it, and that descendant cannot have its own left subtree remaining to be dealt with. So deletion of an ancestor and any right-subtree descendant will always be commutative.
I think there are two equally viable ways to delete a node, when it has 2 children: SKIP TO CASE 4...
Case 1: delete 3 (Leaf node)
2 3
/ \ --> / \
1 3 1
Case 2: delete 2 (Left child node)
2 3
/ \ --> / \
1 3 1
Case 3: delete 2 (Right child node)
2 2
/ \ --> / \
1 3 3
______________________________________________________________________
Case 4: delete 2 (Left & Right child nodes)
2 2 3
/ \ --> / \ or / \
1 3 1 3
BOTH WORK and have different resulting trees :)
______________________________________________________________________
As algorithm explained here: http://www.mathcs.emory.edu/~cheung/Courses/323/Syllabus/Trees/AVL-delete.html
Deleting a node with 2 children nodes:
1) Replace the (to-delete) node with its in-order predecessor or in-order successor
2) Then delete the in-order predecessor or in-order successor
I respond here to Vivin's second update.
I think this is a good recast of the question:
Is deletion commutative when you are
considering the deletion of two nodes
from a Binary Search Tree which have a
ancestor-descendant relationship to
each other (this would imply that they
are in the same sub-tree)?
but this bold sentence below is not true:
When you delete a node (let's say A),
you traverse the right sub-tree to
find the minimum element. This node
will be a leaf node and can never be equal to B
since the minimum element in A's right subtree can have a right child. So, it is not a leaf.
Let's call the minimum element in A's right subtree successor(A).
Now, it is true that B cannot be successor(A), but it can be in its right subtree. So, it is a mess.
I try to summarize.
Hypothesis:
A and B have two children each.
A and B are in the same subtree.
Other stuff we can deduce from hypothesis:
B is not successor(A), neither A is successor(B).
Now, given that, i think there are 4 different cases (as usual, let be A an ancestor of B):
B is in A's left subtree
B is an ancestor of successor(A)
successor(A) is an ancestor of B
B and successor(A) don't have any relationship. (they are in different A's subtrees)
I think (but of course i cannot prove it) that cases 1, 2 and 4 don't matter.
So, only in the case successor(A) is an ancestor of B deletion procedure could not be commutative. Or could it?
I pass the ball : )
Regards.

Resources