Deletion in Left Leaning Red Black Trees - data-structures

I am learning about Left Leaning Red Black Trees.
In the deletion algorithm peresented in the paper, if the key matches for a node and the right subtree is NULL for that node, then that node is deleted. But there may be a left subtree as well which is not considered.
I am not able to understand why would the left subtree be NULL as well. Similar thing is done when deleting the minimum or the maximum as well. Could anyone please guide me on this?

It seems you are speaking about this piece of code:
if (isRed(h.left))
h = rotateRight(h);
if (key.compareTo(h.key) == 0 && (h.right == null))
return null;
Here left descendant cannot be "red" because preceding code would rotate it to the right.
Also left descendant cannot be "black" because in this case there is a path to the left of h containing at least one "black" node while no path to the right of it has any "black" nodes. But in RB-tree the number of black nodes on every path must be the same.
This means there is no left descendant at all and node h is a leaf node.
In deleteMin function there is no need to check right sub-tree if left sub-tree is empty because no right sub-tree of LLRB tree can be greater than corresponding left sub-tree.

There is an interesting analysis of whether left-leaning red black trees are really better or even simpler than prior implementations. The article Left-Leaning Red Black Trees Considered Harmful waswritten by Harvard Comp Sci professor Eddie Kohler. He writes:
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as
the default and describes 2–3 trees as a variant. The delete implementation, however,
only works for 2–3 trees. If you implement the default variant of insert and the
only variant of delete,your tree won’t work. The text doesn’t highlight the switch
from 2–3–4 to 2–3: not kind.

Related

Converting 2-3-4 to Red-Black Tree

I am familiar with converting individual 2-node, 3-node, and 4-nodes straight to Red-Black trees. And this Stackoverflow link is a good explanation 2-3-4 to Red-Black. However, I have a question about the example given in that link.
This is how the Stackoverflow question 2-3-4 to red-black was illustrated 2-3-4 to Red-Black
I highlighted the part that I am questioning. Why is it on this guide 4-node connected to 2-node I found and others on the internet, they say when encountering a 4 node connected to a 2 or 3 node, you need to switch the colors around. But in the StackOverflow example that I highlighted red, they didn't. Thanks
The following image does not imply that the colors of the red-back tree need to be swapped:
It merely describes the process of splitting a B-tree node, which leads to an alternative B-tree for the same data.
Then it shows how that different shape of the B-tree leads to a different coloring in the corresponding red-black tree, and how that new coloring is also a valid alternative.
But the translation from B-tree to red-black tree follows the rules you referred to:
If we look at the left side of the image, we see at the bottom layer of the B-tree a 4-node. According to the rules, this translates to a black node (c) with two red children (b and d). The 2-node at the root translates to a black node (a).
If we look at the right side of the image, we see two 2-nodes at the bottom layer of the B-tree. These translate each to a back node (b and d). The root 3-node is translated to a back node (a) with a red node (c) as child.
This is exactly what is depicted at the bottom of the image. The point is that these two variants are valid red-black trees for the same data, but derived from different shapes of B-trees.
Such a transition from the left to the right version might be needed when inserting a node. For instance, if an "e" would be added, then it cannot be added as a child of the "d" node in the red-black tree without recoloring. By switching to the right-side version of the red-black tree, the node can be added as (red) child of node "d".

Properties of Red-Black Tree

Properties of Red-Black Tree:
Every node is either red or black.
The root is black.
Every leaf (NIL) is black.
If a node is red, then both its children are black.
For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.
According to the properties, are these valid or invalid red black trees?
A.
I think this is valid
B.
I think this is valid, but I am not sure since there two adjacent red nodes?
C.
I think this is valid, but I am not sure since there two adjacent red nodes?
D.
I think this is not valid since it violate Property 4?
Did I understand these properties of a RBtree right? If not, where am I wrong?
You have listed the properties of Red-Black trees correctly. Of the four trees only C is not a valid red-black tree:
A.
This is a valid tree. Wikipedia confirms:
every perfect binary tree that consists only of black nodes is a red–black tree.
B.
I think this is valid, but I am not sure since there two adjacent red nodes?
It is valid. There is no problem with red nodes being siblings. They just should not be in a parent-child relationship.
C.
I think this is valid, but I am not sure since there two adjacent red nodes?
It is not valid. Not because of the adjacent red nodes, but because of property 5. The node with label 12 has paths to its leaves with varying number of black nodes. Same for the node 25.
As a general rule, a red node can never have exactly one NIL-leaf as child. Its children should either both be NIL-leaves, or both be (black) internal nodes. This follows from the properties.
D.
I think this is not valid since it violate Property 4?
Property 4 is not violated: the children of the red nodes are NIL leaves (not visualised here), which are black. The fact that these red nodes have black NIL leaves as siblings is irrelevant: there are no rules that concern siblings. So this is valid.
For an example that combines characteristics of tree C and D, see this valid tree depicted in the Wikipedia article, which also depicts the NIL leaves:
A, B & D are valid red-black trees
C is not valid red-black tree as the black height from root to leaf is not the same. It is 2 in some paths and 1 in other paths. It violates what you stated as rule 5.
If 12 had a right child that was black and 25 a left child that was black, then it would be a red-black tree.
A red-black tree is basically identical to a 2-3-4 tree(4-Btree), even though the splitting/swapping method is upside down.
2-3-4 trees have fixed-size 3-node buckets. The color black means that it's the central node of the 3-bucket. Any red-black tree is considered as a perfect quadtree/binary tree (of 3-node-buckets) with empty nodes(black holes and red holes).
In other words, every black node (every 3-bucket) has its absolute position in the perfect tree(2 dimensional unique Cartesian or 4-adic/2-adic unique fraction number).
NIL nodes are just extra flags to save space; you don't have enough memory to store a perfect quadtree/binary tree.
The easiest way to check a red-black tree is to check that each black node is a new bucket(going down) and each red node is grouped with the above black node(same bucket). If the central black node has less than 2 red nodes, you can just add empty red holes next to the central black node(left and right).
A new black node is always the grandson of the last black node, and each black node can have only two red daughter-nodes and no black son-nodes. If the red daughter(mother) is empty(dead/unborn), the motherless grandson-node is directly linked to its grandfather-node.
A motherless black grandson-node has no brother, but he can have a black cousin-node next to him; the 2 cousins are linked to the same grandfather.
A quadtree is a subset of a binary tree.
All black nodes have even heights(2,4,6...), and all red nodes have odd heights(1,3,5...). Optionally, you can use the half unit 0.5.
The 3-bucket has a fixed size 3; just add extra red holes(unborn unlinked red daughters) to make the size 3.

kd-tree: duplicate key and deletion

In these slides (13) the deletion of a point in a kd-tree is described: It states that the left subtree can be swapped to be the right subtree, if the deleted node has only a left subtree. Then the minimum can be found and recursively be deleted (just as with a right subtree).
This is because kd-trees with equal keys for the current dimensions should be on the right.
My question: Why does the equal key point have to be the right children of the parent point? Also, what happens if my kd-tree algorithm returns a tree with an equal key point on the left?
For example:
Assume the dataset (7,2), (7,4), (9,6)
The resulting kd-tree would be (sorted with respect to one axis):
(7,2)
/ \
(7,4) (9,6)
Another source that states the same theory is this one (paragraph above Example 15.4.3)
Note that we can replace the node to be deleted with the least-valued node from the right subtree only if the right subtree exists. If it does not, then a suitable replacement must be found in the left subtree. Unfortunately, it is not satisfactory to replace N's record with the record having the greatest value for the discriminator in the left subtree, because this new value might be duplicated. If so, then we would have equal values for the discriminator in N's left subtree, which violates the ordering rules for the kd tree. Fortunately, there is a simple solution to the problem. We first move the left subtree of node N to become the right subtree (i.e., we simply swap the values of N's left and right child pointers). At this point, we proceed with the normal deletion process, replacing the record of N to be deleted with the record containing the least value of the discriminator from what is now N's right subtree.
Both refer to nodes that only have a left subtree but why would this be any different?
Thanks!
There is no hard and fast rule to have equal keys on right only. You can update that to left as well.
But doing this, you would also need update your algorithms of search and delete operations.
Have a look at these links:
https://www.geeksforgeeks.org/k-dimensional-tree/
https://www.geeksforgeeks.org/k-dimensional-tree-set-3-delete/

How to save the memory when storing color information in Red-Black Trees?

I've bumped into this question at one of Coursera algorithms course and realized that I have no idea how to do that. But still, I have some thoughts about it. The first thing that comes into my mind was using optimized bit set (like Java's BitSet) to get mapping node's key -> color. So, all we need is to allocate a one bit set for whole tree and use it as color information source. If there is no duplicatate elements in the tree - it should work.
Would be happy to see other's ideas about this task.
Just modify the BST. For black node, do nothing. And for red node, exchange its left child and right child. In this case, a node can be justified red or black according to if its right child is larger than its left child.
Use the least significant bit of one of the pointers in the node to store the color information. The node pointers should contain even addresses on most platforms. See details here.
There's 2 rules we can use:
since the root node is always black, then a red node will always have a parent node.
RB BST is always with the order that left_child < parent < right_child
Then we will do this:
keep the black node unchanged.
for the red node, we call it as R, we suppose it as the left child node for it's parent node, called P.
change the red node value from R to R', while R' = P + P - R
now that R' > P, but as it's the left child tree, we will find the order mismatch.
If we find an order mismatch, then we will know it's a red node.
and it's easy to go back to the original R = P + P - R'
One option is to use a tree that requires less bookkeeping, e.g. a splay tree. However, splay trees in particular aren't very good for iteration (they're much better at random lookup), so they may not be a good fit for the domain you're working in.
You can also use one BitSet for the entire red-black tree based on node position, e.g. the root is the 0th bit, the root's left branch is the 1st bit, the right branch is the 2nd bit, the left branch's left branch is the 3rd bit, etc; this way it shouldn't matter if there are duplicate elements. While traversing the tree make note of which bit position you're at.
It's much more efficient in terms of space to use one bitset for the tree instead of assigning a boolean to each node; each boolean will take up at least a byte and may take up a word depending on alignment, whereas the bitset will only take up one bit per node (plus 2x bits to account for a maximally unbalanced tree where the shortest branch is half the length of the longest branch).
Instead of using boolean property on a child we could define a red node as the one who has a child in the wrong place.
If we go this way all leaf nodes are guaranteed to to be black and we should swap parent with his sibling (making him red) when inserting a new node.

Can anyone explain the deletion of Left-Lean-Red-Black tree clearly?

I am learning Left-Lean-Red-Black tree, from Prof.Robert Sedgewick
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
While I got to understand the insert of the 2-3 tree and the LLRB, I have spent totally like 40 hours now for 2 weeks and I still can't get the deletion of the LLRB.
Can anyone really explain the deletion of LLRB to me?
Ok I am going to try this, and maybe the other good people of SO can help out. You know how one way of thinking of red nodes is as indicators of
where there there imbalance/new nodes in the tree, and
how much imbalance there is.
This is why all new nodes are red. When the nodes (locally) balance out, they undergo a color flip, and the redness is passed up to the parent, and now the parent may look imbalanced relative to its sibling.
As an illustration, consider a situation where you are adding nodes from larger to smaller. You start with node Z which is now root and is black. You add node Y, which is red and is a left child of Z. You add a red X as a child of Z, but now you have two successive reds, so you rotate right, recolor, and you have a balanced, all black (no imbalance/"new nodes"!) tree rooted at Y [first drawing]. Now you add W and V, in that order. At first they are both red [second drawing], but immediately V/X/W are rotated right, and color flipped, so that only X is red [third drawing]. This is important: X being red indicates that left subtree of Y is unbalanced by 2 nodes, or, in other words, there are two new nodes in the left subtree. So the height of the red links is the count of new, potentially unbalanced nodes: there are 2^height of new nodes in the red subtree.
Note how when adding nodes, the redness is always passed up: in color flip, two red children become black (=locally balanced) while coloring their parent red. Essentially what the deletion does, is reverse this process. Just like a new node is red, we always also want to delete a red node. If the node isn't red, then we want to make it red first. This can be done by a color flip (incidentally, this is why color flip in the code on page 3 is actually color-neutral). So if the child we want to delete is black, we can make it red by color-flipping its parent. Now the child is guaranteed to be red.
The next problem to deal with is the fact that when we start the deletion we don't know if the target node to be deleted is red or not. One strategy would be to find out first. However, according to my reading of your first reference, the strategy chosen there is to ensure that the deleted node can be made red, by "pushing" a red node down in front of the search node as we are searching down the tree for the node to be deleted. This may create unnecessary red nodes that fixUp() procedure will resolve on the way back up the tree. fixUp() presumably maintains the usual LLRBT invariants: "no successive red nodes" and "no right red nodes."
Not sure if that helps, or if we need to get into more detailed examination of code.
There is an interesting comment about the Sedgwich implementation and in particular its delete method from a Harvard Comp Sci professor. Left-Leaning Red-Black Trees Considered Harmful was written in 2013 (the Sedgwich pdf you referenced above is dated 2008):
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as the default and describes 2–3 trees as a variant. The delete implementation, however, only works for 2–3 trees. If you implement the default variant of insert and the only variant of delete, your tree won’t work. The text doesn’t highlight the switch from 2–3–4 to 2–3: not kind.
The most recent version I could find of the Sedgwich code, which contains a 2-3 implementation, is dated April 2014. It is on his Algorithms book site at RedBlackBST.java
Follow the next strategy to delete an arbitrary node in a LLRB tree which is not in a leaf:
Transform a LLRB tree to a 2-3-4 tree (we do not need to transform the whole tree, only a part of the tree).
Replace the value of the node (which we want to delete) its successor.
Delete its successor.
Fix the tree (recover balance, see the book "Algorithms 4th edition" on the pages 435, 436).
If a value in a leaf then we do not need to use a successor to replase this value, but we still need to transform the current tree to 2-3-4 tree to delete this value.
The slide on the page 20 of this presentation https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf and the book "Algorithms 4th edition" on the page 437 are a key. They show how a LLRB tree transformations into a 2-3 tree. In the book "Algorithms 4th edition" on the page 442 https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA442 is an algorithm of transformation for trees.
For example, open the page 54 of the presentation https://www.cs.princeton.edu/~rs/talks/LLRB/08Dagstuhl/RedBlack.pdf. The node H has children D and L. According to the algorithm on the page 442 we transform these three nodes into the 4-node of a 2-3-4 tree. Then the node D has children B and F we also transform these nodes into a node of 2-3-4 tree. Then the node B has children A and C we also transform these nodes into a node of 2-3-4 tree. And finally we need to delete A. After deletion we need to recover balance. We move up through the tree and we restore balance of the tree (according to rules, see the book "Algorithms 4th edition" on the pages 435, 436). If you need to delete the node D (the same tree on the page 54). You need the same transformations and need to replace the value of the node D on the value of the node E and delete the node E (because it is a successor of D).

Resources