Rotation operations in a red-black tree - data-structures

Have been using Eric Roberts' Programming Abstractions textbook to strengthening DSA skills. There is an exercise to implement Red-Black tree. And there is a figure Rotations in a red black tree.
I don't see why the tree on the left, which is a mirrored version of the tree on the right satisfy conditions for being a legitimate Red -Black tree. All paths from the root to a left must contain the same number of black nodes.
In the picture I highlighted the path with red. N2 -> N1 -> T3 gives us one black node, excluding null pointer T3. But N2 -> N1 -> N4 - highlighted with green gives two black nodes. Contradiction.
Must some other operations be performed on the left tree to make it satisfy all R-B trees properties?

I am blind and so cannot comment directly on the pictures you've posted. However, when an insert or remove operation is performed the tree may (but need not) become imbalanced. Only when the insert or remove fix-up is complete are you guaranteed that the tree conditions will be valid. There is no sequence where a single rotation on a valid tree, by itself, will result in a different valid tree.
Are you certain that the graphic is not simply illustrating what is meant by the tree rotation operation without commenting on whether the resulting tree is valid?

Related

Converting 2-3-4 to Red-Black Tree

I am familiar with converting individual 2-node, 3-node, and 4-nodes straight to Red-Black trees. And this Stackoverflow link is a good explanation 2-3-4 to Red-Black. However, I have a question about the example given in that link.
This is how the Stackoverflow question 2-3-4 to red-black was illustrated 2-3-4 to Red-Black
I highlighted the part that I am questioning. Why is it on this guide 4-node connected to 2-node I found and others on the internet, they say when encountering a 4 node connected to a 2 or 3 node, you need to switch the colors around. But in the StackOverflow example that I highlighted red, they didn't. Thanks
The following image does not imply that the colors of the red-back tree need to be swapped:
It merely describes the process of splitting a B-tree node, which leads to an alternative B-tree for the same data.
Then it shows how that different shape of the B-tree leads to a different coloring in the corresponding red-black tree, and how that new coloring is also a valid alternative.
But the translation from B-tree to red-black tree follows the rules you referred to:
If we look at the left side of the image, we see at the bottom layer of the B-tree a 4-node. According to the rules, this translates to a black node (c) with two red children (b and d). The 2-node at the root translates to a black node (a).
If we look at the right side of the image, we see two 2-nodes at the bottom layer of the B-tree. These translate each to a back node (b and d). The root 3-node is translated to a back node (a) with a red node (c) as child.
This is exactly what is depicted at the bottom of the image. The point is that these two variants are valid red-black trees for the same data, but derived from different shapes of B-trees.
Such a transition from the left to the right version might be needed when inserting a node. For instance, if an "e" would be added, then it cannot be added as a child of the "d" node in the red-black tree without recoloring. By switching to the right-side version of the red-black tree, the node can be added as (red) child of node "d".

Can anyone explain the deletion of Left-Lean-Red-Black tree clearly?

I am learning Left-Lean-Red-Black tree, from Prof.Robert Sedgewick
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
While I got to understand the insert of the 2-3 tree and the LLRB, I have spent totally like 40 hours now for 2 weeks and I still can't get the deletion of the LLRB.
Can anyone really explain the deletion of LLRB to me?
Ok I am going to try this, and maybe the other good people of SO can help out. You know how one way of thinking of red nodes is as indicators of
where there there imbalance/new nodes in the tree, and
how much imbalance there is.
This is why all new nodes are red. When the nodes (locally) balance out, they undergo a color flip, and the redness is passed up to the parent, and now the parent may look imbalanced relative to its sibling.
As an illustration, consider a situation where you are adding nodes from larger to smaller. You start with node Z which is now root and is black. You add node Y, which is red and is a left child of Z. You add a red X as a child of Z, but now you have two successive reds, so you rotate right, recolor, and you have a balanced, all black (no imbalance/"new nodes"!) tree rooted at Y [first drawing]. Now you add W and V, in that order. At first they are both red [second drawing], but immediately V/X/W are rotated right, and color flipped, so that only X is red [third drawing]. This is important: X being red indicates that left subtree of Y is unbalanced by 2 nodes, or, in other words, there are two new nodes in the left subtree. So the height of the red links is the count of new, potentially unbalanced nodes: there are 2^height of new nodes in the red subtree.
Note how when adding nodes, the redness is always passed up: in color flip, two red children become black (=locally balanced) while coloring their parent red. Essentially what the deletion does, is reverse this process. Just like a new node is red, we always also want to delete a red node. If the node isn't red, then we want to make it red first. This can be done by a color flip (incidentally, this is why color flip in the code on page 3 is actually color-neutral). So if the child we want to delete is black, we can make it red by color-flipping its parent. Now the child is guaranteed to be red.
The next problem to deal with is the fact that when we start the deletion we don't know if the target node to be deleted is red or not. One strategy would be to find out first. However, according to my reading of your first reference, the strategy chosen there is to ensure that the deleted node can be made red, by "pushing" a red node down in front of the search node as we are searching down the tree for the node to be deleted. This may create unnecessary red nodes that fixUp() procedure will resolve on the way back up the tree. fixUp() presumably maintains the usual LLRBT invariants: "no successive red nodes" and "no right red nodes."
Not sure if that helps, or if we need to get into more detailed examination of code.
There is an interesting comment about the Sedgwich implementation and in particular its delete method from a Harvard Comp Sci professor. Left-Leaning Red-Black Trees Considered Harmful was written in 2013 (the Sedgwich pdf you referenced above is dated 2008):
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as the default and describes 2–3 trees as a variant. The delete implementation, however, only works for 2–3 trees. If you implement the default variant of insert and the only variant of delete, your tree won’t work. The text doesn’t highlight the switch from 2–3–4 to 2–3: not kind.
The most recent version I could find of the Sedgwich code, which contains a 2-3 implementation, is dated April 2014. It is on his Algorithms book site at RedBlackBST.java
Follow the next strategy to delete an arbitrary node in a LLRB tree which is not in a leaf:
Transform a LLRB tree to a 2-3-4 tree (we do not need to transform the whole tree, only a part of the tree).
Replace the value of the node (which we want to delete) its successor.
Delete its successor.
Fix the tree (recover balance, see the book "Algorithms 4th edition" on the pages 435, 436).
If a value in a leaf then we do not need to use a successor to replase this value, but we still need to transform the current tree to 2-3-4 tree to delete this value.
The slide on the page 20 of this presentation https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf and the book "Algorithms 4th edition" on the page 437 are a key. They show how a LLRB tree transformations into a 2-3 tree. In the book "Algorithms 4th edition" on the page 442 https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA442 is an algorithm of transformation for trees.
For example, open the page 54 of the presentation https://www.cs.princeton.edu/~rs/talks/LLRB/08Dagstuhl/RedBlack.pdf. The node H has children D and L. According to the algorithm on the page 442 we transform these three nodes into the 4-node of a 2-3-4 tree. Then the node D has children B and F we also transform these nodes into a node of 2-3-4 tree. Then the node B has children A and C we also transform these nodes into a node of 2-3-4 tree. And finally we need to delete A. After deletion we need to recover balance. We move up through the tree and we restore balance of the tree (according to rules, see the book "Algorithms 4th edition" on the pages 435, 436). If you need to delete the node D (the same tree on the page 54). You need the same transformations and need to replace the value of the node D on the value of the node E and delete the node E (because it is a successor of D).

Deletion in Left Leaning Red Black Trees

I am learning about Left Leaning Red Black Trees.
In the deletion algorithm peresented in the paper, if the key matches for a node and the right subtree is NULL for that node, then that node is deleted. But there may be a left subtree as well which is not considered.
I am not able to understand why would the left subtree be NULL as well. Similar thing is done when deleting the minimum or the maximum as well. Could anyone please guide me on this?
It seems you are speaking about this piece of code:
if (isRed(h.left))
h = rotateRight(h);
if (key.compareTo(h.key) == 0 && (h.right == null))
return null;
Here left descendant cannot be "red" because preceding code would rotate it to the right.
Also left descendant cannot be "black" because in this case there is a path to the left of h containing at least one "black" node while no path to the right of it has any "black" nodes. But in RB-tree the number of black nodes on every path must be the same.
This means there is no left descendant at all and node h is a leaf node.
In deleteMin function there is no need to check right sub-tree if left sub-tree is empty because no right sub-tree of LLRB tree can be greater than corresponding left sub-tree.
There is an interesting analysis of whether left-leaning red black trees are really better or even simpler than prior implementations. The article Left-Leaning Red Black Trees Considered Harmful waswritten by Harvard Comp Sci professor Eddie Kohler. He writes:
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as
the default and describes 2–3 trees as a variant. The delete implementation, however,
only works for 2–3 trees. If you implement the default variant of insert and the
only variant of delete,your tree won’t work. The text doesn’t highlight the switch
from 2–3–4 to 2–3: not kind.

Binary Tree Definition

I see this definition of a binary tree in Wikipedia:
Another way of defining binary trees is a recursive definition on directed graphs. A binary tree is either:
A single vertex.
A graph formed by taking two binary trees, adding a vertex, and adding an edge directed from the new vertex to the root of each binary tree.
How then is it possible to have a binary tree with one root and one left son, like this:
O
/
O
This is a binary tree, right? What am I missing here?
And please don't just say "Wikipedia can be wrong", I've seen this definition in a few other places as well.
Correct. A tree can be empty (nil)
Let's assume you have two trees: one, that has one vertex, and one which is empty (nil). They look like this:
O .
Notice that I used a dot for the (nil) tree.
Then I add a new vertex, and edges from the new vertex to the existing two trees (notice that we do not take edges from the existing trees and connect them to the new vertes - it would be impossible.). So it looks like it now:
O
/ \
O .
Since edges leading to (nil) are not drawn, here it is what is at the end:
O
/
O
I hope it clarifies.
It depends on the algorithm you use for binary-tree: as for icecream, there are many flavors :)
One example is when you have a mix of node pointers and leaf pointers on a node, and a balancing system that decide to create a second node (wether it's the root or the other) when you are inserting new values on a full node: instead of creating a root and 2 leafs, by splitting it, it's much more economical to create just another node.
Wikipedia can be wrong. Binary trees are finite data structures, a subtree must be allowed to be empty otherwise binary trees would be infinite. The base case for the recursive definition of a binary tree must allow either a single node or the empty tree.
Section 14.4 of Touch of Class: An Introduction to Programming Well Using Objects
and Contracts, by Bertrand Meyer, Springer Verlag, 2009. © Bertrand Meyer, 2009. has a better recursive definition of a binary tree
Definition: binary tree
A binary tree over G, for an arbitrary data type G, is a finite set of items called
nodes, each containing a value of type G, such that the nodes, if any, are
divided into three disjoint parts:
• A single node, called the root of the binary tree.
• (Recursively) two binary trees over G, called the left subtree and right subtree.
The definition explicitly allows a binary tree to be empty (“the nodes, if any”).
Without this, of course, the recursive definition would lead to an infinite
structure, whereas our binary trees are, as the definition also prescribes, finite.
If not empty, a binary tree always has a root, and may have: no subtree; a
left subtree only; a right subtree only; or both.

Complete binary tree definitions

I have some questions on binary trees:
Wikipedia states that a binary tree is complete when "A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible." What does the last "as far left as possible" passage mean?
A well-formed binary tree is said to be "height-balanced" if (1) it is empty, or (2) its left and right children are height-balanced and the height of the left tree is within 1 of the height of the right tree, taken from How to determine if binary tree is balanced?, is this correct or there's "jitter" on the 1-value? I read on the answer I linked that there could be also a difference factor of 4 between the height of the right and the left tree
Do the complete and height-balanced definitions just apply to binary tree or just any other tree?
Following the reference of the definition in wikipedia, I got to
this page. The definition was taken from there but modified:
Definition: A binary tree in which every level, except possibly the deepest, is completely filled. At depth n, the height of the
tree, all nodes must be as far left as possible.
It continues with a note below though,
A complete binary tree has 2k nodes at every depth k < n and between 2n and 2^(n+1) - 1 nodes altogether.
Sometimes, definitions vary according to convenience (be useful for something). That passage might be a variation which, as I understand, requires leaf nodes to fill first the left side of the deepest level (that is, fill from left to right). The definition that I usually found is exactly as described above but without that
passage.
Usually the definition taken for height-balanced tree is the one you
described. In other words:
A tree is balanced if and only if for every node the heights of its two subtrees differ by at most 1.
That definition was taken from here. Again, sometimes definitions are made more flexible to serve specific purposes. For example, the definition of an AVL tree says that
In an AVL tree, the heights of the two child subtrees of any node
differ by at most one
Still, I remember once I had to rewrite an algorithm so that the tree
would be considered height-balanced if the two child subtrees of any
node differed by at most 2. Note that the definition you gave is recursive, this is very common for binary trees.
In a tree whose number of children is variable, you wouldn't be able to say that it is complete (any parent could have the number of children that you want). Still, it can apply to n-ary trees (with a fixed amount of n children).
Do the complete and height-balanced definitions just apply to binary
tree or just any other tree?
Short answer: Yes, it can be extended to any n-ary tree.

Resources