Converting 2-3-4 to Red-Black Tree - data-structures

I am familiar with converting individual 2-node, 3-node, and 4-nodes straight to Red-Black trees. And this Stackoverflow link is a good explanation 2-3-4 to Red-Black. However, I have a question about the example given in that link.
This is how the Stackoverflow question 2-3-4 to red-black was illustrated 2-3-4 to Red-Black
I highlighted the part that I am questioning. Why is it on this guide 4-node connected to 2-node I found and others on the internet, they say when encountering a 4 node connected to a 2 or 3 node, you need to switch the colors around. But in the StackOverflow example that I highlighted red, they didn't. Thanks

The following image does not imply that the colors of the red-back tree need to be swapped:
It merely describes the process of splitting a B-tree node, which leads to an alternative B-tree for the same data.
Then it shows how that different shape of the B-tree leads to a different coloring in the corresponding red-black tree, and how that new coloring is also a valid alternative.
But the translation from B-tree to red-black tree follows the rules you referred to:
If we look at the left side of the image, we see at the bottom layer of the B-tree a 4-node. According to the rules, this translates to a black node (c) with two red children (b and d). The 2-node at the root translates to a black node (a).
If we look at the right side of the image, we see two 2-nodes at the bottom layer of the B-tree. These translate each to a back node (b and d). The root 3-node is translated to a back node (a) with a red node (c) as child.
This is exactly what is depicted at the bottom of the image. The point is that these two variants are valid red-black trees for the same data, but derived from different shapes of B-trees.
Such a transition from the left to the right version might be needed when inserting a node. For instance, if an "e" would be added, then it cannot be added as a child of the "d" node in the red-black tree without recoloring. By switching to the right-side version of the red-black tree, the node can be added as (red) child of node "d".

Related

Red Black Tree Insertion & Deletion Uniqueness

I've been learning and working on implementing a red-black tree data structure. I'm following this article on red-black tree deletion examples and looking at example 5 they have:
When I insert the same nodes into my tree, I get the following:
I understand that red black trees are not unique (I think), therefore both of the above trees are valid since they don't violate any of the properties.
In the example article, after deleting node 1, they get the following:
But after deleting node 1 in my code, I get the following:
Since in my case, node 1 is red, I don't call my delete_fix function which takes care of re-arranging the tree and such. The deletion algorithm I was following simply states to call a delete_fix function if the node to be deleted is black.
However, after comparing my tree with the one in the example article I can see that mine is not exactly optimized. It still follows the rules of the red-black tree though. Is this to be expected with red-black trees or am I missing something here?
However, after comparing my tree with the one in the example article I can see that mine is not exactly optimized.
It is optimised. Your tree will be fast at deleting nodes 5, 7, 20 & 28. The other only 5 & 7.
Bear in mind that for Red-Black Trees, they can be bushy in one direction. If the black tree height of real nodes is N, then the minimum path from root to leaf node is N (all black) and maximum path from root to leaf node is 2 * N (alternatively black-red-black-red etc). If you try to add a new node to the bushy path that is at maximum height, the tree will recolour and/or rebalance.
If you want a more balanced search tree you should use an AVL tree. Red-Black trees favour minimal insertion/deletion fixups over finding a node. Your tree is fine.

Rotation operations in a red-black tree

Have been using Eric Roberts' Programming Abstractions textbook to strengthening DSA skills. There is an exercise to implement Red-Black tree. And there is a figure Rotations in a red black tree.
I don't see why the tree on the left, which is a mirrored version of the tree on the right satisfy conditions for being a legitimate Red -Black tree. All paths from the root to a left must contain the same number of black nodes.
In the picture I highlighted the path with red. N2 -> N1 -> T3 gives us one black node, excluding null pointer T3. But N2 -> N1 -> N4 - highlighted with green gives two black nodes. Contradiction.
Must some other operations be performed on the left tree to make it satisfy all R-B trees properties?
I am blind and so cannot comment directly on the pictures you've posted. However, when an insert or remove operation is performed the tree may (but need not) become imbalanced. Only when the insert or remove fix-up is complete are you guaranteed that the tree conditions will be valid. There is no sequence where a single rotation on a valid tree, by itself, will result in a different valid tree.
Are you certain that the graphic is not simply illustrating what is meant by the tree rotation operation without commenting on whether the resulting tree is valid?

Can anyone explain the deletion of Left-Lean-Red-Black tree clearly?

I am learning Left-Lean-Red-Black tree, from Prof.Robert Sedgewick
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
While I got to understand the insert of the 2-3 tree and the LLRB, I have spent totally like 40 hours now for 2 weeks and I still can't get the deletion of the LLRB.
Can anyone really explain the deletion of LLRB to me?
Ok I am going to try this, and maybe the other good people of SO can help out. You know how one way of thinking of red nodes is as indicators of
where there there imbalance/new nodes in the tree, and
how much imbalance there is.
This is why all new nodes are red. When the nodes (locally) balance out, they undergo a color flip, and the redness is passed up to the parent, and now the parent may look imbalanced relative to its sibling.
As an illustration, consider a situation where you are adding nodes from larger to smaller. You start with node Z which is now root and is black. You add node Y, which is red and is a left child of Z. You add a red X as a child of Z, but now you have two successive reds, so you rotate right, recolor, and you have a balanced, all black (no imbalance/"new nodes"!) tree rooted at Y [first drawing]. Now you add W and V, in that order. At first they are both red [second drawing], but immediately V/X/W are rotated right, and color flipped, so that only X is red [third drawing]. This is important: X being red indicates that left subtree of Y is unbalanced by 2 nodes, or, in other words, there are two new nodes in the left subtree. So the height of the red links is the count of new, potentially unbalanced nodes: there are 2^height of new nodes in the red subtree.
Note how when adding nodes, the redness is always passed up: in color flip, two red children become black (=locally balanced) while coloring their parent red. Essentially what the deletion does, is reverse this process. Just like a new node is red, we always also want to delete a red node. If the node isn't red, then we want to make it red first. This can be done by a color flip (incidentally, this is why color flip in the code on page 3 is actually color-neutral). So if the child we want to delete is black, we can make it red by color-flipping its parent. Now the child is guaranteed to be red.
The next problem to deal with is the fact that when we start the deletion we don't know if the target node to be deleted is red or not. One strategy would be to find out first. However, according to my reading of your first reference, the strategy chosen there is to ensure that the deleted node can be made red, by "pushing" a red node down in front of the search node as we are searching down the tree for the node to be deleted. This may create unnecessary red nodes that fixUp() procedure will resolve on the way back up the tree. fixUp() presumably maintains the usual LLRBT invariants: "no successive red nodes" and "no right red nodes."
Not sure if that helps, or if we need to get into more detailed examination of code.
There is an interesting comment about the Sedgwich implementation and in particular its delete method from a Harvard Comp Sci professor. Left-Leaning Red-Black Trees Considered Harmful was written in 2013 (the Sedgwich pdf you referenced above is dated 2008):
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as the default and describes 2–3 trees as a variant. The delete implementation, however, only works for 2–3 trees. If you implement the default variant of insert and the only variant of delete, your tree won’t work. The text doesn’t highlight the switch from 2–3–4 to 2–3: not kind.
The most recent version I could find of the Sedgwich code, which contains a 2-3 implementation, is dated April 2014. It is on his Algorithms book site at RedBlackBST.java
Follow the next strategy to delete an arbitrary node in a LLRB tree which is not in a leaf:
Transform a LLRB tree to a 2-3-4 tree (we do not need to transform the whole tree, only a part of the tree).
Replace the value of the node (which we want to delete) its successor.
Delete its successor.
Fix the tree (recover balance, see the book "Algorithms 4th edition" on the pages 435, 436).
If a value in a leaf then we do not need to use a successor to replase this value, but we still need to transform the current tree to 2-3-4 tree to delete this value.
The slide on the page 20 of this presentation https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf and the book "Algorithms 4th edition" on the page 437 are a key. They show how a LLRB tree transformations into a 2-3 tree. In the book "Algorithms 4th edition" on the page 442 https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA442 is an algorithm of transformation for trees.
For example, open the page 54 of the presentation https://www.cs.princeton.edu/~rs/talks/LLRB/08Dagstuhl/RedBlack.pdf. The node H has children D and L. According to the algorithm on the page 442 we transform these three nodes into the 4-node of a 2-3-4 tree. Then the node D has children B and F we also transform these nodes into a node of 2-3-4 tree. Then the node B has children A and C we also transform these nodes into a node of 2-3-4 tree. And finally we need to delete A. After deletion we need to recover balance. We move up through the tree and we restore balance of the tree (according to rules, see the book "Algorithms 4th edition" on the pages 435, 436). If you need to delete the node D (the same tree on the page 54). You need the same transformations and need to replace the value of the node D on the value of the node E and delete the node E (because it is a successor of D).

What is the name (if any) for this kind of tree?

I have this tree which, for each node, has exactly 10 childnodes (0-9). Each node has some associated data (say, for example, a name and a tag and a color) which, I guess, isn't important for this question. Each of the childnodes has exactly 10 childnodes. A node can be null (which 'ends' the branch') or contain another node.
To visualize what I'm talking about I made this diagram (fear my paintz0r skillz!):
A black box is a null-node. A white box is a node which contains data and childnodes. As you can see, even the root, each node has exactly 10 childnodes. Because of simplicity and to keep the diagram sane I have drawn some nodes very tiny but you can imagine these tiny nodes being the same.
This structure allows me to traverse a path consisting of digits very quickly: a path of 47352 would lead me down the "orange path" to the final destination; 4->7->3->5 where the final 2 cannot be resolved because that last one is a null-node (although colored red) and contains no childnodes.
My question is pretty simple actually: what is this kind of tree called? I have gone through all trees on Wikipedia's Tree (data structure) lemma and the closest I (think I) could get is the Octree and/or K-ary tree. Along those lines of reasoning my tree would be called a Dectree, Decitree, 10-ary tree or 10-way tree or something. But there might be a better name for this. So: anyone?
K-ary tree with K=10
In graph theory, a k-ary tree is a rooted tree in which each node has
no more than k children
It is also sometimes known as a k-way tree, an N-ary tree, or an M-ary
tree. A binary tree is the special case where k=2.
This is something like B-Tree.

How does a red-black tree work?

There are lots of questions around about red-black trees but none of them answer how they work. Why is it called red-black? How does this keep the tree balanced (thus increasing performance over an unbalanced normal binary search tree)? I'm just looking for an overview of how and why it works.
For searches and traversals, it's the same as any binary tree.
For inserts and deletes, more sophisticated algorithms are applied which aim to ensure that the tree cannot be too unbalanced. These guarantee that all single-item operations will always run in at worst O(log n) time, whereas in a simple binary tree the binary tree can become so unbalanced that it's effectively a linked list, giving O(n) worst case performance for each single-item operation.
The basic idea of the red-black tree is to imitate a B-tree with up to 3 keys and 4 children per node. B-trees (or variations such as B+ trees) are mainly used for database indexes and for data stored on hard disk.
Each binary tree node has a "colour" - red or black. Each black node is, in the B-tree analogy, the subtree root for the subtree that fits within that B-tree node. If this node has red children, they are also considered part of the same B-tree node. So it is possible (though not done in practice) to convert a red-black tree to a B-tree and back, with (most) structure preserved. The only possible anomoly is that when a B-tree node has two keys and three children, you have a choice of which key to goes in the black node in the equivalent red-black tree.
For example, with red-black trees, every line from root to leaf has the same number of black nodes. This rule is derived from the B-tree rule that all leaf nodes are at the same depth.
Although this is the basic idea from which red-black trees are derived, the algorithms used in practice for inserts and deletes are modified to enforce all the B-tree rules (there might be a minor exception - I forget) during updates, but are tailored for the binary tree form. This means that doing a red-black tree insert or delete may give a different structure for the result than that you'd expect comparing with doing the B-tree insert or delete.
For more detail, follow the Wikipedia link that MigDus already supplied.
A red-black tree is an ordered binary tree where each vertex is coloured red or black. The intuition is that a red vertex should be seen as being at the same height as its parent (i.e., an edge to a red vertex is thought of as "horizontal" rather than "descending").
[I don't believe the Wikipedia entry makes this point clear.]
The usual rules for red-black trees require that a red vertex never point to another red vertex. This means that the possible vertex arrangements for any subtree rooted with a black vertex (bbb, bbr, rbb, rbr -- for [left child][root][right child]) correspond to 234 trees.
Searching a red-black tree is just the same as searching an ordinary binary tree. Insertion and deletion are similar, except that a "fix-up" rotation may be required at some point to preserve the red-black invariant.
Cheers!

Resources