I am reading the Wikipedia explanation of red black tree removal process.
There is one simple thing which I am not able to understand.
Example: I have a totally black RBTree
3(B)
/ \
/ \
1(B) 5(B)
/ \ / \
/ \ / \
0(B) 2(B) 4(B) 6(B)
Wikipedia states that if you have a node to delete with 2 leaf children and the sibling is also a node with 2 Leaf children, then we can simply remove the node and recolor the parent and sibling.
Suppose in the tree above I want to delete 0. Then no amount of recoloring 1 or 2 helps because no matter what you do, the two subrees (the 1 side) and the 5 side end up with different black heights.
What am I missing?
I found that Wikipedia had a very good explanation of insert, but the explanation of delete is confusing.
Deleting node "0" in your tree is actually the most complicated case. Let's follow the description from wikipedia step by step:
We use the label M to denote the node to be deleted; C will denote a
selected child of M, which we will also call "its child". If M does
have a non-leaf child, call that its child, C; otherwise, choose
either leaf as its child, C.
So, in this case, M is your node "0", which will be deleted, and C is any of its NIL children (leaves, which are always NIL). Just to remind, your original tree - following your beautiful ascii-art is:
3(B)
/ \
/ \
1(B) 5(B)
/ \ / \
/ \ / \
0(B) 2(B) 4(B) 6(B)
So M is "0", and note that C (the NIL leaf) is not painted here. The wikipedia follows:
The complex case is when both M and C are black (this is our case).... In the diagrams
below, we will also use P for N's new parent (M's old parent), SL for
S's left child, and SR for S's right child.
So, P is "1", and S is "2", all nodes are black. Then you follow the case descriptions. You skip case 1, as "0" is not root. You skip case 2, as "2" is not red. Case 3 is matching:
Case 3: P, S, and S's children are black. In this case, we simply repaint S red. The result is that all paths passing through S, which
are precisely those paths not passing through N, have one less black
node. Because deleting N's original parent made all paths passing
through N have one less black node, this evens things up. However, all
paths through P now have one fewer black node than paths that do not
pass through P, so property 5 (all paths from any given node to its
leaf nodes contain the same number of black nodes) is still violated.
To correct this, we perform the rebalancing procedure on P, starting
at case 1.
So at this point, you delete "0", and repaint S = "2" red:
3(B)
/ \
/ \
1(B) 5(B)
\ / \
\ / \
2(R) 4(B) 6(B)
Then, following the description, you go to Case 1, but this time with its parent node P (= "1") substitued as N. But N = "1" is not root, so you jump to Case 2. S = "5" is not red, so we jump to Case 3 again. And here, we re-paint S = "5" red:
3(B)
/ \
/ \
1(B) 5(R)
\ / \
\ / \
2(R) 4(B) 6(B)
Then we have to jump to Case 1 again, with P = "3" substitued as N this time. N = "3" and we see that this is a root, so we are done! The tree is balanced!
Related
In electrical engineering, we often deal with a hierarchy of module instances which can be represented as a tree where every tree node is an instance of a module. These modules are connected by signals which can be outputs, inputs or inouts (bi-directional drive direction). In general, the following rules apply:
Each node can have zero or more sub-nodes (children).
Each node can have one or more ports which connect them to other node(s) in the tree.
Each port can have one of three (drive) directions: input (from node to child), output (from child to node) and inout (both ways).
Given a tree with leaf nodes along with their port directions, I have been looking for a simple algorithm to resolve the intermediary port directions, i.e. the goal is to know the port directions of all nodes in the tree connecting the given leaf nodes.
Filtering the tree by signal simplifies the problem such that a given tree is a subset of the original tree with all except one signal removed. This changes rule 2 to
Each node (of the filtered tree) has exactly one port which connects it to all other nodes of the tree.
As for the rules to propagate directions:
If all children have the same direction, the current node's is the same
If there's at least one child which is inout or there's at least one with input and one with output, it depends on other nodes which direction the current node will get (this is where it gets tricky).
Using the notation <X (node X is an output node), >Y (node Y is an input node) and <>Z (Z is a bidir node), here are some examples:
1:
B-<D <B-<D
/ /
A ==> A
\ \
C->E >C->E
2:
<B <B
/ /
A A
\ >D \ >D
\ / ==> \ /
C->E >C->E
\ \
>F >F
3:
<D <D
/ /
B <>B
/ \ / \
/ >E / >E
A ==> A
\ <G \ <G
\ / \ /
C-F->H <>C-<>F->H
\ \
>I >I
4:
<D <D
/ /
B <B
/ \ / \
/ >E / >E
A ==> A
\ \
C->F >C->F
5:
<D <D
/ /
B >B
/ \ / \
/ >E / >E
A ==> A
\ \
C-<F <C-<F
I think the algorithm will probably start at the leaves and propagate towards the root (A), applying propagation rule 1 above. If we hit a rule 2 case (such as in example 3 nodes B, F and, possibly later, C), I'm currently unsure how to proceed further, maybe keep propagating other nodes until (when?) it is determined which direction the current node will be?
Edit 2019-08-14
As commenters have asked about which rules apply when resolving directions in some of the examples, here they are (I hope that covers all cases):
As mentioned above, if all children of a node have the same direction, the parent inherits it (see examples 1, 2 C, 4 C and 5 C).
If the children of a node have mixed directions, it depends:
If there are other nodes with mixed directions (such as between 3 B and 3 F), the nodes will have inouts.
If all other nodes have a single direction (as in examples 4 and 5, between nodes B and F), that determines the port direction of the node with the children of mixed directions.
We'll traverse the tree from left to right, and a limited number of nodes may have to be visited a second time. In every node, we'll store its status: input, output, bidirectional, or to be determined. Consider this example:
-- A --
/ \
B C
/ \ / \
D E >F >G
/ \ / \
>H >I J> K>
We traverse to the leftmost leaf H, find that it is an input, move up to its parent D again, and provisionally mark D as an input. Then we look at the other child I, which is also an input, we move up to D again, and mark it provisionally as an input again. We then notice that D has no more children, and was marked only as an input, so we can mark it permanently as an input.
All of D's children of which the status could be determined will not have to be visited again, so we can ignore them from now on. So now we have:
-- A --
/ \
B C
/ \ / \
>D E >F >G
/ \
J> K>
We move up to B, and provisonally mark it as an input, and then go on to visit its other child E. Similarly as with D, we find that E's status can be determined as an output, and that its children J and K will not have to be visited again:
-- A --
/ \
B C
/ \ / \
>D E> >F >G
We move up to B again, and provisionally mark it as an output; it is now marked as both input and output, so we change its status to "to be determined". D and E will not have to be visited again. So now we have:
-- A --
/ \
?B? C
/ \
>F >G
We move up to A, set its status provisonally to "to be determined", because it has a yet to be determined child, and then move down to the next node F. After examining F and G we find that C is an input, so we get:
-- ?A? --
/ \
?B? >C
We move back up to A and find that we have visited all of its children, and that it has one undetermined child, and all other children are inputs. That means that the undetermined child becomes an output. We then propagate B's status as an output to any children it might have that also have an undetermined status (in the example there are none). This downward propagation means that in the worst case, the whole tree is traversed twice.
-- ?A? --
/ \
B> >C
There is some uncertainty here. If A had three children: one input, one output, and one to be determined, you have not yet defined a rule for what would happen to the to be determined child.
Let's look at another example:
-- A --
/ \
B C
/ \ / \
D E F> >G
/ \ / \
>H I> J> >K
We find that B has two children that are both to be determined:
-- A --
/ \
?B? C
/ \ / \
?D? ?E? F> >G
So we make B bidirectional and propagate this to its to be determined offspring, to end up with:
-- A --
/ \
>B> C
/ \
F> >G
If I understand the rules correctly, this bidirectional status is then propagated to every undetermined node in the rest of the tree, without the need for additional traversal:
-- A --
/ \
>B> >C>
So the algorithm is:
Visit every node, starting from the root node
If the node is a leaf, remember its status, move up to its parent, and give the parent that status provisionally.
If you have visited every child of a node, look at its provisional status: if only input or output has been marked, then the node becomes an input or an output. If both input and output have been marked, then it becomes to be determined. If more than one child had status to be determined, the node becomes bidirectional, and this is propagated down to all its undetermined offspring. We then carry this status up to its parent provisionally.
Once we have set the status of any node to bidirectional, every undetermined node in the tree becomes bidirectional (if I understand the rules correctly).
Practical details:
How a child's status is remembered when you move up to the parent can be done is several ways. You can store boolean flags for input and output, and a count for undetermined (so that you know whether there's more than one), and possibly a seperate boolean flag for bidirectional (in case having a bidirectional child influences the parent in a different way than having a mix of input and output children). But you could also not store the states in the parent node, and look at the children's states again once you find that you've visited all the children.
A node's own status only has 4 states: input, output, bidirectional and undetermined, so that could be stored with just two booleans, where both false means undetermined.
You could store information in each node about which children have to be visited when a status is propagated down to undetermined nodes, but that again isn't strictly necessary; you can just look at the status of each child to update the undetermined ones. Only an undetermined node can have undetermined offspring, so you know that you don't have to visit the children of nodes that weren't undetermined.
The rule you mention in a comment, where a node with three children that are input, output and undetermined would mean the undetermined child becomes bidirectional, is indeed an additional rule. But at the point where you've visited all of a node's children, the details of how you combine the children's states into the parent's status can be easily modified to include any additional rule. If you decide that the undetermined child must become bidirectional, you'd then propagate this status to undetermined offspring of the child, and (if I understand the rules correctly) from that point on every undetermined node you find in the rest of the tree would automatically become bidirectional.
Given a binary arithmetic expression tree consisting of only addition and subtraction operators, and numbers, how to balance the tree as much as possible? The task is to balance the tree without evaluating the expression, that is the number of nodes should stay the same.
Example:
+ +
/ \ / \
+ 15 >>>>>> - +
/ \ / \ / \
5 - 6 4 5 15
/ \
6 4
Addition is commutative and associative and that allows for balancing. Commutativity allows for swapping of children of consecutive '+' nodes. Associativity allows for rotations. In the above example, the transformation performed can be viewed as
Rotation right on '+' at the root.
Swapping of '5' and '-' nodes.
I was thinking of doing an in order traversal and first balancing any sub-trees. I would try to balance any sub-tree with two consecutive '+' nodes by trying all possible arrangements of nodes (there are only 12 of them) to hopefully decrease the total height of the tree. This method should reduce the height of the tree by at most 1 at any step. However, I cannot determine whether it will always give a tree of minimum height, especially when there are more than 2 consecutive '+' nodes.
Another approach could be to read the expression tree into an array and substitute any '-' subtree with a variable. And then us DP to determine the best places for brackets. This must be done bottom up, so that any '-' subtree is already balanced when it is considered by DP algorithm. However, I am worried because there could be (n+1)! ways to arrange nodes and brackets. While I am looking for an O(n) algorithm.
Is it a known problem and is there a specific approach to it?
At the risk of doing something vaguely like "evaluating" (although it isn't in my opinion), I'd do the following:
Change the entire tree to addition nodes, by propagating negation markers down to the roots. A simple way to do this would be to add a "colour" to every leaf node. The colour of a node can be computed directly during a tree walk. During the walk, you keep track of the number (or the parity, since that's the only part we're interested in) of right-hand links from a - nodes taken; when a leaf is reached, it is coloured green if the parity is even and red if the parity is odd. (Red leaves are negated.) During the walk, - nodes are changed to +.
+ +
/ \ / \
+ 15 >>>>>> + 15
/ \ / \
5 - 5 +
/ \ / \
6 4 6 -4
Now minimise the depth of the tree by constructing a minimum depth binary tree over top of the leaves, taking the leaves in order without regard to the previous tree structure:
+ +
/ \ / \
+ 15 >>>>>> + +
/ \ / \ / \
5 + 5 6 -4 15
/ \
6 -4
Turn the colours back into - nodes. The easy transforms are nodes with no red children (just remove the colour) and nodes with exactly one red child and one green child. These latter nodes are turned into - nodes; if the red child is on the left, then the children are also reversed.
The tricky case is nodes all of whose children are red. In that case, move up the tree until you find a parent which has some green descendant. The node you find must have two children (since otherwise its only child would have to have a green descendant), of which exactly one child has green descendants. Then, change that node to -, reverse its children if the right-hand child has a green descendant, and recolour green all the children of the (possibly new) right-hand child.
+ +
/ \ / \
+ + >>>>>> + -
/ \ / \ / \ / \
5 6 -4 15 5 6 15 4
Perhaps it's worth pointing out that the root node has a green descendant on the left-hand side because the very first leaf node is green. That's sufficient to demonstrate that the above algorithm covers all cases.
This may be a very easy question, but I could not find a satisfying answer. After a node is inserted into the red-black tree, three different cases can be encountered :
newly added node = z
Case 1 : z = red, parent of z = red, uncle of z = red
Case 2 : z = red, parent of z = red, z = right child, uncle of z = black
Case 3 : z = red, parent of z = red, z = left child, uncle of z = black
However, I think that we cannot directly enter into case 2 or case 3 because assume that x and y are siblings and red and black respectively. When we insert z under the node x, case 2 or case 3 can be observed without entering into case 1. However, this means that before adding node z, the red-black tree is not balanced because the black-height rule is already broken.
Grandparent
/ \
x(red) y(black)
/ \ / \
nil(b) nil(b) nil(b) nil(b)
The node z can be added into one of the nil pointers of node x, but it is impossible that the tree is like this. After each insertion, the red-black tree must be balanced.
However, my algorithm professor rejected this theory; hence, I cannot ensure this situation. Is it possible to be involved in case 2 or case 3 without case 1 ?
Remember that nulls are black.
It happens like this:
Grandparent
/ \
x(red) nil(b)
/ \
nil(b) nil(b) <-- z goes here
I have looked around for a way to check this property of a red black tree: "Every path from a node to a null node must contain the same number of black nodes".
Most of the upvoted answers look like this:
// Return the black-height of node x. If its subtrees do not have
// the same black-height, call attention to it.
private int checkBlackHeight(Node x) {
if (x == null)
return 0;
else {
int leftBlackHeight = checkBlackHeight(x.left) +
(x.left.isBlack() ? 1 : 0);
int rightBlackHeight = checkBlackHeight(x.right) +
(x.right.isBlack() ? 1 : 0);
if (leftBlackHeight != rightBlackHeight)
complain("blackheight error", x);
return leftBlackHeight;
}
}
What I am confused about is, doesn't this code only check for the leftmost and rightmost paths down the tree? How does it check for the inner paths?
e.g. In the tree below, it should check the path 11-9-8-.. and 11-16-18... but does it check 11-16-13- (some inner nodes) -...
11
/ \
9 16
/ \ / \
8 10 13 18
/\ /\ /\ / \
Thank you in advance!
The reason the code ends up checking all the paths, the way I understand it at least, is because the function has both a "go left" and "go right" instruction, so for each node both left and right are explored, and so all the paths will be covered. Additionally, determining whether the left node is black simply determines whether to add one to the black path length (for each recursive call).
I am studying data structures and algorithms and this thing is really confusing me
Height of a binary tree, as it is also used in AVL search tree.
According to the book I am following "DATA STRUCTURES by Lipschutz" , it says "the depth (or height) of a tree T is the maximum number of nodes in a branch of T. This turns out to be 1 more than the largest level number of T. The tree 7 in figure 7.1 has depth 5."
figure 7.1 :
A
/ \
/ \
/ \
/ \
B C
/ \ / \
D E G H
/ / \
F J K
/
L
But, on several other resources, height has been calculated differently, though same definition is given. For example as I was reading from internet http://www.cs.utexas.edu/users/djimenez/utsa/cs3343/lecture5.html
" Here is a sample binary tree:
1
/ \
/ \
/ \
/ \
2 3
/ \ / \
/ \ / \
/ \ / \
6 7 4 5
/ \ / /
9 10 11 8
The height of a tree is the maximum of the depths of all the nodes. So the tree above is of height 3. "
Another source http://www.comp.dit.ie/rlawlor/Alg_DS/searching/3.%20%20Binary%20Search%20Tree%20-%20Height.pdf
says, "Height of a Binary Tree
For a tree with just one node, the root node, the height is defined to be 0, if there are 2
levels of nodes the height is 1 and so on. A null tree (no nodes except the null node)
is defined to have a height of –1. "
Now these last 2 explanations comply with each other but not with the example given in the book.
Another source says "There are two conventions to define height of Binary Tree
1) Number of nodes on longest path from root to the deepest node.
2) Number of edges on longest path from root to the deepest node.
In this post, the first convention is followed. For example, height of the below tree is 3.
1
/ \
2 3
/ \
4 5
"
In this, I want to ask how can the number of nodes and edges between root and leaf be the same ?
And what will be the height of a leaf node, according to the book it should be 1 (as the number of largest level is 0, so height should be 0+1=1,
but it is usually said height of leaf node is 0.
Also why does the book mention depth and height as the same thing?
This thing is really really confusing me, I have tried clarifying from a number of sources but cant seem to choose between the two interpretations.
Please help.
==> I would like to add to it since now I am accepting the conventions of the book,
in the topic of AVL search trees where we need to calculate the BALANCE FACTOR (which is the difference of the heights left and right subtrees)
it says :
C (-1)
/ \
(0) A G (1)
/
D (0)
The numbers in the brackets are the balance factors.
Now, If I am to follow the book height of D is 1 and right subtree of G has height (-1) since its empty, so Balance factor of G should be = 1-(-1)=2!
Now why has it taken the height of D to be 0 here ?
PLEASE HELP.
The exact definition of height doesn't matter if what you care about is balance factor. Recall that balance factor is
height(left) - height(right)
so if both are one larger or one smaller than in your favorite definition of height, the balance factor doesn't change, as long as you redefine the height of an empty tree accordingly.
Now the problems is that the "maximum number of nodes in a branch" definition is both recursive but doesn't specify a base case. But since the height of a single-element tree is one according to this definition, the obvious choice for the height of a zero-element tree is zero, and if you work out the formulas you'll find this works.
You can also arrive at the zero value by observing that the base case of the other definition is -1, and otherwise it always gives a value one less than the "max. nodes in a branch" definition.