Deletion in red black tree - data-structures

I am trying to follow the RB-DELETE-FIXUP in Introduction to Algorithm 3rd edition. They have this code:
RB-DELETE-FIXUP(T, x)
1 while x != root[T] and color[x] == BLACK
2 do if x == left[p[x]]
3 then w = right[p[x]]
4 if color[w] == RED
5 then color[w] = BLACK ? Case 1
6 color[p[x]] = RED ? Case 1
7 LEFT-ROTATE(T, p[x]) ? Case 1
8 w = right[p[x]] ? Case 1
9 if color[left[w]] == BLACK and color[right[w]] == BLACK
10 then color[w] = RED ? Case 2
11 x = p[x] ? Case 2
12 else if color[right[w]] == BLACK
13 then color[left[w]] = BLACK ? Case 3
14 color[w] = RED ? Case 3
15 RIGHT-ROTATE(T, w) ? Case 3
16 w = right[p[x]] ? Case 3
17 color[w] = color[p[x]] ? Case 4
18 color[p[x]] = BLACK ? Case 4
19 color[right[w]] = BLACK ? Case 4
20 LEFT-ROTATE(T, p[x]) ? Case 4
21 x = root[T] ? Case 4
22 else (same as then clause with "right" and "left" exchanged)
23 color[x] = BLACK
I am not able to understand how the tree is being balanced in case 4. Looking at this image: (from here)
The result for case 4 is not balanced. From D to A, the black-color height is 2. And D to E, the black-color height is 1. What am I missing here?

What you are missing is that the left hand side is not balanced. This routine is called after the parent of x has been spliced out of the tree, and only if the parent was black. Since the tree was balanced prior to removal of the parent, then we know that the subtree rooted at A must have a black height that it is one less than that of the subtree rooted at D. Since E is originally red and D is black, then the subtree rooted at E must originally have the same black height as A. After the transformation, the color of E is now black, so its black height is now one more than A, so the two sides of the tree are indeed balanced.

Related

Find number of nodes in left and right subtree of a complete binary tree if total number of nodes are given in O(1)

I want to find count of nodes in left and right subtree of complete binary tree if total nodes are given,
for example,
n = 5
===> left -> 3 right -> 1
n = 8
===> left -> 4 right -> 3
where n is total number of nodes in a binary tree
is there any formula or O(1)/optimal solution to do this?
is there any formula or O(1)/optimal solution to do this?
Yes.
Write the given 𝑛 (number of nodes) in binary notation, and turn every 1 to a 0, except the most significant 1. Call this number 𝑝. Define 𝑘 as 𝑛 − 𝑝, so that in binary it is the same as 𝑛, but with its most significant digit removed. The number of nodes in the right subtree wil be equal to max(𝑘, 𝑝 / 2 − 1).
It is then a piece of cake to know how many are in the left subtree, because the sum of nodes in both subtrees plus 1 (for the root node) must be equal to 𝑛.
Your example
When 𝑛 = 5, we note that down in binary as 0b101. We can see that 𝑝 = 0b100 and 𝑘 = 0b001. So the right subtree has max(𝑘, 𝑝/2 − 1) nodes, i.e. max(1, 1) = 1. The left subtree has the remaining nodes, i.e. 3, so that 3 + 1 + 1 (for root) = 5.
Other examples
Here is a table with the results for 𝑛 between 1 and 15:
𝑛
in binary
𝑝
𝑘
𝑝/2 − 1
in right subtree = max(𝑘, 𝑝/2 − 1)
in left subtree
1
0b1
0b1
0b0
0b0
0
0
2
0b10
0b10
0b0
0b0
0
1
3
0b11
0b10
0b1
0b1
1
1
4
0b100
0b100
0b00
0b1
1
2
5
0b101
0b100
0b01
0b1
1
3
6
0b110
0b100
0b10
0b1
2
3
7
0b111
0b100
0b11
0b1
3
3
8
0b1000
0b1000
0b000
0b11
3
4
9
0b1001
0b1000
0b001
0b11
3
5
10
0b1010
0b1000
0b010
0b11
3
6
11
0b1011
0b1000
0b011
0b11
3
7
12
0b1100
0b1000
0b100
0b11
4
7
13
0b1101
0b1000
0b101
0b11
5
7
14
0b1110
0b1000
0b110
0b11
6
7
15
0b1111
0b1000
0b111
0b11
7
7

Calculating height for AVL tree while inserting node

I have verified in three sources for avl insert code. In all the cases to calculate height,
root.height = 1 + max(self.getHeight(root.left),
self.getHeight(root.right))
the above line is given.
Here is my query, why should we take max of both left and right subtree and add one to that?
What if we are adding the node to the subtree with minimum height? In that case both will have same height H not H+1.
This increment of height should be added as,
elif key < root.key:
root.left = self.insertNode(root.left, key)
root.height = 1 + self.getHeight(root.left)
else:
root.right = self.insertNode(root.right, key)
root.height = 1 + self.getHeight(root.right )
Am I correct? If yes, why these people are adding one after taking max?
Please use the full code for verification below. code is taken from programiz.com. Also verified geek for geeks.
def insertNode(self, root, key):
if not root:
return TreeNode(key)
elif key < root.key:
root.left = self.insertNode(root.left, key)
else:
root.right = self.insertNode(root.right, key)
root.height = 1 + max(self.getHeight(root.left),
self.getHeight(root.right))
balanceFactor = self.getBalance(root)
if balanceFactor > 1:
if key < root.left.key:
return self.rightRotate(root)
else:
root.left = self.leftRotate(root.left)
return self.rightRotate(root)
if balanceFactor < -1:
if key > root.right.key:
return self.leftRotate(root)
else:
root.right = self.rightRotate(root.right)
return self.leftRotate(root)
return root
Suppose you have a tree like this:
5
/ \
/ \
3 7
/ / \
2 6 8
\
9
The tree has a height of 3 (there are 3 branches between the root node 5 and the deepest leaf node 9).
The subtrees' heights are 1 for the left one (rooted at the node 3) and 2 for the right one (rooted at 7), and
3 = H(node(5)) = 1 + max(H(node(3)), H(node(7))) = 1 + max(1, 2)
Now suppose you add a node with a key 4 to the tree:
5
/ \
/ \
3 7
/ \ / \
2 4 6 8
\
9
The height of the tree rooted at node 3 did not increase: H(node(3)) still equals 1.
If you do a proposed replacement in the algorithm, your tree will erroneously get a height of 2 after a described insertion: 1 + H(node(3)), instead of keeping the height equal 3.
IF your code has been actually 'verified' by any programming site, then run away from that site and never trust them again.

Assignment regarding, dynamic programming. Making my code more efficient?

I've got an assignment regarding dynamic programming.
I'm to design an efficient algorithm that does the following:
There is a path, covered in spots. The user can move forward to the end of the path using a series of push buttons. There are 3 buttons. One moves you forward 2 spots, one moves you forward 3 spots, one moves you forward 5 spots. The spots on the path are either black or white, and you cannot land on a black spot. The algorithm finds the smallest number of button pushes needed to reach the end (past the last spot, can overshoot it).
The user inputs are for "n", the number of spots. And fill the array with n amount of B or W (Black or white). The first spot must be white. Heres what I have so far (Its only meant to be pseudo):
int x = 0
int totalsteps = 0
n = user input
int countAtIndex[n-1] <- Set all values to -1 // I'll do the nitty gritty stuff like this after
int spots[n-1] = user input
pressButton(totalSteps, x) {
if(countAtIndex[x] != -1 AND totalsteps >= countAtIndex[x]) {
FAILED } //Test to see if the value has already been modified (not -1 or not better)
else
if (spots[x] = "B") {
countAtIndex[x] = -2 // Indicator of invalid spot
FAILED }
else if (x >= n-5) { // Reached within 5 of the end, press 5 so take a step and win
GIVE VALUE OF TOTALSTEPS + 1 A SUCCESSFUL SHORTEST OUTPUT
FINISH }
else
countAtIndex[x] = totalsteps
pressButton(totalsteps + 1, x+5) //take 5 steps
pressButton(totalsteps + 1, x+3) //take 3 steps
pressButton(totalsteps + 1, x+2) //take 2 steps
}
I appreciate this may look quite bad but I hope it comes across okay, I just want to make sure the theory is sound before I write it out better. I'm wondering if this is not the most efficient way of doing this problem. In addition to this, where there are capitals, I'm unsure on how to "Fail" the program, or how to return the "Successful" value.
Any help would be greatly appreciated.
I should add incase its unclear, I'm using countAtIndex[] to store the number of moves to get to that index in the path. I.e at position 3 (countAtIndex[2]) could have a value 1, meaning its taken 1 move to get there.
I'm converting my comment into an answer since this will be too long for a comment.
There are always two ways to solve a dynamic programming problem: top-down with memoization, or bottom-up by systematically filling an output array. My intuition says that the implementation of the bottom-up approach will be simpler. And my intent with this answer is to provide an example of that approach. I'll leave it as an exercise for the reader to write the formal algorithm, and then implement the algorithm.
So, as an example, let's say that the first 11 elements of the input array are:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B ...
To solve the problem, we create an output array (aka the DP table), to hold the information we know about the problem. Initially all values in the output array are set to infinity, except for the first element which is set to 0. So the output array looks like this:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B
output: 0 - x - x x x - - x -
where - is a black space (not allowed), and x is being used as the symbol for infinity (a spot that's either unreachable, or hasn't been reached yet).
Then we iterate from the beginning of the table, updating entries as we go.
From index 0, we can reach 2 and 5 with one move. We can't move to 3 because that spot is black. So the updated output array looks like this:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B
output: 0 - 1 - x 1 x - - x -
Next, we skip index 1 because the spot is black. So we move on to index 2. From 2, we can reach 4,5, and 7. Index 4 hasn't been reached yet, but now can be reached in two moves. The jump from 2 to 5 would reach 5 in two moves. But 5 can already be reached in one move, so we won't change it (this is where the recurrence relation comes in). We can't move to 7 because it's black. So after processing index 2, the output array looks like this:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B
output: 0 - 1 - 2 1 x - - x -
After skipping index 3 (black) and processing index 4 (can reach 6 and 9), we have:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B
output: 0 - 1 - 2 1 3 - - 3 -
Processing index 5 won't change anything because 7,8,10 are all black. Index 6 doesn't change anything because 8 is black, 9 can already be reached in three moves, and we aren't showing index 11. Indexes 7 and 8 are skipped because they're black. And all jumps from 9 are into parts of the array that aren't shown.
So if the goal was to reach index 11, the number of moves would be 4, and the possible paths would be 2,4,6,11 or 2,4,9,11. Or if the array continued, we would simply keep iterating through the array, and then check the last five elements of the array to see which has the smallest number of moves.

Given a red-black tree on n nodes, what is the maximum number of red nodes on any root to leaf path?

This was a quiz question. I'm not sure whether my answer was right. Please help me out.
Lets say the height is h, since no two consecutive nodes (as we go up the tree) can be red, wouldn't the max number of red nodes be h/2? (h = log n)
Somehow, I feel that is not the correct answer.
Any help/input would be greatly appreciated!
Thank you so much in advance!
** Edit ** This answer assumes a definition of height to be the number of nodes in the longest path from root to leaf (used, e.g., in lecture notes here) including the "virtual" black leaf nodes. More common definition counts the number of edges, and does not include the leaf nodes. With this definition the answer is round(h/2), and if you include the leaf nodes in to height round_down(h/2). ** Edit ends **
If you follow the rules that the root node is black as in Wikipedia, then the correct answer is the largest integer smaller than h/2. This is just because root and leaves are black, and half of the nodes (rounded up) in between can be red. I.e. round((h-2)/2)
You can also find the rule just by considering some small red-black trees of different heights.
Case h=1 root is black -> 0 red nodes
Case 'h=2' root is black and leaves are black -> 0 red nodes
Case h=3 root is black, second level can be red, and leaves must be black -> max 1 red node
Case h=4 root is black, second level can be red, third level must be black, and leaves must be black -> max 1 red node
Case h=5 black, red, black, red, black -> max 2 red nodes.
The h as a function of n is trickier, but it can be shown that h <= 2 log (n+1), which guarantees the logarithmic search time. For a proof see, e.g., Searching and Search Trees II (page 11). The proof is based on the fact that the rules of red-black tree guarantee that a subtree starting at x contains at least 2^(bh(x)) - 1 internal nodes, where bh(x) is the black height - number of black nodes in path from root to leaf. This is proven by induction. Then by noting that at most half of the nodes are black (we are speaking of subtrees so the root can be red) that bh(x) >= h/2. Now using these results we get n >= 2^bh(x) - 1 >= 2^(h/2) -1. Solving for h, we get the answer h <= 2 log(n+1).
As the question was a quiz, it should be enough to say that h is proportional to log(n) or even about log(n).
Let's first see how few nodes (minimising n) are needed to make a path with 1 red node (* is black):
*
/ \
* R
/ \
* *
So n must be at least 5 when 1 red node is needed. It has 3 leaf nodes, and 2 internal nodes. Removing any node will require to drop the red node as well to stay within the rules.
If we want to extend this tree to get a path with 2 red nodes we could apply the following two steps:
All leaves get two black children
The right-most leaf (just added) is turned into a red node, and it gets 2 black children.
The dollar signs are the added black nodes compared to the prevous tree:
*
/ \
* R
/| / \
$ $ * *
/| / \
$ $ $ R
/ \
$ $
We choose to place that path with the red nodes on the right side; this choice does not influence the conclusions. Note that it does not help to add red nodes in other, shorter paths, as this will only increase the number of nodes without increasing the path with the most red nodes.
The number of leaf nodes (L) doubles with step 1, while the nodes that were leaves become internal nodes (I).
The second step increases both the number of internal nodes and number of leaves with 1. More formally put, we can find these formulas, where the index r represents the number of red nodes:
L1 = 3
I1 = 2
Lr+1 = 2Lr + 1
Ir+1 = Ir + Lr + 1
Put in a table for increasing r:
r | L | I | n=L+I
----+-----+-----+-------
1 | 3 | 2 | 5
2 | 7 | 6 | 13
3 | 15 | 14 | 29
4 | 31 | 30 | 61
... | ... | ... | ...
We can see the following is true:
Lr = 2r+1 - 1
Ir = 2r+1 - 2
And so:
nr = 2r+2 - 3
So we have a formula for knowing the minimum number of nodes needed to have a path with r red nodes. We need a different relation: the maximum for r when given n.
From the above we can derive:
r = ⌊ log2(n+3) ⌋ - 2

Wikipedia's pseudocode for Breadth-first search: How could n's parent be u if "n is adjacent to u"?

Studying the Breadth-first search algorithm, I encountered the following pseudo-code:
1 Breadth-First-Search(G, v):
2
3 for each node n in G:
4 n.distance = INFINITY
5 n.parent = NIL
6
7 create empty queue Q
8
9 v.distance = 0
10 Q.enqueue(v)
11
12 while Q is not empty:
13
14 u = Q.dequeue()
15
16 for each node n that is adjacent to u:
17 if n.distance == INFINITY:
18 n.distance = u.distance + 1
19 n.parent = u
20 Q.enqueue(n)
My question is regarding line 19 (n.parent = u):
How could n's parent be u if "n is adjacent to u"?
A parent is by definition adjacent to its children, they wouldn't be children without the connection. But that's not what this is about. The parent pointers are something completely separate from the structure of the graph, it's something new you're building up that keeps track of from where a node was first reached.

Resources