Properties of a red-black tree - data-structures

35 (black)
/ \
21 54 (whole row is red)
/ \ / \
14 27 42 74 (whole row is black)
\
90 (red)
Would this classify as a red-black tree, I have not spotted any violations. What are primarily the properties I should look out for besides you can't have two consecutive red nodes?

There are no violations in the above tree.
The main properties to look out for are :
1) Root is Black
2) There cannot be 2 consecutive red nodes.
3) You need to add NIL Nodes as leaf nodes, whose color is taken as black.
4) The Black Depth of all nodes from the root is always the same, eg in the above case the Black Depth is 3 including the NIL Nodes on each path.
You can read about them here : Red Black Tree Properties

Related

Given a red-black tree on n nodes, what is the maximum number of red nodes on any root to leaf path?

This was a quiz question. I'm not sure whether my answer was right. Please help me out.
Lets say the height is h, since no two consecutive nodes (as we go up the tree) can be red, wouldn't the max number of red nodes be h/2? (h = log n)
Somehow, I feel that is not the correct answer.
Any help/input would be greatly appreciated!
Thank you so much in advance!
** Edit ** This answer assumes a definition of height to be the number of nodes in the longest path from root to leaf (used, e.g., in lecture notes here) including the "virtual" black leaf nodes. More common definition counts the number of edges, and does not include the leaf nodes. With this definition the answer is round(h/2), and if you include the leaf nodes in to height round_down(h/2). ** Edit ends **
If you follow the rules that the root node is black as in Wikipedia, then the correct answer is the largest integer smaller than h/2. This is just because root and leaves are black, and half of the nodes (rounded up) in between can be red. I.e. round((h-2)/2)
You can also find the rule just by considering some small red-black trees of different heights.
Case h=1 root is black -> 0 red nodes
Case 'h=2' root is black and leaves are black -> 0 red nodes
Case h=3 root is black, second level can be red, and leaves must be black -> max 1 red node
Case h=4 root is black, second level can be red, third level must be black, and leaves must be black -> max 1 red node
Case h=5 black, red, black, red, black -> max 2 red nodes.
The h as a function of n is trickier, but it can be shown that h <= 2 log (n+1), which guarantees the logarithmic search time. For a proof see, e.g., Searching and Search Trees II (page 11). The proof is based on the fact that the rules of red-black tree guarantee that a subtree starting at x contains at least 2^(bh(x)) - 1 internal nodes, where bh(x) is the black height - number of black nodes in path from root to leaf. This is proven by induction. Then by noting that at most half of the nodes are black (we are speaking of subtrees so the root can be red) that bh(x) >= h/2. Now using these results we get n >= 2^bh(x) - 1 >= 2^(h/2) -1. Solving for h, we get the answer h <= 2 log(n+1).
As the question was a quiz, it should be enough to say that h is proportional to log(n) or even about log(n).
Let's first see how few nodes (minimising n) are needed to make a path with 1 red node (* is black):
*
/ \
* R
/ \
* *
So n must be at least 5 when 1 red node is needed. It has 3 leaf nodes, and 2 internal nodes. Removing any node will require to drop the red node as well to stay within the rules.
If we want to extend this tree to get a path with 2 red nodes we could apply the following two steps:
All leaves get two black children
The right-most leaf (just added) is turned into a red node, and it gets 2 black children.
The dollar signs are the added black nodes compared to the prevous tree:
*
/ \
* R
/| / \
$ $ * *
/| / \
$ $ $ R
/ \
$ $
We choose to place that path with the red nodes on the right side; this choice does not influence the conclusions. Note that it does not help to add red nodes in other, shorter paths, as this will only increase the number of nodes without increasing the path with the most red nodes.
The number of leaf nodes (L) doubles with step 1, while the nodes that were leaves become internal nodes (I).
The second step increases both the number of internal nodes and number of leaves with 1. More formally put, we can find these formulas, where the index r represents the number of red nodes:
L1 = 3
I1 = 2
Lr+1 = 2Lr + 1
Ir+1 = Ir + Lr + 1
Put in a table for increasing r:
r | L | I | n=L+I
----+-----+-----+-------
1 | 3 | 2 | 5
2 | 7 | 6 | 13
3 | 15 | 14 | 29
4 | 31 | 30 | 61
... | ... | ... | ...
We can see the following is true:
Lr = 2r+1 - 1
Ir = 2r+1 - 2
And so:
nr = 2r+2 - 3
So we have a formula for knowing the minimum number of nodes needed to have a path with r red nodes. We need a different relation: the maximum for r when given n.
From the above we can derive:
r = ⌊ log2(n+3) ⌋ - 2

AVL Rotation - Which node to rotate

I have read many sources about AVL trees, but did not find anyone addressing this issue: When AVL tree gets unbalanced, which node should be rotated first?
Assuming I have the tree:
10
/ \
5 25
/
20
and I'm trying to add 15, both the root and its child 25 will be unbalanced.
10
/ \
5 25
/
20
/
15
I could do a RR rotation (or single rotation) of 25, resulting in the following tree:
10
/ \
5 20
/\
15 25
or a RL rotation (double rotation) about the root, creating the following tree:
20
/ \
10 25
/ \
5 15
I am confused about which rotation is the most suitable here and in similar cases.
The RR rotation is correct here. The rotation should be done as soon (as low) as the rule is broken. Which is for 25 here.
The higher rotations first don't necessarily break the rule and secondly would become too complex although it doesn't seem so here at the first sight.

Ideal height of tree structure

How can I calculate the ideal height of a tree structure?
When I have this tree
I know the height is 4.
There's a formula that says that the ideal height of a tree is 2 ^ height - 1 but that doesn't make sense to me (since it would be 15).
Can someone please explain?
Well, first of all, that formula applies only to binary trees. Second, the ideal number of nodes in the tree will be 2^height-1. For a saturated binary tree of height 4, the number of nodes will be 15.
That formula is for the maximum number of nodes that can be included in a binary tree of that height. Assuming you want the tree to be as shallow as possible, you want to know the minimum height of such a tree given the number of nodes. So you simply invert:
nodes = 2^height - 1
to get
height = log2(nodes + 1)
rounded up.
Height of the tree is the maximum height among all the nodes in the tree. Now say you have a tree
1
/ \
2 3
/ \ / \
4 5 6 7
the height of the tree is 3(since all path lengths are same so lets say 1-2-5 is maximum) now as there are three levels so no of node at each level
1 =2^0
/ \
2 3 =2^1
/ \ / \
4 5 6 7 =2^2
total =2^0 +2^1+2^2= clearly its a gp with sum 2^3-1 ,hence the number of nodes =2^height-1
if you talk about levels(as they start from 0) no of nodes= 2^(level+1)-1

Balanced Binary Search Tree for numbers

I wanted to draw a balanced binary search tree for numbers from 1 to 20.
_______10_______
/ \
___5___ 15
/ \ / \
3 8 13 18
/ \ / \ / \ / \
2 4 7 9 12 14 17 19
/ / / /
1 6 11 16
Is the above tree correct and balanced?
In answer to your original question as to whether or not you need to first calculate the height, no, you don't need to. You just have to understand that a balanced tree is one where the height difference between the tallest and shortest node is zero or one, and the simplest way to achieve this is to ensure that you always pick the midpoint of the possible list, when populating the top node in a sub-tree.
Your sample tree is balanced since all leaf nodes are either at the bottom or next-to-bottom level, hence the difference in heights between any two leaf nodes is at most one.
To create a balanced tree from the numbers 1 through 20 inclusive, you can just make the root entry 10 or 11 (the midpoint being 10.5 for those numbers), so that there's an equal quantity of numbers in either sub-tree.
Then just do that recursively for each sub-tree. On the lower side of 10, 5 is the midpoint:
10
/ \
5 11-thru-19 sub-tree
/ \
1-thru-4 6-thru-9
sub-tree sub-tree
Just expand on that and you'll end up with something like:
_______10_______
/ \
___5___ 15
/ \ / \
2 7 13 17
/ \ / \ / / \
1 3 6 8 11 16 18 <- depth of highest leaf node
\ \ \ \
4 9 12 19 <- depth of lowest leaf node
^
|
Difference is 1
The midpoint can be found at the number where the difference between quantities above and below that numbers is one or zero. For the whole list of numbers 1 through 20 inclusive, there are nine less than 10 and ten greater than 10 (or, if you chose 11 as the midpoint, the quantities are ten and nine).
The difference between your sample and mine is probably to do with the fact that I preferred to pick the midpoint by rounding down where there was a choice (meaning my right sub-trees tend to be "heavier"). Because your left sub-trees are heavier, you appear to have rounded up.
After choosing 10 as the initial midpoint, there's no leeway on the left sub-tree, you have to choose 5 since it has four above and below it. Any other midpoint would result in a difference of at least two between the two halves (for example, choosing 4 as the midpoint would have the two halves of size three and five). This can still give you a balanced sub-tree depending on the data but it's "safer" to choose the midpoint.

AVL trees balancing

Given an AVL tree below:
23
/ \
19 35
/ \ / \
8 20 27 40
/
38
/
36
Is it ok to just do a single rotation at 40, to the right? Making it something like this:
23
/ \
19 35
/ \ / \
8 20 27 38
/ \
36 40
It still conforms tot he AVL property of having -+1 height compared to the left subtree.
In the answer it does a double rotation so the subtree at 35 above would look like this after:
23
/ \
19 38
/ \ / \
8 20 35 40
/ \
27 36
I don't understand when to do a double rotation and when to do a single rotation if they both do not violate the height property.
The double rotation may be due to a specific AVL algorithm in use. Both answers look like valid AVL trees to me.
If the original question was given with only the unbalanced AVL tree (and not the balanced tree before a node was added or removed), then the single rotation is a valid answer.
If the question provides the AVL tree before and after a node was added or removed, then the algorithm for rebalancing could result in the double rotation occurring.
Both answers are right, though according to the literature that I use:
The are four types of rotations LL, RR,LR and RL. These rotations are
characterized by the nearest ancestor A, of the inserted node N, whose
balance factor becomes 2.
The following characterization of rotation types is obtained:
LL: N is inserted in the left subtree of the left subtree of A
LR: N is inserted in the right subtree of the left subtree of A
RR: N is inserted in the right subtree of the right subtree of A
RL: N is inserted in the left subtree of the right subtree of A
According to these rules, the nearest ancestor whose balance factor becomes 2 is 40 in your example, and the insertion was made in the left subtree of the left subtree of 40 so you have to perform an LL rotation. Following these rules, the first of your answers will be the chosen operation.
Still, both answers are correct and depends of the algorithm you are using and the rules it follows.

Resources