An issue on red-black trees - algorithm

If I have a binary tree, and want to add the black/red property to all of its nodes to form a red-black tree, is there a way to do that if we knew that the binary tree is balanced?

Probably the most stringent condition on red-black trees is the fact that any root-NULL path has to have the same number of black nodes in it. Therefore, one option would be to start off by running a DFS from the root to determine the length of the shortest root-NULL path. This path length gives an upper bound on the "black height" of the tree, which is the number of black nodes on any root-NULL path.
Once we have this value, we can try to assign black heights to the nodes in a way that will let us determine which nodes are red or black. One useful observation is the following: the black height of any of a node's subtrees has to be the same, since otherwise there are two root-NULL paths that will have different black heights. Therefore, for each node, if its current black height is h and the desired black height is H, we can either
Color it red, in which case we recursively color the left and right subtrees such that they have black height H. We also force the roots of those subtrees below to be black.
Color it black, in which case we recursively color the left and right subtrees such that they have black height H - 1. The nodes at the roots of those trees can be either color.
I think you can do this by using dynamic programming. Supposing that the desired target black height is H, we can make a table indexed by node/black depth/color triples (here, the black height is the black height of the subtree rooted at that node) that stores whether it's possible to color that node in the proper way. Let's call it T[v, h, c]. We can fill it in as follows:
Treat NULL as a node that's black. T[null, 0, red] = false, and T[null, 0, black] = true.
For each node, processed in an order such that v is only processed if its subtrees l and r are processed, do the following:
T[v, h, red] = T[l, h, black] and T[r, h, black]
T[v, h, black] = T[l, h - 1, c] and T[r, h - 1, c] for any color c
Once you evaluate this, you can then check if T[root, h, c] is true for any height h or color c. This should take only polynomial time.
Hope this helps!

Templatetypedef has already answered the main part of your question in a very nice way. I just wanted to add an answer to your secondary question.
Since red-black marking is used to prevent unbalanced trees from arising, it is certainly not possible to colour every search tree - if it were then it didn't achieve anything! A counterexample is this tree:
1
\
2
\
3
where all left children are null.

Related

Red Black Tree, and condition for coloring

Recently I think about BST converting to RB Tree by coloring.
I means what is the sufficient and necessary condition that we can convert BST to RB Tree just by coloring without any other change in that BST? (i.e: just by check shortest and longest path is not twice more than shortest path, or specific height or any other condition...)
A null binary tree is a red-black tree. A non-null binary tree is a red-black tree if:
The root is black;
the number of black nodes on any path from root to null is the same.
no such path has two non-black (i.e., red) nodes in a row.
We'll refer to the number of black nodes on every path from root to null as the tree's "black-height".
In any non-null red-black tree, both children of the root have the same black-height and will also be red-black trees if you make sure their roots are colored black. Coloring a red root black will increase the black-height of the tree by 1, so if the children of a root are made into red-black trees, their heights may differ by at most 1.
Similarly, given two red-black trees with the same black-height, you can join them under a new black root to create a new red-black tree.
Given a red-black tree and a red-rooted tree with red-black tree children of the same black-height, you can also join them under a new black root.
Two red-rooted trees with red-black tree children can have their roots recolored and joined under a new root similarly.
Henceforth, a red root with red-black tree children of the same black-height will be referred to as a red-rooted tree.
Given this, we can define the condition for red-black colorability recursively like so:
A binary tree can be colored as a red-black tree with black-height X if and only if:
it is null and X==0; OR
both of its children can be colored as red-black trees or red-rooted trees with black-height X-1
A binary tree can be colored as a red-rooted tree with black-height X if and only if it is non-null and both of its children can be colored as red-black trees with black-height X;
Given any binary tree, then, we can calculate the black-heights at which it could be colored as a red-black tree or a red-rooted tree:
In pseudocode:
redAndBlackHeights(tree):
if (tree == null):
return ([],[0]); //only a red-black tree with bh=0
(left_red_heights,left_black_heights) = redAndBlackHeights(tree.left)
(right_red_heights,right_black_heights) = redAndBlackHeights(tree.right)
red_heights = intersect(left_black_heights, right_black_heights)
black_heights = intersect(
x+1 for x in union(left_red_heights,left_black_heights)
x+1 for x in union(right_red_heights,right_black_heights)
)
return (red_heights, black_heights)
A tree is colorable as a red-black tree if and only if redAndBlackHeights(tree) returns at most one black-rooted height.
Since there are at most O(log N) possible heights in a tree of size N, this takes O(N log N) time.
It turns out, actually, that all of the sets of heights are contiguous ranges, and if you represent them as such the algorithm takes O(N) time.
I believe that Matt Timmermans' answer is correct, but I don't find it very satisfying, since although it provides a good algorithm for determining if a binary tree is red-black–colorable, it doesn't really provide a characterization other than "run this algorithm" — and, worse yet, the algorithm refers to concepts that are specific to red-black trees, and that (IMHO) don't make sense outside that context.
So, below is a characterization that I think is more satisfying.
Let the "least-height" of a node be the least distance from it down to a ɴɪʟ descendant, and let its "greatest-height" be the greatest distance from it down to a ɴɪʟ descendant. That is:
ɴɪʟ.leastHeight = ɴɪʟ.greatestHeight = 0
for a non-ɴɪʟ node:
node.leastHeight = 1 + min(node.left.leastHeight, node.right.leastHeight)
node.greatestHeight = 1 + max(node.left.greatestHeight, node.right.greatestHeight)
So, for example, in this tree:
2
/ \
1 4
/
3
(where I've omitted the ɴɪʟ leaves for readability) we have these heights:
1: least-height = greatest-height = 1
2: least-height = 2; greatest-height = 3
3: least-height = greatest-height = 1
4: least-height = 1; greatest-height = 2
Theorem. A binary tree is red-black–colorable if and only if, for every single node, its greatest-height is at most double its least-height, or equivalently, its least-height is at least half its greatest-height.
(The above example does satisfy this rule; and indeed, we can make it a red-black tree by coloring 3 red and the rest black.)
To prove this, we need one more definition. Let the "black-height" of a node in a red-black tree be the number of black nodes on the path from it down to any ɴɪʟ descendant, including itself if it is black. (By the definition of a red-black tree, this value is the same no matter which ɴɪʟ descendant is chosen.) That is:
ɴɪʟ.blackHeight = 0
for a black node: node.blackHeight = 1 + node.left.blackHeight = 1 + node.right.blackHeight
for a red node: node.blackHeight = node.left.blackHeight = node.right.blackHeight
So, the "only if" direction of the theorem — that if a binary tree is red-black–colorable, then the greatest-height of any node is at most double its least-height — shouldn't be surprising, because that's sort of the whole point of a red-black tree. If the tree is red-black–colorable, choose one such coloring. For a black node with black-height b, every path from that node down to a ɴɪʟ descendant will include exactly b black nodes (including the node itself), so its least-height is at least b; and no path from that node down to a ɴɪʟ descendant can include more than b red nodes, so its greatest-height is at most 2b. For a red node with black-height b, every path from that node down to a ɴɪʟ descendant will include exactly b black nodes, so its least-height is at least b+1 (including the node itself); and no path from that node down a ɴɪʟ descendant can include more than b+1 red nodes (including the node itself), so its greatest-height is at most 2(b+1).
The "if" direction — that if the greatest-height of every single node is at most double its least-height, then the tree is red-black–colorable — is trickier.
The proof turns out to be a bit simpler if we allow the trees to have red roots. This doesn't affect the result, because given a tree that satisfies all of the red-black invariants except that its root is red, we can just recolor the root black without breaking those other invariants. But I don't want to redefine the term "red-black tree" in the middle of a proof about red-black trees, so I'll instead use the term "red-black subtree", in reference to the fact that a subtree of a red-black tree has to satisfy all of the red-black invariants except that its root can be red.
The proof involves mathematical induction, so I'll actually prove a slightly stronger claim that enables the inductive step:
Theorem. If every node in a binary tree has a greatest-height that is at most double its least-height, then for any integer b in the range [root.greatestHeight / 2, root.leastHeight], the tree can be colored to become a red-black subtree whose root has black-height b.
Unfortunately, the inductive step will involve jumping down two levels (considering root.left.left and root.left.right and root.right.left and root.right.right, instead of just root.left and root.right), so our base cases need to cover all cases where root or root.left or root.right is ɴɪʟ.
Base case #1 — root is ɴɪʟ: This is straightforward, since ɴɪʟ is inherently a red-black subtree, the range [ɴɪʟ.greatestHeight / 2, ɴɪʟ.leastHeight] is just {0}, and ɴɪʟ.blackHeight = 0.
Base case #2 — root has two ɴɪʟ children: The least-height and greatest-height are both 1, so the range [root.greatestHeight / 2, root.leastHeight] is just [½, 1], which contains only one integer, namely 1; and indeed, if we color the root black, we'll have a red-black subtree whose root has black-height 1.
Base case #3 — root has one ɴɪʟ child and one non-ɴɪʟ child: The least-height is 1, so by assumption, the greatest-height can be at most 2; so the non-ɴɪʟ child must have greatest-height 1, meaning that both of its children are ɴɪʟ. (In other words, this must be a tree with exactly two non-ɴɪʟ nodes.) The range [root.greatestHeight / 2, root.leastHeight] is just {1}; and indeed, if we color the root black and its non-ɴɪʟ child red, we'll have a red-black subtree whose root has black-height 1.
Inductive case — root has two non-ɴɪʟ children: We assume, by induction, that its four grandchildren root.{left,right}.{left,right} all satisfy the theorem; so, for example, the subtree rooted at root.right.left can be colored to become a red-black subtree with any black-height in the range [root.right.left.greatestHeight / 2, root.right.left.leastHeight]. Then:
For any integer b in the range
[max(root.{left,right}.{left,right}.greatestHeight) / 2, min(root.{left,right}.{left,right}.leastHeight)],
we can color all of root.{left,right}.{left,right} as red-black subtrees whose roots have black-height b.
So, for any integer b′ in the range
[root.greatestHeight / 2, root.leastHeight]
= [(2 + max(root.{left,right}.{left,right}.greatestHeight)) / 2, 2 + min(root.{left,right}.{left,right}.leastHeight)]
= [1 + max(root.{left,right}.{left,right}.greatestHeight) / 2, 2 + min(root.{left,right}.{left,right}.leastHeight)],
we have that either b′−1 or b′−2 (or both) is in the range
[max(root.{left,right}.{left,right}.greatestHeight) / 2, min(root.{left,right}.{left,right}.leastHeight)],
meaning that we can color all of root.{left,right}.{left,right} as red-black subtrees whose roots all have the same black-height and that black-height is either b′−1 or b′−2. So, we do so, and we color root.left and root.right black. We then color root either red or black, depending on whether its grandchildren have black-height b′−1 or b′−2 (red if the former, black if the latter), thereby ensuring that root itself is a red-black subtree with black-height b′, as desired.

What is the maximum attainable height of a red-black tree with black-height k and why?

What is attainable height in red-black tree? Is it the height of the tree? I have read the wiki of it but still have no clue. Thank you.
Since a node of height h has black-height >= h/2, the answer should be 2k.
Black Height of a Red-Black Tree :
Black height is number of black nodes on a path from a node to a leaf. Leaf nodes are also counted black nodes. From above properties 3 and 4, we can derive, a node of height h has black-height >= h/2.

Height of tree with single node

I have googled it that there is no right answer for tree height having only one node . Sometimes it node count and sometimes it is edges count causes sometimes it is 1 and other time it is 0 . what are the cases when node count is used and other time edge count is used ?
It depends entirely on your definition of (1) tree, and (2) height. But we certainly wish to maintain the property that height is a total function from trees to inteters; there should be no tree of undefined height.
Suppose for example we have this definition of a binary tree:
A tree is defined as either (1) the empty tree, or (2) a pair of trees, called the left and right subtrees.
type t = Empty | Node of t * t
Now we can define height, which should be a total function: the height of an empty tree is zero -- what else could it be? -- and the height of a non-empty tree is the larger of the heights of the sub-trees plus one:
let max x y = if x > y then x else y
let rec height tree = match tree with
| Empty -> 0
| Node (left, right) -> 1 + max (height left) (height right)
Now, notice the chain of logic that got us here:
height is a total function
empty is a legal tree
therefore an empty tree must have a height
the only sensible height for an empty tree is zero
therefore the height of a tree with a single node must be one.
If we deny some of those premises then we can come up with other answers. For example, what if there were no empty trees?
A tree is defined as a list, possibly empty, of trees:
type t = Node of t list
And again we could come up with a definition of height: the height of a node with an empty list is defined as zero, and the height of a node with non-empty children is the largest child height plus one.
let max x y = if x > y then x else y
let rec height tree = match tree with
| Node [] -> 0
| Node h :: t -> max (1 + height h) (height (Node t))
In this definition the height of a tree with a single node is zero, and we are counting edges. Again, look at our reasoning:
height is a total function
an empty tree is not a legal tree, but a leaf is
therefore a leaf must have a height
a sensible height for a leaf is zero
therefore a tree that is a single leaf could have height zero.
But we could also have said that the height of a leaf is one, with the same definition otherwise, and we'd be counting nodes. There's no objection to that logically.
what are the cases when node count is used and other time edge count is used ?
If an empty tree is legal then plainly only the node count makes sense. If we try to count edges then there is no way to distinguish the height of the empty tree from the height of a single-node tree, and keep height a total function.
If an empty tree is not legal then either makes sense. Since the relationship between the two height functions is "they differ by exactly one", it doesn't matter which definition you use; if you want to use the other definition, just add or subtract one appropriately.
When balancing a tree we don't care about the absolute heights; we care about the differences in heights between two trees. In those algorithms whether we count edges or nodes is irrelevant. The differences will be the same regardless. A lot of the time it doesn't matter, so pick whichever you like better.
The height of a node is the number of edges on the longest path from the node to a leaf.A leaf node will have a height of 0. Height of tree is height of its root node.
In your case height of tree will be 0.
for detailed answer check this one out.
What is the difference between tree depth and height?

Convert AVL Trees to Red Black tree

I read this statement somewhere that the nodes of any AVL tree T can be colored “red” and “black” so that T becomes a red-black tree.
This statement seems quite convincing but I didn't understand how to formally proof this statement.
According to wiki, A red black tree should satisfy these five properties:
a.A node is either red or black.
b.The root is black. This rule is sometimes omitted. Since the root can always be changed from red to black, but not necessarily vice versa,
c. All leaves (NIL) are black.
d.If a node is red, then both its children are black.
e.Every path from a given node to any of its descendant NIL nodes contains,the same number of black nodes.
The four conditions is quite simple, I got stuck how to proof statement 5
First, define the height of a tree (as used for AVL trees):
height(leaf) = 1
height(node) = 1 + max(height(node.left), height(node.right))
Also, define the depth of a path (as used for red-black trees, a path is the chain of descendants from a given node to some leaf) to be the number of black nodes on the path.
As you point out, the tricky bit about coloring an AVL tree as a red-black tree is making sure that every path has the same depth. You will need to use the AVL invariant: that the subtrees of any given node can differ in height by at most one.
Intuitively, the trick is to use a coloring algorithm whose depth is predictable for a given height, such that you don't need to do any further global coordination. Then, you can tweak the coloring locally, to ensure the children of each node have the same depth; this is possible only because the AVL condition puts strict limits on their height difference.
This tree-coloring algorithm does the trick:
color_black(x):
x.color = black;
if x is a node:
color_children(x.left, x.right)
color_red(x): // height(x) must be even
x.color = red
color_children(x.left, x.right) // x will always be a node
color_children(a,b):
if height(a) < height(b) or height(a) is odd:
color_black(a)
else:
color_red(a)
if height(b) < height(a) or height(b) is odd:
color_black(b)
else:
color_red(b)
For the root of the AVL tree, call color_black(root) to ensure b.
Note that the tree is traversed in depth-first order, also ensuring a.
Note that red nodes all have even height. Leaves have height 1, so they will be colored black, ensuring c. Children of red nodes will either have odd height or will be shorter than their sibling, and will be marked black, ensuring d.
Finally, to show e. (that all paths from root have the same depth),
use induction on n>=1 to prove:
for odd height = 2*n-1,
color_black() creates a red-black tree, with depth n
for even height = 2*n,
color_red() sets all paths to depth n
color_black() creates a red-black tree with depth n+1
Base case, for n = 1:
for odd height = 1, the tree is a leaf;
color_black() sets the leaf to black; the sole path has depth 1,
for even height = 2, the root is a node, and both children are leaves, marked black as above;
color_red() sets node to red; both paths have depth 1
color_black() sets node to black; both paths have depth 2
The induction step is where we use the AVL invariant: sibling trees can differ in height by at most 1. For a node with a given height:
subcase A: both subtrees are (height-1)
subcase B: one subtree is (height-1), and the other is (height-2)
Induction step: given the hypothesis is true for n, show that it holds for n+1:
for odd height = 2*(n+1)-1 = 2*n+1,
subcase A: both subtrees have even height 2*n
color_children() calls color_red() for both children,
via induction hypothesis, both children have depth n
for parent, color_black() adds a black node, for depth n+1
subcase B: subtrees have heights 2*n and 2*n-1
color_children() calls color_red() and color_black(), resp;
for even height 2*n, color_red() yields depth n (induction hyp.)
for odd height 2*n-1, color_black() yields depth n (induction hyp.)
for parent, color_black() adds a black node, for depth n+1
for even height = 2*(n+1) = 2*n + 2
subcase A: both subtrees have odd height 2*n+1 = 2*(n+1)-1
color_children() calls color_black() for both children, for depth n+1
from odd height case above, both children have depth n+1
for parent, color_red() adds a red node, for unchanged depth n+1
for parent, color_black() adds a black node, for depth n+2
subcase B: subtrees have heights 2*n+1 = 2*(n+1)-1 and 2*n
color_children() calls color_black() for both children, for depth n+1
for odd height 2*n+1, color_black() yields depth n+1 (see above)
for even height 2*n, color_black() yields depth n+1 (induction hyp.)
for parent, color_red() adds a red node, for depth n+1
for parent, color_black() adds a black node, for depth n+2 = (n+1)+1
Well, simple case for #5 is a single descendant, which is a leaf, which is black by #3.
Otherwise, the descendant node is red, which is required to have 2 black descendants by #4.
Then these two cases recursively apply at each node, so you'll always have the same amount of black nodes in each path.
Even if you can convert an AVL tree to a red-black tree, the cost is very large. The shape of a tree has nothing to do with the internal structure, which requires a total rebuilding.
The maximum local height difference bound of the red-black tree is 2.

Dynamic Programming Help: Binary Tree Cost Edge

Given a binary tree with n leaves and a set of C colors. Each leaf node of the tree is given a unique color from the set C. Thus no leaf nodes have the same color. The internal nodes of the tree are uncolored. Every pair of colors in the set C has a cost associated with it. So if a tree edge connects two nodes of colors A and B, the edge cost is the cost of the pair (A, B). Our aim is to give colors to the internal nodes of the tree, minimizing the total edge cost of the tree.
I have working on this problem for hours now, and haven't really come up with a working solution. Any hints would be appreciated.
I am going to solve the problem with pseudocode, because I tried writing explanation and it was completely not understandable even for me. Hopefully the code will do the trick. The complexity of my solution is not very good - memory an druntime in O(C^2 * N).
I will need couple of arrays I will be using in dynamic approach to your task:
dp [N][C][C] -> dp[i][j][k] the maximum price you can get from a tree rooted at node i, if you paint it in color j and its parent is colored in color k
maxPrice[N][C] -> maxPrice[i][j] the maximum price you can get from a tree rooted in node i if its parent is colored in color j
color[leaf] -> the color of the leaf leaf
price[C][C] -> price[i][j] the price you get if you have pair of neighbouring nodes with colors i and j
chosenColor[N][C] -> chosenColor[i][j] the color one should choose for node i to obtain maxPrice[i][j]
Lets assume the nodes are ordered using topological sorting, i.e we will be processing first leaves. Topological sorting is very easy to do in a tree. Let the sorting have given a list of inner nodes inner_nodes
for leaf in leaves:
for i in 0..MAX_C, j in 0..MAX_C
dp[leaf][i][j] = (i != color[leaf]) ? 0 : price[i][j]
for i in 0..MAX_C,
maxPrice[leaf][i] = price[color[leaf]][i]
chosenColor[leaf][i] = color[leaf]
for node in inner_nodes
for i in 0..MAX_C, j in 0..MAX_C
dp[node][i][j] = (i != root) ? price[i][j] : 0
for descendant in node.descendants
dp[node][i][j] += maxPrice[descendant][i]
for i in 0...MAX_C
for j in 0...MAX_C
if maxPrice[node][i] < dp[node][j][i]
maxPrice[node][i] = dp[node][j][i]
chosenColor[node][i] = j
for node in inner_nodes (reversed)
color[node] = (node == root) ? chosenColor[node][0] : chosenColor[node][color[parent[node]]
As a starting point, you can use a greedy solution which gives you an upper bound on the total cost:
while the root is not colored
pick an uncolored node having colored descendants only
choose the color that minimizes the total cost to its descendants

Resources