Why the Red Black Tree is kept unbalanced after insertion? - data-structures

Here is a red black tree which seems unbalanced. If this is the case, Someone please explain why it is unbalanced?.

The term "balanced" is a bit ambiguous, since different kinds of balanced trees have different constraints.
A red-black tree ensures that every path to a leaf has the same number of black nodes, and at least as many black nodes as red nodes. The result is that the longest path is at most twice as long as the shortest path, which is good enough to guarantee O(log N) time for search, insert, and delete operations.
Most other kinds of balanced trees have tighter balancing constraints. An AVL tree, for example, ensures that the lengths of the longest paths on either side of every node differ by at most 1. This is more than you need, and that has costs -- inserting or deleting in an AVL tree (after finding the target node) takes O(log N) operations on average, while inserting or deleting in a red-black tree takes O(1) operations on average.
If you wanted to keep a tree completely balanced, so that you had the same number of descendents on either side of every node, +/- 1, it would be very expensive -- insert and delete operations would take O(N) time.

Yes it is balanced. The rule says, counting the black NIL leaves, the longest possible path should consists maximum of 2*B-1 nodes where B is black nodes in shortest possible path from the root to any leaf. In your example shortest path has 2 black nodes so B = 2 so longest path can have upto 3 black nodes but it is just 2.

Related

Complexity analysis exercise on RB-Trees

BLACK_PATH(T,x)
if x==NIL
then return TRUE
if COLOR(x)==BLACK
then return BLACK_PATH(T,left(x)) || BLACK_PATH(T,right(x))
return FALSE
The exercises asks to analyse the complexity of this procedure. I believe the reccurrence is the following
T(n)<=2T(2n/3)+O(1)
Using the recursion tree I obtain T(n)=O(n). Is this correct?
The complexity of this method is linear (O(n)) in the worst case with regards to the number of elements in the tree.
Using the master theorem in terms of the total number of nodes here is difficult because it does not take into account the properties of a red black tree. While it is true in general for heaps that every subtree of a tree with n nodes has max 2n/3 nodes, it is also true that for red black trees every subtree has at max n/2 black nodes. This is because red black trees are balanced with respect to black nodes (every path downwards to a leaf node from an arbitrary node has the same number of black nodes).
Most importantly: because the number of total nodes is not asymptotically higher than the number of black nodes you can, by analyzing the complexity purely with regards to the total number of black nodes, implicitly analyze the complexity with regards to the total number of nodes.
So rather than using T(n)<=2T(2n/3)+O(1) you should use T(m)<=T(m/2)+O(1) where m is the number of black nodes which gives you O(m) and because, as previously discussed, O(m)==O(n), we have O(n).
Another way to think about it: So long as you can understand that this algorithm is O(n) when all the nodes in the tree are black, you should be able to understand that it could only possibly require fewer operations if some of the nodes in the tree are red, since regardless of where the red node is every node in the subtree rooted at that red node will be ignored and not visited by this recursive algorithm. So it can only be O(n) or better, establishing O(n) as your worst case.

Running time to check if a binary tree is subtree of another binary tree

I've come across a naive solution for the problem of checking if a binary tree is subtree of another binary tree:
Given two binary trees, check if the first tree is subtree of the second one. A subtree of a tree T is a tree S consisting of a node in T and all of its descendants in T. The subtree corresponding to the root node is the entire tree; the subtree corresponding to any other node is called a proper subtree.
For example, in the following case, tree S is a subtree of tree T:
Tree 2
10
/ \
4 6
\
30
Tree 1
26
/ \
10 3
/ \ \
4 6 3
\
30
The solution is to traverse the tree T in preorder fashion. For every visited node in the traversal, see if the subtree rooted with this node is identical to S.
It is said in the post that the algorithm has a running time of n^2 or O(m*n) in the worst case where m and n are the sizes of both trees involved.
The point of confusion here is that, if we are traversing through both trees at the same time, in the worst case, it would seem that you would simply have to recurse through all of the nodes in the larger tree to find the subtree. So how could this version of the algorithm (not this one) have a quadratic running time?
Well, basically in the isSubTree() function you only traverse T tree (the main one, not a subtree). You do nothing with S, so in the worst case this function would be executed for every node in T. However (in the worst case) for each execution, it will check if areIdentical(T, S), which in the worst case has to fully traverse one of the given trees (till one of those is zero-sized).
Trees passed to areIdentical() function are obviously smaller and smaller, but in this case it doesn't matter if it comes to time complexity. Either way this gives you O(n^2) or O(n*m) (where n,m - number of nodes in those trees).
To solve reasonably optimally, flatten the two trees. Using Lisp notation,
we get
(10 (4(30) (6))
and
(26 (10 (4(30) (6)) (3 (3))
So the subtree is a substring of the parent. Using strstr we can
complete normally in O(N) time, it might take a little bit longer
if we have lots and lots of near sub-trees. You can use a suffix
tree if you need to do lots of searches and that gets it down to O(M)
time where M is the size of the subtree.
But actually runtime doesn't improve. It's the same algorithm,
and it will have N M behaviour if, for example, all the trees
have the same node id and structure, except for the last right
child of the query sub-tree. it's just that the operations
become a lot faster.

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

Split a tree into equal parts by deleting an edge

I am looking for an algorithm to split a tree with N nodes (where the maximum degree of each node is 3) by removing one edge from it, so that the two trees that come as the result have as close as possible to N/2. How do I find the edge that is "the most centered"?
The tree comes as an input from a previous stage of the algorithm and is input as a graph - so it's not balanced nor is it clear which node is the root.
My idea is to find the longest path in the tree and then select the edge in the middle of the longest path. Does it work?
Optimally, I am looking for a solution that can ensure that neither of the trees has more than 2N / 3 nodes.
Thanks for your answers.
I don't believe that your initial algorithm works for the reason I mentioned in the comments. However, I think that you can solve this in O(n) time and space using a modified DFS.
Begin by walking the graph to count how many total nodes there are; call this n. Now, choose an arbitrary node and root the tree at it. We will now recursively explore the tree starting from the root and will compute for each subtree how many nodes are in each subtree. This can be done using a simple recursion:
If the current node is null, return 0.
Otherwise:
For each child, compute the number of nodes in the subtree rooted at that child.
Return 1 + the total number of nodes in all child subtrees
At this point, we know for each edge what split we will get by removing that edge, since if the subtree below that edge has k nodes in it, the spilt will be (k, n - k). You can thus find the best cut to make by iterating across all nodes and looking for the one that balances (k, n - k) most evenly.
Counting the nodes takes O(n) time, and running the recursion visits each node and edge at most O(1) times, so that takes O(n) time as well. Finding the best cut takes an additional O(n) time, for a net runtime of O(n). Since we need to store the subtree node counts, we need O(n) memory as well.
Hope this helps!
If you see my answer to Divide-And-Conquer Algorithm for Trees, you can see I'll find a node that partitions tree into 2 nearly equal size trees (bottom up algorithm), now you just need to choose one of the edges of this node to do what you want.
Your current approach is not working assume you have a complete binary tree, now add a path of length 3*log n to one of leafs (name it bad leaf), your longest path will be within one of a other leafs to the end of path connected to this bad leaf, and your middle edge will be within this path (in fact after you passed bad leaf) and if you partition base on this edge you have a part of O(log n) and another part of size O(n) .

How does a red-black tree work?

There are lots of questions around about red-black trees but none of them answer how they work. Why is it called red-black? How does this keep the tree balanced (thus increasing performance over an unbalanced normal binary search tree)? I'm just looking for an overview of how and why it works.
For searches and traversals, it's the same as any binary tree.
For inserts and deletes, more sophisticated algorithms are applied which aim to ensure that the tree cannot be too unbalanced. These guarantee that all single-item operations will always run in at worst O(log n) time, whereas in a simple binary tree the binary tree can become so unbalanced that it's effectively a linked list, giving O(n) worst case performance for each single-item operation.
The basic idea of the red-black tree is to imitate a B-tree with up to 3 keys and 4 children per node. B-trees (or variations such as B+ trees) are mainly used for database indexes and for data stored on hard disk.
Each binary tree node has a "colour" - red or black. Each black node is, in the B-tree analogy, the subtree root for the subtree that fits within that B-tree node. If this node has red children, they are also considered part of the same B-tree node. So it is possible (though not done in practice) to convert a red-black tree to a B-tree and back, with (most) structure preserved. The only possible anomoly is that when a B-tree node has two keys and three children, you have a choice of which key to goes in the black node in the equivalent red-black tree.
For example, with red-black trees, every line from root to leaf has the same number of black nodes. This rule is derived from the B-tree rule that all leaf nodes are at the same depth.
Although this is the basic idea from which red-black trees are derived, the algorithms used in practice for inserts and deletes are modified to enforce all the B-tree rules (there might be a minor exception - I forget) during updates, but are tailored for the binary tree form. This means that doing a red-black tree insert or delete may give a different structure for the result than that you'd expect comparing with doing the B-tree insert or delete.
For more detail, follow the Wikipedia link that MigDus already supplied.
A red-black tree is an ordered binary tree where each vertex is coloured red or black. The intuition is that a red vertex should be seen as being at the same height as its parent (i.e., an edge to a red vertex is thought of as "horizontal" rather than "descending").
[I don't believe the Wikipedia entry makes this point clear.]
The usual rules for red-black trees require that a red vertex never point to another red vertex. This means that the possible vertex arrangements for any subtree rooted with a black vertex (bbb, bbr, rbb, rbr -- for [left child][root][right child]) correspond to 234 trees.
Searching a red-black tree is just the same as searching an ordinary binary tree. Insertion and deletion are similar, except that a "fix-up" rotation may be required at some point to preserve the red-black invariant.
Cheers!

Resources