Disjoint-set forests - why should the rank be increased by one when the find of two nodes are of same rank? - algorithm

I am implementing the disjoint-set datastructure to do union find. I came across the following statement in Wikipedia:
... whenever two trees of the same rank r are united, the rank of the result is r+1.
Why should the rank of the joined tree be increased by only one when the trees are of the same rank? What happens if I simply add the two ranks (i.e. 2*r)?

First, what is rank? It is almost the same as the height of a tree. In fact, for now, pretend that it is the same as the height.
We want to keep trees short, so keeping track of the height of every tree helps us do that. When unioning two trees of different height, we make the root of the shorter tree a child of the root of the taller tree. Importantly, this does not change the height of the taller tree. That is, the rank of the taller tree does not change.
However, when unioning two trees of the same height, we make one root the child of the other, and this increases the height of that overall tree by one, so we increase the rank of that root by one.
Now, I said that rank was almost the same as the height of the tree. Why almost? Because of path compression, a second technique used by the union-find data structure to keep trees short. Path compression can alter an existing tree to make it shorter than indicated by its rank. In principle, it might be better to make decisions based on the actual height than using rank as a proxy for height, but in practice, it is too hard/too slow to keep track of the true height information, whereas it is very easy/fast to keep track of rank.
You also asked "What happens if I simply add the two ranks (i.e. 2*r)?" This is an interesting question. The answer is probably nothing, meaning everything will still work just fine, with the same efficiency as before. (Well, assuming that you use 1 as your starting rank rather than 0.) Why? Because the way rank is used, what matters is the relative ordering of ranks, not their absolute magnitudes. If you add them, then your ranks will be 1,2,4,8 instead of 1,2,3,4 (or more likely 0,1,2,3), but they will still have exactly the same relative ordering so all is well. Your rank is simply 2^(the old rank). The biggest danger is that you run a larger risk of overflowing the integer used to represent the rank when dealing with very large sets (or, put another way, that you will need to use more space to store your ranks).
On the other hand, notice that by adding the two ranks, you are approximating the size of the trees rather than the heights of the trees. By always adding the two ranks, whether they are equal or not, then you are exactly tracking the sizes of the trees. Again, everything works just fine, with the same caveats about the possibility of overflowing integers if your trees are very large.
In fact, union-by-size is widely recognized as a legitimate alternative to union-by-rank. For some applications, you actually want to know the sizes of the sets, and for those applications union-by-size is actually preferabe to union-by-rank.

Because in this case - you add one tree is a "sub tree" of the other - which makes the original subtree increase its size.
Have a look at the following example:
1 3
| |
2 4
In the above, the "rank" of each tree is 2.
Now, let's say 1 is going to be the new unified root, you will get the following tree:
1
/ \
/ \
3 2
|
4
after the join the rank of "1" is 3, rank_old(1) + 1 - as expected.1
As for your second question, because it will yield false height for the trees.
If we take the above example, and merge the trees to get the tree of rank 3. What would happen if we then want to merge it with this tree2:
9
/ \
10 11
|
13
|
14
We'll find out both ranks are 4, and try to merge them the same way we did before, without favoring the 'shorter' tree - which will result in trees with higher height, and ultimately - worse time complexity.
(1) Disclaimer: The first part of this answer is taken from my answer to a similar question (though not identical due to your last part of the question)
(2) Note that the above tree is syntatically made, it cannot be created in an optimized disjoint forests algorithms, but it still demonstrates the issues needed for the answer.

If you read that paragraph in a little more depth, you'll realize that rank is more like depth, not size:
Since it is the depth of the tree that affects the running time, the tree with smaller depth gets added under the root of the deeper tree, which only increases the depth if the depths were equal. In the context of this algorithm, the term "rank" is used instead of "depth" ...
and a merge of equal depth trees only increases the depth of the tree by one since the root of the one is added to the root of the other.
Consider:
A D
/ \ merged with / \
B C E F
is:
A
/|\
B C D
/ \
E F
The depth was 2 for both, and it's 3 for the merged one.

Rank represents the depth of the tree, not the number of nodes in it. When you join a tree with a smaller rank with a tree with a larger rank, the overall rank remains the same.
Consider adding a tree with rank 4 to the root of the tree of rank 6: since we added a node above the root of the depth-4 tree, that subtree now has a rank of 5. The subtree to which we've added our depth-4 tree, however, is 6, so the rank does not change.
Now consider adding a tree with rank 6 to the root of a second tree of rank 6: since the root of the first depth-6 tree now has an extra node above it, the rank of that subtree (and the tree overall) changes to 7.
Since the rank of the tree determines the processing speed, the algorithm tries to keep the rank as low as possible by always attaching a shorter tree to the taller one, keeping the overall rank unchanged. The rank changes only when the trees have identical ranks, in which case one of them gets attached to the root of the other, bumping up the rank by one.

Actually Here two important properties should be known very well to us ....
1) What is Rank ?
2) Why Rank is Used ???
Rank is nothing but the depth of a tree .U can say rank as depth (level) of a tree . When we make union nodes then these (graph nodes ) will be formed as a tree with an ultimate root node.Rank is expressed only for those root nodes .
A merged with D
Initially A has rank (level) 0 and D has rank(level) 0 . So u can merge them making anyone of them as a root . Because if u make A as root the rank(level) will be 1
and if u make D as a root then the rank will also be 1
A
`D
Here rank ( level ) is 1 when root is A .
Now think for another ,
A merge B -----> A
`D `C / \
D B
\
C
So the level will be increased by 1 , see exactly without root (A) there is at most height / depth / rank is 2 . rank[ 1] -> {D,B} and rank [2] -> {C} ................
Now our main objective is to make tree with minimum rank(depth) as possible while merging ..
Now when two differnt rank tree merge ,then
A(rank 0) merge B(rank 1)---> B Here merged tree rank is 1 same as high rank (1)
`C / \
A C
When small rank goes under over high rank . Then the merged tree's rank(height/depth) will be the same rank associated with higher rank tree .That means the rank will not increase , the merged tree rank will be same as higher rank before ...
But if we will do the reverse work means high rank tree goes under over low rank tree then see ,
A ( rank 0 ) merge B (rank 1 ) --> A ( merged tree rank 2 greater than both )
`C `B
`C
So , whatever is seen from following observation is that if we try to keep rank (height) of merged tree as minimum possible then , we have to choose the first process. i think this part is clear !!
Now u have to understand what is our objective to keep tree's height minimum as possible ..........
when we use disjoint set union then for path compression ( finding ultimate root with whom a node is connected ) when we traverse from a node to it's root node then if it's height (rank) is long then time processing will be slow .That's why when we try to merge two trees then we try to keep heigh/depth/rank as minimum as possible

Related

Depth vs Level of a Tree

Are they the same thing?
In this article Height,Depth and Level of a Tree
Depth is defined as the number of edges from a node to the tree's root node while
Level is defined as
1 + the number of connections between the node and the root."
or basically depth + 1
and in this link
What is level of root node in a tree?
It is said that level can either start with 1 or 0 which makes it the same with depth if it starts with 0
So which is which? If it is 1 + depth then what is the use of adding 1?
well, it will be best explained by a image, just see below:
// I've used 1 for roots level
// though some people consider roots level as 0, so you can use either 0 or 1
// I would prefer to use 1
// but its your choice
o(depth=0, height=3, lev=1)
/ \
(depth=1, height=2, lev=2)o o(depth=1, height=1, lev=2)
/ / \
(depth=2, height=1, lev=3)o o o(depth=2, height=0, lev=3)
/
(depth=3, height=0, lev=4)o
I hope its clear to you now...
The question was asked because there were already a lot about height vs depth of a tree but not a clear distinction in level and depth of a tree and is often times used interchangeably.
So as I have read here, in different articles and in a book;
The level is depth + 1. It is not the same with depth although some choose to start the level with 0.
Depth is mostly used in relation to the root as
Depth is the number of edges from the root to a node
So it is mostly treated as a property of a node while the level is mostly used as a whole e.g.
Width is the number of nodes in a level
Or in
A perfect binary tree is where all internal nodes have two children and all leaves are at the same level
So level is like steps in a tree wherein the root node is the first step and it just so happen that it shared the same pattern with the depth of a node.
Although there is no single definition, to distinguish the two the level is mostly taken as depth + 1.

What is the total number of nodes generated by Depth-First Search

Assume: 'd' is the finite depth of a tree ; 'b' is a branching factor ; 'g' is the shallowest goal node.
From what I know, the worst-case is when the goal node is at the very last right-bottomed node in a tree.
Thus, supposedly the total number of nodes generated is O(bg), right?
However, my instructor told me it was wrong since the worst-case is when all the tree are explored except the subtree rooted at the goal node.
He mentioned something about O(bd) - O(b(g-d)) .... I'm not entirely sure.
I don't really get what he means, so can somebody tell me which answer is correct?
I recommend drawing a tree, marking the nodes that are explored, and counting how many there are.
Your reasoning is correct if you use breadth first search because you will only have reached a depth of g for each branch (O(b**g) nodes explored in total).
Your instructor's reasoning is correct if you use depth first search because you reach a depth of d for all parts of the tree except the one with the goal (O(b**d - b**(d-g)) nodes explored).
The goal is the green circle.
The blue nodes are explored.
The red nodes are not explored.
To count the number explored we count the total in the tree, and take away the red ones.
Depth = 2 = d
Goal at depth = 1 = g
Branching factor = b = 3
Note that I have called the total number of nodes in the tree O(b**d). Strictly speaking, the total is b**d + b**(d-1) + b**(d-2) + ... + 1, but this is O(b**d).

Given the level order traversal of two complete binary trees, how to check whether one tree is mirror of other?

How to check whether two complete binary trees are mirror of each other where only the level order traversal of the trees are given ?
A Complete binary tree is a binary tree which all the nodes except the leaf nodes have 2 child nodes.
I don't think this is possible. Consider these two trees:
0 0
/ \ / \
1 2 1 2
/ \ / \ / \
3 4 3 4 5 6
/ \
5 6
These are complete binary trees (according to your definition), and even though they're different trees they have the same level-order traversals: 0123456.
Now, look at their mirrors:
0 0
/ \ / \
2 1 2 1
/ \ / \ / \
4 3 6 5 4 3
/ \
6 5
Notice that the left tree has level-order traversal 0214365, while the right tree has level-order traversal 0216543. In other words, the original trees have the same level order traversals, but their mirrors have different traversals.
Now, think about what happens if you have your algorithm and you feed in 0123456 (the level-order traversal of either of the trees) and 0214365 (the level-order traversal of one of the mirrors). What can the algorithm say? If it says that they're mirrors, it will be wrong if you fed in the second of the input trees. If it says that they're not mirrors, it will be wrong if you fed in the first of the input trees. Therefore, there's no way for the algorithm to always produce the right answer.
Hope this helps!
According to general definitions, in a complete binary tree, every level except possibly the last, is completely filled, and all nodes are as far left as possible. So a complete binary tree may have a node with just one child (for eg, one root node with one left child is a complete binary tree). A tree where all nodes except the leaves have 2 child nodes is called a full binary tree.
For complete binary trees, the problem would be trivial. Starting from the top-down, for the ith level you need to compare 2^i elements (root being the 0th level) of the given level-order traversals A and B. For any given i, the set of 2^i elements from A should be equal to the reverse of these elements from B. However, the last level may not be completely filled and you'll need to account for that.
For full binary trees, where the only constraint given is that every node has 2 or no children, it won't be possible unless you have constructed the tree itself. And you cannot construct a tree by using only level-order traversal. templatetypedef provides a good example.

Is it always possible to turn one BST into another using tree rotations?

Given a set of values, it's possible for there to be many different possible binary search trees that can be formed from those values. For example, for the values 1, 2, and 3, there are five BSTs we can make from those values:
1 1 2 3 3
\ \ / \ / /
2 3 1 3 1 2
\ / \ /
3 2 2 1
Many data structures that are based on balanced binary search trees use tree rotations as a primitive for reshaping a BST without breaking the required binary search tree invariants. Tree rotations can be used to pull a node up above its parent, as shown here:
rotate
u right v
/ \ -----> / \
v C A u
/ \ <----- / \
A B rotate B C
left
Given a BST containing a set of values, is it always possible to convert that BST into any arbitrary other BST for the same set of values? For example, could we convert between any of the five BSTs above into any of the other BSTs just by using tree rotations?
The answer to your question depends on whether you are allowed to have equal values in the BST that can appear different from one another. For example, if your BST stores key/value pairs, then it is not always possible to turn one BST for those key/value pairs into a different BST for the same key/value pairs.
The reason for this is that the inorder traversal of the nodes in a BST remains the same regardless of how many tree rotations are performed. As a result, it's not possible to convert from one BST to another if the inorder traversal of the nodes would come out differently. As a very simple case, suppose you have a BST holding two copies of the number 1, each of which is annotated with a different value (say, A or B). In that case, there is no way to turn these two trees into one another using tree rotations:
1:a 1:b
\ \
1:b 1:a
You can check this by brute-forcing the (very small!) set of possible trees you can make with the rotations. However, it suffices to note that an inorder traversal of the first tree gives 1:a, 1:b and an inorder traversal of the second tree gives 1:b, 1:a. Consequently, no number of rotations will suffice to convert between the trees.
On the other hand, if all the values are different, then it is always possible to convert between two BSTs by applying the right number of tree rotations. I'll prove this using an inductive argument on the number of nodes.
As a simple base case, if there are no nodes in the tree, there is only one possible BST holding those nodes: the empty tree. Therefore, it's always possible to convert between two trees with zero nodes in them, since the start and end tree must always be the same.
For the inductive step, let's assume that for any two BSTs of 0, 1, 2, .., n nodes with the same values, that it's always possible to convert from one BST to another using rotations. We'll prove that given any two BSTs made from the same n + 1 values, it's always possible to convert the first tree to the second.
To do this, we'll start off by making a key observation. Given any node in a BST, it is always possible to apply tree rotations to pull that node up to the root of the tree. To do this, we can apply this algorithm:
while (target node is not the root) {
if (node is a left child) {
apply a right rotation to the node and its parent;
} else {
apply a left rotation to the node and its parent;
}
}
The reason that this works is that every time a node is rotated with its parent, its height increases by one. As a result, after applying sufficiently many rotations of the above forms, we can get the root up to the top of the tree.
This now gives us a very straightforward recursive algorithm we can use to reshape any one BST into another BST using rotations. The idea is as follows. First, look at the root node of the second tree. Find that node in the first tree (this is pretty easy, since it's a BST!), then use the above algorithm to pull it up to the root of the tree. At this point, we have turned the first tree into a tree with the following properties:
The first tree's root node is the root node of the second tree.
The first tree's right subtree contains the same nodes as the second tree's right subtree, but possibly with a different shape.
The first tree's left subtree contains the same nodes as the second tree's left subtree, but possibly with a different shape.
Consequently, we could then recursively apply this same algorithm to make the left subtree have the same shape as the left subtree of the second tree and to make the right subtree have the same shape as the right subtree of the second tree. Since these left and right subtrees must have strictly no more than n nodes each, by our inductive hypothesis we know that it's always possible to do this, and so the algorithm will work as intended.
To summarize, the algorithm works as follows:
If the two trees are empty, we are done.
Find the root node of the second tree in the first tree.
Apply rotations to bring that node up to the root.
Recursively reshape the left subtree of the first tree to have the same shape as the left subtree of the second tree.
Recursively reshape the right subtree of the first tree to have the same shape as the right subtree of the second tree.
To analyze the runtime of this algorithm, note that applying steps 1 - 3 requires at most O(h) steps, where h is the height of the first tree. Every node will be brought up to the root of some subtree exactly once, so we do this a total of O(n) times. Since the height of an n-node tree is never greater than O(n), this means that the algorithm takes at most O(n2) time to complete. It's possible that it will do a lot better (for example, if the two trees already have the same shape, then this runs in time O(n)), but this gives a nice worst-case bound.
Hope this helps!
For binary search trees this can actually be done in O(n).
Any tree can be "straightened out", ie put into a form in which all nodes are either the root or a left child.
This form is unique (reading down from root gives the ordering of the elements)
A tree is straightened out as follows:
For any right child, perform a left rotation about itself. This decreases the number of right children by 1, so the tree is straightened out in O(n) rotations.
If A can be straightened out into S in O(n) rotations, and B into S in O(n) rotations, then since rotations are reversible one can turn A -> S -> B in O(n) rotations.

IOI 2003 : how to calculate the node that has the minimum balance in a tree?

here is the Balancing Act problem that demands to find the node that has the minimum balance in a tree. Balance is defined as :
Deleting any node
from the tree yields a forest : a collection of one or more trees. Define the balance of a node to be the size of the largest tree in the forest T created by deleting that node from T
For the sample tree like :
2 6 1 2 1 4 4 5 3 7 3 1
Explanation is :
Deleting node 4 yields two trees whose member nodes are {5} and {1,2,3,6,7}. The
larger of these two trees has five nodes, thus the balance of node 4 is five. Deleting node
1 yields a forest of three trees of equal size: {2,6}, {3,7}, and {4,5}. Each of these trees
has two nodes, so the balance of node 1 is two.
What kind of algorithm can you offer to this problem?
Thanks
I am going to assume that you have had a looong look at this problem: reading the solution does not help, you only get better at solving these problems by solving them yourself.
So one thing to observe is, the input is a tree. That means that each edge joins 2 smaller trees together. Removing an edge yields 2 disconnected trees (a forest of 2 trees).
So, if you calculate the size of the tree on one side of the edge, and then on the other, you should be able to look at a node's edges and ask "What is the size of the tree on the other side of this edge?"
You can calculate the sizes of trees using dynamic programming - your recurrence state is "What edge am I on? What side of the edge am I on?" and it calculates the size of the tree "hung" at that node. That is the crux of the problem.
Having that data, it is sufficient to iterate through all the nodes, look at their edges and ask "What is the size of the tree on the other side of this edge?" From there, you just pick the minimum.
Hope that helps.
You basically want to check 3 things for every node:
The size of its left subtree.
The size of its right subtree.
The size of the rest of the tree. (size of tree - left - right)
You can use this algorithm and expand it to any kind of tree (different number of subnodes).
Go over the tree in an in-order sequence.
Do this recursively:
Every time you just before you back up from a node to the "father" node, you need to add 1+size of node's total sub trees, to the "father" node.
Then store a value, let's call it maxTree, in the node that holds the maximum between all its subtrees, and the (sum of all subtrees)-(size of tree).
This way you can calculate all the subtree sizes in O(N).
While traversing the tree, you can hold a variable that hold the minimum value found so far.

Resources