Disjoint Set and Union Data Structure - algorithm

A union-find structure is a data structure
supporting the following operations:
● find(x), which returns the representative of
node x, and
● union(x, y), which merges the sets containing x
and y into a single set.
Find(x) is having a time complexity of O(n) , so to improve this we are advisied to used concept of Ranks
i.e.
the larger connected component eats up the smaller oneWhich improves the time complexity to O(logn)
I could not understand How we are improving Time Complexity By merging trees on their basics of Rank(Depth) , and How the O(logn) time complexity is achieved.
Please help me to Understand my concept of merging trees on the basis of their Rank.

The key is to understand the maximal height of the tree representing the sets is of size log(n) + 1, thus, following up nodes from any given node to its root is done by O(log(n)) steps.
We now have to prove the claim that each tree in the disjoint set forest is at most of height log(n) + 1 - where n is the number of nodes in this tree. We will prove it by induction and show that after each union(x,y) - this property remains unchanged.
Base: When we begin, we have n different trees, all of size 1. log(1) + 1 = 1 - so each tree is indeed of maximal height log(n) + 1
Union(x,y): We unite two sets, x of size n1 and y of size n2. Without loss of generality, let n1<=n2.
From induction hypothesis, the height h1 of the tree representing x is at most log(n2)+1
So, the union operation is done by changing x's root to point to y's root. This means that the maximal height of any node that was in x is now at most
h1+1 = log(n1)+1 + 1 = log(n1) + log(2) + 1 = log(2*n1) + 1 = log(n1 + n1) + 1 <= log(n1 + n2) + 1
So, we have just found out that for every node that was formally in x, the maximal distance to the root is log(n1+n2) + 1, and the size of the new tree (x and y united) is now n1+n2, so we proved that the desired property remains for any node that was formally in x.
For y - the distance to root remains, while the size of the tree does not shrink - so the property is valid there too.
In conclusion - for all node that was in x or y, the maximal depth from the new root is now log(n1+n2)+1, as required.
QED
remark - all log in this answer is with base 2.

Related

Why is the time complexity of performing n union find (union by size) operations O(n log n)?

In Tree based Implementation of Union Find operation, each element is stored in a node, which contains a pointer to a set name. A node v whose set pointer points back to v is also a set name. Each set is a tree, rooted at a node with a self-referencing set pointer.
To perform a union, we simply make the root of one tree point to the root of the other. To perform a find, we follow set name pointers from the starting node until reaching a node whose set name pointer refers back to itself.
In Union by size -> When performing a union, we make the root of smaller tree
point to the root of the larger. This implies O(n log n) time for
performing n union find operations. Each time we follow a pointer, we are going to a subtree of size at most double the size of the previous subtree. Thus, we will follow at most O(log n) pointers for any find.
I do not understand how for each union operation, Find operation is always O(log n). Can someone please explain how the worst case complexity is actually computed?
Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?
If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).
Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.
The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.
Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.
You can do n union find (union by rank or size) operations with complexity O(n lg* n) where lg* n is the inverse Ackermann function using path compression optimization.
Note that O(n lg* n) is better than O(n log n)
In the question Why is the Ackermann function related to the amortized complexity of union-find algorithm used for disjoint sets? you can find details about this relation.
We need to prove that maximum height of trees is log(N) where N is the number of items in UF (1)
In the base case, all trees have a height of 0. (1) of course satisfied
Now assuming all the trees satisfy (1) we need to prove that joining any 2 trees with i, j (i <= j) nodes will create a new tree with maximum height is log(i + j)(2):
Because the joining 2 trees procedure gets root node of the smaller tree and attach it to the root node of the bigger one so the height of the new tree will be:
max(log(j), 1 + log(i)) = max(log(j), log(2i)) <= log(i + j) => (2) proved
log(j): height of new tree is still the height of the bigger tree
1 + log(i): when height of 2 trees are the same
See the picture below for more details:
Ref: book Algorithms

Running time of algorithm with arbitrary sized recursive calls

I have written the following algorithm that given a node x in a Binary Search Tree T, will set the field s for all nodes in the subtree rooted at x, such that for each node, s will be the sum of all odd keys in the subtree rooted in that node.
OddNodeSetter(T, x):
if (T.x == NIL):
return 0;
if (T.x.key mod 2 == 1):
T.x.s = T.x.key + OddNodeSetter(T, x.left) + OddNodeSetter(T, x.right)
else:
T.x.s = OddNodeSetter(T, x.left) + OddNodeSetter(T, x.right)
I've thought of using the master theorem for this, with the recurrence
T(n) = T(k) + T(n-k-1) + 1 for 1 <= k < n
however since the size of the two recursive calls could vary depending on k and n-k-1 (i.e. the number of nodes in the left and right subtree of x), I can't quite figure out how to solve this recurrence though. For example in case the number of nodes in the left and right subtree of x are equal, we can express the recurrence in the form
T(n) = 2T(n/2) + 1
which can be solved easily, but that doesn't prove the running time in all cases.
Is it possible to prove this algorithm runs in O(n) with the master theorem, and if not what other way is there to do this?
The algorithm visits every node in the tree exactly once, hence O(N).
Update:
And obviously, a visit takes constant time (not counting the recursive calls).
There is no need to use the Master theorem here.
Think of the problem this way: what is the maximum number of operations you have do for each node in the tree? It is bounded by a constant. And what the is the number of nodes in the tree? It is n.
The multiplication of constant with n is still O(n).

Proof that a binary tree with n leaves has a height of at least log n

I've been able to create a proof that shows the maximum total nodes in a tree is equal to n = 2^(h+1) - 1 and logically I know that the height of a binary tree is log n (can draw it out to see) but I'm having trouble constructing a formal proof to show that a tree with n leaves has "at least" log n. Every proof I've come across or been able to put together always deals with perfect binary trees, but I need something for any situation. Any tips to lead me in the right direction?
Lemma: the number of leaves in a tree of height h is no more than 2^h.
Proof: the proof is by induction on h.
Base Case: for h = 0, the tree consists of only a single root node which is also a leaf; here, n = 1 = 2^0 = 2^h, as required.
Induction Hypothesis: assume that all trees of height k or less have fewer than 2^k leaves.
Induction Step: we must show that trees of height k+1 have no more than 2^(k+1) leaves. Consider the left and right subtrees of the root. These are trees of height no more than k, one less than the height of the whole tree. Therefore, each has at most 2^k leaves, by the induction hypothesis. Since the total number of leaves is just the sum of the numbers of leaves of the subtrees of the root, we have n = 2^k + 2^k = 2^(k+1), as required. This proves the claim.
Theorem: a binary tree with n leaves has height at least log(n).
We have already noted in the lemma that the tree consisting of just the root node has one leaf and height zero, so the claim is true in that case. For trees with more nodes, the proof is by contradiction.
Let n = 2^a + b where 0 < b <= 2^a. Now, assume the height of the tree is less than a + 1, contrary to the theorem we intend to prove. Then the height is at most a. By the lemma, the maximum number of leaves in a tree of height a is 2^a. But our tree has n = 2^a + b > 2^a leaves, since 0 < b; a contradiction. Therefore, the assumption that the height was less than a+1 must have been incorrect. This proves the claim.

Given a binary tree find all root-to-leaf paths

Given a binary tree the problem is to find all root-to-leaf paths. And we know the algorithm by passing the path in the form of a list and adding it to our result as soon as we hit a leaf.
My question How much space does storing all the path consumes. My intuition is that each path is going to consume memory order of the height of tree(O(h)), and if there are 2*n - 1 nodes in our full binary tree and then there are n leafs each corresponding to a path and so the space complexity would be O(n*log(n)) assuming the tree is height balanced. Is my analysis correct?
Your reasoning is correct, but it can be made more exact. A balanced binary tree is not necessarily a full binary tree.
Let N(h) be the number of paths when the height is h. Then N(h) &leq; 2 N(h - 1) This is because, given a tree of height h, the children are each trees of height at most h - 1. So
N(h) = O(2h).
Now we need to bound h. Since h appears in the exponent, it is not enough to find its order of growth. More exactly, it is known that
n &geq; 2h - 1
so
h &leq; log(n + 1)
Inserting this to what we have before
N(h) = O(2log(n + 1)) = O(n).
As you wrote the memory is the sum, per path, of the nodes on the path. The sum of nodes on each path is at most log(n + 1). Combining all the above gives O(n log(n)).
Actually, there is a direct result from the very definition of a tree: there is a unique path between any 2 nodes.
So, if n is the number of leaves, then total (root to leaf) paths = n.
And correspondingly, the tree's height is O(log n).

Number of nodes in the bottom level of a balanced binary tree

I am wondering about two questions that came up when studying about binary search trees. They are the following:
What is the maximum number of nodes in the bottom level of a balanced binary search tree with n nodes?
What is the minimum number of nodes in the bottom level of a balanced binary search tree with n nodes?
I cannot find any formulas in my textbook regarding this. Is there any way to answers these questions? Please let me know.
Using notation:
H = Balanced binary tree height
L = Total number of leaves in a full binary tree of height H
N = Total number of nodes in a full binary tree of height H
The relation is L = (N + 1) / 2 as demonstrated below. That would be the maximum number of leaf nodes for a given tree height H. The minimum number of nodes at a given height is 1 (cannot be zero, because then the tree height would be reduced by one).
Drawing trees with increasing heights, one can observe that:
H = 1, L = 1, N = 1
H = 2, L = 2, N = 3
H = 3, L = 4, N = 7
H = 4, L = 8, N = 15
...
The relation between tree height (H) and the total number of leaves (L)
and the total number of nodes (N) becomes apparent:
L = 2^(H-1)
N = (2^H) - 1
The correctness is easily proven using mathematical induction.
Examples above show that it is true for small H.
Simply put in the value of H (e.g. H=1) and compute L and N.
Assuming the formulas are true for some H, one can show they are also true for HH=H+1:
For L, the assumption is that L=2^(H-1) is true.
As each node has two children, increasing the height by one
is going to replace each leaf node with two new leaves, effectively
doubling the total number of leaves. Therefore, in case of HH=H+1,
the total number of leaves (LL) is going to be doubled:
LL = L * 2
= 2^(H-1) * 2
= 2^(H)
= 2^(HH-1)
For N, the assumption is that N=(2^H)-1 is true.
Increasing the height by one (HH=H+1) increases the total number
of nodes by the total number of added leaf nodes. Therefore,
NN = N + LL
= (2^H) - 1 + 2^(HH-1)
= 2^(HH-1) - 1 + 2^(HH-1)
= 2 * 2^(HH-1) - 1
= (2^HH) - 1
Applying the mathematical induction, the correctness is proven.
H can be expressed in terms of N:
N = (2^H) - 1 // +1 to both sides
N + 1 = 2^H // apply log2 monotone function to both sides
log2(N+1) = log2(2^H)
= H * log2(2)
= H
The direct relation between L and N (which is the answer to the question asked) is:
L = 2^(H - 1) // replace H = log2(N + 1)
= 2^(log2(N + 1) - 1)
= 2^(log2(N + 1) - log2(2))
= 2^(log2( (N + 1) / 2 ))
= (N + 1) / 2
For Big O analysis, the constants are discarded, so the Binary Search Tree lookup time complexity (i.e. H with respect to the input size N) is O(log2(N)). Also, keeping in mind the formula for changing the logarithm base:
log2(N) = log10(N) / log10(2)
and discarding the constant factor 1/log10(2), where instead of 10 one can have an arbitrary logarithm base, the time complexity is simply O(log(N)) regardless of the chosen logarithm base constant.
Assuming that it's a full binary tree, the number of nodes in the leaf will always be equal to (n/2)+1.
For the minimum number of nodes, the total number of nodes could be 1 (satisfying the condition that it should be a balanced tree).
I got the answers from my professor.
1) Maximum number of nodes at the last level: ⌈n/2⌉
If there is a balanced binary search tree with 7 nodes, then the answer would be ⌈7/2⌉ = 4 and for a tree with 15 nodes, the answer would be ⌈15/2⌉ = 8.
But what is troubling is the fact that this formula gives the right answer only when the last level of a balanced tree is completely filled from left to right.
For example, a balanced binary search tree with 5 nodes, the above formula gives an answer of 3 which is not true because a tree with 5 nodes can contain a maximum nodes of 4 nodes at the last level. So I am guessing he meant full balanced binary search tree.
2) Minimum number of nodes at the last level: 1
The maximum number of nodes at level L in a binary tree is 2^L (if you assume that the vertex is level 0). This is easy to see because at each level you spawn 2 children from each previous leaf. The fact that it is balanced/search tree is irrelevant. So you have to find the biggest L such that 2^L < n and subtract it from n. Which in math language is:
The minimum number of nodes depends on the way you balance your tree. There can be height-balanced trees, weight-balanced trees and I assume other balanced trees. Even with height balanced trees you can define what do you mean by a balanced tree. Because technically a tree of 2^N nodes that has a hight of N + 2 is still a balanced tree.

Resources