Disjoint Set in a special ways? - algorithm

We implement Disjoint Data structure with tree. in this data structure makeset() create a set with one element, merge(i, j) merge two tree of set i and j in such a way that tree with lower height become a child of root of the second tree. if we do n makeset() operation and n-1 merge() operations in random manner, and then do one find operation. what is the cost of this find operation in worst case?
I) O(n)
II) O(1)
III) O(n log n)
IV) O(log n)
Answer: IV.
Anyone could mentioned a good tips that the author get this solution?

The O(log n) find is only true when you use union by rank (also known as weighted union). When we use this optimisation, we always place the tree with lower rank under the root of the tree with higher rank. If both have the same rank, we choose arbitrarily, but increase the rank of the resulting tree by one. This gives an O(log n) bound on the depth of the tree. We can prove this by showing that a node that is i levels below the root (equivalent to being in a tree of rank >= i) is in a tree of at least 2i nodes (this is the same as showing a tree of size n has log n depth). This is easily done with induction.
Induction hypothesis: tree size is >= 2^j for j < i.
Case i == 0: the node is the root, size is 1 = 2^0.
Case i + 1: the length of a path is i + 1 if it was i and the tree was then placed underneath
another tree. By the induction hypothesis, it was in a tree of size >= 2^i at
that time. It is being placed under another tree, which by our merge rules means
it has at least rank i as well, and therefore also had >= 2^i nodes. The new tree
therefor has >= 2^i + 2^i = 2^(i + 1) nodes.

Related

Why is the time complexity of performing n union find (union by size) operations O(n log n)?

In Tree based Implementation of Union Find operation, each element is stored in a node, which contains a pointer to a set name. A node v whose set pointer points back to v is also a set name. Each set is a tree, rooted at a node with a self-referencing set pointer.
To perform a union, we simply make the root of one tree point to the root of the other. To perform a find, we follow set name pointers from the starting node until reaching a node whose set name pointer refers back to itself.
In Union by size -> When performing a union, we make the root of smaller tree
point to the root of the larger. This implies O(n log n) time for
performing n union find operations. Each time we follow a pointer, we are going to a subtree of size at most double the size of the previous subtree. Thus, we will follow at most O(log n) pointers for any find.
I do not understand how for each union operation, Find operation is always O(log n). Can someone please explain how the worst case complexity is actually computed?
Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?
If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).
Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.
The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.
Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.
You can do n union find (union by rank or size) operations with complexity O(n lg* n) where lg* n is the inverse Ackermann function using path compression optimization.
Note that O(n lg* n) is better than O(n log n)
In the question Why is the Ackermann function related to the amortized complexity of union-find algorithm used for disjoint sets? you can find details about this relation.
We need to prove that maximum height of trees is log(N) where N is the number of items in UF (1)
In the base case, all trees have a height of 0. (1) of course satisfied
Now assuming all the trees satisfy (1) we need to prove that joining any 2 trees with i, j (i <= j) nodes will create a new tree with maximum height is log(i + j)(2):
Because the joining 2 trees procedure gets root node of the smaller tree and attach it to the root node of the bigger one so the height of the new tree will be:
max(log(j), 1 + log(i)) = max(log(j), log(2i)) <= log(i + j) => (2) proved
log(j): height of new tree is still the height of the bigger tree
1 + log(i): when height of 2 trees are the same
See the picture below for more details:
Ref: book Algorithms

Running time of algorithm with arbitrary sized recursive calls

I have written the following algorithm that given a node x in a Binary Search Tree T, will set the field s for all nodes in the subtree rooted at x, such that for each node, s will be the sum of all odd keys in the subtree rooted in that node.
OddNodeSetter(T, x):
if (T.x == NIL):
return 0;
if (T.x.key mod 2 == 1):
T.x.s = T.x.key + OddNodeSetter(T, x.left) + OddNodeSetter(T, x.right)
else:
T.x.s = OddNodeSetter(T, x.left) + OddNodeSetter(T, x.right)
I've thought of using the master theorem for this, with the recurrence
T(n) = T(k) + T(n-k-1) + 1 for 1 <= k < n
however since the size of the two recursive calls could vary depending on k and n-k-1 (i.e. the number of nodes in the left and right subtree of x), I can't quite figure out how to solve this recurrence though. For example in case the number of nodes in the left and right subtree of x are equal, we can express the recurrence in the form
T(n) = 2T(n/2) + 1
which can be solved easily, but that doesn't prove the running time in all cases.
Is it possible to prove this algorithm runs in O(n) with the master theorem, and if not what other way is there to do this?
The algorithm visits every node in the tree exactly once, hence O(N).
Update:
And obviously, a visit takes constant time (not counting the recursive calls).
There is no need to use the Master theorem here.
Think of the problem this way: what is the maximum number of operations you have do for each node in the tree? It is bounded by a constant. And what the is the number of nodes in the tree? It is n.
The multiplication of constant with n is still O(n).

Cracking the Coding Interview 6th Edition: 10.10. Rank from Stream

The problem statement is as follows:
Imagine you are reading in a stream of integers. Periodically, you
wish to be able to look up the rank of a number x (the number of
values less than or equal to x). Implement the data structures and
algorithms to support these operations.That is, implement the method
track (in t x), which is called when each number is generated, and the
method getRankOfNumber(int x) , which returns the number of values
less than or equal to X (not including x itself).
EXAMPLE: Stream(in order of appearance): 5, 1, 4, 4, 5, 9, 7, 13, 3
getRankOfNumber(1) = 0 getRankOfNumber(3) = 1 getRankOfNumber(4) = 3
The suggested solution uses a modified Binary Search Tree, where each node stores stores the number of nodes to the left of that node. The time complexity for both methods is is O(logN) for balanced tree and O(N) for unbalanced tree, where N is the number of nodes.
But how can we construct a balanced BST from a stream of random integers? Won't the tree become unbalanced in due time if we keep on adding to the same tree and the root is not the median? Shouldn't the worst case complexity be O(N) for this solution (in which case a HashMap with O(1) and O(N) for track() and getRankOfNumber() respectively would be better)?
you just need to build an AVL or Red-Black Tree to have the O(lg n) complexities you desire.
about the rank, its kind of simple. Let's call count(T) the number of elements of a tree with root T.
the rank of a node N will be:
firstly there will be count(N's left subtree) nodes before N (elements smaller than N)
let A = N's father. If N is right son of A, then there will be 1 + count(A's left subtree) nodes before N
if A is right son of some B, then there will be 1 + count(B's left subtree) nodes before N
recursively, run all the way up until you reach the root or until the node you are in isn't someone's right son.
as the height of a balanced tree is at most lg(n), this method will take O(lg n) to return you someone's rank ( O(lg n) to find + O(lg n) to run back and measure the rank ), but this taking in consideration that all nodes store the sizes of their left and right subtrees.
hope that helps :)
Building a Binary Search Tree (BST) using the stream of numbers should be easier to imagine. All the values less than the node, goes to the left and all the values greater than the node, goes to the right.
Then Rank of any x will be number of nodes in left subtree of that node with value x.
Operations to be done: Find the node with Value x O(logN) + Count Nodes of left Subtree of node found O(logN) = Total O(logN + logN) = O(logN)
In case to optimize searching of counts of node of left subtree from O(logN) to O(1), you can keep another class variable 'leftSubTreeSize' in Node class, and populate it during insertion of a node.

Given a binary tree find all root-to-leaf paths

Given a binary tree the problem is to find all root-to-leaf paths. And we know the algorithm by passing the path in the form of a list and adding it to our result as soon as we hit a leaf.
My question How much space does storing all the path consumes. My intuition is that each path is going to consume memory order of the height of tree(O(h)), and if there are 2*n - 1 nodes in our full binary tree and then there are n leafs each corresponding to a path and so the space complexity would be O(n*log(n)) assuming the tree is height balanced. Is my analysis correct?
Your reasoning is correct, but it can be made more exact. A balanced binary tree is not necessarily a full binary tree.
Let N(h) be the number of paths when the height is h. Then N(h) &leq; 2 N(h - 1) This is because, given a tree of height h, the children are each trees of height at most h - 1. So
N(h) = O(2h).
Now we need to bound h. Since h appears in the exponent, it is not enough to find its order of growth. More exactly, it is known that
n &geq; 2h - 1
so
h &leq; log(n + 1)
Inserting this to what we have before
N(h) = O(2log(n + 1)) = O(n).
As you wrote the memory is the sum, per path, of the nodes on the path. The sum of nodes on each path is at most log(n + 1). Combining all the above gives O(n log(n)).
Actually, there is a direct result from the very definition of a tree: there is a unique path between any 2 nodes.
So, if n is the number of leaves, then total (root to leaf) paths = n.
And correspondingly, the tree's height is O(log n).

AVL tree height in O(logn)

I'm trying to figure out a way to calculate the height of the tree using the fact that each nodes knows about its balance. This complexity must be in O(logn)
Here's what I've got so far.
if(!node)
return 0
if(node.balance > 0)
return height(node.right) + 1
if(node.balance < 0)
return height(node.left) + 1
return max( height(T.left), height(T.right)) + 1
My thought was that if the tree is right heavy, traverse the right side and ignore the left. If the tree is left heavy, ignore the right and keep traversing in there. However, I'm a little confused as to what to do if the tree is balanced. At that point, wouldn't we be forced to run max( height(T.left), height(T.right)) + 1 which would mean that each node in the subtree would be visited, hence making this complexity O(n) either way?
The height/depth is linked to the size of the tree as a direct result of the tree being balanced. Because an AVL is balanced, you can use its structure to calculate the depth of the tree in O(1) if you know its size:
lg N = log(N) / log(2) -- define a helper function for 2-log
depth N = 1 + floor(lg(N)) -- calculate depth for size N > 0.
Picture the tree as it fills: each new 'level' of depth corresponds to a threshold in the size of the tree being reached, namely the next power of 2.
You can see why this is: the deepest row of the tree is 'maxed out' when the size of the tree N is just one short of a power of two. The 1+ corresponds to that deepest row which is currently being filled until the depth of the tree changes to the next 'level', the floor(lg(N)) corresponds to the depth which the tree had on the previous 'level'.

Resources