Given a binary tree find all root-to-leaf paths - algorithm

Given a binary tree the problem is to find all root-to-leaf paths. And we know the algorithm by passing the path in the form of a list and adding it to our result as soon as we hit a leaf.
My question How much space does storing all the path consumes. My intuition is that each path is going to consume memory order of the height of tree(O(h)), and if there are 2*n - 1 nodes in our full binary tree and then there are n leafs each corresponding to a path and so the space complexity would be O(n*log(n)) assuming the tree is height balanced. Is my analysis correct?

Your reasoning is correct, but it can be made more exact. A balanced binary tree is not necessarily a full binary tree.
Let N(h) be the number of paths when the height is h. Then N(h) ≤ 2 N(h - 1) This is because, given a tree of height h, the children are each trees of height at most h - 1. So
N(h) = O(2h).
Now we need to bound h. Since h appears in the exponent, it is not enough to find its order of growth. More exactly, it is known that
n ≥ 2h - 1
so
h ≤ log(n + 1)
Inserting this to what we have before
N(h) = O(2log(n + 1)) = O(n).
As you wrote the memory is the sum, per path, of the nodes on the path. The sum of nodes on each path is at most log(n + 1). Combining all the above gives O(n log(n)).

Actually, there is a direct result from the very definition of a tree: there is a unique path between any 2 nodes.
So, if n is the number of leaves, then total (root to leaf) paths = n.
And correspondingly, the tree's height is O(log n).

Related

Why is the time complexity of performing n union find (union by size) operations O(n log n)?

In Tree based Implementation of Union Find operation, each element is stored in a node, which contains a pointer to a set name. A node v whose set pointer points back to v is also a set name. Each set is a tree, rooted at a node with a self-referencing set pointer.
To perform a union, we simply make the root of one tree point to the root of the other. To perform a find, we follow set name pointers from the starting node until reaching a node whose set name pointer refers back to itself.
In Union by size -> When performing a union, we make the root of smaller tree
point to the root of the larger. This implies O(n log n) time for
performing n union find operations. Each time we follow a pointer, we are going to a subtree of size at most double the size of the previous subtree. Thus, we will follow at most O(log n) pointers for any find.
I do not understand how for each union operation, Find operation is always O(log n). Can someone please explain how the worst case complexity is actually computed?
Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?
If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).
Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.
The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.
Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.
You can do n union find (union by rank or size) operations with complexity O(n lg* n) where lg* n is the inverse Ackermann function using path compression optimization.
Note that O(n lg* n) is better than O(n log n)
In the question Why is the Ackermann function related to the amortized complexity of union-find algorithm used for disjoint sets? you can find details about this relation.
We need to prove that maximum height of trees is log(N) where N is the number of items in UF (1)
In the base case, all trees have a height of 0. (1) of course satisfied
Now assuming all the trees satisfy (1) we need to prove that joining any 2 trees with i, j (i <= j) nodes will create a new tree with maximum height is log(i + j)(2):
Because the joining 2 trees procedure gets root node of the smaller tree and attach it to the root node of the bigger one so the height of the new tree will be:
max(log(j), 1 + log(i)) = max(log(j), log(2i)) <= log(i + j) => (2) proved
log(j): height of new tree is still the height of the bigger tree
1 + log(i): when height of 2 trees are the same
See the picture below for more details:
Ref: book Algorithms

Proof that a binary tree with n leaves has a height of at least log n

I've been able to create a proof that shows the maximum total nodes in a tree is equal to n = 2^(h+1) - 1 and logically I know that the height of a binary tree is log n (can draw it out to see) but I'm having trouble constructing a formal proof to show that a tree with n leaves has "at least" log n. Every proof I've come across or been able to put together always deals with perfect binary trees, but I need something for any situation. Any tips to lead me in the right direction?
Lemma: the number of leaves in a tree of height h is no more than 2^h.
Proof: the proof is by induction on h.
Base Case: for h = 0, the tree consists of only a single root node which is also a leaf; here, n = 1 = 2^0 = 2^h, as required.
Induction Hypothesis: assume that all trees of height k or less have fewer than 2^k leaves.
Induction Step: we must show that trees of height k+1 have no more than 2^(k+1) leaves. Consider the left and right subtrees of the root. These are trees of height no more than k, one less than the height of the whole tree. Therefore, each has at most 2^k leaves, by the induction hypothesis. Since the total number of leaves is just the sum of the numbers of leaves of the subtrees of the root, we have n = 2^k + 2^k = 2^(k+1), as required. This proves the claim.
Theorem: a binary tree with n leaves has height at least log(n).
We have already noted in the lemma that the tree consisting of just the root node has one leaf and height zero, so the claim is true in that case. For trees with more nodes, the proof is by contradiction.
Let n = 2^a + b where 0 < b <= 2^a. Now, assume the height of the tree is less than a + 1, contrary to the theorem we intend to prove. Then the height is at most a. By the lemma, the maximum number of leaves in a tree of height a is 2^a. But our tree has n = 2^a + b > 2^a leaves, since 0 < b; a contradiction. Therefore, the assumption that the height was less than a+1 must have been incorrect. This proves the claim.

Disjoint Set in a special ways?

We implement Disjoint Data structure with tree. in this data structure makeset() create a set with one element, merge(i, j) merge two tree of set i and j in such a way that tree with lower height become a child of root of the second tree. if we do n makeset() operation and n-1 merge() operations in random manner, and then do one find operation. what is the cost of this find operation in worst case?
I) O(n)
II) O(1)
III) O(n log n)
IV) O(log n)
Answer: IV.
Anyone could mentioned a good tips that the author get this solution?
The O(log n) find is only true when you use union by rank (also known as weighted union). When we use this optimisation, we always place the tree with lower rank under the root of the tree with higher rank. If both have the same rank, we choose arbitrarily, but increase the rank of the resulting tree by one. This gives an O(log n) bound on the depth of the tree. We can prove this by showing that a node that is i levels below the root (equivalent to being in a tree of rank >= i) is in a tree of at least 2i nodes (this is the same as showing a tree of size n has log n depth). This is easily done with induction.
Induction hypothesis: tree size is >= 2^j for j < i.
Case i == 0: the node is the root, size is 1 = 2^0.
Case i + 1: the length of a path is i + 1 if it was i and the tree was then placed underneath
another tree. By the induction hypothesis, it was in a tree of size >= 2^i at
that time. It is being placed under another tree, which by our merge rules means
it has at least rank i as well, and therefore also had >= 2^i nodes. The new tree
therefor has >= 2^i + 2^i = 2^(i + 1) nodes.

How to form a recurrence for findng the height of a weight balanced binary tree?

A weight balanced tree is a binary tree in which for each node, the no. of nodes in the left subtree is atleast half and at most twice the no. of nodes in the right sub tree. So how to approach for forming a recurrence for finding the height of this weight balanced binary tree ?
Only full binary trees can be weight-balanced. Let H(n) be the maximum height of a weight-balanced binary tree with exactly n interior nodes (so, 2*n+1 nodes). The base case
H(0) = 0
is obvious. The recurrence
floor((4*n/3-1)/2)
H(n) = 1 + max max(H(l), H(n-1-l)) for all n > 0
l=ceiling((2*n/3-1)/2)
follows from the recurrence for height and the fact that, given n-1 interior nodes that are not the root, we must distribute them between the left (l) and the right (n-1-l) subtrees, subject to the weight-balance criterion. (There are 2*n interior and exterior nodes to distribute; the most unbalanced split possible is one-third/two-thirds; a full binary tree with k nodes has (k-1)/2 interior nodes.)
Let's conjecture that H(n) is nondecreasing and write a new recurrence
H'(0) = 0
H'(n) = 1 + H'(floor(2*n/3-1/2)) for all n > 0.
The point of the new recurrence is that, if it is in fact nondecreasing, then H(n) = H'(n), by an strong induction proof that involves simplifying the maxes in the other recurrence. In fact, we can prove by induction without solving H'(n) that it is, in fact, nondecreasing, so this simplification is fine.
As for solving H'(n), I'll wave my hands and tell you that Akra--Bazzi applies (alternatively, the Master Theorem Case 2 if you want to play fast and loose with the floor), yielding the asymptotic bound H'(n) = Theta(log n).

Proof that the height of a balanced binary-search tree is log(n)

The binary-search algorithm takes log(n) time, because of the fact that the height of the tree (with n nodes) would be log(n).
How would you prove this?
Now here I am not giving mathematical proof. Try to understand the problem using log to the base 2. Log2 is the normal meaning of log in computer science.
First, understand it is binary logarithm (log2n) (logarithm to the base 2).
For example,
the binary logarithm of 1 is 0
the binary logarithm of 2 is 1
the binary logarithm of 3 is 1
the binary logarithm of 4 is 2
the binary logarithm of 5, 6, 7 is 2
the binary logarithm of 8-15 is 3
the binary logarithm of 16-31 is 4 and so on.
For each height the number of nodes in a fully balanced tree are
Height Nodes Log calculation
0 1 log21 = 0
1 3 log23 = 1
2 7 log27 = 2
3 15 log215 = 3
Consider a balanced tree with between 8 and 15 nodes (any number, let's say 10). It is always going to be height 3 because log2 of any number from 8 to 15 is 3.
In a balanced binary tree the size of the problem to be solved is halved with every iteration. Thus roughly log2n iterations are needed to obtain a problem of size 1.
I hope this helps.
Let's assume at first that the tree is complete - it has 2^N leaf nodes. We try to prove that you need N recursive steps for a binary search.
With each recursion step you cut the number of candidate leaf nodes exactly by half (because our tree is complete). This means that after N halving operations there is exactly one candidate node left.
As each recursion step in our binary search algorithm corresponds to exactly one height level the height is exactly N.
Generalization to all balanced binary trees: If the tree has less nodes than 2^N we for sure don't need more halvings. We might need less or the same amount but never more.
Assuming that we have a complete tree to work with, we can say that at depth k, there are 2k nodes. You can prove this using simple induction, based on the intuition that adding an extra level to the tree will increase the number of nodes in the entire tree by the number of nodes that were in the previous level times two.
The height k of the tree is log(N), where N is the number of nodes. This can be stated as
log2(N) = k,
and it is equivalent to
N = 2k
To understand this, here's an example:
16 = 24 => log2(16) = 4
The height of the tree and the number of nodes are related exponentially. Taking the log of the number of nodes just allows you to work backwards to find the height.
Just look up the rigorous proof in Knuth, Volume 3 - Searching and Sorting Algorithms ... He does it far more rigorously than anyone else I can think of.
http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming
You can find it in any good Computer Science library and on the bookshelves of many (very) old geeks.
Why is the height of a balanced binary tree equal to ceil(log2N) for N nodes?
w = width of base (maximum number of leaves)
h = height of tree (maximum number of edges from root to leaf)
Divide w by 2 (h times) to get to 1, which counts the single root node at top.
N = w + w/2 + ... + 1
N = 2h + ... + 21 + 20
= (1-2h+1) / (1-2) = 2h+1-1
log2(N+1) = h+1
Check: if N=1, h=0. If h=1, N=3.
This formula is for if the bottom level is full. N will not always be so great, but would still have the same height, h. So we must take the log's ceiling.

Resources