Binary Tree Leaves - binary-tree

Question: assuming a binary search tree has 5 leaves, what is the minimum number of levels that it could have?
I thought a leaf was a node that did not have any children, and when I approached this problem I got 6 levels, but the answer is 4. Can someone explain this?
My process:
50
/ \
30 Leaf
/ \
Leaf 40
/ \
35 Leaf
/ \
33 Leaf
/ \
Leaf 34
I might be doing something wrong visualizing the tree, if that's the case please let me know

There could be nodes with two leaves.
10
/\
21 22
/\ /\
L L 31 L
/\
L L

Related

Data Structures and Algorithmn in C++ 2nd Ed - Goodrich . Page 295 question on vector-based structure binary tree worst case for space 2^n - 1

Let me explain as best as i can. This is about binary tree using vector.
According to author, the implementation is as follows:
A simple structure for representing a binary tree T is based on a way of numbering
the nodes of T. For every node v of T, let f(v) be the integer defined as follows:
• If v is the root of T, then f(v) = 1
• If v is the left child of node u, then f(v) = 2 f(u)
• If v is the right child of node u, then f(v) = 2 f(u)+ 1
The numbering function f is known as a level numbering of the nodes in a binary
tree T, because it numbers the nodes on each level of T in increasing order from
left to right, although it may skip some numbers (see figures below).
Let n be the number of nodes of T, and let fM be the maximum value of f(v)
over all the nodes of T. The vector S has size N = fM + 1, since the element of S at
index 0 is not associated with any node of T. Also, S will have, in general, a number
of empty elements that do not refer to existing nodes of T. For a tree of height h,
N = O(2^h). In the worst case, this can be as high as 2^n − 1.
Question:
The last statement worst case 2^n-1 does not seem right. Here n=number of nodes. I think he meant 2^h-1 instead of 2^n-1. Using figure a) as an example, this would mean 2^n -1 means 2^15-1 = 32768-1 = 32767. Does not make sense.
Any insight is appreciated.
Thanks.
The worst case is when the tree is degenerated to a chain from the root, where each node has two children, but at least one of which is always a leaf. When this chain has n nodes, then the height of the tree is n/2. The vector must span all the levels and allocate room for full levels, even though there is in this degenerate tree only one node per level. The size S of the vector will still be O(2h), but now that in this degenerate case h is O(n/2) = O(n), this makes it O(2n) in the worst case.
The formula 2n-1 seems to suggest the author does not have a proper binary tree in mind, and then the above reasoning should be done with a degenerate tree that consists of a single chain where every node has at the most one child.
Example of worst case
Here is an example tree (not a proper tree, but the principle for proper trees is similar):
1
/
2
\
5
\
11
So n = 4, and h = 3.
The vector however needs to store all the slots where nodes could have been, so something like this:
_____ 1 _____
/ \
__2__ __ __
/ \ / \
_5_
/ \ / \ / \ / \
11
...so the vector has a size of 1+2+4+8 = 15. (Even 16 when we account for the unused slot 0 in the vector)
This illustrates that the size S of the vector is always O(2h). In this worst case (worst with respect to n, not with respect to h), S is O(2n).
Example n=6
When n=6, we could have this as a best case:
1
/ \
2 3
/ \ \
4 5 7
This tree can be represented by a vector of size 8, where the entries at index 0 and index 6 are filled with nulls (unused).
However, for n=6 we could have a worst case ("worst" for the impact on the vector size) when the tree is very unbalanced:
1
\
2
\
3
\
4
\
5
\
7
Now the tree's height is 5 instead of 2, and the vector needs to put that node 7 in the slot at index 63... S is 64. Remember that the vector spans each complete binary level, which doubles in size at each next level.
So when n is 6, S can be 8, 16, 32, or 64. It depends on the shape of the tree. In each case we have that S=O(2h). But when we express S in terms of n, then there is variation, and the best case is that S=O(n), while the worst case is S=O(2n).

Is there such a BST that has optimal height but does not satisfy the AVL condition?

I'm curious whether it is possible to construct a binary search tree in such a way that it has minimal height for its n elements but it is not an AVL tree.
Or in other words, is every binary search tree with minimal height by definition also an AVL tree?
The AVL requirement is that left and right depths differ at most by 1.
An optimal BST of N elements, where D = ²log N, has the property that sum of depths is minimal. The effect is that the depth of every element resides at most ceil(D) deep.
To have a minimal sum of depths the tree must be filled most full from the top on down, so the sum of individual lengths is minimal.
Not optimal BST - and not AVL:
f
/ \
a q
/ \
n x
/ \ \
j p y
Elememts: 8
Depts: 0 + 1 + 1 + 2 + 2 + 3 + 3 + 3 = 15
Optimal BST - and AVL:
_ f _
/ \
j q
/ \ / \
a n p x
\
y
Elememts: 8
Depts: 0 + 1 + 1 + 2 + 2 + 2 + 2 + 3 = 13
So there is no non-AVL optimal BST.

Space complexity of breadth first search of binary tree?

What would be the space complexity of breadth first search on a binary tree? Since it would only store one level at a time, I don't think it would be O(n).
The space complexity is in fact O(n), as witnessed by a perfect binary tree. Consider an example of depth four:
____________________14____________________
/ \
_______24_________ __________8_________
/ \ / \
__27__ ____11___ ___23___ ____22___
/ \ / \ / \ / \
_4 5 _13 _2 _17 _12 _26 _25
/ \ / \ / \ / \ / \ / \ / \ / \
29 0 9 6 16 19 20 1 10 7 21 15 18 30 28 3
Note that the number of nodes at each depth is given by
depth num_nodes
0 1
1 2
2 4
3 8
4 16
In general, there are 2^d nodes at depth d. The total number of nodes in a perfect binary tree of depth d is n = 1 + 2^1 + 2^2 + ... + 2^d = 2^(d+1) - 1. As d goes to infinity, 2^d/n goes to 1/2. So, roughly half of all nodes occur at the deepest level. Since n/2 = O(n), the space complexity is linear in the number of nodes.
The illustration credit goes to the binarytree package.

A statement from heapsort algorithm in clrs book

please explain the underlined statement in the picture. It's from section 6.2 in CLRS. How is the subtree size 2n/3 at most ?
Remember that balance in binary trees is generally a good thing for time complexities! The worst case time complexity occurs when the tree is the most inbalanced it can be. We're working with heaps here – heaps are complete binary trees. The most inbalanced a complete tree can have is when its bottomost level is half-full. This is shown below.
-------*-------
/ \
* *
/ \ / \
/ \ / \
/ \ / \
/-------\ /-------\
/---------\ <-- last level is half-full
Suppose there are m nodes in the last level. Then there must be m - 1 nodes remaining in the left subtree.
-------*-------
/ \
* *
/ \ / \
/ \ / \
/ m-1 \ / \
/-------\ /-------\
/--- m ---\
Why? Well in general, a tree with m leaf nodes must have m - 1 internal nodes. Imagine if these m leaf nodes represented players in a tournament, if one player is eliminated per game, there must be m - 1 games to determine the winner. Each game corresponds to an internal node. Hence there are m - 1 internal nodes.
Because the tree is complete, the right subtree must also have m - 1 nodes.
-------*-------
/ \
* *
/ \ / \
/ \ / \
/ m-1 \ / m-1 \
/-------\ /-------\
/--- m ---\
Hence we have total number of nodes (including the root):
n = 1 + [(m - 1) + m] + (m - 1)
= 3m - 1
Let x = number of nodes in the left subtree. Then:
x = (m - 1) + m
= 2m - 1
We can solve these simultaneous equations, eliminating variable m:
2n - 3x = 1
x = (2n - 1) / 3
Hence x is less than 2n/3. This explains the original statement:
The children's subtrees each have size at most 2n/3 – the worst case occurs when the bottom level of the tree is exactly half full

Shortest path between two nodes in an infinite, complete binary tree?

Suppose we have an infinite, complete binary tree where the nodes are numbered 1, 2, 3, ... by their position in a layer-by-layer traversal of the tree. Given the indices of two nodes u and v in the tree, how can we efficiently find the shortest path between them?
Thanks!
#Jonathan Landrum pointed out the solution in his comment. This answer fleshes out that solution.
In any tree, there is exactly one path between any two nodes. Therefore, this problem boils down to determining the unique path between those two nodes.
In any rooted tree, the shortest path between two nodes u and v can be found by finding the lowest common ancestor x of the two nodes, then concatenating the paths from u to x and from x to v. In your case, you therefore need to find the LCA of the two nodes, then glue these paths together.
Since you have an infinite binary tree, I assume that the representation is as follows:
1
/ \
2 3
/ \ / \
4 5 6 7
/ \ / \ / \ / \
8 9 10 11 12 13 14 15
This tree shape has a really interesting property if you write all the numbers in binary:
1
/ \
10 11
/ \ / \
100 101 110 111
/ \ / \ / \ / \
1000 1001 1010 1011 1100 1101 1110 1111
There's a few things you can notice. First, the depth of each node is given by one minus the index of the MSB.
Next, notice that if a number has binary representation b1 b2 ... bn-1bn, then its parent is b1 b2 ... bn-1, and it's a left child if bn = 0 and a right child if bn = 1. By applying this property repeatedly, we get the following: a node u is the kth ancestor of v if and only if (v >> k) = u.
This gives us a lot to work with. Typically, you'd compute LCA(u, v) in the following way:
If u is deeper than v, step upward from u until you reach a node at the same depth as v (and, vice-versa, step up from v if v is deeper).
Walk upward from u and v at the same rate until they reach the same node. That node is the LCA.
We could implement this directly in time O(log max{u, v}) as follows. To do step (1), compute the index of the MSB of u and v to determine the depths d(u) and d(v) of each node. Let's assume WLOG that d(v) ≥ d(u). In that case, we can find the ancestor of u that's at the same depth of v in time O(1) by computing v >> (d(u) - d(v)). Nifty! To do step (2), we compare u and v and, if they're unequal, shift each one left by one spot, simulating stepping up one level. The maximum number of times we can do this is given by O(log max{u, v}), so the overall runtime is O(log max{u, v}).
However, we can speed this up exponentially by using a modified binary search. The depth of the LCA of u and v must be between 0 and min{d(u), d(v)}. Once we find a common ancestor x of u and v, we know that all ancestors of x are also common ancestors of u and v. Therefore, we can binary search over the possible depths of the LCA for u and v, computing the ancestor of each node from that depth by using a bitshift. This will run in time O(log log max{u, v}), since the maximum depth of u is O(log u) and the maximum depth of v is O(log v).
Once we've found that ancestor, we can compute the path between u and v as follows. Compute the path from u to that ancestor by repeatedly shifting away one bit from u until we arrive at the common ancestor. Compute the path from v to the ancestor in the same way, then tack on the reversal of that path to the path found in the first step. The length of this path is given by O(|log u - log v|), so the runtime is O(|log u - log v|).
On the other hand, if you just need the length of the path, you can sum the distance from u to LCA(u, v) and from LCA(u, v) to v. We can compute these values in O(log log max{u, v}) time each, so the runtime is O(log log max{u, v}).
Hope this helps!

Resources