How to find minimum possible height of tree? - data-structures

The fan-out of a node in a tree is defined to be the number of children the node has. The fan-out of a tree is defined to be the maximum fan-out of any node in the tree. Assume tree T has n nodes and a fan-out f > 1. What is the minimum possible height of T?
I have no idea how to begin this problem. I solved the first part which is to find the maximum number of nodes that can be T in terms of height h and fan-out f > 1. I got (f^(h+1) -1)/(f-1). I'm thinking you can use this to solve for the question above. Can someone please point me in the correct direction?
Thank you!

I would approach this problem by turning it around and trying to find the maximum number of nodes you can pack in a tree with a given height and fan-out T_max(h,f). This way, every other tree T(h,f) is guaranteed to have as much, or less nodes than T_max(h,f). Therefore, if you find such T_max(h,f), that
total_nodes( T_max(h,f) ) > n > total_nodes( T_max(h-1,f))
h will be guaranteed to be the minimum height of a tree with n nodes and f fan-out.
In order to find such a tree, you need to maximize the number of nodes in every layer of a tree. In other words, every node of such a tree needs to have fan-out of f, no less. Therefore you start inserting nodes in a tree, one level at a time. After a layer is full, you start adding another layer. After n nodes are inserted in such a tree, you stop and check the height of the tree. This will be the minimal height you are looking for.
Or, you can do a calculation instead:
nodes_in_level(1) = 1
nodes_in_level(2) = f
nodes_in_level(3) = f * f
...
nodes_in_level(x) = f ^ (x - 1)
This is a standard geometric progression. The maximum nodes of a given tree with height x and fan-out f is therefore the sum of such geometric progression and it shouldn't be too much trouble to figure out the smallest x, for which the number of nodes is greater than n.

Related

Maximum & minimum height of a binary tree

According to my textbook when N nodes are stored in a binary tree H(max) = N
According to outside sources when N nodes are stored in a binary tree H(max) = N - 1
Similarly
According to my textbook when N nodes are stored in a binary tree H(min) = [log2N+1]
According to outside sources when N nodes are stored in a binary tree H(min) = [log2(N+1)-1]
Which one is right and which is wrong? Are they supposed to be used in different situations. In this case what would be the maximum height of a tree with 32 nodes?
I have been looking through my resources to understand this concept, and for some reason all my sources have different answers. I can calculate height when a binary tree is pictorially represented because that would involve number of nodes in each subtree. What about when only the number of nodes are given?
Obviously it has to do with the definition of height. If the height is defined by the number of nodes traversed from top to bottom then the max height is N. If it's defined as the number of hops between nodes, then it's N-1. Same goes for minimum height. That said, what counts is it's respectively O(N) and O(log N).

What will be complexity in this Balanced BST?

I have been asked this question in an interview and I'm curious to know what will be the correct explanation for it?
Consider the following height balanced BST where Balance factor = Height of left subtree - Height of right subtree & the accepted balance factors are 0, 1, -1, 2, and -2.
What will be the time taken to search an element in such a kind of height-balanced BST? Explain.
What I said was, even if it has a height factor of 2 rather than 1 in standard Balance BST definitions, still the operations should be logN complexity order, (where N is the number of elements in the tree) because when the N will be large then will be not making much difference if the height factor is 2 or 1.
If anyone can tell me what would have been the correct answer here will be helpful :)
We can solve this mathematically as :
Defining Worst Case
Now, in any Binary Tree, time complexity of searching is O(h) where h is height of the Binary Tree.
Now, for worst case, we want to find Maximum Height.
In case of simple Binary Search Tree with no Balancing Factor
Condition on Nodes, this maximum height can be n or n+1 (depending
on convention whether height of single node tree will be 1 or 0)
where n is number of nodes.
Thus, we can say that given number of nodes, worst case is maximum height.
Interestingly, we can also say that given height of a tree, the worst case is minimum nodes. As even for such minimum number of nodes, we might have to traverse down the tree of height h, which we also have to do for maximum number of nodes.
Thus, the intuition should be clear that Given Height of Tree, the worst case is minimum number of nodes.
Applying this Concept on Binary Search Tree
Let us try to construct Binary Search Tree of Height H such that number of nodes in the tree is minimum.
Here we will exploit the fact that Binary Tree is a Recursive Data
Structure (A Binary Tree can be defined in terms of Binary Tree)
We will use the notation NH to denote Minimum Number of Nodes in a Binary Search Tree of height H
We will create a Root Node
To Left (or Right) of Root, add a subtree of height H-1 (exploiting Recursive Property). So that number of nodes in entire tree is minimum, the number of node in Left (or Right) subtree should also be minimum. Thus NH is a function of
NH-1
Do we need to add anything to Right (or Left)?
No. Because there is no restriction of Balancing Factor on BST. Thus, our tree will look like
Thus, to construct Binary Search Tree of Height H such that number of nodes in the tree is minimum, we can take Binary Search Tree of Height H-1 such that number of nodes is
minimum, and can add 1 root node.
Thus, we can form Recurrence Relation as
NH = NH-1 + 1
with base condition as
N0=1
To create BST of height 0, we need to add one node. Throughout the answer we will use this convention
Now, this Recurrence Relation is quite simple to solve by Substitution and thus
NH = H+1
NH > H
Now, let n be the number of nodes in the BST of height H
Then,
n ≥ NH
n ≥ H
H ≤ n
Therefore,
H=O(n)
Or
O(H) = O(n)
Thus, Worst Case Time Complexity for Searching will be O(n)
Applying this Concept on AVL Tree
We can apply similar concept on AVL Tree. After reading later part of solution, one can find recurrence relation as :
NH = NH-1 + NH-2 + 1
with Base Condition :
N0 = 1
N1 = 2
And, inequality condition on solving recurrence will be
NH ≥ ((1+√5)/2)H
Then, let n be the number of nodes. Thus,
n ≥ NH
On simplifying, one can conclude that
H ≤ 1.44log2(n)
Applying this Concept on GIVEN Tree
Let us try to construct Given Tree of Height H such that number of nodes in the tree is minimum.
We will use the notation NH to denote Minimum Number of Nodes in Given Tree of height H
We will create a Root Node
To Left (or Right) of Root, add a subtree of height H-1 (exploiting Recursive Property). So that number of nodes in entire tree is minimum, the number of node in Left (or Right) subtree should also be minimum. Thus NH is a function of
NH-1
Do we need to add anything to Right (or Left)?
Yes! Because there is restriction of Balancing Factor on Nodes.
We need to add subtree on Right (or Left). What should be it's height?
H?
No, then height of entire tree will become H+1
H-1?
Permitted! since Balancing Factor of Root will be 0
H-2?
Permitted! since Balancing Factor of Root will be 1
H-3?
Permitted! since Balancing Factor of Root will be 2
H-4?
Not Permitted! since Balancing Factor of Root will become 3
We want minimum number of nodes, so out of H-1, H-2 and H-3, we will choose H-3. So that number of nodes in entire tree is minimum, the number of node in Right (or Left) subtree should also be minimum. Thus NH is also a function of
NH-3
Thus, to construct Given Tree of Height H such that number of nodes in the tree is minimum, we can have LEFT subtree as
Given Tree of Height H-1 such that number of nodes is minimum and can have RIGHT subtree as Given Tree of Height H-3 such that number of nodes in it is also minimum, and can add one Root Node. Our tree will look like
Thus, we can form Recurrence Relation as
NH = NH-1 + NH-3 + 1
with base condition as
N0=1
N1=2
N2=3
Now, this Recurrence Relation is Difficult to Solve. But courtesy to this answer, we can conclude that
NH > (√2)H
Now, let n be the number of nodes in the Given Tree Then,
n ≥ NH
n ≥ (√2)H
log√2(n) ≥ H
H ≤ log√2(n)
H ≤ 2log2(n)
Therefore, H=O(log(n))
Or O(H) = O(log(n))
Thus, Worst Case Time Complexity for Searching in this Given Tree will be O(log(n))
Hence, Proved Mathematically!

What is the total number of nodes generated by Depth-First Search

Assume: 'd' is the finite depth of a tree ; 'b' is a branching factor ; 'g' is the shallowest goal node.
From what I know, the worst-case is when the goal node is at the very last right-bottomed node in a tree.
Thus, supposedly the total number of nodes generated is O(bg), right?
However, my instructor told me it was wrong since the worst-case is when all the tree are explored except the subtree rooted at the goal node.
He mentioned something about O(bd) - O(b(g-d)) .... I'm not entirely sure.
I don't really get what he means, so can somebody tell me which answer is correct?
I recommend drawing a tree, marking the nodes that are explored, and counting how many there are.
Your reasoning is correct if you use breadth first search because you will only have reached a depth of g for each branch (O(b**g) nodes explored in total).
Your instructor's reasoning is correct if you use depth first search because you reach a depth of d for all parts of the tree except the one with the goal (O(b**d - b**(d-g)) nodes explored).
The goal is the green circle.
The blue nodes are explored.
The red nodes are not explored.
To count the number explored we count the total in the tree, and take away the red ones.
Depth = 2 = d
Goal at depth = 1 = g
Branching factor = b = 3
Note that I have called the total number of nodes in the tree O(b**d). Strictly speaking, the total is b**d + b**(d-1) + b**(d-2) + ... + 1, but this is O(b**d).

Relationship between number of nodes and height

I am reading The Algorithm Design Manual. The author states that the height of a tree is:
h = log n,
where
h is height
n = number of leaf nodes
log is log to base d, where d is the maximum number of children allowed per node.
He then goes on to say that the height of a perfectly balanced binary search tree, would be:
h = log n
I wonder if n in this second statement denotes 'total number of leaf nodes' or 'total number of nodes'.
Which brings up a bigger question, is there a mathematical relationship between total number of nodes and the height of a perfectly balanced binary search tree?
sure, n = 2^h where h, n denote height of the tree and the number of its nodes, respectively.
proof sketch:
a perfectly balanced binary tree has
an actual branching factor of 2 at each inner node.
equal root path lengths for each leaf node.
about the leaf nodes in a perfectly balanced binary tree:
as the number of leafs is the number of nodes minus the number of nodes in a perfectly balanced binary tree with a height decremented by one, the number of leafs is half the number of all nodes (to be precise, half of n+1).
so h just varies by 1, which usually doesn't make any real difference in complexity considerations. that claim can be illustrated by remembering that it amounts to the same variations as defining the height of a single node tree as either 0 (standard) or 1 (unusual, but maybe handy in distinguishing it from an empty tree).
It doesn't really matter if you talk of all nodes or just leaf nodes: either is bound by above and below by the other multiplied by a constant factor. In a perfectly balanced binary tree the number of nodes on a full level is the number of all nodes in levels above plus one.
In a complete binary tree number of nodes (n) and height of tree (h) have a relationship like this in below.
n = 2^(h+1) -1
this is the all the nodes of the tree

Finding number of nodes within a certain distance in a rooted tree

In a rooted and weighted tree, how can you find the number of nodes within a certain distance from each node? You only need to consider down edges, e.g. nodes going down from the root. Keep in mind each edge has a weight.
I can do this in O(N^2) time using a DFS from each node and keeping track of the distance traveled, but with N >= 100000 it's a bit slow. I'm pretty sure you could easily solve it with unweighted edges with DP, but anyone know how to solve this one quickly? (Less than N^2)
It's possible to improve my previous answer to O(nlog d) time and O(n) space by making use of the following observation:
The number of sufficiently-close nodes at a given node v is the sum of the numbers of sufficiently-close nodes of each of its children, less the number of nodes that have just become insufficiently-close.
Let's call the distance threshold m, and the distance on the edge between two adjacent nodes u and v d(u, v).
Every node has a single ancestor that is the first ancestor to miss out
For each node v, we will maintain a count, c(v), that is initially 0.
For any node v, consider the chain of ancestors from v's parent up to the root. Call the ith node in this chain a(v, i). Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. If we are able to quickly find i, then we can simply decrement c(a(v, i+1)) (bringing it (possibly further) below 0), so that when the counts of a(v, i+1)'s children are added to it in a later pass, v is correctly excluded from being counted. Provided we calculate fully accurate counts for all children of a node v before adding them to c(v), any such exclusions are correctly "propagated" to parent counts.
The tricky part is finding i efficiently. Call the sum of the distances of the first j >= 0 edges on the path from v to the root s(v, j), and call the list of all depth(v)+1 of these path lengths, listed in increasing order, s(v). What we want to do is binary-search the list of path lengths s(v) for the first entry greater than the threshold m: this would find i+1 in log(d) time. The problem is constructing s(v). We could easily build it using a running total from v up to the root -- but that would require O(d) time per node, nullifying any time improvement. We need a way to construct s(v) from s(parent(v)) in constant time, but the problem is that as we recurse from a node v to its child u, the path lengths grow "the wrong way": every path length x needs to become x + d(u, v), and a new path length of 0 needs to be added at the beginning. This appears to require O(d) updates, but a trick gets around the problem...
Finding i quickly
The solution is to calculate, at each node v, the total path length t(v) of all edges on the path from v to the root. This is easily done in constant time per node: t(v) = t(parent(v)) + d(v, parent(v)). We can then form s(v) by prepending -t to the beginning of s(parent(v)), and when performing the binary search, consider each element s(v, j) to represent s(v, j) + t (or equivalently, binary search for m - t instead of m). The insertion of -t at the start can be achieved in O(1) time by having a child u of a node v share v's path length array, with s(u) considered to begin one memory location before s(v). All path length arrays are "right-justified" inside a single memory buffer of size d+1 -- specifically, nodes at depth k will have their path length array begin at offset d-k inside the buffer to allow room for their descendant nodes to prepend entries. The array sharing means that sibling nodes will overwrite each other's path lengths, but this is not a problem: we only need the values in s(v) to remain valid while v and v's descendants are processed in the preorder DFS.
In this way we gain the effect of O(d) path length increases in O(1) time. Thus the total time required to find i at a given node is O(1) (to build s(v)) plus O(log d) (to find i using the modified binary search) = O(log d). A single preorder DFS pass is used to find and decrement the appropriate ancestor's count for each node; a postorder DFS pass then sums child counts into parent counts. These two passes can be combined into a single pass over the nodes that performs operations both before and after recursing.
[EDIT: Please see my other answer for an even more efficient O(nlog d) solution :) ]
Here's a simple O(nd)-time, O(n)-space algorithm, where d is the maximum depth of any node in the tree. A complete tree (a tree in which every node has the same number of children) with n nodes has depth d = O(log n), so this should be much faster than your O(n^2) DFS-based approach in most cases, though if the number of sufficiently-close descendants per node is small (i.e. if DFS only traverses a small number of levels) then your algorithm should not be too bad either.
For any node v, consider the chain of ancestors from v's parent up to the root. Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. So all we need to do is for each node, climb upwards towards the root until such time as the total path length exceeds the threshold distance m, incrementing the count at each ancestor as we go. There are n nodes, and for each node there are at most d ancestors, so this algorithm is trivially O(nd).

Resources