Binary Search Tree Construction of height h - algorithm

Given the first n natural numbers as the keys for a BST , How can I determine the root node of all possible tree that have a height 'h' .
I already came up with a brute force method where I constructed all possible trees with n nodes and then selected the trees which have a height h but it has a time complexity of nearly O(n!) . Can someone please suggest a better method which would me more efficient ?

Problem statement. Given natural numbers n and h, determine exactly all elements root in 1..n such that root is the root of a binary search tree on 1..n of height h.
Solution. We can construct a degenerate binary search tree on 1..n starting from any number in 1..n by splitting it up at root. This changes the lower bounds from the old solution to h-1, while the upper bounds remain the same, rendering the full bounds as follows:
h-1 <= max(root-1, n-root) <= 2^h - 1
Old solution (correct only for full binary trees). A full binary tree with height h has at least 2h+1 nodes, and at most 2^(h+1)-1 nodes. It's easy to see that these bounds are tight, not only for binary trees, but also for binary search trees. In particular, they apply to the left and right subtrees of your root. Since this is a binary search tree on 1..n, you will have that left contains exactly the elements 1..(root-1), and right contains exactly the elements (root+1)..n.
This means that the following is both a necessary and sufficient condition: The larger of the subtrees left and right must satisfy the inequalities
2*(h-1) + 1 <= nodes(subtree) <= 2^h - 1
In other words, the possible values of root are exactly all values in 1..n satisfying
2*(h-1) + 1 <= max(root-1, n-root) <= 2^h - 1
Update. I blindly looked at an inequality I found at wikipedia without realizing that it only applied to full binary trees.

Related

How to prove Average height of Binary Search Tree is O(logn)?

I had thought of a proof but unable to sketch things up on paper.
I had though of the Recurrence relation for the height of a binary search tree would be
T(n)=T(k)+T(n-k-1)+1
Where k is the Number of elements in the left subtree if root and n-k-1 are on right subtree of root (and n= total nodes)
(Correct me if the thing above is wrong)
Now what I thought, because we have to calculate for average case..
So there would be half of the cases possible...
(Now this is the point from where I am starting messing up things plz correct here..)
My claim: I would have approx half cases out of all possible..
Example
Root
(0,N-1) OR
(1, N-2) OR
.
.
(N-1,0)
Where N is total nodes.
Now I am considering half of above cases for average case calculation...
(I don't know whether I am doing this right or not..comment on it would be most appreciated)
So I get :
T(n)=T(n/2)+T(n/2)+1
T(n)=2T(n/2)+1
Now when I apply master method for getting approx answer over the obtained recurrence relation..
I get O(n).
Now how should I proceed...?
(My expectation was instead of n I should have got logn)
But that didn't work out..
So plz suggest how should I proceed further.
(Is my approach even at all correct..from start, also tell me about that?)
From "Algorithms" by Robert Sedgewick and Kevin Wayne
Definition. A binary search tree (BST) is a binary tree where each node has a
Comparable key (and an associated value) and satisfies the restriction that the key
in any node is larger than the keys in all nodes in that node’s left subtree and smaller than the keys in all nodes in that node’s right subtree.
Proposition C. Search hits in a BST built from N random keys require ~ 2 ln N
(about 1.39 lg N) compares, on the average.
Proof: The number of compares used for a search hit ending at a given node is
1 plus the depth. Adding the depths of all nodes, we get a quantity known as the
internal path length of the tree. Thus, the desired quantity is 1 plus the average internal path length of the BST, which we can analyze with the same argument that
we used for Proposition K in Section 2.3: Let CN be the total internal path length
of a BST built from inserting N randomly ordered distinct keys, so that the average
cost of a search hit is 1 CN / N. We have C0= C1= 0 and for N > 1 we can write a
recurrence relationship that directly mirrors the recursive BST structure:
CN = N 1 (C0 CN1) / N + (C1 CN2)/N . . . (CN1 C0 )/N
The N 1 term takes into account that the root contributes 1 to the path length
of each of the other N 1 nodes in the tree; the rest of the expression accounts
for the subtrees, which are equally likely to be any of the N sizes. After rearranging
terms, this recurrence is nearly identical to the one that we solved in Section 2.3
for quicksort, and we can derive the approximation CN ~ 2N ln N
I also reccommend you to check this Mit lecture Binary Search Trees, BST Sort
Also check the chapter 3.2 from Algorithms books, it explains binary search trees in depth

Merging Binary search tree

I'm looking for a efficient way of merging two BST, knowing that the elements in the first tree are all lower than those in the second one.
I saw some merging methods but without that feature, i think this should simplify the algorithm.
Any idea ?
If the trees are not balanced, or the result shouldn't be balanced that's quite easy:
without loss of generality - let the first BST be smaller (in size) than the second.
1. Find the highest element in the first BST - this is done by following the right son while it is still available. Let the value be x, and the node be v
2. Remove this item (v) from the first tree
3. Create a new Root with value x, let this new root be r
4. set r.left = tree1.root, r.right = tree2.root
(*) If the first BST is bigger in size than the second, just repeat the process for finding v as the smallest node in the second tree.
(*) Complexity is O(min{|T1|,|T2|}) worst case (finding highest element if the tree is very inbalanced), and O(log(min{|T1|,|T2|})) average case - if the tree is relatively balanced.

min and max height expression tree

When constructing an expression tree with n binary operations, which maximum and minimum height can I expect? I would be very thankful if someone has a general formula, because I couldn't find one and I also wasn't able to find a schema in the examples I worked with.
Let's suppose you have n operations. Naturally, the maximum height is n + 1, on the first level you see the root operation, on the last level you see value leafs and on all other levels you see an operation node and a value leaf. The minimum depth (if your operations always "cut" the expression in the middle) is of ceil(log(2, 2 * n + 1)).

locating lowest common ancestor in AVL tree

I have an AVL tree and 2 keys in it. how do I find the lowest common ancestor (by lowest I mean hight, not value) with O(logn) complexity?
I've seen an answer here on stackoverflow, but I admit I didn't exactly understand it. it involved finding the routes from each key to the root and then comparing them. I'm not sure how this meets the complexity requirements
For the first node you move up and mark the nodes. For the second node you move up and look if a node on the path is marked. As soon as you find a marked node you can stop. (And remove the marks by doing the first path again).
If you cannot mark nodes in the tree directly then modify the values contained to include a place where you can mark. If you cannot do this either then add a hashmap that stores which nodes are marked.
This is O(logn) because the tree is O(logn) deep and at worst you walk 3 times to the root.
Also, if you wish you can alternate steps of the two paths instead of first walking the first path completely. (Note that then both paths have to check for marks.) This might be better if you expect the two nodes to have their ancestor somewhat locally. The asymptotic runtime is the same as above.
A better solution for the AVL tree (balanced binary search tree) is (I have used C pointers like notation)-
Let K1 and K2 be 2 keys, for which LCA is to be found. Assume K1 < K2
A pointer P = root of tree
If P->key >= K1 and P->key <= K2 : return P
Else if P->key > K1 and P->key > K2 : P = P->left
Else P = P->right
Repeat step 3 to 5
The returned P points to the required LCA.
Note that this approach works only for BST, not any other Binary tree.

How to efficiently check whether it's height balanced for a massively skewed binary search tree?

I was reading this answer on how to check if a BST is height balanced, and really hooked by the bonus question:
Suppose the tree is massively unbalanced. Like, a million nodes deep on one side and three deep on the other. Is there a scenario in which this algorithm blows the stack? Can you fix the implementation so that it never blows the stack, even when given a massively unbalanced tree?
What would be a good strategy here?
I am thinking to do a level order traversal and track the depth, if a leaf is found and current node depth is bigger than the leaf node depth + 2, then it's not balanced. But how to combine this with height checking?
Edit: below is the implementation in the linked answer
IsHeightBalanced(tree)
return (tree is empty) or
(IsHeightBalanced(tree.left) and
IsHeightBalanced(tree.right) and
abs(Height(tree.left) - Height(tree.right)) <= 1)
To review briefly: a tree is defined as being either null or a root node with pointers .left to a left child and .right to a right child, where each child is in turn a tree, the root node appears in neither child, and no node appears in both children. The depth of a node is the number of pointers that must be followed to reach it from the root node. The height of a tree is -1 if it's null or else the maximum depth of a node that appears in it. A leaf is a node whose children are null.
First let me note the two distinct definitions of "balanced" proposed by answerers of the linked question.
EL-balanced A tree is EL-balanced if and only if, for every node v, |height(v.left) - height(v.right)| <= 1.
This is the balance condition for AVL trees.
DF-balanced A tree is DF-balanced if and only if, for every pair of leaves v, w, we have |depth(v) - depth(w)| <= 1. As DF points out, DF-balance for a node implies DF-balance for all of its descendants.
DF-balance is used for no algorithm known to me, though the balance condition for binary heaps is very similar, requiring additionally that the deeper leaves be as far left as possible.
I'm going to outline three approaches to testing balance.
Size bounds for balanced trees
Expand the recursive function to have an extra parameter, maxDepth. For each recursive call, pass maxDepth - 1, so that maxDepth roughly tracks how much stack space is left. If maxDepth reaches 0, report the tree as unbalanced (e.g., by returning "infinity" for the height), since no balanced tree that fits in main memory could possibly be that tall.
This approach relies on an a priori size bound on main memory, which is available in practice if not in all theoretical models, and the fact that no subtrees are shared. (PROTIP: unless you're very careful, your subtrees will be shared at some point during development.) We also need height bounds on balanced trees of at most a given size.
EL-balanced Via mutual induction, we prove a lower bound, L(h), on the number of nodes belonging to an EL-balanced tree of a given height h.
The base cases are
L(-1) = 0
L(0) = 1,
more or less by definition. The inductive case is trickier. An EL-balanced tree of height h > 0 is a node with an EL-balanced child of height h - 1 and another EL-balanced child of height either h - 1 or h - 2. This means that
L(h) = 1 + L(h - 1) + min(L(h - 2), L(h - 1)).
Add 1 to both sides and rearrange.
L(h) + 1 = L(h - 1) + 1 + min(L(h - 2) + 1, L(h - 1) + 1).
A little while later (spoiler), we find that
L(h) <= phi^(h + 2)/sqrt(5),
where phi = (1 + sqrt(5))/2 ~ 1.618.
maxDepth then should be set to the floor of the base-phi logarithm of the maximum number of nodes, plus a small constant that depends on fenceposty things.
DF-balanced Rather than write out an induction proof, I'm going to appeal to your intuition that the worst case is a complete binary tree with one extra leaf on the bottom. Then the proper setting for maxDepth is the base-2 logarithm of the maximum number of nodes, plus a small constant.
Iterative deepening depth-first search
This is the theoretician's version of the answer above. Because, for some reason, we don't know how much RAM our computer has (and with logarithmic space usage, it's not as though we need a tight bound), we again include the maxDepth parameter, but this time, we use it to truncate the tree implicitly below the specified depth. If the height of the tree comes back below the bound, then we know that the algorithm ran successfully. Alternatively, if the truncated tree is unbalanced, then so is the whole tree. The problem case is when the truncated tree is balanced but with height equal to maxDepth. Then we increase maxDepth and retry.
The simplest retry strategy is to increase maxDepth by 1 every time. Since balanced trees with n nodes have height O(log n), the running time is O(n log n). In fact, for DF-balanced trees, the running time is also O(n), since, except for the last couple traversals, the size of the truncated tree increases by a factor of 2 each time, leading to a geometric series.
Another strategy, doubling maxDepth each time, gives an O(n) running time for EL-balanced trees, since the largest tree of height h, with 2^(h + 1) - 1 nodes, is much smaller than the smallest tree of height 2h, with approximately (phi^2)^h nodes. The downside of doubling is that we may use twice as much stack space. With increase-by-1, however, in the family of minimum-size EL-balanced trees we constructed implicitly in defining L(h), the number of nodes at depth h - k in the tree of height h is polynomial of degree k. Accordingly, the last few scans will incur some superlinear term.
Temporarily mutating pointers
If there are parent pointers, then it's easy to traverse depth-first in place, because the parent pointers can be used to derive the relevant information on the stack in an efficient manner. If we don't have parent pointers but can mutate the tree temporarily, then, for descent into a child, we can cannibalize the pointer to that child to store temporarily the node's parent. The problem is determining on the way up whether we came from a left or a right child. If we can sneak a bit (say because pointers are 2-byte aligned, or because there's a spare bit in the balance factor, or because we're copying the tree for stop-and-copy garbage collection and can determine which arena we're in), then that's one way. Another test assumes that the tree is a binary search tree. It turns out that we don't need additional assumptions, however: Explain Morris inorder tree traversal without using stacks or recursion .
The one fly in the ointment is that this approach only works, as far as I know, on DF-balance, since there's no space on the stack to put the partial results for EL-balance.

Resources