min and max height expression tree - binary-tree

When constructing an expression tree with n binary operations, which maximum and minimum height can I expect? I would be very thankful if someone has a general formula, because I couldn't find one and I also wasn't able to find a schema in the examples I worked with.

Let's suppose you have n operations. Naturally, the maximum height is n + 1, on the first level you see the root operation, on the last level you see value leafs and on all other levels you see an operation node and a value leaf. The minimum depth (if your operations always "cut" the expression in the middle) is of ceil(log(2, 2 * n + 1)).

Related

How many comparisons a call to removeMin() will make in max heap of 7-ary tree?

Assume that a max heap with 10^6 elements is stored in a complete 7-ary tree. Approximately how many comparisons a call to removeMin() will make?
5000
50
10^6
500
5
My solution: The number of comparisons should be equal to the number of leaf nodes at most because in max heap, the min. can be found at any of the leaf nodes which is not in the above options. Better approach was to take the square of ( log of 10^6 to the base 7) which gives 50 but this is only when we are sure that the minimum element will follow a single branch across tree which in the case of max heap is not correct.
I hope that you can help.
There's no "natural" way to remove the minimum value from a max heap. You simply have to look at all the leaf nodes to figure out which one happens to be the minimum.
The question then is how many leaf nodes there are. Intuitively, we'd expect the fraction of nodes in the heap that are leaves to be pretty close to the total number of nodes. Take it to the limit - if you have a 1,000,000-ary heap, you'd have one node in the top layer and all remaining 999,999 elements in the next layer. Even in the smallest case where the heap is a binary heap, you'd expect roughly half the elements to be in the bottom layer.
More specifically, let's do some math! How many leaves will a 7-ary heap with n nodes have? Well, each node in the tree will either
be a leaf, or
have seven children,
with one possible exception that, since the bottommost row might not be full, there might be one node with fewer than seven children. Since that's just a one-off, we can ignore that last node when we're dealing with millions of elements. A quick proof by induction can be used to show that any tree where each node either has no children or seven children will have seven times as many leaf nodes as internal nodes (prove this!), so we'd expect than (7/8)ths of the nodes will be leaves, for a total of 875,000 leaves to check.
As a result, the best answer here would be roughly 106 comparisons.
Min element can be any of the leaves of a max heap or any type, and there's no order there. All elements from A[10^6/7 + 1] onwards (where A is the array storing the leaves) are leaf nodes and need to be checked. This means 8571412 comparisons just to find the minimum. After that there is no simple way to 'remove' the minimum without introducing a gap that cannot be filled by simply shifting the leaves.
This is a misprint. Maybe the teacher wanted to ask removeMax, for which the answer is close to 50 -- see below:
There are 7 comparisons per level done by the heapify since each node has 7 children. If h is the height of the heap then that's 7*h comparisons.
Rough analysis: (here ~ means approximately)
h ~ log_7(10^6) = 7.1, hence total comparisons 7*7.1 ~ 50
More accurate analysis:
A 7-ary heap would have elements: 1 + 7 + 7^2 + ... + 7^h = 10^6
On the left side is a geometric series, that sums to: (7^h -1)/6 = 10^6
=> 7^h = 6*10^6 + 1
=> h = lg_7(6*10^6 + 1) = 8 (approximately) , hence 7*8 = 56, still from the options 50 is the closest.
*A is array to sort heap.

Recurrence: T(n/4) + T(n/2) + n^2 with tree methods

I'm trying to solve this exercise with tree method but I've a doubt about two parts:
1) In the T(?) column, is it correct using (n^2/2^i) instead of (n/2^i)? I'm asking because this is the part which cause me the error;
2) Is the last multiplication correct (it's between the number of nodes and the time)? After finding the i value I have to create a serie which starts from 0 to the result of the multiplication, right? And as variable of the serie have I to use 2^i (the number of nodes)?
The column for the number of nodes is misleading.
Each node has a cost of (m/k)^2 where k is whatever the denominator of the node is. With the structure you have using, the nodes in each level will have a variety of denominators. For example, your level 2 should contain the nodes [(m/16), (m/8)], [(m/8), (m/4)].
The cost for a level is the sum of the cost of each node in that level. Since each node has a different cost, you cannot multiply the number of nodes by a value to find the cost of a level, you have to add them up individually.
The total cost is the sum of the cost of each level. The result of this calculation may result in a logarithm, or it may not. It depends on the cost of each level and the number of levels.
Hint: Pascal's Triangle

Missing number in binary search tree

If I have order statistic binary balanced tree that has n different integers as its keys and I want to write function find(x) that returns the minimal integer that is not in the tree, and is greater than x. in O(log(n)) time.
For example, if the keys in the tree are 6,7,8,10,11,13,14 then find(6)=9, find(8)=9, find(10)=12, find(13)=15.
I think about finding the max in O(log(n)) and the index of x (mark i_x) in O(log(n)) then if i_x=n-(m-x) then I can simply return max+1.
By index I mean in 6,7,8,10,11,13,14 that index of 6 is 0 and index of 10 is 3 for example...
But I'm having trouble with the other cases...
According to wikipedia, an order statistic tree supports those two operations in log(n) time:
Select(i) — find the i'th smallest element stored in the tree in O(log(n))
Rank(x) – find the rank of element x in the tree, i.e. its index in the sorted list of elements of the tree in O(log(n))
Start by getting the rank of x, and select the superior ranks of x until you find a place to insert your missing element. But this has worst-case n*log(n).
So instead, once you have the rank of x, you do a kind of binary search. The basic idea is whether there is a space between number x and y which are in the tree. There is a space if rank(x) - rank(y) != x - y.
General case is: when searching for the number in the interval [lo,hi] (lo and hi are ranks in the tree, mid is the middle rank), if there is a space between lo and mid then search inside [lo,mid], else search inside [mid, hi].
You will end up finding the number you seek.
However, this solution does not run in log(n) time, but in log^2(n). This is the best I can think of for a general solution.
EDIT:
Well, it's a tough question, I changed my mind several times. Here is what I came up with:
I assume that the left node holds inferior value and the right node holds superior value
Intuition of find(x): Start at the root and go down the tree almost like in a standard binary tree. If the branch we want to go does not contain the solution of find(x) then cut it.
We'll go through the basic cases first:
If the node I found is null, then I am done, and I return the value I was looking for.
If the current value is less than the one I am looking for, I search for x in the right subtree
If I found the node containing x, then I search for x+1 on the right subtree.
The case where x is in the left subtree is more tricky, because it may contain x, x+1, x+2, x+3, etc up to y-1 where y is the value stored in the current node. In this case, we want to search for y+1 in the right subtree.
However, if all the numbers from x to y are not in the left subtree (that is, there is a gap), then we will find a value in it, so we look into the left subtree for x.
Question is: How to find if the sequence from x to y is present in the subtree ?
The algorithm in python looks like this:
def find(node, x):
if node == null:
return x
if node.data < x:
return find(node.right, x)
if node.data == x:
return find(node.right, x+1)
if is_full(...):
return find(node.right, node.data+1)
return find(node.left, x)
To get the smallest value strictly greater than x which is not in the tree, the first call is find(root, x+1). If you want the smallest value greater than or equals to x that is not in the tree, the first call is find(root, x).
The is_full method checks if the left subtree contains all number from x to node.data-1.
Now, using this as a starting point, I believe you can find a suitable solution by yourself, using the fact that the number of nodes contained in each subtree is stored at the subtree's root.
I faced a similar question.
There were no restrictions about finding greater than some x, simply find the missing element in the BST.
Below is my answer, it is perfectly possible to do so in O(lg(n)) time, with the assumption that, tree is almost balanced. You might want to consider the proof that expected height of the randomly built BST is lg(n) given n elements. I use a simpler notation, O(h) where h = height of the tree, so two things are now separate.
assumptions and/or requirements:
I enhance the data structure. store the count of (left_subtree + right_subtree + 1) at each node.
Obviously, count of a single node is 1
This count is pre-computed and stored at each node
Kindly pardon my multiple notations for not equal to (=/= and !=)
Also note that code might be structured in little better way if one is to write a working code on a machine.
Moreover, I think, at this point in time, that this is correct. I tried as many corner cases as I could think of, and in general it works. Even if there is a counter example, I don;t think it will be that difficult to modify the code to fit that particular case; but please comment the counter example, I am interested.

Binary Search Tree Construction of height h

Given the first n natural numbers as the keys for a BST , How can I determine the root node of all possible tree that have a height 'h' .
I already came up with a brute force method where I constructed all possible trees with n nodes and then selected the trees which have a height h but it has a time complexity of nearly O(n!) . Can someone please suggest a better method which would me more efficient ?
Problem statement. Given natural numbers n and h, determine exactly all elements root in 1..n such that root is the root of a binary search tree on 1..n of height h.
Solution. We can construct a degenerate binary search tree on 1..n starting from any number in 1..n by splitting it up at root. This changes the lower bounds from the old solution to h-1, while the upper bounds remain the same, rendering the full bounds as follows:
h-1 <= max(root-1, n-root) <= 2^h - 1
Old solution (correct only for full binary trees). A full binary tree with height h has at least 2h+1 nodes, and at most 2^(h+1)-1 nodes. It's easy to see that these bounds are tight, not only for binary trees, but also for binary search trees. In particular, they apply to the left and right subtrees of your root. Since this is a binary search tree on 1..n, you will have that left contains exactly the elements 1..(root-1), and right contains exactly the elements (root+1)..n.
This means that the following is both a necessary and sufficient condition: The larger of the subtrees left and right must satisfy the inequalities
2*(h-1) + 1 <= nodes(subtree) <= 2^h - 1
In other words, the possible values of root are exactly all values in 1..n satisfying
2*(h-1) + 1 <= max(root-1, n-root) <= 2^h - 1
Update. I blindly looked at an inequality I found at wikipedia without realizing that it only applied to full binary trees.

How to efficiently check whether it's height balanced for a massively skewed binary search tree?

I was reading this answer on how to check if a BST is height balanced, and really hooked by the bonus question:
Suppose the tree is massively unbalanced. Like, a million nodes deep on one side and three deep on the other. Is there a scenario in which this algorithm blows the stack? Can you fix the implementation so that it never blows the stack, even when given a massively unbalanced tree?
What would be a good strategy here?
I am thinking to do a level order traversal and track the depth, if a leaf is found and current node depth is bigger than the leaf node depth + 2, then it's not balanced. But how to combine this with height checking?
Edit: below is the implementation in the linked answer
IsHeightBalanced(tree)
return (tree is empty) or
(IsHeightBalanced(tree.left) and
IsHeightBalanced(tree.right) and
abs(Height(tree.left) - Height(tree.right)) <= 1)
To review briefly: a tree is defined as being either null or a root node with pointers .left to a left child and .right to a right child, where each child is in turn a tree, the root node appears in neither child, and no node appears in both children. The depth of a node is the number of pointers that must be followed to reach it from the root node. The height of a tree is -1 if it's null or else the maximum depth of a node that appears in it. A leaf is a node whose children are null.
First let me note the two distinct definitions of "balanced" proposed by answerers of the linked question.
EL-balanced A tree is EL-balanced if and only if, for every node v, |height(v.left) - height(v.right)| <= 1.
This is the balance condition for AVL trees.
DF-balanced A tree is DF-balanced if and only if, for every pair of leaves v, w, we have |depth(v) - depth(w)| <= 1. As DF points out, DF-balance for a node implies DF-balance for all of its descendants.
DF-balance is used for no algorithm known to me, though the balance condition for binary heaps is very similar, requiring additionally that the deeper leaves be as far left as possible.
I'm going to outline three approaches to testing balance.
Size bounds for balanced trees
Expand the recursive function to have an extra parameter, maxDepth. For each recursive call, pass maxDepth - 1, so that maxDepth roughly tracks how much stack space is left. If maxDepth reaches 0, report the tree as unbalanced (e.g., by returning "infinity" for the height), since no balanced tree that fits in main memory could possibly be that tall.
This approach relies on an a priori size bound on main memory, which is available in practice if not in all theoretical models, and the fact that no subtrees are shared. (PROTIP: unless you're very careful, your subtrees will be shared at some point during development.) We also need height bounds on balanced trees of at most a given size.
EL-balanced Via mutual induction, we prove a lower bound, L(h), on the number of nodes belonging to an EL-balanced tree of a given height h.
The base cases are
L(-1) = 0
L(0) = 1,
more or less by definition. The inductive case is trickier. An EL-balanced tree of height h > 0 is a node with an EL-balanced child of height h - 1 and another EL-balanced child of height either h - 1 or h - 2. This means that
L(h) = 1 + L(h - 1) + min(L(h - 2), L(h - 1)).
Add 1 to both sides and rearrange.
L(h) + 1 = L(h - 1) + 1 + min(L(h - 2) + 1, L(h - 1) + 1).
A little while later (spoiler), we find that
L(h) <= phi^(h + 2)/sqrt(5),
where phi = (1 + sqrt(5))/2 ~ 1.618.
maxDepth then should be set to the floor of the base-phi logarithm of the maximum number of nodes, plus a small constant that depends on fenceposty things.
DF-balanced Rather than write out an induction proof, I'm going to appeal to your intuition that the worst case is a complete binary tree with one extra leaf on the bottom. Then the proper setting for maxDepth is the base-2 logarithm of the maximum number of nodes, plus a small constant.
Iterative deepening depth-first search
This is the theoretician's version of the answer above. Because, for some reason, we don't know how much RAM our computer has (and with logarithmic space usage, it's not as though we need a tight bound), we again include the maxDepth parameter, but this time, we use it to truncate the tree implicitly below the specified depth. If the height of the tree comes back below the bound, then we know that the algorithm ran successfully. Alternatively, if the truncated tree is unbalanced, then so is the whole tree. The problem case is when the truncated tree is balanced but with height equal to maxDepth. Then we increase maxDepth and retry.
The simplest retry strategy is to increase maxDepth by 1 every time. Since balanced trees with n nodes have height O(log n), the running time is O(n log n). In fact, for DF-balanced trees, the running time is also O(n), since, except for the last couple traversals, the size of the truncated tree increases by a factor of 2 each time, leading to a geometric series.
Another strategy, doubling maxDepth each time, gives an O(n) running time for EL-balanced trees, since the largest tree of height h, with 2^(h + 1) - 1 nodes, is much smaller than the smallest tree of height 2h, with approximately (phi^2)^h nodes. The downside of doubling is that we may use twice as much stack space. With increase-by-1, however, in the family of minimum-size EL-balanced trees we constructed implicitly in defining L(h), the number of nodes at depth h - k in the tree of height h is polynomial of degree k. Accordingly, the last few scans will incur some superlinear term.
Temporarily mutating pointers
If there are parent pointers, then it's easy to traverse depth-first in place, because the parent pointers can be used to derive the relevant information on the stack in an efficient manner. If we don't have parent pointers but can mutate the tree temporarily, then, for descent into a child, we can cannibalize the pointer to that child to store temporarily the node's parent. The problem is determining on the way up whether we came from a left or a right child. If we can sneak a bit (say because pointers are 2-byte aligned, or because there's a spare bit in the balance factor, or because we're copying the tree for stop-and-copy garbage collection and can determine which arena we're in), then that's one way. Another test assumes that the tree is a binary search tree. It turns out that we don't need additional assumptions, however: Explain Morris inorder tree traversal without using stacks or recursion .
The one fly in the ointment is that this approach only works, as far as I know, on DF-balance, since there's no space on the stack to put the partial results for EL-balance.

Resources