I am not hiding this is a part of my homework but I've tried enough before posting here.
So...
I need to prove for a binary tree that a node k have its left child on 2k and right child on 2k + 1 position. I've proved this with induction.
Now I need to prove for a binary tree that a node k have its parent on (floor)(k/2) position. I took two cases.
Tried it with induction as well. It's true for a tree of 3 nodes.
If it's true for node k I'll prove for node k + 1.
If node k+1 shares parent with node k it's obviously true.
If node k+1 has difference parent with node k....
I am trying to make a general binary tree but the types won't help me to use induction assumption. I assume maybe I'll have to use what I proved before for child's position.
Any help?
So you've proven that the kth node has children at 2k and 2k+1. Then let's divide the children into two cases, even and odd.
For even children they're located at i=2k for some k. You can see that that means its parent is at k, or i/2, or floor(i/2).
For odd children they're located at i=2k+1 for some k. You can see that this means its parent is at k. floor(i/2) in this case equals floor(k+1/2), which equals floor(k) = k since k is an integer, so here the parent node is at floor(i/2) also.
Since the set of all odd and even children make up all children, the parent of the ith child is floor(i/2)
QED? Sorry if this isn't rigorous or formal enough..
Related
I have been asked this question in an interview and I'm curious to know what will be the correct explanation for it?
Consider the following height balanced BST where Balance factor = Height of left subtree - Height of right subtree & the accepted balance factors are 0, 1, -1, 2, and -2.
What will be the time taken to search an element in such a kind of height-balanced BST? Explain.
What I said was, even if it has a height factor of 2 rather than 1 in standard Balance BST definitions, still the operations should be logN complexity order, (where N is the number of elements in the tree) because when the N will be large then will be not making much difference if the height factor is 2 or 1.
If anyone can tell me what would have been the correct answer here will be helpful :)
We can solve this mathematically as :
Defining Worst Case
Now, in any Binary Tree, time complexity of searching is O(h) where h is height of the Binary Tree.
Now, for worst case, we want to find Maximum Height.
In case of simple Binary Search Tree with no Balancing Factor
Condition on Nodes, this maximum height can be n or n+1 (depending
on convention whether height of single node tree will be 1 or 0)
where n is number of nodes.
Thus, we can say that given number of nodes, worst case is maximum height.
Interestingly, we can also say that given height of a tree, the worst case is minimum nodes. As even for such minimum number of nodes, we might have to traverse down the tree of height h, which we also have to do for maximum number of nodes.
Thus, the intuition should be clear that Given Height of Tree, the worst case is minimum number of nodes.
Applying this Concept on Binary Search Tree
Let us try to construct Binary Search Tree of Height H such that number of nodes in the tree is minimum.
Here we will exploit the fact that Binary Tree is a Recursive Data
Structure (A Binary Tree can be defined in terms of Binary Tree)
We will use the notation NH to denote Minimum Number of Nodes in a Binary Search Tree of height H
We will create a Root Node
To Left (or Right) of Root, add a subtree of height H-1 (exploiting Recursive Property). So that number of nodes in entire tree is minimum, the number of node in Left (or Right) subtree should also be minimum. Thus NH is a function of
NH-1
Do we need to add anything to Right (or Left)?
No. Because there is no restriction of Balancing Factor on BST. Thus, our tree will look like
Thus, to construct Binary Search Tree of Height H such that number of nodes in the tree is minimum, we can take Binary Search Tree of Height H-1 such that number of nodes is
minimum, and can add 1 root node.
Thus, we can form Recurrence Relation as
NH = NH-1 + 1
with base condition as
N0=1
To create BST of height 0, we need to add one node. Throughout the answer we will use this convention
Now, this Recurrence Relation is quite simple to solve by Substitution and thus
NH = H+1
NH > H
Now, let n be the number of nodes in the BST of height H
Then,
n ≥ NH
n ≥ H
H ≤ n
Therefore,
H=O(n)
Or
O(H) = O(n)
Thus, Worst Case Time Complexity for Searching will be O(n)
Applying this Concept on AVL Tree
We can apply similar concept on AVL Tree. After reading later part of solution, one can find recurrence relation as :
NH = NH-1 + NH-2 + 1
with Base Condition :
N0 = 1
N1 = 2
And, inequality condition on solving recurrence will be
NH ≥ ((1+√5)/2)H
Then, let n be the number of nodes. Thus,
n ≥ NH
On simplifying, one can conclude that
H ≤ 1.44log2(n)
Applying this Concept on GIVEN Tree
Let us try to construct Given Tree of Height H such that number of nodes in the tree is minimum.
We will use the notation NH to denote Minimum Number of Nodes in Given Tree of height H
We will create a Root Node
To Left (or Right) of Root, add a subtree of height H-1 (exploiting Recursive Property). So that number of nodes in entire tree is minimum, the number of node in Left (or Right) subtree should also be minimum. Thus NH is a function of
NH-1
Do we need to add anything to Right (or Left)?
Yes! Because there is restriction of Balancing Factor on Nodes.
We need to add subtree on Right (or Left). What should be it's height?
H?
No, then height of entire tree will become H+1
H-1?
Permitted! since Balancing Factor of Root will be 0
H-2?
Permitted! since Balancing Factor of Root will be 1
H-3?
Permitted! since Balancing Factor of Root will be 2
H-4?
Not Permitted! since Balancing Factor of Root will become 3
We want minimum number of nodes, so out of H-1, H-2 and H-3, we will choose H-3. So that number of nodes in entire tree is minimum, the number of node in Right (or Left) subtree should also be minimum. Thus NH is also a function of
NH-3
Thus, to construct Given Tree of Height H such that number of nodes in the tree is minimum, we can have LEFT subtree as
Given Tree of Height H-1 such that number of nodes is minimum and can have RIGHT subtree as Given Tree of Height H-3 such that number of nodes in it is also minimum, and can add one Root Node. Our tree will look like
Thus, we can form Recurrence Relation as
NH = NH-1 + NH-3 + 1
with base condition as
N0=1
N1=2
N2=3
Now, this Recurrence Relation is Difficult to Solve. But courtesy to this answer, we can conclude that
NH > (√2)H
Now, let n be the number of nodes in the Given Tree Then,
n ≥ NH
n ≥ (√2)H
log√2(n) ≥ H
H ≤ log√2(n)
H ≤ 2log2(n)
Therefore, H=O(log(n))
Or O(H) = O(log(n))
Thus, Worst Case Time Complexity for Searching in this Given Tree will be O(log(n))
Hence, Proved Mathematically!
I tried watching http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-4-heaps-and-heap-sort/ to understand heaps and heapsort but did not find this clear.
I do not understand the function of max-heapify. It seems like a recursive function, but then somehow it's said to run in logarithmic time because of the height of the tree.
To me this makes no sense. In the worst case, won't it have to reverse every single node? I don't see how this can be done without it touching every single node, repeatedly.
Here's what MAX-HEAPIFY does:
Given a node at index i whose left and right subtrees are max-heaps, MAX-HEAPIFY moves the node at i down the max-heap until it no longer violates the max-heap property (that is, the node is not smaller than its children).
The longest path that a node can take before it is in the proper position is equal to the starting height of the node. Each time the node needs to go down one more level in the tree, the algorithm will choose exactly one branch to take and will never backtrack. If the node being heapified is the root of the max-heap, then the longest path it can take is the height of the tree, or O(log n).
MAX-HEAPIFY moves only one node. If you want to convert an array to a max-heap, you have to ensure that all of the subtrees are max-heaps before moving on to the root. You do this by calling MAX-HEAPIFY on n/2 nodes (leaves always satisfy the max-heap property).
From CLRS:
for i = floor(length(A)/2) downto 1
do MAX-HEAPIFY(A,i)
Since you call MAX-HEAPIFY O(n) times, building the entire heap is O(n log n).*
* As mentioned in the comments, a tighter upper-bound of O(n) can be shown. See Section 6.3 of the 2nd and 3rd editions of CLRS for the analysis. (My 1st edition is packed away, so I wasn't able to verify the section number.)
In the worst case, won't it have to reverse every single node?
You don't have to go through every node. The standard max-heapify algorithm is: (taken from Wikipedia)
Max-Heapify (A, i):
left ← 2*i // ← means "assignment"
right ← 2*i + 1
largest ← i
if left ≤ heap_length[A] and A[left] > A[largest] then:
largest ← left
if right ≤ heap_length[A] and A[right] > A[largest] then:
largest ← right
if largest ≠ i then:
swap A[i] and A[largest]
Max-Heapify(A, largest)
You can see that on each recursive call you either stop or continue with the subtree left or right. In the latter case you decrease the tree height with 1. Since the heap tree is balanced by definition you would do at most log(N) steps.
Here's an argument for why it's O(N).
Assume it's a full heap, so every non-leaf node has two children. (It still works even if that's not the case, but it's more annoying.)
Put a coin on each node in the tree. Each time we do a swap, we're going to spend one of those coins. (Note that when elements swap in the heap, the coins don't swap with them.) If we run MAX-HEAPIFY, and there's any coins left over, that means we've done fewer swaps than there are nodes in the tree, and thus MAX-HEAPIFY performs O(N) swaps.
Claim: after MAX-HEAPIFY is done running, a heap will always have at least one path from the root to a leaf with coins on every node of the path.
Proof by induction: For a single-node heap, we don't need to do any swaps, so we don't need to spend any coins. Thus, the one node gets to keep its coin, and we have a full path from root to leaf (of length 1) with coin intact.
Now, assume we have a heap with left and right subheaps, and MAX-HEAPIFY has already run on both. By inductive hypothesis, each has at least one path from root to leaf with coins on it, so we have at least two root-to-leaf paths with coins, one for each child. The farthest the root would ever need to go in order to establish the MAX-HEAP property is to swap all the way to the bottom of the tree. Let's say it swaps down into the left subtree, and it swaps all the way to down to the bottom. For each swap, we need to spend the coin, so we spend it from the node that the root swapped to.
In doing this, we spent all the coins on one of the root-to-leaf paths, but remember we originally had at least two! Therefore, we still have a root-to-leaf path complete with coins after MAX-HEAPIFY runs on the whole heap. Therefore, MAX-HEAPIFY spent fewer coins than there are nodes in the tree. Therefore, the number of swaps is O(N). QED.
As the question states, I'm trying to find an algorithm to find the successor of a key 'k' in balanced binary search tree. I think a balanced BST is the same as an AVL tree (correct me if I'm wrong). I was hoping I could do this in one pass in O(log n) time, but everything I've found says I need to go down the right side of the tree, then the left. I'm new at the whole trees thing and still find it a little confusing. Any help would be greatly appreciated!
Thanks.
In a binary search tree, you have two path option to go down : left or right.
Now suppose we have an element k in a node N. We want to find k's successor, which is the smallest element of the tree which is greater than k.
There are 3 use cases here :
N has a non-NULL right child : the leftmost element of the right subtree is k's successor.
N has not such right child and is the left child of its parent P. In this case, P holds the successor of k.
Finally, N is the right child of its parent P. Then, to find its successor you must follow a more elaborate procedure shown below ...
Starting from S = Parent(P) : while S ≠ Root AND P ≠ Left(S)
P ← S
S ← Parent(S)
If S = Root and P = Right(S), then k was the maximum element of the tree ... Otherwise, just perform the following loop after setting S ← Right(S):
While S ≠ NULL :
P ← S
S ← Left(S)
When this loop ends, P holds k's successor.
If the node with key k has a right subtree, it is the leftmost node in that subtree.
Otherwise, if the node with key k is a left child it is the parent of this node.
Otherwise, find the closest ancestor of the node other than its immediate parent that has a right subtree it is the leftmost node in that subtree. If no such ancestor exists, the node was the maximum and does not have a successor.
Since the tree is balanced, you can always find it in O(log n).
How to prove that finding a successor n-1 times in the BST from the minimum node is O(n)?
The questions is that we can create sorted order by
1) let the node = minimum node of the BST.
2) From that node, we recursively call find a successor.
I was told that the result is O(n) but I do not understand and do not know how to prove it.
Should not it be O(n*log n) instead? Because for the step 1, it is O(log n), for the step 2, it is also O(log n) but it is called n-1 times. Therefore, it will be O(n*log n)
Please clarify my doubt. Thank you! :)
You are correct that any individual operation might take O(log n) time, so if you perform those operations n times, you should get a runtime of O(n log n). This bound is correct, but it's not tight. The actual runtime is Θ(n).
One way to see this is to look at any individual edge in the tree. How many times will you visit each edge if you start at the leftmost node and repeatedly perform a successor query? If you look closely at how the operations work, you'll discover that every edge is visited exactly twice: once downward and once upward. Since all the work done is done traversing up and down edges, this means that the total amount of work done is proportional to twice the number of edges. In any tree, the number of edges is the number of nodes minus one, and so the total work done is Θ(n).
To formalize this as a proof, try showing that you never descend down the same edge twice and that when you ascend up an edge, you never descend down that edge again. Once you've done this, the conclusion that the runtime is Θ(n) follows from the above logic.
Hope this helps!
I wanted to post this as a comment on templatetypedef's answer, but it's too long.
His answer is right in that the easiest way to see that this is linear is because every edge is visited exactly twice, and the number of edges in a tree is always one less than the number of nodes (because every node has one parent, except the root!).
The issue is that the way he phrases the formal proof uses words that seem to imply contradiction as the way to go. In general, mathematicians frown on using contradiction because it often produces proofs with superfluous content. For instance:
Proof that 2 + 2 != 5:
Assume for contradiction that 2 + 2 = 5 (<- Remove this line)
Well 2 + 2 = 4
And 4 != 5
Contradiction! (<- Remove this line)
Contradiction tends to be verbose, and sometimes it can even obfuscate the idea behind the proof! There are times when contradiction seems pretty much necessary, but it's relatively rare and that's a separate discussion.
In this case, I don't see a proof by contradiction being any easier than a direct proof. On the other hand, regardless of proof technique, this proof is pretty ugly to do formally. Here's an attempt:
1) The succ(n) algorithm traverses one of two paths
In the first case every edge is visited on the simple path from a node to the leftmost node of its right subtree
In the other case, the node n has no right child in which case we go up its ancestors p_1, p_2, p_3, ..., p_k such that p_(k-1) is the first ancestor which is the left child of it's parent. All of those edges are visited in that simple path
We want to show that an arbitrary edge is traversed in precisely two succ() calls, once for the first case of succ() and once for the second case of succ(). Well, this is true for every edge other than the rightmost branch, but you can handle those edge cases separately. Alternatively we could prove the simpler argument where we return to the root after visiting the last element
This is two-fold because for a given edge e we have to find the n1 and n2 such that succ(n1) traverses e and succ(n2) also traverses e, as well as prove that every other succ() generates a path which does not include e.
2) First we actually prove that for each type of path that succ() visits, no two paths overlap (i.e. if succ(n) and succ(n') both traverse paths of the same type, those paths share no edges)
In the first case, the simple path is precisely defined as follows. Start at node n and go one edge to the right to r. Then traverse the left branch of the subtree rooted at r. Now consider any other such path that starts at some other node n' (note, we don't assume that n != n'). It must go right one node to r'. Then it traverses the leftmost branch of the subtree rooted at r'. If the paths overlap then pick one of the edges that overlap. If it's (n,r) = (n',r') then we have n = n' and so it's the same path. If it's some e = e' in both leftmost branches then you can show, again, that n = n' (you can trace the leftmost branches and show that every edge is the same, then finally reach the conclusion that r = r' => n = n' because for a tree the parent is unique. You'll see this tracing argument below). Thus we know that for any n and n', if their paths overlap, they are actually the same node! The contrapositive says this: if they are different nodes, then their paths don't overlap. That's exactly what we want (and the contrapositive is always equally true to the original statement).
In the second case we define the simple path starting at node n and go up the ancestors p_1, p_2, ..., p_k = g until we reach the first node p_k such that p_(k-1) is to the left of p_k. Consider some other path of the same type that starts at node n' where n != n'. Similarly it visits p_1', p_2', ..., p_k' = g'. Because it's a tree, none of those ancestors are the same as the first set. Because none of the nodes on the two paths are the same, none of the edges can be the same and hence succ(n) and succ(n') do not traverse any of the same edges
3) Now we just need to show that at least one path of each type exists for a given edge. Well take any such edge e = (c,p) (note here I am ignoring the special edges on the rightmost branch which are technically only visited once and I am also ignoring the special edges on the leftmost branch which are technically visited once by find_min() and then once by succ() calls)
If it's from a left child c to its parent p then succ(c) will cover the second type of path. To find the other path, keep going up p's ancestors p_1, p_2, ..., p_k such that p_(k-1) is to the right of p_k. succ(p_k) will traverse a path containing e by definition (since e is on the leftmost branch of the subtree of p_(k-1) which is p_k's right child).
A similar argument holds for symmetric case when c is the right child of p
To summarize the proof we've shown that succ() generates two types of path. For each type of path, all of the paths of those types do not overlap. Furthermore, for any edge we have at least one of each of those types of paths. Since we call succ() on every node we can finally conclude that each edge is traversed twice (and hence the algorithm is Theta(n)).
Despite how long this proof was, it isn't actually complete (even ignoring the points when I explicitly said I was skipping details!). There are cases where I said something exists without proving it exists. You can figure out those details if you want and it is actually really satisfying to get it completely right (in my opinion at least. Maybe when you're a genius you'll find it tedious, heh)
Hope this helped. Let me know if you want me to clarify some steps
Problem: I have a binary tree, all leaves are numbered (from left to right, starting from 0) and no connection exists between them.
I want an algorithm that, given two indices (of 2 distinct leaves), visits the tree starting from the greater leaf (the one with the higher index) and gets to the lower one.
The internal nodes of the tree do not contain any useful information.
I should chose the path based only on the leaves indices. The path start from a leaf and terminates on a leaf, and of course I can access a leaf if I know its index (through an array of pointers)
The tree is static, no insertion or deletion of nodes is allowed.
I have developed an algorithm to do it but it really sucks... any ideas?
One option would be to find the least common ancestor of the two nodes, along with the sequence of nodes you should take from each node to get to that ancestor. Here's a sketch of the algorithm:
Starting from each node, walk back up to that node's parent until you reach the root. Count the number of nodes on the path from each node to the root. Let the height of the first node be h1 and the height of the second node be h2.
Let h = min(h1, h2). This is the height of the higher of the two nodes.
Starting from each node, keep following the node's parent pointer until both nodes are at height h. Record the nodes you followed during this step. At this point, both nodes are at the same height.
Until you find a common node, keep marching upwards from each node to its parent. Eventually you will hit their common ancestor. At this point, follow the path from the first node up to this ancestor, then down the path from the ancestor down to the second node.
In the worst case, this takes O(h) time and O(h) space, where h is the height of the tree. For a balanced binary tree is this O(lg n) time and space, which is quite good.
If you're interested in a Much More Hardcore version of this algorithm, consider looking into Tarjan's Least Common Ancestors algorithm, which with linear preprocessing time, can be used to find the least common ancestor much more rapidly than this.
Hope this helps!
Distance between any two nodes can be calculated with the help of lowest common ancestor:
Dist(n1, n2) = Dist(root, n1) + Dist(root, n2) - 2*Dist(root, lca)
where lca is lowest common ancestor.
see this for more help about this algorithm and see this video for learning how to calculate lca.