I'm struggling with following problem:
Write function which for given binary tree returns a root of minimal height which is not BST or NIL when tree is BST.
I know how to check if tree is BST but don't know how to rewrite it.
I would be grateful for an algorithm in pseudo code.
Rather than jumping right into an algorithm that works here, I'd like to give a series of observations that ultimately leads up to a really nice algorithm for this problem.
First, suppose that, for each node in the tree, you knew the value of the largest and smallest values in the subtree rooted at that node. (Let's denote these as min(x) and max(x), where x is a node in the tree). Given this information, we can make the following observation:
Observation 1: A node x is the root of a non-BST if x ≤ max(x.left) or if x ≥ min(y.right)
This is not an if-and-only-if condition - it's just an "if" - but it's a useful observation to have. The reason this works is that if x ≤ max(x.left), then there is a node in x's left subtree that's not smaller than x, meaning that the tree rooted at x isn't a BST, and if x > min(x.right), then there's a node in x's right subtree that's not larger than x, meaning that the tree rooted at x isn't a BST.
Now, it's not necessarily the case that any node where x < min(x.right) and x > max(x.left) is the root of a BST. Consider this tree, for example:
4
/ \
1 6
/ \
2 5
Here, the root node is larger than everything in its left subtree and smaller than everything in its right subtree, but the entire tree is itself not a BST. The reason for this is that the trees rooted at 1 and 6 aren't BSTs. This leads to a useful observation:
Observation 2: If x > max(x.left) and x < min(x.right), then x is a BST if and only if x.left and x.right are BSTs.
A quick sketch of a proof of this result: if x.left and x.right are BSTs, then doing an inorder traversal of the tree will list off all the values in x.left in ascending order, then x, then all the values in x.right in ascending order. Since x > max(x.left) and x < min(x.right), these values are sorted, so the tree is a BST. On the other hand, if either x.left or x.right are not BSTs, then the order in which these values come back won't be sorted, so the tree isn't a BST.
These two properties give a really nice way to find every node in the tree that isn't the root of a BST. The idea is to work through the nodes in the tree from the leaves upward, checking whether each node's value is greater than the max of its left subtree and less than the min of its right subtree, then checking whether its left and right subtrees are BSTs. You can do this with a postorder traversal, as shown here:
/* Does a postorder traversal of the tree, tagging each node with its
* subtree min, subtree max, and whether the node is the root of a
* BST.
*/
function findNonBSTs(r) {
/* Edge case for an empty tree. */
if (r is null) return;
/* Process children - this is a postorder traversal. This also
* tags each child with information about its min and max values
* and whether it's a BST.
*/
findNonBSTs(r.left);
findNonBSTs(r.right);
/* If either subtree isn't a BST, we're done. */
if ((r.left != null && !r.left.isBST) ||
(r.right != null && !r.right.isBST)) {
r.isBST = false;
return;
}
/* Otherwise, both children are BSTs. Check against the min and
* max values of those subtrees to make sure we're in range.
*/
if ((r.left != null && r.left.max >= r.value) ||
(r.right != null && r.right.min <= r.value)) {
r.isBST = false;
return;
}
/* Otherwise, we're a BST, and our min and max value can be
* computed from the left and right children.
*/
r.isBST = true;
r.min = (r.left != null? r.left.min : r.value);
r.max = (r.right != null? r.right.max : r.value);
}
One you've run this pass over the tree, every node will be tagged with whether it's a binary search tree or not. From there, all you have to do is make one more pass over the tree to find the deepest node that's not a BST. I'll leave that as a proverbial exercise for the reader. :-)
Hope this helps!
Related
Given a BST and two integers 'a' and 'b' (a < b), how can we find the number of nodes such that , a < node value < b, in O(log n)?
I know one can easily find the position of a and b in LogN time, but how to count the nodes in between without doing a traversal, which is O(n)?
In each node of your Binary Search Tree, also keep count of the number of values in the tree that are lesser than its value (or, for a different tree design mentioned in the footnote below, the nodes in its left subtree).
Now, first find the node containing the value a. Get the count of values lesser than a which has been stored in this node. This step is Log(n).
Now find the node containing the value b. Get the count of values lesser than b which are stored in this node. This step is also Log(n).
Subtract the two counts and you have the number of nodes between a and b. Total complexity of this search is 2*Log(n) = O(Log(n)).
See this video. The professor explains your question here by using Splay Trees.
Simple solution:
Start checking from the root node
If Node falls within range, then increase it by 1 and check in left and right child recursively
If Node is not within range, then check the values with range. If range values are less than root, then definitely possible scenarios are left subtree. Else check in right subtree
Here is the sample code. Hope it clears.
if (node == null) {
return 0;
} else if (node.data == x && node.data == y) {
return 1;
} else if (node.data >= x && node.data <= y) {
return 1 + nodesWithInRange(node.left, x, y) + nodesWithInRange(node.right, x, y);
} else if (node.data > x && node.data > y) {
return nodesWithInRange(node.left, x, y);
} else {
return nodesWithInRange(node.right, x, y);
}
Time Complexity :- O(logn)+ O(K)
K is the number of elements between x and y.
It's not very ideal but good in case you would not like to modify the Binary Tree nodes definition.
store the inorder traversal of BST in array( it will be sorted). Searching 'a' and 'b' will take log(n) time and get their index and take the difference. this will give the number of node in range 'a' to 'b'.
space complexity O(n)
Idea is simple.
Traverse the BST starting from root.
For every node check if it lies in range.
If it lies in range then count++. And recur for both of its children.
If current node is smaller than low value of range, then recur for right child, else recur for left child.
Time complexity will be O(height + number of nodes in range)..
For your question that why it is not O(n).
Because we are not traversing the whole tree that is the number of nodes in the tree. We are just traversing the required subtree according to the parent's data.
Pseudocode
int findCountInRange(Node root, int a, int b){
if(root==null)
return 0;
if(root->data <= a && root->data >= b)
return 1 + findCountInRange(root->left, a, b)+findCountInRange(root->right, a, b);
else if(root->data < low)
return findCountInRange(root->right, a, b);
else
return findCountInRange(root->left, a, b);
}
A binary tree is given and we have to count the number of binary search trees in it.Every leaf node is a BST
I used the following approach.
for every node in bt check if it is bst or not
The time complexity for above approach is O(n2).How can we do it in an efficient way O(n).
If I understood the question correctly, this can be solved as follows; one would aim at counting the number of nodes which are the root of a binary seach tree. As already remarked, every leaf is trivially the root of a binary search tree. A non-leaf node a is the root of a binary search if and only if the left child of a is a binary search tree, the right child of b is the root of a binary search tree and the maximum over all values under the left child of a is not greater than the value of a and the minumum over all values under the right child of a are larger or equal to the value of a. Evaluation of this property can be done by a recursive evaluation which visits every node exactly once, which results in a linear runtime bound.
A straightforward recursive traversal of the tree returning a few extra pieces of data may help manage it in O(n) time, n being the number of nodes. Below you can find an implementation in Python.
numBST = 0
def traverse(root):
global numBST
leftComplies = True
rightComplies = True
rootRange = [root.val, root.val]
if root.left != None:
leftResult = traverse(root.left)
leftComplies = leftResult[0] and leftResult[1][1] < root.val
rootRange[0] = leftResult[1][0]
if root.right != None:
rightResult = traverse(root.right)
rightComplies = rightResult[0] and rightResult[1][0] > root.val
rootRange[1] = rightResult[1][1]
if leftComplies and rightComplies:
numBST += 1
return (leftComplies and rightComplies, rootRange)
After you run traverse with root of the binary tree as parameter, numBST will contain the number of BSTs within the root.
The function traverse given above recursively traverses the tree root of which is given to it as a parameter. For each node V, if V has a left child L, it recursively traverses the left child and returns some data. Specifically, it returns a list of length 2. The first element in the list is a boolean value indicating whether the left subtree rooted in L is a BST. Second element of the returned list contains another list containing the smallest and the largest value, respectively, in the subtree rooted in L.
For the tree rooted in V to be a BST, the subtree rooted in L must also be a BST AND the largest value in the subtree rooted in L(hence all the values in that subtree) must be smaller than the value stored in V. So after recursively calling traverse for L, we check the returned data to find out if these conditions are satisfied.
Similarly, if there is a right child R of V, it is recursively traversed. To be a BST, the tree rooted in V must also satisfy the condition that the tree rooted in R is a BST AND the smallest node of subtree rooted in R(hence all the nodes in that subtree) contains a value that is larger than the value stored in V.
If all these conditions are satisfied, the tree rooted in V can be considered as a BST and the result, stored in numBST, is updated accordingly. Note that we also update the smallest and largest values stored in V as we recursively traverse its children L and R, and perform the checks mentioned above, so that we pass the correctly updated values to the higher levels of recursion.
Say that each node in a binary search tree x keeps x.successor instead of x.parent. Describe Search, Insert, and Delete algorithms using pseudo code that operate in O(h) where h is the height of the tree.
So far, I believe that the simple binary search tree algorithm for search still applies.
TREE-SEARCH(x, k)
if x == NIL or k == x.key
return x
if k < x.key
return TREE-SEARCH(x.left, k)
else return TREE-SEARCH(x.right, k)
This algorithm should be unaffected by the modification and clearly runs in O(h) time. However, for the Insert and Delete algorithms the modification changes the straightforward way to do these. For example, here is the algorithm for Insertion using a normal binary search tree, T.
TREE-INSERT(T,z)
y = NIL
x = T.root
while x =/= NIL
y = x
if z.key < x.key
x = x.left
else x = x.right
z.p = y
if y == NIL
T.root = z //Tree T was empty
elseif z.key < y.key
y.left = z
else y.right = z
Clearly we cannot use z.p because of our modification to the binary search tree. It has been suggested that a subroutine that finds the parent be implemented, but I fail to see how this can be done.
Searching will be unaffected
Sketch of an answer
While inserting, we want the successor of the newly inserted node. If you know the algorithm for the successor in a normal binary search tree, you'd know that at the leaf node, the successor is the first ancestor whose left child is an ancestor of the node. So while doing normal insert the last node whose left child we follow will be the successor of the node. On this insert path the last node where you follow the right child will be the one whose successor is our current node.
For deletion, if in x.successor you are storing the pointer to the successor node, it becomes simple, just replace the node-to-be-deleted's key with the successor's key and the node.successor pointer with the successors pointer. If you're just storing the key, you need to update the node that had this node as the successor. the candidates are, if it has a left child, the rightmost node in this subtree. If it doesn't, search for the node and the last node where you move to the right child will be the required node.
I don't know how to draw diagrams here, it's simpler with the diagrams.
In BST, according to Programming Interviews Exposed
"Given a node, you can even find the next highest node in O(log(n)) time" Pg 65
A node in BST has right child as the next highest node, then why O(log(n))? Please correct
First answer the question, then negate it
With regard to your comment "A node in BST has right child as the next highest node" (assuming here "next highest" means the next sequential value) - no, it doesn't.
That can be the case if the right child has no left sub-tree, but it's not always so.
The next sequential value (I'm using that term rather than "highest" since the latter could be confused with tree height and "largest" implies a specific (low-to-high) order rather than any order) value comes from one of two places.
First, if the current node has a right child, move to that right child then, as long as you can see a left child, move to it.
In other words, with S and D as the source (current) and destination (next largest):
S
/ \
x x <- This is the node your explanation chose,
/ \ but it's the wrong one in this case.
x x
/
D <----- This is the actual node you want.
\
x
Otherwise (i.e., if the current node has no right child), you need to move up to the parent continuously (so nodes need a right, left and parent pointer) until the node you moved from was a left child. If you get to the root and you still haven't moved up from a left child, your original node was already the highest in the tree.
Graphically that entire process is illustrated with:
x
\
D <- Walking up the tree until you came up
/ \ from a left node.
x x
\
x
/ \
x S
/
x
The pseudo-code for such a function (that covers both those cases) would be:
def getNextNode (node):
# Case 1: right once then left many.
if node.right != NULL:
node = node.right
while node.left != NULL:
node = node.left
return node
# Case 2: up until we come from left.
while node.parent != NULL:
if node.parent.left == node:
return node.parent
node = node.parent
# Case 3: we never came from left, no next node.
return NULL
Since the effort is proportional to the height of the tree (we either go down, or up then down), a balanced tree will have a time complexity of O(log N) since the height has a logN relationship to the number of items.
The book is talking about balanced trees here, because it includes such snippets about them as:
This lookup is a fast operation because you eliminate half the nodes from your search on each iteration.
Lookup is an O(log(n)) operation in a binary search tree.
Lookup is only O(log(n)) if you can guarantee that the number of nodes remaining to be searched will be halved or nearly halved on each iteration.
So, while it admits in that last quote that a BST may not be balanced, the O(log N) property is only for those variants that are.
For non-balanced trees, the complexity (worst case) would be O(n) as you could end up with degenerate trees like:
S D
\ /
x x
\ \
x x
\ \
x x
\ \
x x
/ \
D S
I think, We can find the next highest node by simply finding the Inorder Successor of the node.
Steps -
Firstly, go to the right child of the node.
Then move as left as possible. When you reach the leaf node, print that leaf node as that node is your next highest node compared to the given node.
Here is my pseudo implementation in Java. Hope it helps.
Structure of Node
public Class Node{
int value {get, set};
Node leftChild {get,set};
Node rightChild{get, set};
Node parent{get,set};
}
Function to find next highest node
public Node findNextBiggest(Node n){
Node temp=n;
if(n.getRightChild()==null)
{
while(n.getParent()!=null && temp.getValue()>n.getParent().getValue())
{
n=n.getParent():
}
return n.getParent();
}
else
{
n=n.getRightChild();
while (n.getLeftChild()!=null && n.getLeftChild().getValue()>temp.getValue())
{
n=n.getLeftChild();
}
return n;
}
}
Two BSTs (Binary Search Trees) are given. How to find largest common sub-tree in the given two binary trees?
EDIT 1:
Here is what I have thought:
Let, r1 = current node of 1st tree
r2 = current node of 2nd tree
There are some of the cases I think we need to consider:
Case 1 : r1.data < r2.data
2 subproblems to solve:
first, check r1 and r2.left
second, check r1.right and r2
Case 2 : r1.data > r2.data
2 subproblems to solve:
- first, check r1.left and r2
- second, check r1 and r2.right
Case 3 : r1.data == r2.data
Again, 2 cases to consider here:
(a) current node is part of largest common BST
compute common subtree size rooted at r1 and r2
(b)current node is NOT part of largest common BST
2 subproblems to solve:
first, solve r1.left and r2.left
second, solve r1.right and r2.right
I can think of the cases we need to check, but I am not able to code it, as of now. And it is NOT a homework problem. Does it look like?
Just hash the children and key of each node and look for duplicates. This would give a linear expected time algorithm. For example, see the following pseudocode, which assumes that there are no hash collisions (dealing with collisions would be straightforward):
ret = -1
// T is a tree node, H is a hash set, and first is a boolean flag
hashTree(T, H, first):
if (T is null):
return 0 // leaf case
h = hash(hashTree(T.left, H, first), hashTree(T.right, H, first), T.key)
if (first):
// store hashes of T1's nodes in the set H
H.insert(h)
else:
// check for hashes of T2's nodes in the set H containing T1's nodes
if H.contains(h):
ret = max(ret, size(T)) // size is recursive and memoized to get O(n) total time
return h
H = {}
hashTree(T1, H, true)
hashTree(T2, H, false)
return ret
Note that this is assuming the standard definition of a subtree of a BST, namely that a subtree consists of a node and all of its descendants.
Assuming there are no duplicate values in the trees:
LargestSubtree(Tree tree1, Tree tree2)
Int bestMatch := 0
Int bestMatchCount := 0
For each Node n in tree1 //should iterate breadth-first
//possible optimization: we can skip every node that is part of each subtree we find
Node n2 := BinarySearch(tree2(n.value))
Int matchCount := CountMatches(n, n2)
If (matchCount > bestMatchCount)
bestMatch := n.value
bestMatchCount := matchCount
End
End
Return ExtractSubtree(BinarySearch(tree1(bestMatch)), BinarySearch(tree2(bestMatch)))
End
CountMatches(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return 0
End
Return 1 + CountMatches(n1.left, n2.left) + CountMatches(n1.right, n2.right)
End
ExtractSubtree(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return nil
End
Node result := New Node(n1.value)
result.left := ExtractSubtree(n1.left, n2.left)
result.right := ExtractSubtree(n1.right, n2.right)
Return result
End
To briefly explain, this is a brute-force solution to the problem. It does a breadth-first walk of the first tree. For each node, it performs a BinarySearch of the second tree to locate the corresponding node in that tree. Then using those nodes it evaluates the total size of the common subtree rooted there. If the subtree is larger than any previously found subtree, it remembers it for later so that it can construct and return a copy of the largest subtree when the algorithm completes.
This algorithm does not handle duplicate values. It could be extended to do so by using a BinarySearch implementation that returns a list of all nodes with the given value, instead of just a single node. Then the algorithm could iterate this list and evaluate the subtree for each node and then proceed as normal.
The running time of this algorithm is O(n log m) (it traverses n nodes in the first tree, and performs a log m binary-search operation for each one), putting it on par with most common sorting algorithms. The space complexity is O(1) while running (nothing allocated beyond a few temporary variables), and O(n) when it returns its result (because it creates an explicit copy of the subtree, which may not be required depending upon exactly how the algorithm is supposed to express its result). So even this brute-force approach should perform reasonably well, although as noted by other answers an O(n) solution is possible.
There are also possible optimizations that could be applied to this algorithm, such as skipping over any nodes that were contained in a previously evaluated subtree. Because the tree-walk is breadth-first we know than any node that was part of some prior subtree cannot ever be the root of a larger subtree. This could significantly improve the performance of the algorithm in certain cases, but the worst-case running time (two trees with no common subtrees) would still be O(n log m).
I believe that I have an O(n + m)-time, O(n + m) space algorithm for solving this problem, assuming the trees are of size n and m, respectively. This algorithm assumes that the values in the trees are unique (that is, each element appears in each tree at most once), but they do not need to be binary search trees.
The algorithm is based on dynamic programming and works with the following intution: suppose that we have some tree T with root r and children T1 and T2. Suppose the other tree is S. Now, suppose that we know the maximum common subtree of T1 and S and of T2 and S. Then the maximum subtree of T and S
Is completely contained in T1 and r.
Is completely contained in T2 and r.
Uses both T1, T2, and r.
Therefore, we can compute the maximum common subtree (I'll abbreviate this as MCS) as follows. If MCS(T1, S) or MCS(T2, S) has the roots of T1 or T2 as roots, then the MCS we can get from T and S is given by the larger of MCS(T1, S) and MCS(T2, S). If exactly one of MCS(T1, S) and MCS(T2, S) has the root of T1 or T2 as a root (assume w.l.o.g. that it's T1), then look up r in S. If r has the root of T1 as a child, then we can extend that tree by a node and the MCS is given by the larger of this augmented tree and MCS(T2, S). Otherwise, if both MCS(T1, S) and MCS(T2, S) have the roots of T1 and T2 as roots, then look up r in S. If it has as a child the root of T1, we can extend the tree by adding in r. If it has as a child the root of T2, we can extend that tree by adding in r. Otherwise, we just take the larger of MCS(T1, S) and MCS(T2, S).
The formal version of the algorithm is as follows:
Create a new hash table mapping nodes in tree S from their value to the corresponding node in the tree. Then fill this table in with the nodes of S by doing a standard tree walk in O(m) time.
Create a new hash table mapping nodes in T from their value to the size of the maximum common subtree of the tree rooted at that node and S. Note that this means that the MCS-es stored in this table must be directly rooted at the given node. Leave this table empty.
Create a list of the nodes of T using a postorder traversal. This takes O(n) time. Note that this means that we will always process all of a node's children before the node itself; this is very important!
For each node v in the postorder traversal, in the order they were visited:
Look up the corresponding node in the hash table for the nodes of S.
If no node was found, set the size of the MCS rooted at v to 0.
If a node v' was found in S:
If neither of the children of v' match the children of v, set the size of the MCS rooted at v to 1.
If exactly one of the children of v' matches a child of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the subtree rooted at that child.
If both of the children of v' match the children of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the left subtree plus the size of the MCS of the right subtree.
(Note that step (4) runs in expected O(n) time, since it visits each node in S exactly once, makes O(n) hash table lookups, makes n hash table inserts, and does a constant amount of processing per node).
Iterate across the hash table and return the maximum value it contains. This step takes O(n) time as well. If the hash table is empty (S has size zero), return 0.
Overall, the runtime is O(n + m) time expected and O(n + m) space for the two hash tables.
To see a correctness proof, we proceed by induction on the height of the tree T. As a base case, if T has height zero, then we just return zero because the loop in (4) does not add anything to the hash table. If T has height one, then either it exists in T or it does not. If it exists in T, then it can't have any children at all, so we execute branch 4.3.1 and say that it has height one. Step (6) then reports that the MCS has size one, which is correct. If it does not exist, then we execute 4.2, putting zero into the hash table, so step (6) reports that the MCS has size zero as expected.
For the inductive step, assume that the algorithm works for all trees of height k' < k and consider a tree of height k. During our postorder walk of T, we will visit all of the nodes in the left subtree, then in the right subtree, and finally the root of T. By the inductive hypothesis, the table of MCS values will be filled in correctly for the left subtree and right subtree, since they have height ≤ k - 1 < k. Now consider what happens when we process the root. If the root doesn't appear in the tree S, then we put a zero into the table, and step (6) will pick the largest MCS value of some subtree of T, which must be fully contained in either its left subtree or right subtree. If the root appears in S, then we compute the size of the MCS rooted at the root of T by trying to link it with the MCS-es of its two children, which (inductively!) we've computed correctly.
Whew! That was an awesome problem. I hope this solution is correct!
EDIT: As was noted by #jonderry, this will find the largest common subgraph of the two trees, not the largest common complete subtree. However, you can restrict the algorithm to only work on subtrees quite easily. To do so, you would modify the inner code of the algorithm so that it records a subtree of size 0 if both subtrees aren't present with nonzero size. A similar inductive argument will show that this will find the largest complete subtree.
Though, admittedly, I like the "largest common subgraph" problem a lot more. :-)
The following algorithm computes all the largest common subtrees of two binary trees (with no assumption that it is a binary search tree). Let S and T be two binary trees. The algorithm works from the bottom of the trees up, starting at the leaves. We start by identifying leaves with the same value. Then consider their parents and identify nodes with the same children. More generally, at each iteration, we identify nodes provided they have the same value and their children are isomorphic (or isomorphic after swapping the left and right children). This algorithm terminates with the collection of all pairs of maximal subtrees in T and S.
Here is a more detailed description:
Let S and T be two binary trees. For simplicity, we may assume that for each node n, the left child has value <= the right child. If exactly one child of a node n is NULL, we assume the right node is NULL. (In general, we consider two subtrees isomorphic if they are up to permutation of the left/right children for each node.)
(1) Find all leaf nodes in each tree.
(2) Define a bipartite graph B with edges from nodes in S to nodes in T, initially with no edges. Let R(S) and T(S) be empty sets. Let R(S)_next and R(T)_next also be empty sets.
(3) For each leaf node in S and each leaf node in T, create an edge in B if the nodes have the same value. For each edge created from nodeS in S to nodeT in T, add all the parents of nodeS to the set R(S) and all the parents of nodeT to the set R(T).
(4) For each node nodeS in R(S) and each node nodeT in T(S), draw an edge between them in B if they have the same value AND
{
(i): nodeS->left is connected to nodeT->left and nodeS->right is connected to nodeT->right, OR
(ii): nodeS->left is connected to nodeT->right and nodeS->right is connected to nodeT->left, OR
(iii): nodeS->left is connected to nodeT-> right and nodeS->right == NULL and nodeT->right==NULL
(5) For each edge created in step (4), add their parents to R(S)_next and R(T)_next.
(6) If (R(S)_next) is nonempty {
(i) swap R(S) and R(S)_next and swap R(T) and R(T)_next.
(ii) Empty the contents of R(S)_next and R(T)_next.
(iii) Return to step (4).
}
When this algorithm terminates, R(S) and T(S) contain the roots of all maximal subtrees in S and T. Furthermore, the bipartite graph B identifies all pairs of nodes in S and nodes in T that give isomorphic subtrees.
I believe this algorithm has complexity is O(n log n), where n is the total number of nodes in S and T, since the sets R(S) and T(S) can be stored in BST’s ordered by value, however I would be interested to see a proof.