Im attempting to write an algorithm that validates a binary search tree. This is to answer the following algorithm question: https://leetcode.com/problems/validate-binary-search-tree/. I've opted to use a recursive approach and have coded the following in Python3:
def isValidBST(self, root: Optional[TreeNode]) -> bool:
if not root:
return True
left = self.isValidBST(root.left)
right = self.isValidBST(root.right)
if left and right:
if (root.left and root.val <= root.left.val) or (root.right and root.val >= root.right.val):
return False
return True
return False
The algorithm above works up to the 71st test case provided by the leetcode platform. However, it fails on the following input:
[5,4,6,null,null,3,7]
The expected output is False (i.e. not a binary search tree). However, my algorithm outputs True. I've consulted the question description and the diagrams provided multiple times and I believe my output to be correct. With that in mind, is there something I'm missing? Or is the platform incorrect?
The input [5,4,6,null,null,3,7] has the following tree representation.
This is clearly not a valid binary search tree, since the node 3 is less than it's grand parent 5.
In a binary search tree, every node to the right (including all ancestors), must be greater (or grater or equal depending on your definition). The same goes for the left sub trees. Please note that the leetcode problem in this case specifically states that they must be less and greater so equal is not valid. Think about it this way: You need to be able to do a binary search in a BST. When you go right, you expect all the nodes there to be greater than the parent. If a node less than the parent is found there, the ordering is broken and you can't do a binary search. You need to search linearly. Binary searching for 3 in this example would return false, while it's clearly there. So, it isn't a valid binary search tree.
Your algorithm only checks whether the immediate left and right children respect this condition.
To correct it, you need to pass down the min and max allowed values to the recursive function. When you go left, you set the max value equal to the current node value. When you go right you set the min value to the current node value. This makes sure that, say the nodes to the left of a node are never greater than that node.
A possible implementation is as follows:
class Solution:
def isValidBST(self, root: TreeNode) -> bool:
return self.isValidBSTHelper(root, None, None)
def isValidBSTHelper(self, root: TreeNode, min_val: int, max_val: int)-> bool:
if root == None: return True
if (min_val != None and root.val <= min_val) or (max_val != None and root.val >= max_val):
return False
return self.isValidBSTHelper(root.left, min_val, root.val) \
and self.isValidBSTHelper(root.right, root.val, max_val)
Related
I am looking at LeetCode problem 98. Validate Binary Search Tree:
Given the root of a binary tree, determine if it is a valid binary search tree (BST).
A valid BST is defined as follows:
The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
Both the left and right subtrees must also be binary search trees.
What is the problem with the below provided code for validating the binary tree property with preorder traversal?
# Definition for a binary tree node.
# class TreeNode(object):
# def __init__(self, val=0, left=None, right=None):
# self.val = val
# self.left = left
# self.right = right
class Solution(object):
def isValidBST(self, root):
"""
:type root: TreeNode
:rtype: bool
"""
def preorder(root):
if root.left!=None:
if root.left < root.val:
preorder(root.left)
else:
return False
if root.right!=None:
if root.right>root.val:
preorder(root.right)
else:
return False
t= preorder(root)
return t!=False
It is returning False for the Test case where root=[2,1,3]
You need to traverse the right-sub-tree only when the left-sub-tree is a valid one.
def preorder(root):
if root.left!=None:
if root.left < root.val:
preorder(root.left)
else:
return False
if root.right!=None (and left-sub-tree is valid):
if root.right>root.val:
preorder(root.right)
else:
return False
Note: I don't know Python. Understand the comment while checking right-sub-tree.
Several issues:
root.left < root.val is comparing a node with an integer. This should be root.left.val < root.val. Same for the right side check.
Although preorder returns False when there is a violation, the caller that makes the recursive call ignores this return value and happily continues. This is wrong. It should stop the traversal when the recursive call returns False, and itself return False to its caller.
Not the main issue, but as preorder returns False, it should better return True in the remaining cases, so you can be sure that preorder returns a boolean, and don't have to treat None or compare the return value explicitly with False.
The algorithm only checks whether a direct child of a node has a value that is not conflicting with the value of the parent, but that is not enough for validating a BST. All the values in the left subtree must be less than parent's value, not just the left child. So even if the above problems are fixed, this algorithm will wrongly say this BST is valid:
4
/
1
\
9 <-- violation (because of 4)
To solve the last point, you need to revise the algorithm: realise that when you are deep in a BST, there is a window of values that a subtree in a valid BST can have: there is a lower limit and an upper limit. In the above example the right child of the node 1 could only have a value in the range (1, 4).
Here is an implementation as a spoiler:
class Solution(object):
def isValidBST(self, root):
def preorder(root, low, high):
return not root or (
low < root.val < high and
preorder(root.left, low, root.val) and
preorder(root.right, root.val, high)
)
return preorder(root, -float("inf"), float("inf"))
I was solving the following job interview question and solved most of it but failed at the last requirement.
Q: Build a data structure which supports the following functions:
Init - Initialise Empty DS. O(1) Time complexity.
SetPositiveInDay(d,x) - Add to the DS that in day d exactly x new people were infected with covid-19. O(log n)Time complexity.
WorseBefore(d) - From the days inserted into the DS and smaller than d return the last one which has more newly infected people than d. O(log n)Time complexity.
For example:
Init()
SetPositiveInDay(1,10)
SetPositiveInDay(2,20)
SetPositiveInDay(3,15)
SetPositiveInDay(5,17)
SetPositiveInDay(23,180)
SetPositiveInDay(8,13)
SetPositiveInDay(13,18)
WorstBefore(13) // Returns day #2
SetPositiveInDay(10,19)
WorstBefore(13) // Returns day #10
Important note: you can't suppose that days will be entered by order and can't suppose too that there won't be "gaps" between days. (Some days may not be saved in the DS while those after it may be).
What I did?
I used AVL tree (I could use 2-3 tree too).
For each node I have:
Sick - Number of new infected people in that day.
maxLeftSick - Max number of infected people for left son.
maxRightSick - Max number of infected people for right son.
When inserted a new node I made sure that in rotation data won't get missed plus, for each single node from the new one till the root I did:
But I wasn't successful implementing WorseBefore(d).
Where to search?
First you need to find the node node corresponding to d in the tree ordered by days. Let x = Sick(node). This can be done in O(log n).
If maxLeftSick(node) > x, the solution must be in the left subtree of node. Search for the solution there and return the answer. This can be done in O(log n) - see below.
Otherwise, traverse the tree upwards towards the root, starting from node, until you find the first node nextPredecessor satisfying this property (this takes O(log n)):
nextPredecessor is smaller than node,
and either
Sick(nextPredecessor) > x or
maxLeftSick(nextPredecessor) > x.
If no such node exists, we give up. In case 1, just return nextPredecessor since that is the best solution.
In case 2, we know that the solution must be in the left subtree of nextPredecessor, so search there and return the answer. Again, this takes O(log n) - see below.
Note that there is no need to search in the right subtree of nextPredecessor since the only nodes that are smaller than node in that subtree would be the left subtree of node itself, and we have already excluded that.
Note also that it is not necessary to traverse further up the tree than nextPredecessor since those nodes are even smaller, and we are looking for the largest node satisfying all constraints.
How to search?
OK, so how do we search for the solution in a subtree? Finding the largest day within a subtree rooted in q that is worse than an infection number x is simple using the maxLeftSick and maxRightSick information:
If q has a right child and maxRightSick(q) > x then search in the right subtree of q.
If q has no right child and Sick(q) > x, return Day(q).
If q has a left child and maxLeftSick(q) > x then search in the left subtree of q.
Otherwise there is no solution within the subtree q.
We are effectively using maxLeftSick and maxRightSick to prune the search tree to include only "worse" nodes, and within that pruned tree we get the right most node, i.e. the one with the largest day.
It is easy to see that this algorithm runs in O(log n) where n is the total number of nodes since the number of steps is bounded by the height of the tree.
Pseudocode
Here is the pseudocode (assuming maxLeftSick and maxRightSick return -1 if no corresponding child node exists):
// Returns the largest day smaller than d such that its
// infection number is larger than the infection number on day d.
// Returns -1 if no such day exists.
int WorstBefore(int d) {
node = find(d);
// try to find the solution in the left subtree
if (maxLeftSick(node) > Sick(node)) {
return FindLastWorseThan(node -> left, Sick(node));
}
// move up towards root until we find the first node
// that is smaller than `node` and such that
// Sick(nextPredecessor) > Sick(node) or
// maxLeftSick(nextPredecessor) > Sick(node).
nextPredecessor = findNextPredecessor(node);
if (nextPredecessor == null) return -1;
// Case 1
if (Sick(nextPredecessor) > Sick(node)) return nextPredecessor;
// Case 2: maxLeftSick(nextPredecessor) > Sick(node)
return FindLastWorseThan(nextPredecessor -> left, Sick(node));
}
// Finds the latest day within the given subtree with root "node" where
// the infection number is larger than x. Runs in O(log(size(q)).
int FindLastWorseThan(Node q, int x) {
if ((q -> right) = null and Sick(q) > x) return Day(q);
if (maxRightSick(q) > x) return FindLastWorseThan(q -> right, x);
if (maxLeftSick(q) > x) return FindLastWorseThan(q -> left, x);
return -1;
}
First of all, your chosen data structure looks fine to me. You did not mention it explicitly, but I assume that the "key" you use in the AVL tree is the day number, i.e. an in-order traversal of the tree would list the nodes in their chronological order.
I would just suggest a cosmetic change: store the maximum value of sick in the node itself, so that you don't have two similar informations (maxLeftSick and maxRightSick) stored in one node instance, but move those two informations to the child nodes, so that your node.maxLeftSick is actually stored in node.left.maxSick, and similarly node.maxRightSick is stored in node.right.maxSick. This is of course not done when that child does not exist, but then we don't need that information either. In your structure maxLeftSick would be 0 when left is not defined. In my proposed structure, you would not have that value -- the 0 would follow naturally from the fact that there is no left child. In my proposal, the root node would have an information in maxSick which is not present in yours, and which would be the sum of your root.maxLeftSick and root.maxRightSick. This information would not really be used, but it is just there to make the structure consistent throughout the tree.
So you would just store one maxSick, which considers the current node's sick value also in that maximum. The processing you do during rotations will need to change accordingly, but will not become more complex.
I will assume that your AVL tree is single-threaded, i.e. you don't keep track of parent-pointers. So create a find method which will return the path to the node to be found. For instance, in Python syntax, it could look like this:
def find(self, day):
node = self.root
path = [] # an array of nodes
while node:
path.append(node)
if node.day == day: # bingo
return path
if day < node.day:
node = node.left
else:
node = node.right
Then the worstBefore method could look like this:
def worstBefore(self, day):
path = self.find(day)
if not path:
return # day not found
# get number of sick people on that day:
sick = path[-1].sick
# look for recent day with greater number of sick
while path:
node = path.pop() # walk upward, starting with found node
if node.day < day and node.sick > sick:
return node.day
if node.left and node.left.maxSick > sick:
# we will find the result in this subtree
node = node.left
while True:
if node.right and node.right.maxSick > sick:
node = node.right
elif node.sick > sick: # bingo
return node.day
else:
node = node.left
So the path returned by the find method will be used to get the parents of a node when you need to backtrack upwards in the tree along that path.
If along that path you find a left child whose maxSick is greater, then you know that the targeted node must be in that subtree. It is then a matter to walk down that subtree in a controlled way, choosing the right child when it still has maxSick greater. Otherwise check the current node's sick value and return that one if that value is greater. Otherwise go left, and repeat.
While there is no such left sub tree, go up along the path. If that parent would be a match, then return it (make sure to verify the day number). Keep checking for left sub trees that have a larger maxSick.
This runs in O(logn) because you first will walk zero or more steps upward and then zero or more steps downward (in a left subtree).
You can see your example scenario run on repl.it. There I focussed on this question, and didn't implement the rotations.
How much does this algorithm return? Isn't it 1 as left[x] doesn't have children?
NIL = no children
left[x] = left child
right[x] = right child
LEAVES is the name of the algorithm
if (x = nil) then
return 0
else if left[x] = nil then
return 1
else
return Leaves(left[x]) + Leaves(right[x])
end if
I am assuming you want the algorithm to return the number of leaves in the tree. In that case, if a node is equal to NIL then the node is a leaf. Thus the algorithm should be:
def Leaves(x):
if(NotANode(x)):
return 0
if(x==nil):
return 1
return Leaves(left[x]) + Leaves(right[x])
NOTE: I like to use python-like formatting for pseudocode/descriptions of algorithms. Hope you do not mind :) Also note that you did not mention what left[x] is if x is missing only the right child (or vice versa). I assumed that it returned a non-node value and used the made up function NotANode to account for it.
Possible Explanation
Based on how you asked this question I am guessing that this algorithm was given to you and you do not think that it is correct. Is it possible that you do not understand it's purpose?
Normally in programming nil or NULL is used to represent an "empty" object. So the first line if x = nil checks to see if x x is really a node! If it is not then it returns 0 which (assuming the purpose of Leaves is to count the number of leaves in the tree) is correct behavior.
Next we check if the left child of x exists. If it does not then we return 1. The only way this line makes sense is if we can assume that every node fills it's left child first, as then not having a left child means you do not have a right child either and so you are a leaf node (hence the result of 1).
Finally if you do have a left child then the given node is not a leaf and so we must look to it's child subtrees and count their leaves for the answer.
Review the origin of this algorithm to make sure you understand the constraints of the problem and let me know if you need more help :)
This question already has answers here:
How do you validate a binary search tree?
(33 answers)
Closed 8 years ago.
Given a simple binary tree, how can we prove that tree is a binary search tree? When we traverse a binary tree, how do we know if the node we are on is the left or right child of its parent? I came up with one solution where i would pass some flag in recursive function call which could keep track of whether the node is left or right child of its parent as well as we need one parent node pointer through which we can compare :-
if(flag == 'L' && node->value < node->parent->value)
then continue recursion;
else{
print "not a binary search tree"
exit;
}
In the same way one if condition is needed for R. Apart from this can you think any other efficient way?
Thanks in advance :)
I would just check:
currentNode.Left.max() < currentNode.Value and currentNode.Left.isBinarySearchTree(). If both are fulfilled, it's a binary search tree.
Edit:
The above does traverse the left tree twice (once for the max() and once for isBinarySearchTree. However, it can be done using just one traversal:
Store the minimum and maximum element in your tree class. Updates, etc. of course can be done in O(1) space and time.
Then, instead of using max(), make a method isInRange(m,M), that checks, whether a (sub)tree contains only elements in the range (m,m+1,...,M).
Define isInRange(m,M) as follows:
bool isInRange(m,M){
if (m < currentNode.Value <= M){
return (currentNode.Left.isInRange(m, currentNode.Value) && currentNode.Right.isInrange(currentNode.Value+1, M));
}
return false;
}
Then, the initial call would be root.isInRange(globalmin, globalmax).
Didn't test it so I don't know if it matters in performance.
The simple answer is to do an in-order depth-first tree traversal and check that the nodes are in order.
Example code (Common Lisp):
(defun binary-search-tree-p (node)
(let ((last-value nil))
(labels ((check (value)
(if (or (null last-value) ; first checked value
(>= value last-value))
(setf last-value value)
nil))
(traverse (node)
(if (null node)
t
(and (traverse (left node)) ; \
(check (value node)) ; > in-order traversal
(traverse (right node)))))) ; /
(traverse node))))
Do you already have a way to iterate/traverse over the elements of the tree? Then you can simply traverse the tree and check that each element is greater than the previous one
You can do a depth-first search over the tree without needing to cache the minimum and maximum values for each subtree in a single pass, but you have to be careful, as comparing the values between and parent and its children is not enough. For example, in the tree:
(10
(7
nil
(20 nil nil)
)
nil
)
The left child (7) of the root (10) satisfy the inequalty (7 <= 10), as does the right child (20) of 7 (20 >= 7). However, the tree is not a BST (Binary Search Tree), because 20 should not be in the left subtree of 10.
To fix this, you may do a traversal passing to extra arguments specifying the valid interval for the subtree.
// The valid interval for the subtree root's value is (lower_bound, upper_bound).
bool IsBST(const node_t* tree, int lower_bound, int upper_bound) {
if (tree == NULL) return true; // An empty subtree is OK.
if (tree->value <= lower_bound) return false; // Value in the root is too low.
if (tree->value >= upper_bound) return false; // Value in the root is too high.
// Values at the left subtree should be strictly lower than tree->value
// and be inside the root valid interval.
if (!IsBST(tree->left, lower_bound, tree->value))
return false;
// Values at the left subtree should be strictly greater than tree->value
// and be inside the root valid interval.
if (!IsBST(tree->right, tree->value, upper_bound))
return false;
// Everything is OK, it is a valid BST.
return true;
}
Notice that by keeping the original valid interval, the function will detect that 20 is invalid at that position, as it is not inside (7, 10). The first call should be done with an infinite interval, like:
IsBST(tree, INT_MIN, INT_MAX);
Hope this helps.
What would be the efficient algorithm to find if two given binary trees are equal - in structure and content?
It's a minor issue, but I'd adapt the earlier solution as follows...
eq(t1, t2) =
t1.data=t2.data && eq(t1.left, t2.left) && eq(t1.right, t2.right)
The reason is that mismatches are likely to be common, and it is better to detect (and stop comparing) early - before recursing further. Of course, I'm assuming a short-circuit && operator here.
I'll also point out that this is glossing over some issues with handling structurally different trees correctly, and with ending the recursion. Basically, there need to be some null checks for t1.left etc. If one tree has a null .left but the other doesn't, you have found a structural difference. If both have null .left, there's no difference, but you have reached a leaf - don't recurse further. Only if both .left values are non-null do you recurse to check the subtree. The same applies, of course, for .right.
You could include checks for e.g. (t1.left == t2.left), but this only makes sense if subtrees can be physically shared (same data structure nodes) for the two trees. This check would be another way to avoid recursing where it is unnecessary - if t1.left and t2.left are the same physical node, you already know that those whole subtrees are identical.
A C implementation might be...
bool tree_compare (const node* t1, const node* t2)
{
// Same node check - also handles both NULL case
if (t1 == t2) return true;
// Gone past leaf on one side check
if ((t1 == NULL) || (t2 == NULL)) return false;
// Do data checks and recursion of tree
return ((t1->data == t2->data) && tree_compare (t1->left, t2->left )
&& tree_compare (t1->right, t2->right));
}
EDIT In response to a comment...
The running time for a full tree comparison using this is most simply stated as O(n) where n is kinda the size of a tree. If you're willing to accept a more complex bound you can get a smaller one such as O(minimum(n1, n2)) where n1 and n2 are the sizes of the trees.
The explanation is basically that the recursive call is only made (at most) once for each node in the left tree, and only made (at most) once for each node in the right tree. As the function itself (excluding recursions) only specifies at most a constant amount of work (there are no loops), the work including all recursive calls can only be as much as the size of the smaller tree times that constant.
You could analyse further to get a more complex but smaller bound using the idea of the intersection of the trees, but big O just gives an upper bound - not necessarily the lowest possible upper bound. It's probably not worthwhile doing that analysis unless you're trying to build a bigger algorithm/data structure with this as a component, and as a result you know that some property will always apply to those trees which may allow you a tighter bound for the larger algorithm.
One way to form a tigher bound is to consider the sets of paths to nodes in both trees. Each step is either an L (left subtree) or an R (right subtree). So the root is specified with an empty path. The right child of the left child of the root is "LR". Define a function "paths (T)" (mathematically - not part of the program) to represent the set of valid paths into a tree - one path for every node.
So we might have...
paths(t1) = { "", "L", "LR", "R", "RL" }
paths(t2) = { "", "L", "LL", "R", "RR" }
The same path specifications apply to both trees. And each recursion always follows the same left/right link for both trees. So the recursion visits the paths in the itersection of these sets, and the tightest bound we can specify using this is the cardinality of that intersection (still with the constant bound on work per recursive call).
For the tree structures above, we do recursions for the following paths...
paths(t1) intersection paths(t2) = { "", "L", "R" }
So our work in this case is bounded to at most three times the maximum cost of non-recursive work in the tree_compare function.
This is normally an unnecessary amount of detail, but clearly the intersection of the path-sets is at most as large as the number of nodes in the smallest original tree. And whether the n in O(n) refers to the number of nodes in one original tree or to the sum of the nodes in both, this is clearly no smaller than either the minimum or our intersection. Therefore O(n) isn't such a tight bound, but it's still a valid upper bound, even if we're a bit vague which size we're talking about.
Modulo stack overflow, something like
eq(t1, t2) =
eq(t1.left, t2.left) && t1.data=t2.data && eq(t1.right, t2.right)
(This generalizes to an equality predicate for all tree-structured algebraic data types - for any piece of structured data, check if each of its sub-parts are equal to each of the other one's sub-parts.)
We can also do any of the two traversals (pre-order, post-order or in-order) and then compare the results of both the trees. If they are same, we can be sure of their equivalence.
A more general term for what you are probably trying to accomplish is graph isomorphism. There are some algorithms to do this on that page.
Since it's a proven fact that - it is possible to recreate a binary tree as long as we have the following:
The sequence of nodes that are encountered in an In-Order Traversal.
The sequence of nodes that are encountered in a Pre-Order OR Post-Order Traversal
If two binary trees have the same in-order and [pre-order OR post-order] sequence, then they should be equal both structurally and in terms of values.
Each traversal is an O(n) operation. The traversals are done 4 times in total and the results from the same-type of traversal is compared.
O(n) * 4 + 2 => O(n)
Hence, the total order of time-complexity would be O(n)
I would write it as follows. The following code will work in most functional language, and even in python if your datatypes are hashable (e.g. not dictionaries or lists):
topological equality (same in structure, i.e. Tree(1,Tree(2,3))==Tree(Tree(2,3),1)):
tree1==tree2 means set(tree1.children)==set(tree2.children)
ordered equality:
tree1==tree2 means tree1.children==tree2.children
(Tree.children is an ordered list of children)
You don't need to handle the base cases (leaves), because equality has been defined for them already.
bool identical(node* root1,node* root2){
if(root1 == NULL && root2 == NULL)
return true;
if(root1==NULL && root2!=NULL || root1!=NULL && root2 == NULL)
return false;
if(root1->data == root2->data){
bool lIdetical = identical(root1->left,root2->left);
if(!lIdentical)
return false;
bool rIdentical = identical(root1->right,root2->identical);
return lIdentical && rIdentical;
}
else{
printf("data1:%d vs data2:%d",root1->data,root2->data);
return false;
}
}
I do not know if this is the most effecient but I think this works.