Related
First of all, we know that the searching algorithm of a BST looks like this:
// Function to check if given key exists in given BST.
bool search (node* root, int key)
{
if (root == NULL)
return false;
if (root->key == key)
return true;
if (root->key > key)
return search(root->left, key);
else
return search(root->right, key);
}
This searching algorithm is usually applied in a binary search tree. However, when it comes to a general binary tree, such algorithm may give us wrong results.
The following question has trapped me for quite a long time.
Given a general binary tree, count how many nodes in it can be found using the BST searching algorithm above.
Take the binary tree below as an example. The colored nodes are searchable, so the answer is 3.
Suppose the keys in a tree are unique, and we know the values of all the keys.
My thoughts
I have a naive solution in my mind, which is to call the searching algorithm for every possible key, and count how many times the function returns true.
However, I want to reduce the times of calling functions, and also to improve the time complexity. My intuition tells me that recursion can be useful, but I'm not sure.
I think for each node, we need the information about its parent (or ancestors), therefore I have thought about defining the binary tree data structure as follows
struct node {
int key;
struct node* left;
struct node* right;
struct node* parent; // Adding a 'parent' pointer may be useful....
};
I couldn't really figure out an efficient way to tell if a node is searchable, neither can I come up with one to find out the number of searchable nodes. Thus I came here to look for help. A hint will be better than a full solution.
This is my first time asking a question on Stack Overflow. If you think this post needs improvement, feel free to leave a comment. Thanks for reading.
To count the keys that can be found, you should traverse the tree and keep track of the range (low, high) that is implied by the path you took from the root. So when you go left from a node that has key 5, then you should consider that you cannot find any values any more that are greater than 5 (or equal, as that value is already accounted for). If that node's left child node has key 2, and you take a right, then you know that you cannot find any values any more that are less than 2. So your window has at that moment narrowed to (2, 5). If that window becomes empty, than it makes no sense to dig deeper in that direction of the tree.
This is an algorithm you can apply easily using recursion. Here is some code:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
typedef struct node {
int key;
struct node* left;
struct node* right;
} Node;
Node *create_node(int key, Node *left, Node *right) {
Node * node = malloc(sizeof(struct node));
node->key = key;
node->left = left;
node->right = right;
return node;
}
int count_bst_nodes_recur(Node *node, int low, int high) {
return node == NULL || node->key <= low || node->key >= high ? 0
: 1 + count_bst_nodes_recur(node->left, low, node->key)
+ count_bst_nodes_recur(node->right, node->key, high);
}
int count_bst_nodes(Node *node) {
return count_bst_nodes_recur(node, -INT_MAX, INT_MAX);
}
int main(void) {
// Create example tree given in the question:
Node *tree = create_node(1,
create_node(2,
create_node(4, NULL, NULL),
create_node(5, NULL, NULL)
),
create_node(6,
NULL,
create_node(7, NULL, NULL)
)
);
printf("Number of searchable keys: %d\n", count_bst_nodes(tree)); // -> 3
return 0;
}
The following property is very important for solving this question.
Any binary tree node which respects the BST properties will always be searchable
using BST Search Algorithm.
Consider the example you had shared.
.
Now, suppose
If you are searching for 1 => Then it will lead to success in the first hit. (Count =1)
For 2, it will search in the right subtree of 1. AT 6, no left subtree was found, hence not found.
For 6, search in the right subtree of 1. Match found! (Count =2)
Similarly for 7, search in the right subtree of 1 followed by a search in 6. Match found! (Count =3)
Now, the counter is incremented when all numbers from 0 to max(nodes) are searched in the list.
Another interesting pattern, you can see is that counter is incremented whenever node follows a BST Node property.
One of the important property is:
Root node's value is greater than all the root's left's values and less than all the root's right values.
For example, Consider node 7: it is to the right of 6 and right of 1. Hence a valid node.
With this in mind, the problem can be decomposed to Number of valid BST Nodes in a tree.
Solving this is quite straightforward. You try to use a Tree traversal from top to bottom and check if it is in increasing order. If it is not, there is no need to check its children. If it is, then add counter by 1 and check its children.
How to print a binary tree level by level?
This is an interview question I got today. Sure enough, using a BFS style would definitely work. However, the follow up question is: How to print the tree using constant memory? (So no queue can be used)
I thought of converting the binary tree into a linked list somehow but have not come up with a concrete solution.
Any suggestions?
Thanks
One way to avoid using extra memory (much extra, anyway) is to manipulate the tree as you traverse it -- as you traverse downward through nodes, you make a copy of its pointer to one of the children, then reverse that to point back to the parent. When you've gotten to the bottom, you follow the links back up to the parents, and as you go you reverse them to point back to the children.
Of course, that's not the whole job, but it's probably the single "trickiest" part.
Extending on what Jerry Coffin said, I had asked a question earlier about doing something similar using in-order traversal. It uses the same logic as explained by him.
Check it out here:
Explain Morris inorder tree traversal without using stacks or recursion
You can do it by repeatedly doing an in-order traversal of the tree while printing only those nodes at the specified level. However, this isn't strictly constant memory since recursion uses the call stack. And it's super inefficient. Like O(n * 2^n) or something.
printLevel = 1;
while(moreLevels) {
moreLevels = printLevel(root, 1, printLevel);
printLevel++;
}
boolean printLevel(Node node, int currentLevel, int printLevel) {
boolean moreLevels = false;
if(node == null) {
return(false);
}
else if(currentLevel == printLevel) {
print the node value;
}
else {
moreLevels |= printLevel(node.leftChild, currentLevel + 1, printLevel);
moreLevels |= printLevel(node.rightChild, currentLevel + 1, printLevel);
}
return(moreLevels);
}
I need to implement two rank queries [rank(k) and select(r)]. But before I can start on this, I need to figure out how the two functions work.
As far as I know, rank(k) returns the rank of a given key k, and select(r) returns the key of a given rank r.
So my questions are:
1.) How do you calculate the rank of a node in an AVL(self balancing BST)?
2.) Is it possible for more than one key to have the same rank? And if so, what woulud select(r) return?
I'm going to include a sample AVL tree which you can refer to if it helps answer the question.
Thanks!
Your question really boils down to: "how is the term 'rank' normally defined with respect to an AVL tree?" (and, possibly, how is 'select' normally defined as well).
At least as I've seen the term used, "rank" means the position among the nodes in the tree -- i.e., how many nodes are to its left. You're typically given a pointer to a node (or perhaps a key value) and you need to count the number of nodes to its left.
"Select" is basically the opposite -- you're given a particular rank, and need to retrieve a pointer to the specified node (or the key for that node).
Two notes: First, since neither of these modifies the tree at all, it makes no real difference what form of balancing is used (e.g., AVL vs. red/black); for that matter a tree with no balancing at all is equivalent as well. Second, if you need to do this frequently, you can improve speed considerably by adding an extra field to each node recording how many nodes are to its left.
Rank is the number of nodes in the Left sub tree plus one, and is calculated for every node. I believe rank is not a concept specific to AVL trees - it can be calculated for any binary tree.
Select is just opposite to rank. A rank is given and you have to return a node matching that rank.
The following code will perform rank calculation:
void InitRank(struct TreeNode *Node)
{
if(!Node)
{
return;
}
else
{ Node->rank = 1 + NumeberofNodeInTree(Node->LChild);
InitRank(Node->LChild);
InitRank(Node->RChild);
}
}
int NumeberofNodeInTree(struct TreeNode *Node)
{
if(!Node)
{
return 0;
}
else
{
return(1+NumeberofNodeInTree(Node->LChild)+NumeberofNodeInTree(Node->RChild));
}
}
Here is the code i wrote and worked fine for AVL Tree to get the rank of a particular value. difference is just you used a node as parameter and i used a key a parameter. you can modify this as your own way. Sample code:
public int rank(int data){
return rank(data,root);
}
private int rank(int data, AVLNode r){
int rank=1;
while(r != null){
if(data<r.data)
r = r.left;
else if(data > r.data){
rank += 1+ countNodes(r.left);
r = r.right;
}
else{
r.rank=rank+countNodes(r.left);
return r.rank;
}
}
return 0;
}
[N.B] If you want to start your rank from 0 then initialize variable rank=0.
you definitely should have implemented the method countNodes() to execute this code.
This question already has answers here:
How do you validate a binary search tree?
(33 answers)
Closed 8 years ago.
Given a simple binary tree, how can we prove that tree is a binary search tree? When we traverse a binary tree, how do we know if the node we are on is the left or right child of its parent? I came up with one solution where i would pass some flag in recursive function call which could keep track of whether the node is left or right child of its parent as well as we need one parent node pointer through which we can compare :-
if(flag == 'L' && node->value < node->parent->value)
then continue recursion;
else{
print "not a binary search tree"
exit;
}
In the same way one if condition is needed for R. Apart from this can you think any other efficient way?
Thanks in advance :)
I would just check:
currentNode.Left.max() < currentNode.Value and currentNode.Left.isBinarySearchTree(). If both are fulfilled, it's a binary search tree.
Edit:
The above does traverse the left tree twice (once for the max() and once for isBinarySearchTree. However, it can be done using just one traversal:
Store the minimum and maximum element in your tree class. Updates, etc. of course can be done in O(1) space and time.
Then, instead of using max(), make a method isInRange(m,M), that checks, whether a (sub)tree contains only elements in the range (m,m+1,...,M).
Define isInRange(m,M) as follows:
bool isInRange(m,M){
if (m < currentNode.Value <= M){
return (currentNode.Left.isInRange(m, currentNode.Value) && currentNode.Right.isInrange(currentNode.Value+1, M));
}
return false;
}
Then, the initial call would be root.isInRange(globalmin, globalmax).
Didn't test it so I don't know if it matters in performance.
The simple answer is to do an in-order depth-first tree traversal and check that the nodes are in order.
Example code (Common Lisp):
(defun binary-search-tree-p (node)
(let ((last-value nil))
(labels ((check (value)
(if (or (null last-value) ; first checked value
(>= value last-value))
(setf last-value value)
nil))
(traverse (node)
(if (null node)
t
(and (traverse (left node)) ; \
(check (value node)) ; > in-order traversal
(traverse (right node)))))) ; /
(traverse node))))
Do you already have a way to iterate/traverse over the elements of the tree? Then you can simply traverse the tree and check that each element is greater than the previous one
You can do a depth-first search over the tree without needing to cache the minimum and maximum values for each subtree in a single pass, but you have to be careful, as comparing the values between and parent and its children is not enough. For example, in the tree:
(10
(7
nil
(20 nil nil)
)
nil
)
The left child (7) of the root (10) satisfy the inequalty (7 <= 10), as does the right child (20) of 7 (20 >= 7). However, the tree is not a BST (Binary Search Tree), because 20 should not be in the left subtree of 10.
To fix this, you may do a traversal passing to extra arguments specifying the valid interval for the subtree.
// The valid interval for the subtree root's value is (lower_bound, upper_bound).
bool IsBST(const node_t* tree, int lower_bound, int upper_bound) {
if (tree == NULL) return true; // An empty subtree is OK.
if (tree->value <= lower_bound) return false; // Value in the root is too low.
if (tree->value >= upper_bound) return false; // Value in the root is too high.
// Values at the left subtree should be strictly lower than tree->value
// and be inside the root valid interval.
if (!IsBST(tree->left, lower_bound, tree->value))
return false;
// Values at the left subtree should be strictly greater than tree->value
// and be inside the root valid interval.
if (!IsBST(tree->right, tree->value, upper_bound))
return false;
// Everything is OK, it is a valid BST.
return true;
}
Notice that by keeping the original valid interval, the function will detect that 20 is invalid at that position, as it is not inside (7, 10). The first call should be done with an infinite interval, like:
IsBST(tree, INT_MIN, INT_MAX);
Hope this helps.
What would be the efficient algorithm to find if two given binary trees are equal - in structure and content?
It's a minor issue, but I'd adapt the earlier solution as follows...
eq(t1, t2) =
t1.data=t2.data && eq(t1.left, t2.left) && eq(t1.right, t2.right)
The reason is that mismatches are likely to be common, and it is better to detect (and stop comparing) early - before recursing further. Of course, I'm assuming a short-circuit && operator here.
I'll also point out that this is glossing over some issues with handling structurally different trees correctly, and with ending the recursion. Basically, there need to be some null checks for t1.left etc. If one tree has a null .left but the other doesn't, you have found a structural difference. If both have null .left, there's no difference, but you have reached a leaf - don't recurse further. Only if both .left values are non-null do you recurse to check the subtree. The same applies, of course, for .right.
You could include checks for e.g. (t1.left == t2.left), but this only makes sense if subtrees can be physically shared (same data structure nodes) for the two trees. This check would be another way to avoid recursing where it is unnecessary - if t1.left and t2.left are the same physical node, you already know that those whole subtrees are identical.
A C implementation might be...
bool tree_compare (const node* t1, const node* t2)
{
// Same node check - also handles both NULL case
if (t1 == t2) return true;
// Gone past leaf on one side check
if ((t1 == NULL) || (t2 == NULL)) return false;
// Do data checks and recursion of tree
return ((t1->data == t2->data) && tree_compare (t1->left, t2->left )
&& tree_compare (t1->right, t2->right));
}
EDIT In response to a comment...
The running time for a full tree comparison using this is most simply stated as O(n) where n is kinda the size of a tree. If you're willing to accept a more complex bound you can get a smaller one such as O(minimum(n1, n2)) where n1 and n2 are the sizes of the trees.
The explanation is basically that the recursive call is only made (at most) once for each node in the left tree, and only made (at most) once for each node in the right tree. As the function itself (excluding recursions) only specifies at most a constant amount of work (there are no loops), the work including all recursive calls can only be as much as the size of the smaller tree times that constant.
You could analyse further to get a more complex but smaller bound using the idea of the intersection of the trees, but big O just gives an upper bound - not necessarily the lowest possible upper bound. It's probably not worthwhile doing that analysis unless you're trying to build a bigger algorithm/data structure with this as a component, and as a result you know that some property will always apply to those trees which may allow you a tighter bound for the larger algorithm.
One way to form a tigher bound is to consider the sets of paths to nodes in both trees. Each step is either an L (left subtree) or an R (right subtree). So the root is specified with an empty path. The right child of the left child of the root is "LR". Define a function "paths (T)" (mathematically - not part of the program) to represent the set of valid paths into a tree - one path for every node.
So we might have...
paths(t1) = { "", "L", "LR", "R", "RL" }
paths(t2) = { "", "L", "LL", "R", "RR" }
The same path specifications apply to both trees. And each recursion always follows the same left/right link for both trees. So the recursion visits the paths in the itersection of these sets, and the tightest bound we can specify using this is the cardinality of that intersection (still with the constant bound on work per recursive call).
For the tree structures above, we do recursions for the following paths...
paths(t1) intersection paths(t2) = { "", "L", "R" }
So our work in this case is bounded to at most three times the maximum cost of non-recursive work in the tree_compare function.
This is normally an unnecessary amount of detail, but clearly the intersection of the path-sets is at most as large as the number of nodes in the smallest original tree. And whether the n in O(n) refers to the number of nodes in one original tree or to the sum of the nodes in both, this is clearly no smaller than either the minimum or our intersection. Therefore O(n) isn't such a tight bound, but it's still a valid upper bound, even if we're a bit vague which size we're talking about.
Modulo stack overflow, something like
eq(t1, t2) =
eq(t1.left, t2.left) && t1.data=t2.data && eq(t1.right, t2.right)
(This generalizes to an equality predicate for all tree-structured algebraic data types - for any piece of structured data, check if each of its sub-parts are equal to each of the other one's sub-parts.)
We can also do any of the two traversals (pre-order, post-order or in-order) and then compare the results of both the trees. If they are same, we can be sure of their equivalence.
A more general term for what you are probably trying to accomplish is graph isomorphism. There are some algorithms to do this on that page.
Since it's a proven fact that - it is possible to recreate a binary tree as long as we have the following:
The sequence of nodes that are encountered in an In-Order Traversal.
The sequence of nodes that are encountered in a Pre-Order OR Post-Order Traversal
If two binary trees have the same in-order and [pre-order OR post-order] sequence, then they should be equal both structurally and in terms of values.
Each traversal is an O(n) operation. The traversals are done 4 times in total and the results from the same-type of traversal is compared.
O(n) * 4 + 2 => O(n)
Hence, the total order of time-complexity would be O(n)
I would write it as follows. The following code will work in most functional language, and even in python if your datatypes are hashable (e.g. not dictionaries or lists):
topological equality (same in structure, i.e. Tree(1,Tree(2,3))==Tree(Tree(2,3),1)):
tree1==tree2 means set(tree1.children)==set(tree2.children)
ordered equality:
tree1==tree2 means tree1.children==tree2.children
(Tree.children is an ordered list of children)
You don't need to handle the base cases (leaves), because equality has been defined for them already.
bool identical(node* root1,node* root2){
if(root1 == NULL && root2 == NULL)
return true;
if(root1==NULL && root2!=NULL || root1!=NULL && root2 == NULL)
return false;
if(root1->data == root2->data){
bool lIdetical = identical(root1->left,root2->left);
if(!lIdentical)
return false;
bool rIdentical = identical(root1->right,root2->identical);
return lIdentical && rIdentical;
}
else{
printf("data1:%d vs data2:%d",root1->data,root2->data);
return false;
}
}
I do not know if this is the most effecient but I think this works.