Problems with in-order tree traversal - data-structures

I found this picture at Wikipedia:
According to the text underneath the picutre, the in-order is: A, B, C, D, E, F, G, H, I
I understand the order A-F, but what I don't understand is the order of the last three nodes. Is it not supposed to be H, I, G? I thought you were supposed to list internal nodes on your 2nd encounter and leaves on your 1st?
EDIT: would my order be correct if the tree in the picture was a general tree and not a binary tree? (So that G would have only one node, instead of a right node and a null left node.)

Unfortunately Wikipedia is right!
Detailed Algorithm
inorder_traversal(node)
{
if(node!=NULL)
{
inorder_traversas(node->left);
print(node->data); //you overlooked this statement on reaching G
inorder_traversal(node->right);
}
}
Where are you missing the point/going the wrong way?
On reaching G from F, you tried to enter into left child of G which is NULL (in short inorder_traversal(node->left) statement failed as the algorithm has checked if(node!=NULL)
and G's left child is NULL, so after that it again comes back to G, after that you overlooked the print statement which would have printed G there - after overlooking that print statement you jumped to the next statement which checked if node->right is NULL or not - you found it not null (which took you to I). Then you didn't print I directly and checked for its left child. As left child H was there, so you printed it and then you returned to I - the parent, its right child was null. Hence wikipedia is right.
Tip for testing inorder traversal
: Make a binary tree which accepts integer (or numbers generally) as input data, when you'll traverse in inorder fashion, digits will be printed in ascending order.

The answer that has been mentioned by wikipedia is correct. You said you understood A-F, so I will get from G.
In-order traversal follows the series: left-parent-right. So when F is traversed and the algorithm goes to the right child of F, it first encounters G and then tries to locate the left child of G. Since the left child of G is null, nothing is printed and then it goes to the parent, that is, G; so G is printed. Now, it goes to the right child of G which is I. Now, it locates the left child of I which is H and prints it. Then it returns to the parent that is I and prints it. It then traverses to the right of I and finds that it is null. Now as all the nodes have been traversed, the algorithm terminates. Thus the order is G H I

In-order traversal works like this:
1- print the left sub tree.
2- print the root.
3- print the right sub tree
In the example you provided:
1- the left sub-tree of `G` is empty, so we do not print any thing.
2- we print the node `G`.
3- finally we print the right sub-tree of `G`

Related

Job Interview Question Using Trees, What data to save?

I was solving the following job interview question and solved most of it but failed at the last requirement.
Q: Build a data structure which supports the following functions:
Init - Initialise Empty DS. O(1) Time complexity.
SetPositiveInDay(d,x) - Add to the DS that in day d exactly x new people were infected with covid-19. O(log n)Time complexity.
WorseBefore(d) - From the days inserted into the DS and smaller than d return the last one which has more newly infected people than d. O(log n)Time complexity.
For example:
Init()
SetPositiveInDay(1,10)
SetPositiveInDay(2,20)
SetPositiveInDay(3,15)
SetPositiveInDay(5,17)
SetPositiveInDay(23,180)
SetPositiveInDay(8,13)
SetPositiveInDay(13,18)
WorstBefore(13) // Returns day #2
SetPositiveInDay(10,19)
WorstBefore(13) // Returns day #10
Important note: you can't suppose that days will be entered by order and can't suppose too that there won't be "gaps" between days. (Some days may not be saved in the DS while those after it may be).
What I did?
I used AVL tree (I could use 2-3 tree too).
For each node I have:
Sick - Number of new infected people in that day.
maxLeftSick - Max number of infected people for left son.
maxRightSick - Max number of infected people for right son.
When inserted a new node I made sure that in rotation data won't get missed plus, for each single node from the new one till the root I did:
But I wasn't successful implementing WorseBefore(d).
Where to search?
First you need to find the node node corresponding to d in the tree ordered by days. Let x = Sick(node). This can be done in O(log n).
If maxLeftSick(node) > x, the solution must be in the left subtree of node. Search for the solution there and return the answer. This can be done in O(log n) - see below.
Otherwise, traverse the tree upwards towards the root, starting from node, until you find the first node nextPredecessor satisfying this property (this takes O(log n)):
nextPredecessor is smaller than node,
and either
Sick(nextPredecessor) > x or
maxLeftSick(nextPredecessor) > x.
If no such node exists, we give up. In case 1, just return nextPredecessor since that is the best solution.
In case 2, we know that the solution must be in the left subtree of nextPredecessor, so search there and return the answer. Again, this takes O(log n) - see below.
Note that there is no need to search in the right subtree of nextPredecessor since the only nodes that are smaller than node in that subtree would be the left subtree of node itself, and we have already excluded that.
Note also that it is not necessary to traverse further up the tree than nextPredecessor since those nodes are even smaller, and we are looking for the largest node satisfying all constraints.
How to search?
OK, so how do we search for the solution in a subtree? Finding the largest day within a subtree rooted in q that is worse than an infection number x is simple using the maxLeftSick and maxRightSick information:
If q has a right child and maxRightSick(q) > x then search in the right subtree of q.
If q has no right child and Sick(q) > x, return Day(q).
If q has a left child and maxLeftSick(q) > x then search in the left subtree of q.
Otherwise there is no solution within the subtree q.
We are effectively using maxLeftSick and maxRightSick to prune the search tree to include only "worse" nodes, and within that pruned tree we get the right most node, i.e. the one with the largest day.
It is easy to see that this algorithm runs in O(log n) where n is the total number of nodes since the number of steps is bounded by the height of the tree.
Pseudocode
Here is the pseudocode (assuming maxLeftSick and maxRightSick return -1 if no corresponding child node exists):
// Returns the largest day smaller than d such that its
// infection number is larger than the infection number on day d.
// Returns -1 if no such day exists.
int WorstBefore(int d) {
node = find(d);
// try to find the solution in the left subtree
if (maxLeftSick(node) > Sick(node)) {
return FindLastWorseThan(node -> left, Sick(node));
}
// move up towards root until we find the first node
// that is smaller than `node` and such that
// Sick(nextPredecessor) > Sick(node) or
// maxLeftSick(nextPredecessor) > Sick(node).
nextPredecessor = findNextPredecessor(node);
if (nextPredecessor == null) return -1;
// Case 1
if (Sick(nextPredecessor) > Sick(node)) return nextPredecessor;
// Case 2: maxLeftSick(nextPredecessor) > Sick(node)
return FindLastWorseThan(nextPredecessor -> left, Sick(node));
}
// Finds the latest day within the given subtree with root "node" where
// the infection number is larger than x. Runs in O(log(size(q)).
int FindLastWorseThan(Node q, int x) {
if ((q -> right) = null and Sick(q) > x) return Day(q);
if (maxRightSick(q) > x) return FindLastWorseThan(q -> right, x);
if (maxLeftSick(q) > x) return FindLastWorseThan(q -> left, x);
return -1;
}
First of all, your chosen data structure looks fine to me. You did not mention it explicitly, but I assume that the "key" you use in the AVL tree is the day number, i.e. an in-order traversal of the tree would list the nodes in their chronological order.
I would just suggest a cosmetic change: store the maximum value of sick in the node itself, so that you don't have two similar informations (maxLeftSick and maxRightSick) stored in one node instance, but move those two informations to the child nodes, so that your node.maxLeftSick is actually stored in node.left.maxSick, and similarly node.maxRightSick is stored in node.right.maxSick. This is of course not done when that child does not exist, but then we don't need that information either. In your structure maxLeftSick would be 0 when left is not defined. In my proposed structure, you would not have that value -- the 0 would follow naturally from the fact that there is no left child. In my proposal, the root node would have an information in maxSick which is not present in yours, and which would be the sum of your root.maxLeftSick and root.maxRightSick. This information would not really be used, but it is just there to make the structure consistent throughout the tree.
So you would just store one maxSick, which considers the current node's sick value also in that maximum. The processing you do during rotations will need to change accordingly, but will not become more complex.
I will assume that your AVL tree is single-threaded, i.e. you don't keep track of parent-pointers. So create a find method which will return the path to the node to be found. For instance, in Python syntax, it could look like this:
def find(self, day):
node = self.root
path = [] # an array of nodes
while node:
path.append(node)
if node.day == day: # bingo
return path
if day < node.day:
node = node.left
else:
node = node.right
Then the worstBefore method could look like this:
def worstBefore(self, day):
path = self.find(day)
if not path:
return # day not found
# get number of sick people on that day:
sick = path[-1].sick
# look for recent day with greater number of sick
while path:
node = path.pop() # walk upward, starting with found node
if node.day < day and node.sick > sick:
return node.day
if node.left and node.left.maxSick > sick:
# we will find the result in this subtree
node = node.left
while True:
if node.right and node.right.maxSick > sick:
node = node.right
elif node.sick > sick: # bingo
return node.day
else:
node = node.left
So the path returned by the find method will be used to get the parents of a node when you need to backtrack upwards in the tree along that path.
If along that path you find a left child whose maxSick is greater, then you know that the targeted node must be in that subtree. It is then a matter to walk down that subtree in a controlled way, choosing the right child when it still has maxSick greater. Otherwise check the current node's sick value and return that one if that value is greater. Otherwise go left, and repeat.
While there is no such left sub tree, go up along the path. If that parent would be a match, then return it (make sure to verify the day number). Keep checking for left sub trees that have a larger maxSick.
This runs in O(logn) because you first will walk zero or more steps upward and then zero or more steps downward (in a left subtree).
You can see your example scenario run on repl.it. There I focussed on this question, and didn't implement the rotations.

Let P be a singly linked list. Let Q be the pointer to an intermediate node x in the list, then how to delete x from list?

What is the worst-case time complexity of the best-known algorithm to delete the node x from the list ?
According to me it should be O(1) :
Algo:
Q -> data = Q ->next -> data; // Copy the value of next node into Q.
del = Q -> next; // take another pointer variable pointing to next node of Q.
Q -> next = Q -> next ->next;
free (del);
So what is the issue in this , The links I have gone through , it is given O(n)
http://geeksquiz.com/gate-gate-it-2004-question-13/
In the comments on the question, this exact question is being asked. The answer is:
[this] doesn't work for the case when the node to be deleted is last node
The solution you presented won't work for the last node, since no successor is available to this node. Why this matters for "intermediate" nodes isn't entirely clear to me as well, but that's the official answer.

How to find a node's predecessor in a BST with each node having 3 attributes--succ,left and right?

A problem from CLRS ,3ed.
12.3-5
Suppose that instead of each node x keeping the attribute x.p, pointing to x’s parent, it keeps x.succ, pointing to x’s successor. Give pseudocode for SEARCH,INSERT, and DELETE on a binary search tree T using this representation. These procedures should operate in time O(h), where h is the height of the tree T . (Hint:You may wish to implement a subroutine that returns the parent of a node.)
I know how to implement a subroutine that returns the parent of a node in O(h) time.
To find the parent of node x, we should find the maximum key M in the subtree rooted at x first. Then, we go downward to the right from M.succ.left. When we reach x, the node we encounter before x is x's parent.
See the code.
typedef struct TREE{
struct TREE* left,right,succ;
}*BST;
PARENT(BST x)
{
if(x==root) return NULL;
BST max=TREE_MAXIMUM(x);
BST parent=max->succ,t;
if(parent) t=parent->left;
else t=root;
while(t!=x){parent=t;t=t->right;}
return parent;
}
When DELETE x, the succ of x's predecessor should be modified to point to x.succ, no longer x. So now comes the problem--how to find the predecessor of x in O(h) time?
When the left subtree of x is nonempty, it's the rightmost node in x's left subtree. However, when the left subtree of x is empty, the predecessor is x's ancestor, to find which calls O(h) times of PARENT. Does that needs O(h*h) time? Or should we search downward from the root?
Notice that the operation INSERT, also needs to find a node's predecessor.
There comes a question--what if all the keys share the same value? Then we cannot find x's predecessor using comparison, for the key a, which equals to key x, may appears in x's left subtree or right subtree.
In case the left subtree of x is empty, the predecessor of x is not just its ancestor, it's the direct parent of x. So you only need a single call to the parent subroutine, making the over all runtime O(h)+O(h)=O(h).
P.S
Even if it wasn't the direct parent, you still don't have to call parent repeatedly to get all ancestors, you can just start searching for x from the root of the tree as you normally would, saving all of the nodes along the path to x in a single traversal of length O(h). All of the nodes you pass through when searching for x, and only those nodes, are by definition the ancestors of x.
If keys are discrete, the predecessor of x has key that is <= key(x)-1 right? And if you search the tree for key(x)-1 what happens? Don't you necessarily run into pred(x) during that traverse? You'd have to double-check me on that. And of course non-discrete keys (e.g. floating point) could be problematic (as to -1).
Note : O(h) is only possible if the root is directly available.(which should be the case)
Changing your data structure a bit.
typedef struct TREE{
int key;
struct TREE* left,right,succ;
}*BST;
If the left sub-tree is non-empty we use TREE_MAXIMUM, otherwise we search for x in the BST starting the search from root and maintain(in variable goal) the last(i.e latest) node encountered with a right child. At the end of the search, goal is the predecessor.
Predecessor function
BST PRE(BST x)
{
if(x->left!=NULL)
return TREE_MAXIMUM(x->left);
if(x== root)
return NULL;
BST goal = NULL, y = root;
while(1)
{
if(x->key <= y->key)
y=y->left;
else
{
goal=y;
y=y->right;
}
if(x==y)
{
return goal;
}
}
}

Pseudo Code and conditions for deleting a Node in Binary Search Tree

I'm trying to write a function to remove a node from a binary tree. I haven't coded the function yet, and I am trying to think about the different conditions I should consider for removing a node. I am guessing that the possible conditions are:
The node has no children
The node has one child
The node has 2 children
In each of these cases what would be the algorithm to perform a delete function?
This is something you would find in any standard textbook about algorithms, but let's suppose you are interested in the unbalanced case (balanced trees usually performs some rebalancing operations called "rotations" after a removal) and you use the "obvious" datastructure (a tree_node structure that holds the value and two pointers to other tree_node):
No children: release the memory hold by the node and set the parent's child link that pointed to it as NULL;
One child: release the memory hold by the node and set the parent's child link that pointed to it as the address of its unique child;
Two children: this is indeed the "complicated" case. Find the rightmost node of the left child (or the leftmost node of the right child), take its value, remove it (it is "case 1", so it is easy and can be done recursively) and set the current node's value as the one of that node. This is O(tree_height) = O(n), but it is not a problem (at least in theory) because this would be neverthless the complexity of finding a node.
Does your tree have any additional properties?
Is it an AVL?
If not, there are some pretty obvious and straightforward ways to do what you want (which will depend on your data representation, as Vitalij said).
And if it is an AVL for example, there ALSO are some well known method for doing that (wikipedia will tell you more on that topic)
First task is to find whether node exists which will be done during search and rest of your conditions are correct.
Leaf node: set the parent's child (right/left) to NULL.
Has one child: Just set the child of the node to be deleted to its parent's child.
Has two children: Basically have to re-order the whole subtree here by pruning the subtree to by finding new children for the node to be deleted.
Assuming you are dealing with general binary trees, do the following,
Node has no child- ie it is a leaf : Conveniently delete it..
Node has one child - Make the parent of the node to be deleted parent of its child , then delete the node. ie, if A->Parent = B; C->Parent = A; and A has to be deleted, then 1. Make C->Parent = B; 2. Delete A;
Tricky one.... Yes, replacing the node to be deleted by the left most child of the right subtree work, or by the rightmost tree of the left subtree, either will do... because it can be seen like this,
When a node is deleted, it has to be replaced by a node which satisfies some properties...
Lets say if our binary tree represents sorted numbers (in increasing order) in inorder traversal, then the deleted node should be replaced by some node from either of its subtrees. That should be larger in value than the whole remaining left subtree, and smaller than the whole remaining right subtree (remaining means the subtree remaining after adjusting for the deleted node successfully). Only two such nodes exist, leftmost leaf of the right subtree, or the rightmost node of left one.
Hence, replacing the deleted node from either one suffices...!!
Delete the given keys one at a time from the binary search tree. Possible equal keys were inserted into the left branch of the existing node. Please note that the insertion strategy also affects how the deletion is performed
BinarySearchTree-Delete
Node Delete(Node root, Key k)
1 if (root == null) // failed search
2 return null;
3 if (k == root.key) // successful search
4 return DeleteThis(root);
5 if (k < root.key) // k in the left branch
6 root.left = Delete(root.left, k);
7 else // k > root.key, i.e., k in the right branch
8 root.right = Delete(root.right, k);
9 return root;
Node DeleteThis(Node root)
1 if root has two children
2 p = Largest(root.left); // replace root with its immediate predecessor p
3 root.key = p.key;
4 root.left = Delete(root.left, p)
5 return root;
6 if root has only left child
7 return root.left
8 if root has only right child
9 return root.right
10 else root has no children
11 return null
Node Largest(Node root)
1 if root has no right child
2 return root
3 return Largest(root.right)

Sum up child values and save the values calculated in intermediate steps

struct node {
int value;
struct node* left;
struct node* right;
int left_sum;
int right_sum;
}
In a binary tree, from a particular node, there is a simply recursive algorithm to sum up all its child values. Is there a way to save the values calculated in the intermediate steps and store them as left_sum and right_sum in child nodes?
Will it be easier to do this bottom up by adding a struct node* parent link to the node definition?
No, this is clearly an exercise in recursion. Think about what the sum means. It's zero plus the "sum of all values from the root down".
Interestingly enough, the "sum of all values from the root down" is the value of the root node plus the "sum of all values from its left node down" plus the "sum of all values from its right node down".
Hopefully, you can see where I'm going here.
The essence of recursion is to define an operation in terms of similar, simpler, operations with a terminating condition.
The terminating condition, in this case, is the leaf nodes of the tree or, to make the code simpler, beyond the leaf nodes.
Examine the following pseudo-code:
def sumAllNodes (node):
if node == NULL:
return 0
return node.value + sumAllNodes (node.left) + sumAllNodes (node.right)
fullSum = sumAllNodes (rootnode)
That's really all there is to it. With the following tree:
__A(9)__
/ \
B(3) C(2)
/ \ \
D(21) E(7) F(1)
Using the pseudo-code, the sum is the value of A (9) plus the sums of the left and right subtrees.
The left subtree of A is the value of B (3) plus the sums of its left and right subtrees.
The left subtree of B is the value of D (21) plus the sums of its left and right subtrees.
The left subtree of D is the value of NULL (0).
Later on, the right subtree of A is the value of C (2) plus the sums of its left and right subtrees, it's left subtree being empty, its right subtree being F (1).
Because you're doing this recursively, you don't explicitly ever walk your way up the tree. It's the fact that the recursive calls are returning with the summed values which gives that ability. In other words, it happens under the covers.
And the other part of your question is not really useful though, of course, there may be unstated requirements that I'm not taking into account, because they're, well, ... unstated :-)
Is there a way to save the values calculated in the intermediate steps and store them as left_sum and right_sum in child nodes?
You never actually re-use the sums for a given sub-tree. During a sum calculation, you would calculate the B-and-below subtree only once as part of adding it to A and the C-and-below subtree.
You could store those values so that B contained both the value and the two sums (left and right) - this would mean that every change to the tree would have to propagate itself up to the root as well but it's doable.
Now there are some situations where that may be useful. For example, if the tree itself changes very rarely but you want the sum very frequently, it makes sense performance wise to do it on update so that the cost is amortised across lots of reads.
I sometimes use this method with databases (which are mostly read far more often than written) but it's unusual to see it in "normal" binary trees.
Another possible optimisation: just maintain the sum as a separate variable in the tree object. Initialise it to zero then, whenever you add a node, add its value to the sum.
When you delete a node, subtract its value from the sum. That gives you your very fast O(1) "return sum" function without having to propagate upwards on update.
The downside is that you only have a sum for the tree as a whole but I'm having a hard time coming up with a valid use case for needing the sum of subtrees. If you have such a use case, then I'd go for something like:
def updateAllNodes (node):
if node == NULL:
return 0
node.leftSum = updateAllNodes (node.left)
node.rightSum = updateAllNodes (node.right)
return node.value + node.leftSum + node.rightSum
change the tree somehow (possibly many times)
fullSum = updateAllNodes (root)
In other words, just update the entire tree after each change (or batch the changes then update if you know there's quite a few changes happening). This will probably be a little simpler than trying to do it as part of the tree update itself.
You can even use a separate dirtyFlag which is set to true whenever the tree changes and set to false whenever you calculate and store the sum. Then use that in the sum calculation code to only do the recalc if it's dirty (in other words, a cache of the sums).
That way, code like:
fullSum = updateAllNodes (root)
fullSum = updateAllNodes (root)
fullSum = updateAllNodes (root)
fullSum = updateAllNodes (root)
fullSum = updateAllNodes (root)
will only incur a cost on the first invocation. The other four should be blindingly fast since the sum is cached.

Resources