basically i am required to come out with a pseudocode for this. What i currently have is
dictionary = {}
if node.left == none and node.right == none
visit(node)
dictionary[node] = 1
This is only the leaf nodes, how do i get the size for each node(parent and root)?
You can do a post-order traversal to find the size of each node.
The idea is to first handle both left and right trees. Then, after they are processed - you can use this data to process the current node.
This should look something like:
count = 0
if (node.left != none)
count += visit(node.left)
if (node.right != none)
count += visit(node.right)
// self is included.
count += 1
// update the node
node.size = count
return count
The dictionary for visited nodes is not needed since this is a tree, it guarantees to end.
As a side note - the size attribute of each node, is an important one. It basically upgrades your tree to a Order Statistics Tree
well the concept is that each node will know it's subtree size by first knowing the subtree size of all it's child which is maximum two child here as it is a binary tree, so once it knows subtree size of all child it can then add up all of them and atlast add 1 to it's
result and then the same thing will be done by it's parent also and so on upto root node. if we think about leaf node, it
has no child, so result subtree size will be only 1 in which it include itself.
one this idea is clear, it is easy to write code
that while traversing we will first know the subtree size of child nodes of current node then add 1 in it, in case of leaf node it will have subtree size of 1 only, below is the pseudocode of traverse funtion which finds the subtree size of each node and store them in dictionary sizeDictionary and a visited dictionary/array having larger scope has been used to keep track of visited nodes.
traverse(Tree curNode, dictionary subTreeSizeDictionary)
visited[curNode] = true
subtreeSizeDictionary[curNode] = 0
for child of curNode
if(notVisited[child])
traverse(child , sizeDictionary)
subtreeSizeDictionary[curNode] += subtreeSizeDictionary[child]
subtreeSizeDictionary[curNode] += 1;
here it is binary tree, but as you can see from pseudocode this concept can be used for any valid tree, the time complexity is O(n) as we visited each node only once.
Related
I was solving the following job interview question and solved most of it but failed at the last requirement.
Q: Build a data structure which supports the following functions:
Init - Initialise Empty DS. O(1) Time complexity.
SetPositiveInDay(d,x) - Add to the DS that in day d exactly x new people were infected with covid-19. O(log n)Time complexity.
WorseBefore(d) - From the days inserted into the DS and smaller than d return the last one which has more newly infected people than d. O(log n)Time complexity.
For example:
Init()
SetPositiveInDay(1,10)
SetPositiveInDay(2,20)
SetPositiveInDay(3,15)
SetPositiveInDay(5,17)
SetPositiveInDay(23,180)
SetPositiveInDay(8,13)
SetPositiveInDay(13,18)
WorstBefore(13) // Returns day #2
SetPositiveInDay(10,19)
WorstBefore(13) // Returns day #10
Important note: you can't suppose that days will be entered by order and can't suppose too that there won't be "gaps" between days. (Some days may not be saved in the DS while those after it may be).
What I did?
I used AVL tree (I could use 2-3 tree too).
For each node I have:
Sick - Number of new infected people in that day.
maxLeftSick - Max number of infected people for left son.
maxRightSick - Max number of infected people for right son.
When inserted a new node I made sure that in rotation data won't get missed plus, for each single node from the new one till the root I did:
But I wasn't successful implementing WorseBefore(d).
Where to search?
First you need to find the node node corresponding to d in the tree ordered by days. Let x = Sick(node). This can be done in O(log n).
If maxLeftSick(node) > x, the solution must be in the left subtree of node. Search for the solution there and return the answer. This can be done in O(log n) - see below.
Otherwise, traverse the tree upwards towards the root, starting from node, until you find the first node nextPredecessor satisfying this property (this takes O(log n)):
nextPredecessor is smaller than node,
and either
Sick(nextPredecessor) > x or
maxLeftSick(nextPredecessor) > x.
If no such node exists, we give up. In case 1, just return nextPredecessor since that is the best solution.
In case 2, we know that the solution must be in the left subtree of nextPredecessor, so search there and return the answer. Again, this takes O(log n) - see below.
Note that there is no need to search in the right subtree of nextPredecessor since the only nodes that are smaller than node in that subtree would be the left subtree of node itself, and we have already excluded that.
Note also that it is not necessary to traverse further up the tree than nextPredecessor since those nodes are even smaller, and we are looking for the largest node satisfying all constraints.
How to search?
OK, so how do we search for the solution in a subtree? Finding the largest day within a subtree rooted in q that is worse than an infection number x is simple using the maxLeftSick and maxRightSick information:
If q has a right child and maxRightSick(q) > x then search in the right subtree of q.
If q has no right child and Sick(q) > x, return Day(q).
If q has a left child and maxLeftSick(q) > x then search in the left subtree of q.
Otherwise there is no solution within the subtree q.
We are effectively using maxLeftSick and maxRightSick to prune the search tree to include only "worse" nodes, and within that pruned tree we get the right most node, i.e. the one with the largest day.
It is easy to see that this algorithm runs in O(log n) where n is the total number of nodes since the number of steps is bounded by the height of the tree.
Pseudocode
Here is the pseudocode (assuming maxLeftSick and maxRightSick return -1 if no corresponding child node exists):
// Returns the largest day smaller than d such that its
// infection number is larger than the infection number on day d.
// Returns -1 if no such day exists.
int WorstBefore(int d) {
node = find(d);
// try to find the solution in the left subtree
if (maxLeftSick(node) > Sick(node)) {
return FindLastWorseThan(node -> left, Sick(node));
}
// move up towards root until we find the first node
// that is smaller than `node` and such that
// Sick(nextPredecessor) > Sick(node) or
// maxLeftSick(nextPredecessor) > Sick(node).
nextPredecessor = findNextPredecessor(node);
if (nextPredecessor == null) return -1;
// Case 1
if (Sick(nextPredecessor) > Sick(node)) return nextPredecessor;
// Case 2: maxLeftSick(nextPredecessor) > Sick(node)
return FindLastWorseThan(nextPredecessor -> left, Sick(node));
}
// Finds the latest day within the given subtree with root "node" where
// the infection number is larger than x. Runs in O(log(size(q)).
int FindLastWorseThan(Node q, int x) {
if ((q -> right) = null and Sick(q) > x) return Day(q);
if (maxRightSick(q) > x) return FindLastWorseThan(q -> right, x);
if (maxLeftSick(q) > x) return FindLastWorseThan(q -> left, x);
return -1;
}
First of all, your chosen data structure looks fine to me. You did not mention it explicitly, but I assume that the "key" you use in the AVL tree is the day number, i.e. an in-order traversal of the tree would list the nodes in their chronological order.
I would just suggest a cosmetic change: store the maximum value of sick in the node itself, so that you don't have two similar informations (maxLeftSick and maxRightSick) stored in one node instance, but move those two informations to the child nodes, so that your node.maxLeftSick is actually stored in node.left.maxSick, and similarly node.maxRightSick is stored in node.right.maxSick. This is of course not done when that child does not exist, but then we don't need that information either. In your structure maxLeftSick would be 0 when left is not defined. In my proposed structure, you would not have that value -- the 0 would follow naturally from the fact that there is no left child. In my proposal, the root node would have an information in maxSick which is not present in yours, and which would be the sum of your root.maxLeftSick and root.maxRightSick. This information would not really be used, but it is just there to make the structure consistent throughout the tree.
So you would just store one maxSick, which considers the current node's sick value also in that maximum. The processing you do during rotations will need to change accordingly, but will not become more complex.
I will assume that your AVL tree is single-threaded, i.e. you don't keep track of parent-pointers. So create a find method which will return the path to the node to be found. For instance, in Python syntax, it could look like this:
def find(self, day):
node = self.root
path = [] # an array of nodes
while node:
path.append(node)
if node.day == day: # bingo
return path
if day < node.day:
node = node.left
else:
node = node.right
Then the worstBefore method could look like this:
def worstBefore(self, day):
path = self.find(day)
if not path:
return # day not found
# get number of sick people on that day:
sick = path[-1].sick
# look for recent day with greater number of sick
while path:
node = path.pop() # walk upward, starting with found node
if node.day < day and node.sick > sick:
return node.day
if node.left and node.left.maxSick > sick:
# we will find the result in this subtree
node = node.left
while True:
if node.right and node.right.maxSick > sick:
node = node.right
elif node.sick > sick: # bingo
return node.day
else:
node = node.left
So the path returned by the find method will be used to get the parents of a node when you need to backtrack upwards in the tree along that path.
If along that path you find a left child whose maxSick is greater, then you know that the targeted node must be in that subtree. It is then a matter to walk down that subtree in a controlled way, choosing the right child when it still has maxSick greater. Otherwise check the current node's sick value and return that one if that value is greater. Otherwise go left, and repeat.
While there is no such left sub tree, go up along the path. If that parent would be a match, then return it (make sure to verify the day number). Keep checking for left sub trees that have a larger maxSick.
This runs in O(logn) because you first will walk zero or more steps upward and then zero or more steps downward (in a left subtree).
You can see your example scenario run on repl.it. There I focussed on this question, and didn't implement the rotations.
I have a tree. For example the following one:
root
a
/ \
b c
/ \ / \
e d f g
Every node in the tree has an attribute attr1. If a node's attr1 has a value of 1. Then the attr2 (another attribute) of all nodes on the path to this node, should make to 1. But we don't know if any of the nodes has a the value 1 in its attr1.
The idea I have to solve the problem is, to traverse through the tree (pre-order). While traversing I will have a FIFO container (queue) and every time I go downwards I will add to the queue and when going upwards I will remove the nodes which are bellow. So I have always the path to the current node. If then the node has attr1 == 1, then I must iterate the path again and set the attr2 of all nodes in the path to 2.
But I don't know if there is a more efficient way to accomplish this?
def update(node):
if node is None:
return False
upd_left = update(node.left)
upd_right = update(node.right)
node.attr2 = 1 if upd_left or upd_right or node.attr1 == 1 else node.attr2
return node.attr2 == 1
I think this will do what you expect as we are not iterating over the queue again and again.
The worst case complexity of your approach in case of a skewed tree will be O(n2). As for each node, you have to traverse the queue, if attr1==1 for each node.
But, in the above code, the complexity will be atmost O(n). Because you are visiting each node only once.
I faced an interesting problem called as the Great Tree-List Problem. The Problem is as follows :
In the ordered binary tree, each node contains a single data element and "small" and "large" pointers to sub-trees .All the nodes in the "small" sub-tree are less than or equal to the data in the parent node. All the nodes in the "large" sub-tree are greater than the parent node. And a circular doubly linked list consist of previous and next pointers.
The problem is to take an ordered binary tree and rearrange the internal pointers to make a circular doubly linked list out of it. The "small" pointer should play the role of "previous" and the "large" pointer should play the role of "next". The list should be arranged so that the nodes are in increasing order. I have to write a recursive function & Return the head pointer to the new list.
The operation should be done in O(n) time.
I understand that recursion will go down the tree, but how to recursively change the small and large sub-trees into lists, also i have to append those lists together with the parent node.
How should i approach the problem?.. I just need a direction to solve the problem!.
The idea is to create a method that converts a tree node containing subtrees (children nodes) into a loop. And given a node that has converted children (e.g. after recursive calls came back), you create a new loop by pointing the large pointer (next) of the largest node to the smallest node, and the small pointer of the smallest node to the largest node.
May not be complete, but it will be close to this:
tree_node {
small
large
}
convert(node){
//base case 1, if leaf node
if node.small == null && node.large == null
return (self, self)
//recursively convert children
if node.small !=null
smallest, larger = convert(node.small)
else
smallest = larger = self
if node.large !=null
smaller, largest = convert(node.large)
else
smaller = largest = self
//wrap the ends of the chain
largest.large = smallest
smallest.small = largest
//wrap the mid part
smaller.small = larger
larger.large = small
//return pointers to the absolute smallest and largest of this subtree
return (smallest, largest)
}
//actually doing it
convert(tree.root)
The key to recursive programming is to imagine you already have the solution.
So, you already have a function recLink(Tree t) which receives a pointer to a tree, turns that tree into a doubly-linked circular list and returns a pointer to the list's head (leftmost) node:
recLink( n={Node: left, elt, right}) = // pattern match tree to a full node
rt := recLink( right); // already
lt := recLink( left); // have it
n.right := rt; n.left := lt.left; // middle node
lt.left.right := n; rt.left.right := lt; // right edges
lt.left := rt.left; rt.left := n;
return lt;
Finish up with the edge cases (empty child branches etc.). :)
assuming you have a simple tree of 3 nodes
B <--- A ---> C
walk down the left and right sides, get the pointers for each node, then have
B -> C
B <- C
Since your tree is binary, it will be composed of 3 node "subtrees" that can recursively use this strategy.
I'm trying to write a function to remove a node from a binary tree. I haven't coded the function yet, and I am trying to think about the different conditions I should consider for removing a node. I am guessing that the possible conditions are:
The node has no children
The node has one child
The node has 2 children
In each of these cases what would be the algorithm to perform a delete function?
This is something you would find in any standard textbook about algorithms, but let's suppose you are interested in the unbalanced case (balanced trees usually performs some rebalancing operations called "rotations" after a removal) and you use the "obvious" datastructure (a tree_node structure that holds the value and two pointers to other tree_node):
No children: release the memory hold by the node and set the parent's child link that pointed to it as NULL;
One child: release the memory hold by the node and set the parent's child link that pointed to it as the address of its unique child;
Two children: this is indeed the "complicated" case. Find the rightmost node of the left child (or the leftmost node of the right child), take its value, remove it (it is "case 1", so it is easy and can be done recursively) and set the current node's value as the one of that node. This is O(tree_height) = O(n), but it is not a problem (at least in theory) because this would be neverthless the complexity of finding a node.
Does your tree have any additional properties?
Is it an AVL?
If not, there are some pretty obvious and straightforward ways to do what you want (which will depend on your data representation, as Vitalij said).
And if it is an AVL for example, there ALSO are some well known method for doing that (wikipedia will tell you more on that topic)
First task is to find whether node exists which will be done during search and rest of your conditions are correct.
Leaf node: set the parent's child (right/left) to NULL.
Has one child: Just set the child of the node to be deleted to its parent's child.
Has two children: Basically have to re-order the whole subtree here by pruning the subtree to by finding new children for the node to be deleted.
Assuming you are dealing with general binary trees, do the following,
Node has no child- ie it is a leaf : Conveniently delete it..
Node has one child - Make the parent of the node to be deleted parent of its child , then delete the node. ie, if A->Parent = B; C->Parent = A; and A has to be deleted, then 1. Make C->Parent = B; 2. Delete A;
Tricky one.... Yes, replacing the node to be deleted by the left most child of the right subtree work, or by the rightmost tree of the left subtree, either will do... because it can be seen like this,
When a node is deleted, it has to be replaced by a node which satisfies some properties...
Lets say if our binary tree represents sorted numbers (in increasing order) in inorder traversal, then the deleted node should be replaced by some node from either of its subtrees. That should be larger in value than the whole remaining left subtree, and smaller than the whole remaining right subtree (remaining means the subtree remaining after adjusting for the deleted node successfully). Only two such nodes exist, leftmost leaf of the right subtree, or the rightmost node of left one.
Hence, replacing the deleted node from either one suffices...!!
Delete the given keys one at a time from the binary search tree. Possible equal keys were inserted into the left branch of the existing node. Please note that the insertion strategy also affects how the deletion is performed
BinarySearchTree-Delete
Node Delete(Node root, Key k)
1 if (root == null) // failed search
2 return null;
3 if (k == root.key) // successful search
4 return DeleteThis(root);
5 if (k < root.key) // k in the left branch
6 root.left = Delete(root.left, k);
7 else // k > root.key, i.e., k in the right branch
8 root.right = Delete(root.right, k);
9 return root;
Node DeleteThis(Node root)
1 if root has two children
2 p = Largest(root.left); // replace root with its immediate predecessor p
3 root.key = p.key;
4 root.left = Delete(root.left, p)
5 return root;
6 if root has only left child
7 return root.left
8 if root has only right child
9 return root.right
10 else root has no children
11 return null
Node Largest(Node root)
1 if root has no right child
2 return root
3 return Largest(root.right)
I am trying to solve a problem of finding a max repetitive sub-tree in an object tree.
By the object tree I mean a tree where each leaf and node has a name. Each leaf has a type and a value of that type associated with that leaf. Each node has a set of leaves / nodes in certain order.
Given an object tree that - we know - has a repetitive sub-tree in it.
By repetitive I mean 2 or more sub-trees that are similar in everything (names/types/order of sub-elements) but the values of leaves. No nodes/leaves can be shared between sub-trees.
Problem is to identify these sub-trees of the max height.
I know that the exhaustive search can do the trick. I am rather looking for more efficient approach.
you could implement a dfs traversal generating a hash value for each node. Store these values with the node height in a simple array. Sub-tree candidates are duplicate values, just check that the candidates are ok since two different sub-trees could yield same hash value.
Assuming the leafs and internal nodes are all of type Node and that standard access and traversal functions are available :
procedure dfs_update( node : Node, hashmap : Hashmap )
begin
if is_leaf(node) then
hashstring = concat("LEAF",'|',get_name_str(node),'|',get_type_str(node))
else // node is an internal node
hashstring = concat("NODE",'|',get_name_str(node))
for each child in get_children_sorted(node)
dfs_update(child,hashmap)
hashstring = concat(hashstring,'|',get_hash_string(hashmap,child))
end for
end if
// only a ref to node is added to the hashmap, we could also add
// the node's height, hashstring, whatever could be useful and inapropriate
// to keep in the Node ds
add(hashmap, hash(hashstring),node)
end
The tricky part is after a dfs_update, we have to get the list of collinding nodes in the hasmap by descending height and check two by two they are really repetitive.