Inserting and deleting edges in a Tree Dynamically - algorithm

Problem : Given a rooted Tree T containing N nodes. Each node is numbered form 1 to N, node 1 being the root node. Also, each node contains some value. We have to do three kind of queries in the given tree.
Query 1::
Given a node nd, you have to find the sum of the values of all the nodes of the subtree rooted at nd and print the answer.
Query 2::
Given a node nd, you have to delete the subtree rooted at nd, completely (including node nd).
Query 3::
Given a node nd and some integer v, you have to add a child to node nd having value equal to v.
Constraints : N will be of the order of 100000. And total number of queries wil also be of the order 100000. So, I can't to DFS traversal every time.
My Idea: My solution is offline . I will first find all the nodes that are added to the tree at-least once and make the corresponding tree. Then I will do pre-order traversal to the tree and convert it into an array where a subtree will always appear continuously. Then I can use segment tree data structure to solve the problem. My algorithm will be thus O(QlogN), where Q is the total number of queries. However, I am looking for a "online" solution which is efficient. I mean, I have perform each query as soon as it is asked. I can not store all the queries first then perform them one by one.
Any help is appreciated a lot!
Thanks.

Assuming tree is balanced, with two extra parameters in every node you can solve it in o(qlogn).
With every node maintain a sum whose value will be equal to the sum of values of nodes in the subtree rooted at that and maintain parent as well.
With the above two requirements, query one just reduces to returning sum plus the value at that node(o(1)). query two reduces to just subtracting sum plus the value of node from every parent of that node till you reach the root(o(logn)). query three just reduces to adding v to every parent of that node till you reach the root(o(logn)).

Related

Given a query containing two integers as nodes, find all the children of those two nodes in tree?

This is my interview question which has the following problem statement
You are given M queries (1 <= M <= 100000) where every query has 2 integers which behave as nodes of some tree. How will you give all the children(subtree) for these 2 nodes respectively.
Well my approach was naive. I used DFS from both the integers(nodes) for every query but interviewer needed some optimized approach.
More simply, we have to print sub-tree of nodes given in the queries there could be many queries, so we can't run DFS on every node in the query.
Any hints how can I optimize this ?
You could optimize an algorithm that performs DFS on both nodes if one of the nodes is a child of the other.
Suppose Node 2 is a child of Node 1. In this case, calculating the DFS on Node 1 gets all of the children of Node 2, so running DFS again on 2 is inefficient. You could accomplish this by storing intermediate values to avoid recalculation (see dynamic programming, specifically the example for Fibonacci, about how you can not recalculate values for recursive calls)
For a single query, DFS should be the optimal way. For a larger number of queries here are a few things in my mind that you could do:
Cache your results. When a number shows up frequently (say 100 times), save that printed subtree to memory and just return the result when the same number appears again.
When caching, also mark all the nodes contained in the cached subtree on your original tree. When a query contains such a node, refer to the cached subtree instead of the original tree since you have done DFS on these nodes as well.
As noted by #K. Dackow if a query contains A and B and B is a child of A, you can straight out use the DFS results for B when traversing the tree for A. If permitted you can even look into multiple queries (say 10) and see if there are any nodes that belong to the current subtree you're traversing. You can set up a queue for queries and when doing one DFS traversal, look into the top items in you queue to see if you have met any of the nodes.
Hope this helps!

Count nodes bigger then root in each subtree of a given binary tree in O(n log n)

We are given a tree with n nodes in form of a pointer to its root node, where each node contains a pointer to its parent, left child and right child, and also a key which is an integer. For each node v I want to add additional field v.bigger which should contain number of nodes with key bigger than v.key, that are in a subtree rooted at v. Adding such a field to all nodes of a tree should take O(n log n) time in total.
I'm looking for any hints that would allow me to solve this problem. I tried several heuristics - for example when thinking about doing this problem in bottom-up manner, for a fixed node v, v.left and v.right could provide v with some kind of set (balanced BST?) with operation bigger(x), which for a given x returns a number of elements bigger than x in that set in logarihmic time. The problem is, we would need to merge such sets in O(log n), so this seems as a no-go, as I don't know any ordered set like data structure which supports quick merging.
I also thought about top-down approach - a node v adds one to some u.bigger for some node u if and only if u lies on a simple path to the root and u<v. So v could update all such u's somehow, but I couldn't come up with any reasonable way of doing that...
So, what is the right way of thinking about this problem?
Perform depth-first search in given tree (starting from root node).
When any node is visited for the first time (coming from parent node), add its key to some order-statistics data structure (OSDS). At the same time query OSDS for number of keys larger than current key and initialize v.bigger with negated result of this query.
When any node is visited for the last time (coming from right child), query OSDS for number of keys larger than current key and add the result to v.bigger.
You could apply this algorithm to any rooted trees (not necessarily binary trees). And it does not necessarily need parent pointers (you could use DFS stack instead).
For OSDS you could use either augmented BST or Fenwick tree. In case of Fenwick tree you need to preprocess given tree so that values of the keys are compressed: just copy all the keys to an array, sort it, remove duplicates, then substitute keys by their indexes in this array.
Basic idea:
Using the bottom-up approach, each node will get two ordered lists of the values in the subtree from both sons and then find how many of them are bigger. When finished, pass the combined ordered list upwards.
Details:
Leaves:
Leaves obviously have v.bigger=0. The node above them creates a two item list of the values, updates itself and adds its own value to the list.
All other nodes:
Get both lists from sons and merge them in an ordered way. Since they are already sorted, this is O(number of nodes in subtree). During the merge you can also find how many nodes qualify the condition and get the value of v.bigger for the node.
Why is this O(n logn)?
Every node in the tree counts through the number of nodes in its subtree. This means the root counts all the nodes in the tree, the sons of the root each count (combined) the number of nodes in the tree (yes, yes, -1 for the root) and so on all nodes in the same height count together the number of nodes that are lower. This gives us that the number of nodes counted is number of nodes * height of the tree - which is O(n logn)
What if for each node we keep a separate binary search tree (BST) which consists of nodes of the subtree rooted at that node.
For a node v at level k, merging the two subtrees v.left and v.right which both have O(n/2^(k+1)) elements is O(n/2^k). After forming the BST for this node, we can find v.bigger in O(n/2^(k+1)) time by just counting the elements in the right (traditionally) subtree of the BST. Summing up, we have O(3*n/2^(k+1)) operations for a single node at level k. There are a total of 2^k many level k nodes, therefore we have O(2^k*3*n/2^(k+1)) which is simplified as O(n) (dropping the 3/2 constant). operations at level k. There are log(n) levels, hence we have O(n*log(n)) operations in total.

Partitioning a weighted tree to equally weighted subtrees

Input:
a rooted tree with n nodes;
each node p has positive integer weight w(p);
a node can have more than two children.
Problem:
divide the tree into k subtrees/partitions (obviously by removing k-1 edges);
subtree weight W(p) is the weight of all the nodes in a subtree rooted at node p;
all the subtrees should be weighted as evenly as possible - the difference between min(W(p)) and max(W(p)) should be as small as possible.
I've yet to find a suitable algorithm for this. Where should I start? Tips, instructions and pseudocode appreciated.
Assume you can't modify the tree other than to remove edges to create subtrees.
First understand that you cannot guarantee that by simply removing edges that you will have subtrees within an arbitrary bound. You can create tree that when you split them there is no way to create subtrees within a target bound. For example:
a(b(c,d,e,f),g)
You cannot split that into two balanced sections. The best you can do is remove the edge from a to b:
a(g) and b(c,d,e,f)
Also this criteria is a little underdefined when k > 2. What is better a split 10,10,10,1 or 10,10,6,5?
But you can come up with a method to split trees up in the most balanced way possible.
Implement you tree such that each node holds a count of all of its children. You can add this pretty efficiently to any tree. ( E.g. when you add a node you have to iterate up the chain of parent node incrementing the count. Remove a node and you iterate up subtracting from the count )
Then starting from the root iterate down, in a breadth first manner until you find a set of nodes that dominate child nodes in a way that is most balanced. I don't have an algorithm for this at the ready - but I think you can find one pretty readily.
I think something where when you want to divide into k subtrees you create an array of k tree roots. One of those nodes must always be the root of the current tree, then you iterate down looking for nodes to replace on of the k-1 candidates that improves the partitioning. You'll want some kind of terminating condition where you don't interate down to every leaf node. E.g. it never makes sense to subdivide anything by the largest candidate node.

Number of nodes in a B-Tree

How many nodes does a resulting B-Tree(min degree 2) have if I insert numbers from 1 to n in order?
I tried inserting nodes from 1 to 20 there was a series for the number of nodes coming but i could not generalize it.
Can anyone please help me derive the formula for this.
It will depend on the order of the B-Tree. The order of a BTree is the maximum number of children nodes a non-leaf node may hold (which is one more than the minimum number of keys such a node could hold).
According to Knuth's definition, a B-tree of order m is a tree which satisfies the following properties:
Every node has at most m children.
Every non-leaf node (except root) has at least ⌈m⁄2⌉ children.
The root has at least two children if it is not a leaf node.
A non-leaf node with k children contains k−1 keys.
All leaves appear in the same level, and internal vertices carry no information.
So in your case when you are inserting 20 keys if the order is m then based on the conditions mentioned above you can derive a set of inequalities that describes the possible value of m. But there is no equality formula that says the number of internal nodes in a B-Tree.

Find the maximum weight node in a tree if each node is the sum of the weights all the nodes under it.

For exa, this is the tree.
10
12 -1
5 1 1 -2
2 3 10 -9
How to find the node with maximum value?
Given the problem as stated, you need to traverse the entire tree. See proof below.
Traversing the entire tree should be a fairly trivial process.
Proof that we need to traverse the entire tree:
Assume we're able to identify which side of a tree the maximum is on without traversing the entire tree.
Given any tree with the maximum node on the left. Call this maximum x.
Pick one of the leaf nodes on the right. Add 2 children to it: x+1 and -x-1.
Since x+1-x-1 = 0, adding these won't change the sum at the leaf we added it to, thus nor the sums at any other nodes in the tree.
Since this can be added to any leaf in the tree, and it doesn't affect the sums, we'd need to traverse the entire tree to find out if this occurs anywhere.
Thus our assumption that we can identify which side of a tree the maximum is on without traversing the entire tree is incorrect.
Thus we need to traverse the entire tree.
In the general case, you need to traverse the entire tree. If the values in the tree are not constrained (e.g. all non-negative, but in your example there are negative values), then the value in a node tells you nothing about the individual values below it.

Resources