linearizing a tree to an array and answering "sum" queries on paths - algorithm

The question is motivated by the travtree problem in codechef. In the editorial they recommend linearizing the tree to an array by recording for each node its discovery and exit times in a DFS traversal. Now we can quickly answer queries about sum subtree - by summing events that happened in the segment [discovery time, exit time] of that node. (we are using a Fenwick tree to answer these queries fast).
HOWEVER, to solve that problem we also need to quickly answer sum path queries. That is - summing events that happened along the shortest path between a, b. How is that possible? The answer they give is this:
For each interesting event they update this:
update(BT2,event_node,1);
update(BT2,out[event_node],-1);
and the sum path(a,b) is now this:
int l = lca(a,b);
ans = query(BT2,a) + query(BT2,b) - query(BT2,l) - (l==1 ? 0 : query(BT2, parent[0][l]));
Where query is the prefix sum. How is that correct?? when you look at the prefix sum till a you might encounter lots of nodes which are irrelevant to the path between l and a!

In order to linearize a sum path query - sum of events that happened on the shortest path between tree nodes a, b we indeed have to do the following:
When an event happens in node v, we update(IN[v], 1) and update(OUT[v], -1). IN being the node's DFS discovery time and OUT the DFS exit time.
Now the query would be query(IN[b]) - query(IN[a]-1). The query(IN[b]) is a prefix sum: it starts from the root, and traverses the tree until it reaches b. Note that for each node v we will pass not on the direct path from root to b, we will discover and then eventually exit it. Only for the nodes on the path we will discover and not exit. Because of the way we updated, this means that we will effectively sum the nodes on the path root, b (including b).
Now its clear that the same happens in query(IN[a]-1) - it is the sum of the nodes on the path root, a (not including a this time). Subtracting them gives us a, b. Draw a sketch and you'll see it for yourselves.
For completeness - the method for sum subtree is different both in update and in query. For each event you only update(IN[v]). Now for querying sum subtree(a) we do query(OUT[a]) - query(IN[a]-1). This time in query(OUT[a]) we sum all nodes we traversed until we discover a, and then all nodes in a's subtree until we exit it. Now we subtract query(IN[a] - 1) - all the nodes until we discover a. We're left exactly with only the a subtree.

Related

Find something(min/max/unique) for path between u and v node in a graph for every query

https://www.hackerearth.com/challenges/hiring/sap-labs-java-hiring-challenge/algorithm/micro-and-internship-10/description/
In query based questions, like above, where we have to find path between two nodes every time and perform some operation on the path, what should be the approach? I have tried DFS, but it's giving run time error as well as Time limit Exceeded.
DFS algorithm
First let's solve the problem using normal dfs for each query. What we need is a boolean array V of length 100 (since the values on each node can't exceed 100) and as we traversing each node if V[a[u]] (where u is the current node) is 0 set it to 1 and increment the answer.
Now to solve a similar problem like finding a min edge between two nodes we need to use sparse table which is also used to find LCA, if you dont know what a sparse table is I recommend you read about it here https://www.hackerrank.com/topics/lowest-common-ancestor (the third part). Briefly it's a way to find LCA in O(log n) with O(nlogn) pre-processing.
If we take 2 nodes u and v and we want to find the min weight in the path it's equivalent to finding the min of the min weight from u to LCA(u,v) and and the min weight from v to LCA(u,v).
How is that useful ? using sparse table, instead of just saving where a node go to if it makes a jump of height h we also save the min edge on the path then we can simply answer each query with O(log n) complexity.
The same implies to this problem but instead of storing the min value we've got a boolean array as I explained in the first approach, where the i-th value is 1 if there is an edge of weight i when we make a jump of height h and 0 otherwise. Using this array you should be able to answer each query.
this solution makes 100 * log n operations when answering a query since we're iterating through the boolean array.

Queries on Tree Path with Modifications

Question:
You are given a Tree with n Nodes(can be upto 10^5) and n-1 bidirectional edges. And lets say each node contains two values:
It's index(Just a unique number for node), lets say it will be from 1 to n.
And It's value Vi, which can vary from 1 to 10^8
Now there will be multiple same type of queries(number of queries can be upto 10^5) on this same tree, as follows:
You are given node1, node2 and a value P(can vary from 1 to 10^8).
And for every this type of query, you just have to find number of nodes in path from node1 to node2 whose value is less than P.
NOTE: There will be unique path between all the nodes and no two edges belong to same pair of nodes.
Required Time Complexity O(nLog(n)) or can be in other terms but should be solvable in 1 Sec with given constraints.
What I have Tried:
(A). I could solve it easily if value of P would be fixed, using LCA approach in O(nLog(n)) by storing following info at each node:
Number of nodes whose value is less than P, from root to given node.
But here P is varying way too much so this will not help.
(B). Other approach I was thinking is, using simple DFS. But that will also take O(nq), where q is number of queries. Again as n and q both are varying between 1 to 10^5 so this will not help too in given time constraint.
I could not think anything else. Any help would be appreciated. :)
Source:
I read this problem somewhere on SPOJ I guess. But cannot find it now. Tried searching it on Web but could not find solution for it anywhere (Codeforces, CodeChef, SPOJ, StackOverflow).
Let ans(v, P) be the answer on a vertical path from the root to v and the given value of P.
How can we compute it? There's a simple offline solution: we can store all queries for the given node in a vector associated with it, run the depth-first search keep all values on the current path from the path in data structure that can do the following:
add a value
delete a value
count the number elements smaller than X
Any balanced binary-search tree would do. You can make it even simpler: if you know all the queries beforehand, you can compress the numbers so that they're in the [0..n - 1] range and use a binary index tree.
Back to the original problem: the answer to a (u, v, P) query is clearly ans(v, P) + ans(u, P) - 2 * ans(LCA(u, v), P).
That's it. The time complexity is O((N + Q) log N).

RB tree with sum

I have some questions about augmenting data structures:
Let S = {k1, . . . , kn} be a set of numbers. Design an efficient
data structure for S that supports the following two operations:
Insert(S, k) which inserts the
number k into S (you can assume that k is not contained in S yet), and TotalGreater(S, a)
which returns the sum of all keys ki ∈ S which are larger than a, that is, P ki∈S, ki>a ki .
Argue the running time of both operations and give pseudo-code for TotalGreater(S, a) (do not given pseudo-code for Insert(S, k)).
I don't understand how to do this, I was thinking of adding an extra field to the RB-tree called sum, but then it doesn't work because sometimes I need only the sum of the left nodes and sometimes I need the sum of the right nodes too.
So I was thinking of adding 2 fields called leftSum and rightSum and if the current node is > GivenValue then add the cached value of the sum of the sub nodes to the current sum value.
Can someone please help me with this?
You can just add a variable size to each node, which is the number of nodes in the subtree rooted at that node. When finding the node with the smallest value that is larger than the value a, two things can happen on the path to that node: you can go left or right. Every time you go left, you add the size of the right child + 1 to the running total. Every time you go right, you do nothing.
There are two conditions for termination. 1) we find a node containing the exact value a, in which case we add the size of its right child to the total. 2) we reach a leaf, in which case we add 1 if it is larger than a, or nothing if it is smaller.
As Jordi describes: The key-word could be augmented red-black tree.

locating lowest common ancestor in AVL tree

I have an AVL tree and 2 keys in it. how do I find the lowest common ancestor (by lowest I mean hight, not value) with O(logn) complexity?
I've seen an answer here on stackoverflow, but I admit I didn't exactly understand it. it involved finding the routes from each key to the root and then comparing them. I'm not sure how this meets the complexity requirements
For the first node you move up and mark the nodes. For the second node you move up and look if a node on the path is marked. As soon as you find a marked node you can stop. (And remove the marks by doing the first path again).
If you cannot mark nodes in the tree directly then modify the values contained to include a place where you can mark. If you cannot do this either then add a hashmap that stores which nodes are marked.
This is O(logn) because the tree is O(logn) deep and at worst you walk 3 times to the root.
Also, if you wish you can alternate steps of the two paths instead of first walking the first path completely. (Note that then both paths have to check for marks.) This might be better if you expect the two nodes to have their ancestor somewhat locally. The asymptotic runtime is the same as above.
A better solution for the AVL tree (balanced binary search tree) is (I have used C pointers like notation)-
Let K1 and K2 be 2 keys, for which LCA is to be found. Assume K1 < K2
A pointer P = root of tree
If P->key >= K1 and P->key <= K2 : return P
Else if P->key > K1 and P->key > K2 : P = P->left
Else P = P->right
Repeat step 3 to 5
The returned P points to the required LCA.
Note that this approach works only for BST, not any other Binary tree.

Checking if A is a part of binary tree B

Let's say I have binary trees A and B and I want to know if A is a "part" of B. I am not only talking about subtrees. What I want to know is if B has all the nodes and edges that A does.
My thoughts were that since tree is essentially a graph, and I could view this question as a subgraph isomorphism problem (i.e. checking to see if A is a subgraph of B). But according to wikipedia this is an NP-complete problem.
http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
I know that you can check if A is a subtree of B or not with O(n) algorithms (e.g. using preorder and inorder traversals to flatten the trees to strings and checking for substrings). I was trying to modify this a little to see if I can also test for just "parts" as well, but to no avail. This is where I'm stuck.
Are there any other ways to view this problem other than using subgraph isomorphism? I'm thinking there must be faster methods since binary trees are much more restricted and simpler versions of graphs.
Thanks in advance!
EDIT: I realized that the worst case for even a brute force method for my question would only take O(m * n), which is polynomial. So I guess this isn't a NP-complete problem after all. Then my next question is, is there an algorithm that is faster than O(m*n)?
I would approach this problem in two steps:
Find the root of A in B (either BFS of DFS)
Verify that A is contained in B (giving that starting node), using a recursive algorithm, as below (I concocted same crazy pseudo-language, because you didn't specify the language. I think this should be understandable, no matter your background). Note that a is a node from A (initially the root) and b is a node from B (initially the node found in step 1)
function checkTrees(node a, node b) returns boolean
if a does not exist or b does not exist then
// base of the recursion
return false
else if a is different from b then
// compare the current nodes
return false
else
// check the children of a
boolean leftFound = true
boolean rightFound = true
if a.left exists then
// try to match the left child of a with
// every possible neighbor of b
leftFound = checkTrees(a.left, b.left)
or checkTrees(a.left, b.right)
or checkTrees(a.left, b.parent)
if a.right exists then
// try to match the right child of a with
// every possible neighbor of b
leftFound = checkTrees(a.right, b.left)
or checkTrees(a.right, b.right)
or checkTrees(a.right, b.parent)
return leftFound and rightFound
About the running time: let m be the number of nodes in A and n be the number of nodes in B. The search in the first step takes O(n) time. The running time of the second step depends on one crucial assumption I made, but that might be wrong: I assumed that every node of A is equal to at most one node of B. If that is the case, the running time of the second step is O(m) (because you can never search too far in the wrong direction). So the total running time would be O(m + n).
While writing down my assumption, I start to wonder whether that's not oversimplifying your case...
you could compare the trees in bottom-up as follows:
for each leaf in tree A, identify the corresponding node in tree B.
start a parallel traversal towards the root in both trees from the nodes just matched.
specifically, move to the parent of a node in A and subsequently move towards the root in B until you either encounter the corresponding node in B (proceed) or a marked node in A (see below, if a match in B is found proceed, else fail) or the root of B (fail)
mark all nodes visited in A.
you succeed, if you haven't failed ;-).
the main part of the algorithm runs in O(e_B) - in the worst case, all edges in B are visited a constant number of times. the leaf node matching will run in O(n_A * log n_B) if there the B vertices are sorted, O(n_A * log n_A + n_B * log n_B + n) = O(n_B * log n_B) (sort each node set, lienarly scan the results thereafter) otherwise.
EDIT:
re-reading your question, abovementioned step 2 is even easier, as for matching nodes in A, B, their parents must match too (otheriwse there would be a mismatch between the edge sets). no effect on worst-case run time, of course.

Resources