Queries on Tree Path with Modifications - algorithm

Question:
You are given a Tree with n Nodes(can be upto 10^5) and n-1 bidirectional edges. And lets say each node contains two values:
It's index(Just a unique number for node), lets say it will be from 1 to n.
And It's value Vi, which can vary from 1 to 10^8
Now there will be multiple same type of queries(number of queries can be upto 10^5) on this same tree, as follows:
You are given node1, node2 and a value P(can vary from 1 to 10^8).
And for every this type of query, you just have to find number of nodes in path from node1 to node2 whose value is less than P.
NOTE: There will be unique path between all the nodes and no two edges belong to same pair of nodes.
Required Time Complexity O(nLog(n)) or can be in other terms but should be solvable in 1 Sec with given constraints.
What I have Tried:
(A). I could solve it easily if value of P would be fixed, using LCA approach in O(nLog(n)) by storing following info at each node:
Number of nodes whose value is less than P, from root to given node.
But here P is varying way too much so this will not help.
(B). Other approach I was thinking is, using simple DFS. But that will also take O(nq), where q is number of queries. Again as n and q both are varying between 1 to 10^5 so this will not help too in given time constraint.
I could not think anything else. Any help would be appreciated. :)
Source:
I read this problem somewhere on SPOJ I guess. But cannot find it now. Tried searching it on Web but could not find solution for it anywhere (Codeforces, CodeChef, SPOJ, StackOverflow).

Let ans(v, P) be the answer on a vertical path from the root to v and the given value of P.
How can we compute it? There's a simple offline solution: we can store all queries for the given node in a vector associated with it, run the depth-first search keep all values on the current path from the path in data structure that can do the following:
add a value
delete a value
count the number elements smaller than X
Any balanced binary-search tree would do. You can make it even simpler: if you know all the queries beforehand, you can compress the numbers so that they're in the [0..n - 1] range and use a binary index tree.
Back to the original problem: the answer to a (u, v, P) query is clearly ans(v, P) + ans(u, P) - 2 * ans(LCA(u, v), P).
That's it. The time complexity is O((N + Q) log N).

Related

Find something(min/max/unique) for path between u and v node in a graph for every query

https://www.hackerearth.com/challenges/hiring/sap-labs-java-hiring-challenge/algorithm/micro-and-internship-10/description/
In query based questions, like above, where we have to find path between two nodes every time and perform some operation on the path, what should be the approach? I have tried DFS, but it's giving run time error as well as Time limit Exceeded.
DFS algorithm
First let's solve the problem using normal dfs for each query. What we need is a boolean array V of length 100 (since the values on each node can't exceed 100) and as we traversing each node if V[a[u]] (where u is the current node) is 0 set it to 1 and increment the answer.
Now to solve a similar problem like finding a min edge between two nodes we need to use sparse table which is also used to find LCA, if you dont know what a sparse table is I recommend you read about it here https://www.hackerrank.com/topics/lowest-common-ancestor (the third part). Briefly it's a way to find LCA in O(log n) with O(nlogn) pre-processing.
If we take 2 nodes u and v and we want to find the min weight in the path it's equivalent to finding the min of the min weight from u to LCA(u,v) and and the min weight from v to LCA(u,v).
How is that useful ? using sparse table, instead of just saving where a node go to if it makes a jump of height h we also save the min edge on the path then we can simply answer each query with O(log n) complexity.
The same implies to this problem but instead of storing the min value we've got a boolean array as I explained in the first approach, where the i-th value is 1 if there is an edge of weight i when we make a jump of height h and 0 otherwise. Using this array you should be able to answer each query.
this solution makes 100 * log n operations when answering a query since we're iterating through the boolean array.

linearizing a tree to an array and answering "sum" queries on paths

The question is motivated by the travtree problem in codechef. In the editorial they recommend linearizing the tree to an array by recording for each node its discovery and exit times in a DFS traversal. Now we can quickly answer queries about sum subtree - by summing events that happened in the segment [discovery time, exit time] of that node. (we are using a Fenwick tree to answer these queries fast).
HOWEVER, to solve that problem we also need to quickly answer sum path queries. That is - summing events that happened along the shortest path between a, b. How is that possible? The answer they give is this:
For each interesting event they update this:
update(BT2,event_node,1);
update(BT2,out[event_node],-1);
and the sum path(a,b) is now this:
int l = lca(a,b);
ans = query(BT2,a) + query(BT2,b) - query(BT2,l) - (l==1 ? 0 : query(BT2, parent[0][l]));
Where query is the prefix sum. How is that correct?? when you look at the prefix sum till a you might encounter lots of nodes which are irrelevant to the path between l and a!
In order to linearize a sum path query - sum of events that happened on the shortest path between tree nodes a, b we indeed have to do the following:
When an event happens in node v, we update(IN[v], 1) and update(OUT[v], -1). IN being the node's DFS discovery time and OUT the DFS exit time.
Now the query would be query(IN[b]) - query(IN[a]-1). The query(IN[b]) is a prefix sum: it starts from the root, and traverses the tree until it reaches b. Note that for each node v we will pass not on the direct path from root to b, we will discover and then eventually exit it. Only for the nodes on the path we will discover and not exit. Because of the way we updated, this means that we will effectively sum the nodes on the path root, b (including b).
Now its clear that the same happens in query(IN[a]-1) - it is the sum of the nodes on the path root, a (not including a this time). Subtracting them gives us a, b. Draw a sketch and you'll see it for yourselves.
For completeness - the method for sum subtree is different both in update and in query. For each event you only update(IN[v]). Now for querying sum subtree(a) we do query(OUT[a]) - query(IN[a]-1). This time in query(OUT[a]) we sum all nodes we traversed until we discover a, and then all nodes in a's subtree until we exit it. Now we subtract query(IN[a] - 1) - all the nodes until we discover a. We're left exactly with only the a subtree.

RB tree with sum

I have some questions about augmenting data structures:
Let S = {k1, . . . , kn} be a set of numbers. Design an efficient
data structure for S that supports the following two operations:
Insert(S, k) which inserts the
number k into S (you can assume that k is not contained in S yet), and TotalGreater(S, a)
which returns the sum of all keys ki ∈ S which are larger than a, that is, P ki∈S, ki>a ki .
Argue the running time of both operations and give pseudo-code for TotalGreater(S, a) (do not given pseudo-code for Insert(S, k)).
I don't understand how to do this, I was thinking of adding an extra field to the RB-tree called sum, but then it doesn't work because sometimes I need only the sum of the left nodes and sometimes I need the sum of the right nodes too.
So I was thinking of adding 2 fields called leftSum and rightSum and if the current node is > GivenValue then add the cached value of the sum of the sub nodes to the current sum value.
Can someone please help me with this?
You can just add a variable size to each node, which is the number of nodes in the subtree rooted at that node. When finding the node with the smallest value that is larger than the value a, two things can happen on the path to that node: you can go left or right. Every time you go left, you add the size of the right child + 1 to the running total. Every time you go right, you do nothing.
There are two conditions for termination. 1) we find a node containing the exact value a, in which case we add the size of its right child to the total. 2) we reach a leaf, in which case we add 1 if it is larger than a, or nothing if it is smaller.
As Jordi describes: The key-word could be augmented red-black tree.

Finding closest number in a range

I thought a problem which is as follows:
We have an array A of integers of size n, and we have test cases t and in every test cases we are given a number m and a range [s,e] i.e. we are given s and e and we have to find the closest number of m in the range of that array(A[s]-A[e]).
You may assume array indexed are from 1 to n.
For example:
A = {5, 12, 9, 18, 19}
m = 13
s = 4 and e = 5
So the answer should be 18.
Constraints:
n<=10^5
t<=n
All I can thought is an O(n) solution for every test case, and I think a better solution exists.
This is a rough sketch:
Create a segment tree from the data. At each node, besides the usual data like left and right indices, you also store the numbers found in the sub-tree rooted at that node, stored in sorted order. You can achieve this when you construct the segment tree in bottom-up order. In the node just above the leaf, you store the two leaf values in sorted order. In an intermediate node, you keep the numbers in the left child, and right child, which you can merge together using standard merging. There are O(n) nodes in the tree, and keeping this data should take overall O(nlog(n)).
Once you have this tree, for every query, walk down the path till you reach the appropriate node(s) in the given range ([s, e]). As the tutorial shows, one or more different nodes would combine to form the given range. As the tree depth is O(log(n)), that is the time per query to reach these nodes. Each query should be O(log(n)). For all the nodes which lie completely inside the range, find the closest number using binary search in the sorted array stored in those nodes. Again, O(log(n)). Find the closest among all these, and that is the answer. Thus, you can answer each query in O(log(n)) time.
The tutorial I link to contains other data structures, such as sparse table, which are easier to implement, and should give O(sqrt(n)) per query. But I haven't thought much about this.
sort the array and do binary search . complexity : o(nlogn + logn *t )
I'm fairly sure no faster solution exists. A slight variation of your problem is:
There is no array A, but each test case contains an unsorted array of numbers to search. (The array slice of A from s to e).
In that case, there is clearly no better way than a linear search for each test case.
Now, in what way is your original problem more specific than the variation above? The only added information is that all the slices come from the same array. I don't think that this additional constraint can be used for an algorithmic speedup.
EDIT: I stand corrected. The segment tree data structure should work.

Determine if u is an ancestor of v

Heres the problem, and my attempted solution.
My solution:
1. Run a topological sort on the tree, which runs in linear time BigTheta(E+V) where E is the number of edges and V the number of vertices. This puts it in a linked list which also takes constant time.
2. A vertex u would be an ancestor if it has a higher finishing time than vertex v.
3. Look at the 2 vertice's in the linked list and compare their finishing time and return true or false depending on the result from step 2.
Does this sound correct or am i missing something?
I don't think your understanding of what "constant time" means is quite correct. "...time BigTheta(E+V) where E is the number of edges and V the number of vertices" is linear time, not constant time.
Granted, you are allowed to take linear time for the pre-processing, so that's ok, but how are you going to do your step 3 ("Look at the 2 vertice's in the linked list") in constant time?
Here is an approach that will work for any tree (not only binary). The pre-processing step is to perform an Euler Tour of the tree (this is just a DFS traversal) and create a list out of this tour. When you visit a node for the first time you append it to the list and when you visit it the last time you append it to the list.
Example:
x
/ \
y z
The list will look like: [b(x), b(y), e(y), b(z), e(z), e(x)]. Here b(x) means enter x and e(x) means leave x. Now once you have this list, you can answer the query is x an ancestor of y by performing the test b(x) is before b(y) and e(y) is before e(x) in the list.
The question is how can you do this in constant time?
For static trees (which is the case for you), you can use a lookup table (aka array) to store the b/e, now the test takes constant time. So this solves your problem.

Resources