Updating a tree and keeping track of the change in the nodes of some subtree - algorithm

Problem:
You are given a rooted tree where each node is numbered from 1 to N. Initially each node contains some positive value, say X. Now we are to perform two type of operations on the tree. Total 100000 operation.
First Type:
Given a node nd and a positive integer V, you need to decrease the value of all the nodes by some amount. If a node is at a distance of d from the given node then decrease its value by floor[v/(2^d)]. Do this for all the nodes.
That means value of node nd will be decreased by V (i.e, floor[V/2^0]). Values of its nearest neighbours will be decreased by floor[V/2] . And so on.
Second Type:
You are given a node nd. You have to tell the number of nodes in the subtree rooted at nd whose value is positive.
Note: Number of nodes in the tree may be upto 100000 and the initial values, X, in the nodes may be upto 1000000000. But the value of V by which the the decrement operation is to performed will be at most 100000.
How can this be done efficiently? I am stuck with this problem for many days. Any help is appreciated.
My Idea : I am thinking to solve this problem offline. I will store all the queries first. then, if somehow I can find the time[After which operation] when some node nd's value becomes less than or equal to zero(say it death time, for each and every node. Then we can do some kind of binary search (probably using Binary Indexed Trees/ Segment Trees) to answer all the queries of second type. But the problem is I am unable to find the death time for each node.
Also I have tried to solve it online using Heavy Light Decomposition but I am unable to solve it using it either.
Thanks!

Given a tree with vertex weights, there exists a vertex that, when chosen as the root, has subtrees whose weights are at most half of the total. This vertex is a "balanced separator".
Here's an O((n + k) polylog(n, k, D))-time algorithm, where n is the number of vertices and k is the number of operations and D is the maximum decrease. In the first phase, we compute the "death time" of each vertex. In the second, we count the live vertices.
To compute the death times, first split each decrease operation into O(log(D)) decrease operations whose arguments are powers of two between 1 and 2^floor(lg(D)) inclusive. Do the following recursively. Let v be a balanced separator, where the weight of a vertex is one plus the number of decrease operations on it. Compute distances from v, then determine, for each time and each power of two, the cumulative number of operations on v with that effective argument (i.e., if a vertex at distance 2 from v is decreased by 2^i, then record a -1 change in the 2^(i - 2) coefficient for v). Partition the operations and vertices by subtree. For each subtree, repeat this cumulative summary for operations originating in the subtree, but make the coefficients positive instead of negative. By putting the summary for a subtree together with v's summary, we determine the influence of decrease operations originating outside of the subtree. Finally, we recurse on each subtree.
Now, for each vertex w, we compute the death time using binary search. The decrease operations affecting w are given in a logarithmic number of summaries computed in the manner previously described, so the total cost for one vertex is log^2.
It sounds as though you, the question asker, know how the next part goes, but for the sake of completeness, I'll describe it. Do a preorder traversal to assign new labels to vertices and also compute for each vertex the interval of labels that comprises its subtree. Initialize a Fenwick tree mapping each vertex to one (live) or zero (dead), initially one. Put the death times and queries in a priority queue. To process a death, decrease the value of that vertex by one. To process a query, sum the values of vertices in the subtree interval.

Related

Find the shortest path in a graph which visits all node types

I can't figure out how to proceed with the following problem.
Say I have an unoriented graph with an end node and a start node, I need to find the shortest path between these two nodes, but the path must include all mustpass node types.
There can be up to 10 of these types. This means that I should visit at least one node of each type (marked with a letter in the image) and then go to the end. Once I visit one of the nodes of type B, I may, but need not, visit other nodes of type B. The nodes that are marked with a number simply form a path and do not need to be visited.
This question is very similar to this. There it was suggested to find the shortest path between all the crucial nodes and then use the DFS algorithm. Can I apply the same algorithm to this problem?
Let N be the number of vertices and M be the number of edges.
Break down the solution into two phases.
Phase 1:
Compute the distance between each pair of edges. If the edges are not weighted, this can be easily done by starting a BFS from each node, spending a total of O(N(N+M)) time. If the edges are weighted, you can you the Dijkstra's algorithm on each node, spending a total of O(N(NlogN+M)) time.
After this phase, we have computed dist(x,y) for any pair of nodes x,y.
Phase 2:
Now that we can query the distance between any pair of nodes in O(1) using the precomputed values in phase 1, it is time to put everything together. I can propose two possibilities here
Possibility 1:
Us a similar approach as in the thread you linked. There are 1! factorial orders in which you could visit each node. Let's say we have fixed one possible order [s,t1,t2,...,t10,e] (where s and e are the start / end nodes, and ti represents a type) and we are trying to find out what would be the most optimal way to visit the nodes from start to finish in that order. Since each type can have multiple nodes belonging to it, it is not as simple as querying distances for every consecutive types t_i and t_{i+1}.
What we should do instead if, for every node x, compute the fastest way to reach the end node from node x while respecting the order of types [s,t1,...,t10,e]. So if x is of type t_i, then we are looking for a way to reach e from x while visiting nodes of types t_{i+1}, t_{i+2}, ... t_{10} in that order. Call this value dp[x] (where dp stands for dynamic-programming).
We are looking to find dp[s]. How do we compute dp[x] for a given node x? Simple - iterate through all nodes y of type t_{i+1} and consider d(x,y) + dp[y]. Then we have dp[x] = min{dist(x,y) + dp[y] for all y of type t_{i+1}}. Note that we need to compute dp[x] starting from the nodes of type t_10 all the way back to the nodes of type t_1.
The complexity here is O(10! * N^2)
Possibility 2:
There is actually a much faster way to find the answer and reduce the complexity to O(2^10 * N^3) (which can give massive gains for large N, and especially for larger number of node types (like 20 instead of 10)).
To accomplish this we do the following. For each subset S of the set of types {1,2,...10}, and for each pair of nodes x, y with types in S define
dp[S][x][y], which represents the fastest way to traverse the graph starting from node x, ending in node y and visiting all at least one node for every type in S. Note that we don't care about the actual order. To compute dp[S][x][y] for a given (S,x,y), all we need to do is go over all the possibilities for the second type to visit (node z with type t3). then we update dp[S][x][y] according dist(x,z) + dp[S-t1][z][y] (where t1 is the type of the node x). The number of all the possible subsets along with start and end nodes is 2^10 * N^2. To compute each dp, we consider N possibilities for the second node to visit. So overall we get O(2^10 * N^3)
Note: in all of my analysis above, you can replace the value 10, with a more general K, representing the number of different types possible.

optimal way to calculate all nodes at distance less than k from m given nodes

A graph of size n is given and a subset of size m of it's nodes is given . Find all nodes which are at a distance <=k from ALL nodes of the subset .
eg . A->B->C->D->E is the graph , subset = {A,C} , k = 2.
Now , E is at distance <=2 from C , but not from A , so it should not be counted .
I thought of running Breadth First Search from each node in subset , and taking intersection of the respective answers .
Can it be further optimized ?
I went through many posts on SO , but they all direct to kd-trees which i don't understand , so is there any other way ?
I can think of two non-asymptotic (I believe) optimizations:
If you're done with BFS from one of the subset nodes, delete all nodes that have distance > k from it
Start with the two nodes in the subset whose distance is largest to get the smallest possible leftover graph
Of course this doesn't help if k is large (close to n), I have no idea in that case. I am positive however that k/d trees are not applicable to general graphs :)
Nicklas B's optimizations can be applied to both of the following optimizations.
Optimization #1: Modify BFS to do the intersection as it runs rather than afterwords.
The BFS and intersection seems to be the way to go. However, there is redudant work being done by the BFS. Specicially, it is expanding nodes that it doesn't need to expand (after the first BFS). This can be resolved by merging the intersection aspect into the BFS.
The solution seems to be to keep two sets of nodes, call them "ToVisit" and "Visited", rather than label nodes visited or not.
The new rules of the BFS are as followed:
Only nodes in ToVisit are expanded upon by the BFS. They are then moved from ToVisit to Visited to prevent being expanded twice.
The algorithm returns the Visited set as it's result and any nodes left in the ToVisit are discarded. This is then used as the ToVisit set for the next node.
The first node either uses a standard BFS algorithm or ToVisit is the list of all nodes. Either way, the result becomes the second ToVisit set for the second node.
It works better if The ToVisit set is small on average, which tends to be the case of m and k are much less than N.
Optimization #2: Pre-compute the distances if there are enough queries so queries just do intersections.
Although, this is incompatible with the first optimization. If there are a sufficient number of queries on differing subsets and k values, then it is better to find the distances between every pair of nodes ahead of time at a cost of O(VE).
This way you only need to do the intersections, which is O(V*M*Q), where Q is the number of queries, M is the average size of the subset over the queries and V is the number of nodes. If it is expected to the be case that O(M*Q) > O(E), then this approach should be less work. Noting the two most distant nodes are useful as any k equal or higher will always return the set of all vertices, resulting in just O(V) for the query cost in that case.
The distance data should then be stored in four forms.
The first is "kCount[A][k] = number of nodes with distance k or less from A". This provides an alternative to Niklas B.'s suggestion of "Start with the two nodes in the subset whose distance is largest to get the smallest possible leftover graph" in the case that O(m) > O(sqrt(V)) since finding the smallest is O(m^2) and it may be better to avoid trying to find the best choice for the starting pair and just pick a good choice. You can start with the two nodes in the subset with the smallest value for the given k in this data structure. You could also just sort the nodes in the subset by this metric and do the intersections in that order.
The second is "kMax[A] = max k for A", which can be done using a hashmap/dictionary. If the k >= this value, then this this one can be skipped unless kCount[A][kMax[A]] < (number of vertices), meaning not all nodes are reachable from A.
The third is "kFrom[A][k] = set of nodes k distance from A", since k is valid from 0 to the max distance, an hashmap/dictionary to an array/list could be used here rather than a nested hashmap/dictionary. This allows for space and time efficient*** creating the set of nodes with distance <= k from A.
The fourth is "dist[A][B] = distance from A to B", this can be done using a nested hashmap/dictionary. This allows for handling the intersection checks fairly quickly.
* If space isn't an issue, then this structure can store all the nodes k or less distance from A, but that requires O(V^3) space and thus time. The main benefit however is that it allow for also storing a separate list of nodes that are greater than k distance. This allows the algorithm use the smaller of the sets, dist > k or dist <= k. Using an intersection in the case of dist <= k and set subtraction in the case of dist <= k or intersection then set subtraction if the main set has the minimize size.
Add a new node (let's say s) and connect it to all the m given nodes.
Then, find all the nodes which are at a distance less than or equal to k+1 from s and subtract m from it. T(n)=O(V+E)

Given a node network, how to find the highest scoring loop with finite number of moves?

For a project of mine, I'm attempting to create a solver that, given a random set of weighted nodes with weighted paths, will find the highest scoring path with a finite number of moves. I've created a visual to help describe the problem.
This example has all the connection edges shown for completeness. The number on edges are traversal costs and numbers inside nodes are scores. A node is only counted when traversed to and cannot traverse to itself from itself.
As you can see from the description in the image, there is a start/finish node with randomly placed nodes that each have a arbitrary score. Every node is connected to all other nodes and every connection has an arbitrary weight that subtracts from the total number of move units remaining. For simplicity, you could assume the weight of a connection is a function of distance. Nodes can be traveled to more than once and their score is applied again. The goal is to find a loop path that has the highest score for the given move limit.
The solver will never be dealing with more than 30 nodes, usually dealing with 10-15 nodes. I still need to try and make it as fast as possible.
Any ideas on algorithms or methods that would help me solve this problem other than pure brute force methods?
Here's an O(m n^2)-time algorithm, where m is the number of moves and n is the number of nodes.
For every time t in {0, 1, ..., m} and every node v, compute the maximum score of a t-step walk that begins at the start node and ends at v as follows. If t = 0, then there's only walk, namely, doing nothing at the start node, so the maximum for (0, v) is 0 if v is the start node and -infinity (i.e., impossible) otherwise.
For t > 0, we use the entries for t - 1 to compute the entries for t. To compute the (t, v) entry, we add the score for v to the difference of the maximum over all nodes w of the (t - 1, w) entry minus the transition penalty from w to v. In other words, an optimal t-step walk to v consists of a step from some node w to v preceded by a (t - 1)-step walk to w, and this (t - 1)-step walk must be optimal because history does not influence future scoring.
At the end, we look at the (m, start node) entry. To recover the actual walk involves working backward and determining repeatedly which w was the best node to have come from.

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Finding number of nodes within a certain distance in a rooted tree

In a rooted and weighted tree, how can you find the number of nodes within a certain distance from each node? You only need to consider down edges, e.g. nodes going down from the root. Keep in mind each edge has a weight.
I can do this in O(N^2) time using a DFS from each node and keeping track of the distance traveled, but with N >= 100000 it's a bit slow. I'm pretty sure you could easily solve it with unweighted edges with DP, but anyone know how to solve this one quickly? (Less than N^2)
It's possible to improve my previous answer to O(nlog d) time and O(n) space by making use of the following observation:
The number of sufficiently-close nodes at a given node v is the sum of the numbers of sufficiently-close nodes of each of its children, less the number of nodes that have just become insufficiently-close.
Let's call the distance threshold m, and the distance on the edge between two adjacent nodes u and v d(u, v).
Every node has a single ancestor that is the first ancestor to miss out
For each node v, we will maintain a count, c(v), that is initially 0.
For any node v, consider the chain of ancestors from v's parent up to the root. Call the ith node in this chain a(v, i). Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. If we are able to quickly find i, then we can simply decrement c(a(v, i+1)) (bringing it (possibly further) below 0), so that when the counts of a(v, i+1)'s children are added to it in a later pass, v is correctly excluded from being counted. Provided we calculate fully accurate counts for all children of a node v before adding them to c(v), any such exclusions are correctly "propagated" to parent counts.
The tricky part is finding i efficiently. Call the sum of the distances of the first j >= 0 edges on the path from v to the root s(v, j), and call the list of all depth(v)+1 of these path lengths, listed in increasing order, s(v). What we want to do is binary-search the list of path lengths s(v) for the first entry greater than the threshold m: this would find i+1 in log(d) time. The problem is constructing s(v). We could easily build it using a running total from v up to the root -- but that would require O(d) time per node, nullifying any time improvement. We need a way to construct s(v) from s(parent(v)) in constant time, but the problem is that as we recurse from a node v to its child u, the path lengths grow "the wrong way": every path length x needs to become x + d(u, v), and a new path length of 0 needs to be added at the beginning. This appears to require O(d) updates, but a trick gets around the problem...
Finding i quickly
The solution is to calculate, at each node v, the total path length t(v) of all edges on the path from v to the root. This is easily done in constant time per node: t(v) = t(parent(v)) + d(v, parent(v)). We can then form s(v) by prepending -t to the beginning of s(parent(v)), and when performing the binary search, consider each element s(v, j) to represent s(v, j) + t (or equivalently, binary search for m - t instead of m). The insertion of -t at the start can be achieved in O(1) time by having a child u of a node v share v's path length array, with s(u) considered to begin one memory location before s(v). All path length arrays are "right-justified" inside a single memory buffer of size d+1 -- specifically, nodes at depth k will have their path length array begin at offset d-k inside the buffer to allow room for their descendant nodes to prepend entries. The array sharing means that sibling nodes will overwrite each other's path lengths, but this is not a problem: we only need the values in s(v) to remain valid while v and v's descendants are processed in the preorder DFS.
In this way we gain the effect of O(d) path length increases in O(1) time. Thus the total time required to find i at a given node is O(1) (to build s(v)) plus O(log d) (to find i using the modified binary search) = O(log d). A single preorder DFS pass is used to find and decrement the appropriate ancestor's count for each node; a postorder DFS pass then sums child counts into parent counts. These two passes can be combined into a single pass over the nodes that performs operations both before and after recursing.
[EDIT: Please see my other answer for an even more efficient O(nlog d) solution :) ]
Here's a simple O(nd)-time, O(n)-space algorithm, where d is the maximum depth of any node in the tree. A complete tree (a tree in which every node has the same number of children) with n nodes has depth d = O(log n), so this should be much faster than your O(n^2) DFS-based approach in most cases, though if the number of sufficiently-close descendants per node is small (i.e. if DFS only traverses a small number of levels) then your algorithm should not be too bad either.
For any node v, consider the chain of ancestors from v's parent up to the root. Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. So all we need to do is for each node, climb upwards towards the root until such time as the total path length exceeds the threshold distance m, incrementing the count at each ancestor as we go. There are n nodes, and for each node there are at most d ancestors, so this algorithm is trivially O(nd).

Resources