How to find the maximum-weight path between two vertices in a DAG? - algorithm

In a DAG G, with non negative weighted edges, how do you find the maximum-weight path between two vertices in G?
Thank you guys!

You can solve this in O(n + m) time (where n is the number of nodes and m the number of edges) using a topological sort. Begin by doing topological sort on the reverse graph, so that you have all the nodes ordered in a way such that no node is visited before all its children are visited.
Now, we're going to label all the nodes with the weight of the highest-weight path starting with that node. This is done based on the following recursive observation:
The weight of the highest-weight path starting from a sink node (any node with no outgoing edges) is zero, since the only path starting from that node is the length-zero path of just that node.
The weight of the highest-weight path starting from any other node is given by the maximum weight of any path formed by following an outgoing edge to a node, then taking the maximum-weight path from that node.
Because we have the nodes reverse-topologically sorted, we can visit all of the nodes in an order that guarantees that if we ever try following an edge and looking up the cost of the heaviest path at the endpoint of that node, we will have already computed the maximum-weight path starting at that node. This means that once we have the reverse topological sorted order, we can apply the following algorithm to all the nodes in that order:
If the node has no outgoing edges, record the weight of the heaviest path starting at that node (denoted d(u)) as zero.
Otherwise, for each edge (u, v) leaving the current node u, compute l(u, v) + d(v), and set d(u) to be the largest value attained this way.
Once we've done this step, we can make one last pass over all the nodes and return the highest value of d attained by any node.
The runtime of this algorithm can be analyzed as follows. Computing a topological sort can be done in O(n + m) time using many different methods. When we then scan over each node and each outgoing edge from each node, we visit each node and edge exactly once. This means that we spend O(n) time on the nodes and O(m) time on the edges. Finally, we spend O(n) time on one final pass over the elements to find the highest weight path, which takes O(n). This gives a grand total of O(n + m) time, which is linear in the size of the input.

A simple brute-force algorithm can be written using recursive functions.
Start with an empty vector (in C++: std::vector) and insert the first node.
Then call your recursive function with the vector as argument that does the following:
loop over all neighbours and for each neighbour
copy the vector
add the neighbour
call ourself
Also add the total weight as argument to the recursive function and add the weight in every recursive call.
The function should stop whenever it reaches the end node. Then compare the total weight with the maximum weight you have so far (use a global variable) and if the new total weight is bigger, set the maximum weight and store the vector.
The rest is up to you.

Related

Single-source shortest path in a graph with positive weights and diameter D

In a problem, I am given a graph G with only positive weights and its diameter (i.e. the greatest of the shortest paths among each pairs of vertices in G) = D. The problem asks for a single-source shortest path algorithm that is faster than Dijkstra and runs in O(V+E+D) time.
What I've considered so far:
I have thought about the method of adding dummy nodes so as to transform G into an unweighted graph G' and then run BFS, but this would result in a complexity of: O(V+WE)
(As in G', E' = O(WE) and V = O(WE+V))
It seems to me D doesn't really help reduce the complexity of the problem at all, as the sum of weights (i.e. total number of dummy nodes to add) is irrelevant to D.
Use Djikstra's algorithm with an optimised version of the priority queue. Assume the graph has nodes 0..V-1.
The priority queue will consist of an array Arr[0..D] (an array with indices between 0 and D inclusive) of doubly-linked lists of nodes, together with an index i indicating that the priority of all nodes in the array are of distance at least i from the starting node, and an array location[0..V - 1], where the value of location[node] is the doubly-linked list node in Arr containing node, or null if there is no such node. We store a node in the list Arr[i] when we have found a path of length i from the start node to the node in question.
Adding a node to the priority queue which is not already there is O(1) - if we have a tentative distance s, then we add the node to the linked list Arr[s] and update location[node] accordingly. Note that if the priority is >D, we should actually refrain from adding the node to the priority queue entirely and be confident that we will later add it to the queue with a priority <= D.
Removing a given node from the priority queue is also O(1) - we can find its doubly-linked-list node in O(1) using location[node], delete that node from the doubly-linked-list, and set location[node] to null. We will need this operation when we change the priority of a node.
Finding and removing the minimal node is less trivial. We continually increment i until we find some i such that Arr[i] is not empty. We then remove a node which is found in Arr[i] from the priority queue (don't forget to update location[node] as well). The total number of incrementings done in the whole program is D, since we will change i from 0 to D one increment at a time. Ignoring the incrementings, the other work done in this process is O(1).
Note that this is ONLY valid because we can guarantee that once we remove a node with priority i, we will never add another node with priority <i into the priority queue. It also only works because we know that we could never actually remove anything added to the priority queue with priority > D, since we can only remove something from the priority queue when it has its finalised correct path length, which is <= D - therefore, it's unnecessary to add anything to the priority queue with priority > D. This follows from the general properties of Dijkstra's algorithm when the graph has positive edge weights, and from the fact that the graph's diameter is D.
So the algorithm will be O(V + E + D), as required.

Specific Graph and need to more Creative solution

Directed Graph (|V|=a, |E|=b) is given.
each vertexes has specific weight. we want for each vertex (1..a) find a vertex with maximum weight that can be reachable from that vertex.
Update 1: one nice answer is prepare by #Paul in O(b + a log a). but I
search for O(a + b) algorithms, if any?
Is there any different efficient or fastest any other ways for doing it?
Yes, it's possible to modify Tarjan's SCC algorithm to solve this problem in linear time.
Tarjan's algorithm uses two node fields to drive its SCC finding logic: index, which represents the order in which the algorithm discovers the nodes; and lowlink, the minimum index reachable by a sequence of tree arcs followed by a back arc. As part of the same depth-first traversal, we can compute another field, maxweight, which has one of two meanings:
For a node not yet included in a finished SCC, it represents the maximum weight reachable by a sequence of tree arcs, optionally followed by a cross arc to another SCC and then any subsequent path.
For nodes in a finished SCC, it represents the maximum weight reachable.
The logic for computing maxweight is as follows. If we discover an arc from v to a new node w, then vw is a tree arc, so we compute w.maxweight recursively and update v.maxweight = max(v.maxweight, w.maxweight). If w is on the stack, then we do nothing, because vw is a back arc and not included in the definition of maxweight. Otherwise, vw is a cross arc, and we do the same update that we would have done for a tree arc, just without the recursive call.
When Tarjan's algorithm identifies an SCC, it's because it has a node r with r.lowlink == r.index. Since r is the depth-first search root of this SCC, its value of maxweight is correct for the whole SCC. Instead of recording each node in the SCC, we simply update its maxweight to r.maxweight.
Sort all nodes by weight in decreasing order and create the graph g' with all edges in E reversed (i.e. if there's an edge a -> b in g, there's an edge b -> a in g'). In this graph you can now propagate the maximum-value by simple DFS. Do this iteratively for all nodes and terminate when a maximum-weight has already been assigned.
As pseudocode:
dfs_assign_weight_reachable(node, weight):
if node.max_weight_reachable >= weight:
return
node.max_weight_reachable = weight
for n = neighbor of node:
dfs_assign_weight_reachable(n, weight)
g' = g with all edges reversed
nodes = nodes from g' sorted descendingly by weight
assign max_weight_reachable = -inf to each node in nodes
for node in nodes:
dfs_assign_weight_reachable(node, node.weight)
UPDATE:
The tight bound is O(b + a log a). a log a is caused by the sorting step. And each edge gets visited once during the reversal step and once during the assigning maximum weights, giving the second term in the max-expression.
Acknowledgement:
I'd like to thank #SerialLazer for the time invested in a discussion about the time-complexity of the above algorithm and helping me figure out the correct bound.

Time complexity of Hill Climbing algorithm for finding local min/max in a graph

What is the time complexity (order of algorithm) of an algorithm that finds the local minimum in a graph with n nodes (having each node a maximum of d neighbors)?
Detail: We have a graph with n nodes. Each node in the graph has an integer value. Each node has maximum of d neighbors. We are looking for a node that has the lowest value among its neighbors. The graph is represented by an adjacency list. The algorithm starts by selecting random nodes and, within these nodes, it selects the node with minimum value (let's say node u). Starting from node u, the algorithm finds a neighbor v, where value(v) < value(u). Then, it continues with v and repeats the above step. The algorithm terminates when the node does not have any neighbor with a lower value. What is the time complexity of this algorithm and why?
Time complexity is O(n + d), because you can have n nodes, which are connected as this, the number shows the value of node :
16-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1
And you can randomly select these, marked by "!"
!-!-!-13-12-11-10-9-8-7-6-5-4-3-2-1
So you select the node with value 14 and by described alghoritm, you will check all the nodes and all the edges until you reach the node with value 1.
The worst complexity for task : "find one element" is O(N), where "N" is the length of your input and length of your input is actually N=G(n,d)=n+d.

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Finding number of nodes within a certain distance in a rooted tree

In a rooted and weighted tree, how can you find the number of nodes within a certain distance from each node? You only need to consider down edges, e.g. nodes going down from the root. Keep in mind each edge has a weight.
I can do this in O(N^2) time using a DFS from each node and keeping track of the distance traveled, but with N >= 100000 it's a bit slow. I'm pretty sure you could easily solve it with unweighted edges with DP, but anyone know how to solve this one quickly? (Less than N^2)
It's possible to improve my previous answer to O(nlog d) time and O(n) space by making use of the following observation:
The number of sufficiently-close nodes at a given node v is the sum of the numbers of sufficiently-close nodes of each of its children, less the number of nodes that have just become insufficiently-close.
Let's call the distance threshold m, and the distance on the edge between two adjacent nodes u and v d(u, v).
Every node has a single ancestor that is the first ancestor to miss out
For each node v, we will maintain a count, c(v), that is initially 0.
For any node v, consider the chain of ancestors from v's parent up to the root. Call the ith node in this chain a(v, i). Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. If we are able to quickly find i, then we can simply decrement c(a(v, i+1)) (bringing it (possibly further) below 0), so that when the counts of a(v, i+1)'s children are added to it in a later pass, v is correctly excluded from being counted. Provided we calculate fully accurate counts for all children of a node v before adding them to c(v), any such exclusions are correctly "propagated" to parent counts.
The tricky part is finding i efficiently. Call the sum of the distances of the first j >= 0 edges on the path from v to the root s(v, j), and call the list of all depth(v)+1 of these path lengths, listed in increasing order, s(v). What we want to do is binary-search the list of path lengths s(v) for the first entry greater than the threshold m: this would find i+1 in log(d) time. The problem is constructing s(v). We could easily build it using a running total from v up to the root -- but that would require O(d) time per node, nullifying any time improvement. We need a way to construct s(v) from s(parent(v)) in constant time, but the problem is that as we recurse from a node v to its child u, the path lengths grow "the wrong way": every path length x needs to become x + d(u, v), and a new path length of 0 needs to be added at the beginning. This appears to require O(d) updates, but a trick gets around the problem...
Finding i quickly
The solution is to calculate, at each node v, the total path length t(v) of all edges on the path from v to the root. This is easily done in constant time per node: t(v) = t(parent(v)) + d(v, parent(v)). We can then form s(v) by prepending -t to the beginning of s(parent(v)), and when performing the binary search, consider each element s(v, j) to represent s(v, j) + t (or equivalently, binary search for m - t instead of m). The insertion of -t at the start can be achieved in O(1) time by having a child u of a node v share v's path length array, with s(u) considered to begin one memory location before s(v). All path length arrays are "right-justified" inside a single memory buffer of size d+1 -- specifically, nodes at depth k will have their path length array begin at offset d-k inside the buffer to allow room for their descendant nodes to prepend entries. The array sharing means that sibling nodes will overwrite each other's path lengths, but this is not a problem: we only need the values in s(v) to remain valid while v and v's descendants are processed in the preorder DFS.
In this way we gain the effect of O(d) path length increases in O(1) time. Thus the total time required to find i at a given node is O(1) (to build s(v)) plus O(log d) (to find i using the modified binary search) = O(log d). A single preorder DFS pass is used to find and decrement the appropriate ancestor's count for each node; a postorder DFS pass then sums child counts into parent counts. These two passes can be combined into a single pass over the nodes that performs operations both before and after recursing.
[EDIT: Please see my other answer for an even more efficient O(nlog d) solution :) ]
Here's a simple O(nd)-time, O(n)-space algorithm, where d is the maximum depth of any node in the tree. A complete tree (a tree in which every node has the same number of children) with n nodes has depth d = O(log n), so this should be much faster than your O(n^2) DFS-based approach in most cases, though if the number of sufficiently-close descendants per node is small (i.e. if DFS only traverses a small number of levels) then your algorithm should not be too bad either.
For any node v, consider the chain of ancestors from v's parent up to the root. Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. So all we need to do is for each node, climb upwards towards the root until such time as the total path length exceeds the threshold distance m, incrementing the count at each ancestor as we go. There are n nodes, and for each node there are at most d ancestors, so this algorithm is trivially O(nd).

Resources