Single-source shortest path in a graph with positive weights and diameter D - algorithm

In a problem, I am given a graph G with only positive weights and its diameter (i.e. the greatest of the shortest paths among each pairs of vertices in G) = D. The problem asks for a single-source shortest path algorithm that is faster than Dijkstra and runs in O(V+E+D) time.
What I've considered so far:
I have thought about the method of adding dummy nodes so as to transform G into an unweighted graph G' and then run BFS, but this would result in a complexity of: O(V+WE)
(As in G', E' = O(WE) and V = O(WE+V))
It seems to me D doesn't really help reduce the complexity of the problem at all, as the sum of weights (i.e. total number of dummy nodes to add) is irrelevant to D.

Use Djikstra's algorithm with an optimised version of the priority queue. Assume the graph has nodes 0..V-1.
The priority queue will consist of an array Arr[0..D] (an array with indices between 0 and D inclusive) of doubly-linked lists of nodes, together with an index i indicating that the priority of all nodes in the array are of distance at least i from the starting node, and an array location[0..V - 1], where the value of location[node] is the doubly-linked list node in Arr containing node, or null if there is no such node. We store a node in the list Arr[i] when we have found a path of length i from the start node to the node in question.
Adding a node to the priority queue which is not already there is O(1) - if we have a tentative distance s, then we add the node to the linked list Arr[s] and update location[node] accordingly. Note that if the priority is >D, we should actually refrain from adding the node to the priority queue entirely and be confident that we will later add it to the queue with a priority <= D.
Removing a given node from the priority queue is also O(1) - we can find its doubly-linked-list node in O(1) using location[node], delete that node from the doubly-linked-list, and set location[node] to null. We will need this operation when we change the priority of a node.
Finding and removing the minimal node is less trivial. We continually increment i until we find some i such that Arr[i] is not empty. We then remove a node which is found in Arr[i] from the priority queue (don't forget to update location[node] as well). The total number of incrementings done in the whole program is D, since we will change i from 0 to D one increment at a time. Ignoring the incrementings, the other work done in this process is O(1).
Note that this is ONLY valid because we can guarantee that once we remove a node with priority i, we will never add another node with priority <i into the priority queue. It also only works because we know that we could never actually remove anything added to the priority queue with priority > D, since we can only remove something from the priority queue when it has its finalised correct path length, which is <= D - therefore, it's unnecessary to add anything to the priority queue with priority > D. This follows from the general properties of Dijkstra's algorithm when the graph has positive edge weights, and from the fact that the graph's diameter is D.
So the algorithm will be O(V + E + D), as required.

Related

Specific Graph and need to more Creative solution

Directed Graph (|V|=a, |E|=b) is given.
each vertexes has specific weight. we want for each vertex (1..a) find a vertex with maximum weight that can be reachable from that vertex.
Update 1: one nice answer is prepare by #Paul in O(b + a log a). but I
search for O(a + b) algorithms, if any?
Is there any different efficient or fastest any other ways for doing it?
Yes, it's possible to modify Tarjan's SCC algorithm to solve this problem in linear time.
Tarjan's algorithm uses two node fields to drive its SCC finding logic: index, which represents the order in which the algorithm discovers the nodes; and lowlink, the minimum index reachable by a sequence of tree arcs followed by a back arc. As part of the same depth-first traversal, we can compute another field, maxweight, which has one of two meanings:
For a node not yet included in a finished SCC, it represents the maximum weight reachable by a sequence of tree arcs, optionally followed by a cross arc to another SCC and then any subsequent path.
For nodes in a finished SCC, it represents the maximum weight reachable.
The logic for computing maxweight is as follows. If we discover an arc from v to a new node w, then vw is a tree arc, so we compute w.maxweight recursively and update v.maxweight = max(v.maxweight, w.maxweight). If w is on the stack, then we do nothing, because vw is a back arc and not included in the definition of maxweight. Otherwise, vw is a cross arc, and we do the same update that we would have done for a tree arc, just without the recursive call.
When Tarjan's algorithm identifies an SCC, it's because it has a node r with r.lowlink == r.index. Since r is the depth-first search root of this SCC, its value of maxweight is correct for the whole SCC. Instead of recording each node in the SCC, we simply update its maxweight to r.maxweight.
Sort all nodes by weight in decreasing order and create the graph g' with all edges in E reversed (i.e. if there's an edge a -> b in g, there's an edge b -> a in g'). In this graph you can now propagate the maximum-value by simple DFS. Do this iteratively for all nodes and terminate when a maximum-weight has already been assigned.
As pseudocode:
dfs_assign_weight_reachable(node, weight):
if node.max_weight_reachable >= weight:
return
node.max_weight_reachable = weight
for n = neighbor of node:
dfs_assign_weight_reachable(n, weight)
g' = g with all edges reversed
nodes = nodes from g' sorted descendingly by weight
assign max_weight_reachable = -inf to each node in nodes
for node in nodes:
dfs_assign_weight_reachable(node, node.weight)
UPDATE:
The tight bound is O(b + a log a). a log a is caused by the sorting step. And each edge gets visited once during the reversal step and once during the assigning maximum weights, giving the second term in the max-expression.
Acknowledgement:
I'd like to thank #SerialLazer for the time invested in a discussion about the time-complexity of the above algorithm and helping me figure out the correct bound.

How does Dijkstra's Algorithm find shortest path?

How can the shortest path be A,C,E,B,D when there is no path between E and B?
Dijkstra's algorithm adds nodes to the queue in the same order as Breadth-First-Search (BFS) does: when a node is tested its immediate neighbors are added to the queue.
The difference is the way nodes are pulled out from the queue. While BFS does it in FIFO (first in first out) sequence, Dijkstra's algorithm does it by priority.
The node with the highest priority is pulled out from the queue. The priority is set by the cost to get from the origin to that node.
When the origin A is tested its immediate neighbors are added to the queue, so the queue holds 2 nodes :
B(10), C(3)
For convenience I added the cost to each node's name.
The next node to be pulled out of the queue and tested, is the one with the highest priority = lowest cost which is C. After testing C the queue looks like that:
B(7), E(5), D(11)
The cost of B was updated from 10 to 7 because a path with a lower cost (A->C->B) was found.
The next node to be pulled out of the queue is E. Testing E does not add add any of its neighbors (C,D) to the queue. C has already been tested , and D is in the queue.
The queue after pulling E out looks like that:
B(7), D(11)
B which has the highest priority (lowest cost from origin) is pulled out from the queue.
Testing B updates the cost of D to 7+2 = 9. Now we have only D in the queue:
D(9)
D is pulled out and because it it the target the search stops. The right shortest path having the cost of 9 has been found.
Dijkstra's algorithm computes what is the lowest cost from the starting node in this case A to all other nodes in a typical implementation.
To get the complete path from node A to some other node we follow the back pointers back to
A. This, is not shown in this example.
The nodes in S are arranged in the order of increasing cost from A. I am including a few resources on the topic, which might be helpful:
https://math.mit.edu/~rothvoss/18.304.3PM/Presentations/1-Melissa.pdf
https://www.programiz.com/dsa/dijkstra-algorithm

DAG - Algorithm to ensure there is a single source and a single sink

I have to ensure that a graph in our application is a DAG with a unique source and a unique sink.
Specifically I have to ensure that for a given start node and end node (both of which are known at the outset) that every node in the graph lies on a path from the start node to the end node.
I already have an implementation of Tarjan's algorithm which I use to identify cycles, and a a topological sorting algorithm that I can run once Tarjan's algorithm reports the graph is a DAG.
What is the most efficient way to make sure the graph meets this criterion?
If your graph is represented by an adjacency matrix, then node x is a source node if the xth column of the matrix is 0 and is a sink node if row x of the matrix is 0. You could run two quick passes over the matrix to count the number of rows and columns that are 0s to determine how many sources and sinks exist and what they are. This takes time O(n2) and is probably the fastest way to check this.
If your graph is represented by an adjacency list, you can find all sink nodes in time O(n) by checking whether any node has no outgoing edges. You can find all sinks by maintaining for each node a boolean value indicating whether it has any incoming edges, which is initially false. You can then iterate across all the edges in the list in time O(n + m) marking all nodes with incoming edges. The nodes that weren't marked as having incoming edges are then sources. This process takes time O(m + n) and has such little overhead that it's probably one of the fastest approaches.
Hope this helps!
A simple breadth or depth-first search should satisfy this. Firstly you can keep a set of nodes that comprise sink nodes that you've seen. Secondly, you can keep a set of nodes that you've discovered using BFS/DFS. Then the graph will then be connected iff there is a single connected component. Assuming you're using some kind of adjacency list style representation for your graph, the algorithm would look something like this:
create an empty queue
create an empty set to store seen vertices
create an empty set for sink nodes
add source node to queue
while queue is not empty
get next vertex from queue, add vertex to seen vertices
if num of adjacent nodes == 0
add sink nodes to sink node set
else
for each node in adjacent nodes
if node is not in seen vertices
add node to queue
return size of sink nodes == 1 && size of seen vertices == total number in graph
This will be linear in the number of vertices and edges in the graph.
Note that as long as you know a source vertex to start from, this will also ensure the property of a single source: any other vertex that is a source won't be discovered by the BFS/DFS, and thus the size of the seen vertices won't be the total number in the graph.
If your algorithm takes as input a DAG, which is weakly connected, assume that there are only one node s whose in-degree is zero and only one node t whose out-degree is zero, while all other nodes have positive in-degree and out-degree, then s can reach all other nodes, and all other nodes can reach t. By contradiction, assume that there is a node v that s cannot reach. Since no nodes can reach s, that is, v cannot reach s too. As such, v and s are disconnected, which contradicts the assumption. On the other hand, if the DAG is not weakly connected, it definitely does not satisfy the requirements that you want. In sum, you can first compute the weakly connected component of the DAG simply using BFS/DFS, meanwhile remembering the number of nodes whose in-degree or out-degree is zero. The complexity of this algorithm is O(|E|).
First, do a DFS on the graph, starting from the source node, which, as you say, is known in advance. If you encounter a back edge[1], then you have a cycle and you can exit with an error. During this traversal you can identify if there are nodes, not reachable from the source[2] and take the appropriate action.
Once you have determined the graph is a DAG, you can ensure that every node lies on a path from the source to the sink by another DFS, starting from the source, as follows:
bool have_path(source, sink) {
if source == sink {
source.flag = true
return true
}
// traverse all successor nodes of `source`
for dst in succ(source) {
if not dst.flag and not have_path(dst, sink)
return false // exit as soon as we find a node with no path to `sink`
}
source.flag = true;
return true
}
The procedure have_path sets a boolean flag in each node, for which there exists some path from that node to the sink. In the same time the procedure traverses only nodes reachable from the source and it traverses all nodes reachable from the source. If the procedure returns true, then all the nodes, reachable from the source lie on a path to the sink. Unreachable nodes were already handled in the first phase.
[1] an edge, linking a node with a greater DFS number to an node with a lesser DFS number, that is not already completely processed, i.e. is still on the DFS stack
[2] e.g. they would not have an assigned DFS number

Finding number of nodes within a certain distance in a rooted tree

In a rooted and weighted tree, how can you find the number of nodes within a certain distance from each node? You only need to consider down edges, e.g. nodes going down from the root. Keep in mind each edge has a weight.
I can do this in O(N^2) time using a DFS from each node and keeping track of the distance traveled, but with N >= 100000 it's a bit slow. I'm pretty sure you could easily solve it with unweighted edges with DP, but anyone know how to solve this one quickly? (Less than N^2)
It's possible to improve my previous answer to O(nlog d) time and O(n) space by making use of the following observation:
The number of sufficiently-close nodes at a given node v is the sum of the numbers of sufficiently-close nodes of each of its children, less the number of nodes that have just become insufficiently-close.
Let's call the distance threshold m, and the distance on the edge between two adjacent nodes u and v d(u, v).
Every node has a single ancestor that is the first ancestor to miss out
For each node v, we will maintain a count, c(v), that is initially 0.
For any node v, consider the chain of ancestors from v's parent up to the root. Call the ith node in this chain a(v, i). Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. If we are able to quickly find i, then we can simply decrement c(a(v, i+1)) (bringing it (possibly further) below 0), so that when the counts of a(v, i+1)'s children are added to it in a later pass, v is correctly excluded from being counted. Provided we calculate fully accurate counts for all children of a node v before adding them to c(v), any such exclusions are correctly "propagated" to parent counts.
The tricky part is finding i efficiently. Call the sum of the distances of the first j >= 0 edges on the path from v to the root s(v, j), and call the list of all depth(v)+1 of these path lengths, listed in increasing order, s(v). What we want to do is binary-search the list of path lengths s(v) for the first entry greater than the threshold m: this would find i+1 in log(d) time. The problem is constructing s(v). We could easily build it using a running total from v up to the root -- but that would require O(d) time per node, nullifying any time improvement. We need a way to construct s(v) from s(parent(v)) in constant time, but the problem is that as we recurse from a node v to its child u, the path lengths grow "the wrong way": every path length x needs to become x + d(u, v), and a new path length of 0 needs to be added at the beginning. This appears to require O(d) updates, but a trick gets around the problem...
Finding i quickly
The solution is to calculate, at each node v, the total path length t(v) of all edges on the path from v to the root. This is easily done in constant time per node: t(v) = t(parent(v)) + d(v, parent(v)). We can then form s(v) by prepending -t to the beginning of s(parent(v)), and when performing the binary search, consider each element s(v, j) to represent s(v, j) + t (or equivalently, binary search for m - t instead of m). The insertion of -t at the start can be achieved in O(1) time by having a child u of a node v share v's path length array, with s(u) considered to begin one memory location before s(v). All path length arrays are "right-justified" inside a single memory buffer of size d+1 -- specifically, nodes at depth k will have their path length array begin at offset d-k inside the buffer to allow room for their descendant nodes to prepend entries. The array sharing means that sibling nodes will overwrite each other's path lengths, but this is not a problem: we only need the values in s(v) to remain valid while v and v's descendants are processed in the preorder DFS.
In this way we gain the effect of O(d) path length increases in O(1) time. Thus the total time required to find i at a given node is O(1) (to build s(v)) plus O(log d) (to find i using the modified binary search) = O(log d). A single preorder DFS pass is used to find and decrement the appropriate ancestor's count for each node; a postorder DFS pass then sums child counts into parent counts. These two passes can be combined into a single pass over the nodes that performs operations both before and after recursing.
[EDIT: Please see my other answer for an even more efficient O(nlog d) solution :) ]
Here's a simple O(nd)-time, O(n)-space algorithm, where d is the maximum depth of any node in the tree. A complete tree (a tree in which every node has the same number of children) with n nodes has depth d = O(log n), so this should be much faster than your O(n^2) DFS-based approach in most cases, though if the number of sufficiently-close descendants per node is small (i.e. if DFS only traverses a small number of levels) then your algorithm should not be too bad either.
For any node v, consider the chain of ancestors from v's parent up to the root. Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. So all we need to do is for each node, climb upwards towards the root until such time as the total path length exceeds the threshold distance m, incrementing the count at each ancestor as we go. There are n nodes, and for each node there are at most d ancestors, so this algorithm is trivially O(nd).

How to find the maximum-weight path between two vertices in a DAG?

In a DAG G, with non negative weighted edges, how do you find the maximum-weight path between two vertices in G?
Thank you guys!
You can solve this in O(n + m) time (where n is the number of nodes and m the number of edges) using a topological sort. Begin by doing topological sort on the reverse graph, so that you have all the nodes ordered in a way such that no node is visited before all its children are visited.
Now, we're going to label all the nodes with the weight of the highest-weight path starting with that node. This is done based on the following recursive observation:
The weight of the highest-weight path starting from a sink node (any node with no outgoing edges) is zero, since the only path starting from that node is the length-zero path of just that node.
The weight of the highest-weight path starting from any other node is given by the maximum weight of any path formed by following an outgoing edge to a node, then taking the maximum-weight path from that node.
Because we have the nodes reverse-topologically sorted, we can visit all of the nodes in an order that guarantees that if we ever try following an edge and looking up the cost of the heaviest path at the endpoint of that node, we will have already computed the maximum-weight path starting at that node. This means that once we have the reverse topological sorted order, we can apply the following algorithm to all the nodes in that order:
If the node has no outgoing edges, record the weight of the heaviest path starting at that node (denoted d(u)) as zero.
Otherwise, for each edge (u, v) leaving the current node u, compute l(u, v) + d(v), and set d(u) to be the largest value attained this way.
Once we've done this step, we can make one last pass over all the nodes and return the highest value of d attained by any node.
The runtime of this algorithm can be analyzed as follows. Computing a topological sort can be done in O(n + m) time using many different methods. When we then scan over each node and each outgoing edge from each node, we visit each node and edge exactly once. This means that we spend O(n) time on the nodes and O(m) time on the edges. Finally, we spend O(n) time on one final pass over the elements to find the highest weight path, which takes O(n). This gives a grand total of O(n + m) time, which is linear in the size of the input.
A simple brute-force algorithm can be written using recursive functions.
Start with an empty vector (in C++: std::vector) and insert the first node.
Then call your recursive function with the vector as argument that does the following:
loop over all neighbours and for each neighbour
copy the vector
add the neighbour
call ourself
Also add the total weight as argument to the recursive function and add the weight in every recursive call.
The function should stop whenever it reaches the end node. Then compare the total weight with the maximum weight you have so far (use a global variable) and if the new total weight is bigger, set the maximum weight and store the vector.
The rest is up to you.

Resources