How to find longest path in graph? - algorithm

We are given an Adjacency list of the form
U -> (U,V,C) -> (U,V,C) ...
U2 -> ...
U3 -> ...
.
.
etc
(U,V,C) means there's an edge from U to V with cost C.
The given Adjacency list is for a single connected tree with N nodes thus containing N-1 edges.
A set of nodes F=F1,F2,F3...Fk are given.
Now the question is what is the best way to find the longest path amongst the nodes in F?
Is it possible to do it in O(N)?
Is DFS from each node in F the only option?

I understood your question as asking to find a pair of nodes from the set F so that the unique path between those two nodes is as long as it can be. The path is unique because your graph is a tree.
The problem can be solved trivially by doing DFS from every node in F as you mention, for an O(n k) solution where n is the size of the graph and k is the size of the set F.
However, you can solve it potentially faster by a divide and conquer approach. Pick any node R from the graph, and use a single DFS to tabulate distances Dist(R, a) to every other node a a and at the same time partition the nodes to subtrees S1,...,Sm where m is the number of edges from R; that is, these are the m trees hanging at the root R. Now, for any f and g that belong to different subtrees it holds that the path between them has Dist(R, f) + Dist(R, g) edges, so it is possible to search for the longest such path in O(k^2) time. In addition, you have then to recurse to the subproblems S1,...,Sm to cover the case where the longest path is inside one of those trees. The overall complexity can be lower than O(n k) but the math is left as an exercise to the reader.

If I understood your question correctly, you are trying to find the longest cost path in a spanning tree.
You can find the path in just 2 complete traversal i.e., O(2N) ~ O(N) for large value of N.
you should do below step.
Pick any node in the spanning tree.
Run any algo (DFS or BFS) from the node and find the longest cost
path from this node.
This will not be your longest cost path as you started by randomly picking a node.
Run BFS or DFS one more time from the last node of longest cost path
found at step 2.
This time the longest cost path you get, will be the Longest cost
path in spanning tree.
You do not have to run DFS from each node.

Related

Why can Dijkstra's Algorithm be modified to find K shortest paths?

I am trying to find an intuitive explanation as to why we can generalize Dijkstra's Algorithm to find the K shortest (simple) paths from a single source in a directed, weighted graph with no negative edges. According to Wikipedia, the pseudocode for the modified Dijkstra is as follows:
Definitions:
G(V, E): weighted directed graph, with set of vertices V and set of directed edges E,
w(u, v): cost of directed edge from node u to node v (costs are non-negative).
Links that do not satisfy constraints on the shortest path are removed from the graph
s: the source node
t: the destination node
K: the number of shortest paths to find
P[u]: a path from s to u
B: a heap data structure containing paths
P: set of shortest paths from s to t
count[u]: number of shortest paths found to node u
Algorithm:
P = empty
for all u in V:
count[u] = 0
insert path P[s] = {s} into B with cost 0
while B is not empty:
let P[u] be the shortest cost path in B with cost C
remove P[u] from B
count[u] = count[u] + 1
if count[u] <= K then:
for each vertex v adjacent to u:
let P[v] be a new path with cost C + w(u, v) formed by concatenating edge (u, v) to path P[u]
insert P[v] into B
return P
I know that, for the original Dijkstra's Algorithm, one can prove by induction that when a node is added to the closed set (or popped from a heap if it's implemented in the form of BFS + heap), the cost to that node must be minimum from the source.
This algorithm here seems to be based on the fact that when a node is popped for the ith time from the heap, we have the ith smallest cost to it from the source. Why is this true?
The Wiki article doesn't specify, but that code will only solve the 'loopy' version of k-shortest-paths, where paths are not required to be simple.
The simple path version of the problem is harder: you'll want to look at something like Yen's algorithm, which does clever filtering to avoid repeated points when generating paths. Yen's algorithm can use Dijkstra's algorithm as a subroutine, but any other shortest-path algorithm can also be used instead.
There is no obvious way to modify Dijkstra's algorithm to solve the k-shortest-simple-paths problem. You'd need to track the paths in the priority queue (which is already done in your posted code), but there's an exponential upper bound on the number of times each vertex can be explored.
Here, if count[u] <= K puts an upper bound of K+1 on the number of times a vertex can be explored, which works for the non-simple path case. On the other hand, a direct modification of Dijkstra's algorithm for simple paths would, in the worst case, require you to explore a node once for each of the 2^(V-1) possibilities of which nodes had been previously visited (or possibly a slightly smaller exponential).

How to find longest increasing subsequence among all simple paths of an unweighted general graph?

Let G = (V, E) be an unweighted general graph in which every vertex v has a weight w(v).
An increasing subsequence of a simple path p in G is a sequence of vertices of p in which the weights of all vertices along this sequence increase. The simple paths can be closed paths.
A longest increasing subsequence (LIS) of a simple path p is an increasing subsequence of p such that has maximum number of vertices.
The question is that, how to find a longest increasing subsequence among all simple paths of G?
Note that the graph is undirected, therefore it is not a directed acyclic graph (DAG).
Here's a very fast algorithm for solving this problem. The longest increasing subsequence in the graph is a subsequence of a path in the graph, and each path must belong purely to a single connected component. So if we can solve this problem on connected components, we can solve it for the overall graph by finding the best solution across all connected components.
Next, think about the case where you're solving this problem for a connected graph G. In that case, the longest increasing subsequence you could find would be formed by sorting the nodes by their weight, then traversing from the lowest-weight node to the second, then to the third, then to the fourth, etc. If there are any ties or duplicates, you can just skip them. In other words, you can solve this problem by
Sorting all the nodes by weight,
Discarding all but one node of each weight, and
Forming an LIS by visiting each node in sequence.
This leads to a very fast algorithm for the overall problem. In time O(m + n), find all connected components. For each connected component, use the preceding algorithm in time O(Sort(n)), where Sort(n) is the time required to sort n elements (which could be Θ(n log n) if you use heapsort, Θ(n + U) for bucket sort, Θ(n lg U) for radix sort, etc.). Then, return the longest sequence you find.
Overall, the runtime is O(m + n + &Sort(n)), which beats my previous approach and should be a lot easier to code up.
I had originally posted this answer, which I'll leave up because I think it's interesting:
Imagine that you pick a simple path out of the graph G and look at the longest increasing subsequence of that path. Although the path walks all over the graph and might have lots of intermediary nodes, the longest increasing subsequence of that path really only cares about
the first node on the path that's also a part of the LIS, and
from that point, the next-largest value in the path.
As a result, we can think about forming an LIS like this. Start at any node in the graph. Now, travel to any node in the graph that (1) has a higher value than the current node and (2) is reachable from the current node, then repeat this process as many times as desired. The goal is to do so in a way that gives the longest possible sequence of increasing values.
We can model this process as finding a longest path in a DAG. Each node in the DAG represents a node in the original graph G, and there's an edge from a node u to a node v if
there's a path from u to v in G, and
w(u) < w(v).
This is a DAG because of that second condition, even though the original graph isn't a DAG.
So we can solve this overall problem in a two-step process. First, build the DAG described above. To do so:
Find the connected components of the original graph G and label each node with its connected component number. Time: O(m + n).
For each node u in G, construct a corresponding node u' in a new DAG D. Time: O(n).
For each node u in G, and for each node v in G that's in the same SCC as u, if w(u) < w(v), add an edge from u' to v'. Time: Θ(n2) in the worst-case, Θ(n) in the best case.
Find the longest path in D. This path corresponds to the longest increasing subsequence of any simple path in G. Time: O(m + n).
Overall runtime: Θ(n2) in the worst-case, Θ(m + n) in the best-case.

All-pair shortest path for minimum spanning tree

I am trying to solve an algorithm challenge about graphs, which I have managed to break down to the following: Given an undirected spanning tree, find the 2 leaves such that the cost between them is minimal.
Now I know of the Floyd Warshall algorithm that can find all-pair shortest paths with time complexity O(N^3) and space complexity O(N^2). The input of the problem is N = 10^5 so O(N^3) and O(N^2) are too much.
Is there a way to optimize space and time complexity for this problem?
As #Codor said, elaborating on that, in a MST there is only one unique path b/w any pair of nodes, and same will be the shortest path.
In order to calculate shortest path b/w all pairs.
You can choose to follow this algorithm.
You can basically choose find the root of the MST by constantly removing leaf nodes till only one or two nodes are left .
Complexity : centre node in a tree
this can be achieved in O(V) i.e linear time
Choose one of them as root. Calculate distance of all the other nodes in respect to the root node using Breadth First Search(BFS).
Complexity : O(V+E) ~ O(V) in case of tree
Now you can find distance b/w any pair of nodes call it a,b. Find its least common ancestor(lcp).
Then there are two case if
lcp(a,b) = r (root of the tree).
dis(a,b) = dis[a] + dis[b]
lcp(a,b) = c ( which is not the root node)
dis(a,b) = dis[a] + dis[b] - 2 * dis[c]
where dis(x,y) = distance b/w node x,y
and dis[x] distance of node x from root node
If implemented using Ranked Union Find
Complexity : O(h) , where h is height of the tree per pair of (a,b).
h = X/2, where X is the diameter of the tree.
So total complexity depends on the no. of leaf node pairs.

How can I check if a given spanning tree is a MST?

Suppose we are given a graph G and a spanning tree T of it. How can we check that if this spanning tree is a MST or not?
I suggest that for each edge i that is not in T, all edges j on the unique path of T from one end of i to the other end, we must have w_i >= w_j
You want to check that for each edge (u, v) not in the MST, the path from u to v in the MST has no edge with weight larger than that of (u, v). For a single vertex in the tree, you can use a single BFS or DFS to find the largest-weight edge on the path to all other nodes, so this gives an O(n2) algorithm (n times O(n) search). You can probably do better by not starting from scratch for each vertex. That said, it may be more efficient to just calculate an MST and see if the sum of all edge weights is the same. As mentioned by #Niklas in the comments, there are linear time methods for verifying an MST, but they seem to be significantly more involved.

graph - The implementation of updating Minimum Spanning Tree after adding a new edge

Here is an excise
Suppose we are given the minimum spanning tree T of a given graph G
(with n vertices and m edges) and a new edge e = (u, v) of weight w
that we will add to G. Give an efficient algorithm to find the minimum
spanning tree of the graph G + e. Your algorithm should run in O(n)
time to receive full credit.
I have this idea:
In the MST, just find out the path between u and v. Then find the edge (along the path) with maximum weight; if the maximum weight is bigger than w, then remove that edge from the MST and add the new edge to the MST.
The tricky part is how to do this in O(n) time and it is also I get stuck.
The question is that how the MST is stored. In normal Prim's algorithm, the MST is stored as a parent array, i.e., each element is the parent of the according vertex.
So suppose the excise give me a parent array indicating the MST, how can I release the above algorithm in O(n)?
First, how can I identify the path between u and v from the parent array? I can have two ancestor arrays for u and v, then check on the common ancestor, then I can get the path, although in backwards. I think for this part, to find the common ancestor, at least I have to do it in O(n^2), right?
Then, we have the path. But we still need to find the weight of each edge along the path. Since I suppose the graph will use adjacency-list for Prim's algorithm, we have to do O(m) (m is the number of edges) to locate each weight of the edge.
...
So I don't see it is possible to do the algorithm in O(n). Am I wrong?
The idea you have is right. Note that, finding the path between u and v is O(n). I'll assume you have a parent array identifying the MST. tracking the path (for max edge) from u to v or u to root vertex should take only O(n). If you reach root vertex, just track the path from v to u or root vertex.
Now that you have the path from u -> u1 ... -> max_path_vert1 -> max_path_vert2 -> ... -> v, remove the edge max_path_vert1->max_path_vert2 (assuming this is greater than the added edge) and reverse the parents for u->...->max_path_vert1 and mark parent[u] = v.
Edit: More explanation for clarity
Note that, in MST there will be exactly one path between any pair of vertices. So, if you can trace from u->y and v->y, you have only traced through atmost n vertices. If you traced more than n vertices that means you visited a vertex twice, which will not happen in an MST. Ok, now hopefully you're convinced it's O(n) to track from u->y and v->y. Once you have these paths, you have established a path from u->v. Do you see how? I'm assuming this is an undirected graph, since finding MST for directed graph is a different concept in itself. For undirected graph, when you have a path from x->y you have a path from y-x. So, u->y->v exist. You don't even need to trace back from y->v, since weights for v->y will be same as that of y->v. Just find the edge with the maximum weight when you trace from u->y and v->y.
Now for finding edge weights in O(1); how are you storing your current weights? Adjacency list or adjacency matrix? For O(1) access, store it the way parent vertex array is stored. So, weight[v] = weight(v, parent[v]). So, you'll have O(1) access. Hope this helps.
Well - your solution is correct.
But regarding implementation, I dont see why you are using G instead of T to find the path between u and v. Using any search traversal in T for the path between u and v, will give you O(n). - That is, you can assume that v is the root and performs a Depth-First Search algorithm [in this case, you will have to assume all neighbors of v as children] - and stop the DFS once you find u - then, the nodes in the stack corresponds to the path between u and v.
It is easy afterward to find the cost of each edge in the path (O(n)), and it is easy as well to delete/add edges. In total O(n).
Does that help somehow ?
Or maybe you are getting O(n^2) - according to my understanding - because you access the children of a vertex v in T in O(n) -- Here, you have to present your data structure as a mapped array so that the cost is reduced to O(1). [for instace, {a,b,c,u,w}(vertices) -> {0,1,2,3,4}(indices of vertices).

Resources