How to find longest increasing subsequence among all simple paths of an unweighted general graph? - algorithm

Let G = (V, E) be an unweighted general graph in which every vertex v has a weight w(v).
An increasing subsequence of a simple path p in G is a sequence of vertices of p in which the weights of all vertices along this sequence increase. The simple paths can be closed paths.
A longest increasing subsequence (LIS) of a simple path p is an increasing subsequence of p such that has maximum number of vertices.
The question is that, how to find a longest increasing subsequence among all simple paths of G?
Note that the graph is undirected, therefore it is not a directed acyclic graph (DAG).

Here's a very fast algorithm for solving this problem. The longest increasing subsequence in the graph is a subsequence of a path in the graph, and each path must belong purely to a single connected component. So if we can solve this problem on connected components, we can solve it for the overall graph by finding the best solution across all connected components.
Next, think about the case where you're solving this problem for a connected graph G. In that case, the longest increasing subsequence you could find would be formed by sorting the nodes by their weight, then traversing from the lowest-weight node to the second, then to the third, then to the fourth, etc. If there are any ties or duplicates, you can just skip them. In other words, you can solve this problem by
Sorting all the nodes by weight,
Discarding all but one node of each weight, and
Forming an LIS by visiting each node in sequence.
This leads to a very fast algorithm for the overall problem. In time O(m + n), find all connected components. For each connected component, use the preceding algorithm in time O(Sort(n)), where Sort(n) is the time required to sort n elements (which could be Θ(n log n) if you use heapsort, Θ(n + U) for bucket sort, Θ(n lg U) for radix sort, etc.). Then, return the longest sequence you find.
Overall, the runtime is O(m + n + &Sort(n)), which beats my previous approach and should be a lot easier to code up.
I had originally posted this answer, which I'll leave up because I think it's interesting:
Imagine that you pick a simple path out of the graph G and look at the longest increasing subsequence of that path. Although the path walks all over the graph and might have lots of intermediary nodes, the longest increasing subsequence of that path really only cares about
the first node on the path that's also a part of the LIS, and
from that point, the next-largest value in the path.
As a result, we can think about forming an LIS like this. Start at any node in the graph. Now, travel to any node in the graph that (1) has a higher value than the current node and (2) is reachable from the current node, then repeat this process as many times as desired. The goal is to do so in a way that gives the longest possible sequence of increasing values.
We can model this process as finding a longest path in a DAG. Each node in the DAG represents a node in the original graph G, and there's an edge from a node u to a node v if
there's a path from u to v in G, and
w(u) < w(v).
This is a DAG because of that second condition, even though the original graph isn't a DAG.
So we can solve this overall problem in a two-step process. First, build the DAG described above. To do so:
Find the connected components of the original graph G and label each node with its connected component number. Time: O(m + n).
For each node u in G, construct a corresponding node u' in a new DAG D. Time: O(n).
For each node u in G, and for each node v in G that's in the same SCC as u, if w(u) < w(v), add an edge from u' to v'. Time: Θ(n2) in the worst-case, Θ(n) in the best case.
Find the longest path in D. This path corresponds to the longest increasing subsequence of any simple path in G. Time: O(m + n).
Overall runtime: Θ(n2) in the worst-case, Θ(m + n) in the best-case.

Related

Describing an algorithm at most O(nm log n) run time

If I had to give an algorithm in O|V|3| that takes as input a directed graph with positive edge lengths and returns the length of the shortest cycle in the graph (if the graph is acyclic, it should say so). I know that it will be:
Let G be a graph, define a matrix Dij which stores the shortest path from vertex i to j for any pair of vertices u,v. There can be two shortest paths between u and v. The length of the cycle is Duv+ Dvu. This then is enough to compote the minimum of the Duv+Dvu for any given pair of vertices u and v.
Could I write this in a way to make it at most O(nm log n) (where n is the number of vertices and m is the number of edges) instead of O|V|3|?
Yes, in fact this problem can be solved in O(nm) according to a conference paper by Orlin and Sedeño-Noda (2017), titled An O(nm) time algorithm for finding the min length directed cycle in a graph:
In this paper, we introduce an O(nm) time algorithm to determine the minimum length directed cycle (also called the "minimum weight directed cycle") in a directed network with n nodes and m arcs and with no negative length directed cycles.

Algorithm to traverse k nodes of an undirected, weighted graph (and return to the origin) at the lowest cost

I am looking for an algorithm to do the following:
In an undirected, weighted graph with cycles
-find a path that visits exactly k nodes
-minimize the total cost(weight)
-each node can be visited only once
-return to the origin
edit: The start (and end) vertex is set in advance.
If I wanted to visit all nodes, the Traveling Salesman algorithm (and all its variations) would work. But in my case, the "salesman" needs to head home after visiting k nodes.
Both approximate and exact algorithms are fine in this case.
Since your problem includes the TSP for k=n as a special case in general it will be NP-complete. For small k you can adapt the dynamic programming solution of Bellmann (1962) to solve it in time O(2^k n^3).
Let T(u,S) be the length of the shortest route starting at vertex u with vertices in S visited already. Then you want the smallest of T(u0,{u0}) over all starting vertices u0. T satisfies the recurrence
T(u,S) = min { d(u,v)+T(v,S+{v}) | v in V\S } if |S|<k
T(u,S) = d(u,u0) if |S|=k
for distances d(u,v). The DP table has 2^kn entries, each entry takes O(n) time to compute, and you have to compute it n times, for each starting vertex.

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

How to find longest path in graph?

We are given an Adjacency list of the form
U -> (U,V,C) -> (U,V,C) ...
U2 -> ...
U3 -> ...
.
.
etc
(U,V,C) means there's an edge from U to V with cost C.
The given Adjacency list is for a single connected tree with N nodes thus containing N-1 edges.
A set of nodes F=F1,F2,F3...Fk are given.
Now the question is what is the best way to find the longest path amongst the nodes in F?
Is it possible to do it in O(N)?
Is DFS from each node in F the only option?
I understood your question as asking to find a pair of nodes from the set F so that the unique path between those two nodes is as long as it can be. The path is unique because your graph is a tree.
The problem can be solved trivially by doing DFS from every node in F as you mention, for an O(n k) solution where n is the size of the graph and k is the size of the set F.
However, you can solve it potentially faster by a divide and conquer approach. Pick any node R from the graph, and use a single DFS to tabulate distances Dist(R, a) to every other node a a and at the same time partition the nodes to subtrees S1,...,Sm where m is the number of edges from R; that is, these are the m trees hanging at the root R. Now, for any f and g that belong to different subtrees it holds that the path between them has Dist(R, f) + Dist(R, g) edges, so it is possible to search for the longest such path in O(k^2) time. In addition, you have then to recurse to the subproblems S1,...,Sm to cover the case where the longest path is inside one of those trees. The overall complexity can be lower than O(n k) but the math is left as an exercise to the reader.
If I understood your question correctly, you are trying to find the longest cost path in a spanning tree.
You can find the path in just 2 complete traversal i.e., O(2N) ~ O(N) for large value of N.
you should do below step.
Pick any node in the spanning tree.
Run any algo (DFS or BFS) from the node and find the longest cost
path from this node.
This will not be your longest cost path as you started by randomly picking a node.
Run BFS or DFS one more time from the last node of longest cost path
found at step 2.
This time the longest cost path you get, will be the Longest cost
path in spanning tree.
You do not have to run DFS from each node.

graph - The implementation of updating Minimum Spanning Tree after adding a new edge

Here is an excise
Suppose we are given the minimum spanning tree T of a given graph G
(with n vertices and m edges) and a new edge e = (u, v) of weight w
that we will add to G. Give an efficient algorithm to find the minimum
spanning tree of the graph G + e. Your algorithm should run in O(n)
time to receive full credit.
I have this idea:
In the MST, just find out the path between u and v. Then find the edge (along the path) with maximum weight; if the maximum weight is bigger than w, then remove that edge from the MST and add the new edge to the MST.
The tricky part is how to do this in O(n) time and it is also I get stuck.
The question is that how the MST is stored. In normal Prim's algorithm, the MST is stored as a parent array, i.e., each element is the parent of the according vertex.
So suppose the excise give me a parent array indicating the MST, how can I release the above algorithm in O(n)?
First, how can I identify the path between u and v from the parent array? I can have two ancestor arrays for u and v, then check on the common ancestor, then I can get the path, although in backwards. I think for this part, to find the common ancestor, at least I have to do it in O(n^2), right?
Then, we have the path. But we still need to find the weight of each edge along the path. Since I suppose the graph will use adjacency-list for Prim's algorithm, we have to do O(m) (m is the number of edges) to locate each weight of the edge.
...
So I don't see it is possible to do the algorithm in O(n). Am I wrong?
The idea you have is right. Note that, finding the path between u and v is O(n). I'll assume you have a parent array identifying the MST. tracking the path (for max edge) from u to v or u to root vertex should take only O(n). If you reach root vertex, just track the path from v to u or root vertex.
Now that you have the path from u -> u1 ... -> max_path_vert1 -> max_path_vert2 -> ... -> v, remove the edge max_path_vert1->max_path_vert2 (assuming this is greater than the added edge) and reverse the parents for u->...->max_path_vert1 and mark parent[u] = v.
Edit: More explanation for clarity
Note that, in MST there will be exactly one path between any pair of vertices. So, if you can trace from u->y and v->y, you have only traced through atmost n vertices. If you traced more than n vertices that means you visited a vertex twice, which will not happen in an MST. Ok, now hopefully you're convinced it's O(n) to track from u->y and v->y. Once you have these paths, you have established a path from u->v. Do you see how? I'm assuming this is an undirected graph, since finding MST for directed graph is a different concept in itself. For undirected graph, when you have a path from x->y you have a path from y-x. So, u->y->v exist. You don't even need to trace back from y->v, since weights for v->y will be same as that of y->v. Just find the edge with the maximum weight when you trace from u->y and v->y.
Now for finding edge weights in O(1); how are you storing your current weights? Adjacency list or adjacency matrix? For O(1) access, store it the way parent vertex array is stored. So, weight[v] = weight(v, parent[v]). So, you'll have O(1) access. Hope this helps.
Well - your solution is correct.
But regarding implementation, I dont see why you are using G instead of T to find the path between u and v. Using any search traversal in T for the path between u and v, will give you O(n). - That is, you can assume that v is the root and performs a Depth-First Search algorithm [in this case, you will have to assume all neighbors of v as children] - and stop the DFS once you find u - then, the nodes in the stack corresponds to the path between u and v.
It is easy afterward to find the cost of each edge in the path (O(n)), and it is easy as well to delete/add edges. In total O(n).
Does that help somehow ?
Or maybe you are getting O(n^2) - according to my understanding - because you access the children of a vertex v in T in O(n) -- Here, you have to present your data structure as a mapped array so that the cost is reduced to O(1). [for instace, {a,b,c,u,w}(vertices) -> {0,1,2,3,4}(indices of vertices).

Resources