Algorithm for finding all critical paths - algorithm

I need to create an algorithm that could find all the critical paths in a graph.
I have found the topological order of the nodes, calculated earliest end times and latest start times for each node.
Also I have found all critical nodes (i.e. the ones that are on a critical path).
The problem is putting it all together and actually printing out all these paths. If there is only 1 critical path in a graph, then I can deal with it but the issues start if there are multiple paths.
For example one node being part of several critical paths, multiple start nodes, multiple end nodes and so on. I have not managed to come up with an algorithm that could take all of these factors into account.
The output I am looking for is something like this (if a,b,c etc are all nodes):
a->e
a->c->f->i->j->k
a->c->g
l->e
It would be nice if someone could write a description of an algorithm that could find the paths from knowing critical nodes, topological order, etc. Or maybe also in C or java code.
EDIT:
Here is an example, which should provide the output I posted earlier. Critical paths are red, the values of each node are marked above or near it.

The computation of latest start times nearly provides the critical path as well. You need to construct the results from the terminal nodes, and go backwards:
find all nodes that have the maximum earliest end time (t=11, nodes = {e,g,k})
for each of them, find all predecessors that have the locally maximum earliest end time.
For e, both l and k have t=2, so you get l->e, a->e. For g, t=6, which is for node c,
so you get c->g. For k, t=10, but only j has this as its end time, so you get j->k.
repeat until you reach the start node. l->e and a->e are already start nodes.
c->g has the only predecessor a, so you get a->c->g. j->k has t=9, which is i's end time,
so you get i->j->k.

I don't understand your question entirely: what is a critical path? You'd might be helped by:
algorithm to calculate Minimum Spanning Tree or Dijkstra's shortest path algorithm - although you might probably already knew these algorithms.

Related

Find min cost from node A to node B and also keep the path info

I got a question which is to find the min cost from the least number node (1) to the largest number node (7).
The cost is the edge between nodes. I labeled them.
This problem got me to think of the Dijkstra which leads the time complexity for O((v+e) log v)
Any other better approach to solving this question efficiently?
The other requirement is to keep the path information, any thought to keep the path?
As others pointed out, the complexity is as you say and cannot be better. As #nico-schertler commented, searching from both sides in parallel (or taking turns) and stopping as soon as something touches is faster than doing just a search from one side, but it will have the same complexity. It is possible in this case (with fixed costs for the bidirectional edges) but it needs not be in the general case (e. g. cost depending on the already taken path) where Dijkstra is still applicable.
Concerning the keeping of the path: Of course, the whole thing often only makes sense if you get the path to be taken as an answer. There are two main approaches to get the path as a result.
One is to store the path already taken to a certain node along with the node in the lists (white/grey in the classical implementation). Each time you add a new node, you extend the path of its former node by one step. If you find the target node, you can directly return the path as a result (along with the cost sum). Of course this way means uses a lot of memory.
The other is to not store the origin node along with each new found node, so each node points to the node it was visited from first. Think of it as putting up signposts in each node how to go back. This way, if you find the target node, you will have to go backwards from each node to the one it was first visited from and build the path in reverse order in the process. Then you can return this path.

Detecting unusual path patterns in a gigantic directed graph

I have a gigantic directed graph (100M+ nodes) of nodes, with multiple path instance records between sets of nodes. the path taken between any two nodes may vary, but what I'd like to find are paths that share multiple intermediary nodes except for a major deviation.
For example, I have 10 instances of a path between node A and node H. Nine of those ten path instances travel through nodes c,d,e,f - but one of the instances travels through c,d,z,e,f - I want to find that "odd" instance.
Any ideas how I would even begin to approach such a problem? Existing analytical frameworks that might be suited to the task?
Details based on comments:
A PIR (path instance record) is a list of nodes traveled through with associated edge traversal times per edge.
Currently, raw PIR records are in a plain string format - obviously, I would want to store it differently based on how I eventually choose to analyze it.
This is not a route solving problem - I never need to find all possible paths; I only need to analyze taken paths (each of which is a PIR).
The list of subpaths needs to be generated from the PIRs.
An example of a PIR would be something like:
nodeA;300;nodeB;600;nodeC;100;nodeD;100;nodeF
This translates to the path of A->B-C->D->F; the cost/time of each vertice is the number - for instance, it cost 300 to go from A->B, 600 to go from B->C, and 100 to go from D->F. The cost/time of each traversal will differ each time the traversal is made. So, for instance, in one PIR, it may cost 100 to go from A->B, but in the next it may cost 150 to go from A->B.
Go through the list of paths and break them up into sets based on the start and end node. So that for example all paths that start with the node A and end with the node B are in the same set. Then you can do the same thing with subsequences of those paths. So that for example every path with the subsequence a,b,c,d and the start node y and the end node k are in the same set. Also reversing paths as required so that for example, you don't have a set for paths k to y and a set for paths y to k. You can then check if a subsequence is common enough followed by checking if the path(s) that don't have that subsequence if there is a subsequence within that path that is sufficiently close to the original sequence based on edit distance. If you are just interested in the path, then you can simply calculate the edit distance of the path and the subsequence, subtract the difference in length, and check if result is low enough. It's probably best to use a subsequence of the path such that it starts and ends with the same node as the desired subsequence.
For your example, the algorithm would eventually reach the set of paths containing the subsequence c,d,e,f, and find that there are 9 of them. This exceeds the amount required for the subsequence to be common enough (and long enough, probably want sequences of at least length k), it would then check the paths that are not included. In this case, there are only one. It would then note, either directly or indirectly, that only only the removal of z, would make the sequence c,d,z,e,f into c,d,e,f. This passes the (currently vague) requirements for "odd", and thus the path containing c,d,z,e,f is added to the list of paths to be returned.

Graph search problem with route restrictions

I want to calculate the most profitable route and I think this is a type of traveling salesman problem.
I have a set of nodes that I can visit and a function to calculate cost for traveling between nodes and points for reaching the nodes. The goal is to reach a fixed known score while minimizing the cost.
This cost and rewards are not fixed and depend on the nodes visited before.
The starting node is fixed.
There are some restrictions on how nodes can be visited. Some simplified examples include:
Node B can only be visited after A
After node C has been visited, D or E can be visited. Visiting at least one is required, visiting both is permissible.
Z can only be visited after at least 5 other nodes have been visited
Once 50 nodes have been visited, the nodes A-M will no longer reward points
Certain nodes can (and probably must) be visited multiple times
Currently I can think of only two ways to solve this:
a) Genetic Algorithms, with the fitness function calculating the cost/benefit of the generated route
b) Dijkstra search through the graph, since the starting node is fixed, although the large number of nodes will probably make that not feasible memory wise.
Are there any other ways to determine the best route through the graph? It doesn't need to be perfect, an approximated path is perfectly fine, as long as it's error acceptable.
Would TSP-solvers be an option here?
With this much weird variation and path-dependence, what you're actually searching is not the graph itself, but the space of paths from the root, which is a tree. If the problem is as general as you say, you're not going to be able to do better than directly searching the "tree-of-paths", saving the best value and the corresponding path. If you can transform it into any way so that there is no such path-dependence, you should probably do so.
If you can't, there are two basic options: breadth-first, which will return the paths in order of length, but at the cost of high memory usage, as there are many temporary paths that must be stored. Depth-first search only needs to store a single path (which can be done entirely as a series of recursive calls), but has no natural stopping point, and is not guaranteed to actually terminate if there is no upper bound on the path size.
If you're lucky enough that the cost increases monotonically with each additional step, you can instead order by cost. The first one that's good enough is the one you then want. Breadth firs search is sometimes implemented by putting the paths to explore on a queue. Change this to a priority queue based on the cost, and you now have a "cost first search", known formally as Uniform-cost search.
If the cost function can decrease by adding on the path, A* search can be modified to do the search, but you no longer have the guarantee that you can stop early.

An algorithm to check if a vertex is reachable

Is there an algorithm that can check, in a directed graph, if a vertex, let's say V2, is reachable from a vertex V1, without traversing all the vertices?
You might find a route to that node without traversing all the edges, and if so you can give a yes answer as soon as you do. Nothing short of traversing all the edges can confirm that the node isn't reachable (unless there's some other constraint you haven't stated that could be used to eliminate the possibility earlier).
Edit: I should add that it depends on how often you need to do queries versus how large (and dense) your graph is. If you need to do a huge number of queries on a relatively small graph, it may make sense to pre-process the data in the graph to produce a matrix with a bit at the intersection of any V1 and V2 to indicate whether there's a connection from V1 to V2. This doesn't avoid traversing the graph, but it can avoid traversing the graph at the time of the query. I.e., it's basically a greedy algorithm that assumes you're going to eventually use enough of the combinations that it's easiest to just traverse them all and store the result. Depending on the size of the graph, the pre-processing step may be slow, but once it's done executing a query becomes quite fast (constant time, and usually a pretty small constant at that).
Depth first search or breadth first search. Stop when you find one. But there's no way to tell there's none without going through every one, no. You can improve the performance sometimes with some heuristics, like if you have additional information about the graph. For example, if the graph represents a coordinate space like a real map, and most of the time you know that there's going to be a mostly direct path, then you can attempt to have the depth-first search look along lines that "aim towards the target". However, imagine the case where the start and end points are right next to each other, but with no vector inbetween, and to find it, you have to go way out of the way. You have to check every case in order to be exhaustive.
I doubt it has a name, but a breadth-first search might go like this:
Add V1 to a queue of nodes to be visited
While there are nodes in the queue:
If the node is V2, return true
Mark the node as visited
For every node at the end of an outgoing edge which is not yet visited:
Add this node to the queue
End for
End while
Return false
Create an adjacency matrix when the graph is created. At the same time you do this, create matrices consisting of the powers of the adjacency matrix up to the number of nodes in the graph. To find if there is a path from node u to node v, check the matrices (starting from M^1 and going to M^n) and examine the value at (u, v) in each matrix. If, for any of the matrices checked, that value is greater than zero, you can stop the check because there is indeed a connection. (This gives you even more information as well: the power tells you the number of steps between nodes, and the value tells you how many paths there are between nodes for that step number.)
(Note that if you know the number of steps in the longest path in your graph, for whatever reason, you only need to create a number of matrices up to that power. As well, if you want to save memory, you could just store the base adjacency matrix and create the others as you go along, but for large matrices that may take a fair amount of time if you aren't using an efficient method of doing the multiplications, whether from a library or written on your own.)
It would probably be easiest to just do a depth- or breadth-first search, though, as others have suggested, not only because they're comparatively easy to implement but also because you can generate the path between nodes as you go along. (Technically you'd be generating multiple paths and discarding loops/dead-end ones along the way, but whatever.)
In principle, you can't determine that a path exists without traversing some part of the graph, because the failure case (a path does not exist) cannot be determined without traversing the entire graph.
You MAY be able to improve your performance by searching backwards (search from destination to starting point), or by alternating between forward and backward search steps.
Any good AI textbook will talk at length about search techniques. Elaine Rich's book was good in this area. Amazon is your FRIEND.
You mentioned here that the graph represents a road network. If the graph is planar, you could use Thorup's Algorithm which creates an O(nlogn) space data structure that takes O(nlogn) time to build and answers queries in O(1) time.
Another approach to this problem would allow you to ignore all of the vertices. If you were to only look at the edges, you can produce a transitive closure array that will show you each vertex that is reachable from any other vertex.
Start with your list of edges:
Va -> Vc
Va -> Vd
....
Create an array with start location as the rows and end location as the columns. Fill the arrays with 0. For each edge in the list of edges, place a one in the start,end coordinate of the edge.
Now you iterate a few times until either V1,V2 is 1 or there are no changes.
For each row:
NextRowN = RowN
For each column that is true for RowN
Use boolean OR to OR in the results of that row of that number with the current NextRowN.
Set RowN to NextRowN
If you run this algorithm until the end, you will quickly have a complete list of all reachable vertices without looking at any of them. The runtime is proportional to the number of edges. This would work well with a reasonable implementation and a reasonable number of edges.
A slightly more complex version of this algorithm would be to only calculate the vertices reachable by V1. To do this, you would focus your scope on the ones that are currently reachable at any given time. You can also limit adding rows to only one time, since the other rows are never changing.
In order to be sure, you either have to find a path, or traverse all vertices that are reachable from V1 once.
I would recommend an implementation of depth first or breadth first search that stops when it encounters a vertex that it has already seen. The vertex will be processed on the first occurrence only. You need to make sure that the search starts at V1 and stops when it runs out of vertices or encounters V2.

Find the shortest path in a graph which visits certain nodes

I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( http://3e.org/local/maize-graph.png / http://3e.org/local/maize-graph.dot.txt is the graph in question - it represents a corn maze in Lancaster, PA)
Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]
run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end
This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.
Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.
This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
[start->a->b->c->end]
[start->a->c->b->end]
[start->b->a->c->end]
[start->b->c->a->end]
[start->c->a->b->end]
[start->c->b->a->end]
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)
Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: http://en.wikipedia.org/wiki/Steiner_tree and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.
Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
http://en.wikipedia.org/wiki/Traveling_salesman_problem
The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)
How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.
One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.

Resources