Probability and shortest path algorithm

Probability and shortest path algorithm - algorithm

Suppose I have a directed network with "n" nodes and "d" arcs. p(d) represents the probability that a package will arrive safely along that arc. Multiplying all of the probabilities on each arc the package takes on its path provides the probability of the package arriving safely to its destination.
Is there a formula that would allow us to maximize the probability the package arrives safely in the form of a shortest path problem?

Set up a graph where the weight on each arc d is -log(p(d)).
Then solve the shortest path problem, which finds the path with the smallest sum of weights.
This sum is:
-log(p(d0))-log(p(d1))-log(p(d2))... = -log(p(d0)*p(d1)*p(d2)...)
Therefore the smallest sum in neg log space is equivalent to the largest probability.

You can organize this as a variation of a MDP problem where each node has an edge that leads to a dummy state which only has a loop edge on itself.
This state represents the package being damages/compromised and the edges leading to it have a very high cost.
The rest is pretty straightforward, you set the probability and the other costs with a lower amount (e.g. 100 for the edges leading to the "damaged" state including the loop, 1 for the other edges) except the edge leading to the destination state that is to be set with a high negative value (e.g. -500).
Once you've set the problem you can run Policy Iteration or Value Iteration to obtain the best policy (edges to move on) to send the package to destination. They will converge giving you the safest path as possible.
This is quite easy if you're familiar with MDPs.

Related

Good algorithm for finding shortest path for specific vertices

I'm solving the problem described below and can't think of a better algorithm than trying every permutation of every vertex of every group with every.
I'm given a graph of vertices, along with a list of groups of specific vertices, the goal is to find the shortest path from a specific starting vertex to a specific ending vertex, and the path must pass through at least one vertex from each specified group of vertices.
There are also vertices in the graph that are not part of any given group.
Re-visiting vertices and edges is possible.
The graph data is specified as follows:
Vertex list - each vertex is identified by a sequence number (0 to the number of vertices -1 )
Edge list - list of vertex pairs (by vertex number)
Vertex group list - list of lists of vector numbers
A specific starting and ending vertex.
I would be grateful for any ideas for a better solution, thank you.

Summary:
We can use bitmasks to efficiently check which groups we have visited so far, and combine this with a traditional BFS/ Dijkstra's shortest-path algorithm.
If we assume E edges, V vertices, and K vertex-groups that have to be included, the below algorithm has a time complexity of O((V + E) * 2^K) and a space complexity of O(V * 2^K). The exponential 2^K term means it will only work for a relatively small K, say up to 10 or 20.
Details:
First, are the edges weighted?
If yes then a "shortest path" algorithm will usually be a variation of Dijkstra's algorithm, in which we keep a (min) priority queue of the shortest paths. We only visit a node once it's at the top of the queue, meaning that this must be the shortest path to this node. Any other shorter path to this node would already have been added to the priority queue and would come before the current iteration. (Note: this doesn't work for negative paths).
If no, meaning all edges have the same weight, then there is no need to maintain a priority queue with the shortest edges. We can instead just run a regular Breadth-first search (BFS), in which we maintain a deque with all nodes at the current depth. At each step we iterate over all nodes at the current depth (popping them from the left of the deque), and for each node we add all it's not-yet-visited neighbors to the right side of the deque, forming the next level.
The below algorithm works for both BFS and Dijkstra's, but for simplicity's sake for the rest of the answer I'll pretend that the edges have positive weights and we will use Dijkstra's. What is important to take away though is that for either algorithm we will only "visit" or "explore" a node for a path that must be the shortest path to that node. This property is essential for the algorithm to be efficient, since we know that we will at most visit each of the V nodes and E edges only one time, giving us a time complexity of O(V + E). If we use Dijkstra's we have to multiply this with log(V) for the priority queue usage (this also applies to the time complexity mentioned in the summary).
Our Problem
In our case we have the additional complexity that we have K vertex-groups, for each of which our shortest path has to contain at least one the nodes in it. This is a big problem, since it destroys our ability to simple go along with the "shortest current path".
See for example this simple graph. Notation: -- means an edge, start is that start node, and end is the end node. A vertex with value 0 does not have a vertex-group, and a vertex with value >= 1 belongs to the vertex-group of that index.
end -- 0 -- 2 -- start -- 1 -- 2
It is clear that the optimal path will first move right to the node in group 1, and then move left until the end. But this is impossible to do for the BFS and Dijkstra's algorithm we introduced above! After we move from the start to the right to capture the node in group 1, we would never ever move back left to the start, since we have already been there with a shorter path.
The Trick
In the above example, if the right-hand side would have looked like start -- 0 -- 0, where 0 means the vertex does not not belonging to a group, then it would be of no use to go there and back to the start.
The decisive reason of why it makes sense to go there and come back, although the path will get longer, is that it includes a group that we have not seen before.
How can we keep track of whether or not at a current position a group is included or not? The most efficient solution is a bit mask. So if we for example have already visited a node of group 2 and 4, then the bitmask would have a bit set at the position 2 and 4, and it would have the value of 2 ^ 2 + 2 ^ 4 == 4 + 16 == 20
In the regular Dijkstra's we would just keep a one-dimensional array of size V to keep track of what the shortest path to each vertex is, initialized to a very high MAX value. array[start] begins with value 0.
We can modify this method to instead have a two-dimensional array of dimensions [2 ^ K][V], where K is the number of groups. Every value is initialized to MAX, only array[mask_value_of_start][start] begins with 0.
The value we store at array[mask][node] means Given the already visited groups with bit-mask value of mask, what is the length of the shortest path to reach this node?
Suddenly, Dijkstra's resurrected
Once we have this structure, we can suddenly use Dijkstra's again (it's the same for BFS). We simply change the rules a bit:
In regular Dijkstra's we never re-visit a node
--> in our modification we differentiate by mask and never re-visit a node if it's already been visited for that particular mask.
In regular Dijkstra's, when exploring a node, we look at all neighbors and only add them to the priority queue if we managed to decrease the shortest path to them.
--> in our modification we look at all neighbors, and update the mask we use to check for this neighbor like: neighbor_mask = mask | (1 << neighbor_group_id). We only add a {neighbor_mask, neighbor} pair to the priority queue, if for that particular array[neighbor_mask][neighbor] we managed to decrease the minimal path length.
In regular Dijkstra's we only visit unexplored nodes with the current shortest path to it, guaranteeing it to be the shortest path to this node
--> In our modification we only visit nodes that for their respective mask values are not explored yet. We also only visit the current shortest path among all masks, meaning that for any given mask it must be the shortest path.
In regular Dijkstra's we can return once we visit the end node, since we are sure we got the shortest path to it.
--> In our modification we can return once we visit the end node for the full mask, meaning the mask containing all groups, since it must be the shortest path for the full mask. This is the answer to our problem.
If this is too slow...
That's it! Because time and space complexity are exponentially dependent on the number of groups K, this will only work for very small K (of course depending on the number of nodes and edges).
If this is too slow for your requirements then there might be a more sophisticated algorithm for this that someone smarter can come up with, it will probably involve dynamic programming.
It is very possible that this is still too slow, in which case you will probably want to switch to some heuristic, that sacrifices accuracy in order to gain more speed.

dijkstra with at most ten negative edges in a path

A question from homework, maybe need to change the implementation of Dijkstra or just reduction somehow.
Let G=(V, E) and let W be a weight function W: E->Z.
All the negative weight edges with the same negative value x. (for example, all the negative weights on edges are with value -10 and all the other are positive)
Let's define "weight up to 10 negative edges," which returns the weight of the path if there is at most ten negative edges or infinity if there are more than ten negative edges.
I need to find a "weight up to 10 negative edges" path from vertex S to all other vertices.
The complexity time should be O(Elog(V)) or O(E+Vlog(V)).
I thought to duplicate the graph ten times and each time there is a negative weight edge we will move from duplicate to the next one. We will make edges with a weight of infinity between the 10th duplicate to the 11th duplicate and run Dijkstra But I don't think it works.
There should be a solution that uses Dijkstra in some way.

Dijkstra's algorithm doesn't work with negative edges because it iteratively selects the "unconfirmed" node with the lowest path length, marks it as "confirmed", and then never updates the path length for that node again. If a negative edge exists, then a "shorter" path might be found to a node after it has already been "confirmed", but if the node becomes "unconfirmed" again as a result of that then there could potentially be an infinite loop; the same node could keep getting confirmed then unconfirmed over and over, and the algorithm would never terminate. Any change to the algorithm to solve this problem must address that problem.
As a way to guarantee termination, instead of just recording the path length, you can record a pair like (path length, # of negative edges). When a shorter path to a "confirmed" node is found using a negative edge, the path length may get shorter but the number of negative edges in that path is increased. So you can write a condition to stop updating it if the number of negative edges in the resulting path would be greater than 10.
The problem is more subtle than that, though, because it's no longer the case that the "shortest path so far" to a node is the best one to keep. Suppose you have are looking for a shortest path from A to C using at most 10 negative edges, and you have found a path of length 10 from A to B using no negative edges, and another one from A to B of length 5 using three negative edges; you don't yet know which one leads to a better solution (or a solution at all), because there may be 8 negative edges in the path from B to C. So at each node you need to record not just the pair of (path length, # of negative edges), you need to record a set of all best such pairs.
Hopefully that gives you an idea of how Dijkstra's algorithm can be adapted to solve your problem; there are some remaining details you will need to fill in yourself.

You can't use Dijkstra's algorithm with negative weights and come up with a correct solution. See this other post for the reasoning behind why it fails.

least cost path, destination unknown

Question
How would one going about finding a least cost path when the destination is unknown, but the number of edges traversed is a fixed value? Is there a specific name for this problem, or for an algorithm to solve it?
Note that maybe the term "walk" is more appropriate than "path", I'm not sure.
Explanation
Say you have a weighted graph, and you start at vertex V1. The goal is to find a path of length N (where N is the number of edges traversed, can cross the same edge multiple times, can revisit vertices) that has the smallest cost. This process would need to be repeated for all possible starting vertices.
As an additional heuristic, consider a turn-based game where there are rooms connected by corridors. Each corridor has a cost associated with it, and your final score is lowered by an amount equal to each cost 'paid'. It takes 1 turn to traverse a corridor, and the game lasts 10 turns. You can stay in a room (self-loop), but staying put has a cost associated with it too. If you know the cost of all corridors (and for staying put in each room; i.e., you know the weighted graph), what is the optimal (highest-scoring) path to take for a 10-turn (or N-turn) game? You can revisit rooms and corridors.
Possible Approach (likely to fail)
I was originally thinking of using Dijkstra's algorithm to find least cost path between all pairs of vertices, and then for each starting vertex subset the LCP's of length N. However, I realized that this might not give the LCP of length N for a given starting vertex. For example, Dijkstra's LCP between V1 and V2 might have length < N, and Dijkstra's might have excluded an unnecessary but low-cost edge, which, if included, would have made the path length equal N.

It's an interesting fact that if A is the adjacency matrix and you compute Ak using addition and min in place of the usual multiply and sum used in normal matrix multiplication, then Ak[i,j] is the length of the shortest path from node i to node j with exactly k edges. Now the trick is to use repeated squaring so that Ak needs only log k matrix multiply ops.
If you need the path in addition to the minimum length, you must track where the result of each min operation came from.
For your purposes, you want the location of the min of each row of the result matrix and corresponding path.
This is a good algorithm if the graph is dense. If it's sparse, then doing one bread-first search per node to depth k will be faster.

Find the path with the max likelihood between two vertices in markov model

Given a markov model, which has a start state named S and an exit state named F, and this model can be represented as a directed graph, with some constraints:
Every edge has some weight falls in the range (0,1] as the transition probability.
Weights of the edges coming out from each node sum to 1.
The question is how to rank the paths between start state and exit state? Or, more precisely, how to find out the path with highest probability?
On one hand, the weights are probabilities, so the longer the path, the smaller would the products be, so one heuristic strategy is to pick shorter path and bigger weight candidates; but can this problem be converted into shortest path problem or using some tailored Viterbi algorithm or some DP algorithm to solve?

Convert your probabilities to log space (the log base doesn't matter). Now the probability of a path becomes the sum of the log space weights (because log(ab) = log(a) + log(b) . Since the weights/probabilities are <1 the weights in log space will all be negative and the path has the highest weight.
To bring it more into the regular problem you can negate all the log space weights so that they are all positive and you are looking for the lowest sum. At this point you can run standard algorithms (Dijkstra would be simple and very fast) to find the path you are looking for. If you have the sum then negate it and calculate exponential to get the probability.
TL;DR: replace all weight w with -log(w) and run Dijkstra with the new weights.

Finding a "positive cycle"

Suppose we E is an nxn matrix where E[i,j] represents the exchange rate from currency i to currency j. (How much of currency j can be obtained with 1 of currency i). (Note, E[i,j]*E[j,i] is not necessarily 1).
Come up with an algorithm to find a positive cycle if one exists, where a positive cycle is defined by: if you start with 1 of currency i, you can keep exchanging currency such that eventually you can come back and have more than 1 of currency i.
I've been thinking about this problem for a long time, but I can't seem to get it. The only thing I can come up with is to represent everything as a directed graph with matrix E as an adjacency matrix, where log(E[i,j]) is the weight between vertices i and j. And then you would look for a cycle with a negative path or something. Does that even make sense? Is there a more efficient/easier way to think of this problem?

First, take logs of exchange rates (this is not strictly necessary, but it means we can talk about "adding lengths" as usual). Then you can apply a slight modification of the Floyd-Warshall algorithm to find the length of a possibly non-simple path (i.e. a path that may loop back on itself several times, and in different places) between every pair of vertices that is at least as long as the longest simple path between them. The only change needed is to flip the sign of the comparison, so that we always look for the longest path (more details below). Finally you can look through all O(n^2) pairs of vertices u and v, taking the sum of the lengths of the 2 paths in each direction (from u to v, and from v to u). If any of these are > 0 then you have found a (possibly non-simple) cycle having overall exchange rate > 1. Overall the FW part of the algorithm dominates, making this O(n^3)-time.
In general, the problem with trying to use an algorithm like FW to find maximum-weight paths is that it might join together 2 subpaths that share one or more vertices, and we usually don't want this. (This can't ever happen when looking for minimum-length paths in a graph with no negative cycles, since such a path would necessarily contain a positive-weight cycle that could be removed, so it would never be chosen as optimal.) This would be a problem if we were looking for the maximum-weight simple cycle; in that case, to get around this we would need to consider a separate subproblem for every subset of vertices, which pushes the time and space complexity up to O(2^n). Fortunately, however, we are only concerned with finding some positive-weight cycle, and it's reasonably easy to see that if the path found by FW happens to use some vertex more than once, then it must contain a nonnegative-weight cycle -- which can either be removed (if it has weight 0), or (if it has weight > 0) is itself a "right answer"!
If you care about finding a simple cycle, this is easy to do in a final step that is linear in the length of the path reported by FW (which, by the way, may be O(2^|V|) -- if all paths have positive length then all "optimal" lengths will double with each outermost iteration -- but that's pretty unlikely to happen here). Take the optimal path pair implied by the result of FW (each path can be calculated in the usual way, by keeping a table of "optimal predecessor" values of k for each vertex pair (i, j)), and simply walk along it, assigning to each vertex you visit the running total of the length so far, until you hit a vertex that you have already visited. At this point, either currentTotal - totalAtAlreadyVisitedVertex > 0, in which case the cycle you just found has positive weight and you're finished, or this difference is 0, in which case you can delete the edges corresponding to this cycle from the path and continue as usual.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio