Suppose we E is an nxn matrix where E[i,j] represents the exchange rate from currency i to currency j. (How much of currency j can be obtained with 1 of currency i). (Note, E[i,j]*E[j,i] is not necessarily 1).
Come up with an algorithm to find a positive cycle if one exists, where a positive cycle is defined by: if you start with 1 of currency i, you can keep exchanging currency such that eventually you can come back and have more than 1 of currency i.
I've been thinking about this problem for a long time, but I can't seem to get it. The only thing I can come up with is to represent everything as a directed graph with matrix E as an adjacency matrix, where log(E[i,j]) is the weight between vertices i and j. And then you would look for a cycle with a negative path or something. Does that even make sense? Is there a more efficient/easier way to think of this problem?
First, take logs of exchange rates (this is not strictly necessary, but it means we can talk about "adding lengths" as usual). Then you can apply a slight modification of the Floyd-Warshall algorithm to find the length of a possibly non-simple path (i.e. a path that may loop back on itself several times, and in different places) between every pair of vertices that is at least as long as the longest simple path between them. The only change needed is to flip the sign of the comparison, so that we always look for the longest path (more details below). Finally you can look through all O(n^2) pairs of vertices u and v, taking the sum of the lengths of the 2 paths in each direction (from u to v, and from v to u). If any of these are > 0 then you have found a (possibly non-simple) cycle having overall exchange rate > 1. Overall the FW part of the algorithm dominates, making this O(n^3)-time.
In general, the problem with trying to use an algorithm like FW to find maximum-weight paths is that it might join together 2 subpaths that share one or more vertices, and we usually don't want this. (This can't ever happen when looking for minimum-length paths in a graph with no negative cycles, since such a path would necessarily contain a positive-weight cycle that could be removed, so it would never be chosen as optimal.) This would be a problem if we were looking for the maximum-weight simple cycle; in that case, to get around this we would need to consider a separate subproblem for every subset of vertices, which pushes the time and space complexity up to O(2^n). Fortunately, however, we are only concerned with finding some positive-weight cycle, and it's reasonably easy to see that if the path found by FW happens to use some vertex more than once, then it must contain a nonnegative-weight cycle -- which can either be removed (if it has weight 0), or (if it has weight > 0) is itself a "right answer"!
If you care about finding a simple cycle, this is easy to do in a final step that is linear in the length of the path reported by FW (which, by the way, may be O(2^|V|) -- if all paths have positive length then all "optimal" lengths will double with each outermost iteration -- but that's pretty unlikely to happen here). Take the optimal path pair implied by the result of FW (each path can be calculated in the usual way, by keeping a table of "optimal predecessor" values of k for each vertex pair (i, j)), and simply walk along it, assigning to each vertex you visit the running total of the length so far, until you hit a vertex that you have already visited. At this point, either currentTotal - totalAtAlreadyVisitedVertex > 0, in which case the cycle you just found has positive weight and you're finished, or this difference is 0, in which case you can delete the edges corresponding to this cycle from the path and continue as usual.
Related
I'm solving the problem described below and can't think of a better algorithm than trying every permutation of every vertex of every group with every.
I'm given a graph of vertices, along with a list of groups of specific vertices, the goal is to find the shortest path from a specific starting vertex to a specific ending vertex, and the path must pass through at least one vertex from each specified group of vertices.
There are also vertices in the graph that are not part of any given group.
Re-visiting vertices and edges is possible.
The graph data is specified as follows:
Vertex list - each vertex is identified by a sequence number (0 to the number of vertices -1 )
Edge list - list of vertex pairs (by vertex number)
Vertex group list - list of lists of vector numbers
A specific starting and ending vertex.
I would be grateful for any ideas for a better solution, thank you.
Summary:
We can use bitmasks to efficiently check which groups we have visited so far, and combine this with a traditional BFS/ Dijkstra's shortest-path algorithm.
If we assume E edges, V vertices, and K vertex-groups that have to be included, the below algorithm has a time complexity of O((V + E) * 2^K) and a space complexity of O(V * 2^K). The exponential 2^K term means it will only work for a relatively small K, say up to 10 or 20.
Details:
First, are the edges weighted?
If yes then a "shortest path" algorithm will usually be a variation of Dijkstra's algorithm, in which we keep a (min) priority queue of the shortest paths. We only visit a node once it's at the top of the queue, meaning that this must be the shortest path to this node. Any other shorter path to this node would already have been added to the priority queue and would come before the current iteration. (Note: this doesn't work for negative paths).
If no, meaning all edges have the same weight, then there is no need to maintain a priority queue with the shortest edges. We can instead just run a regular Breadth-first search (BFS), in which we maintain a deque with all nodes at the current depth. At each step we iterate over all nodes at the current depth (popping them from the left of the deque), and for each node we add all it's not-yet-visited neighbors to the right side of the deque, forming the next level.
The below algorithm works for both BFS and Dijkstra's, but for simplicity's sake for the rest of the answer I'll pretend that the edges have positive weights and we will use Dijkstra's. What is important to take away though is that for either algorithm we will only "visit" or "explore" a node for a path that must be the shortest path to that node. This property is essential for the algorithm to be efficient, since we know that we will at most visit each of the V nodes and E edges only one time, giving us a time complexity of O(V + E). If we use Dijkstra's we have to multiply this with log(V) for the priority queue usage (this also applies to the time complexity mentioned in the summary).
Our Problem
In our case we have the additional complexity that we have K vertex-groups, for each of which our shortest path has to contain at least one the nodes in it. This is a big problem, since it destroys our ability to simple go along with the "shortest current path".
See for example this simple graph. Notation: -- means an edge, start is that start node, and end is the end node. A vertex with value 0 does not have a vertex-group, and a vertex with value >= 1 belongs to the vertex-group of that index.
end -- 0 -- 2 -- start -- 1 -- 2
It is clear that the optimal path will first move right to the node in group 1, and then move left until the end. But this is impossible to do for the BFS and Dijkstra's algorithm we introduced above! After we move from the start to the right to capture the node in group 1, we would never ever move back left to the start, since we have already been there with a shorter path.
The Trick
In the above example, if the right-hand side would have looked like start -- 0 -- 0, where 0 means the vertex does not not belonging to a group, then it would be of no use to go there and back to the start.
The decisive reason of why it makes sense to go there and come back, although the path will get longer, is that it includes a group that we have not seen before.
How can we keep track of whether or not at a current position a group is included or not? The most efficient solution is a bit mask. So if we for example have already visited a node of group 2 and 4, then the bitmask would have a bit set at the position 2 and 4, and it would have the value of 2 ^ 2 + 2 ^ 4 == 4 + 16 == 20
In the regular Dijkstra's we would just keep a one-dimensional array of size V to keep track of what the shortest path to each vertex is, initialized to a very high MAX value. array[start] begins with value 0.
We can modify this method to instead have a two-dimensional array of dimensions [2 ^ K][V], where K is the number of groups. Every value is initialized to MAX, only array[mask_value_of_start][start] begins with 0.
The value we store at array[mask][node] means Given the already visited groups with bit-mask value of mask, what is the length of the shortest path to reach this node?
Suddenly, Dijkstra's resurrected
Once we have this structure, we can suddenly use Dijkstra's again (it's the same for BFS). We simply change the rules a bit:
In regular Dijkstra's we never re-visit a node
--> in our modification we differentiate by mask and never re-visit a node if it's already been visited for that particular mask.
In regular Dijkstra's, when exploring a node, we look at all neighbors and only add them to the priority queue if we managed to decrease the shortest path to them.
--> in our modification we look at all neighbors, and update the mask we use to check for this neighbor like: neighbor_mask = mask | (1 << neighbor_group_id). We only add a {neighbor_mask, neighbor} pair to the priority queue, if for that particular array[neighbor_mask][neighbor] we managed to decrease the minimal path length.
In regular Dijkstra's we only visit unexplored nodes with the current shortest path to it, guaranteeing it to be the shortest path to this node
--> In our modification we only visit nodes that for their respective mask values are not explored yet. We also only visit the current shortest path among all masks, meaning that for any given mask it must be the shortest path.
In regular Dijkstra's we can return once we visit the end node, since we are sure we got the shortest path to it.
--> In our modification we can return once we visit the end node for the full mask, meaning the mask containing all groups, since it must be the shortest path for the full mask. This is the answer to our problem.
If this is too slow...
That's it! Because time and space complexity are exponentially dependent on the number of groups K, this will only work for very small K (of course depending on the number of nodes and edges).
If this is too slow for your requirements then there might be a more sophisticated algorithm for this that someone smarter can come up with, it will probably involve dynamic programming.
It is very possible that this is still too slow, in which case you will probably want to switch to some heuristic, that sacrifices accuracy in order to gain more speed.
Question
How would one going about finding a least cost path when the destination is unknown, but the number of edges traversed is a fixed value? Is there a specific name for this problem, or for an algorithm to solve it?
Note that maybe the term "walk" is more appropriate than "path", I'm not sure.
Explanation
Say you have a weighted graph, and you start at vertex V1. The goal is to find a path of length N (where N is the number of edges traversed, can cross the same edge multiple times, can revisit vertices) that has the smallest cost. This process would need to be repeated for all possible starting vertices.
As an additional heuristic, consider a turn-based game where there are rooms connected by corridors. Each corridor has a cost associated with it, and your final score is lowered by an amount equal to each cost 'paid'. It takes 1 turn to traverse a corridor, and the game lasts 10 turns. You can stay in a room (self-loop), but staying put has a cost associated with it too. If you know the cost of all corridors (and for staying put in each room; i.e., you know the weighted graph), what is the optimal (highest-scoring) path to take for a 10-turn (or N-turn) game? You can revisit rooms and corridors.
Possible Approach (likely to fail)
I was originally thinking of using Dijkstra's algorithm to find least cost path between all pairs of vertices, and then for each starting vertex subset the LCP's of length N. However, I realized that this might not give the LCP of length N for a given starting vertex. For example, Dijkstra's LCP between V1 and V2 might have length < N, and Dijkstra's might have excluded an unnecessary but low-cost edge, which, if included, would have made the path length equal N.
It's an interesting fact that if A is the adjacency matrix and you compute Ak using addition and min in place of the usual multiply and sum used in normal matrix multiplication, then Ak[i,j] is the length of the shortest path from node i to node j with exactly k edges. Now the trick is to use repeated squaring so that Ak needs only log k matrix multiply ops.
If you need the path in addition to the minimum length, you must track where the result of each min operation came from.
For your purposes, you want the location of the min of each row of the result matrix and corresponding path.
This is a good algorithm if the graph is dense. If it's sparse, then doing one bread-first search per node to depth k will be faster.
I am looking for an efficient algorithm to solve the following problem :
Given a directed, weighted graph G = (V,E), a source vertex S, a destination vertex T, and M, a subset of V, It's required to find the shortest path from S to T.
A special feature present in M is that each vertex in M, once 'visited', the weight of a certain edge changes to another value. Both the edge and the new weight will be given for each vertex in M.
To help in understanding the problem, I have drawn an example using mspaint. (sorry for the quality).
In this example, the 'regular' shortest path from S to T is 1000.
However, visiting the vertex C will reduce the edge weight from 1000 to just 500, so the shortest path in this case is 200+100+500=800.
This problem is NP-hard and it is clearly in NP. The proof is a relatively straightforward gadget reduction.
This more or less rules out making significant improvements on the trivial, brute force algorithm for this problem. So what exactly are you hoping for when you say "efficient" here?
===
Proof
It might be that the problem statement has been unclear somehow so the version OP cares about isn't actually NP-complete. So I'll give some details of the proof.
For technical reasons, usually when we want to show a search problem is NP-hard we actually do it for an associated decision problem which is interreducible to the search problem. The decision problem here is "Given a directed weighted graph as described with associated edge-weight changing data, and a numeric value V, does the shortest path have value at most V?". Clearly, if you have an algorithm for the search problem, you can solve the decision problem easily, and if you have an algorithm for the decision problem, you can use it for the search problem -- you could use binary search essentially to determine the optimal value of V to precision greater than the input numbers, then alter the problem instance by deleting edges and checking if the optimal solution value changed in order to determine if an edge is in the path. So in the sequel I talk about the decision version of the problem.
The problem is in NP
First to see that it is in NP, we want to see that "yes" instances of the decision problem are certifiable in polynomial time. The certificate here is simply the shortest path. It is easy to see that the shortest path does not take more bits to describe than the graph itself. It is also easy to calculate the value of any particular path, you just go through the steps of the path and check what the value of the next edge was at that time. So the problem is in NP.
The problem is NP-hard
To see that it is NP-hard we reduce from 3SAT to the decision problem. That is the problem of determining the satisfiability of a boolean formula in CNF form in which each clause has at most 3 literals. For a complete definition of 3SAT see here: https://en.wikipedia.org/wiki/Boolean_satisfiability_problem
The reduction which I will describe is a transformation which takes an instance of 3SAT and produces an input to the decision problem, with the property that the 3SAT instance is satisfiable if and only if the shortest path value is less than the specified threshold.
For any given 3SAT formula, the graph which we will produce has the following high-level structure. For each variable, there will be a "cloud" of vertices associated to it which are connected in certain ways, and some of those vertices are in M. The graph is arranged so that the shortest path must pass through each cloud exactly once, passing through the cloud for x1 first, then the cloud for x2, and so on. Then, there is also a (differently arranged) cloud for each clause of the formula. After passing through the last variable's cloud, the path must pass through the clouds of the clauses in succession. Then it reaches the terminal.
The basic idea is that, when going through the cloud for variable xi, there are exactly two different possible paths, and the choice represents a commitment to a truth value of xi. All of the costs of the edges for the variable clouds are the same, so it doesn't directly affect the path value. But, since some of them are in M, the choice of path there changes what the costs will later in the clause clouds. The clause clouds enforce that if the variables we picked don't satisfy the clause, then we pay a price.
The variable cloud looks like this:
*_*_*_*
/ \
Entry * * Exit
\ /
*_*_*_*
Where, the stars are vertices, and the lines are edges, and all the edges are directed to the right, and the have same cost, we can take it to be zero, or if thats a problem they could all be one, it doesn't really matter. Here I showed 4 vertices on the two paths, but actually the number of vertices will depend on some things, as we will see.
The clause cloud looks like this:
*
/ \
Entry *_*_* Exit
\ /
*
Where, again all edges are directed to the right.
Each of the 3 central vertices is "labelled" (in our minds) and corresponds to one of the three literals in the clause. All of these edges again have cost 0.
The idea is that when I go through the variable cloud, and pick a value for that variable, if I didn't satisfy the literal of some clause, then the cost of the corresponding edge in the clause cloud should go up. Thus, as long as at least I actually satisfied the clause I have a path from the entry to the exit which costs 0. And if every one of the literals was missed by this assignment, then I have to pay something larger than zero. Maybe, 100 or something, it doesn't really matter.
Going back to the variable cloud now, the variable cloud for xi has 2m vertices in the middle part where m is the number of clauses that xi appears in. Then, depending whether it appears positively or negatively in the k'th such clause, the k'th vertex of the top or the bottom path is in M and changes the edge in the corresponding clause cloud, to have cost 100 or whatever other fixed value.
The overall graph is made by simply pasting together the variable and clause clouds at their entry - exit nodes in succession. The threshold value is, say, 50.
The point is that, if there is a satisfying assignment to the 3SAT instance, then we can construct from it a path through the graph instance of cost 0, since we never pay anything in the vertex clouds, and we always can pick a path in each clause cloud where the clause was satsified and we don't pay anything there either. If there is no satisfying assignment to the 3SAT instance, then any path must get a clause wrong at some point and then pay at least 100. So if we set the threshold to 50 and make that part of the instance, the reduction is sound.
If the problem statement doesn't actually allow 0 cost edges, we can easily change the solution so that it only has positive cost edges. Because, the total number of edges in any path from start to finish is the same for every path. So if we add 1 to every edge cost, and take the threshold to be 50 + length instead, nothing is changed.
That same trick of adding a fixed value to all of the edges and adjusting the threshold can be used to see that the problem is very strongly inapproximable also, as pointed out by David Eisenstat in comments.
Running time implications
If you are economical in how many vertices you assign to each variable cloud, the reduction takes a 3SAT instance with n variables (and hence input length O(n)) also to a graph instance of O(n) vertices, and the graph is sparse. (100n vertices should be way more than sufficient.) As a result, if you can give an algorithm for the stated problem with running time less than 2^{o(n)} on sparse graphs with n vertices, it implies a 2^{o(n)} algorithm for 3SAT which would be a major breakthrough in algorithms, and would disprove the "Exponential Time Hypothesis" of Impagliazzo and Paturi. So you can't really hope for more than a constant-factor-in-the-exponent improvement over the trivial algorithm in this problem.
I'd like to solve a harder version of the minimum spanning tree problem.
There are N vertices. Also there are 2M edges numbered by 1, 2, .., 2M. The graph is connected, undirected, and weighted. I'd like to choose some edges to make the graph still connected and make the total cost as small as possible. There is one restriction: an edge numbered by 2k and an edge numbered by 2k-1 are tied, so both should be chosen or both should not be chosen. So, if I want to choose edge 3, I must choose edge 4 too.
So, what is the minimum total cost to make the graph connected?
My thoughts:
Let's call two edges 2k and 2k+1 a edge set.
Let's call an edge valid if it merges two different components.
Let's call an edge set good if both of the edges are valid.
First add exactly m edge sets which are good in increasing order of cost. Then iterate all the edge sets in increasing order of cost, and add the set if at least one edge is valid. m should be iterated from 0 to M.
Run an kruskal algorithm with some variation: The cost of an edge e varies.
If an edge set which contains e is good, the cost is: (the cost of the edge set) / 2.
Otherwise, the cost is: (the cost of the edge set).
I cannot prove whether kruskal algorithm is correct even if the cost changes.
Sorry for the poor English, but I'd like to solve this problem. Is it NP-hard or something, or is there a good solution? :D Thanks to you in advance!
As I speculated earlier, this problem is NP-hard. I'm not sure about inapproximability; there's a very simple 2-approximation (split each pair in half, retaining the whole cost for both halves, and run your favorite vanilla MST algorithm).
Given an algorithm for this problem, we can solve the NP-hard Hamilton cycle problem as follows.
Let G = (V, E) be the instance of Hamilton cycle. Clone all of the other vertices, denoting the clone of vi by vi'. We duplicate each edge e = {vi, vj} (making a multigraph; we can do this reduction with simple graphs at the cost of clarity), and, letting v0 be an arbitrary original vertex, we pair one copy with {v0, vi'} and the other with {v0, vj'}.
No MST can use fewer than n pairs, one to connect each cloned vertex to v0. The interesting thing is that the other halves of the pairs of a candidate with n pairs like this can be interpreted as an oriented subgraph of G where each vertex has out-degree 1 (use the index in the cloned bit as the tail). This graph connects the original vertices if and only if it's a Hamilton cycle on them.
There are various ways to apply integer programming. Here's a simple one and a more complicated one. First we formulate a binary variable x_i for each i that is 1 if edge pair 2i-1, 2i is chosen. The problem template looks like
minimize sum_i w_i x_i (drop the w_i if the problem is unweighted)
subject to
<connectivity>
for all i, x_i in {0, 1}.
Of course I have left out the interesting constraints :). One way to enforce connectivity is to solve this formulation with no constraints at first, then examine the solution. If it's connected, then great -- we're done. Otherwise, find a set of vertices S such that there are no edges between S and its complement, and add a constraint
sum_{i such that x_i connects S with its complement} x_i >= 1
and repeat.
Another way is to generate constraints like this inside of the solver working on the linear relaxation of the integer program. Usually MIP libraries have a feature that allows this. The fractional problem has fractional connectivity, however, which means finding min cuts to check feasibility. I would expect this approach to be faster, but I must apologize as I don't have the energy to describe it detail.
I'm not sure if it's the best solution, but my first approach would be a search using backtracking:
Of all edge pairs, mark those that could be removed without disconnecting the graph.
Remove one of these sets and find the optimal solution for the remaining graph.
Put the pair back and remove the next one instead, find the best solution for that.
This works, but is slow and unelegant. It might be possible to rescue this approach though with a few adjustments that avoid unnecessary branches.
Firstly, the edge pairs that could still be removed is a set that only shrinks when going deeper. So, in the next recursion, you only need to check for those in the previous set of possibly removable edge pairs. Also, since the order in which you remove the edge pairs doesn't matter, you shouldn't consider any edge pairs that were already considered before.
Then, checking if two nodes are connected is expensive. If you cache the alternative route for an edge, you can check relatively quick whether that route still exists. If it doesn't, you have to run the expensive check, because even though that one route ceased to exist, there might still be others.
Then, some more pruning of the tree: Your set of removable edge pairs gives a lower bound to the weight that the optimal solution has. Further, any existing solution gives an upper bound to the optimal solution. If a set of removable edges doesn't even have a chance to find a better solution than the best one you had before, you can stop there and backtrack.
Lastly, be greedy. Using a regular greedy algorithm will not give you an optimal solution, but it will quickly raise the bar for any solution, making pruning more effective. Therefore, attempt to remove the edge pairs in the order of their weight loss.
Suppose I have a directed network with "n" nodes and "d" arcs. p(d) represents the probability that a package will arrive safely along that arc. Multiplying all of the probabilities on each arc the package takes on its path provides the probability of the package arriving safely to its destination.
Is there a formula that would allow us to maximize the probability the package arrives safely in the form of a shortest path problem?
Set up a graph where the weight on each arc d is -log(p(d)).
Then solve the shortest path problem, which finds the path with the smallest sum of weights.
This sum is:
-log(p(d0))-log(p(d1))-log(p(d2))... = -log(p(d0)*p(d1)*p(d2)...)
Therefore the smallest sum in neg log space is equivalent to the largest probability.
You can organize this as a variation of a MDP problem where each node has an edge that leads to a dummy state which only has a loop edge on itself.
This state represents the package being damages/compromised and the edges leading to it have a very high cost.
The rest is pretty straightforward, you set the probability and the other costs with a lower amount (e.g. 100 for the edges leading to the "damaged" state including the loop, 1 for the other edges) except the edge leading to the destination state that is to be set with a high negative value (e.g. -500).
Once you've set the problem you can run Policy Iteration or Value Iteration to obtain the best policy (edges to move on) to send the package to destination. They will converge giving you the safest path as possible.
This is quite easy if you're familiar with MDPs.