Shortest Path in a Directed Acyclic Graph with two types of costs - algorithm

I am given a directed acyclic graph G = (V,E), which can be assumed to be topologically ordered (if needed). The edges in G have two types of costs - a nominal cost w(e) and a spiked cost p(e).
The goal is to find the shortest path from a node s to a node t which minimizes the following cost:
sum_e (w(e)) + max_e (p(e)), where the sum and maximum are taken over all edges in the path.
Standard dynamic programming methods show that this problem is solvable in O(E^2) time. Is there a more efficient way to solve it? Ideally, an O(E*polylog(E,V)) algorithm would be nice.
---- EDIT -----
This is the O(E^2) solution I found using dynamic programming.
First, order all costs p(e) in an ascending order. This takes O(Elog(E)) time.
Second, define the state space consisting of states (x,i) where x is a node in the graph and i is in 1,2,...,|E|. It represents "We are in node x, and the highest edge weight p(e) we have seen so far is the i-th largest".
Let V(x,i) be the length of the shortest path (in the classical sense) from s to x, where the highest p(e) encountered was the i-th largest. It's easy to compute V(x,i) given V(y,j) for any predecessor y of x and any j in 1,...,|E| (there are two cases to consider - the edge y->x is has the j-th largest weight, or it does not).
At every state (x,i), this computation finds the minimum of about deg(x) values. Thus the complexity is O(|E| * sum_(x\in V) deg(x)) = O(|E|^2), as each node is associated to |E| different states.

I don't see any way to get the complexity you want. Here's an algorithm that I think would be practical in real life.
First, reduce the graph to only vertices and edges between s and t, and do a topological sort so that you can easily find shortest paths in O(E) time.
Let W(m) be the minimum sum(w(e)) cost of paths max(p(e)) <= m, and let P(m) be the smallest max(p(e)) among those shortest paths. The problem solution corresponds to W(m)+P(m) for some cost m. Note that we can find W(m) and P(m) simultaneously in O(E) time by finding a shortest W-cost path, using P-cost to break ties.
The relevant values for m are the p(e) costs that actually occur, so make a sorted list of those. Then use a Kruskal's algorithm variant to find the smallest m that connects s to t, and calculate P(infinity) to find the largest relevant m.
Now we have an interval [l,h] of m-values that might be the best. The best possible result in the interval is W(h)+P(l). Make a priority queue of intervals ordered by best possible result, and repeatedly remove the interval with the best possible result, and:
stop if the best possible result = an actual result W(l)+P(l) or W(h)+P(h)
stop if there are no p(e) costs between l and P(h)
stop if the difference between the best possible result and an actual result is within some acceptable tolerance; or
stop if you have exceeded some computation budget
otherwise, pick a p(e) cost t between l and P(h), find a shortest path to get W(t) and P(t), split the interval into [l,t] and [t,h], and put them back in the priority queue and repeat.
The worst case complexity to get an exact result is still O(E2), but there are many economies and a lot of flexibility in how to stop.

This is only a 2-approximation, not an approximation scheme, but perhaps it inspires someone to come up with a better answer.
Using binary search, find the minimum spiked cost θ* such that, letting C(θ) be the minimum nominal cost of an s-t path using edges with spiked cost ≤ θ, we have C(θ*) = θ*. Every solution has either nominal or spiked cost at least as large as θ*, hence θ* leads to a 2-approximate solution.
Each test in the binary search involves running Dijkstra on the subset with spiked cost ≤ θ, hence this algorithm takes time O(|E| log2 |E|), well, if you want to be technical about it and use Fibonacci heaps, O((|E| + |V| log |V|) log |E|).

Related

Efficient approximate algorithm to determine the presence of k-sized cycle in graph

I have a very large sparse graph G (about 100 million nodes, about 50 million edges) and I would like to find an efficient algorithm (hopefully O(1) or sub-linear in the number of nodes + edges) that predicts with some probability the presence of a cycle of length k in this graph. For practical use, k will very small (between 30 and 90) relative to the size of G. It is also guaranteed that k will always be even. G is also a random graph, so I don't expect any consistent clustering.
The algorithm doesn't need to enumerate the actual nodes contained in the cycle, it just needs to eliminate G if it most likely don't have any cycles of length k.
I found a close solution with the answer presented here, where the trace and rank of L (where L is the Laplacian of G) could be compared to determine whether G had any cycles at all. However, I couldn't find a relatively efficient way to compute rank for G. Another problem was that it doesn't take k into account, which might be able to make a more efficient approach.
Getting connected components is a possibility, but it is linear in the number of nodes + edges, which is not optimal for a graph of this size.
If it's an Erdos--Renyi random graph, then since having such a cycle is a monotone property of a graph, there's a zero-one law (https://www.ams.org/journals/proc/1996-124-10/S0002-9939-96-03732-X/S0002-9939-96-03732-X.pdf), which implies that you can make a reasonably good guess by setting the right threshold. (Which threshold? I don't know offhand, but probably you can extrapolate from smaller graphs.)

least cost path, destination unknown

Question
How would one going about finding a least cost path when the destination is unknown, but the number of edges traversed is a fixed value? Is there a specific name for this problem, or for an algorithm to solve it?
Note that maybe the term "walk" is more appropriate than "path", I'm not sure.
Explanation
Say you have a weighted graph, and you start at vertex V1. The goal is to find a path of length N (where N is the number of edges traversed, can cross the same edge multiple times, can revisit vertices) that has the smallest cost. This process would need to be repeated for all possible starting vertices.
As an additional heuristic, consider a turn-based game where there are rooms connected by corridors. Each corridor has a cost associated with it, and your final score is lowered by an amount equal to each cost 'paid'. It takes 1 turn to traverse a corridor, and the game lasts 10 turns. You can stay in a room (self-loop), but staying put has a cost associated with it too. If you know the cost of all corridors (and for staying put in each room; i.e., you know the weighted graph), what is the optimal (highest-scoring) path to take for a 10-turn (or N-turn) game? You can revisit rooms and corridors.
Possible Approach (likely to fail)
I was originally thinking of using Dijkstra's algorithm to find least cost path between all pairs of vertices, and then for each starting vertex subset the LCP's of length N. However, I realized that this might not give the LCP of length N for a given starting vertex. For example, Dijkstra's LCP between V1 and V2 might have length < N, and Dijkstra's might have excluded an unnecessary but low-cost edge, which, if included, would have made the path length equal N.
It's an interesting fact that if A is the adjacency matrix and you compute Ak using addition and min in place of the usual multiply and sum used in normal matrix multiplication, then Ak[i,j] is the length of the shortest path from node i to node j with exactly k edges. Now the trick is to use repeated squaring so that Ak needs only log k matrix multiply ops.
If you need the path in addition to the minimum length, you must track where the result of each min operation came from.
For your purposes, you want the location of the min of each row of the result matrix and corresponding path.
This is a good algorithm if the graph is dense. If it's sparse, then doing one bread-first search per node to depth k will be faster.

Complete Weighted Graph and Hamiltonian Tour

I ran into a question on a midterm exam. Can anyone clarify the answer?
Problem A: Given a Complete Weighted Graph G, find a Hamiltonian Tour with minimum weight.
Problem B: Given a Complete Weighted Graph G and Real Number R, does G have a Hamiltonian Tour with weight at most R?
Suppose there is a machine that solves B. How many times can we call B (each time G and Real number R are given),to solve problem A with that machine? Suppose the sum of Edges in G up to M.
1) We cannot do this, because there is uncountable state.
2) O(|E|) times
3) O(lg m) times
4) because A is NP-Hard, This is cannot be done.
First algorithm
The answer is (3) O(lg m) times. You just have to perform a binary search for the minimum hamiltonian tour in the weighted graph. Notice that if there is a hamiltonian tour of length L in the graph, there is no point in checking if a hamiltonian tour of length L', where L' > L, exists, since you are interested in the minimum-weight hamiltonian tour. So, in each step of your algorithm you can eliminate half of the remaining possible tour-weights. Consequently, you will have to call B in your machine O(lg m) times, where m stands for the total weight of all edges in the complete graph.
Edit:
Second algorithm
I have a slight modification of the above algorithm, which uses the machine O(|E|) times, since some people said that we cannot apply binary search in an uncountable set of possible values (and they are possibly right): Take every possible subset of edges from the graph, and for each subset store a value that is the sum of weights of all edges from the subset. Lets store the values for all the subsets in an array called Val. The size of this array is 2^|E|. Sort Val in increasing order, and then apply binary search for the minimum hamiltonian path, but this time call the machine that solves problem B only with values from the Val array. Since every subset of edges is included in the sorted array, it is guaranteed that the solution will be found. The total number of calls of the machine is O(lg(2^|E|)), which is O(|E|). So, the correct choice is (2) O(|E|) times.
Note:
The first algorithm I proposed is probably incorrect, as some people noted that you cannot apply binary search in an uncountable set. Since we are talking about real numbers, we cannot apply binary search in the range [0-M]
I believe that choice that was meant to be the answer is 1- you can't do that.
The reason is that you can only do binary search on countable sets.
Note that the edges of the graph may even have negative weights, and besides, they may have fractional, or even irrational weights. In that case, the search space for the answer will be the set of all real values less than m.
However,you may get arbitrarily close to the answer of A in Log(n) time, but you cannot find the exact answer. (n being the size of the countable space).
Supposing that in the encoding of graphs the weights are encoded as binary strings representing nonnegative integers and that Problem B can actually algorithmically be solved by entering a real number and perform calculations based on that, things are apparently as follows.
It is possible to do first binary search over the integral interval {0,...,M} to obtain the minumum weight of a Hamiltonian tour in O(log M) calls to the algorithm for Problem B. As afterwards the optimum is known, we can eliminate single edges in G and use the resulting graph as an input to the algorithm for Problem B to test whether or not the optimum changes. This process uses O(|E|) calls to the algorithm for Problem B to identify edges which occur in an optimal Hamiltonian tour. The overall running time of this approach is O( (|E| + log M ) * r(G)), where r(G) denotes the running time of the algorithm for Problem B taking a graph G as an input. I suppose that r is a polynomial, although the question does not explicitly state this; in total, the overall running time would be polynomially bounded in the encoding length of the input, as M can be computed in polynomial time (and hence is pseudopolynomially bounded in the encoding length of the input G).
That being said, the supposed answers can be commented as follows.
The answer is wrong, as the set of necessary states are finite.
Might be true, but does not follow from the algorithm discussed above.
Might be true, but does not follow from the algorithm discussed above.
The answer is wrong. Strictly speaking, the NP-hardness of Problem A does not rule out a polynomial time algorithm; furthermore, the algorithm for Problem B is not stated to be polynomial, so even P=NP does not follow if Problem A can be solved by a polynomial number of calls to the algorithm for Problem B (which is the case by the algorithm sketched above).

Directed graph (topological sort)

Say there exists a directed graph, G(V, E) (V represents vertices and E represents edges), where each edge (x, y) is associated with a weight (x, y) where the weight is an integer between 1 and 10.
Assume s and tare some vertices in V.
I would like to compute the shortest path from s to t in time O(m + n), where m is the number of vertices and n is the number of edges.
Would I be on the right track in implementing topological sort to accomplish this? Or is there another technique that I am overlooking?
The algorithm you need to use for finding the minimal path from a given vertex to another in a weighted graph is Dijkstra's algorithm. Unfortunately its complexity is O(n*log(n) + m) which may be more than you try to accomplish.
However in your case the edges are special - their weights have only 10 valid values. Thus you can implement a special data structure(kind of a heap, but takes advantage of the small dataset for the wights) to have all operations constant.
One possible way to do that is to have 10 lists - one for each weight. Adding an edge in the data structure is simply append to a list. Finding the minimum element is iteration over the 10 lists to find the first one that is non-empty. This still is constant as no more than 10 iterations will be performed. Removing the minimum element is also pretty straight-forward - simple removal from a list.
Using Dijkstra's algorithm with some data structure of the same asymptotic complexity will be what you need.

Time complexity of creating a minimal spanning tree if the number of edges is known

Suppose that the number of edges of a connected graph is known and the weight of each edge is distinct, would it possible to create a minimal spanning tree in linear time?
To do this we must look at each edge; and during this loop there can contain no searches otherwise it would result in at least n log n time. I'm not sure how to do this without searching in the loop. It would mean that, somehow we must only look at each edge once, and decide rather to include it or not based on some "static" previous values that does not involve a growing data structure.
So.. let's say we keep the endpoints of the node in question, then look at the next node, if the next node has the same vertices as prev, then compare the weight of prev and current node and keep the lower one. If the current node's endpoints are not equal to prev, then it is in a different component .. now I am stuck because we cannot create a hash or array to keep track of the component nodes that are already added while look through each edge in linear time.
Another approach I thought of is to find the edge with the minimal weight; since the edge weights are distinct this edge will be part of any MST. Then.. I am stuck. Since we cannot do this for n - 1 edges in linear time.
Any hints?
EDIT
What if we know the number of nodes, the number of edges and also that each edge weight is distinct? Say, for example, there are n nodes, n + 6 edges?
Then we would only have to find and remove the correct 7 edges correct?
To the best of my knowledge there is no way to compute an MST faster by knowing how many edges there are in the graph and that they are distinct. In the worst case, you would have to look at every edge in the graph before finding the minimum-cost edge (which must be in the MST), which takes Ω(m) time in the worst case. Therefore, I'll claim that any MST algorithm must take Ω(m) time in the worst case.
However, if we're already doing Ω(m) work in the worst-case, we could do the following preprocessing step on any MST algorithm:
Scan over the edges and count up how many there are.
Add an epsilon value to each edge weight to ensure the edges are unique.
This can be done in time Ω(m) as well. Consequently, if there were a way to speed up MST computation knowing the number of edges and that the edge costs are distinct, we would just do this preprocessing step on any current MST algorithm to try to get faster performance. Since to the best of my knowledge no MST algorithm actually tries to do this for performance reasons, I would suspect that there isn't a (known) way to get a faster MST algorithm based on this extra knowledge.
Hope this helps!
There's a famous randomised linear-time algorithm for minimum spanning trees whose complexity is linear in the number of edges. See "A randomized linear-time algorithm to find minimum spanning trees" by Karger, Klein, and Tarjan.
The key result in the paper is their "sampling lemma" -- that, if you independently randomly select a subset of the edges with probability p and find the minimum spanning tree of this subgraph, then there are only |V|/p edges that are better than the worst edge in the tree path connecting its ends.
As templatetypedef noted, you can't beat linear-time. That all edge weights are distinct is a common assumption that simplifies analysis; if anything, it makes MST algorithms run a little slower.
The fact that a number of edges (N) is known does not influence the complexity in any way. N is still a finite but unbounded variable, and each graph will have different N. If you place a upper bound on N, say, 1 million, then the complexity is O(1 million log 1 million) = O(1).
The fact that each edge has distinct weight does not influence the program either, because it does not say anything about the graph's structure. Therefore knowledge about current case cannot influence further processing, as we cannot predict how the graph's structure will look like in the next step.
If the number of edges is close to n, like in this case n-6 (after edit), we know that we only need to remove 7 edges as every spanning tree has only n-1 edges.
The Cycle Property shows that the most expensive edge in a cycle does not belong to any Minimum Spanning tree(assuming all edges are distinct) and thus, should be removed.
Now you can simply apply BFS or DFS to identify a cycle and remove the most expensive edge. So, overall, we need to run BFS 7 times. This takes 7*n time and gives us a time complexity of O(n). Again, this is only true if the number of edges is close to the number of nodes.

Resources