Finding the cycle of smallest average weight in a directed graph - algorithm

I'm looking for an algorithm that takes a directed, weighted graph (with positive integer weights) and finds the cycle in the graph with the smallest average weight (as opposed to total weight).
Based on similar questions (but for total weight), I considered applying a modification of the Floyd–Warshall algorithm, but it would rely on the following property, which does not hold (thank you Ron Teller for providing a counterexample to show this): "For vertices U,V,W, if there are paths p1,p2 from U to V, and paths p3,p4 from V to W, then the optimal combination of these paths to get from U to W is the better of p1,p2 followed by the better of p3,p4."
What other algorithms might I consider that don't rely on this property?
Edit: Moved the following paragraph, which is no longer relevant, below the question.
While this property seems intuitive, it doesn't seem to hold in the case of two paths that are equally desirable. For example, if p1 has total weight 2 and length 2, and p2 has total weight 3 and length 3, neither one is better than the other. However, if p3 and p4 have greater total weights than lengths, p2 is preferable to p1. In my desired application, weights of each edge are positive integers, so this property is enforced and I think I can assume that, in the case of a tie, longer paths are better. However, I still can't prove that this works, so I can't verify the correctness of any algorithm relying on it.

"While this property seems intuitive, it doesn't seem to hold in the
case of two paths that are equally desirable."
Actually, when you are considering 2 parameters (weight, length), it doesn't hold in any case, here's an example when P1 that in itself has a smaller average than P2, can be sometimes better (example 1) for the final solution or worse (example 2), depending on P3 and P4.
Example 1:
L(P1) = 9, W(P1) = 10
L(P2) = 1, W(P2) = 1
L(P3) = 1, W(P3) = 1
L(P4) = 1, W(P4) = 1
Example 2:
L(P1) = 9, W(P1) = 10
L(P2) = 1, W(P2) = 1
L(P3) = 5, W(P3) = 10
L(P4) = 5, W(P4) = 10
These two parameters have an effect on your objective function that cannot be determined locally, thus the Floyd–Warshall algorithm with any modification won't work.
Since you are considering cycles only, you might want to consider a brute force algorithm that validates the average weight of each of the cycles in the graph. you can do it in polynomial time, see:
Finding all cycles in graph

I can suggest another algorithm.
Let's fix C. Now substract C from all weights. How would the answer change? If we substracted the same number from all weights then the average weight of each cycle decreased on the same number C. Now let's check wether we have cycles with negative average weight. The condition of being of negative average weight is equal to condition of being of negative weight. So it's enough to check wether we have cycle with negative weight. We can do it using Bellman-Ford algorithm. If we have such a cycle then the answer is less than C.
So now we can find the answer via binary search. The resulting complexity wil be O(VE log(MaxWeight))

The problem you describe is called the minimum mean cycle problem and can be solved efficiently. Further, there's some really nice optimization theory to check out if you're interested (start with the standard reference AMO93).

Related

Shortest Path in a Directed Acyclic Graph with two types of costs

I am given a directed acyclic graph G = (V,E), which can be assumed to be topologically ordered (if needed). The edges in G have two types of costs - a nominal cost w(e) and a spiked cost p(e).
The goal is to find the shortest path from a node s to a node t which minimizes the following cost:
sum_e (w(e)) + max_e (p(e)), where the sum and maximum are taken over all edges in the path.
Standard dynamic programming methods show that this problem is solvable in O(E^2) time. Is there a more efficient way to solve it? Ideally, an O(E*polylog(E,V)) algorithm would be nice.
---- EDIT -----
This is the O(E^2) solution I found using dynamic programming.
First, order all costs p(e) in an ascending order. This takes O(Elog(E)) time.
Second, define the state space consisting of states (x,i) where x is a node in the graph and i is in 1,2,...,|E|. It represents "We are in node x, and the highest edge weight p(e) we have seen so far is the i-th largest".
Let V(x,i) be the length of the shortest path (in the classical sense) from s to x, where the highest p(e) encountered was the i-th largest. It's easy to compute V(x,i) given V(y,j) for any predecessor y of x and any j in 1,...,|E| (there are two cases to consider - the edge y->x is has the j-th largest weight, or it does not).
At every state (x,i), this computation finds the minimum of about deg(x) values. Thus the complexity is O(|E| * sum_(x\in V) deg(x)) = O(|E|^2), as each node is associated to |E| different states.
I don't see any way to get the complexity you want. Here's an algorithm that I think would be practical in real life.
First, reduce the graph to only vertices and edges between s and t, and do a topological sort so that you can easily find shortest paths in O(E) time.
Let W(m) be the minimum sum(w(e)) cost of paths max(p(e)) <= m, and let P(m) be the smallest max(p(e)) among those shortest paths. The problem solution corresponds to W(m)+P(m) for some cost m. Note that we can find W(m) and P(m) simultaneously in O(E) time by finding a shortest W-cost path, using P-cost to break ties.
The relevant values for m are the p(e) costs that actually occur, so make a sorted list of those. Then use a Kruskal's algorithm variant to find the smallest m that connects s to t, and calculate P(infinity) to find the largest relevant m.
Now we have an interval [l,h] of m-values that might be the best. The best possible result in the interval is W(h)+P(l). Make a priority queue of intervals ordered by best possible result, and repeatedly remove the interval with the best possible result, and:
stop if the best possible result = an actual result W(l)+P(l) or W(h)+P(h)
stop if there are no p(e) costs between l and P(h)
stop if the difference between the best possible result and an actual result is within some acceptable tolerance; or
stop if you have exceeded some computation budget
otherwise, pick a p(e) cost t between l and P(h), find a shortest path to get W(t) and P(t), split the interval into [l,t] and [t,h], and put them back in the priority queue and repeat.
The worst case complexity to get an exact result is still O(E2), but there are many economies and a lot of flexibility in how to stop.
This is only a 2-approximation, not an approximation scheme, but perhaps it inspires someone to come up with a better answer.
Using binary search, find the minimum spiked cost θ* such that, letting C(θ) be the minimum nominal cost of an s-t path using edges with spiked cost ≤ θ, we have C(θ*) = θ*. Every solution has either nominal or spiked cost at least as large as θ*, hence θ* leads to a 2-approximate solution.
Each test in the binary search involves running Dijkstra on the subset with spiked cost ≤ θ, hence this algorithm takes time O(|E| log2 |E|), well, if you want to be technical about it and use Fibonacci heaps, O((|E| + |V| log |V|) log |E|).

least cost path, destination unknown

Question
How would one going about finding a least cost path when the destination is unknown, but the number of edges traversed is a fixed value? Is there a specific name for this problem, or for an algorithm to solve it?
Note that maybe the term "walk" is more appropriate than "path", I'm not sure.
Explanation
Say you have a weighted graph, and you start at vertex V1. The goal is to find a path of length N (where N is the number of edges traversed, can cross the same edge multiple times, can revisit vertices) that has the smallest cost. This process would need to be repeated for all possible starting vertices.
As an additional heuristic, consider a turn-based game where there are rooms connected by corridors. Each corridor has a cost associated with it, and your final score is lowered by an amount equal to each cost 'paid'. It takes 1 turn to traverse a corridor, and the game lasts 10 turns. You can stay in a room (self-loop), but staying put has a cost associated with it too. If you know the cost of all corridors (and for staying put in each room; i.e., you know the weighted graph), what is the optimal (highest-scoring) path to take for a 10-turn (or N-turn) game? You can revisit rooms and corridors.
Possible Approach (likely to fail)
I was originally thinking of using Dijkstra's algorithm to find least cost path between all pairs of vertices, and then for each starting vertex subset the LCP's of length N. However, I realized that this might not give the LCP of length N for a given starting vertex. For example, Dijkstra's LCP between V1 and V2 might have length < N, and Dijkstra's might have excluded an unnecessary but low-cost edge, which, if included, would have made the path length equal N.
It's an interesting fact that if A is the adjacency matrix and you compute Ak using addition and min in place of the usual multiply and sum used in normal matrix multiplication, then Ak[i,j] is the length of the shortest path from node i to node j with exactly k edges. Now the trick is to use repeated squaring so that Ak needs only log k matrix multiply ops.
If you need the path in addition to the minimum length, you must track where the result of each min operation came from.
For your purposes, you want the location of the min of each row of the result matrix and corresponding path.
This is a good algorithm if the graph is dense. If it's sparse, then doing one bread-first search per node to depth k will be faster.

What is the diameter of a graph with just one noed?

I'm trying to find an answer to a problem in my Distributed Algorithms course, and to do so I want to get something clarified.
What is the diameter of a graph with one node, with an edge to itself? Is it 1 or 0?
If you are interested, the question to which I'm trying to find an answer is this:
In terms of n (# nodes), the number of messages (= diam * |E|) used in
the FloodMax algorithm is easily seen to be O(n^3). Produce a class of
digraphs in which the product (diam * |E|) really is Omega(n^3).
The digraph I came up with is a graph with just one node, which has a directed edge to itself. That way |E| would be 1 which is n^2, and if the diam is 1, it satisfies the second condition where diam = 1 = n as well. So it gives me a class of digraphs with message complexity being Omega(n^3).
So am I correct in my thinking, that in such a graph the diameter is 1?
Two things:
It seems to be 0 according to this, which says:
In other words, a graph's diameter is the largest number of vertices which must be traversed in order to travel from one vertex to another when paths which backtrack, detour, or loop are excluded from consideration.
Your solution to the given problem should describe how to build a graph (or rather say what type of known graph has that property, since it says "produce a class") with n nodes, not a graph with however many nodes you manually figured out a solution for. I can do the same for 2 nodes:
1 -- 2
|E| = 1 = (1/4)*2^2 = (1/4)*n^2 = O(n^2)
diam = 1 = 2 - 1 = n - 1 = O(n)
tada!
Or here's how we can make your solution work even if the diameter is 0: 0 = 1 - 1 = n - 1 = O(n) => your solution still works!
So even if you considered paths with loops as well, I would still deem your solution incorrect.
O(n^3) and Omega(n^3) do not mean cn^3, and there is no problem with a function that is 0 at finitely many nonzero values of n being in O(n^3) and Omega(n^3). For example, n^3-100 is in both, as is n^3-100n^2. For the purposes of asymptotics, it is unimportant what the diameter is for a single example. You are asked to find an infinite family of graphs with large enough diameters, and a single example of a graph doesn't affect the asymptotics of the infinite family.
That said, the diameter of a graph (or strongly connected digraph) can be defined in a few ways. One possibility is the greatest value of half of the minimum of the length of a round trip from v to w and back over all pairs v and w, and that is 0 when v and w coincide. So, the diameter of a graph with one vertex is 0.
Again, this doesn't help at all with the exercise that you have, to construct an infinite family. A family with one node and lots of edges back to itself isn't going to cut it. Think of how you might add many edges to graphs with large diameter, such as an n-cycle or path, without decreasing the diameter that much.

Minimum number of days required to solve a list of questions

There are N problems numbered 1..N which you need to complete. You've arranged the problems in increasing difficulty order, and the ith problem has estimated difficulty level i. You have also assigned a rating vi to each problem. Problems with similar vi values are similar in nature. On each day, you will choose a subset of the problems and solve them. You've decided that each subsequent problem solved on the day should be tougher than the previous problem you solved on that day. Also, to not make it boring, consecutive problems you solve should differ in their vi rating by at least K. What is the least number of days in which you can solve all problems?
Input:
The first line contains the number of test cases T. T test cases follow. Each case contains an integer N and K on the first line, followed by integers v1,...,vn on the second line.
Output:
Output T lines, one for each test case, containing the minimum number of days in which all problems can be solved.
Constraints:
1 <= T <= 100
1 <= N <= 300
1 <= vi <= 1000
1 <= K <= 1000
Sample Input:
2
3 2
5 4 7
5 1
5 3 4 5 6
Sample Output:
2
1
This is one of the challenge from interviewstreet.
Below is my approach
Start from 1st question and find out max possible number of question can be solve and remove these questions from the question list.Now again start from first element of the remainning list and do this till now size of the question list is 0.
I am getting wrong answer from this method so looking for some algo to solve this challenge.
Construct a DAG of problems in the following way. Let pi and pj be two different problems. Then we will draw a directed edge from pi to pj if and only if pj can be solved directly after pi on the same day, consecutively. Namely, the following conditions have to be satisfied:
i < j, because you should solve the less difficult problem earlier.
|vi - vj| >= K (the rating requirement).
Now notice that each subset of problems that is chosen to be solved on some day corresponds to the directed path in that DAG. You choose your first problem, and then you follow the edges step by step, each edge in the path corresponds to the pair of problems that have been solved consecutively on the same day. Also, each problem can be solved only once, so any node in our DAG may appear only in exactly one path. And you have to solve all the problems, so these paths should cover all the DAG.
Now we have the following problem: given a DAG of n nodes, find the minimal number of non-crossing directed paths that cover this DAG completely. This is a well-known problem called Path cover. Generally speaking, it is NP-hard. However, our directed graph is acyclic, and for acyclic graphs it can be solved in polynomial time using reduction to the matching problem. Maximal matching problem, in its turn, can be solved using Hopcroft-Karp algorithm, for example. The exact reduction method is easy and can be read, say, on Wikipedia. For each directed edge (u, v) of the original DAG one should add an undirected edge (au, bv) to the bipartite graph, where {ai} and {bi} are two parts of size n.
The number of nodes in each part of the resulting bipartite graph is equal to the number of nodes in the original DAG, n. We know that Hopcroft-Karp algorithm runs in O(n2.5) in the worst case, and 3002.5 ≈ 1 558 845. For 100 tests this algorithm should take under a 1 second in total.
The algorithm is simple. First, sort the problems by v_i, then, for each problem, find the number of problems in the interval (v_i-K, v_i]. The maximum of those numbers is the result. The second phase can be done in O(n), so the most costly operation is sorting, making the whole algorithm O(n log n). Look here for a demonstration of the work of the algorithm on your data and K=35 in a spreadsheet.
Why does this work
Let's reformulate the problem to the problem of graph coloring. We create graph G as follows: vertices will be the problems and there will be an edge between two problems iff |v_i - v_j| < K.
In such graph, independent sets exactly correspond to sets of problems doable on the same day. (<=) If the set can be done on a day, it is surely an independent set. (=>) If the set doesn't contain two problems not satisfying the K-difference criterion, you can just sort them according to the difficulty and solve them in this order. Both condition will be satisfied this way.
Therefore, it easily follows that colorings of graph G exactly correspond to schedules of the problems on different days, with each color corresponding to one day.
So, we want to find the chromaticity of graph G. This will be easy once we recognize the graph is an interval graph, which is a perfect graph, those have chromaticity equal to cliqueness, and both can be found by a simple algorithm.
Interval graphs are graphs of intervals on the real line, edges are between intervals that intersect. Our graph, as can be easily seen, is an interval graph (for each problem, assign an interval (v_i-K, v_i]. It can be easily seen that the edges of this interval graph are exactly the edges of our graph).
Lemma 1: In an interval graph, there exist a vertex whose neighbors form a clique.
Proof is easy. You just take the interval with the lowest upper bound (or highest lower bound) of all. Any intervals intersecting it have the upper bound higher, therefore, the upper bound of the first interval is contained in the intersection of them all. Therefore, they intersect each other and form a clique. qed
Lemma 2: In a family of graphs closed on induced subgraphs and having the property from lemma 1 (existence of vertex, whose neighbors form a clique), the following algorithm produces minimal coloring:
Find the vertex x, whose neighbors form a clique.
Remove x from the graph, making its subgraph G'.
Color G' recursively
Color x by the least color not found on its neighbors
Proof: In (3), the algorithm produces optimal coloring of the subgraph G' by induction hypothesis + closeness of our family on induced subgraphs. In (4), the algorithm only chooses a new color n if there is a clique of size n-1 on the neighbors of x. That means, with x, there is a clique of size n in G, so its chromaticity must be at least n. Therefore, the color given by the algorithm to a vertex is always <= chromaticity(G), which means the coloring is optimal. (Obviously, the algorithm produces a valid coloring). qed
Corollary: Interval graphs are perfect (perfect <=> chromaticity == cliqueness)
So we just have to find the cliqueness of G. That is easy easy for interval graphs: You just process the segments of the real line not containing interval boundaries and count the number of intervals intersecting there, which is even easier in your case, where the intervals have uniform length. This leads to an algorithm outlined in the beginning of this post.
Do we really need to go to path cover? Can't we just follow similar strategy as LIS.
The input is in increasing order of complexity. We just have to maintain a bunch of queues for tasks to be performed each day. Every element in the input will be assigned to a day by comparing the last elements of all queues. Wherever we find a difference of 'k' we append the task to that list.
For ex: 5 3 4 5 6
1) Input -> 5 (Empty lists so start a new one)
5
2) 3 (only list 5 & abs(5-3) is 2 (k) so append 3)
5--> 3
3) 4 (only list with last vi, 3 and abs(3-4) < k, so start a new list)
5--> 3
4
4) 5 again (abs(3-5)=k append)
5-->3-->5
4
5) 6 again (abs(5-6) < k but abs(4-6) = k) append to second list)
5-->3-->5
4-->6
We just need to maintain an array with last elements of each list. Since order of days (when tasks are to be done is not important) we can maintain a sorted list of last tasks therefore, searching a place to insert the new task is just looking for the value abs(vi-k) which can be done via binary search.
Complexity:
The loop runs for N elements. In the worst case , we may end up querying ceil value using binary search (log i) for many input[i].
Therefore, T(n) < O( log N! ) = O(N log N). Analyse to ensure that the upper and lower bounds are also O( N log N ). The complexity is THETA (N log N).
The minimum number of days required will be the same as the length of the longest path in the complementary (undirected) graph of G (DAG). This can be solved using Dilworth's theorem.

Finding subset of disjunctive intervals with maximal weights

I am looking for an algorithm I could use to solve this, not the code. I wondered about using linear programming with relaxation, but maybe there are more efficient ways for solving this?
The problem
I have set of intervals with weights. Intervals can overlap. I need to find maximal sum of weights of disjunctive intervals subset.
Example
Intervals with weights :
|--3--| |---1-----|
|----2--| |----5----|
Answer: 8
I have an exact O(nlog n) DP algorithm in mind. Since this is homework, here is a clue:
Sort the intervals by right edge position as Saeed suggests, then number them up from 1. Define f(i) to be the highest weight attainable by using only intervals that do not extend to the right of interval i's right edge.
EDIT: Clue 2: Calculate each f(i) in increasing order of i. Keep in mind that each interval will either be present or absent. To calculate the score for the "present" case, you'll need to hunt for the "rightmost" interval that is compatible with interval i, which will require a binary search through the solutions you've already computed.
That was a biggie, not sure I can give more clues without totally spelling it out ;)
If there is no weight it's easy you can use greedy algorithm by sorting the intervals by the end time of them, and in each step get the smallest possible end time interval.
but in your case I think It's NPC (should think about it), but you can use similar greedy algorithm by Value each interval by Weigth/Length, and each time get one of a possible intervals in sorted format, Also you can use simulated annealing, means each time you will get best answer by above value with probability P (p is near to 1) or select another interval with probability 1-P. you can do it in while loop for n times to find a good answer.
Here's an idea:
Consider the following graph: Create a node for each interval. If interval I1 and interval I2 do not overlap and I1 comes before I2, add a directed edge from node I1 to node I2. Note this graph is acyclic. Each node has a cost equal to the length of the corresponding interval.
Now, the idea is to find the longest path in this graph, which can be found in polynomial time for acyclic graphs (using dynamic programming, for example). The problem is that the costs are in the nodes, not in the edges. Here is a trick: split each node v into v' and v''. All edges entering v will now enter v' and all edges leaving v will now leave v''. Then, add an edge from v' to v'' with the node's cost, in this case, the length of the interval. All the other edges will have cost 0.
Well, if I'm not mistaken the longest path in this graph will correspond to the set of disjoint intervals with maximum sum.
You could formulate this problem as a general IP (integer programming) problem with binary variables indicating whether an interval is selected or not. The objective function will then be a weighted linear combination of the variables. You would then need appropriate constraints to enforce disjunctiveness amongst the intervals...That should suffice given the homework tag.
Also, just because a problem can be formulated as an integer program (solving which is NP-Hard) it does not mean that the problem class itself is NP-Hard. So, as Ulrich points out there may be a polynomially-solvable formulation/algorithm such as formulating/solving the problem as a linear program.
Correct solution (end to end) is explained here: http://tkramesh.wordpress.com/2011/02/03/dynamic-programming-1-weighted-interval-scheduling/

Resources