I'm trying to understand how the LP formulation for shortest path problem works. However I'm having trouble understanding constraints. Why does this formulation work?
http://ie.bilkent.edu.tr/~ie400/Lecture8.pdf
I'm having trouble understanding how the constraints work at pages 15 and 17. I got the main idea and I understand how and why x should take some values but I did not understand how the whole system works in terms of math. Can someone explain? In the exam, I am supposed to be able to create and modify such constraints but I am pretty far from doing that.
What isn't very clear on those slides (pp. 15 and 17) is that the line beginning with "s.t." is actually specifying one constraint per vertex i, i.e., n separate constraints in total (if there are n vertices). Normally this would be communicated by writing something like "∀i ϵ V" alongside the constraint.
In any case, this line says that for each vertex i, the total amount of flow entering it from any other vertices must equal the total amount of flow leaving it -- unless the vertex is the source, in which case the total amount of flow leaving it must be greater by 1, or the sink, in which case the total amount of flow entering it must be greater by 1. It may not be obvious how to come up with this system of constraints in the first place, but by looking at some examples you should be able to see that any shortest path (or in fact, any path from s to t at all) satisfies all of them: every internal vertex in the path will have 1 incoming edge and 1 outgoing edge, while s and t will have just 1 outgoing, or 1 incoming, edge, respectively. Vertices that don't participate in the path at all have 0 incoming and 0 outgoing flow, so they work too.
One more point is that with flow problems, very often the numbers labelling the edges represent capacity constraints -- maximum limits on the amount of flow between the two endpoints -- not costs as they do here.
Related
Given an undirected weighted graph (or a single connected component of a larger disjoint graph) which typically will contain numerous odd and even cycles, I am searching for algorithms to remove the smallest possible number of edges necessary in order to produce one or more bipartite subgraphs. Are there any standard algorithms in the literature such as exist for minimum cut, etc.?
The problem I am trying to solve looks like this in the real world:
Presentations of about 1 hour each are given to students about different subjects in one or two time blocks. Students can sign up for at least one presentation of their choice, or two, or three (3rd choice is an alternative in case one of the others isn't going to be presented). They have to be all different choices. If there are less than three sign-ups for a given presentation, it will not be given. If there are 18 or more, it will be given twice in both blocks. I have to schedule the presentations such that the maximum number of sign-ups are satisfied.
Scheduling is trivial in the following cases:
Sign-ups for only one presentation can always be satisfied if the presentation is given (i.e. sign-ups >= 3);
Sign-ups for two given presentations are always satisfiable if at least one of them is given twice.
First, all sign-ups are aggregated to determine which ones are given once and which are given twice. If a student has signed up for a presentation with too few other sign-ups, the alternative presentation is chosen if it will also be given.
At the end of the day, I am left with an undirected weighted graph where the vertices are the presentations and the edges represent students who have signed up for that combination of presentations, each of which is only presented once. The weight corresponds to the number of sign-ups for the unique combination of presentations (thus avoiding parallel edges).
If the number of vertices, or presentations, is around 20 or less, I have come up with a brute force solution which finishes in acceptable time. However, each additional vertex will double the runtime of that solution. After 28 or so, it rapidly becomes unmanageable.
This year we had 37 presentations, thirty of which were only given once and thus ended up in the graph. What I am trying right now for larger graphs is the following:
Find all discrete components and solve each component individually;
For each component, remove leaf nodes and bridge edges recursively;
Generate a spanning tree (I am using Kruskal's algorithm which works very well), saving the removed edges;
Generate the fundamental cycle set by adding one removed edge back into the tree at a time and stripping off the rest of the tree;
Using the Gibbs-Welch algorithm, I generate the complete set of all elemental cycles starting with the fundamental set obtained in step 4;
Count the number of odd and even cycles to which each edge belongs;
Create a priority queue of edges (ordering discussed below) and remove each edge successively from its connected component until the resulting component is bipartite.
I cannot find an ordering of the priority queue for which I can prove that the result would be as acceptable as a solution obtained using the brute force method (it is probably NP-hard). However, I am trying something along these lines:
a. If the edge belongs only to odd cycles, remove it first;
b. If the edge belongs to more odd than even cycles, remove it before any other edges which belong to more even cycles than odd;
c. Edges with the smallest weight should be removed first.
If an edge belongs to both an odd and an even cycle, removing it would leave a larger odd cycle behind. That is why I am ordering them like that. Obviously, the larger the number of odd cycles to which an edge belongs, the higher the priority, but only if less even cycles are affected.
There are additional criteria which exist but need to be considered outside of the graph problem; for example, removing an edge effectively removes one of the sign-ups for one of the presentations, so an eye has to be kept on not letting the number of sign-ups get too small.
(EDIT: there is also the possibility of splitting presentations into two blocks which have almost enough sign-ups, e.g. 15-16 instead of 18. But this means that whoever is giving the presentation would have to do it twice, so it is a trade-off.)
Thanks in advance for any suggestions!
This problem is equivalent to the NP-hard weighted max cut problem, which asks for a partition of the vertices into two parts such that the maximum number of edges go between the parts.
I think the easiest way to solve a problem size such as you have would be to formulate it as a quadratic integer program and then apply an off the shelf solver. The formulation looks like
maximize (1/2) sum_{ij} w_{ij} (1 - y_i y_j)
subject to
y_i in {±1} for all i
where w_ij is the weight of the undirected edge ij if present else zero (so the corresponding variable and its constraint can be omitted).
A question to the following exercise:
Let N = (V,E,c,s,t) be a flow network such that (V,E) is acyclic, and let m = |E|. Describe a polynomial-
time algorithm that checks whether N has a unique maximum flow, by solving ≤ m + 1 max-flow problems.
Explain correctness and running time of the algorithm
My suggestion would be the following:
run FF (Ford Fulkerson) once and save the value of the flow v(f) and the flow over all egdes f(e_i)
for each edge e_i with f(e_i)>0:
set capacity (in this iteration) of this edge c(e_i)=f(e_i)-1 and run FF.
If the value of the flow is the same as in the original graph, then there exists another way to push the max flow through the network and we're done - the max flow isn't unique --> return "not unique"
Otherwise we continue
we're done with looping without finding another max flow of same value, that means max flow is unique -> return "unique"
Any feedback? Have I overlooked some cases where this does not work?
Your question leaves a few details open, e.g., is this an integer flow graph (probably yes, although Ford-Fulkerson, if it converges, can run on other networks as well), and how exactly do you define whether two flows are different (is it enough that the function mapping edges to flows be different, or must the set of edges actually flowing something be different, which is a stronger requirement).
If the network is not necessarily integer flows, then, no, this will not necessarily work. Consider the following graph, where, on each edge, the number within the parentheses represents the actual flow, and the number to the left of the parentheses represents the capacity (e.g., the capacity of each of (a, c) and (c, d) is 1.1, and the flow of each is 1.):
In this graph, the flow is non-unique. It's possible to flow a total of 1 by floating 0.5 through (a, b) and (b, d). Your algorithm, however, won't find this by reducing the capacity of each of the edges to 1 below its current flow.
If the network is integer, it is not guaranteed to find a different set of participating edges than the current one. You can see it through the following graph:
Finally, though, if the network is an integer flow network, and the meaning of a different flow is simply a different function of edges to flows, then your algorithm is correct.
Sufficiency If your algorithm finds a different flow with the same total result, then obviously the new flow is legal, and, also, necessarily, at least one of the edges is flowing a different amount than it did before.
Necessity Suppose there is a different flow than the original one (with the same total value), with at least one of the edges flowing a different amount. Say that, for each edge, the flow in the alternative solution is not less than the flow in the original solution. Since the flows are different, there must be at least a single edge where the flow in the alternative solution increased. Without a different edge decreasing the flow, though, there is either a violation of the conservation of flow, or the original solution was suboptimal. Hence there is some edge e where the flow in the alternative solution is lower than in the original solution. Since it is an integer flow network, the flow must be at least 1 lower on e. By definition, though, reducing the capacity of e to at least 1 lower than the current flow, will not make the alternative flow illegal. Hence some alternative flow must be found if the capacity is decreased for e.
non integer, rational flows can be 'scaled' to integer
changing edges capacity is risky, because some edges may be critical and are included in every max flow
there is a better runtime solution, you don't need to check every single edge.
create a residual network (https://en.wikipedia.org/wiki/Flow_network). run DFS on the residual network graph, if you find a circle it means there is another max flow, wherein the flow on at least one edge is different.
The problem that I am trying to solve is as follows:
Given a directed graph, find the minimum number of paths that "cover" the entire graph. Multiple paths may go through the same vertice, but the union of the paths should be all of them.
For the given sample graph(see image), the result should be 2 (1->2->4, and 1->2->3 suffice).
By splitting the vertices and assigning a lower bound of 1 for each edge that connects an in-vertex to an out-vertex, and linking a source to every in-vertex and every out-vertex to a sink (they are not shown in the diagram, as it would make the whole thing messy), the problem is now about finding the minimum flow in the graph, with lower bounds constraints.
However, I have read that in order to solve this, I have to find a feasible flow, and then assign capacities as follows : C(e) = F(e) - L(e). However, by assigning a flow of 1 to each Source-vertex edge, vertex-Sink edge, and In-Out edge, the feasible flow is correct, and the total flow is equal to the number of vertices. But by assigning the new capacities, the in-out edges (marked blue) get a capacity of 0 (they have a lower-bound of 1, and in our choosing of a feasible flow, they get a flow of 1), and no flow is possible.
Fig. 2 : How I choose the "feasible flow"
However, from the diagram you can obviously see that you can direct a 2-flow that suffices the lower-bound on each "vertex edge".
Have I understood the minimum flow algorithm wrong? Where is the mistake?!
Once you have the feasible flow, you need to start "trimming" it by returning flow from the sink back to the source, subject to the lower bound and capacity constraints (really just residual capacities). The lower two black edges are used in the forward direction for this, because they don't have flow on them yet. The edges involving the source and the sink are used in reverse, because we're undoing the flow already on them. If you start thinking about all of this in terms of residual capacities, it will make more sense.
So let me explain the question:
You are given a graph. You find the max flow. But it turns out that an edge, e_i, had the wrong capacity. It had one less. Unfortunately, the flow was maxed out at the old capacity.
Compute the new max flow in linear time (in terms of the number of edges and vertices) once you are told e_i had the wrong capacity.
Here's my plan: (1) You can't only drop the flow at edge e_i by one at an edge because you must violate certain constraints: like flow is conserved at an edge. Fix the flows so you can get a valid flow. But how?
(2) Someone has given me a hint: it will be helpful to show the valid flow = previous flow -1.mmm...
Help.
Here's a couple of tips:
Suppose you found a path with nonzero flow (i.e >= 1), and contained this edge e_i. How might you use this path to make the overall flow valid again? Now, suppose you aren't given this path. How might you get it yourself?
Now, you know that the max flow of the new graph is either the same, or one less than before (why?). How might you find out which in linear time?
I am aware that Dijkstra's algorithm can find the minimum distance between two nodes (or in case of a metro - stations). My question though concerns finding the minimum number of transfers between two stations. Moreover, out of all the minimum transfer paths I want the one with the shortest time.
Now in order to find a minimum-transfer path I utilize a specialized BFS applied to metro lines, but it does not guarantee that the path found is the shortest among all other minimum-transfer paths.
I was thinking that perhaps modifying Dijkstra's algorithm might help - by heuristically adding weight (time) for each transfer, such that it would deter the algorithm from making transfer to a different line. But in this case I would need to find the transfer weights empirically.
Addition to the question:
I have been recommended to add a "penalty" to each time the algorithm wants to transfer to a different subway line. Here I explain some of my concerns about that.
I have put off this problem for a few days and got back to it today. After looking at the problem again it looks like doing Dijkstra algorithm on stations and figuring out where the transfer occurs is hard, it's not as obvious as one might think.
Here's an example:
If here I have a partial graph (just 4 stations) and their metro lines: A (red), B (red, blue), C (red), D (blue). Let station A be the source.
And the connections are :
---- D(blue) - B (blue, red) - A (red) - C (red) -----
If I follow the Dijkstra algorithm: initially I place A into the queue, then dequeue A in the 1st iteration and look at its neighbors :
B and C, I update their distances according to the weights A-B and A-C. Now even though B connects two lines, at this point I don't know
if I need to make a transfer at B, so I do not add the "penalty" for a transfer.
Let's say that the distance between A-B < A-C, which causes on the next iteration for B to be dequeued. Its neighbor is D and only at this
point I see that the transfer had to be made at B. But B has already been processed (dequeued). S
So I am not sure how this "delay" in determining the need for transfer would affect the integrity of the algorithm.
Any thoughts?
You can make each of your weights a pair: (# of transfers, time). You can add these weights in the obvious way, and compare them in lexicographic order (compare # of transfers first, use time as the tiebreaker).
Of course, as others have mentioned, using K * (# of transfers) + time for some large enough K produces the same effect as long as you know the maximum time apriori and you don't run out of bits in your weight storage.
I'm going to be describing my solution using the A* Algorithm, which I consider to be an extension (and an improvement -- please don't shoot me) of Dijkstra's Algorithm that is easier to intuitively understand. The basics of it goes like this:
Add the starting path to the priority queue, weighted by distance-so-far + minimum distance to goal
Every iteration, take the lowest weighted path and explode it into every path that is one step from it (discarding paths that wrap around themselves) and put it back into the queue. Stop if you find a path that ends in the goal.
Instead of making your weight simply distance-so-far + minimum-distance-to-goal, you could use two weights: Stops and Distance/Time, compared this way:
Basically, to compare:
Compare stops first, and report this comparison if possible (i.e., if they aren't the same)
If stops are equal, compare distance traveled
And sort your queue this way.
If you've ever played Mario Party, think of stops as Stars and distance as Coins. In the middle of the game, a person with two stars and ten coins is going to be above someone with one star and fifty coins.
Doing this guarantees that the first node you take out of your priority queue will be the level that has the least amount of stops possible.
You have the right idea, but you don't really need to find the transfer weights empirically -- you just have to ensure that the weight for a single transfer is greater than the weight for the longest possible travel time. You should be pretty safe if you give a transfer a weight equivalent to, say, a year of travel time.
As Amadan noted in a comment, it's all about creating right graph. I'll just describe it in more details.
Consider two vertexes (stations) to have edge if they are on a single line. With this graph (and weights 1) you will find minimum number of transitions with Dijkstra.
Now, lets assume that maximum travel time is always less 10000 (use your constant). Then, weight of edge AB (A and B are on one line) is a time_to_travel_between(A, B) + 10000.
Running Dijkstra on such graph will guarantee that minimal number of transitions is used and minimum time is reached in the second place.
update on comment
Let's "prove" it. There're two solution: with 2 transfers and 40 minutes travel time and with 3 transfers and 25 minutes travel time. In first case you travel on 3 lines, so path weight will be 3*10000 + 40. In second: 4*10000 + 25. First solution will be chosen.
I had the same problem as you, until now. I was using Dijkstra. The penalties for transfers is a very good idea indeed and I've been using it for a while now. The main problem is that you cannot use it directly in the weight as you first you have to identify the transfer. And I didn't want to modify the algorithm.
So what I'be been doing, is that each time and you find a transfer, delete the node, add it with the penalty weight and rerun the graph.
But this way I found out that Dijkstra wont work. And this is where I tried Floyd-Warshall which au contraire to Dijkstra compares all possible paths through the graph between each pair of vertices.
It helped me with my problem switching to Floyd-Warshall. Hope it helps you as well.
Its easier to code and lot more easier to implement.