Error in an approach to Maximum Bipartite matching - algorithm

A bipartite graph with a source and sink is given as shown below. The capacity of every edge is 1 unit :
Source : GeeksforGeeks
I'm trying to find the maximum flow from source to sink. One approach would be using the Ford-Fulkerson Algorithm for Maximum Flow Problem, which is applicable to all the graphs.
I found a simple approach to find the maximum flow(too simple to be correct!) and I'm not able to find any error in the approach.
Approach :
c1 = Count the number of vertices having non zero number of edges originating from it ,in the list of vertices having outgoing edges.
c2 = Count the number of vertices having non zero number of edges converging into it ,in the list of vertices having incoming edges.
The max flow would be the minimum of both these numbers,i.e., min(c1,c2).[Since any path needs one vertex from the outgoing vertices list, and other from incoming vertices list.]
Any help would be appreciated.

Consider a graph like
*--*
/
/
* *
/
/
*--*
(The patch of working by connected component doesn't fix things; connect the lower left to the upper right.)

Don't have an exact answer, but I have an iterative algorithm that works.
To me you clearly need to equilibrate the flow, so that it is distributed among the left vertices that can send it to the right vertices that can receive it.
Suppose you model your situation with a matrix A containing the bipartite links. You can then assume that if your matrix contains exactly the amount of flow (between 0 and 1) you want to pass in an edge, then the total flow given this decision is b=A*a where a is a vector of ones.
If you start with the maximum capacity for A, putting all the edges at one and all the others at 0, you might have some elements of b with more than 1, but you can reduce their corresponding elements of A so they pass less flow.
Then you can revert the flow and see what is the maximum reception capacity of your bipartite part, and test it with a=A'b.
Again you might have elements of a that are more than 1 meaning that an ideal flow would be greater than the possible capacity from source to the element, and reduce these elements in the flow of A.
With this ping-pong algorithm and reducing the corrections progressively, you are guaranteed to converge on the optimal matrix.
Given a final b=Aa with a vector of ones, your maximal flow will be sum(b).
See the octave code below, I use B as the converging matrix, and let me know your comments.
A=[0 1 0 1;1 0 0 1;1 1 0 0;0 0 1 0];
B=A;
repeat
b=B*ones(4,1);
B=B.*([.8 .8 .8 1]'*ones(1,4));
a=B'*ones(4,1)
B=B.*(ones(4,1)*[.9 .9 1 .9]);
until converge
maxflow=sum(b)

Related

Given a flow network and a max flow f on it, Determine whether there are at least 4 different max flows

I'm having trouble solving this one and would really appreciate any help. thank you in advance!
so, the problem is:
given a flow network with integer capacities on the edges and a max flow f on that network, I need to write an algorithm (efficient one) that determine whether there are at least 4 more different max flows on that given network.
I have seen people suggesting to check for cycles in the residual network. so if there is a cycle, the max flow is not unique, hence, there is another max flow "f2" and than we can choose every 0 < x < 1 and set infinite max flows such as (1-x)(|f|) + x|f2|.but, I cant seem to understand why the cycles in the residual network means that the max flow is not unique and also have a really hard time proving the second part is legal. (the infinite max flows)
thanks again!
The idea with the cycles seems correct. A network N has multiple distinct max flows if and only if the residual graph of the max flow contains a cycle. If there is more than one max flow there are infinitely many.
If there is a cycle in the residual graph we can augment the flow along that cycle obtaining a different flow. More precisely, let C be a cycle in the residual graph of a max flow. Let d > 0 denote the smallest residual capacity of an edge of cycle C. We can augment the flow on cycle C by any amount in the interval [0, d] each time obtaining a different max flow (so there are indeed infinitely many max flows, to get four you can augment the flow along cycle C by four arbitrary, distinct values from the interval).
We still have to prove that if there are multiple different maximum flows in a network there will indeed always be a cycle in the residual network of any max flow on that network. Doing that in a mathematically rigorous way can be a bit cumbersome, but the main idea is the following: take two distinct max flows F1 and F2 and compute the difference between them (i.e. for every edge e compute F1(e) - F2(e)). Consider the edges where the difference is non-zero. Those edges will all be present in the residual graph of flow F1 (if the sign of the difference is negative the edge in the direction of flow won't be saturated, if the sign is positive the reverse direction will be present). The conservation of flow constraints at each vertex guarantee that those edges will always form a cycle. For an intuitive understanding you can visualize this drawing both the two flows F1 and F2 on the same network in two different colors. You will see that the edges where the flows differ always form cycles. Obviously the flows intersect (at least at source and sink) and from some intersections you will have a path where one flow is greater on one edge and a path where the other flow is greater on another edge. Those two paths must intersect again somewhere deeper in the network (at the latest at the sink but it could also be before), and therefore form a cycle.
Using this, the most efficient algorithm I can think of would be:
Compute the residual graph
Use DFS to check if there is a cycle in the residual graph (you will probably have to run the DFS multiple times since the residual graph consists of multiple components separated by min cuts)
If you have to, generate four different max flows augmenting the flow along the found cycle by different amounts
This would be linear both in the number of vertices and edges

Min cost matching with outliers

Given a complete bipartite graph G = (V1, V2; E), |V1|=|V2|=n and a non-negative cost for each edge the min cost bipartite matching problem finds a partition of G to n pairs of vertices connected by an edge, such that the total sum of the edges costs is minimized.
This problem can be solved using the min cost flow algorithm, by adding a source and sink vertices connected to each group with a weight 0 and a capacity 1.
But what if instead we get as an input a number m < n and want to find a partition of m pairs such that the total cost is minimized?
At first I thought we can just add another vertex at the beginning which is connected to the original source with weight 0 and capacity m and call it the new source, that way the maximum flow would be m and it should choose only m pairs.
However when I ran this algorithm using boost's min cost flow function a lot of times there were 2 big problems:
1) The flow in an edge wasn't always an integer (i.e. instead of 0 or 1 the flow was 0.5 for example).
2) There were many possible (non-integer) solutions so even for the same input with different order the algorithm outputted different results.
The moment I set m to be n both of these problems were resolved.
So my question is: is there a way to solve this problems and if not is there another algorithm that can solve the min cost bipartite matching with outliers problem?
I just found out the algorithm I described in the question and said that didn't work actually did work and it happened because of floating point error caused inside boosts min cost flow function, when I multiplied all the costs by 10000 all the problems were resolved.

Minimum Hop Count in Directed Graph based on Conditional Statement

A directed graph G is given with Vertices V and Edges E, representing train stations and unidirectional train routes respectively.
Trains of different train numbers move in between pairs of Vertices in a single direction.
Vertices of G are connected with one another through trains with allotted train numbers.
A hop is defined when a passenger needs to shift trains while moving through the graph. The passenger needs to shift trains only if the train-number changes.
Given two Vertices V1 and V2, how would one go about calculating the minimum number of hops needed to reach V2 starting from V1?
In the above example, the minimum number of hops between Vertices 0 and 3 is 1.
There are two paths from 0 to 3, these are
0 -> 1 -> 2 -> 7-> 3
Hop Count 4
Hop Count is 4 as the passenger has to shift from Train A to B then C and B again.
and
0 -> 5 -> 6 -> 8 -> 7 -> 3
Hop Count 1
Hop Count is 1 as the passenger needs only one train route, B to get from Vertices 0 to 3
Thus the minimum hop count is 1.
Input Examples
Input Graph Creation
Input To be solved
Output Example
Output - Solved with Hop Counts
0 in the Hop Count column implies that the destination can't be reached
Assuming number of different trainIDs is relatively small (like 4 in your example), then I suggest using layered graph approach.
Let number of vertices be N, number of edges M, and number of different trainIDs K.
Let's divide our graph to K graphs. (graphA, graphB, ...)
graphA contains only edges labeled with A, and so on.
Weight of each edge in each of the graphs is 0.
Now create edges between these graphs.
Edge between graphs is representing a 'hop'
grapha[i] connects to graphB[i], graphC[i], ...
Each of these edges has weight 1.
Now for every graph run Dijkstra's shortest path algorithm from V1 in that graph, and read results from V2 in all graphs, take minimal value.
This way minimum of results for running dijkstra's for every graph will be your minimum number of hops.
Memory complexity is O(K*(N+M))
And time complexity is O(K*(((2^K)*N+M)*log(KV)))
(2^K)*N comes from fact that for every 1<=i<=N, vertices graphA[i],graphB[i],... must be connected to each other, and this gives 2^K connections for every i, and (2^K)*N connections in total.
For cases where K is relatively small, like 4 in your example, but N and M are quite big, this algorithm works like a charm. It isn't suitable for situation where K is big though.
I'm not sure if that's clear. Tell me if you need more detailed explanation.
EDIT:
Hope this makes my algorithm more clear.
Black edges have weight 0, and red edges have weight 1.
By using layered graph approach, we translated our special graph into plain weighted graph, so we can just run Dijkstra's algorithm on it.
Sorry for ugly image.
EDIT:
Since max K = 10, we would like to remove 2^K from our time complexity. I believe this can be done by making edges that represent possible hops virtual, instead of physically storing them on adjacency list.

Minimum bit wise OR of weights between source and destination in a graph

Given a weighted and un-directed graph with n vertices and m edges, where 1 <= n <= 1000 and 1 <= m <= 10000. There can be multiple edges between two nodes of a graph with different weights.
Given a source and a destination, how to find the minimum distance between source and destination? Here distance is defined as the bit-wise OR of the weights of edges involved in the path.
Hint 1
Try working out the bits of the minimum distance in turn.
Hint 2
Can you work out if there is a path which has bit x clear?
Hint 3
Try removing edges from the graph if the weight has bit x set.
Hint 4
Try seeing if there is a path with bit 31 set to 0.
If not, see if there is a path with bit 30 set to 0.
If there is, try seeing if there is a path with bit 31 and bit 30 set to 0, etc.
While it may not be the most efficient algorithm, I would implement a best-first search on the graph. Create a priority queue that's going to contain the vertices to be explored next, only containing the source initially, with a 0 priority.
Then begin a loop that gets the lowest priority element from the list, explores all of its neighbors, and adds them into the queue, stopping when it finds the destination vertex or when the queue is empty.

Minimum flow in a network with lower bounds - what am I doing wrong?

The problem that I am trying to solve is as follows:
Given a directed graph, find the minimum number of paths that "cover" the entire graph. Multiple paths may go through the same vertice, but the union of the paths should be all of them.
For the given sample graph(see image), the result should be 2 (1->2->4, and 1->2->3 suffice).
By splitting the vertices and assigning a lower bound of 1 for each edge that connects an in-vertex to an out-vertex, and linking a source to every in-vertex and every out-vertex to a sink (they are not shown in the diagram, as it would make the whole thing messy), the problem is now about finding the minimum flow in the graph, with lower bounds constraints.
However, I have read that in order to solve this, I have to find a feasible flow, and then assign capacities as follows : C(e) = F(e) - L(e). However, by assigning a flow of 1 to each Source-vertex edge, vertex-Sink edge, and In-Out edge, the feasible flow is correct, and the total flow is equal to the number of vertices. But by assigning the new capacities, the in-out edges (marked blue) get a capacity of 0 (they have a lower-bound of 1, and in our choosing of a feasible flow, they get a flow of 1), and no flow is possible.
Fig. 2 : How I choose the "feasible flow"
However, from the diagram you can obviously see that you can direct a 2-flow that suffices the lower-bound on each "vertex edge".
Have I understood the minimum flow algorithm wrong? Where is the mistake?!
Once you have the feasible flow, you need to start "trimming" it by returning flow from the sink back to the source, subject to the lower bound and capacity constraints (really just residual capacities). The lower two black edges are used in the forward direction for this, because they don't have flow on them yet. The edges involving the source and the sink are used in reverse, because we're undoing the flow already on them. If you start thinking about all of this in terms of residual capacities, it will make more sense.

Resources