Hungarian Algorithm using Maximum Bipartite Matching for Assignment - algorithm

I am trying to understand the O(N^4) explanation for the assignment problem by reading the topcoder article [1] . Specifically, i am unable to comprehend how the default O(N^5) procedure can be converted to O(N^4). Please read below for details:
Assume for the assignment problem, a complete bipartite graph of N vertices in both sets of vertices, and a edge cost from vertex i to vertex j, denoted by cost[i][j] that is say, integral, non-negative. We are trying to minimize the overall sum of weights of the perfect matching (assuming there is one)
Quoting the O(n^4) algorithm from [1]
Step 0)
A. For each vertex from left Set (workers) find the minimal outgoing edge and subtract its weight from all weights connected with this vertex. This will introduce 0-weight edges (at least one).
B. Apply the same procedure for the vertices in the right Set (jobs).
Step 1)
A. Find the maximum matching using only 0-weight edges (for this purpose you can use max-flow algorithm, augmenting path algorithm, etc.).
B. If it is perfect, then the problem is solved. Otherwise find the minimum vertex cover V (for the subgraph with 0-weight edges only), the best way to do this is to use Konig’s graph theorem.
Step 2)
Let delta = min(cost[i][j]) for i not belonging to vertex cover, and j not belonging to vertex cover.
Then, modify the cost matrix as follows:
cost[i][j] = cost[i][j] - delta for i not belonging to vertex cover, and j not belonging to vertex cover.
cost[i][j] = cost[i][j] + delta for i belonging to vertex cover, and j belonging to vertex cover.
cost[i][j] = cost[i][j] otherwise
Step 3)
Repeat Step 1) until problem is solved
I understand that the above algorithm is O(n^5) if implemented as-is, since the maximum matching on a bipartite graph takes O(n^3) if we use, say, breadth first search, and there are O(n^2) iterations of the overall algorithm since each edge becomes 0 in an iteration.
But, [1] also mentions:
finding the maximum matching in step 1 on each iteration will cause the algorithm to become O(n^5). In order to avoid this, on each step we can just modify the matching from the previous step, which only takes O(n^2) operations, making the overall algorithm O(n^4)
This is the part i do not understand. How do we modify the matching from the previous epoch, thereby taking only O(N^2) operations, instead of the normal O(N^3) iterations for maximum bipartite matching?
I referred [2] as well, but in that solution as well, there is only a comment that says:
// to make O(n^4), start from previous solution
I fail to understand how to convert the O(N^5) to O(N^4), and what exactly starting from previous solution mean?
Can someone explain using the algorithm referenced in [1], how to modify it to run in O(N^4)? Pseudo-code would be most welcome.
[1] https://www.topcoder.com/community/data-science/data-science-tutorials/assignment-problem-and-hungarian-algorithm/
[2] http://algs4.cs.princeton.edu/65reductions/Hungarian.java
Thanks in Advance.

Related

Graph In-degree Calculation from Adjacency-list

I came across this question in which it was required to calculate in-degree of each node of a graph from its adjacency list representation.
for each u
for each Adj[i] where i!=u
if (i,u) ∈ E
in-degree[u]+=1
Now according to me its time complexity should be O(|V||E|+|V|^2) but the solution I referred instead described it to be equal to O(|V||E|).
Please help and tell me which one is correct.
Rather than O(|V||E|), the complexity of computing indegrees is O(|E|). Let us consider the following pseudocode for computing indegrees of each node:
for each u
indegree[u] = 0;
for each u
for each v \in Adj[u]
indegree[v]++;
First loop has linear complexity O(|V|). For the second part: for each v, the innermost loop executes at most |E| times, while the outermost loop executes |V| times. Therefore the second part appears to have complexity O(|V||E|). In fact, the code executes an operation once for each edge, so a more accurate complexity is O(|E|).
According to http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)Graphs.html, Section 4.2, with an adjacency list representation,
Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n+m), where m is the total number of edges.
So, in the notation used here, the time complexity of computing the in-degree of a node is O(|V| + |E|).
This can be reduced at the cost of additional space of using extra space, however. The Wiki also states that
adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d-(u)) time, where d-(u) is u's in-degree.
An example of a package which implements this approach is the Python package Networkx. As you can see from the constructor of the DiGraph object for directional graphs, networkx keeps track of both self._succ and self._pred, which are dictionaries representing the successors and predecessors of each node, respectively. This allows it to compute each node's in_degree efficiently.
O(|V|+|E|) is the correct answer, because you visit each vertex in O(|V|) and each time you visit a fraction of the edges so O(|E|) in total, also usually |E|>>|V| so O(|E|) is also correct

Yen's improvement to Bellman-Ford

I stumbled with famous Yen's optimization of Bellman-Ford algorithm that I got initially from Wikipedia, then I found the same improvement in several textbooks in Exercise section (for example, this is a problem 24-1 in Cormen and Web exercise N5 in Sedgewick's "Algorithms").
Here's the improvement:
Yen's second improvement first assigns some arbitrary linear order on all vertices and then partitions the set of all edges into two subsets. The first subset, Ef, contains all edges (vi, vj) such that i < j; the second, Eb, contains edges (vi, vj) such that i > j. Each vertex is visited in the order v1, v2, ..., v|V|, relaxing each outgoing edge from that vertex in Ef. Each vertex is then visited in the order v|V|, v|V|−1, ..., v1, relaxing each outgoing edge from that vertex in Eb. Each iteration of the main loop of the algorithm, after the first one, adds at least two edges to the set of edges whose relaxed distances match the correct shortest path distances: one from Ef and one from Eb. This modification reduces the worst-case number of iterations of the main loop of the algorithm from |V| − 1 to |V|/2.
Unfortunately, I didn't manage to find the proof of this bound |V|/2, and moreover, it seems that I found a counterexample. I'm sure that I'm wrong about that, but I can't see where exactly.
The counterexample is just a path graph with vertices labeled from 1 to n and initial vertex 1. (So, E={(i, i+1)} for i in range from 1 to n-1). In that case the set of forward vertices is equal E (E_f = E) and the set of backward vertices is just the empty set. Each iteration of algorithm adds only one correctly computed distance, so algorithm finishes in n-1 time, which contradicts the proposed bound n/2.
What am I doing wrong?
UPD: So the mistake was very stupid. I didn't consider the iteration through vertices, thinking about iterations as of immediate updates of path costs. I'm not deleting this topic because someone upvoted it, in case this improvement can be interesting for someone.
This is actually the best case which finishes in 2 iterations regardless of the number of vertices.
Draw the iterations on paper or write the code. The first iteration will find all the correct shortest paths. The second iteration will then not change anything and the algorithm terminates because the set of vertices whose distance was changed last iteration is empty.
The "forward" run through the vertices that relaxes the Ef set of edges will do all the work, while the "backwards" run won't do anything because Eb is an empty set.

Complexity for edge addition (planar graph)?

I created a program that adds edges between vertices. The goal is to add as many edges as possible without crossing them(ie Planar graph). What is the complexity?
Attempt: Since I used depth first search I think it is O(n+m) where n is node and m is edge.
Also, if we plot the number of edges as a function of n what is it going to look like?
Your first question is impossible to answer, since you have not described the algorithm.
For your second question, any maximal planar graph with v ≥ 3 vertices has exactly 3v - 6 edges.

graph - How to find Minimum Directed Cycle (minimum total weight)?

Here is an excise:
Let G be a weighted directed graph with n vertices and m edges, where all edges have positive weight. A directed cycle is a directed path that starts and ends at the same vertex and contains at least one edge. Give an O(n^3) algorithm to find a directed cycle in G of minimum total weight. Partial credit will be given for an O((n^2)*m) algorithm.
Here is my algorithm.
I do a DFS. Each time when I find a back edge, I know I've got a directed cycle.
Then I will temporarily go backwards along the parent array (until I travel through all vertices in the cycle) and calculate the total weights.
Then I compare the total weight of this cycle with min. min always takes the minimum total weights. After the DFS finishes, our minimum directed cycle is also found.
Ok, then about the time complexity.
To be honest, I don't know the time complexity of my algorithm.
For DFS, the traversal takes O(m+n) (if m is the number of edges, and n is the number of vertices). For each vertex, it might point back to one of its ancestors and thus forms a cycle. When a cycle is found, it takes O(n) to summarise the total weights.
So I think the total time is O(m+n*n). But obviously it is wrong, as stated in the excise the optimal time is O(n^3) and the normal time is O(m*n^2).
Can anyone help me with:
Is my algorithm correct?
What is the time complexity if my algorithm is correct?
Is there any better algorithm for this problem?
You can use Floyd-Warshall algorithm here.
The Floyd-Warshall algorithm finds shortest path between all pairs of vertices.
The algorithm is then very simple, go over all pairs (u,v), and find the pair that minimized dist(u,v)+dist(v,u), since this pair indicates on a cycle from u to u with weight dist(u,v)+dist(v,u). If the graph also allows self-loops (an edge (u,u)) , you will also need to check them alone, because those cycles (and only them) were not checked by the algorithm.
pseudo code:
run Floyd Warshall on the graph
min <- infinity
vertex <- None
for each pair of vertices u,v
if (dist(u,v) + dist(v,u) < min):
min <- dist(u,v) + dist(v,u)
pair <- (u,v)
return path(u,v) + path(v,u)
path(u,v) + path(v,u) is actually the path found from u to v and then from v to u, which is a cycle.
The algorithm run time is O(n^3), since floyd-warshall is the bottle neck, since the loop takes O(n^2) time.
I think correctness in here is trivial, but let me know if you disagree with me and I'll try to explain it better.
Is my algorithm correct?
No. Let me give a counter example. Imagine you start DFS from u, there are two paths p1 and p2 from u to v and 1 path p3 from v back to u, p1 is shorter than p2.
Assume you start by taking the p2 path to v, and walk back to u by path p3. One cycle found but apparently it's not minimum. Then you continue exploring u by taking the p1 path, but since v is fully explored, the DFS ends without finding the minimum cycle.
"For each vertex, it might point back to one of its ancestors and thus forms a cycle"
I think it might point back to any of its ancestors which means N
Also, how are u going to mark vertexes when you came out of its dfs, you may come there again from other vertex and its going to be another cycle. So this is not (n+m) dfs anymore.
So ur algo is incomplete
same here
3.
During one dfs, I think the vertex should be either unseen, or check, and for checked u can store the minimum weight for the path to the starting vertex. So if on some other stage u find an edge to that vertex u don't have to search for this path any more.
This dfs will find the minimum directed cycle containing first vertex. and it's O(n^2) (O(n+m) if u store the graph as list)
So if to do it from any other vertex its gonna be O(n^3) (O(n*(n+m))
Sorry, for my english and I'm not good at terminology
I did a similar kind of thing but i did not use any visited array for dfs (which was needed for my algorithm to work correctly) and hence i realised that my algorithm was of exponential complexity.
Since, you are finding all cycles it is not possible to find all cycles in less than exponential time since there can be 2^(e-v+1) cycles.

Updating Shortest path distances matrix if one edge weight is decreased

We are given a weighed graph G and its Shortest path distance's matrix delta. So that delta(i,j) denotes the weight of shortest path from i to j (i and j are two vertexes of the graph).
delta is initially given containing the value of the shortest paths. Suddenly weight of edge E is decreased from W to W'. How to update delta(i,j) in O(n^2)? (n=number of vertexes of graph)
The problem is NOT computing all-pair shortest paths again which has the best O(n^3) complexity. the problem is UPDATING delta, so that we won't need to re-compute all-pair shortest paths.
More clarified : All we have is a graph and its delta matrix. delta matrix contains just value of the shortest path. now we want to update delta matrix according to a change in graph: decreased edge weight. how to update it in O(n^2)?
If edge E from node a to node b has its weight decreased, then we can update the shortest path length from node i to node j in constant time. The new shortest path from i to j is either the same as the old one or it contains the edge from a to b. If it contains the edge from a to b, then its length is delta(i, a) + edge(a,b) + delta(b, j).
From this the O(n^2) algorithm to update the entire matrix is trivial, as is the one dealing with undirected graphs.
http://dl.acm.org/citation.cfm?doid=1039488.1039492
http://dl.acm.org.ezp.lib.unimelb.edu.au/citation.cfm?doid=1039488.1039492
Although they both consider increase and decrease. Increase would make it harder.
On the first one, though, page 973, section 3 they explain how to do a decrease-only in n*n.
And no, the dynamic all pair shortest paths can be done in less than nnn. wikipedia is not up to date I guess ;)
Read up on Dijkstra's algorithm. It's how you do these shortest-path problems, and runs in less than O(n^2) anyway.
EDIT There are some subtleties here. It sounds like you're provided the shortest path from any i to any j in the graph, and it sounds like you need to update the whole matrix. Iterating over this matrix is n^2, because the matrix is every node by every other, or n*n or n^2. Simply re-running Dijkstra's algorithm for every entry in the delta matrix will not change this performance class, since n^2 is greater than Dijkstra's O(|E|+|V|log|V|) performance. Am I reading this properly, or am I misremembering big-O?
EDIT EDIT It looks like I am misremembering big-O. Iterating over the matrix would be n^2, and Dijkstra's on each would be an additional overhead. I don't see how to do this in the general case without figuring out exactly which paths W' is included in... this seems to imply that each pair should be checked. So you either need to update each pair in constant time, or avoid checking significant parts of the array.

Resources