Bipartite selection with associated vertex cost - algorithm

I think I'm looking for an algorithm that can find a "minimum" "selection" in a bipartite graph. Each vertex has an associated (integer) cost to selecting it. I can only find algorithms that minimise the number of vertices in the selected set, not the cost. I'd previously thought I need a "matching", but actually I just need the subset of vertices that cover every edge...
I don't think a greedy solution can work. Suppose our sets are A, B:
Vertices 1,2,3 are in A and have cost 1.
Vertex 4 is in B and has cost 2.
The solution is to remove the most expensive vertex, 4. A greedy solution that chose based on cost would fail. Similarly, if B had cost 10, we could not choose the most connected vertex greedily.
I thought of a different wording: "Given a bipartite graph where each vertex has an associated cost, find a subset of vertices of minimum cost such that every edge is incident on at least one vertex in your selected subset".

Primal LP:
min sum_v c_v x_v
s.t.
forall e=vw. x_v + x_w >= 1
forall v. x_v >= 0
Dual LP:
max sum_e y_e
s.t.
forall v. sum_{e=vw} y_e <= c_v
forall e. y_e >= 0
Find a min cut where the edges are arcs from A to B with infinite capacity, the vertices in A are sources, and the vertices in B are sinks, with all vertices having capacity equal to their cost. (Equivalently, make a supersource with arcs to A and a supersink with arcs from B.)
Take the As that are on the "sink" side of the cut and the Bs that are on the "source" side. Every edge vw is covered because if neither v nor w belonged to the cover then vw would be residual.
Hat tip I think to Jenő Egerváry.

Related

How to find maximal subgraph of bipartite graph with valence constraint?

I have a bipartite graph. I'll refer to red-nodes and black-nodes of the respective disjoint sets.
I would like to know how to find a connected induced subgraph that maximizes the number of red-nodes while ensuring that all black nodes in the subgraph have new valences less than or equal to 2. Where "induced" means that if two nodes are connected in the original graph and both exist in the subgraph then the edge between them is automatically included. Eventually I'd like to introduce non-negative edge-weights.
Can this be reduced to a standard graph algorithm? Hopefully one with known complexity and simple implementation.
It's clearly possible to grow a subgraph greedily. But is this best?
I'm sure that this problem belongs to NP-complete class, so there is no easy way to solve it. I would suggest you using constraint satisfaction approach. There are quite a few ways to formulate your problem, for example mixed-integer programming, MaxSAT or even pseudo-boolean constraints.
For the first try, I would recommend MiniZinc solver. For example, consider this example of defining and solving graph problems in MiniZinc.
Unfortunately this is NP-hard, so there are probably no polynomial-time algorithms to solve it. Here is a reduction from the NP-hard problem Independent Set, where we are given a graph G = (V, E) (with n = |V| and m = |E|) and an integer k, and the task is to determine whether it is possible to find a set of k or more vertices such that no two vertices in the set are linked by an edge:
For every vertex v_i in G, create a red vertex r_i in H.
For every edge (v_i, v_j) in G, create the following in H:
a black vertex b_ij,
n+1 red vertices t_ijk (1 <= k <= n+1),
n black vertices u_ijk (1 <= k <= n),
n edges (t_ijk, u_ijk) (1 <= k <= n)
n edges (t_ijk, u_ij{k-1}) (2 <= k <= n+1)
the three edges (r_i, b_ij), (r_j, b_ij), and (t_ij1, b_ij).
For every pair of vertices v_i, v_j, create the following:
a black vertex c_ij,
the two edges (r_i, c_ij) and (r_j, c_ij).
Set the threshold to m(n+1)+k.
Call the set of all r_i R, the set of all b_ij B, the set of all c_ij C, the set of all t_ij T, and the set of all u_ij U.
The general idea here is that we force each black vertex b_ij to choose at most 1 of the 2 red vertices r_i and r_j that correspond to the endpoints of the edge (i, j) in G. We do this by giving each of these b_ij vertices 3 outgoing edges, of which one (the one to t_ij1) is a "must-have" -- that is, any solution in which a t_ij1 vertex is not selected can be improved by selecting it, as well as the n other red vertices it connects to (via a "wiggling path" that alternates between vertices in t_ijk and vertices in u_ijk), getting rid of either r_i or r_j to restore the property that no black vertex has 3 or more neighbours in the solution if necessary, and then finally restoring connectedness by choosing vertices from C as necessary. (The c_ij vertices are "connectors": they exist only to ensure that whatever subset of R we include can be made into a single connected component.)
Suppose first that there is an IS of size k in G. We will show that there is a connected induced subgraph X with at least m(n+1)+k red nodes in H, in which every black vertex has at most 2 neighbours in X.
First, include in X the k vertices from R that correspond to the vertices in the IS (such a set must exist by assumption). Because these vertices form an IS, no vertex in B is adjacent to more than 1 of them, so for each vertex b_ij, we may safely add it, and the "wiggling path" of 2n+1 vertices beginning at t_ij1, into X as well. Each of these wiggling paths contains n+1 red vertices, and there are m such paths (one for each edge in G), so there are now m(n+1)+k red vertices in X. Finally, to ensure that X is connected, add to it every vertex c_ij such that r_i and r_j are both in X already: notice that this does not change the total number of red vertices in X.
Now suppose that there is a connected induced subgraph X with at least m(n+1)+k red nodes in H, in which every black vertex has at most 2 neighbours in X. We will show that there is an IS in G of size k.
The only red vertices in H are those in R and those in T. There are only n vertices in R, so if X does not contain all m wiggly paths, it must have at most (m-1)(n+1)+n = m(n+1)-1 red vertices, contradicting the assumption that it has at least m(n+1)+k red vertices. Thus X must contain all m wiggly paths. This leaves k other red vertices in X, which must be from R. No two of these vertices can be adjacent to the same vertex in B, since that B-vertex would then be adjacent to 3 vertices: thus, these k vertices correspond to an IS in G.
Since a YES-instance of IS implies a YES-instance to the constructed instance of your problem and vice versa, the solution to the constructed instance of your problem corresponds exactly to the solution to the IS instance; and since the construction is clearly polynomial-time, this establishes that your problem is NP-hard.

How to update MST from the old MST if one edge is deleted

I am studying algorithms, and I have seen an exercise like this
I can overcome this problem with exponential time but. I don't know how to prove this linear time O(E+V)
I will appreciate any help.
Let G be the graph where the minimum spanning tree T is embedded; let A and B be the two trees remaining after (u,v) is removed from T.
Premise P: Select minimum weight edge (x,y) from G - (u,v) that reconnects A and B. Then T' = A + B + (x,y) is a MST of G - (u,v).
Proof of P: It's obvious that T' is a tree. Suppose it were not minimum. Then there would be a MST - call it M - of smaller weight. And either M contains (x,y), or it doesn't.
If M contains (x,y), then it must have the form A' + B' + (x,y) where A' and B' are minimum weight trees that span the same vertices as A and B. These can't have weight smaller than A and B, otherwise T would not have been an MST. So M is not smaller than T' after all, a contradiction; M can't exist.
If M does not contain (x,y), then there is some other path P from x to y in M. One or more edges of P pass from a vertex in A to another in B. Call such an edge c. Now, c has weight at least that of (x,y), else we would have picked it instead of (x,y) to form T'. Note P+(x,y) is a cycle. Consequently, M - c + (x,y) is also a spanning tree. If c were of greater weight than (x,y) then this new tree would have smaller weight than M. This contradicts the assumption that M is a MST. Again M can't exist.
Since in either case, M can't exist, T' must be a MST. QED
Algorithm
Traverse A and color all its vertices Red. Similarly label B's vertices Blue. Now traverse the edge list of G - (u,v) to find a minimum weight edge connecting a Red vertex with a Blue. The new MST is this edge plus A and B.
When you remove one of the edges then the MST breaks into two parts, lets call them a and b, so what you can do is iterate over all vertices from the part a and look for all adjacent edges, if any of the edges forms a link between the part a and part b you have found the new MST.
Pseudocode :
for(all vertices in part a){
u = current vertex;
for(all adjacent edges of u){
v = adjacent vertex of u for the current edge
if(u and v belong to different part of the MST) found new MST;
}
}
Complexity is O(V + E)
Note : You can keep a simple array to check if vertex is in part a of the MST or part b.
Also note that in order to get the O(V + E) complexity, you need to have an adjacency list representation of the graph.
Let's say you have graph G' after removing the edge. G' consists have two connected components.
Let each node in the graph have a componentID. Set the componentID for all the nodes based on which component they belong to. This can be done with a simple BFS for example on G'. This is an O(V) operation as G' only has V nodes and V-2 edges.
Once all the nodes have been flagged, iterate over all unused edges and find the one with the least weight that connects the two components (componentIDs of the two nodes will be different). This is an O(E) operation.
Thus the total runtime is O(V+E).

Maximum weighted path between two vertices in a directed acyclic Graph

Love some guidance on this problem:
G is a directed acyclic graph. You want to move from vertex c to vertex z. Some edges reduce your profit and some increase your profit. How do you get from c to z while maximizing your profit. What is the time complexity?
Thanks!
The problem has an optimal substructure. To find the longest path from vertex c to vertex z, we first need to find the longest path from c to all the predecessors of z. Each problem of these is another smaller subproblem (longest path from c to a specific predecessor).
Lets denote the predecessors of z as u1,u2,...,uk and dist[z] to be the longest path from c to z then dist[z]=max(dist[ui]+w(ui,z))..
Here is an illustration with 3 predecessors omitting the edge set weights:
So to find the longest path to z we first need to find the longest path to its predecessors and take the maximum over (their values plus their edges weights to z).
This requires whenever we visit a vertex u, all of u's predecessors must have been analyzed and computed.
So the question is: for any vertex u, how to make sure that once we set dist[u], dist[u] will never be changed later on? Put it in another way: how to make sure that we have considered all paths from c to u before considering any edge originating at u?
Since the graph is acyclic, we can guarantee this condition by finding a topological sort over the graph. topological sort is like a chain of vertices where all edges point left to right. So if we are at vertex vi then we have considered all paths leading to vi and have the final value of dist[vi].
The time complexity: topological sort takes O(V+E). In the worst case where z is a leaf and all other vertices point to it, we will visit all the graph edges which gives O(V+E).
Let f(u) be the maximum profit you can get going from c to u in your DAG. Then you want to compute f(z). This can be easily computed in linear time using dynamic programming/topological sorting.
Initialize f(u) = -infinity for every u other than c, and f(c) = 0. Then, proceed computing the values of f in some topological order of your DAG. Thus, as the order is topological, for every incoming edge of the node being computed, the other endpoints are calculated, so just pick the maximum possible value for this node, i.e. f(u) = max(f(v) + cost(v, u)) for each incoming edge (v, u).
Its better to use Topological Sorting instead of Bellman Ford since its DAG.
Source:- http://www.utdallas.edu/~sizheng/CS4349.d/l-notes.d/L17.pdf
EDIT:-
G is a DAG with negative edges.
Some edges reduce your profit and some increase your profit
Edges - increase profit - positive value
Edges - decrease profit -
negative value
After TS, for each vertex U in TS order - relax each outgoing edge.
dist[] = {-INF, -INF, ….}
dist[c] = 0 // source
for every vertex u in topological order
if (u == z) break; // dest vertex
for every adjacent vertex v of u
if (dist[v] < (dist[u] + weight(u, v))) // < for longest path = max profit
dist[v] = dist[u] + weight(u, v)
ans = dist[z];

Shortest Path Algorithm with Fuel Constraint and Variable Refueling

Suppose you have an undirected weighted graph. You want to find the shortest path from the source to the target node while starting with some initial "fuel". The weight of each edge is equal to the amount of "fuel" that you lose going across the edge. At each node, you can have a predetermined amount of fuel added to your fuel count - this value can be 0. A node can be visited more than once, but the fuel will only be added the first time you arrive at the node. **All nodes can have different amounts of fuel to provide.
This problem could be related to a train travelling from town A to town B. Even though the two are directly connected by a simple track, there is a shortage of coal, so the train does not have enough fuel to make the trip. Instead, it must make the much shorter trip from town A to town C which is known to have enough fuel to cover the trip back to A and then onward to B. So, the shortest path would be the distance from A to C plus the distance from C to A plus the distance from A to B. We assume that fuel cost and distance is equivalent.
I have seen an example where the nodes always fill the "tank" up to its maximum capacity, but I haven't seen an algorithm that handles different amounts of refueling at different nodes. What is an efficient algorithm to tackle this problem?
Unfortunately this problem is NP-hard. Given an instance of traveling salesman path from s to t with decision threshold d (Is there an st-path visiting all vertices of length at most d?), make an instance of this problem as follows. Add a new destination vertex connected to t by a very long edge. Give starting fuel d. Set the length of the new edge and the fuel at each vertex other than the destination so that (1) the total fuel at all vertices is equal to the length of the new edge (2) it is not possible to use the new edge without collecting all of the fuel. It is possible to reach the destination if and only if there is a short traveling salesman path.
Accordingly, algorithms for this problem will resemble those for TSP. Preprocess by constructing a complete graph on the source, target, and vertices with nonzero fuel. The length of each edge is equal to the distance.
If there are sufficiently few special vertices, then exponential-time (O(2^n poly(n))) dynamic programming is possible. For each pair consisting of a subset of vertices (in order of nondecreasing size) and a vertex in that subset, determine the cheapest way to visit all of the subset and end at the specified vertex. Do this efficiently by using the precomputed results for the subset minus the vertex and each possible last waypoint. There's an optimization that prunes the subsolutions that are worse than a known solution, which may help if it's not necessary to use very many waypoints.
Otherwise, the play may be integer programming. Here's one formulation, quite probably improvable. Let x(i, e) be a variable that is 1 if directed edge e is taken as the ith step (counting from the zeroth) else 0. Let f(v) be the fuel available at vertex v. Let y(i) be a variable that is the fuel in hand after i steps. Assume that the total number of steps is T.
minimize sum_i sum_{edges e} cost(e) x(i, e)
subject to
for each i, for each vertex v,
sum_{edges e with head v} x(i, e) - sum_{edges e with tail v} x(i + 1, e) =
-1 if i = 0 and v is the source
1 if i + 1 = T and v is the target
0 otherwise
for each vertex v, sum_i sum_{edges e with head v} x(i, e) <= 1
for each vertex v, sum_i sum_{edges e with tail v} x(i, e) <= 1
y(0) <= initial fuel
for each i,
y(i) >= sum_{edges e} cost(e) x(i, e)
for each i, for each vertex v,
y(i + 1) <= y(i) + sum_{edges e} (-cost(e) + f(head of e)) x(i, e)
for each i, y(i) >= 0
for each edge e, x(e) in {0, 1}
There is no efficient algorithm for this problem. If you take an existing graph G of size n you can give each edge a weight of 1, each node a deposit of 5, and then add a new node that you are trying to travel to connected to each node with a weight of 4 * (n -1). Now the existence of a path from the source to the target node in this graph is equivalent to the existence of a Hamiltonian path in G, which is a known np-complete problem. (See http://en.wikipedia.org/wiki/Hamiltonian_path for details.)
That said, you can do better than a naive recursive solution for most graphs. First do a breadth first search from the target node so that every node's distance to the target is known. Now you can borrow the main idea of Dijkstra's A* search. Do a search of all paths from the source, using a priority queue to always try to grow a path whose current distance + the minimum to the target is at a minimum. And to reduce work you probably also want to discard all paths that have returned to a node that they have previously visited, except with lower fuel. (This will avoid silly paths that travel around loops back and forth as fuel runs out.)
Assuming the fuel consumption as positive weight and the option to add the fuel as negative weight and additionally treating the initial fuel available as another negative weighted edge, you can use Bellman Ford to solve it as single source shortest path.
As per this answer, elsewhere, undirected graphs can be addressed by replacing each edge with two in both directions. The only constraint I'm not sure about is the part where you can only refuel once. This may be naturally addressed, by the the algorithm, but I'm not sure.

How can I get the antichain elements in SPOJ-DIVREL?

Problem: http://www.spoj.com/problems/DIVREL
In question, we just need to find the maximum number of elements which are not multiples (a divisible by b form) from a set of elements given. If we just make an edge from an element to its multiple and construct a graph it will be a DAG.
Now the question just changes to finding the minimum number of chains which contain all the vertices which equals the antichain cardinality using Dilworth's theorem as it is a partially ordered set.
Minimum chains can be found using bipartite matching (How: It is minimum path cover) but now I am unable to find the antichain elements themselves?
To compute the antichain you can:
Compute the maximum bipartite matching (e.g. with a maximum flow algorithm) on a new bipartite graph D which has an edge from LHS a to RHS b if and only if a divides b.
Use the matching to compute a minimal vertex cover (e.g. with the algorithm described in the proof of Konig's theorem
The antichain is given by all vertices not in the vertex cover
There cannot be an edge between two such elements as otherwise we would have discovered an edge that is not covered by a vertex cover resulting in a contradiction.
The algorithm to find the min vertex cover is (from the link above):
Let S0 consist of all vertices unmatched by M.
For integer j ≥ 0, let S(2j+1) be the set of all vertices v such that v is adjacent via some edge in E \ M to a vertex in S(2j) and v has not been included in any
previously-defined set Sk, where k < 2j+1. If there is no such vertex,
but there remain vertices not included in any previously-defined set
Sk, arbitrarily choose one of these and let S(2j+1) consist of that
single vertex.
For integer j ≥ 1, let S(2j) be the set of all vertices u
such that u is adjacent via some edge in M to a vertex in S(2j−1). Note
that for each v in S(2j−1) there is a vertex u to which it is matched
since otherwise v would have been in S0. Therefore M sets up a
one-to-one correspondence between the vertices of S(2j−1) and the
vertices of S(2j).
The union of the odd indexed subsets is the vertex cover.

Resources