Directed maximum weighted bipartite matching allowing sharing of start/end vertices - algorithm

Let G (U u V, E) be a weighted directed bipartite graph (i.e. U and V are the two sets of nodes of the bipartite graph and E contains directed weighted edges from U to V or from V to U). Here is an example:
In this case:
U = {A,B,C}
V = {D,E,F}
E = {(A->E,7), (B->D,1), (C->E,3), (F->A,9)}
Definition: DirectionalMatching (I made up this term just to make things clearer): set of directed edges that may share the start or end vertices. That is, if U->V and U'->V' both belong to a DirectionalMatching then V /= U' and V' /= U but it may be that U = U' or V = V'.
My question: How to efficiently find a DirectionalMatching, as defined above, for a bipartite directional weighted graph which maximizes the sum of the weights of its edges?
By efficiently, I mean polynomial complexity or faster, I already know how to implement a naive brute force approach.
In the example above the maximum weighted DirectionalMatching is: {F->A,C->E,B->D}, with a value of 13.
Formally demonstrating the equivalence of this problem to any other well known problem in graph theory would also be valuable.
Thanks!
Note 1: This question is based on Maximum weighted bipartite matching _with_ directed edges but with the extra relaxation that it is allowed for edges in the matching to share the origin or destination. Since that relaxation makes a big difference, I created an independent question.
Note 2: This is a maximum weight matching. Cardinality (how many edges are present) and the number of vertices covered by the matching is irrelevant for a correct result. Only the maximum weight matters.
Note 2: During my research to solve the problem I found this paper, I think it would be helpful to others trying to find a solution: Alternating cycles and paths in edge-coloured
multigraphs: a survey
Note 3: In case it helps, you can also think of the graph as its equivalent 2-edge coloured undirected bipartite multigraph. The problem formulation would then turn into: Find the set of edges without colour-alternating paths or cycles which has maximum weight sum.
Note 4: I suspect that the problem might be NP-hard, but I am not that experienced with reductions so I haven't managed to prove it yet.
Yet another example:
Imagine you had
4 vertices: {u1, u2} {v1, v2}
4 edges: {u1->v1, u1->v2, u2->v1, v2->u2}
Then, regardless of their weights, u1->v2 and v2->u2 cannot be in the same DirectionalMatching, neither can v2->u2 and u2->v1. However u1->v1 and u1->v2 can, and so can u1->v1 and u2->v1.

Define a new undirected graph G' from G as follows.
G' has a node (A, B) with weight w for each directed edge (A, B) with weight w in G
G' has undirected edge ((A, B),(B, C)) if (A, B) and (B, C) are both directed edges in G
http://en.wikipedia.org/wiki/Line_graph#Line_digraphs
Now find a maximal (weighted) independent vertex set in G'.
http://en.wikipedia.org/wiki/Vertex_independent_set
Edit: stuff after this point only works if all of the edge weights are the same - when the edge weights have different values its a more difficult problem (google "maximum weight independent vertex set" for possible algorithms)
Typically this would be an NP-hard problem. However, G' is a bipartite graph -- it contains only even cycles. Finding the maximal (weighted) independent vertex set in a bipartite graph is not NP-hard.
The algorithm you will run on G' is as follows.
Find the connected components of G', say H_1, H_2, ..., H_k
For each H_i do a 2-coloring (say red and blue) of the nodes. The cookbook approach here is to do a depth-first search on H_i alternating colors. A simple approach would be to color each vertex in H_i based on whether the corresponding edge in G goes from U to V (red) or from V to U (blue).
The two options for which nodes to select from H_i are either all the red nodes or all the blue nodes. Choose the colored node set with higher weight. For example, the red node set has weight equal to H_i.nodes.where(node => node.color == red).sum(node => node.w). Call the higher-weight node set N_i.
Your maximal weighted independent vertex set is now union(N_1, N_2, ..., N_k).
Since each vertex in G' corresponds to one of the directed edges in G, you have your maximal DirectionalMatching.

This problem can be solved in polynomial time using the Hungarian Algorithm. The "proof" by Vor above is wrong.
The method of structuring the problem for the above example is as follows:
D E F
A # 7 9
B 1 # #
C # 3 #
where "#" means negative infinity. You then resolve the matrix using the Hungarian algorithm to determine the maximum matching. You can multiply the numbers by -1 if you want to find a minimum matching.

Related

Algorithm for minimum vertex cover in Bipartite graph

I am trying to figure out an algorithm for finding minimum vertex cover of a bipartite graph.
I was thinking about a solution, that reduces the problem to maximum matching in bipartite graph. It's known that it can be found using max flow in networ created from the bip. graph.
Max matching M should determine min. vertex cover C, but I can't cope with choosing the vertices to set C.
Let's say bip. graph has parts X, Y and vertices that are endpoints of max matching edges are in set A, those who are not belong to B.
I would say I should choose one vertex for an edge in M to C.
Specifically the endpoint of edge e in M that is connected to vertex in set B, else if it is connected only to vertices in A it does not matter.
This idea unfortunately doesn't work generally as there can be counterexamples found to my algorithm, since vertices in A can be also connected by other edges than those who are included in M.
Any help would be appriciated.
Kőnig's theorem proof does exactly that - building a minimum vertex cover from a maximum matching in a bipartite graph.
Let's say you have G = (V, E) a bipartite graph, separated between X and Y.
As you said, first you have to find a maximum matching (which can be achieved with Dinic's algorithm for instance). Let's call M this maximum matching.
Then to construct your minimum vertex cover:
Find U the set (possibly empty) of unmatched vertices in X1, ie. not connected to any edge in M
Build Z the set or vertices either in U, or connected to U by alternating paths (paths that alternate between edges of M and edges not in M)
Then K = (X \ Z) U (Y ∩ Z) is your minimum vertex cover
The Wikipedia article has details about how you can prove K is indeed a minimum vertex cover.
1 Or Y, both are symmetrical

Algorithm to Maximize Degree Centrality of Subgraph

Say I have some graph with nodes and undirected edges (the edges may have a weight associated to them).
I want to find all (or at least one) connected subgraphs that maximize the sum of the degree centrality of all nodes in the subgraph (the degree centrality is based on the original graph) under the constraint that the sum of the weighted edges is < X.
Is there an algorithm that will do this?
A quick search took me to this description of degree centrality. It turns out that the "degree centrality" of a vertex is simply its degree (neighbour count).
Unfortunately your problem is NP-hard, so it's very unlikely that any algorithm exists that can solve every instance quickly. First notice that, assuming edge weights are positive, the edges in any optimal solution necessarily form a tree, since in any non-tree you can delete at least 1 edge without destroying connectivity, and doing so will decrease the total edge weight of the subgraph. (So, as a positive spinoff: If you compute the minimum spanning tree of your input graph and find that it happens to have total weight < X, then you can simply include every vertex in the graph in your solution.)
Let's formulate a decision version of your problem. Given a graph G = (V, E) with positive (I'll assume) weights on the edges, a number X and a number Y, we want to know: Does there exist a connected subgraph G' = (V', E') of G such that the sum of the edge weights in E' is at most X, and the sum of the degrees of V' (w.r.t. G) is at least Y? (Clearly this is no harder than your original problem: If you had an algorithm to solve your problem, then you could just run it, add up the degrees of the vertices in the subgraph it found and compare this to Y to answer "my" problem.)
Here's a reduction from the NP-hard Steiner Tree in Graphs problem, where we are given a graph G = (V, E) with positive weights on the edges, a subset S of its vertices, and a number k, and the task is to determine whether it's possible to connect the vertices in S using a subset of edges with total weight at most k. (As I showed above, the solution will necessarily be a tree.) If the sum of all degrees in G is d, then all we need to do to transform G into an input for your problem is the following: For each vertex s_i in S we add enough new "ballast" vertices that are each connected only to s_i, via edges with weight X+1, to bring the degree of s_i up to d+1. We set X to k, and set Y to |S|(d+1).
Now suppose that the solution to the Steiner Tree problem is YES -- that is, there exists a subset of edges having total weight <= k that does connect all the vertices in S. In that case, it's clear that the same subgraph in the instance of your problem constructed above connects (possibly among others) all the vertices in S, and since each vertex in S has degree d+1, the total degree is at least |S|(d+1), so the answer to your decision problem is also YES.
In the other direction, suppose that the answer to your decision problem is YES -- that is, there exists a subset of edges having total weight <= X ( = k) that connects a set of vertices having total degree at least |S|(d+1). We need to show that this implies a YES answer to the original Steiner Tree problem. Clearly it suffices to show that the vertex set V' of any subgraph satisfying the conditions above (i.e. edges have total weight <= k and vertices have total degree >= |S|(d+1)) contains S (possibly among other vertices). So let V' be the vertex set of such a solution, and suppose to the contrary that there is some vertex u in S that is not in V'. But then the largest sum of degrees that we could possibly make would be to include all other non-ballast vertices in the graph in V', which would give a degree total of at most (|S|-1)(d+1) + d (the first term is the degree sum for the other vertices in S; the second is an upper bound on the degree sum of all non-S vertices in G; note that none of the ballast vertices we added in could be in the subgraph, because the only way to include any of them is to use an edge of weight X+1, which we obviously can't do). But clearly (|S|-1)(d+1) + d = |S|(d+1) - 1, which is strictly less than |S|(d+1), contradicting our assumption that V' has a degree total at least |S|(d+1). So it follows that S is a subset of V', and thus that it is possible to use the same subset of edges to connect the vertices in S for a total weight of at most k, i.e. that the answer to the Steiner Tree problem is also YES.
So a YES answer to either problem implies a YES answer to the other one, in turn implying that a NO answer to either implies a NO answer to the other. Thus if it were possible to solve the decision version of your problem in polynomial time, it would imply a polynomial-time solution to the NP-hard Steiner Tree in Graphs problem. This means the decision version of your problem is itself NP-hard, and so is the optimisation version (which as I said above is at least as hard). (The decision form is also NP-complete, since a YES answer can be easily verified in polynomial time.)
Sidenote: At first I thought I had a very straightforward reduction from the NP-hard Knapsack problem: Given a list of n weights w_1, ..., w_n and a list of n profits p_1, ..., p_n, make a single central vertex c, and n other vertices v_1, ..., v_n. For each v_i, attach it to c with an edge of weight w_i, and add p_i other leaf vertices, each attached only to v_i with an edge of weight X+1. However this reduction doesn't actually work, because the profits can be exponential in the input size n, meaning that the constructed instance of your problem might need to have an exponential number of vertices, which isn't allowed for a polynomial-time reduction.

Minimum sum of distances from sensor nodes to all others

Is there a way to compute (accurate or hevristics) this problem on medium sized (up to 1000 nodes) weighted graph?
Place n (for example 5) sensors in nodes of the graph in such way that the sum of distances from every other node to the closest sensor will be minimal.
I'll show that this problem is NP-hard by reduction from Vertex Cover. This applies even if the graph is unweighted (you don't say whether it's weighted or not).
Given an unweighted graph G = (V, E) and an integer k, the question asked by Vertex Cover is "Does there exist a set of at most k vertices such that every edge has at least one endpoint in this set?" We will build a new graph G' = (V', E), which is the same as G except that all isolated vertices have been discarded, solve your problem on G', and then use it to answer the original question about Vertex Cover.
Suppose there does exist such a set S of k vertices. If we consider this set S to be the locations to put sensors in your problem, then every vertex in S has a distance of 0, and every other vertex is at a distance of exactly 1 away from a vertex that is in S (because if there was some vertex u for which this wasn't true, it would mean that none of u's neighbours are in S, so for each such neighbour u, the edge uv is not covered by the vertex cover, which would be a contradiction.)
This type of problem is called graph clustering. One of the popular methods to solve it is the Markov Cluster (MCL) Algorithm. A web search should provide some implementation examples. However it does not generally provide the optimal solution.

Finding MST such that a specific vertex has a minimum degree

Given undirected, connected graph G={V,E}, a vertex in V(G), label him v, and a weight function f:E->R+(Positive real numbers), I need to find a MST such that v's degree is minimal. I've already noticed that if all the edges has unique weight, the MST is unique, so I believe it has something to do with repetitive weights on edges. I though about running Kruskal's algorithm, but when sorting the edges, I'll always consider edges that occur on v last. For example, if (a,b),(c,d),(v,e) are the only edges of weight k, so the possible permutations of these edges in the sorted edges array are: {(a,b),(c,d),(v,e)} or {(c,d),(a,b),(v,e)}. I've ran this variation over several graphs and it seems to work, but I couldn't prove it. Does anyone know how to prove the algorithm's correct (Meaning proving v's degree is minimal), or give a contrary example of the algorithm failing?
First note that Kruskal's algorithm can be applied to any weighted graph, whether or not it is connected. In general it results in a minimum-weight spanning forest (MSF), with one MST for each connected component. To prove that your modification of Kruskal's algorithm succeeds in finding the MST for which v has minimal degree, it helps to prove the slightly stronger result that if you apply your algorithm to a possibly disconnected graph then it succeeds in finding the MSF where the degree of v is minimized.
The proof is by induction on the number, k, of distinct weights.
Basis Case (k = 1). In this case weights can be ignored and we are trying to find a spanning forest in which the degree of v is minimized. In this case, your algorithm can be described as follows: pick edges for as long as possible according to the following two rules:
1) No selected edge forms a cycle with previously selected edges
2) An edge involving v isn't selected unless any edge which doesn't
involve v violates rule 1.
Let G' denote the graph from which v and all incident edges have been removed from G. It is easy to see that the algorithm in this special case works as follows. It starts by creating a spanning forest for G'. Then it takes those trees in the forest that are contained in v's connected component in the original graph G and connects each component to v by a single edge. Since the components connected to v in the second stage can be connected to each other in no other way (since if any connecting edge not involving v exists it would have been selected by rule 2) it is easy to see that the degree of v is minimal.
Inductive Case: Suppose that the result is true for k and G is a weighted graph with k+1 distinct weights and v is a specified vertex in G. Sort the distinct weights in increasing order (so that weight k+1 is the longest of the distinct weights -- say w_{k+1}). Let G' be the sub-graph of G with the same vertex set but with all edges of weight w_{k+1} removed. Since the edges are sorted in the order of increasing weight, note that the modified Kruskal's algorithm in effect starts by applying itself to G'. Thus -- by the induction hypothesis prior to considering edges of weight w_{k+1}, the algorithm has succeeded in constructing an MSF F' of G' for which the degree, d' of v in G' is minimized.
As a final step, modified Kruskal's applied to the overall graph G will merge certain of the trees in F' together by adding edges of weight w_{k+1}. One way to conceptualize the final step is the think of F' as a graph where two trees are connected exactly when there is an edge of weight w_{k+1} from some node in the first tree to some node in the second tree. We have (almost) the basis case with F'. Modified Kruskal's will add edged of weight w_{k+1} until it can't do so anymore -- and won't add an edge connecting to v unless there is no other way to connect to trees in F' that need to be connected to get a spanning forest for the original graph G.
The final degree of v in the resulting MSF is d = d'+d" where d" is the number of edges of weight w_{k+1} added at the final step. Neither d' nor d" can be made any smaller, hence it follows that d can't be made any smaller (since the degree of v in any spanning forest can be written as the sum of the number of edges whose weight is less than w_{k+1} coming into v and the number off edges of weight w_{k+1} coming into v).
QED.
There is still an element of hand-waving in this, especially with the final step -- but Stack Overflow isn't a peer-reviewed journal. Anyway, the overall logic should be clear enough.
One final remark -- it seems fairly clear that Prim's algorithm can be similarly modified for this problem. Have you looked into that?

minimum connected subgraph containing a given set of nodes

I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
loop:
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
https://scipjack.zib.de/
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.

Resources