Maximum weight matching (MWM) for a predefined number of nodes - algorithm

I am given a weighted graph and want to find a set of edges such that every node is only incident to one edge and such that the sum of the selected edge weights is maximized. As far as I know this problem is generally referred to as maximum weight matching and there exist fast approximations for it: https://web.eecs.umich.edu/~pettie/papers/ApproxMWM-JACM.pdf
However, for my application it would be better if only a certain ratio of nodes is paired. It's more important for my application that the nodes that get paired have a high weight between them. Leaving some nodes unpaired is no big problem.
Currently, I sort the weights between nodes in descending order and always select the edge with the highest weight until I have paired a certain number of nodes. Of course I ensure that pairs of nodes are mutually exclusive. This is only a 1/2 approximation to the original problem and it's probably even worse for the modified problem.
Could you please suggest an algorithm for this issue or tell me how this problem is called?

Some thoughts.
The greedy algorithm for this problem is a 2-approximation. Imagine that, during the execution of the greedy algorithm, we keep score versus an optimal solution in the following manner. Every time we add an edge to the greedy solution, we delete the incident edges in the optimal solution, which I claim must have weight no greater than the greedy edge. If no edge would be deleted from the optimal solution, we delete the two heaviest edges instead, which also I claim must have weight no greater than the greedy edge. Since the greedy solution must have at least half as many edges as the optimal solution, I claim that there are no optimal edges left at the end, and hence greedy is a 2-approximation because we never deleted more than 2x weight with each greedy edge.
A complete 20,000-vertex graph is right on that line where I don't know whether integer programming would be a good idea. I think it's still worth a try because it's easy enough.
There are polynomial-time algorithms for computing the maximum-weight independent set in the intersection of two matroids (for this problem, the matching matroid and the uniform matroid whose bases have size equal to the size of the desired matching). I don't know if they would be practical.

Related

Maximum weighted Hungarian method Using minimum Hungarian method

I have programmed the minimum Hungarian algorithm for a bipartite graph, with Dijkstra's algorithm to find the minimum cost of a maximum matching. However, I want to use such an algorithm to implement the maximum Hungarian algorithm and don't know if it's correct to just negate the edges, because I don't know if the algorithm will handle it.
My implementation is based on the explanation on the following site: https://www.ics.uci.edu/~eppstein/163/lecture6b.pdf
Given G=(AUB, E), the idea is to label the vertices via an artificial start vertex s which has edges with unsaturated nodes in A, and run Dijkstra's algorithm from s in order to label each vertex, then after labeling each, the edges will be reweighted by their original weight minus the labels of the edge's endpoints.
I have read a lot of articles, and the only I could see is that a minimum Hungarian algorithm can be handled well with maximum cost by negating each edge, however, I am afraid that due to the fact that Dijkstra's algorithm doesn't handle negative edges well, it won't work.
First find the maximum weight in your graph. Then negate all of the weights and add the maximum weight to them. Adding the original maximum to all of the negated values makes them all positive.
You can also use INT_MAX (or whatever is equivalent to it in the programming language you're using) instead of the maximum weight. This skips the step of finding the maximum weight, but could make the first iteration of the Hungarian Algorithm take longer, or cause you to need an extra iteration of the algorithm to get the result. It probably doesn't make much of a difference either way and the performance difference will vary based on the particular weights in your graph.

Edge clique cover algorithm

I am trying to write an algorithm that computes the edge clique cover number (the smallest number of cliques that cover all edges) of an input graph (undirected and no self-loops). My idea would be to
Calculate all maximal cliques with the Bron-Kerbosch algorithm, and
Try if any 1,2,3,... of them would cover all edges until I find the
minimum number
Would that work and does anyone know a better method; is there a standard algorithm? To my surprise, I couldn't find any such algorithm. I know that the problem is NP-hard, so I don't expect a fast solution.
I would gather maximal cliques as you do now (or perhaps using a different algorithm, as suggested by CaptainTrunky), but then use branch and bound. This won't guarantee a speedup, but will often produce a large speedup on "easy" instances.
In particular:
Instead of trying all subsets of maximal cliques in increasing subset size order, pick an edge uv and branch on it. This means:
For each maximal clique C containing uv:
Make a tentative new partial solution that contains all cliques in the current solution
Add C to this partial solution
Make a new subproblem containing the current subproblem's graph, but with all vertices in C collapsed into a single vertex
Recurse to solve this smaller subproblem.
Keep track of the best complete solution so far. This is your upper bound (UB). You do not need to continue processing any subproblem that has already reached this upper bound but still has edges present; a better solution already exists!
It's best to pick an edge to branch on that is covered by as few cliques as possible. When choosing in what order to try those cliques, try whichever you think is likely to be the best (probably the largest one) first.
And here is an idea for a lower bound to improve the pruning level:
If a subgraph G' contains an independent set of size s, then you will need at least s cliques to cover G' (since no clique can cover two or more vertices in an independent set). Computing the largest possible IS is NP-hard and thus impractical here, but you could get a cheap bound by using the 2-approximation for Vertex Cover: Just keep choosing an edge and throwing out both vertices until no edges are left; if you threw out k edges, then what remains is an IS that is within k of optimal.
You can add the size of this IS to the total number of cliques in your solution so far; if that is larger than the current UB, you can abort this subproblem, since we know that fleshing it out further cannot produce a better solution than one we have already seen.
I was working on the similar problem 2 years ago and I've never seen any standard existing approaches to it. I did the following:
Compute all maximal cliques.
MACE was way better than
Bron-Kerbosch in my case.
Build a constraint-satisfaction problem for determining a minimum number of cliques required to cover the graph. You could use SAT, Minizinc, MIP tools to do so. Which one to pick? It depends on your skills, time resources, environment and dozens of other parameters. If we are talking about proof-of-concept, I would stick with Minizinc.
A bit details for the second part. Define a set of Boolean variables with respect to each edge, if it's value == True, then it's covered, otherwise, it's not. Add constraints that allow you covering sets of edges only with respect to each clique. Finally, add variables corresponding to each clique, if it's == True, then it's used already, otherwise, it's not. Finally, require all edges to be covered AND a number of used cliques is minimal.

Minimal spanning tree with K extra node

Assume we're given a graph on a 2D-plane with n nodes and edge between each pair of nodes, having a weight equal to a euclidean distance. The initial problem is to find MST of this graph and it's quite clear how to solve that using Prim's or Kruskal's algorithm.
Now let's say we have k extra nodes, which we can place in any integer point on our 2D-plane. The problem is to find locations for these nodes so as new graph has the smallest possible MST, if it is not necessary to use all of these extra nodes.
It is obviously impossible to find the exact solution (in poly-time), but the goal is to find the best approximate one (which can be found within 1 sec). Maybe you can come up with some hints of the most efficient way of going throw possible solutions, or provide with some articles, where the similar problem is covered.
It is very interesting problem which you are working on. You have many options to attack this problem. The best known heuristics in such situation are - Genetic Algorithms, Particle Swarm Optimization, Differential Evolution and many others of this kind.
What is nice for such kind of heuristics is that you can limit their execution to a certain amount of time (let say 1 second). If it was my task to do I would try first Genetic Algorithms.
You could try with a greedy algorithm, try the longest edges in the MST, potentially these could give the largest savings.
Select the longest edge, now get the potential edge from each vertex that are closed in angle to the chosen one, from each side.
from these select the best Steiner point.
Fix the MST ...
repeat until 1 sec is gone.
The challenge is what to do if one of the vertexes is itself a Steiner point.

Minimum spanning tree with two edges tied

I'd like to solve a harder version of the minimum spanning tree problem.
There are N vertices. Also there are 2M edges numbered by 1, 2, .., 2M. The graph is connected, undirected, and weighted. I'd like to choose some edges to make the graph still connected and make the total cost as small as possible. There is one restriction: an edge numbered by 2k and an edge numbered by 2k-1 are tied, so both should be chosen or both should not be chosen. So, if I want to choose edge 3, I must choose edge 4 too.
So, what is the minimum total cost to make the graph connected?
My thoughts:
Let's call two edges 2k and 2k+1 a edge set.
Let's call an edge valid if it merges two different components.
Let's call an edge set good if both of the edges are valid.
First add exactly m edge sets which are good in increasing order of cost. Then iterate all the edge sets in increasing order of cost, and add the set if at least one edge is valid. m should be iterated from 0 to M.
Run an kruskal algorithm with some variation: The cost of an edge e varies.
If an edge set which contains e is good, the cost is: (the cost of the edge set) / 2.
Otherwise, the cost is: (the cost of the edge set).
I cannot prove whether kruskal algorithm is correct even if the cost changes.
Sorry for the poor English, but I'd like to solve this problem. Is it NP-hard or something, or is there a good solution? :D Thanks to you in advance!
As I speculated earlier, this problem is NP-hard. I'm not sure about inapproximability; there's a very simple 2-approximation (split each pair in half, retaining the whole cost for both halves, and run your favorite vanilla MST algorithm).
Given an algorithm for this problem, we can solve the NP-hard Hamilton cycle problem as follows.
Let G = (V, E) be the instance of Hamilton cycle. Clone all of the other vertices, denoting the clone of vi by vi'. We duplicate each edge e = {vi, vj} (making a multigraph; we can do this reduction with simple graphs at the cost of clarity), and, letting v0 be an arbitrary original vertex, we pair one copy with {v0, vi'} and the other with {v0, vj'}.
No MST can use fewer than n pairs, one to connect each cloned vertex to v0. The interesting thing is that the other halves of the pairs of a candidate with n pairs like this can be interpreted as an oriented subgraph of G where each vertex has out-degree 1 (use the index in the cloned bit as the tail). This graph connects the original vertices if and only if it's a Hamilton cycle on them.
There are various ways to apply integer programming. Here's a simple one and a more complicated one. First we formulate a binary variable x_i for each i that is 1 if edge pair 2i-1, 2i is chosen. The problem template looks like
minimize sum_i w_i x_i (drop the w_i if the problem is unweighted)
subject to
<connectivity>
for all i, x_i in {0, 1}.
Of course I have left out the interesting constraints :). One way to enforce connectivity is to solve this formulation with no constraints at first, then examine the solution. If it's connected, then great -- we're done. Otherwise, find a set of vertices S such that there are no edges between S and its complement, and add a constraint
sum_{i such that x_i connects S with its complement} x_i >= 1
and repeat.
Another way is to generate constraints like this inside of the solver working on the linear relaxation of the integer program. Usually MIP libraries have a feature that allows this. The fractional problem has fractional connectivity, however, which means finding min cuts to check feasibility. I would expect this approach to be faster, but I must apologize as I don't have the energy to describe it detail.
I'm not sure if it's the best solution, but my first approach would be a search using backtracking:
Of all edge pairs, mark those that could be removed without disconnecting the graph.
Remove one of these sets and find the optimal solution for the remaining graph.
Put the pair back and remove the next one instead, find the best solution for that.
This works, but is slow and unelegant. It might be possible to rescue this approach though with a few adjustments that avoid unnecessary branches.
Firstly, the edge pairs that could still be removed is a set that only shrinks when going deeper. So, in the next recursion, you only need to check for those in the previous set of possibly removable edge pairs. Also, since the order in which you remove the edge pairs doesn't matter, you shouldn't consider any edge pairs that were already considered before.
Then, checking if two nodes are connected is expensive. If you cache the alternative route for an edge, you can check relatively quick whether that route still exists. If it doesn't, you have to run the expensive check, because even though that one route ceased to exist, there might still be others.
Then, some more pruning of the tree: Your set of removable edge pairs gives a lower bound to the weight that the optimal solution has. Further, any existing solution gives an upper bound to the optimal solution. If a set of removable edges doesn't even have a chance to find a better solution than the best one you had before, you can stop there and backtrack.
Lastly, be greedy. Using a regular greedy algorithm will not give you an optimal solution, but it will quickly raise the bar for any solution, making pruning more effective. Therefore, attempt to remove the edge pairs in the order of their weight loss.

Algorithm for finding optimal node pairs in hexagonal graph

I'm searching for an algorithm to find pairs of adjacent nodes on a hexagonal (honeycomb) graph that minimizes a cost function.
each node is connected to three adjacent nodes
each node "i" should be paired with exactly one neighbor node "j".
each pair of nodes defines a cost function
c = pairCost( i, j )
The total cost is then computed as
totalCost = 1/2 sum_{i=1:N} ( pairCost(i, pair(i) ) )
Where pair(i) returns the index of the node that "i" is paired with. (The sum is divided by two because the sum counts each node twice). My question is, how do I find node pairs that minimize the totalCost?
The linked image should make it clearer what a solution would look like (thick red line indicates a pairing):
Some further notes:
I don't really care about the outmost nodes
My cost function is something like || v(i) - v(j) || (distance between vectors associated with the nodes)
I'm guessing the problem might be NP-hard, but I don't really need the truly optimal solution, a good one would suffice.
Naive algos tend to get nodes that are "locked in", i.e. all their neighbors are taken.
Note: I'm not familiar with the usual nomenclature in this field (is it graph theory?). If you could help with that, then maybe that could enable me to search for a solution in the literature.
This is an instance of the maximum weight matching problem in a general graph - of course you'll have to negate your weights to make it a minimum weight matching problem. Edmonds's paths, trees and flowers algorithm (Wikipedia link) solves this for you (there is also a public Python implementation). The naive implementation is O(n4) for n vertices, but it can be pushed down to O(n1/2m) for n vertices and m edges using the algorithm of Micali and Vazirani (sorry, couldn't find a PDF for that).
This seems related to the minimum edge cover problem, with the additional constraint that there can only be one edge per node, and that you're trying to minimize the cost rather than the number of edges. Maybe you can find some answers by searching for that phrase.
Failing that, your problem can be phrased as an integer linear programming problem, which is NP-complete, which means that you might get dreadful performance for even medium-sized problems. (This does not necessarily mean that the problem itself is NP-complete, though.)

Resources