I am trying to write an algorithm that computes the edge clique cover number (the smallest number of cliques that cover all edges) of an input graph (undirected and no self-loops). My idea would be to
Calculate all maximal cliques with the Bron-Kerbosch algorithm, and
Try if any 1,2,3,... of them would cover all edges until I find the
minimum number
Would that work and does anyone know a better method; is there a standard algorithm? To my surprise, I couldn't find any such algorithm. I know that the problem is NP-hard, so I don't expect a fast solution.
I would gather maximal cliques as you do now (or perhaps using a different algorithm, as suggested by CaptainTrunky), but then use branch and bound. This won't guarantee a speedup, but will often produce a large speedup on "easy" instances.
In particular:
Instead of trying all subsets of maximal cliques in increasing subset size order, pick an edge uv and branch on it. This means:
For each maximal clique C containing uv:
Make a tentative new partial solution that contains all cliques in the current solution
Add C to this partial solution
Make a new subproblem containing the current subproblem's graph, but with all vertices in C collapsed into a single vertex
Recurse to solve this smaller subproblem.
Keep track of the best complete solution so far. This is your upper bound (UB). You do not need to continue processing any subproblem that has already reached this upper bound but still has edges present; a better solution already exists!
It's best to pick an edge to branch on that is covered by as few cliques as possible. When choosing in what order to try those cliques, try whichever you think is likely to be the best (probably the largest one) first.
And here is an idea for a lower bound to improve the pruning level:
If a subgraph G' contains an independent set of size s, then you will need at least s cliques to cover G' (since no clique can cover two or more vertices in an independent set). Computing the largest possible IS is NP-hard and thus impractical here, but you could get a cheap bound by using the 2-approximation for Vertex Cover: Just keep choosing an edge and throwing out both vertices until no edges are left; if you threw out k edges, then what remains is an IS that is within k of optimal.
You can add the size of this IS to the total number of cliques in your solution so far; if that is larger than the current UB, you can abort this subproblem, since we know that fleshing it out further cannot produce a better solution than one we have already seen.
I was working on the similar problem 2 years ago and I've never seen any standard existing approaches to it. I did the following:
Compute all maximal cliques.
MACE was way better than
Bron-Kerbosch in my case.
Build a constraint-satisfaction problem for determining a minimum number of cliques required to cover the graph. You could use SAT, Minizinc, MIP tools to do so. Which one to pick? It depends on your skills, time resources, environment and dozens of other parameters. If we are talking about proof-of-concept, I would stick with Minizinc.
A bit details for the second part. Define a set of Boolean variables with respect to each edge, if it's value == True, then it's covered, otherwise, it's not. Add constraints that allow you covering sets of edges only with respect to each clique. Finally, add variables corresponding to each clique, if it's == True, then it's used already, otherwise, it's not. Finally, require all edges to be covered AND a number of used cliques is minimal.
Related
What is the best algorithm to use to solve a maze, which is ideally a graph, while gathering the most number of coins in a fixed number of steps?
Each edge has a distance and each node has a certain number of coins(0 - n coins)
The total fixed number of steps is given as input and it is guaranteed that there is a solution which solves the maze after these steps.
Thanks
Sorry, you will not find an efficient solution for this problem, it is NP-hard, which means if you want a precise solution, it will have exponential complexity. This can be shown by reducing the Knapsack problem to it.
Assume you have an instance of Knapsack with n items, set w_0...w_n of weights and set v_0..._v_n of values, and capacity W, we can build a graph where each vertex corresponds to a value, plus 2 additional vertices s and e for start and end. Create an edge from s to each vertex v_i with weight w_i/2, and also an edge from v_i to the vertex e with corresponding weight w_i/2. Now find the path from s to e, limited to length W. This path will not visit vertex v_i more then once since after the coin is collected no reason to return to that node. Also, to visit any v_i and getting to e it will spend exactly w_i steps, whether if going from s or returning from e. The solution guarantees that the limit of steps is not exceeded and the combination of values is maximal. So by just picking all vertices v_i visited we have a solution to the optimization version of Knapsack which is NP-hard.
Now it is not all that bad. The problem has a solution, it is just inefficient. For a small maze, it may still work using brute force, i.e. start with a vertex and try to go in any direction recursively. If exceeded maximum number of steps - abort. If reached destination - return the path. Then of all the returned paths select the one that yields maximum coins.
Additionally, if you do not need a precise solution - you may try to give an approximate solution. For example, you can use Dijkstra algorithm to produce shortest path tree. So now you have a shortest path from source to each vertex. Pick the path from source to destination, and try to improve it iteratively. In each iteration pick a vertex on the path, look at all its neighbors and see it going through this neighbor gives you better value while still under steps constraint. This will not necessarily give you the optimal solution, but will likely produce some good solutions.
And then you can try other optimizations, like "pick a coin and run". Look at your path and try to improve it by going from some vertex to its neighbor and return immediately. It may be useful if your maze has a dead end, so that naturally you would not go there to reach destination, but you do have enough steps to go there collect some coins and return.
I hope this helps.
I am given a weighted graph and want to find a set of edges such that every node is only incident to one edge and such that the sum of the selected edge weights is maximized. As far as I know this problem is generally referred to as maximum weight matching and there exist fast approximations for it: https://web.eecs.umich.edu/~pettie/papers/ApproxMWM-JACM.pdf
However, for my application it would be better if only a certain ratio of nodes is paired. It's more important for my application that the nodes that get paired have a high weight between them. Leaving some nodes unpaired is no big problem.
Currently, I sort the weights between nodes in descending order and always select the edge with the highest weight until I have paired a certain number of nodes. Of course I ensure that pairs of nodes are mutually exclusive. This is only a 1/2 approximation to the original problem and it's probably even worse for the modified problem.
Could you please suggest an algorithm for this issue or tell me how this problem is called?
Some thoughts.
The greedy algorithm for this problem is a 2-approximation. Imagine that, during the execution of the greedy algorithm, we keep score versus an optimal solution in the following manner. Every time we add an edge to the greedy solution, we delete the incident edges in the optimal solution, which I claim must have weight no greater than the greedy edge. If no edge would be deleted from the optimal solution, we delete the two heaviest edges instead, which also I claim must have weight no greater than the greedy edge. Since the greedy solution must have at least half as many edges as the optimal solution, I claim that there are no optimal edges left at the end, and hence greedy is a 2-approximation because we never deleted more than 2x weight with each greedy edge.
A complete 20,000-vertex graph is right on that line where I don't know whether integer programming would be a good idea. I think it's still worth a try because it's easy enough.
There are polynomial-time algorithms for computing the maximum-weight independent set in the intersection of two matroids (for this problem, the matching matroid and the uniform matroid whose bases have size equal to the size of the desired matching). I don't know if they would be practical.
I'd like to solve a harder version of the minimum spanning tree problem.
There are N vertices. Also there are 2M edges numbered by 1, 2, .., 2M. The graph is connected, undirected, and weighted. I'd like to choose some edges to make the graph still connected and make the total cost as small as possible. There is one restriction: an edge numbered by 2k and an edge numbered by 2k-1 are tied, so both should be chosen or both should not be chosen. So, if I want to choose edge 3, I must choose edge 4 too.
So, what is the minimum total cost to make the graph connected?
My thoughts:
Let's call two edges 2k and 2k+1 a edge set.
Let's call an edge valid if it merges two different components.
Let's call an edge set good if both of the edges are valid.
First add exactly m edge sets which are good in increasing order of cost. Then iterate all the edge sets in increasing order of cost, and add the set if at least one edge is valid. m should be iterated from 0 to M.
Run an kruskal algorithm with some variation: The cost of an edge e varies.
If an edge set which contains e is good, the cost is: (the cost of the edge set) / 2.
Otherwise, the cost is: (the cost of the edge set).
I cannot prove whether kruskal algorithm is correct even if the cost changes.
Sorry for the poor English, but I'd like to solve this problem. Is it NP-hard or something, or is there a good solution? :D Thanks to you in advance!
As I speculated earlier, this problem is NP-hard. I'm not sure about inapproximability; there's a very simple 2-approximation (split each pair in half, retaining the whole cost for both halves, and run your favorite vanilla MST algorithm).
Given an algorithm for this problem, we can solve the NP-hard Hamilton cycle problem as follows.
Let G = (V, E) be the instance of Hamilton cycle. Clone all of the other vertices, denoting the clone of vi by vi'. We duplicate each edge e = {vi, vj} (making a multigraph; we can do this reduction with simple graphs at the cost of clarity), and, letting v0 be an arbitrary original vertex, we pair one copy with {v0, vi'} and the other with {v0, vj'}.
No MST can use fewer than n pairs, one to connect each cloned vertex to v0. The interesting thing is that the other halves of the pairs of a candidate with n pairs like this can be interpreted as an oriented subgraph of G where each vertex has out-degree 1 (use the index in the cloned bit as the tail). This graph connects the original vertices if and only if it's a Hamilton cycle on them.
There are various ways to apply integer programming. Here's a simple one and a more complicated one. First we formulate a binary variable x_i for each i that is 1 if edge pair 2i-1, 2i is chosen. The problem template looks like
minimize sum_i w_i x_i (drop the w_i if the problem is unweighted)
subject to
<connectivity>
for all i, x_i in {0, 1}.
Of course I have left out the interesting constraints :). One way to enforce connectivity is to solve this formulation with no constraints at first, then examine the solution. If it's connected, then great -- we're done. Otherwise, find a set of vertices S such that there are no edges between S and its complement, and add a constraint
sum_{i such that x_i connects S with its complement} x_i >= 1
and repeat.
Another way is to generate constraints like this inside of the solver working on the linear relaxation of the integer program. Usually MIP libraries have a feature that allows this. The fractional problem has fractional connectivity, however, which means finding min cuts to check feasibility. I would expect this approach to be faster, but I must apologize as I don't have the energy to describe it detail.
I'm not sure if it's the best solution, but my first approach would be a search using backtracking:
Of all edge pairs, mark those that could be removed without disconnecting the graph.
Remove one of these sets and find the optimal solution for the remaining graph.
Put the pair back and remove the next one instead, find the best solution for that.
This works, but is slow and unelegant. It might be possible to rescue this approach though with a few adjustments that avoid unnecessary branches.
Firstly, the edge pairs that could still be removed is a set that only shrinks when going deeper. So, in the next recursion, you only need to check for those in the previous set of possibly removable edge pairs. Also, since the order in which you remove the edge pairs doesn't matter, you shouldn't consider any edge pairs that were already considered before.
Then, checking if two nodes are connected is expensive. If you cache the alternative route for an edge, you can check relatively quick whether that route still exists. If it doesn't, you have to run the expensive check, because even though that one route ceased to exist, there might still be others.
Then, some more pruning of the tree: Your set of removable edge pairs gives a lower bound to the weight that the optimal solution has. Further, any existing solution gives an upper bound to the optimal solution. If a set of removable edges doesn't even have a chance to find a better solution than the best one you had before, you can stop there and backtrack.
Lastly, be greedy. Using a regular greedy algorithm will not give you an optimal solution, but it will quickly raise the bar for any solution, making pruning more effective. Therefore, attempt to remove the edge pairs in the order of their weight loss.
Given an undirected graph G = G(V, E), how can I find the size of the largest clique in it in polynomial time? Knowing the number of edges, I could put an upper limit on the maximal clique size with
https://cs.stackexchange.com/questions/11360/size-of-maximum-clique-given-a-fixed-amount-of-edges
, and then I could iterate downwards from that upper limit to 1. Since this upper cap is O(sqrt(|E|)), I think I can check for the maximal clique size in O(sqrt(|E|) * sqrt(|E|) * sqrt(|E|)) time.
Is there a more efficient way to solve this NP-complete problem?
Finding the largest clique in a graph is the clique number of the graph and is also known as the maximum clique problem (MCP). This is one of the most deeply studied problems in the graph domain and is known to be NP-Hard so no polynomial time algorithm is expected to be found to solve it in the general case (there are particular graph configurations which do have polynomial time algorithms). Maximum clique is even hard to approximate (i.e. find a number close to the clique number).
If you are interested in exact MCP algorithms there have been a number of important improvements in the past decade, which have increased performance in around two orders of magnitude. The current leading family of algorithms are branch and bound and use approximate coloring to compute bounds. I name the most important ones and the improvement:
Branching on color (MCQ)
Static initial ordering in every subproblem (MCS and BBMC)
Recoloring: MCS
Use of bit strings to encode the graph and the main operations (BBMC)
Reduction to maximum satisfiability to improve bounds (MaxSAT)
Selective coloring (BBMCL)
and others.
It is actually a very active line of research in the scientific community.
The top algorithms are currently BBMC, MCS and I would say MaxSAT. Of these probably BBMC and its variants (which use a bit string encoding) are the current leading general purpose solvers. The library of bitstrings used for BBMC is publicly available.
Well I was thinking a bit about some dynamic programming approach and maybe I figured something out.
First : find nodes with very low degree (can be done in O(n)). Test them, if they are part of any clique and then remove them. With a little "luck" you can crush graph into few separate components and then solve each one independently (which is much much faster).
(To identify component, O(n) time is required).
Second : For each component, you can find if it makes sense to try to find any clique of given size. How? Lets say, you want to find clique of size 19. Then there has to exist at least 19 nodes with at least 19 degree. Otherwise, such clique cannot exist and you dont have to test it.
You are given a simple graph of max degree 4 with 1 million vertices.
We want to find a Maximum Independent Subset.
In the general case it is NP hard.
Does the fact that the degree is max 4 provide an efficient solution to calculate it?
Reading further into that Wikipedia page, I found this on the subject:
For instance, for sparse graphs (graphs in which the number of edges
is at most a constant times the number of vertices in any subgraph),
the maximum clique has bounded size and may be found exactly in linear
time;[6] however, for the same classes of graphs, or even
for the more restricted class of bounded degree graphs, finding the
maximum independent set is MAXSNP-complete, implying that, for some
constant c (depending on the degree) it is NP-hard to find an
approximate solution that comes within a factor of c of the
optimum.[7]
Your case is the bounded degree case, so judging from this snippet, your more restrictive version is still NP-hard.
There's a very simple greedy 1/5-approximation. Take any vertex, add it to independent set, and remove neighbours from the graph. Continue till no vertices remain. A bit more general version of this trick is Turan's theorem.