My Problem
Whether there's an efficient algorithm to find a max-weight (or min-weight) k-clique in a complete k-partite graph (a graph in which vertices are adjacent if and only if they belong to different partite sets according to wikipedia)?
More Details about the Terms
Max-weight Clique: Every edge in the graph has a weight. The weight of a clique is the sum of the weights of all edges in the clique. The goal is to find a clique with the maximum weight.
Note that the size of the clique is k which is the largest possible clique size in a complete k-partite graph.
What I have tried
I met this problem during a project. Since I am not a CS person, I am not sure about the complexity etc.
I have googled several related papers but none of them deals with the same problem. I have also programmed a greedy algorithm + simulated annealing to deal with it (the result seems not good). I have also tried something like Dynamic Programming (but it does not seem efficient). So I wonder whether the exact optimal can be computed efficiently. Thanks in advance.
EDIT Since my input can be really large (e.g. the number of vertices in each clique is 2^k), I hope to find a really fast algorithm (e.g. polynomial of k in time) that works out the optimal result. If it's not possible, can we prove some lower bound of the complexity?
Generalized Maximum Clique Problem (GMCP)
I understand that you are looking for the Generalized Maximum/ minimum Clique Problem (GMCP), where finding the clique with maximum score or minimum cost is the optimization problem.
This problem is a NP-Hard problem as shown in Generalized network design problems, so there is currently no polynomial time exact solution to your problem.
Since, there is no known polynomial solution to your problem, you have 2 choices. Reducing the problem size to find the exact solution or to find an estimated solution by relaxing your problem and it leads you to a an estimation to the optimal solution.
Example and solution for the small problem size
In small k-partite graphs (in our case k is 30 and each partite has 92 nodes), we were able to get the optimal solution in a reasonable time by a heavy branch and bounding algorithm. We have converted the problem into another NP-hard problem (Mixed Integer Programming), reduced number of integer variables, and used IBM Cplex optimizer to find the optimal solution to GMCP.
You can find our project page and paper useful. I can also share the code with you.
How to estimate the solution
One straight forward estimation to this NP-Hard problem is relaxing the Mixed Integer Programming problem and solve it as a linear programming problem. Of course it will give you an estimation of the solution, but still you might get a reasonable answer in practice.
More general problem (Generalized Maximum Multi Clique Problem)
In another work, we solve the Generalized Maximum Multi Clique Problem (GMMCP), where maximizing the score or minimizing the cost of selecting multiple k-cliques in a complete k-partite graph is in interest. You can find the project page by searching for GMMCP Tracking.
The maximum clique problem in a weighted graph in general is intractable. In your case, if the graph contains N nodes, you can enumerate through all possible k-cliques in N ** k time. If k is fixed (don't know if it is), your problem is trivially polynomially solvable, as this is a polynomial in N. I don't believe the problem to be tractable if k is a free parameter because I can't see how the assumption of a k-partite graph would make the problem significantly simpler from the general one.
How hard your problem is in practice depends also on how the weights are distributed. If all the weights are very near to each others, i.e. the difference between "best" and "good" is relatively small, the problem is very hard. If you have wildly different weights on the edges, the problem can be easier, because a greedy algorithm can give you a good "initial" solution, and you can use that and subsequent good solutions to limit your combinatorial search using the well-known branch-and-bound method.
Related
Assume we're given a graph on a 2D-plane with n nodes and edge between each pair of nodes, having a weight equal to a euclidean distance. The initial problem is to find MST of this graph and it's quite clear how to solve that using Prim's or Kruskal's algorithm.
Now let's say we have k extra nodes, which we can place in any integer point on our 2D-plane. The problem is to find locations for these nodes so as new graph has the smallest possible MST, if it is not necessary to use all of these extra nodes.
It is obviously impossible to find the exact solution (in poly-time), but the goal is to find the best approximate one (which can be found within 1 sec). Maybe you can come up with some hints of the most efficient way of going throw possible solutions, or provide with some articles, where the similar problem is covered.
It is very interesting problem which you are working on. You have many options to attack this problem. The best known heuristics in such situation are - Genetic Algorithms, Particle Swarm Optimization, Differential Evolution and many others of this kind.
What is nice for such kind of heuristics is that you can limit their execution to a certain amount of time (let say 1 second). If it was my task to do I would try first Genetic Algorithms.
You could try with a greedy algorithm, try the longest edges in the MST, potentially these could give the largest savings.
Select the longest edge, now get the potential edge from each vertex that are closed in angle to the chosen one, from each side.
from these select the best Steiner point.
Fix the MST ...
repeat until 1 sec is gone.
The challenge is what to do if one of the vertexes is itself a Steiner point.
I'm not sure if I'm using the right term here, but for fully connected components I mean there's an (undirected) edge between every pair of vertices in a component, and no additional vertices can be included without breaking this property.
There're a number algorithms for finding strongly connected components in a graph though (for example Tarjan's algorithm), is there an algorithm for finding such "fully connected components"?
What you are looking for is a list of all the maximal cliques of the graph. It's also called the clique problem. No known polynomial time solution exists for a generic undirected graph.
Most versions of the clique problem are hard. The clique decision problem is NP-complete (one of Karp's 21 NP-complete problems). The problem of finding the maximum clique is both fixed-parameter intractable and hard to approximate. And, listing all maximal cliques may require exponential time as there exist graphs with exponentially many maximal cliques. Therefore, much of the theory about the clique problem is devoted to identifying special types of graph that admit more efficient algorithms, or to establishing the computational difficulty of the general problem in various models of computation.
-https://en.wikipedia.org/wiki/Clique_problem
I was also looking at the same question.
https://en.wikipedia.org/wiki/Bron-Kerbosch_algorithm This turns out to be an algorithm to list it, however, it's not fast. If your graph is sparse, you may want to use the vertex ordering version of the algorithm:
For sparse graphs, tighter bounds are possible. In particular the vertex-ordering version of the Bron–Kerbosch algorithm can be made to run in time O(dn3d/3), where d is the degeneracy of the graph, a measure of its sparseness. There exist d-degenerate graphs for which the total number of maximal cliques is (n − d)3d/3, so this bound is close to tight.[6]
Given an undirected graph G = G(V, E), how can I find the size of the largest clique in it in polynomial time? Knowing the number of edges, I could put an upper limit on the maximal clique size with
https://cs.stackexchange.com/questions/11360/size-of-maximum-clique-given-a-fixed-amount-of-edges
, and then I could iterate downwards from that upper limit to 1. Since this upper cap is O(sqrt(|E|)), I think I can check for the maximal clique size in O(sqrt(|E|) * sqrt(|E|) * sqrt(|E|)) time.
Is there a more efficient way to solve this NP-complete problem?
Finding the largest clique in a graph is the clique number of the graph and is also known as the maximum clique problem (MCP). This is one of the most deeply studied problems in the graph domain and is known to be NP-Hard so no polynomial time algorithm is expected to be found to solve it in the general case (there are particular graph configurations which do have polynomial time algorithms). Maximum clique is even hard to approximate (i.e. find a number close to the clique number).
If you are interested in exact MCP algorithms there have been a number of important improvements in the past decade, which have increased performance in around two orders of magnitude. The current leading family of algorithms are branch and bound and use approximate coloring to compute bounds. I name the most important ones and the improvement:
Branching on color (MCQ)
Static initial ordering in every subproblem (MCS and BBMC)
Recoloring: MCS
Use of bit strings to encode the graph and the main operations (BBMC)
Reduction to maximum satisfiability to improve bounds (MaxSAT)
Selective coloring (BBMCL)
and others.
It is actually a very active line of research in the scientific community.
The top algorithms are currently BBMC, MCS and I would say MaxSAT. Of these probably BBMC and its variants (which use a bit string encoding) are the current leading general purpose solvers. The library of bitstrings used for BBMC is publicly available.
Well I was thinking a bit about some dynamic programming approach and maybe I figured something out.
First : find nodes with very low degree (can be done in O(n)). Test them, if they are part of any clique and then remove them. With a little "luck" you can crush graph into few separate components and then solve each one independently (which is much much faster).
(To identify component, O(n) time is required).
Second : For each component, you can find if it makes sense to try to find any clique of given size. How? Lets say, you want to find clique of size 19. Then there has to exist at least 19 nodes with at least 19 degree. Otherwise, such clique cannot exist and you dont have to test it.
I implemented a back tracing algorithm using both a greedy algorithm and a back tracking algorithm.
The back tracking algorithm is as follows:
MIS(G= (V,E): a graph): largest set of independent vertices
1:if|V|= 0
then return .
3:end if
if | V|= 1
then return V
end if
pick u ∈ V
Gout←G−{u}{remove u from V and E }
Gn ← G−{ u}−N(u){N(u) are the neighbors of u}
Sout ←MIS(Gout)
Sin←MIS(Gin)∪{u}
return maxsize(Sout,Sin){return Sin if there’s a tie — there’s a reason for this.
}
The greedy algorithm is to iteratively pick the node with the smallest degree, place it in the MIS and then remove it and its neighbors from G.
After running the algorithm on varying graph sizes where the probability of an edge existing is 0.5, I have empirically found that the back tracking algorithm always found a smaller a smaller maximum independent set than the greedy algorithm. Is this expected?
Your solution is strange. Backtracking is usually used to yes/no problems, not optimization. The algorithm you wrote depends heavily on how you pick u. And it definitely is not backtracking because you never backtrack.
Such problem can be solved in a number of ways, e.g.:
genetic programming,
exhaustive searching,
solving the problem on dual graph (maximum clique problem).
According to Wikipedia, this is a NP-hard problem:
A maximum independent set is an independent set of the largest possible size for a given graph G.
This size is called the independence number of G, and denoted α(G).
The problem of finding such a set is called the maximum independent set problem and is an NP-hard optimization problem.
As such, it is unlikely that there exists an efficient algorithm for finding a maximum independent set of a graph.
So, for finding the maximum independent set of a graph, you should test all available states (with an algorithm which its time complexity is exponential). All other faster algorithms (like greedy, genetic or randomize ones), can not find the exact answer. They can guarantee to find a maximal independent set, but not the maximum one.
In conclusion, I can say that your backtracking approach is slower and accurate; but the greedy approach is only an approximation algorithm.
I would like to know a fast algorithm to find only the clique number(without actually finding the clique) of a graph with about 100 vertices.
I am trying to solve the following problem.
http://uva.onlinejudge.org/external/1/193.html
This is NP-complete, and you can't do it much better than actually finding the maximum clique and counting its vertices. From Wikipedia:
Clique problems include:
solving the decision problem of testing whether a graph contains a clique larger than N
These problems are all hard: the clique decision problem is NP-complete (one of Karp's 21 NP-complete problems),
If you can find the clique number in P, then the decision problem is answerable in P (you simply compute the clique number and compare it with N).
Since the decision problem is NP-Complete, finding the clique number of a general graph must be NP-Hard.
As already stated by others, this is probably really hard.
But like many theoretically hard problems, it can be pretty fast in practice with a good algorithm and suitable data. If you implement something like Bron-Kerbosch for finding cliques, while keeping track of the largest clique size you've found so far, then you can backtrack out of fruitless search trees.
For example, if you find a clique with 20 nodes in it, and your network has a large number of nodes with degree less than 20, you can immediately rule out those nodes from further consideration. On the other hand, if the degree distribution is more uniform, then this might be as slow as finding all the cliques.
Although the problem is NP-hard, the size of graph you mention is not any problem for today´s fastest maximum clique exact solvers (for any configuration).
If you are ready to implement the code then I recommend you read the papers connected with the family of algorithms MCQ, MCR and MCS, as well as the family BBMC, BBMCL and BBMCX. An interesting starting point is the comparison survey by Prosser [Prosser 12]. It includes explanation for a Java implementation of these algorithms.