Simple Graph |V|= 10^6, degree 4: Maximum Independent Subset? - algorithm

You are given a simple graph of max degree 4 with 1 million vertices.
We want to find a Maximum Independent Subset.
In the general case it is NP hard.
Does the fact that the degree is max 4 provide an efficient solution to calculate it?

Reading further into that Wikipedia page, I found this on the subject:
For instance, for sparse graphs (graphs in which the number of edges
is at most a constant times the number of vertices in any subgraph),
the maximum clique has bounded size and may be found exactly in linear
time;[6] however, for the same classes of graphs, or even
for the more restricted class of bounded degree graphs, finding the
maximum independent set is MAXSNP-complete, implying that, for some
constant c (depending on the degree) it is NP-hard to find an
approximate solution that comes within a factor of c of the
optimum.[7]
Your case is the bounded degree case, so judging from this snippet, your more restrictive version is still NP-hard.

There's a very simple greedy 1/5-approximation. Take any vertex, add it to independent set, and remove neighbours from the graph. Continue till no vertices remain. A bit more general version of this trick is Turan's theorem.

Related

Edge clique cover algorithm

I am trying to write an algorithm that computes the edge clique cover number (the smallest number of cliques that cover all edges) of an input graph (undirected and no self-loops). My idea would be to
Calculate all maximal cliques with the Bron-Kerbosch algorithm, and
Try if any 1,2,3,... of them would cover all edges until I find the
minimum number
Would that work and does anyone know a better method; is there a standard algorithm? To my surprise, I couldn't find any such algorithm. I know that the problem is NP-hard, so I don't expect a fast solution.
I would gather maximal cliques as you do now (or perhaps using a different algorithm, as suggested by CaptainTrunky), but then use branch and bound. This won't guarantee a speedup, but will often produce a large speedup on "easy" instances.
In particular:
Instead of trying all subsets of maximal cliques in increasing subset size order, pick an edge uv and branch on it. This means:
For each maximal clique C containing uv:
Make a tentative new partial solution that contains all cliques in the current solution
Add C to this partial solution
Make a new subproblem containing the current subproblem's graph, but with all vertices in C collapsed into a single vertex
Recurse to solve this smaller subproblem.
Keep track of the best complete solution so far. This is your upper bound (UB). You do not need to continue processing any subproblem that has already reached this upper bound but still has edges present; a better solution already exists!
It's best to pick an edge to branch on that is covered by as few cliques as possible. When choosing in what order to try those cliques, try whichever you think is likely to be the best (probably the largest one) first.
And here is an idea for a lower bound to improve the pruning level:
If a subgraph G' contains an independent set of size s, then you will need at least s cliques to cover G' (since no clique can cover two or more vertices in an independent set). Computing the largest possible IS is NP-hard and thus impractical here, but you could get a cheap bound by using the 2-approximation for Vertex Cover: Just keep choosing an edge and throwing out both vertices until no edges are left; if you threw out k edges, then what remains is an IS that is within k of optimal.
You can add the size of this IS to the total number of cliques in your solution so far; if that is larger than the current UB, you can abort this subproblem, since we know that fleshing it out further cannot produce a better solution than one we have already seen.
I was working on the similar problem 2 years ago and I've never seen any standard existing approaches to it. I did the following:
Compute all maximal cliques.
MACE was way better than
Bron-Kerbosch in my case.
Build a constraint-satisfaction problem for determining a minimum number of cliques required to cover the graph. You could use SAT, Minizinc, MIP tools to do so. Which one to pick? It depends on your skills, time resources, environment and dozens of other parameters. If we are talking about proof-of-concept, I would stick with Minizinc.
A bit details for the second part. Define a set of Boolean variables with respect to each edge, if it's value == True, then it's covered, otherwise, it's not. Add constraints that allow you covering sets of edges only with respect to each clique. Finally, add variables corresponding to each clique, if it's == True, then it's used already, otherwise, it's not. Finally, require all edges to be covered AND a number of used cliques is minimal.

Divide a graph into same size disjoint sets with minimum cut

Is there any algorithm or code that divide graph nodes into two or more disjoint sets that satisfy following conditions:
first, only edges allowed to remove.
second, edges are weighted and those that would be removed must has minimum weight(minimum cut algorithm).
third,desired disjoint sets have same size as long as possible.
It looks like you're trying to solve the Min-bisection problem in which given a graph G you would like to partition V[G] into two disjoint subsets A and B of equal size such that the sum of weights of the edges between A and B is minimized. Unfortunately, the Min-bisection problem is known to be NP-hard. However, Kernighan–Lin algorithm is a very simple O(n^2*logn) heuristic algorithm for the problem. While very little is known about the algorithm theoretically (we do not have a proven bound on its performance relative to an optimal solution), the algorithm is shown to be quite effective in experiments.

How to find the size of maximal clique or clique number?

Given an undirected graph G = G(V, E), how can I find the size of the largest clique in it in polynomial time? Knowing the number of edges, I could put an upper limit on the maximal clique size with
https://cs.stackexchange.com/questions/11360/size-of-maximum-clique-given-a-fixed-amount-of-edges
, and then I could iterate downwards from that upper limit to 1. Since this upper cap is O(sqrt(|E|)), I think I can check for the maximal clique size in O(sqrt(|E|) * sqrt(|E|) * sqrt(|E|)) time.
Is there a more efficient way to solve this NP-complete problem?
Finding the largest clique in a graph is the clique number of the graph and is also known as the maximum clique problem (MCP). This is one of the most deeply studied problems in the graph domain and is known to be NP-Hard so no polynomial time algorithm is expected to be found to solve it in the general case (there are particular graph configurations which do have polynomial time algorithms). Maximum clique is even hard to approximate (i.e. find a number close to the clique number).
If you are interested in exact MCP algorithms there have been a number of important improvements in the past decade, which have increased performance in around two orders of magnitude. The current leading family of algorithms are branch and bound and use approximate coloring to compute bounds. I name the most important ones and the improvement:
Branching on color (MCQ)
Static initial ordering in every subproblem (MCS and BBMC)
Recoloring: MCS
Use of bit strings to encode the graph and the main operations (BBMC)
Reduction to maximum satisfiability to improve bounds (MaxSAT)
Selective coloring (BBMCL)
and others.
It is actually a very active line of research in the scientific community.
The top algorithms are currently BBMC, MCS and I would say MaxSAT. Of these probably BBMC and its variants (which use a bit string encoding) are the current leading general purpose solvers. The library of bitstrings used for BBMC is publicly available.
Well I was thinking a bit about some dynamic programming approach and maybe I figured something out.
First : find nodes with very low degree (can be done in O(n)). Test them, if they are part of any clique and then remove them. With a little "luck" you can crush graph into few separate components and then solve each one independently (which is much much faster).
(To identify component, O(n) time is required).
Second : For each component, you can find if it makes sense to try to find any clique of given size. How? Lets say, you want to find clique of size 19. Then there has to exist at least 19 nodes with at least 19 degree. Otherwise, such clique cannot exist and you dont have to test it.

Back Tracking Vs. Greedy Algorithm Maximum Independent Set

I implemented a back tracing algorithm using both a greedy algorithm and a back tracking algorithm.
The back tracking algorithm is as follows:
MIS(G= (V,E): a graph): largest set of independent vertices
1:if|V|= 0
then return .
3:end if
if | V|= 1
then return V
end if
pick u ∈ V
Gout←G−{u}{remove u from V and E }
Gn ← G−{ u}−N(u){N(u) are the neighbors of u}
Sout ←MIS(Gout)
Sin←MIS(Gin)∪{u}
return maxsize(Sout,Sin){return Sin if there’s a tie — there’s a reason for this.
}
The greedy algorithm is to iteratively pick the node with the smallest degree, place it in the MIS and then remove it and its neighbors from G.
After running the algorithm on varying graph sizes where the probability of an edge existing is 0.5, I have empirically found that the back tracking algorithm always found a smaller a smaller maximum independent set than the greedy algorithm. Is this expected?
Your solution is strange. Backtracking is usually used to yes/no problems, not optimization. The algorithm you wrote depends heavily on how you pick u. And it definitely is not backtracking because you never backtrack.
Such problem can be solved in a number of ways, e.g.:
genetic programming,
exhaustive searching,
solving the problem on dual graph (maximum clique problem).
According to Wikipedia, this is a NP-hard problem:
A maximum independent set is an independent set of the largest possible size for a given graph G.
This size is called the independence number of G, and denoted α(G).
The problem of finding such a set is called the maximum independent set problem and is an NP-hard optimization problem.
As such, it is unlikely that there exists an efficient algorithm for finding a maximum independent set of a graph.
So, for finding the maximum independent set of a graph, you should test all available states (with an algorithm which its time complexity is exponential). All other faster algorithms (like greedy, genetic or randomize ones), can not find the exact answer. They can guarantee to find a maximal independent set, but not the maximum one.
In conclusion, I can say that your backtracking approach is slower and accurate; but the greedy approach is only an approximation algorithm.

Best subsample in the Maxmin distance sense

I have a set of N points in a D-dimensional metric space. I want to select K of them in such a way that the smallest distance between any two points in the subset is the largest.
For instance, with N=4 and K=3 in 3D Euclidean space, the solution is the face of the tetrahedron having the longest short side.
Is there a classical way to achieve that ? Can it be solved exactly in polynomial time ?
I have googled as much as I could, but I have not figured out yet how to call this problem.
In my case N=50, K=10 and D=300 typically.
Clarification:
A brute force approach would be to try every combination of K points among the N and determine the closest pair in every subset. The solution is given by the subset that yields the longest pair.
Done the trivial way, an O(K^2) process, to be repeated N! / K!(N-K)! times.
Hum, 10^2 50! / 10! 40! = 1027227817000
I think you might find papers on unit disk graphs informative but discouraging. For instance, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.3113&rep=rep1&type=pdf states that the maximum independent set problem on unit disk graphs in NP-complete, even if the disk representation is known. A unit disk graph is the graph you get by placing points in the plane and forming links between every pair of points at most a unit distance apart.
So I think that if you could solve your problem in polynomial time you could run it on a unit disk graph for different values of K until you find a value at which the smallest distance between two chosen points was just over one, and I think this would be a maximum independent set on the unit disk graph, which would be solving an NP-complete problem in polynomial time.
(Just about to jump on a bicycle so this is a bit rushed, but searching for papers on unit disk graphs might at least turn up some useful search terms)
Here's an attempt to explain it piece by piece:
Here is another attempt to relate the two problems.
For maximum independent set see http://en.wikipedia.org/wiki/Maximum_independent_set#Finding_maximum_independent_sets. A decision problem version of this is "Are there K vertices in this graph such that no two are joined by an edge?" If you can solve this you can certainly find a maximum independent set by finding the largest K by asking this question for different K and then finding the K nodes by asking the question on versions of the graph with one or more nodes deleted.
I state without proof that finding the maximum independent set in a unit disk graph is NP-complete. Another reference for this is http://web.sau.edu/lilliskevinm/wirelessbib/ClarkColbournJohnson.pdf.
A decision version of your problem is "Do there exist K points with distance at least D between any two points?" Again, you can solve this in polynomial time iff you can solve your original problem in polynomial time - play around until you find the largest D that gives answer yes, and then delete points and see what happens.
A unit disk graph has an edge exactly when the distance between two points is 1 or less. So if you could solve the decision version of your original problem you could solve the decision version of the unit disk graph problem just by setting D = 1 and solving your problem.
So I think I have constructed a series of links showing that if you could solve your problem you could solve an NP-complete problem by turning it into your problem, which causes me to think that your problem is hard.

Resources