Back Tracking Vs. Greedy Algorithm Maximum Independent Set - performance

I implemented a back tracing algorithm using both a greedy algorithm and a back tracking algorithm.
The back tracking algorithm is as follows:
MIS(G= (V,E): a graph): largest set of independent vertices
1:if|V|= 0
then return .
3:end if
if | V|= 1
then return V
end if
pick u ∈ V
Gout←G−{u}{remove u from V and E }
Gn ← G−{ u}−N(u){N(u) are the neighbors of u}
Sout ←MIS(Gout)
Sin←MIS(Gin)∪{u}
return maxsize(Sout,Sin){return Sin if there’s a tie — there’s a reason for this.
}
The greedy algorithm is to iteratively pick the node with the smallest degree, place it in the MIS and then remove it and its neighbors from G.
After running the algorithm on varying graph sizes where the probability of an edge existing is 0.5, I have empirically found that the back tracking algorithm always found a smaller a smaller maximum independent set than the greedy algorithm. Is this expected?

Your solution is strange. Backtracking is usually used to yes/no problems, not optimization. The algorithm you wrote depends heavily on how you pick u. And it definitely is not backtracking because you never backtrack.
Such problem can be solved in a number of ways, e.g.:
genetic programming,
exhaustive searching,
solving the problem on dual graph (maximum clique problem).

According to Wikipedia, this is a NP-hard problem:
A maximum independent set is an independent set of the largest possible size for a given graph G.
This size is called the independence number of G, and denoted α(G).
The problem of finding such a set is called the maximum independent set problem and is an NP-hard optimization problem.
As such, it is unlikely that there exists an efficient algorithm for finding a maximum independent set of a graph.
So, for finding the maximum independent set of a graph, you should test all available states (with an algorithm which its time complexity is exponential). All other faster algorithms (like greedy, genetic or randomize ones), can not find the exact answer. They can guarantee to find a maximal independent set, but not the maximum one.
In conclusion, I can say that your backtracking approach is slower and accurate; but the greedy approach is only an approximation algorithm.

Related

How to find the size of maximal clique or clique number?

Given an undirected graph G = G(V, E), how can I find the size of the largest clique in it in polynomial time? Knowing the number of edges, I could put an upper limit on the maximal clique size with
https://cs.stackexchange.com/questions/11360/size-of-maximum-clique-given-a-fixed-amount-of-edges
, and then I could iterate downwards from that upper limit to 1. Since this upper cap is O(sqrt(|E|)), I think I can check for the maximal clique size in O(sqrt(|E|) * sqrt(|E|) * sqrt(|E|)) time.
Is there a more efficient way to solve this NP-complete problem?
Finding the largest clique in a graph is the clique number of the graph and is also known as the maximum clique problem (MCP). This is one of the most deeply studied problems in the graph domain and is known to be NP-Hard so no polynomial time algorithm is expected to be found to solve it in the general case (there are particular graph configurations which do have polynomial time algorithms). Maximum clique is even hard to approximate (i.e. find a number close to the clique number).
If you are interested in exact MCP algorithms there have been a number of important improvements in the past decade, which have increased performance in around two orders of magnitude. The current leading family of algorithms are branch and bound and use approximate coloring to compute bounds. I name the most important ones and the improvement:
Branching on color (MCQ)
Static initial ordering in every subproblem (MCS and BBMC)
Recoloring: MCS
Use of bit strings to encode the graph and the main operations (BBMC)
Reduction to maximum satisfiability to improve bounds (MaxSAT)
Selective coloring (BBMCL)
and others.
It is actually a very active line of research in the scientific community.
The top algorithms are currently BBMC, MCS and I would say MaxSAT. Of these probably BBMC and its variants (which use a bit string encoding) are the current leading general purpose solvers. The library of bitstrings used for BBMC is publicly available.
Well I was thinking a bit about some dynamic programming approach and maybe I figured something out.
First : find nodes with very low degree (can be done in O(n)). Test them, if they are part of any clique and then remove them. With a little "luck" you can crush graph into few separate components and then solve each one independently (which is much much faster).
(To identify component, O(n) time is required).
Second : For each component, you can find if it makes sense to try to find any clique of given size. How? Lets say, you want to find clique of size 19. Then there has to exist at least 19 nodes with at least 19 degree. Otherwise, such clique cannot exist and you dont have to test it.

Algorithm for independent set of a graph?

is there an algorithm for finding all the independent sets of an directed graph ?
From what i've read an independent set represents a set formed by the nodes that are not adjacent.
So for this example I would have {1} {2} {1,3}
So how is possible to find all of them, I am thinking about something recursive but I don't really know the algorithm, if someone could point me in the right direction it would be much appreciated !
Thank you!
Typical way to find independent sets is to consider the complement of a graph. A complement of a graph is defined as a graph with the same set of vertices and an edge between a pair if and only if there is no edge between them in the original graph. An independent set in the graph corresponds to a clique in the complements. Finding all the cliques is exponential in complexity so you can not improve brute force much. Still I believe considering the complement of the graph may make the problem easier to deal with.
Other than complement and finding cliques, I can also think about "Graph Coloring", you color the vertices somehow that no two adjacent vertices have the same color (you can do it with a very simple heuristic algorithm like SL = Smallest Last), and then choose vertices in every color as a subset (as a maximal independent subset).
The only problem is that there are probably too many ways of coloring a graph. You have to keep all the found (maximal) independent sets and move on until you get enough sets!
The Bron–Kerbosch algorithm is commonly used for this problem, see the Wikipedia article for a description and pseudocode that can be turned into a useable program without too much problem. The size of output is, in the worst case, exponential in the number of vertices, but brute force will always be exponential while BK will be polynomial if the output is polynomial. In other words if you know that the output will be reasonable then BK will produce it in a reasonable time. This is an active area of research and there are a number of other algorithms that do the same thing with varying efficiency depending of the type and size of graph. There are applications in several areas, in particular genetics.

max-weight k-clique in a complete k-partite graph

My Problem
Whether there's an efficient algorithm to find a max-weight (or min-weight) k-clique in a complete k-partite graph (a graph in which vertices are adjacent if and only if they belong to different partite sets according to wikipedia)?
More Details about the Terms
Max-weight Clique: Every edge in the graph has a weight. The weight of a clique is the sum of the weights of all edges in the clique. The goal is to find a clique with the maximum weight.
Note that the size of the clique is k which is the largest possible clique size in a complete k-partite graph.
What I have tried
I met this problem during a project. Since I am not a CS person, I am not sure about the complexity etc.
I have googled several related papers but none of them deals with the same problem. I have also programmed a greedy algorithm + simulated annealing to deal with it (the result seems not good). I have also tried something like Dynamic Programming (but it does not seem efficient). So I wonder whether the exact optimal can be computed efficiently. Thanks in advance.
EDIT Since my input can be really large (e.g. the number of vertices in each clique is 2^k), I hope to find a really fast algorithm (e.g. polynomial of k in time) that works out the optimal result. If it's not possible, can we prove some lower bound of the complexity?
Generalized Maximum Clique Problem (GMCP)
I understand that you are looking for the Generalized Maximum/ minimum Clique Problem (GMCP), where finding the clique with maximum score or minimum cost is the optimization problem.
This problem is a NP-Hard problem as shown in Generalized network design problems, so there is currently no polynomial time exact solution to your problem.
Since, there is no known polynomial solution to your problem, you have 2 choices. Reducing the problem size to find the exact solution or to find an estimated solution by relaxing your problem and it leads you to a an estimation to the optimal solution.
Example and solution for the small problem size
In small k-partite graphs (in our case k is 30 and each partite has 92 nodes), we were able to get the optimal solution in a reasonable time by a heavy branch and bounding algorithm. We have converted the problem into another NP-hard problem (Mixed Integer Programming), reduced number of integer variables, and used IBM Cplex optimizer to find the optimal solution to GMCP.
You can find our project page and paper useful. I can also share the code with you.
How to estimate the solution
One straight forward estimation to this NP-Hard problem is relaxing the Mixed Integer Programming problem and solve it as a linear programming problem. Of course it will give you an estimation of the solution, but still you might get a reasonable answer in practice.
More general problem (Generalized Maximum Multi Clique Problem)
In another work, we solve the Generalized Maximum Multi Clique Problem (GMMCP), where maximizing the score or minimizing the cost of selecting multiple k-cliques in a complete k-partite graph is in interest. You can find the project page by searching for GMMCP Tracking.
The maximum clique problem in a weighted graph in general is intractable. In your case, if the graph contains N nodes, you can enumerate through all possible k-cliques in N ** k time. If k is fixed (don't know if it is), your problem is trivially polynomially solvable, as this is a polynomial in N. I don't believe the problem to be tractable if k is a free parameter because I can't see how the assumption of a k-partite graph would make the problem significantly simpler from the general one.
How hard your problem is in practice depends also on how the weights are distributed. If all the weights are very near to each others, i.e. the difference between "best" and "good" is relatively small, the problem is very hard. If you have wildly different weights on the edges, the problem can be easier, because a greedy algorithm can give you a good "initial" solution, and you can use that and subsequent good solutions to limit your combinatorial search using the well-known branch-and-bound method.

Simple Graph |V|= 10^6, degree 4: Maximum Independent Subset?

You are given a simple graph of max degree 4 with 1 million vertices.
We want to find a Maximum Independent Subset.
In the general case it is NP hard.
Does the fact that the degree is max 4 provide an efficient solution to calculate it?
Reading further into that Wikipedia page, I found this on the subject:
For instance, for sparse graphs (graphs in which the number of edges
is at most a constant times the number of vertices in any subgraph),
the maximum clique has bounded size and may be found exactly in linear
time;[6] however, for the same classes of graphs, or even
for the more restricted class of bounded degree graphs, finding the
maximum independent set is MAXSNP-complete, implying that, for some
constant c (depending on the degree) it is NP-hard to find an
approximate solution that comes within a factor of c of the
optimum.[7]
Your case is the bounded degree case, so judging from this snippet, your more restrictive version is still NP-hard.
There's a very simple greedy 1/5-approximation. Take any vertex, add it to independent set, and remove neighbours from the graph. Continue till no vertices remain. A bit more general version of this trick is Turan's theorem.

Find the extreme for priority function / alphabet order

We have an array of elements a1,a2,...aN from an alphabet E. Assuming |N| >> |E|.
For each symbol of the alphabet we define an unique integer priority = V(sym). Let's define V{i} := V(symbol(ai)) for the simplicity.
How can I find the priority function V for which:
Count(i)->MAX | V{i} < V{i+1}
In other words, I need to find the priorities / permutation of the alphabet for which the number of positions i, satisfying the condition V{i}<V{i+1}, is maximum.
Edit-1: Please, read carefully. I'm given an array ai and the task is to produce a function V. It's not about sorting the input array with a priority function.
Edit-2: Example
E = {a,b,c}; A = 'abcab$'; (here $ = artificial termination symbol, V{$}=+infinity)
One of the optimal priority functions is: V{a}=1,V{b}=2,V{c}=3, which gives us the following signs between array elements: a<b<c>a<b<$, resulting in 4 '<' signs of 5 total.
If elements could not have tied priorities, this would be trivial. Just sort by priority. But you can have equal priorities.
I would first sort the alphabet by priority. Then I'd extract the longest rising subsequence. That is the start of your answer. Extract the longest rising subsequence from what remains. Append that to your answer. Repeat the extraction process until the entire alphabet has been extracted.
I believe that this gives an optimal result, but I haven't tried to prove it. If it is not perfectly optimal, it still will be pretty good.
Now that I think I understand the problem, I can tell you that there is no good algorithm to solve it.
To see this let us first construct a directed graph whose vertices are your elements, and whose edges indicate how many times one element immediately preceeded another. You can create a priority function by dropping enough edges to get a directed acyclic graph, use the edges to create a partially ordered set, and then add order relations until you have a full linear order, from which you can trivially get a priority function. All of this is straightforward once you have figured out which edges to drop. Conversely given that directed graph and your final priority function, it is easy to figure out which set of edges you had to decide to drop.
Therefore your problem is entirely equivalent to figuring out a minimal set of edges you need to drop from athat directed graph to get athat directed acyclic graph. However as http://en.wikipedia.org/wiki/Feedback_arc_set says, this is a known NP hard problem called the minimum feedback arc set. begin update It is therefore very unlikely that there is a good algorithm for the graphs you can come up with. end update
If you need to solve it in practice, I'd suggest going for some sort of greedy algorithm. It won't always be right, but it will generally give somewhat reasonable results in reasonable time.
Update: Moron is correct, I did not prove NP-hard. However there are good heuristic reasons to believe that the problem is, in fact, NP-hard. See the comments for more.
There's a trivial reduction from Minimum Feedback Arc Set on directed graphs whose arcs can be arranged into an Eulerian path. I quote from http://www14.informatik.tu-muenchen.de/personen/jacob/Publications/JMMN07.pdf :
To the best of our knowledge, the
complexity status of minimum feedback
arc set in such graphs is open.
However, by a lemma of Newman, Chen,
and Lovász [1, Theorem 4], a
polynomial algorithm for [this problem]
would lead to a 16/9 approximation
algorithm for the general minimum
feedback arc set problem, improving
over the currently best known O(log n
log log n) algorithm [2].
Newman, A.: The maximum acyclic subgraph problem and degree-3 graphs.
In: Proceedings of the 4th
International Workshop on
Approximation Algorithms for
Combinatorial Optimization Problems,
APPROX. LNCS (2001) 147–158
Even,G.,Naor,J.,Schieber,B.,Sudan,M.:Approximatingminimumfeedbacksets
and multi-cuts in directed graphs. In:
Proceedings of the 4th International
Con- ference on Integer Programming
and Combinatorial Optimization. LNCS
(1995) 14–28

Resources