Algorithm for independent set of a graph? - algorithm

is there an algorithm for finding all the independent sets of an directed graph ?
From what i've read an independent set represents a set formed by the nodes that are not adjacent.
So for this example I would have {1} {2} {1,3}
So how is possible to find all of them, I am thinking about something recursive but I don't really know the algorithm, if someone could point me in the right direction it would be much appreciated !
Thank you!

Typical way to find independent sets is to consider the complement of a graph. A complement of a graph is defined as a graph with the same set of vertices and an edge between a pair if and only if there is no edge between them in the original graph. An independent set in the graph corresponds to a clique in the complements. Finding all the cliques is exponential in complexity so you can not improve brute force much. Still I believe considering the complement of the graph may make the problem easier to deal with.

Other than complement and finding cliques, I can also think about "Graph Coloring", you color the vertices somehow that no two adjacent vertices have the same color (you can do it with a very simple heuristic algorithm like SL = Smallest Last), and then choose vertices in every color as a subset (as a maximal independent subset).
The only problem is that there are probably too many ways of coloring a graph. You have to keep all the found (maximal) independent sets and move on until you get enough sets!

The Bron–Kerbosch algorithm is commonly used for this problem, see the Wikipedia article for a description and pseudocode that can be turned into a useable program without too much problem. The size of output is, in the worst case, exponential in the number of vertices, but brute force will always be exponential while BK will be polynomial if the output is polynomial. In other words if you know that the output will be reasonable then BK will produce it in a reasonable time. This is an active area of research and there are a number of other algorithms that do the same thing with varying efficiency depending of the type and size of graph. There are applications in several areas, in particular genetics.

Related

Rearranging a graph so that certain nodes are not adjacent?

EDIT: Precisely, I am trying to find two disjoint independent sets of known size in a graph shaped like a triangular grid, which may have holes and has a variable perimeter shape.
I'm not very well versed in graph theory, so I'm not sure if there exists an efficient solution for this problem. Consider the following graphs:
The colors of any two nodes can be swapped. The goal is to ensure that no two red nodes are adjacent, and no two green nodes are adjacent. The edges marked with exclamation points are invalid. Basically, I need to write two algorithms:
Determine that the nodes in a given graph can be arranged so that red and green nodes are not adjacent to nodes of the same color.
Actually rearrange the nodes.
I'm a little lost on how to implement this. It's not too difficult to separate the nodes of one color, but repeating the process for the second color may mess up the first color. Without a way to determine whether the graph can actually be arranged properly, this process could loop forever.
Is there some kind of algorithm that I can use/write for this? I'm mainly interested in the first image's graph (a triangular grid), but a generic algorithm would work as well.
First, let's note that the problem is a variant of graph coloring.
Now, if you only dealing with 2 colors (red,green) - coloring a graph with 2 colors is fairly easy, and is basically done by finding out if the graph is bipartite, and coloring each "side" of the graph in one color. Finding if a graph is bipartite is fairly simple.
However, if you want more than two colors, the problem becomes NP-Complete, and is actually a variant of the Graph Coloring Problem.
Graph Coloring Problem:
Given a graph G=(V,E) and a number k determine if there is a
function c:V->{1,2.,,,.k} such that c(v) = v(u) -> (v,u) is not an
edge.
Informally, you can color the graph in k colors, and you need to determine if there is some coloring such that you never color 2 nodes that share an edge with the same color.
Note that while it seems your problem is slightly easier, since you already know what is the number of nodes in each color, it doesn't really make a difference.
Assume you have a polynomial time algorithm A that solves your problem.
Now, given an instance (G,k) of graph coloring - there are only O(n^3) possibilities to #color1,#color2,#color3 - so by examining each of these and invoking A on it, you can find a polynomial time solution to Graph-Coloring. This will mean P=NP, which is most likely (according to most CS researchers) not the case.
tl;dr:
For 2 colors: find out if the graph is bipartite - and give one color to each side of the graph.
For 3 or more colors: There is no known efficient solution, and the general belief is one does not exist.
I thought this problem would be easier for planar graph, but unfortunately it's not the case. Best match for this problem I was able to find is minimum sum coloring and largest bipartite subgraph.
For largest bipartite subgraph, assume that number of reds + number of greens exactly match the size of largest bipartite subgraph. Then this problem is equivalent to your. Paper claims that it's still NP-hard even for planar graphs.
For minimum sum coloring, assume that red color has weight 1, green color has color 2, and we have infinitely many blue* colors with some weight of >graph size. Then if answer is exactly minimal sum coloring, there is no polynomial algorithm to find it (although paper referes to such algorithm for chordal graphs).
Anyway, it seems that the closer your red+green count to the 'optimal' in some sense subgraph, the more difficult problem is.
If you can afford inexact solution, or relaxed solution then you only spearate, say, reds, you have an option. As I said in comment, approximate solution of maximum independent set problem for planar graph. Then color this set into red and blue colors, if it is large enough.
If you know that red+green is much less than total number of vertices, another approximation can work. Look at introduction chapter of this article. It claims that:
For graphs which are promised to have small chromatic number, a better
guarantee is known: given a k-colorable graph, an algorithm due to
Karger et al. [12] uses semidefinite programming (SDP) to find an
independent set of size about Ω(n/∆^(1−2/k)).
Your graph is for sure 4-colorable, so you can count on large enough independent set. The same article states that greedy solution already can find large enough independent set.

How to find the size of maximal clique or clique number?

Given an undirected graph G = G(V, E), how can I find the size of the largest clique in it in polynomial time? Knowing the number of edges, I could put an upper limit on the maximal clique size with
https://cs.stackexchange.com/questions/11360/size-of-maximum-clique-given-a-fixed-amount-of-edges
, and then I could iterate downwards from that upper limit to 1. Since this upper cap is O(sqrt(|E|)), I think I can check for the maximal clique size in O(sqrt(|E|) * sqrt(|E|) * sqrt(|E|)) time.
Is there a more efficient way to solve this NP-complete problem?
Finding the largest clique in a graph is the clique number of the graph and is also known as the maximum clique problem (MCP). This is one of the most deeply studied problems in the graph domain and is known to be NP-Hard so no polynomial time algorithm is expected to be found to solve it in the general case (there are particular graph configurations which do have polynomial time algorithms). Maximum clique is even hard to approximate (i.e. find a number close to the clique number).
If you are interested in exact MCP algorithms there have been a number of important improvements in the past decade, which have increased performance in around two orders of magnitude. The current leading family of algorithms are branch and bound and use approximate coloring to compute bounds. I name the most important ones and the improvement:
Branching on color (MCQ)
Static initial ordering in every subproblem (MCS and BBMC)
Recoloring: MCS
Use of bit strings to encode the graph and the main operations (BBMC)
Reduction to maximum satisfiability to improve bounds (MaxSAT)
Selective coloring (BBMCL)
and others.
It is actually a very active line of research in the scientific community.
The top algorithms are currently BBMC, MCS and I would say MaxSAT. Of these probably BBMC and its variants (which use a bit string encoding) are the current leading general purpose solvers. The library of bitstrings used for BBMC is publicly available.
Well I was thinking a bit about some dynamic programming approach and maybe I figured something out.
First : find nodes with very low degree (can be done in O(n)). Test them, if they are part of any clique and then remove them. With a little "luck" you can crush graph into few separate components and then solve each one independently (which is much much faster).
(To identify component, O(n) time is required).
Second : For each component, you can find if it makes sense to try to find any clique of given size. How? Lets say, you want to find clique of size 19. Then there has to exist at least 19 nodes with at least 19 degree. Otherwise, such clique cannot exist and you dont have to test it.

Find representative vertices in a graph

For some project in computer vision I have N points in high-dimensional space. I want to select k of them that will be "the most distinguishable" from each other. For example, it can translate to sum of distances between chosen points is maximum. Or it can be that volume of polyhedron is maximum. But generally anything that has some intuition behind can go.
As expected I want to find these representative points.
There are two questions:
What definition for "the most distinguishable" points is more commonly used? Do they change the algorithm used to find those points?
What is the algorithm to find the points? It highly reminds me maximal weighted clique problem. Is it NP-hard problem? In this case can we make some good approximation against optimal solution?
The way you define "the most distinguishable" will definitely affect the algorithm you'll want to use. for example, you can define "the most distinguishable" as the set with the maximal sum of distances between any two points in the set, but you could also define it as the set with the maximal minimum distance between any two points. these are two completely different problems.
As for algorithms, as I've said, that depends on your definition. If you're looking to find the K farthest points, you should look into this question. This problem is NP-Complete, but you may get some ideas about how to approach the problem.

Find the extreme for priority function / alphabet order

We have an array of elements a1,a2,...aN from an alphabet E. Assuming |N| >> |E|.
For each symbol of the alphabet we define an unique integer priority = V(sym). Let's define V{i} := V(symbol(ai)) for the simplicity.
How can I find the priority function V for which:
Count(i)->MAX | V{i} < V{i+1}
In other words, I need to find the priorities / permutation of the alphabet for which the number of positions i, satisfying the condition V{i}<V{i+1}, is maximum.
Edit-1: Please, read carefully. I'm given an array ai and the task is to produce a function V. It's not about sorting the input array with a priority function.
Edit-2: Example
E = {a,b,c}; A = 'abcab$'; (here $ = artificial termination symbol, V{$}=+infinity)
One of the optimal priority functions is: V{a}=1,V{b}=2,V{c}=3, which gives us the following signs between array elements: a<b<c>a<b<$, resulting in 4 '<' signs of 5 total.
If elements could not have tied priorities, this would be trivial. Just sort by priority. But you can have equal priorities.
I would first sort the alphabet by priority. Then I'd extract the longest rising subsequence. That is the start of your answer. Extract the longest rising subsequence from what remains. Append that to your answer. Repeat the extraction process until the entire alphabet has been extracted.
I believe that this gives an optimal result, but I haven't tried to prove it. If it is not perfectly optimal, it still will be pretty good.
Now that I think I understand the problem, I can tell you that there is no good algorithm to solve it.
To see this let us first construct a directed graph whose vertices are your elements, and whose edges indicate how many times one element immediately preceeded another. You can create a priority function by dropping enough edges to get a directed acyclic graph, use the edges to create a partially ordered set, and then add order relations until you have a full linear order, from which you can trivially get a priority function. All of this is straightforward once you have figured out which edges to drop. Conversely given that directed graph and your final priority function, it is easy to figure out which set of edges you had to decide to drop.
Therefore your problem is entirely equivalent to figuring out a minimal set of edges you need to drop from athat directed graph to get athat directed acyclic graph. However as http://en.wikipedia.org/wiki/Feedback_arc_set says, this is a known NP hard problem called the minimum feedback arc set. begin update It is therefore very unlikely that there is a good algorithm for the graphs you can come up with. end update
If you need to solve it in practice, I'd suggest going for some sort of greedy algorithm. It won't always be right, but it will generally give somewhat reasonable results in reasonable time.
Update: Moron is correct, I did not prove NP-hard. However there are good heuristic reasons to believe that the problem is, in fact, NP-hard. See the comments for more.
There's a trivial reduction from Minimum Feedback Arc Set on directed graphs whose arcs can be arranged into an Eulerian path. I quote from http://www14.informatik.tu-muenchen.de/personen/jacob/Publications/JMMN07.pdf :
To the best of our knowledge, the
complexity status of minimum feedback
arc set in such graphs is open.
However, by a lemma of Newman, Chen,
and Lovász [1, Theorem 4], a
polynomial algorithm for [this problem]
would lead to a 16/9 approximation
algorithm for the general minimum
feedback arc set problem, improving
over the currently best known O(log n
log log n) algorithm [2].
Newman, A.: The maximum acyclic subgraph problem and degree-3 graphs.
In: Proceedings of the 4th
International Workshop on
Approximation Algorithms for
Combinatorial Optimization Problems,
APPROX. LNCS (2001) 147–158
Even,G.,Naor,J.,Schieber,B.,Sudan,M.:Approximatingminimumfeedbacksets
and multi-cuts in directed graphs. In:
Proceedings of the 4th International
Con- ference on Integer Programming
and Combinatorial Optimization. LNCS
(1995) 14–28

Efficient minimal spanning tree in metric space

I have a large set of points (n > 10000 in number) in some metric space (e.g. equipped with Jaccard Distance). I want to connect them with a minimal spanning tree, using the metric as the weight on the edges.
Is there an algorithm that runs in less than O(n2) time?
If not, is there an algorithm that runs in less than O(n2) average time (possibly using randomization)?
If not, is there an algorithm that runs in less than O(n2) time and gives a good approximation of the minimum spanning tree?
If not, is there a reason why such algorithm can't exist?
Thank you in advance!
Edit for the posters below:
Classical algorithms for finding minimal spanning tree don't work here. They have an E factor in their running time, but in my case E = n2 since I actually consider the complete graph. I also don't have enough memory to store all the >49995000 possible edges.
Apparently, according to this: Estimating the weight of metric minimum spanning trees in sublinear time there is no deterministic o(n^2) (note: smallOh, which is probably what you meant by less than O(n^2), I suppose) algorithm. That paper also gives a sub-linear randomized algorithm for the metric minimum weight spanning tree.
Also look at this paper: An optimal minimum spanning tree algorithm which gives an optimal algorithm. The paper also claims that the complexity of the optimal algorithm is not yet known!
The references in the first paper should be helpful and that paper is probably the most relevant to your question.
Hope that helps.
When I was looking at a very similar problem 3-4 years ago, I could not find an ideal solution in the literature I looked at.
The trick I think is to find a "small" subset of "likely good" edges, which you can then run plain old Kruskal on. In general, it's likely that many MST edges can be found among the set of edges that join each vertex to its k nearest neighbours, for some small k. These edges might not span the graph, but when they don't, each component can be collapsed to a single vertex (chosen randomly) and the process repeated. (For better accuracy, instead of picking a single representative to become the new "supervertex", pick some small number r of representatives and in the next round examine all r^2 distances between 2 supervertices, choosing the minimum.)
k-nearest-neighbour algorithms are quite well-studied for the case where objects can be represented as vectors in a finite-dimensional Euclidean space, so if you can find a way to map your objects down to that (e.g. with multidimensional scaling) then you may have luck there. In particular, mapping down to 2D allows you to compute a Voronoi diagram, and MST edges will always be between adjacent faces. But from what little I've read, this approach doesn't always produce good-quality results.
Otherwise, you may find clustering approaches useful: Clustering large datasets in arbitrary metric spaces is one of the few papers I found that explicitly deals with objects that are not necessarily finite-dimensional vectors in a Euclidean space, and which gives consideration to the possibility of computationally expensive distance functions.

Resources