Maximum Independent Set Algorithm - algorithm

I don't believe there exists an algorithm for finding the maximum independent vertex set in a bipartite graph other than the brute force method of finding the maximum among all possible independent sets.
I am wondering about the pseudocode to find all possible vertex sets.
Say given a bipartite graph with 4 blue vertices and 4 red. Currently I would
Start with an arbitrary blue,
find all red that don't match this blue
put all these red in Independent Set
find all blue that dont match these red
put these blue in Independent Set
Repeat for next vertex in blue
Repeat all over again for all blue then all vertices in red.
I understand that this way doesn't give me all possible Independent Set combinations at all, since after the first step I am choosing all of the next colour vertices that dont match rather than stepping through every possiblity.
For example given a graph with the matching
B R
1 1
1 3
2 1
2 3
3 1
3 3
4 2
4 4
Start with blue 1
Choose red 2 and 4 since they dont match
Add 2, 4 to independent Set
Choose 2 and 3 from blue since they dont with 2 or 4 from red
Add 2 and 3 from blue to independent set as well.
Independent Set = 1,2,3 from blue 2,4 from red
Repeat for blue 2, blue 3, ... red n (storing the cardinality for each set)
Is there a way I can improve this algorithm to better search for all possibilities. I know that a |Maximum Set for a bipartite graph| = |Red| + |Blue| - |Maximum Matching|.
The problem arises with the possibility that by choosing all possible red in the first go for a given blue, if those red connect to all other possible blue then my set only ever has all 1 blue and rest red.

I don't believe there exists an algorithm for finding the maximum independent vertex set in a bipartite graph other than the brute force method of finding the maximum among all possible independent sets.
There is: finding the maximum independent set is equivalent to finding the minimum vertex cover (by taking complement of the result), and Konig's theorem states that minimum vertex cover in bipartite graphs is equivalent to maximum matching, and that that can be found in polynomial time. I don't know about finding all matchings, but it seems there can be exponentially many.

As Aaron McDaid mentions in his now deleted answer, the problem of find a maximum independent set is equivalent to finding a maximum clique. The equivalence is that finding a maximum independent set in a graph G is the same as finding a maximum clique in the complement of G. The problem is known to be NP-complete.
The brute force solution you mention takes O(n^2 2^n), but you can do better than this. Robson has a paper entitled ""Algorithms for maximum independent sets" from 1986 that gives an algorithm that takes O(2^{c*n}) for a constant 0<c<1 (I believe c is around 1/4, but I could be mistaken). None of this is specific to a bipartite graph.
One thing to note if you have a bipartite graph is that either side forms an independent set. In the complete bipartite graph K_{b,r} which is partitioned B x R with |B|=b and |R|=r where there is an edge from every vertex in B to every vertex in R and none within B nor none within R, a maximum independent set is B if b>=r, otherwise it's R.
Taking B or R won't work in general. sdcvvc noted the example with vertices {1,2,3,A,B,C} and edges {A,1}, {A,2}, {A,3}, {B,3}, {C,3}. In this case, the maximal independent set is {1,2,B,C}, which is larger than either partition {A,B,C} or {1,2,3}.
It may improve Robson's algorithm to start with the larger of B or R and proceed from there, though I'm not sure how much of a difference this will make.
Alternatively, you can use the Hopcroft–Karp algorithm on the bipartite complement of a graph to find a maximum matching, and then take the vertices used in the matching as the independent set. This gives a polynomial time algorithm to solve the problem.

Related

Maximum weight independent set with divide and conquer

I was reading about the Maximum Weight Independent Set problem which is:
Input: An undirected graph G = (V, E)and a non-negative weight Wv for
each vertex v ∈ V
Output: An independent set S ∈ V of G with the maximum-possible sum
∑Vw of vertex weights
and that same source (not the SO post) mentions that the problem can be solved by 4 recursive calls with a divide & conquer approach.
I googled but couldn't find such an algorithm. Does anyone have an idea how would this be solved by divide & conquer? I do understand that the running time is much worse than the Dynamic Programming I am just curious on the approach
I understand the part in the manuscript in such a way that only line graphs are taken into consideration. In that case, I believe the footnote to mean the following.
If a larger line graph is taken as input and split (say at an edge which in incident with the nodes a and b), there are four cases to consider to have a proper combination step.
You would have to solve the "left" branch by solving for the cases
a is included in the maximum independent set
a is not included in the maximum independent set
and the same goes for the "right" branch. In total there are four possible ways to combine the results, out of which a maximal one is to be returned.

Minimum number of flips to get adjacent 1's in a matrix

Given a binary matrix (values of 0 or 1), adjacent entries of 1 denote “hills”. Also, given some number k, find the minimum number of 0's you need to “flip” to 1 in order to form a hill of at least size k.
Edit: For clarification, adjacent means left-right-up-down neighborhoods. Diagonals do not count as adjacent. For example,
[0 1
0 1]
is one hill of size 2,
[0 1
1 0]
defines 2 hills of size 1,
[0 1
1 1]
defines 1 hill of size 3, and
[1 1
1 1]
defines 1 hill of size 4.
Also for clarification, size is defined by the area formed by the adjacent blob of 1's.
My initial solution has to do with transforming each existing hill into nodes of a graph, and the cost to be the minimal path to each other node. Then, performing a DFS (or similar algorithm) to find the minimum cost.
This fails in cases where choosing some path reduces the cost for another edge, and solutions to combat this (that I can think of) are too close to a brute force solution.
Your problem is closely related to the rectilinear Steiner tree problem.
A Steiner tree connects a set of points together using line segments, minimising the total length of the line segments. The line segments can meet in arbitrary places, not necessarily at points in the set (so it is not the same thing as a minimum spanning tree). For example, given three points at the corners of an equilateral triangle, the Euclidean Steiner tree connects them by meeting in the middle:
A rectilinear Steiner tree is the same, except you minimise the total Manhattan distance instead of the total Euclidean distance.
In your problem, instead of joining your hills with line segments whose length is measured by Euclidean distance, you are joining your hills by adding pixels. The total number of 0s you need to flip to join two cells in your array is equal to the Manhattan distance between those two cells, minus 1.
The rectilinear Steiner tree problem is known to be NP-complete, even when restricted to points with integer coordinates. Your problem is a generalisation, except for two differences:
The "minus 1" part when measuring the Manhattan distance. I doubt that this subtle difference is enough to bring the problem into a lower complexity class, though I don't have a proof for you.
The coordinates of your integer points are bounded by the size of the matrix (as pointed out by Albert Hendriks in the comments). This does matter — it means that pseudo-polynomial time for the rectilinear Steiner tree problem would be polynomial time for your problem.
This means that your problem may or may not be NP-hard, depending on whether the rectilinear Steiner tree problem is weakly NP-complete or strongly NP-complete. I wasn't able to find a definitive answer to this in the literature, and there isn't much information about the problem other than in academic literature. It does at least appear that there isn't a known pseudo-polynomial time algorithm, as far as I can tell.
Given that, your most likely options are some kind of backtracking search for an exact solution, or applying a heuristic to get a "good enough" solution. One possible heuristic as described by Wikipedia is to compute a rectilinear minimum spanning tree and then try to improve on the RMST using an iterative improvement method. The RMST itself gives a solution within a constant factor of 1.5 of the true optimum.
A hill is composed by four sequences of 1's:
The right sequence is composed of r 'bits', the up sequence has u bits, and so on.
A hill of size k is k= 1 + r + l + u + d (1 central + sequences), where each value is 0 <= v < k.
The problem is combinatorial. For each cell all possible combinations of {r,l,u,d} that satisfy the former relation should be tested.
When testing a combination in a cell, you must count the number of the existing 1 in each value of the combination, they don't "flip". This will also skip early some other combinations.

Divide points into sets of maximum distance

I have a list of GPS points...but what am I asking for could also work for any X,Y coordinates.
In my use-case, I want to assign points in sets. Each point can belong in only one set and each set has a condition that distance between any of two points in the set is not greater than some constant...that means, all points of the set fit a circle of a specific diameter.
For a list of points, I want to find best (or at least some) arrangement in which there is minimal number of sets.
There will be sets with just a single point because other points around are already in different sets or simply because there are no points around (distance between them is greater than in the condition of the set)...what I want to avoid is inefficient set assignment where e.g. instead of finding ideal 2 sets, each having 30 points, I find 5 sets, one with 1 point, second with 40 points, etc...
All I'am capable of is a brute-force solution, compute all distances, build all posible set arrangements, sort them by number of sets and pick one with the least number of sets.
Is there a better approach?
The Problem here is NP-complete. What you try to solve is the max-clique problem combined with a set cover problem.
Your problem can be represented as a Graph G=(V,E), where the vertices are your coordinates and the edges the connections in distances. This graph can be made in O(n^2) time. Then you filter out all edges with a distance greater then your constant giving the graph G'.
With the the remaining graph G' you want to find all cliques (effectively solving max-clique). A clique is a fully connected set of vertices. Name this list of cliques S.
Now finding a minimal set of elements of S that cover all vertices V is the set cover problem.
Both the set cover problem and the max clique are NP complete. And therefore finding an optimal solution would take exponential time. You could look at approximation algorithms for these two problems.

Is there any algorithm which can solve this number group sequence?

I hava a problem which is described below. Do you have any good solution or this problem is just another form of any "classics" or "have been solved" problem?
The problem is :
There are some group of numbers,e.g.
A(1 8 9)
B(1 4 5)
C(2 4 6)
D(3 4 7)
E(2 10 11)
F(3 12 13)
There are "A-F" six group. we have numbers "1,2,3,4,5,6,7,8,9,10,11,12,13".
Now find the minimum amount of number set which satisfies each group must have a number in this set at least. For examlpe, we can find the set "1 4 2 13 12" that A has "1",B has "1,4",C has "2,4",D has "4" ,E has "2",F has "12,13" .
But set "1 2 4" is not that we find ,F does not has any number in the set.
The best set is "1,2,3",every gruop has a number in the set,and the size of the set is optimal. It has only three numbers. THIS is What we want. If there are many best sets,finding any one is OK. Thanks.
This is equivalent to the set cover problem. In this case, each of your sets A, B, ..., F are the elements of the set cover problem, and each of the numbers 1, 2, ..., 13 are the sets. For example, in this mapping 1 becomes {A, B}, and 11 becomes the set {E}.
Set cover is NP-hard. The integer linear programming formulation on the linked Wikipedia page is probably as good as you'll get for exact solutions; the greedy algorithm there is a decent approximation for large problems.
This problem is NP-hard via a reduction from the NP-hard vertex cover problem (given a graph, can you find a set of k nodes such that every vertex in the graph is adjacent to some chosen node?)
The reduction is as follows. Number all of the nodes in the graph 1, 2, 3, ..., n in any order that you'd like. Then, for each edge in the graph, construct the set containing just two numbers - the edge's endpoints. If there is a k-node vertex cover in the original graph, then there is a set of k numbers you can pick (namely, the nodes in the vertex cover) such that you have one number chosen from each set. This can be computed in polynomial-time.
To see why the reduction works, note that if there is a set of size k you can pick such that each set in the construction has at least one element picked, then the vertices corresponding to those numbers form a k-element vertex cover in the original graph.
This reduction can be done in polynomial-time, so we have a polynomial-time reduction from the NP-hard vertex-cover problem to your problem. Thus this problem is NP-hard. So, unless P = NP, there is no polynomial-time algorithm for this problem.
Hope this helps!

Algorithm/approximation for combined independent set/hamming distance

Input: Graph G
Output: several independent sets, so that the membership of a node to all independent sets is unique. A node therefore has no connections to any node in its own set. Here is an example path.
Since clarification was called for here another rephrasal:
Divide a given graph into sets so that
i can tell a node from all others by its membership in sets e.g. if node i is present only in set A no other node should be present in set A only
if node j is present in set A and B then no other node should be present in set A and B only. if the membership of nodes is coded by a bit pattern, then these bit patterns have hamming distance at least one
if two nodes are adjacent in the graph, they should not be present in the same set, hence be an independent set
Example:
B has no adjacent nodes
D=>A, A=>D
Solution:
A B /
/ B D
A has bit pattern 10 and no adjacent node in its set. B has bit pattern 11 and no adjacent node, D has 01
therefore all nodes have hamming distance at least 1 an no adjacent nodes => correct
Wrong, because D and A are connected:
A D B
/ D B
A has bit pattern 10 and D in its set, they are adjacent. B has bit pattern 11 and no adjacent node, D has 11 as has B, so there are two errors in this solution and therefore it is not accepted.
Of course this should be extended to more Sets as the number of Nodes in the Graph increases, since you need at least log(n) sets.
I already wrote a transformation into MAX-SAT, to use a sat-solver for this. but the number of clauses is just to big. A more direct approach would be nice. So far I have an approximation, but I would like an exact solution or at least a better approximation.
I have tried an approach where I used a particle swarm to optimize from an arbitrary solution towards a better one. However the running time is pretty awful and the results are far from great. I am looking for a dynamic algorithm or something, however i cannot fathom how to divide and conquer this problem.
Not a complete answer, and I don't know how useful it will be to you. But here goes:
The hamming distance strikes me as a red herring. Your problem statement says it must be at least 1 but it could be 1000. It suffices to say the bit encoding for each node's set memberships is unique.
Your problem statement doesn't spell it out, but your solution above suggests every node must be a member of at least 1 set. ie. a bit encoding of all 0's is not allowed for any node's set memberships.
Ignoring connected nodes for a moment, disjoint nodes are easy: Simply number them sequentially with an unused bit encoding. Save those for last.
Your example above uses directed edges, but again, that strikes me as a red herring. If A cannot be in the same set as D because A=>D, D cannot be in the same set as A regardless whether D=>A.
You mention needing at least log(N) sets. You will also have at most N sets. A fully connected graph (with (N^2-N)/2 undirected edges) will require N sets each containing a single node.
In fact, if your graph contains a fully connected simplex of M dimensions (M in 1..N-1) with M+1 vertices and (M^2+M)/2 undirected edges, you will require at least M+1 sets.
In your example above, you have one such simplex (M=1) with 2 vertices {A,D} and 1 (undirected) edge {(A,D)}.
It would seem that your problem boils down to finding the largest fully connected simplexes in your graph. Stated differently, you have a routing problem: How many dimensions do you need to route your edges so none cross? It doesn't sound like a very scalable problem.
The first large simplex found is easy. Every vertex node gets a new set with its own bit.
The disjoint nodes are easy. Once the connected nodes are dealt with, simply number the disjoint nodes sequentially skipping any previously used bit patterns. From your example above, since A and D take 01 and 10, the next available bit pattern for B is 11.
The tricky part then becomes how to fold any remaining simplexes as much as possible into the existing range before creating any new sets with new bits. When folding, one must use 2 or more bits (sets) for each node, and the bits (sets) must not intersect with the bits (sets) for any adjacent node.
Consider what happens to your example above when one adds another node, C, to the example:
If C connects directly to both A and D, then the initial problem becomes finding the 2-simplex with 3 vertices {A,C,D} and 3 edges {(A,c),(A,D),(C,D)}. Once A, C and D take the bit patterns 001, 010 and 100, the lowest available bit pattern for disjoint B is 011.
If, on the other hand, C connects directly A or D but not both, the graph has two 1-simplexes. Supposing we find the 1-simplex with vertices {A,D} first giving them the bit patterns 01 and 10, the problem then becomes how to fold C into that range. The only bit pattern with at least 2 bits is 11, but that intersects with whichever node C connects to so we have to create a new set and put C in it. At this point, the solution is similar to the one above.
If C is disjoint, either B or C will get the bit pattern 11 and the remaining one will need a new set and get the bit pattern 100.
Suppose C connects to B but not to A or D. Again, the graph has two 1-simplexes but this time disjoint. Suppose {A,D} is found first as above giving A and D the bit patterns 10 and 01. We can fold B or C into the existing range. The only available bit pattern in the range is 11 and either B or C could get that pattern as neither is adjacent to A or D. Once 11 is used, no bit patterns with 2 or more bits set remain, and we will have to create a new set for the remaining node giving it the bit pattern 100.
Suppose C connects to all 3 A, B and D. In this case, the graph has a 2-simplex with 3 vertexes {A,C,D} and a 1-simplex with 2 vertexes {B, C}. Proceeding as above, after processing the largest simplex, A, C and D will have bit patterns 001, 010, 100. For folding B into this range, the available bit patterns with 2 or more bits set are: 011, 101, 110 and 111. All of these except 101 intersect with C so B would get the bit pattern 101.
The question then becomes: How efficiently can you find the largest fully-connected simplexes?
If finding the largest fully connected simplex is too expensive, one could put an approximate upper bound on potential fully connected simplexes by finding maximal minimums in terms of connections:
Sweep through the edges updating the
vertices with a count of the
connecting edges.
For each connected node, create an array of Cn counts initially zero
where Cn is the count of edges
connected to the node n.
Sweep through the edges again, for the connected nodes n1 and n2,
increment the count in n1
corresponding to Cn2 and vice versa.
If Cn2 > Cn1, update the last count
in the n1 array and vice versa.
Sweep through the connected nodes again, calculating an upper bound on
the largest simplex each node could
be a part of. You could build a pigeon-hole array with a list of vertices
for each upper bound as you sweep through the nodes.
Work through the pigeon-hole array from largest to smallest extracting and
folding nodes into unique sets.
If your nodes are in a set N and your edges in a set E, the complexity will be:
O(|N|+|E|+O(Step 5))
If the above approximation suffices, the question becomes: How efficiently can you fold nodes into existing ranges given the requirements?
This maybe not the answer you might expect, but I can't find a place to add a comment. So I type it directly here. I can't fully understand your question. Or does it need specific knowledge to understand? What is this independent set? As I know a node in an independent set from a directed graph have a two way path to any other node in this set. Is your notion the same?
If this problem is like what I assume, independent sets can be found by this algorithm:
1. do depth-first search on the directed graph, records the time of tree rooted by this node is traversed.
2. then reverse all the edges in this graph
3. do depth-frist search again on the modified graph.
The algorihtm is precisely explained by book "introduction to alogrithm"

Resources