Is there any algorithm which can solve this number group sequence? - algorithm

I hava a problem which is described below. Do you have any good solution or this problem is just another form of any "classics" or "have been solved" problem?
The problem is :
There are some group of numbers,e.g.
A(1 8 9)
B(1 4 5)
C(2 4 6)
D(3 4 7)
E(2 10 11)
F(3 12 13)
There are "A-F" six group. we have numbers "1,2,3,4,5,6,7,8,9,10,11,12,13".
Now find the minimum amount of number set which satisfies each group must have a number in this set at least. For examlpe, we can find the set "1 4 2 13 12" that A has "1",B has "1,4",C has "2,4",D has "4" ,E has "2",F has "12,13" .
But set "1 2 4" is not that we find ,F does not has any number in the set.
The best set is "1,2,3",every gruop has a number in the set,and the size of the set is optimal. It has only three numbers. THIS is What we want. If there are many best sets,finding any one is OK. Thanks.

This is equivalent to the set cover problem. In this case, each of your sets A, B, ..., F are the elements of the set cover problem, and each of the numbers 1, 2, ..., 13 are the sets. For example, in this mapping 1 becomes {A, B}, and 11 becomes the set {E}.
Set cover is NP-hard. The integer linear programming formulation on the linked Wikipedia page is probably as good as you'll get for exact solutions; the greedy algorithm there is a decent approximation for large problems.

This problem is NP-hard via a reduction from the NP-hard vertex cover problem (given a graph, can you find a set of k nodes such that every vertex in the graph is adjacent to some chosen node?)
The reduction is as follows. Number all of the nodes in the graph 1, 2, 3, ..., n in any order that you'd like. Then, for each edge in the graph, construct the set containing just two numbers - the edge's endpoints. If there is a k-node vertex cover in the original graph, then there is a set of k numbers you can pick (namely, the nodes in the vertex cover) such that you have one number chosen from each set. This can be computed in polynomial-time.
To see why the reduction works, note that if there is a set of size k you can pick such that each set in the construction has at least one element picked, then the vertices corresponding to those numbers form a k-element vertex cover in the original graph.
This reduction can be done in polynomial-time, so we have a polynomial-time reduction from the NP-hard vertex-cover problem to your problem. Thus this problem is NP-hard. So, unless P = NP, there is no polynomial-time algorithm for this problem.
Hope this helps!

Related

Maximum weight independent set with divide and conquer

I was reading about the Maximum Weight Independent Set problem which is:
Input: An undirected graph G = (V, E)and a non-negative weight Wv for
each vertex v ∈ V
Output: An independent set S ∈ V of G with the maximum-possible sum
∑Vw of vertex weights
and that same source (not the SO post) mentions that the problem can be solved by 4 recursive calls with a divide & conquer approach.
I googled but couldn't find such an algorithm. Does anyone have an idea how would this be solved by divide & conquer? I do understand that the running time is much worse than the Dynamic Programming I am just curious on the approach
I understand the part in the manuscript in such a way that only line graphs are taken into consideration. In that case, I believe the footnote to mean the following.
If a larger line graph is taken as input and split (say at an edge which in incident with the nodes a and b), there are four cases to consider to have a proper combination step.
You would have to solve the "left" branch by solving for the cases
a is included in the maximum independent set
a is not included in the maximum independent set
and the same goes for the "right" branch. In total there are four possible ways to combine the results, out of which a maximal one is to be returned.

Minimum number of flips to get adjacent 1's in a matrix

Given a binary matrix (values of 0 or 1), adjacent entries of 1 denote “hills”. Also, given some number k, find the minimum number of 0's you need to “flip” to 1 in order to form a hill of at least size k.
Edit: For clarification, adjacent means left-right-up-down neighborhoods. Diagonals do not count as adjacent. For example,
[0 1
0 1]
is one hill of size 2,
[0 1
1 0]
defines 2 hills of size 1,
[0 1
1 1]
defines 1 hill of size 3, and
[1 1
1 1]
defines 1 hill of size 4.
Also for clarification, size is defined by the area formed by the adjacent blob of 1's.
My initial solution has to do with transforming each existing hill into nodes of a graph, and the cost to be the minimal path to each other node. Then, performing a DFS (or similar algorithm) to find the minimum cost.
This fails in cases where choosing some path reduces the cost for another edge, and solutions to combat this (that I can think of) are too close to a brute force solution.
Your problem is closely related to the rectilinear Steiner tree problem.
A Steiner tree connects a set of points together using line segments, minimising the total length of the line segments. The line segments can meet in arbitrary places, not necessarily at points in the set (so it is not the same thing as a minimum spanning tree). For example, given three points at the corners of an equilateral triangle, the Euclidean Steiner tree connects them by meeting in the middle:
A rectilinear Steiner tree is the same, except you minimise the total Manhattan distance instead of the total Euclidean distance.
In your problem, instead of joining your hills with line segments whose length is measured by Euclidean distance, you are joining your hills by adding pixels. The total number of 0s you need to flip to join two cells in your array is equal to the Manhattan distance between those two cells, minus 1.
The rectilinear Steiner tree problem is known to be NP-complete, even when restricted to points with integer coordinates. Your problem is a generalisation, except for two differences:
The "minus 1" part when measuring the Manhattan distance. I doubt that this subtle difference is enough to bring the problem into a lower complexity class, though I don't have a proof for you.
The coordinates of your integer points are bounded by the size of the matrix (as pointed out by Albert Hendriks in the comments). This does matter — it means that pseudo-polynomial time for the rectilinear Steiner tree problem would be polynomial time for your problem.
This means that your problem may or may not be NP-hard, depending on whether the rectilinear Steiner tree problem is weakly NP-complete or strongly NP-complete. I wasn't able to find a definitive answer to this in the literature, and there isn't much information about the problem other than in academic literature. It does at least appear that there isn't a known pseudo-polynomial time algorithm, as far as I can tell.
Given that, your most likely options are some kind of backtracking search for an exact solution, or applying a heuristic to get a "good enough" solution. One possible heuristic as described by Wikipedia is to compute a rectilinear minimum spanning tree and then try to improve on the RMST using an iterative improvement method. The RMST itself gives a solution within a constant factor of 1.5 of the true optimum.
A hill is composed by four sequences of 1's:
The right sequence is composed of r 'bits', the up sequence has u bits, and so on.
A hill of size k is k= 1 + r + l + u + d (1 central + sequences), where each value is 0 <= v < k.
The problem is combinatorial. For each cell all possible combinations of {r,l,u,d} that satisfy the former relation should be tested.
When testing a combination in a cell, you must count the number of the existing 1 in each value of the combination, they don't "flip". This will also skip early some other combinations.

Finding the cycle of smallest average weight in a directed graph

I'm looking for an algorithm that takes a directed, weighted graph (with positive integer weights) and finds the cycle in the graph with the smallest average weight (as opposed to total weight).
Based on similar questions (but for total weight), I considered applying a modification of the Floyd–Warshall algorithm, but it would rely on the following property, which does not hold (thank you Ron Teller for providing a counterexample to show this): "For vertices U,V,W, if there are paths p1,p2 from U to V, and paths p3,p4 from V to W, then the optimal combination of these paths to get from U to W is the better of p1,p2 followed by the better of p3,p4."
What other algorithms might I consider that don't rely on this property?
Edit: Moved the following paragraph, which is no longer relevant, below the question.
While this property seems intuitive, it doesn't seem to hold in the case of two paths that are equally desirable. For example, if p1 has total weight 2 and length 2, and p2 has total weight 3 and length 3, neither one is better than the other. However, if p3 and p4 have greater total weights than lengths, p2 is preferable to p1. In my desired application, weights of each edge are positive integers, so this property is enforced and I think I can assume that, in the case of a tie, longer paths are better. However, I still can't prove that this works, so I can't verify the correctness of any algorithm relying on it.
"While this property seems intuitive, it doesn't seem to hold in the
case of two paths that are equally desirable."
Actually, when you are considering 2 parameters (weight, length), it doesn't hold in any case, here's an example when P1 that in itself has a smaller average than P2, can be sometimes better (example 1) for the final solution or worse (example 2), depending on P3 and P4.
Example 1:
L(P1) = 9, W(P1) = 10
L(P2) = 1, W(P2) = 1
L(P3) = 1, W(P3) = 1
L(P4) = 1, W(P4) = 1
Example 2:
L(P1) = 9, W(P1) = 10
L(P2) = 1, W(P2) = 1
L(P3) = 5, W(P3) = 10
L(P4) = 5, W(P4) = 10
These two parameters have an effect on your objective function that cannot be determined locally, thus the Floyd–Warshall algorithm with any modification won't work.
Since you are considering cycles only, you might want to consider a brute force algorithm that validates the average weight of each of the cycles in the graph. you can do it in polynomial time, see:
Finding all cycles in graph
I can suggest another algorithm.
Let's fix C. Now substract C from all weights. How would the answer change? If we substracted the same number from all weights then the average weight of each cycle decreased on the same number C. Now let's check wether we have cycles with negative average weight. The condition of being of negative average weight is equal to condition of being of negative weight. So it's enough to check wether we have cycle with negative weight. We can do it using Bellman-Ford algorithm. If we have such a cycle then the answer is less than C.
So now we can find the answer via binary search. The resulting complexity wil be O(VE log(MaxWeight))
The problem you describe is called the minimum mean cycle problem and can be solved efficiently. Further, there's some really nice optimization theory to check out if you're interested (start with the standard reference AMO93).

Minimum number of days required to solve a list of questions

There are N problems numbered 1..N which you need to complete. You've arranged the problems in increasing difficulty order, and the ith problem has estimated difficulty level i. You have also assigned a rating vi to each problem. Problems with similar vi values are similar in nature. On each day, you will choose a subset of the problems and solve them. You've decided that each subsequent problem solved on the day should be tougher than the previous problem you solved on that day. Also, to not make it boring, consecutive problems you solve should differ in their vi rating by at least K. What is the least number of days in which you can solve all problems?
Input:
The first line contains the number of test cases T. T test cases follow. Each case contains an integer N and K on the first line, followed by integers v1,...,vn on the second line.
Output:
Output T lines, one for each test case, containing the minimum number of days in which all problems can be solved.
Constraints:
1 <= T <= 100
1 <= N <= 300
1 <= vi <= 1000
1 <= K <= 1000
Sample Input:
2
3 2
5 4 7
5 1
5 3 4 5 6
Sample Output:
2
1
This is one of the challenge from interviewstreet.
Below is my approach
Start from 1st question and find out max possible number of question can be solve and remove these questions from the question list.Now again start from first element of the remainning list and do this till now size of the question list is 0.
I am getting wrong answer from this method so looking for some algo to solve this challenge.
Construct a DAG of problems in the following way. Let pi and pj be two different problems. Then we will draw a directed edge from pi to pj if and only if pj can be solved directly after pi on the same day, consecutively. Namely, the following conditions have to be satisfied:
i < j, because you should solve the less difficult problem earlier.
|vi - vj| >= K (the rating requirement).
Now notice that each subset of problems that is chosen to be solved on some day corresponds to the directed path in that DAG. You choose your first problem, and then you follow the edges step by step, each edge in the path corresponds to the pair of problems that have been solved consecutively on the same day. Also, each problem can be solved only once, so any node in our DAG may appear only in exactly one path. And you have to solve all the problems, so these paths should cover all the DAG.
Now we have the following problem: given a DAG of n nodes, find the minimal number of non-crossing directed paths that cover this DAG completely. This is a well-known problem called Path cover. Generally speaking, it is NP-hard. However, our directed graph is acyclic, and for acyclic graphs it can be solved in polynomial time using reduction to the matching problem. Maximal matching problem, in its turn, can be solved using Hopcroft-Karp algorithm, for example. The exact reduction method is easy and can be read, say, on Wikipedia. For each directed edge (u, v) of the original DAG one should add an undirected edge (au, bv) to the bipartite graph, where {ai} and {bi} are two parts of size n.
The number of nodes in each part of the resulting bipartite graph is equal to the number of nodes in the original DAG, n. We know that Hopcroft-Karp algorithm runs in O(n2.5) in the worst case, and 3002.5 ≈ 1 558 845. For 100 tests this algorithm should take under a 1 second in total.
The algorithm is simple. First, sort the problems by v_i, then, for each problem, find the number of problems in the interval (v_i-K, v_i]. The maximum of those numbers is the result. The second phase can be done in O(n), so the most costly operation is sorting, making the whole algorithm O(n log n). Look here for a demonstration of the work of the algorithm on your data and K=35 in a spreadsheet.
Why does this work
Let's reformulate the problem to the problem of graph coloring. We create graph G as follows: vertices will be the problems and there will be an edge between two problems iff |v_i - v_j| < K.
In such graph, independent sets exactly correspond to sets of problems doable on the same day. (<=) If the set can be done on a day, it is surely an independent set. (=>) If the set doesn't contain two problems not satisfying the K-difference criterion, you can just sort them according to the difficulty and solve them in this order. Both condition will be satisfied this way.
Therefore, it easily follows that colorings of graph G exactly correspond to schedules of the problems on different days, with each color corresponding to one day.
So, we want to find the chromaticity of graph G. This will be easy once we recognize the graph is an interval graph, which is a perfect graph, those have chromaticity equal to cliqueness, and both can be found by a simple algorithm.
Interval graphs are graphs of intervals on the real line, edges are between intervals that intersect. Our graph, as can be easily seen, is an interval graph (for each problem, assign an interval (v_i-K, v_i]. It can be easily seen that the edges of this interval graph are exactly the edges of our graph).
Lemma 1: In an interval graph, there exist a vertex whose neighbors form a clique.
Proof is easy. You just take the interval with the lowest upper bound (or highest lower bound) of all. Any intervals intersecting it have the upper bound higher, therefore, the upper bound of the first interval is contained in the intersection of them all. Therefore, they intersect each other and form a clique. qed
Lemma 2: In a family of graphs closed on induced subgraphs and having the property from lemma 1 (existence of vertex, whose neighbors form a clique), the following algorithm produces minimal coloring:
Find the vertex x, whose neighbors form a clique.
Remove x from the graph, making its subgraph G'.
Color G' recursively
Color x by the least color not found on its neighbors
Proof: In (3), the algorithm produces optimal coloring of the subgraph G' by induction hypothesis + closeness of our family on induced subgraphs. In (4), the algorithm only chooses a new color n if there is a clique of size n-1 on the neighbors of x. That means, with x, there is a clique of size n in G, so its chromaticity must be at least n. Therefore, the color given by the algorithm to a vertex is always <= chromaticity(G), which means the coloring is optimal. (Obviously, the algorithm produces a valid coloring). qed
Corollary: Interval graphs are perfect (perfect <=> chromaticity == cliqueness)
So we just have to find the cliqueness of G. That is easy easy for interval graphs: You just process the segments of the real line not containing interval boundaries and count the number of intervals intersecting there, which is even easier in your case, where the intervals have uniform length. This leads to an algorithm outlined in the beginning of this post.
Do we really need to go to path cover? Can't we just follow similar strategy as LIS.
The input is in increasing order of complexity. We just have to maintain a bunch of queues for tasks to be performed each day. Every element in the input will be assigned to a day by comparing the last elements of all queues. Wherever we find a difference of 'k' we append the task to that list.
For ex: 5 3 4 5 6
1) Input -> 5 (Empty lists so start a new one)
5
2) 3 (only list 5 & abs(5-3) is 2 (k) so append 3)
5--> 3
3) 4 (only list with last vi, 3 and abs(3-4) < k, so start a new list)
5--> 3
4
4) 5 again (abs(3-5)=k append)
5-->3-->5
4
5) 6 again (abs(5-6) < k but abs(4-6) = k) append to second list)
5-->3-->5
4-->6
We just need to maintain an array with last elements of each list. Since order of days (when tasks are to be done is not important) we can maintain a sorted list of last tasks therefore, searching a place to insert the new task is just looking for the value abs(vi-k) which can be done via binary search.
Complexity:
The loop runs for N elements. In the worst case , we may end up querying ceil value using binary search (log i) for many input[i].
Therefore, T(n) < O( log N! ) = O(N log N). Analyse to ensure that the upper and lower bounds are also O( N log N ). The complexity is THETA (N log N).
The minimum number of days required will be the same as the length of the longest path in the complementary (undirected) graph of G (DAG). This can be solved using Dilworth's theorem.

Maximum Independent Set Algorithm

I don't believe there exists an algorithm for finding the maximum independent vertex set in a bipartite graph other than the brute force method of finding the maximum among all possible independent sets.
I am wondering about the pseudocode to find all possible vertex sets.
Say given a bipartite graph with 4 blue vertices and 4 red. Currently I would
Start with an arbitrary blue,
find all red that don't match this blue
put all these red in Independent Set
find all blue that dont match these red
put these blue in Independent Set
Repeat for next vertex in blue
Repeat all over again for all blue then all vertices in red.
I understand that this way doesn't give me all possible Independent Set combinations at all, since after the first step I am choosing all of the next colour vertices that dont match rather than stepping through every possiblity.
For example given a graph with the matching
B R
1 1
1 3
2 1
2 3
3 1
3 3
4 2
4 4
Start with blue 1
Choose red 2 and 4 since they dont match
Add 2, 4 to independent Set
Choose 2 and 3 from blue since they dont with 2 or 4 from red
Add 2 and 3 from blue to independent set as well.
Independent Set = 1,2,3 from blue 2,4 from red
Repeat for blue 2, blue 3, ... red n (storing the cardinality for each set)
Is there a way I can improve this algorithm to better search for all possibilities. I know that a |Maximum Set for a bipartite graph| = |Red| + |Blue| - |Maximum Matching|.
The problem arises with the possibility that by choosing all possible red in the first go for a given blue, if those red connect to all other possible blue then my set only ever has all 1 blue and rest red.
I don't believe there exists an algorithm for finding the maximum independent vertex set in a bipartite graph other than the brute force method of finding the maximum among all possible independent sets.
There is: finding the maximum independent set is equivalent to finding the minimum vertex cover (by taking complement of the result), and Konig's theorem states that minimum vertex cover in bipartite graphs is equivalent to maximum matching, and that that can be found in polynomial time. I don't know about finding all matchings, but it seems there can be exponentially many.
As Aaron McDaid mentions in his now deleted answer, the problem of find a maximum independent set is equivalent to finding a maximum clique. The equivalence is that finding a maximum independent set in a graph G is the same as finding a maximum clique in the complement of G. The problem is known to be NP-complete.
The brute force solution you mention takes O(n^2 2^n), but you can do better than this. Robson has a paper entitled ""Algorithms for maximum independent sets" from 1986 that gives an algorithm that takes O(2^{c*n}) for a constant 0<c<1 (I believe c is around 1/4, but I could be mistaken). None of this is specific to a bipartite graph.
One thing to note if you have a bipartite graph is that either side forms an independent set. In the complete bipartite graph K_{b,r} which is partitioned B x R with |B|=b and |R|=r where there is an edge from every vertex in B to every vertex in R and none within B nor none within R, a maximum independent set is B if b>=r, otherwise it's R.
Taking B or R won't work in general. sdcvvc noted the example with vertices {1,2,3,A,B,C} and edges {A,1}, {A,2}, {A,3}, {B,3}, {C,3}. In this case, the maximal independent set is {1,2,B,C}, which is larger than either partition {A,B,C} or {1,2,3}.
It may improve Robson's algorithm to start with the larger of B or R and proceed from there, though I'm not sure how much of a difference this will make.
Alternatively, you can use the Hopcroft–Karp algorithm on the bipartite complement of a graph to find a maximum matching, and then take the vertices used in the matching as the independent set. This gives a polynomial time algorithm to solve the problem.

Resources