finding largest subset of intervals

finding largest subset of intervals - algorithm

I was trying to solve this problem here.
Also, posting the question: You are given a list of N intervals.
The challenge is to select the largest subset of intervals such that no three intervals in the subset share a common point?
but couldn't come around to a solution. This is what I tried so far:
DP: don't think problem has overlapping sub-problems so this didn't work
reduced it to a graph with each point being a vertex and intervals being the edges of the undirected graph. Then problem reduces to finding maximum length disjoint paths in the graph. Couldn't come up with a neat way of doing this as well
tried reducing it to network flow but that didn't work as well.
Could you guys give me hints on how to approach this problem or if I am missing anything. Sorry, I am doing algorithms after a really long time and been out of touch lately.

I'll give the solution in general words without programming it.
Let's denote segments as s1, s2, ..., sn. Their beginnings as b1, b2,... bn, and their ends as e1, e2,... en.
Sort segments by their beginnings, so b1< b2<...< bn. It is enough to check them if the condition of no three segments covering a point holds. We will be doing so in the order from b1 to bn. So, start with b1, move to the next point, and so on one by one, until at some point bi there are three segments covering it. These will be the segment si and two others, let's say sj and sk. Of those three segments delete the one with the maximum end point, i.e. max{ei, ej, ek}. Move on to the beginning of the next segment (bi+1). When we reach bn the process is done. All the segments that are left constitute the largest subset of segments such that no three segments share a common point.
Why this will be the maximal subset. Let's say our solution is S (the set of segments). Suppose there is an optimal solution S*. Again, sort the segments in S and S* by the coordinate of their beginnings. Now, we will be going through the segments in S and in S* and comparing their end points. By the construction of S for any kth segment in S its end coordinate is smaller than the end coordinate of kth segment in S* (ek<=ek). Therefore, the number of segments in S is not less than in S (moving in S* we're always outrunning S).
If this is not convincing enough, try to think about a simpler problem at first, where no two segments can overlap. The solution is the same, but it's much more intuitive to see why it gives the right answer.

Shafa is Right;
#include <iostream>
#include <set>
using namespace std;
class Interval{
public:
int begin;int end;
Interval(){
begin=0;end=0;
}
Interval(int _b,int _e){
begin=_b;end=_e;
}
bool operator==(const Interval& i) const {
return (begin==i.begin)&&(end==i.end);
}
bool operator<(const Interval& i) const {
return begin<i.begin;
}
};
int n,t,a,b;
multiset<Interval> inters;
multiset<int> iends;
multiset<Interval>::iterator it1;
multiset<int>::iterator et1;
int main(){
scanf("%d",&t);
while(t--){
inters.clear();
iends.clear();
scanf("%d",&n);
while(n--){
scanf("%d %d",&a,&b);
Interval inter(a,b);
inters.insert(inter);
}
it1=inters.begin();
while(it1!=inters.end()){
iends.insert(it1->end);
et1=iends.lower_bound(it1->begin);
multiset<int>::iterator t=et1;
if((++et1!=iends.end())&&(++et1!=iends.end())){
//æŠŠå‰©ä¸‹çš„çº¿æ®µå…¨éƒ¨åˆ æŽ‰
while(et1!=iends.end()){
multiset<int>::iterator te=et1;
et1++;
iends.erase(te);
}
}
it1++;
}
printf("%d\n",iends.size());
}
system("pause");
return 0;
}

Related

Reduce time taken in Line Sweep for vertical and horizontal line segments

I have used std::set to implement line sweep algorithm for vertical and horizontal lines. But the final range search on the lower bound and uppper bound of 'status' set takes a lot of time. Is there some way to avoid this? I chose std::set because it is based on balanced BST and insertion, deletion and search take logn time. Is there a better data structure to implement this?
// before this I initialize the events set with segments with increasing x co-ordinates. The segment struct has 2 points variable and 1 type variable for identifying vertical segment(1), horizontal segment starting(0) and ending(2).
for(auto iter = events.begin(); iter != events.end(); iter++)
{
segment temp = *iter;
if(temp.type == 0)
status.insert(temp.p1);
else if(temp.type == 2)
status.erase(temp.p2);
else
{
auto lower = status.lower_bound(std::make_pair(temp.p1.x, temp.p1.y));
auto upper = status.upper_bound(std::make_pair(temp.p2.x, temp.p2.y));
// Can the no of elements in the interval be found without this for loop
for(;lower != upper; lower++)
{
count++;
}
}
}
Here event and status are sets of segments struct and points respectively.
typedef std::pair<int, int> point;
struct segment
{
point p1, p2;
int type;
segment(point a, point b, int t)
:p1(a), p2(b), type(t){}
};
std::set<segment, segCompare> events;
...
std::set<point, pointCompare> status;

In order to compute the distance efficiently, the tree would need to maintain size counts for each sub-tree. Since that service is not needed in most cases, it is not too surprising that std::set does not incur its cost for everyone.
I haven't found anything in the C++ standard library that will do this off the shelf. I think you may need to roll your own in this case, or find someone else who has.

If you do batch insertions of the events use a std::vector that is always sorted. There is no difference in asymptomatic runtime, which is O(n log n) for both, for a batch of n insertions.
This lets you do iterator arithmetic among other things.

Selecting evenly distributed points algorithm

Suppose there are 25 points in a line segment, and these points may be unevenly distributed (spatially) as the following figure shows:
My question is how we can select 10 points among these 25 points so that these 10 points can be as spatially evenly distributed as possible. In the idea situation, the selected points should be something like this:
EDIT:
It is true that this question can become more elegant if I can tell the criterion that justify the "even distribution". What I know is my expection for the selected points: if I divide the line segment into 10 equal line segments. I expect there should be one point on each small line segment. Of course it may happen that in some small line segments we cannot find representative points. In that case I will resort to its neighboring small line segment that has representative point. In the next step I will further divide the selected neighboring segment into two parts: if each part has representative points, then the empty representative point problem will be solved. If we cannot find representative point in one of the small line segments, we can further divide it into smaller parts. Or we can resort to the next neighboring line segment.
EDIT:
Using dynamic programming, a possible solution is implemented as follows:
#include <iostream>
#include <vector>
using namespace std;
struct Note
{
int previous_node;
double cost;
};
typedef struct Note Note;
int main()
{
double dis[25] =
{0.0344460805029088, 0.118997681558377, 0.162611735194631,
0.186872604554379, 0.223811939491137, 0.276025076998578,
0.317099480060861, 0.340385726666133, 0.381558457093008,
0.438744359656398, 0.445586200710900, 0.489764395788231,
0.498364051982143, 0.585267750979777, 0.646313010111265,
0.655098003973841, 0.679702676853675, 0.694828622975817,
0.709364830858073, 0.754686681982361, 0.765516788149002,
0.795199901137063, 0.823457828327293, 0.950222048838355, 0.959743958516081};
Note solutions[25];
for(int i=0; i<25; i++)
{
solutions[i].cost = 1000000;
}
solutions[0].cost = 0;
solutions[0].previous_node = 0;
for(int i=0; i<25; i++)
{
for(int j= i-1; j>=0; j--)
{
double tempcost = solutions[j].cost + std::abs(dis[i]-dis[j]-0.1);
if (tempcost<solutions[i].cost)
{
solutions[i].previous_node = j;
solutions[i].cost = tempcost;
}
}
}
vector<int> selected_points_index;
int i= 24;
selected_points_index.push_back(i);
while (solutions[i].previous_node != 0)
{
i = solutions[i].previous_node;
selected_points_index.push_back(i);
}
selected_points_index.push_back(0);
std::reverse(selected_points_index.begin(),selected_points_index.end());
for(int i=0; i<selected_points_index.size(); i++)
cout<<selected_points_index[i]<<endl;
return 0;
}
The result are shown in the following figure, where the selected points are denoted as green:

Until a good, and probably O(n^2) solution comes along, use this approximation:
Divide the range into 10 equal-sized bins. Choose the point in each bin closest to the centre of each bin. Job done.
If you find that any of the bins is empty choose a smaller number of bins and try again.
Without information about the scientific model that you are trying to implement it is difficult (a) to suggest a more appropriate algorithm and/or (b) to justify the computational effort of a more complicated algorithm.

Let {x[i]} be your set of ordered points. I guess what you need to do is to find the subset of 10 points {y[i]} that minimizes \sum{|y[i]-y[i-1]-0.1|} with y[-1] = 0.
Now, if you see the configuration as a strongly connected directed graph, where each node is one of the 25 doubles and the cost for every edge is |y[i]-y[i-1]-0.1|, you should be able to solve the problem in O(n^2 +nlogn) time with the Dijkstra's algorithm.
Another idea, that will probably lead to a better result, is using dynamic programming : if the element x[i] is part of our soltion, the total minimum is the sum of the minimum to get to the x[i] point plus the minimum to get the final point, so you could write a minimum solution for each point, starting from the smallest one, and using for the next one the minimum between his predecessors.
Note that you'll probably have to do some additional work to pick, from the solutions set, the subset of those with 10 points.
EDIT
I've written this in c#:
for (int i = 0; i < 25; i++)
{
for (int j = i-1; j > 0; j--)
{
double tmpcost = solution[j].cost + Math.Abs(arr[i] - arr[j] - 0.1);
if (tmpcost < solution[i].cost)
{
solution[i].previousNode = j;
solution[i].cost = tmpcost;
}
}
}
I've not done a lot of testing, and there may be some problem if the "holes" in the 25 elements are quite wide, leading to solutions that are shorter than 10 elements ... but it's just to give you some ideas to work on :)

You can find approximate solution with Adaptive Non-maximal Suppression (ANMS) algorithm provided the points are weighted. The algorithm selects n best points while keeping them spatially well distributed (most spread across the space).
I guess you can assign point weights based on your distribution criterion - e.g. a distance from uniform lattice of your choice. I think the lattice should have n-1 bins for optimal result.
You can look up following papers discussing the 2D case (the algorithm can be easily realized in 1D):
Turk, Steffen Gauglitz Luca Foschini Matthew, and Tobias Höllerer. "EFFICIENTLY SELECTING SPATIALLY DISTRIBUTED KEYPOINTS FOR VISUAL TRACKING."
Brown, Matthew, Richard Szeliski, and Simon Winder. "Multi-image matching using multi-scale oriented patches." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005.
The second paper is less related to your problem but it describes basic ANMS algorithm. The first papers provides faster solution. I guess both will do in 1D for a moderate amount of points (~10K).

Given a large database of over 50,000 , How can I quickly search for desired points

I have a database of over 50,000 points. Each point has 3 dimensions. Let's label them [i,j,k]
I wish to look for points in which it is better than another point in some other way.
For example, Object A [10 10 3], and Object B[1 1 4], Object C[1 1 1], Object D[1 1 10]
Then the desired output would be A and D (since C is worser than all of them, and B beats A in dimenson[k] but D beats B in dimension [k])
I've tried some basic comparison algorithms (i.e. if else statements) which do work when I cut down the database size. But with 50,000, it takes more than 10mins to find the desired output, which of course is not a good solution.
Could somebody recommend me a method or two to do this the fastest possible way?
Thanks
EDIT:
Thanks I think I've got it

You can do many optimizations to your code:
{
vector<bool> isinterst(n, true);
for (int i=0; i<n; i++) {
for (int j=0; j<n; j++) {
if (isinterst[i]) {
bool worseelsewhere=false;
for (int k=0; k<d; k++)
{
if (point[i][k]<point[j][k])
{
worseelsewhere=true;
break; //you can exit for loop if worseelsewhere is set to true
}
}
if(worseelsewhere == false)
{
continue; //skip the rest if worseelsewhere is false
}
bool worse=true;
for (int k=0; k<d; k++)
{
if (point[i][k]>point[j][k])
{
worse=false;
break; //you can exit for loop if worse is set to false
}
}
if (worseelsewhere && worse) {
isinterst[i]=false;
//cout << i << " Not desirable " << endl;
}
}
}
}

You're looking for pareto-optimal points. These form a convex hull. That's easiest to see in 2 dimensions. Use an iterative algorithm to determine the pareto-optimal points of the first N points. For N=1, that's just the first point. For N=2, the next point is either dominated by the first (discard 2nd), dominates the 1st (discard 1st), lies above to the left, or below to the right (and so is also pareto-optimal).
You can speed up classification by keeping a simplified upper and lower bound for the convex hull, e.g. just single points {minX, minY, minZ} and {maxX, maxY, maxZ}. If P={x,y,z} is dominated by {minX, minY, minZ} then it is dominated by all pareto-optimal points so far and can be discarded. If P dominates {maxX, maxY, maxZ}, it also dominates all points that were pareto-optimal so far and you can discard all those.
A quick O(log N) initial step is to first sort the collection in X order to find the point with max X, then Y to find the point with max Y, and finally with max Z. Finding the pareto-optimal points in ths N=3 subset is easy, and can be hardcoded. You can then use this set as a first approximation.
A more refined solution is to then sort by X+Y, X+Z, Y+Z and X+Y+Z and find those maxima as well. Again, this produces points which are good initial candidates because they will dominate many other points.
E.g. in your case, sorting by X and sorting by Y would both produce point A; sorting by Z would produce point D, neither dominates the other, and you can then quickly discard B and C.

Without knowing your definition of "better" it's a bit hard to make concrete suggestions here. I note, however, that you appear to working with spatial data. A data structure that is often used when working with spatial data is the R-Tree (http://en.wikipedia.org/wiki/R-tree). This provides an efficient index for multidimensional information.
Perhaps the boost::geometry library has some tools that will assist: http://www.boost.org/doc/libs/1_53_0/libs/geometry/doc/html/geometry/introduction.html

Bidirectional spanning tree

I came across this question from interviewstreet.com
Machines have once again attacked the kingdom of Xions. The kingdom
of Xions has N cities and N-1 bidirectional roads. The road network is
such that there is a unique path between any pair of cities.
Morpheus has the news that K Machines are planning to destroy the
whole kingdom. These Machines are initially living in K different
cities of the kingdom and anytime from now they can plan and launch an
attack. So he has asked Neo to destroy some of the roads to disrupt
the connection among Machines i.e after destroying those roads there
should not be any path between any two Machines.
Since the attack can be at any time from now, Neo has to do this task
as fast as possible. Each road in the kingdom takes certain time to
get destroyed and they can be destroyed only one at a time.
You need to write a program that tells Neo the minimum amount of time
he will require to disrupt the connection among machines.
Sample Input First line of the input contains two, space-separated
integers, N and K. Cities are numbered 0 to N-1. Then follow N-1
lines, each containing three, space-separated integers, x y z, which
means there is a bidirectional road connecting city x and city y, and
to destroy this road it takes z units of time. Then follow K lines
each containing an integer. Ith integer is the id of city in which ith
Machine is currently located.
Output Format Print in a single line the minimum time required to
disrupt the connection among Machines.
Sample Input
5 3
2 1 8
1 0 5
2 4 5
1 3 4
2
4
0
Sample Output
10
Explanation Neo can destroy the road connecting city 2 and city 4 of
weight 5 , and the road connecting city 0 and city 1 of weight 5. As
only one road can be destroyed at a time, the total minimum time taken
is 10 units of time. After destroying these roads none of the Machines
can reach other Machine via any path.
Constraints
2 <= N <= 100,000
2 <= K <= N
1 <= time to destroy a road <= 1000,000
Can someone give idea how to approach the solution.

The kingdom has N cities, N-1 edges and it's fully connected, therefore our kingdom is tree (in graph theory). At this picture you can see tree representation of your input graph in which Machines are represented by red vertices.
By the way you should consider all paths from the root vertex to all leaf nodes. So in every path you would have several red nodes and during removing edges you should take in account only neighboring red nodes. For example in path 0-10 there are two meaningfull pairs - (0,3) and (3,10). And you must remove exactly one node (not less, not more) from each path which connected vertices in pairs.
I hope this advice was helpful.

All the three answers will lead to correct solution but you can not achieve the solution within the time limit provided by interviewstreet.com. You have to think of some simple approach to solve this problem successfully.
HINT: start from the node where machine is present.

As said by others, a connected graph with N vertices and N-1 edges is a tree.
This kind of problem asks for a greedy solution; I'd go for a modification of Kruskal's algorithm:
Start with a set of N components - 1 for every node (city). Keep track of which components contain a machine-occupied city.
Take 1 edge (road) at a time, order by descending weight (starting with roads most costly to destroy). For this edge (which necessarily connects two components - the graph is a tree):
if both neigboring components contain a machine-occupied city, this road must be destroyed, mark it as such
otherwise, merge the neigboring components into one. If one of them contained a machine-occupied city, so does the merged component.
When you're done with all edges, return the sum of costs for the destroyed roads.
Complexity will be the same as Kruskal's algorithm, that is, almost linear for well chosen data structure and sorting method.

pjotr has a correct answer (though not quite asymptotically optimal), but this statement
This kind of problem asks for a greedy solution
really requires proof, as in the real world (as distinguished from competitive programming), there are several problems of this “kind” for which the greedy solution is not optimal (e.g., this very same problem in general graphs, which is called multiterminal cut and is NP-hard). In this case, proof consists of verifying the matroid axioms. Let a set of edges A &subseteq; E be independent if the graph (V, E &setminus; A) has exactly |A| + 1 connected components containing at least one machine.
Independence of the empty set. Trivial.
Hereditary property. Let A be an independent set. Every edge e &in; A joins two connected components of the graph (V, E &setminus; A), and every connected component contains at least one machine. In putting e back in the graph, the number of connected components containing at least one machine decreases by 1, so A &setminus; {e} is also independent.
Augmentation property. Let A and B be independent sets with |A| < |B|. Since (V, E &setminus; B) has more connected components than (V, E &setminus; A), there exists by the pigeonhole principle a pair of machines u, v such that u and v are disconnected by B but not by A. Since there is exactly one path from u to v, B contains at least one edge e on this path, and A cannot contain e. The removal of A ∪ {e} induces one more connected component containing at least one machine than A, so A ∪ {e} is independent, as required.

Start performing a DFS from either of the machine nodes. Also, keep track of the edge with min weight encountered so far. As soon as you find the next node which also contains a machine, delete the min edge recorded so far. Start DFS from this new node now.
Repeat until you have found all nodes where the machines exists.
Should be of the O(N) that way !!

I write some code, and pasted all the tests.
#include <iostream>
#include<algorithm>
using namespace std;
class Line {
public:
Line(){
begin=0;end=0; weight=0;
}
int begin;int end;int weight;
bool operator<(const Line& _l)const {
return weight>_l.weight;
}
};
class Point{
public:
Point(){
pre=0;machine=false;
}
int pre;
bool machine;
};
void DP_Matrix();
void outputLines(Line* lines,Point* points,int N);
int main() {
DP_Matrix();
system("pause");
return 0;
}
int FMSFind(Point* trees,int x){
int r=x;
while(trees[r].pre!=r)
r=trees[r].pre;
int i=x;int j;
while(i!=r) {
j=trees[i].pre;
trees[i].pre=r;
i=j;
}
return r;
}
void DP_Matrix(){
int N,K,machine_index;scanf("%d%d",&N,&K);
Line* lines=new Line[100000];
Point* points=new Point[100000];
N--;
for(int i=0;i<N;i++) {
scanf("%d%d%d",&lines[i].begin,&lines[i].end,&lines[i].weight);
points[i].pre=i;
}
points[N].pre=N;
for(int i=0;i<K;i++) {
scanf("%d",&machine_index);
points[machine_index].machine=true;
}
long long finalRes=0;
for(int i=0;i<N;i++) {
int bP=FMSFind(points,lines[i].begin);
int eP=FMSFind(points,lines[i].end);
if(points[bP].machine&&points[eP].machine){
finalRes+=lines[i].weight;
}
else{
points[bP].pre=eP;
points[eP].machine=points[bP].machine||points[eP].machine;
points[bP].machine=points[eP].machine;
}
}
cout<<finalRes<<endl;
delete[] lines;
delete[] points;
}
void outputLines(Line* lines,Point* points,int N){
printf("\nLines:\n");
for(int i=0;i<N;i++){
printf("%d\t%d\t%d\n",lines[i].begin,lines[i].end,lines[i].weight);
}
printf("\nPoints:\n");
for(int i=0;i<=N;i++){
printf("%d\t%d\t%d\n",i,points[i].machine,points[i].pre);
}
}

Algorithm to find out all the possible positions

I need an algorithm to find out all the possible positions of a group of pieces in a chessboard. Like finding all the possible combinations of the positions of a number N of pieces.
For example in a chessboard numbered like cartesian coordinate systems any piece would be in a position
(x,y) where 1 <= x <= 8 and 1 <= y <= 8
I'd like to get an algorithm which can calculate for example for 3 pieces all the possible positions of the pieces in the board. But I don't know how can I get them in any order. I can get all the possible positions of a single piece but I don't know how to mix them with more pieces.
for(int i = 0; i<= 8; i++){
for(int j = 0; j<= 8; j++){
System.out.println("Position: x:"+i+", y:"+j);
}
}
How can I get a good algoritm to find all the posible positions of the pieces in a chessboard?
Thanks.

You got 8x8 board, so total of 64 squares.
Populate a list containing these 64 sqaures [let it be list], and find all of the possibilities recursively: Each step will "guess" one point, and invoke the recursve call to find the other points.
Pseudo code:
choose(list,numPieces,sol):
if (sol.length == numPieces): //base clause: print the possible solution
print sol
return
for each point in list:
sol.append(point) //append the point to the end of sol
list.remove(point)
choose(list,numPieces,sol) //recursive call
list.add(point) //clean up environment before next recursive call
sol.removeLast()
invoke with choose(list,numPieces,[]) where list is the pre-populated list with 64 elements, and numPieces is the pieces you are going to place.
Note: This solution assumes pieces are not identical, so [(1,2),(2,1)] and [(2,1),(1,2)] are both good different solutions.
EDIT:
Just a word about complexity, since there are (n^2)!/(n^2-k)! possible solutions for your problem - and you are looking for all of them, any algorithm will suffer from exponential run time, so trying to invoke it with just 10 pieces, will take ~400 years
[In the above notation, n is the width and length of the board, and k is the number of pieces]

You can use a recursive algorithm to generate all possiblities:
void combine(String instr, StringBuffer outstr, int index)
{
for (int i = index; i < instr.length(); i++)
{
outstr.append(instr.charAt(i));
System.out.println(outstr);
combine(instr, outstr, i + 1);
outstr.deleteCharAt(outstr.length() - 1);
}
}
combine("abc", new StringBuffer(), 0);

As I understand you should consider that some firgure may come block some potential position for figures that can reach them on the empty board. I guess it is the most tricky part.
So you should build some set of vertexes (set of board states) that is reached from some single vertex (initial board state).
The first algorithm that comes to my mind:
Pre-conditions:
Order figures in some way to form circle.
Assume initial set of board states (S0) to contain single element which represents inital board state.
Actions
Choose next figure to extend set of possible positions
For each state of board within S(n) walk depth-first all possible movements that new board states and call it F(n) (frame).
Form S(n+1) = S(n) ∪ F(n).
Repeat steps till all frames of updates during whole circle pass will not be empty.
This is kind of mix breath-first and depth-first search

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

finding largest subset of intervals - algorithm

Related

Reduce time taken in Line Sweep for vertical and horizontal line segments

Selecting evenly distributed points algorithm

Given a large database of over 50,000 , How can I quickly search for desired points

Bidirectional spanning tree

Algorithm to find out all the possible positions

Categories

Resources