Does greedily removing intervals with most conflicts solve interval scheduling? - algorithm

We can solve the scheduling problem, in which we must select the largest set of continuous intervals that do no overlap, with a greedy algorithm: we just keep picking the intervals that end the earliest: http://en.wikipedia.org/wiki/Interval_scheduling
Apparently, greedily picking the intervals with fewest conflicts does not work.
I was wondering if putting all the intervals in one big set and then greedily removing the interval with the most number of conflicts left (until the intervals have no conflicts) works. I can envision implementing this greedy algorithm with a priority queue: every time we remove the interval X with greatest conflicts from the priority queue, we update the other intervals that used to conflict with interval X so that the other intervals now are marked as having 1 less conflict.
Does this work? I'm trying to come up with a counterexample to disprove it and can't.

Here is a counterexample.
The idea is to drop a required interval on the very first pick.
The number of conflicts is on the right.
==== 2
---- 3
---- 3
==== 4
---- 3
---- 3
==== 2
Obviously, we will want to pick the three bold (====) intervals and drop the four thin (----) intervals.
There is no other way to obtain three non-intersecting intervals.
By the way, you may find the TopCoder tutorial on greedy problems interesting since it starts with a discussion of several approaches on the same problem.

Related

Problems that involve time intervals and their overlapping

I have recently came across a lot of questions that involve time intervals as an input. Some of the time intervals are overlapping. And depending upon that you have to perform an optimization, maximization or minimization operation on the input. I am not able to solve such problems. In fact, I am not able to even start thinking on these problems.
Here is an example:
Let us say, you are a resource holder. There can be an infinite supply of such a resource.
There are people who want that resource for a particular time interval. For ex: 4 pm to 8 pm
There can be an overlapping interval. ex: 5 pm to 7 pm, 3 pm to 6 pm
etc.
Depending upon these intervals, and their overlapping nature, you have to figure out how many distinct instances of these resources are required.
Ex. Input:
8 am - 9 am
8:30 am to 9:15 am
9.30 am to 1040 am
In this case, the first two intervals overlap. So two instances of resources will be required. The third interval is not overlapping, so the person with that interval can reuse the resource returned by any of the earlier ones.
Hence, in this case, minimum resources required are 2.
I don't need a solution. I need some pointers on how to solve. Are there any algorithms that address such questions? What should I read/ study. Are there any data structures that might help.
The number of intervals overlapping any time instant T is the number of interval start times less than T, minus the number of interval end times less than or equal to T.
Many of these problems, like the specific one above, can be solved by putting the start and end times separately into a sorted list or tree so you can figure out stuff about how these counts change over time.
To solve this problem, for example, sort the start and end times in a single list:
800S, 900E, 830S, 915E, 930S, 1040E
then sort them:
800S, 830S, 900E, 915E, 930S, 1040E
The run through the list and count, adding 1 for each start time and subtracting one for each end time:
1 2 1 0 1 0
The highest number of overlapping intervals is 2.
The data structure you need to use in order to solve this type of problems is The Interval Graph. The Interval Graph has a vertex for every interval and an edge between every pair of vertices corresponding to intervals that intersect.
The following interval graph corresponds to the set of three intervals in your example:
A: 8:00-9:00
B: 8:30-9:15
C: 9:30-10:40
This data structure captures the relevant aspects of most problems involving intervals and thus helps to solve them efficiently. Also, given the set of intervals (represented by a list of 2-tuples), you can construct the interval graph in Polynomial time.
Many problems that are NP-hard in general graphs, such as finding the Maximum Weight Independent Set or finding the Optimal Coloring, can be efficiently solved for interval graphs.
To solve the particular problem you've specified, first construct the interval graph G, while storing for each vertex the finish time of its corresponding interval. Also initialize a set of resources R={1} that at first contains only a single resource: resource number 1. Consider each vertex v of G in sorted order according to their finish time. Assign to v resource number i where i is the smallest resource in R not used by the neighbors of v. If no such a resource exists (because the neighbors of v use all the resources in R), insert a new resource i=max{R}+1 to R and assign it to v. The optimal number of resources (aka, the solution to your problem) is the size of the set R.

Activity selection with two resources

Given n activities with start time (Si) and end time (Fi) and 2 resources.
Pick the activities such that maximum number of activities are finished.
My ideas
I tried to solve it with DP but couldn't figure out anything with DP.So trying with greedy
Approach: Fill resource-1 first greedily and then resource-2 next greedily(Least end time first). But this will not work for this case T1(1,4) T2(5,10) T3(6,12) T4(11,15)
Approach 2:Select tasks greedily and assign it in round robin fashion.
This will also not work.
Can anyone please help me in figuring out this?
No need to use DP at all, a Greedy solution suffices, though it is slightly more complicated than the 1-resource problem.
Here, we first sort the intervals by the ending time, earlier first. Then, put two "sentinel" intervals in the resources, both with ending time -∞. Then, keeping grabbing the interval x with lowest x.end, and follow these rules:
if x.start is before both of the two ending times in our two resources, skip x and don't assign it, since x cannot fit
otherwise, have x overwrite the resource whose endpoint is latest and still before x.start
The greedy strategy in rule 2 is the key point here: we want to replace the latest ending used resource, since that maximizes the "space" that we have in the other resource to accommodate some future interval with an early start time, making it strictly more likely that future interval will be able to fit.
Let's look the example in the question, with intervals (1,4), (5,10), (6,12), and (11,18) already in sorted order. We begin with both resources having (-∞,-∞) as "sentinel" intervals. Now take the first interval (1,4), and see that it fits, so now we have resource 1 having (1,4) and resource 2 having (-∞,-∞). Next, take (5,10), which can fit in both resources, so we choose resource 1, because it ends the latest, and now resource 1 has (5,10). Next, we take (6,12), which only fits in resource 2, so resource 2 has (6,12). Finally, take (11,18), which fits in resource 1.
Hence, we have been able to fit all four intervals using our Greedy strategy.
Activity selection problem can be solved by Greedy-Iterative-Activity-Selector Algorithm.
The basic idea is to always pick the next activity whose finish time is least among the remaining activities and the start time is more than or equal to the finish time of previously selected activity. We can sort the activities according to their finishing time so that we always consider the next activity as minimum finishing time activity.
See more on Wikipedia.

Greedy Algorithm for Finding Min Set of Intervals that Overlap All Other Intervals

I'm learning greedy algorithms and came across a problem that I'm not sure how to tackle. Given a set of intervals (a,b) with start time a and end time b, give a greedy algorithm that returns the minimum amount of intervals that overlap every other interval in the set. So for example if I had:
(1,4) (2,3) (5,8) (6,9) (7,10)
I would return (2,3) and (7,8) since these two intervals cover every interval in the set. What I have right now is this:
Sort the intervals by increasing end time
Push the interval with the smallest end time onto a stack
If an interval (a,b) overlaps the interval on the top of the stack (c,d) (so a is less than d) then if a<=c keep (c,d). Else update the interval on the top of the stack to (a,d)
If an interval (a,b) does not overlap the interval on the top of the stack (c,d) then push (a,b) onto the stack
At the end the stack contains the desired intervals and this should run in O(n) time
My question is: how is this algorithm greedy? I'm struggling with the concepts. So maybe I have this right and maybe I don't, but if I do, I can't figure out what the greedy rule is/should be.
EDIT: A valid point was made below, about which I should have been clearer about. (7,8) works instead of (1,10) (which covers everything) because every time in (7,8) is in (5,8) (6,9) and (7,10). Same with (2,3), every time in there is in (1,4) and (2,3). The goal is to get a set of intervals such that if you looked at all possible times in that set of intervals, each time would be in at least one of the original intervals.
A greedy algorithm is one that repeatedly chooses the best incremental improvement, even though it might turn out to be sub-optimal in the long run.
Your algorithm doesn't seem greedy to me. A greedy algorithm for this problem would be:
Find the interval that is contained in the largest number of intervals from the input set.
Remove the intervals from the input set that contain it.
Repeat until the input set is empty.
For this example, it would first produce (7,8), because it is contained in 3 input intervals, then reduce the input set to (1,4)(2,3), then produce (2,3)
Note that this algorithm doesn't produce the optimal output for input set:
(0,4)(1,2)(1,4)(3,6)(3,7)(5,6)
It produces (3,4) first, since it is covered by 4 input intervals, but the best answer is (1,2)(5,6), which are covered by 3 intervals each

Variation to the Set-Covering Prob (Maybe an Activity Selection Prob)

Everyday from 9am to 5pm, I am supposed to have at least one person at the factory supervising the workers and make sure that nothing goes wrong.
There are currently n applicants to the job, and each of them can work from time si to time ci, i = 1, 2, ..., n.
My goal is to minimize the time that more than two people are keeping watch of the workers at the same time.
(The applicants' available working hours are able to cover the time period from 9am to 5pm.)
I have proved that at most two people are needed for any instant of time to fulfill my needs, but how should I get from here to the final solution?
Finding the time periods where only one person is available for the job and keeping them is my first step, but finding the next step is what troubles me... .
The algorithm must run in polynomial-time.
Any hints(a certain type of data structure maybe?) or references are welcome. Many thanks.
I think you can do this with dynamic programming by solving the sub-problem:
What is the minimum overlap time given that applicant i is the last worker and we have covered all times from start of day up to ci?
Call this value of the minimum overlap time cost(i).
You can compute the value of cost(i) by considering cases:
If si is equal to the start of day, then cost(i) = 0 (no overlap is required)
Otherwise, consider all previous applicants j. Set cost(i) to the minimum of cost(j)+overlap between i and j. Also set prev(i) to the value of j that attains the minimum.
Then the answer to your problem is given by the minimum of cost(k) for all values of k where ck is equal to the end of the day. You can work out the correct choice of people by backtracking using the values of prev.
This gives an O(n^2) algorithm.

Finding the minimal coverage of an interval with subintervals

Suppose I have an interval (a,b), and a number of subintervals {(ai,bi)}i whose union is all of (a,b). Is there an efficient way to choose a minimal-cardinality subset of these subintervals which still covers (a,b)?
A greedy algorithm starting at a or b always gives the optimal solution.
Proof: consider the set Sa of all the subintervals covering a. Clearly, one of them has to belong to the optimal solution. If we replace it with a subinterval (amax,bmax) from Sa whose right endpoint bmax is maximal in Sa (reaches furthest to the right), the remaining uncovered interval (bmax,b) will be a subset of the remaining interval from the optimal solution, so it can be covered with no more subintervals than the analogous uncovered interval from the optimal solution. Therefore, a solution constructed from (amax,bmax) and the optimal solution for the remaining interval (bmax,b) will also be optimal.
So, just start at a and iteratively pick the interval reaching furthest right (and covering the end of previous interval), repeat until you hit b. I believe that picking the next interval can be done in log(n) if you store the intervals in an augmented interval tree.
Sounds like dynamic programming.
Here's an illustration of the algorithm (assume intervals are in a list sorted by ending time):
//works backwards from the end
int minCard(int current, int must_end_after)
{
if (current < 0)
if (must_end_after == 0)
return 0; //no more intervals needed
else
return infinity; //doesn't cover (a,b)
if (intervals[current].end < must_end_after)
return infinity; //doesn't cover (a,b)
return min( 1 + minCard(current - 1, intervals[current].start),
minCard(current - 1, must_end_after) );
//include current interval or not?
}
But it should also involve caching (memoisation).
There are two cases to consider:
Case 1: There are no over-lapping intervals after the finish time of an interval. In this case, pick the next interval with the smallest starting time and the longest finishing time. (amin, bmax).
Case 2: There are 1 or more intervals overlapping with the last interval you're looking at. In this case, the start time doesn't matter because you've already covered that. So optimize for the finishing time. (a, bmax).
Case 1 always picks the first interval as the first interval in the optimal set as well (the proof is the same as what #RafalDowgrid provided).
You mean so that the subintervals still overlap in such a way that (a,b) remains completely covered at all points?
Maybe splitting up the subintervals themselves into basic blocks associated with where they came from, so you can list options for each basic block interval accounting for other regions covered by the subinterval also. Then you can use a search based on each sub-subinterval and at least be sure no gaps are left.
Then would need to search.. efficiently.. that would be harder.
Could eliminate any collection of intervals that are entirely covered by another set of smaller number and work the problem after the preprocessing.
Wouldn't the minimal for the whole be minimal for at least one half? I'm not sure.
Found a link to a journal but couldn't read it. :(
This would be a hitting set problem and be NP_hard in general.
Couldn't read this either but looks like opposite kind of problem.
Couldn't read it but another link that mentions splitting intervals up.
Here is an available reference on Randomized Algorithms for GeometricOptimization Problems.
Page 35 of this pdf has a greedy algorithm.
Page 11 of Karp (1972) mentions hitting-set and is cited alot.
Google result. Researching was fun but I have to go now.

Resources