Finding the minimal coverage of an interval with subintervals - algorithm

Suppose I have an interval (a,b), and a number of subintervals {(ai,bi)}i whose union is all of (a,b). Is there an efficient way to choose a minimal-cardinality subset of these subintervals which still covers (a,b)?

A greedy algorithm starting at a or b always gives the optimal solution.
Proof: consider the set Sa of all the subintervals covering a. Clearly, one of them has to belong to the optimal solution. If we replace it with a subinterval (amax,bmax) from Sa whose right endpoint bmax is maximal in Sa (reaches furthest to the right), the remaining uncovered interval (bmax,b) will be a subset of the remaining interval from the optimal solution, so it can be covered with no more subintervals than the analogous uncovered interval from the optimal solution. Therefore, a solution constructed from (amax,bmax) and the optimal solution for the remaining interval (bmax,b) will also be optimal.
So, just start at a and iteratively pick the interval reaching furthest right (and covering the end of previous interval), repeat until you hit b. I believe that picking the next interval can be done in log(n) if you store the intervals in an augmented interval tree.

Sounds like dynamic programming.
Here's an illustration of the algorithm (assume intervals are in a list sorted by ending time):
//works backwards from the end
int minCard(int current, int must_end_after)
{
if (current < 0)
if (must_end_after == 0)
return 0; //no more intervals needed
else
return infinity; //doesn't cover (a,b)
if (intervals[current].end < must_end_after)
return infinity; //doesn't cover (a,b)
return min( 1 + minCard(current - 1, intervals[current].start),
minCard(current - 1, must_end_after) );
//include current interval or not?
}
But it should also involve caching (memoisation).

There are two cases to consider:
Case 1: There are no over-lapping intervals after the finish time of an interval. In this case, pick the next interval with the smallest starting time and the longest finishing time. (amin, bmax).
Case 2: There are 1 or more intervals overlapping with the last interval you're looking at. In this case, the start time doesn't matter because you've already covered that. So optimize for the finishing time. (a, bmax).
Case 1 always picks the first interval as the first interval in the optimal set as well (the proof is the same as what #RafalDowgrid provided).

You mean so that the subintervals still overlap in such a way that (a,b) remains completely covered at all points?
Maybe splitting up the subintervals themselves into basic blocks associated with where they came from, so you can list options for each basic block interval accounting for other regions covered by the subinterval also. Then you can use a search based on each sub-subinterval and at least be sure no gaps are left.
Then would need to search.. efficiently.. that would be harder.
Could eliminate any collection of intervals that are entirely covered by another set of smaller number and work the problem after the preprocessing.
Wouldn't the minimal for the whole be minimal for at least one half? I'm not sure.
Found a link to a journal but couldn't read it. :(
This would be a hitting set problem and be NP_hard in general.
Couldn't read this either but looks like opposite kind of problem.
Couldn't read it but another link that mentions splitting intervals up.
Here is an available reference on Randomized Algorithms for GeometricOptimization Problems.
Page 35 of this pdf has a greedy algorithm.
Page 11 of Karp (1972) mentions hitting-set and is cited alot.
Google result. Researching was fun but I have to go now.

Related

Finding "maximum" overlapping interval pair in O(nlog(n))

Problem Statement
Input
set of n intervals; {[s_1,t_1], [s_2,t_2], ... ,[s_n,t_n]}.
Output
pair of intervals; {[s_i,t_i],[s_j,t_j]}, with the maximum overlap among all the interval pairs.
Example
input intervals : {[1, 10], [2, 6], [3,15], [5, 9]}
-> There are possible 6 interval pairs. Among those pairs, [1,10] & [3,15] has the largest possible overlap of 7.
output : {[1,10],[3,15]}
A naive algorithm will be a brute force method where all n intervals get compared to each other, while the current maximum overlap value being tracked. The time complexity would be O(n^2) for this case.
I was able to find many procedures regarding interval trees, maximum number of overlapping intervals and maximum set of non-overlapping intervals, but nothing on this problem. Maybe I would be able to use the ideas given in the above algorithms, but I wasn't able to come up with one.
I spent many hours trying to figure out a nice solution, but I think I need some help at this point.
Any suggestions will help!
First, sort the intervals: first by left endpoint in increasing order, then — as a secondary criterion — by right endpoint in decreasing order. For the rest of this answer, I'll assume that the intervals are already in sorted order.
Now, there are two possibilities for what the maximum possible overlap might be:
it may be between an interval and a later interval that it completely covers.
it may be between an interval and the very next interval that it doesn't completely cover.
We can cover both cases in O(n) time by iterating over the intervals, keeping track of the following:
the greatest overlap we've seen so far, and the relevant pair of intervals.
the latest interval we've seen, call it L, that wasn't completely covered by any of its predecessors. (For this, the key insight is that thanks to the ordering of the intervals, we can easily tell if an interval is completely covered by any of its predecessors — and therefore if we need to update L — by simply checking if it's completely covered by the current L. So we can keep L up-to-date in O(1) time.)
and computing each interval's overlap with L.
So:
result := []
max_overlap := 0
L := sorted_intervals[1]
for interval I in sorted_intervals[2..n]:
overlap := MIN(L.right, I.right) - I.left
if overlap >= max_overlap:
result := [L, I]
max_overlap := overlap
if I.right > L.right:
L := I
So the total cost is the cost of sorting the intervals, which is likely to be O(n log n) time but may be O(n) if you can use bucket-sort or radix-sort or similar.

Greedy Algorithm for Finding Min Set of Intervals that Overlap All Other Intervals

I'm learning greedy algorithms and came across a problem that I'm not sure how to tackle. Given a set of intervals (a,b) with start time a and end time b, give a greedy algorithm that returns the minimum amount of intervals that overlap every other interval in the set. So for example if I had:
(1,4) (2,3) (5,8) (6,9) (7,10)
I would return (2,3) and (7,8) since these two intervals cover every interval in the set. What I have right now is this:
Sort the intervals by increasing end time
Push the interval with the smallest end time onto a stack
If an interval (a,b) overlaps the interval on the top of the stack (c,d) (so a is less than d) then if a<=c keep (c,d). Else update the interval on the top of the stack to (a,d)
If an interval (a,b) does not overlap the interval on the top of the stack (c,d) then push (a,b) onto the stack
At the end the stack contains the desired intervals and this should run in O(n) time
My question is: how is this algorithm greedy? I'm struggling with the concepts. So maybe I have this right and maybe I don't, but if I do, I can't figure out what the greedy rule is/should be.
EDIT: A valid point was made below, about which I should have been clearer about. (7,8) works instead of (1,10) (which covers everything) because every time in (7,8) is in (5,8) (6,9) and (7,10). Same with (2,3), every time in there is in (1,4) and (2,3). The goal is to get a set of intervals such that if you looked at all possible times in that set of intervals, each time would be in at least one of the original intervals.
A greedy algorithm is one that repeatedly chooses the best incremental improvement, even though it might turn out to be sub-optimal in the long run.
Your algorithm doesn't seem greedy to me. A greedy algorithm for this problem would be:
Find the interval that is contained in the largest number of intervals from the input set.
Remove the intervals from the input set that contain it.
Repeat until the input set is empty.
For this example, it would first produce (7,8), because it is contained in 3 input intervals, then reduce the input set to (1,4)(2,3), then produce (2,3)
Note that this algorithm doesn't produce the optimal output for input set:
(0,4)(1,2)(1,4)(3,6)(3,7)(5,6)
It produces (3,4) first, since it is covered by 4 input intervals, but the best answer is (1,2)(5,6), which are covered by 3 intervals each

scheduling n people with given time of travel

this is a puzzle but i think it could be a classical algorithm which i am unaware of :
There are n people at the bottom of a mountain, and everyone wants to go up, then down the mountain. Person i takes u[i] time to climb this mountain, and d[i] time to descend it.
However, at same given time atmost 1 person can climb , and .atmost 1 person can descend the mountain. Find the least time to travel up and back down the mountain.
Update 1 :
well i tried with few examples and found that it's not reducible to sorting , or getting the fastest climbers first or vice versa . I think to get optimal solution we may have to try out all possible solutions , so seems to be NP complete.
My initial guess: (WRONG)
The solution i thought is greedy : sort n people by start time in ascending order. Then up jth person up and kth down where u[j]<= d[k] and d[k] is minimum from all k persons on top of mountain. I am not able to prove correctness of this .
Any other idea how to approach ?
A hint would suffice.
Try to think in the following manner: if the people are not sorted in ascending order of time it takes them to climb the mountain than what happens if you find a pair of adjacent people that are not in the correct order(i.e. first one climbs longer than second one) and swap them. Is it possible that the total time increases?
I think it is incorrect. Consider
u = [2,3]
d = [1,3]
Your algorithm gives ordering 0,1 whereas it should be 1,0.
I would suggest another greedy approach:
Create ordering list and add first person.
For current ordering keep track of two values:
mU - time of last person on the mountain - time of the end
mD - time of earliest time of first descending
From people who are not ordered choose the one which minimises abs(mD - d) and abs(mU - u). Then if abs(mD - d) < abs(mU - u) he should go at the beginning of ordering. Otherwise he goes at the end.
Some tweak may still be needed here, but this approach should minimise losses from cases like the one given in the example.
The following solution will only work with n <= 24.
This solution will require dynamic programming and bit-mask technique knowledge to be understood.
Observation: we can easily observe that the optimal total climb up time is fixed, which is equalled to the total climb up time of n people.
For the base case, if n = 1, the solution is obvious.
For n = 2, the solution is simple, just scan through all 4 possibilities and calculate the minimum down time.
For n = 3, we can see that this case will be equal to the case when one person climb up first, followed by two.
And the two person minimum down time can be easily pre-calculated. More important, this two person then can be treated as one person with up time is the total up time of the two, and down time is the minimum down time.
Storing all result for minimum down time for cases from n = 0 to n = 3 in array called 'dp', using bit-mask technique, we represent the state for 3 person as index 3 = 111b, so the result for case n = 3 will be:
for(int i = 0; i < 3; i++){
dp[3] = min(dp[(1<<i)] + dp[3^(1<<i)],dp[3]);
}
For n = 4... 24, the solution will be similar to case n = 3.
Note: The actual formula is not just simple as the code for case n = 3(and it requires similar approach to solve as case n = 2), but will be very similar,
Your approach looks sensible, but it may be over-simplified, could you describe it more precisely here?
From your description, I can't make out whether you are sorting or something else; these are the heuristics that I figured you are using:
Get the fastest climbers first, so the start using the Down path
asap.
Ensure there is always people at the top of the mountain, so
when the Down path becomes available, a person starts descending
immediately.The way you do that is to select first those people who
climb fast and descend slowly.
What if the fastest climber is also the fastest descender? That would leave the Down path idle until the second climber gets to the top, how does your algorithm ensures that this the best order?. I'm not sure that the problem reduces to a Sorting problem, it looks more like a knapsack or scheduling type.

Interval selection algorithm with 2 overlappings at most

I am trying to solve this problem, in which N (N<=1000) intervals are given, and the algorithm should compute the (size) largest subset of intervals such that no three intervals in the subset share a common point. Right now I am looking only for a Dynamic Programming algorithm.
My idea was to sort the intervals by finishing time and then to consider the following subproblem:
A[i] - the solution to the problem (i.e max. number of intervals) using only the first i intervals.
And now my question is about the recurrence: if the ith interval is taken, then I can't figure out the recurrence.
Could someone explain where I did wrong (was it the subproblem definition or do I simply miss something at the recurrence subproblem)?
EDIT (new idea?)
After some research, I have found this DP solution for the general Job Selection algorithm. So the next idea would be to have the q array, where q[i] = the index of the last interval that has no overlapping with the i-th interval.
And then for sure, the formula for computing A[i] when the i-th interval is taken would be something like (note the missing part):
A[i] = 1 + A[q[i]] + [missing-part];
//1 stands for the i-th interval
//A[q[i]] stands for the solution of the intervals that do not overlap with with the i-th interval
//[missing-part] is the maximum number of intervals from the intervals that overlap with the i-th intervals that are safe to be added.
The question remain: how to compute the missing part?
EDIT (greedy solution, not the wanted solution)
The greedy solution is pretty straightforward, very similar to the Job Selection Problem, with the additional check that when adding a new interval, there must remain no unprocessed intervals that break the condition of the problem.

Sub O(n^2) algorithm for counting nested intervals?

We have a list of intervals of the form [ai, bi]. For each interval, we want to count the number of other intervals that are nested within it.
For example, if we had two intervals, A = [1,4] and B = [2,3]. Then the count for B would be 0 as there are no nested intervals for B; and the count for A would be 1 as B fits within A.
My question is, does there exist a sub- O(n2) algorithm for this problem where n is the number of intervals?
EDIT: Here are the conditions the intervals meet. The end points of the intervals are floating point numbers. The lower limit for the ai's/bi's is 0 and the upper limit is whatever max float is. Also, there is the condition that ai < bi, so no intervals of length 0.
Yes, it is possible.
We will borrow the typical computational geometry "scan line" trick.
First, let's answer an easier (but closely related) question. Instead of reporting how many other intervals each interval contains, let's report how many intervals each is contained in. So for your example with only two intervals, interval I0 = [1,4] has value zero because it is contained in zero intervals, while I1 = [2,3] has value one because it is contained in one interval.
You will see in a minute (a) why this question is easier and (b) how it leads to the answer for the original question.
To solve this easier question: Take all starting and ending points -- all of the ai and bi -- and put them into a master list. Call each element of this list an "event". So an event would be something like "interval I37 started" or "interval I23 ended".
Sort this list of events and process it in order.
As you process the list of events, maintain a set S of "active intervals". An interval is "active" if we have encountered its start event but not its ending event; that is, if we are within that interval.
Now, whenever we see an ending event bj, we are ready to compute how many intervals contain Ij (= [aj, bj]). All we need to do is examine the set S of active intervals and determine how many of them started before aj. That is our answer for how many intervals contain interval Ij.
To do this efficiently, keep S itself sorted by starting point; e.g., by using a self-balancing binary tree.
Sorting the list of events is O(2n log 2n) = O(n log n). Adding or removing an element from a self-balancing binary tree is O(log n). Asking "how many elements of the self-balancing binary tree are less than x?" is also O(log n). Therefore this entire algorithm is O(n log n).
So, that solves the easy question. Call that the "easy algorithm". Now for what you actually asked.
Think of the number line as extending to infinity and wrapping around to -infinity, and define an interval with bi < ai to start at ai, stretch to infinity, wrap to minus infinity, and end at bi.
For any interval Ij = [aj, bj], define Complement(Ij) as the interval [bj, aj]. (For example, the interval [2, 3] starts at 2 and ends at 3; so Complement([2,3]) = [3,2] starts at 3, stretches to infinity, wraps to -infinity, and ends at 2.)
Observe that interval I contains interval J if and only if Complement(J) contains Complement(I). (Prove this.)
So, we can answer your original question simply by running the "easy algorithm" on the set of complements of all of the intervals. That is, start your scan at -infinity with the set S of "active intervals" containing all intervals (because all complements contain infinity/-infinity). Keep S sorted by end point (i.e. start point of complement).
Sort all start points and end points and process them in order. When you encounter a starting point for interval Ij (= [aj, bj]), you are actually hitting the end point of its complement... So remove Ij from S, query S to see how many of its endpoints (i.e. complement start points) come before bj, and report that as the answer for Ij. If you later encounter the end point of Ij, you are encountering the start point of its complement, so you need to add it back into the set S of active intervals.
This final algorithm is O(n log n) for the same reasons the "easy algorithm" was.
[Update]
One clarification, one correction, one comment...
Clarification: Of course, the "self-balancing binary tree" has to be augmented such that each sub-tree knows how many elements it contains. Otherwise, you cannot answer "how many elements are less than x?" This augmentation is straightforward to maintain, but it is not something that every implementation provides; e.g. the C++ std::set does not, to my knowledge.
Correction: You do not want to add any elements back in to the set S of active intervals; in fact, doing so can result in the wrong answer. For example, if the intervals are just [1,2] and [3,4], you would hit 1 (and remove [1,2] from the set), then 2 (and add it back in again), then 3... And since 2<4, you would conclude that [3,4] contains [1,2]. Which is wrong.
Conceptually, you already processed all of the "start events" for the complement intervals; that is why S begins will all intervals inside of it. So all you need to worry about are the ending points; you do not want to add any elements to S, ever.
Put another way, instead of having the intervals wrap around, you can think of [bi,ai] (where bi > ai) as meaning [bi - infinity, ai] with no wrap-around. The logic still works, but the processing is more clear: First you process all of the "whatever - infinity" terms (i.e. the end points), then you process the others (i.e. the start points).
With this correction, I am pretty sure my solution actually works. This formulation also extends -- I think -- to the case where you have both normal and "backward" intervals together in one input.
Comment: This problem is tricky because if you have to enumerate the set of all intervals contained within every interval, the output itself can be O(n^2). So any working approach has to somehow count the intervals without even being able to identify them :-).
Here is a O(N*LOG(N)):
let Ii = Interval i = (ai, bi)
let L = list of intervals I
sort L by ai
divide L in half into L1a and L2a.
sort L1a and L2a by bi to get L1b and L2b
merge sort L1b and L2b keeping track of the count of nestings (e.g. because all intervals in L1b start before intervals in L2b, when we find an endpoint in L1b that is higher than an endpoint in l2b, we know everything between them is nested inside - think about it)..
Now you have updated the counts on how often an interval in L2 is nested inside an interval in L1.
after merging L1 and L2, we repeat the process (recursion) by dividing L1 into L11a and l12a, also dividing L2 into L21a and L21a..

Resources