Insert Interval into a disjoint set of intervals - algorithm

Given sorted disjoint sets (p,q) where ‘p’ is the start time and ‘q’ is the end time. You will be given one input interval. Insert it in the right place. And return the resulting sorted disjoint sets.
Eg: (1,4);(5,7);(8,10);(13,18)
Input interval – (3,7)
Result : (1,7);(8,10);(13,18)
Input Interval – (1,3)
Result: (1,4);(5,7);(8,10);(13,18)
Input interval – (11,12)
Result: (1,4);(5,7);(8,10);(11,12);(13,18)
Inserting an interval in a sorted list of disjoint intervals , there is no efficient answer here

Your question and examples imply non-overlapping intervals. In this case you can just perform a binary search - whether comparison is done by start time or end time does not matter for non-overlapping intervals - and insert the new interval at the position found if not already present.
UPDATE
I missed the merging occurring in the first example. A bad case is inserting a large interval into a long list of short intervals where the long interval overlaps many short intervals. To avoid a linear search for all intervals that have to be merged one could perform two binary searches - one from the left comparing by start time and one from the right comparing by the end time.
Now it is trivial to decide if the interval is present, must be inserted or must be merged with the intervals between the positions found by the two searches. While this is not very complex it is probably very prone to off-by-one errors and requires some testing.

Related

How to sort intervals into minimum number of bins without overlapping intervals

I'm looking for an algorithm to achieve the following:
Given an arbitrary set of "intervals", where an interval is defined simply by a start and an end (2 floating point numbers with end >= start). I would like to organise these intervals into 1 or more "bins"/"buckets"/groups such that:
No two intervals within a single bin overlap each other
The minimum number of bins is used
My solution has been to iterate the intervals and for each one, essentially perform a binary search on each bin until one is found which can accommodate the interval (new bin if necessary). This works but I'm left wondering if it can be optimised because depending on the order in which the intervals are processed, the result is different. I have a feeling that sorting the intervals from largest to smallest (end - start) gives better results but I'm not certain.
Sort the intervals by their end point, and then take each interval in end-point order, and put it into the bucket with the largest right-most end point, such that bucket.max_endpoint < interval.startpoint. If there is no such bucket, then you have to start a new one.
If you keep the buckets in a binary search tree sorted by max_endpoint, then you can find the best one in log(|buckets|) time, for O(N log N) all together.
The proof that choosing the tightest fit interval for each bucket is optimal is simple:
Imagine that you already know an optimal assignment of intervals to buckets, and you put them into the buckets in endpoint order, and at some point you don't choose the tightest-fit bucket...
If you change your mind and choose the tightest-fit bucket instead, you'll be in the same situation, except that one of the buckets will have more room on the right. It never hurts to have more room, so the remaining intervals will still fit.

Sub O(n^2) algorithm for counting nested intervals?

We have a list of intervals of the form [ai, bi]. For each interval, we want to count the number of other intervals that are nested within it.
For example, if we had two intervals, A = [1,4] and B = [2,3]. Then the count for B would be 0 as there are no nested intervals for B; and the count for A would be 1 as B fits within A.
My question is, does there exist a sub- O(n2) algorithm for this problem where n is the number of intervals?
EDIT: Here are the conditions the intervals meet. The end points of the intervals are floating point numbers. The lower limit for the ai's/bi's is 0 and the upper limit is whatever max float is. Also, there is the condition that ai < bi, so no intervals of length 0.
Yes, it is possible.
We will borrow the typical computational geometry "scan line" trick.
First, let's answer an easier (but closely related) question. Instead of reporting how many other intervals each interval contains, let's report how many intervals each is contained in. So for your example with only two intervals, interval I0 = [1,4] has value zero because it is contained in zero intervals, while I1 = [2,3] has value one because it is contained in one interval.
You will see in a minute (a) why this question is easier and (b) how it leads to the answer for the original question.
To solve this easier question: Take all starting and ending points -- all of the ai and bi -- and put them into a master list. Call each element of this list an "event". So an event would be something like "interval I37 started" or "interval I23 ended".
Sort this list of events and process it in order.
As you process the list of events, maintain a set S of "active intervals". An interval is "active" if we have encountered its start event but not its ending event; that is, if we are within that interval.
Now, whenever we see an ending event bj, we are ready to compute how many intervals contain Ij (= [aj, bj]). All we need to do is examine the set S of active intervals and determine how many of them started before aj. That is our answer for how many intervals contain interval Ij.
To do this efficiently, keep S itself sorted by starting point; e.g., by using a self-balancing binary tree.
Sorting the list of events is O(2n log 2n) = O(n log n). Adding or removing an element from a self-balancing binary tree is O(log n). Asking "how many elements of the self-balancing binary tree are less than x?" is also O(log n). Therefore this entire algorithm is O(n log n).
So, that solves the easy question. Call that the "easy algorithm". Now for what you actually asked.
Think of the number line as extending to infinity and wrapping around to -infinity, and define an interval with bi < ai to start at ai, stretch to infinity, wrap to minus infinity, and end at bi.
For any interval Ij = [aj, bj], define Complement(Ij) as the interval [bj, aj]. (For example, the interval [2, 3] starts at 2 and ends at 3; so Complement([2,3]) = [3,2] starts at 3, stretches to infinity, wraps to -infinity, and ends at 2.)
Observe that interval I contains interval J if and only if Complement(J) contains Complement(I). (Prove this.)
So, we can answer your original question simply by running the "easy algorithm" on the set of complements of all of the intervals. That is, start your scan at -infinity with the set S of "active intervals" containing all intervals (because all complements contain infinity/-infinity). Keep S sorted by end point (i.e. start point of complement).
Sort all start points and end points and process them in order. When you encounter a starting point for interval Ij (= [aj, bj]), you are actually hitting the end point of its complement... So remove Ij from S, query S to see how many of its endpoints (i.e. complement start points) come before bj, and report that as the answer for Ij. If you later encounter the end point of Ij, you are encountering the start point of its complement, so you need to add it back into the set S of active intervals.
This final algorithm is O(n log n) for the same reasons the "easy algorithm" was.
[Update]
One clarification, one correction, one comment...
Clarification: Of course, the "self-balancing binary tree" has to be augmented such that each sub-tree knows how many elements it contains. Otherwise, you cannot answer "how many elements are less than x?" This augmentation is straightforward to maintain, but it is not something that every implementation provides; e.g. the C++ std::set does not, to my knowledge.
Correction: You do not want to add any elements back in to the set S of active intervals; in fact, doing so can result in the wrong answer. For example, if the intervals are just [1,2] and [3,4], you would hit 1 (and remove [1,2] from the set), then 2 (and add it back in again), then 3... And since 2<4, you would conclude that [3,4] contains [1,2]. Which is wrong.
Conceptually, you already processed all of the "start events" for the complement intervals; that is why S begins will all intervals inside of it. So all you need to worry about are the ending points; you do not want to add any elements to S, ever.
Put another way, instead of having the intervals wrap around, you can think of [bi,ai] (where bi > ai) as meaning [bi - infinity, ai] with no wrap-around. The logic still works, but the processing is more clear: First you process all of the "whatever - infinity" terms (i.e. the end points), then you process the others (i.e. the start points).
With this correction, I am pretty sure my solution actually works. This formulation also extends -- I think -- to the case where you have both normal and "backward" intervals together in one input.
Comment: This problem is tricky because if you have to enumerate the set of all intervals contained within every interval, the output itself can be O(n^2). So any working approach has to somehow count the intervals without even being able to identify them :-).
Here is a O(N*LOG(N)):
let Ii = Interval i = (ai, bi)
let L = list of intervals I
sort L by ai
divide L in half into L1a and L2a.
sort L1a and L2a by bi to get L1b and L2b
merge sort L1b and L2b keeping track of the count of nestings (e.g. because all intervals in L1b start before intervals in L2b, when we find an endpoint in L1b that is higher than an endpoint in l2b, we know everything between them is nested inside - think about it)..
Now you have updated the counts on how often an interval in L2 is nested inside an interval in L1.
after merging L1 and L2, we repeat the process (recursion) by dividing L1 into L11a and l12a, also dividing L2 into L21a and L21a..

Drawing sum of values across multiple overlapping dynamic intervals

I have a an array of intervals [a,b] (where [a,b] = set of all x such that a<=x<=b). Each one of these intervals has a value associated with it (think of it as the cost of something across such interval). Intervals can overlap. Intervals are dynamic (they can be added, removed, translated, and their size can be changed). Also, the value associated with any of such intervals can change.
I need to create a graph containing the sum of all such values across interval [start, end] which is defined as the interval containing all of such intervals. In order to do so I need an ordered list of where, along the real line, such values change, as well as what values they are changing between. Such list needs to be easily / quickly updated when an interval in the original array changes.
Side notes: assume not very large number of intervals (hundreds?).
Any suggestions on data structures / algorithms to do this effectively?
Interval tree is able to perform such operations

Algorithm for partitioning 1-dimensional space

I two sets of intervals that correspond to the same 1-dimensional (linear) space. Here is a rough visual--in reality, there are many more intervals and they are much more spread out, but this gives the basic idea.
Each of these intervals contains information, and I am writing a program to compare the information in one set of intervals (the red) to the information contained in the other set (the blue).
Here is my problem. I would like to partition the space into n chunks such that there is roughly an equal amount of comparison work to be done in each chunk (the amount of work depends on the number of intervals in that portion of the space). Also, the partition should not split any red or blue interval across two chunks.
So the input is two sets of intervals, and the desired output is a partition of the space such that
the intervals are (roughly) equally distributed across each element of the partition
no interval overlaps with multiple partition elements
Can anyone suggest an approach or an algorithm for doing this?
Define a "word" to be a maximal interval in which every point belongs either to a red interval or a blue interval. No chunk can end in the middle of a word, and every union of consecutive words is a potential chunk. Now apply a minimum raggedness word-wrap algorithm to the words, where the length of a word is defined to be the number of intervals it contains (line = chunk).

search for interval overlap in list of intervals?

Say [a,b] represents the interval on the real line from a to b, a < b, inclusive (ie, [a,b] = set of all x such that a<=x<=b). Also, say [a,b] and [c,d] are 'overlapping' if they share any x such that x is in both [a,b] and [c,d].
Given a list of intervals, ([x1,y1],[x2,y2],...), what is the most efficient way to find all such intervals that overlap with [x,y]?
Obviously, I can try each and get it in O(n). But I was wondering if I could sort the list of intervals in some clever way, I could find /one/ overlapping item in O(log N) via a binary search, and then 'look around' from that position in the list to find all overlapping intervals. However, how do I sort intervals such that such a strategy would work?
Note that there may be overlaps between elements in the list items itself, which is what makes this hard.
I've tried it by sorting intervals by their left end, right end, middle, but none seem to lead to an exhaustive search.
Help?
For completeness' sake, I'd like to add that there is a well-known data structure for just this sort of problem, known (surprise, surprise) as an interval tree. It's basically an augmented balanced tree (red-black, AVL, your pick) that stores intervals sorted by their left (low) endpoint. The augmentation is that each node stores the largest right (high) endpoint in its subtree. This tree allows you to find all overlapping intervals in O(log n) time.
It's described in CLRS 14.3.
[a, b] overlaps with [x, y] iff b > x and a < y. Sorting intervals by their first elements gives you intervals matching the first condition in log time. Sorting intervals by their last elements gives you intervals matching the second condition in log time. Take the intersections of the resulting sets.
A 'quadtree' is a data structure often used to improve the efficiency of collision detection in 2 dimensions.
I think you could come up with a similar 1-d structure. This would require some pre-computation but should result in O(log N) performance.
Basically you start with a root 'node' that covers all possible intervals, and when adding a node to the tree, you decide if it falls on the left or the right of the midpoint. If it crosses the mid point, you break it into two intervals (but record the original parent) and recursively proceed from there. You can set a limit on the depth of the tree, which can save memory and improve performance, but comes at the expense of complicating things a little (you need to store a list of intervals in your nodes).
Then when checking an interval, you basically find all leaf nodes that it would be inserted into were it inserted, check the partial intervals within those nodes for intersection, and then report the interval that is recorded against them as the 'original' parent.
Just a quick thought 'off the cuff' so to speak.
Could you organize them into 2 lists, one for start of intervals and the other for end of intervals.
This way, you can compare y to the items in the start of interval list (say by binary search) to cut down the candidates based on that.
You can then compare x to the items in the end of interval list.
EDIT
Case: Once Off
If you are comparing only single interval to the list of intervals in a once-off situation, I don't believe sorting will help you out since ideal sorting is O(n).
By doing a linear search through all x's to trim out any impossible intervals then doing another linear search through the remaining y's you can reduce your total work. While this is still O(n), without this you would be doing 2n comparisons, whereas on average, you would only do (3n-1)/2 comparisons this way.
I believe this is the best you can do for an unsorted list.
Case: Pre-sorting doesn't count
In the case where you will be repeatedly comparing single intervals to this list of intervals and your pre-sort your list, you can achieve better results. The process above still applies, but by doing a binary search on the first list then the second you can get O(m log n) as opposed to O(mn), where m is the number of single intervals being compared. Note, still still gives you the advantage of reducing total comparisons. [2m log n compared to m(3(log n) - 1)/2]
You could sort by both left end and right end at the same time and use both lists to eliminate none overlapping values. If the list is sorted by the left end then none of the intervals to the right of the right end of the test range can overlap. If the list is sorted by the right end then none of the intervals to the left of the left end of the test range can overlap.
For example if the intervals are
[1,4], [3,6], [4,5], [2,8], [5,7], [1,2], [2,2.5]
and you're finding overlap with [3,4] then sorting by left end and marking position of the right end of the test (with the right end as just greater than its value so that 4 is included in the range)
[1,4], [1,2], [2,2.5], [2,8], [3,6], [4,5], *, [5,7]
you know [5,7] can't overlap, then sorting by right end and marking position of the left end of the test
[1,2], [2,2.5], *, [1,4], [4,5], [3,6], [5,7], [2,8]
you know [1,2] and [2,2.5] can't overlap
Not sure how efficient this would be since you're having to do two sorts and searches.
As you can see in other answers, most algorithms come together with a special data structure. For example, for unsorted list of intervals as input O(n) is best that you'll get. (And usually it's easier to think in terms of data structure that dictates the algorithm).
In this case, your question is not complete:
Are you given the whole list or it is you who actually creates it?
Do you have to perform just one such lookup or many of them?
Do you have any estimations for operations it should support and their frequencies?
For example, if you have to perform just one such lookup, then it's not worthy to sort the list before. If many, then the more expensive sorting or generation of an "1D quadtree" would be amortized.
However, it would be difficult to solve it, because a simple quadtree (as I understand it) is able just to detect the collistion, but it's not able to create the list of all the segments that are overlapping with your input.
One simple implementation would be an ordered (by coordonate) list where you insert all the segment ends with flag start/end and with segment number. In this way, by parsing it (still O(n), but I doubt you can make it faster if you also need the list of all the segments that overlaps), and keeping the track of all opened segments that were not closed at "check points".

Resources