Efficiently finding overlapping intervals from a list of intervals - algorithm

This is related to finding overlapping intervals. I know how to do so given a list of intervals (interval trees). What I have is a list of list of intervals. For example,
[2,6], [7,11]
[1,3], [5,10], [11,13]
[2,5], [6,8]
The result for this should be
[2,3], [7,8]
What I need to do is to find a list of intervals that are common in all the lists.
I see this problem as similar to merging n lists. The problem is I cannot apply pairwise merging of lists. Applying this method can cause a loss of overlapping intervals. So I need to merge all the lists together considering all of them at a time (instead of pairwise).
I can use interval trees. Inserting the first intervals from each list to the interval tree and finding the overlap. Removing the weakest interval from the tree and inserting next interval from one of the lists. I haven't yet completely figured out how I can use this method but it seems It'll get too expensive.
Is there any efficient algorithm for finding overlapping intervals from a list of list of intervals.?
Additional Info:
The intervals within a list are sorted. They don't overlap and form a sequence.

Create a single, sorted array of transitions. Each transition has a position, and a cumulative number based on how many intervals you're joining or leaving. As you pass through the list, keep track of how many intervals you are in. When you're in as many intervals as series, that's when you're in a common interval.
For your example the transitions would be:
[2, 1], [6, -1], [7, 1], [11, -1],
[1, 1], [3, -1], [5, 1], [10, -1], [11, 1], [13, -1]
[2, 1], [5, -1], [6, 1], [8, -1]
which after sorting by position and merging collapses to:
[1, 1], [2, 2], [3, -1], [5, 0], [6, 0], [7, 1], [8, -1], [10, -1], [11, 0], [13, -1]
which gives you transitions for running totals of:
[1, 1], [2, 3], [3, 2], [7, 3], [8, 2], [10, 2], [13, 1]
And then we can read off the intervals where we're at 3 as one starting at 2 and going to 3, and another starting at 7 and going to 8. Which is the answer.
The idea of creating one long list and sorting is admittedly extra work. You can instead create those lists and merge them on the fly. The savings is a factor of the log of the number of series rather than the log of the number of events.

My understanding of what you want to do is to apply the intersection operation over list of intervals. And you can do this pairwise as intersection is associative.
What I would do is something like
Let S be the set of sets, R = s1, s1 in S
for each set s2 in S / {s1}
for each element e1 in R
for each element e2 in s2 s.t. e1.sup < e2.inf
e1 <- intersection (e1, e2)
And the intersection operation between two intervals is
intersection (e1,e2):
return new Interval(max(e1.inf, e2.inf), min (e1.sup, e2.sup));

You said each individual list of intervals is sorted and non-overlapping. So,
Keep track of where you are in each list, starting at the beginning of each.
While none of the lists has run out:
If the current intervals (one from each list) all overlap:
Output the intersection of the current intervals
Find which of the current intervals has the earliest end point
Advance one position within that list.
If there are K lists of intervals and N intervals altogether, this should take O(N K) time if implemented in the most straightforward way, but you should be able to reduce this to O(N log K) time by tracking information about the current intervals in a heap or some other priority queue.

Related

The best solution (considering time complexity) for the function implementation

A function does the following task:
For example L = [[1, 2, 3], [1, 2], [1, 2, 3, 5, 6, 8], [1, 8, 6, 10, 21], [1, 4, 6, 9], [22]]; (array of arrays)
find out the index number of L such that all digit numbers in the value(sub-array) don't appear in other sub-arrays. In this example, the function would return 5 (the index of [22]) because 22 is only in this sub-array.
What could be the optimal solution in time complexity
The algorithm is to keep track of all the numbers you've seen so far (for example in a hashset), and process the sub-arrays one by one until you find one which matches your condition. In the worst case it's O(n) basic set operations, where n is the sum of the lengths of the subarrays of L. This is O(n) comparisons on average if you use a hashset.

query the number of intersected segments in a rage

I have a large dataset of segments (ai, bi), where ai < bi, and many queries. Each query asks for the number of intersected segments with the given range (b, e). The number of queries can be very large. A naive algorithm is to search for all intersected segments per query which takes O(N) time apparently. Is there a faster way to do this? I can imagine soring the segments dataset in ascending order of ai may help but I don't know what to do with the other direction.
segments: [1, 3], [2, 6], [4, 7], [7, 8]
query 1: [2, 5] => output [1, 3] [2, 6], [4, 7]
...
Make list B of sorted start points, as you wrote.
Make list P of structures containing all points - both starting and ending points together with field SE = +1/-1 for start and end correspondingly. Sort it by point coordinate.
Make Active = 0. Walk through P, adding SE to Counter and making new list A containing point position and Active count.
For every query start search (with binary search) lower position in A, get Active - number of opened segments at this moment.
Then search indexes in B corresponding to query start and query end, get index difference - number of segments starting inside query interval.
Sum of these values is needed number of intersected segments (you don't need segments themselves according to the problem statement)
Time per query is O(log(N))
[1, 3], [2, 6], [4, 7], [7, 8] initial list
[1, 2, 4, 7] list B
(1,1),(2,1),(3,-1),(4,1),(6,-1),(7,-1),(7,1),(8,-1) list P
(1,1),(2,2),(3,1), (4,2),(6,1), (7,0), (7,1),(8,0) list A
^
q start 2 gives active = 2 (two active intervals)
searching 2 in B gives index 1, searching 5 gives index 2,
difference is 1
result = 2 + 1 = 3

Segment tree doesn't take care of all segments of input data. Is there an efficient date structure that does?

Segment tree is an efficient, but not always completely useful date structure. For example: if we have an array of length 8, it will take care of segments 1-2, 3-4, 5-6, 7-8, 1-4, 5-8 and 1-8. But many will be left out, such as 2-3, 2-4, 4-5, 6-7 etc. Is there an efficient data structure that takes care of all of the segments of input data?
No, its not true. It actually "take care" of every intervals.
For instance, in above segment tree, if you need to perform an query for range [4, 7], it will go to left subtree like [0, 4] -> [3, 4] -> [4, 4] and in right subtree [5, 9] -> [5, 7] and then aggregate the result of [4, 4] and [5, 7] and pass the result up to root.
I would suggest to simulate with pen and paper or use your debugger to see what happens under the hood of the recursion calls. Good luck!

Is there an Algorithm for finding the minimum number of classrooms for scheduling n classes in O(nlogn)?

So the question is this :
we have n classes (n intervals) with start time and finish time [si , fi] and we want to find the minimum number of classrooms which we can satisfy all the classes(intervals) without any collusion
the book that I'm reading says we can solve this in O(nlogn) but i cannot find any algorithm better than O(n^2)
it says we should sort them by the starting time but doesn't say the rest of the solution, but that doesn't make any sense because before giving each class a room, shouldn't we check all the other intervals to see if we have a collusion or not? which makes it O(n^2) because for each interval we need to check all the other intervals
am i missing something ?
You can sort the events (an event is either the start of a class or the end of a class) by time. This will take O(n log n).
Now, keep a stack of empty rooms and go through the events in order:
for each start event take a room from the empty stack and allocate the class to it.
for each end event put the corresponding room back to the empty stack.
This second phase can be completed in O(n).
By keeping track of the allocations and deallocations done you can easily find the number of needed rooms and the schedule.
If you just need the number of needed rooms this can be simplified to just use a counter instead of the list of rooms. Add one for each start event and subtract 1 for each end event; keep track of the maximum.
First step: Store classes' starting and finishing points individually in actions array. If the point is starting point then type of action is +1 else, if it is ending of a class its type is -1.
Second step: Sort the actions array in ascending order by their time. If the times are equal then sort them by type in ascending order.
Third step: Set counter to 0, iterate through actions array, if it is starting type add 1 to counter, if it is finishing type subtract 1 from counter. Again, if times are equal execute finishing types first. Because you can use the same classroom as soon as the class at that room ends.
The maximum value that counter reaches is your answer.
Here is an implementation of the algorithm in python:
classess = [ [13, 15], [11, 13], [4, 7], [2, 4], [3, 6] ]
# construct a action list:
# action[0] -> time of action
# action[1] -> type of action (-1 for finish type, 1 for start type)
actions = []
for cla55 in classess:
actions.append([cla55[0], 1])
actions.append([cla55[1], -1])
actions.sort()
# [[2, 1], [3, 1], [4, -1], [4, 1], [6, -1], [7, -1], [11, 1], [13, -1], [13, 1], [15, -1]]
min_classrooms = 0
curr_classrooms = 0
for action in actions:
curr_classrooms += action[1]
if curr_classrooms > min_classrooms:
min_classrooms = curr_classrooms
print(min_classrooms)

Place Intervals to Maximize Number of Adjacency

My problem:
I have n "items" to place on an integer axis. Each item contains several choices of placement, represented as closed intervals of integers. These intervals can have some overlapping elements. The goal is to find a non-overlapping placement (if any) of the n items on the int axis with maximal number of interval adjacency.
More details with the terms used above:
1) Overlapping: interval [1, 4] and [3, 6] have two overlapping elements {3} and {4}; interval [2, 5] and [6, 10] do not overlap.
2) Interval adjacency: interval [a, b] and [b+1, c] are called adjacent. The number of interval adjacency for this example is 1. The maximal possible number of interval adjacency for n items is n-1, which occurs when the placement makes n intervals pair-wisely adjacent.
Example:
There are 3 items; their placement choices are listed here
item1 has 2 choices: [1, 4], [2, 5]
item2 has 3 choices: [5, 8], [9, 11], [16, 18]
item3 has 2 choices: [3, 5], [13, 15]
One feasible placement is
[1, 4](item1), [5, 8](item2), [13, 15](item3)
Another two feasible placement are
[1, 4](item1), [16, 18](item2), [13, 15](item3); and
[2, 5](item1), [16, 18](item2), [13, 15](item3).
All these three placement in this example are optimal. The number of interval adjacency is 1.
My question:
Is there a better way than enumeration of all possibilities?
I can only think of that if an interval choice of an item overlaps with all the choices of another item, then this choice can be excluded. Any ideas are welcome:)

Resources