query the number of intersected segments in a rage - algorithm

I have a large dataset of segments (ai, bi), where ai < bi, and many queries. Each query asks for the number of intersected segments with the given range (b, e). The number of queries can be very large. A naive algorithm is to search for all intersected segments per query which takes O(N) time apparently. Is there a faster way to do this? I can imagine soring the segments dataset in ascending order of ai may help but I don't know what to do with the other direction.
segments: [1, 3], [2, 6], [4, 7], [7, 8]
query 1: [2, 5] => output [1, 3] [2, 6], [4, 7]
...

Make list B of sorted start points, as you wrote.
Make list P of structures containing all points - both starting and ending points together with field SE = +1/-1 for start and end correspondingly. Sort it by point coordinate.
Make Active = 0. Walk through P, adding SE to Counter and making new list A containing point position and Active count.
For every query start search (with binary search) lower position in A, get Active - number of opened segments at this moment.
Then search indexes in B corresponding to query start and query end, get index difference - number of segments starting inside query interval.
Sum of these values is needed number of intersected segments (you don't need segments themselves according to the problem statement)
Time per query is O(log(N))
[1, 3], [2, 6], [4, 7], [7, 8] initial list
[1, 2, 4, 7] list B
(1,1),(2,1),(3,-1),(4,1),(6,-1),(7,-1),(7,1),(8,-1) list P
(1,1),(2,2),(3,1), (4,2),(6,1), (7,0), (7,1),(8,0) list A
^
q start 2 gives active = 2 (two active intervals)
searching 2 in B gives index 1, searching 5 gives index 2,
difference is 1
result = 2 + 1 = 3

Related

pairwise distinct left ends in all segments

I am provided with M segments of form [L,R] of N elements of an array.I need to change these segments in such a way that all segments have pairwise distinct left ends.
Example : Let suppose we have 5 elements in array and we have 4 segments : [1,2],[1,3],[2,4] and [4,5] then after making all the left ends pairwise disjoint we have [1,2],[3,3],[2,4] and [4,5].Here all segments have different left ends
Let's see if I got this. I suggest
You sort all segments according to the right end.
Then you fix all the left ends, starting with the smallest right end working towards larger right ends. Fixing means you replace the current left end with the next available value.
In Python it looks like this:
def fit_intervals(datalist):
d1 = sorted(datalist, key=lambda x : x[1])
taken = set()
def find_next_free(x):
while x in taken:
x = x + 1
taken.add(x)
return x
for interval in d1:
interval[0] = find_next_free( interval[0] )
data = [ [4,5], [1,9], [1,2], [1,3], [2,4] ]
fit_intervals(data)
print(data)
output: [[4, 5], [5, 9], [1, 2], [2, 3], [3, 4]]
This function find_next_free currently uses a simple linear algorithm, if necessary this could certainly be improved.

Efficiently finding overlapping intervals from a list of intervals

This is related to finding overlapping intervals. I know how to do so given a list of intervals (interval trees). What I have is a list of list of intervals. For example,
[2,6], [7,11]
[1,3], [5,10], [11,13]
[2,5], [6,8]
The result for this should be
[2,3], [7,8]
What I need to do is to find a list of intervals that are common in all the lists.
I see this problem as similar to merging n lists. The problem is I cannot apply pairwise merging of lists. Applying this method can cause a loss of overlapping intervals. So I need to merge all the lists together considering all of them at a time (instead of pairwise).
I can use interval trees. Inserting the first intervals from each list to the interval tree and finding the overlap. Removing the weakest interval from the tree and inserting next interval from one of the lists. I haven't yet completely figured out how I can use this method but it seems It'll get too expensive.
Is there any efficient algorithm for finding overlapping intervals from a list of list of intervals.?
Additional Info:
The intervals within a list are sorted. They don't overlap and form a sequence.
Create a single, sorted array of transitions. Each transition has a position, and a cumulative number based on how many intervals you're joining or leaving. As you pass through the list, keep track of how many intervals you are in. When you're in as many intervals as series, that's when you're in a common interval.
For your example the transitions would be:
[2, 1], [6, -1], [7, 1], [11, -1],
[1, 1], [3, -1], [5, 1], [10, -1], [11, 1], [13, -1]
[2, 1], [5, -1], [6, 1], [8, -1]
which after sorting by position and merging collapses to:
[1, 1], [2, 2], [3, -1], [5, 0], [6, 0], [7, 1], [8, -1], [10, -1], [11, 0], [13, -1]
which gives you transitions for running totals of:
[1, 1], [2, 3], [3, 2], [7, 3], [8, 2], [10, 2], [13, 1]
And then we can read off the intervals where we're at 3 as one starting at 2 and going to 3, and another starting at 7 and going to 8. Which is the answer.
The idea of creating one long list and sorting is admittedly extra work. You can instead create those lists and merge them on the fly. The savings is a factor of the log of the number of series rather than the log of the number of events.
My understanding of what you want to do is to apply the intersection operation over list of intervals. And you can do this pairwise as intersection is associative.
What I would do is something like
Let S be the set of sets, R = s1, s1 in S
for each set s2 in S / {s1}
for each element e1 in R
for each element e2 in s2 s.t. e1.sup < e2.inf
e1 <- intersection (e1, e2)
And the intersection operation between two intervals is
intersection (e1,e2):
return new Interval(max(e1.inf, e2.inf), min (e1.sup, e2.sup));
You said each individual list of intervals is sorted and non-overlapping. So,
Keep track of where you are in each list, starting at the beginning of each.
While none of the lists has run out:
If the current intervals (one from each list) all overlap:
Output the intersection of the current intervals
Find which of the current intervals has the earliest end point
Advance one position within that list.
If there are K lists of intervals and N intervals altogether, this should take O(N K) time if implemented in the most straightforward way, but you should be able to reduce this to O(N log K) time by tracking information about the current intervals in a heap or some other priority queue.

Parallel algorithm for set intersections

I have n-sets (distributed on n-ranks) of data which represents the nodes of a mesh and I wanted to know an efficient parallel algorithm to find the intersection of these sets, i.e., the common nodes. An intersection is defined as soon as any 2 sets share a node.
For example;
Input:
Rank 0: Set 1 - [0, 1, 2, 3, 4]
Rank 1: Set 2 - [2, 4, 5, 6]
Rank 2: Set 3 - [0, 5, 6, 7, 8]
Implement Parallel Algorithm --> Result: (after finding intersections)
Rank 0: [0, 2, 4]
Rank 1: [2, 4, 5, 6]
Rank 2: [0, 5, 6]
The algorithm needs to be done on n-ranks with 1 set on each rank.
You should be able to this fast O(N), in parallel, with hash tables.
For each set S_i, for each member m_x (all of which can be done in parallel), put the set member into a hash table associated with the set name, e.g., . Anytime you get a hit in the hash table on m_x from set S_j, you now have the corresponding set number S_i, and you know immediately that S_i intersects S_j. You can put m_x in the derived intersection sets.
You need a parallel-safe hash table. That's easy; lock the buckets during updates.
[Another answer suggested sorting the sets. With most sort algorithms, would be O(N ln N) time, not as fast].

Partitioning a superset and getting the list of original sets for each partition

Introduction
While trying to do some cathegorization on nodes in a graph (which will be rendered differenty), I find myself confronted with the following problem:
The Problem
Given a superset of elements S = {0, 1, ... M} and a number n of non-disjoint subsets T_i thereof, with 0 <= i < n, what is the best algorithm to find out the partition of the set S called P?
P = S is the union of all disjoint partitions P_j of the original superset S, with 0 <= j < M, such that for all elements x in P_j, every x has the same list of "parents" among the "original" sets T_i.
Example
S = [1, 2, 3, 4, 5, 6, 8, 9]
T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]
So all P_js would be:
P_1 = [1, 4] # all elements x have the same list of "parents": T_1, T_3
P_2 = [2] # all elements x have the same list of "parents": T_2
P_3 = [3] # all elements x have the same list of "parents": T_2, T_3
P_4 = [5, 6, 8, 9] # all elements x have the same list of "parents": S (so they're not in any of the P_j
Questions
What are good functions/classes in the python packages to compute all P_js and the list of their "parents", ideally restricted to numpy and scipy? Perhaps there's already a function which does just that
What is the best algorithm to find those partitions P_js and for each one, the list of "parents"? Let's note T_0 = S
I think the brute force approach would be to generate all 2-combinations of T sets and split them in at most 3 disjoint sets, which would be added back to the pool of T sets and then repeat the process until all resulting Ts are disjoint, and thus we've arrived at our answer - the set of P sets. A little problematic could be caching all the "parents" on the way there.
I suspect a dynamic programming approach could be used to optimize the algorithm.
Note: I would have loved to write the math parts in latex (via MathJax), but unfortunately this is not activated :-(
The following should be linear time (in the number of the elements in the Ts).
from collections import defaultdict
S = [1, 2, 3, 4, 5, 6, 8, 9]
T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]
Ts = [S, T_1, T_2, T_3]
parents = defaultdict(int)
for i, T in enumerate(Ts):
for elem in T:
parents[elem] += 2 ** i
children = defaultdict(list)
for elem, p in parents.items():
children[p].append(elem)
print(list(children.values()))
Result:
[[5, 6, 8, 9], [1, 4], [2], [3]]
The way I'd do this is to construct an M × n boolean array In where In(i, j) &equals; Si &in; Tj. You can construct that in O(Σj|Tj|), provided you can map an element of S onto its integer index in O(1), by scanning all of the sets T and marking the corresponding bit in In.
You can then read the "signature" of each element i directly from In by concatenating row i into a binary number of n bits. The signature is precisely the equivalence relationship of the partition you are seeking.
By the way, I'm in total agreement with you about Math markup. Perhaps it's time to mount a new campaign.

Find the middle element in merged arrays in O(logn)

We have two sorted arrays of the same size n. Let's call the array a and b.
How to find the middle element in an sorted array merged by a and b?
Example:
n = 4
a = [1, 2, 3, 4]
b = [3, 4, 5, 6]
merged = [1, 2, 3, 3, 4, 4, 5, 6]
mid_element = merged[(0 + merged.length - 1) / 2] = merged[3] = 3
More complicated cases:
Case 1:
a = [1, 2, 3, 4]
b = [3, 4, 5, 6]
Case 2:
a = [1, 2, 3, 4, 8]
b = [3, 4, 5, 6, 7]
Case 3:
a = [1, 2, 3, 4, 8]
b = [0, 4, 5, 6, 7]
Case 4:
a = [1, 3, 5, 7]
b = [2, 4, 6, 8]
Time required: O(log n). Any ideas?
Look at the middle of both the arrays. Let's say one value is smaller and the other is bigger.
Discard the lower half of the array with the smaller value. Discard the upper half of the array with the higher value. Now we are left with half of what we started with.
Rinse and repeat until only one element is left in each array. Return the smaller of those two.
If the two middle values are the same, then pick arbitrarily.
Credits: Bill Li's blog
Quite interesting task. I'm not sure about O(logn), but solution O((logn)^2) is obvious for me.
If you know position of some element in first array then you can find how many elements are smaller in both arrays then this value (you know already how many smaller elements are in first array and you can find count of smaller elements in second array using binary search - so just sum up this two numbers). So if you know that number of smaller elements in both arrays is less than N, you should look in to the upper half in first array, otherwise you should move to the lower half. So you will get general binary search with internal binary search. Overall complexity will be O((logn)^2)
Note: if you will not find median in first array then start initial search in the second array. This will not have impact on complexity
So, having
n = 4 and a = [1, 2, 3, 4] and b = [3, 4, 5, 6]
You know the k-th position in result array in advance based on n, which is equal to n.
The result n-th element could be in first array or second.
Let's first assume that element is in first array then
do binary search taking middle element from [l,r], at the beginning l = 0, r = 3;
So taking middle element you know how many elements in the same array smaller, which is middle - 1.
Knowing that middle-1 element is less and knowing you need n-th element you may have [n - (middle-1)]th element from second array to be smaller, greater. If that's greater and previos element is smaller that it's what you need, if it's greater and previous is also greater we need to L = middle, if it's smaller r = middle.
Than do the same for the second array in case you did not find solution for first.
In total log(n) + log(n)

Resources