Algorithm to find an interval with the highest summed weight of weighted overlapping intervals - algorithm

Well, I think it's hard to explain, so I've made a figure to show that.
As we can see in this figure, there are 6 intervals of time. Each one has its weight. Higher the opacity, higher the weight. I want an algorithm to find the interval with the highest summed weight. In the case of the figure, it'd be the overlapping of the intervals 5 and 6, which is the area with highest opacity.

Split each interval into start and end points.
Sort the points.
Start with a sum of 0.
Iterate through the points using a sweep-line algorithm:
If you get a start point:
Increase the sum by the value of the corresponding interval.
If the sum count is higher than the best sum so far, store this start point and set a flag.
If you get an end point:
If the flag is set, store the stored start point and this end point with the current sum as the best interval so far and reset the flag.
Decrease the count by the value of the corresponding interval.
This is derived from the answer I wrote here, which is based on the unweighted version, i.e. finding the maximum number of overlapping intervals, rather than the maximum summed weight.
Example:
For this example:
The start / end points will be sorted as: (S = start, E = end)
1S, 1E, 2S, 3S, 2E, 3E, 4S, 5S, 4E, 6S, 5E, 6E
Iterating through them, you'll set the flag on 1S, 5S and 6S, and you'll store the respective intervals at 1E, 4E and 5E (which is the first end-points you get to after the above start points).
You won't set the flag on 2S, 3S or 4S, as the sum will be lower than the best sum so far.

The algorithm logic can be derived from the figure. Assuming that resolution of time intervals is 1 min, then an array can be created and used for all the calculations:
create the array of 24 * 60 elements and fill it with 0 weights;
for each time interval add the weight of this interval to the corresponding part of the array;
find a maximum summed weight by iterating the array;
iterate over the array again and output array index (time) with the maximal summed weight.
This algorithm can be modified for a slightly different task, if you need to have interval indices in the output. In this case the array should contain list of the input time interval indices as a second dimension (or it can be a separate array, depending on particular language).
UPD. I was curious if this simple algorithm is significantly slower than more elegant one suggested by #Dukeling. I coded both algorithms and created an input generator to estimate their performance.
Generator:
#!/bin/sh
awk -v n=$1 '
BEGIN {
tmax = 24 * 60; wmax = 100;
for (i = 0; i < n; i++) {
t1 = int(rand() * tmax);
t2 = int(rand() * tmax);
w = int(rand() * wmax);
if (t2 >= t1) {print t1, t2, w} else {print t2, t1, w}
}
}' | sort -n > i.txt
Algorithm #1:
#!/bin/sh
awk '
{t1[++i] = $1; t2[i] = $2; w[i] = $3}
END {
for (i in t1) {
for (t = t1[i]; t <= t2[i]; t++) {
W[t] += w[i];
}
}
Wmax = 0.;
for (t in W){
if (W[t] > Wmax) {Wmax = W[t]}
}
print Wmax;
for (t in W){
if (W[t] == Wmax) {print t}
}
}
' i.txt > a1.txt
Algorithm #2:
#!/bin/sh
awk '
{t1[++i] = $1; t2[i] = $2; w[i] = $3}
END {
for (i in t1) {
p[t1[i] "a" i] = i "S";
p[t2[i] "b" i] = i "E";
}
n = asorti(p, psorted, "#ind_num_asc");
W = 0.; Wmax = 0.; f = 0;
for (i = 1; i <= n; i++){
P = p[psorted[i] ];
k = int(P);
if (index(P, "S") > 0) {
W += w[k];
if (W > Wmax) {
f = 1;
Wmax = W;
to1 = t1[k]
}
}
else {
if (f != 0) {
to2 = t2[k];
f = 0
}
W -= w[k];
}
}
print Wmax, to1 "-" to2
}
' i.txt > a2.txt
Results:
$ ./gen.sh 1000
$ time ./a1.sh
real 0m0.283s
$ time ./a2.sh
real 0m0.019s
$ cat a1.txt
24618
757
$ cat a2.txt
24618 757-757
$ ./gen.sh 10000
$ time ./a1.sh
real 0m3.026s
$ time ./a2.sh
real 0m0.144s
$ cat a1.txt
252452
746
$ cat a2.txt
252452 746-746
$ ./gen.sh 100000
$ time ./a1.sh
real 0m34.127s
$ time ./a2.sh
real 0m1.999s
$ cat a1.txt
2484719
714
$ cat a2.txt
2484719 714-714
The simple on is ~20x slower.

Related

Algorithms question: Largest contiguous subarray selection

Q. Given two arrays, A and B, of equal length, find the largest possible contiguous subarray of indices [i,j] such that max(A[i: j]) < min(B[i: j]).
Example: A = [10, 21, 5, 1, 3], B = [3, 1, 4, 23, 56]
Explanation: A[4] = 1, A[5] = 3, B[4] = 23, B[5] = 56, max(A[4], A[5]) < min(B[4], B[5])
The indices are [4,5] (inclusive), and the largest contiguous subarray has length 2
I can do this in O(n2) brute force method but cannot seem to reduce the time complexity. Any ideas?
O(n) solution:
Move index j from left to right and drag i behind so that the window from i to j is valid. So always increase j by 1, and then increase i as much as needed for the window to be valid.
To do that, keep a queue Aq of indexes of non-increasing A-values. Then A[Aq[0]] tells you the max A-value in the window. Similarly, keep a queue for non-decreasing B-values.
For each new j, first update Aq and Bq according to the new A-value and new B-value. Then, while the window is invalid, increase i and drop Aq[0] and Bq[0] if they're i. When both j and i are updated, update the result with the window size j - i - 1.
Python implementation:
def solution(A, B):
Aq = deque()
Bq = deque()
i = 0
maxlen = 0
for j in range(len(A)):
while Aq and A[Aq[-1]] < A[j]:
Aq.pop()
Aq.append(j)
while Bq and B[Bq[-1]] > B[j]:
Bq.pop()
Bq.append(j)
while Aq and A[Aq[0]] >= B[Bq[0]]:
if Aq[0] == i:
Aq.popleft()
if Bq[0] == i:
Bq.popleft()
i += 1
maxlen = max(maxlen, j - i + 1)
return maxlen
Test results from comparing against a naive brute force reference solution:
expect: 83 result: 83 same: True
expect: 147 result: 147 same: True
expect: 105 result: 105 same: True
expect: 71 result: 71 same: True
expect: 110 result: 110 same: True
expect: 56 result: 56 same: True
expect: 140 result: 140 same: True
expect: 109 result: 109 same: True
expect: 86 result: 86 same: True
expect: 166 result: 166 same: True
Testing code (Try it online!)
from timeit import timeit
from random import choices
from collections import deque
from itertools import combinations
def solution(A, B):
Aq = deque()
Bq = deque()
i = 0
maxlen = 0
for j in range(len(A)):
while Aq and A[Aq[-1]] < A[j]:
Aq.pop()
Aq.append(j)
while Bq and B[Bq[-1]] > B[j]:
Bq.pop()
Bq.append(j)
while Aq and A[Aq[0]] >= B[Bq[0]]:
if Aq[0] == i:
Aq.popleft()
if Bq[0] == i:
Bq.popleft()
i += 1
maxlen = max(maxlen, j - i + 1)
return maxlen
def naive(A, B):
return max((j - i + 1
for i, j in combinations(range(len(A)), 2)
if max(A[i:j+1]) < min(B[i:j+1])),
default=0)
for _ in range(10):
n = 500
A = choices(range(42), k=n)
B = choices(range(1234), k=n)
expect = naive(A, B)
result = solution(A, B)
print(f'expect: {expect:3} result: {result:3} same: {result == expect}')
I can see based on the problem, saying we have 2 conditions:
max(A[i,j-1]) < min(B[i,j-1])
max(A[i,j]) >= min(B[i,j])
saying maxA is index of max item in A array in [i,j], minB is index of min item in B array in [i,j]; and anchor is min(maxA, minB)
Then we will have: max(A[i+k,anchor]) >= min(B[i+k,anchor]) ∀ k in [i+1,anchor].
So I come up with a simple algorithm like below:
int extractLongestRange(int n, int[] A, int[] B) {
// n is size of both array
int size = 0;
for(int i = 0; i < n; i++){
int maxAIndex = i;
int minBIndex = i;
for(int j = i; j < n; j++){
if(A[maxAIndex] < A[j]){
maxAIndex = j;
}
if(B[minBIndex] > B[j]){
minBIndex = j;
}
if(A[maxAIndex] >= B[minBIndex]){
if(size < j - i){
size = j - i;
}
// here, need to jump back to min of maxAIndex and minBIndex.
i = Math.min(maxAIndex, minBIndex);
break;
}
// this case, if j reach the end of array
if(j == n - 1){
if(size < j - i + 1){
size = j - i + 1;
i = j;
}
}
}
}
return size;
}
With this approach, the complexity is O(n).
We can change the output to pick-up the other information if needed.
This can be solved in O(n log(n)).
Here is an outline of my technique.
My idea looks like a "rising water level" of maximum in A, keeping track of the "islands" emerging in A, and the "islands" submerging in B. And recording the maximum intersections after they appear from A emerging, or before they disappear from B sinking.
We will need 2 balanced binary trees of intervals in A and B, and a priority queue of events.
The tree of intervals need to support a logarithmic "find and return interval containing i if it exists". It also needs to support a logarithmic "add i to tree, extending/merging intervals as appropriate, and return new interval". And likewise a logarithmic "remove i from tree, shortening, splitting, or deleting its interval as appropriate".
Events can be of the form "remove B[i]" or "add A[i]". The priority queue is sorted first by the size of the element added/subtracted, and then by putting B before A. (So we do not elements of size x to A until all of the elements of size x are removed from B.)
With that in mind, here is pseudocode for how to do it.
Put all possible events in the priority queue
Initialize an empty tree of intervals A_tree
Initialize a tree of intervals B_tree containing the entire range
Initialize max interval to (0, -1) (length 0)
For each event (type, i) in queue:
if event.type = A:
new_interval = A_tree.add(event.i)
if event.i in B_tree:
Intersect event.i's A_tree interval with event.i's B_tree interval
if intersection is longer than max_interval:
update to new and better max_interval
else:
if event.i in A_tree:
Intersect event.i's A_tree interval with event.i's B_tree interval
if intersection is longer than max_interval:
update to new and better max_interval
B_tree.remove(event.i)
Processing any event is O(log(n). There are 2n = O(n) events. So overall time is O(n log(n)).

Average over diagonally in a Matrix

I have a matrix. e.g. 5 x 5 matrix
$ cat input.txt
1 5.6 3.4 2.2 -9.99E+10
2 3 2 2 -9.99E+10
2.3 3 7 4.4 5.1
4 5 6 7 8
5 -9.99E+10 9 11 13
Here I would like to ignore -9.99E+10 values.
I am looking for average of all entries after dividing diagonally. Here are four possibilities (using 999 in place of -9.99E+10 to save space in the graphic):
I would like to average over all the values under different shaded triangles.
So the desire output is:
$cat outfile.txt
P1U 3.39 (Average of all values of Lower side of Possible 1 without considering -9.99E+10)
P1L 6.88 (Average of all values of Upper side of Possible 1 without considering -9.99E+10)
P2U 4.90
P2L 5.59
P3U 3.31
P3L 6.41
P4U 6.16
P4L 4.16
It is being difficult to develop a proper algorithm to write it in fortran or in shell script.
I am thinking of the following algorithm, but can't able to think what is next.
step 1: #Assign -9.99E+10 to the Lower diagonal values of a[ij]
for i in {1..5};do
for j in {1..5};do
a[i,j+1]=-9.99E+10
done
done
step 2: #take the average
sum=0
for i in {1..5};do
for j in {1..5};do
sum=sum+a[i,j]
done
done
printf "%s %5.2f",P1U, sum
step 3: #Assign -9.99E+10 to the upper diagonal values of a[ij]
for i in {1..5};do
for j in {1..5};do
a[i-1,j]=-9.99E+10
done
done
step 4: #take the average
sum=0
for i in {1..5};do
for j in {1..5};do
sum=sum+a[i,j]
done
done
printf "%s %5.2f",P1L,sum
Just save all the values in an aray indexied by row and column number and then in the END section repeat this process of setting the beginning and end row and column loop delimiters as needed when defining the loops for each section:
$ cat tst.awk
{
for (colNr=1; colNr<=NF; colNr++) {
vals[colNr,NR] = $colNr
}
}
END {
sect = "P1U"
begColNr = 1; endColNr = NF; begRowNr = 1; endRowNr = NR
sum = cnt = 0
for (rowNr=begRowNr; rowNr<=endRowNr; rowNr++) {
for (colNr=begRowNr; colNr<=endColNr-rowNr+1; colNr++) {
val = vals[colNr,rowNr]
if ( val != "-9.99E+10" ) {
sum += val
cnt++
}
}
}
printf "%s %.2f\n", sect, (cnt ? sum/cnt : 0)
sect = "P1L"
begColNr = 1; endColNr = NF; begRowNr = 1; endRowNr = NR
sum = cnt = 0
for (rowNr=begRowNr; rowNr<=endRowNr; rowNr++) {
for (colNr=endColNr-rowNr+1; colNr<=endColNr; colNr++) {
val = vals[colNr,rowNr]
if ( val != "-9.99E+10" ) {
sum += val
cnt++
}
}
}
printf "%s %.2f\n", sect, (cnt ? sum/cnt : 0)
}
.
$ awk -f tst.awk file
P1U 3.39
P1L 6.88
I assume given the above for handling the first quadrant diagonal halves you'll be able to figure out the other quadrant diagonal halves and the horizontal/vertical quadrant halves are trivial (just set begRowNr to int(NR/2)+1 or endRowNr to int(NR/2) or begColNr to int(NF/2)+1 or endColNr to int(NF/2) then loop through the resultant full range of values of each).
you can compute all in one iteration
$ awk -v NA='-9.99E+10' '{for(i=1;i<=NF;i++) a[NR,i]=$i}
END {for(i=1;i<=NR;i++)
for(j=1;j<=NF;j++)
{v=a[i,j];
if(v!=NA)
{if(i+j<=6) {p["1U"]+=v; c["1U"]++}
if(i+j>=6) {p["1L"]+=v; c["1L"]++}
if(j>=i) {p["2U"]+=v; c["2U"]++}
if(i<=3) {p["3U"]+=v; c["3U"]++}
if(i>=3) {p["3D"]+=v; c["3D"]++}
if(j<=3) {p["4U"]+=v; c["4U"]++}
if(j>=3) {p["4D"]+=v; c["4D"]++}}}
for(k in p) printf "P%s %.2f\n", k,p[k]/c[k]}' file | sort
P1L 6.88
P1U 3.39
P2U 4.90
P3D 6.41
P3U 3.31
P4D 6.16
P4U 4.16
I forgot to add P2D, but from the pattern it should be clear what needs to be done.
To generalize further as suggested. Assert NF==NR, otherwise diagonals not well defined. Let n=NF (and n=NR) You can replace 6 with n+1 and 3 with ceil(n/2). Which can be implemented as function ceil(x) {return x==int(x)?x:x+1}

Find N max difference between two consecutive number present in file using unix

Integer numbers are stored in file, i need to find Max and N Max difference between two consecutive number present in file ( one integer number on each row/line)
e.g.
12
15
50
80
Max diff : 35 ( 50 -15 ) and say N=2 so 1st max 35 and 2nd max : 30
#!/usr/bin/awk -f
NR>1{ diff = $0 - prev
for (i = 0; i < N; ++i)
if (diff > maxdiff[i])
{ # sort new max. diff.
for (j = N; --j > i; ) if (j-1 in maxdiff) maxdiff[j] = maxdiff[j-1]
maxdiff[j] = diff
break
}
}
{ prev = $0 }
END { for (i in maxdiff) print maxdiff[i] }
- e. g., if the script is named nmaxdiff.awk and the numbers are stored in the file numbers, enter
nmaxdiff.awk N=2 numbers

Display all the possible numbers having its digits in ascending order

Write a program that can display all the possible numbers in between given two numbers, having its digits in ascending order.
For Example:-
Input: 5000 to 6000
Output: 5678 5679 5689 5789
Input: 90 to 124
Output: 123 124
Brute force approach can make it count to all numbers and check of digits for each one of them. But I want approaches that can skip some numbers and can bring complexity lesser than O(n). Do any such solution(s) exists that can give better approach for this problem?
I offer a solution in Python. It is efficient as it considers only the relevant numbers. The basic idea is to count upwards, but handle overflow somewhat differently. While we normally set overflowing digits to 0, here we set them to the previous digit +1. Please check the inline comments for further details. You can play with it here: http://ideone.com/ePvVsQ
def ascending( na, nb ):
assert nb>=na
# split each number into a list of digits
a = list( int(x) for x in str(na))
b = list( int(x) for x in str(nb))
d = len(b) - len(a)
# if both numbers have different length add leading zeros
if d>0:
a = [0]*d + a # add leading zeros
assert len(a) == len(b)
n = len(a)
# check if the initial value has increasing digits as required,
# and fix if necessary
for x in range(d+1, n):
if a[x] <= a[x-1]:
for y in range(x, n):
a[y] = a[y-1] + 1
break
res = [] # result set
while a<=b:
# if we found a value and add it to the result list
# turn the list of digits back into an integer
if max(a) < 10:
res.append( int( ''.join( str(k) for k in a ) ) )
# in order to increase the number we look for the
# least significant digit that can be increased
for x in range( n-1, -1, -1): # count down from n-1 to 0
if a[x] < 10+x-n:
break
# digit x is to be increased
a[x] += 1
# all subsequent digits must be increased accordingly
for y in range( x+1, n ):
a[y] = a[y-1] + 1
return res
print( ascending( 5000, 9000 ) )
Sounds like task from Project Euler. Here is the solution in C++. It is not short, but it is straightforward and effective. Oh, and hey, it uses backtracking.
// Higher order digits at the back
typedef std::vector<int> Digits;
// Extract decimal digits of a number
Digits ExtractDigits(int n)
{
Digits digits;
while (n > 0)
{
digits.push_back(n % 10);
n /= 10;
}
if (digits.empty())
{
digits.push_back(0);
}
return digits;
}
// Main function
void PrintNumsRec(
const Digits& minDigits, // digits of the min value
const Digits& maxDigits, // digits of the max value
Digits& digits, // digits of current value
int pos, // current digits with index greater than pos are already filled
bool minEq, // currently filled digits are the same as of min value
bool maxEq) // currently filled digits are the same as of max value
{
if (pos < 0)
{
// Print current value. Handle leading zeros by yourself, if need
for (auto pDigit = digits.rbegin(); pDigit != digits.rend(); ++pDigit)
{
if (*pDigit >= 0)
{
std::cout << *pDigit;
}
}
std::cout << std::endl;
return;
}
// Compute iteration boundaries for current position
int first = minEq ? minDigits[pos] : 0;
int last = maxEq ? maxDigits[pos] : 9;
// The last filled digit
int prev = digits[pos + 1];
// Make sure generated number has increasing digits
int firstInc = std::max(first, prev + 1);
// Iterate through possible cases for current digit
for (int d = firstInc; d <= last; ++d)
{
digits[pos] = d;
if (d == 0 && prev == -1)
{
// Mark leading zeros with -1
digits[pos] = -1;
}
PrintNumsRec(minDigits, maxDigits, digits, pos - 1, minEq && (d == first), maxEq && (d == last));
}
}
// High-level function
void PrintNums(int min, int max)
{
auto minDigits = ExtractDigits(min);
auto maxDigits = ExtractDigits(max);
// Make digits array of the same size
while (minDigits.size() < maxDigits.size())
{
minDigits.push_back(0);
}
Digits digits(minDigits.size());
int pos = digits.size() - 1;
// Placeholder for leading zero
digits.push_back(-1);
PrintNumsRec(minDigits, maxDigits, digits, pos, true, true);
}
void main()
{
PrintNums(53, 297);
}
It uses recursion to handle arbitrary amount of digits, but it is essentially the same as the nested loops approach. Here is the output for (53, 297):
056
057
058
059
067
068
069
078
079
089
123
124
125
126
127
128
129
134
135
136
137
138
139
145
146
147
148
149
156
157
158
159
167
168
169
178
179
189
234
235
236
237
238
239
245
246
247
248
249
256
257
258
259
267
268
269
278
279
289
Much more interesting problem would be to count all these numbers without explicitly computing it. One would use dynamic programming for that.
There is only a very limited number of numbers which can match your definition (with 9 digits max) and these can be generated very fast. But if you really need speed, just cache the tree or the generated list and do a lookup when you need your result.
using System;
using System.Collections.Generic;
namespace so_ascending_digits
{
class Program
{
class Node
{
int digit;
int value;
List<Node> children;
public Node(int val = 0, int dig = 0)
{
digit = dig;
value = (val * 10) + digit;
children = new List<Node>();
for (int i = digit + 1; i < 10; i++)
{
children.Add(new Node(value, i));
}
}
public void Collect(ref List<int> collection, int min = 0, int max = Int16.MaxValue)
{
if ((value >= min) && (value <= max)) collection.Add(value);
foreach (Node n in children) if (value * 10 < max) n.Collect(ref collection, min, max);
}
}
static void Main(string[] args)
{
Node root = new Node();
List<int> numbers = new List<int>();
root.Collect(ref numbers, 5000, 6000);
numbers.Sort();
Console.WriteLine(String.Join("\n", numbers));
}
}
}
Why the brute force algorithm may be very inefficient.
One efficient way of encoding the input is to provide two numbers: the lower end of the range, a, and the number of values in the range, b-a-1. This can be encoded in O(lg a + lg (b - a)) bits, since the number of bits needed to represent a number in base-2 is roughly equal to the base-2 logarithm of the number. We can simplify this to O(lg b), because intuitively if b - a is small, then a = O(b), and if b - a is large, then b - a = O(b). Either way, the total input size is O(2 lg b) = O(lg b).
Now the brute force algorithm just checks each number from a to b, and outputs the numbers whose digits in base 10 are in increasing order. There are b - a + 1 possible numbers in that range. However, when you represent this in terms of the input size, you find that b - a + 1 = 2lg (b - a + 1) = 2O(lg b) for a large enough interval.
This means that for an input size n = O(lg b), you may need to check in the worst case O(2 n) values.
A better algorithm
Instead of checking every possible number in the interval, you can simply generate the valid numbers directly. Here's a rough overview of how. A number n can be thought of as a sequence of digits n1 ... nk, where k is again roughly log10 n.
For a and a four-digit number b, the iteration would look something like
for w in a1 .. 9:
for x in w+1 .. 9:
for y in x+1 .. 9:
for x in y+1 .. 9:
m = 1000 * w + 100 * x + 10 * y + w
if m < a:
next
if m > b:
exit
output w ++ x ++ y ++ z (++ is just string concatenation)
where a1 can be considered 0 if a has fewer digits than b.
For larger numbers, you can imagine just adding more nested for loops. In general, if b has d digits, you need d = O(lg b) loops, each of which iterates at most 10 times. The running time is thus O(10 lg b) = O(lg b) , which is a far better than the O(2lg b) running time you get by checking if every number is sorted or not.
One other detail that I have glossed over, which actually does affect the running time. As written, the algorithm needs to consider the time it takes to generate m. Without going into the details, you could assume that this adds at worst a factor of O(lg b) to the running time, resulting in an O(lg2 b) algorithm. However, using a little extra space at the top of each for loop to store partial products would save lots of redundant multiplication, allowing us to preserve the originally stated O(lg b) running time.
One way (pseudo-code):
for (digit3 = '5'; digit3 <= '6'; digit3++)
for (digit2 = digit3+1; digit2 <= '9'; digit2++)
for (digit1 = digit2+1; digit1 <= '9'; digit1++)
for (digit0 = digit1+1; digit0 <= '9'; digit0++)
output = digit3 + digit2 + digit1 + digit0; // concatenation

finding unions of line segments on a number line

I have a number-line between 0 to 1000. I have many line segments on the number line. All line segments' x1 is >= 0 and all x2 are < 1000. All x1 and x2 are integers.
I need to find all of the unions of the line segments.
In this image, the line segments are in blue and the unions are in red:
Is there an existing algorithm for this type of problem?
You can use marzullo's algorithm (see Wikipedia for more details).
Here is a Python implementation I wrote:
def ip_ranges_grouping(range_lst):
## Based on Marzullo's algorithm
## Input: list of IP ranges
## Returns a new merged list of IP ranges
table = []
for rng in range_lst:
start,end = rng.split('-')
table.append((ip2int(start),1))
table.append((ip2int(end),-1))
table.sort(key=lambda x: x[0])
for i in range(len(table) - 1):
if((table[i][0] == table[i+1][0]) and ((table[i][1] == -1) and (table[i+1][1] == 1))):
table[i],table[i+1] = table[i+1],table[i]
merged = []
end = count = 0
while (end < len(table)):
start = end
count += table[end][1]
while(count > 0): # upon last index, count == 0 and loop terminates
end += 1
count += table[end][1]
merged.append(int2ip(table[start][0]) + '-' + int2ip(table[end][0]))
end += 1
return merged
Considering that the coordinates of your segments are bounded ([0, 1000]) integers, you could use an array of size 1000 initialized with zeroes. You then run through your set of segments and set 1 on every cell of the array that the segment covers. You then only have to run through the array to check for contigous sequences of 1.
--- -----
--- ---
1111100111111100
The complexity depends on the number of segments but also on their length.
Here is another method, which also work for floating point segments. Sort the segments. You then only have to travel the sorted segments and compare the boundaries of each adjacent segments. If they cross, they are in the same union.
If the segments are not changed dynamically, it is a simple problem. Just sorting all the segments by the left end, then scanning the sorted elements:
struct Seg {int L,R;};
int cmp(Seg a, Seg b) {return a.L < b.L;}
int union_segs(int n, Seg *segs, Segs *output) {
sort(segs, segs + n, cmp);
int right_most = -1;
int cnt = 0;
for (int i = 0 ; i < n ; i++) {
if (segs[i].L > right_most) {
right_most = segs[i].R;
++cnt;
output[cnt].L = segs[i].L;
output[cnt].R = segs[i].R;
}
if (segs[i].R > right_most) {
right_most = segs[i].R;
output[cnt].R = segs[i].R;
}
}
return cnt+1;
}
The time complexity is O(nlogn) (sorting) + O(n) (scan).
If the segments are inserted and deleted dynamically, and you want to query the union at any time, you will need some more complicated data structures such as range tree.

Resources