Related
A function is given with a method to get the next integer from a stream of integers. The numbers are fetched sequentially from the stream. How will we go about producing a summary of integers encountered till now?
Given a list of numbers, the summary will consist of the ranges of numbers. Example: The list till now = [1,5,4,2,7] then summary = [[1-2],[4-5],7]
Put the number in ranges if they are continuous.
My Thoughts:
Approach 1:
Maintain the sorted numbers. So when we fetch a new number from a stream, we can use binary search to find the location of the number in the list and insert the element so that the resulting list is sorted. But since this is a list, I think inserting the element will be an O(N) operation.
Approach 2:
Use Balanced binary search trees like Red, Black, or AVL. Each insertion will be O(log N)
and in order will yield the sorted array from which one can compute the range in O(N)
Approach 2 looks like a better approach if I am not making any mistakes. I am unsure if there is a better way to solve this issue.
I'd not keep the original numbers, but aggregate them to ranges on the fly. This has the potential to reduce the number of elements by quite some factor (depending on the ordering and distribution of the incoming values). The task itself seems to imply that you expect contiguous ranges of integers to appear quite frequently in the input.
Then a newly incoming number can fall into one of a few cases:
It is already contained in some range: then simply ignore the number (this is only relevant if duplicate inputs can happen).
It is adjacent to none of the ranges so far: create a new single-element range.
It is adjacent to exactly one range: extend that range by 1, downward or upward.
It is adjacent to two ranges (i.e. fills the gap): merge the two ranges.
For the data structure holding the ranges, you want a good performance for the following operations:
Find the place (position) for a given number.
Insert a new element (range) at a given place.
Merge two (neighbor) elements. This can be broken down into:
Remove an element at a given place.
Modify an element at a given place.
Depending on the expected number und sparsity of ranges, a sorted list of ranges might do. Otherwise, some kind of search tree might turn out helpful.
Anyway, start with the most readable approach, measure performance for typical cases, and decide whether some optimization is necessary.
I suggest maintaining a hashmap that maps each integer seen so far to the interval it belongs to.
Make sure that two numbers that are part of the same interval will point to the same interval object, not to copies; so that if you update an interval to extend it, all numbers can see it.
All operations are O(1), except the operation "merge two intervals" that happens if the stream produces integer x when we have two intervals [a, x - 1] and [x + 1, b]. The merge operation is proportional to the length of the shortest of these two intervals.
As a result, for a stream of n integers, the algorithm's complexity is O(n) in the best-case (where at most a few big merges happen) and O(n log n) in the worst-case (when we keep merging lots of intervals).
In python:
def add_element(intervals, x):
if x in intervals: # do not do anything
pass
elif x + 1 in intervals and x - 1 in intervals: # merge two intervals
i = intervals[x - 1]
j = intervals[x + 1]
if i[1]-i[0] > j[1]-j[0]: # j is shorter: update i, and make everything in j point to i
i[1] = j[1]
for y in range(j[0] - 1, j[1]+1):
intervals[y] = i
else: # i is shorter: update j, and make everything in i point to j
j[0] = i[0]
for y in range(i[0], i[1] + 2):
intervals[y] = j
elif x + 1 in intervals: # extend one interval to the left
i = intervals[x + 1]
i[0] = x
intervals[x] = i
elif x - 1 in intervals: # extend one interval to the right
i = intervals[x - 1]
i[1] = x
intervals[x] = i
else: # add a singleton
intervals[x] = [x,x]
return intervals
from random import shuffle
def main():
stream = list(range(10)) * 2
shuffle(stream)
print(stream)
intervals = {}
for x in stream:
intervals = add_element(intervals, x)
print(x)
print(set(map(tuple, intervals.values()))) # this line terribly inefficient because I'm lazy
if __name__=='__main__':
main()
Output:
[1, 5, 8, 3, 9, 6, 7, 9, 3, 0, 6, 5, 8, 1, 4, 7, 2, 2, 0, 4]
1
{(1, 1)}
5
{(1, 1), (5, 5)}
8
{(8, 8), (1, 1), (5, 5)}
3
{(8, 8), (1, 1), (5, 5), (3, 3)}
9
{(8, 9), (1, 1), (5, 5), (3, 3)}
6
{(3, 3), (1, 1), (8, 9), (5, 6)}
7
{(5, 9), (1, 1), (3, 3)}
9
{(5, 9), (1, 1), (3, 3)}
3
{(5, 9), (1, 1), (3, 3)}
0
{(0, 1), (5, 9), (3, 3)}
6
{(0, 1), (5, 9), (3, 3)}
5
{(0, 1), (5, 9), (3, 3)}
8
{(0, 1), (5, 9), (3, 3)}
1
{(0, 1), (5, 9), (3, 3)}
4
{(0, 1), (3, 9)}
7
{(0, 1), (3, 9)}
2
{(0, 9)}
2
{(0, 9)}
0
{(0, 9)}
4
{(0, 9)}
You could use a Disjoint Set Forest implementation for this. If well-implemented, it gives a near linear time complexity for inserting 𝑛 elements into it. The amortized running time of each insert operation is Θ(α(𝑛)) where α(𝑛) is the inverse Ackermann function. For all practical purposes we can not distinguish this from O(1).
The extraction of the ranges can have a time complexity of O(𝑘), where 𝑘 is the number of ranges, provided that the disjoint set maintains the set of root nodes. If the ranges need to be sorted, then this extraction will have a time complexity of O(𝑘log𝑘), as it will then just perform the sort-operation on it.
Here is an implementation in Python:
class Node:
def __init__(self, value):
self.low = value
self.parent = self
self.size = 1
def find(self): # Union-Find: Path splitting
node = self
while node.parent is not node:
node, node.parent = node.parent, node.parent.parent
return node
class Ranges:
def __init__(self):
self.nums = dict()
self.roots = set()
def union(self, a, b): # Union-Find: Size-based merge
a = a.find()
b = b.find()
if a is not b:
if a.size > b.size:
a, b = b, a
self.roots.remove(a) # Keep track of roots
a.parent = b
b.low = min(a.low, b.low)
b.size = a.size + b.size
def add(self, n):
if n not in self.nums:
self.nums[n] = node = Node(n)
self.roots.add(node)
if (n+1) in self.nums:
self.union(node, self.nums[n+1])
if (n-1) in self.nums:
self.union(node, self.nums[n-1])
def get(self):
return sorted((node.low, node.low + node.size - 1) for node in self.roots)
# example run
ranges = Ranges()
for n in 4, 7, 1, 6, 2, 9, 5:
ranges.add(n)
print(ranges.get()) # [(1, 2), (4, 7), (9, 9)]
Given A and B, which are two interval lists. A has no overlap inside A and B has no overlap inside B. In A, the intervals are sorted by their starting points. In B, the intervals are sorted by their starting points. How do you merge the two interval lists and output the result with no overlap?
One method is to concatenate the two lists, sort by the starting point, and apply merge intervals as discussed at https://www.geeksforgeeks.org/merging-intervals/. Is there a more efficient method?
Here is an example:
A: [1,5], [10,14], [16,18]
B: [2,6], [8,10], [11,20]
The output:
[1,6], [8, 20]
So you have two sorted lists with events - entering interval and leaving interval.
Merge these lists keeping current state as integer 0, 1, 2 (active interval count)
Get the next coordinate from both lists
If it is entering event
Increment state
If state becomes 1, start new output interval
If it is closing event
Decrement state
If state becomes 0, close current output interval
Note that this algo is similar to intersection finding there
Here is a different approach, in the spirit of the answer to the question of overlaps.
<!--code lang=scala-->
def findUnite (l1: List[Interval], l2: List[Interval]): List[Interval] = (l1, l2) match {
case (Nil, Nil) => Nil
case (as, Nil) => as
case (Nil, bs) => bs
case (a :: as, b :: bs) => {
if (a.lower > b.upper) b :: findUnite (l1, bs)
else if (a.upper < b.lower) a :: findUnite (as, l2)
else if (a.upper > b.upper) findUnite (a.union (b).get :: as, bs)
else findUnite (as, a.union (b).get :: bs)
}
}
If both lists are empty - return the empty list.
If only one is empty, return the other.
If the upper bound of one list is below the lower bound of the other, there is no unification possible, so return the other and proceed with the rest.
If they overlap, don't return, but call the method recursively, the unification on the side of the more far reaching interval and without the consumed less far reaching interval.
The union method looks similar to the one which does the overlap:
<!--code scala-->
case class Interval (lower: Int, upper: Int) {
// from former question, to compare
def overlap (other: Interval) : Option [Interval] = {
if (lower > other.upper || upper < other.lower) None else
Some (Interval (Math.max (lower, other.lower), Math.min (upper, other.upper)))
}
def union (other: Interval) : Option [Interval] = {
if (lower > other.upper || upper < other.lower) None else
Some (Interval (Math.min (lower, other.lower), Math.max (upper, other.upper)))
}
}
The test for non overlap is the same. But min and max have changed places.
So for (2, 4) (3, 5) the overlap is (3, 4), the union is (2, 5).
lower upper
_____________
2 4
3 5
_____________
min 2 4
max 3 5
Table of min/max lower/upper.
<!--code lang='scala'-->
val e = List (Interval (0, 4), Interval (7, 12))
val f = List (Interval (1, 3), Interval (6, 8), Interval (9, 11))
findUnite (e, f)
// res3: List[Interval] = List(Interval(0,4), Interval(6,12))
Now for the tricky or unclear case from above:
val e = List (Interval (0, 4), Interval (7, 12))
val f = List (Interval (1, 3), Interval (5, 8), Interval (9, 11))
findUnite (e, f)
// res6: List[Interval] = List(Interval(0,4), Interval(5,12))
0-4 and 5-8 don't overlap, so they form two different results which don't get merged.
A simple solution could be, to deflate all elements, put them into a set, sort it, then iterate to transform adjectant elements to Intervals.
A similar approach could be chosen for your other question, just eliminating all distinct values to get the overlaps.
But - there is a problem with that approach.
Lets define a class Interval:
case class Interval (lower: Int, upper: Int) {
def deflate () : List [Int] = {(lower to upper).toList}
}
and use it:
val e = List (Interval (0, 4), Interval (7, 12))
val f = List (Interval (1, 3), Interval (6, 8), Interval (9, 11))
deflating:
e.map (_.deflate)
// res26: List[List[Int]] = List(List(0, 1, 2, 3, 4), List(7, 8, 9, 10, 11, 12))
f.map (_.deflate)
// res27: List[List[Int]] = List(List(1, 2, 3), List(6, 7, 8), List(9, 10, 11))
The ::: combines two Lists, here two Lists of Lists, which is why we have to flatten the result, to make one big List:
(res26 ::: res27).flatten
// res28: List[Int] = List(0, 1, 2, 3, 4, 7, 8, 9, 10, 11, 12, 1, 2, 3, 6, 7, 8, 9, 10, 11)
With distinct, we remove duplicates:
(res26 ::: res27).flatten.distinct
// res29: List[Int] = List(0, 1, 2, 3, 4, 7, 8, 9, 10, 11, 12, 6)
And then we sort it:
(res26 ::: res27).flatten.distinct.sorted
// res30: List[Int] = List(0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12)
All in one command chain:
val united = ((e.map (_.deflate) ::: f.map (_.deflate)).flatten.distinct).sorted
// united: List[Int] = List(0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12)
// ^ (Gap)
Now we have to find the gaps like the one between 4 and 6 and return two distinct Lists.
We go recursively through the input list l, and if the element is from the sofar collected elements 1 bigger than the last, we collect that element into this sofar-list. Else we return the sofar collected list as partial result, followed by splitting of the rest with a List of just the current element as new sofar-collection. In the beginning, sofar is empty, so we can start right with adding the first element into that list and splitting the tail with that.
def split (l: List [Int], sofar: List[Int]): List[List[Int]] = l match {
case Nil => List (sofar)
case h :: t => if (sofar.isEmpty) split (t, List (h)) else
if (h == sofar.head + 1) split (t, h :: sofar)
else sofar :: split (t, List (h))
}
// Nil is the empty list, we hand in for initialization
split (united, Nil)
// List(List(4, 3, 2, 1, 0), List(12, 11, 10, 9, 8, 7, 6))
Converting the Lists into intervals would be a trivial task - take the first and last element, and voila!
But there is a problem with that approach. Maybe you recognized, that I redefined your A: and B: (from the former question). In B, I redefined the second element from 5-8 to 6-8. Because else, it would merge with the 0-4 from A because 4 and 5 are direct neighbors, so why not combine them to a big interval?
But maybe it is supposed to work this way? For the above data:
split (united, Nil)
// List(List(6, 5, 4, 3, 2, 1), List(20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8))
I want to get number of permutations of {1, ..., n} for which Insertion Sort does exactly n(n-1)/2 comparisions.
For example, for {1, 2, 3, 4} we got (4, 3, 2, 1), (3, 4, 2, 1), (4, 2, 3, 1) etc. - for all of them InsertionSort does 4*3/2 = 6 comparisions.
Anybody know some exact formula for that?
I am thinking about something like (n-1) + 1 = n, where
1 stands for reverse sequence and then we can swap all of (n-1) pairs in reverse sequence.
Here is a hint. The complete list for (1, 2, 3, 4) are:
(4, 3, 2, 1)
(3, 4, 2, 1)
(4, 2, 3, 1)
(2, 4, 3, 1)
(4, 3, 1, 2)
(3, 4, 1, 2)
(4, 1, 3, 2)
(1, 4, 3, 2)
Look at it from last column to first.
Walk step by step through the insertion sorts. See where they merge. Do you see a pattern there?
Reversing it, can you figure out how I generated this list? Can you prove that the list is complete?
The why is what matters here. Just saying 2n-1 is useless.
n(n-1)/2 is the sum of all elements in the range (1, n - 1). Since your sequence has length n, you can expand that range to (0, n - 1).
The number of swaps for each insertion would be:
run # list value swaps
1 [] a 0 (no swaps possible)
2 [a] b 1
3 [b, a] c 2
...
10 [i,...,a] j 9
...
n [...] ? n - 1
So we need to move every element through the entire list in order to achieve the required count of swaps. The number of comparisons can be at most one higher than the number of swaps, which means each value that is being inserted must either be placed at the first or second index of the resulting list. Or
Put differently, assuming ascending ordering of the output:
The input list should in general be a nearly descending list, where each element in the list may be preceded by at most one element that is not larger than the element in question.
If I have data that's in the form of a list of tuples:
[(uid, start_time, end_time)]
I'd like to find all unique combinations of uids that overlap in time. Eg, if I had a list like the following:
[(0, 1, 2),
(1, 1.1, 3),
(2, 1.5, 2.5),
(3, 2.5, 4),
(4, 4, 5)]
I'd like to get as output:
[(0,1,2), (1,3), (0,), (1,), (2,), (3,), (4,)]
Is there a faster algorithm for this than the naive brute force?
First, sort your tuples by start time. Keep a heap of active tuples, which has the one with the earliest end time on top.
Then, you move through your sorted list and add tuples to the active set. Doing so, you also check if you need to remove tuples. If so, you can report an interval. In order to avoid duplicate reports, report new intervals only if there has been a new tuple added to the active set since the last report.
Here is some pseudo-code that visualizes the idea:
sort(tuples)
activeTuples := new Heap
bool newInsertAfterLastReport = false
for each tuple in tuples
while activeTuples is not empty and activeTuples.top.endTime <= tuple.startTime
//the first tuple from the active set has to be removed
if newInsertAfterLastReport
report activeTuples
newInsertAfterLastReport = false
activeTuples.pop()
end while
activeTuples.insert(tuple)
newInsertAfterLastReport = true
next
if activeTuples has more than 1 entry
report activeTuples
With your example data set you get:
data = [(0, 1, 2), (1, 1.1, 3), (2, 1.5, 2.5), (3, 2.5, 4), (4, 4, 5)]
tuple activeTuples newInsertAfterLastReport
---------------------------------------------------------------------
(0, 1, 2) [] false
[(0, 1, 2)] true
(1, 1.1, 3)
[(0, 1, 2), (1, 1.1, 3)]
(2, 1.5, 2.5)
[(0, 1, 2), (2, 1.5, 2.5), (1, 1.1, 3)]
(3, 2.5, 4) -> report (0, 1, 2)
[(2, 1.5, 2.5), (1, 1.1, 3)] false
[(1, 1.1, 3)]
[(1, 1.1, 3), (3, 2.5, 4)] true
(4, 4, 5) -> report (1, 3) false
[(3, 2.5, 4)]
[]
[(4, 4, 5)]
Actually, I would remove the if activeTuples has more than 1 entry part and always report at the end. This would result in an additional report of (4) because it is not included in any of the previous reports (whereas (0) ... (3) are).
I think this can be done in O(n lg n + n o) time where o is the maximum size of your output (o could be n in the worst case).
Build a 3-tuple for each start_time or end_time as follows: the first component is the start_time or end_time of an input tuple, the second component is the id of the input tuple, the third component is whether it's start_time or end_time. Now you have 2n 3-tuples. Sort them in ascending order of the first component.
Now start scanning the list of 3-tuples from the smallest to the largest. Each time a range starts, add its id to a balanced binary search tree (in O(lg o) time), and output the contents of the tree (in O(o)), and each time a range ends, remove its id from the tree (in O(lg o) time).
You also need to take care of the corner cases, e.g., how to deal with equal start and end times either of the same range or of different ranges.
I encountered and solved this problem as part of a larger algorithm, but my solution seems inelegant and I would appreciate any insights.
I have a list of pairs which can be viewed as points on a Cartesian plane. I need to generate three lists: the sorted x values, the sorted y values, and a list which maps an index in the sorted x values with the index in the sorted y values corresponding to the y value with which it was originally paired.
A concrete example might help explain. Given the following list of points:
((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
The sorted list of x values would be (3, 4, 5, 7, 9, 15), and the sorted list of y values would be (0, 4, 7, 7, 11, 12).
Assuming a zero based indexing scheme, the list that maps the x list index to the index of its paired y list index would be (2, 3, 0, 4, 5, 1).
For example the value 7 appears as index 3 in the x list. The value in the mapping list at index 3 is 4, and the value at index 4 in the y list is 11, corresponding to the original pairing (7, 11).
What is the simplest way of generating this mapping list?
Here's a simple O(nlog n) method:
Sort the pairs by their x value: ((3, 7), (4, 7), (5, 0), (7, 11), (9, 12), (15, 4))
Produce a list of pairs in which the first component is the y value from the same position in the previous list and the second increases from 0: ((7, 0), (7, 1), (0, 2), (11, 3), (12, 4), (4, 5))
Sort this list by its first component (y value): ((0, 2), (4, 5), (7, 0), (7, 1), (11, 3), (12, 4))
Iterate through this list. For the ith such pair (y, k), set yFor[k] = i. yFor[] is your list (well, array) mapping indices in the sorted x list to indices in the sorted y list.
Create the sorted x list simply by removing the 2nd element from the list produced in step 1.
Create the sorted y list by doing the same with the list produced in step 3.
I propose the following.
Generate the unsorted x and y lists.
xs = [3, 15, 7, 5, 4, 9 ]
ys = [7, 4, 11, 0, 7, 12]
Transform each element into a tuple - the first of the pair being the coordinate, the second being the original index.
xs = [(3, 0), (15, 1), ( 7, 2), (5, 3), (4, 4), ( 9, 5)]
ys = [(7, 0), ( 4, 1), (11, 2), (0, 3), (7, 4), (12, 5)]
Sort both lists.
xs = [(3, 0), (4, 4), (5, 3), (7, 2), ( 9, 5), (15, 1)]
ys = [(0, 3), (4, 1), (7, 0), (7, 4), (11, 2), (12, 5)]
Create an array, y_positions. The nth element of the array contains the current index of the y element that was originally at index n.
Create an empty index_list.
For each element of xs, get the original_index, the second pair of the tuple.
Use y_positions to retrieve the current index of the y element with the given original_index. Add the current index to index_list.
Finally, remove the index values from xs and ys.
Here's a sample Python implementation.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
#generate unsorted lists
xs, ys = zip(*points)
#pair each element with its index
xs = zip(xs, range(len(xs)))
ys = zip(ys, range(len(xs)))
#sort
xs.sort()
ys.sort()
#generate the y positions list.
y_positions = [None] * len(ys)
for i in range(len(ys)):
original_index = ys[i][1]
y_positions[original_index] = i
#generate `index_list`
index_list = []
for x, original_index in xs:
index_list.append(y_positions[original_index])
#remove tuples from x and y lists
xs = zip(*xs)[0]
ys = zip(*ys)[0]
print "xs:", xs
print "ys:", ys
print "index list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index list: [2, 3, 0, 4, 5, 1]
Generation of y_positions and index_list is O(n) time, so the complexity of the algorithm as a whole is dominated by the sorting step.
Thank you for the answers. For what it's worth, the solution I had was pretty similar to those outlined, but as j_random_hacker pointed out, there's no need for a map. It just struck me that this little problem seems more complicated than it appears at first glance and I was wondering if I was missing something obvious. I've rehashed my solution into Python for comparison.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
# Separate the points into their x and y components, tag the values with
# their index into the points list.
# Sort both resulting (value, tag) lists and then unzip them into lists of
# sorted x and y values and the tag information.
xs, s = zip(*sorted(zip([x for (x, y) in points], range(N))))
ys, r = zip(*sorted(zip([y for (x, y) in points], range(N))))
# Generate the mapping list.
t = N * [0]
for i in range(N):
t[r[i]] = i
index_list = [t[j] for j in s]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index_list: [2, 3, 0, 4, 5, 1]
I've just understood what j_random_hacker meant by removing a level of indirection by sorting the points in x initially. That allows things to be tidied up nicely. Thanks.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
ordered_by_x = sorted(points)
ordered_by_y = sorted(zip([y for (x, y) in ordered_by_x], range(N)))
index_list = N * [0]
for i, (y, k) in enumerate(ordered_by_y):
index_list[k] = i
xs = [x for (x, y) in ordered_by_x]
ys = [y for (y, k) in ordered_by_y]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list