tuple becomes empty after one iteration of for loop - for-loop

I have a problem with this code. Even though the if conditional is for equivalence of the tuple object, after the second iteration of for loop, the whole parent tuple becomes empty, and I receive error:
IndexError: list index out of range
tup = ((1, 2), (1, 4), (1, 3), (1, 2))
def remove_dup(event, schedule):
for i in schedule:
if i == event:
schedule = tuple((list(schedule[1:])).remove(i))
return schedule
remove_dup((1, 2), tup) # should remove (1, 2) of index 3
I was hoping that the for loop would remove the last child tuple (1, 2), as it is equivalent to the event.

Related

Data Structure to convert a stream of integers into a list of range

A function is given with a method to get the next integer from a stream of integers. The numbers are fetched sequentially from the stream. How will we go about producing a summary of integers encountered till now?
Given a list of numbers, the summary will consist of the ranges of numbers. Example: The list till now = [1,5,4,2,7] then summary = [[1-2],[4-5],7]
Put the number in ranges if they are continuous.
My Thoughts:
Approach 1:
Maintain the sorted numbers. So when we fetch a new number from a stream, we can use binary search to find the location of the number in the list and insert the element so that the resulting list is sorted. But since this is a list, I think inserting the element will be an O(N) operation.
Approach 2:
Use Balanced binary search trees like Red, Black, or AVL. Each insertion will be O(log N)
and in order will yield the sorted array from which one can compute the range in O(N)
Approach 2 looks like a better approach if I am not making any mistakes. I am unsure if there is a better way to solve this issue.
I'd not keep the original numbers, but aggregate them to ranges on the fly. This has the potential to reduce the number of elements by quite some factor (depending on the ordering and distribution of the incoming values). The task itself seems to imply that you expect contiguous ranges of integers to appear quite frequently in the input.
Then a newly incoming number can fall into one of a few cases:
It is already contained in some range: then simply ignore the number (this is only relevant if duplicate inputs can happen).
It is adjacent to none of the ranges so far: create a new single-element range.
It is adjacent to exactly one range: extend that range by 1, downward or upward.
It is adjacent to two ranges (i.e. fills the gap): merge the two ranges.
For the data structure holding the ranges, you want a good performance for the following operations:
Find the place (position) for a given number.
Insert a new element (range) at a given place.
Merge two (neighbor) elements. This can be broken down into:
Remove an element at a given place.
Modify an element at a given place.
Depending on the expected number und sparsity of ranges, a sorted list of ranges might do. Otherwise, some kind of search tree might turn out helpful.
Anyway, start with the most readable approach, measure performance for typical cases, and decide whether some optimization is necessary.
I suggest maintaining a hashmap that maps each integer seen so far to the interval it belongs to.
Make sure that two numbers that are part of the same interval will point to the same interval object, not to copies; so that if you update an interval to extend it, all numbers can see it.
All operations are O(1), except the operation "merge two intervals" that happens if the stream produces integer x when we have two intervals [a, x - 1] and [x + 1, b]. The merge operation is proportional to the length of the shortest of these two intervals.
As a result, for a stream of n integers, the algorithm's complexity is O(n) in the best-case (where at most a few big merges happen) and O(n log n) in the worst-case (when we keep merging lots of intervals).
In python:
def add_element(intervals, x):
if x in intervals: # do not do anything
pass
elif x + 1 in intervals and x - 1 in intervals: # merge two intervals
i = intervals[x - 1]
j = intervals[x + 1]
if i[1]-i[0] > j[1]-j[0]: # j is shorter: update i, and make everything in j point to i
i[1] = j[1]
for y in range(j[0] - 1, j[1]+1):
intervals[y] = i
else: # i is shorter: update j, and make everything in i point to j
j[0] = i[0]
for y in range(i[0], i[1] + 2):
intervals[y] = j
elif x + 1 in intervals: # extend one interval to the left
i = intervals[x + 1]
i[0] = x
intervals[x] = i
elif x - 1 in intervals: # extend one interval to the right
i = intervals[x - 1]
i[1] = x
intervals[x] = i
else: # add a singleton
intervals[x] = [x,x]
return intervals
from random import shuffle
def main():
stream = list(range(10)) * 2
shuffle(stream)
print(stream)
intervals = {}
for x in stream:
intervals = add_element(intervals, x)
print(x)
print(set(map(tuple, intervals.values()))) # this line terribly inefficient because I'm lazy
if __name__=='__main__':
main()
Output:
[1, 5, 8, 3, 9, 6, 7, 9, 3, 0, 6, 5, 8, 1, 4, 7, 2, 2, 0, 4]
1
{(1, 1)}
5
{(1, 1), (5, 5)}
8
{(8, 8), (1, 1), (5, 5)}
3
{(8, 8), (1, 1), (5, 5), (3, 3)}
9
{(8, 9), (1, 1), (5, 5), (3, 3)}
6
{(3, 3), (1, 1), (8, 9), (5, 6)}
7
{(5, 9), (1, 1), (3, 3)}
9
{(5, 9), (1, 1), (3, 3)}
3
{(5, 9), (1, 1), (3, 3)}
0
{(0, 1), (5, 9), (3, 3)}
6
{(0, 1), (5, 9), (3, 3)}
5
{(0, 1), (5, 9), (3, 3)}
8
{(0, 1), (5, 9), (3, 3)}
1
{(0, 1), (5, 9), (3, 3)}
4
{(0, 1), (3, 9)}
7
{(0, 1), (3, 9)}
2
{(0, 9)}
2
{(0, 9)}
0
{(0, 9)}
4
{(0, 9)}
You could use a Disjoint Set Forest implementation for this. If well-implemented, it gives a near linear time complexity for inserting 𝑛 elements into it. The amortized running time of each insert operation is Θ(α(𝑛)) where α(𝑛) is the inverse Ackermann function. For all practical purposes we can not distinguish this from O(1).
The extraction of the ranges can have a time complexity of O(𝑘), where 𝑘 is the number of ranges, provided that the disjoint set maintains the set of root nodes. If the ranges need to be sorted, then this extraction will have a time complexity of O(𝑘log𝑘), as it will then just perform the sort-operation on it.
Here is an implementation in Python:
class Node:
def __init__(self, value):
self.low = value
self.parent = self
self.size = 1
def find(self): # Union-Find: Path splitting
node = self
while node.parent is not node:
node, node.parent = node.parent, node.parent.parent
return node
class Ranges:
def __init__(self):
self.nums = dict()
self.roots = set()
def union(self, a, b): # Union-Find: Size-based merge
a = a.find()
b = b.find()
if a is not b:
if a.size > b.size:
a, b = b, a
self.roots.remove(a) # Keep track of roots
a.parent = b
b.low = min(a.low, b.low)
b.size = a.size + b.size
def add(self, n):
if n not in self.nums:
self.nums[n] = node = Node(n)
self.roots.add(node)
if (n+1) in self.nums:
self.union(node, self.nums[n+1])
if (n-1) in self.nums:
self.union(node, self.nums[n-1])
def get(self):
return sorted((node.low, node.low + node.size - 1) for node in self.roots)
# example run
ranges = Ranges()
for n in 4, 7, 1, 6, 2, 9, 5:
ranges.add(n)
print(ranges.get()) # [(1, 2), (4, 7), (9, 9)]

Cartesian product but remove duplicates up to cyclic permutations

Given two integers n and r, I want to generate all possible combinations with the following rules:
There are n distinct numbers to choose from, 1, 2, ..., n;
Each combination should have r elements;
A combination may contain more than one of an element, for instance (1,2,2) is valid;
Order matters, i.e. (1,2,3) and (1,3,2) are considered distinct;
However, two combinations are considered equivalent if one is a cyclic permutation of the other; for instance, (1,2,3) and (2,3,1) are considered duplicates.
Examples:
n=3, r=2
11 distinct combinations
(1,1,1), (1,1,2), (1,1,3), (1,2,2), (1,2,3), (1,3,2), (1,3,3), (2,2,2), (2,2,3), (2,3,3) and (3,3,3)
n=2, r=4
6 distinct combinations
(1,1,1,1), (1,1,1,2), (1,1,2,2), (1,2,1,2), (1,2,2,2), (2,2,2,2)
What is the algorithm for it? And how to implement it in c++?
Thank you in advance for advice.
Here is a naive solution in python:
Generate all combinations from the Cartesian product of {1, 2, ...,n} with itself r times;
Only keep one representative combination for each equivalency class; drop all other combinations that are equivalent to this representative combination.
This means we must have some way to compare combinations, and for instance, only keep the smallest combination of every equivalency class.
from itertools import product
def is_representative(comb):
return all(comb[i:] + comb[:i] >= comb
for i in range(1, len(comb)))
def cartesian_product_up_to_cyclic_permutations(n, r):
return filter(is_representative,
product(range(n), repeat=r))
print(list(cartesian_product_up_to_cyclic_permutations(3, 3)))
# [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 1), (0, 1, 2), (0, 2, 1), (0, 2, 2), (1, 1, 1), (1, 1, 2), (1, 2, 2), (2, 2, 2)]
print(list(cartesian_product_up_to_cyclic_permutations(2, 4)))
# [(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 1), (0, 1, 0, 1), (0, 1, 1, 1), (1, 1, 1, 1)]
You mentioned that you wanted to implement the algorithm in C++. The product function in the python code behaves just like a big for-loop that generates all the combinations in the Cartesian product. See this related question to implement Cartesian product in C++: Is it possible to execute n number of nested "loops(any)" where n is given?.

Filling in Julia matrix with nested for loops

I have two arrays and an empty matrix, I need to perform a function such that the resulting matrix includes every combination of the two arrays.
Unfortunately I cannot run the arrays separately as they are both optional parameters for the function. I thought that the best way to do this was through nested loops but now I am unsure...
I've tried multiplying one of the matrices so that it includes the necessary duplicates, but I struggled with that as the real data is somewhat larger.
I've tried many versions of these nested loops.
a = [ 1 2 3 ]
b = [ 4 5 6 7 ]
ab = zeros(3,4)
for i = 1:length(a)
for j = 1:length(b)
ab[??] = function(x = a[??], y = b[??])
end
end
ab = [1x4 1x5 1x6 1x7, 2x4 2x5 2x6 2x7, 3x4 3x5 3x6 3x7]
Your problem can be solved by broadcasting:
julia> f(x, y) = (x,y) # trivial example
f (generic function with 1 method)
julia> f.([1 2 3]', [4 5 6 7])
3×4 Array{Tuple{Int64,Int64},2}:
(1, 4) (1, 5) (1, 6) (1, 7)
(2, 4) (2, 5) (2, 6) (2, 7)
(3, 4) (3, 5) (3, 6) (3, 7)
The prime in a' transposes a to make the shapes work out correctly.
But note that a = [ 1 2 3 ] constructs a 1×3 Array{Int64,2}, which is a matrix. For a vector (what you probably call "array"), use commas: a = [ 1, 2, 3 ] etc. If you have your data in that form, you have to transpose the other way round:
julia> f.([1,2,3], [4,5,6,7]')
3×4 Array{Tuple{Int64,Int64},2}:
(1, 4) (1, 5) (1, 6) (1, 7)
(2, 4) (2, 5) (2, 6) (2, 7)
(3, 4) (3, 5) (3, 6) (3, 7)
BTW, this is called an "outer product" (for f = *), or a generalization of it. And if f is an operator ∘, you can use dotted infix broadcasting: a' ∘. b.
Isn't that just
a'.*b
?
Oh, now I have to write some more characters to get past the minimum acceptable answer length but I don't really have anything to add, I hope the code is self-explanatory.
Also a list comprehension:
julia> a = [1,2,3];
julia> b = [4,5,6,7];
julia> ab = [(x,y) for x in a, y in b]
3×4 Array{Tuple{Int64,Int64},2}:
(1, 4) (1, 5) (1, 6) (1, 7)
(2, 4) (2, 5) (2, 6) (2, 7)
(3, 4) (3, 5) (3, 6) (3, 7)

Sort by key then value which will then be grouped up...pyspark

So I'm trying to sort data in this format...
[((0, 4), 3), ((4, 0), 3), ((1, 6), 1), ((3, 2), 3), ((0, 5), 1)...
Ascending by key and then descending by value. I'm able to achieve this via...
test = test.sortBy(lambda x: (x[0], -x[1]))
which would give me based on shortened version above...
[((0, 4), 3), ((0, 5), 1), ((1, 6), 1), ((3, 2), 3), ((4, 0), 3)...
The problem I'm having is that after the sorting I no longer want the value but do need to retain the sort after grouping the data. So...
test = test.map(lambda x: (x[0][0],x[0][1]))
Gives me...
[(0, 4), (0, 5), (1, 6), (3, 2), (4, 0)...
Which is still in the order I need it but I need the elements to be grouped up by key. I then use this command...
test = test.groupByKey().map(lambda x: (x[0], list(x[1])))
But in the process I lose the sorting. Is there any way retain?
I managed to retain the order by changing the format of the tuple...
test = test.map(lambda x: (x[0][0],(x[0][1],x[1]))
test = test.groupByKey().map(lambda x: (x[0], sorted(list(x[1]), key=lambda x: (x[0],-x[1]))))
[(0, [(4, 3), (5, 1)] ...
which leaves me with the value (2nd element in the tuple) that I want to get rid of but took care of that too...
test = test.map(lambda x: (x[0], [e[0] for e in x[1]]))
Feels a bit hacky but not sure how else it could be done.

Finding all unique combinations of overlapping items?

If I have data that's in the form of a list of tuples:
[(uid, start_time, end_time)]
I'd like to find all unique combinations of uids that overlap in time. Eg, if I had a list like the following:
[(0, 1, 2),
(1, 1.1, 3),
(2, 1.5, 2.5),
(3, 2.5, 4),
(4, 4, 5)]
I'd like to get as output:
[(0,1,2), (1,3), (0,), (1,), (2,), (3,), (4,)]
Is there a faster algorithm for this than the naive brute force?
First, sort your tuples by start time. Keep a heap of active tuples, which has the one with the earliest end time on top.
Then, you move through your sorted list and add tuples to the active set. Doing so, you also check if you need to remove tuples. If so, you can report an interval. In order to avoid duplicate reports, report new intervals only if there has been a new tuple added to the active set since the last report.
Here is some pseudo-code that visualizes the idea:
sort(tuples)
activeTuples := new Heap
bool newInsertAfterLastReport = false
for each tuple in tuples
while activeTuples is not empty and activeTuples.top.endTime <= tuple.startTime
//the first tuple from the active set has to be removed
if newInsertAfterLastReport
report activeTuples
newInsertAfterLastReport = false
activeTuples.pop()
end while
activeTuples.insert(tuple)
newInsertAfterLastReport = true
next
if activeTuples has more than 1 entry
report activeTuples
With your example data set you get:
data = [(0, 1, 2), (1, 1.1, 3), (2, 1.5, 2.5), (3, 2.5, 4), (4, 4, 5)]
tuple activeTuples newInsertAfterLastReport
---------------------------------------------------------------------
(0, 1, 2) [] false
[(0, 1, 2)] true
(1, 1.1, 3)
[(0, 1, 2), (1, 1.1, 3)]
(2, 1.5, 2.5)
[(0, 1, 2), (2, 1.5, 2.5), (1, 1.1, 3)]
(3, 2.5, 4) -> report (0, 1, 2)
[(2, 1.5, 2.5), (1, 1.1, 3)] false
[(1, 1.1, 3)]
[(1, 1.1, 3), (3, 2.5, 4)] true
(4, 4, 5) -> report (1, 3) false
[(3, 2.5, 4)]
[]
[(4, 4, 5)]
Actually, I would remove the if activeTuples has more than 1 entry part and always report at the end. This would result in an additional report of (4) because it is not included in any of the previous reports (whereas (0) ... (3) are).
I think this can be done in O(n lg n + n o) time where o is the maximum size of your output (o could be n in the worst case).
Build a 3-tuple for each start_time or end_time as follows: the first component is the start_time or end_time of an input tuple, the second component is the id of the input tuple, the third component is whether it's start_time or end_time. Now you have 2n 3-tuples. Sort them in ascending order of the first component.
Now start scanning the list of 3-tuples from the smallest to the largest. Each time a range starts, add its id to a balanced binary search tree (in O(lg o) time), and output the contents of the tree (in O(o)), and each time a range ends, remove its id from the tree (in O(lg o) time).
You also need to take care of the corner cases, e.g., how to deal with equal start and end times either of the same range or of different ranges.

Resources