Mapping sort indexes - algorithm

I encountered and solved this problem as part of a larger algorithm, but my solution seems inelegant and I would appreciate any insights.
I have a list of pairs which can be viewed as points on a Cartesian plane. I need to generate three lists: the sorted x values, the sorted y values, and a list which maps an index in the sorted x values with the index in the sorted y values corresponding to the y value with which it was originally paired.
A concrete example might help explain. Given the following list of points:
((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
The sorted list of x values would be (3, 4, 5, 7, 9, 15), and the sorted list of y values would be (0, 4, 7, 7, 11, 12).
Assuming a zero based indexing scheme, the list that maps the x list index to the index of its paired y list index would be (2, 3, 0, 4, 5, 1).
For example the value 7 appears as index 3 in the x list. The value in the mapping list at index 3 is 4, and the value at index 4 in the y list is 11, corresponding to the original pairing (7, 11).
What is the simplest way of generating this mapping list?

Here's a simple O(nlog n) method:
Sort the pairs by their x value: ((3, 7), (4, 7), (5, 0), (7, 11), (9, 12), (15, 4))
Produce a list of pairs in which the first component is the y value from the same position in the previous list and the second increases from 0: ((7, 0), (7, 1), (0, 2), (11, 3), (12, 4), (4, 5))
Sort this list by its first component (y value): ((0, 2), (4, 5), (7, 0), (7, 1), (11, 3), (12, 4))
Iterate through this list. For the ith such pair (y, k), set yFor[k] = i. yFor[] is your list (well, array) mapping indices in the sorted x list to indices in the sorted y list.
Create the sorted x list simply by removing the 2nd element from the list produced in step 1.
Create the sorted y list by doing the same with the list produced in step 3.

I propose the following.
Generate the unsorted x and y lists.
xs = [3, 15, 7, 5, 4, 9 ]
ys = [7, 4, 11, 0, 7, 12]
Transform each element into a tuple - the first of the pair being the coordinate, the second being the original index.
xs = [(3, 0), (15, 1), ( 7, 2), (5, 3), (4, 4), ( 9, 5)]
ys = [(7, 0), ( 4, 1), (11, 2), (0, 3), (7, 4), (12, 5)]
Sort both lists.
xs = [(3, 0), (4, 4), (5, 3), (7, 2), ( 9, 5), (15, 1)]
ys = [(0, 3), (4, 1), (7, 0), (7, 4), (11, 2), (12, 5)]
Create an array, y_positions. The nth element of the array contains the current index of the y element that was originally at index n.
Create an empty index_list.
For each element of xs, get the original_index, the second pair of the tuple.
Use y_positions to retrieve the current index of the y element with the given original_index. Add the current index to index_list.
Finally, remove the index values from xs and ys.
Here's a sample Python implementation.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
#generate unsorted lists
xs, ys = zip(*points)
#pair each element with its index
xs = zip(xs, range(len(xs)))
ys = zip(ys, range(len(xs)))
#sort
xs.sort()
ys.sort()
#generate the y positions list.
y_positions = [None] * len(ys)
for i in range(len(ys)):
original_index = ys[i][1]
y_positions[original_index] = i
#generate `index_list`
index_list = []
for x, original_index in xs:
index_list.append(y_positions[original_index])
#remove tuples from x and y lists
xs = zip(*xs)[0]
ys = zip(*ys)[0]
print "xs:", xs
print "ys:", ys
print "index list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index list: [2, 3, 0, 4, 5, 1]
Generation of y_positions and index_list is O(n) time, so the complexity of the algorithm as a whole is dominated by the sorting step.

Thank you for the answers. For what it's worth, the solution I had was pretty similar to those outlined, but as j_random_hacker pointed out, there's no need for a map. It just struck me that this little problem seems more complicated than it appears at first glance and I was wondering if I was missing something obvious. I've rehashed my solution into Python for comparison.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
# Separate the points into their x and y components, tag the values with
# their index into the points list.
# Sort both resulting (value, tag) lists and then unzip them into lists of
# sorted x and y values and the tag information.
xs, s = zip(*sorted(zip([x for (x, y) in points], range(N))))
ys, r = zip(*sorted(zip([y for (x, y) in points], range(N))))
# Generate the mapping list.
t = N * [0]
for i in range(N):
t[r[i]] = i
index_list = [t[j] for j in s]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index_list: [2, 3, 0, 4, 5, 1]

I've just understood what j_random_hacker meant by removing a level of indirection by sorting the points in x initially. That allows things to be tidied up nicely. Thanks.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
ordered_by_x = sorted(points)
ordered_by_y = sorted(zip([y for (x, y) in ordered_by_x], range(N)))
index_list = N * [0]
for i, (y, k) in enumerate(ordered_by_y):
index_list[k] = i
xs = [x for (x, y) in ordered_by_x]
ys = [y for (y, k) in ordered_by_y]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list

Related

Cartesian product but remove duplicates up to cyclic permutations

Given two integers n and r, I want to generate all possible combinations with the following rules:
There are n distinct numbers to choose from, 1, 2, ..., n;
Each combination should have r elements;
A combination may contain more than one of an element, for instance (1,2,2) is valid;
Order matters, i.e. (1,2,3) and (1,3,2) are considered distinct;
However, two combinations are considered equivalent if one is a cyclic permutation of the other; for instance, (1,2,3) and (2,3,1) are considered duplicates.
Examples:
n=3, r=2
11 distinct combinations
(1,1,1), (1,1,2), (1,1,3), (1,2,2), (1,2,3), (1,3,2), (1,3,3), (2,2,2), (2,2,3), (2,3,3) and (3,3,3)
n=2, r=4
6 distinct combinations
(1,1,1,1), (1,1,1,2), (1,1,2,2), (1,2,1,2), (1,2,2,2), (2,2,2,2)
What is the algorithm for it? And how to implement it in c++?
Thank you in advance for advice.
Here is a naive solution in python:
Generate all combinations from the Cartesian product of {1, 2, ...,n} with itself r times;
Only keep one representative combination for each equivalency class; drop all other combinations that are equivalent to this representative combination.
This means we must have some way to compare combinations, and for instance, only keep the smallest combination of every equivalency class.
from itertools import product
def is_representative(comb):
return all(comb[i:] + comb[:i] >= comb
for i in range(1, len(comb)))
def cartesian_product_up_to_cyclic_permutations(n, r):
return filter(is_representative,
product(range(n), repeat=r))
print(list(cartesian_product_up_to_cyclic_permutations(3, 3)))
# [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 1), (0, 1, 2), (0, 2, 1), (0, 2, 2), (1, 1, 1), (1, 1, 2), (1, 2, 2), (2, 2, 2)]
print(list(cartesian_product_up_to_cyclic_permutations(2, 4)))
# [(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 1), (0, 1, 0, 1), (0, 1, 1, 1), (1, 1, 1, 1)]
You mentioned that you wanted to implement the algorithm in C++. The product function in the python code behaves just like a big for-loop that generates all the combinations in the Cartesian product. See this related question to implement Cartesian product in C++: Is it possible to execute n number of nested "loops(any)" where n is given?.

Obtaining all groups of n pairs of numbers that pass condition

I have a list with certain combinations between two numbers:
[1 2] [1 4] [1 6] [3 4] [5 6] [3 6] [2 3] [4 5] [2 5]
Now I want to make groups of 3 combinations, where each group contains all six digits once, e.g.:
[1 2] [3 6] [4 5] is valid
[1 4] [2 3] [5 6] is valid
[1 2] [2 3] [5 6] is invalid
Order is not important.
How can I arrive upon a list of all possible groups, without employing a brute forcing algorithm?
The language it is implemented in doesn't matter. Description of an algorithm that could achieve this is enough.
One thing to notice is that there are only finitely many possible pairs of elements you can pick from the set {1,2,3,4,5,6}. Specifically, there are (6P2) = 30 of them if you consider order relevant and (6 choose 2) = 15 if you don't. Even the simple "try all triples" algorithm that runs in cubic time in this case will only have to look at at most (30 choose 3) = 4,060 triples, which is a pretty small number. I doubt that you'd have any problems in practice just doing this.
Here's a recursive function in Python that picks a pair of numbers from a list, and then calls itself with the remaining list:
def pairs(l, picked, ok_pairs):
n = len(l)
for a in range(n-1):
for b in range(a+1,n):
pair = (l[a],l[b])
if pair not in ok_pairs:
continue
if picked and picked[-1][0] > pair[0]:
continue
p = picked+[pair]
if len(l) > 2:
pairs([m for i,m in enumerate(l) if i not in [a, b]], p, ok_pairs)
else:
print p
ok_pairs = set([(1, 2), (1, 4), (1, 6), (3, 4), (5, 6), (3, 6), (2, 3), (4, 5), (2, 5)])
pairs([1,2,3,4,5,6], [], ok_pairs)
The output (of 6 triplets) is:
[(1, 2), (3, 4), (5, 6)]
[(1, 2), (3, 6), (4, 5)]
[(1, 4), (2, 3), (5, 6)]
[(1, 4), (2, 5), (3, 6)]
[(1, 6), (2, 3), (4, 5)]
[(1, 6), (2, 5), (3, 4)]
Here's a version using Python set arithmetic:
pairs = [(1, 2), (1, 4), (1, 6), (3, 4), (5, 6), (3, 6), (2, 3), (4, 5), (2, 5)]
n = len(pairs)
for i in range(n-2):
set1 = set(pairs[i])
for j in range(i+1,n-1):
set2 = set(pairs[j])
if set1 & set2:
continue
for k in range(j+1,n):
set3 = set(pairs[k])
if set1 & set3 or set2 & set3:
continue
print pairs[i], pairs[j], pairs[k]
The output is:
(1, 2) (3, 4) (5, 6)
(1, 2) (3, 6) (4, 5)
(1, 4) (5, 6) (2, 3)
(1, 4) (3, 6) (2, 5)
(1, 6) (3, 4) (2, 5)
(1, 6) (2, 3) (4, 5)

Finding all unique combinations of overlapping items?

If I have data that's in the form of a list of tuples:
[(uid, start_time, end_time)]
I'd like to find all unique combinations of uids that overlap in time. Eg, if I had a list like the following:
[(0, 1, 2),
(1, 1.1, 3),
(2, 1.5, 2.5),
(3, 2.5, 4),
(4, 4, 5)]
I'd like to get as output:
[(0,1,2), (1,3), (0,), (1,), (2,), (3,), (4,)]
Is there a faster algorithm for this than the naive brute force?
First, sort your tuples by start time. Keep a heap of active tuples, which has the one with the earliest end time on top.
Then, you move through your sorted list and add tuples to the active set. Doing so, you also check if you need to remove tuples. If so, you can report an interval. In order to avoid duplicate reports, report new intervals only if there has been a new tuple added to the active set since the last report.
Here is some pseudo-code that visualizes the idea:
sort(tuples)
activeTuples := new Heap
bool newInsertAfterLastReport = false
for each tuple in tuples
while activeTuples is not empty and activeTuples.top.endTime <= tuple.startTime
//the first tuple from the active set has to be removed
if newInsertAfterLastReport
report activeTuples
newInsertAfterLastReport = false
activeTuples.pop()
end while
activeTuples.insert(tuple)
newInsertAfterLastReport = true
next
if activeTuples has more than 1 entry
report activeTuples
With your example data set you get:
data = [(0, 1, 2), (1, 1.1, 3), (2, 1.5, 2.5), (3, 2.5, 4), (4, 4, 5)]
tuple activeTuples newInsertAfterLastReport
---------------------------------------------------------------------
(0, 1, 2) [] false
[(0, 1, 2)] true
(1, 1.1, 3)
[(0, 1, 2), (1, 1.1, 3)]
(2, 1.5, 2.5)
[(0, 1, 2), (2, 1.5, 2.5), (1, 1.1, 3)]
(3, 2.5, 4) -> report (0, 1, 2)
[(2, 1.5, 2.5), (1, 1.1, 3)] false
[(1, 1.1, 3)]
[(1, 1.1, 3), (3, 2.5, 4)] true
(4, 4, 5) -> report (1, 3) false
[(3, 2.5, 4)]
[]
[(4, 4, 5)]
Actually, I would remove the if activeTuples has more than 1 entry part and always report at the end. This would result in an additional report of (4) because it is not included in any of the previous reports (whereas (0) ... (3) are).
I think this can be done in O(n lg n + n o) time where o is the maximum size of your output (o could be n in the worst case).
Build a 3-tuple for each start_time or end_time as follows: the first component is the start_time or end_time of an input tuple, the second component is the id of the input tuple, the third component is whether it's start_time or end_time. Now you have 2n 3-tuples. Sort them in ascending order of the first component.
Now start scanning the list of 3-tuples from the smallest to the largest. Each time a range starts, add its id to a balanced binary search tree (in O(lg o) time), and output the contents of the tree (in O(o)), and each time a range ends, remove its id from the tree (in O(lg o) time).
You also need to take care of the corner cases, e.g., how to deal with equal start and end times either of the same range or of different ranges.

quick sort list of tuple with python

I am trying to do this in its operation algorithm quicksort to sort though the elements of a list of tuples. Or if I have a list of this type [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )] I want to sort it in function of the second element of each tuple and obtain [(6,4), (3,3), (4,2), (0,1), (1,1), (2,1 ), (5,1)]. I have tried using the following algorithm:
def partition(array, begin, end, cmp):
pivot=array[end][1]
ii=begin
for jj in xrange(begin, end):
if cmp(array[jj][1], pivot):
array[ii], array[jj] = array[jj], array[ii]
ii+=1
array[ii], array[end] = pivot, array[ii]
return ii
enter code hedef sort(array, cmp=lambda x, y: x > y, begin=0, end=None):
if end is None: end = len(array)
if begin < end:
i = partition(array, begin, end-1, cmp)
sort(array, cmp, i+1, end)
sort(array, cmp, begin, i)
The problem is that the result is this: [4, (3, 3), (4, 2), 1, 1, 1, (5, 1)]. What do I have to change to get the correct result ??
Complex sorting patterns in Python are painless. Python's sorting algorithm is state of the art, one of the fastest available in real-world cases. No algorithm design needed.
>>> from operator import itemgetter
>>> l = [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )]
>>> l.sort(key=itemgetter(1), reverse=True)
>>> l
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
Above, itemgetter returns a function that returns the second element of its argument. Thus the key argument to sort is a function that returns the item on which to sort the list.
Python's sort is stable, so the ordering of elements with equal keys (in this case, the second item of each tuple) is determined by the original order.
Unfortunately the answer from #wkschwartz only works due to the peculiar start ordering of the terms. If the tuple (5, 1) is moved to the beginning of the list then it gives a different answer.
The following (first) method works in that it gives the same result for any initial ordering of the items in the initial list.
Python 3.4.2 |Continuum Analytics, Inc.| (default, Oct 22 2014, 11:51:45) [MSC v
.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> l = [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )]
>>> sorted(l, key=lambda x: (-x[1], x[0]))
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>> from operator import itemgetter
>>> sorted(l, key=itemgetter(1), reverse=True)
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>> # but note:
>>> l2 = [(5,1), (1,1), (2,1), (3,3), (4,2), (0,1), (6,4 )]
>>> # Swapped first and sixth elements
>>> sorted(l2, key=itemgetter(1), reverse=True)
[(6, 4), (3, 3), (4, 2), (5, 1), (1, 1), (2, 1), (0, 1)]
>>> sorted(l2, key=lambda x: (-x[1], x[0]))
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>>

Counting number of days, given a collection of day ranges?

Say I have the following ranges, in some list:
{ (1, 4), (6, 8), (2, 5), (1, 3) }
(1, 4) represents days 1, 2, 3, 4. (6, 8) represents days 6, 7, 8, and so on.
The goal is to find the total number of days that are listed in the collection of ranges -- for instance, in the above example, the answer would be 8, because days 1, 2, 3, 4, 6, 7, 8, and 5 are contained within the ranges.
This problem can be solved trivially by iterating through the days in each range and putting them in a HashSet, then returning the size of the HashSet. But is there any way to do it in O(n) time with respect to the number of range pairs? How about in O(n) time and with constant space? Thanks.
Sort the ranges in ascending order by their lower limits. You can probably do this in linear time since you're dealing with integers.
The rest is easy. Loop through the ranges once keeping track of numDays (initialized to zero) and largestDay (initialized to -INF). On reaching each interval (a, b):
if b > largestDay then
numDays <- numDays + b-max(a - 1, largestDay)
largestDay <- max(largestDay, b)
else nothing.
So, after sorting we have (1,4), (1,3), (2,5), (6,8)
(1,4): numDays <- 0 + (4 - max(1 - 1, -INF)) = 4, largestDay <- max(-INF, 4) = 4
(1,3): b < largestDay, so no change.
(2,5): numDays <- 4 + (5 - max(2 - 1, 4)) = 5, largestDay <- 5
(6,8): numDays <- 5 + (8 - max(6-1, 5)) = 8, largestDay <- 8
The complexity of the following algorithm is O(n log n) where n is the number of ranges.
Sort the ranges (a, b) lexicographically by increasing a then by decreasing b.
Before: { (1, 4), (6, 8), (2, 5), (1, 3) }
After: { (1, 4), (1, 3), (2, 5), (6, 8) }
Collapse the sorted sequence of ranges into a potentially-shorter sequence of ranges, repeatedly merging consecutive (a, b) and (c, d) into (a, max(b, d)) if b >= c.
Before: { (1, 4), (1, 3), (2, 5), (6, 8) }
{ (1, 4), (2, 5), (6, 8) }
After: { (1, 5), (6, 8) }
Map the sequence of ranges to their sizes.
Before: { (1, 5), (6, 8) }
After: { 5, 3 }
Sum the sizes to arrive at the total number of days.
8

Resources