quick sort list of tuple with python - algorithm

I am trying to do this in its operation algorithm quicksort to sort though the elements of a list of tuples. Or if I have a list of this type [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )] I want to sort it in function of the second element of each tuple and obtain [(6,4), (3,3), (4,2), (0,1), (1,1), (2,1 ), (5,1)]. I have tried using the following algorithm:
def partition(array, begin, end, cmp):
pivot=array[end][1]
ii=begin
for jj in xrange(begin, end):
if cmp(array[jj][1], pivot):
array[ii], array[jj] = array[jj], array[ii]
ii+=1
array[ii], array[end] = pivot, array[ii]
return ii
enter code hedef sort(array, cmp=lambda x, y: x > y, begin=0, end=None):
if end is None: end = len(array)
if begin < end:
i = partition(array, begin, end-1, cmp)
sort(array, cmp, i+1, end)
sort(array, cmp, begin, i)
The problem is that the result is this: [4, (3, 3), (4, 2), 1, 1, 1, (5, 1)]. What do I have to change to get the correct result ??

Complex sorting patterns in Python are painless. Python's sorting algorithm is state of the art, one of the fastest available in real-world cases. No algorithm design needed.
>>> from operator import itemgetter
>>> l = [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )]
>>> l.sort(key=itemgetter(1), reverse=True)
>>> l
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
Above, itemgetter returns a function that returns the second element of its argument. Thus the key argument to sort is a function that returns the item on which to sort the list.
Python's sort is stable, so the ordering of elements with equal keys (in this case, the second item of each tuple) is determined by the original order.

Unfortunately the answer from #wkschwartz only works due to the peculiar start ordering of the terms. If the tuple (5, 1) is moved to the beginning of the list then it gives a different answer.
The following (first) method works in that it gives the same result for any initial ordering of the items in the initial list.
Python 3.4.2 |Continuum Analytics, Inc.| (default, Oct 22 2014, 11:51:45) [MSC v
.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> l = [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )]
>>> sorted(l, key=lambda x: (-x[1], x[0]))
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>> from operator import itemgetter
>>> sorted(l, key=itemgetter(1), reverse=True)
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>> # but note:
>>> l2 = [(5,1), (1,1), (2,1), (3,3), (4,2), (0,1), (6,4 )]
>>> # Swapped first and sixth elements
>>> sorted(l2, key=itemgetter(1), reverse=True)
[(6, 4), (3, 3), (4, 2), (5, 1), (1, 1), (2, 1), (0, 1)]
>>> sorted(l2, key=lambda x: (-x[1], x[0]))
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>>

Related

Cartesian product but remove duplicates up to cyclic permutations

Given two integers n and r, I want to generate all possible combinations with the following rules:
There are n distinct numbers to choose from, 1, 2, ..., n;
Each combination should have r elements;
A combination may contain more than one of an element, for instance (1,2,2) is valid;
Order matters, i.e. (1,2,3) and (1,3,2) are considered distinct;
However, two combinations are considered equivalent if one is a cyclic permutation of the other; for instance, (1,2,3) and (2,3,1) are considered duplicates.
Examples:
n=3, r=2
11 distinct combinations
(1,1,1), (1,1,2), (1,1,3), (1,2,2), (1,2,3), (1,3,2), (1,3,3), (2,2,2), (2,2,3), (2,3,3) and (3,3,3)
n=2, r=4
6 distinct combinations
(1,1,1,1), (1,1,1,2), (1,1,2,2), (1,2,1,2), (1,2,2,2), (2,2,2,2)
What is the algorithm for it? And how to implement it in c++?
Thank you in advance for advice.
Here is a naive solution in python:
Generate all combinations from the Cartesian product of {1, 2, ...,n} with itself r times;
Only keep one representative combination for each equivalency class; drop all other combinations that are equivalent to this representative combination.
This means we must have some way to compare combinations, and for instance, only keep the smallest combination of every equivalency class.
from itertools import product
def is_representative(comb):
return all(comb[i:] + comb[:i] >= comb
for i in range(1, len(comb)))
def cartesian_product_up_to_cyclic_permutations(n, r):
return filter(is_representative,
product(range(n), repeat=r))
print(list(cartesian_product_up_to_cyclic_permutations(3, 3)))
# [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 1), (0, 1, 2), (0, 2, 1), (0, 2, 2), (1, 1, 1), (1, 1, 2), (1, 2, 2), (2, 2, 2)]
print(list(cartesian_product_up_to_cyclic_permutations(2, 4)))
# [(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 1), (0, 1, 0, 1), (0, 1, 1, 1), (1, 1, 1, 1)]
You mentioned that you wanted to implement the algorithm in C++. The product function in the python code behaves just like a big for-loop that generates all the combinations in the Cartesian product. See this related question to implement Cartesian product in C++: Is it possible to execute n number of nested "loops(any)" where n is given?.

Sort by key then value which will then be grouped up...pyspark

So I'm trying to sort data in this format...
[((0, 4), 3), ((4, 0), 3), ((1, 6), 1), ((3, 2), 3), ((0, 5), 1)...
Ascending by key and then descending by value. I'm able to achieve this via...
test = test.sortBy(lambda x: (x[0], -x[1]))
which would give me based on shortened version above...
[((0, 4), 3), ((0, 5), 1), ((1, 6), 1), ((3, 2), 3), ((4, 0), 3)...
The problem I'm having is that after the sorting I no longer want the value but do need to retain the sort after grouping the data. So...
test = test.map(lambda x: (x[0][0],x[0][1]))
Gives me...
[(0, 4), (0, 5), (1, 6), (3, 2), (4, 0)...
Which is still in the order I need it but I need the elements to be grouped up by key. I then use this command...
test = test.groupByKey().map(lambda x: (x[0], list(x[1])))
But in the process I lose the sorting. Is there any way retain?
I managed to retain the order by changing the format of the tuple...
test = test.map(lambda x: (x[0][0],(x[0][1],x[1]))
test = test.groupByKey().map(lambda x: (x[0], sorted(list(x[1]), key=lambda x: (x[0],-x[1]))))
[(0, [(4, 3), (5, 1)] ...
which leaves me with the value (2nd element in the tuple) that I want to get rid of but took care of that too...
test = test.map(lambda x: (x[0], [e[0] for e in x[1]]))
Feels a bit hacky but not sure how else it could be done.

Alternating product of two lists

If I use itertools.product with two lists, the nested for loop equivalent always cycles the second list first:
>>> from itertools import product
>>> list(product([1,2,3], [4,5,6]))
[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]
But for some use cases I might want these orders to be alternating, much like popping the first items off each list, without actually popping them. The hypothetical function would first give [1, 4], then [2, 4] (1 popped), and then [2, 5] (4 popped), and then [3, 5], and finally [3, 6].
>>> list(hypothetical([1,2,3], [4,5,6]))
[(1, 4), (2, 4), (2, 5), (3, 5), (3, 6)]
The only method I can think of is yielding in a for loop with a "which list to pop from next" flag.
Is there a built-in or library method that does this? Will I have to write my own?
import itertools
L1 = [1, 2, 3]
L2 = [4, 5, 6]
print list(zip(itertools.islice((e for e in L1 for x in (1, 2)), 1, None), (e for e in L2 for x in (1, 2))))

Obtaining all groups of n pairs of numbers that pass condition

I have a list with certain combinations between two numbers:
[1 2] [1 4] [1 6] [3 4] [5 6] [3 6] [2 3] [4 5] [2 5]
Now I want to make groups of 3 combinations, where each group contains all six digits once, e.g.:
[1 2] [3 6] [4 5] is valid
[1 4] [2 3] [5 6] is valid
[1 2] [2 3] [5 6] is invalid
Order is not important.
How can I arrive upon a list of all possible groups, without employing a brute forcing algorithm?
The language it is implemented in doesn't matter. Description of an algorithm that could achieve this is enough.
One thing to notice is that there are only finitely many possible pairs of elements you can pick from the set {1,2,3,4,5,6}. Specifically, there are (6P2) = 30 of them if you consider order relevant and (6 choose 2) = 15 if you don't. Even the simple "try all triples" algorithm that runs in cubic time in this case will only have to look at at most (30 choose 3) = 4,060 triples, which is a pretty small number. I doubt that you'd have any problems in practice just doing this.
Here's a recursive function in Python that picks a pair of numbers from a list, and then calls itself with the remaining list:
def pairs(l, picked, ok_pairs):
n = len(l)
for a in range(n-1):
for b in range(a+1,n):
pair = (l[a],l[b])
if pair not in ok_pairs:
continue
if picked and picked[-1][0] > pair[0]:
continue
p = picked+[pair]
if len(l) > 2:
pairs([m for i,m in enumerate(l) if i not in [a, b]], p, ok_pairs)
else:
print p
ok_pairs = set([(1, 2), (1, 4), (1, 6), (3, 4), (5, 6), (3, 6), (2, 3), (4, 5), (2, 5)])
pairs([1,2,3,4,5,6], [], ok_pairs)
The output (of 6 triplets) is:
[(1, 2), (3, 4), (5, 6)]
[(1, 2), (3, 6), (4, 5)]
[(1, 4), (2, 3), (5, 6)]
[(1, 4), (2, 5), (3, 6)]
[(1, 6), (2, 3), (4, 5)]
[(1, 6), (2, 5), (3, 4)]
Here's a version using Python set arithmetic:
pairs = [(1, 2), (1, 4), (1, 6), (3, 4), (5, 6), (3, 6), (2, 3), (4, 5), (2, 5)]
n = len(pairs)
for i in range(n-2):
set1 = set(pairs[i])
for j in range(i+1,n-1):
set2 = set(pairs[j])
if set1 & set2:
continue
for k in range(j+1,n):
set3 = set(pairs[k])
if set1 & set3 or set2 & set3:
continue
print pairs[i], pairs[j], pairs[k]
The output is:
(1, 2) (3, 4) (5, 6)
(1, 2) (3, 6) (4, 5)
(1, 4) (5, 6) (2, 3)
(1, 4) (3, 6) (2, 5)
(1, 6) (3, 4) (2, 5)
(1, 6) (2, 3) (4, 5)

Mapping sort indexes

I encountered and solved this problem as part of a larger algorithm, but my solution seems inelegant and I would appreciate any insights.
I have a list of pairs which can be viewed as points on a Cartesian plane. I need to generate three lists: the sorted x values, the sorted y values, and a list which maps an index in the sorted x values with the index in the sorted y values corresponding to the y value with which it was originally paired.
A concrete example might help explain. Given the following list of points:
((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
The sorted list of x values would be (3, 4, 5, 7, 9, 15), and the sorted list of y values would be (0, 4, 7, 7, 11, 12).
Assuming a zero based indexing scheme, the list that maps the x list index to the index of its paired y list index would be (2, 3, 0, 4, 5, 1).
For example the value 7 appears as index 3 in the x list. The value in the mapping list at index 3 is 4, and the value at index 4 in the y list is 11, corresponding to the original pairing (7, 11).
What is the simplest way of generating this mapping list?
Here's a simple O(nlog n) method:
Sort the pairs by their x value: ((3, 7), (4, 7), (5, 0), (7, 11), (9, 12), (15, 4))
Produce a list of pairs in which the first component is the y value from the same position in the previous list and the second increases from 0: ((7, 0), (7, 1), (0, 2), (11, 3), (12, 4), (4, 5))
Sort this list by its first component (y value): ((0, 2), (4, 5), (7, 0), (7, 1), (11, 3), (12, 4))
Iterate through this list. For the ith such pair (y, k), set yFor[k] = i. yFor[] is your list (well, array) mapping indices in the sorted x list to indices in the sorted y list.
Create the sorted x list simply by removing the 2nd element from the list produced in step 1.
Create the sorted y list by doing the same with the list produced in step 3.
I propose the following.
Generate the unsorted x and y lists.
xs = [3, 15, 7, 5, 4, 9 ]
ys = [7, 4, 11, 0, 7, 12]
Transform each element into a tuple - the first of the pair being the coordinate, the second being the original index.
xs = [(3, 0), (15, 1), ( 7, 2), (5, 3), (4, 4), ( 9, 5)]
ys = [(7, 0), ( 4, 1), (11, 2), (0, 3), (7, 4), (12, 5)]
Sort both lists.
xs = [(3, 0), (4, 4), (5, 3), (7, 2), ( 9, 5), (15, 1)]
ys = [(0, 3), (4, 1), (7, 0), (7, 4), (11, 2), (12, 5)]
Create an array, y_positions. The nth element of the array contains the current index of the y element that was originally at index n.
Create an empty index_list.
For each element of xs, get the original_index, the second pair of the tuple.
Use y_positions to retrieve the current index of the y element with the given original_index. Add the current index to index_list.
Finally, remove the index values from xs and ys.
Here's a sample Python implementation.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
#generate unsorted lists
xs, ys = zip(*points)
#pair each element with its index
xs = zip(xs, range(len(xs)))
ys = zip(ys, range(len(xs)))
#sort
xs.sort()
ys.sort()
#generate the y positions list.
y_positions = [None] * len(ys)
for i in range(len(ys)):
original_index = ys[i][1]
y_positions[original_index] = i
#generate `index_list`
index_list = []
for x, original_index in xs:
index_list.append(y_positions[original_index])
#remove tuples from x and y lists
xs = zip(*xs)[0]
ys = zip(*ys)[0]
print "xs:", xs
print "ys:", ys
print "index list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index list: [2, 3, 0, 4, 5, 1]
Generation of y_positions and index_list is O(n) time, so the complexity of the algorithm as a whole is dominated by the sorting step.
Thank you for the answers. For what it's worth, the solution I had was pretty similar to those outlined, but as j_random_hacker pointed out, there's no need for a map. It just struck me that this little problem seems more complicated than it appears at first glance and I was wondering if I was missing something obvious. I've rehashed my solution into Python for comparison.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
# Separate the points into their x and y components, tag the values with
# their index into the points list.
# Sort both resulting (value, tag) lists and then unzip them into lists of
# sorted x and y values and the tag information.
xs, s = zip(*sorted(zip([x for (x, y) in points], range(N))))
ys, r = zip(*sorted(zip([y for (x, y) in points], range(N))))
# Generate the mapping list.
t = N * [0]
for i in range(N):
t[r[i]] = i
index_list = [t[j] for j in s]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index_list: [2, 3, 0, 4, 5, 1]
I've just understood what j_random_hacker meant by removing a level of indirection by sorting the points in x initially. That allows things to be tidied up nicely. Thanks.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
ordered_by_x = sorted(points)
ordered_by_y = sorted(zip([y for (x, y) in ordered_by_x], range(N)))
index_list = N * [0]
for i, (y, k) in enumerate(ordered_by_y):
index_list[k] = i
xs = [x for (x, y) in ordered_by_x]
ys = [y for (y, k) in ordered_by_y]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list

Resources