Related
I am organizing a tournament where 12 players are going to meet each other on 10 board games.
I want each player to play at least one time with the 11 others players over the 10 board games.
For example :
BoardGame1 - Match1 - Player1 + Player2 + Player3
BoardGame1 - Match2 - Player4 + Player5 + Player6
BoardGame1 - Match3 - Player7 + Player8 + Player9
BoardGame1 - Match4 - Player10 + Player11 + Player12
[...]
BoardGame10 - Match1 - Player1 + Player11 + Player9
BoardGame10 - Match2 - Player4 + Player2 + Player12
BoardGame10 - Match3 - Player7 + Player5 + Player3
BoardGame10 - Match4 - Player10 + Player8 + Player6
How do you create an algorithm where you distribute the players evenly?
I'd like to do it with TDD approach, so I need to predict the expected result (meaning no random distribution).
If all players play each other exactly once, then the resulting object
would be a Kirkman Triple System. There is no KTS with the parameters
you want, but since each player has twenty opponent slots and only
eleven potential opponents, it should be easy to find a suitable
schedule.
The code below generates the (11 choose 2) × (8 choose 2) × (5 choose 2)
= 15400 possibilities for one game and repeatedly greedily chooses the
one that spreads the pairings in the fairest manner.
import collections
import itertools
import pprint
def partitions(players, k):
players = set(players)
assert len(players) % k == 0
if len(players) == 0:
yield []
else:
players = players.copy()
leader = min(players)
players.remove(leader)
for comb in itertools.combinations(sorted(players), k - 1):
group = {leader} | set(comb)
for part in partitions(players - group, k):
yield [group] + part
def update_pair_counts(pair_counts, game):
for match in game:
pair_counts.update(itertools.combinations(sorted(match), 2))
def evaluate_game(pair_counts, game):
pair_counts = pair_counts.copy()
update_pair_counts(pair_counts, game)
objective = [0] * max(pair_counts.values())
for count in pair_counts.values():
objective[count - 1] += 1
total = 0
for i in range(len(objective) - 1, -1, -1):
total += objective[i]
objective[i] = total
return objective
def schedule(n_groups, n_players_per_group, n_games):
games = list(partitions(range(n_groups * n_players_per_group), n_players_per_group))
pair_counts = collections.Counter()
for i in range(n_games):
game = max(games, key=lambda game: evaluate_game(pair_counts, game))
yield game
update_pair_counts(pair_counts, game)
def main():
pair_counts = collections.Counter()
for game in schedule(4, 3, 10):
pprint.pprint(game)
update_pair_counts(pair_counts, game)
print()
pprint.pprint(pair_counts)
if __name__ == "__main__":
main()
Sample output:
[{0, 1, 2}, {3, 4, 5}, {8, 6, 7}, {9, 10, 11}]
[{0, 3, 6}, {1, 4, 9}, {2, 10, 7}, {8, 11, 5}]
[{0, 4, 7}, {3, 1, 11}, {8, 9, 2}, {10, 5, 6}]
[{0, 1, 5}, {2, 11, 6}, {9, 3, 7}, {8, 10, 4}]
[{0, 8, 3}, {1, 10, 6}, {2, 11, 4}, {9, 5, 7}]
[{0, 10, 11}, {8, 1, 7}, {2, 3, 5}, {9, 4, 6}]
[{0, 9, 2}, {1, 10, 3}, {8, 4, 5}, {11, 6, 7}]
[{0, 4, 6}, {1, 2, 5}, {10, 3, 7}, {8, 9, 11}]
[{0, 5, 7}, {1, 11, 4}, {8, 2, 10}, {9, 3, 6}]
[{0, 3, 11}, {8, 1, 6}, {2, 4, 7}, {9, 10, 5}]
Counter({(0, 3): 3,
(0, 1): 2,
(0, 2): 2,
(1, 2): 2,
(3, 5): 2,
(4, 5): 2,
(6, 7): 2,
(6, 8): 2,
(7, 8): 2,
(9, 10): 2,
(9, 11): 2,
(10, 11): 2,
(0, 6): 2,
(3, 6): 2,
(1, 4): 2,
(4, 9): 2,
(2, 7): 2,
(2, 10): 2,
(7, 10): 2,
(5, 8): 2,
(8, 11): 2,
(0, 4): 2,
(0, 7): 2,
(4, 7): 2,
(1, 3): 2,
(1, 11): 2,
(3, 11): 2,
(2, 8): 2,
(2, 9): 2,
(8, 9): 2,
(5, 10): 2,
(6, 10): 2,
(0, 5): 2,
(1, 5): 2,
(2, 11): 2,
(6, 11): 2,
(3, 7): 2,
(3, 9): 2,
(7, 9): 2,
(4, 8): 2,
(8, 10): 2,
(1, 6): 2,
(1, 10): 2,
(2, 4): 2,
(4, 11): 2,
(5, 7): 2,
(5, 9): 2,
(0, 11): 2,
(1, 8): 2,
(2, 5): 2,
(4, 6): 2,
(6, 9): 2,
(3, 10): 2,
(3, 4): 1,
(1, 9): 1,
(5, 11): 1,
(5, 6): 1,
(2, 6): 1,
(4, 10): 1,
(0, 8): 1,
(3, 8): 1,
(0, 10): 1,
(1, 7): 1,
(2, 3): 1,
(0, 9): 1,
(7, 11): 1})
How do you create an algorithm where you distribute the players evenly? I'd like to do it with TDD approach, so I need to predict the expected result (meaning no random distribution).
TDD tends to be successful when your problem is that you know what the computer should do, and you know how to make the computer do that, but you don't know the best way to write the code.
When you don't know how to make the computer do what you want, TDD is a lot harder. There are two typical approaches taken here.
The first is to perform a "spike" - sit down and hack things until you understand how to make the computer do what you want. The key feature of spikes is that you don't get to keep the code changes at the end; instead, you discard your spiked code, keep what you have learned in your head, and start over by writing the tests that you need.
The second approach is to sort of sneak up on it - you do TDD for the very simple cases that you do understand, and keep adding tests that are just a little bit harder than what you have already done. See Robert Martin's Craftsman series for an example of this approach.
For this problem, you might begin by first thinking of an interface that you might use for accessing the algorithm. For instance, you might consider a design that accepts as input a number of players and a number of games, and returns you a sequence of tuples, where each tuple represents a single match.
Typically, this version of the interface will look like general purpose data structures as inputs (in this example: numbers), and general purpose data structures as outputs (the list of tuples).
Most commonly, we'll verify the behavior in each test by figuring out what the answer should be for a given set of inputs, and asserting that the actual data structure exactly matches the expected. For a list of tuples, that would look something like:
assert len(expected) == len(actual)
for x in range(actual):
assert len(expected[x]) == len(actual[x])
for y in range(actual[x]):
assert expected[x][y] == actual[x][y]
Although of course you could refactor that into something that looks nicer
assert expected == actual
Another possibility is to think about the properties that a solution should have, and verify that the actual result is consistent with those properties. Here, you seem to have two properties that are required for every solution:
Each pair of players should appear exactly once in the list of matches
Every player, boardgame pair should appear exactly once in the list of matches
In this case, the answer is easy enough to check (iterate through all of the matches, count each pair, assert count equals one).
The test themselves we introduce by starting with the easiest example we can think of. Here, that might be the case where we have 2 players and 1 board, and our answer should be
BoardGame1 - Match1 - Player1 + Player2
And so we write that test (RED), and hard code this specific answer (GREEN), and then (REFACTOR) the code so that it is clear to the reader why this is the correct answer for these inputs.
And when you are happy with that code, you look for the next example - an example where the current implementation returns the wrong answer, but the change that you need to make to get it to return the write answer is small/easy.
Often, what will happen is that we "pass" the next test using a branch:
if special_case(inputs):
return answer_for_special_case
else:
# ... real implementation here ...
return answer_for_general_case
And then refactor the code until the two blocks are the same, then finally remove the if clause.
It will sometimes happen that the new test is too big, and we can't figure out how to extend the algorithm to cover the new case. Usually the play is to revert any changes we've made (keeping the tests passing), and use what we have learned to find a different test that might be easier to introduce to the code.
And you keep iterating on this process until you have solved "all" of the problem.
Here is a resolvable triple system due to Haim Hanani (“On resolvable
balanced incomplete block designs”, 1974), which provides a schedule for
11 games (drop one). Unfortunately it repeats matches.
import collections
import itertools
from pprint import pprint
def Match(a, b, c):
return tuple(sorted([a, b, c]))
games = []
for j in range(4):
games.append(
[
Match(0 ^ j, 4 ^ j, 8 ^ j),
Match(1 ^ j, 2 ^ j, 3 ^ j),
Match(5 ^ j, 6 ^ j, 7 ^ j),
Match(9 ^ j, 10 ^ j, 11 ^ j),
]
)
games.append([Match(1 ^ j, 6 ^ j, 11 ^ j) for j in range(4)])
games.append([Match(2 ^ j, 7 ^ j, 9 ^ j) for j in range(4)])
games.append([Match(3 ^ j, 5 ^ j, 10 ^ j) for j in range(4)])
for j in range(4):
games.append(
[
Match(0 ^ j, 4 ^ j, 8 ^ j),
Match(1 ^ j, 6 ^ j, 11 ^ j),
Match(2 ^ j, 7 ^ j, 9 ^ j),
Match(3 ^ j, 5 ^ j, 10 ^ j),
]
)
for game in games:
game.sort()
pprint(len(games))
pprint(games)
pair_counts = collections.Counter()
for game in games:
for triple in game:
pair_counts.update(itertools.combinations(sorted(triple), 2))
pprint(max(pair_counts.values()))
Output:
11
[[(0, 4, 8), (1, 2, 3), (5, 6, 7), (9, 10, 11)],
[(0, 2, 3), (1, 5, 9), (4, 6, 7), (8, 10, 11)],
[(0, 1, 3), (2, 6, 10), (4, 5, 7), (8, 9, 11)],
[(0, 1, 2), (3, 7, 11), (4, 5, 6), (8, 9, 10)],
[(0, 7, 10), (1, 6, 11), (2, 5, 8), (3, 4, 9)],
[(0, 5, 11), (1, 4, 10), (2, 7, 9), (3, 6, 8)],
[(0, 6, 9), (1, 7, 8), (2, 4, 11), (3, 5, 10)],
[(0, 4, 8), (1, 6, 11), (2, 7, 9), (3, 5, 10)],
[(0, 7, 10), (1, 5, 9), (2, 4, 11), (3, 6, 8)],
[(0, 5, 11), (1, 7, 8), (2, 6, 10), (3, 4, 9)],
[(0, 6, 9), (1, 4, 10), (2, 5, 8), (3, 7, 11)]]
2
Combinatorial optimization is another possibility. This one doesn’t
scale super well but can handle 12 players/10 games.
import collections
import itertools
from pprint import pprint
def partitions(V):
if not V:
yield []
return
a = min(V)
V.remove(a)
for b, c in itertools.combinations(sorted(V), 2):
for part in partitions(V - {b, c}):
yield [(a, b, c)] + part
parts = list(partitions(set(range(12))))
from ortools.sat.python import cp_model
model = cp_model.CpModel()
vars = [model.NewBoolVar("") for part in parts]
model.Add(sum(vars) == 10)
pairs = collections.defaultdict(list)
for part, var in zip(parts, vars):
for (a, b, c) in part:
pairs[(a, b)].append(var)
pairs[(a, c)].append(var)
pairs[(b, c)].append(var)
for clique in pairs.values():
total = sum(clique)
model.Add(1 <= total)
model.Add(total <= 2)
solver = cp_model.CpSolver()
status = solver.Solve(model)
print(solver.StatusName(status))
schedule = []
for part, var in zip(parts, vars):
if solver.Value(var):
schedule.append(part)
pprint(schedule)
Sample output:
OPTIMAL
[[(0, 1, 6), (2, 4, 9), (3, 8, 11), (5, 7, 10)],
[(0, 1, 10), (2, 3, 5), (4, 8, 9), (6, 7, 11)],
[(0, 2, 4), (1, 8, 10), (3, 7, 11), (5, 6, 9)],
[(0, 2, 8), (1, 4, 11), (3, 7, 9), (5, 6, 10)],
[(0, 3, 6), (1, 4, 7), (2, 5, 8), (9, 10, 11)],
[(0, 3, 8), (1, 5, 11), (2, 6, 9), (4, 7, 10)],
[(0, 4, 5), (1, 2, 7), (3, 6, 10), (8, 9, 11)],
[(0, 5, 11), (1, 3, 9), (2, 6, 7), (4, 8, 10)],
[(0, 7, 9), (1, 6, 8), (2, 10, 11), (3, 4, 5)],
[(0, 9, 10), (1, 2, 3), (4, 6, 11), (5, 7, 8)]]
I have a list with certain combinations between two numbers:
[1 2] [1 4] [1 6] [3 4] [5 6] [3 6] [2 3] [4 5] [2 5]
Now I want to make groups of 3 combinations, where each group contains all six digits once, e.g.:
[1 2] [3 6] [4 5] is valid
[1 4] [2 3] [5 6] is valid
[1 2] [2 3] [5 6] is invalid
Order is not important.
How can I arrive upon a list of all possible groups, without employing a brute forcing algorithm?
The language it is implemented in doesn't matter. Description of an algorithm that could achieve this is enough.
One thing to notice is that there are only finitely many possible pairs of elements you can pick from the set {1,2,3,4,5,6}. Specifically, there are (6P2) = 30 of them if you consider order relevant and (6 choose 2) = 15 if you don't. Even the simple "try all triples" algorithm that runs in cubic time in this case will only have to look at at most (30 choose 3) = 4,060 triples, which is a pretty small number. I doubt that you'd have any problems in practice just doing this.
Here's a recursive function in Python that picks a pair of numbers from a list, and then calls itself with the remaining list:
def pairs(l, picked, ok_pairs):
n = len(l)
for a in range(n-1):
for b in range(a+1,n):
pair = (l[a],l[b])
if pair not in ok_pairs:
continue
if picked and picked[-1][0] > pair[0]:
continue
p = picked+[pair]
if len(l) > 2:
pairs([m for i,m in enumerate(l) if i not in [a, b]], p, ok_pairs)
else:
print p
ok_pairs = set([(1, 2), (1, 4), (1, 6), (3, 4), (5, 6), (3, 6), (2, 3), (4, 5), (2, 5)])
pairs([1,2,3,4,5,6], [], ok_pairs)
The output (of 6 triplets) is:
[(1, 2), (3, 4), (5, 6)]
[(1, 2), (3, 6), (4, 5)]
[(1, 4), (2, 3), (5, 6)]
[(1, 4), (2, 5), (3, 6)]
[(1, 6), (2, 3), (4, 5)]
[(1, 6), (2, 5), (3, 4)]
Here's a version using Python set arithmetic:
pairs = [(1, 2), (1, 4), (1, 6), (3, 4), (5, 6), (3, 6), (2, 3), (4, 5), (2, 5)]
n = len(pairs)
for i in range(n-2):
set1 = set(pairs[i])
for j in range(i+1,n-1):
set2 = set(pairs[j])
if set1 & set2:
continue
for k in range(j+1,n):
set3 = set(pairs[k])
if set1 & set3 or set2 & set3:
continue
print pairs[i], pairs[j], pairs[k]
The output is:
(1, 2) (3, 4) (5, 6)
(1, 2) (3, 6) (4, 5)
(1, 4) (5, 6) (2, 3)
(1, 4) (3, 6) (2, 5)
(1, 6) (3, 4) (2, 5)
(1, 6) (2, 3) (4, 5)
I am trying to do this in its operation algorithm quicksort to sort though the elements of a list of tuples. Or if I have a list of this type [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )] I want to sort it in function of the second element of each tuple and obtain [(6,4), (3,3), (4,2), (0,1), (1,1), (2,1 ), (5,1)]. I have tried using the following algorithm:
def partition(array, begin, end, cmp):
pivot=array[end][1]
ii=begin
for jj in xrange(begin, end):
if cmp(array[jj][1], pivot):
array[ii], array[jj] = array[jj], array[ii]
ii+=1
array[ii], array[end] = pivot, array[ii]
return ii
enter code hedef sort(array, cmp=lambda x, y: x > y, begin=0, end=None):
if end is None: end = len(array)
if begin < end:
i = partition(array, begin, end-1, cmp)
sort(array, cmp, i+1, end)
sort(array, cmp, begin, i)
The problem is that the result is this: [4, (3, 3), (4, 2), 1, 1, 1, (5, 1)]. What do I have to change to get the correct result ??
Complex sorting patterns in Python are painless. Python's sorting algorithm is state of the art, one of the fastest available in real-world cases. No algorithm design needed.
>>> from operator import itemgetter
>>> l = [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )]
>>> l.sort(key=itemgetter(1), reverse=True)
>>> l
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
Above, itemgetter returns a function that returns the second element of its argument. Thus the key argument to sort is a function that returns the item on which to sort the list.
Python's sort is stable, so the ordering of elements with equal keys (in this case, the second item of each tuple) is determined by the original order.
Unfortunately the answer from #wkschwartz only works due to the peculiar start ordering of the terms. If the tuple (5, 1) is moved to the beginning of the list then it gives a different answer.
The following (first) method works in that it gives the same result for any initial ordering of the items in the initial list.
Python 3.4.2 |Continuum Analytics, Inc.| (default, Oct 22 2014, 11:51:45) [MSC v
.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> l = [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )]
>>> sorted(l, key=lambda x: (-x[1], x[0]))
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>> from operator import itemgetter
>>> sorted(l, key=itemgetter(1), reverse=True)
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>> # but note:
>>> l2 = [(5,1), (1,1), (2,1), (3,3), (4,2), (0,1), (6,4 )]
>>> # Swapped first and sixth elements
>>> sorted(l2, key=itemgetter(1), reverse=True)
[(6, 4), (3, 3), (4, 2), (5, 1), (1, 1), (2, 1), (0, 1)]
>>> sorted(l2, key=lambda x: (-x[1], x[0]))
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>>
So if I had the numbers [1,2,2,3] and I want k=2 partitions I'd have [1][2,2,3], [1,2][2,3], [2,2][1,3], [2][1,2,3], [3][1,2,2], etc.
See an answer in Python at Code Review.
user3569's solution at Code Review produces five 2-tuples for the test case below, instead of exclusively 3-tuples. However, removing the frozenset() call for the returned tuples leads to the code returning exclusively 3-tuples. The revised code is as follows:
from itertools import chain, combinations
def subsets(arr):
""" Note this only returns non empty subsets of arr"""
return chain(*[combinations(arr,i + 1) for i,a in enumerate(arr)])
def k_subset(arr, k):
s_arr = sorted(arr)
return set([i for i in combinations(subsets(arr),k)
if sorted(chain(*i)) == s_arr])
s = k_subset([2,2,2,2,3,3,5],3)
for ss in sorted(s):
print(len(ss)," - ",ss)
As user3569 says "it runs pretty slow, but is fairly concise".
(EDIT: see below for Knuth's solution)
The output is:
3 - ((2,), (2,), (2, 2, 3, 3, 5))
3 - ((2,), (2, 2), (2, 3, 3, 5))
3 - ((2,), (2, 2, 2), (3, 3, 5))
3 - ((2,), (2, 2, 3), (2, 3, 5))
3 - ((2,), (2, 2, 5), (2, 3, 3))
3 - ((2,), (2, 3), (2, 2, 3, 5))
3 - ((2,), (2, 3, 3), (2, 2, 5))
3 - ((2,), (2, 3, 5), (2, 2, 3))
3 - ((2,), (2, 5), (2, 2, 3, 3))
3 - ((2,), (3,), (2, 2, 2, 3, 5))
3 - ((2,), (3, 3), (2, 2, 2, 5))
3 - ((2,), (3, 5), (2, 2, 2, 3))
3 - ((2,), (5,), (2, 2, 2, 3, 3))
3 - ((2, 2), (2, 2), (3, 3, 5))
3 - ((2, 2), (2, 3), (2, 3, 5))
3 - ((2, 2), (2, 5), (2, 3, 3))
3 - ((2, 2), (3, 3), (2, 2, 5))
3 - ((2, 2), (3, 5), (2, 2, 3))
3 - ((2, 3), (2, 2), (2, 3, 5))
3 - ((2, 3), (2, 3), (2, 2, 5))
3 - ((2, 3), (2, 5), (2, 2, 3))
3 - ((2, 3), (3, 5), (2, 2, 2))
3 - ((2, 5), (2, 2), (2, 3, 3))
3 - ((2, 5), (2, 3), (2, 2, 3))
3 - ((2, 5), (3, 3), (2, 2, 2))
3 - ((3,), (2, 2), (2, 2, 3, 5))
3 - ((3,), (2, 2, 2), (2, 3, 5))
3 - ((3,), (2, 2, 3), (2, 2, 5))
3 - ((3,), (2, 2, 5), (2, 2, 3))
3 - ((3,), (2, 3), (2, 2, 2, 5))
3 - ((3,), (2, 3, 5), (2, 2, 2))
3 - ((3,), (2, 5), (2, 2, 2, 3))
3 - ((3,), (3,), (2, 2, 2, 2, 5))
3 - ((3,), (3, 5), (2, 2, 2, 2))
3 - ((3,), (5,), (2, 2, 2, 2, 3))
3 - ((5,), (2, 2), (2, 2, 3, 3))
3 - ((5,), (2, 2, 2), (2, 3, 3))
3 - ((5,), (2, 2, 3), (2, 2, 3))
3 - ((5,), (2, 3), (2, 2, 2, 3))
3 - ((5,), (2, 3, 3), (2, 2, 2))
3 - ((5,), (3, 3), (2, 2, 2, 2))
Knuth's solution, as implemented by Adeel Zafar Soomro on the same Code Review page can be called as follows if no duplicates are desired:
s = algorithm_u([2,2,2,2,3,3,5],3)
ss = set(tuple(sorted(tuple(tuple(y) for y in x) for x in s)))
I haven't timed it, but Knuth's solution is visibly faster, even for this test case.
However, it returns 63 tuples rather than the 41 returned by user3569's solution. I haven't yet gone through the output closely enough to establish which output is correct.
Here's a version in Haskell:
import Data.List (nub, sort, permutations)
parts 0 = []
parts n = nub $ map sort $ [n] : [x:xs | x <- [1..n`div`2], xs <- parts(n - x)]
partition [] ys result = sort $ map sort result
partition (x:xs) ys result =
partition xs (drop x ys) (result ++ [take x ys])
partitions xs k =
let variations = filter (\x -> length x == k) $ parts (length xs)
in nub $ concat $ map (\x -> mapVariation x (nub $ permutations xs)) variations
where mapVariation variation = map (\x -> partition variation x [])
OUTPUT:
*Main> partitions [1,2,2,3] 2
[[[1],[2,2,3]],[[1,2,3],[2]],[[1,2,2],[3]],[[1,2],[2,3]],[[1,3],[2,2]]]
Python solution:
pip install PartitionSets
Then:
import partitionsets.partition
filter(lambda x: len(x) == k, partitionsets.partition.Partition(arr))
The PartitionSets implementation seems to be pretty fast however it's a pity you can't pass number of partitions as an argument, so you need to filter your k-set partitions from all subset partitions.
You may also want to look at:
similar topic on researchgate.
I encountered and solved this problem as part of a larger algorithm, but my solution seems inelegant and I would appreciate any insights.
I have a list of pairs which can be viewed as points on a Cartesian plane. I need to generate three lists: the sorted x values, the sorted y values, and a list which maps an index in the sorted x values with the index in the sorted y values corresponding to the y value with which it was originally paired.
A concrete example might help explain. Given the following list of points:
((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
The sorted list of x values would be (3, 4, 5, 7, 9, 15), and the sorted list of y values would be (0, 4, 7, 7, 11, 12).
Assuming a zero based indexing scheme, the list that maps the x list index to the index of its paired y list index would be (2, 3, 0, 4, 5, 1).
For example the value 7 appears as index 3 in the x list. The value in the mapping list at index 3 is 4, and the value at index 4 in the y list is 11, corresponding to the original pairing (7, 11).
What is the simplest way of generating this mapping list?
Here's a simple O(nlog n) method:
Sort the pairs by their x value: ((3, 7), (4, 7), (5, 0), (7, 11), (9, 12), (15, 4))
Produce a list of pairs in which the first component is the y value from the same position in the previous list and the second increases from 0: ((7, 0), (7, 1), (0, 2), (11, 3), (12, 4), (4, 5))
Sort this list by its first component (y value): ((0, 2), (4, 5), (7, 0), (7, 1), (11, 3), (12, 4))
Iterate through this list. For the ith such pair (y, k), set yFor[k] = i. yFor[] is your list (well, array) mapping indices in the sorted x list to indices in the sorted y list.
Create the sorted x list simply by removing the 2nd element from the list produced in step 1.
Create the sorted y list by doing the same with the list produced in step 3.
I propose the following.
Generate the unsorted x and y lists.
xs = [3, 15, 7, 5, 4, 9 ]
ys = [7, 4, 11, 0, 7, 12]
Transform each element into a tuple - the first of the pair being the coordinate, the second being the original index.
xs = [(3, 0), (15, 1), ( 7, 2), (5, 3), (4, 4), ( 9, 5)]
ys = [(7, 0), ( 4, 1), (11, 2), (0, 3), (7, 4), (12, 5)]
Sort both lists.
xs = [(3, 0), (4, 4), (5, 3), (7, 2), ( 9, 5), (15, 1)]
ys = [(0, 3), (4, 1), (7, 0), (7, 4), (11, 2), (12, 5)]
Create an array, y_positions. The nth element of the array contains the current index of the y element that was originally at index n.
Create an empty index_list.
For each element of xs, get the original_index, the second pair of the tuple.
Use y_positions to retrieve the current index of the y element with the given original_index. Add the current index to index_list.
Finally, remove the index values from xs and ys.
Here's a sample Python implementation.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
#generate unsorted lists
xs, ys = zip(*points)
#pair each element with its index
xs = zip(xs, range(len(xs)))
ys = zip(ys, range(len(xs)))
#sort
xs.sort()
ys.sort()
#generate the y positions list.
y_positions = [None] * len(ys)
for i in range(len(ys)):
original_index = ys[i][1]
y_positions[original_index] = i
#generate `index_list`
index_list = []
for x, original_index in xs:
index_list.append(y_positions[original_index])
#remove tuples from x and y lists
xs = zip(*xs)[0]
ys = zip(*ys)[0]
print "xs:", xs
print "ys:", ys
print "index list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index list: [2, 3, 0, 4, 5, 1]
Generation of y_positions and index_list is O(n) time, so the complexity of the algorithm as a whole is dominated by the sorting step.
Thank you for the answers. For what it's worth, the solution I had was pretty similar to those outlined, but as j_random_hacker pointed out, there's no need for a map. It just struck me that this little problem seems more complicated than it appears at first glance and I was wondering if I was missing something obvious. I've rehashed my solution into Python for comparison.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
# Separate the points into their x and y components, tag the values with
# their index into the points list.
# Sort both resulting (value, tag) lists and then unzip them into lists of
# sorted x and y values and the tag information.
xs, s = zip(*sorted(zip([x for (x, y) in points], range(N))))
ys, r = zip(*sorted(zip([y for (x, y) in points], range(N))))
# Generate the mapping list.
t = N * [0]
for i in range(N):
t[r[i]] = i
index_list = [t[j] for j in s]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index_list: [2, 3, 0, 4, 5, 1]
I've just understood what j_random_hacker meant by removing a level of indirection by sorting the points in x initially. That allows things to be tidied up nicely. Thanks.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
ordered_by_x = sorted(points)
ordered_by_y = sorted(zip([y for (x, y) in ordered_by_x], range(N)))
index_list = N * [0]
for i, (y, k) in enumerate(ordered_by_y):
index_list[k] = i
xs = [x for (x, y) in ordered_by_x]
ys = [y for (y, k) in ordered_by_y]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list