Maximum sum of combinations (algorithm optimization) - algorithm

I have a dictionary of length n choose m, with keys of tuples of length m containing all combinations of integers 1 to n. I would like to compute the maximum sum of the values of this dictionary such that the indices of the tuple keys are unique, storing the combinations that make up this maximum.
Example input:
input_dict = {
(1, 2): 16,
(1, 3): 4,
(1, 4): 13,
(1, 5): 8,
(1, 6): 9,
(2, 3): 6,
(2, 4): 19,
(2, 5): 7,
(2, 6): 16,
(3, 4): 12,
(3, 5): 23,
(3, 6): 12,
(4, 5): 17,
(4, 6): 19,
(5, 6): 21
}
Example output:
(((1, 2), (3, 5), (4, 6)), 58)
My current approach: compute the sum of all unique combinations of outputs and take the maximum.
gen_results = [{}]
for (key, val) in input_dict.items():
gen_results[0][(key,)] = val
i = 0
complete = False
while not complete:
complete = True
gen_results.append({})
for (combinations, running_sum) in gen_results[i].items():
for (key, val) in input_dict.items():
unique_combination = True
for combination in combinations:
for idx in key:
if idx in combination:
unique_combination = False
break
if not unique_combination:
break
if unique_combination:
complete = False
gen_results[i+1][combinations + (key,)] = running_sum + val
i += 1
generation_maximums = []
for gen_result in gen_results:
if gen_result == {}:
continue
generation_maximums.append(max(gen_result.items(), key=(lambda x: x[1])))
print(max(generation_maximums, key=(lambda x: x[1])))
How can I improve my algorithm for large n and m?

if you don't go for integer programming, you can often brute force these things using bits as hashes
e.g. the following outputs 58
input_dict = {
(1, 2): 16,
(1, 3): 4,
(1, 4): 13,
(1, 5): 8,
(1, 6): 9,
(2, 3): 6,
(2, 4): 19,
(2, 5): 7,
(2, 6): 16,
(3, 4): 12,
(3, 5): 23,
(3, 6): 12,
(4, 5): 17,
(4, 6): 19,
(5, 6): 21
}
dp = {}
n, m = 2, 6
for group, score in input_dict.items():
bit_hash = 0
for x in group:
bit_hash += 1 << (x-1)
dp[bit_hash] = score
while True:
items = dp.items()
for hash1, score1 in items:
for hash2, score2 in items:
if hash1 & hash2 == 0:
dp[hash1|hash2] = max(dp.get(hash1|hash2), score1+score2)
if len(dp) == (1<<m)/2-1:
print dp[(1<<m)-1]
break

Related

Assigning color to nodes in a graph such that no two neighbor nodes to it has same color

If you see the above graph, no nodes next to each other has the same color. I created a grid graph with diagonal edges across nodes using networkx python and applied greedy color to it.
greed = nx.coloring.greedy_color(G)
print(greed)
which gives the output
{(1, 1): 0, (1, 2): 1, (1, 3): 0, (1, 4): 1, (1, 5): 0, (1, 6): 1, (1, 7): 0, (1, 8): 1, (2, 1): 2, (2, 2): 3, (2, 3): 2, (2, 4): 3, (2, 5): 2, (2, 6): 3, (2, 7): 2, (2, 8): 3, (3, 1): 0, (3, 2): 1, (3, 3): 0, (3, 4): 1, (3, 5): 0, (3, 6): 1, (3, 7): 0, (3, 8): 1, (4, 1): 2, (4, 2): 3, (4, 3): 2, (4, 4): 3, (4, 5): 2, (4, 6): 3, (4, 7): 2, (4, 8): 3, (5, 1): 0, (5, 2): 1, (5, 3): 0, (5, 4): 1, (5, 5): 0, (5, 6): 1, (5, 7): 0, (5, 8): 1, (6, 1): 2, (6, 2): 3, (6, 3): 2, (6, 4): 3, (6, 5): 2, (6, 6): 3, (6, 7): 2, (6, 8): 3, (7, 1): 0, (7, 2): 1, (7, 3): 0, (7, 4): 1, (7, 5): 0, (7, 6): 1, (7, 7): 0, (7, 8): 1, (8, 1): 2, (8, 2): 3, (8, 3): 2, (8, 4): 3, (8, 5): 2, (8, 6): 3, (8, 7): 2, (8, 8): 3, (0, 1): 2, (0, 2): 3, (0, 3): 2, (0, 4): 3, (0, 5): 2, (0, 6): 3, (0, 7): 2, (0, 8): 3, (1, 0): 1, (1, 9): 0, (2, 0): 3, (2, 9): 2, (3, 0): 1, (3, 9): 0, (4, 0): 3, (4, 9): 2, (5, 0): 1, (5, 9): 0, (6, 0): 3, (6, 9): 2, (7, 0): 1, (7, 9): 0, (8, 0): 3, (8, 9): 2, (9, 1): 0, (9, 2): 1, (9, 3): 0, (9, 4): 1, (9, 5): 0, (9, 6): 1, (9, 7): 0, (9, 8): 1, (0, 0): 3, (0, 9): 2, (9, 0): 1, (9, 9): 0}
after sorting
{(0, 0): 3, (0, 1): 2, (0, 2): 3, (0, 3): 2, (0, 4): 3, (0, 5): 2, (0, 6): 3, (0, 7): 2, (0, 8): 3, (0, 9): 2, (1, 0): 1, (1, 1): 0, (1, 2): 1, (1, 3): 0, (1, 4): 1, (1, 5): 0, (1, 6): 1, (1, 7): 0, (1, 8): 1, (1, 9): 0, (2, 0): 3, (2, 1): 2, (2, 2): 3, (2, 3): 2, (2, 4): 3, (2, 5): 2, (2, 6): 3, (2, 7): 2, (2, 8): 3, (2, 9): 2, (3, 0): 1, (3, 1): 0, (3, 2): 1, (3, 3): 0, (3, 4): 1, (3, 5): 0, (3, 6): 1, (3, 7): 0, (3, 8): 1, (3, 9): 0, (4, 0): 3, (4, 1): 2, (4, 2): 3, (4, 3): 2, (4, 4): 3, (4, 5): 2, (4, 6): 3, (4, 7): 2, (4, 8): 3, (4, 9): 2, (5, 0): 1, (5, 1): 0, (5, 2): 1, (5, 3): 0, (5, 4): 1, (5, 5): 0, (5, 6): 1, (5, 7): 0, (5, 8): 1, (5, 9): 0, (6, 0): 3, (6, 1): 2, (6, 2): 3, (6, 3): 2, (6, 4): 3, (6, 5): 2, (6, 6): 3, (6, 7): 2, (6, 8): 3, (6, 9): 2, (7, 0): 1, (7, 1): 0, (7, 2): 1, (7, 3): 0, (7, 4): 1, (7, 5): 0, (7, 6): 1, (7, 7): 0, (7, 8): 1, (7, 9): 0, (8, 0): 3, (8, 1): 2, (8, 2): 3, (8, 3): 2, (8, 4): 3, (8, 5): 2, (8, 6): 3, (8, 7): 2, (8, 8): 3, (8, 9): 2, (9, 0): 1, (9, 1): 0, (9, 2): 1, (9, 3): 0, (9, 4): 1, (9, 5): 0, (9, 6): 1, (9, 7): 0, (9, 8): 1, (9, 9): 0}
But I want it to be in such a way that no two adjacent/neighbor nodes to a node should have the same color
In the above figure, (1,4) [green] has its neighbors (1,3) [red] and (1,5) [red]. In this case both nodes next to node (1,4) are red. But I want (1,3) and (1,5) in different colors. Can anyone tell me how to solve this problem?
I tried greedy color method from networkx to color in such a way that no two nodes adjacent to each other have the same color.
The problem is that you have an additional constraint that the coloring algorithm does not respect. You have two choice : change the algorithm to respect the constraint (hard), change the data (the graph) so that the constraints are integrated in it.
The second option is really easy to do here. All we have to do is add edges between nodes that should not be the same color (that is, nodes that share a common neighbor), color the graph.
Create a deep copy G2 of the graph G. As we will modify the graph to match the new constraints, we have to keep the original intact.
For every pair of nodes n_1, n_2 in G :
If they are adjacent, nothing to do.
If they share a common neighbor in G, add an edge (n_1, n_2) in G2
Color G2
For every node in G, set it's color to the color of the corresponding node in G2
Have you tried the Graph Coloring algorithm?
Step 1 − Arrange the vertices of the graph in some order.
Step 2 − Choose the first vertex and color it with the first color.
Step 3 − Choose the next vertex and color it with the lowest numbered color that has not been colored on any vertices adjacent to it. If all the adjacent vertices are colored with this color, assign a new color to it. Repeat this step until all the vertices are colored.
credits : https://www.tutorialspoint.com/the-graph-coloring

Even distribution algorithm

I am organizing a tournament where 12 players are going to meet each other on 10 board games.
I want each player to play at least one time with the 11 others players over the 10 board games.
For example :
BoardGame1 - Match1 - Player1 + Player2 + Player3
BoardGame1 - Match2 - Player4 + Player5 + Player6
BoardGame1 - Match3 - Player7 + Player8 + Player9
BoardGame1 - Match4 - Player10 + Player11 + Player12
[...]
BoardGame10 - Match1 - Player1 + Player11 + Player9
BoardGame10 - Match2 - Player4 + Player2 + Player12
BoardGame10 - Match3 - Player7 + Player5 + Player3
BoardGame10 - Match4 - Player10 + Player8 + Player6
How do you create an algorithm where you distribute the players evenly?
I'd like to do it with TDD approach, so I need to predict the expected result (meaning no random distribution).
If all players play each other exactly once, then the resulting object
would be a Kirkman Triple System. There is no KTS with the parameters
you want, but since each player has twenty opponent slots and only
eleven potential opponents, it should be easy to find a suitable
schedule.
The code below generates the (11 choose 2) × (8 choose 2) × (5 choose 2)
= 15400 possibilities for one game and repeatedly greedily chooses the
one that spreads the pairings in the fairest manner.
import collections
import itertools
import pprint
def partitions(players, k):
players = set(players)
assert len(players) % k == 0
if len(players) == 0:
yield []
else:
players = players.copy()
leader = min(players)
players.remove(leader)
for comb in itertools.combinations(sorted(players), k - 1):
group = {leader} | set(comb)
for part in partitions(players - group, k):
yield [group] + part
def update_pair_counts(pair_counts, game):
for match in game:
pair_counts.update(itertools.combinations(sorted(match), 2))
def evaluate_game(pair_counts, game):
pair_counts = pair_counts.copy()
update_pair_counts(pair_counts, game)
objective = [0] * max(pair_counts.values())
for count in pair_counts.values():
objective[count - 1] += 1
total = 0
for i in range(len(objective) - 1, -1, -1):
total += objective[i]
objective[i] = total
return objective
def schedule(n_groups, n_players_per_group, n_games):
games = list(partitions(range(n_groups * n_players_per_group), n_players_per_group))
pair_counts = collections.Counter()
for i in range(n_games):
game = max(games, key=lambda game: evaluate_game(pair_counts, game))
yield game
update_pair_counts(pair_counts, game)
def main():
pair_counts = collections.Counter()
for game in schedule(4, 3, 10):
pprint.pprint(game)
update_pair_counts(pair_counts, game)
print()
pprint.pprint(pair_counts)
if __name__ == "__main__":
main()
Sample output:
[{0, 1, 2}, {3, 4, 5}, {8, 6, 7}, {9, 10, 11}]
[{0, 3, 6}, {1, 4, 9}, {2, 10, 7}, {8, 11, 5}]
[{0, 4, 7}, {3, 1, 11}, {8, 9, 2}, {10, 5, 6}]
[{0, 1, 5}, {2, 11, 6}, {9, 3, 7}, {8, 10, 4}]
[{0, 8, 3}, {1, 10, 6}, {2, 11, 4}, {9, 5, 7}]
[{0, 10, 11}, {8, 1, 7}, {2, 3, 5}, {9, 4, 6}]
[{0, 9, 2}, {1, 10, 3}, {8, 4, 5}, {11, 6, 7}]
[{0, 4, 6}, {1, 2, 5}, {10, 3, 7}, {8, 9, 11}]
[{0, 5, 7}, {1, 11, 4}, {8, 2, 10}, {9, 3, 6}]
[{0, 3, 11}, {8, 1, 6}, {2, 4, 7}, {9, 10, 5}]
Counter({(0, 3): 3,
(0, 1): 2,
(0, 2): 2,
(1, 2): 2,
(3, 5): 2,
(4, 5): 2,
(6, 7): 2,
(6, 8): 2,
(7, 8): 2,
(9, 10): 2,
(9, 11): 2,
(10, 11): 2,
(0, 6): 2,
(3, 6): 2,
(1, 4): 2,
(4, 9): 2,
(2, 7): 2,
(2, 10): 2,
(7, 10): 2,
(5, 8): 2,
(8, 11): 2,
(0, 4): 2,
(0, 7): 2,
(4, 7): 2,
(1, 3): 2,
(1, 11): 2,
(3, 11): 2,
(2, 8): 2,
(2, 9): 2,
(8, 9): 2,
(5, 10): 2,
(6, 10): 2,
(0, 5): 2,
(1, 5): 2,
(2, 11): 2,
(6, 11): 2,
(3, 7): 2,
(3, 9): 2,
(7, 9): 2,
(4, 8): 2,
(8, 10): 2,
(1, 6): 2,
(1, 10): 2,
(2, 4): 2,
(4, 11): 2,
(5, 7): 2,
(5, 9): 2,
(0, 11): 2,
(1, 8): 2,
(2, 5): 2,
(4, 6): 2,
(6, 9): 2,
(3, 10): 2,
(3, 4): 1,
(1, 9): 1,
(5, 11): 1,
(5, 6): 1,
(2, 6): 1,
(4, 10): 1,
(0, 8): 1,
(3, 8): 1,
(0, 10): 1,
(1, 7): 1,
(2, 3): 1,
(0, 9): 1,
(7, 11): 1})
How do you create an algorithm where you distribute the players evenly? I'd like to do it with TDD approach, so I need to predict the expected result (meaning no random distribution).
TDD tends to be successful when your problem is that you know what the computer should do, and you know how to make the computer do that, but you don't know the best way to write the code.
When you don't know how to make the computer do what you want, TDD is a lot harder. There are two typical approaches taken here.
The first is to perform a "spike" - sit down and hack things until you understand how to make the computer do what you want. The key feature of spikes is that you don't get to keep the code changes at the end; instead, you discard your spiked code, keep what you have learned in your head, and start over by writing the tests that you need.
The second approach is to sort of sneak up on it - you do TDD for the very simple cases that you do understand, and keep adding tests that are just a little bit harder than what you have already done. See Robert Martin's Craftsman series for an example of this approach.
For this problem, you might begin by first thinking of an interface that you might use for accessing the algorithm. For instance, you might consider a design that accepts as input a number of players and a number of games, and returns you a sequence of tuples, where each tuple represents a single match.
Typically, this version of the interface will look like general purpose data structures as inputs (in this example: numbers), and general purpose data structures as outputs (the list of tuples).
Most commonly, we'll verify the behavior in each test by figuring out what the answer should be for a given set of inputs, and asserting that the actual data structure exactly matches the expected. For a list of tuples, that would look something like:
assert len(expected) == len(actual)
for x in range(actual):
assert len(expected[x]) == len(actual[x])
for y in range(actual[x]):
assert expected[x][y] == actual[x][y]
Although of course you could refactor that into something that looks nicer
assert expected == actual
Another possibility is to think about the properties that a solution should have, and verify that the actual result is consistent with those properties. Here, you seem to have two properties that are required for every solution:
Each pair of players should appear exactly once in the list of matches
Every player, boardgame pair should appear exactly once in the list of matches
In this case, the answer is easy enough to check (iterate through all of the matches, count each pair, assert count equals one).
The test themselves we introduce by starting with the easiest example we can think of. Here, that might be the case where we have 2 players and 1 board, and our answer should be
BoardGame1 - Match1 - Player1 + Player2
And so we write that test (RED), and hard code this specific answer (GREEN), and then (REFACTOR) the code so that it is clear to the reader why this is the correct answer for these inputs.
And when you are happy with that code, you look for the next example - an example where the current implementation returns the wrong answer, but the change that you need to make to get it to return the write answer is small/easy.
Often, what will happen is that we "pass" the next test using a branch:
if special_case(inputs):
return answer_for_special_case
else:
# ... real implementation here ...
return answer_for_general_case
And then refactor the code until the two blocks are the same, then finally remove the if clause.
It will sometimes happen that the new test is too big, and we can't figure out how to extend the algorithm to cover the new case. Usually the play is to revert any changes we've made (keeping the tests passing), and use what we have learned to find a different test that might be easier to introduce to the code.
And you keep iterating on this process until you have solved "all" of the problem.
Here is a resolvable triple system due to Haim Hanani (“On resolvable
balanced incomplete block designs”, 1974), which provides a schedule for
11 games (drop one). Unfortunately it repeats matches.
import collections
import itertools
from pprint import pprint
def Match(a, b, c):
return tuple(sorted([a, b, c]))
games = []
for j in range(4):
games.append(
[
Match(0 ^ j, 4 ^ j, 8 ^ j),
Match(1 ^ j, 2 ^ j, 3 ^ j),
Match(5 ^ j, 6 ^ j, 7 ^ j),
Match(9 ^ j, 10 ^ j, 11 ^ j),
]
)
games.append([Match(1 ^ j, 6 ^ j, 11 ^ j) for j in range(4)])
games.append([Match(2 ^ j, 7 ^ j, 9 ^ j) for j in range(4)])
games.append([Match(3 ^ j, 5 ^ j, 10 ^ j) for j in range(4)])
for j in range(4):
games.append(
[
Match(0 ^ j, 4 ^ j, 8 ^ j),
Match(1 ^ j, 6 ^ j, 11 ^ j),
Match(2 ^ j, 7 ^ j, 9 ^ j),
Match(3 ^ j, 5 ^ j, 10 ^ j),
]
)
for game in games:
game.sort()
pprint(len(games))
pprint(games)
pair_counts = collections.Counter()
for game in games:
for triple in game:
pair_counts.update(itertools.combinations(sorted(triple), 2))
pprint(max(pair_counts.values()))
Output:
11
[[(0, 4, 8), (1, 2, 3), (5, 6, 7), (9, 10, 11)],
[(0, 2, 3), (1, 5, 9), (4, 6, 7), (8, 10, 11)],
[(0, 1, 3), (2, 6, 10), (4, 5, 7), (8, 9, 11)],
[(0, 1, 2), (3, 7, 11), (4, 5, 6), (8, 9, 10)],
[(0, 7, 10), (1, 6, 11), (2, 5, 8), (3, 4, 9)],
[(0, 5, 11), (1, 4, 10), (2, 7, 9), (3, 6, 8)],
[(0, 6, 9), (1, 7, 8), (2, 4, 11), (3, 5, 10)],
[(0, 4, 8), (1, 6, 11), (2, 7, 9), (3, 5, 10)],
[(0, 7, 10), (1, 5, 9), (2, 4, 11), (3, 6, 8)],
[(0, 5, 11), (1, 7, 8), (2, 6, 10), (3, 4, 9)],
[(0, 6, 9), (1, 4, 10), (2, 5, 8), (3, 7, 11)]]
2
Combinatorial optimization is another possibility. This one doesn’t
scale super well but can handle 12 players/10 games.
import collections
import itertools
from pprint import pprint
def partitions(V):
if not V:
yield []
return
a = min(V)
V.remove(a)
for b, c in itertools.combinations(sorted(V), 2):
for part in partitions(V - {b, c}):
yield [(a, b, c)] + part
parts = list(partitions(set(range(12))))
from ortools.sat.python import cp_model
model = cp_model.CpModel()
vars = [model.NewBoolVar("") for part in parts]
model.Add(sum(vars) == 10)
pairs = collections.defaultdict(list)
for part, var in zip(parts, vars):
for (a, b, c) in part:
pairs[(a, b)].append(var)
pairs[(a, c)].append(var)
pairs[(b, c)].append(var)
for clique in pairs.values():
total = sum(clique)
model.Add(1 <= total)
model.Add(total <= 2)
solver = cp_model.CpSolver()
status = solver.Solve(model)
print(solver.StatusName(status))
schedule = []
for part, var in zip(parts, vars):
if solver.Value(var):
schedule.append(part)
pprint(schedule)
Sample output:
OPTIMAL
[[(0, 1, 6), (2, 4, 9), (3, 8, 11), (5, 7, 10)],
[(0, 1, 10), (2, 3, 5), (4, 8, 9), (6, 7, 11)],
[(0, 2, 4), (1, 8, 10), (3, 7, 11), (5, 6, 9)],
[(0, 2, 8), (1, 4, 11), (3, 7, 9), (5, 6, 10)],
[(0, 3, 6), (1, 4, 7), (2, 5, 8), (9, 10, 11)],
[(0, 3, 8), (1, 5, 11), (2, 6, 9), (4, 7, 10)],
[(0, 4, 5), (1, 2, 7), (3, 6, 10), (8, 9, 11)],
[(0, 5, 11), (1, 3, 9), (2, 6, 7), (4, 8, 10)],
[(0, 7, 9), (1, 6, 8), (2, 10, 11), (3, 4, 5)],
[(0, 9, 10), (1, 2, 3), (4, 6, 11), (5, 7, 8)]]

X-Y Heuristic on the N-Puzzle

First of all I have seen this answer and yes it explains X-Y heuristic but the example board was too simple for me to understand the general heuristic.
X-Y heuristic function for solving N-puzzle
So could someone please explain the X-Y heuristic using this example?
8 1 2
7 3 6
0 5 4
The algorithm consists of 2 separate parts - for rows and columns.
1) Rows. Divide the input matrix by rows - elements from each row go to separate set.
(1, 2, 8) - (3, 6, 7) - (0, 4, 5)
The only available move is swaping 0 with an element from adjacent set.
You finish, when each element is in the proper set.
swap 0 and 7 -> (1, 2, 8) - (0, 3, 6) - (4, 5, 7)
swap 0 and 8 -> (0, 1, 2) - (3, 6, 8) - (4, 5, 7)
swap 0 and 3 -> (1, 2, 3) - (0, 6, 8) - (4, 5, 7)
swap 0 and 4 -> (1, 2, 3) - (4, 6, 8) - (0, 5, 7)
swap 0 and 8 -> (1, 2, 3) - (0, 4, 6) - (5, 7, 8)
swap 0 and 5 -> (1, 2, 3) - (4, 5, 6) - (0, 7, 8)
Number of required steps = 6.
2) Similarly for columns. You start with:
(0, 7, 8) - (1, 3, 5) - (2, 4 ,6)
And then
(1, 7, 8) - (0, 3, 5) - (2, 4, 6)
(0, 1, 7) - (3, 5, 8) - (2, 4, 6)
(1, 3, 7) - (0, 5, 8) - (2, 4, 6)
(1, 3, 7) - (2, 5, 8) - (0, 4, 6)
(1, 3, 7) - (0, 2, 5) - (4, 6, 8)
(0, 1, 3) - (2, 5, 7) - (4, 6, 8)
(1, 2, 3) - (0, 5, 7) - (4, 6, 8)
(1, 2, 3) - (4, 5, 7) - (0, 6, 8)
(1, 2, 3) - (0, 4, 5) - (6, 7, 8)
(1, 2, 3) - (4, 5, 6) - (0, 7, 8)
Number of required steps = 10
3) Total number of steps: 6 + 10 = 16

quick sort list of tuple with python

I am trying to do this in its operation algorithm quicksort to sort though the elements of a list of tuples. Or if I have a list of this type [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )] I want to sort it in function of the second element of each tuple and obtain [(6,4), (3,3), (4,2), (0,1), (1,1), (2,1 ), (5,1)]. I have tried using the following algorithm:
def partition(array, begin, end, cmp):
pivot=array[end][1]
ii=begin
for jj in xrange(begin, end):
if cmp(array[jj][1], pivot):
array[ii], array[jj] = array[jj], array[ii]
ii+=1
array[ii], array[end] = pivot, array[ii]
return ii
enter code hedef sort(array, cmp=lambda x, y: x > y, begin=0, end=None):
if end is None: end = len(array)
if begin < end:
i = partition(array, begin, end-1, cmp)
sort(array, cmp, i+1, end)
sort(array, cmp, begin, i)
The problem is that the result is this: [4, (3, 3), (4, 2), 1, 1, 1, (5, 1)]. What do I have to change to get the correct result ??
Complex sorting patterns in Python are painless. Python's sorting algorithm is state of the art, one of the fastest available in real-world cases. No algorithm design needed.
>>> from operator import itemgetter
>>> l = [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )]
>>> l.sort(key=itemgetter(1), reverse=True)
>>> l
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
Above, itemgetter returns a function that returns the second element of its argument. Thus the key argument to sort is a function that returns the item on which to sort the list.
Python's sort is stable, so the ordering of elements with equal keys (in this case, the second item of each tuple) is determined by the original order.
Unfortunately the answer from #wkschwartz only works due to the peculiar start ordering of the terms. If the tuple (5, 1) is moved to the beginning of the list then it gives a different answer.
The following (first) method works in that it gives the same result for any initial ordering of the items in the initial list.
Python 3.4.2 |Continuum Analytics, Inc.| (default, Oct 22 2014, 11:51:45) [MSC v
.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> l = [(0,1), (1,1), (2,1), (3,3), (4,2), (5,1), (6,4 )]
>>> sorted(l, key=lambda x: (-x[1], x[0]))
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>> from operator import itemgetter
>>> sorted(l, key=itemgetter(1), reverse=True)
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>> # but note:
>>> l2 = [(5,1), (1,1), (2,1), (3,3), (4,2), (0,1), (6,4 )]
>>> # Swapped first and sixth elements
>>> sorted(l2, key=itemgetter(1), reverse=True)
[(6, 4), (3, 3), (4, 2), (5, 1), (1, 1), (2, 1), (0, 1)]
>>> sorted(l2, key=lambda x: (-x[1], x[0]))
[(6, 4), (3, 3), (4, 2), (0, 1), (1, 1), (2, 1), (5, 1)]
>>>

Mapping sort indexes

I encountered and solved this problem as part of a larger algorithm, but my solution seems inelegant and I would appreciate any insights.
I have a list of pairs which can be viewed as points on a Cartesian plane. I need to generate three lists: the sorted x values, the sorted y values, and a list which maps an index in the sorted x values with the index in the sorted y values corresponding to the y value with which it was originally paired.
A concrete example might help explain. Given the following list of points:
((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
The sorted list of x values would be (3, 4, 5, 7, 9, 15), and the sorted list of y values would be (0, 4, 7, 7, 11, 12).
Assuming a zero based indexing scheme, the list that maps the x list index to the index of its paired y list index would be (2, 3, 0, 4, 5, 1).
For example the value 7 appears as index 3 in the x list. The value in the mapping list at index 3 is 4, and the value at index 4 in the y list is 11, corresponding to the original pairing (7, 11).
What is the simplest way of generating this mapping list?
Here's a simple O(nlog n) method:
Sort the pairs by their x value: ((3, 7), (4, 7), (5, 0), (7, 11), (9, 12), (15, 4))
Produce a list of pairs in which the first component is the y value from the same position in the previous list and the second increases from 0: ((7, 0), (7, 1), (0, 2), (11, 3), (12, 4), (4, 5))
Sort this list by its first component (y value): ((0, 2), (4, 5), (7, 0), (7, 1), (11, 3), (12, 4))
Iterate through this list. For the ith such pair (y, k), set yFor[k] = i. yFor[] is your list (well, array) mapping indices in the sorted x list to indices in the sorted y list.
Create the sorted x list simply by removing the 2nd element from the list produced in step 1.
Create the sorted y list by doing the same with the list produced in step 3.
I propose the following.
Generate the unsorted x and y lists.
xs = [3, 15, 7, 5, 4, 9 ]
ys = [7, 4, 11, 0, 7, 12]
Transform each element into a tuple - the first of the pair being the coordinate, the second being the original index.
xs = [(3, 0), (15, 1), ( 7, 2), (5, 3), (4, 4), ( 9, 5)]
ys = [(7, 0), ( 4, 1), (11, 2), (0, 3), (7, 4), (12, 5)]
Sort both lists.
xs = [(3, 0), (4, 4), (5, 3), (7, 2), ( 9, 5), (15, 1)]
ys = [(0, 3), (4, 1), (7, 0), (7, 4), (11, 2), (12, 5)]
Create an array, y_positions. The nth element of the array contains the current index of the y element that was originally at index n.
Create an empty index_list.
For each element of xs, get the original_index, the second pair of the tuple.
Use y_positions to retrieve the current index of the y element with the given original_index. Add the current index to index_list.
Finally, remove the index values from xs and ys.
Here's a sample Python implementation.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
#generate unsorted lists
xs, ys = zip(*points)
#pair each element with its index
xs = zip(xs, range(len(xs)))
ys = zip(ys, range(len(xs)))
#sort
xs.sort()
ys.sort()
#generate the y positions list.
y_positions = [None] * len(ys)
for i in range(len(ys)):
original_index = ys[i][1]
y_positions[original_index] = i
#generate `index_list`
index_list = []
for x, original_index in xs:
index_list.append(y_positions[original_index])
#remove tuples from x and y lists
xs = zip(*xs)[0]
ys = zip(*ys)[0]
print "xs:", xs
print "ys:", ys
print "index list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index list: [2, 3, 0, 4, 5, 1]
Generation of y_positions and index_list is O(n) time, so the complexity of the algorithm as a whole is dominated by the sorting step.
Thank you for the answers. For what it's worth, the solution I had was pretty similar to those outlined, but as j_random_hacker pointed out, there's no need for a map. It just struck me that this little problem seems more complicated than it appears at first glance and I was wondering if I was missing something obvious. I've rehashed my solution into Python for comparison.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
# Separate the points into their x and y components, tag the values with
# their index into the points list.
# Sort both resulting (value, tag) lists and then unzip them into lists of
# sorted x and y values and the tag information.
xs, s = zip(*sorted(zip([x for (x, y) in points], range(N))))
ys, r = zip(*sorted(zip([y for (x, y) in points], range(N))))
# Generate the mapping list.
t = N * [0]
for i in range(N):
t[r[i]] = i
index_list = [t[j] for j in s]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index_list: [2, 3, 0, 4, 5, 1]
I've just understood what j_random_hacker meant by removing a level of indirection by sorting the points in x initially. That allows things to be tidied up nicely. Thanks.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
ordered_by_x = sorted(points)
ordered_by_y = sorted(zip([y for (x, y) in ordered_by_x], range(N)))
index_list = N * [0]
for i, (y, k) in enumerate(ordered_by_y):
index_list[k] = i
xs = [x for (x, y) in ordered_by_x]
ys = [y for (y, k) in ordered_by_y]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list

Resources