Maximum matching for assigning 2 items - algorithm

There are N people at a party. Each one has some preferences of food and drinks. Given all the types of foods and drinks that a particular person prefers, find the maximum number of people that can be assigned a drink and a food of their choice.
A person may have several choices for both food and drinks, for example, a person may like Foods A,B,C and Drinks X,Y,Z. If we assign (A,Z) to the person, we consider the person to have been correctly assigned.
How do we solve this problem, considering that there are 2 constraints that we need to handle.

Let F be the set of all food there is, D be the set of all drink and P be the set of all people there is.
Build 2 bipartite graphs G and G' such that: for G: the first partite set is P and the second partite set is F, for G': the first partite set is P and the second partite set is D. Do maximal matching on both G and G' separately. Call M the maximum matching on G and M' the maximum matching on G'. M is a list of vertex-pair: (p1, f1), (p2,f2)... where pi and fi are people and food respectively. M' is also a list of vertex pair: (p1,d1), (p3,d3) ...
Now, merge M and M' by merging the pair with the same person: (p1,f1) + (p1,d1) = (p1,f1,d1) and that is the food-drink combo for p1. Say if p2 has a matching with f2 but p2 has no matching in G' (no drink), then ignore it.
A good algorithm for bipartite graph matching is Hopcroft-Karp algorithm. http://en.wikipedia.org/wiki/Hopcroft%E2%80%93Karp_algorithm.

Related

Optimization - distribute participants far from each other

This is my first question. I tried to find an answer for 2 days but I couldn't find what I was looking for.
Question: How can I minimize the amount of matches between students from the same school
I have a very practical case, I need to arrange a competition (tournament bracket)
but some of the participants might come from the same school.
Those from the same school should be put as far as possible from each other
for example: {A A A B B C} => {A B}, {A C}, {A B}
if there are more than half participants from one school, then there would be no other way but to pair up 2 guys from the same school.
for example: {A A A A B C} => {A B}, {A C}, {A A}
I don't expect to get code, just some keywords or some pseudo code on what you think would be a way of making this would be of great help!
I tried digging into constraint resolution algorithms and tournament bracket algorithms, but they don't consider minimising the amount of matches between students from same school.
Well, thank you so much in advance!
A simple algorithm (EDIT 2)
From the comments below: you have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
The idea
Sort the students by school, the schools with the more students before the ones with the less students. e.g A B B B B C C -> B B B B C C A.
Distribute the students in two groups A and B as in a war card game: 1st student in A, 2nd student in B, 3rd student in A, 4th student in B, ...
Continue with groups A and B.
You have a recursion: the position of a player in the level k-1 (k=n-1 to 0) is ((pos at level k) % 2) * 2^k + (pos at level k) // 2 (every even goes to the left, every odd goes to the right)
Python code
Sort array by number of schools:
assert 2**math.log2(len(players)) == len(players) # n is the number of rounds
c = collections.Counter([p.school for p in players])
players_sorted_by_school_count = sorted(players, key=lambda p:-c[p.school])
Find the final position of every player:
players_sorted_for_tournament = [-1] * 2**n
for j, player in enumerate(players_sorted_by_school_count):
pos = 0
for e in range(n-1,-1,-1):
if j % 2 == 1:
pos += 2**e # to the right
j = j // 2
players_sorted_for_tournament[pos] = player
This should give groups that are diverse enough, but I'm not sure whether it's optimal or not. Waiting for comments.
First version: how to make pairs from students of different schools
Just put the students from a same school into a stack. You have as many stack as schools. Now, sort your stacks by number of students. In your first example {A A A B B C}, you get:
A
A B
A B C
Now, take the two top elements from the two first stacks. The stack sizes have changed: if needed, reorder the stacks and continue. When you have only one stack, make pairs from this stack.
The idea is to keep as many "schools-stacks" as possible as long as possible: you spare the students of small stacks until you have no choice but to take them.
Steps with your second example, {A A A A B C}:
A
A
A
A B C => output A, B
A
A
A C => output A, C
A
A => output A A
It's a matching problem (EDIT 1)
I elaborate on the comments below. You have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
Your solution is to start with the set of all players and split it into two sets that are as diverse a possible. "Diverse" means here: the maximum number of different schools. To do so, you check all possible combinations of elements that split the set into two subsets of equals size. Then you perform recursively the same operation on those sets, until you arrive to the player level.
Another idea is to start with players and try to make pairs with other players from other school. Let's define a distance: 1 if two players are in the same school, 0 if they are in a different school. You want to make pairs with the minimum global distance.
This distance may be generalized for the pairs of players: take the number of common schools. That is: A B A B -> 2 (A & B), A B A C -> 1 (A), A B C D -> 0. You can imagine the distance between two sets (players, pairs, pairs of pairs, ...): the number of common schools. Now you can see this as a graph whose vertices are the sets (players, pairs, pairs of pairs, ...) and whose edges connect every pair of vertices with a weight that is the distance defined above. You are looking for a perfect matching (all vertices are matched) with a minimum weight.
The blossom algorithm or some of its variants seems to fit your needs, but it's probably overkill if the number of players is limited.
Create a two-dimensional array, where the first dimension will be for each school and the second dimension will be for each participant in this take-off.
Load them and you'll have everything you need linearly.
For example:
School 1 ------- Schol 2 -------- School 3
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B
A ------------ B
A
A
In the example above, we will have 3 schools (first dimension), with school 1 having 7 participants (second dimension), school 2 having 5 participants and school 3 having 3 participants.
You can also create a second array containing the resulting combinations and, for each chosen pair, delete this pair from the initial array in a loop until it is completely empty and the result array is completely full.
I think the algorithm in this answer could help.
Basically: group the students by school, and use the error tracking idea behind Bresenham's Algorithm to distribute the schools as far apart as possible. Then you pull out pairs from the list.

Algorithm to find matchup for a group with n players and m participant matches

I have the following problem:
A group of n players wants to play a set of matches. Each match has m
participants. I want to find a schedule with a minimum number of games
where every player meets every other player at least once and maximum
variety of opponents.
After some research I found that the "social golfer problem" seems to be a similar problem but I could not find a solution which I could adapt nor can I come up with an own solution.
Pseudocode (assuming there are flags inside the players):
function taking array of players (x) and players per game (y) {
array of players in this game (z)
for each player (t) in x {
if z.length == y {break out of loop}
check the flag of each player in (t), if the flag is not set {
check if z.length is less than y {
set flag and add it to array z
}
}
}
if z.length is less than 2, change the players in z's flags back to false
return (if z.length == 3, return z, or else return false);
}
Start with player A; (assume players A to F, 3 players per game)
By going through from top to bottom, we can eliminate possibilities. Start with each person playing all other players (that they have not already played, so for example, B skips C because B played with C in ABC) (in groups of 3). We can write a function to do this (see psuedocode at top)
A B C (save this game to a list of games, or increment a counter or something)
A D E
A F -missing (returned false so we did not save this)
B D E
B F -missing
C D E
C F -missing
D E F
Now, almost all players have played each other, if you only count the groups of 3. This is 5 games so far. Remove the games we've already counted, resulting in
A F -missing
B F -missing
C F -missing
What is in common here? They all have F. That means that F must play everyone in this list, so all we need to do is put F in the front.
We can now do F A B, and then C F + any random player. This is the minimum 7 games.
Basically, you can run the pseudocode over and over until it returns false 2 times in a row. When it has returned false 2 times in a row, you know that all flags have been set.
This may not be a complete solution, but... Consider a graph with n nodes. A match with m players can be represented by laying m-1 edges into the graph per round. The requirement that each player meet each other player at least once means that you will after some number of rounds have a complete graph.
For round (match) 1, lay an arbitrary set of m-1 edges. For each next round, lay m-1 edges that are not currently connecting two nodes. Repeat until the graph is complete.
Edit: Edges would need to be laid connected to ensure only m players are in a match for m-1 edges, which would make this a little more difficult. If you lay each round in a walk of the complete graph, the problem is then the same as finding the shortest walk of the complete graph. This answer to a different question may be relevant, and suggests the Floyd-Warshall algorithm.

Determine parent, children from pairwise data

I need to determine parent/child relationships from some unusual data.
Flight numbers are marketing creations and they are odd. Flight # 22 by Airline X may refer to a singular trip between X and Y. Flight # 44 from the same airline may actually refer to multiple flights between city pairs. Example:
Flight 44: Dallas - Paris
Flight 44: Dallas - Chicago
Flight 44: Chicago - New York
Flight 44: New York - Paris
Flight 44: Chicago - Paris
Flight 44: Dallas - New York
Reality -- this is the way they work. When I pull the data from the "big list of flight numbers and city pairs" I get those 6 combinations for flight 44. I have passenger counts for each, so if there are 10 people flying Dallas - Paris, I need to take those 10 passengers and add them to the DAL - CHI, CHI - NY, and NY - PAR segments.
From a list of all the segments, I need to figure out "ahhh, this is a flight that goes from Dallas to Paris" --then when I see passenger loads I can increment the city-to-city actual loads accordingly like so:
- Value associated with AD -- > increment segments AB, BC, CD
- value associated with AC --> increment only segments AB, BC
- value associated with AB --> increment only segment AB
etc.
Assume I get a list of values in no order for flight 44 like this: (DAL-CHI, CHI-NYC, NYC-PAR, DAL-NYC, DAL-PAR, CHI-PAR). How do I figure out the parent child structure comparing these 4 values in these 6 combinations?
Formulation
Let a_i -> b_i be the ith entry in your list of pairs for flight 44, i = 1..M.
Let V be the set of all unique a_i and b_i values:
V = {a_i | i = 1..M} U {b_i | i = 1..M}
Let E be the set of all pairs (a_i, b_i):
E = {(a_i, b_i) | i = 1..M}
Then G = (V, E) is a directed acyclic graph where vertices V are cities and directed edges E correspond to entries a_i -> b_i in your list.
Algorithm
What you are looking for is a topological sort of the graph G. The linked Wikipedia page has pseudocode for this algorithm.
This will give you a linear ordering of the cities (in your example: [Dallas, Chicago, New York, Paris]) that is consistent with all of the ordering constraints present in your initial list. If your initial list contains fewer than |V| choose 2 pairs (meaning there is not a full set of constraints) then there will potentially be multiple consistent topological orderings of the cities in your set V.
Note: This is a common-sense analysis, but see Timothy Shields solution where he identified the problem as Topological Sorting problem, thus having known computation complexity & known conditions on uniqueness.
I will try to extract the core of the problem from your answer in order to describe it formally.
In the above example, you actually have four nodes (cities), for brevity denoted as D, P, C, and NY. You have a set of ordered pairs (x, y), which are interpreted as "on that flight, node x precedes node y". Writing this as x<y, we actually have the following:
(for flight 044):
D < P
D < C
C < NY
NY < P
C < P
D < NY
From these constraints, we want to find an ordered tuple (x, y, z, w) such that x < y < z < w and the above constraints hold. We know that the solution is (x=D, y=C, z=NY, w=P).
Note: It might be that in your database, the first element in your set is always the "origin-destination pair" (in our case, D<P). But it does not change much on the analysis which follows.
How to find this ordered tuple programatically? I have relatively fair knowledge of algorithms, but am not aware of a standard method for solving this (other users may help here). I am concerned about the uniqueness of the result. It could be a good unit test of the integrity of your data that you should require that the solution for that ordered tuple is unique, otherwise you might be, subsequently, incrementing the wrong segments.
As we deal with uniqueness issue, I would suggest generating all the permutations of nodes, and displaying all the solutions which are feasible w.r.t the given constraints.
A naive implementation could look like this:
import itertools
nodes = ['D', 'P', 'NY', 'C']
result = [ot
for ot in itertools.permutations(nodes) # ot = ordered tuple
if ot.index('D') < ot.index('P')
if ot.index('D') < ot.index('C')
if ot.index('C') < ot.index('NY')
if ot.index('NY') < ot.index('P')
if ot.index('C') < ot.index('P')
if ot.index('D') < ot.index('NY')
]
print result
# displays: [('D', 'C', 'NY', 'P')]
If the number of nodes is low, this type of "naive" implementation may be sufficient. If the number is higher, I would suggest to implement it in such a way that the constrains are used effectively to prune the solution space (ask me if you would need hints for this).
Construct a list of all cities that are either departures or destinations from your flight list. This gives four
cities:
Dallas
Paris
Chicago
New York
Iterate over the flight list again and count the number of occurances of each destination city:
0 Dallas
3 Paris
1 Chicago
2 New York
Sort the list by the destination count and you have the route:
Dallas -> Chicago -> New York -> Paris
Note: If the destination counts are not contiguous starting with zero (eg. 0, 1, 2, 3...) it points to either an inconsistent or incomplete departure/destination list for that flight.
Okay, take two: here's a function that will take a string such as the one you provided and topologically sort it as per that wikipedia article.
import re
import itertools
def create_nodes(segments):
remaining_segments = re.findall(r'(\w*?)-(\w*?)[,)]', segments)
nodes = []
while len(remaining_segments) > 1:
outgoing, incoming = zip(*remaining_segments)
point = next(node for node in outgoing if node not in incoming)
nodes.append(point)
remaining_segments = [segment for segment in remaining_segments if segment[0] != point]
last_segment = remaining_segments.pop()
nodes.extend(last_segment)
return nodes
Test:
>>> foo = '(DAL-CHI, CHI-NYC, NYC-PAR, DAL-NYC, DAL-PAR, CHI-PAR)'
>>> scratch.create_nodes(foo)
['DAL', 'CHI', 'NYC', 'PAR']
Note that this is not a perfect topological sort function for every use; however, for your specific use case of multiple-stop one-way journeys it should be effective.
Have you looked into using a dictionary with a list store in it.
a dictionary is basically a hash table and you can store a key (the begging and end point ex, AD) and a value (the segments it needs to go through [AB, BC, CD])

Algorithm: find connections between towns with a limit of train changes

What algorithm would you use to create an application that given appropriate data (list of cities, train routes, train stations) is capable of returning a list of connection between any two user-selected cities? The application has to choose only those connections that fall into the limit of accepted train-changes.
Example: I ask the application which train to take if I need to travel from Paris to Moscow with max. 1 stop/switch - the application returns a route: Train 1 (Paris-Berlin) -> Train 2 (Berlin->Moscow) (No direct connection exists).
Graphical example
http://i.imgur.com/KEJ3I.png
If I ask the system about possible connections from Town A to Town G I get a response:
Brown Line (0 switches = direct)
Brown Line to Town B / Orange Line to Town G (1 switch)
Brown Line to Town B / Orange Line to Town D / Red Line to G (2 switch)
... all other possibilities
And thouh the 2nd and 3rd options are shorter than the 1st, it's the 1st that should have priority (since no train-switching is involved).
Assuming the only thing important is "number of stops/switches", then the problem is actually finding a shortest path in an unweighted directed graph.
The graph model is G = (V,E) where V = {all possible stations} and E = { (u,v) | there is a train/route from station u to station v }
Note: let's say you have a train which starts at a_0, and paths through a_1, a_2,...a_n: then E will contain: (a_0,a_1),(a_0,a_2),..,(a_0,a_n) and also (a_1,a_2),(a_1,a_3),... formally: for each i < j : (a_i,a_j) &in; E.
BFS solves this problem, and is both complete [always finds a solution if there is one] and optimal [finds the shortest path].
If the edges [routes] are weighted, something like dijkstra's algorithm will be needed instead.
If you want a list of all possible routes, Iterative-Deepening DFS could be used, without maintaining a visited set, and print all the paths found to the target up to the relevant depth. [BFS fails to return all paths with the counter example of a clique]
I think you need to compute all pairs shortest paths. Check http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm.

Seating people in a movie theater

This is based on an article I read about puzzles and interview questions asked by large software companies, but it has a twist...
General question:
What is an algorithm to seat people in a movie theater so that they sit directly beside their friends but not beside their enemies.
Technical question:
Given an N by M grid, fill the grid with N * M - 1 items. Each item has an association Boolean value for each of the other N * M - 2 items. In each row of N, items directly beside other items should have a positive association value for the other. Columns however do not matter, i.e. an item can be "enemies" with the item in front of it. Note: If item A has a positive association value for B, then that means B also has a positive association value for A. It works the same for negative association values. An item is guarenteed to have a positive association with atleast one other item. Also, you have access to all of the items and their association values before you start placing them in the grid.
Comments:
I have been researching this problem and thinking about it since yesterday, from what I have found it reminds me of the bin packing problem with some added requirements. In some free time I attempted to implement it, but large groups of "enemies" were sitting next to each other. I am sure that most situations will have to have atleast one pair of enemies sitting next to each other, but my solution was far from optimal. It actually looked as if I had just randomized it.
As far as my implementation went, I made N = 10, M = 10, the number of items = 99, and had an array of size 99 for EACH item that had a randomized Boolean value that referred to the friendship of the corresponding item number. This means that each item had a friendship value that corresponded with their self as well, I just ignored that value.
I plan on trying to reimplement this again later and I will post the code. Can anyone figure out a "good" way to do this to minimize seating clashes between enemies?
This problem is NP-Hard.
Define L={(G,n,m)|there is a legal seating for G in m×m matrix,(u,v) in E if u is friend of v} L is a formal definition of this problem as a language.
Proof:
We will show Hamiltonian-Problem ≤ (p) 2-path ≤ (p) This-problem in 2 steps [Hamiltonian and 2-path defined below], and thus we conclude this problem is NP-Hard.
(1) We will show that finding two paths covering all vertices without using any vertex twice is NP-Hard [let's call such a path: 2-path and this problem as 2-path problem]
A reduction from Hamiltonian Path problem:
input: a graph G=(V,E)
Output: a graph G'=(V',E) where V' = V U {u₀}.
Correctness:
if G has Hamiltonian Path: v₁→v₂→...→vn, then G' has 2-path:
v₁→v₂→...→vn,u₀
if G' has 2-path, since u₀ is isolated from the rest vertices, there is a
path: v₁→...→vn, which is Hamiltonian in G.
Thus: G has Hamiltonian path 1 ⇔ G' has 2-path, and thus: 2-path problem is NP-Hard.
(2)We will now show that our problem [L] is also NP-Hard:
We will show a reduction from the 2-path problem, defined above.
input: a graph G=(V,E)
output: (G,|V|+1,1) [a long row with |V|+1 sits].
Correctness:
If G has 2-path, then we can seat the people, and use the 1 sit gap to
use as a 'buffer' between the two paths, this will be a legal perfect seating
since if v₁ is sitting next to v₂, then v₁ v₁→v₂ is in the path, and thus
(v₁,v₂) is in E, so v₁,v₂ are friends.
If (G,|V|+1,1) is legal seat:[v₁,...,vk,buffer,vk+1,...,vn] , there is a 2-path in G,
v₁→...→vk, vk+1→...→vn
Conclusion: This problem is NP-Hard, so there is not known polynomial solution for it.
Exponential solution:
You might want to use backtracking solution: which is basically: create all subsets of E with size |V|-2 or less, check which is best.
static best <- infinity
least_enemies(G,used):
if |used| <= |V|-2:
val <- evaluate(used)
best <- min(best,val)
if |used| == |V|-2:
return
for each edge e in E-used: //E without used
least_enemies(G,used + e)
in here we assume evaluate(used) gives the 'score' for this solution. if this solution is completely illegal [i.e. a vertex appear twice], evaluate(used)=infinity. an optimization can of course be made, trimming these cases. to get the actual sitting we can store the currently best solution.
(*)There are probably better solutions, this is just a simple possible solution to start with, the main aim in this answer is proving this problem is NP-Hard.
EDIT: simpler solution:
Create a graph G'=(V U { u₀ } ,E U {(u₀,v),(v,u₀) | for each v in V}) [u₀ is a junk vertex for the buffer] and a weight function for edges:
w((u,v)) = 1 u is friend of v
w((u,v)) = 2 u is an enemy v
w((u0,v)) = w ((v,u0)) = 0
Now you got yourself a classic TSP, which can be solved in O(|V|^2 * 2^|V|) using dynamic programming.
Note that this solution [using TSP] is for one lined theatre, but it might be a good lead to find a solution for the general case.
One algorithm used for large "search spaces" such as this is simulated annealing

Resources