I'm making a matchmaking client that matches 10 people together into two teams:
Each person chooses four people they would like to play with, ranked from highest to lowest.
Two teams are then formed out of the strongest relationships in that set.
How would you create an algorithm that solves this problem?
Example:
Given players [a, b, c, d, e, f, g, h, i, j], '->' meaning a preference pick.
a -> b (weight: 4)
a -> c (weight: 3)
a -> d (weight: 2)
a -> e (weight: 1)
b -> d (weight: 4)
b -> h (weight: 3)
b -> a (weight: 2)
...and so on
This problem seemed simple on the surface (after all it is only just a matchmaking client), but after thinking about it for a while it seems that there needs to be quite a lot of relationships taken into account.
Edit (pasted from a comment):
Ideally, I would avoid a brute-force approach to scale to larger games which require 100 players and 25 teams, where picking your preferred teammates would be done through a search function. I understand that this system may not be the best for its purpose - however, it is an interesting problem and I would like to find an efficient solution while learning something along the way.
A disclaimer first.
If your user suggested this, there are two possibilities.
Either they can provide the exact details of the algorithm, so ask them.
Or they most probably don't know what they are talking about, and just generated a partial idea on the spot, in which case, it's sadly not worth much on average.
So, one option is to search how matchmaking works in other projects, disregarding the idea completely.
Another is to explore the user's idea.
Probably it won't turn into a good system, but there is a chance it will.
In any case, you will have to do some experiments yourself.
Now, to the case where you are going to have fun exploring the idea.
First, for separating ten items into two groups of five, there are just choose(10,5)=252 possibilities, so, unless the system has to do it millions of times per second, you can just calculate some score for all of them, and choose the best one.
The most straightforward way is perhaps to consider all 2^{10} = 1024 ways to form a subset of 10 elements, and then explore the ones where the size of the subset is 5.
But there may be better, more to-the-point, tools readily available, depending on the language or framework.
The 10-choose-5 combination is one group, the items not taken are the other group.
So, what would be the score of a combination?
Now we look at our preferences.
For each preference satisfied, we can add its weight, or its weight squared, or otherwise, to the score.
Which works best would sure need some experimentation.
Similarly, for each preference not satisfied, we can add a penalty depending on its weight.
Next, we can consider all players, and maybe add more penalty for each of the players which has none of their preferences satisfied.
Another thing to consider is team balance.
Since the only data so far are preferences (which may well turn out to be insufficient), an imbalance means that one team has many of their preferences satisfied, and the other has only few, if any at all.
So, we add yet another penalty depending on the absolute difference of (satisfaction sum of the first team) and (satisfaction sum of the second team).
Sure there can be other things to factor in...
Based on all this, construct a system which at least looks plausible on the surface, and then experiment and experiment again, tweaking it so that it better fits the matchmaking goals.
I would think of a way to score proposed teams against the selections from people, such as scoring proposed teams against the weights.
I would try and optimise this by hill-climbing (e.g. swapping a pair of people and looking to see if that improves the score) if only because people could look at the final solution and try this themselves - so you don't want to miss improvements of this sort.
I would hill-climb multiple times, from different starting points, and pick the answer found with the best score, because hill-climbing will probably end at local optima, not global optima.
At least some of the starting points should be based on people's original selections. This would be easiest if you got people's selections to amount to an entire team's worth of choices, but you can probably build up a team from multiple suggestions if you say that you will follow person A's suggestions, and then person B's selection if needed, and then person C's selection if needed, and so on.
If you include as starting points everybody's selections, or selections based on priority ABCDE.. and then priority BCDE... and then priority CDEF... then you have the property that if anybody submits a perfect selection your algorithm will recognise it as such.
If your hill-climbing algorithm tries swapping all pairs of players to improve, and continues until it finds a local optimum and then stops, then you also have the property that if anybody submits a selection which is only one swap away from perfection, your algorithm will recognise it as such.
Related
My problem is the following:
Me and my team are moving to another part of the office and we have to decide everybody's place to sit. However, everybody has priorities. I would like to find an algorithm which helps us to distribute the seats in a way that everybody is satisfied. (Or the most of them at least.)
I've started to implement my own algorithm where I ask 3 preferred options (the team consists of 10 people and there are 10 places) from everybody and consider there "seniority" (the length of the time they have spent in the team) as a rank between them.
However, I've stuck without any luck, tried to browse the internet for an algorithm which solves a similar problem but didn't find any.
What would be the best way to solve this? Is there any
generally known algorithm which solves this or a similar problem?
Thank you!
What first comes to mind for me is the stable marriage problem. Here's the problem statement for the original algorithm:
Given n men and n women, where each person has ranked all members of the opposite sex in order of preference, marry the men and women together such that there are no two people of opposite sex who would both rather have each other than their current partners. When there are no such pairs of people, the set of marriages is deemed stable.
Please read up on the Gale–Shapley algorithm, which is what I'll adapt for this problem.
Have each worker make a list of their rankings for all the spots. These will be the "men". Then, each spot will use the seniority ranking as their rankings for the "men". The spots will be the "women" in the Gale-Shapley algorithm.
You will get a seat assignment that has no "unstable marriage". Here's what an unstable marriage is:
There is an element A of the first matched set which prefers some given element B of the second matched set over the element to which A is already matched, and
B also prefers A over the element to which B is already matched.
In this context, an unstable marriage means that there is a worker-seat between W1 and S1 assignment such that another worker, W2, has ranked S1 higher. Not only that, S1 has also ranked W2 higher. Since the seats made their list based off the seniority list, it means that W2 has higher seniority.
In effect, this means that you'll get a seating assignment such that no worker has a seat that someone else with higher seniority wants "more".
The bottom of that Wiki article mentions packages in R and Python that have already implemented the algorithm, so it's just up to you to input the preference lists.
Disclaimer: This is probably not the most efficient algorithm. All the seats have the same ranking list, so there's probably a shortcut somewhere. However, it's easier to use a cannon to kill a fly, if the cannon is already written in R/Python for you. Also, this is the only algorithm I remember from uni, so this is the only hammer I have for any nail.
I decided to implement a brute force solution as lots of the comments suggested.
So:
I asked everybody from the team to give a preference order between the seats (10 to 1, what I use as score the "teamMember-seat" pairings, 10 is the highest score)
collected all of the "teamMember-seat" pairings with scores e.g. name:Steve, seat:seat1, score:5 (the score is from the given order from the previous step)
generated all the possible sitting combination from these
e.g.
List1: [name:Steve seat:seat1 score:5], [name:John seat:seat2 score:3] ... [name:X seat:seatY score:X]
List2: [name:Steve seat:seat2 score:4], [name:John seat:seat1 score:4] ... [name:X seat:seatY score:X]
...
ListX: [],[]...
chose the "teamMember-seat" list(s) with the highest score (score of the list is calculated by summing the scores of the "teamMember-seat" pairings)
if there are 2 lists with equal scores, then the algorithm choose that one where the most senior team members get the most preferred seats of them
if still there are more then one list (combination) the algorithm choose one randomly
I'm sure there are some better algorithms to do this as some of you suggested but I've run out of time.
I didn't post the code since it is really long and not too complicated to implement. However, if you need it, don't hesitate to drop a private message.
SUMMARY
I'm looking for an algorithm to rank objects. Two objects can be compared. However, the comparisons are real world comparisons that may be flawed. Also, I care more about finding out the very best object than which ones are the worst.
TO MOTIVATE:
Think that I'm scientifically evaluating materials. I combine two materials. I want to find the best working material for in-depth testing. So, I don't care about materials that are unpromising. However, each test can be a false positive or have anomalies between those particular two materials.
PRECISE PROBLEM:
There is an unlimited pool of objects.
Two objects can be compared to each other. It is resource expensive to compare two objects.
It's resource expensive to consider an additional object. So, an object should only be included in the evaluation if it can be fully ranked.
It is very important to find the very best object in the pool of the tested ones. If an object is in the bottom half, it doesn't matter to find out where in the bottom half it is. The importance of finding out the exact rank is a gradient with the top much more important.
Most of the time, if A > B and B > C, it is safe to assume that A > C. Sometimes, there are false positives. Sometimes A > B and B > C and C > A. This is not an abstract math space but real world measurements.
At the start, it is not known how many comparisons are allowed to be taken. The algorithm is granted permission to do another comparison until it isn't. Thus, the decision on including an additional object or testing more already tested objects has to be made.
TO MOTIVATE MORE IN-DEPTH:
Imagine that you are tasked with hiring a team of boxers. You know nothing about evaluating boxers but can ask two boxers to fight each other. There is an unlimited number of boxers in the world. But it's expensive to fly them in. Ideally, you want to hire the n best boxers. Realistically, you don't know if the boxers are going to accept your offer. Plus, you don't know how competitively the other boxing clubs bid. You are going to make offers to only the best n boxers, but have to be prepared to know which next n boxers to send offers to. That you only get the worst boxers is very unlikely.
SOME APPROACHES
I could think of the following approaches. However, they all have drawbacks. I feel like there should be a much better approach.
USE TRADITIONAL SORTING ALGORITHMS
Traditional sorting algorithms could be used.
Drawback:
- A false positive could serious throw of the correctness of the algorithm.
- A sorting algorithm would spend half the time sorting the bottom half of the pack, which is unimportant.
- Sorting algorithms start with all items. With this problem, we are allowed to do the first test, not knowing if we are allowed to do a second test. We may end up only being allowed to do two test. Or we may be allowed to do a million tests.
USE TOURNAMENT ALGORITHMS
There are algorithms for tournaments. E.g., everyone gets a first match. The winner of the first match moves on to the next round. There is a variety of tournament strategies that accounts for people having a bad day or being paired with the champion in their first match.
Drawback:
- This seems pretty promising. The difficulty is to find one that allows adding one more player at a time as we are allowed more comparisons. It seems that there should be a highly specialized solution that's better than a standard tournament algorithm.
BINARY SEARCH
We could start with two objects. Each time an object is added, we could use a binary search to find its spot in the ranking. Because the top is more important, we could use a weighted binary search. E.g. instead of testing the mid point, it tests the point at the top 1/3.
Drawback:
- The algorithm doesn't correct for false positives. If there is a false positive at the top early on, it could skew the whole rest of the tests.
COUNT WINS AND LOSSES
The wins and losses could be counted. The algorithm would choose test subjects by a priority of the least losses and second priority of the most wins. This would focus on testing the best objects. If an object has zero losses, it would get the focus of the testing. It would either quickly get a loss and drop in priority, or it would get a lot more tests because it's the likely top candidate.
DRAWBACK:
- The approach is very nice in that it corrects for false positives. It also allows adding more objects to the test pool easily. However, it does not consider that a win against a top object counts a lot more than a win against a bottom object. Thus, comparisons are wasted.
GRAPH
All the objects could be added to a graph. The graph could be flattened.
DRAWBACK:
- I don't know how to flatten such a messy graph that could have cycles and ambiguous end nodes. There could be multiple objects that are undefeated. How would one pick a winner in such a messy graph? How would one know which comparison would be the most valuable?
SCORING
As a win depends on the rank of the loser, a win could be given a score. Say A > B, means that A gets 1 point. if C > A, C gets 2 points because A has 1 point. In the end, objects are ranked by how many points they have.
DRAWBACK
- The approach seems promising in that it is easy to add new objects to the pool of tested objects. It also takes into account that wins against top objects should count for more. I can't think of a good way to determine the points. That first comparison, was awarded 1 point. Once 10,000 objects are in the pool, an average win would be worth 5,000 points. The award of both tests should be roughly equal. Later comparisons overpower the earlier comparisons and make them be ignored when they shouldn't.
Does anyone have a good idea on tackling this problem?
I would search for an easily computable value for an object, that could be compared between objects to give a good enough approximation of order. You could compare each new object with the current best accurately, then insertion sort the loser into a list of the rest using its computed value.
The best will always be accurate. The ordering of the rest depending on your "value".
I would suggest looking into Elo Rating systems and its derivatives. (like Glicko, BayesElo, WHR, TrueSkill etc.)
So you assign each object a preliminary rating, and then update that value according to the matches/comparisons you make. (with bigger changes to the ratings the more unexpected the outcome was)
This still leaves open the question of how to decide which object to compare to which other object to gain most information. For that I suggest looking into tournament systems and playoff formats. Though I suspect that an optimal solution will be decidedly more ad-hoc than that.
I was out buying groceries the other day and needed to search through my wallet to find my credit card, my customer rewards (loyalty) card, and my photo ID. My wallet has dozens of other cards in it (work ID, other credit cards, etc.), so it took me a while to find everything.
My wallet has six slots in it where I can put cards, with only the first card in each slot initially visible at any one time. If I want to find a specific card, I have to remember which slot it's in, then look at all the cards in that slot one at a time to find it. The closer it is to the front of a slot, the easier it is to find it.
It occurred to me that this is pretty much a data structures question. Suppose that you have a data structure consisting of k linked lists, each of which can store an arbitrary number of elements. You want to distribute elements into the linked lists in a way that minimizes looking up. You can use whatever system you want for distributing elements into the different lists, and can reorder lists whenever you'd like. Given this setup, is there an optimal way to order the lists, under any of the assumptions:
You are given the probabilities of accessing each element in advance and accesses are independent, or
You have no knowledge in advance what elements will be accessed when?
The informal system I use in my wallet is to "hash" cards into different slots based on use case (IDs, credit cards, loyalty cards, etc.), then keep elements within each slot roughly sorted by access frequency. However, maybe there's a better way to do this (for example, storing the k most frequently-used elements at the front of each slot regardless of their use case).
Is there a known system for solving this problem? Is this a well-known problem in data structures? If so, what's the optimal solution?
(In case this doesn't seem programming-related: I could imagine an application in which the user has several drop-down lists of commonly-used items, and wants to keep those items ordered in a way that minimizes the time required to find a particular item.)
Although not a full answer for general k, this 1985 paper by Sleator and Tarjan gives a helpful analysis of the amortised complexity of several dynamic list update algorithms for the case k=1. It turns out that move-to-front is very good: assuming fixed access probabilities for each item, it never requires more than twice the number of steps (moves and swaps) that would be required by the optimal (static) algorithm, in which all elements are listed in nonincreasing order of probability.
Interestingly, a couple of other plausible heuristics -- namely swapping with the previous element after finding the desired element, and maintaining order according to explicit frequency counts -- don't share this desirable property. OTOH, on p. 2 they mention that an earlier paper by Rivest showed that the expected amortised cost of any access under swap-with-previous is <= the corresponding cost under move-to-front.
I've only read the first few pages, but it looks relevant to me. Hope it helps!
You need to look at skip lists. There is a similar problem with arranging stations for a train system where there are express trains and regular trains. An express train stops only at express stations while regular trains stop at regular stations and express stations. Where should the express stops be placed so that one can minimize the average number of stops when travelling from a start station to any station.
The solution is to use stations at ternary numbers (i.e., at 1, 3, 6, 10 etc where T_n = n * (n + 1) / 2).
This is assuming all stops (or cards) are equally likely to be accessed.
If you know the access probabilities of your n cards in advance and you have k wallet slots and accesses are independent, isn't it fairly clear that the greedy solution is optimal? That is, the most frequently-accessed k cards go at the front of the pockets, next-most-frequently accessed k go immediately behind, and so forth? (You never want a lower-probability card ranked before a higher-probability card.)
If you don't know the access probabilities, but you do know they exist and that card accesses are independent, I imagine sorting the cards similarly, but by number-of-accesses-seen-so-far instead is asymptotically optimal. (Move-to-front is cool too, but I don't see an obvious reason to use it here.)
Perhaps you get something interesting if you penalise card moves as well; if I have any known probability distribution on card accesses, independent or not, I just greedily re-sort the cards every time I do an access.
I'm trying to write a program to automate a ticket draft.
We have a certain number of season ticket passes and want to split up the tickets among a group of people. There are X number of games, Y number of season passes, and Z number of people. Each of Z people has ranked the X games.
My code basically goes through the draft order and back picking out the tickets from their ranking if available, otherwise, picking the next highest ranking. For the most part it works. The problem is, there's a point where most of the tickets are taken and the remaining tickets left are ones you already have so you just don't pick them. People therefore have different numbers of tickets. Is there a good way to get around this?
If you have X games and Y season passes, presumably there are X*Y tickets available to give to the Z people, right?
This sounds like it could be treated as an optimization problem, but to do so you have to identify your main goals? I'm guessing you want each person to receive X*Y / Z tickets (split them evenly), but maybe not. I'm guessing you also want to maximize the aggregate satisfaction (defined in some way according to the rankings) in tickets. You would probably want to give a large penalty in satisfaction for a person if he receives more than 1 ticket for the same game. I believe this last aspect might be why the straight draft approach is not the best, but I could be mistaken.
Once you are clear on what you are trying to optimize (if this is indeed an optimization problem), then you can consider the best approach to the problem. This could be your own custom-built solution, or you could try an existing technique (genetic algorithm, etc.). Before doing so though it is important that you frame the problem properly.
If there were no preferences involved, this would be a straight min-cut max flow problem. http://en.wikipedia.org/wiki/Maximum_flow_problem, as follows:
Create a source vertex A. From A, create Z vertices, one for each person. The capacity can be infinite (or very, very large). Create a sink B, and create X vertices, one for each game, linked to B; the capacity should be Y (you have Y tickets per game). From each person, link to each game they've ranked, with capacity 1.
If you look at the wiki link above, there are about 10 algorithms to solve this basic problem. Find one you understand and can implement yourself, because you'll need to modify it slightly. I'm not familiar with all of them, but the ones I know about have a step 'pick an edge' or 'pick a path.' You should modify the 'how you pick an edge' logic to take the priority ordering of the games into account. I'm not sure exactly what the ordering should be (you'll probably need to experiment), but if you say the lowest ranked game is 1, the next is 2, up to X, then a score like 'ranking of the edge - number of games the person is already signed up for' might work.
I think this is a variant of the Stable Marriage Problem or the Stable Roommates Problem for which there are known algorithms for solving.
Ladies and Gents,
My best friends and I do a "Secret Santa" type gift exchange every year, this year I've been trying to think of a couple of ways to make it interesting. There are six of us involved and I want to design a small program that allows the six of us to rank their preferred gift-recipients from 1 to 5 as well as their preferred gift-givers.
So, let's say we're called A, B, C, D, E and F.
A submits two lists:
List 1 - People I would most like to give a present to: B, D, C, F, E
List 2 - People I would most like to recieve a present from: F, D, E, B, C
All six of us will submit both these lists, so I'll have 12 lists all together. I suppose my question is what is the best algorithm to now go ahead and assign each person a gift recipient?
I thought of something like this:
If two people have both selected each other in their opposing lists (i.e. A most wants to give to B, B most wants to get from A) then I immediately assign A to B. So now A is removed from our list of gift-recipients and B is removed from our pool of gift-givers.
Once I've assigned the "perfect matches" I'm kind of lost though, is there an establish algorithm for situations like this? Obviously it's only for entertainment value but surely there must be a "real" application of something similar? Perhaps timetabling or something?
My Google-fu has failed me but I have a feeling it might just be due my own lack of precision in search terms.
Cheers,
(and Happy Holidays I guess),
Rob
Update / Part 2
Okay, Ying Xiao came to the rescue by recommending the Gale Shapley Algorithm for the Stable Marriage Problem and I've implemented that in Python and it works a treat. However, this is just a thought that occurred to me. I guess within our group of six best friends there are three pairings of "extra-best" friends so I have a feeling we'll just end up with three pairs of AB, CD, EF and BA, DC, FE in terms of gift giving and recieving.
Is there an algorithm we could design that did take peoples rankings into account but also restricted two people forming a "closed group"? That is, if A is assigned to buy a gift for B, B can not be assigned to buy a gift for A? Perhaps I need to solve the Stable roommates problem?
Related questions:
Secret santa algorithm.
What is the best low-tech protocol to simulate drawing names out of a hat and ensure secrecy?
The Gale-Shapley algorithm (for the Stable Marriage problem) applies only when each person has a ranked list of all other participants -- you may or may not be able to convert your problem to that form (make everyone rank everyone).
Also, note that the thing it is optimizing for is something different: it tries to find a set of stable marriages, where no pair of people will "elope" because they prefer each other to their current partners. This is not something you care about in your Secret Santa application.
What you want (depending on your definition of "best") is a maximum-weight bipartite matching, which fixes both the above objections: put the "givers" on one side, the "receivers" on the other (so two copies of each person, in this case), give each edge a weight corresponding to how highly that giver ranks that receiver, and it is now the assignment problem. You can use the Hungarian algorithm for this, or simpler (slower) ones. You can also vary how you assign the weights to optimize for different things (e.g. maximize the number of people who get their first choice, or minimize the worst choice that anyone gets, etc.)
If you do use the Gale-Shapley stable marriage algorithm, note that it is optimal for the "proposers" (male-optimal and female-pessimal), so be sure to put the "givers" as the "proposers", and not vice versa.
For each person, create two virtual people, a "giver" and a "receiver". Now match the set of givers against the set of receivers using the Gale Shapley Algorithm. Runs in O(n^2) time.
http://en.wikipedia.org/wiki/Stable_marriage_problem