Algorithm to create fair / evenly matched teams based on player rankings - algorithm

I have a data set of players' skill ranking, age and sex and would like to create evenly matched teams.
Teams will have the same number of players (currently 8 teams of 12 players).
Teams should have the same or similar male to female ratio.
Teams should have similar age curve/distribution.
I would like to try this in Haskell but the choice of coding language is the least important aspect of this problem.

This is a bin packing problem, or a multi-dimensional knapsack problem. Björn B. Brandenburg has made a bin packing heuristics library in Haskell that you may find useful.
You need something like...
data Player = P { skill :: Int, gender :: Bool, age :: Int }
Decide on a number of teams n (I'm guessing this is a function of the total number of players).
Find the desired total skill per team:
teamSkill n ps = sum (map skill ps) / n
Find the ideal gender ratio:
genderRatio ps = sum (map (\x -> if gender x then 1 else 0)) / length ps
Find the ideal age variance (you'll want the Math.Statistics package):
ageDist ps = pvar (map age ps)
And you must assign the three constraints some weights to come up with a scoring for a given team:
score skillW genderW ageW team = skillW * sk + genderW * g + ageW * a
where (sk, (g, a)) = (teamSkill 1 &&& genderRatio &&& ageDist) team
The problem reduces to the minimization of the difference in scores between teams. A brute force approach will take time proportional to Θ(nk−1). Given the size of your problem (8 teams of 12 players each), this translates to about 6 to 24 hours on a typical modern PC.
EDIT
An approach that may work well for you (since you don't need an exact solution in practise) is simulated annealing, or continual improvement by random permutation:
Pick teams at random.
Get a score for this configuration (see above).
Randomly swap players between two or more teams.
Get a score for the new configuration. If it's better than the previous one, keep it and recurse to step 3. Otherwise discard the new configuration and try step 3 again.
When the score has not improved for some fixed number of iterations (experiment to find the knee of this curve), stop. It's likely that the configuration you have at this point will be close enough to the ideal. Run this algorithm a few times to gain confidence that you have not hit on some local optimum that is considerably worse than ideal.

Given the number of players per team and the gender ration (which you can easily compute). The remaining problem is called n-partition problem, which is unfortunately NP-complete and thus very hard to solve exactly. You will have to use approximative or heuristic allgorithms (evolutionary algorithms), if your problem size is too big for a brute force solution. A very simple approximation would be sorting by age and assign in an alternating way.

Assign point values to the skill levels, gender, and age
Assign the sum of the points for each criteria to each player
Sort players by their calculated point value
Assign the next player to the first team
Assign players to the second team until it has >= total points than the first team or the team reaches the maximum players.
Perform 5 for each team, looping back to the first team, until all players are assigned
You can tweak the skill level, gender, and age point values to change the distribution of each.

Lets say you have six players (for a simple example). We can use the same algorithm which pairs opponents in single-elimination tournaments and adapt that to generate "even" teams based on any criteria you choose.
First rank your players best-to-worst. Don't take this too literally. You want a list of players sorted by the criteria you wish to separate them.
Why?
Let's look at single elimination tournaments for a second. The idea of using an algorithm to generate optimal single-elimination matches is to avoid the problem of the "top players" meeting too soon in the tournament. If top players meet too soon, one of the top players will be eliminated early on, making the tournament less interesting. We can use this "optimal" pairing to generate teams in which the "top" players are spread out evenly across the teams. Then spread out the the second top players, etc, etc.
So list you players by the criteria you want them separated: men first, then women... sorted by age second. We get (for example):
Player 1: Male - 18
Player 2: Male - 26
Player 3: Male - 45
Player 4: Female - 18
Player 5: Female - 26
Player 6: Female - 45
Then we'll apply the single-elimination algorithm which uses their "rank" (which is just their player number) to create "good match ups".
The single-elimination tournament generator basically works like this: take their rank (player number) and reverse the bits (binary). This new number you come up with become their "slot" in the tournament.
Player 1 in binary (001), reversed becomes 100 (4 decimal) = slot 4
Player 2 in binary (010), reversed becomes 010 (2 decimal) = slot 2
Player 3 in binary (011), reversed becomes 110 (6 decimal) = slot 6
Player 4 in binary (100), reversed becomes 001 (1 decimal) = slot 1
Player 5 in binary (101), reversed becomes 101 (5 decimal) = slot 5
Player 6 in binary (110), reversed becomes 011 (3 decimal) = slot 3
In a single-elimination tournament, slot 1 plays slot 2, 3-vs-4, 5-vs-6. We're going to uses these "pair ups" to generate optimal teams.
Looking at the player number above, ordered by their "slot number", here is the list we came up with:
Slot 1: Female - 18
Slot 2: Male - 26
Slot 3: Female - 45
Slot 4: Male - 18
Slot 5: Female - 26
Slot 6: Male - 45
When you split the slots up into teams (two or more) you get the players in slot 1-3 vs players in slot 4-6. That is the best/optimal grouping you can get.
This technique scales very well with many more players, multiple criteria (just group them together correctly), and multiple teams.

Idea:
Sort players by skill
Assign best players in order (i.e.: team A: 1st player, team B: 2nd player, ...)
Assign worst players in order
Loop on 2
Evaluate possible corrections and perform them (i.e.: if team A has a total skill of 19 with a player with skill 5 and team B has a total skill of 21 with a player with skill 4, interchange them)
Evaluate possible corrections on gender distribution and perform them
Evaluate possible corrections on age distribution and perform them

Almost trivial approach for two teams:
Sort all player by your skill/rank assessment.
Assign team A the best player.
Assign team B the next two best players
Assign team A the next two best players
goto 3
End when you're out of players.
Not very flexible, and only works on one column ranking, so it won't try to get similar gender or age profiles. But it does make fair well matched teams if the input distribution is reasonably smooth. Plus it doesn't always end with team A have the spare player when there are an odd number.

Well,
My answer is not about scoring strategies of teams/players because all the posted are good, but I would try a brute force or a random search approach.
I don't think it's worth create a genetic algorithm.
Regards.

Related

Optimizing a Combination Algorithm

I'm organizing a game tournament in a week. I began thinking about the best way to mathematically arrange the teams (so that they are really even, thus more competitive). So here's the data:
20 players in the event
Each player is assigned a skill level (high number = skilled)
4 teams of 5 players each (although I'd prefer to build a algorithm that takes these as
variables)
I'm using a computer to solve the problem
So, I have 20 players. I'd like to generate 4 teams with 5 players each. To do this, I'd like to generate a list of all possible team combinations. To evaluate a team combination, I:
Generate a combination of teams (a match)
Sum the total skill for each team based off the players in that team
Compare each team to each other, the highest difference between any two teams in the match is the "tolerance" level for that match. If the tolerance level is higher than a certain cap, the match is discarded
My current approach is to generate a base X number that is N digits long, where X is the number of teams I want, and N is the number of players. Then increment the base X number by 1, I'll get every possible team combination, and I can generate a list of matches that have low tolerance values.
The problem with this, as you probably know, is for 4 teams with 20 players, that's (4-1)^20 in base 3, which is 1E12 matches to check through. (This takes a long time on my computer). Is there a mathematical way to simplify this calculation to be doable in a short period of time?
By current method also allows for the possibility of uneven players spread across the number of teams, which is preferable. If this can't be present with a highly performant algorithm, then it's okay not to use it.
Try following approach:
For teams 1 to 4: Take the strongest player from the remaining
The the same in another direction: from 4 to 1
Again 1 to 4
Again 4 to 1
In the last round use random
This works well when player skills are distributed more or less evenly. If not, then the probability of bigger differences between teams is higher.

Developing player rankings with ELO

I recently created a tournament system that will soon lead into player rankings. Basically, after players are done with the tournament, they are given a rank based on how they did in the tournament. So the person who won the tournament will have the most points and be ranked #1, while the second will have the second most points and be ranked #2, and so on...
However, after they are ranked in the new rankings, they can challenge other members and have a way to play other members and change their ranks. So basically (using a ranking system), if Player A who is ranked #2 beats Player B who is ranked #1, Player A will now become #1.
I've also decided that if a player wants to compete in the rankings but was not present during the tournament, they can sign up after the tournament, and will be given the lowest possible rank with the lowest points (but they have a chance to move up).
So now, I am wanting to know which way should I go about planning this. When I convert the players from tournament to match rankings, I have to identify them with points in order to rank them. I decided this seems like the best way to do it.
1 1000
2 900
3 800
4 700
5 600
6 500
7 400
8 300
9 200
10 100
After looking on the internet I've decided it would be wise to use ELO to give players their new rank after they players have matched against each other.. I went about it on this page: http://www.lifewithalacrity.com/2006/01/ranking_systems.html
So if I go about it this way, lets say I have rank #10 facing rank #1. According to the website above, my formula is:
R' = R + K * (S - E)
and the rating of #10 only has 100 points where #1 has 1,000.
So after doing the math rank #10's expected value of beating #1 is:
1 / [ 1 + 10 ^ ( [1000 - 100] / 400) ]
= 0.55%
So
100 + 32 * (1 - 0.52)
= 115.36
The problem I have with ELO is it makes no sense. After A rank such as #10 beats #1, he should not gain something as low as 15 points. I'm not sure if i'm doing the math wrong, or if I'm splitting up the points wrong. Or maybe I shouldn't use ELO at all? Any suggestions would be very helpful
Don't get offended, it is your table that doesn't make sense.
Elo system is based on the premise that a rating is an accurate estimate of the strength, and difference of ratings accurately predicts an outcome of a match (a player better by 200 point is expected to score 75%). If an actual outcome does not agree with a prediction, it means that ratings do not reflect strength, hence must be adjusted according to how much an actual outcome differs from the predicted.
An official (as in FIDE) Elo system has few arbitrary arbitrary constants (e.g. 200/75 gauge, Erf as predictor, etc); choosing them (reasonably) different may lead to a different rating values, yet would result (in a long run) in the same ranking. There is some interesting math behind this assertion; this is not a right place to get into details.
Now back to your table. It assigns the rating based on the place, not on the points scored. The champion gets 1000 no matter whether she swept the tournament with an absolute 100% result, or barely made it among equals. These points do not estimate the strength of the participants.
So my advise is to abandon the table altogether, assign each new player an entry rating (say, 1000; it really doesn't matter as long as you are consistent), and stick to Elo from the very beginning.

Find all possible combinations of a scores consistent with data

So I've been working on a problem in my spare time and I'm stuck. Here is where I'm at. I have a number 40. It represents players. I've been given other numbers 39, 38, .... 10. These represent the scores of the first 30 players (1 -30). The rest of the players (31-40) have some unknown score. What I would like to do is find how many combinations of the scores are consistent with the given data.
So for a simpler example: if you have 3 players. One has a score of 1. Then the number of possible combinations of the scores is 3 (0,2; 2,0; 1,1), where (a,b) stands for the number of wins for player one and player two, respectively. A combination of (3,0) wouldn't work because no person can have 3 wins. Nor would (0,0) work because we need a total of 3 wins (and wouldn't get it with 0,0).
I've found the total possible number of games. This is the total number of games played, which means it is the total number of wins. (There are no ties.) Finally, I have a variable for the max wins per player (which is one less than the total number of players. No player can have more than that.)
I've tried finding the number of unique combinations by spreading out N wins to each player and then subtracting combinations that don't fit the criteria. E.g., to figure out many ways to give 10 victories to 5 people with no more than 4 victories to each person, you would use:
C(14,4) - C(5,1)*C(9,4) + C(5,2)*C(4,4) = 381. C(14,4) comes from the formula C(n+k-1, k-1) (google bars and strips, I believe). The next is picking off the the ones with the 5 (not allowed), but adding in the ones we subtracted twice.
Yeah, there has got to be an easier way. Lastly, the numbers get so big that I'm not sure that my computer can adequately handle them. We're talking about C(780, 39), which is 1.15495183 × 10^66. Regardless, there should be a better way of doing this.
To recap, you have 40 people. The scores of the first 30 people are 10 - 39. The last ten people have unknown scores. How many scores can you generate that are meet the criteria: all the scores add up to total possible wins and each player gets no more 39 wins.
Thoughts?
Generating functions:
Since the question is more about math, but still on a programming QA site, let me give you a partial solution that works for many of these problems using a symbolic algebra (like Maple of Mathematica). I highly recommend that you grab an intro combinatorics book, these kind of questions are answered there.
First of all the first 30 players who score 10-39 (with a total score of 735) are a bit of a red herring - what we would like to do is solve the other problem, the remaining 10 players whose score could be in the range of (0...39).
If we think of the possible scores of the players as the polynomial:
f(x) = x^0 + x^1 + x^2 + ... x^39
Where a value of x^2 is the score of 2 for example, consider what this looks like
f(x)^10
This represents the combined score of all 10 players, ie. the coefficent of x^385 is 2002, which represents the fact that there are 2002 ways for the 10 players to score 385. Wolfram Alpha (a programming lanuage IMO) can evaluate this for us.
If you'd like to know how many possible ways of doing this, just substitute in x=1 in the expression giving 8,140,406,085,191,601, which just happens to be 39^10 (no surprise!)
Why is this useful?
While I know it may seem silly to some to set up all this machinery for a simple problem that can be solved on paper - the approach of generating functions is useful when the problem gets messy (and asymptotic analysis is possible). Consider the same problem, but now we restrict the players to only score prime numbers (2,3,5,7,11,...). How many possible ways can the 10 of them score a specific number, say 344? Just modify your f(x):
f(x) = x^2 + x^3 + x^5 + x^7 + x^11 ...
and repeat the process! (I get [x^344]f(x)^10 = 1390).

Algorithm to select random pairs, schedule matchups

I'm working in Ruby, but I think this question is best asked agnostic of language. It may be assumed that we have access to basic list/array functions, as well as a "random" number generator. Here's what I'd like to be able to do:
Given a collection of n teams, with n even,
Randomly pair each team with an opponent, such that every team is part of exactly one pair. Call this ROUND 1.
Randomly generate n-2 subsequent rounds (ROUND 2 through ROUND n-1) such that:
Each round has the same property as the first (every team is a
member of one pair), and
After all the rounds, every team has faced every other team exactly once.
I imagine that algorithms for doing exactly this must be well known, but as a self-taught coder I'm having trouble figuring out how to find them.
I belive You are describing a round robin tournament. The wikipedia page gives an algorithm.
If You need a way to randomize the schedule, randomize team order, round order, etc.
Well not sure if this is the most efficient algorithm but:
Randomly assign N teams into two lists of same length n/2 (List1, List2)
Starting with i = 0:
Create pairs: List1[i],List2[i] = a team pair
Repeat for i = 1-> (n/2-1)
For rounds 2-> n/2-1:
Rotate List2, so that the first team in List2 is now at the end.
Repeat steps 2 through 5, until List2 has been cycled once.
This link was very helpful to me the last time I wrote a round robin scheduling algorithm. It includes a C implementation of a first fit algorithm for round robin pairings.
http://www.devenezia.com/downloads/round-robin/
In addition to the algorithm, he has some helpful links to other aspects of tournament scheduling (balancing home and away games, as well as rotating teams across fields/courts).
Note that you don't necessarily want a "random" order to the pairings in all cases. If, for example, you were scheduling a round robin soccer league for 8 games that only had 6 teams, then each team is going to have to play two other teams twice. If you want to make a more enjoyable season for everyone, you have to start worrying about seeding so that you don't have your top 2 teams clobbering the two weakest teams in their last two games. You'd be better off arranging for the extra games to be paired against teams of similar strength/seeding.
Based on info I found through Maniek's link, I went with the following:
A simple round robin algorithm that
a. Starts with pairings achieved by zipping [0,...,(n-1)/2] and [(n-1)/2 + 1,..., n-1]. (So, if n==10, we have 0 paired with 5, 1 with 6, etc.)
b. Rotates all but one team n-2 times clockwise until all teams have played each other. (So in round 2 we pair 1 with 6, 5 with 7, etc.)
Randomly assigns one of [0,..., n-1] to each of the teams.

I have an algorithm problem having to do with scheduling teams in rotation as fairly as possible

I have a project for school where I have to come up with an algorithm for scheduling 4 teams to play volleyball on one court, such that each team gets as close to the same amount of time as possible to play.
If you always have the winners stay in and rotate out the loser, then the 4th ranked team will never play and the #1 team always will.
The goal is to have everybody play the same amount of time.
The simplest answer is team 1 play team 2, then team 3 play team 4 and keep switching, but then team 1 never gets to play team 3 or 4 and so on.
So I'm trying to figure out an algorithm that will allow everybody to play everybody else at some point without having one team sit out a lot more than any other team.
Suggestions?
How about this: Make a hashtable H of size NC2, in this case, 6. It looks like:
H[12] = 0
H[13] = 0
H[14] = 0
H[23] = 0
H[24] = 0
H[34] = 0
I am assuming it would be trivial to generate the keys.
Now to schedule a game, scan through the hash and pick the key with the lowest value (one pass). The teams denoted by the key play the game and you increment the value by one.
EDIT:
To add another constraint that no team should wait too long, make another hash W:
W[1] = 0
W[2] = 0
W[3] = 0
W[4] = 0
After every game increment the W value for the team that did not play, by one.
Now when picking up the least played team if there are more than one team combo with low play score, take help from this hash to determine which team must play next.
well you should play 1-2 3-4, 1-3 2-4, 1-4 2-3 and then start all over again.
If there are N teams and you want all pairs of them to play once, then there are "N choose 2" = N*(N-1)/2 games you need to run.
To enumerate them, just put the teams in an ordered list and have the first team play every other team, then have the second team play all the teams below it in the list, and so on. If you want to spread the games out so teams have similar rest intervals between games, then see Knuth.
Check out the wikipedia entry on round robin scheduling.
pretend it's a small sports league, and repeat the "seasons"...
(in most sports leagues in Europe, all teams play against all other teams a couple of times during a season)
The REQUIREMENTS for the BALANCED ROUND ROBIN algorithm, for the Team championship scheduling may be found here:
Constellation Algorithm - Balanced Round Robin
The requirements of the algorithm can be defined by these four constraints:
1) All versus all
Each team must meet exactly once, and once only, the other teams in the division/ league.
If the division is composed of n teams, the championship takes place in the n-1 rounds.
2) Alternations HOME / AWAY rule
The sequence of alternations HOME / AWAY matches for every teams in the division league, should be retained if possible. For any team in the division league at most once in the sequence of consecutive matches HAHA, occurs the BREAK of the rhythm, i.e. HH or AA match in the two consecutive rounds.
3) The rule of the last slot number
The team with the highest slot number must always be positioned in the last row of the grid. For each subsequent iteration the highest slot number of grid alternates left and right position; left column (home) and right (away).
The system used to compose the league schedule is "counter-clockwise circuit." In the construction of matches in one round of the championship, a division with an even number of teams. If in a division is present odd number of teams, it will be inserted a BYE/Dummy team in the highest slot number of grid/ring.
4) HH and AA non-terminal and not initial
Cadence HH or AA must never happen at the beginning or at the end of the of matches for any team in the division.

Resources