Combinatorial optimization for selecting the players of a football team

Combinatorial optimization for selecting the players of a football team - algorithm

I'm thinking of using integer programming to generate the optimal combination of football players to comprise a team.
Though the problem itself is simple, I'm having trouble formulating an efficient constraint for position eligibility.
I've searched stackoverflow for variants on the classic integer programming problems and came up with a temporary solution, but want to verify if the current one is efficient.
Problem
Select 11 players for a football (soccer) team from a finite set of potential players.
The team is composed of 11 slots: 5 offense, 5 defense, and 1 goal keeper.
Each role has its own respective pool of eligible players.
(e.g. goalies can only be selected from a set of players that have gloves)
Each player is assigned a 'skill score' that represents their skill
Objective function
Maximize: sum of all 11 player skill scores
Constraints
All 11 slots must be filled with at least 1 player
1 player may occupy only 1 slot
Each slot must be filled by a player that is eligible for the role
My temporary solution
Let xij: player i in consideration for slot j (j=1, 2,..., 11, 1 <= i <= n)
and w: n x 1 weight matrix representing skill score for player i
1. sum_j(xij) == 1 for each slot j
2. 0 <= sum_i(xij) <= 1 for each player i
3. 0 <= xij <= 1 if player i is eligible for slot j, == 0 otherwise
maximize sum(wTx)
I could not yet think of an elegant way to operationalize 3, so my answer for now is to hard code each cell to be 0.
I'm planning to use an integer programming library in Python (cvxpy or PuLP).
Would the current design lead to any issues in convergence or computation time?
Would there be a more efficient way to model the problem?
Notes
For simplicity, let's assume that while a player can be a candidate for multiple roles, their skill point does not change depending on role
Would the problem formulation change if the players' skill scores change depending on their synergy with other players? I'm thinking that simply expanding the x matrix by nC2 possible interactions could work, but am curious if there are better solutions.

Your IP formulation looks OK. However, this problem can also be solved using Dynamic Programming:
Let the array dp[n][mask] represent the maximum possible score you can get for placing players from 1 to n into positions that the 0 digits in the mask's binary representation correspond. For example dp[5][0b11111110101] is equal to the maximum score you can get for placing players 1 to 5 into positions 2 and 4. (second and fourth bits of the mask are zero.)
Now we recursively define dp[n][mask]:
dp[n][mask] = max{
dp[n-1][mask], # the case when we don't use player n
max_j{ dp[n-1][mask|(1<<j)] + w[n] } (1 <= j <= 11 and mask's jth bit is 0)
}
The base case is when n=0.
dp[0][mask] = {
-infinity, if there are 0 bits in mask # there are empty positions left, we want to strictly discourage this case.
0, if all 11 bits of mask is 1
}

Related

Algorithm to optimize a team formation based on player rating and power

I want to create an algorithm for a hypothetical game where you can create as many groups as desired with a given list of players.
Suppose I have a list of players, where every player is represented by their rating.
Given The following matrix
The numbers in yellow correspond to the amount of players in any given group.
The numbers in white correspond to the score that each player in the group is contributing.
The numbers in orange correspond to the rating threshold needed for the corresponding score.
For example, if I have a group of players of rating [50, 100], using the matrix it can be determined that they are each generating a score of 26.45, since the total rating is 150 and there are two players in that group. The total score in that team is 52.90.
Ideally the algorithm would return the groups that yield to the best score, with the constraints that I can make as many groups as wanted, and not all players need to be put in a group.
What would be a good way to get started or solve this algorithm?

I'd reduce this problem to weighted set packing and use a mixed integer program (MIP) solver library such as OR-Tools.
For every subset S of players that can form a group, we have a variable x(S) that is 1 if we choose the subset and 0 otherwise. We let the score of the subset be score(S), so that the objective is to maximize the sum over S of score(S) x(S). We have a constraint for each player p: the sum over all S containing p of x(S) is at most 1.
For example, with three players with all nonempty subsets as possible groups, we would get
maximize score({1}) x({1}) + score({2}) x({2}) + score({3}) x({3}) +
score({1,2}) x({1,2}) + score({1,3}) x({1,3}) + score({2,3}) x({2,3}) +
score({1,2,3}) x({1,2,3})
subject to
x({1}) + x({1,2}) + x({1,3}) + x({1,2,3}) <= 1
x({2}) + x({1,2}) + x({2,3}) + x({1,2,3}) <= 1
x({3}) + x({1,3}) + x({2,3}) + x({1,2,3}) <= 1
x({1}), ..., x({1,2,3}) in {0, 1}
Unless the scores are close to linear, modern MIP solvers on modern hardware should be able to scale to 20 players. If not, there are more complicated techniques.

Calculate the winning strategy of a subtraction game

Problem:
Given 100 stones, two players alternate to take stones out. One can take any number from 1 to 15; however, one cannot take any number that was already taken. If in the end of the game, there is k stones left, but 1 through k have all been previously taken, one can take k stones. The one who takes the last stone wins. How can the first player always win?
My Idea
Use recursion (or dynamic programming). Base case 1, where player 1 has a winning strategy.
Reducing: for n stones left, if palyer 1 takes m1 stones, he has to ensure that for all options player 2 has (m2), he has a winning strategy. Thus the problem is reduced to (n - m1 - m2).
Follow Up Question:
If one uses DP, the potential number of tables to be filled is large (2^15), since the available options left depend on the history, which has 2^15 possibilities.
How can you optimize?

Assuming that the set of numbers remaining can be represented as R, the highest number remaining after your selection can be represented by RH and the lowest number remaining can be RL, the trick is to use your second-to-last move to raise the number to <100-RH, but >100-RH-RL. That forces your opponent to take a number that will put you in winning range.
The final range of winning, with the total number that you create with your second-to-last move, is:
N < 100-RH
N > 100-RH-RL
By observation I noted that RH can be as high as 15 and as low as 8. RL can be as low as 1 and as high as 13. From this range I evaluated the equations.
N < 100-[8:15] => N < [92:85]
N > 100-[8:15]-[1:13] => N > [92:85] - [1:13] => N > [91:72]
Other considerations can narrow this gap. RL, for instance, is only 13 in an edge circumstance that always results in a loss for Player A, so the true range is between 72 and 91. There is a similar issue with RH and the low end of it, so the final ranges and calculations are:
N < 100-[9:15] => N < [91:85]
N > 100-[9:15]-[1:12] => N > [91:85] - [1:12] => N > [90:73]
[90:73] < N < [91:85]
Before this, however, the possibilities explode. Remember, this is AFTER you choose your second-to-last number, not before. At this point they are forced to choose a number that will allow you to win.
Note that 90 is not a valid choice to win with, even though it might exist. Thus, the maximum it can be is 89. The real range of N is:
[88:73] < N < [90:85]
It is, however, possible to calculate the range of the number that you're using to put your opponent in a no-win situation. In the situation you find yourself in, the lowest number or the highest number might be the one you chose, so if RHc is the highest number you can pick and RLc is the lowest number you can pick, then
RHc = [9:15]
RLc = [1:12]
With this information, I can begin constructing a relative algorithm starting from the end of the game.
N*p* - RHp - RLp < Np < N*p* - RHp, where p = iteration and *p* = iteration + 1
RHp = [8+p:15]
RLp = [1:13-p]
p = -1 is your winning move
p = 0 is your opponent's helpless move
p = 1 is your set-up move
Np is the sum of that round.
Thus, solving the algorithm for your set-up move, p=1, you get:
N*p* - [9:15] - [1:12] < Np < N*p* - [9:15]
100 <= N*p* <= 114
I'm still working out the math for this, so expect adjustments. If you see an error, please let me know and I'll adjust appropriately.

Here is a simple, brute force Python code:
# stoneCount: number of stones to start the game with
# possibleMoves: which numbers of stones may be removed? (*sorted* list of integers)
# return value: signals if winning can be forced by first player;
# if True, the winning move is attached
def isWinningPosition(stoneCount, possibleMoves):
if stoneCount == 0:
return False
if len(possibleMoves) == 0:
raise ValueError("no moves left")
if stoneCount in possibleMoves or stoneCount < possibleMoves[0]:
return True,stoneCount
for move in possibleMoves:
if move > stoneCount:
break
remainingMoves = [m for m in possibleMoves if m != move]
winning = isWinningPosition(stoneCount - move, remainingMoves)
if winning == False:
return True,move
return False
For the given problem size this function returns in less than 20 seconds on an Intel i7:
>>> isWinningPosition(100, range(1,16))
False
(So the first play cannot force a win in this situation. Whatever move he makes, it will result in a winning position for the second player.)
Of course, there is a lot of room for run time optimization. In the above implementation many situations are reached and recomputed again and again (e.g. when the first play takes one stone and the second player takes two stones this will put the first player into the same situation as when the number of stones taken by each player are reversed). So the first (major) improvement is to memorize already computed situations. Then one could go for more efficient data structures (e.g. encoding the list of possible moves as bit pattern).

Algorithm to calculate sum of points for groups with varying member count [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Let's start with an example. In Harry Potter, Hogwarts has 4 houses with students sorted into each house. The same happens on my website and I don't know how many users are in each house. It could be 20 in one house 50 in another and 100 in the third and fourth.
Now, each student can earn points on the website and at the end of the year, the house with the most points will win.
But it's not fair to "only" do a sum of the points, as the house with a 100 students will have a much higher chance to win, as they have more users to earn points. So I need to come up with an algorithm which is fair.
You can see an example here: https://worldofpotter.dk/points
What I do now is to sum all the points for a house, and then divide it by the number of users who have earned more than 10 points. This is still not fair, though.
Any ideas on how to make this calculation more fair?
Things we need to take into account:
* The percent of users earning points in each house
* Few users earning LOTS of points
* Many users earning FEW points (It's not bad earning few points. It still counts towards the total points of the house)
Link to MySQL dump(with users, houses and points): https://worldofpotter.dk/wop_points_example.sql
Link to CSV of points only: https://worldofpotter.dk/points.csv

I'd use something like Discounted Cumulative Gain which is used for measuring the effectiveness of search engines.
The concept is as it follows:
FUNCTION evalHouseScore (0_INDEXED_SORTED_ARRAY scores):
score = 0;
FOR (int i = 0; i < scores.length; i++):
score += scores[i]/log2(i);
END_FOR
RETURN score;
END_FUNCTION;
This must be somehow modified as this way of measuring focuses on the first result. As this is subjective you should decide on your the way you would modify it. Below I'll post the code which some constants which you should try with different values:
FUNCTION evalHouseScore (0_INDEXED_SORTED_ARRAY scores):
score = 0;
FOR (int i = 0; i < scores.length; i++):
score += scores[i]/log2(i+K);
END_FOR
RETURN L*score;
END_FUNCTION
Consider changing the logarithm.
Tests:
int[] g = new int[] {758,294,266,166,157,132,129,116,111,88,83,74,62,60,60,52,43,40,28,26,25,24,18,18,17,15,15,15,14,14,12,10,9,5,5,4,4,4,4,3,3,3,2,1,1,1,1,1};
int[] s = new int[] {612,324,301,273,201,182,176,139,130,121,119,114,113,113,106,86,77,76,65,62,60,58,57,54,54,42,42,40,36,35,34,29,28,23,22,19,17,16,14,14,13,11,11,9,9,8,8,7,7,7,6,4,4,3,3,3,3,2,2,2,2,2,2,2,1,1,1};
int[] h = new int[] {813,676,430,382,360,323,265,235,192,170,107,103,80,70,60,57,43,41,21,17,15,15,12,10,9,9,9,8,8,6,6,6,4,4,4,3,2,2,2,1,1,1};
int[] r = new int[] {1398,1009,443,339,242,215,210,205,177,168,164,144,144,92,85,82,71,61,58,47,44,33,21,19,18,17,12,11,11,9,8,7,7,6,5,4,3,3,3,3,2,2,2,1,1,1,1};
The output is for different offsets:
1182
1543
1847
2286
904
1231
1421
1735
813
1120
1272
1557

It sounds like some sort of constraint between the houses may need to be introduced. I might suggest finding the person that earned the most points out of all the houses and using it as the denominator when rolling up the scores. This will guarantee the max value of a user's contribution is 1, then all the scores for a house can be summed and then divided by the number of users to normalize the house's score. That should give you a reasonable comparison. It does introduce issues with low numbers of users in a house that are high achievers in which you may want to consider lower limits to the number of house members. Another technique may be to introduce handicap scores for users to balance the scales. The algorithm will most likely flex over time based on the data you receive. To keep it fair it will take some responsive action after the initial iteration. Players can come up with some creative ways to make scoring systems work for them. Here is some pseudo-code in PHP that you may use:
<?php
$mostPointsEarned; // Find the user that earned the most points
$houseScores = [];
foreach ($houses as $house) {
$numberOfUsers = 0;
$normalizedScores = [];
foreach ($house->getUsers() as $user) {
$normalizedScores[] = $user->getPoints() / $mostPointsEarned;
$numberOfUsers++;
}
$houseScores[] = array_sum($normalizedScores) / $numberOfUsers;
}
var_dump($houseScores);

You haven't given any examples on what should be preferred state, and what are situations against which you want to be immune. (3,2,1,1 compared to 5,2 etc.)
It's also a pity you haven't provided us the dataset in some nice way to play.
scala> val input = Map( // as seen on 2016-09-09 14:10 UTC on https://worldofpotter.dk/points
'G' -> Seq(758,294,266,166,157,132,129,116,111,88,83,74,62,60,60,52,43,40,28,26,25,24,18,18,17,15,15,15,14,14,12,10,9,5,5,4,4,4,4,3,3,3,2,1,1,1,1,1),
'S' -> Seq(612,324,301,273,201,182,176,139,130,121,119,114,113,113,106,86,77,76,65,62,60,58,57,54,54,42,42,40,36,35,34,29,28,23,22,19,17,16,14,14,13,11,11,9,9,8,8,7,7,7,6,4,4,3,3,3,3,2,2,2,2,2,2,2,1,1,1),
'H' -> Seq(813,676,430,382,360,323,265,235,192,170,107,103,80,70,60,57,43,41,21,17,15,15,12,10,9,9,9,8,8,6,6,6,4,4,4,3,2,2,2,1,1,1),
'R' -> Seq(1398,1009,443,339,242,215,210,205,177,168,164,144,144,92,85,82,71,61,58,47,44,33,21,19,18,17,12,11,11,9,8,7,7,6,5,4,3,3,3,3,2,2,2,1,1,1,1)
) // and the results on the website were: 1. R 1951, 2. H 1859, 3. S 990, 4. G 954
Here is what I thought of:
def singleValuedScore(individualScores: Seq[Int]) = individualScores
.sortBy(-_) // sort from most to least
.zipWithIndex // add indices e.g. (best, 0), (2nd best, 1), ...
.map { case (score, index) => score * (1 + index) } // here is the 'logic'
.max
input.mapValues(singleValuedScore)
res: scala.collection.immutable.Map[Char,Int] =
Map(G -> 1044,
S -> 1590,
H -> 1968,
R -> 2018)
The overall positions would be:
Ravenclaw with 2018 aggregated points
Hufflepuff with 1968
Slytherin with 1590
Gryffindor with 1044
Which corresponds to the ordering on that web: 1. R 1951, 2. H 1859, 3. S 990, 4. G 954.
The algorithms output is maximal product of score of user and rank of the user within a house.
This measure is not affected by "long-tail" of users having low score compared to the active ones.
There are no hand-set cutoffs or thresholds.
You could experiment with the rank attribution (score * index or score * Math.sqrt(index) or score / Math.log(index + 1) ...)

I take it that the fair measure is the number of points divided by the number of house members. Since you have the number of points, the exercise boils down to estimate the number of members.
We are in short supply of data here as the only hint we have on member counts is the answers on the website. This makes us vulnerable to manipulation, members can trick us into underestimating their numbers. If the suggested estimation method to "count respondents with points >10" would be known, houses would only encourage the best to do the test to hide members from our count. This is a real problem and the only thing I will do about it is to present a "manipulation indicator".
How could we then estimate member counts? Since we do not know anything other than test results, we have to infer the propensity to do the test from the actual results. And we have little other to assume than that we would have a symmetric result distribution (of the logarithm of the points) if all members tested. Now let's say the strong would-be respondents are more likely to actually test than weak would-be respondents. Then we could measure the extra dropout ratio for the weak by comparing the numbers of respondents in corresponding weak and strong test-point quantiles.
To be specific, of the 205 answers, there are 27 in the worst half of the overall weakest quartile, while 32 in the strongest half of the best quartile. So an extra 5 respondents of the very weakest have dropped out from an assumed all-testing symmetric population, and to adjust for this, we are going to estimate member count from this quantile by multiplying the number of responses in it by 32/27=about 1.2. Similarly, we have 29/26 for the next less-extreme half quartiles and 41/50 for the two mid quartiles.
So we would estimate members by simply counting the number of respondents but multiplying the number of respondents in the weak quartiles mentioned above by 1.2, 1.1 and 0.8 respectively. If however any result distribution within a house would be conspicuously skewed, which is not the case now, we would have to suspect manipulation and re-design our member count.
For the sample at hand however, these adjustments to member counts are minor, and yields the same house ranks as from just counting the respondents without adjustments.

I got myself to amuse me a little bit with your question and some python programming with some random generated data. As some people mentioned in the comments you need to define what is fairness. If as you said you don't know the number of people in each of the houses, you can use the number of participations of each house, thus you motivate participation (it can be unfair depending on the number of people of each house, but as you said you don't have this data on the first place).
The important part of the code is the following.
import numpy as np
from numpy.random import randint # import random int
# initialize random seed
np.random.seed(4)
houses = ["Gryffindor","Slytherin", "Hufflepuff", "Ravenclaw"]
houses_points = []
# generate random data for each house
for _ in houses:
# houses_points.append(randint(0, 100, randint(60,100)))
houses_points.append(randint(0, 50, randint(2,10)))
# count participation
houses_participations = []
houses_total_points = []
for house_id in xrange(len(houses)):
houses_total_points.append(np.sum(houses_points[house_id]))
houses_participations.append(len(houses_points[house_id]))
# sum the total number of participations
total_participations = np.sum(houses_participations)
# proposed model with weighted total participation points
houses_partic_points = []
for house_id in xrange(len(houses)):
tmp = houses_total_points[house_id]*houses_participations[house_id]/total_participations
houses_partic_points.append(tmp)
The results of this method are the following:
House Points per Participant
Gryffindor: [46 5 1 40]
Slytherin: [ 8 9 39 45 30 40 36 44 38]
Hufflepuff: [42 3 0 21 21 9 38 38]
Ravenclaw: [ 2 46]
House Number of Participations per House
Gryffindor: 4
Slytherin: 9
Hufflepuff: 8
Ravenclaw: 2
House Total Points
Gryffindor: 92
Slytherin: 289
Hufflepuff: 172
Ravenclaw: 48
House Points weighted by a participation factor
Gryffindor: 16
Slytherin: 113
Hufflepuff: 59
Ravenclaw: 4
You'll find the complete file with printing results here (https://gist.github.com/silgon/5be78b1ea0b55a20d90d9ec3e7c515e5).

You should enter some more rules to define the fairness.
Idea 1
You could set up the rule that anyone has to earn at least 10 points to enter the competition.
Then you can calculate the average points for each house.
Positive: Everyone needs to show some motivation.
Idea 2
Another approach would be to set the rule that from each house only the 10 best students will count for the competition.
Positive: Easy rule to calculate the points.
Negative: Students might become uninterested if they see they can't reach the top 10 places of their house.

From my point of view, your problem is diveded in a few points:
The best thing to do would be to re - assignate the player in the different Houses so that each House has the same number of players. (as explain by #navid-vafaei)
If you don't want to do that because you believe that it may affect your game popularity with player whom are in House that they don't want because you can change the choice of the Sorting Hat at least in the movie or books.
In that case, you can sum the point of the student's house and divide by the number of students. You may just remove the number of student with a very low score. You may remove as well the student with a very low activity because students whom skip school might be fired.
The most important part for me n your algorithm is weather or not you give points for all valuables things:
In the Harry Potter's story, the students earn point on the differents subjects they chose at school and get point according to their score.
At the end of the year, there is a special award event. At that moment, the Director gave points for valuable things which cannot be evaluated in the subject at school suche as the qualites (bravery for example).

Need help making team LINEUP by dynamic programming

Imagine you are a football team-coach. There are 11 players and 11 different positions in the field in which players play. Each player is capable of playing at all 11 different positions with a certain rating at the specified position.
You being the coach of the team have to decide the strongest possible LINEUP for the team (consisting of all 11 players) such that overall rating (i.e, sum of ratings) is maximized.
No two players can play at the same position.
As an example, consider a smaller LINEUP problem in which only 3 players play a certain game.
3 2 1
4 1 5
6 7 3
Player 1 can play at position 1 with rating 3, at position 2 with rating 2 and at position 3 with rating 1. Similarly for all the players the ith column represents their rating at ith position. The best LINEUP will be when player 1 plays at position 1, player 2 at position 3 and player 3 at position 2, resulting in maximum rating = 15 (3 + 5 + 7).
So, how can this problem be solved by Dynamic Programming? I have read on forums someone solving this problem by DP but I am unable to figure out how the problem possesses Optimal Substructure. So please help me figure out that....
Plz also mention that is it possible or not to solve the problem by DP
And please edit the title appropriately...

I believe you have an assignment problem here, which can be solved by the Hungarian Method.
And if you really want to have a DP solution, here is one idea. Why not have a F[i,j], i=0..11, j = 0..2^11-1. i - is the number of players you are allowed to select from and j - is a bitmask representing the field positions that are covered and F is the maximum "team value" you can get. For i = 1, for example, only those values of j whose binary representation contains at most one set bit are "valid". Obviously you cannot conver more than one position with just one player.
// initialize the F[][] array with -infinity
F[0][0] = 0;
for(i = 1; i <= 11; ++1)
{
for(j = 0; j < 2^11; ++j)
for(k = 0; k < 11; ++k )
if( (j & (1 << k)) == 0 ) // k-th position is not occupied?
F[i][j | (1 << k)] = max( F[i][j | (1 << k)], F[i-1][j] + <value of payer i playing at position k> );
}
ANSWER = F[11][2^11-1]
This obviously can be optimized: for F[i] we are only interested in bitmasks containing exactly i set bits.
There was a name for the technique of turning a combinatorial problem into a DP problem by representing the possible states with a bitmap, but I don't remember it. The proper solution for this is still Hungarian method.

it's a matching problem.
u can use KM algorithm to solve that wiki
and Hungarian Method which Alex mentioned is a special case of KM.
for DP method, Ales gave the right answer. since the number of players is not big, a bit manipulation method can be used here

Ranking algorithms to compare "Rankings"

Is there an algorithm that allows to rank items based on the difference of the position of those items in two rankings but also "weighted" with the position, e.g. one Player that goes from position 2->1 should be ranked higher than a player that went from 9->8.
Toy example, I have two lists/ranks:
Rank 1:
Player a
Player b
Player c
Player d
...
Rank 2:
Player d
Player c
Player a
Player b
...
I was thinking to "weight" the ranking difference with the average ranking (or other value), for example, if a player goes from 9->8 the value used to rank will be (9-8)/avg(8,9) = 1/8,5.

What you want seems more or less equivalent to Spearman's rank correlation in non-parametric statistics. It basically sums the squares of the amount_moved (the difference between the old rank and the new rank)

Number your list backwards. Calculate the "value" of moving between positions as the difference of the squares of these numbers.
So if you've got 10 items in your list:
2->1 would be 10^2 - 9^2 = 19
9->8 would be 3^2 - 2^2 = 5.
Hard to tell if this is exactly what you're after without knowing what kind of relative weights you're after. If this doesn't quite suit you, try raising/lowering the exponent to find something that fits.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio