Best approach to a variation of a bucketing problem - algorithm

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.

Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.

I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Related

Football Guaranteed Relegation/Promotion Algorithm

I'm wondering if there is a way to speed up the calculation of guaranteed promotion in football (soccer) for a given football league table. It seems like there is a lot of structure to the problem so the exhaustive solution is perhaps slower than necessary.
In the problem there is a schedule (a list of pairs of teams) that will play each other in the future and a table (map) of points each team has earned in games in the past. I've included a sample real life points table below. Each future game can be won, lost or tied and teams earn 3 points for a win and 1 for a tie. Points (Pts) is what ultimately matters for promotion and num_of_promoted_teams (positive integer, usually around 1-3) are promoted at the end of each season.
The problem is to determine which (if any) teams are currently guaranteed promotion. Where guaranteed promotion means that no matter the outcome of the final games the team must end up promoted.
def promoted(num_of_promoted_teams, table, schedule):
return guaranteed_promotions
I've been thinking about using depth first search (of the future game results) to eliminate teams which would lower the average but not the worst case performance. This certainly help early in the season, but the problem could become large in mid-season before shrinking again near the end. It seems like there might be a better way.
A constraint solver should be fast enough in practice thanks to clever pruning algorithms, and hard to screw up. Here’s some sample code with the OR-Tools CP-SAT solver.
from ortools.sat.python import cp_model
def promoted(num_promoted_teams, table, schedule):
for candidate in table.keys():
model = cp_model.CpModel()
final_table = table.copy()
for home, away in schedule:
home_win = model.NewBoolVar("")
draw = model.NewBoolVar("")
away_win = model.NewBoolVar("")
model.AddBoolOr([home_win, draw, away_win])
model.AddBoolOr([home_win.Not(), draw.Not()])
model.AddBoolOr([home_win.Not(), away_win.Not()])
model.AddBoolOr([draw.Not(), away_win.Not()])
final_table[home] += 3 * home_win + draw
final_table[away] += draw + 3 * away_win
candidate_points = final_table[candidate]
num_not_behind = 0
for team, team_points in final_table.items():
if team == candidate:
continue
is_behind = model.NewBoolVar("")
model.Add(team_points < candidate_points).OnlyEnforceIf(is_behind)
model.Add(candidate_points <= team_points).OnlyEnforceIf(is_behind.Not())
num_not_behind += is_behind.Not()
model.Add(num_promoted_teams <= num_not_behind)
solver = cp_model.CpSolver()
status = solver.Solve(model)
if status == cp_model.INFEASIBLE:
yield candidate
print(*promoted(2, {"A": 10, "B": 8, "C": 8}, [("B", "C")]))
Here’s an alternative solution that is less extensible and probably slower in exchange for being self-contained and predictable.
This solution consists of an algorithm to test whether a particular team can finish behind a particular set of other teams (assuming unfavorable tie-breaking), wrapped in a loop over pairs consisting of a top-k team ℓ and a set of k teams W that might or might not finish ahead of ℓ (where k is the number of promoted teams).
If there were no draws, then we could use bipartite matching. Mark ℓ as having lost its remaining matches and mark W as having won their matches against teams not in W. On one side of the bipartite graph, there are nodes corresponding to matches between members of W. On the other side, there are zero or more nodes for each team in W, corresponding to the number of matches that that team must win to pull ahead of ℓ. If there is a matching that completely matches the latter side, then W can finish collectively in front of ℓ, and ℓ is not guaranteed promotion.
This could be extended easily if wins were 2 points instead of 3, but alas, 3 points causes the problem not to be convex, and we’re going to need some branching. The simplest branching method depends on the observation that it’s better for two teams to each win once and lose once against each other than draw twice. Hence, loop over all subsets of at most k choose 2 pairs of teams and run the algorithm above after marking each pair in the subset as having drawn once.
(I could propose improvements, but k is small, computers are cheap, programmers are expensive, and sports fans are relentless.)

probability of winning a special sing-elimination tournament using dynamic programming and bitmask

Let's say we have N teams in a tournament and based on historical data we know what is the probability of each team winning any other team .Lets put all the probabilities in a matrix called P . P[a][b] is the probability team a winning team b. It is obvious that P[a][a] = 0 and P[a][b] = 1-P[b][a].
In this tournament at every round, two of teams compete against each other and the loser is eliminated. This two team are chosen randomly (with equal possibility of each team being picked). So at the first round we have n teams, next n-1 teams and so on until only one team remains and becomes the champion. What is the probability of each team becoming the champion? ( 1 <= N <= 18).
At first when I didn't know how to approach the problem but after some reading and search and keeping in mind that max n is 18 I figured at that using Dynamic programming and Bitmask is the way to go. How ever I couldn't figure at a solution. Here are my problems:
I have really hard time to figure at what are the sub problems and what sub problems should not be recomputed, basically I can't find a well defined recursive ( or not recursive) equation for the problem
In bitmask+dp problems we usually define something like dp[mask][n] or dp[n][mask]. I tried different approaches to define the mask but since the general solution is not clear to me there was no success
Some guidance on this two problems would be very helpful.
This is not really a dynamic programming problem.
If you have a vector V that gives the probability of each player being in the game after n rounds, then you can calculate the player probabilities for n+1 rounds by:
V'i = 2/((18-n)(17-n)) * sum over all j!=i of [ViVjPi,j]
That first factor is the probability that any given available match will be chosen, which depends on the number of previous rounds, because each successive round has fewer players to match up.
The second part is the probability of the players being available for each match, times the probability that the current player will win.
Just do this calculation 17 times to get the player probabilities after 17 rounds, which is the answer you're looking for. You can even drop that first factor, and fix it at the end by normalizing the vector so that the probabilities sum to 1.

Variation to the Set-Covering Prob (Maybe an Activity Selection Prob)

Everyday from 9am to 5pm, I am supposed to have at least one person at the factory supervising the workers and make sure that nothing goes wrong.
There are currently n applicants to the job, and each of them can work from time si to time ci, i = 1, 2, ..., n.
My goal is to minimize the time that more than two people are keeping watch of the workers at the same time.
(The applicants' available working hours are able to cover the time period from 9am to 5pm.)
I have proved that at most two people are needed for any instant of time to fulfill my needs, but how should I get from here to the final solution?
Finding the time periods where only one person is available for the job and keeping them is my first step, but finding the next step is what troubles me... .
The algorithm must run in polynomial-time.
Any hints(a certain type of data structure maybe?) or references are welcome. Many thanks.
I think you can do this with dynamic programming by solving the sub-problem:
What is the minimum overlap time given that applicant i is the last worker and we have covered all times from start of day up to ci?
Call this value of the minimum overlap time cost(i).
You can compute the value of cost(i) by considering cases:
If si is equal to the start of day, then cost(i) = 0 (no overlap is required)
Otherwise, consider all previous applicants j. Set cost(i) to the minimum of cost(j)+overlap between i and j. Also set prev(i) to the value of j that attains the minimum.
Then the answer to your problem is given by the minimum of cost(k) for all values of k where ck is equal to the end of the day. You can work out the correct choice of people by backtracking using the values of prev.
This gives an O(n^2) algorithm.

Cost minimizing algorithm (time limited)

Let's say I have a group of N people who is going to travel by train. I need to organize them in a line to the ticket office in a way that minimizes the total tickets cost. The cost can be minimized if families buy family tickets and people travelling to the same destination buy group tickets.
I do not know who of these people are families and where are they travelling.
All I can do is to send any M (1 <= M <= N) of them to the ticket office and get the answer, how much it will cost for these M people.
I also have a limited time, as a train is leaving in some minutes, so the near best solution is good enough for me.
The brute force solution is O(N!) and so is obviously unacceptable.
EDIT
The answer from the ticket office is always the total sum for M people, no details.
Group and/or family may start at 2 people and may include all N of them.
The cushier in the tickets office will always know who is a family and who is not.
EDIT
If I am sending to the tickets office a family and some more people, the family will not get a family ticket, they all will get their regular tickets.
Since you mentioned you can settle for 'good enough', and you are time limited - here is a greedy any-time approach:
p <- Create a random permutation
estimate the cost of this permutation. (let that be cost)
Find a new permutation p' such that cost(p') < cost(p) that is achieved from p using a single swap of two people (there are n(n-1)/2 such possible swaps)1
If such p' exist: p <- p' and return to 2.
Else, store p as a local minimum, and return to 1.
When time is up - choose the best local minimum found.
This is basically a variation of steepest ascent hill climbing with random restarts.
Note that if your time->infinity, you will find optimal solution,
because the probability of checking any possible permutation is
getting closer and closer to 1 as time passes.
(1) getting the price can be done by first checking who is a family member/going to the same destination and is adjacent to each other in the permutation at O(n)using the fact that d(passenger1,passenger2) < d(passenger1) + d(passenger2), and then checking each group separately.

Is there a well understood algorithm or solution model for this meeting scheduling scenario?

I have a complex problem and I want to know if an existing and well understood solution model exists or applies, like the Traveling Salesman problem.
Input:
A calendar of N time events, defined by starting and finishing time, and place.
The capacity of each meeting place (maximum amount of people it can simultaneously hold)
A set of pairs (Ai,Aj) which indicates that attendant Ai wishes to meet with attendat Aj, and Aj accepted that invitation.
Output:
For each assistant A, a cronogram of all the events he will attend. The main criteria is that each attendants should meet as many of the attendants who accepted his invites as possible, satisfying the space constraints.
So far, we thought of solving with backtracking (trying out all possible solutions), and using linear programming (i.e. defining a model and solving with the simplex algorithm)
Update: If Ai already met Aj in some event, they don't need to meet anymore (they have already met).
Your problem is as hard as minimum maximal matching problem in interval graphs, w.l.o.g Assume capacity of rooms is 2 means they can handle only one meeting in time. You can model your problem with Interval graphs, each interval (for each people) is one node. Also edges are if A_i & A_j has common time and also they want to see each other, set weight of edges to the amount of time they should see each other, . If you find the minimum maximal matching in this graph, you can find the solution for your restricted case. But notice that this graph is n-partite and also each part is interval graph.
P.S: note that if the amount of time that people should be with each other is fixed this will be more easier than weighted one.
If you have access to a good MIP solver (cplex/gurobi via acedamic initiative, but coin OR and LP_solve are open-source, and not bad either), I would definitely give simplex a try. I took a look at formulating your problem as a mixed integer program, and my feeling is that it will have pretty strong relaxations, so branch and cut and price will go a long way for you. These solvers give remarkably scalable solutions nowadays, especially the commercial ones. Advantage is they also provide an upper bound, so you get an idea of solution quality, which is not the case for heuristics.
Formulation:
Define z(i,j) (binary) as a variable indicating that i and j are together in at least one event n in {1,2,...,N}.
Define z(i,j,n) (binary) to indicate they are together in event n.
Define z(i,n) to indicate that i is attending n.
Z(i,j) and z(i,j,m) only exist if i and j are supposed to meet.
For each t, M^t is a subset of time events that are held simulteneously.
So if event 1 is from 9 to 11, event 2 is from 10 to 12 and event 3 is from 11 to 13, then
M^1 = {event 1, event 2) and M^2 = {event 2, event 3}. I.e. no person can attend both 1 and 2, or 2 and 3, but 1 and 3 is fine.
Max sum Z(i,j)
z(i,j)<= sum_m z(i,j,m)
(every i,j)(i and j can meet if they are in the same location m at least once)
z(i,j,m)<= z(i,m) (for every i,j,m)
(if i and j attend m, then i attends m)
z(i,j,m)<= z(j,m) (for every i,j,m)
(if i and j attend m, then j attends m)
sum_i z(i,m) <= C(m) (for every m)
(only C(m) persons can visit event m)
sum_(m in M^t) z(i,m) <= 1 (for every t and i)
(if m and m' are both overlapping time t, then no person can visit them both. )
As pointed out by #SaeedAmiri, this looks like a complex problem.
My guess would be that the backtracking and linear programming options you are considering will explode as soon as the number of assistants grows a bit (maybe in the order of tens of assistants).
Maybe you should consider a (meta)heuristic approach if optimality is not a requirement, or constraint programming to build an initial model and see how it scales.
To give you a more precise answer, why do you need to solve this problem? what would be the typical number of attendees? number of rooms?

Resources