Algorithm Development for a Mentoring Event - algorithm

I am attempting to create an algorithm that pairs mentees with a different group of 1-4 mentors in each "round" of mentoring.
The algorithm would accept 3 inputs:
The number of mentees - an integer between 1 and 35
The number of mentors - Always greater than or equal to the number of mentees, but never greater than 35
The number of rounds of mentoring to perform
Given these three inputs, the algorithm would match each mentee with 1-4 mentors, with the following restrictions:
If it is possible to do so with the given inputs, a mentee must never be paired with the same mentor more than once
If it is possible to do so with the given inputs, a mentor must never be in the same group of mentors as another mentor more than once
Note: The algorithm does NOT have to assign mentors to mentees randomly. The code can give the same output each time it is run with the same set of inputs.
Here is an example of the output of a successful algorithm with 2 rounds, 4 mentees, and 11 mentors:
| Round
| Mentee | 1 | 2 |
| 1 | 1, 2, 3| 4, 7, 10|
| 2 | 4, 5, 6| 1, 8, 11|
| 3 | 7, 8, 9| 2, 5 |
| 4 |10, 11 | 3, 6, 9 |
Please let me know if you have any questions, or if what I am asking for is in fact impossible. Thank you very much for your time and assistance, and have a great day.
Sincerely,
Tyrovar

Interesting problem!
It's unlikely that it has a fast solution, unfortunately, because it's related to some long-standing combinatorial problems that don't have efficient constructive solutions I'm aware of.
You can think of the problem in two parts: (1) deciding the groups of mentors for each round, and then (2) pairing these groups with mentees.
Call n the number of mentors and m <= n the number of mentees. And assume m evenly divides n. (If not, you could just add m - (n mod m) "fake" mentors, i.e. empty chairs.)
Then just the first part (1) to figure out the groups of mentors in each round, is equivalent (with sufficiently many rounds) to finding a S(2, n/m, n) Steiner Triple System. There has been a good deal of research on these combinatorial objects, and some algorithms published, but to my knowledge none of them will be fast or even polynomial-time.
However, for reasonable parameter sizes, you could hope to generate the groups of mentors with a greedy algorithm approach: For each mentor, keep track (in a set data structure) of which other mentors that mentor has been in groups with already. For each round, start making groups of mentors that haven't been in groups with each other yet, and then add those group members to each list. If this fails, however, you may have to resort to a brute-force approach to find the groups.
Once you have the groups of mentors, pairing them with the mentees in each round becomes an instance of the bipartite perfect matching problem, to which there are a few reasonably efficient solutions, including the Hopcroft-Karp algorithm. Again, you would want to keep track of which mentors each mentee has seen, and update these in each round.
So in general, the approach I would suggest is:
Create a set of mentors for every mentor and for every mentee, initially empty.
Greedily select m/n groups of n mentors each that have not been in the same group before. If you get "stuck", resort to a brute-force search over all possible groupings until you find a suitable one.
Create a bipartite graph, with one vertex for each mentee, one vertex for each group of mentors in the current round, and an edge between them if none of the mentors in the group has seen that mentee yet.
Find a maximum matching in that graph - this is the pairings of mentors to mentees for the current round.
Repeat Steps 2-4 for the remaining rounds.
You might also want to look at the Kirkman's Schoolgirl's Problem, which is closely related to what you are asking and a good, simple illustration of why it's difficult in general to solve.

Related

Shortest time for everyone to get to the destination

I have to design an algorithm to solve a problem:
We have two groups of people (group A and group B, the number of people in group A is always less or equal to the number of people in group B), all standing in a one-dimensional line, each people have a corresponding number indicating its location. When the timer starts, each people in group A must find a partner in group B, but people in group B cannot move at all and each people in group B can only have at most 1 partner.
Suppose that people in group A move 1 unit/sec, how can I find the minimum time for everyone in group A to find a partner?
for example, if there are three people in group A with location {5,7,8}, and four people in group B with location {2,3,4,9}, the optimal solution would be 3 sec because max(5-3,7-4,9-8)=3
I could just use brute-force to solve it, but is there a better way of solving this problem?
This problem is a special case of the edit distance problem, and so a similar Dynamic Programming solution can be used to solve it. It's possible that a faster solution exists for this special case.
Let A = [a_0, a_1...,a_(m-1)] be the (sorted) positions of our m moving people, and B = [b_0, b_1...,b_(n-1)] be the n (sorted) destination spots, with m <= n. For the edit distance analogy, the allowed operations are:
Insert a number into A (free), or
Substitute an element a -> a' in A with cost |a-a'|.
We can solve this in O(n*m) time (plus sorting time of both A and B, if necessary).
We can define the dynamic programming via a cost function C(i, j) which is the minimum cost to move the first i people a_0, ... a_(i-1) using only the first j spots b_0, ... b_(j-1). You want C(m,n). Define C as follows:

Best approach to a variation of a bucketing problem

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.
Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.
I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Second best solution to an assignmentproblem using the Hungarian Algorithm

For finding the best solution in the assignment problem it's easy to use the Hungarian Algorithm.
For example:
A | 3 4 2
B | 8 9 1
C | 7 9 5
When using the Hungarian Algorithm on this you become:
A | 0 0 1
B | 5 5 0
C | 0 1 0
Which means A gets assigned to 'job' 2, B to job 3 and C to job 1.
However, I want to find the second best solution, meaning I want the best solution with a cost strictly greater that the cost of the optimal solution. According to me I just need to find the assignment with the minimal sum in the last matrix without it being the same as the optimal. I could do this by just searching in a tree (with pruning) but I'm worried about the complexity (being O(n!)). Is there any efficient method for this I don't know about?
I was thinking about a search in which I sort the rows first and then greedily choose the lowest cost first assuming most of the lowest costs will make up for the minimal sum + pruning. But assuming the Hungarian Algorithm can produce a matrix with a lot of zero's, the complexity is terrible again...
What you describe is a special case of the K best assignments problem -- there was in fact a solution to this problem proposed by Katta G. Murty in the following 1968 paper "An Algorithm for Ranking all the Assignments in Order of Increasing Cost." Operations Research 16(3):682-687.
Looks like there are actually a reasonable number of implementations of this, at least in Java and Matlab, available on the web (see e.g. here.)
In r there is now an implementation of Murty's algorithm in the muRty package.
CRAN
GitHub
It covers:
Optimization in both minimum and maximum direction;
output by rank (similar to dense rank in SQL), and
the use of either Hungarian algorithm (as implemented in clue) or linear programming (as implemented in lpSolve) for solving the initial assignment(s).
Disclaimer: I'm the author of the package.

Finding subsets being used at most k times

Every now and then I read all those conspiracy theories about Lotto-based games being controlled and a computer browsing through the combinations chosen by the players and determining the non-used subset. It got me thinking - how would such algorithm have to work in order to determine such subsets really efficiently? Finding non-used numbers is definitely crossed out as is finding the least used because it's not necesserily providing us with a solution. Also, going deeper, how could an algorithm efficiently choose such a subset that it was used some k times by the players? Saying more formally:
We are given a set of 50 numbers 1 to 50. In the draw 6 numbers are picked.
INPUT: m subsets each consisting of 6 distinct numbers 1 to 50 each,
integer k (0<=k) being the maximum players having all of their 6
numbers correct.
OUTPUT: Subsets which make not more than k players win the jackpot ('winning' means all the numbers they chose were picked in the draw).
Is there any efficient algorithm which could calculate this without using a terrabyte HDD to store all the encounters of every possible 50!/(44!*6!) in the pessimistic case? Honestly, I can't think of any.
If I wanted to run such a conspirancy I would first of all acquire the list of submissions by players. Then I would generate random lottery selections and see how many winners would be produced by each such selection. Then just choose the random lottery selection most attractive to me. There is little point doing anything more sophisticated, because that is probably already powerful enough to be noticed by staticians.
If you want to corrupt the lottery it would probably be easier and safer to select a few competitors you favour and have them win the lottery. In (the book) "1984" I think the state simply announced imaginary lottery winners, with the announcement in each area announcing somebody outside the area. One of the ideas in "The Beckoning Lady" by Margery Allingham is of a gang who attempt to set up a racecourse so they can rig races to allow them to disguise bribes as winnings.
First of all, the total number of combinations (choosing 6 from 50) is not very large. It is about 16 million which can be easily handled.
For each combination keep a count of number of people who played it. While declaring a winner choose the combination that has less than k plays.
If the number within each subset are sorted, then you can treat your subsets as strings - sort them in lexicographical order, then it is easy to count how many players selected each subset (and which subsets were not selected at all). So the time is proportional to the number of players and not the number of numbers in the lottery.

Algorithm to select random pairs, schedule matchups

I'm working in Ruby, but I think this question is best asked agnostic of language. It may be assumed that we have access to basic list/array functions, as well as a "random" number generator. Here's what I'd like to be able to do:
Given a collection of n teams, with n even,
Randomly pair each team with an opponent, such that every team is part of exactly one pair. Call this ROUND 1.
Randomly generate n-2 subsequent rounds (ROUND 2 through ROUND n-1) such that:
Each round has the same property as the first (every team is a
member of one pair), and
After all the rounds, every team has faced every other team exactly once.
I imagine that algorithms for doing exactly this must be well known, but as a self-taught coder I'm having trouble figuring out how to find them.
I belive You are describing a round robin tournament. The wikipedia page gives an algorithm.
If You need a way to randomize the schedule, randomize team order, round order, etc.
Well not sure if this is the most efficient algorithm but:
Randomly assign N teams into two lists of same length n/2 (List1, List2)
Starting with i = 0:
Create pairs: List1[i],List2[i] = a team pair
Repeat for i = 1-> (n/2-1)
For rounds 2-> n/2-1:
Rotate List2, so that the first team in List2 is now at the end.
Repeat steps 2 through 5, until List2 has been cycled once.
This link was very helpful to me the last time I wrote a round robin scheduling algorithm. It includes a C implementation of a first fit algorithm for round robin pairings.
http://www.devenezia.com/downloads/round-robin/
In addition to the algorithm, he has some helpful links to other aspects of tournament scheduling (balancing home and away games, as well as rotating teams across fields/courts).
Note that you don't necessarily want a "random" order to the pairings in all cases. If, for example, you were scheduling a round robin soccer league for 8 games that only had 6 teams, then each team is going to have to play two other teams twice. If you want to make a more enjoyable season for everyone, you have to start worrying about seeding so that you don't have your top 2 teams clobbering the two weakest teams in their last two games. You'd be better off arranging for the extra games to be paired against teams of similar strength/seeding.
Based on info I found through Maniek's link, I went with the following:
A simple round robin algorithm that
a. Starts with pairings achieved by zipping [0,...,(n-1)/2] and [(n-1)/2 + 1,..., n-1]. (So, if n==10, we have 0 paired with 5, 1 with 6, etc.)
b. Rotates all but one team n-2 times clockwise until all teams have played each other. (So in round 2 we pair 1 with 6, 5 with 7, etc.)
Randomly assigns one of [0,..., n-1] to each of the teams.

Resources