Group Maker Algorithm based on free time avaliable - algorithm

I am trying to come up with an algorithm that would make groups based on the maximum free time available a group of people have in polynomial time, but I believe the solution to this problem might be NP.
The problem is as followed:
We divided the week into 1 hour slots where users can put down for each slot whether they are free or busy. Let's say we gather this information from 30 users. Let's also assume that users%group_size = 0
First:
Is it possible to put these people into groups of size G so that every member in each group of size G has one overlapping free time slot with each other?
Is it possible to put these people into groups of size G that results in an optimal solution, which is to have the maximum total overlapping free time slots among all groups?
For example, if we had a group of 6 people with the following free time:
A: 1pm-3pm Sunday AND 1pm-3pm Monday
B: 2pm-3pm Sunday AND 1pm-3pm Monday
C: 1pm-3pm Sunday AND 7pm-9pm Monday
D: 6pm-7pm Sunday AND 7pm-9pm Monday
E: 5pm-7pm Sunday AND 7pm-9pm Monday
F: 6pm-7pm Sunday AND 1pm-3pm Monday
The algorithm would determine that A,B,F would be one group and C,D,E would be another group because a maximum of two hours of free time overlaps between the groups. This is opposed to A,B,C and D,E,F which only contains 1 overlapping time slot for every member in the group. As a result, this is optimal solution which is greatest overlap in total among all groups.
I realized this problem is probably related to the Hopcroft-Karp Algorithm, but needs to be modified greatly to accomplish this task. Is their another algorithm that relates more closely to the solution then the Hopcroft-Karp Algorithm? Can this solution be achieved in polynomial time?
Background:
We have a bunch of people(30-50) who want to volunteer for a cause and they only have certain times they are free during the week. We want to break them into groups of 3-5 and have them work together for this cause. We want the group members to have as much time as possible with each other so we want to break them into groups where they have similar free times available.
Thanks a bunch and please let me know if this is an obvious question or if more clarification is needed.

On first look, it seems like a set cover problem, where a subset is number of persons sharing a time slot and the universal set would be all the persons.
U = {p0, p1, p2 ..... , p29} // Number of persons.
S = {S0, S1, S2, ....... S23} // number of 1 hour slots.
I am still not sure how to use the G(size of a ideal group) into account.

Related

How to find the cheapest combo of metro subscription

Lets say we have the following fares available :
1 trip
2 trips
10 trips
unlimited weekend (saturday to sunday, not 2 days)
1 day
3 days
unlimited week (monday to sunday, not 7 days)
unlimited month (1st to last day of month)
... with a price for every one of them.
The problem is : **How to determine what set of subscriptions to chose given a date of arrival and a date of departure ? **
Lets say we want the solution for n between 1 and 8, n being the number of time we take the metro daily (so we assume we take the metro the same number of time every day)
For example it would say something like :
n = 1
Arriving on Friday 19th and leaving Thursday 23th, the best is taking the 1 trip, then the weekend, then the 2 trips (didnt calculate but you see the point)
n = 2
...
I have found examples with only 1 day, 2 days, 7 days fares with dynamic programming, but it looks a lot harder when you considerate the days of the week.
Thanks :)
I like to view this kind of dynamic program as finding a shortest path in a directed acyclic graph.
Each node of the graph encodes
what the current day is (either during the travel period or the day after), and
how many trips remain on trip-limited passes (at most n + 9).
Each arc represents either
purchasing a specific pass at a specific time (the length of the arc is the cost of this pass), or
using trip-limited passes to cover the day's trips (the length of the arc is zero).
The time-limited passes advance the day to the first day they no longer work. The trip-limited passes increase the number of remaining trips. The zero-cost arcs advance the day by one while decreasing the number of remaining trips by n.
Given the shortest path, it is easy to decode it to a plan for purchasing passes.
(P.S. I don't know what the rules are on, e.g., purchasing a week pass on a Tuesday for the rest of the week. Even if this is not allowed, you're going to want to put arcs for the time-limited passes that could have been purchased on a previous day during the travel period.)

Algorithm for heavily restricted Knapsack problem

I have a following problem to solve:
We have 53 weeks in a year, for each week we need to choose one model from the list: [A1, A2,....,F149, F150]. In total around 750 models in 6 classes: A,B,C,D,E,F.
Models can repeat and each has a specific value from around 3 to 10 and a weight. The goal is to achieve a target total value of 280+-5% with a minimal weight by the end of the year.
However there are a ton of restrictions. For example:
Models must be held for at least 4 weeks in a row. If we have chosen A1 for week 1, then we need to choose A1 for weeks 2,3,4;
If we have chosen model classes E,F then, after they end, we cannot choose E, F for another 4 weeks.
Throughout the year we can only choose 23 models of class D.
an so on
What I've tried so far:
Based on a target value create a corridor of allowed values throughout the year:
Corridor looks like this
Starting at week 1, choose a random model for the week from the list of allowed models -> Based on the choice modify the list of allowed models for next weeks
If our choice satisfies the criterion (also lies within a corridor), then week+=1. If not, delete this possibility.
If there is no more models for this week, go back one week, delete the possibility we have chosen before and choose random from what's left.
Pictorially the algorithm is like following the branches of a tree. If the branch is bad, return back to the fork and cut off the bad branch.
This algorithm can generate a random valid solution (in about 5 to 80 minutes with a mean time of 25 minutes). Then I need to generate more of those and choose one that has the least weight. Which is not a very good approach, I presume.
Question
The question is: what is the optimal way to solve the problem? The priority is to find the solution with a minimal weight and a target value and not the fastest algorithm. But it should at least end in a final amount of time =)
The problem statement above is a bit oversimplified and due to the complexity of calculations and the amount of combinations, there is no way to consider and compare all combinations.

Best approach to a variation of a bucketing problem

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.
Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.
I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Variation to the Set-Covering Prob (Maybe an Activity Selection Prob)

Everyday from 9am to 5pm, I am supposed to have at least one person at the factory supervising the workers and make sure that nothing goes wrong.
There are currently n applicants to the job, and each of them can work from time si to time ci, i = 1, 2, ..., n.
My goal is to minimize the time that more than two people are keeping watch of the workers at the same time.
(The applicants' available working hours are able to cover the time period from 9am to 5pm.)
I have proved that at most two people are needed for any instant of time to fulfill my needs, but how should I get from here to the final solution?
Finding the time periods where only one person is available for the job and keeping them is my first step, but finding the next step is what troubles me... .
The algorithm must run in polynomial-time.
Any hints(a certain type of data structure maybe?) or references are welcome. Many thanks.
I think you can do this with dynamic programming by solving the sub-problem:
What is the minimum overlap time given that applicant i is the last worker and we have covered all times from start of day up to ci?
Call this value of the minimum overlap time cost(i).
You can compute the value of cost(i) by considering cases:
If si is equal to the start of day, then cost(i) = 0 (no overlap is required)
Otherwise, consider all previous applicants j. Set cost(i) to the minimum of cost(j)+overlap between i and j. Also set prev(i) to the value of j that attains the minimum.
Then the answer to your problem is given by the minimum of cost(k) for all values of k where ck is equal to the end of the day. You can work out the correct choice of people by backtracking using the values of prev.
This gives an O(n^2) algorithm.

Event Scheduling Greedy

We are given N ranges of date offsets when N employees are present in an
organization. Something like
1-4 (i.e. employee will come on 1st, 2nd, 3rd and 4th day )
2-6
8-9
..
1-14
We have to organize an event on minimum number of days such that each
employee can attend the event at least twice.Please suggest the algorithm(probably greedy) to do this.
PS: Event is one day event.
If your data is small, you can just brute-force it. Pick all possible combination of 2 days. For each combination, try it and see if everyone can attend both. If not, pick all possible combinations of 3 days, see if everyone can attend 2 out of the 3, and so on. It's exponential, but may not be so bad for your purposes.
The greedy approach is to count how many people are at work each day, and pick the day with the maximum number of people. Repeating, count how many people are at work each day who don't already have two events scheduled and pick the day with the maximum number of people. Of course, don't pick the same day twice.
I think this can be done by the following greedy approach on events sorted with end date
Maintain a num count for all intervals. (Initialize all to 0)
If num = 0 place the two events on the last two days of this interval.
If num = 1 place one event on the last day of this interval
If num = 2 already two events have been covered for this interval.
Placing on the event in an interval can lead to increase in num count of the succeeding event.

Resources