How to design an algorithm to put elements into groups with constraints? - algorithm

I was given a task of putting students into groups (to prepare a coding camp), but with several constraints. Though I've finished the task by hand, I'd like to know is there already exist some algorithms for tasks like this, or how can I design such an algorithm.
Background: 40 students in total, with these attributes:
gender: F/M
grade: Year 1/2
school: School 1/School 2/...
early assessment result: Rank from 1 to 40
Constraints: All of them needs to be satisfied.
Exactly 4 people per group
Each group needs to have at least a girl
Each group needs to have at least a Year 2 student
4 group members needs to come from 4 different schools
Each group needs to have at least a student who ranked top 10 in early assessment
What I'm expecting:
The Best: An existing algorithm/program for these kind of problems
Or, An algorithm for this specific problem
Or at least, Some ideas of creating an algorithm for this specific problem
My thoughts:
Since I've successed in making groups by hand, I know that such a solution indeed exists for my current dataset. But if I need an algorithm to find a solution for me, it should first try to check whether a solution even exists, by check if the number of girl / Year 2 students is greater than 10 (with pigeonhole principle), and some other conditions. And obviously, Constraint 5 is the easiest, and can provide a base solution for the rest. However, I still can not find a systematic way of doing it. Perhaps bruteforce and randomization can help? I'm not sure.
And sorry, since the data is confidential, I can not post it.
Update: After consulting a friend, here is a possible method:
First put the top 1 to 10 into 10 different groups.
Then iterate through groups. If the only person in the group is a boy/girl, try to add a girl/boy from a different school.
Then the problem size is reduced from 2^40 to 2^20, making bruthforce a viable solution.

Related

Which algorithm for assigning pattern shifts

I'm looking for a good algorithm or technic to find the best solution for the following problem. First, I’ll introduce the context and then, the problem.
I work for a company with more than 2000 employees; all of them work with pattern shift, this means that any employee has a pattern which specifies the sequence of workday and free day. We have these patterns:
5-2-5-2 (5 days work, 2 free, 5 days work, 2 free) and so on.
5-2-4-3
5-4-5-3
5-3-5-3
At this moment we have all these patterns and different numbers of start, that is to say, a pattern can start at a certain date at a specific part inside the pattern, for e.g., the pattern 5-4-5-3 has 17 possible starting sequences, this number is a sum of 5+4+5+3 = 17 possible sequences.
https://en.wikipedia.org/wiki/Shift_plan
Now the problem,
Every 6 months each employee can change the pattern and start in any sequence number of the pattern.
But we must analyze all the requirements and accept or reject in order to obtain the better combination for the company operation, because we need that every day have the same work force but we understand that this is impossible but the algorithm will help us find a good solution, not perfect.
I was reading about the "Nurse scheduling problem" with Google Or-Tool but I don’t understand how to set Pattern Sequence to create a solution for this problem. I read some opinions about GA (genetic algorithms) and all of them said that this kind of solution is not good for this kind of problem.
Does anyone have a similar problem? Can someone give me a more accurate example with Google OR-tools than the example in GitHub.
it's not necessary to find a strictly optimal solution; the roster is currently done manual, and I'm pretty sure the result is considerably sub-optimal most of the time.
Does anyone have a similar problem?
Sounds a lot like OptaWeb Employee Rostering, which is a vertical on top of OptaPlanner, the constraint solver. Take a look at the source code. It's all open source.
I think this can be modeled as a MIP model.
Thinking out loud:
Introduce a binary decision variable:
δ(i,p) = 1 if pattern i is selected for person p
0 otherwise
This includes the current pattern (say i=0). This would allow the cases:
an employee does not submit a new pattern (then we only have i=0 for this employee)
an employee submits one or more preferable patterns
We have the constraints:
sum(i, δ(i,p)) = 1 ∀p
sum((i,p), pattern(i,p,t)*δ(i,p)) ≈ requiredlevel(t) ∀t
δ(i,p) ∈ {0,1}
here pattern(i,p,t) describes pattern i: it is 1 if period t is covered when pattern (i,p) is used and 0 otherwise. Here I use ≈ to indicate "approximately". (This is easily modeled using slacks and possibly penalty terms in the objective).
Now we maximize
maximize sum((i,p), weight(i,p) * δ(i,p))
where weight(i,p) indicates the preference for a pattern (e.g. weight(0,p)=0 i.e. no bonus points when not selected a newly, preferred pattern).
Something like this should not be too difficult to set up. Of course many refinements are possible. These type of model tend to solve quite quickly.
What is the workflow ?
If you have a fixed roster, and one person proposes a new pattern. Just remove this person contribution, test all (17) starting points of the new pattern and score them.
If you can change patterns, or starting points for more than 1 employee, create an integer variable per starting point. From this starting point, it is easy to compute the persons contribution for each shifted day of the pattern. Then you can optimize quality of service w.r.t. the starting points of each pattern, summing potential contributions per day of the week for each employee.
Is that clear ?

n! combinations, how to find best one without killing computer?

I'll get straight to it. I'm working on an web or phone app that is responsible for scheduling. I want students to input courses they took, and I give them possible combinations of courses they should take that fits their requirements.
However, let's say there's 150 courses that fits their requirements and they're looking for 3 courses. That would be 150C3 combinations, right?.
Would it be feasible to run something like this in browser or a mobile device?
First of all you need a smarter algorithm which can prune the search tree. Also, if you are doing this for the same set of courses over and over again, doing the computation on the server would be better, and perhaps precomputing a feasible data structure can reduce the execution time of the queries. For example, you can create a tree where each sub-tree under a node contains nodes that are 'compatible'.
Sounds to me like you're viewing this completely wrong. At most institutions there are 1) curriculum requirements for graduation, and 2) prerequisites for many requirements and electives. This isn't a pure combinatorial problem, it's a dependency tree. For instance, if Course 201, Course 301, and Course 401 are all required for the student's major, higher numbers have the lower numbered ones as prereqs, and the student is a Junior, you should be strongly recommending that Course 201 be taken ASAP.
Yay, mathematics I think I can handle!
If there are 150 courses, and you have to choose 3, then the amount of possibilities are (150*149*148)/(3*2) (correction per jerry), which is certainly better than 150 factorial which is a whole lot more zeros ;)
Now, you really don't want to build an array that size, and you don't have to! All web languages have the idea of randomly choosing an element in an array, so you get an element in an array and request 3 random unique entries from it.
While the potential course combinations is very large, based on your post I see no reason to even attempt to calculate them. This task of random selection of k items from n-sized list is delightfully trivial even for old, slow devices!
Is there any particular reason you'd need to calculate all the potential course combinations, instead of just grab-bagging one random selection as a suggestion? If not, problem solved!
Option 1 (Time\Space costly): let the user on mobile phone browse the list of (150*149*148) possible choices, page by page, the processing is done at the server-side.
Option 2 (Simple): instead of the (150*149*148)-item decision tree, provide a 150-item bag, if he choose one item from the bag, remove it from the bag.
Option 3 (Complex): expand your decision tree (possible choices) using a dependency tree (parent course requires child courses) and the list of course already taken by the student, and his track\level.
As far as I know, most educational systems use the third option, which requires having a profile for the student.

Algorithm to find the best possible available times

Here is my scenario,
I run a Massage Place which offers various type of massages. Say 30 min Massage, 45 min massage, 1 hour massage, etc. I have 50 rooms, 100 employees and 30 pieces of equipment.When a customer books a massage appointment, the appointment requires 1 room, 1 employee and 1 piece of equipment to be available.
What is a good algorithm to find available resources for 10 guests for a given day
Resources:
Room – 50
Staff – 100
Equipment – 30
Business Hours : 9AM - 6PM
Staff Hours: 9AM- 6PM
No of guests: 10
Services
5 Guests- (1 hour massages)
3 Guests - (45mins massages)
2 Guests - (1 hour massage).
They are coming around the same time. Assume there are no other appointment on that day
What is the best way to get ::
Top 10 result - Fastest search which meets all conditions gets the top 10 result set. Top ten is defined by earliest available time. 9 – 11AM is best result set. 9 – 5pm is not that good.
Exhaustive search (Find all combinations) - all sets – Every possible combination
First available met (Only return the first match) – stop after one of the conditions have been met
I would appreciate your help.
Thanks
Nick
First, it seems the number of employees, rooms, and equipment are irrelevant. It seems like you only care about which of those is the lowest number. That is your inventory. So in your case, inventory = 30.
Next, it sounds like you can service all 10 people at the same time within the first hour of business. In fact, you can service 30 people at the same time.
So, no algorithm is necessary to figure that out, it's a static solution. If you take #Mario The Spoon's advice and weight the different duration massages with their corresponding profits, then you can start optimizing when you have more than 30 customers at a time.
Looks like you are trying to solve a problem for which there are quite specialized software applications. If your problem is small enough, you could try to do a brute force approach using some looping and backtracking, but as soon as the problem becomes too big, it will take too much time to iterate through all possibilities.
If the problem starts to get big, look for more specialized software. Things to look for are "constraint based optimization" and "constraint programming".
E.g. the ECLIPSe tool is an open-source constraint programming environment. You can find some examples on http://eclipseclp.org/examples/index.html. One nice example you can find there is the SEND+MORE=MONEY problem. In this problem you have the following equation:
S E N D
+ M O R E
-----------
= M O N E Y
Replace every letter by a digit so that the sum is correct.
This also illustrates that although you can solve this brute-force, there are more intelligent ways to solve this (see http://eclipseclp.org/examples/sendmore.pl.txt).
Just an idea to find a solution:
You might want to try to solve it with a constraint satisfaction problem (CSP) algorithm. That's what some people do if they have to solve timetable problems in general (e.g. room reservation at the University).
There are several tricks to improve CSP performance like forward checking, building a DAG and then do a topological sort and so on...
Just let me know, if you need more information about CSP :)

What class of algorithms can be used to solve this?

EDIT: Just to make sure someone is not breaking their head on the problem... I am not looking for the best optimal algorithm. Some heuristic that makes sense is fine.
I made a previous attempt at formulating this and realized I did not do a great job at it so I removed that question. I have taken another shot at formulating my problem. Please feel free to provide any constructive criticism that can help me improve this.
Input:
N people
k announcements that I can make
Distance that my voice can be heard (say 5 meters) i.e. I may decide to announce or not depending on the number of people within these 5 meters
Goal:
Maximize the total number of people who have heard my k announcements and (optionally) minimize the time in which I can finish announcing all k announcements
Constraints:
Once a person hears my announcement, he is be removed from the total i.e. if he had heard my first announcement, I do not count him even if he hears my second announcement
I can see the same person as well as the same set of people within my proximity
Example:
Let us consider 10 people numbered from 1 to 10 and the following pattern of arrival:
Time slot 1: 1 (payoff = 1)
Time slot 2: 2 3 4 5 (payoff = 4)
Time slot 3: 5 6 7 8 (payoff = 4 if no announcement was made previously in time slot 2, 3 if an announcement was made in time slot 2)
Time slot 4: 9 10 (payoff = 2)
and I am given 2 announcements to make. Now if I were an oracle, I would choose time slots 2 and time slots 3 because then 7 people would have heard (because 5 already heard my announcement in Time slot 2, I do not consider him anymore). I am looking for an online algorithm that will help me make these decisions on whether or not to make an announcement and if so based on what factors. Does anyone have any ideas on what algorithms can be used to solve this or a simpler version of this problem?
There should be an approach relying upon a max-flow algorithm. In essence, you're trying to push the maximum amount of messages from start->end. Though it would be multidimensional, you could have a super-sink, which connects to each value of t, then have each value of t connect to the people you can reach at this time and then have a super-sink. This way, you simply have to compute a max-flow (with the added constraint of no more than k shouts, which should be solvable with a bit of dynamic programming). It's a terrifically dirty way to solve it, but it should get the job done deterministically and without the use of heuristics.
I don't know that there is really a way to solve this or an algorithm to do it the way you have formulated it.
It seems like basically you are trying to reach the maximum number of people with exactly 2 announcements. But without knowing any information about the groups of people in advance, you can't really make any kind of intelligent decision about whether or not to use your first announcement. Your second one at least has the benefit of knowing when not to be used (i.e. if the group has no new members then you can know its not worth wasting the announcement). But it still has basically the same problem.
The only real way to solve this is to use knowledge about the type of data or the desired outcome to make guesses. If you know that groups average 100 people with a standard deviation of 10, then you could just refuse to announce if less than 90 people are present. Or, if you know you need to reach at least 100 people with two announcements, you could choose never to announce to less than 50 at once. Obviously those approaches risk never announcing at all if the actual data does not meet what you would expect. But that's always going to be a risk, since you could get 1 person in the first group and then 0 in all of the rest, no matter what you do.
Or, you could try more clearly defining the problem, I have a hard time figuring out how to relate this to computers.
Lets start my trying to solve the simplest possible variant of the problem: Lets assume N people and K timeslots, but only one possible announcement. Lets also assume that each person will only ever stay for one timeslot and that each person who hasn't yet shown up has an equally probable chance of showing up at any future timeslot.
Given these simplifications, at each timeslot you look at the payoff of announcing at the current timeslot and compare to the chance of a future timeslot having a higher payoff, eg, lets assume 4 people 3 timeslots:
Timeslot 1: Person 1 shows up, so you know you could get a payoff of 1 by announcing, but then you have 3 people to show up in 2 remaining timeslots, so at least one of those timeslots is guaranteed to have 2 people, so don't announce..
So at each timeslot, you can calculate the chance that a later timeslot will have a higher payoff than the current by treating the remaining (N) people and (K) timeslots as being N independent random numbers each from 1..k, and calculate the chance of at least one value k being hit more than or equal to the current-payoff times. (Similar to the Birthday problem, but for more than 1 collision) and then you need to decide hwo much to discount based on expected variances. (bird in the hand, etc)
Generalization of this solution to the original problem is left as an exercise for the reader.

How to calculate correlation amongst preferences?

I have to split a group of x people into 3 or 4 groups, most likely 3.
I want people to be happy, so I'm having each person rate the other members of the big group from 1 to (x-1).
How do I optimize preferences to create 3 groups?
Here is a method that is likely to get a good arrangement, even if it is not an optimal arrangement:
First create a ranking function that can take any pair of groupings and determine whether one is better than the other. Then apply the following algorithm:
Randomly assign people into groups.
Randomly pick one person from each group.
Create new groupings in which each combination of reassignments is performed on the people chosen in step 2. (For 3 groups there will be 6 such reassignments. For 4, 24.)
Of all possible reasignments, pick the best one.
Repeat steps 2–4 one million times.
UPDATE
If there are only 18 people that need to be assigned, then that's just (18 choose 6) * (12 choose 6) / 6 = 2,858,856 possible groupings. (Or, in the case of four groups it's (18 choose 4) * (14 choose 4) * (10 choose 5) / 4 = 192,972,780 groupings.)
You can just try each one and pick the best.
I guess the ranking algorithm itself is really the hard part of this assignment.
You could just give each person a score based on summing the scores of the people selected to be in their group, then sum the scores of each person together.
The problem is that you're going to end up with all the popular people in one group, and all the unpopular people in another group, and all the telephone handset cleaners in another group.
You should just assign people randomly, and then tell them that you used some really scientific system. That way everybody gets a good mix.
Measure the total satisfaction of a given configuration by calculating the distance between the actual positions and the stated preferences. Start with a randomized set of groups. Then use something like hill climbing or simulated annealing to optimise.
http://en.wikipedia.org/wiki/Hill_climbing
http://en.wikipedia.org/wiki/Simulated_annealing
Simulated annealing sounds complicated, but it's really just a cleverer version of hill-climbing.

Resources