Currently, we are attempting to build a matching application that matches students to alumni for an event. The event consists of multiple time-slots, in each of which each of the students can be assigned to one alumnus.
For each time-slot, the alumni have a maximum and a minimum number of students that should be assigned to them, and the students have a minimum number of time-slots in which they should be assigned to an alumnus. Students should also never be assigned twice to the same alumnus. Here's the real kicker though: the students can submit a ranked list of preferences for the event which contains a ranking of alumni they would like to talk to.
The algorithm has to create a schedule that contains a "fair" distribution of the students over the alumni and the time-slots.
We have already concluded that we will probably not be able to get an optimal solution, so we want to try to use Local Search to get somewhat of a "fair" schedule. However, to run Local Search we first need to create some random (but valid!) schedules with the capacities and constraints in mind. This "random fill" algorithm is where we run into some issues, since we can not figure out how to create a non-deterministic algorithm that, with the above constraints, creates even a random schedule.
We have tried converting the problem to a flow problem but the resulting graph is way too large to solve in a reasonable amount of time, and we have tried some sort of FCFS approach but there are always edge cases in which conflicts exist that require the algorithm to go into a recursive loop that could take such a long time that we might as well brute force the schedule.
While we cannot figure anything out ourselves, we feel like there must be some similar problem that can be solved with an algorithm that has already been found before. If anyone has any experience with a problem similar to this, we would love their help.
I would recommend a simple greedy approach. Whenever you are assigning a student, assign the student to their best slot. Where best is defined as follows:
If any desired alumnus has not reached a minimum in some time slot that the student is available, then the most desired such alumnus, in the available slot that is farthest from reaching a minimum.
If any desired alumnus has not reached a maximum in some time slot that the student is available, then the most desired such alumnus, in the available slot that is farthest from reaching a maximum.
If the student has not reached their minimum, all alumni in all available slots have hit a maximum, and any student in those slots has passed their minimum, then bump an assigned student. Choose to bump a student based on this student's preference minus that student's preference is as high as possible (ie it is easier to bump someone from their 5th slot than their 1st), and break ties by bumping the student who is farthest past their minimum.
No assignment this time.
Assign all students in random order, and try to assign them. Scramble them again, and repeat. Stop when there is a pass with no assignments at all.
This algorithm is not guaranteed to find a best solution, or find a solution. But within reasonable constraints, it is very likely find a decent solution in polynomial time.
Related
I'm trying to look for an algorithm to optimally schedule events, given a set of timeslots. Each event (a,b) is a meeting between 2 users and each timeslot is a fixed amount of time.
eg. a possible set of events can be: [(1,2),(1,3),(4,2),(4,3),(3,1)] with 4 possible timeslots. All events have to be scheduled in a certain timeslot, however, waiting time per user should be minimised (time between two events) and at the same time, the amount of users in a waiting timeslot should be maximised.
Do you know of any possible algorithm or heuristic for this problem?
Greetings
Sound like a combination of Job Shop Scheduling (video) and Meeting Scheduling (video) with a fairness constraint. Both are NP-complete.
Use a simple greedy Construction Heuristic (such as First Fit Decreasing) with Local Search (such as Tabu Search). For these use cases, Local Search leads to better results than Genetic Algorithms, as well be more scalable (see research competitions for proof).
For the fairness constraint "waiting time per user should be minimised", penalize the waiting time squared:
You could get a maybe-better-than-random solution with a simple approach:
sort each pair with the lower-numbered user first
sort the list on first-user (primary key), second-user (secondary sort key)
schedule meetings in that order, with any independent meetings scheduled in parallel. (Like a CPU instruction scheduler looking ahead for independent instructions. Any given user will still have their meetings in the listed order. You're just finding allowed overlaps here.)
I'm unfortunately not an expert on trying to reduce problems to known NP problems like the travelling salesman problem. It's possible there's a polynomial-time solution to this, but it's not obvious to me. If nobody comes up with one, then read on:
If the list isn't too big, you could brute-force check every permutation. For each permutation, schedule all the meetings (with independent meetings in parallel), then sum the last-first meeting times for every user. That's the score for that permutation. Take the permutation with the lowest score.
Instead of brute force, you could use a random start point and evolve towards a local minimum. Phylogenetics software like phyml uses this technique to search for maximum-likelihood evolutionary tree, which has a similarly factorial search space.
Start with a random permutation and evaluate its score
make some random changes, then evaluate the score
if it's not an improvement, try another permutation until you find one that is. (maybe with a mechanism to remember that you already tried this modification to the starting tree).
Repeat from 2 with this new tree, until you've converged on a local minimum.
Repeat from 1 for some other starting guesses, and take the best final result.
If you can efficiently figure out the score change from a swap, that will be a big speedup over re-computing the score for a permutation from scratch.
This is similar to a genetic algorithm. You should read up on that and see if any of those ideas can work.
I need advice for heuristic for minesweeper game. If found 10 fields without mine, i am curious how to estimate what should be the next field to open? I was thinking about finding possibility for mines around every field with number, and at the end of computation to choose a field with least possibility but i don't think it will give me good results, because i need to open already safe field and what i need is to open a field which will opens the biggest area on the board. I would like to read good ideas, but just without cheating algorithms.
You could try an A* search with Monte Carlo simulation. That is, to define a cost/reward to each type of cell being opened (each type of action).
Assume you have K different actions you can perform (a_1,a_2,a_3...) at current timestep.
For each action (open cell X), and use the game model to simulate what would happen next. Store the reward for the sequence of actions, and accumulate the reward to the original action. You can add probability weight to actions and the consequences to make the estimate more accurate.
Take the average of simulated rewards for each action and action sequence. After M simulations at depth D (where M and D are just pre-defined values to ensure the algorithm doesn't take too long), choose one action from (a_1,a_2,a_3...) with highest simulated reward. Pruning is necessary to make this method efficient (that is, not to waste time on actions that are definitely not lead to high reward after a few steps simulations)
Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient
Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.
This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.
There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.
We have a simulation program where we take a very large population of individual people and group them into families. Each family is then run through the simulation.
I am in charge of grouping the individuals into families, and I think it is a really cool problem.
Right now, my technique is pretty naive/simple. Each individual record has some characteristics, including married/single, age, gender, and income level. For married people I select an individual and loop through the population and look for a match based on a match function. For people/couples with children I essentially do the same thing, looking for a random number of children (selected according to an empirical distribution) and then loop through all of the children and pick them out and add them to the family based on a match function. After this, not everybody is matched, so I relax the restrictions in my match function and loop through again. I keep doing this, but I stop before my match function gets too ridiculous (marries 85-year-olds to 20-year-olds for example). Anyone who is leftover is written out as a single person.
This works well enough for our current purposes, and I'll probably never get time or permission to rework it, but I at least want to plan for the occasion or learn some cool stuff - even if I never use it. Also, I'm afraid the algorithm will not work very well for smaller sample sizes. Does anybody know what type of algorithms I can study that might relate to this problem or how I might go about formalizing it?
For reference, I'm comfortable with chapters 1-26 of CLRS, but I haven't really touched NP-Completeness or Approximation Algorithms. Not that you shouldn't bring up those topics, but if you do, maybe go easy on me because I probably won't understand everything you are talking about right away. :) I also don't really know anything about evolutionary algorithms.
Edit: I am specifically looking to improve the following:
Less ridiculous marriages.
Less single people at the end.
Perhaps what you are looking for is cluster analysis?
Lets try to think of your problem like this (starting by solving the spouses matching):
If you were to have a matrix where each row is a male and each column is a female, and every cell in that matrix is the match function's returned value, what you are now looking for is selecting cells so that there won't be a row or a column in which more than one cell is selected, and the total sum of all selected cells should be maximal. This is very similar to the N Queens Problem, with the modification that each allocation of a "queen" has a reward (which we should maximize).
You could solve this problem by using a graph where:
You have a root,
each of the first raw's cells' values is an edge's weight leading to first depth vertices
each of the second raw's cells' values is an edge's weight leading to second depth vertices..
Etc.
(Notice that when you find a match to the first female, you shouldn't consider her anymore, and so for every other female you find a match to)
Then finding the maximum allocation can be done by BFS, or better still by A* (notice A* typically looks for minimum cost, so you'll have to modify it a bit).
For matching between couples (or singles, more on that later..) and children, I think KNN with some modifications is your best bet, but you'll need to optimize it to your needs. But now I have to relate to your edit..
How do you measure your algorithm's efficiency?
You need a function that receives the expected distribution of all states (single, married with one children, single with two children, etc.), and the distribution of all states in your solution, and grades the solution accordingly. How do you calculate the expected distribution? That's quite a bit of statistics work..
First you need to know the distribution of all states (single, married.. as mentioned above) in the population,
then you need to know the distribution of ages and genders in the population,
and last thing you need to know - the distribution of ages and genders in your population.
Only then, according to those three, can you calculate how many people you expect to be in each state.. And then you can measure the distance between what you expected and what you got... That is a lot of typing.. Sorry for the general parts...
I have N people who must each take T exams. Each exam takes "some" time, e.g. 30 min (no such thing as finishing early). Exams must be performed in front of an examiner.
I need to schedule each person to take each exam in front of an examiner within an overall time period but avoiding a lunch break, using the minimum number of examiners for the minimum amount of time (i.e. no/minimum examiners idle)
There are the following restrictions:
No person can be in 2 places at once
each person must take each exam once
noone should be examined by the same examiner twice
I realise that an optimal solution is probably NP-Complete, and that I'm probably best off using a genetic algorithm to obtain a best estimate (similar to this? Seating plan software recommendations (does such a beast even exist?)).
I'm comfortable with how genetic algorithms work, what i'm struggling with is how to model the problem programatically such that i CAN manipulate the parameters genetically..
If each exam took the same amount of time, then i'd divide the time period up into these lengths, and simply create a matrix of time slots vs examiners and drop the candidates in. However because the times of each test are not necessarily the same, i'm a bit lost on how to approach this.
currently im doing this:
make a list of all "tests" which need to take place, between every candidate and exam
start with as many examiners as there are tests
repeatedly loop over all examiners, for each one: find an unscheduled test which is eligible for the examiner (based on the restrictions)
continue until all tests that can be scheduled, are
if there are any unscheduled tests, increment the number of examiners and start again.
i'm looking for better suggestions on how to approach this, as it feels rather crude currently.
As julienaubert proposed, a solution (which I will call schedule) is a sequence of tuples (date, student, examiner, test) that covers all relevant student-test combinations (do all N students take all T tests?). I understand that a single examiner may test several students at once. Otherwise, a huge number of examiners will be required (at least one per student), which I really doubt.
Two tuples A and B conflict if
the student is the same, the test is different, and the time-period overlaps
the examiner is the same, the test is different, and the time-period overlaps
the student has already worked with the examiner on another test
Notice that tuple conflicts are different from schedule conflicts (which must additionally check for the repeated examiner problem).
Lower bounds:
the number E of examiners must be >= the total number of the tests of the most overworked student
total time must be greater than the total length of the tests of the most overworked student.
A simple, greedy schedule can be constructed by the following method:
Take the most overworked student and
assigning tests in random order,
each with a different examiner. Some
bin-packing can be used to reorder
the tests so that the lunch hour is
kept free. This will be a happy
student: he will finish in the
minimum possible time.
For each other student, if the student must take any test already scheduled, share time, place and examiner with the previously-scheduled test.
Take the most overworked student (as in: highest number of unscheduled tests), and assign tuples so that no constraints are violated, adding more time and examiners if necessary
If any students have unscheduled tests, goto 2.
Improving the choices made in step 2 above is critical to improvement; this choice can form the basis for heuristic search. The above algorithm tries to minimize the number of examiners required, at the expense of student time (students may end up with one exam early on and another last thing in the day, nothing in between). However, it is guaranteed to yield legal solutions. Running it with different students can be used to generate "starting" solutions to a GA that sticks to legal answers.
In general, I believe there is no "perfect answer" to a problem like this one, because there are so many factors that must be taken into account: students would like to minimize their total time spent examining themselves, as would examiners; we would like to minimize the number of examiners, but there are also practical limitations to how many students we can stack in a room with a single examiner. Also, we would like to make the scheduling "fair", so nobody is clearly getting much worse than others. So the best you can hope to do is to allow these knobs to be fiddled with, the results to be known (total time, per-student happiness, per-examiner happiness, exam sizes, perceived fairness) and allow the user to explore the parameter space and make an informed choice.
I'd recommend using a SAT solver for this. Although the problem is probably NP-hard, good SAT solvers can often handle hundreds of thousands of variables. Check out Chaff or MiniSat for two examples.
Don't limit yourself to genetic algorithms prematurely, there are many other approaches.
To be more specific, genetic algorithms are only really useful if you can combine parts of two solutions into a new one. This looks rather hard for this problem, at least if there are a similar number of people and exams so that most of them interact directly.
Here is a take on how you could model it with GA.
Using your notation:
N (nr exam-takers)
T (nr exams)
Let the gene of an individual express a complete schedule of bookings.
i.e. an individual is a list of specific bookings: (i,j,t,d)
i is the i'th exam-taker [1,N]
j is the j'th examiner [1,?]
t is the t'th test the exam-taker must take [1,T]
d is the start of the exam (date+time)
evaluate using a fitness function which has the property to:
penalize (severly) for all double booked examiners
penalize for examiners idle-time
penalize for exam-takers who were not allocated within their time period
reward for each exam-taker's test which was booked within period
this function will have all the logic to determine double bookings etc.. you have the complete proposed schedule in the individual, you now run the logic knowing the time for each test at each booking to determine the fitness and you increase/decrease the score of the booking accordingly.
to make this work well, consider:
initiation - use as much info you have to make "sane" bookings if it is computationally cheap.
define proper GA operators
initiating in a sane way:
random select d within the time period
randomly permute (1,2,..,N) and then pick the i from this (avoids duplicates), same for j and t
proper GA operators:
say you have booking a and b:
(a_i,a_j,a_t,a_d)
(b_i,b_j,b_t,b_d)
you can swap a_i and b_i and you can swap a_j and b_j and a_d and b_d, but likely no point in swapping a_t and b_t.
you can also have cycling, best illustrated with an example, if N*T = 4 a complete booking is 4 tuples and you would then cycle along i or j or d, for example cycle along i:
a_i = b_i
b_i = c_i
c_i = d_i
d_i = a_i
You might also consider constraint programming. Check out Prolog or, for a more modern expression of logic programming, PyKE