how to tackle this combinatorial algorithm problem - algorithm

I have N people who must each take T exams. Each exam takes "some" time, e.g. 30 min (no such thing as finishing early). Exams must be performed in front of an examiner.
I need to schedule each person to take each exam in front of an examiner within an overall time period but avoiding a lunch break, using the minimum number of examiners for the minimum amount of time (i.e. no/minimum examiners idle)
There are the following restrictions:
No person can be in 2 places at once
each person must take each exam once
noone should be examined by the same examiner twice
I realise that an optimal solution is probably NP-Complete, and that I'm probably best off using a genetic algorithm to obtain a best estimate (similar to this? Seating plan software recommendations (does such a beast even exist?)).
I'm comfortable with how genetic algorithms work, what i'm struggling with is how to model the problem programatically such that i CAN manipulate the parameters genetically..
If each exam took the same amount of time, then i'd divide the time period up into these lengths, and simply create a matrix of time slots vs examiners and drop the candidates in. However because the times of each test are not necessarily the same, i'm a bit lost on how to approach this.
currently im doing this:
make a list of all "tests" which need to take place, between every candidate and exam
start with as many examiners as there are tests
repeatedly loop over all examiners, for each one: find an unscheduled test which is eligible for the examiner (based on the restrictions)
continue until all tests that can be scheduled, are
if there are any unscheduled tests, increment the number of examiners and start again.
i'm looking for better suggestions on how to approach this, as it feels rather crude currently.

As julienaubert proposed, a solution (which I will call schedule) is a sequence of tuples (date, student, examiner, test) that covers all relevant student-test combinations (do all N students take all T tests?). I understand that a single examiner may test several students at once. Otherwise, a huge number of examiners will be required (at least one per student), which I really doubt.
Two tuples A and B conflict if
the student is the same, the test is different, and the time-period overlaps
the examiner is the same, the test is different, and the time-period overlaps
the student has already worked with the examiner on another test
Notice that tuple conflicts are different from schedule conflicts (which must additionally check for the repeated examiner problem).
Lower bounds:
the number E of examiners must be >= the total number of the tests of the most overworked student
total time must be greater than the total length of the tests of the most overworked student.
A simple, greedy schedule can be constructed by the following method:
Take the most overworked student and
assigning tests in random order,
each with a different examiner. Some
bin-packing can be used to reorder
the tests so that the lunch hour is
kept free. This will be a happy
student: he will finish in the
minimum possible time.
For each other student, if the student must take any test already scheduled, share time, place and examiner with the previously-scheduled test.
Take the most overworked student (as in: highest number of unscheduled tests), and assign tuples so that no constraints are violated, adding more time and examiners if necessary
If any students have unscheduled tests, goto 2.
Improving the choices made in step 2 above is critical to improvement; this choice can form the basis for heuristic search. The above algorithm tries to minimize the number of examiners required, at the expense of student time (students may end up with one exam early on and another last thing in the day, nothing in between). However, it is guaranteed to yield legal solutions. Running it with different students can be used to generate "starting" solutions to a GA that sticks to legal answers.
In general, I believe there is no "perfect answer" to a problem like this one, because there are so many factors that must be taken into account: students would like to minimize their total time spent examining themselves, as would examiners; we would like to minimize the number of examiners, but there are also practical limitations to how many students we can stack in a room with a single examiner. Also, we would like to make the scheduling "fair", so nobody is clearly getting much worse than others. So the best you can hope to do is to allow these knobs to be fiddled with, the results to be known (total time, per-student happiness, per-examiner happiness, exam sizes, perceived fairness) and allow the user to explore the parameter space and make an informed choice.

I'd recommend using a SAT solver for this. Although the problem is probably NP-hard, good SAT solvers can often handle hundreds of thousands of variables. Check out Chaff or MiniSat for two examples.

Don't limit yourself to genetic algorithms prematurely, there are many other approaches.
To be more specific, genetic algorithms are only really useful if you can combine parts of two solutions into a new one. This looks rather hard for this problem, at least if there are a similar number of people and exams so that most of them interact directly.

Here is a take on how you could model it with GA.
Using your notation:
N (nr exam-takers)
T (nr exams)
Let the gene of an individual express a complete schedule of bookings.
i.e. an individual is a list of specific bookings: (i,j,t,d)
i is the i'th exam-taker [1,N]
j is the j'th examiner [1,?]
t is the t'th test the exam-taker must take [1,T]
d is the start of the exam (date+time)
evaluate using a fitness function which has the property to:
penalize (severly) for all double booked examiners
penalize for examiners idle-time
penalize for exam-takers who were not allocated within their time period
reward for each exam-taker's test which was booked within period
this function will have all the logic to determine double bookings etc.. you have the complete proposed schedule in the individual, you now run the logic knowing the time for each test at each booking to determine the fitness and you increase/decrease the score of the booking accordingly.
to make this work well, consider:
initiation - use as much info you have to make "sane" bookings if it is computationally cheap.
define proper GA operators
initiating in a sane way:
random select d within the time period
randomly permute (1,2,..,N) and then pick the i from this (avoids duplicates), same for j and t
proper GA operators:
say you have booking a and b:
(a_i,a_j,a_t,a_d)
(b_i,b_j,b_t,b_d)
you can swap a_i and b_i and you can swap a_j and b_j and a_d and b_d, but likely no point in swapping a_t and b_t.
you can also have cycling, best illustrated with an example, if N*T = 4 a complete booking is 4 tuples and you would then cycle along i or j or d, for example cycle along i:
a_i = b_i
b_i = c_i
c_i = d_i
d_i = a_i

You might also consider constraint programming. Check out Prolog or, for a more modern expression of logic programming, PyKE

Related

Creating a scheduling algorithm based on preferences

Currently, we are attempting to build a matching application that matches students to alumni for an event. The event consists of multiple time-slots, in each of which each of the students can be assigned to one alumnus.
For each time-slot, the alumni have a maximum and a minimum number of students that should be assigned to them, and the students have a minimum number of time-slots in which they should be assigned to an alumnus. Students should also never be assigned twice to the same alumnus. Here's the real kicker though: the students can submit a ranked list of preferences for the event which contains a ranking of alumni they would like to talk to.
The algorithm has to create a schedule that contains a "fair" distribution of the students over the alumni and the time-slots.
We have already concluded that we will probably not be able to get an optimal solution, so we want to try to use Local Search to get somewhat of a "fair" schedule. However, to run Local Search we first need to create some random (but valid!) schedules with the capacities and constraints in mind. This "random fill" algorithm is where we run into some issues, since we can not figure out how to create a non-deterministic algorithm that, with the above constraints, creates even a random schedule.
We have tried converting the problem to a flow problem but the resulting graph is way too large to solve in a reasonable amount of time, and we have tried some sort of FCFS approach but there are always edge cases in which conflicts exist that require the algorithm to go into a recursive loop that could take such a long time that we might as well brute force the schedule.
While we cannot figure anything out ourselves, we feel like there must be some similar problem that can be solved with an algorithm that has already been found before. If anyone has any experience with a problem similar to this, we would love their help.
I would recommend a simple greedy approach. Whenever you are assigning a student, assign the student to their best slot. Where best is defined as follows:
If any desired alumnus has not reached a minimum in some time slot that the student is available, then the most desired such alumnus, in the available slot that is farthest from reaching a minimum.
If any desired alumnus has not reached a maximum in some time slot that the student is available, then the most desired such alumnus, in the available slot that is farthest from reaching a maximum.
If the student has not reached their minimum, all alumni in all available slots have hit a maximum, and any student in those slots has passed their minimum, then bump an assigned student. Choose to bump a student based on this student's preference minus that student's preference is as high as possible (ie it is easier to bump someone from their 5th slot than their 1st), and break ties by bumping the student who is farthest past their minimum.
No assignment this time.
Assign all students in random order, and try to assign them. Scramble them again, and repeat. Stop when there is a pass with no assignments at all.
This algorithm is not guaranteed to find a best solution, or find a solution. But within reasonable constraints, it is very likely find a decent solution in polynomial time.

Ideas for heuristically solving travelling salesman with extra constraints

I'm trying to come up with a fast and reasonably optimal algorithm to solve the following TSP/hamiltonian-path-like problem:
A delivery vehicle has a number of pickups and dropoffs it needs to
perform:
For each delivery, the pickup needs to come before the
dropoff.
The vehicle is quite small and the packages vary in size.
The total carriage cannot exceed some upper bound (e.g. 1 cubic
metre). Each delivery has a deadline.
The planner can run mid-route, so the vehicle will begin with a number of jobs already picked up and some capacity already taken up.
A near-optimal solution should minimise the total cost (for simplicity, distance) between each waypoint. If a solution does not exist because of the time constraints, I need to find a solution that has the fewest number of late deliveries. Some illustrations of an example problem and a non-optimal, but valid solution:
I am currently using a greedy best first search with backtracking bounded to 100 branches. If it fails to find a solution with on-time deliveries, I randomly generate as many as I can in one second (the most computational time I can spare) and pick the one with the fewest number of late deliveries. I have looked into linear programming but can't get my head around it - plus I would think it would be inappropriate given it needs to be run very frequently. I've also tried algorithms that require mutating the tour, but the issue is mutating a tour nearly always makes it invalid due to capacity constraints and precedence. Can anyone think of a better heuristic approach to solving this problem? Many thanks!
Safe Moves
Here are some ideas for safely mutating an existing feasible solution:
Any two consecutive stops can always be swapped if they are both pickups, or both deliveries. This is obviously true for the "both deliveries" case; for the "both pickups" case: if you had room to pick up A, then pick up B without delivering anything in between, then you have room to pick up B first, then pick up A. (In fact a more general rule is possible: In any pure-delivery or pure-pickup sequence of consecutive stops, the stops can be rearranged arbitrarily. But enumerating all the possibilities might become prohibitive for long sequences, and you should be able to get most of the benefit by considering just pairs.)
A pickup of A can be swapped with any later delivery of something else B, provided that A's original pickup comes after B was picked up, and A's own delivery comes after B's original delivery. In the special case where the pickup of A is immediately followed by the delivery of B, they can always be swapped.
If there is a delivery of an item of size d followed by a pickup of an item of size p, then they can be swapped provided that there is enough extra room: specifically, provided that f >= p, where f is the free space available before the delivery. (We already know that f + d >= p, otherwise the original schedule wouldn't be feasible -- this is a hint to look for small deliveries to apply this rule to.)
If you are starting from purely randomly generated schedules, then simply trying all possible moves, greedily choosing the best, applying it and then repeating until no more moves yield an improvement should give you a big quality boost!
Scoring Solutions
It's very useful to have a way to score a solution, so that they can be ordered. The nice thing about a score is that it's easy to incorporate levels of importance: just as the first digit of a two-digit number is more important than the second digit, you can design the score so that more important things (e.g. deadline violations) receive a much greater weight than less important things (e.g. total travel time or distance). I would suggest something like 1000 * num_deadline_violations + total_travel_time. (This assumes of course that total_travel_time is in units that will stay beneath 1000.) We would then try to minimise this.
Managing Solutions
Instead of taking one solution and trying all the above possible moves on it, I would instead suggest using a pool of k solutions (say, k = 10000) stored in a min-heap. This allows you to extract the best solution in the pool in O(log k) time, and to insert new solutions in the same time.
You could initially populate the pool with randomly generated feasible solutions; then on each step, you would extract the best solution in the pool, try all possible moves on it to generate child solutions, and insert any child solutions that are better than their parent back into the pool. Whenever the pool doubles in size, pull out the first (i.e. best) k solutions and make a new min-heap with them, discarding the old one. (Performing this step after the heap grows to a constant multiple of its original size like this has the nice property of leaving the amortised time complexity unchanged.)
It can happen that some move on solution X produces a child solution Y that is already in the pool. This wastes memory, which is unfortunate, but one nice property of the min-heap approach is that you can at least handle these duplicates cheaply when they arrive at the front of the heap: all duplicates will have identical scores, so they will all appear consecutively when extracting solutions from the top of the heap. Thus to avoid having duplicate solutions generate duplicate children "down through the generations", it suffices to check that the new top of the heap is different from the just-extracted solution, and keep extracting and discarding solutions until this holds.
A note on keeping worse solutions: It might seem that it could be worthwhile keeping child solutions even if they are slightly worse than their parent, and indeed this may be useful (or even necessary to find the absolute optimal solution), but doing so has a nasty consequence: it means that it's possible to cycle from one solution to its child and back again (or possibly a longer cycle). This wastes CPU time on solutions we have already visited.
You are basically combining the Knapsack Problem with the Travelling Salesman Problem.
Your main problem here seems to be actually the Knapsack Problem, rather then the Travelling Salesman Problem, since it has the one hard restriction (maximum delivery volume). Maybe try to combine the solutions for the Knapsack Problem with the Travelling Salesman.
If you really only have one second max for calculations a greedy algorithm with backtracking might actually be one of the best solutions that you can get.

How to match students to their preferred sections efficiently?

Say you have a class with 5 sections: A,B,C,D,E. Each section meets at different times, thus students registering for the course will have preference for which section they will take (they can only take one section). When students register for the course, they list 3 sections they would prefer to take, in order of preference.
Each section has n students. Let's say for simplicity that exactly n*5 students have registered for the course.
So, the question is: How do you efficiently match students to their preferred section?
I've seen some questions with similar matching scenario questions, but none quite fit and I'm afraid I don't know enough about algorithms to make up my own. BTW, this is a real problem and I know the department in question takes a few days to do it by hand.
To determine whether each student can be assigned to a preferred section, construct an integer-valued maximum flow in the following network, where the three Xs stand for capacity-1 arcs from students to the sections they prefer (polynomial-time via, e.g., the push-relabel algorithm). There's a solution if and only if the maximum flow moves m = n*5 units; then the assignments are determined by which arcs from each student is saturated.
capacity-1 arcs capacity-n arcs
| |
v v
student 1
/ student 2 section1
/ . X section2 \
source < . X section3 > sink
\ . X section4 /
\ student m-1 section5
student m
To take the order of preference into account, switch to solving a min-cost flow problem, still poly-time solvable (though you may find the network simplex mode of a general-purpose LP solver easier to use) which allows a cost to specified for each arc. Choose a score for each preference level depending on what you think is fair.
I'm positive that this has been asked before, but scheduling problems are like snowflakes, and I can't find the old question by keywords alone.
Maybe you could randomly distribute them into sections. Next you select random pairs of student and consider if swapping them improves the distribution (does it increase the match with their preferences?). You can iterate until there is no improvement possible for X iterations.
This is obviously a very naive approach but if your sample is small it might converge quickly. You cannot guarantee you have the optimal solution, but therefore you'd need a brute force approach which is probably not possible.
Is there a scoring system in which if student 1 is in section A the score is 20? (on the other hand if student 2 is in section A, score is 15?
I'm asking since if there's only one spot left for section A, and both student 1 and 2 has section A most preferred, then who ever gets registered first gets the spot. Instead of who ever is best fit (higher score).
If there is no scoring, you can just loop through the students and put them in the section they prefer. If the first one is full, try their second preference, then the next. If all three sections the student prefers are filled, just enroll them to one that isn't filled.
(It'd be different if there is scoring since you have to go with a priority queue for each section and maximize that.)

algorithm for solving resource allocation problems

Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient
Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.
This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.
There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.

What are some examples of problems well suited for Integer Linear Programming?

I've always been writing software to solve business problems. I came across about LIP while I was going through one of the SO posts. I googled it but I am unable to relate how I can use it to solve business problems. Appreciate if some one can help me understand in layman terms.
ILP can be used to solve essentially any problem involving making a bunch of decisions, each of which only has several possible outcomes, all known ahead of time, and in which the overall "quality" of any combination of choices can be described using a function that doesn't depend on "interactions" between choices. To see how it works, it's easiest to restrict further to variables that can only be 0 or 1 (the smallest useful range of integers). Now:
Each decision requiring a yes/no answer becomes a variable
The objective function should describe the thing we want to maximise (or minimise) as a weighted combination of these variables
You need to find a way to express each constraint (combination of choices that cannot be made at the same time) using one or more linear equality or inequality constraints
Example
For example, suppose you have 3 workers, Anne, Bill and Carl, and 3 jobs, Dusting, Typing and Packing. All of the people can do all of the jobs, but they each have different efficiency/ability levels at each job, so we want to find the best task for each of them to do to maximise overall efficiency. We want each person to perform exactly 1 job.
Variables
One way to set this problem up is with 9 variables, one for each combination of worker and job. The variable x_ad will get the value 1 if Anne should Dust in the optimal solution, and 0 otherwise; x_bp will get the value 1 if Bill should Pack in the optimal solution, and 0 otherwise; and so on.
Objective Function
The next thing to do is to formulate an objective function that we want to maximise or minimise. Suppose that based on Anne, Bill and Carl's most recent performance evaluations, we have a table of 9 numbers telling us how many minutes it takes each of them to perform each of the 3 jobs. In this case it makes sense to take the sum of all 9 variables, each multiplied by the time needed for that particular worker to perform that particular job, and to look to minimise this sum -- that is, to minimise the total time taken to get all the work done.
Constraints
The final step is to give constraints that enforce that (a) everyone does exactly 1 job and (b) every job is done by exactly 1 person. (Note that actually these steps can be done in any order.)
To make sure that Anne does exactly 1 job, we can add the constraint that x_ad + x_at + x_ap = 1. Similar constraints can be added for Bill and Carl.
To make sure that exactly 1 person Dusts, we can add the constraint that x_ad + x_bd + x_cd = 1. Similar constraints can be added for Typing and Packing.
Altogether there are 6 constraints. You can now supply this 9-variable, 6-constraint problem to an ILP solver and it will spit back out the values for the variables in one of the optimal solutions -- exactly 3 of them will be 1 and the rest will be 0. The 3 that are 1 tell you which people should be doing which job!
ILP is General
As it happens, this particular problem has a special structure that allows it to be solved more efficiently using a different algorithm. The advantage of using ILP is that variations on the problem can be easily incorporated: for example if there were actually 4 people and only 3 jobs, then we would need to relax the constraints so that each person does at most 1 job, instead of exactly 1 job. This can be expressed simply by changing the equals sign in each of the 1st 3 constraints into a less-than-or-equals sign.
First, read a linear programming example from Wikipedia
Now imagine the farmer producing pigs and chickens, or a factory producing toasters and vacuums - now the outputs (and possibly constraints) are integers, so those pretty graphs are going to go all crookedly step-wise. That's a business application that is easily represented as a linear programming problem.
I've used integer linear programming before to determine how to tile n identically proportioned images to maximize screen space used to display these images, and the formalism can represent covering problems like scheduling, but business applications of integer linear programming seem like the more natural applications of it.
SO user flolo says:
Use cases where I often met it: In digital circuit design you have objects to be placed/mapped onto certain parts of a chip (FPGA-Placing) - this can be done with ILP. Also in HW-SW codesign there often arise the partition problem: Which part of a program should still run on a CPU and which part should be accelerated on hardware. This can be also solved via ILP.
A sample ILP problem will looks something like:
maximize 37∙x1 + 45∙x2
where
x1,x2,... should be positive integers
...but, there is a set of constrains in the form
a1∙x1+b1∙x2 < k1
a2∙x1+b2∙x2 < k2
a3∙x1+b3∙x2 < k3
...
Now, a simpler articulation of Wikipedia's example:
A farmer has L m² land to be planted with either wheat or barley or a combination of the two.
The farmer has F grams of fertilizer, and P grams of insecticide.
Every m² of wheat requires F1 grams of fertilizer, and P1 grams of insecticide
Every m² of barley requires F2 grams of fertilizer, and P2 grams of insecticide
Now,
Let a1 denote the selling price of wheat per 1 m²
Let a2 denote the selling price of barley per 1 m²
Let x1 denote the area of land to be planted with wheat
Let x2 denote the area of land to be planted with barley
x1,x2 are positive integers (Assume we can plant in 1 m² resolution)
So,
the profit is a1∙x1 + a2∙x2 - we want to maximize it
Because the farmer has a limited area of land: x1+x2<=L
Because the farmer has a limited amount of fertilizer: F1∙x1+F2∙x2 < F
Because the farmer has a limited amount of insecticide: P1∙x1+P2∙x2 < P
a1,a2,L,F1,F2,F,P1,P2,P - are all constants (in our example: positive)
We are looking for positive integers x1,x2 that will maximize the expression stated, given the constrains stated.
Hope it's clear...
ILP "by itself" can directly model lots of stuff. If you search for LP examples you will probably find lots of famous textbook cases, such as the diet problem
Given a set of pills, each with a vitamin content and a daily vitamin
quota, find the cheapest cocktail that matches the quota.
Many such problems naturally have instances that require varialbe to be integers (perhaps you can't split pills in half)
The really interesting stuff though is that actually a big deal of combinatorial problems reduce to LP. One of my favourites is the assignment problem
Given a set of N workers, N tasks and an N by N matirx describing how
much each worker charges for the each task, determine what task to
give to each worker in order to minimize cost.
Most solution that naturally come up have exponential complexity but there is a polynomial solution using linear programming.
When it comes to ILP, ILP has the added benefit/difficulty of being NP-complete. This means that it can be used to model a very wide range of problems (boolean satisfiability is also very popular in this regard). Since there are many good and optimized ILP solvers out there it is often viable to translate an NP-complete problem into ILP instead of devising a custom algorithm of your own.
You can apply linear program easily everywhere you want to optimize and the target function is linear. You can make schedules (I mean big, like train companies, who need to optimize the utilization of the vehicles and tracks), productions (optimize win), almost everything. Sometimes it is tricky to formulate your problem as IP and/or sometimes you meet the problem that your solution is, that you have to produce e.g. 0.345 cars for optimum win. That is of course not possible, and so you constraint even more: Your variable for the number of cars must be integer. Even when it now sounds simpler (because you have infinite less choices for your variable), its actually harder. In this moment it gets NP-hard. Which actually means you can solve ANY problem from your computer with ILP, you just have to transform it.
For you I would recommend an intro into reading some basic (I)LP stuff. From my mind I dont know any good online site (but if you goolge you will find some), as book I can recommend Linear Programming from Chvatal. It has very good examples, and describes also real use cases.
The other answers here have excellent examples. Two of the gold standards in business of using integer programming and more generally operations research are
the journal Interfaces published by INFORMS (The Institute for Operations Research and the Management Sciences)
winners of the the Franz Edelman Award for Achievement in Operations Research and the Management Sciences
Interfaces publishes research that uses operations research applied to real-world problems, and the Edelman award is a highly competitive award for business use of operations research techniques.

Resources