I am trying to design an algorithm to solve a workshop scheduling problem.
The problem is as follows:
I have to schedule a workshop consisting of a finite number of time slots, and a finite number of students. Each time slot has a capacity, which must be fulfilled. Each student has a list of preference corresponding to a set of time slots, where he/she wants to work.
For instance picture the following work shop:
where time slot 1,1 has capacity 3 and time slot 1,2 has capacity 4 and so on.
I want to optimize my scheduling so that students has the least overhead as possible. Overhead is defined as having to "wait" between work. For instance the following picture shows a student, that can work all three time slots in a day, but is only assigned to the first and the last, thus having one time slot overhead.
Thus: My objective function is to find a feasible solution while minimizing the students' total overhead.
My idea on how to solve it, is to construct a graph (see underneath), run a max-flow algorithm to determine a feasible solution, compute the total overhead (penalty), introduce some sort of randomization to change the schedule and compute the new overhead. Repeat the last 3 steps an arbitrary number of times to find the local minimum.
Graph:
Edit: Explanation of graph: The above graph shows 5 students, which each can be connected to up til 3 time slots. Each edge between a student and a time slot symbolizes that the student wants to work on that specific time slot. Each of these edges has a capacity of 1. The source node, S has edges to every student with the capacity of infinite, because each student can work as much as he/she wants. Each edge from a time slot to the sink node T, has a capacity, of the specific time slot's capacity.
Pseudo code
function solve-this(timeslots, students) {
create graph from time slots and students
run ford-folkerson max-flow algorithm to determine a feasible solution
while(local optimal solution not found) {
compute penalty / overhead
if(computed penalty < current solution) {
current solution = this solution
}
change schedule based on some sort of randomization factor
}
return solution
}
Is there any smarter way to do this? Is this the best way to find an somewhat optimal solution, without checking every possible schedule?
Any help, comments and/or feedback is greatly appreciated.
Ok, this is a real life problem and I'm trying to find an algorithm (if it exists) or even the scientific name for the problem. The problem goes as follows:
We have N time slots, and we know N tuples in the form (start, end) marking the start and end timestamps of each time slot.
We also have N events which we want to match to the N time slots. Each event goes to only one time slot and one time slot has only one event. All events must be matched to a time slot. Each event has P participants.
What the algorithm should do, is match the events to the time slots, so that the cumulative time between two events is maximized. The cumulative time is the total time waited by all participants combined (so that the mean participant has the maximum time between two events, if that makes sense).
The real life application is for the order of subjects in a university at the end of a semester, so that the subjects can be as spread out as they can, based on the number of students that have to study for each subject, so that the mean student has the maximum possible time to study between each subject.
For further clarifications, please leave a comment.
Thanks!
Assuming all processes arrive at the same time, shortest job first seems to be optimal in terms of lowering the average turn around time. I also managed to prove that.
However, when procesess arrive at different times I felt like the optimal algorithm would be the Shortest Remaing Time (preemptive shortest job first). But I can't find a way to prove it. Can someone help me/point me to a solution? Or am I flat out wrong?
http://en.wikipedia.org/wiki/Shortest_remaining_time
You can run one process at a time. No context switch time.
EDIT:
Say we have n proccesses.
Each process has an execution time of P(i). 1<= i <= n
Each process becomes available for execution at a specific time R(i)
Each process ends running at some time C(i) (turn-around time) based on when it started running, if it was suspended e.t.c
all times are integers. no specific example. I just have to find an algorithm that optimizes the average turn around time ((C(1)+C(2)+...+C(n))/n) for any given input. (as low aas possible)
This is an interview question
There is an airline company that wants to provide new updates to all of its flight
attendants through meetings. Each ith flight attendant has a working time from start
time i (si) to end time i (ei). Design an algorithm that minimizes the number of
meetings that company has to hold.
My approach is to pick a flight attendant who has the smallest end time. Then delete all those attendants whose start time <= this end time (because they already know the updates from the meeting). Continue until there is no more flight attendant to select. The airline should hold meetings at the end time of those attendants that I pick.
Is this a correct approach? If so, how to prove its correctness.
I think the complexity is O (n log n) since I will first sort the list in ascending order of end time and go through the list once.
To my understanding, the described algorithm yields an optimal solution by the following argument. Fix an instance and its optimal solution; fix the earliest ending time t of a working period; if a meeting is scheduled to t-1, all working periods starting earlier than t can be served by this meeting, so any optimal solution using more than one meeting up to t could improved. On the other hand, there must be at least one meeting up to time t-1 since otherwise some working periods could not be served.
After deletion of served working times, we obtain a smaller instance of the same problem. By using the above argument iteratively, a minimum number of meetings is obtained.
I have N people who must each take T exams. Each exam takes "some" time, e.g. 30 min (no such thing as finishing early). Exams must be performed in front of an examiner.
I need to schedule each person to take each exam in front of an examiner within an overall time period but avoiding a lunch break, using the minimum number of examiners for the minimum amount of time (i.e. no/minimum examiners idle)
There are the following restrictions:
No person can be in 2 places at once
each person must take each exam once
noone should be examined by the same examiner twice
I realise that an optimal solution is probably NP-Complete, and that I'm probably best off using a genetic algorithm to obtain a best estimate (similar to this? Seating plan software recommendations (does such a beast even exist?)).
I'm comfortable with how genetic algorithms work, what i'm struggling with is how to model the problem programatically such that i CAN manipulate the parameters genetically..
If each exam took the same amount of time, then i'd divide the time period up into these lengths, and simply create a matrix of time slots vs examiners and drop the candidates in. However because the times of each test are not necessarily the same, i'm a bit lost on how to approach this.
currently im doing this:
make a list of all "tests" which need to take place, between every candidate and exam
start with as many examiners as there are tests
repeatedly loop over all examiners, for each one: find an unscheduled test which is eligible for the examiner (based on the restrictions)
continue until all tests that can be scheduled, are
if there are any unscheduled tests, increment the number of examiners and start again.
i'm looking for better suggestions on how to approach this, as it feels rather crude currently.
As julienaubert proposed, a solution (which I will call schedule) is a sequence of tuples (date, student, examiner, test) that covers all relevant student-test combinations (do all N students take all T tests?). I understand that a single examiner may test several students at once. Otherwise, a huge number of examiners will be required (at least one per student), which I really doubt.
Two tuples A and B conflict if
the student is the same, the test is different, and the time-period overlaps
the examiner is the same, the test is different, and the time-period overlaps
the student has already worked with the examiner on another test
Notice that tuple conflicts are different from schedule conflicts (which must additionally check for the repeated examiner problem).
Lower bounds:
the number E of examiners must be >= the total number of the tests of the most overworked student
total time must be greater than the total length of the tests of the most overworked student.
A simple, greedy schedule can be constructed by the following method:
Take the most overworked student and
assigning tests in random order,
each with a different examiner. Some
bin-packing can be used to reorder
the tests so that the lunch hour is
kept free. This will be a happy
student: he will finish in the
minimum possible time.
For each other student, if the student must take any test already scheduled, share time, place and examiner with the previously-scheduled test.
Take the most overworked student (as in: highest number of unscheduled tests), and assign tuples so that no constraints are violated, adding more time and examiners if necessary
If any students have unscheduled tests, goto 2.
Improving the choices made in step 2 above is critical to improvement; this choice can form the basis for heuristic search. The above algorithm tries to minimize the number of examiners required, at the expense of student time (students may end up with one exam early on and another last thing in the day, nothing in between). However, it is guaranteed to yield legal solutions. Running it with different students can be used to generate "starting" solutions to a GA that sticks to legal answers.
In general, I believe there is no "perfect answer" to a problem like this one, because there are so many factors that must be taken into account: students would like to minimize their total time spent examining themselves, as would examiners; we would like to minimize the number of examiners, but there are also practical limitations to how many students we can stack in a room with a single examiner. Also, we would like to make the scheduling "fair", so nobody is clearly getting much worse than others. So the best you can hope to do is to allow these knobs to be fiddled with, the results to be known (total time, per-student happiness, per-examiner happiness, exam sizes, perceived fairness) and allow the user to explore the parameter space and make an informed choice.
I'd recommend using a SAT solver for this. Although the problem is probably NP-hard, good SAT solvers can often handle hundreds of thousands of variables. Check out Chaff or MiniSat for two examples.
Don't limit yourself to genetic algorithms prematurely, there are many other approaches.
To be more specific, genetic algorithms are only really useful if you can combine parts of two solutions into a new one. This looks rather hard for this problem, at least if there are a similar number of people and exams so that most of them interact directly.
Here is a take on how you could model it with GA.
Using your notation:
N (nr exam-takers)
T (nr exams)
Let the gene of an individual express a complete schedule of bookings.
i.e. an individual is a list of specific bookings: (i,j,t,d)
i is the i'th exam-taker [1,N]
j is the j'th examiner [1,?]
t is the t'th test the exam-taker must take [1,T]
d is the start of the exam (date+time)
evaluate using a fitness function which has the property to:
penalize (severly) for all double booked examiners
penalize for examiners idle-time
penalize for exam-takers who were not allocated within their time period
reward for each exam-taker's test which was booked within period
this function will have all the logic to determine double bookings etc.. you have the complete proposed schedule in the individual, you now run the logic knowing the time for each test at each booking to determine the fitness and you increase/decrease the score of the booking accordingly.
to make this work well, consider:
initiation - use as much info you have to make "sane" bookings if it is computationally cheap.
define proper GA operators
initiating in a sane way:
random select d within the time period
randomly permute (1,2,..,N) and then pick the i from this (avoids duplicates), same for j and t
proper GA operators:
say you have booking a and b:
(a_i,a_j,a_t,a_d)
(b_i,b_j,b_t,b_d)
you can swap a_i and b_i and you can swap a_j and b_j and a_d and b_d, but likely no point in swapping a_t and b_t.
you can also have cycling, best illustrated with an example, if N*T = 4 a complete booking is 4 tuples and you would then cycle along i or j or d, for example cycle along i:
a_i = b_i
b_i = c_i
c_i = d_i
d_i = a_i
You might also consider constraint programming. Check out Prolog or, for a more modern expression of logic programming, PyKE