Dynamic Programming Scheduler in Prolog - prolog

I'm trying to create a simple scheduler in Prolog that takes a bunch of courses along with the semesters they're offered and a user's ranking of the courses. These inputs get turned into facts like
course('CS 4812','Quantum Information Processing',1.0882353,s2012).
course('Math 6110','Real Analysis I',0.5441176,f2011).
where the third entry is a score. Currently, my database is around 60 classes, but I'd like the program to eventually be able to handle more. I'm having trouble getting my DP implementation to work on a nontrivial input. The answers are correct, but the time spent is on the same order as the brute force algorithm. I handle memoization with a dynamic predicate:
:- dynamic(stored/6).
memo(Result,Schedule,F11,S12,F12,S13) :-
stored(Result,Schedule,F11,S12,F12,S13) -> true;
dpScheduler(Result,Schedule,F11,S12,F12,S13),
assertz(stored(Result,Scheduler,F11,S12,F12,S13)).
The arguments to dpScheduler are the optimal schedule (a tuple of a list of classes and its score), the classes chosen so far, and how many classes are remaining to be chosen for the Fall 2011, Spring 2012, Fall 2012, and Spring 2013 semesters. Once the scheduler has a compete schedule, it gets the score with evalSchedule, which just sums up the scores of the classes.
dpScheduler((Acc,X),Acc,0,0,0,0) :-
!, evalSchedule(X,Acc).
I broke up dpScheduler up for each semester, but they all pretty much look the same. Here is the clause for Fall 2011, the first semester chosen.
dpScheduler(Answer,Acc,N,B,C,D) :-
!, M is N - 1,
getCourses(Courses,f2011,Acc),
lemma([Head|Tail],Courses,Acc,M,B,C,D),
findBest(Answer,Tail,Head).
The lemma predicate computes all the subgoals.
lemma(Results,Courses,Acc,F11,S12,F12,S13) :-
findall(Result,
(member(Course,Courses), memo(Result,[Course|Acc],F11,S12,F12,S13)),
Results).
My performance has been horrendous, and I'd be grateful for any pointers on how to improve it. Also, I'm a new Prolog programmer, and I haven't spent much time reading others' Prolog code, so my program is probably unidiomatic. Any advice on that would be much appreciated as well.

There are a couple of reasons for bad performance:
First of all, assert/3 is not very fast so you spend a lot of time there if there are a lot of asserts.
Then, prolog uses a hash table based on the first argument to match clauses. In your case, yhe first argument is the Result which is uninstantiated when it's called so I think you would have a performance penalty because of that. You could solve this be reordering the arguments. I thought that you could change the argument on which the hash table is based but i dont see how in the swi-prolog manual :/
Also, prolog isnt really renowned for great performance xd
I suggest to use XSB (if possible) which offers automatic memoization (tabling); you simply write
:-table(my_predicate/42) and it takes care of everything. I think that it's a bit faster than swipl too.
Other than that, you could try to use a list with all the calculated values and pass it around; maybe an association list.
edit: i dont really see where you call the memoization predicate

Related

How to make Prolog depth-first-search algorithm go deeper into the tree? (Applied to Sokoban)

I'm trying to solve the Sokoban puzzle in Prolog using a depth-first-search algorithm, but I cannot manage to search the solution tree in depth. I'm able to explore only the first level.
All the sources are at Github (links to revision when the question was asked) so feel free to explore and test them. I divided the rules into several files:
board.pl: contains rules related to the board: directions, neighbourhoods,...
game.pl: this file states the rules about movements, valid positions,...
level1.pl: defines the board, position of the boxes and solution squares for a sample game.
sokoban.pl: tries to implement dfs :(
I know I need to go deeper when a new state is created instead of checking if it is the final state and backtracking... I need to continue moving, it is impossible to reach the final state with only one movement.
Any help/advice will be highly appreciated, I've been playing around without improvements.
Thanks!
PS.- ¡Ah! I'm working with SWI-Prolog, just in case it makes some difference
PS.- I'm really newbie to Prolog, and maybe I'm facing an obvious mistake, but this is the reason I'm asking here.
This is easy to fix: In sokoban.pl, predicate solve_problem/2, you are limiting the solution to lists of a single element in the goal:
solve_dfs(Problem, Initial, [Initial], [Solution])
Instead, you probably mean:
solve_dfs(Problem, Initial, [Initial], Solution)
because a solution can consist of many moves.
In fact, an even better search strategy is often iterative deepening, which you get with:
length(Solution, _),
solve_dfs(Problem, Initial, [Initial], Solution)
Iterative deepening is a complete search strategy and an optimal strategy under quite general assumptions.
Other than that, I recommend you cut down the significant number of impure I/O calls in your program. There are just too many predicates where you write something on the screen.
Instead, focus on a clear declarative description, and cleanly separate the output from a description of what a solution looks like. In fact, let the toplevel do the printing for you: Describe what a solution looks like (you are already doing this), and let the toplevel display the solution as variable bindings. Also, think declaratively, and use better names like dfs_moves/4, problem_solution/2 instead of solve_dfs/4, solve_problem/2 etc.
DCGs may also help you in some places of your code to more conveniently describe lists.
+1 for tackling a nice and challenging search problem with Prolog!

School Scheduling (Constrained Logic)

I'm trying to build a school scheduling program in prolog. I want to check if a teacher is available at a given time to teach a certain this class; check allowable time slots; etc.
Here's what I've been able to write so far:
teacher(ali, bio).
teacher(sara, math).
teacher(john, lit).
teacher(milton, arabic).
% a, b, c, d, e, f, g
timeslot(a).
timeslot(b).
% class has a name and a grade
class(bio, 1).
class(math, 1).
class(lit, 2).
class(arabic, 2).
How do I establish that a class cannot have two timeslots?
I have used a little bit of Prolog, but I'm not sure how to go about this. Any further tips and indications, like papers or similar problems that are solved more frequently, would be appreciated.
The wording of the Question suggests that a program is to be written that will produce (or at least check) a proposed class schedule.
Inputs to the program appear to be a list of teachers (and their subjects), a list of time slots, and a list of classes (and their subjects/grades).
Presumably there are several "cardinality" restrictions (sometimes called "business rules") that a proper class schedule must meet. A class can only be given once (not two time slots) is the point of the Question, but also a teacher can only teach one class per time slot, etc.
How can these restrictions be indicated? Prolog predicates do not have inherent restrictions of this kind, but they can be implemented either structurally or logically (i.e. in the program's logical checking).
An example of doing things in a structural way would be adding a field to the class predicate to represent the assigned timeslot. Some logic would be involved in how this field is assigned, to insure that value is a valid time slot.
An example of doing the relationship between classes and time slots in a logical fashion would be to define an additional predicate that models the assignment of time slots to classes (presumably something similar applies to assigning classes to teachers). You would have, as illustration, predicate class_timeslot(Class,Timeslot). The rules of your program would enforce the uniqueness of one instance of these (dynamically asserted) facts per Class instance, and the validity of the Timeslot value.
Alternatively, instead of dynamic facts, the class schedule could be constructed as a list of structures similarly pairing classes and time slots. But the point is that program logic needs to implement that this pairing is a functional relationship.
i wrote two years ago a scheduling program for assessment center and used clpfd for that because in normal swi-prolog it would be much more complicated and the problem is scaling exponentional with the complexity so if you have a real school with lots of teachers, lessions etc this will be not really efficient without constraint programming.
Please have a look int clp(fd) at swi-prolog website
Kind regards
solick

Efficient Mutable Graph Representation in Prolog?

I would like to represent a mutable graph in Prolog in an efficient manner. I will searching for subsets in the graph and replacing them with other subsets.
I've managed to get something working using the database as my 'graph storage'. For instance, I have:
:- dynamic step/2.
% step(Type, Name).
:- dynamic sequence/2.
% sequence(Step, NextStep).
I then use a few rules to retract subsets I've matched and replace them with new steps using assert. I'm really liking this method... it's easy to read and deal with, and I let Prolog do a lot of the heavy pattern-matching work.
The other way I know to represent graphs is using lists of nodes and adjacency connections. I've seen plenty of websites using this method, but I'm a bit hesitant because it's more overhead.
Execution time is important to me, as is ease-of-development for myself.
What are the pros/cons for either approach?
As usual: Using the dynamic database gives you indexing, which may speed things up (on look-up) and slow you down (on asserting). In general, the dynamic database is not so good when you assert more often than you look up. The main drawback though is that it also significantly complicates testing and debugging, because you cannot test your predicates in isolation, and need to keep the current implicit state of the database in mind. Lists of nodes and adjacancy connections are a good representation in many cases. A different representation I like a lot, especially if you need to store further attributes for nodes and edges, is to use one variable for each node, and use variable attribtues (get_attr/3 and put_attr/3 in SWI-Prolog) to store edges on them, for example [edge_to(E1,N_1),edge_to(E2,N_2),...] where N_i are the variables representing other nodes (with their own attributes), and E_j are also variables onto which you can attach further attributes to store additional information (weight, capacity etc.) about each edge if needed.
Have you considered using SWI-Prolog's RDF database ? http://www.swi-prolog.org/pldoc/package/semweb.html
as mat said, dynamic predicates have an extra cost.
in case however you can construct the graph and then you dont need to change it, you can compile the predicate and it will be as fast as a normal predicate.
usually in sw-prolog the predicate lookup is done using hash tables on the first argument. (they are resized in case of dynamic predicates)
another solution is association lists where the cost of lookup etc is o(log(n))
after you understand how they work you could easily write an interface if needed.
in the end, you can always use a SQL database and use the ODBC interface to submit queries (although it sounds like an overkill for the application you mentioned)

prolog problem; displaying same rank universities

Sorry for the blunt title, but couldn't really generalize the question. Hope someone with Prolog experience can help me out here. So I've got a database which basically lists universities and their rank, i.e: oxford(1), warwick(2), etc. The question requires me to write a rule that returns all the names of the universities that have the same rank. Thanks in advance.
I believe this is going to require a bit of meta-programming, but only a little bit. You are probably going to have to provide some feedback about my assumptions in this answer in order to get a robust solution. But I think jumping in will get you there faster (with both of us learning something along the way) than asking a sequence of clarifying comments.
Our immediate goal will be to find these "university" facts through what SWI-Prolog calls "Examining the program" (links below, but you could search for it as a section of the Manual). If we can do this, we can query those facts to get a particular rank, thus obtaining all universities of the same rank.
From what you've said, there are a number of "facts" of the form "UNIVERSITY(RANK)." Typically if you consult a file containing these from SWI-Prolog, they will be dynamic predicates and (unless you've done something explicit to avoid it) added to the [user] module. Such a database is often called a "factbase". Facts here mean clauses with only a head (no body); dynamic predicates can in general have clauses with or without bodies.
SWI-Prolog has three different database mechanisms. The one we are discussing is the clause database that is manipulated through not only consulting but also by the assert/retract meta-predicates. We will refer to these as the "dynamic" predicates.
Here's a modification of a snippet of code that Jan Wielemaker provides for generating (through backtracking) all the built-in predicates, now repurposed to generate the dynamic predicates:
generate_dynamic(Name/Arity) :-
predicate_property(user:Head, dynamic),
functor(Head, Name, Arity). % get arity: number of args
In your case you are only interested in certain dynamic predicates, so this may return too much in the way of results. One way to narrow things down is by setting Arity = 1, since your university facts only consist of predicates with a single argument.
Another way to narrow things down is by the absence of a body. If this check is needed, we can incorporate a call to clause/2 (documented on the same page linked above). If we have a "fact" (clause without a body), then the resulting call to clause/2 returns the second argument (Body) set to the atom true.
As a final note, Jan's website uses SWI-Prolog to deliver its pages, but the resulting links don't always cut-and-paste well. If the link I gave above doesn't work for you, then you can either navigate to Sec. 4.14 of the Manual yourself, or try this link to a mirrored copy of the documentation that appears not-quite current (cf. difference in section numbering and absence of Jan's code snippet).
And feel free to ask questions if I've said something that needs clarification or assumed something that doesn't apply to your setup.
Added: Let's finish the answer by showing how to query a list of universities, whether given as such or derived from the "factbase" as outlined above. Then we have a few comments about design and learning at the end.
Suppose LU = [oxford,warwick,...] is in hand, a list of all possible universities. Apart from efficiency, we may not even care if a few things that are not universities or are not ranked are on the list, depending on the nature of the query you want to do.
listUniversityRank(LU,V,R) :- % LU: list of universities
member(V,LU),
call(V(R)).
The above snippet defines a predicate listUniversityRank/2 that we would provide a list of universities, and which would in turn call a dynamically generated goal on each member of the list to find its rank. Such a predicate can be used in several ways to accomplish your objective of finding "all the names of the universities that have the same rank."
For instance, we might want to ask for a specific rank R=1 what universities share that rank. Calling listUniversityRank(LU,V,R) with R bound to 1 would accomplish that, at least in the sense that it would backtrack through all such university names. If you wanted to gather these names into a list, then you could use findall/3.
For that matter you might want to begin listing "all the names of the universities that have the same rank" by making a list of all possible ranks, using setof/3 to collect the solutions for R in listUniversityRank(LU,_,R). setof is similar to findall but sorts the results and eliminates duplicates.
Now let's look back and think about how hard we are working to accomplish the stated aim, and what might be a design that makes life easier for that purpose. We want a list of university names with a certain property (all have the same rank). It would have been easier if we had the list of university names to start with. As Little Bobby Tables points out in one of the Comments on the Question, we have a tough time telling what is and isn't a university if there are facts like foo(3) in our program.
But something more is going on here. Using the university names to create the "facts", a different predicate for each different university, obscures the relationship university vs. rank that we would like to query. If we only had it to do over again, surely we'd rather represent this relationship with a single two-argument predicate, say universityRank/2 that directly connects each university name and the corresponding rank. Fewer predicates, better design (because more easily queried, without fancy meta-programming).

Algorithm for creating a school timetable

I've been wondering if there are known solutions for algorithm of creating a school timetable. Basically, it's about optimizing "hour-dispersion" (both in teachers and classes case) for given class-subject-teacher associations. We can assume that we have sets of classes, lesson subjects and teachers associated with each other at the input and that timetable should fit between 8AM and 4PM.
I guess that there is probably no accurate algorithm for that, but maybe someone knows a good approximation or hints for developing it.
This problem is NP-Complete!
In a nutshell one needs to explore all possible combinations to find the list of acceptable solutions. Because of the variations in the circumstances in which the problem appears at various schools (for example: Are there constraints with regards to classrooms?, Are some of the classes split in sub-groups some of the time?, Is this a weekly schedule? etc.) there isn't a well known problem class which corresponds to all the scheduling problems. Maybe, the Knapsack problem has many elements of similarity with these problems at large.
A confirmation that this is both a hard problem and one for which people perennially seek a solution, is to check this (long) list of (mostly commercial) software scheduling tools
Because of the big number of variables involved, the biggest source of which are, typically, the faculty member's desires ;-)..., it is typically impractical to consider enumerating all possible combinations. Instead we need to choose an approach which visits a subset of the problem/solution spaces.
- Genetic Algorithms, cited in another answer is (or, IMHO, seems) well equipped to perform this kind of semi-guided search (The problem being to find a good evaluation function for the candidates to be kept for the next generation)
- Graph Rewriting approaches are also of use with this type of combinatorial optimization problems.
Rather than focusing on particular implementations of an automatic schedule generator program, I'd like to suggest a few strategies which can be applied, at the level of the definition of the problem.
The general rationale is that in most real world scheduling problems, some compromises will be required, not all constraints, expressed and implied: will be satisfied fully. Therefore we help ourselves by:
Defining and ranking all known constraints
Reducing the problem space, by manually, providing a set of additional constraints.This may seem counter-intuitive but for example by providing an initial, partially filled schedule (say roughly 30% of the time-slots), in a way that fully satisfies all constraints, and by considering this partial schedule immutable, we significantly reduce the time/space needed to produce candidate solutions. Another way additional constraints help is for example "artificially" adding a constraint which prevent teaching some subjects on some days of the week (if this is a weekly schedule...); this type of constraints results in reducing the problem/solution spaces, without, typically, excluding a significant number of good candidates.
Ensuring that some of the constraints of the problem can be quickly computed. This is often associated with the choice of data model used to represent the problem; the idea is to be able to quickly opt-for (or prune-out) some of the options.
Redefining the problem and allowing some of the constraints to be broken, a few times, (typically towards the end nodes of the graph). The idea here is to either remove some of constraints for filling-in the last few slots in the schedule, or to have the automatic schedule generator program stop shy of completing the whole schedule, instead providing us with a list of a dozen or so plausible candidates. A human is often in a better position to complete the puzzle, as indicated, possibly breaking a few of the contraints, using information which is not typically shared with the automated logic (eg "No mathematics in the afternoon" rule can be broken on occasion for the "advanced math and physics" class; or "It is better to break one of Mr Jones requirements than one of Ms Smith ... ;-) )
In proof-reading this answer , I realize it is quite shy of providing a definite response, but it none the less full of practical suggestions. I hope this help, with what is, after all, a "hard problem".
It's a mess. a royal mess. To add to the answers, already very complete, I want to point out my family experience. My mother was a teacher and used to be involved in the process.
Turns out that having a computer to do so is not only difficult to code per-se, it is also difficult because there are conditions that are difficult to specify to a pre-baked computer program. Examples:
a teacher teaches both at your school and at another institute. Clearly, if he ends the lesson there at 10.30, he cannot start at your premises at 10.30, because he needs some time to commute between the institutes.
two teachers are married. In general, it's considered good practice not to have two married teachers on the same class. These two teachers must therefore have two different classes
two teachers are married, and their child attends the same school. Again, you have to prevent the two teachers to teach in the specific class where their child is.
the school has separate facilities, like one day the class is in one institute, and another day the class is in another.
the school has shared laboratories, but these laboratories are available only on certain weekdays (for security reasons, for example, where additional personnel is required).
some teachers have preferences for the free day: some prefer on Monday, some on Friday, some on Wednesday. Some prefer to come early in the morning, some prefer to come later.
you should not have situations where you have a lesson of say, history at the first hour, then three hours of math, then another hour of history. It does not make sense for the students, nor for the teacher.
you should spread the arguments evenly. It does not make sense to have the first days in the week only math, and then the rest of the week only literature.
you should give some teachers two consecutive hours to do evaluation tests.
As you can see, the problem is not NP-complete, it's NP-insane.
So what they do is that they have a large table with small plastic insets, and they move the insets around until a satisfying result is obtained. They never start from scratch: they normally start from the previous year timetable and make adjustments.
The International Timetabling Competition 2007 had a lesson scheduling track and exam scheduling track. Many researchers participated in that competition. Lots of heuristics and metaheuristics were tried, but in the end the local search metaheuristics (such as Tabu Search and Simulated Annealing) clearly beat other algorithms (such as genetic algorithms).
Take a look at the 2 open source frameworks used by some of the finalists:
JBoss OptaPlanner (Java, open source)
Unitime (Java, open source) - more for universities
One of my half-term assignments was an genetic-algorithm school table generation.
Whole table is one "organism". There were some changes and caveats to the generic genetic algorithms approach:
Rules were made for "illegal tables": two classes in the same classroom, one teacher teaching two groups at the same time etc. These mutations were deemed lethal immediately and a new "organism" was sprouted in place of the "deceased" immediately. The initial one was generated by a series of random tries to get a legal (if senseless) one. Lethal mutation wasn't counted towards count of mutations in iteration.
"Exchange" mutations were much more common than "Modify" mutations. Changes were only between parts of the gene that made sense - no substituting a teacher with a classroom.
Small bonuses were assigned for bundling certain 2 hours together, for assigning same generic classroom in sequence for the same group, for keeping teacher's work hours and class' load continuous. Moderate bonuses were assigned for giving correct classrooms for given subject, keeping class hours within bonds (morning or afternoon), and such. Big bonuses were for assigning correct number of given subject, given workload for a teacher etc.
Teachers could create their workload schedules of "want to work then", "okay to work then", "doesn't like to work then", "can't work then", with proper weights assigned. Whole 24h were legal work hours except night time was very undesired.
The weight function... oh yeah. The weight function was huge, monstrous product (as in multiplication) of weights assigned to selected features and properties. It was extremely steep, one property easily able to change it by an order of magnitude up or down - and there were hundreds or thousands of properties in one organism. This resulted in absolutely HUGE numbers as the weights, and as a direct result, need to use a bignum library (gmp) to perform the calculations. For a small testcase of some 10 groups, 10 teachers and 10 classrooms, the initial set started with note of 10^-200something and finished with 10^+300something. It was totally inefficient when it was more flat. Also, the values grew a lot wider distance with bigger "schools".
Computation time wise, there was little difference between a small population (100) over a long time and a big population (10k+) over less generations. The computation over the same time produced about the same quality.
The calculation (on some 1GHz CPU) would take some 1h to stabilize near 10^+300, generating schedules that looked quite nice, for said 10x10x10 test case.
The problem is easily paralellizable by providing networking facility that would exchange best specimens between computers running the computation.
The resulting program never saw daylight outside getting me a good grade for the semester. It showed some promise but I never got enough motivation to add any GUI and make it usable to general public.
This problem is tougher than it seems.
As others have alluded to, this is a NP-complete problem, but let's analyse what that means.
Basically, it means you have to look at all possible combinations.
But "look at" doesn't tell you much what you need to do.
Generating all possible combinations is easy. It might produce a huge amount of data, but you shouldn't have much problems understanding the concepts of this part of the problem.
The second problem is the one of judging whether a given possible combination is good, bad, or better than the previous "good" solution.
For this you need more than just "is it a possible solution".
For instance, is the same teacher working 5 days a week for X weeks straight? Even if that is a working solution, it might not be a better solution than alternating between two people so that each teacher does one week each. Oh, you didn't think about that? Remember, this is people you're dealing with, not just a resource allocation problem.
Even if one teacher could work full-time for 16 weeks straight, that might be a sub-optimal solution compared to a solution where you try to alternate between teachers, and this kind of balancing is very hard to build into software.
To summarize, producing a good solution to this problem will be worth a lot, to many many people. Hence, it's not an easy problem to break down and solve. Be prepared to stake out some goals that aren't 100% and calling them "good enough".
My timetabling algorithm, implemented in FET (Free Timetabling Software, http://lalescu.ro/liviu/fet/ , a successful application):
The algorithm is heuristic. I named it "recursive swapping".
Input: a set of activities A_1...A_n and the constraints.
Output: a set of times TA_1...TA_n (the time slot of each activity. Rooms are excluded here, for simplicity). The algorithm must put each activity at a time slot, respecting constraints. Each TA_i is between 0 (T_1) and max_time_slots-1 (T_m).
Constraints:
C1) Basic: a list of pairs of activities which cannot be simultaneous (for instance, A_1 and A_2, because they have the same teacher or the same students);
C2) Lots of other constraints (excluded here, for simplicity).
The timetabling algorithm (which I named "recursive swapping"):
Sort activities, most difficult first. Not critical step, but speeds up the algorithm maybe 10 times or more.
Try to place each activity (A_i) in an allowed time slot, following the above order, one at a time. Search for an available slot (T_j) for A_i, in which this activity can be placed respecting the constraints. If more slots are available, choose a random one. If none is available, do recursive swapping:
a. For each time slot T_j, consider what happens if you put A_i into T_j. There will be a list of other activities which don't agree with this move (for instance, activity A_k is on the same slot T_j and has the same teacher or same students as A_i). Keep a list of conflicting activities for each time slot T_j.
b. Choose a slot (T_j) with lowest number of conflicting activities. Say the list of activities in this slot contains 3 activities: A_p, A_q, A_r.
c. Place A_i at T_j and make A_p, A_q, A_r unallocated.
d. Recursively try to place A_p, A_q, A_r (if the level of recursion is not too large, say 14, and if the total number of recursive calls counted since step 2) on A_i began is not too large, say 2*n), as in step 2).
e. If successfully placed A_p, A_q, A_r, return with success, otherwise try other time slots (go to step 2 b) and choose the next best time slot).
f. If all (or a reasonable number of) time slots were tried unsuccessfully, return without success.
g. If we are at level 0, and we had no success in placing A_i, place it like in steps 2 b) and 2 c), but without recursion. We have now 3 - 1 = 2 more activities to place. Go to step 2) (some methods to avoid cycling are used here).
UPDATE: from comments ... should have heuristics too!
I'd go with Prolog ... then use Ruby or Perl or something to cleanup your solution into a prettier form.
teaches(Jill,math).
teaches(Joe,history).
involves(MA101,math).
involves(SS104,history).
myHeuristic(D,A,B) :- [test_case]->D='<';D='>'.
createSchedule :- findall(Class,involves(Class,Subject),Classes),
predsort(myHeuristic,Classes,ClassesNew),
createSchedule(ClassesNew,[]).
createSchedule(Classes,Scheduled) :- [the actual recursive algorithm].
I am (still) in the process of doing something similar to this problem but using the same path as I just mentioned. Prolog (as a functional language) really makes solving NP-Hard problems easier.
Genetic algorithms are often used for such scheduling.
Found this example (Making Class Schedule Using Genetic Algorithm) which matches your requirement pretty well.
Here are a few links I found:
School timetable - Lists some problems involved
A Hybrid Genetic Algorithm for School Timetabling
Scheduling Utilities and Tools
This paper describes the school timetable problem and their approach to the algorithm pretty well: "The Development of SYLLABUS—An Interactive, Constraint-Based Scheduler for Schools and Colleges."[PDF]
The author informs me the SYLLABUS software is still being used/developed here: http://www.scientia.com/uk/
I work on a widely-used scheduling engine which does exactly this. Yes, it is NP-Complete; the best approaches seek to approximate an optimal solution. And, of course there are a lot of different ways to say which one is the "best" solution - is it more important that your teachers are happy with their schedules, or that students get into all their classes, for instance?
The absolute most important question you need to resolve early on is what makes one way of scheduling this system better than another? That is, if I have a schedule with Mrs Jones teaching Math at 8 and Mr Smith teaching Math at 9, is that better or worse than one with both of them teaching Math at 10? Is it better or worse than one with Mrs Jones teaching at 8 and Mr Jones teaching at 2? Why?
The main advice I'd give here is to divide the problem up as much as possible - maybe course by course, maybe teacher by teacher, maybe room by room - and work on solving the sub-problem first. There you should end up with multiple solutions to choose from, and need to pick one as the most likely optimal. Then, work on making the "earlier" sub-problems take into account the needs of later sub-problems in scoring their potential solutions. Then, maybe work on how to get yourself out of painted-into-the-corner situations (assuming you can't anticipate those situations in earlier sub-problems) when you get to a "no valid solutions" state.
A local-search optimization pass is often used to "polish" the end answer for better results.
Note that typically we are dealing with highly resource-constrained systems in school scheduling. Schools don't go through the year with a lot of empty rooms or teachers sitting in the lounge 75% of the day. Approaches which work best in solution-rich environments aren't necessarily applicable in school scheduling.
Generally, constraint programming is a good approach to this type of scheduling problem. A search on "constraint programming" and scheduling or "constraint based scheduling" both within stack overflow and on Google will generate some good references. It's not impossible - it's just a little hard to think about when using traditional optimization methods like linear or integer optimization. One output would be - does a schedule exist that satisfies all the requirements? That, in itself, is obviously helpful.
Good luck !
I have designed commercial algorithms for both class timetabling and examination timetabling. For the first I used integer programming; for the second a heuristic based on maximizing an objective function by choosing slot swaps, very similar to the original manual process that had been evolved. They main things in getting such solutions accepted are the ability to represent all the real-world constraints; and for human timetablers to not be able to see ways to improve the solution. In the end the algorithmic part was quite straightforward and easy to implement compared with the preparation of the databases, the user interface, ability to report on statistics like room utilization, user education and so on.
You can takle it with genetic algorithms, yes. But you shouldn't :). It can be too slow and parameter tuning can be too timeconsuming etc.
There are successful other approaches. All implemented in open source projects:
Constraint based approach
Implemented in UniTime (not really for schools)
You could also go further and use Integer programming. Successfully done at Udine university and also at University Bayreuth (I was involved there) using the commercial software (ILOG CPLEX)
Rule based approach with heuristisc - See Drools planner
Different heuristics - FET and my own
See here for a timetabling software list
I think you should use genetic algorithm because:
It is best suited for large problem instances.
It yields reduced time complexity on the price of inaccurate answer(Not the ultimate best)
You can specify constraints & preferences easily by adjusting fitness punishments for not met ones.
You can specify time limit for program execution.
The quality of solution depends on how much time you intend to spend solving the program..
Genetic Algorithms Definition
Genetic Algorithms Tutorial
Class scheduling project with GA
Also take a look at :a similar question and another one
This problem is MASSIVE where I work - imagine 1800 subjects/modules, and 350 000 students, each doing 5 to 10 modules, and you want to build an exam in 10 weeks, where papers are 1 hour to 3 days long... one plus point - all exams are online, but bad again, cannot exceed the system's load of max 5k concurrent. So yes we are doing this now in cloud on scaling servers.
The "solution" we used was simply to order modules on how many other modules they "clash" with descending (where a student does both), and to "backpack" them, allowing for these long papers to actually overlap, else it simply cannot be done.
So when things get too large, I found this "heuristic" to be practical... at least.
I don't know any one will agree with this code but i developed this code with the help of my own algorithm and is working for me in ruby.Hope it will help them who are searching for it
in the following code the periodflag ,dayflag subjectflag and the teacherflag are the hash with the corresponding id and the flag value which is Boolean.
Any issue contact me.......(-_-)
periodflag.each do |k2,v2|
if(TimetableDefinition.find(k2).period.to_i != 0)
subjectflag.each do |k3,v3|
if (v3 == 0)
if(getflag_period(periodflag,k2))
#teachers=EmployeesSubject.where(subject_name: #subjects.find(k3).name, division_id: division.id).pluck(:employee_id)
#teacherlists=Employee.find(#teachers)
teacherflag=Hash[teacher_flag(#teacherlists,teacherflag,flag).to_a.shuffle]
teacherflag.each do |k4,v4|
if(v4 == 0)
if(getflag_subject(subjectflag,k3))
subjectperiod=TimetableAssign.where("timetable_definition_id = ? AND subject_id = ?",k2,k3)
if subjectperiod.blank?
issubjectpresent=TimetableAssign.where("section_id = ? AND subject_id = ?",section.id,k3)
if issubjectpresent.blank?
isteacherpresent=TimetableAssign.where("section_id = ? AND employee_id = ?",section.id,k4)
if isteacherpresent.blank?
#finaltt=TimetableAssign.new
#finaltt.timetable_struct_id=#timetable_struct.id
#finaltt.employee_id=k4
#finaltt.section_id=section.id
#finaltt.standard_id=standard.id
#finaltt.division_id=division.id
#finaltt.subject_id=k3
#finaltt.timetable_definition_id=k2
#finaltt.timetable_day_id=k1
set_school_id(#finaltt,current_user)
if(#finaltt.save)
setflag_sub(subjectflag,k3,1)
setflag_period(periodflag,k2,1)
setflag_teacher(teacherflag,k4,1)
end
end
else
#subjectdetail=TimetableAssign.find_by_section_id_and_subject_id(#section.id,k3)
#finaltt=TimetableAssign.new
#finaltt.timetable_struct_id=#subjectdetail.timetable_struct_id
#finaltt.employee_id=#subjectdetail.employee_id
#finaltt.section_id=section.id
#finaltt.standard_id=standard.id
#finaltt.division_id=division.id
#finaltt.subject_id=#subjectdetail.subject_id
#finaltt.timetable_definition_id=k2
#finaltt.timetable_day_id=k1
set_school_id(#finaltt,current_user)
if(#finaltt.save)
setflag_sub(subjectflag,k3,1)
setflag_period(periodflag,k2,1)
setflag_teacher(teacherflag,k4,1)
end
end
end
end
end
end
end
end
end
end
end

Resources