What is an efficient way to go beyond a greedy algorithm

What is an efficient way to go beyond a greedy algorithm - algorithm

The domain of this question is scheduling operations on constrained hardware. The resolution of the result is the number of clock cycles the schedule fits within. The search space grows very rapidly where early decisions constrain future decisions and the total number of possible schedules grows rapidly and exponentially. A lot of the possible schedules are equivalent because just swapping the order of two instructions usually result in the same timing constraint.
Basically the question is what is a good strategy for exploring the vast search space without spending too much time. I expect to search only a small fraction but would like to explore different parts of the search space while doing so.
The current greedy algorithm tend to make stupid decisions early on sometimes and the attempt at branch and bound was beyond slow.
Edit:
Want to point out that the result is very binary with perhaps the greedy algorithm ending up using 8 cycles while there exists a solution using only 7 cycles using branch and bound.
Second point is that there are significant restrictions in data routing between instructions and dependencies between instructions that limits the amount of commonality between solutions. Look at it as a knapsack problem with a lot of ordering constraints as well as some solutions completely failing because of routing congestion.
Clarification:
In each cycle there is a limit to how many operations of each type and some operations have two possible types. There are a set of routing constraints which can be varied to be either fairly tight or pretty forgiving and the limit depends on routing congestion.

Integer linear optimization for NP-hard problems
Depending on your side constraints, you may be able to use the critical path method or
(as suggested in a previous answer) dynamic programming. But many scheduling problems are NP-hard just like the classical traveling sales man --- a precise solution has a worst case of exponential search time, just as you describe in your problem.
It's important to know that while NP-hard problems still have a very bad worst case solution time there is an approach that very often produces exact answers with very short computations (the average case is acceptable and you often don't see the worst case).
This approach is to convert your problem to a linear optimization problem with integer variables. There are free-software packages (such as lp-solve) that can solve such problems efficiently.
The advantage of this approach is that it may give you exact answers to NP-hard problems in acceptable time. I used this approach in a few projects.
As your problem statement does not include more details about the side constraints, I cannot go into more detail how to apply the method.
Edit/addition: Sample implementation
Here are some details about how to implement this method in your case (of course, I make some assumptions that may not apply to your actual problem --- I only know the details form your question):
Let's assume that you have 50 instructions cmd(i) (i=1..50) to be scheduled in 10 or less cycles cycle(t) (t=1..10). We introduce 500 binary variables v(i,t) (i=1..50; t=1..10) which indicate whether instruction cmd(i) is executed at cycle(t) or not. This basic setup gives the following linear constraints:
v_it integer variables
0<=v_it; v_it<=1; # 1000 constraints: i=1..50; t=1..10
sum(v_it: t=1..10)==1 # 50 constraints: i=1..50
Now, we have to specify your side conditions. Let's assume that operations cmd(1)...cmd(5) are multiplication operations and that you have exactly two multipliers --- in any cycle, you may perform at most two of these operations in parallel:
sum(v_it: i=1..5)<=2 # 10 constraints: t=1..10
For each of your resources, you need to add the corresponding constraints.
Also, let's assume that operation cmd(7) depends on operation cmd(2) and needs to be executed after it. To make the equation a little bit more interesting, lets also require a two cycle gap between them:
sum(t*v(2,t): t=1..10) + 3 <= sum(t*v(7,t): t=1..10) # one constraint
Note: sum(t*v(2,t): t=1..10) is the cycle t where v(2,t) is equal to one.
Finally, we want to minimize the number of cycles. This is somewhat tricky because you get quite big numbers in the way that I propose: We give assign each v(i,t) a price that grows exponentially with time: pushing off operations into the future is much more expensive than performing them early:
sum(6^t * v(i,t): i=1..50; t=1..10) --> minimum. # one target function
I choose 6 to be bigger than 5 to ensure that adding one cycle to the system makes it more expensive than squeezing everything into less cycles. A side-effect is that the program will go out of it's way to schedule operations as early as possible. You may avoid this by performing a two-step optimization: First, use this target function to find the minimal number of necessary cycles. Then, ask the same problem again with a different target function --- limiting the number of available cycles at the outset and imposing a more moderate price penalty for later operations. You have to play with this, I hope you got the idea.
Hopefully, you can express all your requirements as such linear constraints in your binary variables. Of course, there may be many opportunities to exploit your insight into your specific problem to do with less constraints or less variables.
Then, hand your problem off to lp-solve or cplex and let them find the best solution!

At first blush, it sounds like this problem might fit into a dynamic programming solution. Several operations may take the same amount of time so you might end up with overlapping subproblems.

If you can map your problem to the "travelling salesman" (like: Find the optimal sequence to run all operations in minimum time), then you have an NP-complete problem.
A very quick way to solve that is the ant algorithm (or ant colony optimization).
The idea is that you send an ant down every path. The ant spreads a smelly substance on the path which evaporates over time. Short parts mean that the path will stink more when the next ant comes along. Ants prefer smelly over clean paths. Run thousands of ants through the network. The most smelly path is the optimal one (or at least very close).

Try simulated annealing, cfr. http://en.wikipedia.org/wiki/Simulated_annealing .

Related

robust online algorithm for semi-variance

I'm looking for the equivalent of welford's algorithm for the online computation semi-variance (downside partial variance). Does anyone know of a good reference? Does such an algorithm even exist?
Edit: the case where the semi-variance is taken relative to a fixed target is trivial. the problem is calculating the semi-variance in relation to the mean

I believe the answer is one does not exist and I'm going to try to outline a proof of why this is so.
Consider a 'uesful' online algorithm to be defined by two criteria:
It must have fixed memory requirements during processing.
Each update should take a fixed amount of time.
This is stricter than the literal definition of an sequential/incremental/online algorithm which really just requires that data can be passed in one piece at a time. However, consider that if either 1) or 2) were not true then after processing a large enough amounts of elements, the memory required or time required to run the algorithm would eventually become infeasible. Usually, one of the reasons why online algorithms are used is that they can be used continuously without fear of the performance slowly getting worse. Also, note that there are online algorithms for calculating the mean and variance that satisfy both 1 & 2 and I think that's what we are aiming to achieve.
Now to the problem posed. During processing, the mean will change with every bit of new data. That in turn means the set of observations that fall below the mean will change. When this happens, we need to adjust our running semi-variance according to the set "delta", defined as the elements that are not in the union between the set of elements below the old mean and the set of elements below the new mean. We will have to calculate this delta in the process of adjusting the old-semivariance to the new-semivariance in the presence of new data.
Now let's consider the complexity of calculating this set delta. We will need to find all elements that fall between the old mean and the new mean. We will always keep track of the old mean, while the new mean can be calculated incrementally in fixed time so they pose no problem. However to calculate the delta itself, there is no way to do it other than requiring us to keep track of all the previous elements in our set. This immediately breaks the memory condition of an online algorithm. Secondly, even if we keep the previous elements in our set sorted, the best speed we can achieve to find those that are between the old mean and new mean is O(log(number of elements)), which is worse than fixed. So eventually, with enough elements, the online algorithm will not only require more memory than we have, but it will also require more time.

http://www3.sympatico.ca/jean-v.cote/computation_of_semi-variance.pdf
P.S.:This is not an incremental computation. I have another idea. I will keep you posted.

Best algorithm for optimizing the decisions in a simulation

I'm looking for the best algorithm to optimise the decisions made in a simultaion to find a fast result in a reasonable amount of time. The simultaion does a number of "ticks" and occasionaly needs to make a decision. Eventually a goal state is reached. ( It would be possible to never reach a goal state if you make very bad decisions )
There are many many goal states. I want to find the goal state with the least number of ticks ( a tick equates roughly to a second in real life." I basically want to decide which decisions to make to get to the goal in as few seconds as possible,
Some points about the problem domain:
Straight off the bat I can generate a series of choices that will lead to a solution. It won't be optimal.
I have a reasonable heuristic function to determine what would be a good decision
I have a reasonable function to determine the minimum possible time cost from a node to a goal.
Algorithms:
I need to process this problem for about 10 seconds and then give the best answer I can.
I believe A* would find me the optimal soluton. The problem is that the decision tree will be so large that I won't be able to calculate it quick enough.
IDA* would give me a good first few choices in 10 seconds but I need a path all the way to a goal.
At the moment I am thinking that I will start off with the known non optimal path to a goal and then perhaps use Simulated Anealing and attempt to improve it over 10 seconds.
What would be a good algorithm to research to try to solve this sort of problem?

Have a look at limited discrepancy search, repeating with increasingly loose limits on the maximum discrepancy search, or beam search.
If you have a good heuristic you should be able to use it to compare individual choices - for the limited discrepancy search, and compare partial solutions, for the beam search.
See if you can place an upper bound on how good any extension of a partial solution is. Then you can prune out partial solutions that can't possibly be extended to beat the result from the heuristic, or the best result found so far in a series of iterative searches with increasing depth.

Let's get a few facts out.
1) The only way to know for sure which decision is the best is to test every possible decision and evaluate the outcome based on some criteria.
2) We are highly unlikely to have the time to decide to go through every possible decision, so we have to limit how far in the future we will evaluate the decision.
3) We are highly unlikely to make the best move ~ever~. Not just often, but ever. Unless you have only a couple of decisions, chances are every time you make a decision, there was a better one you didn't get to.
4) We can use how our previous decisions worked out to our advantage.
Put all this together... Let's say when we have a decision, we evaluate what happens 30 ticks into the future, in 30 ticks we can check to see if what actually happened matches what we simulated 30 ticks ago. If it was, we know that decision leads to predictable outcomes and we should use that decision less. If we didn't, or if it turns out better than we hoped, we should use that decision more.
Ideally, you would use your logic in a ... simulation of your simulation ... for purposes of evaluating it. Then when you get to the 'real' simulation, you have a better chance at picking your better decisions earlier. Of course, give a higher weight to the results of your actual simulation results as opposed to your simulated simulation results.

Multiple parameter optimization with lots of local minima

I'm looking for algorithms to find a "best" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse, testing a set of parameters is very slow - on the order of 1 minute - and I can't compute the gradient directly.
Are there any well-known algorithms for this kind of optimization?
I've had moderate success with just trying random values. I'm wondering if I can improve the performance by making the random parameter chooser have a lower chance of picking parameters close to ones that had produced bad results in the past. Is there a name for this approach so that I can search for specific advice?
More info:
Parameters are continuous
There are on the order of 5-10 parameters. Certainly not more than 10.

How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?
Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.
Simulated
Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
Particle
Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.
The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.
Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.
By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.

Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.
The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.
Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.
Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.

I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).
I've also tried an algorithm that does the following:
Pick a random point and a random direction
Evaluate the function
Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.
This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.
If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.
This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.
Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.

I can't really help you with finding an algorithm for your specific problem.
However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.
If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.

There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.
Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:
get 100 computers to cut your parameter testing time to some reasonable time
really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.

What can be parameters other than time and space while analyzing certain algorithms?

I was interested to know about parameters other than space and time during analysing the effectiveness of an algorithms. For example, we can focus on the effective trap function while developing encryption algorithms. What other things can you think of ?

First and foremost there's correctness. Make sure your algorithm always works, no matter what the input. Even for input that the algorithm is not designed to handle, you should print an error mesage, not crash the entire application. If you use greedy algorithms, make sure they truly work in every case, not just a few cases you tried by hand.
Then there's practical efficiency. An O(N2) algorithm can be a lot faster than an O(N) algorithm in practice. Do actual tests and don't rely on theoretical results too much.
Then there's ease of implementation. You usually don't need the best intro sort implementation to sort an array of 100 integers once, so don't bother.
Look for worst cases in your algorithms and if possible, try to avoid them. If you have a generally fast algorithm but with a very bad worst case, consider detecting that worst case and solving it using another algorithm that is generally slower but better for that single case.
Consider space and time tradeoffs. If you can afford the memory in order to get better speeds, there's probably no reason not to do it, especially if you really need the speed. If you can't afford the memory but can afford to be slower, do that.
If you can, use existing libraries. Don't roll your own multiprecision library if you can use GMP for example. For C++, stuff like boost and even the STL containers and algorithms have been worked on for years by an army of people and are most likely better than you can do alone.

Stability (sorting) - Does the algorithm maintain the relative order of equal elements?
Numeric Stability - Is the algorithm prone to error when very large or small real numbers are used?
Correctness - Does the algorithm always give the correct answer? If not, what is the margin of error?
Generality - Does the algorithm work in many situation (e.g. with many different data types)?
Compactness - Is the program for the algorithm concise?
Parallelizability - How well does performance scale when the number of concurrent threads of execution are increased?
Cache Awareness - Is the algorithm designed to maximize use of the computer's cache?
Cache Obliviousness - Is the algorithm tuned for particulary cache-sizes / cache-line-sizes or does it perform well regardless of the parameters of the cache?

Complexity. 2 algorithms being the same in all other respects, the one that's much simpler is going to be a much better candidate for future customization and use.
Ease of parallelization. Depending on your use case, it might not make any difference or, on the other hand, make the algorithm useless because it can't use 10000 cores.

Stability - some algorithms may "blow up" with certain test conditions, e.g. take an inordinately long time to execute, or use an inordinately large amount of memory, or perhaps not even terminate.

For algorithms that perform floating point operations, the accumulation of round-off error is often a consideration.

Power consumption, for embedded algorithms (think smartcards).

One important parameter that is frequently measure in the analysis of algorithms is that of Cache hits and cache misses. While this is a very implementation and architecture dependent issue, it is possible to generalise somewhat. One particularly interesting property of the algorithm is being Cache-oblivious, which means that the algorithm will use the cache optimally on multiple machines with different cache sizes and structures without modification.

Time and space are the big ones, and they seem so plain and definitive, whereby they should often be qualified (1). The fact that the OP uses the word "parameter" rather than say "criteria" or "properties" is somewhat indicative of this (as if a big O value on time and on space was sufficient to frame the underlying algorithm).
Other criteria include:
domain of applicability
complexity
mathematical tractability
definitiveness of outcome
ease of tuning (may be tied to "complexity" and "tactability" afore mentioned)
ability of running the algorithm in a parallel fashion
(1) "qualified": As hinted in other answers, a -technically- O(n^2) algorithm may be found to be faster than say an O(n) algorithm, in 90% of the cases (which, btw, may turn out to be 100% of the practical cases)

worst case and best case are also interesting, especially when linked to some conditions in the input. if your input data shows some properties, an algorithm, by taking advantage of this property, may perform better that another algorithm which performs the same task but does not use that property.
for example, many sorting algorithm perform very efficiently when input are partially ordered in a specific way which minimizes the number of operations the algorithm has to execute.
(if your input is mostly sorted, an insertion sort will fit nicely, while you would never use that algorithm otherwise)

If we're talking about algorithms in general, then (in the real world) you might have to think about CPU/filesystem(read/write operations)/bandwidth usage.
True they are way down there in the list of things you need worry about these days, but given a massive enough volume of data and cheap enough infrastructure you might have to tweak your code to ease up on one or the other.

What you are interested aren’t parameters, rather they are intrinsic properties of an algorithm.
Anyway, another property you might be interested in, and analyse an algorithm for, concerns heuristics (or rather, approximation algorithms), i.e. algorithms which don’t find an exact solution but rather one that is (hopefully) good enough.
You can analyze how far a solution is from the theoretical optimal solution in the worst case. For example, an existing algorithm (forgot which one) approximates the optimal travelling salesman tour by a factor of two, i.e. in the worst case it’s twice as long as the optimal tour.
Another metric concerns randomized algorithms where randomization is used to prevent unwanted worst-case behaviours. One example is randomized quicksort; quicksort has a worst-case running time of O(n2) which we want to avoid. By shuffling the array beforehand we can avoid the worst-case (i.e. an already sorted array) with a very high probability. Just how high this probability is can be important to know; this is another intrinsic property of the algorithm that can be analyzed using stochastic.

For numeric algorithms, there's also the property of continuity: that is, whether if you change input slightly, output also changes only slightly. See also Continuity analysis of programs on Lambda The Ultimate for a discussion and a link to an academical paper.
For lazy languages, there's also strictness: f is called strict if f _|_ = _|_ (where _|_ denotes the bottom (in the sense of domain theory), a computation that can't produce a result due to non-termination, errors etc.), otherwise it is non-strict. For example, the function \x -> 5 is non-strict, because (\x -> 5) _|_ = 5, whereas \x -> x + 1 is strict.
Another property is determinicity: whether the result of the algorithm (or its other properties, such as running time or space consumption) depends solely on its input.

All these things in the other answers about the quality of various algorithms are important and should be considered.
But time and space are two things that vary at some rate compared to the size of the input (n). So what else can vary according to n?
There are several that are related to I/O. For example, the number of writes to a disk is an important one, which may not be directly shown by space and time estimates alone. This becomes particularly important with flash memory, where the number of writes to the same memory location is the significant metric in some algorithms.
Another I/O metric would be "chattiness". A networking protocol might send shorter messages more often adding up to the same space and time as another networking protocol, but some aspect of the system (perhaps billing?) might make minimizing either the size or number of the messages desireable.
And that brings us to Cost, which is a very important algorithmic consideration sometimes. The cost of an algorithm may be affected by both space and time in different amounts (consider the separate costing of server storage space and gigabits of data transfer), but the cost is the thing that you wish to minimize overall, so it may have its own big-O estimations.

Which algorithm for assigning shifts (discrete optimization problem)

I'm developing an application that optimally assigns shifts to nurses in a hospital. I believe this is a linear programming problem with discrete variables, and therefore probably NP-hard:
For each day, each nurse (ca. 15-20) is assigned a shift
There is a small number (ca. 6) of different shifts
There is a considerable number of constraints and optimization criteria, either concerning a day, or concerning an emplyoee, e.g.:
There must be a minimum number of people assigned to each shift every day
Some shifts overlap so that it's OK to have one less person in early shift if there's someone doing intermediate shift
Some people prefer early shift, some prefer late shift, but a minimum of shift changes is required to still get the higher shift-work pay.
It's not allowed for one person to work late shift one day and early shift the next day (due to minimum resting time regulations)
Meeting assigned working week lengths (different for different people)
...
So basically there is a large number (aout 20*30 = 600) variables that each can take a small number of discrete values.
Currently, my plan is to use a modified Min-conflicts algorithm
start with random assignments
have a fitness function for each person and each day
select the person or day with the worst fitness value
select at random one of the assignments for that day/person and set it to the value that results in the optimal fitness value
repeat until either a maximum number of iteration is reached or no improvement can be found for the selected day/person
Any better ideas? I am somewhat worried that it will get stuck in a local optimum. Should I use some form of simulated annealing? Or consider not only changes in one variable at a time, but specifically switches of shifts between two people (the main component in the current manual algorithm)? I want to avoid tailoring the algorithm to the current constraints since those might change.
Edit: it's not necessary to find a strictly optimal solution; the roster is currently done manual, and I'm pretty sure the result is considerably sub-optimal most of the time - shouldn't be hard to beat that. Short-term adjustments and manual overrides will also definitely be necessary, but I don't believe this will be a problem; Marking past and manual assignments as "fixed" should actually simplify the task by reducing the solution space.

This is a difficult problem to solve well. There has been many academic papers on this subject particularly in the Operations Research field - see for example nurse rostering papers 2007-2008 or just google "nurse rostering operations research". The complexity also depends on aspects such as: how many days to solve; what type of "requests" can the nurse's make; is the roster "cyclic"; is it a long term plan or does it need to handle short term rostering "repair" such as sickness and swaps etc etc.
The algorithm you describe is a heuristic approach.
You may find you can tweak it to work well for one particular instance of the problem but as soon as "something" is changed it may not work so well (e.g. local optima, poor convergence).
However, such an approach may be adequate depending your particular business needs - e.g. how important is it to get the optimal solution, is the problem outline you describe expected to stay the same, what is the potential savings (money and resources), how important is the nurse's perception of the quality of their rosters, what is the budget for this work etc.

Umm, did you know that some ILP-solvers do quite a good job? Try AIMMS, Mathematica or the GNU programming kit! 600 Variables is of course a lot more than the Lenstra theorem will solve easily, but sometimes these ILP solvers have a good handle and in AIMMS, you can modify the branching strategy a little. Plus, there's a really fast 100%-approximation for ILPs.

I solved a shift assignment problem for a large manufacturing plant recently. First we tried generating purely random schedules and returning any one which passed the is_schedule_valid test - the fallback algorithm. This was, of course, slow and indeterminate.
Next we tried genetic algorithms (as you suggested), but couldn't find a good fitness function that closed on any viable solution (because the smallest change can make the entire schedule RIGHT or WRONG - no points for almost).
Finally we chose the following method (which worked great!):
Randomize the input set (i.e. jobs, shift, staff, etc.).
Create a valid tuple and add it to your tentative schedule.
If not valid tuple can be created, rollback (and increment) the last tuple added.
Pass the partial schedule to a function that tests could_schedule_be_valid, that is, could this schedule be valid if the remaining tuples were filled in a possible way
If !could_schedule_be_valid, simply rollback (and increment) the tuple added in (2).
If schedule_is_complete, return schedule
Goto (2)
You incrementally build a partial shift this way. The benefit is that some tests for valid schedule can easily be done in Step 2 (pre-tests), and others must remain in Step 5 (post-tests).
Good luck. We wasted days trying the first two algorithms, but got the recommended algorithm generating valid schedules instantly in under 5 hours of development.
Also, we supported pre-fixing and post-fixing of assignments that the algorithm would respect. You simply don't randomize those slots in Step 1. You'll find that the solutions doesn't have to be anywhere near optimal. Our solution is O(N*M) at a minimum but executes in PHP(!) in less than half a second for an entire manufacturing plant. The beauty is in ruling out bad schedules quickly using a good could_schedule_be_valid test.
The people that are used to doing it manually don't care if it takes an hour - they just know they don't have to do it manually any more.

Mike,
Don't know if you ever got a good answer to this, but I'm pretty sure that constraint programming is the ticket. While a GA might give you an answer, CP is designed to give you many answers or tell you if there is no feasible solution. A search on "constraint programming" and scheduling should bring up lots of info. It's a relatively new area and CP methods work well on many types of problems where traditional optimization methods bog down.

Dynamic programming a la Bell? Kinda sounds like there's a place for it: overlapping subproblems, optimal substructures.

One thing you can do is to try to look for symmetries in the problem. E.g. can you treat all nurses as equivalent for the purposes of the problem? If so, then you only need to consider nurses in some arbitrary order -- you can avoid considering solutions such that any nurse i is scheduled before any nurse j where i > j. (You did say that individual nurses have preferred shift times, which contradicts this example, although perhaps that's a less important goal?)

I think you should use genetic algorithm because:
It is best suited for large problem instances.
It yields reduced time complexity on the price of inaccurate answer(Not the ultimate best)
You can specify constraints & preferences easily by adjusting fitness punishments for not met ones.
You can specify time limit for program execution.
The quality of solution depends on how much time you intend to spend solving the program..
Genetic Algorithms Definition
Genetic Algorithms Tutorial
Class scheduling project with GA
Also take a look at :a similar question and another one

Using CSP programming I made programms for automatic shitfs rostering. eg:
2-shifts system - tested for 100+ nurses, 30 days time horizon, 10+
rules
3-shifts system - tested for 80+ nurses, 30 days time horizon, 10+ rules
3-shifts system, 4-teams - tested for 365 days horizon, 10+ rules,
and a couple of similiar systems. All of them were tested on my home PC (1.8GHz, dual-core). Execution times always were acceptable ie. for 3/ it took around 5 min and 300MB RAM.
Most hard part of this problem was selecting proper solver and proper solving strategy.

Metaheuristics did very well at the International Nurse Rostering Competition 2010.
For an implementation, see this video with a continuous nurse rostering (java).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio