Algorithm Perfection Vs Time Analysis : Does Time complexity matters everytime? - algorithm

I have a very basic and general doubt related to algorithm design. I've learnt basic algorithm and now learning randomized algorithm. Everywhere I observed that a professor mostly focuses on designing the algorithm that will ultimately try to reduces the complexity.
The usual way(What I observed) is to learn some basic(or an older) algorithm which behaves badly in terms of complexity and so the objective is to modify that older one with a newer algorithm which should focus on reducing the complexity, without affecting the output.
But in most of algorithm I've studied, especially distributed algorithms (in distributed operating systems) such as algorithms for distributed mutual exclusion, distributed deadlock detection etc., what I observed is that(and mostly I think that) the design of the algorithm should not focus only on complexity enhancement but it should focus on the perfection of the algorithm as well.
Lets take an example of distributed mutual exclusion algorithm. The very basic algorithm is a Lamport's algorithm and the modified version(by enhancing the complexity) of it is the Ricart-Agarwala algorithm. Since in distributed environment the communication is mostly by means of message passing, for distributed mutual exclusion we have three kinds of messages : a) Request critical resource b) Reply the request c) Release critical resource. The basic algorithm uses extra release messages(to inform all sites that the my site has released the critical resource, so you can enter). But in the advanced version what they did is they discarded these release messages by accommodating it in reply messages. And so they came up with some reduced complexity solution.
But when I tried the implementation of these algorithms in java, I observed that even if the complexity of basic algorithm was bit higher but it was more perfect than the advanced one. Because by reducing the number of messages transferred (in advanced solution), local site is no longer aware of the fact that remote site has actually released the resource or not because on the confirmation of release message only site updates its local data structures such as request queue etc. If we don't send any explicit notification for release, then requests remains pending unnecessarily in request queue of the local site for entire run.
So my doubt is that if enhancement of complexity is so important, why can't perfection ? I mean if algorithm is producing perfect results at the cost of bit higher complexity then how does it matters as far as I am getting perfection in output as compared to the enhanced complexity solution which lacks in perfection ?
Note : By perfection I don't mean correct/incorrect results. Results are always correct. Only the perfection or accuracy of the produced result varies.

Principally a fair complexity comparision is done for two algoritms that produce exactly the same output. E.g sorting.
In your case it is different, you describe algoritms with different behaviour.
To choose the better suited algorithm many factors decide:
Ease of implementations (in praxis very important)
A faster algorithm, that lacks some functionallity like in your case must be incredible faster (faktor 10 on expected data volume) to choose it, or easier to implement.
robustness: well know algo, successfuly used since 10 years, or a new algo from a paper where chance are high that it works only the environment (optimized for the algo) by the scientist. (I know such a case for a telecom network algo)

Consider any NP-complete problem (e.g. the travelling salesman problem).
There are no known non-exponential exact algorithms for these problems (except in special cases), so it would literally take years (or much longer) to find an exact solution for any reasonably-sized version of these problems.
So, instead we use heuristics and approximations (and possibly some randomness) to get a non-exact solution in a reasonable time-frame.
NP-complete problems are just an extreme example - we can also just have a few seconds to get a solution (for whatever reason), but finding an exact solution will take a few minutes. So it all comes down to balancing out how long we want to run the algorithm for and how good we want the results to be (and development time also certainly plays a role).
I hope I understood what you were asking correctly and that this helps.

Instead of "perfection", maybe you should consider "fitness for a particular purpose".
For your example of a distributed mutual exclusion algorithm, consider the "simple" and "improved" algorithms from different viewpoints. As another answer pointed out, the two algorithms behave differently; my point is that different people are interested in different aspects of that behavior.
Someone using an algorithm for a particular purpose probably does not care about all aspects of its behavior. For your example, you are concerned about pending resource locks. However, if the mutual exclusion algorithm is expected to be running all the time, the user might not care, because the locks will be returned promptly anyway, while using fewer messages than the simple version. If you want both efficiency and promptness, there is likely some way to accommodate both -- at the cost of greater complexity -- and if you're looking for practical "perfection", this is the logical endpoint.
A computer scientist does not know how his algorithm might be used. In general, he cannot anticipate all possible variations on a particular technique, and you would not want to read them all if he could! When publishing an algorithm, clarity of expression is the "perfection" you're pursuing -- the idea should be described as simply as possible.

Related

Unit Testing Approximation Algorithms

I'm working on an open-source approximation algorithms library for graphs and networks using some popular python packages as a base. The main goal is to encompass up-to-date approximation algorithms for NP-Complete problems over graphs and networks. The reason for this is 1) I haven't seen a nice (modern) consolidated package that covers this and 2) it would be a nice pedagogical tool for learning about approximation algorithms on NP-Hard optimization problems.
In building this library I am using unit-tests to sanity check (as any proper developer would). I am somewhat cautious about my unit tests in that by their very nature, approximation algorithms may not return the correct solution. Currently I am solving some small instances by hand and then assuring that the returned result matches that, but this is not desirable, nor scalable in an implementation sense.
What would be the best way to unit test approximation algorithms? Generate random instances and ensure that the returned results are less than the bound guaranteed by the algorithm? That would seem to have false positives (the test just got lucky that time, not guaranteed for all instances to be below bound).
You need to separate two concerns here. The quality of your approximation algorithms and the correctness of implementation of those algorithms.
Testing the quality of an approximation algorithm usually will not lend itself to unit testing methods used in software development. For example you would need to generate random problems that is representative of the real sizes of problems. You might need to do mathematical work to get some upper/lower bound to judge the quality of your algorithms for unsolvable large instances. Or use problem test sets that have known or best known solutions and compare your results. But in any case unit testing would not help you much in improving the quality of the approximation algorithms. This is where your domain knowledge in optimization and math will help.
The correctness of your implementation is where unit tests will be really useful. You can use toy sized problems here and compare known results (solving by hand, or verified through careful step by step debugging in code) with what your code generates. Having small problems is not only enough but also desirable here so that tests run fast and can be run many times during development cycle. These types of tests makes sure that overall algorithm is arriving at the correct result. It is somewhere between a unit test and an integration tests since you are testing a large portion of the code as a black box. But I have found these types of tests to be extremely useful in optimization domain. One thing I recommend doing for this type of testing is removing all randomness in your algorithms through fixed seeds for random number generators. These tests should always run in a deterministic way and give exactly the same result 100% of the time.
I also recommend unit testing at the lower level modules of your algorithms. Isolate that method that assigns weights to arcs on the graph and check if the correct weights are assigned. Isolate your objective function value calculation function and unit test that. You get my point.
One other concern that cuts both of these slices is performance. You cannot reliably test performance with small toy problems. Also realizing a change that degrades performance significantly for a working algorithm quickly is very desirable. Once you have a running version of your algorithms you can create larger test problems where you measure the performance and automate it to be your performance/integration tests. You can run these less frequently as they will take more time but at least will notify you early of newly introduced performance bottlenecks during refactoring or new feature additions to algorithms
Checking the validity of the produced solutions is the obvious first step.
Additionally, one angle of attack could be regression testing using instances for which the expected approximate solution is known (e.g. obtained by executing the algorithm by hand or by using somebody else's implementation of the same algorithm).
There also exist repositories of problem instances with known (optimal) solutions, such as TSPLIB for TSP-like problems. Perhaps these could be put to some use.
If there are known upper bounds for the algorithm in question, then generating many random instances and verifying the heuristic solutions against the upper bounds may prove fruitful. If you do do this, I'd urge you to make the runs reproducible (e.g. by always using the same random number generator and seed).
One final note: for some problems, fully random instances are on average pretty easy to find good approximate solutions for. Asymmetric TSP with uniformly and independently chosen arc weights is one such example. I am mentioning this since it may affect your testing strategy.
There is usually something you can check - for instance, that your algorithm always returns solutions that satisfy their constraints, even if they are not optimal. You should also put in assertion checks at every possible opportunity - these will be specific to your program, but might check that some quantity is conserved, or that something that should increase or at worst stay the same does not decrease, or that some supposed local optimum really is a local optimum.
Given these sorts of checks, and the checks on bounds that you have already mentioned, I favour running tests on a very large number of randomly generated small problems, with random seeds chosen in such a way that if it fails on problem 102324 you can repeat that failure for debugging without running through the 102323 problems before it. With a large number of problems, you increase the chance that an underlying bug will cause an error obvious enough to fail your checks. With small problems, you increase the chance that you will be able to find and fix the bug.

When to use Tabu Search with Genetic Algorithms and when not?

Tabu Search may be using at Genetic Algorithms.
Genetic Algorithms may need many generations to get a success so running at high performance is important for them. Tabu Search is for enhancement for avoiding local maximums and good with memory mechanism to get better success through the iterations. However Tabu Search makes the algorithm more slower as usual beside its benefits.
My question is:
Is there any research about when to use Tabu Search with Genetic Algorithms and when not?
Generally speaking, GAs spend a lot of time sampling points that are trivially suboptimal. Suppose you're optimizing a function that looks like a couple of camel humps. GAs will dump points all over the place initially, and slowly converge to the points being at the top of the humps. However, even a very simple local search algorithm can take a point that the GA generates on the slope of a hump and push it straight to the top of the hump essentially immediately. If you let every point the GA generates go through this simple local optimization, then you end up with a GA searching only the space of local optima, which generally will greatly improve your chances of finding the best solutions. The problem is that when you start on real problems instead of camel humps, simple local search algorithms often aren't powerful enough to find the really good local optima, but something like tabu search can be used in their place.
There are two drawbacks. One, each generation of the GA goes much more slowly (but you need many fewer generations usually). Two, you lose some diversity, which can cause you to converge to a suboptimal solution more often.
In practice, I would always include some form of local search inside a GA whenever possible. No Free Lunch tells us that sometimes you'll make things worse, but after ten years or so of doing GA and local search research professionally, I'd pretty much always put up a crisp new $100 bill that says that local search will improve things for the majority of cases you really care about. It doesn't have to be tabu search; you could use Simulated Annealing, VDS, or just a simple next-ascent hill climber.
When you combine multiple heuristics together, you have what's referred to as a hybrid-heuristic.
It has been a trend in the last decade or so to explore the advantages and disadvantages of hybrid-heuristics in the optimisation "crowd".
There are literally hundreds of papers available on the topic and a lot of them are quite good. I have seen papers which employ a local-search (hill-climbing, not Tabu) for each offspring at each generation of GA to direct each offspring to the local optimum. The authors report good results. I have also seen papers which use a GA to optimise the cooling schedule of a simulated annealing algorithm for both a particular problem instance and also for a general case and have good results. I've also read a paper which adds a tabu list to a simulated annealing algorithm so that it prevents revisiting solutions it has seen in the past n iterations, unless some aspiration function is satisfied.
If you're working on timetabling (as your other comment suggests), I suggest you read some papers from PATAT (practice and theory in automated timetabling), particularly from E.K.Burke and P.Brucker who are very active and well-known in the field. A lot of the PATAT proceedings are freely available.
Try a Scholar search like this:
http://scholar.google.com/scholar?q=%22hybrid+heuristics%22+%22combinatorial+optimization%22+OR+timetabling+OR+scheduling&btnG=&hl=en&as_sdt=0%2C5&as_ylo=2006
It is very difficult to prove the convergence of these sorts of heuristics mathematically. I have seen a Markov chain representation of simulated-annealing which shows upper- and lower-bounds of convergence and there exists something similar for GA. Often you can use many different heuristics on a single problem, and only experimental results will show which is better. You may need to do computational experiments to see if your GA can be improved with a TS or more generic local search, but in general, hybrid heuristics seem to be the go these days.
I haven't combined tabu search with genetic algorithms yet, but I have combined it with simulated annealing. It's not really tabu search, it's more enhancing the other algorithm with tabu.
From my experience, checking if something is tabu doesn't have a high performance cost.

Initial Genetic Programming Parameters

I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?
You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.
Some personal experience:
Large population sizes seem to work
better when you have a noisy fitness
function, I think this is because the growth
of sub-groups in the population over successive generations acts
to give more sampling of
the fitness function. I typically use
100 for less noisy/deterministic functions, 1000+
for noisy.
For number of generations it is best to measure improvements in the
fitness function and stop when it
meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
Tree depth requirements are really dependent on your DSL. I sometimes try to do an
implementation without explicit
limits but penalise or eliminate
programs that run too long (which is probably
what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
Percentages for different mutation / recombination operators don't seem
to matter all that much. As long as
you have a comprehensive set of mutations, any reasonably balanced
distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
Why don't you try using a genetic algorithm to optimise these parameters for you? :)
Any problem in computer science can be
solved with another layer of
indirection (except for too many
layers of indirection.)
-David J. Wheeler
When I started looking into Genetic Algorithms I had the same question.
I wanted to collect data variating parameters on a very simple problem and link given operators and parameters values (such as mutation rates, etc) to given results in function of population size etc.
Once I started getting into GA a bit more I then realized that given the enormous number of variables this is a huge task, and generalization is extremely difficult.
talking from my (limited) experience, if you decide to simplify the problem and use a fixed way to implement crossover, selection, and just play with population size and mutation rate (implemented in a given way) trying to come up with general results you'll soon realize that too many variables are still into play because at the end of the day the number of generations after which statistically you will get a decent result (whatever way you wanna define decent) still obviously depend primarily on the problem you're solving and consequently on the genome size (representing the same problem in different ways will obviously lead to different results in terms of effect of given GA parameters!).
It is certainly possible to draft a set of guidelines - as the (rare but good) literature proves - but you will be able to generalize the results effectively in statistical terms only when the problem at hand can be encoded in the exact same way and the fitness is evaluated in a somehow an equivalent way (which more often than not means you're ealing with a very similar problem).
Take a look at Koza's voluminous tomes on these matters.
There are very different schools of thought even within the GP community -
Some regard populations in the (low) thousands as sufficient whereas Koza and others often don't deem if worthy to start a GP run with less than a million individuals in the GP population ;-)
As mentioned before it depends on your personal taste and experiences, resources and probably the GP system used!
Cheers,
Jan

What can be parameters other than time and space while analyzing certain algorithms?

I was interested to know about parameters other than space and time during analysing the effectiveness of an algorithms. For example, we can focus on the effective trap function while developing encryption algorithms. What other things can you think of ?
First and foremost there's correctness. Make sure your algorithm always works, no matter what the input. Even for input that the algorithm is not designed to handle, you should print an error mesage, not crash the entire application. If you use greedy algorithms, make sure they truly work in every case, not just a few cases you tried by hand.
Then there's practical efficiency. An O(N2) algorithm can be a lot faster than an O(N) algorithm in practice. Do actual tests and don't rely on theoretical results too much.
Then there's ease of implementation. You usually don't need the best intro sort implementation to sort an array of 100 integers once, so don't bother.
Look for worst cases in your algorithms and if possible, try to avoid them. If you have a generally fast algorithm but with a very bad worst case, consider detecting that worst case and solving it using another algorithm that is generally slower but better for that single case.
Consider space and time tradeoffs. If you can afford the memory in order to get better speeds, there's probably no reason not to do it, especially if you really need the speed. If you can't afford the memory but can afford to be slower, do that.
If you can, use existing libraries. Don't roll your own multiprecision library if you can use GMP for example. For C++, stuff like boost and even the STL containers and algorithms have been worked on for years by an army of people and are most likely better than you can do alone.
Stability (sorting) - Does the algorithm maintain the relative order of equal elements?
Numeric Stability - Is the algorithm prone to error when very large or small real numbers are used?
Correctness - Does the algorithm always give the correct answer? If not, what is the margin of error?
Generality - Does the algorithm work in many situation (e.g. with many different data types)?
Compactness - Is the program for the algorithm concise?
Parallelizability - How well does performance scale when the number of concurrent threads of execution are increased?
Cache Awareness - Is the algorithm designed to maximize use of the computer's cache?
Cache Obliviousness - Is the algorithm tuned for particulary cache-sizes / cache-line-sizes or does it perform well regardless of the parameters of the cache?
Complexity. 2 algorithms being the same in all other respects, the one that's much simpler is going to be a much better candidate for future customization and use.
Ease of parallelization. Depending on your use case, it might not make any difference or, on the other hand, make the algorithm useless because it can't use 10000 cores.
Stability - some algorithms may "blow up" with certain test conditions, e.g. take an inordinately long time to execute, or use an inordinately large amount of memory, or perhaps not even terminate.
For algorithms that perform floating point operations, the accumulation of round-off error is often a consideration.
Power consumption, for embedded algorithms (think smartcards).
One important parameter that is frequently measure in the analysis of algorithms is that of Cache hits and cache misses. While this is a very implementation and architecture dependent issue, it is possible to generalise somewhat. One particularly interesting property of the algorithm is being Cache-oblivious, which means that the algorithm will use the cache optimally on multiple machines with different cache sizes and structures without modification.
Time and space are the big ones, and they seem so plain and definitive, whereby they should often be qualified (1). The fact that the OP uses the word "parameter" rather than say "criteria" or "properties" is somewhat indicative of this (as if a big O value on time and on space was sufficient to frame the underlying algorithm).
Other criteria include:
domain of applicability
complexity
mathematical tractability
definitiveness of outcome
ease of tuning (may be tied to "complexity" and "tactability" afore mentioned)
ability of running the algorithm in a parallel fashion
(1) "qualified": As hinted in other answers, a -technically- O(n^2) algorithm may be found to be faster than say an O(n) algorithm, in 90% of the cases (which, btw, may turn out to be 100% of the practical cases)
worst case and best case are also interesting, especially when linked to some conditions in the input. if your input data shows some properties, an algorithm, by taking advantage of this property, may perform better that another algorithm which performs the same task but does not use that property.
for example, many sorting algorithm perform very efficiently when input are partially ordered in a specific way which minimizes the number of operations the algorithm has to execute.
(if your input is mostly sorted, an insertion sort will fit nicely, while you would never use that algorithm otherwise)
If we're talking about algorithms in general, then (in the real world) you might have to think about CPU/filesystem(read/write operations)/bandwidth usage.
True they are way down there in the list of things you need worry about these days, but given a massive enough volume of data and cheap enough infrastructure you might have to tweak your code to ease up on one or the other.
What you are interested aren’t parameters, rather they are intrinsic properties of an algorithm.
Anyway, another property you might be interested in, and analyse an algorithm for, concerns heuristics (or rather, approximation algorithms), i.e. algorithms which don’t find an exact solution but rather one that is (hopefully) good enough.
You can analyze how far a solution is from the theoretical optimal solution in the worst case. For example, an existing algorithm (forgot which one) approximates the optimal travelling salesman tour by a factor of two, i.e. in the worst case it’s twice as long as the optimal tour.
Another metric concerns randomized algorithms where randomization is used to prevent unwanted worst-case behaviours. One example is randomized quicksort; quicksort has a worst-case running time of O(n2) which we want to avoid. By shuffling the array beforehand we can avoid the worst-case (i.e. an already sorted array) with a very high probability. Just how high this probability is can be important to know; this is another intrinsic property of the algorithm that can be analyzed using stochastic.
For numeric algorithms, there's also the property of continuity: that is, whether if you change input slightly, output also changes only slightly. See also Continuity analysis of programs on Lambda The Ultimate for a discussion and a link to an academical paper.
For lazy languages, there's also strictness: f is called strict if f _|_ = _|_ (where _|_ denotes the bottom (in the sense of domain theory), a computation that can't produce a result due to non-termination, errors etc.), otherwise it is non-strict. For example, the function \x -> 5 is non-strict, because (\x -> 5) _|_ = 5, whereas \x -> x + 1 is strict.
Another property is determinicity: whether the result of the algorithm (or its other properties, such as running time or space consumption) depends solely on its input.
All these things in the other answers about the quality of various algorithms are important and should be considered.
But time and space are two things that vary at some rate compared to the size of the input (n). So what else can vary according to n?
There are several that are related to I/O. For example, the number of writes to a disk is an important one, which may not be directly shown by space and time estimates alone. This becomes particularly important with flash memory, where the number of writes to the same memory location is the significant metric in some algorithms.
Another I/O metric would be "chattiness". A networking protocol might send shorter messages more often adding up to the same space and time as another networking protocol, but some aspect of the system (perhaps billing?) might make minimizing either the size or number of the messages desireable.
And that brings us to Cost, which is a very important algorithmic consideration sometimes. The cost of an algorithm may be affected by both space and time in different amounts (consider the separate costing of server storage space and gigabits of data transfer), but the cost is the thing that you wish to minimize overall, so it may have its own big-O estimations.

What is an efficient way to go beyond a greedy algorithm

The domain of this question is scheduling operations on constrained hardware. The resolution of the result is the number of clock cycles the schedule fits within. The search space grows very rapidly where early decisions constrain future decisions and the total number of possible schedules grows rapidly and exponentially. A lot of the possible schedules are equivalent because just swapping the order of two instructions usually result in the same timing constraint.
Basically the question is what is a good strategy for exploring the vast search space without spending too much time. I expect to search only a small fraction but would like to explore different parts of the search space while doing so.
The current greedy algorithm tend to make stupid decisions early on sometimes and the attempt at branch and bound was beyond slow.
Edit:
Want to point out that the result is very binary with perhaps the greedy algorithm ending up using 8 cycles while there exists a solution using only 7 cycles using branch and bound.
Second point is that there are significant restrictions in data routing between instructions and dependencies between instructions that limits the amount of commonality between solutions. Look at it as a knapsack problem with a lot of ordering constraints as well as some solutions completely failing because of routing congestion.
Clarification:
In each cycle there is a limit to how many operations of each type and some operations have two possible types. There are a set of routing constraints which can be varied to be either fairly tight or pretty forgiving and the limit depends on routing congestion.
Integer linear optimization for NP-hard problems
Depending on your side constraints, you may be able to use the critical path method or
(as suggested in a previous answer) dynamic programming. But many scheduling problems are NP-hard just like the classical traveling sales man --- a precise solution has a worst case of exponential search time, just as you describe in your problem.
It's important to know that while NP-hard problems still have a very bad worst case solution time there is an approach that very often produces exact answers with very short computations (the average case is acceptable and you often don't see the worst case).
This approach is to convert your problem to a linear optimization problem with integer variables. There are free-software packages (such as lp-solve) that can solve such problems efficiently.
The advantage of this approach is that it may give you exact answers to NP-hard problems in acceptable time. I used this approach in a few projects.
As your problem statement does not include more details about the side constraints, I cannot go into more detail how to apply the method.
Edit/addition: Sample implementation
Here are some details about how to implement this method in your case (of course, I make some assumptions that may not apply to your actual problem --- I only know the details form your question):
Let's assume that you have 50 instructions cmd(i) (i=1..50) to be scheduled in 10 or less cycles cycle(t) (t=1..10). We introduce 500 binary variables v(i,t) (i=1..50; t=1..10) which indicate whether instruction cmd(i) is executed at cycle(t) or not. This basic setup gives the following linear constraints:
v_it integer variables
0<=v_it; v_it<=1; # 1000 constraints: i=1..50; t=1..10
sum(v_it: t=1..10)==1 # 50 constraints: i=1..50
Now, we have to specify your side conditions. Let's assume that operations cmd(1)...cmd(5) are multiplication operations and that you have exactly two multipliers --- in any cycle, you may perform at most two of these operations in parallel:
sum(v_it: i=1..5)<=2 # 10 constraints: t=1..10
For each of your resources, you need to add the corresponding constraints.
Also, let's assume that operation cmd(7) depends on operation cmd(2) and needs to be executed after it. To make the equation a little bit more interesting, lets also require a two cycle gap between them:
sum(t*v(2,t): t=1..10) + 3 <= sum(t*v(7,t): t=1..10) # one constraint
Note: sum(t*v(2,t): t=1..10) is the cycle t where v(2,t) is equal to one.
Finally, we want to minimize the number of cycles. This is somewhat tricky because you get quite big numbers in the way that I propose: We give assign each v(i,t) a price that grows exponentially with time: pushing off operations into the future is much more expensive than performing them early:
sum(6^t * v(i,t): i=1..50; t=1..10) --> minimum. # one target function
I choose 6 to be bigger than 5 to ensure that adding one cycle to the system makes it more expensive than squeezing everything into less cycles. A side-effect is that the program will go out of it's way to schedule operations as early as possible. You may avoid this by performing a two-step optimization: First, use this target function to find the minimal number of necessary cycles. Then, ask the same problem again with a different target function --- limiting the number of available cycles at the outset and imposing a more moderate price penalty for later operations. You have to play with this, I hope you got the idea.
Hopefully, you can express all your requirements as such linear constraints in your binary variables. Of course, there may be many opportunities to exploit your insight into your specific problem to do with less constraints or less variables.
Then, hand your problem off to lp-solve or cplex and let them find the best solution!
At first blush, it sounds like this problem might fit into a dynamic programming solution. Several operations may take the same amount of time so you might end up with overlapping subproblems.
If you can map your problem to the "travelling salesman" (like: Find the optimal sequence to run all operations in minimum time), then you have an NP-complete problem.
A very quick way to solve that is the ant algorithm (or ant colony optimization).
The idea is that you send an ant down every path. The ant spreads a smelly substance on the path which evaporates over time. Short parts mean that the path will stink more when the next ant comes along. Ants prefer smelly over clean paths. Run thousands of ants through the network. The most smelly path is the optimal one (or at least very close).
Try simulated annealing, cfr. http://en.wikipedia.org/wiki/Simulated_annealing .

Resources