When to use Tabu Search with Genetic Algorithms and when not? - performance

Tabu Search may be using at Genetic Algorithms.
Genetic Algorithms may need many generations to get a success so running at high performance is important for them. Tabu Search is for enhancement for avoiding local maximums and good with memory mechanism to get better success through the iterations. However Tabu Search makes the algorithm more slower as usual beside its benefits.
My question is:
Is there any research about when to use Tabu Search with Genetic Algorithms and when not?

Generally speaking, GAs spend a lot of time sampling points that are trivially suboptimal. Suppose you're optimizing a function that looks like a couple of camel humps. GAs will dump points all over the place initially, and slowly converge to the points being at the top of the humps. However, even a very simple local search algorithm can take a point that the GA generates on the slope of a hump and push it straight to the top of the hump essentially immediately. If you let every point the GA generates go through this simple local optimization, then you end up with a GA searching only the space of local optima, which generally will greatly improve your chances of finding the best solutions. The problem is that when you start on real problems instead of camel humps, simple local search algorithms often aren't powerful enough to find the really good local optima, but something like tabu search can be used in their place.
There are two drawbacks. One, each generation of the GA goes much more slowly (but you need many fewer generations usually). Two, you lose some diversity, which can cause you to converge to a suboptimal solution more often.
In practice, I would always include some form of local search inside a GA whenever possible. No Free Lunch tells us that sometimes you'll make things worse, but after ten years or so of doing GA and local search research professionally, I'd pretty much always put up a crisp new $100 bill that says that local search will improve things for the majority of cases you really care about. It doesn't have to be tabu search; you could use Simulated Annealing, VDS, or just a simple next-ascent hill climber.

When you combine multiple heuristics together, you have what's referred to as a hybrid-heuristic.
It has been a trend in the last decade or so to explore the advantages and disadvantages of hybrid-heuristics in the optimisation "crowd".
There are literally hundreds of papers available on the topic and a lot of them are quite good. I have seen papers which employ a local-search (hill-climbing, not Tabu) for each offspring at each generation of GA to direct each offspring to the local optimum. The authors report good results. I have also seen papers which use a GA to optimise the cooling schedule of a simulated annealing algorithm for both a particular problem instance and also for a general case and have good results. I've also read a paper which adds a tabu list to a simulated annealing algorithm so that it prevents revisiting solutions it has seen in the past n iterations, unless some aspiration function is satisfied.
If you're working on timetabling (as your other comment suggests), I suggest you read some papers from PATAT (practice and theory in automated timetabling), particularly from E.K.Burke and P.Brucker who are very active and well-known in the field. A lot of the PATAT proceedings are freely available.
Try a Scholar search like this:
http://scholar.google.com/scholar?q=%22hybrid+heuristics%22+%22combinatorial+optimization%22+OR+timetabling+OR+scheduling&btnG=&hl=en&as_sdt=0%2C5&as_ylo=2006
It is very difficult to prove the convergence of these sorts of heuristics mathematically. I have seen a Markov chain representation of simulated-annealing which shows upper- and lower-bounds of convergence and there exists something similar for GA. Often you can use many different heuristics on a single problem, and only experimental results will show which is better. You may need to do computational experiments to see if your GA can be improved with a TS or more generic local search, but in general, hybrid heuristics seem to be the go these days.

I haven't combined tabu search with genetic algorithms yet, but I have combined it with simulated annealing. It's not really tabu search, it's more enhancing the other algorithm with tabu.
From my experience, checking if something is tabu doesn't have a high performance cost.

Related

Algorithm Perfection Vs Time Analysis : Does Time complexity matters everytime?

I have a very basic and general doubt related to algorithm design. I've learnt basic algorithm and now learning randomized algorithm. Everywhere I observed that a professor mostly focuses on designing the algorithm that will ultimately try to reduces the complexity.
The usual way(What I observed) is to learn some basic(or an older) algorithm which behaves badly in terms of complexity and so the objective is to modify that older one with a newer algorithm which should focus on reducing the complexity, without affecting the output.
But in most of algorithm I've studied, especially distributed algorithms (in distributed operating systems) such as algorithms for distributed mutual exclusion, distributed deadlock detection etc., what I observed is that(and mostly I think that) the design of the algorithm should not focus only on complexity enhancement but it should focus on the perfection of the algorithm as well.
Lets take an example of distributed mutual exclusion algorithm. The very basic algorithm is a Lamport's algorithm and the modified version(by enhancing the complexity) of it is the Ricart-Agarwala algorithm. Since in distributed environment the communication is mostly by means of message passing, for distributed mutual exclusion we have three kinds of messages : a) Request critical resource b) Reply the request c) Release critical resource. The basic algorithm uses extra release messages(to inform all sites that the my site has released the critical resource, so you can enter). But in the advanced version what they did is they discarded these release messages by accommodating it in reply messages. And so they came up with some reduced complexity solution.
But when I tried the implementation of these algorithms in java, I observed that even if the complexity of basic algorithm was bit higher but it was more perfect than the advanced one. Because by reducing the number of messages transferred (in advanced solution), local site is no longer aware of the fact that remote site has actually released the resource or not because on the confirmation of release message only site updates its local data structures such as request queue etc. If we don't send any explicit notification for release, then requests remains pending unnecessarily in request queue of the local site for entire run.
So my doubt is that if enhancement of complexity is so important, why can't perfection ? I mean if algorithm is producing perfect results at the cost of bit higher complexity then how does it matters as far as I am getting perfection in output as compared to the enhanced complexity solution which lacks in perfection ?
Note : By perfection I don't mean correct/incorrect results. Results are always correct. Only the perfection or accuracy of the produced result varies.
Principally a fair complexity comparision is done for two algoritms that produce exactly the same output. E.g sorting.
In your case it is different, you describe algoritms with different behaviour.
To choose the better suited algorithm many factors decide:
Ease of implementations (in praxis very important)
A faster algorithm, that lacks some functionallity like in your case must be incredible faster (faktor 10 on expected data volume) to choose it, or easier to implement.
robustness: well know algo, successfuly used since 10 years, or a new algo from a paper where chance are high that it works only the environment (optimized for the algo) by the scientist. (I know such a case for a telecom network algo)
Consider any NP-complete problem (e.g. the travelling salesman problem).
There are no known non-exponential exact algorithms for these problems (except in special cases), so it would literally take years (or much longer) to find an exact solution for any reasonably-sized version of these problems.
So, instead we use heuristics and approximations (and possibly some randomness) to get a non-exact solution in a reasonable time-frame.
NP-complete problems are just an extreme example - we can also just have a few seconds to get a solution (for whatever reason), but finding an exact solution will take a few minutes. So it all comes down to balancing out how long we want to run the algorithm for and how good we want the results to be (and development time also certainly plays a role).
I hope I understood what you were asking correctly and that this helps.
Instead of "perfection", maybe you should consider "fitness for a particular purpose".
For your example of a distributed mutual exclusion algorithm, consider the "simple" and "improved" algorithms from different viewpoints. As another answer pointed out, the two algorithms behave differently; my point is that different people are interested in different aspects of that behavior.
Someone using an algorithm for a particular purpose probably does not care about all aspects of its behavior. For your example, you are concerned about pending resource locks. However, if the mutual exclusion algorithm is expected to be running all the time, the user might not care, because the locks will be returned promptly anyway, while using fewer messages than the simple version. If you want both efficiency and promptness, there is likely some way to accommodate both -- at the cost of greater complexity -- and if you're looking for practical "perfection", this is the logical endpoint.
A computer scientist does not know how his algorithm might be used. In general, he cannot anticipate all possible variations on a particular technique, and you would not want to read them all if he could! When publishing an algorithm, clarity of expression is the "perfection" you're pursuing -- the idea should be described as simply as possible.

Multiple parameter optimization with lots of local minima

I'm looking for algorithms to find a "best" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse, testing a set of parameters is very slow - on the order of 1 minute - and I can't compute the gradient directly.
Are there any well-known algorithms for this kind of optimization?
I've had moderate success with just trying random values. I'm wondering if I can improve the performance by making the random parameter chooser have a lower chance of picking parameters close to ones that had produced bad results in the past. Is there a name for this approach so that I can search for specific advice?
More info:
Parameters are continuous
There are on the order of 5-10 parameters. Certainly not more than 10.
How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?
Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.
Simulated
Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
Particle
Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.
The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.
Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.
By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.
Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.
The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.
Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.
Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.
I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).
I've also tried an algorithm that does the following:
Pick a random point and a random direction
Evaluate the function
Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.
This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.
If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.
This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.
Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.
I can't really help you with finding an algorithm for your specific problem.
However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.
If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.
There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.
Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:
get 100 computers to cut your parameter testing time to some reasonable time
really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.

Initial Genetic Programming Parameters

I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?
You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.
Some personal experience:
Large population sizes seem to work
better when you have a noisy fitness
function, I think this is because the growth
of sub-groups in the population over successive generations acts
to give more sampling of
the fitness function. I typically use
100 for less noisy/deterministic functions, 1000+
for noisy.
For number of generations it is best to measure improvements in the
fitness function and stop when it
meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
Tree depth requirements are really dependent on your DSL. I sometimes try to do an
implementation without explicit
limits but penalise or eliminate
programs that run too long (which is probably
what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
Percentages for different mutation / recombination operators don't seem
to matter all that much. As long as
you have a comprehensive set of mutations, any reasonably balanced
distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
Why don't you try using a genetic algorithm to optimise these parameters for you? :)
Any problem in computer science can be
solved with another layer of
indirection (except for too many
layers of indirection.)
-David J. Wheeler
When I started looking into Genetic Algorithms I had the same question.
I wanted to collect data variating parameters on a very simple problem and link given operators and parameters values (such as mutation rates, etc) to given results in function of population size etc.
Once I started getting into GA a bit more I then realized that given the enormous number of variables this is a huge task, and generalization is extremely difficult.
talking from my (limited) experience, if you decide to simplify the problem and use a fixed way to implement crossover, selection, and just play with population size and mutation rate (implemented in a given way) trying to come up with general results you'll soon realize that too many variables are still into play because at the end of the day the number of generations after which statistically you will get a decent result (whatever way you wanna define decent) still obviously depend primarily on the problem you're solving and consequently on the genome size (representing the same problem in different ways will obviously lead to different results in terms of effect of given GA parameters!).
It is certainly possible to draft a set of guidelines - as the (rare but good) literature proves - but you will be able to generalize the results effectively in statistical terms only when the problem at hand can be encoded in the exact same way and the fitness is evaluated in a somehow an equivalent way (which more often than not means you're ealing with a very similar problem).
Take a look at Koza's voluminous tomes on these matters.
There are very different schools of thought even within the GP community -
Some regard populations in the (low) thousands as sufficient whereas Koza and others often don't deem if worthy to start a GP run with less than a million individuals in the GP population ;-)
As mentioned before it depends on your personal taste and experiences, resources and probably the GP system used!
Cheers,
Jan

City building strategy algorithms

I'm looking for some papers on finding an infrastructure development strategy in games like Starcraft / Age of Empires. Basic facts characterising those games are:
continuous time (well - it could be split into 10s periods, or something like that)
many variables describing growth (many resources, buildings levels, etc.)
many variables influencing growth (technology upgrades, levels, etc.)
Most of what I could find is basically either:
tree search minimising time to get to a given condition (building/technology at level X)
tree search maximising value = each game variable*bias
genetic algorithms... obvious doing either of the above
Are there any better algorithms that can be tuned to look for a perfect solution of the early phase?
You might find some information on one or more of these books:
http://www.gamedev.net/columns/books/books.asp?CategoryID=7
I do not know of any specific algorithm but this does sound like a traveling salesman problem. It does look like you have your base rules so you are already on your way. If you know what end condition you want to reach then it shouldn't be to hard to build a heuristic algorithm for the above rules. Then you could just run a simulation of the build outs and then measure them against each other. Each time you do that you would have a better idea of how to get where you want. Check out this to learn about heuristic algorithms.
There is no "perfect solution" for the early phase (if your game is complex enough). If you've played these games online, you'll see players using various strategies and all of these working depending on the other player's strategy. Some try to attack very early, some are more defensive, some prefer developing economically rather than having lots of unprepared soldiers.
Given this, I believe you must try to figure out a good value function to be maximized.

What is the state of the art in computer chess tree searching?

I'm not interested in tiny optimizations giving few percents of the speed.
I'm interested in the most important heuristics for alpha-beta search. And most important components for evaluation function.
I'm particularly interested in algorithms that have greatest (improvement/code_size) ratio.
(NOT (improvement/complexity)).
Thanks.
PS
Killer move heuristic is a perfect example - easy to implement and powerful.
Database of heuristics is too complicated.
Not sure if you're already aware of it, but check out the Chess Programming Wiki - it's a great resource that covers just about every aspect of modern chess AI. In particular, relating to your question, see the Search and Evaluation sections (under Principle Topics) on the main page. You might also be able to discover some interesting techniques used in some of the programs listed here. If your questions still aren't answered, I would definitely recommend you ask in the Chess Programming Forums, where there are likely to be many more specialists around to answer. (Not that you won't necessarily get good answers here, just that it's rather more likely on topic-specific expert forums).
MTD(f) or one of the MTD variants is a big improvement over standard alpha-beta, providing you don't have really fine detail in your evaluation function and assuming that you're using the killer heuristic. The history heuristic is also useful.
The top-rated chess program Rybka has apparently abandoned MDT(f) in favour of PVS with a zero-aspiration window on the non-PV nodes.
Extended futility pruning, which incorporates both normal futility pruning and deep razoring, is theoretically unsound, but remarkably effective in practice.
Iterative deepening is another useful technique. And I listed a lot of good chess programming links here.
Even though many optimizations based on heuristics(I mean ways to increase the tree depth without actualy searching) discussed in chess programming literature, I think most of them are rarely used. The reason is that they are good performance boosters in theory, but not in practice.
Sometimes these heuristics can return a bad(I mean not the best) move too.
The people I have talked to always recommend optimizing the alpha-beta search and implementing iterative deepening into the code rather than trying to add the other heuristics.
The main reason is that computers are increasing in processing power, and research[need citation I suppose] has shown that the programs that use their full CPU time to brute force the alpha-beta tree to the maximum depth have always outrunned the programs that split their time between a certain levels of alpha-beta and then some heuristics,.
Even though using some heuristics to extend the tree depth can cause more harm than good, ther are many performance boosters you can add to the alpha-beta search algorithm.
I am sure that you are aware that for alpha-beta to work exactly as it is intended to work, you should have a move sorting mechanisn(iterative deepening). Iterative deepening can give you about 10% performace boost.
Adding Principal variation search technique to alpha beta may give you an additional 10% boost.
Try the MTD(f) algorithm too. It can also increase the performance of your engine.
One heuristic that hasn't been mentioned is Null move pruning.
Also, Ed Schröder has a great page explaining a number of tricks he used in his Rebel engine, and how much improvement each contributed to speed/performance: Inside Rebel
Using a transposition table with a zobrist hash
It takes very little code to implement [one XOR on each move or unmove, and an if statement before recursing in the game tree], and the benefits are pretty good, especially if you are already using iterative deepening, and it's pretty tweakable (use a bigger table, smaller table, replacement strategies, etc)
Killer moves are good example of small code size and great improvement in move ordering.
Most board game AI algorithms are based on http://en.wikipedia.org/wiki/Minmax MinMax. The goal is to minimize their options while maximizing your options. Although with Chess this is a very large and expensive runtime problem. To help reduce that you can combine minmax with a database of previously played games. Any game that has a similar board position and has a pattern established on how that layout was won for your color can be used as far as "analyzing" where to move next.
I am a bit confused on what you mean by improvement/code_size. Do you really mean improvement / runtime analysis (big O(n) vs. o(n))? If that is the case, talk to IBM and big blue, or Microsoft's Parallels team. At PDC I spoke with a guy (whose name escapes me now) who was demonstrating Mahjong using 8 cores per opponent and they won first place in the game algorithm design competition (whose name also escapes me).
I do not think there are any "canned" algorithms out there to always win chess and do it very fast. The way that you would have to do it is have EVERY possible previously played game indexed in a very large dictionary based database and have pre-cached the analysis of every game. It would be a VERY compex algorithm and would be a very poor improvement / complexity problem in my opinion.
I might be slightly off topic but "state of the art" chess programs use MPI such as Deep Blue for massive parallel power.
Just consider than parallel processing plays a great role in modern chess

Resources