Please correct me if I'm wrong, but it is my understanding that crossovers tend to lead towards local optima, while mutation increases the random walk of the search thus tend to help in escaping local optima tendencies. This insight I got from reading the following: Introduction to Genetic Algorithms and Wikipedia's article on Genetic Operators.
My question is, what is the best or most ideal way to pick which individuals go through crossover and which go through mutation? Is there a rule of thumb for this? What are the implications?
Thanks in advance. This is a pretty specific question that is a bit hard to Google with (for me at least).
The selection of individuals to participate in crossover operation must consider the fitness, that is "better individuals are more likely to have more child programs than inferior individuals.":
http://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/23Selection.html#7_3
The most common way to perform this is using Tournament Selection (see wikipedia).
Selection of the individuals to mutate should not consider fitness, in fact, should be random. And the number of elements mutated per generation (mutation rate) should be very low, around 1% (or it may fall into random search):
http://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/24RecombinationandMutation.html#7_4
In my experience, tweaking the tournament parameters just a bit could lead to substantial changes in the final results (for better or for worse), so it is usually a good idea to play with these parameters until you find a "sweet spot".
Related
I have implemented a simple Genetic Algorithm to generate short story based on Aesop fables.
Here are the parameters I'm using:
Mutation: Single word swap mutation with tested rate with 0.01.
Crossover: Swap the story sentences at given point. rate - 0.7
Selection: Roulette wheel selection - https://stackoverflow.com/a/5315710/536474
Fitness function: 3 different function. highest score of each is 1.0. so total highest fitness score is 3.0.
Population size: Since I'm using 86 Aesop fables, I tested population size with 50.
Initial population: All 86 fable sentence orders are shuffled in order to make complete nonsense. And my goal is to generate something meaningful(at least at certain level) from these structure lost fables.
Stop Condition: 3000 generations.
And the results are below:
However, this still did not produce a favorable result. I was expecting the plot that goes up over the generations. Any ideas to why my GA performing worse result?
Update: As all of you suggested, I've employed elitism by 10% of current generation copied to next generation. Result still remains the same:
Probably I should use tournament selection.
All of the above responses are great and I'd look into them. I'll add my thoughts.
Mutation
Your mutation rate seems fine although with Genetic Algorithms mutation rate can cause a lot of issues if it's not right. I'd make sure you test a lot of other values to be sure.
With mutation I'd maybe use two types of mutation. One that replaces words with other from your dictionary, and one that swaps two words within a sentence. This would encourage diversifying the population as a whole, and shuffling words.
Crossover
I don't know exactly how you've implemented this but one-point crossover doesn't seem like it'll be that effective in this situation. I'd try to implement an n-point crossover, which will do a much better job of shuffling your sentences. Again, I'm not sure how it's implemented but just swapping may not be the best solution. For example, if a word is at the first point, is there ever any way for it to move to another position, or will it always be the first word if it's chosen by selection?
If word order is important for your chosen problem simple crossover may not be ideal.
Selection
Again, this seems fine but I'd make sure you test other options. In the past I've found rank based roulette selection to be a lot more successful.
Fitness
This is always the most important thing to consider in any genetic algorithm and with the complexity of problem you have I'd make doubly sure it works. Have you tested that it works with 'known' problems?
Population Size
Your value seems small but I have seen genetic algorithms work successfully with small populations. Again though, I'd experiment with much larger populations to see if your results are any better.
The most popular suggestion so far is to implement elitism and I'd definitely recommend it. It doesn't have to be much, even just the best couple of chromosome every generation (although as with everything else I'd try different values).
Another sometimes useful operator to implement is culling. Destroy a portion of your weakest chromosomes, or one that are similar to others (or both) and replace them with new chromosomes. This should help to stop your population going 'stale', which, from your graph looks like it might be happening. Mutation only does so much to diversify the population.
You may be losing the best combinations, you should keep the best of each generation without crossing(elite). Also, your function seems to be quite stable, try other types of mutations, that should improve.
Drop 5% to 10% of your population to be elite, so that you don't lose the best you have.
Make sure your selection process is well set up, if bad candidates are passing through very often it'll ruin your evolution.
You might also be stuck in a local optimum, you might need to introduce other stuff into your genome, otherwise you wont move far.
Moving sentences and words around will not probably get you very far, introducing new sentences or words might be interesting.
If you think of story as a point x,y and your evaluation function as f(x,y), and you're trying to find the max for f(x,y), but your mutation and cross-over are limited to x -> y, y ->y, it makes sense that you wont move far. Granted, in your problem there is a lot more variables, but without introducing something new, I don't think you can avoid locality.
As #GettnDer said, elitism might help a lot.
What I would suggest is to use different selection strategy. The roulette wheel selection has one big problem: imagine that the best indidivual's fitness is e.g. 90% of the sum of all fitnesses. Then the roulette wheel is not likely to select the other individuals (see e.g. here). The selction strategy I like the most is the tournament selection. It is much more robust to big differences in fitness values and the selection pressure can be controlled very easily.
Novelty Search
I would also give a try to Novelty Search. It's relatively new approach in evolutionary computation, where you don't do the selection based on the actual fitness but rather based on novelty which is supposed to be some metric of how an individual is different in its behaviour from the others (but you still compute the fitness to catch the good ones). Of special interest might be combinations of classical fitness-driven algorithms and novelty-driven ones, like the this one by J.-B. Mouret.
When working with genetic algorithms, it is a good practice to structure you chromosome in order to reflect the actual knowledge on the process under optimization.
In your case, since you intend to generate stories, which are made of sentences, it could improve your results if you transformed your chromosomes into structured phrases, line <adjectives>* <subject> <verb> <object>* <adverbs>* (huge simplification here).
Each word could then be assigned a class. For instance, Fox=subject , looks=verb , grapes=object and then your crossover operator would exchange elements from the same category between chromosomes. Besides, your mutation operator could only insert new elements of a proper category (for instance, an adjective before the subject) or replace a word for a random word in the same category.
This way you would minimize the number of nonsensical chromosomes (like Fox beautiful grape day sky) and improve the discourse generation power for your GA.
Besides, I agree with all previous comments: if you are using elitism and the best performance decreases, then you are implementing it wrong (notice that in a pathological situation it may remain constant for a long period of time).
I hope it helps.
In each evolution generation, a new population is constructed by the genetic operators.
In my implementation, I combine the new population and the old population together, and then sort all of them by the fitness. Among them, the top 100 ranked genomes are returned as the population for the next evolution generation (Suppose the population consists of 100 genomes).
This mechanism works well in my implementation. So, what is the name of this mechanism? I have read about it but forget its name. Could anyone tell me and give some references?
This is a form of crowding. For example, NSGA-II (a multi-objective GA) uses a crowding mechanism more or less identical to the one you described.
But it's a form of elitism too.
It's elitism - see info at Wikipedia
Elitism usually leads quicker to a better solution as "good" solutions are not lost. However in certain solution spaces you may not reach the global optimum. In some of my GA's I used a larger population instead of elitism to carry over good gens. Also reinitialization (when genoms start to become similar) can help to find the gloabl optimum. You may give it a try.
I am implementing my M.Sc dissertation and in theory aspect of my thesis, i have a big problem.
suppose we want to use genetic algorithms.
we have 2 kind of functions :
a) some functions that have relations like this : ||x1 - x2||>>||f(x1) - f(x2)||
for example : y=(1/10)x^2
b) some functions that have relations like this : ||x1 - x2||<<||f(x1) - f(x2)||
for example : y=x^2
my question is that which of the above kind of functions have more difficulties than other when we want to use genetic algorithms to find optimum ( never mind MINIMUM or MAXIMUM ).
Thank you a lot,
Armin
I don't believe you can answer this question in general without imposing additional constraints.
It's going to depend on the particular type of genetic algorithm you're dealing with. If you use fitness proportional (roulette-wheel) selection, then altering the range of fitness values can matter a great deal. With tournament selection or rank-biased selection, as long as the ordering relations hold between individuals, there will be no effects.
Even if you can say that it does matter, it's still going to be difficult to say which version is harder for the GA. The main effect will be on selection pressure, which causes the algorithm to converge more or less quickly. Is that good or bad? It depends. For a function like f(x)=x^2, converging as fast as possible is probably great, because there's only one optimum, so find it as soon as possible. For a more complex function, slower convergence can be required to find good solutions. So for any given function, scaling and/or translating the fitness values may or may not make a difference, and if it does, the difference may or may not be helpful.
There's probably also a No Free Lunch argument that no single best choice exists over all problems and optimization algorithms.
I'd be happy to be corrected, but I don't believe you can say one way or the other without specifying much more precisely exactly what class of algorithms and problems you're focusing on.
I've implemented a number of genetic algorithms to solve a variety of a problems. However I'm still skeptical of the usefulness of crossover/recombination.
I usually first implement mutation before implementing crossover. And after I implement crossover, I don't typically see a significant speed-up in the rate at which a good candidate solution is generated compared to simply using mutation and introducing a few random individuals in each generation to ensure genetic .
Of course, this may be attributed to poor choices of the crossover function and/or the probabilities, but I'd like to get some concrete explanation/evidence as to why/whether or not crossover improves GAs. Have there been any studies regarding this?
I understand the reasoning behind it: crossover allows the strengths of two individuals to be combined into one individual. But to me that's like saying we can mate a scientist and a jaguar to get a smart and fast hybrid.
EDIT: In mcdowella's answer, he mentioned how finding a case where cross-over can improve upon hill-climbing from multiple start points is non-trivial. Could someone elaborate upon this point?
It strongly depends on the smoothness of your search space. Perverse example if every "geneome" was hashed before being used to generate "phenomes" then you would just be doing random search.
Less extreme case, this is why we often gray-code integers in GAs.
You need to tailor your crossover and mutation functions to the encoding. GAs decay quite easily if you throw unsympathetic calculations at them. If the crossover of A and B doesn't yield something that's both A-like and B-like then it's useless.
Example:
The genome is 3 bits long, bit 0 determines whether it's land-dwelling or sea-dwelling. Bits 1-2 describe digestive functions for land-dwelling creatures and visual capabilities for sea-dwelling creatures.
Consider two land-dwelling creatures.
| bit 0 | bit 1 | bit 2
----+-------+-------+-------
Mum | 0 | 0 | 1
Dad | 0 | 1 | 0
They might crossover between bits 1 and 2 yielding a child whose digestive function is some compromise between Mum's and Dad's. Great.
This crossover seems sensible provided that bit 0 hasn't changed. If is does then your crossover function has turned some kind of guts into some kind of eyes. Er... Wut? It might as well have been a random mutations.
Begs the question how DNA gets around this problem. Well, it's both modal and hierarchial. There are large sections which can change a lot without much effect, in others a single mutation can have drastic effects (like bit 0 above). Sometimes the value of X affects the behaviour tiggered by Y, and all values of X are legal and can be explored whereas a modification to Y makes the animal segfault.
Theoretical analyses of GAs often use extremely crude encodings and they suffer more from numerical issues than semantic ones.
You are correct in being skeptical about the cross-over operation. There is a paper called
"On the effectiveness of crossover in simulated evolutionary optimization" (Fogel and Stayton, Biosystems 1994). It is available for free at 1.
By the way, if you haven't already I recommend looking into a technique called "Differential Evolution". It can be very good at solving many optimization problems.
My impression is that hill-climbing from multiple random starts is very effective, but that trying to find a case where cross-over can improve on this is non-trivial. One reference is "Crossover: The Divine Afflatus in Search" by David Icl˘anzan, which states
The traditional GA theory is pillared on the Building Block Hypothesis
(BBH) which states that Genetic Algorithms (GAs) work by discovering,
emphasizing and recombining low order schemata in high-quality
strings, in a strongly parallel manner. Historically, attempts to
capture the topological fitness landscape features which exemplify
this intuitively straight-forward process, have been mostly
unsuccessful. Population-based recombinative methods had been
repeatedly outperformed on the special designed abstract test suites,
by different variants of mutation-based algorithms.
A related paper is "Overcoming Hierarchical Difficulty by Hill-Climbing the
Building Block Structure" by David Iclănzan and Dan Dumitrescu, which states
The Building Block Hypothesis suggests that Genetic Algorithms (GAs)
are well-suited for hierarchical problems, where efficient solving
requires proper problem decomposition and assembly of solution from
sub-solution with strong non-linear interdependencies. The paper
proposes a hill-climber operating over the building block (BB) space
that can efficiently address hierarchical problems.
John Holland's two seminal works "Adaptation in Natural and Artificial Systems" and "Hidden Order" (less formal) discuss the theory of crossover in depth. IMO, Goldberg's "Genetic Algorithms in Search, Optimization, and Machine Learning" has a very approachable chapter on mathematical foundations which includes such conclusions as:
With both crossover and reproduction....those schemata with both above-average performance and short defining lengths are going to be sampled at exponentially increasing rates.
Another good reference might be Ankenbrandt's "An Extension to the Theory of Convergence and a Proof of the Time Complexity of Genetic Algorithms" (in "Foundations of Genetic Algorithms" by Rawlins).
I'm surprised that the power of crossover has not been apparent to you in your work; when I began using genetic algorithms and saw how powerfully "directed" crossover seemed, I felt I gained an insight into evolution that overturned what I had been taught in school. All the questions about "how could mutation lead to this and that?" and "Well, over the course of so many generations..." came to seem fundamentally misguided.
The crossover and mutation!! Actually both of them are necessary.
Crossover is an explorative operator, but the mutation is an exploitive one. Considering the structure of solutions, problem, and the likelihood of optimization rate, its very important to select a correct value for Pc and Pm (probability of crossover and mutation).
Check this GA-TSP-Solver, it uses many crossover and mutation methods. You can test any crossover alongside mutations with given probabilities.
it mainly depends on the search space and the type of crossover you are using. For some problems I found that using crossover at the beginning and then mutation, it will speed up the process for finding a solution, however this is not very good approach since I will end up on finding similar solutions. If we use both crossover and mutation I usually get better optimized solutions. However for some problems crossover can be very destructive.
Also genetic operators alone are not enough to solve large/complex problems. When your operators don't improve your solution (so when they don't increase the value of fitness), you should start considering other solutions such as incremental evolution, etc..
I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?
You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.
Some personal experience:
Large population sizes seem to work
better when you have a noisy fitness
function, I think this is because the growth
of sub-groups in the population over successive generations acts
to give more sampling of
the fitness function. I typically use
100 for less noisy/deterministic functions, 1000+
for noisy.
For number of generations it is best to measure improvements in the
fitness function and stop when it
meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
Tree depth requirements are really dependent on your DSL. I sometimes try to do an
implementation without explicit
limits but penalise or eliminate
programs that run too long (which is probably
what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
Percentages for different mutation / recombination operators don't seem
to matter all that much. As long as
you have a comprehensive set of mutations, any reasonably balanced
distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
Why don't you try using a genetic algorithm to optimise these parameters for you? :)
Any problem in computer science can be
solved with another layer of
indirection (except for too many
layers of indirection.)
-David J. Wheeler
When I started looking into Genetic Algorithms I had the same question.
I wanted to collect data variating parameters on a very simple problem and link given operators and parameters values (such as mutation rates, etc) to given results in function of population size etc.
Once I started getting into GA a bit more I then realized that given the enormous number of variables this is a huge task, and generalization is extremely difficult.
talking from my (limited) experience, if you decide to simplify the problem and use a fixed way to implement crossover, selection, and just play with population size and mutation rate (implemented in a given way) trying to come up with general results you'll soon realize that too many variables are still into play because at the end of the day the number of generations after which statistically you will get a decent result (whatever way you wanna define decent) still obviously depend primarily on the problem you're solving and consequently on the genome size (representing the same problem in different ways will obviously lead to different results in terms of effect of given GA parameters!).
It is certainly possible to draft a set of guidelines - as the (rare but good) literature proves - but you will be able to generalize the results effectively in statistical terms only when the problem at hand can be encoded in the exact same way and the fitness is evaluated in a somehow an equivalent way (which more often than not means you're ealing with a very similar problem).
Take a look at Koza's voluminous tomes on these matters.
There are very different schools of thought even within the GP community -
Some regard populations in the (low) thousands as sufficient whereas Koza and others often don't deem if worthy to start a GP run with less than a million individuals in the GP population ;-)
As mentioned before it depends on your personal taste and experiences, resources and probably the GP system used!
Cheers,
Jan