Applying mutation in steady-state genetic algorithm - genetic-algorithm

I’m implementing a steady-state genetic algorithm to perform symbolic regression.
My questions are about the relation between mutation and crossover operators.
I always consult a mutation probability (Pm) before applying mutation and a tournament selection to choose parents based in their error.
First question:
Mutation must be applied ONLY to children obtained after crossover (or another genetic operator)
or can be applied directly to a 1 parent to generate a new individual ?
Second question:
Children obtained after a crossover operation must always try a mutation (of course with Pm)?
Thank you all in advance.

Usually the mating process includes cross-over and mutation, so to answer your question a standard way of doing this is to take the parents, apply cross-over and only then mutate the final result (before calling it a child).
The reason for this is that if you apply mutation to the parents there's basically 'too much mutation' going on (assuming the mutation rate is the same, you're doubling the chance of stuff getting scrambled).
Even if I have never seen it done like that, of course you could do it but you would have to 'rescale' mutation so that it's not disruptive for the evolution process (too much mutation --> random walk).
All the standard evolution rates I've ever used as a reference are given on the child, so that's another reason to go with that.

In each case, you can do either. Different crossover and mutation schemes may work well for different problems; try a variety of things for your problem and see how they perform. (But of course if you (1) say that mutation is only applied to children after crossover and (2) say that children after crossover don't mutate, then the result is that you have no mutation :-), so that combination is probably not a good one.)

As has been mentioned in other answers, either approach is usable and I have seen both implemented in practice. It is a design choice. But, having said that, I'd like to persuade you that it is preferable to only perform one genetic operation at a time.
The property of high 'locality' is desirable for genetic operators the majority of the time. Locality refers to how localised an operator's effect on an individual is - does it radically change it, or does it only make a small adjustment, nudging the individual to an adjacent location in the search space. An operator which has low locality creates large unrelated jumps in the search space, which makes it difficult to make gradual progress, instead relying upon lucky strikes. If you are to apply crossover and mutation in one step, then the changes are effectively combined, creating an operation of lower locality than if they were applied individually.
There are times when you might want this by choice, but normally only in circumstances that the fitness landscape is so rugged that evolutionary algorithms are probably the wrong approach.

Related

Combining multiple genetic operators

Please correct me if I'm wrong, but it is my understanding that crossovers tend to lead towards local optima, while mutation increases the random walk of the search thus tend to help in escaping local optima tendencies. This insight I got from reading the following: Introduction to Genetic Algorithms and Wikipedia's article on Genetic Operators.
My question is, what is the best or most ideal way to pick which individuals go through crossover and which go through mutation? Is there a rule of thumb for this? What are the implications?
Thanks in advance. This is a pretty specific question that is a bit hard to Google with (for me at least).
The selection of individuals to participate in crossover operation must consider the fitness, that is "better individuals are more likely to have more child programs than inferior individuals.":
http://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/23Selection.html#7_3
The most common way to perform this is using Tournament Selection (see wikipedia).
Selection of the individuals to mutate should not consider fitness, in fact, should be random. And the number of elements mutated per generation (mutation rate) should be very low, around 1% (or it may fall into random search):
http://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/24RecombinationandMutation.html#7_4
In my experience, tweaking the tournament parameters just a bit could lead to substantial changes in the final results (for better or for worse), so it is usually a good idea to play with these parameters until you find a "sweet spot".

Genetic algorithm - new generations getting worse

I have implemented a simple Genetic Algorithm to generate short story based on Aesop fables.
Here are the parameters I'm using:
Mutation: Single word swap mutation with tested rate with 0.01.
Crossover: Swap the story sentences at given point. rate - 0.7
Selection: Roulette wheel selection - https://stackoverflow.com/a/5315710/536474
Fitness function: 3 different function. highest score of each is 1.0. so total highest fitness score is 3.0.
Population size: Since I'm using 86 Aesop fables, I tested population size with 50.
Initial population: All 86 fable sentence orders are shuffled in order to make complete nonsense. And my goal is to generate something meaningful(at least at certain level) from these structure lost fables.
Stop Condition: 3000 generations.
And the results are below:
However, this still did not produce a favorable result. I was expecting the plot that goes up over the generations. Any ideas to why my GA performing worse result?
Update: As all of you suggested, I've employed elitism by 10% of current generation copied to next generation. Result still remains the same:
Probably I should use tournament selection.
All of the above responses are great and I'd look into them. I'll add my thoughts.
Mutation
Your mutation rate seems fine although with Genetic Algorithms mutation rate can cause a lot of issues if it's not right. I'd make sure you test a lot of other values to be sure.
With mutation I'd maybe use two types of mutation. One that replaces words with other from your dictionary, and one that swaps two words within a sentence. This would encourage diversifying the population as a whole, and shuffling words.
Crossover
I don't know exactly how you've implemented this but one-point crossover doesn't seem like it'll be that effective in this situation. I'd try to implement an n-point crossover, which will do a much better job of shuffling your sentences. Again, I'm not sure how it's implemented but just swapping may not be the best solution. For example, if a word is at the first point, is there ever any way for it to move to another position, or will it always be the first word if it's chosen by selection?
If word order is important for your chosen problem simple crossover may not be ideal.
Selection
Again, this seems fine but I'd make sure you test other options. In the past I've found rank based roulette selection to be a lot more successful.
Fitness
This is always the most important thing to consider in any genetic algorithm and with the complexity of problem you have I'd make doubly sure it works. Have you tested that it works with 'known' problems?
Population Size
Your value seems small but I have seen genetic algorithms work successfully with small populations. Again though, I'd experiment with much larger populations to see if your results are any better.
The most popular suggestion so far is to implement elitism and I'd definitely recommend it. It doesn't have to be much, even just the best couple of chromosome every generation (although as with everything else I'd try different values).
Another sometimes useful operator to implement is culling. Destroy a portion of your weakest chromosomes, or one that are similar to others (or both) and replace them with new chromosomes. This should help to stop your population going 'stale', which, from your graph looks like it might be happening. Mutation only does so much to diversify the population.
You may be losing the best combinations, you should keep the best of each generation without crossing(elite). Also, your function seems to be quite stable, try other types of mutations, that should improve.
Drop 5% to 10% of your population to be elite, so that you don't lose the best you have.
Make sure your selection process is well set up, if bad candidates are passing through very often it'll ruin your evolution.
You might also be stuck in a local optimum, you might need to introduce other stuff into your genome, otherwise you wont move far.
Moving sentences and words around will not probably get you very far, introducing new sentences or words might be interesting.
If you think of story as a point x,y and your evaluation function as f(x,y), and you're trying to find the max for f(x,y), but your mutation and cross-over are limited to x -> y, y ->y, it makes sense that you wont move far. Granted, in your problem there is a lot more variables, but without introducing something new, I don't think you can avoid locality.
As #GettnDer said, elitism might help a lot.
What I would suggest is to use different selection strategy. The roulette wheel selection has one big problem: imagine that the best indidivual's fitness is e.g. 90% of the sum of all fitnesses. Then the roulette wheel is not likely to select the other individuals (see e.g. here). The selction strategy I like the most is the tournament selection. It is much more robust to big differences in fitness values and the selection pressure can be controlled very easily.
Novelty Search
I would also give a try to Novelty Search. It's relatively new approach in evolutionary computation, where you don't do the selection based on the actual fitness but rather based on novelty which is supposed to be some metric of how an individual is different in its behaviour from the others (but you still compute the fitness to catch the good ones). Of special interest might be combinations of classical fitness-driven algorithms and novelty-driven ones, like the this one by J.-B. Mouret.
When working with genetic algorithms, it is a good practice to structure you chromosome in order to reflect the actual knowledge on the process under optimization.
In your case, since you intend to generate stories, which are made of sentences, it could improve your results if you transformed your chromosomes into structured phrases, line <adjectives>* <subject> <verb> <object>* <adverbs>* (huge simplification here).
Each word could then be assigned a class. For instance, Fox=subject , looks=verb , grapes=object and then your crossover operator would exchange elements from the same category between chromosomes. Besides, your mutation operator could only insert new elements of a proper category (for instance, an adjective before the subject) or replace a word for a random word in the same category.
This way you would minimize the number of nonsensical chromosomes (like Fox beautiful grape day sky) and improve the discourse generation power for your GA.
Besides, I agree with all previous comments: if you are using elitism and the best performance decreases, then you are implementing it wrong (notice that in a pathological situation it may remain constant for a long period of time).
I hope it helps.

1X Crossover: the trivial case

I've got a little question about 1X crossover. Do we have account for the possibilities that the breakpoint can be at the beginning or at the end (the trivial case)?
Thanks in advance!
The Genetic Algorithm is quite robust. If you include a child as a copy of one parent in some cases it is similar to a lower crossover probability. I would not expect this little extra to have much of an impact if any at all. Still, if you are unsure you can implement both and try. But let me add that people also attempted to implement largely different crossovers and still with some of these the performance is similar. What we know from the design points of a good crossover is that the child should consist only of those alleles that are present in either one of the parents. So called unwanted mutations are to be avoided.
As pointed out by #seaotternerd, there is no hard and fast rule here.
The general practice however is that the selection of a crossover site is carried out by including only one of the two trivial cases; i.e. either the beginning or the end, but not both. This practice assumes importance in cases where the chromosomes are especially small as it allows for maximum variability without compromising on the principle of indifference.

Elitism in GA: Should I let the elites be selected as parents?

I am a little confused by the elitism concept in Genetic Algorithm (and other evolutionary algorithms). When I reserve and then copy 1 (or more) elite individuals to the next generation,
Should I consider the elite solution(s) in the parent selection of the current generation (making a new population)?
Or, should I use others (putting the elites aside) for making a new population and just copy the elites directly to the next generation?
If the latter, what is the use of elitism? Is it just for not losing the best solution? Because in this scheme, it won't help the convergence at all.
for example, here under the crossover/mutation part, it is stated that the elites aren't participating.
(Of course, the same question can be asked about the survivor selection part.)
Elitism only means that the most fit handful of individuals are guaranteed a place in the next generation - generally without undergoing mutation. They should still be able to be selected as parents, in addition to being brought forward themselves.
That article does take a slightly odd approach to elitism. It suggests duplicating the most fit individual - that individual gets two reserved slots in the next generation. One of these slots is mutated, the other is not. That means that, in the next generation, at least one of those slots will reenter the general population as a parent, and possibly two if both are overtaken.
It does seem a viable approach. Either way - whether by selecting elites as parents while also perpetuating them, or by copying the elites and then mutating one - the elites should still be closely attached to the population at large so that they can share their beneficial genes around.
#Peladao's answer and comment are also absolutely spot on - especially on the need to maintain diversity and avoid premature convergence, and the elites should only represent a small portion of the population.
I see no reason why one would not use the elites as parents, besides perhaps a small loss in diversity. (The number of elites should therefore be small compared to the population size).
Since the elites are the best individuals, they are valuable candidates to create new individuals using crossover, as long as the elites themselves are also copied (unchanged) into the new population.
Keeping sufficient diversity and avoiding premature convergence is always important, also when elites are not used as parents.
There exists different methodologies used in order to implement elitism, as pointed out also by the other valid answers.
Generally, for elitism, just copy N individuals in the new generation without applying any kind of change. However this individuals can be selected by fitness ranking (true elitism) guaranteeing that the bests are really "saved", or they can be chosen via proportional selection (as pointed out in the book Machine Learning by Mitchell T.). The latter one is the same used in the roulette selection, but note that in this case the individuals are not used for generating new offspring, but are directly copied in the new population (survivors!).
When the selection for elitism is proportional we obtain a good compromise between a lack of diversity and a premature over-fitting situation.
Applying real elitism and avoiding to use the "elite" as parents will be counter-productive, especially considering the validity of the crossover operation.
In nutshell the main points about using elitism are:
The number of elites in the population should not exceed say 10% of the total population to maintain diversity.
Out of this say 5% may be direct part of the next generation and the remaining should undergo crossover and mutation with other non-elite population.

Genetic algorithms -- what are the benefits of sexual, as opposed to asexual, genetic algorithms?

Intuitively I'd think that if I want to find the "best" set of parameters, I can simply take the best performing 1 guy from a subset of lots of children, make that guy generate 100 children similar to himself, pick the best performer and repeat. What purpose does it serve to pick specifically the best 2 and crossbreed? For that matter, why not select 3, 4, or 10 parents ("orgy-derived" zygotes) from which to create each generation of children?
"from a subset of lots of children" - how were those children made, and what mechanism makes them different from each other? "generate 100 children similar to himself" - if not exactly like himself, then what mechanism makes them similar, yet not identical?
Sexual reproduction is a mechanism that answers these questions. Through sexual reproduction you create new combinations, made up of the genes of fit individuals. Just using random mutation alone as a mechanism for creating diversity and new combinations is what it says - random - a shot in the dark. Sexual reproduction creates new combinations using the genes of successful individuals, which is not simply random.
Questioning which is better, sexual vs. asexual is a good question, and there are a lot of articles on this topic of sexual vs. asexual, and not all favor sexual. There are successful asexual mechanisms, although I'm not sure if the alternative you proposed in your question is among them.
Think of it this way: your best performing guy is maybe better-than-average in, let's just say, 3 areas out of 10. Small variations of him (his asexually-reproduced kid) are probably going to have advantages in those same 3 areas: maybe 4 and maybe 2, depending on mutation. But the best performing guy and the best performing girl are better in perhaps 5 areas out of 10 (he's better-than-average in 3, she's better-than-average in 3, and maybe there's 1 where they overlap) and so if they had a good number of children one of them might be better-than-average in 5 areas (and perhaps one of them might inherit no advantages -- such are the breaks). And if that 5-areas-of-advantage kid mates with another 5-area-of-advantage kid, then while there's more chance of overlapping advantage, there's still a good chance (in our "10 gene" world) that the grand-child will have even more advantageous genes).
It's the recombination of several characteristics in a complex environment that's really at the heart of the genetic algorithm. It's not intuitive, because we don't generally think that twiddling every knob on the control panel at once is a good way to optimize, but if you have a lot of parameters and they're quite independent, it can be.
The fewer parents you have, the more likely you are to get caught in a local optimum-- potentially a not-very-good local optimum-- for a very long time. With only one parent, the only search mechanism left is individual mutation.
The more parents you have the less likely you are to capture whatever it was about the original parents that caused them to be selected for reproduction in the first place. The details will depend on exactly how your n-ary crossover works, but intuitively, the more parents you have, the less genetic material you are likely to have from any one particular parent, and the less likely the children are to inherit (and thus improve upon) any beneficial multi-chromosome traits of their parents.
This is related to the Schema Theorem.
Technically, you can have orgy-derived zygotes in your population, but there is no mathematical proof (at least to my knowledge) that they improve either diversity or the final result found by your algoritm. Besides, orgy operators (to use your term) are more complicated that the simple two parent kind, and are not easily prone to understanding by students. Hence, they are not advertised (does not mean that they are not allowed).
Actually, you can use a mix of both single and double parent in your GA. As one answer already pointed out, the single parent element is equivalent to a local search and technically you would be implementing a memetic algorithm, which is usually an improvement on the simple GA.

Resources