I've got a little question about 1X crossover. Do we have account for the possibilities that the breakpoint can be at the beginning or at the end (the trivial case)?
Thanks in advance!
The Genetic Algorithm is quite robust. If you include a child as a copy of one parent in some cases it is similar to a lower crossover probability. I would not expect this little extra to have much of an impact if any at all. Still, if you are unsure you can implement both and try. But let me add that people also attempted to implement largely different crossovers and still with some of these the performance is similar. What we know from the design points of a good crossover is that the child should consist only of those alleles that are present in either one of the parents. So called unwanted mutations are to be avoided.
As pointed out by #seaotternerd, there is no hard and fast rule here.
The general practice however is that the selection of a crossover site is carried out by including only one of the two trivial cases; i.e. either the beginning or the end, but not both. This practice assumes importance in cases where the chromosomes are especially small as it allows for maximum variability without compromising on the principle of indifference.
Related
Please correct me if I'm wrong, but it is my understanding that crossovers tend to lead towards local optima, while mutation increases the random walk of the search thus tend to help in escaping local optima tendencies. This insight I got from reading the following: Introduction to Genetic Algorithms and Wikipedia's article on Genetic Operators.
My question is, what is the best or most ideal way to pick which individuals go through crossover and which go through mutation? Is there a rule of thumb for this? What are the implications?
Thanks in advance. This is a pretty specific question that is a bit hard to Google with (for me at least).
The selection of individuals to participate in crossover operation must consider the fitness, that is "better individuals are more likely to have more child programs than inferior individuals.":
http://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/23Selection.html#7_3
The most common way to perform this is using Tournament Selection (see wikipedia).
Selection of the individuals to mutate should not consider fitness, in fact, should be random. And the number of elements mutated per generation (mutation rate) should be very low, around 1% (or it may fall into random search):
http://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/24RecombinationandMutation.html#7_4
In my experience, tweaking the tournament parameters just a bit could lead to substantial changes in the final results (for better or for worse), so it is usually a good idea to play with these parameters until you find a "sweet spot".
I have implemented a simple Genetic Algorithm to generate short story based on Aesop fables.
Here are the parameters I'm using:
Mutation: Single word swap mutation with tested rate with 0.01.
Crossover: Swap the story sentences at given point. rate - 0.7
Selection: Roulette wheel selection - https://stackoverflow.com/a/5315710/536474
Fitness function: 3 different function. highest score of each is 1.0. so total highest fitness score is 3.0.
Population size: Since I'm using 86 Aesop fables, I tested population size with 50.
Initial population: All 86 fable sentence orders are shuffled in order to make complete nonsense. And my goal is to generate something meaningful(at least at certain level) from these structure lost fables.
Stop Condition: 3000 generations.
And the results are below:
However, this still did not produce a favorable result. I was expecting the plot that goes up over the generations. Any ideas to why my GA performing worse result?
Update: As all of you suggested, I've employed elitism by 10% of current generation copied to next generation. Result still remains the same:
Probably I should use tournament selection.
All of the above responses are great and I'd look into them. I'll add my thoughts.
Mutation
Your mutation rate seems fine although with Genetic Algorithms mutation rate can cause a lot of issues if it's not right. I'd make sure you test a lot of other values to be sure.
With mutation I'd maybe use two types of mutation. One that replaces words with other from your dictionary, and one that swaps two words within a sentence. This would encourage diversifying the population as a whole, and shuffling words.
Crossover
I don't know exactly how you've implemented this but one-point crossover doesn't seem like it'll be that effective in this situation. I'd try to implement an n-point crossover, which will do a much better job of shuffling your sentences. Again, I'm not sure how it's implemented but just swapping may not be the best solution. For example, if a word is at the first point, is there ever any way for it to move to another position, or will it always be the first word if it's chosen by selection?
If word order is important for your chosen problem simple crossover may not be ideal.
Selection
Again, this seems fine but I'd make sure you test other options. In the past I've found rank based roulette selection to be a lot more successful.
Fitness
This is always the most important thing to consider in any genetic algorithm and with the complexity of problem you have I'd make doubly sure it works. Have you tested that it works with 'known' problems?
Population Size
Your value seems small but I have seen genetic algorithms work successfully with small populations. Again though, I'd experiment with much larger populations to see if your results are any better.
The most popular suggestion so far is to implement elitism and I'd definitely recommend it. It doesn't have to be much, even just the best couple of chromosome every generation (although as with everything else I'd try different values).
Another sometimes useful operator to implement is culling. Destroy a portion of your weakest chromosomes, or one that are similar to others (or both) and replace them with new chromosomes. This should help to stop your population going 'stale', which, from your graph looks like it might be happening. Mutation only does so much to diversify the population.
You may be losing the best combinations, you should keep the best of each generation without crossing(elite). Also, your function seems to be quite stable, try other types of mutations, that should improve.
Drop 5% to 10% of your population to be elite, so that you don't lose the best you have.
Make sure your selection process is well set up, if bad candidates are passing through very often it'll ruin your evolution.
You might also be stuck in a local optimum, you might need to introduce other stuff into your genome, otherwise you wont move far.
Moving sentences and words around will not probably get you very far, introducing new sentences or words might be interesting.
If you think of story as a point x,y and your evaluation function as f(x,y), and you're trying to find the max for f(x,y), but your mutation and cross-over are limited to x -> y, y ->y, it makes sense that you wont move far. Granted, in your problem there is a lot more variables, but without introducing something new, I don't think you can avoid locality.
As #GettnDer said, elitism might help a lot.
What I would suggest is to use different selection strategy. The roulette wheel selection has one big problem: imagine that the best indidivual's fitness is e.g. 90% of the sum of all fitnesses. Then the roulette wheel is not likely to select the other individuals (see e.g. here). The selction strategy I like the most is the tournament selection. It is much more robust to big differences in fitness values and the selection pressure can be controlled very easily.
Novelty Search
I would also give a try to Novelty Search. It's relatively new approach in evolutionary computation, where you don't do the selection based on the actual fitness but rather based on novelty which is supposed to be some metric of how an individual is different in its behaviour from the others (but you still compute the fitness to catch the good ones). Of special interest might be combinations of classical fitness-driven algorithms and novelty-driven ones, like the this one by J.-B. Mouret.
When working with genetic algorithms, it is a good practice to structure you chromosome in order to reflect the actual knowledge on the process under optimization.
In your case, since you intend to generate stories, which are made of sentences, it could improve your results if you transformed your chromosomes into structured phrases, line <adjectives>* <subject> <verb> <object>* <adverbs>* (huge simplification here).
Each word could then be assigned a class. For instance, Fox=subject , looks=verb , grapes=object and then your crossover operator would exchange elements from the same category between chromosomes. Besides, your mutation operator could only insert new elements of a proper category (for instance, an adjective before the subject) or replace a word for a random word in the same category.
This way you would minimize the number of nonsensical chromosomes (like Fox beautiful grape day sky) and improve the discourse generation power for your GA.
Besides, I agree with all previous comments: if you are using elitism and the best performance decreases, then you are implementing it wrong (notice that in a pathological situation it may remain constant for a long period of time).
I hope it helps.
Intuitively I'd think that if I want to find the "best" set of parameters, I can simply take the best performing 1 guy from a subset of lots of children, make that guy generate 100 children similar to himself, pick the best performer and repeat. What purpose does it serve to pick specifically the best 2 and crossbreed? For that matter, why not select 3, 4, or 10 parents ("orgy-derived" zygotes) from which to create each generation of children?
"from a subset of lots of children" - how were those children made, and what mechanism makes them different from each other? "generate 100 children similar to himself" - if not exactly like himself, then what mechanism makes them similar, yet not identical?
Sexual reproduction is a mechanism that answers these questions. Through sexual reproduction you create new combinations, made up of the genes of fit individuals. Just using random mutation alone as a mechanism for creating diversity and new combinations is what it says - random - a shot in the dark. Sexual reproduction creates new combinations using the genes of successful individuals, which is not simply random.
Questioning which is better, sexual vs. asexual is a good question, and there are a lot of articles on this topic of sexual vs. asexual, and not all favor sexual. There are successful asexual mechanisms, although I'm not sure if the alternative you proposed in your question is among them.
Think of it this way: your best performing guy is maybe better-than-average in, let's just say, 3 areas out of 10. Small variations of him (his asexually-reproduced kid) are probably going to have advantages in those same 3 areas: maybe 4 and maybe 2, depending on mutation. But the best performing guy and the best performing girl are better in perhaps 5 areas out of 10 (he's better-than-average in 3, she's better-than-average in 3, and maybe there's 1 where they overlap) and so if they had a good number of children one of them might be better-than-average in 5 areas (and perhaps one of them might inherit no advantages -- such are the breaks). And if that 5-areas-of-advantage kid mates with another 5-area-of-advantage kid, then while there's more chance of overlapping advantage, there's still a good chance (in our "10 gene" world) that the grand-child will have even more advantageous genes).
It's the recombination of several characteristics in a complex environment that's really at the heart of the genetic algorithm. It's not intuitive, because we don't generally think that twiddling every knob on the control panel at once is a good way to optimize, but if you have a lot of parameters and they're quite independent, it can be.
The fewer parents you have, the more likely you are to get caught in a local optimum-- potentially a not-very-good local optimum-- for a very long time. With only one parent, the only search mechanism left is individual mutation.
The more parents you have the less likely you are to capture whatever it was about the original parents that caused them to be selected for reproduction in the first place. The details will depend on exactly how your n-ary crossover works, but intuitively, the more parents you have, the less genetic material you are likely to have from any one particular parent, and the less likely the children are to inherit (and thus improve upon) any beneficial multi-chromosome traits of their parents.
This is related to the Schema Theorem.
Technically, you can have orgy-derived zygotes in your population, but there is no mathematical proof (at least to my knowledge) that they improve either diversity or the final result found by your algoritm. Besides, orgy operators (to use your term) are more complicated that the simple two parent kind, and are not easily prone to understanding by students. Hence, they are not advertised (does not mean that they are not allowed).
Actually, you can use a mix of both single and double parent in your GA. As one answer already pointed out, the single parent element is equivalent to a local search and technically you would be implementing a memetic algorithm, which is usually an improvement on the simple GA.
I’m implementing a steady-state genetic algorithm to perform symbolic regression.
My questions are about the relation between mutation and crossover operators.
I always consult a mutation probability (Pm) before applying mutation and a tournament selection to choose parents based in their error.
First question:
Mutation must be applied ONLY to children obtained after crossover (or another genetic operator)
or can be applied directly to a 1 parent to generate a new individual ?
Second question:
Children obtained after a crossover operation must always try a mutation (of course with Pm)?
Thank you all in advance.
Usually the mating process includes cross-over and mutation, so to answer your question a standard way of doing this is to take the parents, apply cross-over and only then mutate the final result (before calling it a child).
The reason for this is that if you apply mutation to the parents there's basically 'too much mutation' going on (assuming the mutation rate is the same, you're doubling the chance of stuff getting scrambled).
Even if I have never seen it done like that, of course you could do it but you would have to 'rescale' mutation so that it's not disruptive for the evolution process (too much mutation --> random walk).
All the standard evolution rates I've ever used as a reference are given on the child, so that's another reason to go with that.
In each case, you can do either. Different crossover and mutation schemes may work well for different problems; try a variety of things for your problem and see how they perform. (But of course if you (1) say that mutation is only applied to children after crossover and (2) say that children after crossover don't mutate, then the result is that you have no mutation :-), so that combination is probably not a good one.)
As has been mentioned in other answers, either approach is usable and I have seen both implemented in practice. It is a design choice. But, having said that, I'd like to persuade you that it is preferable to only perform one genetic operation at a time.
The property of high 'locality' is desirable for genetic operators the majority of the time. Locality refers to how localised an operator's effect on an individual is - does it radically change it, or does it only make a small adjustment, nudging the individual to an adjacent location in the search space. An operator which has low locality creates large unrelated jumps in the search space, which makes it difficult to make gradual progress, instead relying upon lucky strikes. If you are to apply crossover and mutation in one step, then the changes are effectively combined, creating an operation of lower locality than if they were applied individually.
There are times when you might want this by choice, but normally only in circumstances that the fitness landscape is so rugged that evolutionary algorithms are probably the wrong approach.
I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?
You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.
Some personal experience:
Large population sizes seem to work
better when you have a noisy fitness
function, I think this is because the growth
of sub-groups in the population over successive generations acts
to give more sampling of
the fitness function. I typically use
100 for less noisy/deterministic functions, 1000+
for noisy.
For number of generations it is best to measure improvements in the
fitness function and stop when it
meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
Tree depth requirements are really dependent on your DSL. I sometimes try to do an
implementation without explicit
limits but penalise or eliminate
programs that run too long (which is probably
what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
Percentages for different mutation / recombination operators don't seem
to matter all that much. As long as
you have a comprehensive set of mutations, any reasonably balanced
distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
Why don't you try using a genetic algorithm to optimise these parameters for you? :)
Any problem in computer science can be
solved with another layer of
indirection (except for too many
layers of indirection.)
-David J. Wheeler
When I started looking into Genetic Algorithms I had the same question.
I wanted to collect data variating parameters on a very simple problem and link given operators and parameters values (such as mutation rates, etc) to given results in function of population size etc.
Once I started getting into GA a bit more I then realized that given the enormous number of variables this is a huge task, and generalization is extremely difficult.
talking from my (limited) experience, if you decide to simplify the problem and use a fixed way to implement crossover, selection, and just play with population size and mutation rate (implemented in a given way) trying to come up with general results you'll soon realize that too many variables are still into play because at the end of the day the number of generations after which statistically you will get a decent result (whatever way you wanna define decent) still obviously depend primarily on the problem you're solving and consequently on the genome size (representing the same problem in different ways will obviously lead to different results in terms of effect of given GA parameters!).
It is certainly possible to draft a set of guidelines - as the (rare but good) literature proves - but you will be able to generalize the results effectively in statistical terms only when the problem at hand can be encoded in the exact same way and the fitness is evaluated in a somehow an equivalent way (which more often than not means you're ealing with a very similar problem).
Take a look at Koza's voluminous tomes on these matters.
There are very different schools of thought even within the GP community -
Some regard populations in the (low) thousands as sufficient whereas Koza and others often don't deem if worthy to start a GP run with less than a million individuals in the GP population ;-)
As mentioned before it depends on your personal taste and experiences, resources and probably the GP system used!
Cheers,
Jan