Why does adding Crossover to my Genetic Algorithm gives me worse results? - algorithm

I have implemented a Genetic Algorithm to solve the Traveling Salesman Problem (TSP). When I use only mutation, I find better solutions than when I add in crossover. I know that normal crossover methods do not work for TSP, so I implemented both the Ordered Crossover and the PMX Crossover methods, and both suffer from bad results.
Here are the other parameters I'm using:
Mutation: Single Swap Mutation or Inverted Subsequence Mutation (as described by Tiendil here) with mutation rates tested between 1% and 25%.
Selection: Roulette Wheel Selection
Fitness function: 1 / distance of tour
Population size: Tested 100, 200, 500, I also run the GA 5 times so that I have a variety of starting populations.
Stop Condition: 2500 generations
With the same dataset of 26 points, I usually get results of about 500-600 distance using purely mutation with high mutation rates. When adding crossover my results are usually in the 800 distance range. The other confusing thing is that I have also implemented a very simple Hill-Climbing algorithm to solve the problem and when I run that 1000 times (faster than running the GA 5 times) I get results around 410-450 distance, and I would expect to get better results using a GA.
Any ideas as to why my GA performing worse when I add crossover? And why is it performing much worse than a simple Hill-Climb algorithm which should get stuck on local maxima as it has no way of exploring once it finds a local max?

It looks like your crossover operator is introducing too much randomness into the new generations, so you are losing your computational effort trying to improve bad solutions. Imagine that the Hill-Climb algorithm can improve a given solution to the best of its neighborhood, but your Genetic Algorithm can only make limited improvements to almost random population (solutions).
It is also worth to say that GA is not the best tool to solve the TSP. Anyway, you should look like at some examples of how to implement it. e.g. http://www.lalena.com/AI/Tsp/

With roulette-wheel selection, you're introducing bad parents into the mix. If you'd like to weight the wheel somehow to choose some better parents, this may help.
Remember, much of your population might be unfit parents. If you're not weighting parent selection at all, there's a good chance you'll be breeding consistently bad solutions that overrun the pool. Weight your selection to choose better parents more frequently, and use mutation to correct a too-similar pool by adding randomness.

You might try introducing elitism into your selection process. Elitism means that the two highest fitness individuals in the population are preserved and copied to the new population before any selection is done. After elitism is completed, selection continues as normal. Doing this means that no matter which parents are selected by the roulette wheel or what they produce during crossover, the two best individuals will always be preserved. This prevents the new population from losing fitness because its two best solutions can't be any worse than the previous generation.

One reason for your results being worse when crossover is added because may be it is not doing what it should- combine the best features of two individuals. Try with a low crossover probability may be? Population diversity could be a issue here. Morrison and De Jong in their work Measurement of Population Diversity proposes a novel measure of diversity. Using that measure you can see how your population diversity is changing over the generations. See what difference it makes when you use crossover or don't use crossover.
Also, there could be some minor mistake/missed detail in your OX or PMX implementation. Maybe you have overlooked something? BTW, may be you want to try the Edge Recombination crossover operator? (Pyevolve has an implementation).

In order to come up with 'innovative' strategies genetic algorithms generally use crossover to combine feats of different candidate solutions in order to explore the search space very quickly and find new strategies of higher fitness - not at all unlike the inner workings of human intelligence (this is why it is arguable that we never really 'invent' anything, but merely mix up stuff we already know).
By doing so (randomly combining different individuals) crossover does not preserve symmetry or ordering, and when the problem is highly dependent on symmetry of some sort or on the order of the genes in the chromosome (as in your particular case) it is indeed likely that adopting crossover will lead to worse results. As you mention yourself, it is well known that known that crossover doesn't work for the traveling salesman.
It's worth underlining that without this symmetry breaking feat of crossover genetic algorithms would not be able to fill evolutionary 'niches' (where lack of symmetry is often necessary) - and that's why crossover (in all its variants) is essentially important in a vast majority of cases.

Related

Why adding randomness to a crossover operator in a GA improves it so much?

Playing with genectic algorithm I noticed that if I choosed a random crossover location instead of a fixed one for each crossover operation, the number of generations needed for me to get to a proper solution was way lower.
I didn't get the intuition behind this. What's happening? Why randomly crossovering chromosomes looks to be so more efficient?
In my experience this depends of what is the problem domain that you are trying to solve, for example in TSP (Travelling Salesman Problem) I like to use these combination of operators, because they normally found I good solution in an affordable time:
Ordered Crossover (OX1)
Reverse Sequence Mutation (RSM)
Elite Selection
But in other domain problems we can select different operators to achieve better results, like those one to a Function Builder problem:
Three Parent Crossover
Uniform Mutation
Elite Selection
So, I think the more important than the randomness is the right set of operators to the target problem.

Apply mutation (with small probability) to the offspring

I am learning genetic algorithm and when I studied about mutation some thing was there I can't figured out.It was a little unusual after we produce offspring by crossover point we should apply mutation(with small probability) what is that small probability? I have image about 8 queen problem that we found optimal answer for it here our crossover point is 3 so why for example we have mutation in first and third and last population but not in second one??
I am sorry that this question might be silly!
First, what you call population, it's actually just an individual from a population. The population is the whole set of all individuals.
A good genetic algorithm is one that balances "Exploration & Exploatation". Exploration tries to find out new solutions not caring how good they are (because they might be leading to some better solutions). Exploatation is trying to take advantage of what the algorithm already knows as being "good solutions".
CROSSOVER = EXPLOATATION Using crossover you are trying to combine the best individuals (fitness wise) to produce even better solutions
MUTATION = EXPLORATION Using mutation you try to get out of your "gene pool" and find out new solutions, with new characteristics that don't arise from your current population.
This being said, the best way to balance them is usually trial&error. For a certain input try to play with the parameters and see how they work.
About why the 2nd individual didn't mutate at all is just simply because the probabilistic process that mutates individual didn't choose it. Usually, on this kind of problems, mutation works like this:
for individual in population do:
for gene in individual:
if random() < MUTATION_RATIO:
mutate(gene)
This means that an individual might suffer even more than one mutation.
My experience with genetic algorithms is that the optimal mutuation probability is dependent on the algorithm and sometimes even on the problem. As a rule of thumb:
Mutation probability too small: population converges early
Mutation probanility too high: population never converges
So basically the question is not answerable. I have gone from 0.5 to 8% percent depending on the number of parameters, the mutation algorithm and the problem (i.e. sensibility to parameter changes). There are also algorithms that change the mutationrate over the generations.
I found a nice way to learn and experiment with the mutation rate (although only for that alorithm) is this site you can play with the probability and see the effects instantly. It also is pretty meditative watching those litte cars.

How genetic algorithm is different from random selection and evaluation for fittest?

I have been learning the genetic algorithm since 2 months. I knew about the process of initial population creation, selection , crossover and mutation etc. But could not understand how we are able to get better results in each generation and how its different than random search for a best solution. Following I am using one example to explain my problem.
Lets take example of travelling salesman problem. Lets say we have several cities as X1,X2....X18 and we have to find the shortest path to travel. So when we do the crossover after selecting the fittest guys, how do we know that after crossover we will get a better chromosome. The same applies for mutation also.
I feel like its just take one arrangement of cities. Calculate the shortest distance to travel them. Then store the distance and arrangement. Then choose another another arrangement/combination. If it is better than prev arrangement, then save the current arrangement/combination and distance else discard the current arrangement. By doing this also, we will get some solution.
I just want to know where is the point where it makes the difference between random selection and genetic algorithm. In genetic algorithm, is there any criteria that we can't select the arrangement/combination of cities which we have already evaluated?
I am not sure if my question is clear. But I am open, I can explain more on my question. Please let me know if my question is not clear.
A random algorithm starts with a completely blank sheet every time. A new random solution is generated each iteration, with no memory of what happened before during the previous iterations.
A genetic algorithm has a history, so it does not start with a blank sheet, except at the very beginning. Each generation the best of the solution population are selected, mutated in some way, and advanced to the next generation. The least good members of the population are dropped.
Genetic algorithms build on previous success, so they are able to advance faster than random algorithms. A classic example of a very simple genetic algorithm, is the Weasel program. It finds its target far more quickly than random chance because each generation it starts with a partial solution, and over time those initial partial solutions are closer to the required solution.
I think there are two things you are asking about. A mathematical proof that GA works, and empirical one, that would waive your concerns.
Although I am not aware if there is general proof, I am quite sure at least a good sketch of a proof was given by John Holland in his book Adaptation in Natural and Artificial Systems for the optimization problems using binary coding. There is something called Holland's schemata theoerm. But you know, it's heuristics, so technically it does not have to be. It basically says that short schemes in genotype raising the average fitness appear exponentially with successive generations. Then cross-over combines them together. I think the proof was given only for binary coding and got some criticism as well.
Regarding your concerns. Of course you have no guarantee that a cross-over will produce a better result. As two intelligent or beautiful parents might have ugly stupid children. The premise of GA is that it is less likely to happen. (As I understand it) The proof for binary coding hinges on the theoerm that says a good partial patterns will start emerging, and given that the length of the genotype should be long enough, such patterns residing in different specimen have chance to be combined into one improving his fitness in general.
I think it is fairly easy to understand in terms of TSP. Crossing-over help to accumulate good sub-paths into one specimen. Of course it all depends on the choice of the crossing method.
Also GA's path towards the solution is not purely random. It moves towards a certain direction with stochastic mechanisms to escape trappings. You can lose best solutions if you allow it. It works because it wants to move towards the current best solutions, but you have a population of specimens and they kind of share knowledge. They are all similar, but given that you preserve diversity new better partial patterns can be introduced to the whole population and get incorporated into the best solutions. This is why diversity in population is regarded as very important.
As a final note please remember the GA is a very broad topic and you can modify the base in nearly every way you want. You can introduce elitarism, taboos, niches, etc. There is no one-and-only approach/implementation.

Lack of diversification, is it really a drawback of Genetic Algorithms?

We know that Genetic Algorithms (or evolutionary computation) work with an encoding of the points in our solution space Ω rather than these points directly. In the literature, we often find that GAs have the drawback : (1) since many chromosomes are coded into a similar point of Ω or similar chromosomes have very different points, the efficiency is quite low. Do you think that is really a drawback ? because these kind of algorithms uses the mutation operator in each iteration to diversify the candidate solutions. To add more diversivication we simply increase the probability of crossover. And we mustn't forget that our initial population ( of chromosones ) is randomly generated ( another more diversification). The question is, if you think that (1) is a drawback of GAs, can you provide more details ? Thank you.
Mutation and random initialization are not enough to combat the problem that is known as genetic drift which is the major problem of genetic algorithms. Genetic drift means that the GA may quickly lose most of its genetic diversity and the search proceeds in a way that is not beneficial for crossover. This is because the random initial population quickly converges. Mutation is a different thing, if it is high it will diversify, true, but at the same time it will prevent convergence and the solutions will remain at a certain distance to the optimum with higher probability. You will need to adapt the mutation probability (not the crossover probability) during the search. In a similar manner the Evolution Strategy, which is similar to a GA, adapts the mutation strength during the search.
We have developed a variant of the GA that is called OffspringSelection GA (OSGA) which introduces another selection step after crossover. Only those children will be accepted that surpass their parents' fitness (the better, the worse or any linearly interpolated value). This way you can even use random parent selection and put the bias on the quality of the offspring. It has been shown that this slows the genetic drift. The algorithm is implemented in our framework HeuristicLab. It features a GUI so you can download and try it on some problems.
Other techniques that combat genetic drift are niching and crowding which let the diversity flow into the selection and thus introduce another, but likely different bias.
EDIT: I want to add that the situation of having multiple solutions with equal quality might of course pose a problem as it creates neutral areas in the search space. However, I think you didn't really mean that. The primary problem is genetic drift, ie. the loss of (important) genetic information.
As a sidenote, you (the OP) said:
We know that Genetic Algorithms (or evolutionary computation) work with an encoding of the points in our solution space Ω rather than these points directly.
This is not always true. An individual is coded as a genotype, which can have any shape, such as a string (genetic algorithms) or a vector of real (evolution strategies). Each genotype is transformed into a phenotype when assessing the individual, i.e. when its fitness is calculated. In some cases, the phenotype is identical to the genotype: it is called direct coding. Otherwise, the coding is called indirect. (you may find more definitions here (section 2.2.1))
Example of direct encoding:
http://en.wikipedia.org/wiki/Neuroevolution#Direct_and_Indirect_Encoding_of_Networks
Example of indirect encoding:
Suppose you want to optimize the size of a rectangular parallelepiped dened by its length, height and width. To simplify the example, assume that these three quantities are integers between 0 and 15. We can then describe each of them using a 4-bit binary number. An example of a potential solution may be to genotype 0001 0111 01010. The corresponding phenotype is a parallelepiped of length 1, height 7 and width 10.
Now back to the original question on diversity, in addition to what DonAndre said you could read you read chapter 9 "Multi-Modal Problems and Spatial Distribution" of the excellent book Introduction to Evolutionary Computing written by A. E. Eiben and J. E. Smith. as well as a research paper on that matter such as Encouraging Behavioral Diversity in Evolutionary Robotics: an Empirical Study. In a word, diversity is not a drawback of GA, it is "just" an issue.

Algorithm to optimize parameters based on imprecise fitness function

I am looking for a general algorithm to help in situations with similar constraints as this example :
I am thinking of a system where images are constructed based on a set of operations. Each operation has a set of parameters. The total "gene" of the image is then the sequential application of the operations with the corresponding parameters. The finished image is then given a vote by one or more real humans according to how "beautiful" it is.
The question is what kind of algorithm would be able to do better than simply random search if you want to find the most beautiful image? (and hopefully improve the confidence over time as votes tick in and improve the fitness function)
Given that the operations will probably be correlated, it should be possible to do better than random search. So for example operation A with parameters a1 and a2 followed by B with parameters b1 could generally be vastly superior to B followed by A. The order of operations will matter.
I have tried googling for research papers on random walk and markov chains as that is my best guesses about where to look, but so far have found no scenarios similar enough. I would really appreciate even just a hint of where to look for such an algorithm.
I think what you are looking for fall in a broad research area called metaheuristics (which include many non-linear optimization algorithms such as genetic algorithms, simulated annealing or tabu search).
Then if your raw fitness function is just giving a statistical value somehow approximating a real (but unknown) fitness function, you can probably still use most metaheuristics by (somehow) smoothing your fitness function (averaging results would do that).
Do you mean the Metropolis algorithm?
This approach uses a random walk, weighted by the fitness function. It is useful for locating local extrema in complicated fitness landscapes, but is generally slower than deterministic approaches where those will work.
You're pretty much describing a genetic algorithm in which the sequence of operations represents the "gene" ("chromosome" would be a better term for this, where the parameter[s] passed to each operation represents a single "gene", and multiple genes make up a chromosome), the image produced represents the phenotypic expression of the gene, and the votes from the real humans represent the fitness function.
If I understand your question, you're looking for an alternative algorithm of some sort that will evaluate the operations and produce a "beauty" score similar to what the real humans produce. Good luck with that - I don't think there really is any such thing, and I'm not surprised that you didn't find anything. Human brains, and correspondingly human evaluations of aesthetics, are much too staggeringly complex to be reducible to a simplistic algorithm.
Interestingly, your question seems to encapsulate the bias against using real human responses as the fitness function in genetic-algorithm-based software. This is a subject of relevance to me, since my namesake software is specifically designed to use human responses (or "votes") to evaluate music produced via a genetic process.
Simple Markov Chain
Markov chains, which you mention, aren't a bad way to go. A Markov chain is just a state machine, represented as a graph with edge weights which are transition probabilities. In your case, each of your operations is a node in the graph, and the edges between the nodes represent allowable sequences of operations. Since order matters, your edges are directed. You then need three components:
A generator function to construct the graph of allowed transitions (which operations are allowed to follow one another). If any operation is allowed to follow any other, then this is easy to write: all nodes are connected, and your graph is said to be complete. You can initially set all the edge weights to 1.
A function to traverse the graph, crossing N nodes, where N is your 'gene-length'. At each node, your choice is made randomly, but proportionally weighted by the values of the edges (so better edges have a higher chance of being selected).
A weighting update function which can be used to adjust the weightings of the edges when you get feedback about an image. For example, a simple update function might be to give each edge involved in a 'pleasing' image a positive vote each time that image is nominated by a human. The weighting of each edge is then normalised, with the currently highest voted edge set to 1, and all the others correspondingly reduced.
This graph is then a simple learning network which will be refined by subsequent voting. Over time as votes accumulate, successive traversals will tend to favour the more highly rated sequences of operations, but will still occasionally explore other possibilities.
Advantages
The main advantage of this approach is that it's easy to understand and code, and makes very few assumptions about the problem space. This is good news if you don't know much about the search space (e.g. which sequences of operations are likely to be favourable).
It's also easy to analyse and debug - you can inspect the weightings at any time and very easily calculate things like the top 10 best sequences known so far, etc. This is a big advantage - other approaches are typically much harder to investigate ("why did it do that?") because of their increased abstraction. Although very efficient, you can easily melt your brain trying to follow and debug the convergence steps of a simplex crawler!
Even if you implement a more sophisticated production algorithm, having a simple baseline algorithm is crucial for sanity checking and efficiency comparisons. It's also easy to tinker with, by messing with the update function. For example, an even more baseline approach is pure random walk, which is just a null weighting function (no weighting updates) - whatever algorithm you produce should perform significantly better than this if its existence is to be justified.
This idea of baselining is very important if you want to evaluate the quality of your algorithm's output empirically. In climate modelling, for example, a simple test is "does my fancy simulation do any better at predicting the weather than one where I simply predict today's weather will be the same as yesterday's?" Since weather is often correlated on a timescale of several days, this baseline can give surprisingly good predictions!
Limitations
One disadvantage of the approach is that it is slow to converge. A more agressive choice of update function will push promising results faster (for example, weighting new results according to a power law, rather than the simple linear normalisation), at the cost of giving alternatives less credence.
This is equivalent to fiddling with the mutation rate and gene pool size in a genetic algorithm, or the cooling rate of a simulated annealing approach. The tradeoff between 'climbing hills or exploring the landscape' is an inescapable "twiddly knob" (free parameter) which all search algorithms must deal with, either directly or indirectly. You are trying to find the highest point in some fitness search space. Your algorithm is trying to do that in less tries than random inspection, by looking at the shape of the space and trying to infer something about it. If you think you're going up a hill, you can take a guess and jump further. But if it turns out to be a small hill in a bumpy landscape, then you've just missed the peak entirely.
Also note that since your fitness function is based on human responses, you are limited to a relatively small number of iterations regardless of your choice of algorithmic approach. For example, you would see the same issue with a genetic algorithm approach (fitness function limits the number of individuals and generations) or a neural network (limited training set).
A final potential limitation is that if your "gene-lengths" are long, there are many nodes, and many transitions are allowed, then the size of the graph will become prohibitive, and the algorithm impractical.

Resources