Lack of diversification, is it really a drawback of Genetic Algorithms? - algorithm

We know that Genetic Algorithms (or evolutionary computation) work with an encoding of the points in our solution space Ω rather than these points directly. In the literature, we often find that GAs have the drawback : (1) since many chromosomes are coded into a similar point of Ω or similar chromosomes have very different points, the efficiency is quite low. Do you think that is really a drawback ? because these kind of algorithms uses the mutation operator in each iteration to diversify the candidate solutions. To add more diversivication we simply increase the probability of crossover. And we mustn't forget that our initial population ( of chromosones ) is randomly generated ( another more diversification). The question is, if you think that (1) is a drawback of GAs, can you provide more details ? Thank you.

Mutation and random initialization are not enough to combat the problem that is known as genetic drift which is the major problem of genetic algorithms. Genetic drift means that the GA may quickly lose most of its genetic diversity and the search proceeds in a way that is not beneficial for crossover. This is because the random initial population quickly converges. Mutation is a different thing, if it is high it will diversify, true, but at the same time it will prevent convergence and the solutions will remain at a certain distance to the optimum with higher probability. You will need to adapt the mutation probability (not the crossover probability) during the search. In a similar manner the Evolution Strategy, which is similar to a GA, adapts the mutation strength during the search.
We have developed a variant of the GA that is called OffspringSelection GA (OSGA) which introduces another selection step after crossover. Only those children will be accepted that surpass their parents' fitness (the better, the worse or any linearly interpolated value). This way you can even use random parent selection and put the bias on the quality of the offspring. It has been shown that this slows the genetic drift. The algorithm is implemented in our framework HeuristicLab. It features a GUI so you can download and try it on some problems.
Other techniques that combat genetic drift are niching and crowding which let the diversity flow into the selection and thus introduce another, but likely different bias.
EDIT: I want to add that the situation of having multiple solutions with equal quality might of course pose a problem as it creates neutral areas in the search space. However, I think you didn't really mean that. The primary problem is genetic drift, ie. the loss of (important) genetic information.

As a sidenote, you (the OP) said:
We know that Genetic Algorithms (or evolutionary computation) work with an encoding of the points in our solution space Ω rather than these points directly.
This is not always true. An individual is coded as a genotype, which can have any shape, such as a string (genetic algorithms) or a vector of real (evolution strategies). Each genotype is transformed into a phenotype when assessing the individual, i.e. when its fitness is calculated. In some cases, the phenotype is identical to the genotype: it is called direct coding. Otherwise, the coding is called indirect. (you may find more definitions here (section 2.2.1))
Example of direct encoding:
http://en.wikipedia.org/wiki/Neuroevolution#Direct_and_Indirect_Encoding_of_Networks
Example of indirect encoding:
Suppose you want to optimize the size of a rectangular parallelepiped dened by its length, height and width. To simplify the example, assume that these three quantities are integers between 0 and 15. We can then describe each of them using a 4-bit binary number. An example of a potential solution may be to genotype 0001 0111 01010. The corresponding phenotype is a parallelepiped of length 1, height 7 and width 10.
Now back to the original question on diversity, in addition to what DonAndre said you could read you read chapter 9 "Multi-Modal Problems and Spatial Distribution" of the excellent book Introduction to Evolutionary Computing written by A. E. Eiben and J. E. Smith. as well as a research paper on that matter such as Encouraging Behavioral Diversity in Evolutionary Robotics: an Empirical Study. In a word, diversity is not a drawback of GA, it is "just" an issue.

Related

Definition of "fractional" in algorithms

What is the definition of the word "fractional" in algorithms? I have encountered the word in phrases like "fractional algorithm", "fractional node routing problem". I have also encountered the phrase "[...]designing a fractional algorithm and transforming it into a discrete algorithm [...]". Could the word "fractional" mean "continuous"? Could it mean "perfect"?
Note: English is not my native language
I think it's a case of a paper's authors being pretentious. I went digging for some examples, the best I found is this one: http://books.google.com/books?id=X88_R8gH4hsC&lpg=PA54&ots=-FLjG-dNZg&dq=%22fractional%20algorithm%22&pg=PA54#v=onepage&q=%22fractional%20algorithm%22&f=false
The paper writes:
...we show a fractional algorithm for the switch throughput problem, i.e. one that can insert fractions of packets* [...] Then we transform our fractional algorithm into a discrete algorithm, i.e. one that can insert and transit integral packets.
My understanding suggests that a "fractional algorithm" is one that can process sub-integral, but not necessarily continuous (i.e. "a stream") units of data. Obviously this only applies to certain classes of algorithms, but an example could be an image-processing algorithm: a fractional approach might be able to work on an arbitrarily sub-pixel basis rather than per-pixel (i.e. discrete units), but it couldn't necessarily process a stream of color data (e.g. an analog TV scanline).
"Fractional" in the context of algorithms, my research specialty, has a precise technical meaning, namely, when the problem can be formulated in some obvious way as an integer program, the "fractional" version corresponds to the linear program obtained by dropping the integrality constraints. Often it's possible to transform a fractional solution into an integral one by rounding, often in a randomized manner.

Genetic algorithm chromosome generation

I need to develop a system to select a team out of a database. Is it possible to use a Genetic algorithm to get the initial population (chromosomes) representing players as some identifier. Each identifier have its genes in a database which are used to apply various rules (such as requirements to be team leader, etc.).
Is GA helpful for such scenario?
Yes, it can be.
First, evolutionary algorithms work directly with the genotype of an individual. Stating that your are using identifiers to link an individual in the algorithm is either implementation details (useless for the question) and seems simply erroneous (you should load the genotype in memory for faster access).
Your problem is a simple combination problem. For a given number of players available n from which we want to form teams of size k, a total of n! / (k! ⋅ (n - k)!) combinations are possible. This is generally too much possibilities to handle on nowadays computing resources. Evolutionary algorithms allows (among others) the optimization of a given function too big for analytic resolution or where no analytic analysis exists.
You seem confused as to how to implement this kind of process. First, choosing a good data representation is important to get good results. You should first state every characteristics you want to optimize and what is their relation toward performance and if cross-relations affect global performance.
You should be careful, though: genetic algorithms can tend to get stuck in local maximums, be sure to keep your genetic diversity high by not punishing too hard relatively good solutions or with a steep selection phase.
That being said, the analysis I gave you was for a purely combinatorial view. From the point of view of a team, where context matters, evolutionary algorithms won't be efficient. For instance, if you need 3 attackers, 2 defenders and a goalkeeper, you should simply sort your player list three times, first according to the characteristics of a good attacker, then defender and finally goalkeeper and take the best elements (first elements after sorting) to compose your team. This will be way faster and give you an optimal result than using an evolutionary algorithm. Evolutionary algorithms such as genetic algorithsm would be of prime choice if you had no idea of the mechanics of the game played nor the inner workings of an optimal play.
Nevertheless, It is a good idea to begin toying with genetic algorithms to get a grasp of their possibilities and limitations. A good idea is to begin with a simple framework in a simple language such as deap or pyevolve in Python to try your ideas out.

Reasonable bit string size for genetic algorithm convergeance

In a typical genetic algorithm, is there any guideline for estimating the generations required to converge given the amount of entropy in the description of an individual in the population?
Also, I suppose it is reasonable to also require the number of offspring per generation and rate of mutation, but adjustment of those parameters is of less interest to me at the moment.
Well, there are not any concrete guidelines in the form of mathematical models, but there are several concepts that people use to communicate about parameter settings and advice on how to choose them. One of these concepts is diversity, which would be similar to the entropy that you mentioned. The other concept is called selection pressure and determines the chance an individual has to be selected based on its relative fitness.
Diversity and selection pressure can be computed for each generation, but the change between generations is very difficult to estimate. You would also need models that predict the expected quality of your crossover and mutation operator in order to estimate the fitness distribution in the next generation.
There have been work published on these topics very recently:
* Chicano and Alba. 2011. Exact Computation of the Expectation Curves of the Bit-Flip Mutation using Landscapes Theory
* Chicano, Whitley, and Alba. 2012. Exact computation of the expectation curves for uniform crossover
Is your question resulting from a general research interest or do you seek practical guidence?
No. If you define a mathematical model of the algorithm (initial population, combination function, mutation function) you can use normal mathematical methods to calculate what you want to know, but "typical genetic algorithm" is too vague to have any meaningful answer.
If you want to set the hyperparameters of some genetic algorithm (eg number of "DNA" bits) than this is typically done in the usual way for any machine learning algorithm, with a cross validation set.

Why does adding Crossover to my Genetic Algorithm gives me worse results?

I have implemented a Genetic Algorithm to solve the Traveling Salesman Problem (TSP). When I use only mutation, I find better solutions than when I add in crossover. I know that normal crossover methods do not work for TSP, so I implemented both the Ordered Crossover and the PMX Crossover methods, and both suffer from bad results.
Here are the other parameters I'm using:
Mutation: Single Swap Mutation or Inverted Subsequence Mutation (as described by Tiendil here) with mutation rates tested between 1% and 25%.
Selection: Roulette Wheel Selection
Fitness function: 1 / distance of tour
Population size: Tested 100, 200, 500, I also run the GA 5 times so that I have a variety of starting populations.
Stop Condition: 2500 generations
With the same dataset of 26 points, I usually get results of about 500-600 distance using purely mutation with high mutation rates. When adding crossover my results are usually in the 800 distance range. The other confusing thing is that I have also implemented a very simple Hill-Climbing algorithm to solve the problem and when I run that 1000 times (faster than running the GA 5 times) I get results around 410-450 distance, and I would expect to get better results using a GA.
Any ideas as to why my GA performing worse when I add crossover? And why is it performing much worse than a simple Hill-Climb algorithm which should get stuck on local maxima as it has no way of exploring once it finds a local max?
It looks like your crossover operator is introducing too much randomness into the new generations, so you are losing your computational effort trying to improve bad solutions. Imagine that the Hill-Climb algorithm can improve a given solution to the best of its neighborhood, but your Genetic Algorithm can only make limited improvements to almost random population (solutions).
It is also worth to say that GA is not the best tool to solve the TSP. Anyway, you should look like at some examples of how to implement it. e.g. http://www.lalena.com/AI/Tsp/
With roulette-wheel selection, you're introducing bad parents into the mix. If you'd like to weight the wheel somehow to choose some better parents, this may help.
Remember, much of your population might be unfit parents. If you're not weighting parent selection at all, there's a good chance you'll be breeding consistently bad solutions that overrun the pool. Weight your selection to choose better parents more frequently, and use mutation to correct a too-similar pool by adding randomness.
You might try introducing elitism into your selection process. Elitism means that the two highest fitness individuals in the population are preserved and copied to the new population before any selection is done. After elitism is completed, selection continues as normal. Doing this means that no matter which parents are selected by the roulette wheel or what they produce during crossover, the two best individuals will always be preserved. This prevents the new population from losing fitness because its two best solutions can't be any worse than the previous generation.
One reason for your results being worse when crossover is added because may be it is not doing what it should- combine the best features of two individuals. Try with a low crossover probability may be? Population diversity could be a issue here. Morrison and De Jong in their work Measurement of Population Diversity proposes a novel measure of diversity. Using that measure you can see how your population diversity is changing over the generations. See what difference it makes when you use crossover or don't use crossover.
Also, there could be some minor mistake/missed detail in your OX or PMX implementation. Maybe you have overlooked something? BTW, may be you want to try the Edge Recombination crossover operator? (Pyevolve has an implementation).
In order to come up with 'innovative' strategies genetic algorithms generally use crossover to combine feats of different candidate solutions in order to explore the search space very quickly and find new strategies of higher fitness - not at all unlike the inner workings of human intelligence (this is why it is arguable that we never really 'invent' anything, but merely mix up stuff we already know).
By doing so (randomly combining different individuals) crossover does not preserve symmetry or ordering, and when the problem is highly dependent on symmetry of some sort or on the order of the genes in the chromosome (as in your particular case) it is indeed likely that adopting crossover will lead to worse results. As you mention yourself, it is well known that known that crossover doesn't work for the traveling salesman.
It's worth underlining that without this symmetry breaking feat of crossover genetic algorithms would not be able to fill evolutionary 'niches' (where lack of symmetry is often necessary) - and that's why crossover (in all its variants) is essentially important in a vast majority of cases.

Algorithm to optimize parameters based on imprecise fitness function

I am looking for a general algorithm to help in situations with similar constraints as this example :
I am thinking of a system where images are constructed based on a set of operations. Each operation has a set of parameters. The total "gene" of the image is then the sequential application of the operations with the corresponding parameters. The finished image is then given a vote by one or more real humans according to how "beautiful" it is.
The question is what kind of algorithm would be able to do better than simply random search if you want to find the most beautiful image? (and hopefully improve the confidence over time as votes tick in and improve the fitness function)
Given that the operations will probably be correlated, it should be possible to do better than random search. So for example operation A with parameters a1 and a2 followed by B with parameters b1 could generally be vastly superior to B followed by A. The order of operations will matter.
I have tried googling for research papers on random walk and markov chains as that is my best guesses about where to look, but so far have found no scenarios similar enough. I would really appreciate even just a hint of where to look for such an algorithm.
I think what you are looking for fall in a broad research area called metaheuristics (which include many non-linear optimization algorithms such as genetic algorithms, simulated annealing or tabu search).
Then if your raw fitness function is just giving a statistical value somehow approximating a real (but unknown) fitness function, you can probably still use most metaheuristics by (somehow) smoothing your fitness function (averaging results would do that).
Do you mean the Metropolis algorithm?
This approach uses a random walk, weighted by the fitness function. It is useful for locating local extrema in complicated fitness landscapes, but is generally slower than deterministic approaches where those will work.
You're pretty much describing a genetic algorithm in which the sequence of operations represents the "gene" ("chromosome" would be a better term for this, where the parameter[s] passed to each operation represents a single "gene", and multiple genes make up a chromosome), the image produced represents the phenotypic expression of the gene, and the votes from the real humans represent the fitness function.
If I understand your question, you're looking for an alternative algorithm of some sort that will evaluate the operations and produce a "beauty" score similar to what the real humans produce. Good luck with that - I don't think there really is any such thing, and I'm not surprised that you didn't find anything. Human brains, and correspondingly human evaluations of aesthetics, are much too staggeringly complex to be reducible to a simplistic algorithm.
Interestingly, your question seems to encapsulate the bias against using real human responses as the fitness function in genetic-algorithm-based software. This is a subject of relevance to me, since my namesake software is specifically designed to use human responses (or "votes") to evaluate music produced via a genetic process.
Simple Markov Chain
Markov chains, which you mention, aren't a bad way to go. A Markov chain is just a state machine, represented as a graph with edge weights which are transition probabilities. In your case, each of your operations is a node in the graph, and the edges between the nodes represent allowable sequences of operations. Since order matters, your edges are directed. You then need three components:
A generator function to construct the graph of allowed transitions (which operations are allowed to follow one another). If any operation is allowed to follow any other, then this is easy to write: all nodes are connected, and your graph is said to be complete. You can initially set all the edge weights to 1.
A function to traverse the graph, crossing N nodes, where N is your 'gene-length'. At each node, your choice is made randomly, but proportionally weighted by the values of the edges (so better edges have a higher chance of being selected).
A weighting update function which can be used to adjust the weightings of the edges when you get feedback about an image. For example, a simple update function might be to give each edge involved in a 'pleasing' image a positive vote each time that image is nominated by a human. The weighting of each edge is then normalised, with the currently highest voted edge set to 1, and all the others correspondingly reduced.
This graph is then a simple learning network which will be refined by subsequent voting. Over time as votes accumulate, successive traversals will tend to favour the more highly rated sequences of operations, but will still occasionally explore other possibilities.
Advantages
The main advantage of this approach is that it's easy to understand and code, and makes very few assumptions about the problem space. This is good news if you don't know much about the search space (e.g. which sequences of operations are likely to be favourable).
It's also easy to analyse and debug - you can inspect the weightings at any time and very easily calculate things like the top 10 best sequences known so far, etc. This is a big advantage - other approaches are typically much harder to investigate ("why did it do that?") because of their increased abstraction. Although very efficient, you can easily melt your brain trying to follow and debug the convergence steps of a simplex crawler!
Even if you implement a more sophisticated production algorithm, having a simple baseline algorithm is crucial for sanity checking and efficiency comparisons. It's also easy to tinker with, by messing with the update function. For example, an even more baseline approach is pure random walk, which is just a null weighting function (no weighting updates) - whatever algorithm you produce should perform significantly better than this if its existence is to be justified.
This idea of baselining is very important if you want to evaluate the quality of your algorithm's output empirically. In climate modelling, for example, a simple test is "does my fancy simulation do any better at predicting the weather than one where I simply predict today's weather will be the same as yesterday's?" Since weather is often correlated on a timescale of several days, this baseline can give surprisingly good predictions!
Limitations
One disadvantage of the approach is that it is slow to converge. A more agressive choice of update function will push promising results faster (for example, weighting new results according to a power law, rather than the simple linear normalisation), at the cost of giving alternatives less credence.
This is equivalent to fiddling with the mutation rate and gene pool size in a genetic algorithm, or the cooling rate of a simulated annealing approach. The tradeoff between 'climbing hills or exploring the landscape' is an inescapable "twiddly knob" (free parameter) which all search algorithms must deal with, either directly or indirectly. You are trying to find the highest point in some fitness search space. Your algorithm is trying to do that in less tries than random inspection, by looking at the shape of the space and trying to infer something about it. If you think you're going up a hill, you can take a guess and jump further. But if it turns out to be a small hill in a bumpy landscape, then you've just missed the peak entirely.
Also note that since your fitness function is based on human responses, you are limited to a relatively small number of iterations regardless of your choice of algorithmic approach. For example, you would see the same issue with a genetic algorithm approach (fitness function limits the number of individuals and generations) or a neural network (limited training set).
A final potential limitation is that if your "gene-lengths" are long, there are many nodes, and many transitions are allowed, then the size of the graph will become prohibitive, and the algorithm impractical.

Resources