Why do you need fitness scaling in Genetic Algorithms? - genetic-algorithm

Reading the book "Genetic Algorithms" by David E. Goldberg, he mentions fitness scaling in Genetic Algorithms.
My understanding of this function is to constrain the strongest candidates so that they don't flood the pool for reproduction.
Why would you want to constrain the best candidates? In my mind having as many of the best candidates as early as possible would help get to the optimal solution as fast as possible.

What if your early best candidates later on turn out to be evolutionary dead ends? Say, your early fittest candidates are big, strong agents that dominate smaller, weaker candidates. If all the weaker ones are eliminated, you're stuck with large beasts that maybe have a weakness to an aspect of the environment that hasn't been encountered yet that the weak ones can handle: think dinosaurs vs tiny mammals after an asteroid impact. Or, in a more deterministic setting that is more likely the case in a GA, the weaker candidates may be one or a small amount of evolutionary steps away from exploring a whole new fruitful part of the fitness landscape: imagine the weak small critters evolving flight, opening up a whole new world of possibilities that the big beasts most likely will never touch.
The underlying problem is that your early strongest candidates may actually be in or around a local maximum in fitness space, that may be difficult to come out of. It could be that the weaker candidates are actually closer to the global maximum.
In any case, by pruning your population aggressively, you reduce the genetic diversity of your population, which in general reduces the search space you are covering and limits how fast you can search this space. For instance, maybe your best candidates are relatively close to the global best solution, but just inbreeding that group may not move it much closer to it, and you may have to wait for enough random positive mutations to happen. However, perhaps one of the weak candidates that you wanted to cut out has some gene that on its own doesn't help much, but when crossed with the genes from your strong candidates in may cause a big evolutionary jump! Imagine, say, a human crossed with spider DNA.

#sgvd's answer makes valid points but I would like to elaborate more.
First of all, we need to define what fitness scaling actually means. If it means just multiplying the fitnesses by some factor then this does not change the relationships in the population - if the best individual had 10 times higher fitness than the worst one, after such multiplication this is still true (unless you multiply by zero which makes no real sense). So, a much more sensible fitness scaling is an affine transformation of the fitness values:
scaled(f) = a * f + b
i.e. the values are multiplied by some number and offset by another number, up or down.
Fitness scaling makes sense only with certain types of selection strategies, namely those where the selection probability is proportional to the fitness of the individuals1.
Fitness scaling plays, in fact, two roles. The first one is merely practical - if you want a probability to be proportional to the fitness, you need the fitness to be positive. So, if your raw fitness value can be negative (but is limited from below), you can adjust it so you can compute probabilities out of it. Example: if your fitness gives values from the range [-10, 10], you can just add 10 to the values to get all positive values.
The second role is, as you and #sgvd already mentioned, to limit the capability of the strongest solutions to overwhelm the weaker ones. The best illustration would be with an example.
Suppose that your raw fitness values gives values from the range [0, 100]. If you left it this way, the worst individuals would have zero probability of being selected, and the best ones would have up to 100x higher probability than the worst ones (excluding the really worst ones). However, let's set the scaling factors to a = 1/2, b = 50. Then, the range is transformed to [50, 100]. And right away, two things happen:
Even the worst individuals have non-zero probability of being selected.
The best individuals are now only 2x more likely to be selected than the worst ones.
Exploration vs. exploitation
By setting the scaling factors you can control whether the algorithm will do more exploration over exploitation and vice versa. The more "compressed"2 the values are going to be after the scaling, the more exploration is going to be done (because the likelihood of the best individuals being selected compared to the worst ones will be decreased). And vice versa, the more "expanded"2 are the values going to be, the more exploitation is going to be done (because the likelihood of the best individuals being selected compared to the worst ones will be increased).
Other selection strategies
As I have already written at the beginning, fitness scaling only makes sense with selection strategies which derive the selection probability proportionally from the fitness values. There are, however, other selection strategies that do not work like this.
Ranking selection
Ranking selection is identical to roulette wheel selection but the numbers the probabilities are derived from are not the raw fitness values. Instead, the whole population is sorted by the raw fitness values and the rank (i.e. the position in the sorted list) is the number you derive the selection probability from.
This totally erases the discrepancy when there is one or two "big" individuals and a lot of "small" ones. They will just be ranked.
Tournament selection
In this type of selection you don't even need to know the absolute fitness values at all, you just need to be able to compare two of them and tell which one is better. To select one individual using tournament selection, you randomly pick a number of individuals from the population (this number is a parameter) and you pick the best one of them. You repeat that as long as you have selected enough individuals.
Here you can also control the exploration vs. exploitation thing by the size of the tournament - the larger the tournament is the higher is the chance that the best individuals will take part in the tournaments.
1 An example of such selection strategy is the classical roulette wheel selection. In this selection strategy, each individual has its own section of the roulette wheel which is proportional in size to the particular individual's fitness.
2 Assuming the raw values are positive, the scaled values get compressed as a goes down to zero and as b goes up. Expansion goes the other way around.

Related

How to valorize better offsprings better than with my roulette selection method?

I am playing around with genetic programming algorithms, and I want to know how I can valorize and make sure my best exemplares reproduce more by substituting or improving the way I choose which one will reproduce. Currently the method I use looks like this:
function roulette(population)
local slice = sum_of_fitnesses(population) * math.random()
local sum = 0
for iter = 1, #population do
sum = sum + population[iter].fitness
if sum >= slice then
return population[iter]
end
end
end
But I can't get my population to reach an average fitness which is above a certain value and I worry it's because of less fit members reproducing with more fit members and thus continuing to spread their weak genes around.
So how can I improve my roulette selection method? Or should I use a completely different fitness proportionate selector?
There are a couple of issues at play here.
You are choosing the probability of an individual replicating based on its fitness, so the fitness function that you are using needs to exaggerate small differences or else having a minor decrease in fitness isn't so bad. For example, if a fitness drops from 81 to 80, this change is probably within the noise of the system and won't make much of a different to evolution. It will certainly be almost impossible to climb to a very high fitness if a series of small changes need to be made because the selective pressure simply won't be strong enough.
The way you solve this problem is by using something like tournament selection. In it's simplest form, every time you want to choose another individual to be born, you pick K random individuals (K is known and the "tournament size"). You calculate the fitness of each individual and whomever has the highest fitness is replicated. It doesn't matter if the fitness difference is 81 vs 80 or if its 10000 vs 2, since it simply takes the highest fitness.
Now the question is: what should you set K to? K can be thought of as the strength of selection. If you set it low (e.g., K=2) then many low fitness individuals will get lucky and slip through, being competed against other low-fitness individuals. You'll get a lot of diversity, but very little section. On the flip side, if you set K to be high (say, K=100), you're ALWAYS going to pick one of the highest fitnesses in the population, ensuring that the population average is driven closer to the max, but also driving down diversity in the population.
The particular tradeoff here depends on the specific problem. I recommend trying out different options (including your original algorithm) with a few different problems to see what happens. For example, try the all-ones problem: potential solutions are bit strings and a fitness is simply the number of 1's. If you have weak selection (as in your original example, or with K=2), you'll see that it never quite gets to a perfect all-ones solution.
So, why not always use a high K? Well consider a problem where ones are negative unless they appear in a block of four consecutive ones (or eight, or however many), when they suddenly become very positive. Such a problem is "deceptive", which means that you need to explore through solutions that look bad in order to find ones that are good. If you set your strength of selection too high, you'll never collect three ones for that final mutation to give you the fourth.
Lots of more advanced techniques exist that use tournament selection that you might want to look at. For example, varying K over time, or even within a population, select some individuals using a low K and others using a high K. It's worth reading up on some more if you're planning to build a better algorithm.

Why do Genetics Algorithms evaluate the same genome several times?

I'm trying to solve the DARP Problem, using a Genetic Algorithm DLL.
Thing is that eventhough it sometimes comes up with the right solution, others times it does not. Although Im using a really simple version of the problem. When I checked the genomes the algorithm evaluated I found out that it evaluates several times the same genome.
Why does it evaluate the same several time? wouldn't be more efficient if it does not?
There is a difference between evaluating the same chromosome twice, and using the same chromosome in a population (or different populations) more than once. The first can probably be usefully avoided; the second, maybe not.
Consider:
In some generation G1, you mate 0011 and 1100, cross them right down the middle, get lucky and fail to mutate any of the genes, and end up with 0000 and 1111. You evalate them, stick them back into the population for the next generation, and continue the algorithm.
Then in some later generation G2, you mate 0111 and 1001 at the first index and (again, ignoring mutation) end up with 1111 and 0001. One of those has already been evaluated, so why evaluate it again? Especially if evaluating that function is very expensive, it may very well be better to keep a hash table (or some such) to store the results of previous evaluations, if you can afford the memory.
But! Just because you've already generated a value for a chromosome doesn't mean you shouldn't include it naturally in the results going forward, allowing it to either mutate further or allowing it to mate with other members of the same population. If you don't do that, you're effectively punishing success, which is exactly the opposite of what you want to do. If a chromosome persists or reappears from generation to generation, that is a strong sign that it is a good solution, and a well-chosen mutation operator will act to drive it out if it is a local maximum instead of a global maximum.
The basic explanation for why a GA might evaluate the same individual is precisely because it is non-guided, so the recreation of a previously-seen (or simultaneously-seen) genome is something to be expected.
More usefully, your question could be interpreted as about two different things:
A high cost associated with evaluating the fitness function, which could be mitigated by hashing the genome, but trading off memory. This is conceivable, but I've never seen it. Usually GAs search high-dimensional-spaces so you'd end up with a very sparse hash.
A population in which many or all members have converged to a single or few patterns: at some point, the diversity of your genome will tend towards 0. This is the expected outcome as the algorithm has converged upon the best solution that it has found. If this happens too early, with mediocre results, it indicates that you are stuck in a local minimum and you have lost diversity too early.
In this situation, study the output to determine which of the two scenarios has happened:
You lose diversity because particularly-fit individuals win a very large percentage of the parental lotteries. Or,
You don't gain diversity because the population over time is quite static.
In the former case, you have to maintain diversity. Ensure that less-fit individuals get more chances, perhaps by decreasing turnover or scaling the fitness function.
In the latter case, you have to increase diversity. Ensure that you inject more randomness into the population, perhaps by increasing mutation rate or increasing turnover.
(In both cases, of course, you ultimately do want diversity to decrease as the GA converges in the solution space. So you don't just want to "turn up all the dials" and devolve into a Monte Carlo search.)
Basicly genetic algoritm consists of
initial population (size N)
fitness function
mutation operation
crossover operation (performed usually on 2 individuals by taking some parts of their genome and combining it into new individual)
At every steps it
choses random individuals
performs crossover resulting in new individuals
possibly perform mutation(change random gene in random individual)
evaluate all old and new individuals with fitness function
choose N best fitted to be a new population on next iteration
The algorythm stops when it reaches a threshold of fitness function, or if there is no changes in population in last K iterations.
So, it could stop not at the best solution, but at local maximum.
A part of the population could stay unchanged from one iteration to another, because they could have a good value of fitness function.
Also it is possible to "fall back" to previos genomes because of mutation.
There are a lot of tricks to make genetic algorytm work better : choosing appropriate population encoding into genom, finding a good fitness function, playing with crossover and mutation ratio.
Depending on the particulars of your GA, you may have the same genomes in successive populations. For example Elitism saves the best or n best genomes from each population.
Reevaluating genomes is inefficient from a computational standpoint. The easiest way to avoid this is to put a boolean HasFitness flag for each genome. You could also create a unique string key for each genome encoding and store all the fitness values in a dictionary. This lookup can get very expensive, so this is only recommended if your fitness function is expensive enough to warrant the added expense of the lookup.
Elitism aside, The GA does not evaluate the same genome repeatedly. What you are seeing is identical genomes being regenerated and reevaluated. This is because each generation is a new set of genomes, which may or may not have been evaluated before.
To avoid the reevaluation you would need to keep a list of already produced genomes, with their fitness. To access the fitness you would need to have compare each of your new population with the list, when it is not in the list you would need to evaluate it, and add it to the list.
As real world applications have thousands of parameters, you end up with millions of stored genomes. This then becomes massively expensive to search & maintain. So probably quicker to just evaluate the genome each time.

breeding parents for multiple children in genetic algorithm

I'm building my first Genetic Algorithm in javascript, using a collection of tutorials.
I'm building a somewhat simpler structure to this scheduling tutorial http://www.codeproject.com/KB/recipes/GaClassSchedule.aspx#Chromosome8, but I've run into a problem with breeding.
I get a population of 60 individuals, and now I'm picking the top two individuals to breed, and then selecting a few random other individuals to breed with the top two, am I not going to end up with a fairly small amount of parents rather quickly?
I figure I'm not going to be making much progress in the solution if I breed the top two results with each of the next 20.
Is that correct? Is there a generally accepted method for doing this?
I have a sample of genetic algorithms in Javascript here.
One problem with your approach is that you are killing diversity in the population by mating always the top 2 individuals. That will never work very well because it's too greedy, and you'll actually be defeating the purpose of having a genetic algorithm in the first place.
This is how I am implementing mating with elitism (which means I am retaining a percentage of unaltered best fit individuals and randomly mating all the rest), and I'll let the code do the talking:
// save best guys as elite population and shove into temp array for the new generation
for(var e = 0; e < ELITE; e++) {
tempGenerationHolder.push(fitnessScores[e].chromosome);
}
// randomly select a mate (including elite) for all of the remaining ones
// using double-point crossover should suffice for this silly problem
// note: this should create INITIAL_POP_SIZE - ELITE new individualz
for(var s = 0; s < INITIAL_POP_SIZE - ELITE; s++) {
// generate random number between 0 and INITIAL_POP_SIZE - ELITE - 1
var randInd = Math.floor(Math.random()*(INITIAL_POP_SIZE - ELITE));
// mate the individual at index s with indivudal at random index
var child = mate(fitnessScores[s].chromosome, fitnessScores[randInd].chromosome);
// push the result in the new generation holder
tempGenerationHolder.push(child);
}
It is fairly well commented but if you need any further pointers just ask (and here's the github repo, or you can just do a view source on the url above). I used this approach (elitism) a number of times, and for basic scenarios it usually works well.
Hope this helps.
When I've implemented genetic algorithms in the past, what I've done is to pick the parents always probabilistically - that is, you don't necessarily pick the winners, but you will pick the winners with a probability depending on how much better they are than everyone else (based on the fitness function).
I cannot remember the name of the paper to back it up, but there is a mathematical proof that "ranking" selection converges faster than "proportional" selection. If you try looking around for "genetic algorithm selection strategy" you may find something about this.
EDIT:
Just to be more specific, since pedalpete asked, there are two kinds of selection algorithms: one based on rank, one based on fitness proportion. Consider a population with 6 solutions and the following fitness values:
Solution Fitness Value
A 5
B 4
C 3
D 2
E 1
F 1
In ranking selection, you would take the top k (say, 2 or 4) and use those as the parents for your next generation. In proportional ranking, to form each "child", you randomly pick the parent with a probability based on fitness value:
Solution Probability
A 5/16
B 4/16
C 3/16
D 2/16
E 1/16
F 1/16
In this scheme, F may end up being a parent in the next generation. With a larger population size (100 for example - may be larger or smaller depending on the search space), this will mean that the bottom solutions will end up being a parent some of the time. This is OK, because even "bad" solutions have some "good" aspects.
Keeping the absolute fittest individuals is called elitism, and it does tend to lead to faster convergence, which, depending on the fitness landscape of the problem, may or may not be what you want. Faster convergence is good if it reduces the amount of effort taken to find an acceptable solution but it's bad if it means that you end up with a local optimum and ignore better solutions.
Picking the other parents completely at random isn't going to work very well. You need some mechanism whereby fitter candidates are more likely to be selected than weaker ones. There are several different selection strategies that you can use, each with different pros and cons. Some of the main ones are described here. Typically you will use roulette wheel selection or tournament selection.
As for combining the elite individuals with every single one of the other parents, that is a recipe for destroying variation in the population (as well as eliminating the previously preserved best candidates).
If you employ elitism, keep the elite individuals unchanged (that's the point of elitism) and then mate pairs of the other parents (which may or may not include some or all of the elite individuals, depending on whether they were also picked out as parents by the selection strategy). Each parent will only mate once unless it was picked out multiple times by the selection strategy.
Your approach is likely to suffer from premature convergence. There are lots of other selection techniques to pick from though. One of the more popular that you may wish to consider is Tournament selection.
Different selection strategies provide varying levels of 'selection pressure'. Selection pressure is how strongly the strategy insists on choosing the best programs. If the absolute best programs are chosen every time, then your algorithm effectively becomes a hill-climber; it will get trapped in local optimum with no way of navigating to other peaks in the fitness landscape. At the other end of the scale, no fitness pressure at all means the algorithm will blindly stumble around the fitness landscape at random. So, the challenge is to try to choose an operator with sufficient (but not excessive) selection pressure, for the problem you are tackling.
One of the advantages of the tournament selection operator is that by just modifying the size of the tournament, you can easily tweak the level of selection pressure. A larger tournament will give more pressure, a smaller tournament less.

Algorithm to optimize parameters based on imprecise fitness function

I am looking for a general algorithm to help in situations with similar constraints as this example :
I am thinking of a system where images are constructed based on a set of operations. Each operation has a set of parameters. The total "gene" of the image is then the sequential application of the operations with the corresponding parameters. The finished image is then given a vote by one or more real humans according to how "beautiful" it is.
The question is what kind of algorithm would be able to do better than simply random search if you want to find the most beautiful image? (and hopefully improve the confidence over time as votes tick in and improve the fitness function)
Given that the operations will probably be correlated, it should be possible to do better than random search. So for example operation A with parameters a1 and a2 followed by B with parameters b1 could generally be vastly superior to B followed by A. The order of operations will matter.
I have tried googling for research papers on random walk and markov chains as that is my best guesses about where to look, but so far have found no scenarios similar enough. I would really appreciate even just a hint of where to look for such an algorithm.
I think what you are looking for fall in a broad research area called metaheuristics (which include many non-linear optimization algorithms such as genetic algorithms, simulated annealing or tabu search).
Then if your raw fitness function is just giving a statistical value somehow approximating a real (but unknown) fitness function, you can probably still use most metaheuristics by (somehow) smoothing your fitness function (averaging results would do that).
Do you mean the Metropolis algorithm?
This approach uses a random walk, weighted by the fitness function. It is useful for locating local extrema in complicated fitness landscapes, but is generally slower than deterministic approaches where those will work.
You're pretty much describing a genetic algorithm in which the sequence of operations represents the "gene" ("chromosome" would be a better term for this, where the parameter[s] passed to each operation represents a single "gene", and multiple genes make up a chromosome), the image produced represents the phenotypic expression of the gene, and the votes from the real humans represent the fitness function.
If I understand your question, you're looking for an alternative algorithm of some sort that will evaluate the operations and produce a "beauty" score similar to what the real humans produce. Good luck with that - I don't think there really is any such thing, and I'm not surprised that you didn't find anything. Human brains, and correspondingly human evaluations of aesthetics, are much too staggeringly complex to be reducible to a simplistic algorithm.
Interestingly, your question seems to encapsulate the bias against using real human responses as the fitness function in genetic-algorithm-based software. This is a subject of relevance to me, since my namesake software is specifically designed to use human responses (or "votes") to evaluate music produced via a genetic process.
Simple Markov Chain
Markov chains, which you mention, aren't a bad way to go. A Markov chain is just a state machine, represented as a graph with edge weights which are transition probabilities. In your case, each of your operations is a node in the graph, and the edges between the nodes represent allowable sequences of operations. Since order matters, your edges are directed. You then need three components:
A generator function to construct the graph of allowed transitions (which operations are allowed to follow one another). If any operation is allowed to follow any other, then this is easy to write: all nodes are connected, and your graph is said to be complete. You can initially set all the edge weights to 1.
A function to traverse the graph, crossing N nodes, where N is your 'gene-length'. At each node, your choice is made randomly, but proportionally weighted by the values of the edges (so better edges have a higher chance of being selected).
A weighting update function which can be used to adjust the weightings of the edges when you get feedback about an image. For example, a simple update function might be to give each edge involved in a 'pleasing' image a positive vote each time that image is nominated by a human. The weighting of each edge is then normalised, with the currently highest voted edge set to 1, and all the others correspondingly reduced.
This graph is then a simple learning network which will be refined by subsequent voting. Over time as votes accumulate, successive traversals will tend to favour the more highly rated sequences of operations, but will still occasionally explore other possibilities.
Advantages
The main advantage of this approach is that it's easy to understand and code, and makes very few assumptions about the problem space. This is good news if you don't know much about the search space (e.g. which sequences of operations are likely to be favourable).
It's also easy to analyse and debug - you can inspect the weightings at any time and very easily calculate things like the top 10 best sequences known so far, etc. This is a big advantage - other approaches are typically much harder to investigate ("why did it do that?") because of their increased abstraction. Although very efficient, you can easily melt your brain trying to follow and debug the convergence steps of a simplex crawler!
Even if you implement a more sophisticated production algorithm, having a simple baseline algorithm is crucial for sanity checking and efficiency comparisons. It's also easy to tinker with, by messing with the update function. For example, an even more baseline approach is pure random walk, which is just a null weighting function (no weighting updates) - whatever algorithm you produce should perform significantly better than this if its existence is to be justified.
This idea of baselining is very important if you want to evaluate the quality of your algorithm's output empirically. In climate modelling, for example, a simple test is "does my fancy simulation do any better at predicting the weather than one where I simply predict today's weather will be the same as yesterday's?" Since weather is often correlated on a timescale of several days, this baseline can give surprisingly good predictions!
Limitations
One disadvantage of the approach is that it is slow to converge. A more agressive choice of update function will push promising results faster (for example, weighting new results according to a power law, rather than the simple linear normalisation), at the cost of giving alternatives less credence.
This is equivalent to fiddling with the mutation rate and gene pool size in a genetic algorithm, or the cooling rate of a simulated annealing approach. The tradeoff between 'climbing hills or exploring the landscape' is an inescapable "twiddly knob" (free parameter) which all search algorithms must deal with, either directly or indirectly. You are trying to find the highest point in some fitness search space. Your algorithm is trying to do that in less tries than random inspection, by looking at the shape of the space and trying to infer something about it. If you think you're going up a hill, you can take a guess and jump further. But if it turns out to be a small hill in a bumpy landscape, then you've just missed the peak entirely.
Also note that since your fitness function is based on human responses, you are limited to a relatively small number of iterations regardless of your choice of algorithmic approach. For example, you would see the same issue with a genetic algorithm approach (fitness function limits the number of individuals and generations) or a neural network (limited training set).
A final potential limitation is that if your "gene-lengths" are long, there are many nodes, and many transitions are allowed, then the size of the graph will become prohibitive, and the algorithm impractical.

Channel allocation algorithm

We have a set of radio nodes in close proximity to each other and would like to allocate the frequencies for them to minimize overlap. To get complete coverage of the area, radio channels need to be oversubscribed and so we will have nearby radios transmitting on the same frequency.
Sample data:
5 Frequencies
343 Radios
4158 Edges
My current best guess is to randomly generate a population of frequency allocations and to swap frequencies between radios until the best score does not improve for 10 generations. Score is the sum of 1/range^2 for radios on the same frequency.
Each edge is the distance between the radios, corrected for walls and floors. Edges above 2* the max radio range have been culled from the list.
Is there a better way?
This is basically a graph-coloring problem with a twist. Rather than all proper colorings being equally good, some proper colorings are better than others, as defined by your scoring algorithm.
I think your genetic approach is practical and will yield good (if not provably optimal) solutions, but I would definitely suggest looking at some graph-coloring papers and seeing how applicable they are. It is very likely that you will get some great ideas for deciding how your algorithm should consider the available choices.
I agree that a simulation based on random initial assignment followed by some optimization is a good approach, but you're describing an optimization procedure which does not seem optimal, if I understand correctly (you're planning to swap frequencies at random if I read you correctly). At each optimization step you could pick a "reasonable" improvement by taking one radio from each frequency group and considering the 5*4/2=10 possible swaps of frequencies between two of them, and either choose the best, or (say) one of those which has positive delta score, with probabilities proportional to the deltas in the scores.
In the spirit of "simulated annealing", once the overall score seems to have more or less stabilized, you may want to switch for a small number of steps to "high temperature" (high randomness) where you just pick the set of 5 radios and swap them all e.g. with a circular permutation of frequency assignments -- do that a few times then go to the "cooling down" part again with the procedure in the above paragraph (which tries to get a cheap simulation of a maximum-gradient descent;-).
My quick stab at it would be to use a thin plate spline (or possibly a similar, cleverer linear algebra technique) to fit a plane to the function of frequency density. The average 'altitude' of each plane (per frequency) would then tell you whether a frequency is overused (i.e. when it's higher than the others); the slope would be an indication of the spatial distribution.

Resources