Can fitness functions be of the type "minimum-value as best-value" in GA? - genetic-algorithm

I have implemented a genetic algorithm, in which the fitness function considers the coefficient of variation of the data as the fitness value, and so the closer the COV is to zero, the better. Will it still be called the fitness function? Usually the fitness value is defined such that the greater the value, the better.

An abstract & direct answer would be fitness function is a measure for your solutions and it should be well defined to drive the search space to a near-optimal solution you want to achieve for your problem instance. You can find more for designing fitness function here: A guide for fitness function design
You can definitely have an EA for the maximization or minimization problem. A general evolutionary cycle is shown in the image below (from one of the text book-Introduction to Evolutionary Computing). As per the EA cycle, you need to evaluate your solutions after the population is created, and after offsprings are created. Basically, survivor selection is the process where you would like to focus on maximization or minimization, and your problem is minimization. For your problem, you might want to adopt one of the following approaches:
When you create your fitness function, you can negate the fitness. And make sure in this approach you have to choose the highest fitness individuals for next-generation (aka in survivor selection).
Let your fitness function be positive. But as you want to approach as a minimization problem, you have to choose the lowest fitness individuals for next-generation (aka in survivor selection).

Related

Picking breeders in a genetic algorithm

I'm implementing a genetic algorithm, and I'm uncertain on how to pick the breeders for the next generation:
I am holding a list of all the past individuals calculated,
Is it ok if I select the breeders from this list? or should I rather pick the best ones from the latest generation?
If you only select from the latest generation, it's possible for your population to evolve backwards. There's no guarantee that later generations are better than earlier ones. To prevent this, some algorithms maintain a pool of "elite" individuals that continually mix with the regular population. (The strategy is called "elitism".) A particularly successful version of this approach is Coello's micro-GA, which uses a very small population with elite preservation and frequent restarts to make progress.
It's usually preferable to select the ones that have the highest fitness value
Based on a certain function that you define, evaluate individuals in your population and choose the best N ones. For example, the lightest rocks in algorithm where you want to generate light rocks.
If computing the fitness value of all individuals in your population is costly operations, you should first select a sample based on some distribution. A good one is to select in a uniform fashion (all individuals have equal probabilities to be selected)
If you can't easily define a fitness function, a good technique is to run simulations. For example if your phenotype (criteria) is hard to define, like a shape of an irregular 3D object for example.
You can try one of the following methods to select breeders( parent)
- Roulette wheel selection
- Stochastic Universal Sampling
- Tournament Selection
- Random Selection
Reference :
https://www.tutorialspoint.com/genetic_algorithms/genetic_algorithms_parent_selection.htm

Why do you need fitness scaling in Genetic Algorithms?

Reading the book "Genetic Algorithms" by David E. Goldberg, he mentions fitness scaling in Genetic Algorithms.
My understanding of this function is to constrain the strongest candidates so that they don't flood the pool for reproduction.
Why would you want to constrain the best candidates? In my mind having as many of the best candidates as early as possible would help get to the optimal solution as fast as possible.
What if your early best candidates later on turn out to be evolutionary dead ends? Say, your early fittest candidates are big, strong agents that dominate smaller, weaker candidates. If all the weaker ones are eliminated, you're stuck with large beasts that maybe have a weakness to an aspect of the environment that hasn't been encountered yet that the weak ones can handle: think dinosaurs vs tiny mammals after an asteroid impact. Or, in a more deterministic setting that is more likely the case in a GA, the weaker candidates may be one or a small amount of evolutionary steps away from exploring a whole new fruitful part of the fitness landscape: imagine the weak small critters evolving flight, opening up a whole new world of possibilities that the big beasts most likely will never touch.
The underlying problem is that your early strongest candidates may actually be in or around a local maximum in fitness space, that may be difficult to come out of. It could be that the weaker candidates are actually closer to the global maximum.
In any case, by pruning your population aggressively, you reduce the genetic diversity of your population, which in general reduces the search space you are covering and limits how fast you can search this space. For instance, maybe your best candidates are relatively close to the global best solution, but just inbreeding that group may not move it much closer to it, and you may have to wait for enough random positive mutations to happen. However, perhaps one of the weak candidates that you wanted to cut out has some gene that on its own doesn't help much, but when crossed with the genes from your strong candidates in may cause a big evolutionary jump! Imagine, say, a human crossed with spider DNA.
#sgvd's answer makes valid points but I would like to elaborate more.
First of all, we need to define what fitness scaling actually means. If it means just multiplying the fitnesses by some factor then this does not change the relationships in the population - if the best individual had 10 times higher fitness than the worst one, after such multiplication this is still true (unless you multiply by zero which makes no real sense). So, a much more sensible fitness scaling is an affine transformation of the fitness values:
scaled(f) = a * f + b
i.e. the values are multiplied by some number and offset by another number, up or down.
Fitness scaling makes sense only with certain types of selection strategies, namely those where the selection probability is proportional to the fitness of the individuals1.
Fitness scaling plays, in fact, two roles. The first one is merely practical - if you want a probability to be proportional to the fitness, you need the fitness to be positive. So, if your raw fitness value can be negative (but is limited from below), you can adjust it so you can compute probabilities out of it. Example: if your fitness gives values from the range [-10, 10], you can just add 10 to the values to get all positive values.
The second role is, as you and #sgvd already mentioned, to limit the capability of the strongest solutions to overwhelm the weaker ones. The best illustration would be with an example.
Suppose that your raw fitness values gives values from the range [0, 100]. If you left it this way, the worst individuals would have zero probability of being selected, and the best ones would have up to 100x higher probability than the worst ones (excluding the really worst ones). However, let's set the scaling factors to a = 1/2, b = 50. Then, the range is transformed to [50, 100]. And right away, two things happen:
Even the worst individuals have non-zero probability of being selected.
The best individuals are now only 2x more likely to be selected than the worst ones.
Exploration vs. exploitation
By setting the scaling factors you can control whether the algorithm will do more exploration over exploitation and vice versa. The more "compressed"2 the values are going to be after the scaling, the more exploration is going to be done (because the likelihood of the best individuals being selected compared to the worst ones will be decreased). And vice versa, the more "expanded"2 are the values going to be, the more exploitation is going to be done (because the likelihood of the best individuals being selected compared to the worst ones will be increased).
Other selection strategies
As I have already written at the beginning, fitness scaling only makes sense with selection strategies which derive the selection probability proportionally from the fitness values. There are, however, other selection strategies that do not work like this.
Ranking selection
Ranking selection is identical to roulette wheel selection but the numbers the probabilities are derived from are not the raw fitness values. Instead, the whole population is sorted by the raw fitness values and the rank (i.e. the position in the sorted list) is the number you derive the selection probability from.
This totally erases the discrepancy when there is one or two "big" individuals and a lot of "small" ones. They will just be ranked.
Tournament selection
In this type of selection you don't even need to know the absolute fitness values at all, you just need to be able to compare two of them and tell which one is better. To select one individual using tournament selection, you randomly pick a number of individuals from the population (this number is a parameter) and you pick the best one of them. You repeat that as long as you have selected enough individuals.
Here you can also control the exploration vs. exploitation thing by the size of the tournament - the larger the tournament is the higher is the chance that the best individuals will take part in the tournaments.
1 An example of such selection strategy is the classical roulette wheel selection. In this selection strategy, each individual has its own section of the roulette wheel which is proportional in size to the particular individual's fitness.
2 Assuming the raw values are positive, the scaled values get compressed as a goes down to zero and as b goes up. Expansion goes the other way around.

Evolutionary algorithm - Traveling Salesman

I try to solve this problem using genetic algorithm and get difficult to choose the fitness function.
My problem is a little differnt than the original Traveling Salesman Problem ,since the population and maybe also the win unit not neccesrly contain all the cities.
So , I have 2 value for each unit: the amount of cities he visit, the total time and the order he visit the cities.
I tried 2-3 fitness function but they don't give good sulotion.
I need idea of good fitness function which take in account the amount of cities he visited and also the total time.
Thanks!
In addition to Peladao's suggestions of using a pareto approach or some kind of weighted sum there are two more possibilities that I'd like to mention for the sake of completeness.
First, you could prioritize your fitness functions. So that the individuals in the population are ranked by first goal, then second goal, then third goal. Therefore only if two individuals are equal in the first goal they will be compared by second goal. If there is a clear dominance in your goals this may be a feasible approach.
Second, you could define two of your goals as constraints that you penalize only when they exceed a certain threshold. This may be feasible when e.g. the amount of cities should not be in a certain range, e.g. [4;7], but doesn't matter if it's 4 or 5. This is similar to a weighted sum approach where the contribution of the individual goals to the combined fitness value differs by several orders of magnitude.
The pareto approach is the only one that treats all objectives with equal importance. It requires special algorithms suited for multiobjective optimization though, such as NSGA-II, SPEA2, AbYSS, PAES, MO-TS, ...
In any case, it would be good if you could show the 2-3 fitness functions that you tried. Maybe there were rather simple errors.
Multiple-objective fitness functions can be implemented using a Pareto optimal.
You could also use a weighted sum of different fitness values.
For a good and readable introduction into multiple-objective optimisation and GA: http://www.calresco.org/lucas/pmo.htm

Reasonable bit string size for genetic algorithm convergeance

In a typical genetic algorithm, is there any guideline for estimating the generations required to converge given the amount of entropy in the description of an individual in the population?
Also, I suppose it is reasonable to also require the number of offspring per generation and rate of mutation, but adjustment of those parameters is of less interest to me at the moment.
Well, there are not any concrete guidelines in the form of mathematical models, but there are several concepts that people use to communicate about parameter settings and advice on how to choose them. One of these concepts is diversity, which would be similar to the entropy that you mentioned. The other concept is called selection pressure and determines the chance an individual has to be selected based on its relative fitness.
Diversity and selection pressure can be computed for each generation, but the change between generations is very difficult to estimate. You would also need models that predict the expected quality of your crossover and mutation operator in order to estimate the fitness distribution in the next generation.
There have been work published on these topics very recently:
* Chicano and Alba. 2011. Exact Computation of the Expectation Curves of the Bit-Flip Mutation using Landscapes Theory
* Chicano, Whitley, and Alba. 2012. Exact computation of the expectation curves for uniform crossover
Is your question resulting from a general research interest or do you seek practical guidence?
No. If you define a mathematical model of the algorithm (initial population, combination function, mutation function) you can use normal mathematical methods to calculate what you want to know, but "typical genetic algorithm" is too vague to have any meaningful answer.
If you want to set the hyperparameters of some genetic algorithm (eg number of "DNA" bits) than this is typically done in the usual way for any machine learning algorithm, with a cross validation set.

Algorithm to optimize parameters based on imprecise fitness function

I am looking for a general algorithm to help in situations with similar constraints as this example :
I am thinking of a system where images are constructed based on a set of operations. Each operation has a set of parameters. The total "gene" of the image is then the sequential application of the operations with the corresponding parameters. The finished image is then given a vote by one or more real humans according to how "beautiful" it is.
The question is what kind of algorithm would be able to do better than simply random search if you want to find the most beautiful image? (and hopefully improve the confidence over time as votes tick in and improve the fitness function)
Given that the operations will probably be correlated, it should be possible to do better than random search. So for example operation A with parameters a1 and a2 followed by B with parameters b1 could generally be vastly superior to B followed by A. The order of operations will matter.
I have tried googling for research papers on random walk and markov chains as that is my best guesses about where to look, but so far have found no scenarios similar enough. I would really appreciate even just a hint of where to look for such an algorithm.
I think what you are looking for fall in a broad research area called metaheuristics (which include many non-linear optimization algorithms such as genetic algorithms, simulated annealing or tabu search).
Then if your raw fitness function is just giving a statistical value somehow approximating a real (but unknown) fitness function, you can probably still use most metaheuristics by (somehow) smoothing your fitness function (averaging results would do that).
Do you mean the Metropolis algorithm?
This approach uses a random walk, weighted by the fitness function. It is useful for locating local extrema in complicated fitness landscapes, but is generally slower than deterministic approaches where those will work.
You're pretty much describing a genetic algorithm in which the sequence of operations represents the "gene" ("chromosome" would be a better term for this, where the parameter[s] passed to each operation represents a single "gene", and multiple genes make up a chromosome), the image produced represents the phenotypic expression of the gene, and the votes from the real humans represent the fitness function.
If I understand your question, you're looking for an alternative algorithm of some sort that will evaluate the operations and produce a "beauty" score similar to what the real humans produce. Good luck with that - I don't think there really is any such thing, and I'm not surprised that you didn't find anything. Human brains, and correspondingly human evaluations of aesthetics, are much too staggeringly complex to be reducible to a simplistic algorithm.
Interestingly, your question seems to encapsulate the bias against using real human responses as the fitness function in genetic-algorithm-based software. This is a subject of relevance to me, since my namesake software is specifically designed to use human responses (or "votes") to evaluate music produced via a genetic process.
Simple Markov Chain
Markov chains, which you mention, aren't a bad way to go. A Markov chain is just a state machine, represented as a graph with edge weights which are transition probabilities. In your case, each of your operations is a node in the graph, and the edges between the nodes represent allowable sequences of operations. Since order matters, your edges are directed. You then need three components:
A generator function to construct the graph of allowed transitions (which operations are allowed to follow one another). If any operation is allowed to follow any other, then this is easy to write: all nodes are connected, and your graph is said to be complete. You can initially set all the edge weights to 1.
A function to traverse the graph, crossing N nodes, where N is your 'gene-length'. At each node, your choice is made randomly, but proportionally weighted by the values of the edges (so better edges have a higher chance of being selected).
A weighting update function which can be used to adjust the weightings of the edges when you get feedback about an image. For example, a simple update function might be to give each edge involved in a 'pleasing' image a positive vote each time that image is nominated by a human. The weighting of each edge is then normalised, with the currently highest voted edge set to 1, and all the others correspondingly reduced.
This graph is then a simple learning network which will be refined by subsequent voting. Over time as votes accumulate, successive traversals will tend to favour the more highly rated sequences of operations, but will still occasionally explore other possibilities.
Advantages
The main advantage of this approach is that it's easy to understand and code, and makes very few assumptions about the problem space. This is good news if you don't know much about the search space (e.g. which sequences of operations are likely to be favourable).
It's also easy to analyse and debug - you can inspect the weightings at any time and very easily calculate things like the top 10 best sequences known so far, etc. This is a big advantage - other approaches are typically much harder to investigate ("why did it do that?") because of their increased abstraction. Although very efficient, you can easily melt your brain trying to follow and debug the convergence steps of a simplex crawler!
Even if you implement a more sophisticated production algorithm, having a simple baseline algorithm is crucial for sanity checking and efficiency comparisons. It's also easy to tinker with, by messing with the update function. For example, an even more baseline approach is pure random walk, which is just a null weighting function (no weighting updates) - whatever algorithm you produce should perform significantly better than this if its existence is to be justified.
This idea of baselining is very important if you want to evaluate the quality of your algorithm's output empirically. In climate modelling, for example, a simple test is "does my fancy simulation do any better at predicting the weather than one where I simply predict today's weather will be the same as yesterday's?" Since weather is often correlated on a timescale of several days, this baseline can give surprisingly good predictions!
Limitations
One disadvantage of the approach is that it is slow to converge. A more agressive choice of update function will push promising results faster (for example, weighting new results according to a power law, rather than the simple linear normalisation), at the cost of giving alternatives less credence.
This is equivalent to fiddling with the mutation rate and gene pool size in a genetic algorithm, or the cooling rate of a simulated annealing approach. The tradeoff between 'climbing hills or exploring the landscape' is an inescapable "twiddly knob" (free parameter) which all search algorithms must deal with, either directly or indirectly. You are trying to find the highest point in some fitness search space. Your algorithm is trying to do that in less tries than random inspection, by looking at the shape of the space and trying to infer something about it. If you think you're going up a hill, you can take a guess and jump further. But if it turns out to be a small hill in a bumpy landscape, then you've just missed the peak entirely.
Also note that since your fitness function is based on human responses, you are limited to a relatively small number of iterations regardless of your choice of algorithmic approach. For example, you would see the same issue with a genetic algorithm approach (fitness function limits the number of individuals and generations) or a neural network (limited training set).
A final potential limitation is that if your "gene-lengths" are long, there are many nodes, and many transitions are allowed, then the size of the graph will become prohibitive, and the algorithm impractical.

Resources