Genetic Algorithms: Can my Fitness Function be too complicated?

Genetic Algorithms: Can my Fitness Function be too complicated? - genetic-algorithm

On a Genetic Algorithm, would it be correct to make the Fitness Function something other than a mathematical ecuation? Could it have a recursive function and a loop inside of it?
The thing is I'm evaluating if I can work with Genetic Algorithms for my thesis and this fitness function I'm thinking about could be a little complicated. But maybe not, I'll just have to make sure the program can handle such function and it doesn't create a bottleneck, right?.
Basic idea:
FitnessFunction(){
fitness = RecursiveFunction();
}
RecursiveFunction(){
do{
//Do something
}while(other_condition);
if(another_condition){
return RecursiveFunction();
}
return fitness;
}

It will be a bottleneck, but that is expected. The evaluation function usually takes the great majority of the execution time, since the genetic operators (crossover, mutation) are very simple operations in comparison. I've seen GA where the evaluation function is to simulate a house structure receiving an earthquake, so you should be fine.
Is worth, however, that you isolate, measure the time and try to optimize the function as much as possible. Consider that it will run for hundreds or thousands of individuals, for many generations, and you will repeat the whole process a lot, tweaking parameters and your GA implementation.

As long as your function returns a fitness value (it seems like it does), I don't see a problem with this. In fact, GA and evolutionary computation in general are perfectly suitable for complex fitness functions which are in many cases non-differentiable and therefore hard to use with other training methods like gradient descent.

Related

Genetic programming for objective / fitness function determination

I hope this question is appropriate. I'm looking for a solution to an genetic program implementation I have in my mind, but I'm still new to AI and have no clue where / how to solve the problem in mind.
So I recently started experimenting with genetic algorithms (finding the optimal solutions), and afterwards now started looking at genetic programming (finding the optimal program for solutions)... The mutation, crossover etc. on decision trees are relatively straight forward to me, but am still struggling to wrap my head around the implementation of problems.
If a genetic program can find the math function for a set of correlating inputs values to outputs, then by looking at the optimization problem (let's say bin packing problem), how can you use a genetic program to determine the most optimal objective / fitness function for the problem? Because then you "should be" able to use that generated fitness function (from the GP) for implementation on the GA to find the optimal solution to the problem? - in such a manner that your GA will perform more robustly
If I understand this correctly, how will the GP process be feasible? Is it some general used technique, is the approach related to fitness landscape evaluation for optimization problems... or? Any help will be greatly appreciated. I know there's a solution, just don't know where / how to search for it, or what it's referred to...
Thank you in advance.

... [I] am still struggling to wrap my head around the implementation of problems.
... how can you use a genetic program to determine the most optimal objective / fitness function for the problem?
In evolutionary algorithms (GA, GP in particular) a problem is basically defined by a fitness function. In such a context it does not make sense to talk about an automated way of finding the fitness function, because it translates to finding a problem, what doesn't seem to make any sense to me.
If your idea lays in a different plane and you think it still makes sense, please try to clarify it more clearly.

For the purpose of answering werediver's comments, and providing more context to the specified problem (which can't be summed up in a comment), I can sort of answer my own question although still seeing the implementation extremely complex for the "average skill level" in evolutionary programming.
Nevertheless, if someone else finds this useful, after reading a whole bunch of journals, it points my thoughts to fitness landscape analysis (landscapes, Surveying landscape correlation ) as an optimization problem.
I interpret it as implementing a number of fitness landscape techniques, such as Entropy, auto correlation, correlation length, fitness clouds, evolvability etc. (1, 2, 3, 4), where one "should" be able to calculate a number of landscape coefficients and incorporate / utilize them as the fitness function for your GP. The GP therefor basically generates a fitness function for your optimization problem (by means such as symbolic regression) and optimizes it based on the assessed fitness landscape analysis, so to change the objective function.
Because as all literature states, the quality of your fitness function changes your search landscape, and correspondingly influences the performance of optimization algorithms such as GAs and PSOs (quantifying fitness landscapes by means of "ruggedness", "deception" etc.).
Another image which I found quite useful (putting things in perspective), illustrated three defined methodologies for optimization objective functions via fitness landscape analysis as ref:
From the programming side, the two most popular software libraries which I could find (until thus far), however the implementation complexity of either still unknown is:
Paradiseo metaheuristics, with fitness landscape ref doc (C++)
HeuristicLab, automatic fitness analysis ref doc (C# / C++)
So if anyone else has some valid info or experience to offer, it'll be very interesting to hear your perspective, but from what I read, this is not at all a "easy or transparent" approach.

How critical is the accuracy of a fitness calculation in a genetic algorithm? Will inaccuracies work their way out over time?

Consider a genetic algorithm where fitness is a function of how many times a thing is "liked" ala Facebook.
Let's assume that some percentage of the time, likes originate due to factors beyond the strength of the content, and thus, the fitness of one chromosome vs. another is due in some part to random chance. Let's ballpark that number at 30% -- that is, 30% of the likes of any given thing are not due to deliberate action.
If we have 100 chromosomes, and we know our average conversion for a like is 5% (likes / impressions), how many impressions do we need to have to feel confident in the fitness rankings of each individual chromosome?

Optimising the wrong thing is a pretty good way of getting a wrong answer, and optimisation when you only have a noisy version of the evaluation function is different. See for example https://en.wikipedia.org/wiki/Stochastic_approximation. Even supposing that you find a plausible answer, you need to consider how sensitive that answer is to noise.

optimum finding in Genetic algorithms

I am implementing my M.Sc dissertation and in theory aspect of my thesis, i have a big problem.
suppose we want to use genetic algorithms.
we have 2 kind of functions :
a) some functions that have relations like this : ||x1 - x2||>>||f(x1) - f(x2)||
for example : y=(1/10)x^2
b) some functions that have relations like this : ||x1 - x2||<<||f(x1) - f(x2)||
for example : y=x^2
my question is that which of the above kind of functions have more difficulties than other when we want to use genetic algorithms to find optimum ( never mind MINIMUM or MAXIMUM ).
Thank you a lot,
Armin

I don't believe you can answer this question in general without imposing additional constraints.
It's going to depend on the particular type of genetic algorithm you're dealing with. If you use fitness proportional (roulette-wheel) selection, then altering the range of fitness values can matter a great deal. With tournament selection or rank-biased selection, as long as the ordering relations hold between individuals, there will be no effects.
Even if you can say that it does matter, it's still going to be difficult to say which version is harder for the GA. The main effect will be on selection pressure, which causes the algorithm to converge more or less quickly. Is that good or bad? It depends. For a function like f(x)=x^2, converging as fast as possible is probably great, because there's only one optimum, so find it as soon as possible. For a more complex function, slower convergence can be required to find good solutions. So for any given function, scaling and/or translating the fitness values may or may not make a difference, and if it does, the difference may or may not be helpful.
There's probably also a No Free Lunch argument that no single best choice exists over all problems and optimization algorithms.
I'd be happy to be corrected, but I don't believe you can say one way or the other without specifying much more precisely exactly what class of algorithms and problems you're focusing on.

Methods for crossover in genetic algorithms

When reading about the crossover part of genetic algorithms, books and papers usually refer to methods of simply swapping out bits in the data of two selected candidates which are to reproduce.
I have yet to see actual code of an implemented genetic algorithm for actual industry applications, but I find it hard to imagine that it's enough to operate on simple data types.
I always imagined that the various stages of genetic algorithms would be performed on complex objects involving complex mathematical operations, as opposed to just swapping out some bits in single integers.
Even Wikipedia just lists these kinds of operations for crossover.
Am I missing something important or are these kinds of crossover methods really the only thing used?

There are several things used... although the need for parallelity and several generations (and sometimes a big population) leads to using techniques that perform well...
Another point to keep in mind is that "swapping out some bits" when modeled correctly resembles a simple and rather accurate version of what happens naturally (recombination of genes, mutations)...
For a very simple and nicely written walkthrough see http://www.electricmonk.nl/log/2011/09/28/evolutionary-algorithm-evolving-hello-world/
For some more info see
http://www.codeproject.com/KB/recipes/btl_ga.aspx
http://www.codeproject.com/KB/recipes/genetics_dot_net.aspx
http://www.codeproject.com/KB/recipes/GeneticandAntAlgorithms.aspx
http://www.c-sharpcorner.com/UploadFile/mgold/GeneticAlgorithm12032005044205AM/GeneticAlgorithm.aspx

I always imagined that the various stages of genetic algorithms would be performed on complex objects involving complex mathematical operations, as opposed to just swapping out some bits in single integers.
You probably think complex mathematical operations are used because you think the Genetic Algorithm has to modify a complex object. That's usually not how a Genetic Algorithm works.
So what does happen? Well, usually, the programmer (or scientist) will identify various parameters in a configuration, and then map those parameters to integers/floats. This does limit the directions in which the algorithm can explore, but that's the only realistic method of getting any results.
Let's look at evolving an antenna. You could perform complex simulation with an genetic algorithm rearranging copper molecules, but that would be very complex and take forever. Instead, you'd identify antenna "parameters". Most antenna's are built up out of certain lengths of wire, bent at certain places in order to maximize their coverage area. So you could identify a couple of parameters: number of starting wires, section lengths, angle of the bends. All those are easily represented as integer numbers, and are therefor easy for the Genetic Algorithm to manipulate. The resulting manipulations can be fed into an "Antenna simulator", to see how well it receives signals.
In short, when you say:
I find it hard to imagine that it's enough to operate on simple data types.
you must realize that simple data types can be mapped to much more intricate structures. The Genetic Algorithm doesn't have to know anything about these intricate structures. All it needs to know is how it can manipulate the parameters that build up the intricate structures. That is, after all, the way DNA works.

In Genetic algorithms usually bitswapping of some variety is used.
As you have said:
I always imagined that the various stages of genetic algorithms would
be performed on complex objects involving complex mathematical
operations
What i think you are looking for is Genetic Programming, where the chromosome describes a program, in this case you would be able to do more with the operators when applying crossover.
Also make sure you have understood the difference between your fitness function in genetic algorithms, and the operators within a chromosome in genetic programming.

Different applications require different encondigs. The goal certainly is to find the most effective encoding and often enough the simple encodings are the better suited. So for example a Job Shop Scheduling Problem might be represented as list of permutations which represent the execution order of the jobs on the different machines (so called job sequence matrix). It can however also be represented as a list of priority rules that construct the schedule. A traveling salesman problem or quadratic assignment problem is typically represented by a single permutation that denotes a tour in one case or an assignment in another case. Optimizing the parameters of a simulation model or finding the root of a complex mathematical function is typically represented by a vector of real values.
For all those, still simple types crossover and mutation operators exist. For the permutation these are e.g. OX, ERX, CX, PMX, UBX, OBX, and many more. If you can combine a number of simple representations to represent a solution of your complex problem you might reuse these operations and apply them to each component individually.
The important thing about crossover to work effectively is that a few properties should be fulfilled:
The crossover should conserve those parts that are similar in both
For those parts that are not similar, the crossover should not introduce an element that is not already part of one of the parents
The crossover of two solutions should, if possible, produce a feasible solution
You want to avoid so called unwanted mutations in your crossovers. In that light you also want to avoid having to repair a large part of your chromosomes after crossover, since that is also introducing unwanted mutations.
If you want to experiment with different operators and problems, we have a nice GUI-driven software: HeuristicLab.

Simple Bit swapping is usually the way to go. The key thing to note is the encoding that is used in each candidate solution. Solutions should be encoded such that there is minimal or no error introduced into the new offspring. Any error would require that the algorithm to provide a fix which will lead to increased processing time.
As an example I have developed a university timetable generator in C# that uses a integer encoding to represent the timeslots available in each day. This representation allows very efficient single point or multi-point crossover operator which uses the LINQ intersect function to combine parents.
Typical multipoint crossover with hill-climbing
public List<TimeTable> CrossOver(List<TimeTable> parents) // Multipoint crossover
{
var baby1 = new TimeTable {Schedule = new List<string>(), Fitness = 0};
var baby2 = new TimeTable {Schedule = new List<string>(), Fitness = 0};
for (var gen = 0; gen < parents[0].Schedule.Count; gen++)
{
if (rnd.NextDouble() < (double) CrossOverProb)
{
baby2.Schedule.Add(parents[0].Schedule[gen]);
baby1.Schedule.Add(parents[1].Schedule[gen]);
}
else
{
baby1.Schedule.Add(parents[0].Schedule[gen]);
baby2.Schedule.Add(parents[1].Schedule[gen]);
}
}
CalculateFitness(ref baby1);
CalculateFitness(ref baby2);
// allow hill-climbing
parents.Add(baby1);
parents.Add(baby2);
return parents.OrderByDescending(i => i.Fitness).Take(2).ToList();
}

Evaluating a function at a particular value in parallel

The question may seem vague, but let me explain it.
Suppose we have a function f(x,y,z ....) and we need to find its value at the point (x1,y1,z1 .....).
The most trivial approach is to just replace (x,y,z ...) with (x1,y1,z1 .....).
Now suppose that the function is taking a lot of time in evaluation and I want to parallelize the algorithm to evaluate it. Obviously it will depend on the nature of function, too.
So my question is: what are the constraints that I have to look for while "thinking" to parallelize f(x,y,z...)?
If possible, please share links to study.

Asking the question in such a general way does not permit very specific advice to be given.
I'd begin the analysis by looking for ways to evaluate or rewrite the function using groups of variables that interact closely, creating intermediate expressions that can be used to make the final evaluation. You may find a way to do this involving a hierarchy of subexpressions that leads from the variables themselves to the final function.
In general the shorter and wider such an evaluation tree is, the greater the degree of parallelism. There are two cautionary notes to keep in mind that detract from "more parallelism is better."
For one thing a highly parallel approach may actually involve more total computation than your original "serial" approach. In fact some loss of efficiency in this regard is to be expected, since a serial approach can take advantage of all prior subexpression evaluations and maximize their reuse.
For another thing the parallel evaluation will often have worse rounding/accuracy behavior than a serial evaluation chosen to give good or optimal error estimates.
A lot of work has been done on evaluations that involve matrices, where there is usually a lot of symmetry to how the function value depends on its arguments. So it helps to be familiar with numerical linear algebra and parallel algorithms that have been developed there.
Another area where a lot is known is for multivariate polynomial and rational functions.
When the function is transcendental, one might hope for some transformations or refactoring that makes the dependence more tractable (algebraic).
Not directly relevant to your question are algorithms that amortize the cost of computing function values across a number of arguments. For example in computing solutions to ordinary differential equations, there may be "multi-step" methods that share the cost of evaluating derivatives at intermediate points by reusing those values several times.
I'd suggest that your concern to speed up the evaluation of the function suggests that you plan to perform more than one evaluation. So you might think about ways to take advantage of prior evaluations or perform evaluations at related arguments in a way that contributes to your search for parallelism.
Added: Some links and discussion of search strategy
Most authors use the phrase "parallel function evaluation" to
mean evaluating the same function at multiple argument points.
See for example:
[Coarse Grained Parallel Function Evaluation -- Rulon and Youssef]
http://cdsweb.cern.ch/record/401028/files/p837.pdf
A search strategy to find the kind of material Gaurav Kalra asks
about should try to avoid those. For example, we might include
"fine-grained" in our search terms.
It's also effective to focus on specific kinds of functions, e.g.
"polynomial evaluation" rather than "function evaluation".
Here for example we have a treatment of some well-known techniques
for "fast" evaluations applied to design for GPU-based computation:
[How to obtain efficient GPU kernels -- Cruz, Layton, and Barba]
http://arxiv.org/PS_cache/arxiv/pdf/1009/1009.3457v1.pdf
(from their Abstract) "Here, we have tackled fast summation
algorithms (fast multipole method and fast Gauss transform),
and applied algorithmic redesign for attaining performance on
GPUs. The progression of performance improvements attained
illustrates the exercise of formulating algorithms for the
massively parallel architecture of the GPU."
Another search term that might be worth excluding is "pipelined".
This term invariably discusses the sort of parallelism that can
be used when multiple function evaluations are to be done. Early
stages of the computation can be done in parallel with later
stages, but on different inputs.
So that's a search term that one might want to exclude. Or not.
Here's a paper that discusses n-fold speedup for n-variate
polynomial evaluation over finite fields GF(p). This might be
of direct interest for cryptographic applications, but the
approach via modified Horner's method may be interesting for
its potential for generalization:
[Comparison of Bit and Word Level Algorithms for Evaluating
Unstructured Functions over Finite Rings -- Sunar and Cyganski]
http://www.iacr.org/archive/ches2005/018.pdf
"We present a modification to Horner’s algorithm for evaluating
arbitrary n-variate functions defined over finite rings and fields.
... If the domain is a finite field GF(p) the complexity of
multivariate Horner polynomial evaluation is improved from O(p^n)
to O((p^n)/(2n)). We prove the optimality of the presented algorithm."
Multivariate rational functions can be considered simply as the
ratio of two such polynomial functions. For the special case
of univariate rational functions, which can be particularly
effective in approximating elementary transcendental functions
and others, can be evaluated via finite (resp. truncated)
continued fractions, whose convergents (partial numerators
and denominators) can be defined recursively.
The topic of continued fraction evaluations allows us to segue
to a final link that connects that topic with some familiar
parallelism of numerical linear algebra:
[LU Factorization and Parallel Evaluation of Continued Fractions
-- Ömer Egecioglu]
http://www.cs.ucsb.edu/~omer/DOWNLOADABLE/lu-cf98.pdf
"The first n convergents of a general continued fraction
(CF) can be computed optimally in logarithmic parallel
time using O(n/log(n))processors."

You've asked how to speed up the evalution of a single call to a single function. Unless that evaluation time is measured in hours, it isn't clear why it is worth the bother to speed it up. If you insist on speeding up the function execution itself, you'll have to inspect its content to see if some aspects of it are parallelizable. You haven't provided any information on what it computes or how it does so, so it is hard to give any further advice on this aspect. hardmath's answer suggests some ideas you can use, depending on the actual internal structure of your function.
However, usually people asking your question actually call the function many times (say, N times) for different values of x,y,z (eg., x1,y1,... x2,y2,... xN,yN, ... using your vocabulary).
Yes, if you speed up the execution of the function, making the collective set of calls will speed up and that's what people tend to want. If this is the case, it is "technically easy" to speed up overall execution: make N calls to the function in parallel. Then all the pointwise evaluations happen at the same time. To make this work, you pretty much have make vectors out of the values you want to process (so this kind of trick is called "data parallel" programming). So what you really want is something like:
PARALLEL DO I=1,N
RESULT(I)=F(X[J],Y[J], ...)
END PARALLEL DO
How you implement PARALLEL DO depends on the programming language and libraries you have.
This generally only works if N is a fairly big number, but the more expensive f is to execute, the smaller the effective N.
You can also take advantage of the structure of your function to make this even more efficient. If f computes some internal value the same way for commonly used cases, you might be able
to break out the special cases, pre-compute those, and then use those results to compute "the rest of f" for each individual call.
If you are combining ("reducing") the results of all the functions (e..g, summing all the results), you can do that outside the PARALELL DO loop. If you try to combine results inside the loop, you'll have "loop carried dependencies" and you'll either get the wrong answer or it won't go parallel in the way you expect, depending on your compiler or the parallelism libraries. You can combine the answers efficiently if the combination is some associative/commutative operation such as "sum", by building what amounts to a binary tree and running the evaluation of that in parallel. That's a different problem that also occurs frequently in data parallel computation, but we won't go into further here.
Often the overhead of a parallel for loop is pretty high (forking threads is expensive). So usually people divide the overhead across several iterations:
PARALLEL DO I=1,N,M
DO J=I,I+M
RESULT(J)=F(X[J],Y[J], ...)
END DO
END PARALLEL DO
The constant M requires calibration for efficiency; you have to "tune" it. You also have to take care of the fact that N might not be a multiple of M; that requires just an extra clean loop to handle the edge condition:
PARALLEL DO I=1,int(N/M)*M,M
DO J=I,I+M
RESULT(J)=F(X[J],Y[J], ...)
END DO
END PARALLEL DO
DO J=int(N/M)*M,N,1
RESULT(J)=F(X[J],Y[J], ...)
END DO

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio