parallel run for genetic algorithm - parallel-processing

I have GA, a for loop which runs generations, and in this for I have while which is replacing old individuals with new and better one, how can I parallelize this? I found this topic , they say i should divide in to smaller parts, but what would this part be ?

It is usually the case that the vast majority of the computational effort in a genetic algorithm is involved in evaluating the fitness of each individual in the population. It is also usually the case that the fitness evaluation of each individual does not depend on any of the other individuals in the current generation of the population.
Therefore, the typical parallelization approach is to evaluate the fitness of multiple individuals within a generation in parallel.
You can also easily parallelize the creation of new individuals when moving from one generation to the next. Each individual can pick its parents and perform the crossover and mutation steps in parallel with all other individuals.
I will also note that in many cases you will find that you want to run the evolution multiple times (either with different initial conditions or different parameter settings). Of course, you can just run the non-parallelized GA instances in parallel.

Related

How to deal with unfeasible individuals in Genetic Algorithm?

I'm trying to optimize a thermal power plant in a thermoeconomic way using Genetic Algorithms. Creating population gets me with a lot of unfeasible Individuals (e.g: ValueErros, TypeError etc.). I tried to use Penalty Functions, but the GA get stucked in first populations with a feasible Individual fitness and it doesn't evolve. There's any other way to deal with it?
I will be grateful if anyone can help me
Thank in advance
Do not allow such individuals to get part of the population. It will slow down your convergence but you will garantee that solutions found are fine.
You may want to look into Diversity Control.
In theory, invalid individuals may contain advantageous/valid pieces of code, and discarding them just because they have a bug is wasteful. In diversity control, your population is grouped into different species based on similarity metric (for tree structures it's usually edit distance), then the fitness of each individual is "shared" with other members of the group. In such a case fitness = performance/group_size. This is usually done to prevent premature convergence and to widen the exploration.
By combining your penalty function with diversity control, if the group of valid individuals becomes too numerous, fitness within that group will go down, and the groups that throw errors yet are less numerous will become more competitive, carrying the potentially valuable material forward.
Finally something like the rank-based selection should make the search insensitive to outliers, so when your top dog is 200% better than the other ones, it won't be selected all the time.

Simulation Performance Metrics

This is a semi-broad question, but it's one that I feel on some level is answerable or at least approachable.
I've spent the last month or so making a fairly extensive simulation. In order to protect the interests of my employer, I won't state specifically what it does... but an analogy of what it does may be explained by... a high school dance.
A girl or boy enters the dance floor, and based on the selection of free dance partners, an optimal choice is made. After a period of time, two dancers finish dancing and are now free for a new partnership.
I've been making partner selection algorithms designed to maximize average match outcome while not sacrificing wait time for a partner too much.
I want a way to gauge / compare versions of my algorithms in order to make a selection of the optimal algorithm for any situation. This is difficult however since the inputs of my simulation are extremely large matrices of input parameters (2-5 per dancer), and the simulation takes several minutes to run (a fact that makes it difficult to test a large number of simulation inputs). I have a few output metrics, but linking them to the large number of inputs is extremely hard. I'm also interested in finding which algorithms completely fail under certain input conditions...
Any pro tips / online resources which might help me in defining input constraints / output variables which might give clarity on an optimal algorithm?
I might not understand what you exactly want. But here is my suggestion. Let me know if my solution is inaccurate/irrelevant and I will edit/delete accordingly.
Assume you have a certain metric (say compatibility of the pairs or waiting time). If you just have the average or total number for this metric over all the users, it is kind of useless. Instead you might want to find the distribution of of this metric over all users. If nothing, you should always keep track of the variance. Once you have the distribution, you can calculate a probability that particular algorithm A is better than B for a certain metric.
If you do not have the distribution of the metric within an experiment, you can always run multiple experiments, and the number of experiments you need to run depends on the variance of the metric and difference between two algorithms.

Elitism in GA: Should I let the elites be selected as parents?

I am a little confused by the elitism concept in Genetic Algorithm (and other evolutionary algorithms). When I reserve and then copy 1 (or more) elite individuals to the next generation,
Should I consider the elite solution(s) in the parent selection of the current generation (making a new population)?
Or, should I use others (putting the elites aside) for making a new population and just copy the elites directly to the next generation?
If the latter, what is the use of elitism? Is it just for not losing the best solution? Because in this scheme, it won't help the convergence at all.
for example, here under the crossover/mutation part, it is stated that the elites aren't participating.
(Of course, the same question can be asked about the survivor selection part.)
Elitism only means that the most fit handful of individuals are guaranteed a place in the next generation - generally without undergoing mutation. They should still be able to be selected as parents, in addition to being brought forward themselves.
That article does take a slightly odd approach to elitism. It suggests duplicating the most fit individual - that individual gets two reserved slots in the next generation. One of these slots is mutated, the other is not. That means that, in the next generation, at least one of those slots will reenter the general population as a parent, and possibly two if both are overtaken.
It does seem a viable approach. Either way - whether by selecting elites as parents while also perpetuating them, or by copying the elites and then mutating one - the elites should still be closely attached to the population at large so that they can share their beneficial genes around.
#Peladao's answer and comment are also absolutely spot on - especially on the need to maintain diversity and avoid premature convergence, and the elites should only represent a small portion of the population.
I see no reason why one would not use the elites as parents, besides perhaps a small loss in diversity. (The number of elites should therefore be small compared to the population size).
Since the elites are the best individuals, they are valuable candidates to create new individuals using crossover, as long as the elites themselves are also copied (unchanged) into the new population.
Keeping sufficient diversity and avoiding premature convergence is always important, also when elites are not used as parents.
There exists different methodologies used in order to implement elitism, as pointed out also by the other valid answers.
Generally, for elitism, just copy N individuals in the new generation without applying any kind of change. However this individuals can be selected by fitness ranking (true elitism) guaranteeing that the bests are really "saved", or they can be chosen via proportional selection (as pointed out in the book Machine Learning by Mitchell T.). The latter one is the same used in the roulette selection, but note that in this case the individuals are not used for generating new offspring, but are directly copied in the new population (survivors!).
When the selection for elitism is proportional we obtain a good compromise between a lack of diversity and a premature over-fitting situation.
Applying real elitism and avoiding to use the "elite" as parents will be counter-productive, especially considering the validity of the crossover operation.
In nutshell the main points about using elitism are:
The number of elites in the population should not exceed say 10% of the total population to maintain diversity.
Out of this say 5% may be direct part of the next generation and the remaining should undergo crossover and mutation with other non-elite population.

Methods for crossover in genetic algorithms

When reading about the crossover part of genetic algorithms, books and papers usually refer to methods of simply swapping out bits in the data of two selected candidates which are to reproduce.
I have yet to see actual code of an implemented genetic algorithm for actual industry applications, but I find it hard to imagine that it's enough to operate on simple data types.
I always imagined that the various stages of genetic algorithms would be performed on complex objects involving complex mathematical operations, as opposed to just swapping out some bits in single integers.
Even Wikipedia just lists these kinds of operations for crossover.
Am I missing something important or are these kinds of crossover methods really the only thing used?
There are several things used... although the need for parallelity and several generations (and sometimes a big population) leads to using techniques that perform well...
Another point to keep in mind is that "swapping out some bits" when modeled correctly resembles a simple and rather accurate version of what happens naturally (recombination of genes, mutations)...
For a very simple and nicely written walkthrough see http://www.electricmonk.nl/log/2011/09/28/evolutionary-algorithm-evolving-hello-world/
For some more info see
http://www.codeproject.com/KB/recipes/btl_ga.aspx
http://www.codeproject.com/KB/recipes/genetics_dot_net.aspx
http://www.codeproject.com/KB/recipes/GeneticandAntAlgorithms.aspx
http://www.c-sharpcorner.com/UploadFile/mgold/GeneticAlgorithm12032005044205AM/GeneticAlgorithm.aspx
I always imagined that the various stages of genetic algorithms would be performed on complex objects involving complex mathematical operations, as opposed to just swapping out some bits in single integers.
You probably think complex mathematical operations are used because you think the Genetic Algorithm has to modify a complex object. That's usually not how a Genetic Algorithm works.
So what does happen? Well, usually, the programmer (or scientist) will identify various parameters in a configuration, and then map those parameters to integers/floats. This does limit the directions in which the algorithm can explore, but that's the only realistic method of getting any results.
Let's look at evolving an antenna. You could perform complex simulation with an genetic algorithm rearranging copper molecules, but that would be very complex and take forever. Instead, you'd identify antenna "parameters". Most antenna's are built up out of certain lengths of wire, bent at certain places in order to maximize their coverage area. So you could identify a couple of parameters: number of starting wires, section lengths, angle of the bends. All those are easily represented as integer numbers, and are therefor easy for the Genetic Algorithm to manipulate. The resulting manipulations can be fed into an "Antenna simulator", to see how well it receives signals.
In short, when you say:
I find it hard to imagine that it's enough to operate on simple data types.
you must realize that simple data types can be mapped to much more intricate structures. The Genetic Algorithm doesn't have to know anything about these intricate structures. All it needs to know is how it can manipulate the parameters that build up the intricate structures. That is, after all, the way DNA works.
In Genetic algorithms usually bitswapping of some variety is used.
As you have said:
I always imagined that the various stages of genetic algorithms would
be performed on complex objects involving complex mathematical
operations
What i think you are looking for is Genetic Programming, where the chromosome describes a program, in this case you would be able to do more with the operators when applying crossover.
Also make sure you have understood the difference between your fitness function in genetic algorithms, and the operators within a chromosome in genetic programming.
Different applications require different encondigs. The goal certainly is to find the most effective encoding and often enough the simple encodings are the better suited. So for example a Job Shop Scheduling Problem might be represented as list of permutations which represent the execution order of the jobs on the different machines (so called job sequence matrix). It can however also be represented as a list of priority rules that construct the schedule. A traveling salesman problem or quadratic assignment problem is typically represented by a single permutation that denotes a tour in one case or an assignment in another case. Optimizing the parameters of a simulation model or finding the root of a complex mathematical function is typically represented by a vector of real values.
For all those, still simple types crossover and mutation operators exist. For the permutation these are e.g. OX, ERX, CX, PMX, UBX, OBX, and many more. If you can combine a number of simple representations to represent a solution of your complex problem you might reuse these operations and apply them to each component individually.
The important thing about crossover to work effectively is that a few properties should be fulfilled:
The crossover should conserve those parts that are similar in both
For those parts that are not similar, the crossover should not introduce an element that is not already part of one of the parents
The crossover of two solutions should, if possible, produce a feasible solution
You want to avoid so called unwanted mutations in your crossovers. In that light you also want to avoid having to repair a large part of your chromosomes after crossover, since that is also introducing unwanted mutations.
If you want to experiment with different operators and problems, we have a nice GUI-driven software: HeuristicLab.
Simple Bit swapping is usually the way to go. The key thing to note is the encoding that is used in each candidate solution. Solutions should be encoded such that there is minimal or no error introduced into the new offspring. Any error would require that the algorithm to provide a fix which will lead to increased processing time.
As an example I have developed a university timetable generator in C# that uses a integer encoding to represent the timeslots available in each day. This representation allows very efficient single point or multi-point crossover operator which uses the LINQ intersect function to combine parents.
Typical multipoint crossover with hill-climbing
public List<TimeTable> CrossOver(List<TimeTable> parents) // Multipoint crossover
{
var baby1 = new TimeTable {Schedule = new List<string>(), Fitness = 0};
var baby2 = new TimeTable {Schedule = new List<string>(), Fitness = 0};
for (var gen = 0; gen < parents[0].Schedule.Count; gen++)
{
if (rnd.NextDouble() < (double) CrossOverProb)
{
baby2.Schedule.Add(parents[0].Schedule[gen]);
baby1.Schedule.Add(parents[1].Schedule[gen]);
}
else
{
baby1.Schedule.Add(parents[0].Schedule[gen]);
baby2.Schedule.Add(parents[1].Schedule[gen]);
}
}
CalculateFitness(ref baby1);
CalculateFitness(ref baby2);
// allow hill-climbing
parents.Add(baby1);
parents.Add(baby2);
return parents.OrderByDescending(i => i.Fitness).Take(2).ToList();
}

Evaluating a function at a particular value in parallel

The question may seem vague, but let me explain it.
Suppose we have a function f(x,y,z ....) and we need to find its value at the point (x1,y1,z1 .....).
The most trivial approach is to just replace (x,y,z ...) with (x1,y1,z1 .....).
Now suppose that the function is taking a lot of time in evaluation and I want to parallelize the algorithm to evaluate it. Obviously it will depend on the nature of function, too.
So my question is: what are the constraints that I have to look for while "thinking" to parallelize f(x,y,z...)?
If possible, please share links to study.
Asking the question in such a general way does not permit very specific advice to be given.
I'd begin the analysis by looking for ways to evaluate or rewrite the function using groups of variables that interact closely, creating intermediate expressions that can be used to make the final evaluation. You may find a way to do this involving a hierarchy of subexpressions that leads from the variables themselves to the final function.
In general the shorter and wider such an evaluation tree is, the greater the degree of parallelism. There are two cautionary notes to keep in mind that detract from "more parallelism is better."
For one thing a highly parallel approach may actually involve more total computation than your original "serial" approach. In fact some loss of efficiency in this regard is to be expected, since a serial approach can take advantage of all prior subexpression evaluations and maximize their reuse.
For another thing the parallel evaluation will often have worse rounding/accuracy behavior than a serial evaluation chosen to give good or optimal error estimates.
A lot of work has been done on evaluations that involve matrices, where there is usually a lot of symmetry to how the function value depends on its arguments. So it helps to be familiar with numerical linear algebra and parallel algorithms that have been developed there.
Another area where a lot is known is for multivariate polynomial and rational functions.
When the function is transcendental, one might hope for some transformations or refactoring that makes the dependence more tractable (algebraic).
Not directly relevant to your question are algorithms that amortize the cost of computing function values across a number of arguments. For example in computing solutions to ordinary differential equations, there may be "multi-step" methods that share the cost of evaluating derivatives at intermediate points by reusing those values several times.
I'd suggest that your concern to speed up the evaluation of the function suggests that you plan to perform more than one evaluation. So you might think about ways to take advantage of prior evaluations or perform evaluations at related arguments in a way that contributes to your search for parallelism.
Added: Some links and discussion of search strategy
Most authors use the phrase "parallel function evaluation" to
mean evaluating the same function at multiple argument points.
See for example:
[Coarse Grained Parallel Function Evaluation -- Rulon and Youssef]
http://cdsweb.cern.ch/record/401028/files/p837.pdf
A search strategy to find the kind of material Gaurav Kalra asks
about should try to avoid those. For example, we might include
"fine-grained" in our search terms.
It's also effective to focus on specific kinds of functions, e.g.
"polynomial evaluation" rather than "function evaluation".
Here for example we have a treatment of some well-known techniques
for "fast" evaluations applied to design for GPU-based computation:
[How to obtain efficient GPU kernels -- Cruz, Layton, and Barba]
http://arxiv.org/PS_cache/arxiv/pdf/1009/1009.3457v1.pdf
(from their Abstract) "Here, we have tackled fast summation
algorithms (fast multipole method and fast Gauss transform),
and applied algorithmic redesign for attaining performance on
GPUs. The progression of performance improvements attained
illustrates the exercise of formulating algorithms for the
massively parallel architecture of the GPU."
Another search term that might be worth excluding is "pipelined".
This term invariably discusses the sort of parallelism that can
be used when multiple function evaluations are to be done. Early
stages of the computation can be done in parallel with later
stages, but on different inputs.
So that's a search term that one might want to exclude. Or not.
Here's a paper that discusses n-fold speedup for n-variate
polynomial evaluation over finite fields GF(p). This might be
of direct interest for cryptographic applications, but the
approach via modified Horner's method may be interesting for
its potential for generalization:
[Comparison of Bit and Word Level Algorithms for Evaluating
Unstructured Functions over Finite Rings -- Sunar and Cyganski]
http://www.iacr.org/archive/ches2005/018.pdf
"We present a modification to Horner’s algorithm for evaluating
arbitrary n-variate functions defined over finite rings and fields.
... If the domain is a finite field GF(p) the complexity of
multivariate Horner polynomial evaluation is improved from O(p^n)
to O((p^n)/(2n)). We prove the optimality of the presented algorithm."
Multivariate rational functions can be considered simply as the
ratio of two such polynomial functions. For the special case
of univariate rational functions, which can be particularly
effective in approximating elementary transcendental functions
and others, can be evaluated via finite (resp. truncated)
continued fractions, whose convergents (partial numerators
and denominators) can be defined recursively.
The topic of continued fraction evaluations allows us to segue
to a final link that connects that topic with some familiar
parallelism of numerical linear algebra:
[LU Factorization and Parallel Evaluation of Continued Fractions
-- Ömer Egecioglu]
http://www.cs.ucsb.edu/~omer/DOWNLOADABLE/lu-cf98.pdf
"The first n convergents of a general continued fraction
(CF) can be computed optimally in logarithmic parallel
time using O(n/log(n))processors."
You've asked how to speed up the evalution of a single call to a single function. Unless that evaluation time is measured in hours, it isn't clear why it is worth the bother to speed it up. If you insist on speeding up the function execution itself, you'll have to inspect its content to see if some aspects of it are parallelizable. You haven't provided any information on what it computes or how it does so, so it is hard to give any further advice on this aspect. hardmath's answer suggests some ideas you can use, depending on the actual internal structure of your function.
However, usually people asking your question actually call the function many times (say, N times) for different values of x,y,z (eg., x1,y1,... x2,y2,... xN,yN, ... using your vocabulary).
Yes, if you speed up the execution of the function, making the collective set of calls will speed up and that's what people tend to want. If this is the case, it is "technically easy" to speed up overall execution: make N calls to the function in parallel. Then all the pointwise evaluations happen at the same time. To make this work, you pretty much have make vectors out of the values you want to process (so this kind of trick is called "data parallel" programming). So what you really want is something like:
PARALLEL DO I=1,N
RESULT(I)=F(X[J],Y[J], ...)
END PARALLEL DO
How you implement PARALLEL DO depends on the programming language and libraries you have.
This generally only works if N is a fairly big number, but the more expensive f is to execute, the smaller the effective N.
You can also take advantage of the structure of your function to make this even more efficient. If f computes some internal value the same way for commonly used cases, you might be able
to break out the special cases, pre-compute those, and then use those results to compute "the rest of f" for each individual call.
If you are combining ("reducing") the results of all the functions (e..g, summing all the results), you can do that outside the PARALELL DO loop. If you try to combine results inside the loop, you'll have "loop carried dependencies" and you'll either get the wrong answer or it won't go parallel in the way you expect, depending on your compiler or the parallelism libraries. You can combine the answers efficiently if the combination is some associative/commutative operation such as "sum", by building what amounts to a binary tree and running the evaluation of that in parallel. That's a different problem that also occurs frequently in data parallel computation, but we won't go into further here.
Often the overhead of a parallel for loop is pretty high (forking threads is expensive). So usually people divide the overhead across several iterations:
PARALLEL DO I=1,N,M
DO J=I,I+M
RESULT(J)=F(X[J],Y[J], ...)
END DO
END PARALLEL DO
The constant M requires calibration for efficiency; you have to "tune" it. You also have to take care of the fact that N might not be a multiple of M; that requires just an extra clean loop to handle the edge condition:
PARALLEL DO I=1,int(N/M)*M,M
DO J=I,I+M
RESULT(J)=F(X[J],Y[J], ...)
END DO
END PARALLEL DO
DO J=int(N/M)*M,N,1
RESULT(J)=F(X[J],Y[J], ...)
END DO

Resources