I am implementing my M.Sc dissertation and in theory aspect of my thesis, i have a big problem.
suppose we want to use genetic algorithms.
we have 2 kind of functions :
a) some functions that have relations like this : ||x1 - x2||>>||f(x1) - f(x2)||
for example : y=(1/10)x^2
b) some functions that have relations like this : ||x1 - x2||<<||f(x1) - f(x2)||
for example : y=x^2
my question is that which of the above kind of functions have more difficulties than other when we want to use genetic algorithms to find optimum ( never mind MINIMUM or MAXIMUM ).
Thank you a lot,
Armin
I don't believe you can answer this question in general without imposing additional constraints.
It's going to depend on the particular type of genetic algorithm you're dealing with. If you use fitness proportional (roulette-wheel) selection, then altering the range of fitness values can matter a great deal. With tournament selection or rank-biased selection, as long as the ordering relations hold between individuals, there will be no effects.
Even if you can say that it does matter, it's still going to be difficult to say which version is harder for the GA. The main effect will be on selection pressure, which causes the algorithm to converge more or less quickly. Is that good or bad? It depends. For a function like f(x)=x^2, converging as fast as possible is probably great, because there's only one optimum, so find it as soon as possible. For a more complex function, slower convergence can be required to find good solutions. So for any given function, scaling and/or translating the fitness values may or may not make a difference, and if it does, the difference may or may not be helpful.
There's probably also a No Free Lunch argument that no single best choice exists over all problems and optimization algorithms.
I'd be happy to be corrected, but I don't believe you can say one way or the other without specifying much more precisely exactly what class of algorithms and problems you're focusing on.
Related
I hope this question is appropriate. I'm looking for a solution to an genetic program implementation I have in my mind, but I'm still new to AI and have no clue where / how to solve the problem in mind.
So I recently started experimenting with genetic algorithms (finding the optimal solutions), and afterwards now started looking at genetic programming (finding the optimal program for solutions)... The mutation, crossover etc. on decision trees are relatively straight forward to me, but am still struggling to wrap my head around the implementation of problems.
If a genetic program can find the math function for a set of correlating inputs values to outputs, then by looking at the optimization problem (let's say bin packing problem), how can you use a genetic program to determine the most optimal objective / fitness function for the problem? Because then you "should be" able to use that generated fitness function (from the GP) for implementation on the GA to find the optimal solution to the problem? - in such a manner that your GA will perform more robustly
If I understand this correctly, how will the GP process be feasible? Is it some general used technique, is the approach related to fitness landscape evaluation for optimization problems... or? Any help will be greatly appreciated. I know there's a solution, just don't know where / how to search for it, or what it's referred to...
Thank you in advance.
... [I] am still struggling to wrap my head around the implementation of problems.
... how can you use a genetic program to determine the most optimal objective / fitness function for the problem?
In evolutionary algorithms (GA, GP in particular) a problem is basically defined by a fitness function. In such a context it does not make sense to talk about an automated way of finding the fitness function, because it translates to finding a problem, what doesn't seem to make any sense to me.
If your idea lays in a different plane and you think it still makes sense, please try to clarify it more clearly.
For the purpose of answering werediver's comments, and providing more context to the specified problem (which can't be summed up in a comment), I can sort of answer my own question although still seeing the implementation extremely complex for the "average skill level" in evolutionary programming.
Nevertheless, if someone else finds this useful, after reading a whole bunch of journals, it points my thoughts to fitness landscape analysis (landscapes, Surveying landscape correlation ) as an optimization problem.
I interpret it as implementing a number of fitness landscape techniques, such as Entropy, auto correlation, correlation length, fitness clouds, evolvability etc. (1, 2, 3, 4), where one "should" be able to calculate a number of landscape coefficients and incorporate / utilize them as the fitness function for your GP. The GP therefor basically generates a fitness function for your optimization problem (by means such as symbolic regression) and optimizes it based on the assessed fitness landscape analysis, so to change the objective function.
Because as all literature states, the quality of your fitness function changes your search landscape, and correspondingly influences the performance of optimization algorithms such as GAs and PSOs (quantifying fitness landscapes by means of "ruggedness", "deception" etc.).
Another image which I found quite useful (putting things in perspective), illustrated three defined methodologies for optimization objective functions via fitness landscape analysis as ref:
From the programming side, the two most popular software libraries which I could find (until thus far), however the implementation complexity of either still unknown is:
Paradiseo metaheuristics, with fitness landscape ref doc (C++)
HeuristicLab, automatic fitness analysis ref doc (C# / C++)
So if anyone else has some valid info or experience to offer, it'll be very interesting to hear your perspective, but from what I read, this is not at all a "easy or transparent" approach.
Please correct me if I'm wrong, but it is my understanding that crossovers tend to lead towards local optima, while mutation increases the random walk of the search thus tend to help in escaping local optima tendencies. This insight I got from reading the following: Introduction to Genetic Algorithms and Wikipedia's article on Genetic Operators.
My question is, what is the best or most ideal way to pick which individuals go through crossover and which go through mutation? Is there a rule of thumb for this? What are the implications?
Thanks in advance. This is a pretty specific question that is a bit hard to Google with (for me at least).
The selection of individuals to participate in crossover operation must consider the fitness, that is "better individuals are more likely to have more child programs than inferior individuals.":
http://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/23Selection.html#7_3
The most common way to perform this is using Tournament Selection (see wikipedia).
Selection of the individuals to mutate should not consider fitness, in fact, should be random. And the number of elements mutated per generation (mutation rate) should be very low, around 1% (or it may fall into random search):
http://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/24RecombinationandMutation.html#7_4
In my experience, tweaking the tournament parameters just a bit could lead to substantial changes in the final results (for better or for worse), so it is usually a good idea to play with these parameters until you find a "sweet spot".
I've implemented a number of genetic algorithms to solve a variety of a problems. However I'm still skeptical of the usefulness of crossover/recombination.
I usually first implement mutation before implementing crossover. And after I implement crossover, I don't typically see a significant speed-up in the rate at which a good candidate solution is generated compared to simply using mutation and introducing a few random individuals in each generation to ensure genetic .
Of course, this may be attributed to poor choices of the crossover function and/or the probabilities, but I'd like to get some concrete explanation/evidence as to why/whether or not crossover improves GAs. Have there been any studies regarding this?
I understand the reasoning behind it: crossover allows the strengths of two individuals to be combined into one individual. But to me that's like saying we can mate a scientist and a jaguar to get a smart and fast hybrid.
EDIT: In mcdowella's answer, he mentioned how finding a case where cross-over can improve upon hill-climbing from multiple start points is non-trivial. Could someone elaborate upon this point?
It strongly depends on the smoothness of your search space. Perverse example if every "geneome" was hashed before being used to generate "phenomes" then you would just be doing random search.
Less extreme case, this is why we often gray-code integers in GAs.
You need to tailor your crossover and mutation functions to the encoding. GAs decay quite easily if you throw unsympathetic calculations at them. If the crossover of A and B doesn't yield something that's both A-like and B-like then it's useless.
Example:
The genome is 3 bits long, bit 0 determines whether it's land-dwelling or sea-dwelling. Bits 1-2 describe digestive functions for land-dwelling creatures and visual capabilities for sea-dwelling creatures.
Consider two land-dwelling creatures.
| bit 0 | bit 1 | bit 2
----+-------+-------+-------
Mum | 0 | 0 | 1
Dad | 0 | 1 | 0
They might crossover between bits 1 and 2 yielding a child whose digestive function is some compromise between Mum's and Dad's. Great.
This crossover seems sensible provided that bit 0 hasn't changed. If is does then your crossover function has turned some kind of guts into some kind of eyes. Er... Wut? It might as well have been a random mutations.
Begs the question how DNA gets around this problem. Well, it's both modal and hierarchial. There are large sections which can change a lot without much effect, in others a single mutation can have drastic effects (like bit 0 above). Sometimes the value of X affects the behaviour tiggered by Y, and all values of X are legal and can be explored whereas a modification to Y makes the animal segfault.
Theoretical analyses of GAs often use extremely crude encodings and they suffer more from numerical issues than semantic ones.
You are correct in being skeptical about the cross-over operation. There is a paper called
"On the effectiveness of crossover in simulated evolutionary optimization" (Fogel and Stayton, Biosystems 1994). It is available for free at 1.
By the way, if you haven't already I recommend looking into a technique called "Differential Evolution". It can be very good at solving many optimization problems.
My impression is that hill-climbing from multiple random starts is very effective, but that trying to find a case where cross-over can improve on this is non-trivial. One reference is "Crossover: The Divine Afflatus in Search" by David Icl˘anzan, which states
The traditional GA theory is pillared on the Building Block Hypothesis
(BBH) which states that Genetic Algorithms (GAs) work by discovering,
emphasizing and recombining low order schemata in high-quality
strings, in a strongly parallel manner. Historically, attempts to
capture the topological fitness landscape features which exemplify
this intuitively straight-forward process, have been mostly
unsuccessful. Population-based recombinative methods had been
repeatedly outperformed on the special designed abstract test suites,
by different variants of mutation-based algorithms.
A related paper is "Overcoming Hierarchical Difficulty by Hill-Climbing the
Building Block Structure" by David Iclănzan and Dan Dumitrescu, which states
The Building Block Hypothesis suggests that Genetic Algorithms (GAs)
are well-suited for hierarchical problems, where efficient solving
requires proper problem decomposition and assembly of solution from
sub-solution with strong non-linear interdependencies. The paper
proposes a hill-climber operating over the building block (BB) space
that can efficiently address hierarchical problems.
John Holland's two seminal works "Adaptation in Natural and Artificial Systems" and "Hidden Order" (less formal) discuss the theory of crossover in depth. IMO, Goldberg's "Genetic Algorithms in Search, Optimization, and Machine Learning" has a very approachable chapter on mathematical foundations which includes such conclusions as:
With both crossover and reproduction....those schemata with both above-average performance and short defining lengths are going to be sampled at exponentially increasing rates.
Another good reference might be Ankenbrandt's "An Extension to the Theory of Convergence and a Proof of the Time Complexity of Genetic Algorithms" (in "Foundations of Genetic Algorithms" by Rawlins).
I'm surprised that the power of crossover has not been apparent to you in your work; when I began using genetic algorithms and saw how powerfully "directed" crossover seemed, I felt I gained an insight into evolution that overturned what I had been taught in school. All the questions about "how could mutation lead to this and that?" and "Well, over the course of so many generations..." came to seem fundamentally misguided.
The crossover and mutation!! Actually both of them are necessary.
Crossover is an explorative operator, but the mutation is an exploitive one. Considering the structure of solutions, problem, and the likelihood of optimization rate, its very important to select a correct value for Pc and Pm (probability of crossover and mutation).
Check this GA-TSP-Solver, it uses many crossover and mutation methods. You can test any crossover alongside mutations with given probabilities.
it mainly depends on the search space and the type of crossover you are using. For some problems I found that using crossover at the beginning and then mutation, it will speed up the process for finding a solution, however this is not very good approach since I will end up on finding similar solutions. If we use both crossover and mutation I usually get better optimized solutions. However for some problems crossover can be very destructive.
Also genetic operators alone are not enough to solve large/complex problems. When your operators don't improve your solution (so when they don't increase the value of fitness), you should start considering other solutions such as incremental evolution, etc..
I'm looking for algorithms to find a "best" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse, testing a set of parameters is very slow - on the order of 1 minute - and I can't compute the gradient directly.
Are there any well-known algorithms for this kind of optimization?
I've had moderate success with just trying random values. I'm wondering if I can improve the performance by making the random parameter chooser have a lower chance of picking parameters close to ones that had produced bad results in the past. Is there a name for this approach so that I can search for specific advice?
More info:
Parameters are continuous
There are on the order of 5-10 parameters. Certainly not more than 10.
How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?
Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.
Simulated
Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
Particle
Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.
The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.
Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.
By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.
Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.
The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.
Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.
Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.
I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).
I've also tried an algorithm that does the following:
Pick a random point and a random direction
Evaluate the function
Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.
This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.
If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.
This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.
Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.
I can't really help you with finding an algorithm for your specific problem.
However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.
If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.
There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.
Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:
get 100 computers to cut your parameter testing time to some reasonable time
really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.
I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?
You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.
Some personal experience:
Large population sizes seem to work
better when you have a noisy fitness
function, I think this is because the growth
of sub-groups in the population over successive generations acts
to give more sampling of
the fitness function. I typically use
100 for less noisy/deterministic functions, 1000+
for noisy.
For number of generations it is best to measure improvements in the
fitness function and stop when it
meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
Tree depth requirements are really dependent on your DSL. I sometimes try to do an
implementation without explicit
limits but penalise or eliminate
programs that run too long (which is probably
what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
Percentages for different mutation / recombination operators don't seem
to matter all that much. As long as
you have a comprehensive set of mutations, any reasonably balanced
distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
Why don't you try using a genetic algorithm to optimise these parameters for you? :)
Any problem in computer science can be
solved with another layer of
indirection (except for too many
layers of indirection.)
-David J. Wheeler
When I started looking into Genetic Algorithms I had the same question.
I wanted to collect data variating parameters on a very simple problem and link given operators and parameters values (such as mutation rates, etc) to given results in function of population size etc.
Once I started getting into GA a bit more I then realized that given the enormous number of variables this is a huge task, and generalization is extremely difficult.
talking from my (limited) experience, if you decide to simplify the problem and use a fixed way to implement crossover, selection, and just play with population size and mutation rate (implemented in a given way) trying to come up with general results you'll soon realize that too many variables are still into play because at the end of the day the number of generations after which statistically you will get a decent result (whatever way you wanna define decent) still obviously depend primarily on the problem you're solving and consequently on the genome size (representing the same problem in different ways will obviously lead to different results in terms of effect of given GA parameters!).
It is certainly possible to draft a set of guidelines - as the (rare but good) literature proves - but you will be able to generalize the results effectively in statistical terms only when the problem at hand can be encoded in the exact same way and the fitness is evaluated in a somehow an equivalent way (which more often than not means you're ealing with a very similar problem).
Take a look at Koza's voluminous tomes on these matters.
There are very different schools of thought even within the GP community -
Some regard populations in the (low) thousands as sufficient whereas Koza and others often don't deem if worthy to start a GP run with less than a million individuals in the GP population ;-)
As mentioned before it depends on your personal taste and experiences, resources and probably the GP system used!
Cheers,
Jan