Genetic programming for objective / fitness function determination - genetic-algorithm

I hope this question is appropriate. I'm looking for a solution to an genetic program implementation I have in my mind, but I'm still new to AI and have no clue where / how to solve the problem in mind.
So I recently started experimenting with genetic algorithms (finding the optimal solutions), and afterwards now started looking at genetic programming (finding the optimal program for solutions)... The mutation, crossover etc. on decision trees are relatively straight forward to me, but am still struggling to wrap my head around the implementation of problems.
If a genetic program can find the math function for a set of correlating inputs values to outputs, then by looking at the optimization problem (let's say bin packing problem), how can you use a genetic program to determine the most optimal objective / fitness function for the problem? Because then you "should be" able to use that generated fitness function (from the GP) for implementation on the GA to find the optimal solution to the problem? - in such a manner that your GA will perform more robustly
If I understand this correctly, how will the GP process be feasible? Is it some general used technique, is the approach related to fitness landscape evaluation for optimization problems... or? Any help will be greatly appreciated. I know there's a solution, just don't know where / how to search for it, or what it's referred to...
Thank you in advance.

... [I] am still struggling to wrap my head around the implementation of problems.
... how can you use a genetic program to determine the most optimal objective / fitness function for the problem?
In evolutionary algorithms (GA, GP in particular) a problem is basically defined by a fitness function. In such a context it does not make sense to talk about an automated way of finding the fitness function, because it translates to finding a problem, what doesn't seem to make any sense to me.
If your idea lays in a different plane and you think it still makes sense, please try to clarify it more clearly.

For the purpose of answering werediver's comments, and providing more context to the specified problem (which can't be summed up in a comment), I can sort of answer my own question although still seeing the implementation extremely complex for the "average skill level" in evolutionary programming.
Nevertheless, if someone else finds this useful, after reading a whole bunch of journals, it points my thoughts to fitness landscape analysis (landscapes, Surveying landscape correlation ) as an optimization problem.
I interpret it as implementing a number of fitness landscape techniques, such as Entropy, auto correlation, correlation length, fitness clouds, evolvability etc. (1, 2, 3, 4), where one "should" be able to calculate a number of landscape coefficients and incorporate / utilize them as the fitness function for your GP. The GP therefor basically generates a fitness function for your optimization problem (by means such as symbolic regression) and optimizes it based on the assessed fitness landscape analysis, so to change the objective function.
Because as all literature states, the quality of your fitness function changes your search landscape, and correspondingly influences the performance of optimization algorithms such as GAs and PSOs (quantifying fitness landscapes by means of "ruggedness", "deception" etc.).
Another image which I found quite useful (putting things in perspective), illustrated three defined methodologies for optimization objective functions via fitness landscape analysis as ref:
From the programming side, the two most popular software libraries which I could find (until thus far), however the implementation complexity of either still unknown is:
Paradiseo metaheuristics, with fitness landscape ref doc (C++)
HeuristicLab, automatic fitness analysis ref doc (C# / C++)
So if anyone else has some valid info or experience to offer, it'll be very interesting to hear your perspective, but from what I read, this is not at all a "easy or transparent" approach.

Related

How do I implement the Gaussian mutation method in my genetic algorithm?

I am doing a research paper on a specific genetic algorithm and wanted to analyse the influence of using the Gaussian mutation method. However, the only thing I understand is that I have to sample a random Gaussian value and add that to the gene I have read somewhere on the internet that the mean should be 0, which I understand; this gives us negative as well as positive values. However, I have not found a single source that gave an example of what the std. dev should be or how it should be calculated.
Does anyone know how the standard deviation is determined using Gaussian mutation method so I can get a value from it?
I have read the question and answers of this question here on StackOverflow, but it does not provide me with any details regarding my problem.
What a reasonable (or even the optimal)mutation strength is, depends on the problem to be solved.
Usually you apply Genetic Algorithms to very hard optimization problems for that usual optimization algorithms fail. You can imagine the possible „solutions“ to such an optimization problem as a fitness landscape with high peaks for good solutions and valleys for bad ones.
So if your problem corresponds to a landscape with many peaks of similar height spread out widely (how do you know?), you should use a broad Gaussian distribution so that your chance to find the maximum peak is higher. If you believe however that you have already a pretty good solution (whatever this is) you could use a smaller distribution to find the maximum faster.
So a reasonable approach is to start with a broad distribution and let the population develop towards a (local) maximum by reducing the distribution width.
Again, the concrete numerical values must be derived from the problem.
EDIT:
If you want to play a little with the effects, you could download my free iPhone/iPad app "Steinertree" that shows the effects of varying mutation strength and population size.

A good parameter optimization algorithm for a limited number of points with variance

I'm trying to meta-optimize an algorithm, which has almost a dosen constants. I guess some form of genetic algorithm should be used. However, the algorithm itself is quite heavy and probabilistic by nature (a version of ant colony optimization). Thus calculating the fitness for some set of parameters is quite slow and the results include a lot of variance. Even the order of magnitude for some of the parameters is not exactly clear, so the distribution on some components will likely need to be logarithmic.
Would someone have ideas about suitable algorithms for this problem? I.e. it would need to converge with a limited number of measurement points and also be able to handle randomness in the measured fitness. Also, the easier it is to implement with Java the better of course. :)
If you can express you model algebraically (or as differential equations), consider trying a derivative-based optimization methods. These have the theoretical properties you desire and are much more computationally efficient than black-box/derivative free optimization methods. If you have a MATLAB license, give fmincon a try. Note: fmincon will work much better if you supply derivative information. Other modeling environments include Pyomo, CasADi and Julia/JuMP, which will automatically calculate derivatives and interface with powerful optimization solvers.

optimum finding in Genetic algorithms

I am implementing my M.Sc dissertation and in theory aspect of my thesis, i have a big problem.
suppose we want to use genetic algorithms.
we have 2 kind of functions :
a) some functions that have relations like this : ||x1 - x2||>>||f(x1) - f(x2)||
for example : y=(1/10)x^2
b) some functions that have relations like this : ||x1 - x2||<<||f(x1) - f(x2)||
for example : y=x^2
my question is that which of the above kind of functions have more difficulties than other when we want to use genetic algorithms to find optimum ( never mind MINIMUM or MAXIMUM ).
Thank you a lot,
Armin
I don't believe you can answer this question in general without imposing additional constraints.
It's going to depend on the particular type of genetic algorithm you're dealing with. If you use fitness proportional (roulette-wheel) selection, then altering the range of fitness values can matter a great deal. With tournament selection or rank-biased selection, as long as the ordering relations hold between individuals, there will be no effects.
Even if you can say that it does matter, it's still going to be difficult to say which version is harder for the GA. The main effect will be on selection pressure, which causes the algorithm to converge more or less quickly. Is that good or bad? It depends. For a function like f(x)=x^2, converging as fast as possible is probably great, because there's only one optimum, so find it as soon as possible. For a more complex function, slower convergence can be required to find good solutions. So for any given function, scaling and/or translating the fitness values may or may not make a difference, and if it does, the difference may or may not be helpful.
There's probably also a No Free Lunch argument that no single best choice exists over all problems and optimization algorithms.
I'd be happy to be corrected, but I don't believe you can say one way or the other without specifying much more precisely exactly what class of algorithms and problems you're focusing on.

Multiple parameter optimization with lots of local minima

I'm looking for algorithms to find a "best" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse, testing a set of parameters is very slow - on the order of 1 minute - and I can't compute the gradient directly.
Are there any well-known algorithms for this kind of optimization?
I've had moderate success with just trying random values. I'm wondering if I can improve the performance by making the random parameter chooser have a lower chance of picking parameters close to ones that had produced bad results in the past. Is there a name for this approach so that I can search for specific advice?
More info:
Parameters are continuous
There are on the order of 5-10 parameters. Certainly not more than 10.
How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?
Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.
Simulated
Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
Particle
Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.
The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.
Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.
By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.
Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.
The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.
Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.
Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.
I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).
I've also tried an algorithm that does the following:
Pick a random point and a random direction
Evaluate the function
Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.
This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.
If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.
This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.
Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.
I can't really help you with finding an algorithm for your specific problem.
However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.
If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.
There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.
Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:
get 100 computers to cut your parameter testing time to some reasonable time
really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.

Initial Genetic Programming Parameters

I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?
You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.
Some personal experience:
Large population sizes seem to work
better when you have a noisy fitness
function, I think this is because the growth
of sub-groups in the population over successive generations acts
to give more sampling of
the fitness function. I typically use
100 for less noisy/deterministic functions, 1000+
for noisy.
For number of generations it is best to measure improvements in the
fitness function and stop when it
meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
Tree depth requirements are really dependent on your DSL. I sometimes try to do an
implementation without explicit
limits but penalise or eliminate
programs that run too long (which is probably
what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
Percentages for different mutation / recombination operators don't seem
to matter all that much. As long as
you have a comprehensive set of mutations, any reasonably balanced
distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
Why don't you try using a genetic algorithm to optimise these parameters for you? :)
Any problem in computer science can be
solved with another layer of
indirection (except for too many
layers of indirection.)
-David J. Wheeler
When I started looking into Genetic Algorithms I had the same question.
I wanted to collect data variating parameters on a very simple problem and link given operators and parameters values (such as mutation rates, etc) to given results in function of population size etc.
Once I started getting into GA a bit more I then realized that given the enormous number of variables this is a huge task, and generalization is extremely difficult.
talking from my (limited) experience, if you decide to simplify the problem and use a fixed way to implement crossover, selection, and just play with population size and mutation rate (implemented in a given way) trying to come up with general results you'll soon realize that too many variables are still into play because at the end of the day the number of generations after which statistically you will get a decent result (whatever way you wanna define decent) still obviously depend primarily on the problem you're solving and consequently on the genome size (representing the same problem in different ways will obviously lead to different results in terms of effect of given GA parameters!).
It is certainly possible to draft a set of guidelines - as the (rare but good) literature proves - but you will be able to generalize the results effectively in statistical terms only when the problem at hand can be encoded in the exact same way and the fitness is evaluated in a somehow an equivalent way (which more often than not means you're ealing with a very similar problem).
Take a look at Koza's voluminous tomes on these matters.
There are very different schools of thought even within the GP community -
Some regard populations in the (low) thousands as sufficient whereas Koza and others often don't deem if worthy to start a GP run with less than a million individuals in the GP population ;-)
As mentioned before it depends on your personal taste and experiences, resources and probably the GP system used!
Cheers,
Jan

Resources