Linear fitness scaling in Genetic Algorithm produces negative fitness values - algorithm

I have a GA with a fitness function that can evaluate to negative or positive values. For the sake of this question let's assume the function
u = 5 - (x^2 + y^2)
where
x in [-5.12 .. 5.12]
y in [-5.12 .. 5.12]
Now in the selection phase of GA I am using simple roulette wheel. Since to be able to use simple roulette wheel my fitness function must be positive for concrete cases in a population, I started looking for scaling solutions. The most natural seems to be linear fitness scaling. It should be pretty straightforward, for example look at this implementation. However, I am getting negative values even after linear scaling.
For example for the above mentioned function and these fitness values:
-9.734897 -7.479017 -22.834280 -9.868979 -13.180669 4.898595
after linear scaling I am getting these values
-9.6766040 -11.1755111 -0.9727897 -9.5875139 -7.3870793 -19.3997490
Instead, I would like to scale them to positive values, so I can do roulette wheel selection in the next phase.
I must be doing something fundamentally wrong here. How should I approach this problem?

The main mistake was that the input to linear scaling must already be positive (by definition), whereas I was fetching it also negative values.
The talk about negative values is not about input to the algorithm, but about output (scaled values) from the algorithm. The check is to handle this case and then correct it so as not to produce negative scaled values.
if(p->min > (p->scaleFactor * p->avg - p->max)/
(p->scaleFactor - 1.0)) { /* if nonnegative smin */
d = p->max - p->avg;
p->scaleConstA = (p->scaleFactor - 1.0) * p->avg / d;
p->scaleConstB = p->avg * (p->max - (p->scaleFactor * p->avg))/d;
} else { /* if smin becomes negative on scaling */
d = p->avg - p->min;
p->scaleConstA = p->avg/d;
p->scaleConstB = -p->min * p->avg/d;
}
On the image below, if f'min is negative, go to else clause and handle this case.
Well the solution is then to prescale above mentioned function, so it gives only positive values. As Hyperboreus suggested, this can be done by adding the smallest possible value
u = 5 - (2*5.12^2)
It is best if we separate real fitness values that we are trying to maximize from scaled fitness values that are input to selection phase of GA.

I agree with the previous answer. Linear scaling by itself tries to preserve the average fitness value, so it needs to be offset if the function is negative. For more details, please have a look in Goldberg's Genetic Algorithms book (1989), Chapter 7, pp. 76-79.

Your smallest possible value for u = 5 - (2*5.12^2). Why not just add this to your u?

Related

Implement density function

I am going through my book , it states that " Write a sampling algorithm for this density function"
y=x^2+(2/3)*x+1/3; 0 < đť‘Ą < 1
Or I can use Monte Carlo?
Any help would be appreciated!
I'm assuming you mean you want to generate random x values that have the distribution specified by density y(x).
It's often desirable to derive the cumulative distribution function by integrating the density, and use inverse transform sampling to generate x values. In your case the the CDF is a third order polynomial which doesn't factor to yield a simple cube-root solution, so you would have to use a numerical solver to find the inverse. Time to consider alternatives.
Another option is to use the acceptance/rejection method. After checking the derivative, it's clear that your density is convex, so it's easy to create a bounding function b(x) by drawing a straight line from f(0) to f(1). This yields b(x) = 1/3 + 5x/3. This bounding function has area 7/6, while your f(x) has an area of 1, since it is a valid density. Consequently, 6/7 of points generated uniformly under b(x) will also fall under f(x), and only 1 out of 7 attempts will fail in the rejection scheme. Here's a plot of f(x) and b(x):
Since b(x) is linear, it is easy to generate x values using it as a distribution after scaling by 6/7 to make it a valid distribution function. The algorithm, expressed in pseudocode, then becomes:
function generate():
while TRUE:
x <- (sqrt(1 + 35 * U(0,1)) - 1) / 5 # inverse CDF transform of b(x)
if U(0, b(x)) <= f(x):
return x
end while
end function
where U(a,b) means generate a value uniformly distributed between a and b, f(x) is your density, and b(x) is the bounding function described above.
I implemented the algorithm described above to generate 100,000 candidate values, of which 14,199 (~1/7) were rejected, as expected. The end results are presented in the following histogram, which you can compare to f(x) in the plot above.
I'm assuming that you have a function y(x), which takes a value between [0,1] and returns the value of y. You just need to provide a random value of x and return the corresponding value of y.
def getSample():
#get uniform random number
x = numpy.random.random()
#sample my custom function
return y(x)

Matlab - Least Squares data fitting - Cost function with extra constraint

I am currently working on some MatLab code to fit experimental data to a sum of exponentials following a method described in this paper.
According to the paper, the data has to follow the following equation (written in pseudo-code):
y = sum(v(i)*exp(-x/tau(i)),i=1..n)
Here tau(i) is a set of n predefined constants. The number of constants also determines the size of the summation, and hence the size of v. For example, we can try to fit a sum of 100 exponentials, each with a different tau(i) to our data. However, due to the nature of the fitting and the exponential sum, we need to add another constraint to the problem, and hence to the cost function of the least-squares method used.
Normally, the cost function of the least-squares method is given by:
(y_data - sum(v(i)*exp(-x/tau(i)),i=1..n)^2
And this has to be minimized. However, to prevent over-fitting that would make the time-constant spectrum extremely noisy, the paper adds the following constraint to the cost function:
|v(i) - v(i+1)|^2
Because of this extra constraint, as far as I know, the regular algorithms, like lsqcurvefit aren't useable any longer, and I have to use fminsearch to search the minimum of my least-squares cost function with a constraint. The function that has to be minimized, according to me, is the following:
(y_data - sum(v(i)*exp(-x/tau(i)),i=1..n)^2 + sum(|v(j) - v(j+1)|^2,j=1..n-1)
My attempt to code this in MatLab is the following. Initially we define the function in a function script, then we use fminsearch to actually minimize the function and get values for v.
function res = funcost( v )
%FUNCOST Definition of the function that has to be minimised
%We define a function yvalues with 2 exponentials with known time-constants
% so we know the result that should be given by minimising.
xvalues = linspace(0,50,10000);
yvalues = 3-2*exp(-xvalues/1)-exp(-xvalues/10);
%Definition of 30 equidistant point in the logarithmic scale
terms = 30;
termsvector = [1:terms];
tau = termsvector;
for i = 1:terms
tau(i) = 10^(-1+3/terms*i);
end
%Definition of the regular function
res_1 = 3;
for i=1:terms
res_1 =res_1+ v(i).*exp(-xvalues./tau(i));
end
res_1 = res_1-yvalues;
%Added constraint
k=1;
res_2=0;
for i=1:terms-1
res_2 = res_2 + (v(i)-v(i+1))^2;
end
res=sum(res_1.*res_1) + k*res_2;
end
fminsearch(#funcost,zeros(30,1),optimset('MaxFunEvals',1000000,'MaxIter',1000000))
However, this code is giving me inaccurate results (no error, just inaccurate results), which leads me to believe I either made a mistake in the coding or in the interpretation of the added constraint for the least-squares method.
I would try to introduce the additional constrain in following way:
res_2 = max((v(1:(end-1))-v(2:end)).^2);
e.g. instead of minimizing an integrated (summed up) error, it does minmax.
You may also make this constrain stiff by
if res_2 > some_number
k = a_very_big_number;
else
k=0; % or k = a_small_number
end;

Genetic/Evolutionary algorithm - Painter

My task:
Create a program to copy a picture (given as input) using primitives only (like triangle or something). The program should use evolutionary algorithm to create output picture.
My question:
I need to invent an algorithm to create populations and check them (how much - in % - they match the input picture).
I have an idea; you can find it below.
So what I want from you: advice (if you find my idea not so bad) or inspiration (maybe you have a better idea?)
My idea:
Let's say that I'll use only triangles to build the output picture.
My first population is P pictures (generated by using T randomly generated triangles - called Elements).
I check by my fitness function every pictures in population and choose E of them as elite and rest of population just remove:
To compare 2 pictures we check every pixel in picture A and compare his R,G,B with
the same pixel (the same coordinates) in picture B.
I use this:
SingleDif = sqrt[ (Ar - Br)^2 + (Ag - Bg)^2 + (Ab - Bb)^2]
then i sum all differences (from all pixels) - lets call it SumDif
and use:
PictureDif = (DifMax - SumDif)/DifMax
where
DifMax = pictureHeight * pictureWidth * 255*3
The best are used to create the next population in this way:
picture MakeChild(picture Mother, picture Father)
{
picture child;
for( int i = 0; i < T; ++i )
{
j //this is a random number from 0 to 1 - created now
if( j < 0.5 ) child.element(i) = Mother.element(i);
else child.element(i) = Father.element(i)
if( j < some small % ) mutate( child.element(i) );
}
return child;
}
So it's quite simple. Only the mutation needs a comment: So there is always some small probability that element X in child will be different than X in his parent. To do this we make random changes in element in child (change his colour by random number, or add random number to his (x,y) coordinate - or his node).
So this is my idea... I didn't test it, didn't code it.
Please check my idea - what do you think about it?
I would make the number of patches of each child dynamic and get the mutation operation to insert/delete patches with some (low) probability. Of course this could result in a lot of redundancy and bloat in the child's genome. In these situations, it is usually a good idea to use the length of an individual's genome as a parameter of the fitness function so that individuals get rewarded (with a higher fitness value) for using fewer patches. So for example if the PictureDif of individuals A and B are the same but the A has fewer patches than B, then A has a higher fitness.
Another issue is the reproductive operator that you proposed (namely, the crossover operation). In order for the evolutionary process to work efficiently, you need to achieve a reasonable exploration and exploitation balance. One way of doing this is by having a set of reproductive operators that exhibit a good fitness correlation [1] which means the fitness of a child must be close to the fitness of its parent(s).
In the case of single parent reproduction you only need to find the right mutation parameters. However, when it comes to multi-parent reproduction (crossover) one of the frequently used techniques is to produce 2 children (instead of 1) from the same 2 parents. For the first child, each gene comes from the mother with the probability of 0.2 and from the father with the probability of 0.8, and for the second child the other way around. Of course after the crossover, you can do the mutation.
Oh and one more thing, for the mutation operators, when you say
... make random changes in element in child (change his colour by random number, or add random number to his (x,y) coordinate - or his node)
it's a good idea to use a Gaussian distribution to change the colour, coordinate etc.
[1] Evolutionary Computation: A unified approach by Kenneth A. De Jong, page 69

genetic algorithm handling negative fitness values

I am trying to implement genetic algorithm for maximizing a function of n variables. However the problem is that the fitness values can be negative and I am not sure about how to handle negative values while doing selection. I read this article Linear fitness scaling in Genetic Algorithm produces negative fitness values
but it's not clear to me how the negative fitness values were taken care of and how scaling factors a and b were calculated.
Also, from the article I know that roulette wheel selection only works for positive fitness value. Is it the same for tournament selection as well ?
When you have negative values, you could try to find the smallest fitness value in your population and add its opposite to every value. This way you will no longer have negative values, while the differences between fitness values will remain the same.
Tournament selection is not affected by this problem. It simply compares the fitness values of a uniformly sampled subset of size n of the population and takes the one with the best value. Still of course this means that, if you sample without repetition then the worst n-1 individuals will never get selected. If you sample with repetition they have a chance of being selected.
As with proportional selection: It doesn't work with negative fitness values. You can only apply "windowing" or "scaling" of your fitness values in which case they work again.
I once programmed some sampling methods as extension methods for C#'s IEnumerable among them is a SampleProportional and SampleProportionalWithoutRepetition extension method. They're part of HeuristicLab under GPL license.
Okay, it's late to answer, but still someone could google it.
First of all - yes, you can use negative fitness. But I'm totally suggest you not to do it, because I've did it and experienced a lot of problems (still doable, but totally not recommended). So here's explanation:
Say you have population of N creatures. After simulation they all have some fitness values f(n), where f(n) is fitness and n is creature number. After this you want to build some probability distribution to determine which creatures should be killed (of course you can delete say 40% of just worst creatures but it would be better if you use distribution). How do you build such distribution? Say f(a) = 50, and f(b) = 100, so creature b is 2 times better than creature a, so probably you want to make the
survival probability of creature a 2 times higher than creature b (makes great sense if your fitness value is linear). In case you wonder how to do it:
Let's say that sum( f (n) ) is the summ of all fitness values. Then
survival probability p(a) of creature a is:
p(a) = f(a) / sum( f(n) )
This will do the trick.
But now let's make negative fitness allowed. Say f(a) = 50, f(b) = 100, f(c) = -1000. b is again 2 times better than a, makes sense, but it's -10 times better than c? Doesn't make sense. Gentleman above suggested you to add oppositive of worst fitness value, which kinda can "fix" your situation, but really it don't (I maked same mistake before). Okay, let's add 1000 to all fitness values:
f(a) = 1050, f(b) = 1100, f(c) = 0, so survival probability of c is zero now, okay, we can take it. But let's compare a and b now:
b is 1.05 better than a now, which means that fitness of a and b is almost the same, which is totally unacceptable, because it clearly was 2 times better than a (of course in assumption that fitness is linear, but this will mess up nonlinear fitnesses as well)! You can't escape this problem, it will constantly get in your way, because probability can't be negative, so you either can remove the probability from evolution (which is not very good thing to do) or you can do some exceptions and tricks.
Since it was too late in my scenario to remove negative fitness, here's my way in order to fix things up:
Once again, you have population of N creatures. Say neg(N) gives you all negative fitness creatures and pos(N) positive fitness creatures (it's your call to make zero negative or positive, doesn't matter in this case). And let's say you need D creatures to die. And now here's the trick:
the higher f( c ) ( c is pos creature) value, the better creature is, so we can use its fitness to determine the probability of survivial. But the lower (bigger negative) f( m ) (m is neg creature ), the worser creature is, so we can use its fitness to determine the probability of dying.
Now, if D > neg(N) then all neg(N) will die, and (D-neg(N)) of pos(N) will die with use of probability distribution based on all positive creatures fitness (probability of survival p(a) = f(a) / sum( pos(n) ) ). But if D < neg(N), then all pos(N) will survive, and D of neg(N) creatures will die with use of probability distribution based on all negative creatures fitness (probability of dying p(a) = f(a) / sum( neg(n) ) (f(a) will be negative, but sum( neg(n) ) will be negative as well, so probability will be positive).
I know this question has been here for a long time, but if new guys want to know the best way to deal with negative values, and also your problem is minimum. Here is the code for it.
from numpy import min, sum, ptp, array
from numpy.random import uniform
list_fitness1 = array([-12, -45, 0, 72.1, -32.3])
list_fitness2 = array([0.5, 6.32, 988.2, 1.23])
def get_index_roulette_wheel_selection(list_fitness=None):
""" It can handle negative also. Make sure your list fitness is 1D-numpy array"""
scaled_fitness = (list_fitness - min(list_fitness)) / ptp(list_fitness)
minimized_fitness = 1.0 - scaled_fitness
total_sum = sum(minimized_fitness)
r = uniform(low=0, high=total_sum)
for idx, f in enumerate(minimized_fitness):
r = r + f
if r > total_sum:
return idx
get_index_roulette_wheel_selection(list_fitness1)
get_index_roulette_wheel_selection(list_fitness2)
Make sure your fitness list is 1D-numpy array
Scaled the fitness list to the range [0, 1]
Transform maximum problem to minimum problem by 1.0 - scaled_fitness_list
Random a number between 0 and sum(minimizzed_fitness_list)
Keep adding element in minimized fitness list until we get the value greater than the total sum
You can see if the fitness is small --> it has bigger value in minimized_fitness --> It has a bigger chance to add and make the value greater than the total sum.
I think that the main issue people are running into here is that they're treating the fitness score improperly. Let's think about an example fitness score as the temperature inside of a truck shipping frozen goods. The truck's internal temperature should be -2 C... but that's also 28.4 F. They are the same exact fitness relative to the food staying frozen, but 2 * -2 = -4, and 2 * 28.4 is 56.8. "Two times colder" doesn't really make any sense here (-4 C != 14.2 F either). Same with fitness scores.
In the case of -1000 in Volot's example, the difference between 50 and 100 is actually comparatively low: the important thing is that you'd pick either / both of those over the -1000, which you will definitely do if you just subtract -1000 from everything. Then the next generation of children may have fitness scores of 50, 100, 200, and 10, let's say. Now the difference between 50 and 100 is much more pronounced, and 50 will have a much lower chance of getting picked. Remember genetic algorithms are iterative. It also reminds me of a saying: You don’t have to run faster than the bear to get away. You just have to run faster than the guy next to you. 50 just needs to outrun -1000 to survive to reproduce.
The problem of subtracting the min resulting in 0 also can be avoided. When estimating probability distributions, people will add 1 occurrence to every (known) possible outcome so that extremely rare events are still captured. That gets somewhat trickier with fitness scores. You can't just add 1. What if your fitness scores are 0.01, 0.02, and -0.01? 1.03, 1.02, and 1.00 are going to result in picking a low relative fitness a lot. You can instead add the lowest non-zero value to everything, resulting in 0.04, 0.05, and 0.02. For the -1000 case, it results in 2150, 2100, and 1050 (so everything that used to be 0 will always be half as likely as the next lowest fitness to get picked)
Still, to make things as consistent as possible with what is a more typical GA sampling method, I would only subtract the min and add back in a small amount of fitness when there are negative values. When everything is positive, there's no reason to do it.

How to calculate the sum of two normal distributions

I have a value type that represents a gaussian distribution:
struct Gauss {
double mean;
double variance;
}
I would like to perform an integral over a series of these values:
Gauss eulerIntegrate(double dt, Gauss iv, Gauss[] values) {
Gauss r = iv;
foreach (Gauss v in values) {
r += v*dt;
}
return r;
}
My question is how to implement addition for these normal distributions.
The multiplication by a scalar (dt) seemed simple enough. But it wasn't simple! Thanks FOOSHNICK for the help:
public static Gauss operator * (Gauss g, double d) {
return new Gauss(g.mean * d, g.variance * d * d);
}
However, addition eludes me. I assume I can just add the means; it's the variance that's causing me trouble. Either of these definitions seems "logical" to me.
public static Gauss operator + (Gauss a, Gauss b) {
double mean = a.mean + b.mean;
// Is it this? (Yes, it is!)
return new Gauss(mean, a.variance + b.variance);
// Or this? (nope)
//return new Gauss(mean, Math.Max(a.variance, b.variance));
// Or how about this? (nope)
//return new Gauss(mean, (a.variance + b.variance)/2);
}
Can anyone help define a statistically correct - or at least "reasonable" - version of the + operator?
I suppose I could switch the code to use interval arithmetic instead, but I was hoping to stay in the world of prob and stats.
The sum of two normal distributions is itself a normal distribution:
N(mean1, variance1) + N(mean2, variance2) ~ N(mean1 + mean2, variance1 + variance2)
This is all on wikipedia page.
Be careful that these really are variances and not standard deviations.
// X + Y
public static Gauss operator + (Gauss a, Gauss b) {
//NOTE: this is valid if X,Y are independent normal random variables
return new Gauss(a.mean + b.mean, a.variance + b.variance);
}
// X*b
public static Gauss operator * (Gauss a, double b) {
return new Gauss(a.mean*b, a.variance*b*b);
}
To be more precise:
If a random variable Z is defined as the linear combination of two uncorrelated Gaussian random variables X and Y, then Z is itself a Gaussian random variable, e.g.:
if Z = aX + bY,
then mean(Z) = a * mean(X) + b * mean(Y), and variance(Z) = a2 * variance(X) + b2 * variance(Y).
If the random variables are correlated, then you have to account for that. Variance(X) is defined by the expected value E([X-mean(X)]2). Working this through for Z = aX + bY, we get:
variance(Z) = a2 * variance(X) + b2 * variance(Y) + 2ab * covariance(X,Y)
If you are summing two uncorrelated random variables which do not have Gaussian distributions, then the distribution of the sum is the convolution of the two component distributions.
If you are summing two correlated non-Gaussian random variables, you have to work through the appropriate integrals yourself.
Well, your multiplication by scalar is wrong - you should multiply variance by the square of d. If you're adding a constant, then just add it to the mean, the variance stays the same. If you're adding two distributions, then add the means and add the variances.
Can anyone help define a statistically correct - or at least "reasonable" - version of the + operator?
Arguably not, as adding two distributions means different things - having worked in reliability and maintainablity my first reaction from the title would be the distribution of a system's mtbf, if the mtbf of each part is normally distributed and the system had no redundancy. You are talking about the distribution of the sum of two normally distributed independent variates, not the (logical) sum of two normal distributions' effect. Very often, operator overloading has surprising semantics. I'd leave it as a function and call it 'normalSumDistribution' unless your code has a very specific target audience.
Hah, I thought you couldn't add gaussian distributions together, but you can!
http://mathworld.wolfram.com/NormalSumDistribution.html
In fact, the mean is the sum of the individual distributions, and the variance is the sum of the individual distributions.
I'm not sure that I like what you're calling "integration" over a series of values. Do you mean that word in a calculus sense? Are you trying to do numerical integration? There are other, better ways to do that. Yours doesn't look right to me, let alone optimal.
The Gaussian distribution is a nice, smooth function. I think a nice quadrature approach or Runge-Kutta would be a much better idea.
I would have thought it depends on what type of addition you are doing. If you just want to get a normal distribution with properties (mean, standard deviation etc.) equal to the sum of two distributions then the addition of the properties as given in the other answers is fine. This is the assumption used in something like PERT where if a large number of normal probability distributions are added up then the resulting probability distribution is another normal probability distribution.
The problem comes when the two distributions being added are not similar. Take for instance adding a probability distribution with a mean of 2 and standard deviation of 1 and a probability distribution of 10 with a standard deviation of 2. If you add these two distributions up, you get a probability distribution with two peaks, one at 2ish and one at 10ish. The result is therefore not a normal distibution. The assumption about adding distributions is only really valid if the original distributions are either very similar or you have a lot of original distributions so that the peaks and troughs can be evened out.

Resources