What are some ways to crossover in a Genetic Algorithm? - algorithm

Here is a very small example I'll give:
Suppose I have my population of size 4. Now, let's say that I sort them based on their fitness and I decide to drop the last 2 (so now my population size is 2). Now I would need to go back to the original size but first I would have to create offspring.
Let's say I have this: (not written in any specific language)
population = [[2.2],[49.7],[34.1],[25.39]] //original population, I would run this under a fitness function
sortedPopulation = [[49.7],[25.39],[2.2],[34.1]] //sorted population based upon their fitness
best = [[49.7],[25.39]] //updated population with the last 2 elements being dropped (because they are the 2 worst)
At this point, I'm trying to figure out how I can crossover and create offspring. After the offspring, I will generate some more Doubles to get back to the original population size (which I already know how to do). What are some ways to crossover? And what would the result of the crossover actually look like?
I also want to make sure that there is a method that will work for any # of elements in the population's elements.
For instance, if each of the elements were 2 double's, then how would I create offspring from something like this:
best = [[3.3,92.56],[10.5,15.01]]

if you are working with integers its good idea to work with them as binary... then you can for example swap last 2 bits as crossover, swap every second bit, etc... or even better as Gray code (https://en.wikipedia.org/wiki/Gray_code) or you might get stuck on Hamming's barrier (https://en.wikipedia.org/wiki/Hamming_distance)
hope it helps ^^
Patrik

Related

How to valorize better offsprings better than with my roulette selection method?

I am playing around with genetic programming algorithms, and I want to know how I can valorize and make sure my best exemplares reproduce more by substituting or improving the way I choose which one will reproduce. Currently the method I use looks like this:
function roulette(population)
local slice = sum_of_fitnesses(population) * math.random()
local sum = 0
for iter = 1, #population do
sum = sum + population[iter].fitness
if sum >= slice then
return population[iter]
end
end
end
But I can't get my population to reach an average fitness which is above a certain value and I worry it's because of less fit members reproducing with more fit members and thus continuing to spread their weak genes around.
So how can I improve my roulette selection method? Or should I use a completely different fitness proportionate selector?
There are a couple of issues at play here.
You are choosing the probability of an individual replicating based on its fitness, so the fitness function that you are using needs to exaggerate small differences or else having a minor decrease in fitness isn't so bad. For example, if a fitness drops from 81 to 80, this change is probably within the noise of the system and won't make much of a different to evolution. It will certainly be almost impossible to climb to a very high fitness if a series of small changes need to be made because the selective pressure simply won't be strong enough.
The way you solve this problem is by using something like tournament selection. In it's simplest form, every time you want to choose another individual to be born, you pick K random individuals (K is known and the "tournament size"). You calculate the fitness of each individual and whomever has the highest fitness is replicated. It doesn't matter if the fitness difference is 81 vs 80 or if its 10000 vs 2, since it simply takes the highest fitness.
Now the question is: what should you set K to? K can be thought of as the strength of selection. If you set it low (e.g., K=2) then many low fitness individuals will get lucky and slip through, being competed against other low-fitness individuals. You'll get a lot of diversity, but very little section. On the flip side, if you set K to be high (say, K=100), you're ALWAYS going to pick one of the highest fitnesses in the population, ensuring that the population average is driven closer to the max, but also driving down diversity in the population.
The particular tradeoff here depends on the specific problem. I recommend trying out different options (including your original algorithm) with a few different problems to see what happens. For example, try the all-ones problem: potential solutions are bit strings and a fitness is simply the number of 1's. If you have weak selection (as in your original example, or with K=2), you'll see that it never quite gets to a perfect all-ones solution.
So, why not always use a high K? Well consider a problem where ones are negative unless they appear in a block of four consecutive ones (or eight, or however many), when they suddenly become very positive. Such a problem is "deceptive", which means that you need to explore through solutions that look bad in order to find ones that are good. If you set your strength of selection too high, you'll never collect three ones for that final mutation to give you the fourth.
Lots of more advanced techniques exist that use tournament selection that you might want to look at. For example, varying K over time, or even within a population, select some individuals using a low K and others using a high K. It's worth reading up on some more if you're planning to build a better algorithm.

Genetic algorithm on a knapsack-alike optiproblem

I have a optimzation problem i'm trying to solve using a genetic algorithm. Basically, there is a list of 10 bound real valued variables (-1 <= x <= 1), and I need to maximize some function of that list. The catch is that only up to 4 variables in the list may be != 0 (subset condition).
Mathematically speaking:
For some function f: [-1, 1]^10 -> R
min f(X)
s.t.
|{var in X with var != 0}| <= 4
Some background on f: The function is NOT similar to any kind of knapsack objective function like Sum x*weight or anything like that.
What I have tried so far:
Just a basic genetic algorithm over the genome [-1, 1]^10 with 1-point-crossover and some gaussian mutation on the variables. I tried to encode the subset condition in the fitness function by using just the first 4 nonzero (zero as in close enough to 0) values. This approach doesn't work that well and the algorithm is stuck at the 4 first variables and never uses values beyond that. I saw some kind of GA for the 01-knapsack problem where this approach worked well, but apparently this works just with binary variables.
What would you recommend me to try next?
If your fitness function is quick and dirty to evaluate then it's cheap to increase your total population size.
The problem you are running into is that you're trying to select two completely different things simultaneously. You want to select which 4 genomes you care about, and then what values are optimal.
I see two ways to do this.
You create 210 different "species". Each specie is defined by which 4 of the 10 genomes they are allowed to use. Then you can run a genetic algorithm on each specie separately (either serially, or in parallel within a cluster).
Each organism has only 4 genome values (when creating random offspring choose which genomes at random). When two organisms mate you only cross over with genomes that match. If your pair of organisms contain 3 common genomes then you could randomly pick which of the genome you may prefer as the 4th. You could also, as a heuristic, avoid mating organisms that appear to be too genetically different (i.e. a pair that shares two or fewer genomes may make for a bad offspring).
I hope that gives you some ideas you can work from.
You could try a "pivot"-style step: choose one of the existing nonzero values to become zero, and replace it by setting one of the existing zero values to become nonzero. (My "pivot" term comes from linear programming, in which a pivot is the basic step in the simplex method).
Simplest case would be to be evenhandedly random in the selection of each of these values; you can choose a random value, or multiple values, for the new nonzero variable. A more local kind of step would be to use a Gaussian step only on the existing nonzero variables, but if one of those variables crosses zero, spawn variations that pivot to one of the zero values. (Note that these are not mutually exclusive, as you can easily add both kinds of steps).
If you have any information about the local behavior of your fitness score, you can try to use that to guide your choice. Just because actual evolution doesn't look at the fitness function, doesn't mean you can't...
Does your GA solve the problem well without the subset constraint? If not, you might want to tackle that first.
Secondly, you might make your constraint soft instead of hard: Penalize a solution's fitness for each zero-valued variable it has, beyond 4. (You might start by loosening the constraint even further, allowing 9 0-valued variables, then 8, etc., and making sure the GA is able to handle those problem variants before making the problem more difficult.)
Thirdly, maybe try 2-point or multi-point crossover instead of 1-point.
Hope that helps.
-Ted

Solve crossword puzzle with genetic algorithm, fitness, mutation

I'm trying hard to do a lab for school. I'm trying to solve a crossword puzzle using genetic algorithms.
Problem is that is not very good (it is still too random)
I will try to give a brief explanation of how my program is implemented now:
If i have the puzzle (# is block, 0 is empty space)
#000
00#0
#000
and a collection of words that are candidates to this puzzle's solution.
My DNA is simply the matrix as a 1D array.
My first set of individuals have random generated DNAs from the pool of letters that my words contains.
I do selection using roulette-selection.
There are some parameters about the chance of combination and mutations, but if mutation will happen then i will always change 25% of the DNA.
I change it with random letters from my pool of letters.(this can have negative effects, as the mutations can destroy already formed words)
Now the fitness function:
I traverse the matrix both horizontaly and verticaly:
If i find a word then FITNESS += word.lengh +1
If i find a string that is a part of some word then FITNESS += word.length / (puzzle_size*4) . Anyway it should give a value between 0 and 1.
So it can find "to" from "tool" and ads X to FITNESS, then right after it finds "too" from "tool" and adds another Y to FITNESS.
My generations are not actually improving over time. They appear random.
So even after 400 generations with a pool of 1000-2000 (these numbers dont really matter) i get a solution with 1-2 words (of 2 or 3 letters) when the solution should have 6 words.
I think your fitness function might be ill-defined. I would set this up so each row has a binary fitness level. Either a row is fit, or it is not. (eg a Row is a word or it is not a word) Then the overall fitness of the solution would be #fit rows / total rows (both horizontally and vertically). Also, you might be changing too much of the dna, I would make that variable and experiment with that.
Your fitness function looks OK to me, although without more detail it's hard to get a really good picture of what you're doing.
You don't specify the mutation probability, but when you do mutate, 25% is a very high mutation. Also, roulette wheel selection applies a lot of selection pressure. What you often see is that the algorithm pretty early on finds a solution that is quite a bit better than all the others, and roulette wheel selection causes the algorithm to select it with such high probability that you quickly end up with a population full of copies of that. At that point, search halts except for the occasional blindly lucky mutation, and since your mutations are so large, it's very unlikely that you'll find an improving move without wrecking the rest of the chromosome.
I'd try binary tournament selection, and a more sensible mutation operator. The usual heuristic people use for mutation is to (on average) flip one "bit" of each chromosome. You don't want a deterministic one letter change each time though. Something like this:
for(i=0; i<chromosome.length(); ++i) {
// random generates double in the range [0, 1)
if(random() < 1.0/chromosome.length()) {
chromosome[i] = pick_random_letter();
}
}

breeding parents for multiple children in genetic algorithm

I'm building my first Genetic Algorithm in javascript, using a collection of tutorials.
I'm building a somewhat simpler structure to this scheduling tutorial http://www.codeproject.com/KB/recipes/GaClassSchedule.aspx#Chromosome8, but I've run into a problem with breeding.
I get a population of 60 individuals, and now I'm picking the top two individuals to breed, and then selecting a few random other individuals to breed with the top two, am I not going to end up with a fairly small amount of parents rather quickly?
I figure I'm not going to be making much progress in the solution if I breed the top two results with each of the next 20.
Is that correct? Is there a generally accepted method for doing this?
I have a sample of genetic algorithms in Javascript here.
One problem with your approach is that you are killing diversity in the population by mating always the top 2 individuals. That will never work very well because it's too greedy, and you'll actually be defeating the purpose of having a genetic algorithm in the first place.
This is how I am implementing mating with elitism (which means I am retaining a percentage of unaltered best fit individuals and randomly mating all the rest), and I'll let the code do the talking:
// save best guys as elite population and shove into temp array for the new generation
for(var e = 0; e < ELITE; e++) {
tempGenerationHolder.push(fitnessScores[e].chromosome);
}
// randomly select a mate (including elite) for all of the remaining ones
// using double-point crossover should suffice for this silly problem
// note: this should create INITIAL_POP_SIZE - ELITE new individualz
for(var s = 0; s < INITIAL_POP_SIZE - ELITE; s++) {
// generate random number between 0 and INITIAL_POP_SIZE - ELITE - 1
var randInd = Math.floor(Math.random()*(INITIAL_POP_SIZE - ELITE));
// mate the individual at index s with indivudal at random index
var child = mate(fitnessScores[s].chromosome, fitnessScores[randInd].chromosome);
// push the result in the new generation holder
tempGenerationHolder.push(child);
}
It is fairly well commented but if you need any further pointers just ask (and here's the github repo, or you can just do a view source on the url above). I used this approach (elitism) a number of times, and for basic scenarios it usually works well.
Hope this helps.
When I've implemented genetic algorithms in the past, what I've done is to pick the parents always probabilistically - that is, you don't necessarily pick the winners, but you will pick the winners with a probability depending on how much better they are than everyone else (based on the fitness function).
I cannot remember the name of the paper to back it up, but there is a mathematical proof that "ranking" selection converges faster than "proportional" selection. If you try looking around for "genetic algorithm selection strategy" you may find something about this.
EDIT:
Just to be more specific, since pedalpete asked, there are two kinds of selection algorithms: one based on rank, one based on fitness proportion. Consider a population with 6 solutions and the following fitness values:
Solution Fitness Value
A 5
B 4
C 3
D 2
E 1
F 1
In ranking selection, you would take the top k (say, 2 or 4) and use those as the parents for your next generation. In proportional ranking, to form each "child", you randomly pick the parent with a probability based on fitness value:
Solution Probability
A 5/16
B 4/16
C 3/16
D 2/16
E 1/16
F 1/16
In this scheme, F may end up being a parent in the next generation. With a larger population size (100 for example - may be larger or smaller depending on the search space), this will mean that the bottom solutions will end up being a parent some of the time. This is OK, because even "bad" solutions have some "good" aspects.
Keeping the absolute fittest individuals is called elitism, and it does tend to lead to faster convergence, which, depending on the fitness landscape of the problem, may or may not be what you want. Faster convergence is good if it reduces the amount of effort taken to find an acceptable solution but it's bad if it means that you end up with a local optimum and ignore better solutions.
Picking the other parents completely at random isn't going to work very well. You need some mechanism whereby fitter candidates are more likely to be selected than weaker ones. There are several different selection strategies that you can use, each with different pros and cons. Some of the main ones are described here. Typically you will use roulette wheel selection or tournament selection.
As for combining the elite individuals with every single one of the other parents, that is a recipe for destroying variation in the population (as well as eliminating the previously preserved best candidates).
If you employ elitism, keep the elite individuals unchanged (that's the point of elitism) and then mate pairs of the other parents (which may or may not include some or all of the elite individuals, depending on whether they were also picked out as parents by the selection strategy). Each parent will only mate once unless it was picked out multiple times by the selection strategy.
Your approach is likely to suffer from premature convergence. There are lots of other selection techniques to pick from though. One of the more popular that you may wish to consider is Tournament selection.
Different selection strategies provide varying levels of 'selection pressure'. Selection pressure is how strongly the strategy insists on choosing the best programs. If the absolute best programs are chosen every time, then your algorithm effectively becomes a hill-climber; it will get trapped in local optimum with no way of navigating to other peaks in the fitness landscape. At the other end of the scale, no fitness pressure at all means the algorithm will blindly stumble around the fitness landscape at random. So, the challenge is to try to choose an operator with sufficient (but not excessive) selection pressure, for the problem you are tackling.
One of the advantages of the tournament selection operator is that by just modifying the size of the tournament, you can easily tweak the level of selection pressure. A larger tournament will give more pressure, a smaller tournament less.

Genetic Algorithm Implementation for weight optimization

I am a data mining student and I have a problem that I was hoping that you guys could give me some advice on:
I need a genetic algo that optimizes the weights between three inputs. The weights need to be positive values AND they need to sum to 100%.
The difficulty is in creating an encoding that satisfies the sum to 100% requirement.
As a first pass, I thought that I could simply create a chrom with a series of numbers (ex.4,7,9). Each weight would simply be its number divided by the sum of all of the chromosome's numbers (ex. 4/20=20%).
The problem with this encoding method is that any change to the chromosome will change the sum of all the chromosome's numbers resulting in a change to all of the chromosome's weights. This would seem to significantly limit the GA's ability to evolve a solution.
Could you give any advice on how to approach this problem?
I have read about real valued encoding and I do have an implementation of a GA but it will give me weights that may not necessarily add up to 100%.
It is mathematically impossible to change one value without changing at least one more if you need the sum to remain constant.
One way to make changes would be exactly what you suggest: weight = value/sum. In this case when you change one value, the difference to be made up is distributed across all the other values.
The other extreme is to only change pairs. Start with a set of values that add to 100, and whenever 1 value changes, change another by the opposite amount to maintain your sum. The other could be picked randomly, or by a rule. I'd expect this would take longer to converge than the first method.
If your chromosome is only 3 values long, then mathematically, these are your only two options.

Resources