I have an abstract question about mutation in genetic algorithm. I do not include any code snippet to ask my question independent of coding environment.
I am writing a genetic algorithm code but I do not know how to implement mutation. Suppose an offspring that is going to be mutated is a string like 10110011110101000111 which has a length of 20.
Mutation must be done with a very small probability for example 0.001. We produce a random number between 0 and 1, and by that we decide whether the offspring should be mutated or not. My question is that we have to generate 20 random number and make decision about mutation for every 20 bits of this offspring? or we have to generate only 1 random number for the whole offspring and toggle a bit randomly?
In other words does each bit in the offspring have a chance to be mutated according to generated random number or only one bit has the chance of mutation?
"Traditionally" the mutation rate pmutation ("Genetic Algorithms in Search, Optimization and Machine Learning" by David E Goldberg) is a measure of the likeness that random elements of your chromosome will be flipped into something else (your "option 1").
I.e. if your chromosome is encoded as a binary string of length 1000 and pmutation is 1% (0.01), it means that 10 out of your 1000 bits (on average) will be flipped.
Although it would be possible to avoid much random number generation deciding when the next mutation should occur (rather than calling the RNG each time), the implementation is somewhat more complex. To avoid sophisticated niceties many implementations go with "option 2".
Consider that, if L is the length of the binary chromosome:
for unimodal search spaces, 1/L is often suggested as a good mutation rate (e.g. "How genetic algorithms really work: mutation and hillclimbing" by H. Muehlenbein).
for multimodal search spaces, mutation rates significantly higher than 1/L are the standard and your "option 2" would have some problems.
"option 1" is a bit more flexible and follows the rule of least surprise so, lacking other details, I'd choose it.
Related
I am learning genetic algorithm and when I studied about mutation some thing was there I can't figured out.It was a little unusual after we produce offspring by crossover point we should apply mutation(with small probability) what is that small probability? I have image about 8 queen problem that we found optimal answer for it here our crossover point is 3 so why for example we have mutation in first and third and last population but not in second one??
I am sorry that this question might be silly!
First, what you call population, it's actually just an individual from a population. The population is the whole set of all individuals.
A good genetic algorithm is one that balances "Exploration & Exploatation". Exploration tries to find out new solutions not caring how good they are (because they might be leading to some better solutions). Exploatation is trying to take advantage of what the algorithm already knows as being "good solutions".
CROSSOVER = EXPLOATATION Using crossover you are trying to combine the best individuals (fitness wise) to produce even better solutions
MUTATION = EXPLORATION Using mutation you try to get out of your "gene pool" and find out new solutions, with new characteristics that don't arise from your current population.
This being said, the best way to balance them is usually trial&error. For a certain input try to play with the parameters and see how they work.
About why the 2nd individual didn't mutate at all is just simply because the probabilistic process that mutates individual didn't choose it. Usually, on this kind of problems, mutation works like this:
for individual in population do:
for gene in individual:
if random() < MUTATION_RATIO:
mutate(gene)
This means that an individual might suffer even more than one mutation.
My experience with genetic algorithms is that the optimal mutuation probability is dependent on the algorithm and sometimes even on the problem. As a rule of thumb:
Mutation probability too small: population converges early
Mutation probanility too high: population never converges
So basically the question is not answerable. I have gone from 0.5 to 8% percent depending on the number of parameters, the mutation algorithm and the problem (i.e. sensibility to parameter changes). There are also algorithms that change the mutationrate over the generations.
I found a nice way to learn and experiment with the mutation rate (although only for that alorithm) is this site you can play with the probability and see the effects instantly. It also is pretty meditative watching those litte cars.
First of all, this is a part of a homework.
I am trying to implement a genetic algorithm. I am confused about selecting parents to crossover.
In my notes (obviously something is wrong) this is what is done as example;
Pc (possibility of crossover) * population size = estimated chromosome count to crossover (if not even, round to one of closest even)
Choose a random number in range [0,1] for every chromosome and if this number is smaller then Pc, choose this chromosome for a crossover pair.
But when second step applied, chosen chromosome count is equals to result found in first step. Which is not always guaranteed because of randomness.
So this does not make any sense. I searched for selecting parents for crossover but all i found is crossover techniques (one-point, cut and slice etc.) and how to crossover between chosen parents (i do not have a problem with these). I just don't know which chromosomesi should choose for crossover. Any suggestions or simple example?
You can implement it like this:
For every new child you decide if it will result from crossover by random probability. If yes, then you select two parents, eg. through roulette wheel selection or tournament selection. The two parents make a child, then you mutate it with mutation probability and add it to the next generation. If no, then you select only one "parent" clone it, mutate it with probability and add it to the next population.
Some other observations I noted and that I like to comment. I often read the word "chromosomes" when it should be individual. You hardly ever select chromosomes, but full individuals. A chromosome is just one part of a solution. That may be nitpicking, but a solution is not a chromosome. A solution is an individual that consists of several chromosomes which consist of genes which show their expression in the form of alleles. Often an individual has only one chromosome, but it's still not okay to mix terms.
Also I noted that you tagged genetic programming which is basically only a special type of a genetic algorithm. In GP you consider trees as a chromosome which can represent mathematical formulas or computer programs. Your question does not seem to be about GP though.
This is very late answer, but hoping it will help someone in the future. Even if two chromosomes are not paired (and produced children), they goes to the next generation (without crossover) but after some mutation (subject to probability again). And on the other hand, if two chromosomes paired, then they produce two children (replacing the original two parents) for the next generation. So, that's why the no of chromosomes remain same in two generations.
I'm trying hard to do a lab for school. I'm trying to solve a crossword puzzle using genetic algorithms.
Problem is that is not very good (it is still too random)
I will try to give a brief explanation of how my program is implemented now:
If i have the puzzle (# is block, 0 is empty space)
#000
00#0
#000
and a collection of words that are candidates to this puzzle's solution.
My DNA is simply the matrix as a 1D array.
My first set of individuals have random generated DNAs from the pool of letters that my words contains.
I do selection using roulette-selection.
There are some parameters about the chance of combination and mutations, but if mutation will happen then i will always change 25% of the DNA.
I change it with random letters from my pool of letters.(this can have negative effects, as the mutations can destroy already formed words)
Now the fitness function:
I traverse the matrix both horizontaly and verticaly:
If i find a word then FITNESS += word.lengh +1
If i find a string that is a part of some word then FITNESS += word.length / (puzzle_size*4) . Anyway it should give a value between 0 and 1.
So it can find "to" from "tool" and ads X to FITNESS, then right after it finds "too" from "tool" and adds another Y to FITNESS.
My generations are not actually improving over time. They appear random.
So even after 400 generations with a pool of 1000-2000 (these numbers dont really matter) i get a solution with 1-2 words (of 2 or 3 letters) when the solution should have 6 words.
I think your fitness function might be ill-defined. I would set this up so each row has a binary fitness level. Either a row is fit, or it is not. (eg a Row is a word or it is not a word) Then the overall fitness of the solution would be #fit rows / total rows (both horizontally and vertically). Also, you might be changing too much of the dna, I would make that variable and experiment with that.
Your fitness function looks OK to me, although without more detail it's hard to get a really good picture of what you're doing.
You don't specify the mutation probability, but when you do mutate, 25% is a very high mutation. Also, roulette wheel selection applies a lot of selection pressure. What you often see is that the algorithm pretty early on finds a solution that is quite a bit better than all the others, and roulette wheel selection causes the algorithm to select it with such high probability that you quickly end up with a population full of copies of that. At that point, search halts except for the occasional blindly lucky mutation, and since your mutations are so large, it's very unlikely that you'll find an improving move without wrecking the rest of the chromosome.
I'd try binary tournament selection, and a more sensible mutation operator. The usual heuristic people use for mutation is to (on average) flip one "bit" of each chromosome. You don't want a deterministic one letter change each time though. Something like this:
for(i=0; i<chromosome.length(); ++i) {
// random generates double in the range [0, 1)
if(random() < 1.0/chromosome.length()) {
chromosome[i] = pick_random_letter();
}
}
What is Crossover Probability & Mutation Probability in Genetic Algorithm or Genetic Programming ? Could someone explain them from implementation perspective!
Mutation probability (or ratio) is basically a measure of the likeness that random elements of your chromosome will be flipped into something else. For example if your chromosome is encoded as a binary string of lenght 100 if you have 1% mutation probability it means that 1 out of your 100 bits (on average) picked at random will be flipped.
Crossover basically simulates sexual genetic recombination (as in human reproduction) and there are a number of ways it is usually implemented in GAs. Sometimes crossover is applied with moderation in GAs (as it breaks symmetry, which is not always good, and you could also go blind) so we talk about crossover probability to indicate a ratio of how many couples will be picked for mating (they are usually picked by following selection criteria - but that's another story).
This is the short story - if you want the long one you'll have to make an effort and follow the link Amber posted. Or do some googling - which last time I checked was still a good option too :)
According to Goldberg (Genetic Algorithms in Search, Optimization and Machine Learning) the probability of crossover is the probability that crossover will occur at a particular mating; that is, not all matings must reproduce by crossover, but one could choose Pc=1.0.
Probability of Mutation is per JohnIdol.
It's shows the quantity of features which inherited from the parents in crossover!
Note: If crossover probability is 100%, then all offspring is made by crossover. If it is 0%, whole new generation is made from exact
copies of chromosomes from old population (but this does not mean that
the new generation is the same!).
Here might be a little good explanation on these two probabilities:
http://www.optiwater.com/optiga/ga.html
Johnldol's answer on mutation probability is exactly words that the website is saying:
"Each bit in each chromosome is checked for possible mutation by generating a random number between zero and one and if this number is less than or equal to the given mutation probability e.g. 0.001 then the bit value is changed."
For crossover probability, maybe it is the ratio of next generation population born by crossover operation. While the rest of population...maybe by previous selection
or you can define it as best fit survivors
Imagine, there are two same-sized sets of numbers.
Is it possible, and how, to create a function an algorithm or a subroutine which exactly maps input items to output items? Like:
Input = 1, 2, 3, 4
Output = 2, 3, 4, 5
and the function would be:
f(x): return x + 1
And by "function" I mean something slightly more comlex than [1]:
f(x):
if x == 1: return 2
if x == 2: return 3
if x == 3: return 4
if x == 4: return 5
This would be be useful for creating special hash functions or function approximations.
Update:
What I try to ask is to find out is whether there is a way to compress that trivial mapping example from above [1].
Finding the shortest program that outputs some string (sequence, function etc.) is equivalent to finding its Kolmogorov complexity, which is undecidable.
If "impossible" is not a satisfying answer, you have to restrict your problem. In all appropriately restricted cases (polynomials, rational functions, linear recurrences) finding an optimal algorithm will be easy as long as you understand what you're doing. Examples:
polynomial - Lagrange interpolation
rational function - Pade approximation
boolean formula - Karnaugh map
approximate solution - regression, linear case: linear regression
general packing of data - data compression; some techniques, like run-length encoding, are lossless, some not.
In case of polynomial sequences, it often helps to consider the sequence bn=an+1-an; this reduces quadratic relation to linear one, and a linear one to a constant sequence etc. But there's no silver bullet. You might build some heuristics (e.g. Mathematica has FindSequenceFunction - check that page to get an impression of how complex this can get) using genetic algorithms, random guesses, checking many built-in sequences and their compositions and so on. No matter what, any such program - in theory - is infinitely distant from perfection due to undecidability of Kolmogorov complexity. In practice, you might get satisfactory results, but this requires a lot of man-years.
See also another SO question. You might also implement some wrapper to OEIS in your application.
Fields:
Mostly, the limits of what can be done are described in
complexity theory - describing what problems can be solved "fast", like finding shortest path in graph, and what cannot, like playing generalized version of checkers (they're EXPTIME-complete).
information theory - describing how much "information" is carried by a random variable. For example, take coin tossing. Normally, it takes 1 bit to encode the result, and n bits to encode n results (using a long 0-1 sequence). Suppose now that you have a biased coin that gives tails 90% of time. Then, it is possible to find another way of describing n results that on average gives much shorter sequence. The number of bits per tossing needed for optimal coding (less than 1 in that case!) is called entropy; the plot in that article shows how much information is carried (1 bit for 1/2-1/2, less than 1 for biased coin, 0 bits if the coin lands always on the same side).
algorithmic information theory - that attempts to join complexity theory and information theory. Kolmogorov complexity belongs here. You may consider a string "random" if it has large Kolmogorov complexity: aaaaaaaaaaaa is not a random string, f8a34olx probably is. So, a random string is incompressible (Volchan's What is a random sequence is a very readable introduction.). Chaitin's algorithmic information theory book is available for download. Quote: "[...] we construct an equation involving only whole numbers and addition, multiplication and exponentiation, with the property that if one varies a parameter and asks whether the number of solutions is finite or infinite, the answer to this question is indistinguishable from the result of independent tosses of a fair coin." (in other words no algorithm can guess that result with probability > 1/2). I haven't read that book however, so can't rate it.
Strongly related to information theory is coding theory, that describes error-correcting codes. Example result: it is possible to encode 4 bits to 7 bits such that it will be possible to detect and correct any single error, or detect two errors (Hamming(7,4)).
The "positive" side are:
symbolic algorithms for Lagrange interpolation and Pade approximation are a part of computer algebra/symbolic computation; von zur Gathen, Gerhard "Modern Computer Algebra" is a good reference.
data compresssion - here you'd better ask someone else for references :)
Ok, I don't understand your question, but I'm going to give it a shot.
If you only have 2 sets of numbers and you want to find f where y = f(x), then you can try curve-fitting to give you an approximate "map".
In this case, it's linear so curve-fitting would work. You could try different models to see which works best and choose based on minimizing an error metric.
Is this what you had in mind?
Here's another link to curve-fitting and an image from that article:
It seems to me that you want a hashtable. These are based in hash functions and there are known hash functions that work better than others depending on the expected input and desired output.
If what you want a algorithmic way of mapping arbitrary input to arbitrary output, this is not feasible in the general case, as it totally depends on the input and output set.
For example, in the trivial sample you have there, the function is immediately obvious, f(x): x+1. In others it may be very hard or even impossible to generate an exact function describing the mapping, you would have to approximate or just use directly a map.
In some cases (such as your example), linear regression or similar statistical models could find the relation between your input and output sets.
Doing this in the general case is arbitrarially difficult. For example, consider a block cipher used in ECB mode: It maps an input integer to an output integer, but - by design - deriving any general mapping from specific examples is infeasible. In fact, for a good cipher, even with the complete set of mappings between input and output blocks, you still couldn't determine how to calculate that mapping on a general basis.
Obviously, a cipher is an extreme example, but it serves to illustrate that there's no (known) general procedure for doing what you ask.
Discerning an underlying map from input and output data is exactly what Neural Nets are about! You have unknowingly stumbled across a great branch of research in computer science.