In jenetics library, following code is given in alter() methid of Mutator class:
final double p = pow(_probability, 1.0/3.0);
Can anyone please explain the purpose of calculating this new probabilty for mutation? How is it beneficial? How can we use this class to implement One-Position or Point Mutation?
Ref:https://github.com/jenetics/jenetics/blob/master/org.jenetics/src/main/java/org/jenetics/Mutator.java
The reason for this is the hierarchical structure of Population -> Genotype -> Chromosome -> Gene. Since the given probability is the mutation probability of a single Gene and you first have to select one Genotype out of the Population. Then you select one Chromosome out of the selected Genotype. At last, the Gene is selected from the Chromosome. The selection probability of the single selection steps is set to pow(p, 1/3), which leads to the desired Gene mutation probability of p.
This mechanism is also described in the Jenetics Manual in paragraph "Mutator" on page 13.
Related
I'm coming from biology's field and thus I have some difficulties in understanding (intuitively?) some of the ideas of that paper. I really tried my best to decipher it step by step by using a lot of google and youtube, but now I feel, it's the time to refer to the professionals in that field.
Before filling out the whole universe with (unordered) questions, let me put the whole thing down and try to introduce you to the subject while at the same time explain to you what I got so far from my research on that.
Microarrays
For those that do not have any idea of what this is, you can imagine, that it is literally an array (matrix) where each cell of it contains a probe for a specific gene. Making the long story short, by the end of the microarray experiment, you have a matrix (in computational terms) with each column representing a sample, each line a different gene while the contents of the matrix represent the expression values of the genes for each sample.
Pathways
In biology pathway / gene-set they call a set of genes that interact with each other forming a small network responsible for a specific function.These pathways are not isolated but they talk/interact with each other too. What that paper does on the first hand, is to expand the initial pathway (let us call it target pathway), by including some other genes from other pathways that might interact with that.
Procedure
1.
Let's assume now that we have a matrix G x S. Where G for genes and S for Samples. We construct a gene co-expression network (G x G) using as weights the Pearson's correlation coefficients between genes' pairs (a). This could also be represented as an undirected weighted graph. .
2.
For each gene (row OR column) we calculate the weighted degree (d) which is nothing more than the sum of all correlation coefficients of that gene.
3.
From the two previous matrices, they construct the transition matrix producing the probabilities (P) to transit from one gene to another by using the
formula
Q1. Why do they call this transition probability? Is there any intuitive way to see this as a probability in the biological context?
4.
Since we have the whole transition matrix, we can define a subnetwork of the initial one, that we want to expand it and it consisted out of let's say 15 genes. In that step, they used formula number 3 (on the paper) which transforms the values of the initial transition matrix as it says. They set the probability of 1 on the nodes that are part of the selected subnetwork because they define them as absorbing states.
Q2. In that same formula (3), I cannot understand what the second condition does. When should the probability be 0? Intuitively, in my opinion, all nodes that didn't exist in subnetwork, should have the P_ij value as a probability.
5.
After that, the newly constructed transition matrix is showed at formula (4) in the paper and I managed to understand it using this excellent article.
6.
Here is where everything is getting more blur for me and where I need the most of the help. What I imagine at that step, is that the algorithm starts randomly from one node and keep walking around the network. In order to construct a relevance function (What that exactly means?), they firstly calculate a probability called joint probability of visiting one node/edge E(i,j) and noted as :
From the other hand they seem to calculate another probability called probability of a walk of length L starting in x and denoted as :
7.
In the next step, they divide the previously calculated probabilities and calculate the number of times a random walk starts in x using the transition from i to j that I don't really understand what this means.
After that step, I lost their reasoning at all :-P.
I'm not expecting an expert to come open my mind and give me understand that procedure. What I'm expecting is some guidelines, hints, ideas, useful resources or more intuitive approaches to understanding the whole procedure. Then when I fully understand it I will try to implement it on R or python.
So any idea / critics is welcome.
Thanks.
i am trying to select 3 feature from a data set of 24*461. my problem is in generation part. after cross-over, new chromosome can have more than three 1 and therefore more than three variable. in mutation step, when a zero is changed to one, number of selected feature is more than 3. Any help will be greatly appreciated
A common technique to solve this problem is to impose a "penalty", wherein, any chromosome that have more than three 1 have a penalty added. For example if a chromosome have five 1, add 2x to chromosome fitness score. In this case any chromosome that have more than three 1, gradually Remove from population and permitting other (that have three or less 1) individuals to be maintained in the population.
Can someone explain to me what means "mutation step size" ?
I'm reading an article regarding genetic algorithms and it says:
"the mutation randomly changes the decision of
a node or mutate the value with a step-size of 0.25"
I know the role of mutation in GA life cycle , but I can't find a good explanation for what is step size of mutation.
thanks.
Essentially it's how far a mutation can be away from the last value.
"As far as real-valued search spaces are concerned, mutation is normally performed by adding a normally distributed random value to each vector component. The step size or mutation strength (i.e. the standard deviation of the normal distribution) is often governed by self-adaptation (see evolution window)."
That's complex talk for given a vector you are mutating (say X = [x1,x2,..,xN]) then you'll modify that vector's values by some random amount that won't be more than the mutation step size. So say we had a function called normal(v,stdDev) that generated random values with a normal distribution around some value with stdDev. Then we'd modify each value of that vector with the following psuedo code:
for x in X {
x = normal(x,mutationStepSize)
}
First of all, this is a part of a homework.
I am trying to implement a genetic algorithm. I am confused about selecting parents to crossover.
In my notes (obviously something is wrong) this is what is done as example;
Pc (possibility of crossover) * population size = estimated chromosome count to crossover (if not even, round to one of closest even)
Choose a random number in range [0,1] for every chromosome and if this number is smaller then Pc, choose this chromosome for a crossover pair.
But when second step applied, chosen chromosome count is equals to result found in first step. Which is not always guaranteed because of randomness.
So this does not make any sense. I searched for selecting parents for crossover but all i found is crossover techniques (one-point, cut and slice etc.) and how to crossover between chosen parents (i do not have a problem with these). I just don't know which chromosomesi should choose for crossover. Any suggestions or simple example?
You can implement it like this:
For every new child you decide if it will result from crossover by random probability. If yes, then you select two parents, eg. through roulette wheel selection or tournament selection. The two parents make a child, then you mutate it with mutation probability and add it to the next generation. If no, then you select only one "parent" clone it, mutate it with probability and add it to the next population.
Some other observations I noted and that I like to comment. I often read the word "chromosomes" when it should be individual. You hardly ever select chromosomes, but full individuals. A chromosome is just one part of a solution. That may be nitpicking, but a solution is not a chromosome. A solution is an individual that consists of several chromosomes which consist of genes which show their expression in the form of alleles. Often an individual has only one chromosome, but it's still not okay to mix terms.
Also I noted that you tagged genetic programming which is basically only a special type of a genetic algorithm. In GP you consider trees as a chromosome which can represent mathematical formulas or computer programs. Your question does not seem to be about GP though.
This is very late answer, but hoping it will help someone in the future. Even if two chromosomes are not paired (and produced children), they goes to the next generation (without crossover) but after some mutation (subject to probability again). And on the other hand, if two chromosomes paired, then they produce two children (replacing the original two parents) for the next generation. So, that's why the no of chromosomes remain same in two generations.
Let TARGET be a set of strings that I expect to be spoken.
Let SOURCE be the set of strings returned by a speech recognizer (that is, the possible sentences that it has heard).
I need a way to choose a string from TARGET. I read about the Levenshtein distance and the Damerau-Levenshtein distance, which basically returns the distance between a source string and a target string, that is the number of changes needed to transform the source string into the target string.
But, how can I apply this algorithm to a set of target strings?
I thought I'd use the following method:
For each string that belongs to TARGET, I calculate the distance from each string in SOURCE. In this way we obtain an m-by-n matrix, where n is the cardinality of SOURCE and n is the cardinality of TARGET. We could say that the i-th row represents the similarity of the sentences detected by the speech recognizer with respect to the i-th target.
Calculating the average of the values on each row, you can obtain the average distance between the i-th target and the output of the speech recognizer. Let's call it average_on_row(i), where i is the row index.
Finally, for each row, I calculate the standard deviation of all values in the row. For each row, I also perform the sum of all the standard deviations. The result is a column vector, in which each element (Let's call it stadard_deviation_sum(i)) refers to a string of TARGET.
The string which is associated with the shortest stadard_deviation_sum could be the sentence pronounced by the user. Could be considered the correct method I used? Or are there other methods?
Obviously, too high values indicate that the sentence pronounced by the user probably does not belong to TARGET.
I'm not an expert but your proposal does not make sense. First of all, in practice I'd expect the cardinality of TARGET to be very large if not infinite. Second, I don't believe the Levensthein distance or some similar similarity metric will be useful.
If :
you could really define SOURCE and TARGET sets,
all strings in SOURCE were equally probable,
all strings in TARGET were equally probable,
the strings in SOURCE and TARGET consisted of not characters but phonemes,
then I believe your best bet would be to find the pair p in SOURCE, q in TARGET such that distance(p,q) is minimum. Since especially you cannot guarantee the equal-probability part, I think you should think about the problem from scratch, do some research and make a completely different design. The usual methodology for speech recognition is the use Hidden Markov models. I would start from there.
Answer to your comment: Choose whichever is more probable. If you don't consider probabilities, it is hopeless.
[Suppose the following example is on phonemes, not characters]
Suppose the recognized word the "chees". Target set is "cheese", "chess". You must calculate P(cheese|chees) and P(chess|chees) What I'm trying to say is that not every substitution is equiprobable. If you will model probabilities as distances between strings, then at least you must allow that for example d("c","s") < d("c","q") . (It is common to confuse c and s letters but it is not common to confuse c and q) Adapting the distance calculation algorithm is easy, coming with good values for all pairs is difficult.
Also you must somehow estimate P(cheese|context) and P(chess|context) If we are talking about board games chess is more probable. If we are talking about dairy products cheese is more probable. This is why you'll need large amounts of data to come up with such estimates. This is also why Hidden Markov Models are good for this kind of problem.
You need to calculate these probabilities first: probability of insertion, deletion and substitution. Then use log of these probabilities as penalties for each operation.
In a "context independent" situation, if pi is probability of insertion, pd is probability of deletion and ps probability of substitution, the probability of observing the same symbol is pp=1-ps-pd.
In this case use log(pi/pp/k), log(pd/pp) and log(ps/pp/(k-1)) as penalties for insertion, deletion and substitution respectively, where k is the number of symbols in the system.
Essentially if you use this distance measure between a source and target you get log probability of observing that target given the source. If you have a bunch of training data (i.e. source-target pairs) choose some initial estimates for these probabilities, align source-target pairs and re-estimate these probabilities (AKA EM strategy).
You can start with one set of probabilities and assume context independence. Later you can assume some kind of clustering among the contexts (eg. assume there are k different sets of letters whose substitution rate is different...).