I have the following genetic programming problem(Homework):
I need to use 5 digits (0 to 9) and any operators(+,-,*,/) to generate a expression to give me the answer as 30. Can anyone please explain how can I go about solving this using genetic programming?
First,it's a genetic programing(GP) problem.You can google it for more details.
In GP,each individual represents one computer program,which is represented using a tree structure.In your problem ,it represents a mathematical expression.The tree in follow picture can represent a individual of expression ‘3.4-x’ .
The leaf node is chose from your digital set,and non-leaf node from operator set.
Then,you can choose some crossover and mutation operators to evolve your expression.
For a crossover operator,you can select one random node in each of the two parents,then swap the corresponding subtrees to create the two offspring as illustrated in next figure.
Also,there are many corresponding mutation operators.For example,randomly choose a node to change the digital or operators.
Related
I'm coming from biology's field and thus I have some difficulties in understanding (intuitively?) some of the ideas of that paper. I really tried my best to decipher it step by step by using a lot of google and youtube, but now I feel, it's the time to refer to the professionals in that field.
Before filling out the whole universe with (unordered) questions, let me put the whole thing down and try to introduce you to the subject while at the same time explain to you what I got so far from my research on that.
Microarrays
For those that do not have any idea of what this is, you can imagine, that it is literally an array (matrix) where each cell of it contains a probe for a specific gene. Making the long story short, by the end of the microarray experiment, you have a matrix (in computational terms) with each column representing a sample, each line a different gene while the contents of the matrix represent the expression values of the genes for each sample.
Pathways
In biology pathway / gene-set they call a set of genes that interact with each other forming a small network responsible for a specific function.These pathways are not isolated but they talk/interact with each other too. What that paper does on the first hand, is to expand the initial pathway (let us call it target pathway), by including some other genes from other pathways that might interact with that.
Procedure
1.
Let's assume now that we have a matrix G x S. Where G for genes and S for Samples. We construct a gene co-expression network (G x G) using as weights the Pearson's correlation coefficients between genes' pairs (a). This could also be represented as an undirected weighted graph. .
2.
For each gene (row OR column) we calculate the weighted degree (d) which is nothing more than the sum of all correlation coefficients of that gene.
3.
From the two previous matrices, they construct the transition matrix producing the probabilities (P) to transit from one gene to another by using the
formula
Q1. Why do they call this transition probability? Is there any intuitive way to see this as a probability in the biological context?
4.
Since we have the whole transition matrix, we can define a subnetwork of the initial one, that we want to expand it and it consisted out of let's say 15 genes. In that step, they used formula number 3 (on the paper) which transforms the values of the initial transition matrix as it says. They set the probability of 1 on the nodes that are part of the selected subnetwork because they define them as absorbing states.
Q2. In that same formula (3), I cannot understand what the second condition does. When should the probability be 0? Intuitively, in my opinion, all nodes that didn't exist in subnetwork, should have the P_ij value as a probability.
5.
After that, the newly constructed transition matrix is showed at formula (4) in the paper and I managed to understand it using this excellent article.
6.
Here is where everything is getting more blur for me and where I need the most of the help. What I imagine at that step, is that the algorithm starts randomly from one node and keep walking around the network. In order to construct a relevance function (What that exactly means?), they firstly calculate a probability called joint probability of visiting one node/edge E(i,j) and noted as :
From the other hand they seem to calculate another probability called probability of a walk of length L starting in x and denoted as :
7.
In the next step, they divide the previously calculated probabilities and calculate the number of times a random walk starts in x using the transition from i to j that I don't really understand what this means.
After that step, I lost their reasoning at all :-P.
I'm not expecting an expert to come open my mind and give me understand that procedure. What I'm expecting is some guidelines, hints, ideas, useful resources or more intuitive approaches to understanding the whole procedure. Then when I fully understand it I will try to implement it on R or python.
So any idea / critics is welcome.
Thanks.
I'm reading the The Algorithm Design Manual by Steven S Skiena, and trying to understand the solution to the problem of War Story: What’s Past is Prolog.
The problem is also well described here.
Basically, the problem is, given an ordered list of strings, give a optimal solution to construct a trie with minimum size (the string character as the node), with the constraint that the order of the strings must be reserved, while the character index can be reordered.
Maybe this is not an appropriate question here for stackoverflow, still I'm wondering if anyone could give me some hint on the solution, especially what this recurrence means by its arguments:
the recurrence for the Dynamic Programming algorithm
You can think about it this way:
Let's assume that we fix the index of the first character. All strings get split into r bins based on the value of the character in this position (bins are essentially subtrees).
We can work with each bin independently. It won't change the order across different bins because two strings in different bins are different in the first character.
Thus, we can solve the problem for each bin independently. After that, we need exactly one edge to connect the root to each bin (that is, subtree). That's where the formula
C[i_k, j_k] + 1 comes from.
As we want to minimize the total number of edges and we're free to pick the first position, we just try all possible options among m positions.
Note: this algorithm is correct under assumption that we can reorder the rest of the characters in each subtree independently. If it's not the case, the dynamic programming solution is incorrect.
Normally, at each node of the decision tree, we consider all features and all splitting points for each feature. We calculate the difference between the entropy of the entire node and the weighted avg of the entropies of potential left and right branches, and the feature + splitting feature_value that gives us the greatest entropy drop is chosen as the splitting criterion for that particular node.
Can someone explain why the above process, which requires (2^m -2)/2 tries for each feature at each node, where m is the number of distinct feature_values at the node, is the same as trying ONLY m-1 splits:
sort the m distinct feature_values by the percentage of 1's of the samples within the node that takes that feature_value for that feature.
Only try the m-1 ways of splitting the sorted list.
This 'trying only m-1 splits' method is mentioned as a 'shortcut' in the article below, which (by definition of 'shortcut') means the results of the two methods which differ drastically in runtime are exactly the same.
The quote:"For regression and binary classification problems, with K = 2 response classes, there is a computational shortcut [1]. The tree can order the categories by mean response (for regression) or class probability for one of the classes (for classification). Then, the optimal split is one of the L – 1 splits for the ordered list. "
The article:
http://www.mathworks.com/help/stats/splitting-categorical-predictors-for-multiclass-classification.html?s_tid=gn_loc_drop&requestedDomain=uk.mathworks.com
Note that I'm talking only about categorical variables.
Can someone explain why the above process, which requires (2^m -2)/2 tries for each feature at each node, where m is the number of distinct feature_values at the node, is the same as trying ONLY m-1 splits:
The answer is simple: both procedures just aren't the same. As you noticed, splitting in the exact way is an NP-hard problem and thus hardly feasible for any problem in practice. Moreover, due to overfitting that would usually be not the optimal result in terms of generaluzation.
Instead, the exhaustive search is replaced by some kind of greedy procedure which goes like: sort first, then try all ordered splits. In general this leads to different results than the exact splitting.
In order to improve on the greedy result, one further often applies pruning (which can be seen as another greedy and heuristic method). And never methods like random forests or BART deal with this problem effectively by averaging over several trees -- so that the deviation of a single tree becomes less important.
I am trying to implement a basic genetic algorithm in MATLAB. I have some questions regarding the cross-over operation. I was reading materials on it and I found that always two parents are selected for cross-over operation.
What happens if I happen to have an odd number of parents?
Suppose I have parent A, parent B & parent C and I cross parent A with B and again parent B with C to produce offspring, even then I get 4 offspring. What is the criteria for rejecting one of them, as my population pool should remain the same always? Should I just reject the offspring with the lowest fitness value ?
Can an arithmetic operation between parents, like suppose OR or AND operation be deemed a good crossover operation? I found some sites listing them as crossover operations but I am not sure.
How can I do crossover between multiple parents ?
"Crossover" isn't so much a well-defined operator as the generic idea of taking aspects of parents and using them to produce offspring similar to each parent in some ways. As such, there's no real right answer to the question of how one should do crossover.
In practice, you should do whatever makes sense for your problem domain and encoding. With things like two parent recombination of binary encoded individuals, there are some obvious choices -- things like n-point and uniform crossover, for instance. For real-valued encodings, there are things like SBX that aren't really sensible if viewed from a strict biological perspective. Rather, they are simply engineered to have some predetermined properties. Similarly, permutation encodings offer numerous well-known operators (Order crossover, Cycle crossover, Edge-assembly crossover, etc.) that, again, are the result of analysis of what features in parents make sense to make heritable for particular problem domains.
You're free to do the same thing. If you have three parents (with some discrete encoding like binary), you could do something like the following:
child = new chromosome(L)
for i=1 to L
switch(rand(3))
case 0:
child[i] = parentA[i]
case 1:
child[i] = parentB[i]
case 2:
child[i] = parentC[i]
Whether that is a good operator or not will depend on several factors (problem domain, the interpretation of the encoding, etc.), but it's a perfectly legal way of producing offspring. You could also invent your own more complex method, e.g., taking a weighted average of each allele value over multiple parents, doing boolean operations like AND and OR, etc. You can also build a more "structured" operator if you like in which different parents have specific roles. The basic Differential Evolution algorithm selects three parents, a, b, and c, and computes an update like a + F(b - c) (with some function F) roughly corresponding to an offspring.
Consider reading the following academic articles:
DEB, Kalyanmoy et al. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, v. 6, n. 2, p. 182-197, 2002.
DEB, Kalyanmoy; AGRAWAL, Ram Bhushan. Simulated binary crossover for continuous search space. Complex systems, v. 9, n. 2, p. 115-148, 1995.
For SBX, method of crossing and mutate children mentioned by #deong, see answer simulated-binary-crossover-sbx-crossover-operator-example
Genetic algorithm does not have an arbitrary and definite form to be made. Many ways are proposed. But generally, what applies in all are the following steps:
Generate a random population by lot or any other method
Cross parents to raise children
Mutate
Evaluate the children and parents
Generate new population based only on children or children and parents (different approaches exist)
Return to item 2
NSGA-II, the DEB quoted above, is one of the most widely used and well-known genetic algorithms. See an image of the flow taken from the article:
First of all, this is a part of a homework.
I am trying to implement a genetic algorithm. I am confused about selecting parents to crossover.
In my notes (obviously something is wrong) this is what is done as example;
Pc (possibility of crossover) * population size = estimated chromosome count to crossover (if not even, round to one of closest even)
Choose a random number in range [0,1] for every chromosome and if this number is smaller then Pc, choose this chromosome for a crossover pair.
But when second step applied, chosen chromosome count is equals to result found in first step. Which is not always guaranteed because of randomness.
So this does not make any sense. I searched for selecting parents for crossover but all i found is crossover techniques (one-point, cut and slice etc.) and how to crossover between chosen parents (i do not have a problem with these). I just don't know which chromosomesi should choose for crossover. Any suggestions or simple example?
You can implement it like this:
For every new child you decide if it will result from crossover by random probability. If yes, then you select two parents, eg. through roulette wheel selection or tournament selection. The two parents make a child, then you mutate it with mutation probability and add it to the next generation. If no, then you select only one "parent" clone it, mutate it with probability and add it to the next population.
Some other observations I noted and that I like to comment. I often read the word "chromosomes" when it should be individual. You hardly ever select chromosomes, but full individuals. A chromosome is just one part of a solution. That may be nitpicking, but a solution is not a chromosome. A solution is an individual that consists of several chromosomes which consist of genes which show their expression in the form of alleles. Often an individual has only one chromosome, but it's still not okay to mix terms.
Also I noted that you tagged genetic programming which is basically only a special type of a genetic algorithm. In GP you consider trees as a chromosome which can represent mathematical formulas or computer programs. Your question does not seem to be about GP though.
This is very late answer, but hoping it will help someone in the future. Even if two chromosomes are not paired (and produced children), they goes to the next generation (without crossover) but after some mutation (subject to probability again). And on the other hand, if two chromosomes paired, then they produce two children (replacing the original two parents) for the next generation. So, that's why the no of chromosomes remain same in two generations.