Choosing parents to crossover in genetic algorithms? - genetic-algorithm

First of all, this is a part of a homework.
I am trying to implement a genetic algorithm. I am confused about selecting parents to crossover.
In my notes (obviously something is wrong) this is what is done as example;
Pc (possibility of crossover) * population size = estimated chromosome count to crossover (if not even, round to one of closest even)
Choose a random number in range [0,1] for every chromosome and if this number is smaller then Pc, choose this chromosome for a crossover pair.
But when second step applied, chosen chromosome count is equals to result found in first step. Which is not always guaranteed because of randomness.
So this does not make any sense. I searched for selecting parents for crossover but all i found is crossover techniques (one-point, cut and slice etc.) and how to crossover between chosen parents (i do not have a problem with these). I just don't know which chromosomesi should choose for crossover. Any suggestions or simple example?

You can implement it like this:
For every new child you decide if it will result from crossover by random probability. If yes, then you select two parents, eg. through roulette wheel selection or tournament selection. The two parents make a child, then you mutate it with mutation probability and add it to the next generation. If no, then you select only one "parent" clone it, mutate it with probability and add it to the next population.
Some other observations I noted and that I like to comment. I often read the word "chromosomes" when it should be individual. You hardly ever select chromosomes, but full individuals. A chromosome is just one part of a solution. That may be nitpicking, but a solution is not a chromosome. A solution is an individual that consists of several chromosomes which consist of genes which show their expression in the form of alleles. Often an individual has only one chromosome, but it's still not okay to mix terms.
Also I noted that you tagged genetic programming which is basically only a special type of a genetic algorithm. In GP you consider trees as a chromosome which can represent mathematical formulas or computer programs. Your question does not seem to be about GP though.

This is very late answer, but hoping it will help someone in the future. Even if two chromosomes are not paired (and produced children), they goes to the next generation (without crossover) but after some mutation (subject to probability again). And on the other hand, if two chromosomes paired, then they produce two children (replacing the original two parents) for the next generation. So, that's why the no of chromosomes remain same in two generations.

Related

Question about mating pool, selection and crossover

I am currently trying to coding a genetic algorithm which intended to find the optimal solution for timetable planning. I have successfully created the population and also able to calculate the fitness, what I confuse is that mating pool and the selection.
I am planning to do the tournament solution.
What I know So far is that I need to select a random number of candidate, and choose the "most fit" and the first parent. Repeat the step and find the second parent. the crossover each other. But How many crossovers I need to do? Until the same size of the population size that I set. then How about my original population?
Can Someone help me?
Your original population dies. If you want to keep the best solution(s) you can copy it to the new population (this is called elitism). You then generate offspring until your new population is full. Have a look at this Outline of the Basic Genetic Algorithm.
Cross-over is just one way to make children "different" from parents (in addition to mutation). This is independent from how you select good parents (e.g. tournaments). But note that there are so many GA variants that this may not be true for all of them.
I would consider to start with only mutation (no cross-over). This is much easier to implement, and sometimes good enough. You can always add cross-over later and see if you get an improvement.

Maintaining Population Size in a Genetic Algorithm/Program

I'm writing genetic program, but it's been a while so I'm a little rusty.
If I start with a population size of 100 individuals, and select 50 through tournament selection for reproduction, and after crossover each pair produces 50 next-generation individuals, I'm left with 100 1st-gen individuals (which will no longer reproduce, no longer part of the "population") and 50 current-gen individuals. So my tournament selection of 50 won't really work. Should the tournament selected individuals also go on to the next generation? Or should they reproduce 2:1 somehow?
Thanks for the refresher!
There are many ways to perform selection and crossover in a Genetic Algorithm but generally, if you're using tournament selection you're best to select as many individuals as your population and have them produce the same number of offspring.
There are a number of ways to produce the same number of offspring as parents but, as an example, if performing a straight forward one point crossover each half of the initial parent would carry forward with the other half of the other parent. That way two parents produce two offspring. For example
Parent 1: 00000000
Parent 2: 11111111
With a crossover point after the third bit.
Offspring 1: 00011111
Offspring 2: 11100000
Afterwards you can discard your entire initial population and replace them with all the offspring.
Note: This doesn't take into account any specialised operator you may want to include which can help to carry the best individuals forward every population. But that's another story....

What is the correct way to execute selection procedure in a genetic algorithm?

I am working on a genetic algorithm for symmetric TSP in VB.NET. I want to know what is the correct way go execute selection procedure. There seems to be at least two different possibilities:
1)
-create a "reproduction pool" of size R by using SELECTION(pop) function
-do offspring creation cycle
-randomly (uniformly) select two parents from that pool for each offspring
that needs to be created in each iteration
2)
-do offspring creation cycle
-use modified SELECTION(pop) function that will return two different parents from pop
-perform crossover to produce a child
Bonus question: After selecting two parents it is possible to produce two different offsprings (if the crossover operator is mot commutative): CROSS(p1, p2) and CROSS(p2, p1).
Should I insert both offsprings immediately or produce them one by one ? Will this make a difference?
Currently I am producing them one by one because I think it will give more variance in the population.
In genetic algorithms you don't use a separate reproduction pool, but sample from the population (|N| until you have 2*|N| parents out of which you create |N| children). If your reproduction pool R is of size 2*|N| and you sample randomly out of that pool it's essentially the same behavior, but you need more random numbers and it's more expensive to compute (depending on your RNG). Note, that there is no need to care about getting two different parents. A parent mated by itself will produce a child that is the same as the parent (if the crossover is idempotent). It's similar to using crossover probability. Also the check if two parents are different may be quite expensive if you compare them structurally. You could also compare them by fitness, but often you can have very different solutions of the same quality.
Regarding your second question: I would say that it doesn't matter much. I would choose to return only one child for simplicity reasons: A method that takes two solutions and returns one solution is easier to deal with than one that returns an array of solutions. I would say that returning both is only interesting in those cases where you can create two distinct solutions. This is the case of binary or real-valued encoding. With permutations however you cannot guarantee this property and some genetic information will be lost in the crossover anyway.
It depends on the codification.
You can consider the two fittest individuals of the current population.
Or you can use roulette whell selection (Google it) to associate each individual with a reproduction rate, this is the usual way.

Optimal placement of objects wrt pairwise similarity weights

Ok this is an abstract algorithmic challenge and it will remain abstract since it is a top secret where I am going to use it.
Suppose we have a set of objects O = {o_1, ..., o_N} and a symmetric similarity matrix S where s_ij is the pairwise correlation of objects o_i and o_j.
Assume also that we have an one-dimensional space with discrete positions where objects may be put (like having N boxes in a row or chairs for people).
Having a certain placement, we may measure the cost of moving from the position of one object to that of another object as the number of boxes we need to pass by until we reach our target multiplied with their pairwise object similarity. Moving from a position to the box right after or before that position has zero cost.
Imagine an example where for three objects we have the following similarity matrix:
1.0 0.5 0.8
S = 0.5 1.0 0.1
0.8 0.1 1.0
Then, the best ordering of objects in the tree boxes is obviously:
[o_3] [o_1] [o_2]
The cost of this ordering is the sum of costs (counting boxes) for moving from one object to all others. So here we have cost only for the distance between o_2 and o_3 equal to 1box * 0.1sim = 0.1, the same as:
[o_3] [o_1] [o_2]
On the other hand:
[o_1] [o_2] [o_3]
would have cost = cost(o_1-->o_3) = 1box * 0.8sim = 0.8.
The target is to determine a placement of the N objects in the available positions in a way that we minimize the above mentioned overall cost for all possible pairs of objects!
An analogue is to imagine that we have a table and chairs side by side in one row only (like the boxes) and you need to put N people to sit on the chairs. Now those ppl have some relations that is -lets say- how probable is one of them to want to speak to another. This is to stand up pass by a number of chairs and speak to the guy there. When the people sit on two successive chairs then they don't need to move in order to talk to each other.
So how can we put those ppl down so that every distance-cost between two ppl are minimized. This means that during the night the overall number of distances walked by the guests are close to minimum.
Greedy search is... ok forget it!
I am interested in hearing if there is a standard formulation of such problem for which I could find some literature, and also different searching approaches (e.g. dynamic programming, tabu search, simulated annealing etc from combinatorial optimization field).
Looking forward to hear your ideas.
PS. My question has something in common with this thread Algorithm for ordering a list of Objects, but I think here it is better posed as problem and probably slightly different.
That sounds like an instance of the Quadratic Assignment Problem. The speciality is due to the fact that the locations are placed on one line only, but I don't think this will make it easier to solve. The QAP in general is NP hard. Unless I misinterpreted your problem you can't find an optimal algorithm that solves the problem in polynomial time without proving P=NP at the same time.
If the instances are small you can use exact methods such as branch and bound. You can also use tabu search or other metaheuristics if the problem is more difficult. We have an implementation of the QAP and some metaheuristics in HeuristicLab. You can configure the problem in the GUI, just paste the similarity and the distance matrix into the appropriate parameters. Try starting with the robust Taboo Search. It's an older, but still quite well working algorithm. Taillard also has the C code for it on his website if you want to implement it for yourself. Our implementation is based on that code.
There has been a lot of publications done on the QAP. More modern algorithms combine genetic search abilities with local search heuristics (e. g. Genetic Local Search from Stützle IIRC).
Here's a variation of the already posted method. I don't think this one is optimal, but it may be a start.
Create a list of all the pairs in descending cost order.
While list not empty:
Pop the head item from the list.
If neither element is in an existing group, create a new group containing
the pair.
If one element is in an existing group, add the other element to whichever
end puts it closer to the group member.
If both elements are in existing groups, combine them so as to minimize
the distance between the pair.
Group combining may require reversal of order in a group, and the data structure should
be designed to support that.
Let me help the thread (of my own) with a simplistic ordering approach.
1. Order the upper half of the similarity matrix.
2. Start with the pair of objects having the highest similarity weight and place them in the center positions.
3. The next object may be put on the left or the right side of them. So each time you may select the object that when put to left or right
has the highest cost to the pre-placed objects. Goto Step 2.
The selection of Step 3 is because if you left this object and place it later this cost will be again the greatest of the remaining, and even more (farther to the pre-placed objects). So the costly placements should be done as earlier as it can be.
This is too simple and of course does not discover a good solution.
Another approach is to
1. start with a complete ordering generated somehow (random or from another algorithm)
2. try to improve it using "swaps" of object pairs.
I believe local minima would be a huge deterrent.

Grouping individuals into families

We have a simulation program where we take a very large population of individual people and group them into families. Each family is then run through the simulation.
I am in charge of grouping the individuals into families, and I think it is a really cool problem.
Right now, my technique is pretty naive/simple. Each individual record has some characteristics, including married/single, age, gender, and income level. For married people I select an individual and loop through the population and look for a match based on a match function. For people/couples with children I essentially do the same thing, looking for a random number of children (selected according to an empirical distribution) and then loop through all of the children and pick them out and add them to the family based on a match function. After this, not everybody is matched, so I relax the restrictions in my match function and loop through again. I keep doing this, but I stop before my match function gets too ridiculous (marries 85-year-olds to 20-year-olds for example). Anyone who is leftover is written out as a single person.
This works well enough for our current purposes, and I'll probably never get time or permission to rework it, but I at least want to plan for the occasion or learn some cool stuff - even if I never use it. Also, I'm afraid the algorithm will not work very well for smaller sample sizes. Does anybody know what type of algorithms I can study that might relate to this problem or how I might go about formalizing it?
For reference, I'm comfortable with chapters 1-26 of CLRS, but I haven't really touched NP-Completeness or Approximation Algorithms. Not that you shouldn't bring up those topics, but if you do, maybe go easy on me because I probably won't understand everything you are talking about right away. :) I also don't really know anything about evolutionary algorithms.
Edit: I am specifically looking to improve the following:
Less ridiculous marriages.
Less single people at the end.
Perhaps what you are looking for is cluster analysis?
Lets try to think of your problem like this (starting by solving the spouses matching):
If you were to have a matrix where each row is a male and each column is a female, and every cell in that matrix is the match function's returned value, what you are now looking for is selecting cells so that there won't be a row or a column in which more than one cell is selected, and the total sum of all selected cells should be maximal. This is very similar to the N Queens Problem, with the modification that each allocation of a "queen" has a reward (which we should maximize).
You could solve this problem by using a graph where:
You have a root,
each of the first raw's cells' values is an edge's weight leading to first depth vertices
each of the second raw's cells' values is an edge's weight leading to second depth vertices..
Etc.
(Notice that when you find a match to the first female, you shouldn't consider her anymore, and so for every other female you find a match to)
Then finding the maximum allocation can be done by BFS, or better still by A* (notice A* typically looks for minimum cost, so you'll have to modify it a bit).
For matching between couples (or singles, more on that later..) and children, I think KNN with some modifications is your best bet, but you'll need to optimize it to your needs. But now I have to relate to your edit..
How do you measure your algorithm's efficiency?
You need a function that receives the expected distribution of all states (single, married with one children, single with two children, etc.), and the distribution of all states in your solution, and grades the solution accordingly. How do you calculate the expected distribution? That's quite a bit of statistics work..
First you need to know the distribution of all states (single, married.. as mentioned above) in the population,
then you need to know the distribution of ages and genders in the population,
and last thing you need to know - the distribution of ages and genders in your population.
Only then, according to those three, can you calculate how many people you expect to be in each state.. And then you can measure the distance between what you expected and what you got... That is a lot of typing.. Sorry for the general parts...

Resources