Genetic Algorithm - deciding on the number of children to generate - genetic-algorithm

If I am to build a genetic algorithm with
Tournament selection
box crossover (Recombination)
Gaussian mutation
population replacement with offstring
I would like to know the following?
should i offspring be generated for each pair or parents or two? If not should the number of offspring created be a parameterized ?
Thanks in advance for your replies

Related

Algorithm: Where to place the facility, digging the well, to have minimal distance in total

The question is: You are appointed by a Non-governmental Organization whose mission is to increase access to drinkable water to find the optimal place to dig a well in a village. The sum of the distances to the houses should be minimal, but there are obstacles (walls, cliffs, trees) in the way.
An example would be:
Where # is an obstacle and * is a house.
What I have tried:
1) for each empty grid, run a Breadth-first search algorithm. and calculate the total distance from that grid to all the houses. Finally, find the one that has the smallest distance.
2) build a complete graph for this map. i.e., connect all the possible routes.
Finally, run the Minimum Spanning Tree algorithm for it. All the empty grid locate on the MST is the solutions
Assuming the number of houses is small, you can run BFS from each house instead of from each empty cell, then take the cell with the minimum sum of distances.
I believe this is over-engineering.
The objective function need not be the “sum” : what if there is an outlier that contributes heavily ?
Also the solutions you have tried are flawed : breadth-first search and MST apply to tree topology which is not existing in your problem setup.
Why don't you do a geographic centre of gravity calculation ; find this initial point ; then apply perturbation techniques to refine the solution ?

Graph partitioning in groups of n vertices each

Is there any graph partitioning method that can partition a graph in groups of maximum n vertices.
Example : I have a graph with 1000 vertices and I want to partition it in subgraphs with maximum 100 vertices. There can be 2 subgraphs with 50 vertices each if algorithm find this being better.
I found a method with k-means and after k-means to "calibrate" clusters in such way to have 100 vertices in each cluster but I think this method is time consuming.
Any ideas ?
Edit: Ok, maybe it was wrong to ask for subgraphs. Just imagine how kmeans work , I want to partition my graph in small groups , after partition I solve TSP problem in each group and then link every group with nearest group and apply 3opt moves for groups center. But to do this I need a partition method to find groups with max n vertices ; algorithm can find k groups with n vertices and if there are some vertices free it will make another group with what left. Vertices must be close eachother not random selected.
You need to make research for this. I remember an heuristic algorithm called Kernighan-Lin that can serve your purpose. Unfortunate you need to generalize it and the time is really bad. I believe it is around O(N^3).
Also another more professional but complicated approach is to use Spectral Partitioning. There is a very detailed article about this topic that you can use on the Direct Science website. Here is a link to this topic: Spectral partitioning works: Planar graphs and finite
element meshes.
I hope this will help you in your quest. But I am warning you this is not a simple matter. Good Luck!

Importance of randomness in Kruskal's algorithm for maze generation

I have an assignment where I need to create a maze from a grid of cells.
I successfully did it using the Randomized Kruskal's algorithm as described on the Wiki page and using a Disjoint-set data structure.
Now the assignment ask me to do the same but instead of picking cells in a random order I just start at the top left of the grid and browse all the cells in order until I reach the bottom left.
The modified algorithm seems to work just fine and I don't notice any major difference with the Randomized Kruskal's algorithm.
So my question is: what is the importance of picking elements in a random order in the Randomized Kruskal's algorithm ? Is there any maze that could be created by the randomized version but could not using the non-random one ?
Thanks,
Both can create all mazes, but the distributions will be different.

Crossover algorithm for Travelling Salesman?

I am looking for a crossover algorithm for a genetic algorithm for the Travelling Salesman Problem.
However, my TSP problem is a variation of the traditional problem:
Instead of only being given a list of points that we need to visit, we are given a list points that we need to visit and a list of points we need to start at and end at. In other words, any route must start and end at any point belonging to the second list, but must visit all the points in the first list.
So in other words, not every permutation of points is valid. Because of this, I'm not sure if traditional crossover algorithms will work well (for instance, I tried ordered crossover and the children it created were generally worse than its parents).
Can anyone suggest a crossover algorithm?
In order to maintain your crossover and mutation operators, you can add a repair operator that checks if the position of your starting and ending locations are correct and if they aren't, swap them into their valid positions (initial and ending).

Why do these maze generation algorithms produce mazes with different properties?

I was browsing the Wikipedia entry on maze generation algorithms and found that the article strongly insinuated that different maze generation algorithms (randomized depth-first search, randomized Kruskal's, etc.) produce mazes with different characteristics. This seems to suggest that the algorithms produce random mazes with different probability distributions over the set of all single-solution mazes (spanning trees on a rectangular grid).
My questions are:
Is this correct? That is, am I reading this article correctly, and is the article correct?
If so, why? I don't see an intuitive reason why the different algorithms would produce different distributions.
Uh well I think it's pretty obvious different algorithms generate different mazes. Let's just talk about spanning trees of a grid. Suppose you have a grid G and you have two algorithms to generate a spanning tree for the grid:
Algorithm A:
Pick any edge of the grid, with 99% probability choose a horizontal one, otherwise a vertical one
Add the edge to the maze, unless adding it would create a cycle
Stop when every vertex is connected to every other vertex (spanning tree complete)
Algorithm B:
As algorithm A, but set the probability to 1% instead of 99%
"Obviously" algorithm A produces mazes with lots of horizontal passages and algorithm B mazes with lots of vertical passages. That is, there is a statistical correlation between the number of horizontal passages in a maze and the maze being produced by algorithm A.
Of course the differences between the Wikipedia algorithms are more intricate but the principle is the same. The algorithms sample the space of possible mazes for a given grid in a non-uniform, structured way.
LOL I remember a scientific conference where a researcher presented her results about her algorithm that did something "for graphs". The results were statistical and presented for "random graphs". Someone asked from the audience "which distribution of random graphs did you draw the graphs from?" The answer: "uh... they were produced by our graph generation program". Duh!
Interesting question. Here my random 2c.
Comparing Prim's to, say, DFS, the latter seems to have a proclivity for producing deeper trees simply due to the fact that the first 'runs' have more space to create deep trees with less branches. Prim's algorithm, on the other hand, appears to create trees with more branching due to the fact that any open branch can be selected at each iteration.
One way to ask this would be to look at what is the probability that each algorithm will produce a tree of depth > N. I have a hunch that they would be different. A more formal approach to do proving this might be to assign some weights to each part of the tree and show it's more likely to be taken or attempt to characterize the space some other way, but I'll be hand wavy and guessing it's correct :). I'm interested in what lead to you think it wouldn't be, because my intuition was the opposite. And no, the Wiki article doesn't give a convincing argument.
EDIT
One simple way to see this to consider an initial tree with two children with a total of k nodes
e.g.,
*---* ... *
\--* ... *
Choose a random node as the start and end. DFS will produce one of two mazes, either the entire tree, or the part of it with the direct path from start to end. Prim's algorithm will produce the 'maze' with the direct path from start to end with secondary paths of length 1 ... k.
It is not statistical until you request that each algorithm produce every solution it can.
What you are perceiving as statistical bias is only a bias towards the preferred, first solution.
That bias may not be algorithmic (set-theory-wise) but implementation dependent (like the bias in the choice of the pivot in quicksort).
Yes, it is correct. You can produce different mazes by starting the process in different ways. Some algorithms start with a fully closed grid and remove walls to generate a path through the maze while some start with a empty grid and add walls leaving behind a path. This alone can produce different results.

Resources