Implementing the mutation rate in a genetic algorithm - genetic-algorithm

Given an array such as
[1, 4, 6, 1, 10, 3, 24, 1]
And I wanted to implement a mutation rate of .2 let's say. Would I:
always mutate 20% of my array entries, or
mutate 0-20% of the entries?
iterate over array and mutate each 20% of the time
I am unclear from literature how this is handled - or if there is even an agreed upon standard.
Note - I am a coder meddling with GA so bear with me in the lack of depth of GA knowledge.
Thanks

I was unsure about that too when I started to learn about genetic algorithms. I decided it's best to give each gene a x% chance to be mutated (completely changed). In your case I would iterate over the array and whenever Math.random() is smaller than 0.2 I would set the current number to a new random one.
If you find that you don't get enough diversity you can also add one or two completely random individuals (I like to call them 'foreigners' since they don't have any common ancestors).

Related

Distribute a quantity randomly

I'm starting a project where I'm simulating an explosion of an object. I want to randomly distribute the total mass of the object that explodes into the fragments. For example, if the object has a mass of 3 kg and breaks into 3 fragments their masses could be 1, 0.5, 1.5 respectively. I want to do the same thing with energy and other things. Also, I would like to have control over the random distribution used.
I think I could do this simply by generating a random number, somehow relate it to the quantity I want to distribute and keep doing that while subtracting to the total pool. The problem with this approach is that on first sight it doesn't seem very efficient, and it may give problems for a fixed number of fragments.
So the question is, is there an algorithm or an efficient way to do this?
An example will be thoroughly appreciated.
For this problem, the first thing I would try is this:
Generate N-1 random numbers between 0 and 1
Sort them
Raise them to the xth power
Multiply the N differences between 0, successive numbers, and 1, by the quantity you want to distribute. Of course all these differences add up to 1, so you'll end up distributing exactly the target quantity.
A nice advantage of this method is that you can adjust the parameter x to get an aesthetically pleasing distribution of chunks. Natural explosions won't produce a uniform distribution of chunk sizes, so you'll want to play with this.
So here's a generic algorithm that might work for you:
Generate N random numbers using a distribution of your choosing
Find the sum of all the numbers
Divide each number by its sum
Multiply by the fixed total mass of your object
This will only take O(N) time, and will allow you to control the distribution and number of chunks.

Partition matrix to minimize variance of parts

I have a matrix of real numbers and I would like to find a partition of this matrix such that the both the number of parts and the variance of the numbers in each part are minimized. Intuitively, I want as few parts as possible, but I also want all the numbers within any given part to be close together.
More formally, I suppose for the latter I would find for each part the variance of the numbers in that part, and then take the average of those variances over all the parts. This would be part of the "score" for a given solution, the other part of the score would be, for instance, the total number of elements in the matrix minus the number of parts in the partition, so that fewer parts would lead to this part of the score being higher. The final score for the solution would be a weighted average of the two parts, and the best solution is the one with the highest score.
Obviously a lot of this is heuristic: I need to decide how to balance the number of parts versus the variances. But I'm stuck for even a general approach to the problem.
For instance, given the following simple matrix:
10, 11, 12, 20, 21
8, 13, 9, 22, 23
25, 23, 24, 26, 27
It would be a reasonable solution to partition into the following submatrices:
10, 11, 12 | 20, 21
8, 13, 9 | 22, 23
--------------+----------
25, 23, 24 | 26, 27
Partitioning is only allowed by slicing vertically and horizontally.
Note that I don't need the optimal solution, I just need an approach to get a "good" solution. Also, these matrices are several hundred by several hundred, so brute forcing it is probably not a reasonable solution, unless someone can propose a good way to pare down the search space.
I think you'd be better off by starting with a simpler problem. Let's call this
Problem A: given a fixed number of vertical and/or horizontal partitions, where should they go to minimize the sum of variances (or perhaps some other measure of variation, such as the sum of ranges within each block).
I'd suggest using a dynamic programming formulation for problem A.
Once you have that under control, then you can deal with
Problem B: find the best trade-off between variation and the number of vertical and horizontal partitions.
Obviously, you can reduce the variance to 0 by putting each element into its own block. In general, problem B requires you to solve problem A for each choice of vertical and horizontal partition counts that is considered.
To use a dynamic programming approach for problem B, you would have to formulate an objective function that encapsulates the trade-off you seek. I'm not sure how feasible this is, so I'd suggest looking for different approaches.
As it stands, problem B is a 2D problem. You might find some success looking at 2D clustering algorithms. An alternative might be possible if it can be reformulated as a 1D problem: trading off variation with the number of blocks (instead of the number of vertical and horizontal partition count). Then you could use something like the Jenks natural breaks classification method to decide where to draw the line(s).
Anyway, this answer clearly doesn't give you a working algorithm. But I hope that it does at least provide an approach (which is all you asked for :)).

Data structure and algorithm for multidimensional "volume rental" scheduler

Abstract problem. Imagine the world is a cube, made of multiple cubical cells along all the dimensions of the cube.
Now, imagine you are able to rent certain volumes for certain periods of time, for example: you rent a 3x3x3 volume with coordinates [1, 1, 1] to [3, 3, 3] for year 2012. Then you rent a 2x2x2 volume with coordinates [4, 1, 1] to [5, 2, 2] for year 2012.
Now, imagine you are able to let out volumes that you have rented, in periods for which you have acquired them. For example, having rented the volumes as defined above, you let out a 5x2x1 cell volume with coordinates [1, 1, 1] to [5, 2, 1] for Q1'2012. Then youlet out cell [5, 2, 2] for the whole year 2012.
You can rent the same volumes in multiple "rental contracts", and let them out, also in multiple "contracts".
The question is - what data structures and algorithms can be used to answer questions like:
When can I let out a certain cell?
What cells can I let out in a certain period?
Can I let out cells of certain coordinates, not including all the dimensions (e.g.: someone wants to rent any cells that have coordinate X between 2 and 4 for year 2012)?
A brute force approach (try every combination to check) is out of question. The data set I need this to work is 5-dimensional (with more dimensions potentially coming soon), and the dimensions are 100-200 items long on average.
If you treat time as just another dimension then what you describe looks like the sort of queries you might expect to want to pose about any collection of objects in n-dimensional space.
That suggests to me something like http://en.wikipedia.org/wiki/K-d_tree or possibly some n-dimensional version of http://en.wikipedia.org/wiki/Octree. The catch, of course, is that these datastructures run out of steam as the number of dimensions increase.
You rule out the brute force approach of checking every cell. Are you also forced to rule out the brute force approach of checking each query against each known object in the n-dimensional space? Since everything seems to be an axis-aligned n-dimensional rectangle, checking for the intersection of a query with an object may not be hard - and this may be what you will get anyway if you attempt to duck the problem by throwing it a database query package or some very high level language - a database full table scan.
As mcdowella points out, Octree and k-d trees lose efficiency as the number of dimensions increased beyond about 4 or 5. Since you haven't said what the dimensions are, I'm going to assume they are properties of the objects you are talking about. Just put them in an RDBMS and use indexes on these fields. A good implementation can have good performance doing a query against multiply-indexed items.
If your dimensions have binary values (or small enums) then something else will likely be better.

Find the right/best track combination for a given distance, using a genetic algorithm

I have a list of tracks (model railroad tracks) with different length, example:
TrackA on 3.0cm,
TrackB on 5.0cm,
TrackC on 6.5cm,
TrackD on 10.5cm
Then I want to find out of what kind of track I should put together to get from point A to point B with a given distance and a margin. And I should also be able to a prioritizes the use of track type.
Example; Distance from point A to B is 1.7m, and I have lot of TrackC and few of TrackB.
And I will allow a margin on +/- 0.5cm to the distance.
What kind of tracks should I use, and how many of each track, and how many combination do I have, sorted after the track where I have most of.
I have Google after some C# help using genetic algorithm, but I am lost in, how I can implement this in a good methode.
Please help..
This is how you do it. Ill assume you are familiar with the basic G.A. concepts:
Each individual in the population much consist of various 'lengths of track'.
The primitive set would therefore be a set of constants corresponding to the lengths you have available fore example { 3, 4, 5}
The fitness of each individual is therefore said to be the sum of total error. Or more simply: say your track is supposed to be 1 metre long. If an individual is 1 metre long exactly, there is no error and the fitness is 0. If another individual has a length of 0.5m, its fitness is 0.5. So the lower the better.
So your algorithm goes like:
Construct a population (maybe randomly but there are other techniques for initialisation such as ramped half-and-half, full etc... look them up).
Assess the fitness of each individual. If any have fitness within a certain margin, you're done.
Else advance by one generation via crossover / reproduction / mutation.
Goto #2

Algorithm Question Maximize Average of Functions

I have a set of N non-decreasing functions each denoted by Fi(h), where h is an integer. The functions have numeric values.
I'm trying to figure out a way to maximize the average of all of the functions given some total H value.
For example, say each function represents a grade on an assignment. If I spend h hours on assignment i, I will get g = Fi(h) as my grade. I'm given H hours to finish all of the assignments. I want to maximize my average grade for all assignments.
Can anyone point me in the right direction to figure this out? I just need a generic algorithm in pseudo code and then I can probably adapt quickly from that.
EDIT: I think dynamic programming could be used to figure this out but I'm not really 100% sure.
EDIT 2: I found an example in my algorithms book from when I was in university that is almost the exact same problem take a look here on Google Books.
I don't know about programming, but in mathematics functions of functions are called functionals, and the pertinent math is calculus of variations.
Have a look at linear programming, the section on integer programming
Genetic Algorithms are sometimes used for this sort of a thing, but the result you'll get won't be optimal, but near it.
For a "real" solution (I always feel genetics is sort of cheating) if we can determine some properties of the functions (Is function X rising? Do any of them have asymptotes we can abuse? etc.), then you need to design some analyzing mechanism for each function, and take it from there. If we have no properties for any of them, they could be anything. My math isn't excellent, but those functions could be insane factorials^99 that is zero unless your h is 42 or something.
Without further info, or knowledge that your program could analyze and get some info. I'd go genetics. (It would make sense to apply some analyzing function on it, and if you find some properties you can use, use them, otherwise turn to the genetic algorithm)
If the functions in F are monotonically increasing in their domains then parametric search is applicable (search for Meggido).
Have a look at The bounded knapsack problem and the dynamic programming algorithm given.
I have one question: how many functions and how many hours do you have ?
It seems to me that an exhaustive search would be quite suitable if none is too high.
The Dynamic Programming application is quite easy, first consider:
F1 = [0, 1, 1, 5] # ie F1[0] == 0, F1[1] == 1
F2 = [0, 2, 2, 2]
Then if I have 2 hours, my best method is to do:
F1[1] + F2[1] == 3
If I have 3 hours though, I am better off doing:
F1[3] + F2[0] == 5
So the profile is anarchic given the number of hours, which means that if a solution exists it consists in manipulating the number of functions.
We can thus introduce the methods one at a time:
R1 = [0, 1, 1, 5] # ie maximum achievable (for each amount) if I only have F1
R2 = [0, 2, 3, 5] # ie maximum achievable (for each amount) if I have F1 and F2
Introducing a new function takes O(N) time, where N is the total number of hours (of course I would have to store the exact repartition...)
Thus, if you have M functions, the algorithm is O(M*N) in terms of number of functions execution.
Some functions may not be trivial, but this algorithm performs caching implicitly: ie we only evaluate a given function at a given point once!
I suppose we could be better if we were able to use the increasing property into consideration, but I daresay I am unsure about the specifics. Waiting for a cleverer fellow!
Since it's homework, I'll refrain from posting the code. I would just note that you can "store" the repartition if your R tables are composed of pairs (score,nb) where nb indicates the amount of hours used by the latest method introduced.

Resources