dynamic fitness function for genetic algorithm - genetic-algorithm

I'm not sure if I'm completely understanding genetic algorithms and how they work, I'm trying to learn via ai4r http://ai4r.rubyforge.org/geneticAlgorithms.html
If in Job Shop Scheduling, which I believe can be solved by GA(?), isn't cost of any single job is based on how it related to it's predecessors? I was thinking I would calculate a cost based on the placement of the chromosome with a dynamic score of how well it is placed rather than a binary value, but I'm not sure this works.
Anybody have any experience with this? or does a GA only work when the difference between any two genomes is static?
I hope I have the right terminology here, as I mentioned, I'm just learning.
-----------------------update-----------------------------------
I think I'm using a bit of the wrong terminology here. I referred to 'fitness' when I think what I actually wanted to use was cost matrix.
The example I'm going from describes this
Each chromosome must represent a posible solution for the problem. This class conatins an array with the list of visited nodes (cities of the tour). The size of the tour is obtained automatically from the traveling costs matrix. You have to assign the costs matrix BEFORE you run the genetic search. The following costs matrix could be used to solve the problem with only 3 cities:
data_set = [ [ 0, 10, 5],
[ 6, 0, 4],
[25, 4, 0]
]
Ai4r::GeneticAlgorithm::Chromosome.set_cost_matrix(data_set)
so in my instance, I'm thinking the 'cost' of each chromosome is dynamic based on it's neighbours.

Since you asked in a comment to make this an answer, I took the liberty of summarizing my earlier responses as well so it's all in one place. The answer to the specific question of "what is a penalty term" is in item #3 below.
The way a standard genetic algorithm works is that each "chromosome" is a complete solution to the problem. In your case, an ordering for the jobs to be submitted. The confusion, I think, centers around the notion that because the individual contributions to fitness made by a particular job in that schedule varies according to the rest of the schedule, you must need something "dynamic". That's not really true. From the point of view of the GA, the only thing that has a fitness is the entire solution. So a dynamic problem is one in which the fitness of a whole schedule can change over time. Going back to the TSP, a dynamic problem would be one in which touring cities in order of A, B, C, D, and then E actually had a different distance each time you tried it. Even though the cost of a tour through B depends on which cities come before and after B in the tour, once you decide that, the costs are static, and because the GA only ever receives costs for entire tours, all it knows is that [A,B,C,D,E] has a constant fitness. No dynamic trickery needed.
Now, your second question was how to handle constraints like, for the TSP example, what if you need to ensure that the salesman gets to Paris by a certain time? Typically, there are three ways to try to handle this.
Never allow a solution to be generated in which he doesn't get there before 2:00. Sometimes this is easy, other times it's very hard. For instance, if the constraint was "he cannot start at city X", it's fairly easy to just not generate solutions that don't start with X. Often though, simply finding valid solutions can be hard, and so this approach doesn't really work.
Allow constraints to be violated, but fix them afterward. In the TSP example, you let crossover and mutation produce any possible tour, but then scan through it to see if he gets to Paris too late. If so, swap the position of Paris with some earlier city in the tour. Again though, sometimes it can be hard to figure out a good way to repair violations.
Penalize the fitness of an infeasible solution. Here, the idea is that even if I can't prevent him from getting to Paris too late and I can't fix it if he does, I can at least make the fitness arbitrarily worse. For TSP, the fitness is the length of the tour. So you might say that if a tour gets him to Paris too late, the fitness is the length of the tour + 100. That let's the solution stay in the population (it might be very good otherwise, so you want it to have a chance to pass on some of its genes), but you make it less likely to be selected, because your selection and replacement methods pick individuals with better fitness values.
For your JSP problem, typically you're looking to minimize the makespan. The same three options are available to you if you do have some constraints. But from what I can tell, you don't really have such constraints. I think you're trying to inject too much knowledge into the process rather than letting the evolutionary algorithm come up with it on its own. That is, you don't necessarily worry about telling the GA that some arrangements of jobs are better than others. You just assign higher fitness to the better ones and let the process converge.
That said, injecting information like this is often a really good thing to do, but you want to have a good understanding of the basic algorithm first. Let's say that we know that for TSP, it's more likely that a good solution will connect cities that are close to one another. The way I would use that information inside a GA would be to generate random solutions non-uniformly (perhaps with a greedy heuristic). I might also replace the standard crossover and mutation algorithms with something customized. Mutation is typically easier to do this with than crossover. To mutate a TSP solution, I might pick two connected cities, break the connection, and then look for a way to reconnect them that was "closer". That is, if a tour is [A,B,C,D,E,F,G,H], I might pick the edge [B,C] at random, and then look for another edge, maybe [F,G], such that when I connected them crossways to get [A,B,G,D,E,F,C,H], the total tour length was lower. I could even extend that mutation beyond one step -- create a loop that keeps trying to break and reconnect edges until it can't find a shorter tour. This leads to what is usually called a hybrid GA because it's a GA hybridized with a local search; sometimes also called a Memetic Algorithm. These sorts of algorithms usually outperform a black-box GA because you're giving the algorithm "hints" to bias it towards trying things you expect to be good.
I think this idea of a memetic algorithm is pretty close to what you were hitting on in your original question of wondering how to deal with the fact that the contribution to fitness from a particular job depends on where the other jobs are in the schedule. The only stumbling block there is that you were a bit unlucky in that the somewhat reasonable idea of thinking of this as "dynamic" leads you a bit astray, as "dynamic" actually means something entirely different here.
So to wrap up, there's nothing "dynamic" about your problem, so the things people do with GAs for dynamic problems will be entirely unhelpful. A standard GA will work with no fancy tricks. However, the idea of using information you have about what schedules work better can be introduced into the genetic operators, and will probably result in a significantly better overall algorithm.

You'd use GA to find say the best order to do a number of jobs in, or those jobs which made say best use of a day's resources. So yes they'd be related to each other.
So your fitness measure would be for seq 1,3,4,5,6,2.
Look at say find shortest path algorithm, starts to make sense then

Related

How to find neighboring solutions in simulated annealing?

I'm working on an optimization problem and attempting to use simulated annealing as a heuristic. My goal is to optimize placement of k objects given some cost function. Solutions take the form of a set of k ordered pairs representing points in an M*N grid. I'm not sure how to best find a neighboring solution given a current solution. I've considered shifting each point by 1 or 0 units in a random direction. What might be a good approach to finding a neighboring solution given a current set of points?
Since I'm also trying to learn more about SA, what makes a good neighbor-finding algorithm and how close to the current solution should the neighbor be? Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
I would split your question into several smaller:
Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
Usually, you pick multiple points from a neighborhood, and you can explore all of them. For example, you generate 10 points randomly and choose the best one. By doing so you can efficiently explore more possible solutions.
Why is it better than a random guess? Good solutions tend to have a lot in common (e.g. they are close to each other in a search space). So by introducing small incremental changes, you would be able to find a good solution, while random guess could send you to completely different part of a search space and you'll never find an appropriate solution. And because of the curse of dimensionality random jumps are not better than brute force - there will be too many places to jump.
What might be a good approach to finding a neighboring solution given a current set of points?
I regret to tell you, that this question seems to be unsolvable in general. :( It's a mix between art and science. Choosing a right way to explore a search space is too problem specific. Even for solving a placement problem under varying constraints different heuristics may lead to completely different results.
You can try following:
Random shifts by fixed amount of steps (1,2...). That's your approach
Swapping two points
You can memorize bad moves for some time (something similar to tabu search), so you will use only 'good' ones next 100 steps
Use a greedy approach to generate a suboptimal placement, then improve it with methods above.
Try random restarts. At some stage, drop all of your progress so far (except for the best solution so far), raise a temperature and start again from a random initial point. You can do this each 10000 steps or something similar
Fix some points. Put an object at point (x,y) and do not move it at all, try searching for the best possible solution under this constraint.
Prohibit some combinations of objects, e.g. "distance between p1 and p2 must be larger than D".
Mix all steps above in different ways
Try to understand your problem in all tiniest details. You can derive some useful information/constraints/insights from your problem description. Assume that you can't solve placement problem in general, so try to reduce it to a more specific (== simpler, == with smaller search space) problem.
I would say that the last bullet is the most important. Look closely to your problem, consider its practical aspects only. For example, a size of your problems might allow you to enumerate something, or, maybe, some placements are not possible for you and so on and so forth. THere is no way for SA to derive such domain-specific knowledge by itself, so help it!
How to understand that your heuristic is a good one? Only by practical evaluation. Prepare a decent set of tests with obvious/well-known answers and try different approaches. Use well-known benchmarks if there are any of them.
I hope that this is helpful. :)

GA Chromosome Representation with bits of different importance

In a genetic algorithm, is it ok to encode the chromosome in a way such that some bits have more importance than other bits in the same chromosome? For example, the (index%2==0)/(2,4,6,..) bit is more important than (index%2!=0)/(1,3,5,..) bits. For example, if the bit 2 has value in range [1,5], we consider the value of bit 3, and if the bit 2 has value 0, the value of bit 3 makes no effect.
For example, if the problem is that we have multiple courses to be offered by a school and we want to know which course should be offered in the next semester and which should not, and if a course should be offered who should teach that course and when he/she should teach it. So one way to represent the problem is to use a vector of length 2n, where n is the number of courses. Each course is represented by a 2-tuple (who,when), where when is when the course should be taught and who is who should teach it. The tuple in the i-th position holds assignment for the i-th course. Now the possible values for who are the ids of the teachers [1-10], and the possible values for when are all possible times plus 0, where 0 means at no time which means the course should not be offered.
Now is it ok to have two different tuples with the same fitness? For instance, (3,0) and (2,0) are different values for the i-th course but they mean the same thing, this course should not be offered since we don't care about who if when=0. Or should I add 0 to who so that 0 means taught by no one and a tuple means that the corresponding course should not be offered if and only if its value is (0,0). But how about (0,v) and (v,0), where v>0? should I consider these to mean that the course should not be offered? I need help with this please.
I'm not sure I fully understand your question but I'll try to answer as best I can.
When using genetic algorithms to solve problems you can have a lot of flexibility in how it's encoded. Broadly, there are two places where certain bits can have more prominence: In the fitness function or in the implementation of the algorithms (namely selection, crossover and mutation). If you want to change the prominence of certain bits in the fitness function I'd go ahead. This would encourage the behaviour you want and generally lead towards a solution where certain bits are more prominent.
I've worked with a lot of genetic algorithms where the fitness function gives some bits (or groupings of bits) more weight than others. It's fairly standard.
I'd be a lot more careful when making certain bits more prominent than others in the genetic algorithm implementation. I've worked with algorithms that only allow certain bits to mutate, or that can only crossover at certain points. Sometimes they can work well (sometimes they're necessary given the problem) but for the most part they're a lot harder to get right, and more prone to problems like premature convergence.
EDIT:
In answer to the second part of your question, and your comments:
The best way to deal with situations where a course should not be offered is probably in the fitness function. Simply give a low score (or no score) to these. The same applies to course duplicates in a chromosome. In theory, this should discourage them from becoming a prevalent part of your population. Alternatively, you could apply a form of "culling" every generation, which completely removes chromosome which are not viable from the population. You can probably mix the two by completely excluding chromosomes with no fitness score from selection.
From what you've said about the problem it sounds like having non-viable chromosomes is probably going to be common. This doesn't have to be a problem. If your fitness function is encoded well, and you use the correct selection and crossover methods it shouldn't be an issue. As long as the more viable solutions are fitter you should be able to evolve a good solution.
In some cases it's a good idea to stop crossover at certain points in the chromosomes. It sounds like this might be the case, but again, without knowing more about your implementation it's hard to say.
I can't really give a more detailed answer without knowing more about how you plan to implement the algorithm. I'm not really familiar with the problem either. It's not something I've ever done. If you add a bit more detail on how you plan to encode the problem and fitness function I may be able to give more specific advise.

How genetic algorithm is different from random selection and evaluation for fittest?

I have been learning the genetic algorithm since 2 months. I knew about the process of initial population creation, selection , crossover and mutation etc. But could not understand how we are able to get better results in each generation and how its different than random search for a best solution. Following I am using one example to explain my problem.
Lets take example of travelling salesman problem. Lets say we have several cities as X1,X2....X18 and we have to find the shortest path to travel. So when we do the crossover after selecting the fittest guys, how do we know that after crossover we will get a better chromosome. The same applies for mutation also.
I feel like its just take one arrangement of cities. Calculate the shortest distance to travel them. Then store the distance and arrangement. Then choose another another arrangement/combination. If it is better than prev arrangement, then save the current arrangement/combination and distance else discard the current arrangement. By doing this also, we will get some solution.
I just want to know where is the point where it makes the difference between random selection and genetic algorithm. In genetic algorithm, is there any criteria that we can't select the arrangement/combination of cities which we have already evaluated?
I am not sure if my question is clear. But I am open, I can explain more on my question. Please let me know if my question is not clear.
A random algorithm starts with a completely blank sheet every time. A new random solution is generated each iteration, with no memory of what happened before during the previous iterations.
A genetic algorithm has a history, so it does not start with a blank sheet, except at the very beginning. Each generation the best of the solution population are selected, mutated in some way, and advanced to the next generation. The least good members of the population are dropped.
Genetic algorithms build on previous success, so they are able to advance faster than random algorithms. A classic example of a very simple genetic algorithm, is the Weasel program. It finds its target far more quickly than random chance because each generation it starts with a partial solution, and over time those initial partial solutions are closer to the required solution.
I think there are two things you are asking about. A mathematical proof that GA works, and empirical one, that would waive your concerns.
Although I am not aware if there is general proof, I am quite sure at least a good sketch of a proof was given by John Holland in his book Adaptation in Natural and Artificial Systems for the optimization problems using binary coding. There is something called Holland's schemata theoerm. But you know, it's heuristics, so technically it does not have to be. It basically says that short schemes in genotype raising the average fitness appear exponentially with successive generations. Then cross-over combines them together. I think the proof was given only for binary coding and got some criticism as well.
Regarding your concerns. Of course you have no guarantee that a cross-over will produce a better result. As two intelligent or beautiful parents might have ugly stupid children. The premise of GA is that it is less likely to happen. (As I understand it) The proof for binary coding hinges on the theoerm that says a good partial patterns will start emerging, and given that the length of the genotype should be long enough, such patterns residing in different specimen have chance to be combined into one improving his fitness in general.
I think it is fairly easy to understand in terms of TSP. Crossing-over help to accumulate good sub-paths into one specimen. Of course it all depends on the choice of the crossing method.
Also GA's path towards the solution is not purely random. It moves towards a certain direction with stochastic mechanisms to escape trappings. You can lose best solutions if you allow it. It works because it wants to move towards the current best solutions, but you have a population of specimens and they kind of share knowledge. They are all similar, but given that you preserve diversity new better partial patterns can be introduced to the whole population and get incorporated into the best solutions. This is why diversity in population is regarded as very important.
As a final note please remember the GA is a very broad topic and you can modify the base in nearly every way you want. You can introduce elitarism, taboos, niches, etc. There is no one-and-only approach/implementation.

How to mix genetic algorithm with some heuristic

I'm working on university scheduling problem and using simple genetic algorithm for this. Actually it works great and optimizes the objective function value for 1 hour from 0% to 90% (approx). But then the process getting slow down drammatically and it takes days to get the best solution. I saw a lot of papers that it is reasonable to mix other algos with genetiс one. Could you, please, give me some piece of advise of what algorithm can be mixed with genetic one and of how this algorithm can be applied to speed up the solving process. The main question is how can any heuristic can be applied to such complex-structured problem? I have no idea of how can be applied there, for instance, greedy heuristics.
Thanks to everyone in advance! Really appreciate your help!
Problem description:
I have:
array filled by ScheduleSlot objects
array filled by Lesson objects
I do:
Standart two-point crossover
Mutation (Move random lesson to random position)
Rough selection (select only n best individuals to next population)
Additional information for #Dougal and #izomorphius:
I'm triyng to construct a university schedule, which will have no breaks between lessons, overlaps and geographically distributed lessons for groups and professors.
The fitness function is really simple: fitness = -1000*numberOfOverlaps - 1000*numberOfDistrebutedLessons - 20*numberOfBreaks. (or something like that, we can simply change coefficients in fron of the variables)
At the very beggining I generate my individuals just placing lessons in random room, time and day.
Mutation and crossover, as described above, a really trivial:
Crossover - take to parent schedules, randomly choose the point and the range of crossover and just exchange the parts of parent schedules, generating two child schedules.
Mutation - take a child schedule and move n random lessons to random position.
My initial observation: you have chosen the coefficients in front of the numberOfOverlaps, numberOfDistrebutedLessons and numberOfBreaks somewhat randomly. My experience shows that usually these choices are not the best one and you should better let the computer choose them. I propose writing a second algorithm to choose them - could be neural network, second genetic algorithm or a hill climbing. The idea is - compute how good a result you get after a certain amount of time and try to optimize the choice of these 3 values.
Another idea: after getting the result you may try to brute-force optimize it. What I mean is the following - if you had the initial problem the "silly" solution would be back track that checks all the possibilities and this is usually done using dfs. Now this would be very slow, but you may try using depth first search with iterative deepening or simply a depth restricted DFS.
For many problems, I find that a Lamarckian-style of GA works well, combining a local search into the GA algorithm.
For your case, I would try to introduce a partial systematic search as the local search. There are two obvious ways to do this, and you should probably try both.
Alternate GA iterations with local search iterations. For your local search you could, for example, brute force all the lessons assigned in a single day while leaving everything else unchanged. Another possibility is to move a randomly selected lesson to all free slots to find the best choice for that. The key is to minimise the cost of the brute-search while still having the chance to find local improvements.
Add a new operator alongside mutation and crossover that performs your local search. (You might find that the mutation operator is less useful in the hybrid scheme, so just replacing that could be viable.)
In essence, you will be combining the global exploration of the GA with an efficient local search. Several GA frameworks include features to assist in this combination. For example, GAUL implements the alternate scheme 1 above, with either the full population or just the new offspring at each iteration.

concrete examples of heuristics

What are concrete examples (e.g. Alpha-beta pruning, example:tic-tac-toe and how is it applicable there) of heuristics. I already saw an answered question about what heuristics is but I still don't get the thing where it uses estimation. Can you give me a concrete example of a heuristic and how it works?
Warnsdorff's rule is an heuristic, but the A* search algorithm isn't. It is, as its name implies, a search algorithm, which is not problem-dependent. The heuristic is. An example: you can use the A* (if correctly implemented) to solve the Fifteen puzzle and to find the shortest way out of a maze, but the heuristics used will be different. With the Fifteen puzzle your heuristic could be how many tiles are out of place: the number of moves needed to solve the puzzle will always be greater or equal to the heuristic.
To get out of the maze you could use the Manhattan Distance to a point you know is outside of the maze as your heuristic. Manhattan Distance is widely used in game-like problems as it is the number of "steps" in horizontal and in vertical needed to get to the goal.
Manhattan distance = abs(x2-x1) + abs(y2-y1)
It's easy to see that in the best case (there are no walls) that will be the exact distance to the goal, in the rest you will need more. This is important: your heuristic must be optimistic (admissible heuristic) so that your search algorithm is optimal. It must also be consistent. However, in some applications (such as games with very big maps) you use non-admissible heuristics because a suboptimal solution suffices.
A heuristic is just an approximation to the real cost, (always lower than the real cost if admissible). The better the approximation, the fewer states the search algorithm will have to explore. But better approximations usually mean more computing time, so you have to find a compromise solution.
Most demonstrative is the usage of heuristics in informed search algorithms, such as A-Star. For realistic problems you usually have large search space, making it infeasible to check every single part of it. To avoid this, i.e. to try the most promising parts of the search space first, you use a heuristic. A heuristic gives you an estimate of how good the available subsequent search steps are. You will choose the most promising next step, i.e. best-first. For example if you'd like to search the path between two cities (i.e. vertices, connected by a set of roads, i.e. edges, that form a graph) you may want to choose the straight-line distance to the goal as a heuristic to determine which city to visit first (and see if it's the target city).
Heuristics should have similar properties as metrics for the search space and they usually should be optimistic, but that's another story. The problem of providing a heuristic that works out to be effective and that is side-effect free is yet another problem...
For an application of different heuristics being used to find the path through a given maze also have a look at this answer.
Your question interests me as I've heard about heuristics too during my studies but never saw an application for it, I googled a bit and found this : http://www.predictia.es/blog/aco-search
This code simulate an "ant colony optimization" algorithm to search trough a website.
The "ants" are workers which will search through the site, some will search randomly, some others will follow the "best path" determined by the previous ones.
A concrete example: I've been doing a solver for the game JT's Block, which is roughly equivalent to the Same Game. The algorithm performs a breadth-first search on all possible hits, store the values, and performs to the next ply. Problem is the number of possible hits quickly grows out of control (10e30 estimated positions per game), so I need to prune the list of positions at each turn and only take the "best" of them.
Now, the definition of the "best" positions is quite fuzzy: they are the positions that are expected to lead to the best final scores, but nothing is sure. And here comes the heuristics. I've tried a few of them:
sort positions by score obtained so far
increase score by best score obtained with a x-depth search
increase score based on a complex formula using the number of tiles, their color and their proximity
improve the last heuristic by tweaking its parameters and seeing how they perform
etc...
The last of these heuristic could have lead to an ant-march optimization: there's half a dozen parameters that can be tweaked from 0 to 1, and an optimizer could find the optimal combination of these. For the moment I've just manually improved some of them.
The second of this heuristics is interesting: it could lead to the optimal score through a full depth-first search, but such a goal is impossible of course because it would take too much time. In general, increasing X leads to a better heuristic, but increases the computing time a lot.
So here it is, some examples of heuristics. Anything can be an heuristic as long as it helps your algorithm perform better, and it's what makes them so hard to grasp: they're not deterministic. Another point with heuristics: they're supposed to lead to quick and dirty results of the real stuff, so there's a trade-of between their execution time and their accuracy.
A couple of concrete examples: for solving the Knight's Tour problem, one can use Warnsdorff's rule - an heuristic. Or for solving the Fifteen puzzle, a possible heuristic is the A* search algorithm.
The original question asked for concrete examples for heuristics.
Some of these concrete examples were already given. Another one would be the number of misplaced tiles in the 15-puzzle or its improvement, the Manhattan distance, based on the misplaced tiles.
One of the previous answers also claimed that heuristics are always problem-dependent, whereas algorithms are problem-independent. While there are, of course, also problem-dependent algorithms (for instance, for every problem you can just give an algorithm that immediately solves that very problem, e.g. the optimal strategy for any tower-of-hanoi problem is known) there are also problem-independent heuristics!
Consequently, there are also different kinds of problem-independent heuristics. Thus, in a certain way, every such heuristic can be regarded a concrete heuristic example while not being tailored to a specific problem like 15-puzzle. (Examples for problem-independent heuristics taken from planning are the FF heuristic or the Add heuristic.)
These problem-independent heuristics base on a general description language and then they perform a problem relaxation. That is, the problem relaxation only bases on the syntax (and, of course, its underlying semantics) of the problem description without "knowing" what it represents. If you are interested in this, you should get familiar with "planning" and, more specifically, with "planning as heuristic search". I also want to mention that these heuristics, while being problem-independent, are dependent on the problem description language, of course. (E.g., my before-mentioned heuristics are specific to "planning problems" and even for planning there are various different sub problem classes with differing kinds of heuristics.)

Resources