GA Chromosome Representation with bits of different importance - genetic-algorithm

In a genetic algorithm, is it ok to encode the chromosome in a way such that some bits have more importance than other bits in the same chromosome? For example, the (index%2==0)/(2,4,6,..) bit is more important than (index%2!=0)/(1,3,5,..) bits. For example, if the bit 2 has value in range [1,5], we consider the value of bit 3, and if the bit 2 has value 0, the value of bit 3 makes no effect.
For example, if the problem is that we have multiple courses to be offered by a school and we want to know which course should be offered in the next semester and which should not, and if a course should be offered who should teach that course and when he/she should teach it. So one way to represent the problem is to use a vector of length 2n, where n is the number of courses. Each course is represented by a 2-tuple (who,when), where when is when the course should be taught and who is who should teach it. The tuple in the i-th position holds assignment for the i-th course. Now the possible values for who are the ids of the teachers [1-10], and the possible values for when are all possible times plus 0, where 0 means at no time which means the course should not be offered.
Now is it ok to have two different tuples with the same fitness? For instance, (3,0) and (2,0) are different values for the i-th course but they mean the same thing, this course should not be offered since we don't care about who if when=0. Or should I add 0 to who so that 0 means taught by no one and a tuple means that the corresponding course should not be offered if and only if its value is (0,0). But how about (0,v) and (v,0), where v>0? should I consider these to mean that the course should not be offered? I need help with this please.

I'm not sure I fully understand your question but I'll try to answer as best I can.
When using genetic algorithms to solve problems you can have a lot of flexibility in how it's encoded. Broadly, there are two places where certain bits can have more prominence: In the fitness function or in the implementation of the algorithms (namely selection, crossover and mutation). If you want to change the prominence of certain bits in the fitness function I'd go ahead. This would encourage the behaviour you want and generally lead towards a solution where certain bits are more prominent.
I've worked with a lot of genetic algorithms where the fitness function gives some bits (or groupings of bits) more weight than others. It's fairly standard.
I'd be a lot more careful when making certain bits more prominent than others in the genetic algorithm implementation. I've worked with algorithms that only allow certain bits to mutate, or that can only crossover at certain points. Sometimes they can work well (sometimes they're necessary given the problem) but for the most part they're a lot harder to get right, and more prone to problems like premature convergence.
EDIT:
In answer to the second part of your question, and your comments:
The best way to deal with situations where a course should not be offered is probably in the fitness function. Simply give a low score (or no score) to these. The same applies to course duplicates in a chromosome. In theory, this should discourage them from becoming a prevalent part of your population. Alternatively, you could apply a form of "culling" every generation, which completely removes chromosome which are not viable from the population. You can probably mix the two by completely excluding chromosomes with no fitness score from selection.
From what you've said about the problem it sounds like having non-viable chromosomes is probably going to be common. This doesn't have to be a problem. If your fitness function is encoded well, and you use the correct selection and crossover methods it shouldn't be an issue. As long as the more viable solutions are fitter you should be able to evolve a good solution.
In some cases it's a good idea to stop crossover at certain points in the chromosomes. It sounds like this might be the case, but again, without knowing more about your implementation it's hard to say.
I can't really give a more detailed answer without knowing more about how you plan to implement the algorithm. I'm not really familiar with the problem either. It's not something I've ever done. If you add a bit more detail on how you plan to encode the problem and fitness function I may be able to give more specific advise.

Related

How to find neighboring solutions in simulated annealing?

I'm working on an optimization problem and attempting to use simulated annealing as a heuristic. My goal is to optimize placement of k objects given some cost function. Solutions take the form of a set of k ordered pairs representing points in an M*N grid. I'm not sure how to best find a neighboring solution given a current solution. I've considered shifting each point by 1 or 0 units in a random direction. What might be a good approach to finding a neighboring solution given a current set of points?
Since I'm also trying to learn more about SA, what makes a good neighbor-finding algorithm and how close to the current solution should the neighbor be? Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
I would split your question into several smaller:
Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
Usually, you pick multiple points from a neighborhood, and you can explore all of them. For example, you generate 10 points randomly and choose the best one. By doing so you can efficiently explore more possible solutions.
Why is it better than a random guess? Good solutions tend to have a lot in common (e.g. they are close to each other in a search space). So by introducing small incremental changes, you would be able to find a good solution, while random guess could send you to completely different part of a search space and you'll never find an appropriate solution. And because of the curse of dimensionality random jumps are not better than brute force - there will be too many places to jump.
What might be a good approach to finding a neighboring solution given a current set of points?
I regret to tell you, that this question seems to be unsolvable in general. :( It's a mix between art and science. Choosing a right way to explore a search space is too problem specific. Even for solving a placement problem under varying constraints different heuristics may lead to completely different results.
You can try following:
Random shifts by fixed amount of steps (1,2...). That's your approach
Swapping two points
You can memorize bad moves for some time (something similar to tabu search), so you will use only 'good' ones next 100 steps
Use a greedy approach to generate a suboptimal placement, then improve it with methods above.
Try random restarts. At some stage, drop all of your progress so far (except for the best solution so far), raise a temperature and start again from a random initial point. You can do this each 10000 steps or something similar
Fix some points. Put an object at point (x,y) and do not move it at all, try searching for the best possible solution under this constraint.
Prohibit some combinations of objects, e.g. "distance between p1 and p2 must be larger than D".
Mix all steps above in different ways
Try to understand your problem in all tiniest details. You can derive some useful information/constraints/insights from your problem description. Assume that you can't solve placement problem in general, so try to reduce it to a more specific (== simpler, == with smaller search space) problem.
I would say that the last bullet is the most important. Look closely to your problem, consider its practical aspects only. For example, a size of your problems might allow you to enumerate something, or, maybe, some placements are not possible for you and so on and so forth. THere is no way for SA to derive such domain-specific knowledge by itself, so help it!
How to understand that your heuristic is a good one? Only by practical evaluation. Prepare a decent set of tests with obvious/well-known answers and try different approaches. Use well-known benchmarks if there are any of them.
I hope that this is helpful. :)

What string distance algorithm is best for measuring typing accuracy?

I'm trying to write a function that detects how accurate the user typed a particular phrase/sentence/word/words. My objective is to build an app to train the user's typing accuracy of certain phrases.
My initial instinct is to use the basic levenshtein distance algorithm (mostly because that's the only algo I knew off the top of my head).
But after a bit more research, I saw that Jaro-Winkler is a slightly more interesting algorithm because of its consideration for transpositions.
I even found a link that talks about the differences between these algorithms:
Difference between Jaro-Winkler and Levenshtein distance?
Having read all that, in addition to the respective Wikipedia posts, I am still a little clueless as to which algorithm fits my objective the best.
Since you are grading the quality of typing, and you want to train the student to make zero mistakes, you should use Levenshtein distance, because it is less forgiving.
Additionally, Levenshtein score is more intuitive to understand, and easier to represent graphically, than the Jaro-Winkler results. You can modify Levenshtein algorithm to report insertions, deletions, and mistypes separately, and show end-users a list of corrections. Jaro-Winkler, on the other hand, gives you a score that is hard to show to end-user, because penalties for misspelling in the middle are lower than penalties at the end.
Slightly tongue-in-cheek, but only slightly: build a generative model for typing that gives high (prior) probability to hitting the right letter, and apportion out some probabilities for hitting two neighboring keys at once, two keys from different hands in the wrong order, two keys from the same hand in the wrong order, a key near the correct one, a key far from the correct one, etc. Or perhaps less ad-hoc: give your model a probability for a given sequence of keypresses given the current pair of keys needed to continue the passage. You could do a lot of things with such a model; for example, you could get a "distance"-like metric by giving a likelihood score for the learner's actual performance. But even better would be to give them a report summarizing which kinds of errors they make the most -- after all, why boil their performance down to a single number when many numbers would do? Bonus points if you learn the probabilities for the different kinds of errors from a large corpus of real typists' work.
I mostly agree with the answer given by dasblinkenlight, however, would suggest to use the Damerau-Levenshtein distance instead of only Levenshtein, that is, including transpositions. Transpositions are fairly frequent and easy to make while typing, and there is no good reason why they should incur a double distance penalty with respect to the other possible errors (insertions, deletions, and substitutions).

Genetic algorithm. A population contains chromosomes, that don't perform conditions. What should I do with?

I'm working on Traveling Salesman Problem solution with a Genetic Algorithm.
Some chromosomes are contain the shortest way, but they still aren't appropriate.
For example, a salesman must get to the A city at 6.00 pm, but using the solution of a chromosome he'll get there at 7.00 pm. Thus, this solution is not correct.
What should I do with this issue?
Firstly, I can change these chromosomes. But how can I do it?
Secondly, I can keep them. How should I do the selection then?
Thirdly, I can replace them, but I have no idea what should I use instead.
Could you please help me or recommend me some useful information?
English is not my native language, so sorry if I said something wrong.
The easiest solution in my opinion to make the samples carrying these chromosomes unvital.
This means, that at each iteration of your Genetic Algorithm, possible solution that carries that chromosome "dies". This will make sure that the population carrying this chromosome will remain extremely small, and will not be able to "reproduce" for next generations, and will not be an issue - because samples having this chromosome cannot become dominant in the population.
Do not kill the chromosomes if it's not necessary.
Kill the chromosomes only if the population became too large.
Simply Use the crossover and mutation operators, maybe with an elitist sampling strategy.
PS:
Have you read the book of Zbigniew Michalewicz about Genetic Algorithms?
I'm pretty sure it contains an example with the Salesman Problem
You are dealing with a problem with constraints. Using GAs to solve such problems can be tricky, but there are generally four possibilites of how to deal with constraints:
Use such representation that makes the solutions always valid.
Don't use any constraint-preserving representation but introduce some correction operator which makes a valid solution out of an invalid one.
Penalize (or kill) the invalid solutions.
Use a multi-objective algorithm (e.g. NSGA-II, my favourite one) and turn the constraints into objectives.
The last option is very effective but you need to be able to measure how much is a constraint violated. If this is possible, which seems to me that it is in your case - you can just sum the differences between the desired and actual time of the visit, then this measure of the constraint violation just becomes another objective and you optimize the original objective and the new one(s) at the same time. This approach enables the algorithm to exploit useful information in all individuals even if they are invalid.

How genetic algorithm is different from random selection and evaluation for fittest?

I have been learning the genetic algorithm since 2 months. I knew about the process of initial population creation, selection , crossover and mutation etc. But could not understand how we are able to get better results in each generation and how its different than random search for a best solution. Following I am using one example to explain my problem.
Lets take example of travelling salesman problem. Lets say we have several cities as X1,X2....X18 and we have to find the shortest path to travel. So when we do the crossover after selecting the fittest guys, how do we know that after crossover we will get a better chromosome. The same applies for mutation also.
I feel like its just take one arrangement of cities. Calculate the shortest distance to travel them. Then store the distance and arrangement. Then choose another another arrangement/combination. If it is better than prev arrangement, then save the current arrangement/combination and distance else discard the current arrangement. By doing this also, we will get some solution.
I just want to know where is the point where it makes the difference between random selection and genetic algorithm. In genetic algorithm, is there any criteria that we can't select the arrangement/combination of cities which we have already evaluated?
I am not sure if my question is clear. But I am open, I can explain more on my question. Please let me know if my question is not clear.
A random algorithm starts with a completely blank sheet every time. A new random solution is generated each iteration, with no memory of what happened before during the previous iterations.
A genetic algorithm has a history, so it does not start with a blank sheet, except at the very beginning. Each generation the best of the solution population are selected, mutated in some way, and advanced to the next generation. The least good members of the population are dropped.
Genetic algorithms build on previous success, so they are able to advance faster than random algorithms. A classic example of a very simple genetic algorithm, is the Weasel program. It finds its target far more quickly than random chance because each generation it starts with a partial solution, and over time those initial partial solutions are closer to the required solution.
I think there are two things you are asking about. A mathematical proof that GA works, and empirical one, that would waive your concerns.
Although I am not aware if there is general proof, I am quite sure at least a good sketch of a proof was given by John Holland in his book Adaptation in Natural and Artificial Systems for the optimization problems using binary coding. There is something called Holland's schemata theoerm. But you know, it's heuristics, so technically it does not have to be. It basically says that short schemes in genotype raising the average fitness appear exponentially with successive generations. Then cross-over combines them together. I think the proof was given only for binary coding and got some criticism as well.
Regarding your concerns. Of course you have no guarantee that a cross-over will produce a better result. As two intelligent or beautiful parents might have ugly stupid children. The premise of GA is that it is less likely to happen. (As I understand it) The proof for binary coding hinges on the theoerm that says a good partial patterns will start emerging, and given that the length of the genotype should be long enough, such patterns residing in different specimen have chance to be combined into one improving his fitness in general.
I think it is fairly easy to understand in terms of TSP. Crossing-over help to accumulate good sub-paths into one specimen. Of course it all depends on the choice of the crossing method.
Also GA's path towards the solution is not purely random. It moves towards a certain direction with stochastic mechanisms to escape trappings. You can lose best solutions if you allow it. It works because it wants to move towards the current best solutions, but you have a population of specimens and they kind of share knowledge. They are all similar, but given that you preserve diversity new better partial patterns can be introduced to the whole population and get incorporated into the best solutions. This is why diversity in population is regarded as very important.
As a final note please remember the GA is a very broad topic and you can modify the base in nearly every way you want. You can introduce elitarism, taboos, niches, etc. There is no one-and-only approach/implementation.

dynamic fitness function for genetic algorithm

I'm not sure if I'm completely understanding genetic algorithms and how they work, I'm trying to learn via ai4r http://ai4r.rubyforge.org/geneticAlgorithms.html
If in Job Shop Scheduling, which I believe can be solved by GA(?), isn't cost of any single job is based on how it related to it's predecessors? I was thinking I would calculate a cost based on the placement of the chromosome with a dynamic score of how well it is placed rather than a binary value, but I'm not sure this works.
Anybody have any experience with this? or does a GA only work when the difference between any two genomes is static?
I hope I have the right terminology here, as I mentioned, I'm just learning.
-----------------------update-----------------------------------
I think I'm using a bit of the wrong terminology here. I referred to 'fitness' when I think what I actually wanted to use was cost matrix.
The example I'm going from describes this
Each chromosome must represent a posible solution for the problem. This class conatins an array with the list of visited nodes (cities of the tour). The size of the tour is obtained automatically from the traveling costs matrix. You have to assign the costs matrix BEFORE you run the genetic search. The following costs matrix could be used to solve the problem with only 3 cities:
data_set = [ [ 0, 10, 5],
[ 6, 0, 4],
[25, 4, 0]
]
Ai4r::GeneticAlgorithm::Chromosome.set_cost_matrix(data_set)
so in my instance, I'm thinking the 'cost' of each chromosome is dynamic based on it's neighbours.
Since you asked in a comment to make this an answer, I took the liberty of summarizing my earlier responses as well so it's all in one place. The answer to the specific question of "what is a penalty term" is in item #3 below.
The way a standard genetic algorithm works is that each "chromosome" is a complete solution to the problem. In your case, an ordering for the jobs to be submitted. The confusion, I think, centers around the notion that because the individual contributions to fitness made by a particular job in that schedule varies according to the rest of the schedule, you must need something "dynamic". That's not really true. From the point of view of the GA, the only thing that has a fitness is the entire solution. So a dynamic problem is one in which the fitness of a whole schedule can change over time. Going back to the TSP, a dynamic problem would be one in which touring cities in order of A, B, C, D, and then E actually had a different distance each time you tried it. Even though the cost of a tour through B depends on which cities come before and after B in the tour, once you decide that, the costs are static, and because the GA only ever receives costs for entire tours, all it knows is that [A,B,C,D,E] has a constant fitness. No dynamic trickery needed.
Now, your second question was how to handle constraints like, for the TSP example, what if you need to ensure that the salesman gets to Paris by a certain time? Typically, there are three ways to try to handle this.
Never allow a solution to be generated in which he doesn't get there before 2:00. Sometimes this is easy, other times it's very hard. For instance, if the constraint was "he cannot start at city X", it's fairly easy to just not generate solutions that don't start with X. Often though, simply finding valid solutions can be hard, and so this approach doesn't really work.
Allow constraints to be violated, but fix them afterward. In the TSP example, you let crossover and mutation produce any possible tour, but then scan through it to see if he gets to Paris too late. If so, swap the position of Paris with some earlier city in the tour. Again though, sometimes it can be hard to figure out a good way to repair violations.
Penalize the fitness of an infeasible solution. Here, the idea is that even if I can't prevent him from getting to Paris too late and I can't fix it if he does, I can at least make the fitness arbitrarily worse. For TSP, the fitness is the length of the tour. So you might say that if a tour gets him to Paris too late, the fitness is the length of the tour + 100. That let's the solution stay in the population (it might be very good otherwise, so you want it to have a chance to pass on some of its genes), but you make it less likely to be selected, because your selection and replacement methods pick individuals with better fitness values.
For your JSP problem, typically you're looking to minimize the makespan. The same three options are available to you if you do have some constraints. But from what I can tell, you don't really have such constraints. I think you're trying to inject too much knowledge into the process rather than letting the evolutionary algorithm come up with it on its own. That is, you don't necessarily worry about telling the GA that some arrangements of jobs are better than others. You just assign higher fitness to the better ones and let the process converge.
That said, injecting information like this is often a really good thing to do, but you want to have a good understanding of the basic algorithm first. Let's say that we know that for TSP, it's more likely that a good solution will connect cities that are close to one another. The way I would use that information inside a GA would be to generate random solutions non-uniformly (perhaps with a greedy heuristic). I might also replace the standard crossover and mutation algorithms with something customized. Mutation is typically easier to do this with than crossover. To mutate a TSP solution, I might pick two connected cities, break the connection, and then look for a way to reconnect them that was "closer". That is, if a tour is [A,B,C,D,E,F,G,H], I might pick the edge [B,C] at random, and then look for another edge, maybe [F,G], such that when I connected them crossways to get [A,B,G,D,E,F,C,H], the total tour length was lower. I could even extend that mutation beyond one step -- create a loop that keeps trying to break and reconnect edges until it can't find a shorter tour. This leads to what is usually called a hybrid GA because it's a GA hybridized with a local search; sometimes also called a Memetic Algorithm. These sorts of algorithms usually outperform a black-box GA because you're giving the algorithm "hints" to bias it towards trying things you expect to be good.
I think this idea of a memetic algorithm is pretty close to what you were hitting on in your original question of wondering how to deal with the fact that the contribution to fitness from a particular job depends on where the other jobs are in the schedule. The only stumbling block there is that you were a bit unlucky in that the somewhat reasonable idea of thinking of this as "dynamic" leads you a bit astray, as "dynamic" actually means something entirely different here.
So to wrap up, there's nothing "dynamic" about your problem, so the things people do with GAs for dynamic problems will be entirely unhelpful. A standard GA will work with no fancy tricks. However, the idea of using information you have about what schedules work better can be introduced into the genetic operators, and will probably result in a significantly better overall algorithm.
You'd use GA to find say the best order to do a number of jobs in, or those jobs which made say best use of a day's resources. So yes they'd be related to each other.
So your fitness measure would be for seq 1,3,4,5,6,2.
Look at say find shortest path algorithm, starts to make sense then

Resources