What is the basic difference between Chromosome and Genotype in the discussion of Genetic Algorithms?
My guess is, Genotype is a particular arrangement of a chromosome.
What do you say?
In Genetic Algorithms unlike biology, genome and chromosome are equivalent. Both refer to the way the genes are arranged in an individual to propose a possible solution to resolve the problem. I.e. both are equal to genotype: "The complete heritable genetic identity" as noted here.
Related
I am new to heuristic methods of optimization and learning about different optimization algorithms available in this space like Gentic Algorithm, PSO, DE, CMA ES etc.. The general flow of any of these algorithms seem to be initialise a population, select, crossover and mutation for update , evaluate and the cycle continues. The initial step of population creation in genetic algorithm seems to be that each member of the population is encoded by a chromosome, which is a bitstring of 0s and 1s and then all the other operations are performed. GE has simple update methods to popualation like mutation and crossover, but update methods are different in other algorithms.
My query here is do all the other heuristic algorithms also initialize the population as bitstrings of 0 and 1s or do they use the general natural numbers?
The representation of individuals in evolutionary algorithms (EA) depends on the representation of a candidate solution. If you are solving a combinatorial problem i.e. knapsack problem, the final solution is comprised of (0,1) string, so it makes sense to have a binary representation for the EA. However, if you are solving a continuous black-box optimisation problem, then it makes sense to have a representation with continuous decision variables.
In the old days, GA and other algorithms only used binary representation even for solving continuous problems. But nowadays, all the algorithms you mentioned have their own binary and continuous (and etc.) variants. For example, PSO is known as a continuous problem solver, but to update the individuals (particles), there are mapping strategies such as s-shape transform or v-shape transform to update the binary individuals for the next iteration.
My two cents: the choice of the algorithm relies on the type of the problem, and I personally won't recommend using a binary PSO at first try to solve a problem. Maybe there are benefits hidden there but need investigation.
Please feel free to extend your question.
I'm trying to develop my understanding of genetic algorithms.
I have found great explanations for how to run a genetic algorithm for the knapsack problem here and the travelling salesman problem here, and I understand these processes now, and from these papers, I understand how to encode these problems into chromosomes (as described in the papers linked).
I'm struggling to understand how this translates to the bin-packing problem (described here) so as to begin understanding the algorithm. Could someone show me a sample of how to encode the bin-packing problem into chromosomes just with a small amount of toy data to start me off?
It's easier with float-valued genes, as in the BRKGA framework. Then you can (e.g.) have one gene per item and decode a chromosome by sorting the corresponding item using the genes to compare, and then turning the order into packed bins by running a simple online approximation algorithm like next-fit.
I'm reading up on the one dimensional bin packing problem and the different solutions that can be used to solve it.
Bin Packing Problem Definition: Given a list of objects and their weights, and a collection of bins of fixed size, find the smallest number of bins so that all of the objects are assigned to a bin.
Solutions I'm studying: Next Fit, First Fit, Best Fit, Worst Fit, First Fit Decreasing, Best Fit Decreasing
I notice that some articles I read call these "approximation algorithms", and others call these "heuristics". I know that there is a difference between approximation algorithms and heuristics:
Heuristic: With some hard problems, it's difficult to get an acceptable solution in a decent run time, so we can get an "okay" solution by applying some educated guesses, or arbitrarily choosing.
Approximation Algorithm: This gives an approximate solution, with some "guarantee" on it's performance (maybe a ratio, or something like that)
So, my question is are these solutions that I'm studying heuristic or approximation algorithms? I'm more inclined to believe that they are heuristic because we're choosing the next item to be placed in a bin by some "guess". We're not guaranteed of some optimal solution. So why do some people call them approximation algorithms?
If these aren't heuristic algorithms, then what are examples of heuristic algorithms to solve the bin packing problem?
An algorithm can be both a heuristic and an approximation algorithm -- the two terms don't conflict. If some "good but not always optimal" strategy (the heuristic) can be proven to be "not too bad" (the approximation guarantee), then it qualifies as both.
All of the algorithms you listed are heuristic because they prescribe a "usually good" strategy, which is the heuristic. For any of the algorithms where there is an approximation guarantee (the "error" must be bounded somehow), then you can also say it's an approximation algorithm.
Consider a genetic algorithm that uses only selection and mutation(no crossover). How is this similar to a hill climbing algorithm ?
I found this statement in an article, but I dont seem to understand why?
This statement is risky and it is hard to see. I believe that many would not necessairly (or fully) agree with it.
Case when it may be true
I believe that the author of this statement wants to say that it is possible to use only mutation and selection to achieve hill-climbing algorithm.
Imagine that each mutation of your Chromosome (X) can improve or deteriorate value of your fitness function (Y) (imagine it is height). We want to find X for which Y is the biggest.
We put into our pool population of Chromosomes (X)
We MUTATE chromosomes (X) and look for improvement of (Y).
After mutation SELECT only chromosomes producing highest (Y) and repeat steps 1-2 20 times.
Because at every stage you are rejecting poor values - you will be able to get (nearly) maximum value of Y.
I think this is what the author was trying to say.
Case when it may be false
When mutations affect Chromosomes to a great extent - the algorithm will not converge easily to the maximum. This is when too many genes in a chromosome are affected at each mutation;
When chromosome after mutation does not resemble its original set of genes - you are only introducing noise. In the effect it is a bit like using random generator for (X) to find maxmimum (Y). Every time you mutate (X) you are getting something that has nothing to do with its original.
You may find maximum value, but it has little to do with hill climbing.
Both hill climbing and genetic algorithms without crossover are local search techniques. In that sense, they are similar, but I would not say they are the same.
Hill climbing comes in different forms but all share some properties that the genetic algorithm does not have:
there is one well-defined neighbor function (which given one solution can enumerate all neighbors)
unless cancelled the algorithm continues as long as improvements are found (not after a fixed number of generations)
during the iteration, there is only one solution (not a pool of solutions)
In practice, choosing a good neighbor function can have a huge impact on the effectiveness of a hill climbing algorithm. Here, you can sometimes use additional domain knowledge.
In genetic algorithms, as far as I have seen, domain knowledge is not used for mutators. Mostly, they use simple techniques like flipping bits or adding random noise to numbers.
Hill climbing can work well as a deterministic algorithm without any randomness. Depending on your problem, that may be a critical property or not. If not, then random-restart hill climbing will often lead to better results.
In summary, if you use a genetic algorithm without crossovers, you end up with a rather bad local search algorithm. I would expect a good hill climbing algorithm to outperform it, especially in a scenario where you are under strict time contraints (real-time systems).
I'm not sure if my understanding of maximization and minimization is correct.
So lets say for some function f(x,y,z) I want to find what would give the highest value that would be maximization, right? And if I wanted to find the lowest value that would be minimization?
So if a genetic algorithm is a search algorithm trying to maximize some fitness function would they by definition be maximization algorithms?
So let's say for some function f(x,y,z), I want to find what would give the highest value that would be maximization, right? And if I wanted to find the lowest value that would be minimization?
Yes, that's by definition true.
So if a genetic algorithm is a search algorithm trying to maximize some fitness function would they by definition be maximization algorithms?
Pretty much yes, although I'm not sure a "maximization algorithm" is a well-used term, and only if a genetic algorithm is defined as such, which I don't believe it is strictly.
Generic algorithms can also try to minimize the distance to some goal function value, or minimize the function value, but then again, this can just be rephrased as maximization without loss of generality.
Perhaps more significantly, there isn't a strict need to even have a function - the candidates just need to be comparable. If they have a total order, it's again possible to rephrase it as a maximization problem. If they don't have a total order, it might be a bit more difficult to get candidates objectively better than all the others, although nothing's stopping you from running the GA on this type of data.
In conclusion - trying to maximize a function is the norm (and possibly in line with how you'll mostly see it defined), but don't be surprised if you come across a GA that doesn't do this.
Are all genetic algorithms maximization algorithms?
No they aren't.
Genetic algorithms are popular approaches to multi-objective optimization (e.g. NSGA-II or SPEA-2 are very well known genetic algorithm based approaches).
For multi-objective optimization you aren't trying to maximize a function.
This because scalarizing multi-objective optimization problems is seldom a viable way (i.e. there isn't a single solution that simultaneously optimizes each objective) and what you are looking for is a set of nondominated solutions (or a representative subset of the Pareto optimal solutions).
There are also approaches to evolutionary algorithms which try to capture open-endedness of natural evolution searching for behavioral novelty. Even in an objective-based problem, such novelty search ignores the objective (see Abandoning Objectives: Evolution through the
Search for Novelty Alone by Joel Lehman and Kenneth O. Stanley for details).