Is there way to detect formula with help of genetic algorithm? - genetic-algorithm

I am trying to find how two images (let's say "image1" and "image2") match to each other.
There are several parameters calculated for each possible position of "image2" relative to "image1". And I have empirical formula which gives "score" to each position depending on those parameters.
I tried to match image pars with help of neural networks, but failed : empirical formula works much better. From this I started thinking about to improve this formula with help of genetic algorithm.
So, the question is : I have a bunch of image pairs and for each pair I know "right" match position. Is genetic algorithm can be used in such things ? Any examples ?
Suggestions and links are appreciated.
Thanks.

Basically, yes! The parameters of your score function could be the parameters that your GA is going to evolve. You may want to use a real coded genetic algorithm or evolution strategy (CMA-ES) if your parameters are in the real domain.
There exist several possible choices for crossover:
Average / Intermediate
Blend-Alpha (BLX-a)
Blend-Alpha-Beta (BLX-a-b)
Discrete
Heuristic
Local
Random Convex
Simulated Binary (SBX)
Single Point
And also some mutation operators:
Normal distributed N(0, sigma) -> e.g. with adaptation to reduce sigma over time
Uniform distributed (in some positions)
Polynomial mutation
Another metaheuristic suitable for real coded problems is particle swarm optimization (PSO).
With genetic programming you're going to evolve a formula (e.g. a tree). I'm not so sure why you mention it, maybe I still misunderstand something. Clarify your problem, just in case.
EDIT:
Okay it seems it's not the weights that you want to optimize, but the whole formula. Still, genetic algorithms can be used for this representation as well. I want to mention HeuristicLab due to its good support for genetic programming.
I assume you have a more complex problem since you want to optimize the scoring function, and still have another algorithm for optimizing the placement according to that scoring function. You could try an easy approach and generate a dataset with several positions predefined and the features calculated accordingly. Then you could formulate a classification problem and find a model that allows you to identify those positionings that are optimal.

Related

How genetic algorithm is different from random selection and evaluation for fittest?

I have been learning the genetic algorithm since 2 months. I knew about the process of initial population creation, selection , crossover and mutation etc. But could not understand how we are able to get better results in each generation and how its different than random search for a best solution. Following I am using one example to explain my problem.
Lets take example of travelling salesman problem. Lets say we have several cities as X1,X2....X18 and we have to find the shortest path to travel. So when we do the crossover after selecting the fittest guys, how do we know that after crossover we will get a better chromosome. The same applies for mutation also.
I feel like its just take one arrangement of cities. Calculate the shortest distance to travel them. Then store the distance and arrangement. Then choose another another arrangement/combination. If it is better than prev arrangement, then save the current arrangement/combination and distance else discard the current arrangement. By doing this also, we will get some solution.
I just want to know where is the point where it makes the difference between random selection and genetic algorithm. In genetic algorithm, is there any criteria that we can't select the arrangement/combination of cities which we have already evaluated?
I am not sure if my question is clear. But I am open, I can explain more on my question. Please let me know if my question is not clear.
A random algorithm starts with a completely blank sheet every time. A new random solution is generated each iteration, with no memory of what happened before during the previous iterations.
A genetic algorithm has a history, so it does not start with a blank sheet, except at the very beginning. Each generation the best of the solution population are selected, mutated in some way, and advanced to the next generation. The least good members of the population are dropped.
Genetic algorithms build on previous success, so they are able to advance faster than random algorithms. A classic example of a very simple genetic algorithm, is the Weasel program. It finds its target far more quickly than random chance because each generation it starts with a partial solution, and over time those initial partial solutions are closer to the required solution.
I think there are two things you are asking about. A mathematical proof that GA works, and empirical one, that would waive your concerns.
Although I am not aware if there is general proof, I am quite sure at least a good sketch of a proof was given by John Holland in his book Adaptation in Natural and Artificial Systems for the optimization problems using binary coding. There is something called Holland's schemata theoerm. But you know, it's heuristics, so technically it does not have to be. It basically says that short schemes in genotype raising the average fitness appear exponentially with successive generations. Then cross-over combines them together. I think the proof was given only for binary coding and got some criticism as well.
Regarding your concerns. Of course you have no guarantee that a cross-over will produce a better result. As two intelligent or beautiful parents might have ugly stupid children. The premise of GA is that it is less likely to happen. (As I understand it) The proof for binary coding hinges on the theoerm that says a good partial patterns will start emerging, and given that the length of the genotype should be long enough, such patterns residing in different specimen have chance to be combined into one improving his fitness in general.
I think it is fairly easy to understand in terms of TSP. Crossing-over help to accumulate good sub-paths into one specimen. Of course it all depends on the choice of the crossing method.
Also GA's path towards the solution is not purely random. It moves towards a certain direction with stochastic mechanisms to escape trappings. You can lose best solutions if you allow it. It works because it wants to move towards the current best solutions, but you have a population of specimens and they kind of share knowledge. They are all similar, but given that you preserve diversity new better partial patterns can be introduced to the whole population and get incorporated into the best solutions. This is why diversity in population is regarded as very important.
As a final note please remember the GA is a very broad topic and you can modify the base in nearly every way you want. You can introduce elitarism, taboos, niches, etc. There is no one-and-only approach/implementation.

Lack of diversification, is it really a drawback of Genetic Algorithms?

We know that Genetic Algorithms (or evolutionary computation) work with an encoding of the points in our solution space Ω rather than these points directly. In the literature, we often find that GAs have the drawback : (1) since many chromosomes are coded into a similar point of Ω or similar chromosomes have very different points, the efficiency is quite low. Do you think that is really a drawback ? because these kind of algorithms uses the mutation operator in each iteration to diversify the candidate solutions. To add more diversivication we simply increase the probability of crossover. And we mustn't forget that our initial population ( of chromosones ) is randomly generated ( another more diversification). The question is, if you think that (1) is a drawback of GAs, can you provide more details ? Thank you.
Mutation and random initialization are not enough to combat the problem that is known as genetic drift which is the major problem of genetic algorithms. Genetic drift means that the GA may quickly lose most of its genetic diversity and the search proceeds in a way that is not beneficial for crossover. This is because the random initial population quickly converges. Mutation is a different thing, if it is high it will diversify, true, but at the same time it will prevent convergence and the solutions will remain at a certain distance to the optimum with higher probability. You will need to adapt the mutation probability (not the crossover probability) during the search. In a similar manner the Evolution Strategy, which is similar to a GA, adapts the mutation strength during the search.
We have developed a variant of the GA that is called OffspringSelection GA (OSGA) which introduces another selection step after crossover. Only those children will be accepted that surpass their parents' fitness (the better, the worse or any linearly interpolated value). This way you can even use random parent selection and put the bias on the quality of the offspring. It has been shown that this slows the genetic drift. The algorithm is implemented in our framework HeuristicLab. It features a GUI so you can download and try it on some problems.
Other techniques that combat genetic drift are niching and crowding which let the diversity flow into the selection and thus introduce another, but likely different bias.
EDIT: I want to add that the situation of having multiple solutions with equal quality might of course pose a problem as it creates neutral areas in the search space. However, I think you didn't really mean that. The primary problem is genetic drift, ie. the loss of (important) genetic information.
As a sidenote, you (the OP) said:
We know that Genetic Algorithms (or evolutionary computation) work with an encoding of the points in our solution space Ω rather than these points directly.
This is not always true. An individual is coded as a genotype, which can have any shape, such as a string (genetic algorithms) or a vector of real (evolution strategies). Each genotype is transformed into a phenotype when assessing the individual, i.e. when its fitness is calculated. In some cases, the phenotype is identical to the genotype: it is called direct coding. Otherwise, the coding is called indirect. (you may find more definitions here (section 2.2.1))
Example of direct encoding:
http://en.wikipedia.org/wiki/Neuroevolution#Direct_and_Indirect_Encoding_of_Networks
Example of indirect encoding:
Suppose you want to optimize the size of a rectangular parallelepiped dened by its length, height and width. To simplify the example, assume that these three quantities are integers between 0 and 15. We can then describe each of them using a 4-bit binary number. An example of a potential solution may be to genotype 0001 0111 01010. The corresponding phenotype is a parallelepiped of length 1, height 7 and width 10.
Now back to the original question on diversity, in addition to what DonAndre said you could read you read chapter 9 "Multi-Modal Problems and Spatial Distribution" of the excellent book Introduction to Evolutionary Computing written by A. E. Eiben and J. E. Smith. as well as a research paper on that matter such as Encouraging Behavioral Diversity in Evolutionary Robotics: an Empirical Study. In a word, diversity is not a drawback of GA, it is "just" an issue.

Optimization of multivariate function with a initial solution close to the optimum

I was wondering if anyone knows which kind of algorithm could be use in my case. I already have run the optimizer on my multivariate function and found a solution to my problem, assuming that my function is regular enough. I slightly perturbate the problem and would like to find the optimum solution which is close to my last solution. Is there any very fast algorithm in this case or should I just fallback to a regular one.
We probably need a bit more information about your problem; but since you know you're near the right solution, and if derivatives are easy to calculate, Newton-Raphson is a sensible choice, and if not, Conjugate-Gradient may make sense.
If you already have an iterative optimizer (for example, based on Powell's direction set method, or CG), why don't you use your initial solution as a starting point for the next run of your optimizer?
EDIT: due to your comment: if calculating the Jacobian or the Hessian matrix gives you performance problems, try BFGS (http://en.wikipedia.org/wiki/BFGS_method), it avoids calculation of the Hessian completely; here
http://www.alglib.net/optimization/lbfgs.php you find a (free-for-non-commercial) implementation of BFGS. A good description of the details you will here.
And don't expect to get anything from finding your initial solution with a less sophisticated algorithm.
So this is all about unconstrained optimization. If you need information about constrained optimization, I suggest you google for "SQP".
there are a bunch of algorithms for finding the roots of equations. If you know approximately where the root is, there are algorithms that will get you arbitrarily close very quickly, in ln n time or better.
One is Newton's method
another is the Bisection Method
Note that these algorithms are for single variable functions, but can be expanded to multivariate functions.
Every minimization algorithm performs better (read: perform at all) if you have a good initial guess. The initial guess for the perturbed problem will be in your case the minimum point of the non perturbed problem.
Then, you have to specify your requirements: you want speed. What accuracy do you want ? Does space efficiency matters ? Most importantly: what information do you have: only the value of the function, or do you also have the derivatives (possibly second derivatives) ?
Some background on the problem would help too. Looking for a smooth function which has been discretized will be very different than looking for hundreds of unrelated parameters.
Global information (ie. is the function convex, is there a guaranteed global minimum or many local ones, etc) can be left aside for now. If you have trouble finding the minimum point of the perturbed problem, this is something you will have to investigate though.
Answering these questions will allow us to select a particular algorithm. There are many choices (and trade-offs) for multivariate optimization.
Also, which is quicker will very much depend on the problem (rather than on the algorithm), and should be determined by experimentation.
Thought I don't know much about using computers in this capacity, I remember an article that used neuroevolutionary techniques to find "best-fit" equations relatively efficiently, given a known function complexity (linear, Nth-polynomial, exponential, logarithmic, etc) and a set of point plots. As I recall it was one of the earliest uses of what we now know as computational neuroevolution; because the functional complexity (and thus the number of terms) of the equation is known and fixed, a static neural net can be used and seeded with your closest values, then "mutated" and tested for fitness, with heuristics to make new nets closer to existing nets with high fitness. Using multithreading, many nets can be created, tested and evaluated in parallel.

Algorithm to optimize parameters based on imprecise fitness function

I am looking for a general algorithm to help in situations with similar constraints as this example :
I am thinking of a system where images are constructed based on a set of operations. Each operation has a set of parameters. The total "gene" of the image is then the sequential application of the operations with the corresponding parameters. The finished image is then given a vote by one or more real humans according to how "beautiful" it is.
The question is what kind of algorithm would be able to do better than simply random search if you want to find the most beautiful image? (and hopefully improve the confidence over time as votes tick in and improve the fitness function)
Given that the operations will probably be correlated, it should be possible to do better than random search. So for example operation A with parameters a1 and a2 followed by B with parameters b1 could generally be vastly superior to B followed by A. The order of operations will matter.
I have tried googling for research papers on random walk and markov chains as that is my best guesses about where to look, but so far have found no scenarios similar enough. I would really appreciate even just a hint of where to look for such an algorithm.
I think what you are looking for fall in a broad research area called metaheuristics (which include many non-linear optimization algorithms such as genetic algorithms, simulated annealing or tabu search).
Then if your raw fitness function is just giving a statistical value somehow approximating a real (but unknown) fitness function, you can probably still use most metaheuristics by (somehow) smoothing your fitness function (averaging results would do that).
Do you mean the Metropolis algorithm?
This approach uses a random walk, weighted by the fitness function. It is useful for locating local extrema in complicated fitness landscapes, but is generally slower than deterministic approaches where those will work.
You're pretty much describing a genetic algorithm in which the sequence of operations represents the "gene" ("chromosome" would be a better term for this, where the parameter[s] passed to each operation represents a single "gene", and multiple genes make up a chromosome), the image produced represents the phenotypic expression of the gene, and the votes from the real humans represent the fitness function.
If I understand your question, you're looking for an alternative algorithm of some sort that will evaluate the operations and produce a "beauty" score similar to what the real humans produce. Good luck with that - I don't think there really is any such thing, and I'm not surprised that you didn't find anything. Human brains, and correspondingly human evaluations of aesthetics, are much too staggeringly complex to be reducible to a simplistic algorithm.
Interestingly, your question seems to encapsulate the bias against using real human responses as the fitness function in genetic-algorithm-based software. This is a subject of relevance to me, since my namesake software is specifically designed to use human responses (or "votes") to evaluate music produced via a genetic process.
Simple Markov Chain
Markov chains, which you mention, aren't a bad way to go. A Markov chain is just a state machine, represented as a graph with edge weights which are transition probabilities. In your case, each of your operations is a node in the graph, and the edges between the nodes represent allowable sequences of operations. Since order matters, your edges are directed. You then need three components:
A generator function to construct the graph of allowed transitions (which operations are allowed to follow one another). If any operation is allowed to follow any other, then this is easy to write: all nodes are connected, and your graph is said to be complete. You can initially set all the edge weights to 1.
A function to traverse the graph, crossing N nodes, where N is your 'gene-length'. At each node, your choice is made randomly, but proportionally weighted by the values of the edges (so better edges have a higher chance of being selected).
A weighting update function which can be used to adjust the weightings of the edges when you get feedback about an image. For example, a simple update function might be to give each edge involved in a 'pleasing' image a positive vote each time that image is nominated by a human. The weighting of each edge is then normalised, with the currently highest voted edge set to 1, and all the others correspondingly reduced.
This graph is then a simple learning network which will be refined by subsequent voting. Over time as votes accumulate, successive traversals will tend to favour the more highly rated sequences of operations, but will still occasionally explore other possibilities.
Advantages
The main advantage of this approach is that it's easy to understand and code, and makes very few assumptions about the problem space. This is good news if you don't know much about the search space (e.g. which sequences of operations are likely to be favourable).
It's also easy to analyse and debug - you can inspect the weightings at any time and very easily calculate things like the top 10 best sequences known so far, etc. This is a big advantage - other approaches are typically much harder to investigate ("why did it do that?") because of their increased abstraction. Although very efficient, you can easily melt your brain trying to follow and debug the convergence steps of a simplex crawler!
Even if you implement a more sophisticated production algorithm, having a simple baseline algorithm is crucial for sanity checking and efficiency comparisons. It's also easy to tinker with, by messing with the update function. For example, an even more baseline approach is pure random walk, which is just a null weighting function (no weighting updates) - whatever algorithm you produce should perform significantly better than this if its existence is to be justified.
This idea of baselining is very important if you want to evaluate the quality of your algorithm's output empirically. In climate modelling, for example, a simple test is "does my fancy simulation do any better at predicting the weather than one where I simply predict today's weather will be the same as yesterday's?" Since weather is often correlated on a timescale of several days, this baseline can give surprisingly good predictions!
Limitations
One disadvantage of the approach is that it is slow to converge. A more agressive choice of update function will push promising results faster (for example, weighting new results according to a power law, rather than the simple linear normalisation), at the cost of giving alternatives less credence.
This is equivalent to fiddling with the mutation rate and gene pool size in a genetic algorithm, or the cooling rate of a simulated annealing approach. The tradeoff between 'climbing hills or exploring the landscape' is an inescapable "twiddly knob" (free parameter) which all search algorithms must deal with, either directly or indirectly. You are trying to find the highest point in some fitness search space. Your algorithm is trying to do that in less tries than random inspection, by looking at the shape of the space and trying to infer something about it. If you think you're going up a hill, you can take a guess and jump further. But if it turns out to be a small hill in a bumpy landscape, then you've just missed the peak entirely.
Also note that since your fitness function is based on human responses, you are limited to a relatively small number of iterations regardless of your choice of algorithmic approach. For example, you would see the same issue with a genetic algorithm approach (fitness function limits the number of individuals and generations) or a neural network (limited training set).
A final potential limitation is that if your "gene-lengths" are long, there are many nodes, and many transitions are allowed, then the size of the graph will become prohibitive, and the algorithm impractical.

3 dimensional bin packing algorithms

I'm faced with a 3 dimensional bin packing problem and am currently conducting some preliminary research as to which algorithms/heuristics are currently yielding the best results. Since the problem is NP hard I do not expect to find the optimal solution in every case, but I was wondering:
1) what are the best exact solvers? Branch and Bound? What problem instance sizes can I expect to solve with reasonable computing resources?
2) what are the best heuristic solvers?
3) What off-the-shelf solutions exist to conduct some experiments with?
As far as off the shelf solutions, check out MAXLOADPRO for loading trucks. It may be able to be configured to load any rectangular volume, but I haven't tried that yet. In general 3d bin-packing problems have the added complication that the objects can be rotated into different positions so for any object with a given length, width and height, you effectively have to create three variables representing each position, but you only use one in the solution.
In general, stand-alone MIP formulations (or branch and bound) don't work well for the 2d or 3d problem but constraint programming has met with some success producing exact solutions for the 2d problem. Check out this abstract. Without looking at the paper, I like the decomposition approach for the problem where you're trying to minimize the number of same-sized bins. I haven't seen as many results for the 3d problem, but let us know if you find any that are implementable.
Good luck !
I've written a program which tests three various algorithms. Also this is a good source of information: A Thousand Ways to Pack the Bin - A Practical Approach to Two-Dimensional Rectangle Bin Packing. It is for two-dimensional rectangle bin, but you can always transform it to 3D.
From wikipedia:
Although these simple strategies are often good enough, efficient approximation algorithms have been demonstrated that can solve the bin packing problem within any fixed percentage of the optimal solution for sufficiently large inputs
Here are the two sources they give for this:
Approximation Algorithms
Bin packing can be solved within 1 + ε in linear time
Best exact solver: Use dynamic programming.
State variables:
Items you have packed and discarded.
Space filled in the container.
If the container is a parallelepiped grid, and the items "fit" in exact cells of the grid, you can use a 3-dimensional array to represent state variable 2. Otherwise, you will have to use more complex data structures.
Best heuristic solvers
I don't know. Perhaps Variable Neighborhood Search. There are some similarities between your problem and the timetable construction problem (which I'm working on), so the same heuristic might be good for both.
Off-the-shelf solutions to conduct experiments
I'm sorry, I don't even have a clue.
You question is similar to:
3d bin packing algorithm
Although, because you dis-allow rotation, you can get pretty good results. I suggest looking more towards a FIRST-FIT-DECREASING solution.
3dbinpacking is a commercial solution (not an algorithm) exposing an API to consume with nice visualization. It offers:
Single bin packing
Multi bin packing
Find third dimension
Find a bin dimensions

Resources