Crossover operator for permutations - algorithm

i'm trying to solve the problem of crossover in genetic algorithm on my permutations.
Let's say I have two permutations of 20 integers. I want to crossover them to get two children. Parents have the same integers inside, but the order is different.
Example:
Parent1:
5 12 60 50 42 21 530 999 112 234 15 152 601 750 442 221 30 969 113 134
Parent2:
12 750 42 113 530 112 5 23415 60 152 601 999 442 221 50 30 969 134 21
Let it be that way - how can I get children of these two?

What you are looking for is ordered crossover. There is an explanation for the Travelling Salesman Problem here.
Here is some Java code that implements the partially mapped crossover (PMX) variant.

The choice of crossover depends on whether the order or the absolute position of the integers is important to the fitness. In HeuristicLab (C#) we have implemented several popular ones found in the literature which include: OrderCrossover (2 variants), OrderBasedCrossover, PartiallyMatchedCrossover, CyclicCrossover (2 variants), EdgeRecombinationCrossover (ERX), MaximalPreservativeCrossover, PositionBasedCrossover and UniformLikeCrossover. Their implementation can be found together with reference to a scientific source in the HeuristicLab.Encodings.PermutationEncoding plugin. The ERX makes sense only for the TSP or TSP-like problems. The CX is position-based, the PMX is partly position partly order based, but more towards position. The OX is solely order based.
Beware that our implementations assume a continous numbered permutation with integers from 0 to N-1. You have to map them to this range first.

According to my research and implementations of genetic operators. Many types of crossover operators exist for the order coding (i.e. repetition of genes not allowed, like in TSP). In general, I like to think that there are two main families:
The ERX-family
A list of neighborhood is used to store the neighbors of each node in both parents. Then, The child is generated using only the list. ERX is known to be more respectful and alleles transmitting, which basically means that the links between genes are not likely to be broken.
Examples of ERX-like operators include: Edge Recombination (ERX), Edge-2, Edge-3, Edge-4, and Generalized Partition Crossover (GPX).
OX-like crossovers
Two crossover points are chosen. Then, the genes between the points are swapped between the two parents. Since repetitions are not allowed, each crossover proposes a technique to avoid/eliminate repetitions. These crossover operators are more disruptive than ERX.
Example of OX-like crossovers: Order crossover (OX), Maximal Preservative Crossover (MPX), and Partial-Mapped Crossover (PMX).
The first family (ERX) performs better in plain genetic algorithms. While the second family is more suited in a hybrid genetic algorithm or memetic algorithm (use of local search). This paper explains it in details.

In Traveling Salesrep Problem (TSP), you want the order to visit a list of cities, and you want to visit each city exactly once. If you encode the cities directly in the genome, then a naive crossover or mutation will often generate an invalid itinerary.
I once came up with a novel approach to solving this problem: Instead of encoding the solution directly in the genome, I instead encoded a transformation that would re-order a canonical list of values.
Given the genome [1, 2, 4, 3, 2, 4, 1, 3], you'd start with the list of cities in some arbitrary order, say alphabetical:
Atlanta
Boston
Chicago
Denver
You'd then you'd take each pair of values from the genome and swap the cities in those positions. So, for the genome above, you'd swap those in 1 and 2, and then those in 4 and 3, and then those in 2 and 4, and finally those in 1 and 3. You'd end up with:
Denver
Chicago
Boston
Atlanta
With this technique, you can use any type of crossover or mutation operation and still always get a valid tour. If the genome is long enough, then entire solution space can be explored.
I've used this for TSP and other optimization problems with lots of success.

Related

Donald Knuth Algorithm Mastermind

I'm working on a mastermind game that implements the Donald Knuth algorithm. The first five steps are clear. I have to create a set of permutations for each possible answer, use 1122 as my first guess, compare each possible answer from the set to 1122 and then remove any of the possible answers that does not return the same feedback as the current guess. The problem now lies in determining the next guess and how I'm supposed to implement step 6. The algorithm is shown below.
Mastermind-Five-Guess-Algorithm Donal Knuth's five guess algorithm for solving the game Mastermind.
In 1977, Donald Knuth demonstrated that the codebreaker can solve the
pattern in five moves or fewer, using an algorithm that progressively
reduced the number of possible patterns.
The algorithm works as follows:
Create the set S of 1296 possible codes (1111, 1112 ... 6665, 6666).
Start with initial guess 1122 (Knuth gives examples showing that other first guesses such as 1123, 1234 do not win in five tries on
every code).
Play the guess to get a response of colored and white pegs.
If the response is four colored pegs, the game is won, the algorithm terminates.
Otherwise, remove from S any code that would not give the same response if the current guess were the code. For example, if
your current guess is 1122 and you get a response of BW; If the
code were 1111 you would get two black pegs (BB) with a guess of 1122,
which is not the same as one black peg and one white peg (BW). So,
remove 1111 from the list of potential solutions. F(1122,1112)
= BBB≠BW →Remove 1112 from S F(1122,1113) = BB≠BW →Remove 1113 from S F(1122,1114) = BB≠BW →Remove 1114 from S
F(1122,1314) = BW=BW →Keep 1314 in S
Apply minimax technique to find a next guess as follows: For each possible guess, that is, any unused code of the 1296 not just
those in S, calculate how many possibilities in S would be eliminated
for each possible colored/white peg score. The score of a guess is the
minimum number of possibilities it might eliminate from S. A
single loop through S for each unused code of the 1296 will provide a
'hit count' for each of the possible colored/white peg scores; Create
a set of guesses with the smallest max score (hence minmax). From
the set of guesses with the minimum (max) score, select one as the
next guess, choosing a member of S whenever possible. Knuth
follows the convention of choosing the guess with the least numeric
value e.g. 2345 is lower than 3456. Knuth also gives an example
showing that in some cases no member of S will be among the highest
scoring guesses and thus the guess cannot win on the next turn, yet
will be necessary to assure a win in five.
Repeat from step 3
Link to Wikipedia page
Take the set of untried codes, and call it T.
Iterate over T, considering each code as a guess, g.
For each g, iterate over T again considering each code as a possible true hidden code, c.
Calculate the black-white peg score produced by guessing g if the real code is c. Call it s.
Keep a little table of possible scores, and as you iterate over the possible c, keep track of how many codes produce each score. That is, how many choices of c produce two-blacks-one-white, how many produce two-blacks-two-whites, and so on.
When you've considered all possible codes (for that g) consider the score that came up the most often. You might call that the least informative possible result of guessing g. That is g's score; the lower it is, the better.
As you iterate over g, keep track of the guess with the lowest score. That's the guess to make.

Formal name for this optimization algorithm?

I have the following problem in one of my coding project which I will simplify here:
I am ordering groceries online and want very specific things in very specific quantities. I would like to order the following:
8 Apples
1 Yam
2 Soups
3 Steaks
20 Orange Juices
There are many stores equidistant from me which I will have food delivered from. Not all stores have what I need. I want to obtain what I need with the fewest number of orders made. For example, ordering from Store #2 below is a wasted order, since I can complete my items in less orders by ordering from different stores. What is the name of the optimization algorithm that solves this?
Store #1 Supply
50 Apples
Store #2 Supply
1 Orange Juice
2 Steaks
1 Soup
Store #3 Supply
25 Soup
50 Orange Juices
Store #4 Supply
25 Steaks
10 Yams
The lowest possible orders is 3 in this case. 8 Apples from Store #1. 2 Soup and 20 Orange Juice from Store #3. 1 Yam and 3 Steaks from Store #4.
To me, this most likely sounds like a restricted case of the Integer Linear programming problem (ILP), namely, its 0-or-1 variant, where the integer variables are restricted to the set {0, 1}. This is known to be NP-hard (and the corresponding decision problem is NP-complete).
The problem is formulated as follows (following the conventions in the op. cit.):
Given the matrix A, the constraint vector b, and the weight vector c, find the vector x ∈ {0, 1}N such that all the constraints A⋅x ≥ b are satisfied, and the cost c⋅x is minimal.
I flipped the constraint inequality, but this is equivalent to changing the sign of both A and b.
The inequalities indicate satisfaction of your order: that you can buy at the least the amount of every item in the visited store. Note that b has the same length as the number of rows in A and the number of columns in both c and x. The dot-product c⋅x is, naturally, a scalar.
Since you are minimizing the number of trips, each trip costs the same, so that c = 1, and c⋅x is the total number of trips. The store inventory matrix A has a row per item, and a column per store, and the b is your shopping list.
Naturally, the exact best solution is found by trying all possible 2N values for the x.
Since there is no single approach to NP-hard problems, consider the problem size, and how close to the optimum you want to arrive. A greedy approach would work well (when your next store to visit has the most total number of items not yet satisfied) when the "inventories" are large. If you have the idea in advance about the expected minimum number of trips, you can trim the search beam at some value, exceeding the number of trips by some multiplication coefficient. This is the best approach when your search is time constrained (I routinely do beam searches, closely related to the branch-and-cut approach mentioned in the article, in graphs that take a few GB of memory slightly faster than the limit of 30ms per exploration step with a beam as wide as 10,000). Simulated annealing also works, if the search landscape is not excessively rough.
Also search on cs.SE; it may be even a better place for questions of this type.

Quick Sort Algorithms - Many different ways of doing the same thing?

Am I correct in saying that there would be many ways to perform a Quick Sort?
For argument sakes, lets use the first textbook's numbers:
20 47 12 53 32 84 85 96 45 18
This book says to swap the 18 and 20 (in the book the 20 is red and the 18 is blue, so I've bolded the 20).
Basically it keeps moving the blue pointer until the numbers are:
18 12 20 53 32 84 85 96 45 47
Now it says (and this is obvious to me) that all the numbers to the left of the 20 are less than and all of the numbers to the right are greater than, but it never names the 20 as a "pivot", which is how most other resources talk about it. Then as all other methods state, it does a quick sort on two sides and then we end up with (it only covers sorting the right half of the list):
47 32 45 53 96 85 84 and the book ends. Now I know from the other resources that once all of the lists are in order they are put back together. I guess I understand this but am constantly confused by the one "Cambridge approved" textbook that differs from the second one. The second one talking about finding a pivot by picking the median.
What's the best way to find a "pivot" for a list?
What is given in your textbook is similar to the pivot-based concept except that they haven't mentioned this terminology over there. But,anyways the concepts are the same.
What's the best way to find a "pivot" for a list?
There's not a fixed way of selecting pivotal-element. You can select any of the element of the array---first,second,last,etc. It can also be randomly selected for a given array.
But, scientists and mathematicians generally talk about the median element which is the middle element of the list for the symmetry based reason,thereby reducing the recursive calls.
It is almost obvious that when you'll select the first or the last element of the array, there will be more number of recursive calls --- thereby moving closer to the worst case scenario. The more number of recursive calls will be framed to separately perform quick-sort on the two partitions.
Theoretically - choosing the median element as the pivot guarantees least number of recursive calls, and guarantees Theta(nlogn) running time.
However, finding this median is done with selection algorithm - and if you want to guarantee selection takes linear time - it needs median of medians algorithm, which has poor constants.
If you chose the first (or last) element as pivot - you are guaranteed to get poor performance for sorted or almost sorted array - which is pretty likely to be your input array in many applications - so that's not a good choice either. So choosing the first/last element of the array is actually a bad idea.
A good solid solution to chose pivot - is at random. Draw a random number from r = rand([0,length(array)), and chose the r'th element as your pivot.
While there is a theoretical possibility for worst case here - it is:
Very unlikely
Hard for mallicious user to predict what is the worst case input, especially if the random function and/or seed is unknown to him.

find the combination of records that sum up to given limits

Given the set of records that contain only integer elements find the (one or all) combinations of records that sum up to given limits. For example if have records that represent fruits and their characteristics(elements) are vitamins A, B and C
Apple - A=10, B=5, C=15
Orange - A=1, B=20, C=14
Banana - A=4, B=9, C=5
And the limits are
For A - 13 to 15
For B - 10 to 15
For C - 20 to 25
In this case the combination that fulfill the limits would be Apple and Banana. Is there an algorithm that works better than brute force?
This is an integer linear program (with constant objective function).
Solving problems like this is NP, but there are solvers that can solve practical problems (even quite large ones) efficiently.
One such solver is GLPK.
Finding algorithms to solve ILPs efficiently is an active research area. There's some information on the wikipedia page for Integer Programming.

Find electric circuit with given resistance

I need some help with the following problem:
Given a set of resistances, need to construct circuit with given resistance (i.e. we choose some resistors and construct circuit). Only parallel and sequential connection are allowed. So, the formal definition of such circuit is the following:
Circuit = Resistance | (Sequential (Circuit) (Circuit a)) |
(Parallel (Circuit) (Circuit))
The total number of circuits with N unlabeled resistors (where all resistors are used) is A000084 (Thanks Axel Kemper). But in my case resistors are labeled and I don't know how to check all circuits efficiently.
Number of resistors is about 15, is it possible to solve this problem?
UPD. Resistors may have different resistance. And of course, some resistances can't be achieved, in such case we just say that there is no solutions.
Integer sequence A000084 lists the Number of series-parallel networks with n unlabeled edges. Also called yoke-chains by Cayley and MacMahon. MacMahon's paper is online.
The first 15 elements of the sequence:
1, 2, 4, 10, 24, 66, 180, 522, 1532, 4624, 14136, 43930, 137908, 437502, 1399068
If the resistors have different resistance values, they are not "unlabeled".
The number of different overall-resistances is less than the number of networks.
Looking at the numbers, brute-force enumeration is probably feasible for moderate values of n.
It is not possible to match every conceivable total resistance exactly. As mentioned in a comment: The number of 15 resistors might be too small to reach the required value. Other example: If all 15 restors have 1 ohm each, the total resistance cannot be smaller than 1/15 ohm.
Look on page 70 of Analytic Combinatorics to find an illustration of the equivalence between a tree, a bracketed expression and a series-parallel graph:
Like mentioned in one of the comments, a search procedure like A* could be used to search the space of possible trees. The tree representation of the series-parallel network is also useful to determine the source-to-sink resistance with a simple recursive function.

Resources