Crossover algorithm for Travelling Salesman? - algorithm

I am looking for a crossover algorithm for a genetic algorithm for the Travelling Salesman Problem.
However, my TSP problem is a variation of the traditional problem:
Instead of only being given a list of points that we need to visit, we are given a list points that we need to visit and a list of points we need to start at and end at. In other words, any route must start and end at any point belonging to the second list, but must visit all the points in the first list.
So in other words, not every permutation of points is valid. Because of this, I'm not sure if traditional crossover algorithms will work well (for instance, I tried ordered crossover and the children it created were generally worse than its parents).
Can anyone suggest a crossover algorithm?

In order to maintain your crossover and mutation operators, you can add a repair operator that checks if the position of your starting and ending locations are correct and if they aren't, swap them into their valid positions (initial and ending).

Related

A* functionality

I'm just after some clarification how A* for path finding should operate in the [case] of two paths having equal value, both during the calculation and also at the end if there are two equal short paths.
For example I'm at my start node and there are two possible nodes I can expand to, but they both have the same f(x). Do they both get expanded and in which order?
What happens if at the end of the search there are two equal shortest paths?
In both cases you just pick an arbitrary one. Note that A* finds one of the shortest paths, not all of them, and no specific way of resolving ambiguities like those that you described is necessary for it to work.
If you build your own implementation of A* you get to decide exactly how such cases would be handled. Typically, whichever of the equal shortest paths is current at the moment when the algorithm determines that all remaining paths are at least as costly will be returned as the shortest path.
In my game program (on hex grids) I use have two separate implementations of A*. One for short distances (and without road movement) uses a vector product as tie-breaker between equal paths, that chooses the one which is visually more direct. The one for longer distances enables road movement and ignores the refinement above, but uses a more complicated heuristic that is much more efficient over long ranges.
There are many questions on Game Development StackExchange that address various refinements of the A* algorithm.
A good heuristic will evaluate on many conditions (add probabilistic conditions such as risk, cost, utility management, rationality of the operation, etc) and thus will minimize the number of shortest path.
However, if there are still multiple paths on the fringe (i.e. array of expandable nodes), a simple A* will pick up arbitrarily.

Generating a tower defense maze (longest maze with limited walls) - near-optimal heuristic?

In a tower defense game, you have an NxM grid with a start, a finish, and a number of walls.
Enemies take the shortest path from start to finish without passing through any walls (they aren't usually constrained to the grid, but for simplicity's sake let's say they are. In either case, they can't move through diagonal "holes")
The problem (for this question at least) is to place up to K additional walls to maximize the path the enemies have to take. For example, for K=14
My intuition tells me this problem is NP-hard if (as I'm hoping to do) we generalize this to include waypoints that must be visited before moving to the finish, and possibly also without waypoints.
But, are there any decent heuristics out there for near-optimal solutions?
[Edit] I have posted a related question here.
I present a greedy approach and it's maybe close to the optimal (but I couldn't find approximation factor). Idea is simple, we should block the cells which are in critical places of the Maze. These places can help to measure the connectivity of maze. We can consider the vertex connectivity and we find minimum vertex cut which disconnects the start and final: (s,f). After that we remove some critical cells.
To turn it to the graph, take dual of maze. Find minimum (s,f) vertex cut on this graph. Then we examine each vertex in this cut. We remove a vertex its deletion increases the length of all s,f paths or if it is in the minimum length path from s to f. After eliminating a vertex, recursively repeat the above process for k time.
But there is an issue with this, this is when we remove a vertex which cuts any path from s to f. To prevent this we can weight cutting node as high as possible, means first compute minimum (s,f) cut, if cut result is just one node, make it weighted and set a high weight like n^3 to that vertex, now again compute the minimum s,f cut, single cutting vertex in previous calculation doesn't belong to new cut because of waiting.
But if there is just one path between s,f (after some iterations) we can't improve it. In this case we can use normal greedy algorithms like removing node from a one of a shortest path from s to f which doesn't belong to any cut. after that we can deal with minimum vertex cut.
The algorithm running time in each step is:
min-cut + path finding for all nodes in min-cut
O(min cut) + O(n^2)*O(number of nodes in min-cut)
And because number of nodes in min cut can not be greater than O(n^2) in very pessimistic situation the algorithm is O(kn^4), but normally it shouldn't take more than O(kn^3), because normally min-cut algorithm dominates path finding, also normally path finding doesn't takes O(n^2).
I guess the greedy choice is a good start point for simulated annealing type algorithms.
P.S: minimum vertex cut is similar to minimum edge cut, and similar approach like max-flow/min-cut can be applied on minimum vertex cut, just assume each vertex as two vertex, one Vi, one Vo, means input and outputs, also converting undirected graph to directed one is not hard.
it can be easily shown (proof let as an exercise to the reader) that it is enough to search for the solution so that every one of the K blockades is put on the current minimum-length route. Note that if there are multiple minimal-length routes then all of them have to be considered. The reason is that if you don't put any of the remaining blockades on the current minimum-length route then it does not change; hence you can put the first available blockade on it immediately during search. This speeds up even a brute-force search.
But there are more optimizations. You can also always decide that you put the next blockade so that it becomes the FIRST blockade on the current minimum-length route, i.e. you work so that if you place the blockade on the 10th square on the route, then you mark the squares 1..9 as "permanently open" until you backtrack. This saves again an exponential number of squares to search for during backtracking search.
You can then apply heuristics to cut down the search space or to reorder it, e.g. first try those blockade placements that increase the length of the current minimum-length route the most. You can then run the backtracking algorithm for a limited amount of real-time and pick the best solution found thus far.
I believe we can reduce the contained maximum manifold problem to boolean satisifiability and show NP-completeness through any dependency on this subproblem. Because of this, the algorithms spinning_plate provided are reasonable as heuristics, precomputing and machine learning is reasonable, and the trick becomes finding the best heuristic solution if we wish to blunder forward here.
Consider a board like the following:
..S........
#.#..#..###
...........
...........
..........F
This has many of the problems that cause greedy and gate-bound solutions to fail. If we look at that second row:
#.#..#..###
Our logic gates are, in 0-based 2D array ordered as [row][column]:
[1][4], [1][5], [1][6], [1][7], [1][8]
We can re-render this as an equation to satisfy the block:
if ([1][9] AND ([1][10] AND [1][11]) AND ([1][12] AND [1][13]):
traversal_cost = INFINITY; longest = False # Infinity does not qualify
Excepting infinity as an unsatisfiable case, we backtrack and rerender this as:
if ([1][14] AND ([1][15] AND [1][16]) AND [1][17]:
traversal_cost = 6; longest = True
And our hidden boolean relationship falls amongst all of these gates. You can also show that geometric proofs can't fractalize recursively, because we can always create a wall that's exactly N-1 width or height long, and this represents a critical part of the solution in all cases (therefore, divide and conquer won't help you).
Furthermore, because perturbations across different rows are significant:
..S........
#.#........
...#..#....
.......#..#
..........F
We can show that, without a complete set of computable geometric identities, the complete search space reduces itself to N-SAT.
By extension, we can also show that this is trivial to verify and non-polynomial to solve as the number of gates approaches infinity. Unsurprisingly, this is why tower defense games remain so fun for humans to play. Obviously, a more rigorous proof is desirable, but this is a skeletal start.
Do note that you can significantly reduce the n term in your n-choose-k relation. Because we can recursively show that each perturbation must lie on the critical path, and because the critical path is always computable in O(V+E) time (with a few optimizations to speed things up for each perturbation), you can significantly reduce your search space at a cost of a breadth-first search for each additional tower added to the board.
Because we may tolerably assume O(n^k) for a deterministic solution, a heuristical approach is reasonable. My advice thus falls somewhere between spinning_plate's answer and Soravux's, with an eye towards machine learning techniques applicable to the problem.
The 0th solution: Use a tolerable but suboptimal AI, in which spinning_plate provided two usable algorithms. Indeed, these approximate how many naive players approach the game, and this should be sufficient for simple play, albeit with a high degree of exploitability.
The 1st-order solution: Use a database. Given the problem formulation, you haven't quite demonstrated the need to compute the optimal solution on the fly. Therefore, if we relax the constraint of approaching a random board with no information, we can simply precompute the optimum for all K tolerable for each board. Obviously, this only works for a small number of boards: with V! potential board states for each configuration, we cannot tolerably precompute all optimums as V becomes very large.
The 2nd-order solution: Use a machine-learning step. Promote each step as you close a gap that results in a very high traversal cost, running until your algorithm converges or no more optimal solution can be found than greedy. A plethora of algorithms are applicable here, so I recommend chasing the classics and the literature for selecting the correct one that works within the constraints of your program.
The best heuristic may be a simple heat map generated by a locally state-aware, recursive depth-first traversal, sorting the results by most to least commonly traversed after the O(V^2) traversal. Proceeding through this output greedily identifies all bottlenecks, and doing so without making pathing impossible is entirely possible (checking this is O(V+E)).
Putting it all together, I'd try an intersection of these approaches, combining the heat map and critical path identities. I'd assume there's enough here to come up with a good, functional geometric proof that satisfies all of the constraints of the problem.
At the risk of stating the obvious, here's one algorithm
1) Find the shortest path
2) Test blocking everything node on that path and see which one results in the longest path
3) Repeat K times
Naively, this will take O(K*(V+ E log E)^2) but you could with some little work improve 2 by only recalculating partial paths.
As you mention, simply trying to break the path is difficult because if most breaks simply add a length of 1 (or 2), its hard to find the choke points that lead to big gains.
If you take the minimum vertex cut between the start and the end, you will find the choke points for the entire graph. One possible algorithm is this
1) Find the shortest path
2) Find the min-cut of the whole graph
3) Find the maximal contiguous node set that intersects one point on the path, block those.
4) Wash, rinse, repeat
3) is the big part and why this algorithm may perform badly, too. You could also try
the smallest node set that connects with other existing blocks.
finding all groupings of contiguous verticies in the vertex cut, testing each of them for the longest path a la the first algorithm
The last one is what might be most promising
If you find a min vertex cut on the whole graph, you're going to find the choke points for the whole graph.
Here is a thought. In your grid, group adjacent walls into islands and treat every island as a graph node. Distance between nodes is the minimal number of walls that is needed to connect them (to block the enemy).
In that case you can start maximizing the path length by blocking the most cheap arcs.
I have no idea if this would work, because you could make new islands using your points. but it could help work out where to put walls.
I suggest using a modified breadth first search with a K-length priority queue tracking the best K paths between each island.
i would, for every island of connected walls, pretend that it is a light. (a special light that can only send out horizontal and vertical rays of light)
Use ray-tracing to see which other islands the light can hit
say Island1 (i1) hits i2,i3,i4,i5 but doesn't hit i6,i7..
then you would have line(i1,i2), line(i1,i3), line(i1,i4) and line(i1,i5)
Mark the distance of all grid points to be infinity. Set the start point as 0.
Now use breadth first search from the start. Every grid point, mark the distance of that grid point to be the minimum distance of its neighbors.
But.. here is the catch..
every time you get to a grid-point that is on a line() between two islands, Instead of recording the distance as the minimum of its neighbors, you need to make it a priority queue of length K. And record the K shortest paths to that line() from any of the other line()s
This priority queque then stays the same until you get to the next line(), where it aggregates all priority ques going into that point.
You haven't showed the need for this algorithm to be realtime, but I may be wrong about this premice. You could then precalculate the block positions.
If you can do this beforehand and then simply make the AI build the maze rock by rock as if it was a kind of tree, you could use genetic algorithms to ease up your need for heuristics. You would need to load any kind of genetic algorithm framework, start with a population of non-movable blocks (your map) and randomly-placed movable blocks (blocks that the AI would place). Then, you evolve the population by making crossovers and transmutations over movable blocks and then evaluate the individuals by giving more reward to the longest path calculated. You would then simply have to write a resource efficient path-calculator without the need of having heuristics in your code. In your last generation of your evolution, you would take the highest-ranking individual, which would be your solution, thus your desired block pattern for this map.
Genetic algorithms are proven to take you, under ideal situation, to a local maxima (or minima) in reasonable time, which may be impossible to reach with analytic solutions on a sufficiently large data set (ie. big enough map in your situation).
You haven't stated the language in which you are going to develop this algorithm, so I can't propose frameworks that may perfectly suit your needs.
Note that if your map is dynamic, meaning that the map may change over tower defense iterations, you may want to avoid this technique since it may be too intensive to re-evolve an entire new population every wave.
I'm not at all an algorithms expert, but looking at the grid makes me wonder if Conway's game of life might somehow be useful for this. With a reasonable initial seed and well-chosen rules about birth and death of towers, you could try many seeds and subsequent generations thereof in a short period of time.
You already have a measure of fitness in the length of the creeps' path, so you could pick the best one accordingly. I don't know how well (if at all) it would approximate the best path, but it would be an interesting thing to use in a solution.

Optimization from partial solution: minimize sum of distances between pairs

I have a problem which I like and I love to think about solutions, but I'm stuck unfortunately. I hope you like it too. The problem states:
I have two lists of 2D points(say A and B) and need to pair up points from A with points from B, under the condition that the sum of the distances in all pairs is minimal. A pair contains one point from A and one from B, a point can be used only once, and as many as possible pairs should be created(i.e. min(length(A), length(B))).
I've made a simple example, where color denotes which list the point is from, and the black connections are the solution.
Although this is a nice problem and I suspect is NP-hard, it gets nicer. I can build on existing solutions. Suppose I have two lists and the corresponding solution(i.e. the set of pairs), then the problem I need to solve is to reoptimalize that solution when a point is added to or removed from either list.
I've unfortunately not been able to come up with any non-brute force algorithm yielding the optimal solution. I hope you can. Any algorithm is appreciated in any (pseudo) language, preferably C#.
This problem is solvable in polynomial time via the Hungarian algorithm. To get a square matrix, add dummy entries to the shorter list at "distance 0" from everything.
Your problem is an instance of the weighted minimum maximal matching problem (as described in this Wikipedia article). There is no polynomial-time algorithm even for the unweighted problem (all distances equal). There are efficient algorithms to approximately solve it in polynomial time (within a factor of 2).
This is the minimum weight Euclidean bipartite matching problem. There is a O(n^(2+epsilon)) algorithm.

Creating a "crossover" function for a genetic algorithm to improve network paths

I'm trying to develop a genetic algorithm that will find the most efficient way to connect a given number of nodes at specified locations.
All the nodes on the network must be able to connect to the server node and there must be no cycles within the network. It's basically a tree.
I have a function that can measure the "fitness" of any given network layout.
What's stopping me is that I can't think of a crossover function that would take 2 network structures (parents) and somehow mix them to create offspring that would meet the above conditions.
Any ideas?
Clarification: The nodes each have a fixed x,y coordiante position. Only the routes between them can be altered.
Amir- I think the idea is that every generated tree will contain the same set of nodes, arranged in a different order.
Perhaps rather than using a crossover-based genetic algorithm, you'd be better off using a less biologically inspired hill-climbing algorithm? Define a set of swaps (trading children between nodes, for example) to act as possible mutations, and then iteratively mutate and check against your fitness function. As is the case with all searches of this kind, you're vulnerable to getting stuck in local maxima, so many runs from different starting positions is a good idea.
Let me begin by answering your question with a question :
How does the fitness function behave if you create a network layout that violates the 'no cycle' rule and the 'connect to server' rule?
If it simply punishes the given network layout via a poor fitness score, you don't need to do anything special except take two network layouts and cross them over, 1/2 from layout A, 1/2 from layout B. That's a very basic cross over function, and it should work.
If however, you are responsible for constructing a valid layout and cannot rely on invalid layouts simply being weeded out, you'll need to do more work.
Sounds like you need to create a Minimum Spanning Tree network. I know that this doesn't really answer your genetic algorithm question, but this is quite a well understood problem. Two classical methods are Prim and Krustal. Maybe these algorithms and the methods they use to select edges to connect might give you some clues. Maybe the genes don't describe the network but instead the likelihood of connecting nodes via particular edges? Or a way of picking a node to connect in next?
Or just check out someone who's done this before, or this.
The purpose of crossover in genetic algorithms is to potentially mix good partial solutions from one parent with those of another. One way to think about partial solutions in this case may be subtrees of closely connected nodes. If your fitness function is fairly smooth in regards to small changes to localized parts of the overall tree, this may be a useful way to think about crossover.
Given this, one possible form of crossover would be the following:
Start with two parent trees, P1, and P2. Select two nodes randomly (possibly with some kind of enforcement on minimum distance between the nodes), N1 and N2.
On a node-by-node basis, "grow" a tree C1 outwards from N1 according to the linkages in P1, while simultaneously growing another tree outwards from N2 starting with P2. Do not add the same node to both trees - keep the sets of nodes entirely disjoint. Continue until all nodes have been added to either C1 or C2. This gives us the "traits" from each parent to recombine, in a form guaranteed to be acyclic.
Recombination is accomplished by adding an additional link, from C1 to C2, to create the new child C. As for which link to choose, this can be randomly selected (either uniformly or according to some distribution), or it could be selected by a greedy algorithm (based on some heuristic, or the overall fitness of the new tree C).
You could check on the crossover operators, which make sure that you have no repeating nodes in the child chromosomes. Couple of those crossover operators are the Order Crossover (OX) and the Edge Crossover operators. Such crossover operators are also helpful in solving TSP using GAs.
And after that you will have to check if you are getting any cycles or not. If yes, generate a new pair of chromosome. This is brute force of course.
An idea to try: encode nodes' positions in a metric space (e.g. 3-dimensional euclidean space). There are no "incorrect" assignments, so crossover is never destructive. From such an assignment you can always build a nearest neighbor tree, a minimum spanning tree or similar.
This is just example of a more general idea: do not encode the tree directly, encode some information from which a tree can always be constructed. The tricky part is to do it in such a way that the child trees keep important properties of the parents.
There was a paper in one of the early conferences that proposed the following algorithm for Traveling Salesman, which I have adapted for several graph problems with success:
Across the entire POPULATION, calculate and sort the nodes by descending number of connections (in other words, if N0 is connected in some individuals to N1, N2, N3, it has 3 connections, if N1 is always connected to N4, it has only 1).
Initially, take the node with the highest count. Call this the current_gene_node. (Say, N0)
LOOP:
Add current_gene_node to your offspring.
Remove that node from the lists of connections. (No cycles, so remove N0 from further consideration.)
If current_gene_node has no connections in the population, choose a random unchosen node in the population (mutation)
Else, from the list of connections for that node, do a lottery selection based on the prevalence of connections across the population (If current_gene_node = N0, and connections N0 are, say, N1 = 50%, N2 = 30%, N3 = 20% -- N1 has a 50% chance of being next current_gene_node).
Go to LOOP until all nodes connected
It's not really genetic in the sense of choosing directly from 2 parents, but it follows the same mathematical pressure of selecting based on population prevalence. So it's "genetic enough" for me and for me it's worked pretty well :-)

Algorithm Optimization - Shortest Route Between Multiple Points

Problem: I have a large collection of points. Each of these points has a list with references to other points with the distance between them already calculated and stored. I need to determine the shortest route that begins from an origin and passes through a specific number of points to any destination.
Ex: I'm on vacation and I'm staying in a specific city. I'm making a ONE WAY trip to see ANY four cities and I want to travel the least distance possible. I cannot visit the same city more than once.
Current solution: Right now I'm just iterating through every possibility manually and storing the shortest path. This works but feels inefficient. Also, this problem will eventually be expanded to include searching from multiple origin points to multiple destination points, so I think that might explode the search space.
What is the better way to search for the shortest route?
Answering to the updated post, your solution of checking every possibility is optimal (at least, noone has discovered better algorithms so far). Yes, that's a travelling salesman, whose essense is not touching every city, but touching every city once. If you don't want to search for best solution possible, you may find it useful to use heuristics that work faster, but allow for limited discrepancy from ideal solution.
For future answerers: Floyd-Warshall algorithm and all Floyd-like variations are inapplicable here.
In generally you should to strict bad variants...
I think you should use some variations of Branch_and_bound method
http://en.wikipedia.org/wiki/Branch_and_bound
Either bredth first search as norheim.se said or Dijkstra's algorithm would be my suggestion as well.
This sounds Travelling Salesman-esque? One solution is to use an optimisation technique such as an evolutionary algorithm. Currently you are doing an exhaustive search, which will get very slow very quickly. But I think this is pretty much a travelling salesman problem and it has been tackled for several decades if not centuries, and such there are several possible ways of attack. Google is your friend.
Perhaps this is what the original poster means by "iterating through each possibility manually and storing the shortest path", but I thought I'd like to make explicit what appears to be a baseline solution.
Assume you already have a two-point shortest path algorithm--this has classical solutions for various kinds of graphs. Assume all distances are nonnegative and d(A->B->C) = d(A->B) + d(B->C).
The essentials are that the path starts at S goes through one of intermediate cities "abcd" and ends with E:
e.g. SabcdE, SacbdE, etc...
With only 4 intermediate cities, you enumerate all 24 permutations. For each permutation use your shortest two-point algorithm to compute the path from head to tail, and its total distance.
Then given the start and ending point, there are 12 possibilities attaching to one of abcd and for each two possibilities for the interior. You've computed these distances already, so you add on the distance from S to the head and the tail to E. Choose minimum. So once you've precomputed the intermediate distances for a fixed set of interior cities you need to do 12 two point shortest path problems for any pair of start and end points.
This obviously scales poorly with increasing number of intermediate cities. It's not clear to me that it could do better unless you impose greater restrictions on the graph structure (is this in a physical Euclidenan space? Triangle inequality?).
My thought example: suppose all intermediate distances between cities are O(1). With no restriction on the graph, then the distance from S to any intermediate city might be 1000 except for one being 1. Same for the tail. So you can force the first city to be visited to be anything. Now, go one layer down, take the first city as the "start point". Apply the same argument: you can make the best path go to any of the following cities by manipulating the distances in the graph.
So it seems that the complexity can't be helped without additional assumptions.
This is the very common and real time situation any one can fall in.Google map user interface gives you the path in the same order, you add in the destination list. it doesn't give you the optimal path though their own Google maps API provide the solution.
Google maps API provides the solution for this. In the request to find out the path you have to provide the flag 'optimizeWaypoints: true,'. The request will seem like this.
var request = {
origin: start,
destination: end,
waypoints: waypts,
optimizeWaypoints: true,
travelMode: google.maps.TravelMode.DRIVING
};
and you can see whole code of the utility in the view source as complete utility is developed in javascript and HTML.
I hope it will help.
It appears that the edges of your graph are bidirectional. In this case, the algorithm you're looking for is Dijkstra's algorithm.

Resources