Where to find a set of hard Traveling Salesman Problems (with known solutions/approximations)? - algorithm

I want to try my hand at finding heuristics/approximations for solving the Traveling Salesman Problem, and in order to do that, I'm looking for some "hard" TSP instances (along with their best known solutions) so that I can try solving them and see how well I can do.
Ideally, they would be simply a text-based list of adjacency matrices or adjacency lists (I don't want to deal with parsing, just the algorithm).
By "hard", I mean that they should be practically impossible to solve or approximate using brute-force.
(This is so that I can be reasonably confident that if I find an answer close to the best known answer, then I'm actually doing something right, and not just getting lucky.)
Are there any lists that would work for this purpose? I searched around a bit but didn't find anything.

Here is another question on SE partially answering your problem (it lists problems, but most of these seems not to have a solution provided, but you better check the links anyway - things may have changed).
If you can't find them, what about randomly generating a set of nodes along with a path connecting them, saving the path length as "minimal" (making sure that the longest connection between two nodes is never > X) and then adding a bunch of other paths making sure these are all > X?
This way (unless I am missing something) you have a set of connected nodes "as complex as you want" and know the actual shortest connecting path from the start...
Addendum - if you really want to see how you compare to existing tools, then you have to run these on your generated problems. One that is free and accessible (but I don't know how "efficient" it may be) is the TSP Library for R.
Wikipedia has a list of other free sw packages for this.
Maybe you could create a different SE question asking for how to get other TSP tools.

The TSP gatech site seems to be the canonical site for TSP information.
Here's a list of the available datasets: http://www.tsp.gatech.edu/data/index.html
The optimal solution is available for some datasets with over 10 000 cities. And there are datasets available of over 1 000 000 cities.

There is a well-known algorithm for finding the optimum TSP solution - it is called brute force.
So the only real way you can compare two algorithms has to be on the quality of the solution as well as some other criteria - usually running time.
Even here you run into a problem. Many algorithms are effectively search algorithms, and the longer you search the more possible solutions are evaluated. The algorithms already trade off quality and running time. They may or may not result in the correct (best) answer for some or all graphs.
The only real way you are going to be able to compare your algorithm to others is by implementing the other algorithms then throwing yours and them the same hard problems (and as others have identified, it is easy to make at least some types of hard problems). Implementing these existing algorithms may suggest ways of improving yours. http://en.wikipedia.org/wiki/Travelling_salesman_problem has plenty of algorithms, and at least a couple look very easy to code. Why not implement them as the first benchmark for your algorithm?

Related

What's the theory behind this puzzle?

I recently came across the above puzzle game. The objective is to form a large triangle in such a way that the shapes and colors of the parts of the figures on neighboring triangles match.
One way to solve this problem is to apply an exhaustive search and to test every possible combination (roughly 7.1e9). I wrote a simple script to solve it (github).
Since this puzzle is quite old, brute-forcing this problem may not have been feasible back then. So, what's a more efficient way (algorithm/mathematical theory) to solve this?
This is equivalent to the Edge-matching problem (with some regular polygons), which is of course np-complete (and there are more negative results i assume about approximations). This means, that there exists puzzles which are very hard to solve (at least if P != NP).
One interesting side-note: there is a very popular (commercial) edge-matching puzzle called Eternity II which had a prize value of two million dollars. It's still unsoved to my knowledge.
This problem resulted in many attempts and blog-writings, which should offer you much about solving these kind of problems.
Failed (in terms of: did not solve the full-size E2 puzzle; but other hard ones) approaches, which should work much better than exhaustive-search (without heuristics) are:
SAT-solving (in my opinion most powerful complete approach)
Constraint-programming
Common Metaheuristics (a lot of potential when tuned to some problem-statistics)
Some interesting resources:
Complexity-theory: Demaine, Erik D., and Martin L. Demaine. "Jigsaw puzzles, edge matching, and polyomino packing: Connections and complexity." Graphs and Combinatorics 23.1 (2007): 195-208.
General hardness analysis (practical): Ansótegui, Carlos, et al. "How Hard is a Commercial Puzzle: the Eternity II Challenge." CCIA. 2008.
SAT-solving approach: Heule, Marijn JH. "Solving edge-matching problems with satisfiability solvers." SAT (2009): 69-82.
Edge-matching as benchmarks (because of hardness): Ansótegui, Carlos, et al. "Edge matching puzzles as hard sat/csp benchmarks." International Conference on Principles and Practice of Constraint Programming. Springer Berlin Heidelberg, 2008.
One common approach to solving this sort of problem is with backtracking.
You choose a starting place, put down one of the tiles and then try to find matches for it in the neighboring places. When you get stuck, you back up one, and try an alternative there.
Eventually you have tried every possibility, without bothering with a huge number of dead ends. Once you get stuck, there is no point in filling in the rest in any way, because you'll still be stuck at that one point.
More recently, Knuth has applied his Dancing Links algorithm to problems of this nature, with even greater efficiencies gained thereby.
For a problem the size of your example, with just 9 pieces and two "colors", all solutions would be found in a matter of seconds at the most.

Tiling Algorithm

I'm faced with a problem where I have to solve puzzles.
E.g. I have an (variable) area of 20x20 (meters for example). There are a number of given set pieces having variable sizes. Such as 4x3, 4x2, 1x5 pieces etc. These pieces can also be turned to add more pain to my problem. The point of the puzzle is to fill the entire area of 20x20 with the given pieces.
What would be a good starting algorithm to achieve such a feat?
I'm thinking of using a heuristic that calculates the open space (for efficiency purposes).
Thanks in advance
That's an Exact Cover problem, with a nice structure too, usually, depending on the pieces. I don't know about any heuristic algorithms, but there are several exact options that should work well.
As usual with Exact Covers, you can use Dancing Links, a way to implement Algorithm X efficiently.
Less generally, you can probably solve this with zero-suppressed decision diagrams. It depends on the tiles though. As a bonus, you can represent all possible solutions and count them or generate one with some properties, all without ever explicitly storing the entire (usually far too large) set of solutions.
BDDs would work about as well, using more nodes to accomplish the same thing (because the solutions are very sparse, as in, using few of the possible tile-placements - ZDDs like that but BDDs like symmetry better than sparseness).
Or you could turn it into a SAT problem, then you get less information (no solution count for example), but faster if there are easy solutions.

Pathfinding with limited knowledge and no distance heuristic

I'm having trouble writing the pathfinding routine for the AI in a simple Elite-esque game I'm writing.
The world in this game is a handful of "systems" connected by "wormholes", and a ship can jump from the system it's in to any system it's linked to once per turn. The AI is limited to only knowing things that it should know; it doesn't know what links come from a system it hasn't been to (though it can work it out from the systems it has seen, since links are two-way). Other parts of the AI decide which system the ship needs to get to based on what goods it has in its inventory and how much it remembers things being worth on systems it has passed through.
The problem is, I don't know how to approach the problem of finding a path to the target system. I can't use A*; there's no way to determine the 'distance' to another system without pathing to it. I also need this algorithm to be efficient, since it'll need to run about 100 times every time the player takes his turn.
Does anyone know of a suitable algorithm?
I ended up implementing a bidirectional, greedy version of breadth-first search, which suits the purpose well enough. To put it simply, I just had the program look through each node its starting node connected through, then each node those nodes connected to, then each node those connected to... until the destination node was found.
Normally one would build a list of appropriate paths and pick the shortest one, but I tried a different method; I had the program run two searches in parallel, one from the starting point, and one from the end point. When the 'from' search found the last node of the 'to' search, the path was considered found.
It then optimizes the path by checking if each node on the path connects to a node further up in the path, and deleting each node in between them.
Whether or not this funky algorithm is actually any better than a straight BFS remains to be seen.
When it comess to unknown environments, I usually use an evolutionary algorithm approach. It doesn't guarantee that you'll find the best solution in the small timeframe you have, but is a way to approach such a problem.
Have a look at Partially Observable Markov Decision Problems (POMDP). You should be able to express your problem with this model.
Then you can use an algorithm that solves these problems to try to find a good solution. Note that solving POMDPs is usually very expensive and you probably have to resort to approximate methods.
Easiest way to cheat this would be o go through, or at least attempt to access as many systems as possible, then implement the distance heuristic as the sum of all the systems you've been to.
Alternatively, and way cooler:
I've implemented something similar using ACO (Ant colony optimization) and worked pretty well combined with PSO(particle swarm optimization), however, the additional constraints your system is imposing means that you'll have to spend a few (at least one) sessions figuring out the environment layout, and if it's dynamic... well... though.
The good thing is that this algorithm completely bypasses the need for heuristic generation which is what you need since you are flying blind. Be advised though, that this is a bad idea if your search space (number of runs) is small. (100 may be acceptable, but 10 or 5 ... not so much).
This scales up quite nicely when dealing with large numbers of nodes (systems) and it bypasses the heuristic distance computational need for every node-to-node relationship, thereby making it more efficient.
Good luck.

Course Scheduling Algorithms: why use of DFS or Graph coloring is not suggested?

I need to develop a Course Timetabling software which can allot timeslots and rooms efficiently. This is a curriculum based routine, not post-enrollment based. And efficiently means classes are assigned timeslots according to staff time preferences and also need to minimize 1st year-2nd year class overlap so that 2nd year students can retake the courses they've failed to pass.(and also for 3rd-4th yr pair).
Now, at first i thought that would be an easy problem, but now it seems different. Most of the papers i've looked on uses Genetic Algorithm/PSO/Simulated Annealing or these type of algorithm. And i'm still unable to interpret the problem to a GA problem.
what i'm confused about is why almost none of them suggests DFS or Graph-coloring algorithm?
Can someone explain the scenario if DFS/graph-coloring is used? Or why they aren't suggested or tried.
My experience with solving this problem for a complex department, is that the hard constraints (like no overlapping of courses that are taken by the same population, and hard constraints of the teachers) are rather easily solvable by exact methods. I modeled the problem with 0-1 integer linear programming, and solved it with a SAT-based tool called minisat+. Competitive commercial tools like cplex can also solve it.
So with today's tools there is no need to approximate as suggested above, even when the input is rather large.
Now, optimizing the solution is a different story. There can be many (weighted) objectives, and finding the solution that brings the objective to minimum is indeed very hard computationally (no tool that I tried can solve it within 24 hours), but they reach near optimum in a few hours (I know it is near optimum because I can compute the theoretical bound on the solution).
This document describes applying a GA approach to university time-tabling, so it should be directly applicable to your requirement: Using a GA to solve university time-tabling

Best Fit Scheduling Algorithm

I'm writing a scheduling program with a difficult programming problem. There are several events, each with multiple meeting times. I need to find an arrangement of meeting times such that each schedule contains any given event exactly once, using one of each event's multiple meeting times.
Obviously I could use brute force, but that's rarely the best solution. I'm guessing this is a relatively basic computer science problem, which I'll learn about once I am able to start taking computer science classes. In the meantime, I'd prefer any links where I could read up on this, or even just a name I could Google.
I think you should use genetic algorithm because:
It is best suited for large problem instances.
It yields reduced time complexity on the price of inaccurate answer(Not the ultimate best)
You can specify constraints & preferences easily by adjusting fitness punishments for not met ones.
You can specify time limit for program execution.
The quality of solution depends on how much time you intend to spend solving the program..
Genetic Algorithms Definition
Genetic Algorithms Tutorial
Class scheduling project with GA
There are several ways to do this
One approach is to do constraint programming. It is a special case of the dynamic programming suggested by feanor. It is helful to use a specialized library that can do the bounding and branching for you. (Google for "gecode" or "comet-online" to find libraries)
If you are mathematically inclined then you can also use integer programming to solve the problem. The basic idea here is to translate your problem in to a set of linear inequalities. (Google for "integer programming scheduling" to find many real life examples and google for "Abacus COIN-OR" for a useful library)
My guess is that constraint programming is the easiest approach, but integer programming is useful if you want to include real variables in you problem at some point.
Your problem description isn't entirely clear, but if all you're trying to do is find a schedule which has no overlapping events, then this is a straightforward bipartite matching problem.
You have two sets of nodes: events and times. Draw an edge from each event to each possible meeting time. You can then efficiently construct the matching (the largest possible set of edges between the nodes) using augmented paths. This works because you can always convert a bipartite graph into an equivalent flow graph.
An example of code that does this is BIM. Standard graphing libraries such as GOBLIN and NetworkX also have bipartite matching implementations.
This sounds like this could be a good candidate for a dynamic programming solution, specifically something similar to the interval scheduling problem.
There are some visuals here for the interval scheduling problem specifically, which may make the concept clearer. Here is a good tutorial on dynamic programming overall.

Resources