How to properly unit test a suffix tree algorithm?

How to properly unit test a suffix tree algorithm? - algorithm

I'm working on an implementation of Ukkonen's linear time suffix tree construction algorithm, and planning to implement improvements suggested by e.g. Kurtz and NJ Larsson (for example edge links instead of suffix links).
While testing, I experienced mixed green and red lights based on the specific strings I tested, and had similar experiences with a few algorithms I found online. Which made me wonder:
Are there any known, specifically built (preferably simple/short) strings for unit-testing suffix trees to ensure the algorithm works precisely in all branching scenarios?
Furthermore, are there any good methods to separate the testing of the tree building algorithm from the testing of the traversal/lookup algorithm?
I know this question doesn't have a single specific correct answer, but I think it could serve as a good reference point for people working on similar algorithms.
My current unit-testing approach is quite primitive (C# with NUnit):
[TestCase]
public void Contains_Simple_ShouldReturnTrue()
{
var s = "bananasbanananananananananabananas";
var st = SuffixTree.Build(s);
var t1 = s.Substring(0, 10);
Assert.IsTrue(st.Contains(t1));
}
// ... Other simple test cases
[TestCase]
// This test fails, but it's not particularly helpful for bugfixing
public void Contains_DynamicBarrage_OnLongString_ShouldReturnTrue()
{
const int CYCLES = 200,
MAXLEN = 200;
var s = "olbafuynhfcxzqhnebecxjrfwfttw"; // Shortened for sanity
var st = SuffixTree.Build(s);
var r = new Random();
for (int i = 0; i < CYCLES; i++)
{
var pos = r.Next(0, s.Length - 2);
var len = r.Next(1, Math.Min(s.Length - pos, MAXLEN));
Assert.IsTrue(st.Contains(s.Substring(pos, len)));
}
}

Related

How do online blackjack games shuffle cards?

What method of shuffling do online blackjack games use? Do they keep a "deck" of cards representation in code, or simply treat each hand as a completely new pick from the set of cards at random?

It depends a lot who is the manufacturer of the game and what kind of programming language they use. Imagine that each time you are creating new deck of card objects. Even in this case you will need to shuffle them. Good shuffling algorithms are very important and if the manufacturer is skilled enough it will use algorithm like this one:
var rand = new Random();
for (int i = cards.Length - 1; i > 0; i--) {
int n = rand.Next(i + 1);
int temp = cards[i];
cards[i] = cards[n];
cards[n] = temp;
}

Find largest circle not overlapping with others using genetic algorithm

I'm using GA, so I took example from this page (http://www.ai-junkie.com/ga/intro/gat3.html) and tried to do on my own.
The problem is, it doesn't work. For example, maximum fitness does not always grow in the next generation, but becomes smallest. Also, after some number of generations, it just stops getting better. For example, in first 100 generations, it found the largest circle with radius 104. And in next 900 largest radius is 107. And after drawing it, I see that it can grow much more.
Here is my code connected with GA. I leave out generating random circles, decoding and drawing.
private Genome ChooseParent(Genome[] population, Random r)
{
double sumFitness = 0;
double maxFitness = 0;
for (int i = 0; i < population.Length; i++)
{
sumFitness += population[i].fitness;
if (i == 0 || maxFitness < population[i].fitness)
{
maxFitness = population[i].fitness;
}
}
sumFitness = population.Length * maxFitness - sumFitness;
double randNum = r.NextDouble() *sumFitness;
double acumulatedSum = 0;
for(int i=0;i<population.Length;i++)
{
acumulatedSum += population[i].fitness;
if(randNum<acumulatedSum)
{
return population[i];
}
}
return population[0];
}
private void Crossover(Genome parent1, Genome parent2, Genome child1, Genome child2, Random r)
{
double d=r.NextDouble();
if(d>this.crossoverRate || child1.Equals(child2))
{
for (int i = 0; i < parent1.bitNum; i++)
{
child1.bit[i] = parent1.bit[i];
child2.bit[i] = parent2.bit[i];
}
}
else
{
int cp = r.Next(parent1.bitNum - 1);
for (int i = 0; i < cp; i++)
{
child1.bit[i] = parent1.bit[i];
child2.bit[i] = parent2.bit[i];
}
for (int i = cp; i < parent1.bitNum; i++)
{
child1.bit[i] = parent2.bit[i];
child2.bit[i] = parent1.bit[i];
}
}
}
private void Mutation(Genome child, Random r)
{
for(int i=0;i<child.bitNum;i++)
{
if(r.NextDouble()<=this.mutationRate)
{
child.bit[i] = (byte)(1 - child.bit[i]);
}
}
}
public void Run()
{
for(int generation=0;generation<1000;generation++)
{
CalculateFitness(population);
System.Diagnostics.Debug.WriteLine(maxFitness);
population = population.OrderByDescending(x => x).ToArray();
//ELITIZM
Copy(population[0], newpopulation[0]);
Copy(population[1], newpopulation[1]);
for(int i=1;i<this.populationSize/2;i++)
{
Genome parent1 = ChooseParent(population, r);
Genome parent2 = ChooseParent(population, r);
Genome child1 = newpopulation[2 * i];
Genome child2 = newpopulation[2 * i + 1];
Crossover(parent1, parent2, child1, child2, r);
Mutation(child1, r);
Mutation(child2, r);
}
Genome[] tmp = population;
population = newpopulation;
newpopulation = tmp;
DekodePopulation(population); //decoding and fitness calculation for each member of population
}
}
If someone can point on potential problem that caused such behaviour and ways to fix it, I'll be grateful.

Welcome to the world of genetic algorithms!
I'll go through your issues and suggest a potential problem. Here we go:
maximum fitness does not always grow in the next generation, but becomes smallest - You probably meant smaller. This is weird since you employed elitism, so each generation's best individual should be at least as good as in the previous one. I suggest you check your code for mistakes because this really should not happen. However, the fitness does not need to always grow. It is impossible to achieve this in GA - it's a stochastic algorithm, working with randomness - suppose that, by chance, no mutation nor crossover happens in a generation - then the fitness cannot improve to the next generation since there is no change.
after some number of generations, it just stops getting better. For example, in first 100 generations, it found the largest circle with radius 104. And in next 900 largest radius is 107. And after drawing it, I see that it can grow much more. - this is (probably) a sign of a phenomenon called premature convergence and it's, unfortunately, a "normal" thing in genetic algorithm. Premature convergence is a situation when the whole population converges to a single solution or to a set of solutions which are near each other and which is/are sub-optimal (i.e. it is not the best possible soluion). When this happens, the GA has a very hard time escaping this local optimum. You can try to tweak the parameters, especially the mutation probability, to force more exploration.
Also, another very important thing that can cause problems is the encoding, i.e. how is the bit string mapped to the circle. If the encoding is much too indirect, it can lead to poor performance of the GA. GAs work when there are some building blocks in the genotype which can be exchanged between among the population. If there are no such blocks, the performance of a GA is usually going to be poor.

I have implemented this exercise and achieved good results. Here is the link:
https://github.com/ManhTruongDang/ai-junkie
Hope this can be of use to you.

how to sort objects for Guendelman shock propagation?

This is a question regarding forming a topographical tree in 3D. A bit of context: I have a physics engine where I have bodies and collision points and some constraints. This is not homework, but an experiment in a multi-threading.
I need to sort the bodies in a bottom-to-top fashion with groups of objects belonging to layers like in this document: See section on "shock-propagation"
http://www2.imm.dtu.dk/visiondag/VD05/graphical/slides/kenny.pdf
the pseudocode he uses to describe how to iterate over the tree makes perfect sense:
shock-propagation(algorithm A)
compute contact graph
for each stack layer in bottom up order
fixate bottom-most objects of layer
apply algorithm A to layer
un-fixate bottom-most objects of layer
next layer
I already have algorithm A figured out (my impulse code). What would the pseudocode look like for tree/layer sorting (topo sort?) with a list of 3D points?
I.E., I don't know where to stop/begin the next "rung" or "branch". I guess I could just chunk it up by y position, but that seems clunky and error prone. Do I look into topographical sorting? I don't really know how to go about this in 3D. How would I get "edges" for a topo sort, if that's the way to do it?
Am I over thinking this and I just "connect the dots" by finding point p1 then the least distant next point p2 where p2.y > p1.y ? I see a problem here where p1 distance from p0 could be greater than p2 using pure distances, which would lead to a bad sort.

i just tackled this myself.
I found an example of how to accomplish this in the downloadable source code linked to this paper:
http://www-cs-students.stanford.edu/~eparker/files/PhysicsEngine/
Specifically the WorldState.cs file starting at line 221.
But the idea being that you assign all static objects with the level of -1 and each other object with a different default level of say -2. Then for each collision with the bodies at level -1 you add the body it collided with to a list and set its level to 0.
After that using a while loop while(list.Count > 0) check for the bodies that collide with it and set there levels to the body.level + 1.
After that, for each body in the simulation that still has the default level (i said -2 earlier) set its level to the highest level.
There are a few more fine details, but looking at the code in the example will explain it way better than i ever could.
Hope it helps!
Relevant Code from Evan Parker's code. [Stanford]
{{{
// topological sort (bfs)
// TODO check this
int max_level = -1;
while (queue.Count > 0)
{
RigidBody a = queue.Dequeue() as RigidBody;
//Console.Out.WriteLine("considering collisions with '{0}'", a.Name);
if (a.level > max_level) max_level = a.level;
foreach (CollisionPair cp in a.collisions)
{
RigidBody b = (cp.body[0] == a ? cp.body[1] : cp.body[0]);
//Console.Out.WriteLine("considering collision between '{0}' and '{1}'", a.Name, b.Name);
if (!b.levelSet)
{
b.level = a.level + 1;
b.levelSet = true;
queue.Enqueue(b);
//Console.Out.WriteLine("found body '{0}' in level {1}", b.Name, b.level);
}
}
}
int num_levels = max_level + 1;
//Console.WriteLine("num_levels = {0}", num_levels);
ArrayList[] bodiesAtLevel = new ArrayList[num_levels];
ArrayList[] collisionsAtLevel = new ArrayList[num_levels];
for (int i = 0; i < num_levels; i++)
{
bodiesAtLevel[i] = new ArrayList();
collisionsAtLevel[i] = new ArrayList();
}
for (int i = 0; i < bodies.GetNumBodies(); i++)
{
RigidBody a = bodies.GetBody(i);
if (!a.levelSet || a.level < 0) continue; // either a static body or no contacts
// add a to a's level
bodiesAtLevel[a.level].Add(a);
// add collisions involving a to a's level
foreach (CollisionPair cp in a.collisions)
{
RigidBody b = (cp.body[0] == a ? cp.body[1] : cp.body[0]);
if (b.level <= a.level) // contact with object at or below the same level as a
{
// make sure not to add duplicate collisions
bool found = false;
foreach (CollisionPair cp2 in collisionsAtLevel[a.level])
if (cp == cp2) found = true;
if (!found) collisionsAtLevel[a.level].Add(cp);
}
}
}
for (int step = 0; step < num_contact_steps; step++)
{
for (int level = 0; level < num_levels; level++)
{
// process all contacts
foreach (CollisionPair cp in collisionsAtLevel[level])
{
cp.ResolveContact(dt, (num_contact_steps - step - 1) * -1.0f/num_contact_steps);
}
}
}
}}}

Optimization of irregular functions

I have a complicated function defined (4 double parameters), which has a lot of different local optima. I have no reason to think that it should be differentiable either. The only thing I can tell is the hypercube in which the (interesting) optima can be found.
I wrote a really crude and slow algorithm to optimize the function:
public static OptimalParameters brutForce(Model function) throws FunctionEvaluationException, OptimizationException {
System.out.println("BrutForce");
double startingStep = 0.02;
double minStep = 1e-6;
int steps = 30;
double[] start = function.startingGuess();
int n = start.length;
Comparer comparer = comparer(function);
double[] minimum = start;
double result = function.value(minimum);
double step = startingStep;
while (step > minStep) {
System.out.println("STEP step=" + step);
GridGenerator gridGenerator = new GridGenerator(steps, step, minimum);
double[] point;
while ((point = gridGenerator.NextPoint()) != null) {
double value = function.value(point);
if (comparer.better(value, result)) {
System.out.println("New optimum " + value + " at " + model.timeSeries(point));
result = value;
minimum = point;
}
}
step /= 1.93;
}
return new OptimalParameters(result, function.timeSeries(minimum));
}
private static Comparer comparer(Model model) {
if (model.goalType() == GoalType.MINIMIZE) {
return new Comparer() {
#Override
public boolean better(double newVal, double optimumSoFar) {
return newVal < optimumSoFar;
}
};
}
return new Comparer() {
#Override
public boolean better(double newVal, double optimumSoFar) {
return newVal > optimumSoFar;
}
};
}
private static interface Comparer {
boolean better(double newVal, double optimumSoFar);
}
Note that it is more important to find a better local optimum than speed of the algorithm.
Are there any better algorithms to do this kind of optimization? Would you have any ideas how to improve this design?

You can use simplex based optimization. It is suitable exactly for problems like you have.
If you can use Matlab, at least for the prototyping, try using fminsearch
http://www.mathworks.com/help/techdoc/ref/fminsearch.html
[1] Lagarias, J.C., J. A. Reeds, M. H. Wright, and P. E. Wright, "Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions," SIAM Journal of Optimization, Vol. 9 Number 1, pp. 112-147, 1998.

Try something classic: http://en.wikipedia.org/wiki/Golden_section_search

Your problem sounds as if metaheuristics would be an ideal solution. You can try a metaheuristic such as Evolution Strategies (ES). ES was designed for tough multimodal real vector functions. ES and several real functions are implemented (Rosenbrock, Rastrigin, Ackley, etc.) in our software HeuristicLab. You can implement your own function there and have it optimized. You don't need to add a lot of code and you can directly copy from the other functions that can work as examples for you. You would need to port your code to C# though, but only the evaluation, the other parts are not needed.
An advantage is that if you have your function implemented in HeuristicLab you can also try to optimize it with a Particle Swarm Optimization (PSO) method, Genetic Algorithm, or Simulated Annealing which are also already implemented and see which one works best. You only need to implement the evaluation function once.
Or you just scan the literature for papers on Evolution Strategies and reimplement it yourself. IIRC Beyer has implementations on his website - it's written for MatLab.

Optimized TSP Algorithms

I am interested in ways to improve or come up with algorithms that are able to solve the Travelling salesman problem for about n = 100 to 200 cities.
The wikipedia link I gave lists various optimizations, but it does so at a pretty high level, and I don't know how to go about actually implementing them in code.
There are industrial strength solvers out there, such as Concorde, but those are way too complex for what I want, and the classic solutions that flood the searches for TSP all present randomized algorithms or the classic backtracking or dynamic programming algorithms that only work for about 20 cities.
So, does anyone know how to implement a simple (by simple I mean that an implementation doesn't take more than 100-200 lines of code) TSP solver that works in reasonable time (a few seconds) for at least 100 cities? I am only interested in exact solutions.
You may assume that the input will be randomly generated, so I don't care for inputs that are aimed specifically at breaking a certain algorithm.

200 lines and no libraries is a tough constraint. The advanced solvers use branch and bound with the Held–Karp relaxation, and I'm not sure if even the most basic version of that would fit into 200 normal lines. Nevertheless, here's an outline.
Held Karp
One way to write TSP as an integer program is as follows (Dantzig, Fulkerson, Johnson). For all edges e, constant we denotes the length of edge e, and variable xe is 1 if edge e is on the tour and 0 otherwise. For all subsets S of vertices, ∂(S) denotes the edges connecting a vertex in S with a vertex not in S.
minimize sumedges e we xe
subject to
1. for all vertices v, sumedges e in ∂({v}) xe = 2
2. for all nonempty proper subsets S of vertices, sumedges e in ∂(S) xe ≥ 2
3. for all edges e in E, xe in {0, 1}
Condition 1 ensures that the set of edges is a collection of tours. Condition 2 ensures that there's only one. (Otherwise, let S be the set of vertices visited by one of the tours.) The Held–Karp relaxation is obtained by making this change.
3. for all edges e in E, xe in {0, 1}
3. for all edges e in E, 0 ≤ xe ≤ 1
Held–Karp is a linear program but it has an exponential number of constraints. One way to solve it is to introduce Lagrange multipliers and then do subgradient optimization. That boils down to a loop that computes a minimum spanning tree and then updates some vectors, but the details are sort of involved. Besides "Held–Karp" and "subgradient (descent|optimization)", "1-tree" is another useful search term.
(A slower alternative is to write an LP solver and introduce subtour constraints as they are violated by previous optima. This means writing an LP solver and a min-cut procedure, which is also more code, but it might extend better to more exotic TSP constraints.)
Branch and bound
By "partial solution", I mean an partial assignment of variables to 0 or 1, where an edge assigned 1 is definitely in the tour, and an edge assigned 0 is definitely out. Evaluating Held–Karp with these side constraints gives a lower bound on the optimum tour that respects the decisions already made (an extension).
Branch and bound maintains a set of partial solutions, at least one of which extends to an optimal solution. The pseudocode for one variant, depth-first search with best-first backtracking is as follows.
let h be an empty minheap of partial solutions, ordered by Held–Karp value
let bestsolsofar = null
let cursol be the partial solution with no variables assigned
loop
while cursol is not a complete solution and cursol's H–K value is at least as good as the value of bestsolsofar
choose a branching variable v
let sol0 be cursol union {v -> 0}
let sol1 be cursol union {v -> 1}
evaluate sol0 and sol1
let cursol be the better of the two; put the other in h
end while
if cursol is better than bestsolsofar then
let bestsolsofar = cursol
delete all heap nodes worse than cursol
end if
if h is empty then stop; we've found the optimal solution
pop the minimum element of h and store it in cursol
end loop
The idea of branch and bound is that there's a search tree of partial solutions. The point of solving Held–Karp is that the value of the LP is at most the length OPT of the optimal tour but also conjectured to be at least 3/4 OPT (in practice, usually closer to OPT).
The one detail in the pseudocode I've left out is how to choose the branching variable. The goal is usually to make the "hard" decisions first, so fixing a variable whose value is already near 0 or 1 is probably not wise. One option is to choose the closest to 0.5, but there are many, many others.
EDIT
Java implementation. 198 nonblank, noncomment lines. I forgot that 1-trees don't work with assigning variables to 1, so I branch by finding a vertex whose 1-tree has degree >2 and delete each edge in turn. This program accepts TSPLIB instances in EUC_2D format, e.g., eil51.tsp and eil76.tsp and eil101.tsp and lin105.tsp from http://www2.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/tsp/.
// simple exact TSP solver based on branch-and-bound/Held--Karp
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class TSP {
// number of cities
private int n;
// city locations
private double[] x;
private double[] y;
// cost matrix
private double[][] cost;
// matrix of adjusted costs
private double[][] costWithPi;
Node bestNode = new Node();
public static void main(String[] args) throws IOException {
// read the input in TSPLIB format
// assume TYPE: TSP, EDGE_WEIGHT_TYPE: EUC_2D
// no error checking
TSP tsp = new TSP();
tsp.readInput(new InputStreamReader(System.in));
tsp.solve();
}
public void readInput(Reader r) throws IOException {
BufferedReader in = new BufferedReader(r);
Pattern specification = Pattern.compile("\\s*([A-Z_]+)\\s*(:\\s*([0-9]+))?\\s*");
Pattern data = Pattern.compile("\\s*([0-9]+)\\s+([-+.0-9Ee]+)\\s+([-+.0-9Ee]+)\\s*");
String line;
while ((line = in.readLine()) != null) {
Matcher m = specification.matcher(line);
if (!m.matches()) continue;
String keyword = m.group(1);
if (keyword.equals("DIMENSION")) {
n = Integer.parseInt(m.group(3));
cost = new double[n][n];
} else if (keyword.equals("NODE_COORD_SECTION")) {
x = new double[n];
y = new double[n];
for (int k = 0; k < n; k++) {
line = in.readLine();
m = data.matcher(line);
m.matches();
int i = Integer.parseInt(m.group(1)) - 1;
x[i] = Double.parseDouble(m.group(2));
y[i] = Double.parseDouble(m.group(3));
}
// TSPLIB distances are rounded to the nearest integer to avoid the sum of square roots problem
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
double dx = x[i] - x[j];
double dy = y[i] - y[j];
cost[i][j] = Math.rint(Math.sqrt(dx * dx + dy * dy));
}
}
}
}
}
public void solve() {
bestNode.lowerBound = Double.MAX_VALUE;
Node currentNode = new Node();
currentNode.excluded = new boolean[n][n];
costWithPi = new double[n][n];
computeHeldKarp(currentNode);
PriorityQueue<Node> pq = new PriorityQueue<Node>(11, new NodeComparator());
do {
do {
boolean isTour = true;
int i = -1;
for (int j = 0; j < n; j++) {
if (currentNode.degree[j] > 2 && (i < 0 || currentNode.degree[j] < currentNode.degree[i])) i = j;
}
if (i < 0) {
if (currentNode.lowerBound < bestNode.lowerBound) {
bestNode = currentNode;
System.err.printf("%.0f", bestNode.lowerBound);
}
break;
}
System.err.printf(".");
PriorityQueue<Node> children = new PriorityQueue<Node>(11, new NodeComparator());
children.add(exclude(currentNode, i, currentNode.parent[i]));
for (int j = 0; j < n; j++) {
if (currentNode.parent[j] == i) children.add(exclude(currentNode, i, j));
}
currentNode = children.poll();
pq.addAll(children);
} while (currentNode.lowerBound < bestNode.lowerBound);
System.err.printf("%n");
currentNode = pq.poll();
} while (currentNode != null && currentNode.lowerBound < bestNode.lowerBound);
// output suitable for gnuplot
// set style data vector
System.out.printf("# %.0f%n", bestNode.lowerBound);
int j = 0;
do {
int i = bestNode.parent[j];
System.out.printf("%f\t%f\t%f\t%f%n", x[j], y[j], x[i] - x[j], y[i] - y[j]);
j = i;
} while (j != 0);
}
private Node exclude(Node node, int i, int j) {
Node child = new Node();
child.excluded = node.excluded.clone();
child.excluded[i] = node.excluded[i].clone();
child.excluded[j] = node.excluded[j].clone();
child.excluded[i][j] = true;
child.excluded[j][i] = true;
computeHeldKarp(child);
return child;
}
private void computeHeldKarp(Node node) {
node.pi = new double[n];
node.lowerBound = Double.MIN_VALUE;
node.degree = new int[n];
node.parent = new int[n];
double lambda = 0.1;
while (lambda > 1e-06) {
double previousLowerBound = node.lowerBound;
computeOneTree(node);
if (!(node.lowerBound < bestNode.lowerBound)) return;
if (!(node.lowerBound < previousLowerBound)) lambda *= 0.9;
int denom = 0;
for (int i = 1; i < n; i++) {
int d = node.degree[i] - 2;
denom += d * d;
}
if (denom == 0) return;
double t = lambda * node.lowerBound / denom;
for (int i = 1; i < n; i++) node.pi[i] += t * (node.degree[i] - 2);
}
}
private void computeOneTree(Node node) {
// compute adjusted costs
node.lowerBound = 0.0;
Arrays.fill(node.degree, 0);
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) costWithPi[i][j] = node.excluded[i][j] ? Double.MAX_VALUE : cost[i][j] + node.pi[i] + node.pi[j];
}
int firstNeighbor;
int secondNeighbor;
// find the two cheapest edges from 0
if (costWithPi[0][2] < costWithPi[0][1]) {
firstNeighbor = 2;
secondNeighbor = 1;
} else {
firstNeighbor = 1;
secondNeighbor = 2;
}
for (int j = 3; j < n; j++) {
if (costWithPi[0][j] < costWithPi[0][secondNeighbor]) {
if (costWithPi[0][j] < costWithPi[0][firstNeighbor]) {
secondNeighbor = firstNeighbor;
firstNeighbor = j;
} else {
secondNeighbor = j;
}
}
}
addEdge(node, 0, firstNeighbor);
Arrays.fill(node.parent, firstNeighbor);
node.parent[firstNeighbor] = 0;
// compute the minimum spanning tree on nodes 1..n-1
double[] minCost = costWithPi[firstNeighbor].clone();
for (int k = 2; k < n; k++) {
int i;
for (i = 1; i < n; i++) {
if (node.degree[i] == 0) break;
}
for (int j = i + 1; j < n; j++) {
if (node.degree[j] == 0 && minCost[j] < minCost[i]) i = j;
}
addEdge(node, node.parent[i], i);
for (int j = 1; j < n; j++) {
if (node.degree[j] == 0 && costWithPi[i][j] < minCost[j]) {
minCost[j] = costWithPi[i][j];
node.parent[j] = i;
}
}
}
addEdge(node, 0, secondNeighbor);
node.parent[0] = secondNeighbor;
node.lowerBound = Math.rint(node.lowerBound);
}
private void addEdge(Node node, int i, int j) {
double q = node.lowerBound;
node.lowerBound += costWithPi[i][j];
node.degree[i]++;
node.degree[j]++;
}
}
class Node {
public boolean[][] excluded;
// Held--Karp solution
public double[] pi;
public double lowerBound;
public int[] degree;
public int[] parent;
}
class NodeComparator implements Comparator<Node> {
public int compare(Node a, Node b) {
return Double.compare(a.lowerBound, b.lowerBound);
}
}

If your graph satisfy the triangle inequality and you want a guarantee of 3/2 within the optimum I suggest the christofides algorithm. I've wrote an implementation in php at phpclasses.org.

As of 2013, It is possible to solve for 100 cities using only the exact formulation in Cplex. Add degree equations for each vertex, but include subtour-avoiding constraints only as they appear. Most of them are not necessary. Cplex has an example on this.
You should be able to solve for 100 cities. You will have to iterate every time a new subtour is found. I ran an example here and in a couple of minutes and 100 iterations later I got my results.

I took Held-Karp algorithm from concorde library and 25 cities are solved in 0.15 seconds. This performance is perfectly good for me! You can extract the code (writen in ANSI C) of held-karp from concorde library: http://www.math.uwaterloo.ca/tsp/concorde/downloads/downloads.htm. If the download has the extension gz, it should be tgz. You might need to rename it. Then you should make little ajustments to port in in VC++. First take the file heldkarp h and c (rename it cpp) and other about 5 files, make adjustments and it should work calling CCheldkarp_small(...) with edgelen: euclid_ceiling_edgelen.

TSP is an NP-hard problem. (As far as we know) there is no algorithm for NP-hard problems which runs in polynomial time, so you ask for something that doesn't exist.
It's either fast enough to finish in a reasonable time and then it's not exact, or exact but won't finish in your lifetime for 100 cities.

To give a dumb answer: me too. Everyone is interrested in such algorithm, but as others already stated: I does not (yet?) exist. Esp your combination of exact, 200 nodes, few seconds runtime and just 200 lines of code is impossible. You already know that is it NP hard and if you got the slightest impression of asymptotic behaviour you should know that there is no way of achieving this (except you prove that NP=P, and even that I would say thats not possible). Even the exact commercial solvers need for such instances far more than some seconds and as you can imagine they have far more than 200 lines of code (even when you just consider their kernels).
EDIT: The wiki algorithms are the "usual suspects" of the field: Linear Programming and branch-and-bound. Their solutions for the instances with thousands of nodes took Years to solve (they just did it with very very much CPUs parallel, so they can do it faster). Some even use for the branch-and-bound problem specific knowledge for the bounding, so they are no general approaches.
Branch and bound just enumerates all possible paths (e.g. with backtracking) and applies once it has a solution this for to stop a started recursion when it can prove that the result is not better than the already found solution (e.g. if you just visited 2 of your cities and the path is already longer than a found 200 city tour. You can discard all tours that start with that 2 city combination). Here you can invest very much problem specific knowledge in the function that tells you, that the path is not going to be better than the already found solution. The better it is, the less paths you have to look at, the faster is your algorithm.
Linear Programming is an optimization method so solve linear inequality problems. It works in polynomial time (simplex just practically, but that doesnt matter here), but the solution is real. When you have the additional constraint that the solution must be integer, it gets NP-complete. For small instances it is possible, e.g. one method to solve it, then look which variable of the solution violates the integer part and add addition inequalities to change it (this is called cutting-plane, the name cames from the fact that the inequalities define (higher-dimensional) plane, the solution space is a polytop and by adding additional inequalities you cut something with a plane from the polytop). The topic is very complex and even a general simple simplex is hard to understand when you dont want dive deep into the math. There are several good books about, one of the betters is from Chvatal, Linear Programming, but there are several more.

I have a theory, but I've never had the time to pursue it:
The TSP is a bounding problem (single shape where all points lie on the perimeter) where the optimal solution is that solution that has the shortest perimeter.
There are plenty of simple ways to get all the points that lie on a minimum bounding perimeter (imagine a large elastic band stretched around a bunch of nails in a large board.)
My theory is that if you start pushing in on the elastic band so that the length of band increases by the same amount between adjacent points on the perimeter, and each segment remains in the shape of an eliptical arc, the stretched elastic will cross points on the optimal path before crossing points on non-optimal paths. See this page on mathopenref.com on drawing ellipses--particularly steps 5 and 6. Points on the bounding perimeter can be viewed as focal points of the ellipse (F1, F2) in the images below.
What I don't know is if the "bubble stretching" process needs to be reset after each new point is added, or if the existing "bubbles" continue to grow and each new point on the perimeter causes only the localized "bubble" to turn into two line segments. I'll leave that for you to figure out.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio