Travelling Salesman - approximation online software? - genetic-algorithm

would any of you know of a solution to generate even a mediocre solution to the travelling salesman problem. I have 3 people meant to visit 31 destinations... I'm not sure how to approach that?
Thanks to all -
Maxim

There are several options including: Nearest Neighbor, Christofides, and Lin–Kernighan.
If these fail, you can always use code from the experts (free for academic researchers).

I suppose you could try a greedy approach by picking an arbitrary point to start (if a starting point is not specified) and then on each step you travel to the closest point to your current point. Assuming that distance between points can be calculated in constant time you can then find a path in O(n^2) time.
To clarify, the following pseudo code should help (I haven't written this kind of thing in a while, so I hope this is clear enough)
Name: GreedyPath(C, p)
Input: C - a non-empty collection of points to visit
p - a starting point in C
Output: S - a sequence of points covering C
Algorithm:
Remove p from C
If C is empty
Return the sequence containing only p
Else
Let p1 be the closest point to p in C
Let S = GreedyPath(C, p1)
Append p to the start of S
Return S
Obviously the time consuming part is finding the closest point to p every time, but for n points this should only require n(n-1)/2 distance calculations (unless I'm mistaken).

Related

Can you help explain this Held-Karp TSP Pseudocode?

I am trying to implement the Held-Karp algorithm for the Traveling Salesman Problem by following this pseudocode:
(which I found here: https://en.wikipedia.org/wiki/Held%E2%80%93Karp_algorithm#Example.5B4.5D )
I can do the algorithm by hand but am having trouble actually implementing it in code. It would be great if someone could provide an easy-to-follow explanation.
I also don't understand this:
I thought this part was for setting the distance from the starting city to it's connected cities. If that was the case, wouldn't it be it C({1}, k) := d1,k and not C({k}, k) := d1,k? Am I just completely misunderstanding this?
I have also heard that this algorithm does not perform very well past about 15-20 cities so for around 40 cities, what would be a good alternative?
Held-Karp is a dynamic programming approach.
In dynamic programming, you break the task into subtasks and use "dynamic function" to solve larger subtasks using already computed results of smaller subtasks, until you finally solve your task.
To understand a DP algorithm it's imperative to understand how it defines subtask and dynamic function.
In the case of Held-Karp, the subtask is following:
For a given set of vertices S and a vertex k   (1 ∉ S, k ∈ S)
C(S,k) is the minimal length of the path that starts with vertex 1, traverses all vertices in S and ends with the vertex k.
Given this subtask definition, it's clear why initialization is:
C({k}, k) := d(1,k)
The minimal length of the path from 1 to k, traversing through {k}, is just the edge from 1 to k.
Next, the "dynamic function".
A side note, DP algorithm could be written as top-down or bottom-up. This pseudocode is bottom-up, meaning it computes smaller tasks first and uses their results for larger tasks. To be more specific, it computes tasks in the order of increasing size of the set S, starting from |S|=1 and going up to |S| = n-1 (i.e. S containing all vertices, except 1).
Now, consider a task, defined by some S, k. Remember, it corresponds to path from 1, through S, ending in k.
We break it into a:
path from 1, through all vertices in S except k (S\k), which ends in the vertex m   (m ∈ S, m ≠ k):  C(S\k, m)
an edge from m to k
It's easy to see, that if we look through all possible ways to break C(S,k) like this, and find the minimal path among them, we'll have the answer for C(S, k).
Finally, having computed all C(S, k) for |S| = n-1, we check all of them, completing the cycle with the missing edge from k to 1:  d(1,k). The minimal cycle is the final result.
Regarding:
I have also heard that this algorithm does not perform very well past about 15-20 cities so for around 40 cities, what would be a good alternative?
Held-Karp has algorithmic complexity of θ(n²2n). 40² * 240 ≈ 1.75 * 1015 which, I would say, is unfeasible to compute on a single machine in reasonable time.
As David Eisenstat suggested, there are approaches using mixed integer programming that can solve this problem fast enough for N=40.
For example, see this blog post, and this project that builds upon it.

is this the best complexity I can get?

The problem goes as follows:
you have n domino pieces and the two numbers on the every domino piece (n pieces), also there is an extra set of m domino pieces but you can use only one piece from this set (at most) to help you do the following:
calculate the minimum number of domino pieces that you can use to go from a given starting point S to an ending point D.
meaning that the starting piece should have the number (S) and the ending piece should have the number (D).
Input:
n and the domino pieces' numbers (n pairs).
m and the extra domino pieces' numbers (m pairs).
starting point S and a destination D.
Output:
the minimum number of domino pieces.
I am thinking of using BFS for this problem where I can start from S and find the minimum path to D with constantly removing node m(i) from the graph and adding node m(i+1)
but doing this the time complexity will be O(n * m).
but not only this, there could be multiple starting points S so the complexity would be O(|S| * n * m).
can it be solved in a better way?
The teacher said it could be solved in a Linear Time but I am just very confused.
I initially missed that your question allows multiple sources, and wrote a somewhat long answer explaining how to approach that problem. Let me post it here anyway, because it might still be helpful. Scroll further for the solution to the original question.
Finding shortest paths from single S to D in linear time
Let's build the idea incrementally. First, let's try to solve a simpler version of the problem, where we just need to find out whether you can get from a single S to a single D at all by using at most one domino from the set of extra M dominoes.
I suggest to approach it this way: do some preprocessing on the N dominoes that will let you, for each of the M additional dominoes, quickly (in constant time) answer whether there exists a path from S to D that goes through this domino. (And of course we need to remember the edge case when we don't need an extra domino at all, but it's easy to cover in linear time.)
What kind of information lets you answer this question? Let's say you are looking at a domino with numbers A and B on its ends. If you knew that you can get from S to A, and from B to D, you use this domino to get from S to D, right? Alternatively, if there was a path from S to B and from A to D, it would also work. If neither is true, then there is no way this domino can help you to get from S to D.
That's great, but if we run BFS from every possible B, we won't achieve linear time complexity. However, note that you can reverse the second problem (detecting if a path from B's to D exists) and pose it as "can I get from D to every possible B"? That is easily answered with a single BFS.
Can you see how this approach can be adapted to find length of the shortest path through each domino, as opposed to just detecting if a path exists?
Finding shortest paths from multiple S to D in linear time
Let's reverse the problem and say we want to find the shortest paths from D to multiple S. You could create a directed graph with twice as many nodes as there were unique numbers written on dominoes. That is, for each number there are nodes V and V', and if you are in V, it means you haven't used an extra domino yet, but if you are in V', it means you already used one. Each core (that is, one of the original N) domino (A, B) corresponds to 4 edges in this graph: (A -> B), (B -> A), (A' -> B'), (B' -> A'). Each extra domino corresponds to 2 edges: (A -> B'), (B -> A'). Note that once we get into a node with ', we can never get out of it, so we will only use at most one extra domino this way. A single BFS from D in this graph will answer the problem.

Picking the "spread" from the points on a line

I'm facing an algorithmic problem described as follows: Given a line from 0 to N (really big N), a list of X points on said line, and a number Z (0<=Z<=X) pick Z points from X to maximize the distance between two closest points. The brute-force solution in O(n^2) doesn't seem that difficult but I'm looking for something more sophisticated that can be done in O(n log n) time. Any clues, solutions, advice is very appreciated.
Edit: Answering the question in the first post-it is the minimal distance (between the two closest points) that has to be maximized.
One easy approach is O(XlogN).
First, sort the points.
Next observe that if you already know the minimum distance (call it d) between the points, it's O(X) to see if there's a way of picking Z points all of which are at least distance d apart: take the left-most element, then the next that's at least distance d away, then the next that's at least distance d away from that, and so on. If by the time you've got to the end of the array you have at least Z points, then you have a solution, and if you don't, there is no solution.
Now, you can use a binary search on [0, N] to find the largest d with a solution.
The sort is O(XlogX), the binary search takes O(logN) trials, and each is O(X). Overall, that's O(XlogX + XlogN), but since N >= X that simplifies to O(XlogN).

A greedy or dynamic algorithm to subset selection

I have a simple algorithmic question. I would be grateful if you could help me.
We have some 2 dimensional points. A positive weight is associated to them (a sample problem is attached). We want to select a subset of them which maximizes the weights and neither of two selected points overlap each other (for example, in the attached file, we cannot select both A and C because they are in the same row, and in the same way we cannot select both A and B, because they are in the same column.) If there is any greedy (or dynamic) approach I can use. I'm aware of non-overlapping interval selection algorithm, but I cannot use it here, because my problem is 2 dimensional.
Any reference or note is appreciated.
Regards
Attachment:
A simple sample of the problem:
A (30$) -------- B (10$)
|
|
|
|
C (8$)
If you are OK with a good solution, and do not demand the best solution - you can use heuristical algorithms to solve this.
Let S be the set of points, and w(s) - the weightening function.
Create a weight function W:2^S->R (from the subsets of S to real numbers):
W(U) = - INFINITY is the solution is not feasible
Sigma(w(u)) for each u in U otherwise
Also create a function next:2^S -> 2^2^S (a function that gets a subset of S, and returns a set of subsets of S)
next(U) = V you can get V from U by adding/removing one element to/from U
Now, given that data - you can invoke any optimization algorithm in the Artificial Intelligence book, such as Genetic Algorithm or Hill Climbing.
For example, Hill Climbing with random restarts, will be something like that:
1. best<- -INFINITY
2. while there is more time
3. choose a random subset s
4. NEXT <- next(s)
5. if max{ W(v) | for each v in NEXT} < W(s): //s is a local maximum
5.1. if W(s) > best: best <- W(s) //if s is better then the previous result - store it.
5.2. go to 2. //restart the hill climbing from a different random point.
6. else:
6.1. s <- max { NEXT }
6.2. goto 4.
7. return best //when out of time, return the best solution found so far.
The above algorithm is anytime - meaning it will produce better results if given more time.
This can be treated as a linear assignment problem, which can be solved using an algorithm like the Hungarian algorithm. The algorithm tries to minimize the sum of costs, so just negate your weights, and use them as the costs. The assignment of rows to columns will give you the subset of points that you need. There are sparse variants for cases where not every (row,column) pair has an associated point, but you can also just use a large positive cost for these.
Well you can think of this as a binary constraint optimization problem, and there are various algorithms. The easiest algorithm for this problem is backtracking and arc propogation. However, it takes exponential time in the worst case. I am not sure if there are any specific algorithms to take advantage of the geometrical nature of the problem.
This can be solved by a pretty straight forward dynamic programming approach with a exponential time complexity
s = {A, B, C ...}
getMaxSum(s) = max( A.value + getMaxSum(compatibleSubSet(s, A)),
B.value + getMaxSum(compatibleSubSet(s, B)),
...)
where compatibleSubSet(s, A) gets the subset of s that does not overlap with A
To optimize it, you can memorize the result for each subset
Some way to do it:
Write a function that generates subsets ordered from the subset off maximum weight to the subset off minimum weight while ignoring the constraints.
Then call this function repeatedly until a subset that honors the constraints pops up.
In order to improve the performance, you can write a not so dumb generator function that for instance honors the not-on-the-same-row constraint but that ignores the not-on-the-same-column one.

Point covering problem

I recently had this problem on a test: given a set of points m (all on the x-axis) and a set n of lines with endpoints [l, r] (again on the x-axis), find the minimum subset of n such that all points are covered by a line. Prove that your solution always finds the minimum subset.
The algorithm I wrote for it was something to the effect of:
(say lines are stored as arrays with the left endpoint in position 0 and the right in position 1)
algorithm coverPoints(set[] m, set[][] n):
chosenLines = []
while m is not empty:
minX = min(m)
bestLine = n[0]
for i=1 to length of n:
if n[i][0] <= minX and n[i][1] > bestLine[1] then
bestLine = n[i]
add bestLine to chosenLines
for i=0 to length of m:
if m[i] <= bestLine[1] then delete m[i] from m
return chosenLines
I'm just not sure if this always finds the minimum solution. It's a simple greedy algorithm so my gut tells me it won't, but one of my friends who is much better than me at this says that for this problem a greedy algorithm like this always finds the minimal solution. For proving mine always finds the minimal solution I did a very hand wavy proof by contradiction where I made an assumption that probably isn't true at all. I forget exactly what I did.
If this isn't a minimal solution, is there a way to do it in less than something like O(n!) time?
Thanks
Your greedy algorithm IS correct.
We can prove this by showing that ANY other covering can only be improved by replacing it with the cover produced by your algorithm.
Let C be a valid covering for a given input (not necessarily an optimal one), and let S be the covering according to your algorithm. Now lets inspect the points p1, p2, ... pk, that represent the min points you deal with at each iteration step. The covering C must cover them all as well. Observe that there is no segment in C covering two of these points; otherwise, your algorithm would have chosen this segment! Therefore, |C|>=k. And what is the cost (segments count) in your algorithm? |S|=k.
That completes the proof.
Two notes:
1) Implementation: Initializing bestLine with n[0] is incorrect, since the loop may be unable to improve it, and n[0] does not necessarily cover minX.
2) Actually this problem is a simplified version of the Set Cover problem. While the original is NP-complete, this variation results to be polynomial.
Hint: first try proving your algorithm works for sets of size 0, 1, 2... and see if you can generalise this to create a proof by induction.

Resources