Hill climbing and single-pair shortest path algorithms - data-structures

I have a bit of a strange question. Can anyone tell me where to find information about, or give me a little bit of an introduction to using shortest path algorithms that use a hill climbing approach? I understand the basics of both, but I can't put the two together. Wikipedia has an interesting part about solving the Travelling Sales Person with hill climbing, but doesn't provide a more in-depth explanation of how to go about it exactly.
For example, hill climbing can be
applied to the traveling salesman
problem. It is easy to find a solution
that visits all the cities but will be
very poor compared to the optimal
solution. The algorithm starts with
such a solution and makes small
improvements to it, such as switching
the order in which two cities are
visited. Eventually, a much better
route is obtained.
As far as I understand it, you should pick any path and then iterate through it and make optimisations along the way. For instance going back and picking a different link from the starting node and checking whether that gives a shorter path.
I am sorry - I did not make myself very clear. I understand how to apply the idea to Travelling Salesperson. I would like to use it on a shortest distance algorithm.

You could just randomly exchange two cities.
You first path is: A B C D E F A with length 200
Now you change it by swapping C and D: A B D C E F A with length 350 - Worse!
Next step: A B C D F E A: length 150 - You improved your solution. ;-)
Hill climbing algorithms are really easy to implement but have several problems with local maxima! [A better approch based on the same idea is simulated annealing.]
Hill climbing is a very simple kind of evolutionary optimization, a much more sophisticated algorithm class are genetic algorithms.
Another good metaheuristic for solving the TSP is ant colony optimization

Examples would be genetic algorithms or expectation maximization in data clustering. With an iteration of single steps it is tried to come to a better solution with every step. The problem is that it only finds a local maximum/minimum, it is never assured that it finds the global maximum/minimum.
A solution for the travelling salesman problem as a genetic algorithm for which we need:
Representation of the solution as order of visited cities, e.g. [New York, Chicago, Denver, Salt Lake City, San Francisco]
Fitness function as the travelled distance
Selection of the best results is done by selecting items randomly depending on their fitness, the higher the fitness, the higher the probability that the solution is chosen to survive
Mutation would be switching to cities in a list, like [A,B,C,D] becomes [A,C,B,D]
Crossing of two possible solutions [B,A,C,D] and [A,B,D,C] result in [B,A,D,C], i.e. cutting both list in the middle and use the beginning of one parent and the end of the other parent to form the child
The algorithm then:
initalization of the starting set of solution
calculation of the fitness of every solution
until desired maximum fitness or until no changes happen any more
selection of the best solutions
crossing and mutation
fitness calculation of every solution
It is possible that with every execution of the algorithm the result is differently, therefore it should be executed more then once.

I'm not sure why you would want to use a hill-climbing algorithm, since Djikstra's algorithm is polynomial complexity O( | E | + | V | log | V | ) using Fibonacci queues:
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
If you're looking for an heuristic approach to the single-path problem, then you can use A*:
http://en.wikipedia.org/wiki/A*_search_algorithm
but an improvement in efficiency is dependent on having an admissible heuristic estimate of the distance to the goal.
http://en.wikipedia.org/wiki/A*_search_algorithm

To hillclimb the TSP you should have a starting route. Of course picking a "smart" route wouldn't hurt.
From that starting route you make one change and compare the result. If it's higher you keep the new one, if it's lower keep the old one. Repeat this until you reach a point from where you can't climb anymore, which becomes your best result.
Obviously, with TSP, you will more than likely hit a local maximum. But it is possible to get decent results.

Related

Using a different heuristic for the A* Algorithm

my professor gave me a simple task to program - at least i thought so. The basics are simple: I have a randomized maze out of n > 20.000 different rooms. I thought "hey great, let's use A* to calculate the optimal route". Now i am encountering the following problem:
The java-program from my professor randomizes the rooms into a large array saving all successor rooms from room X. Example:
Rooms[2427][{42},{289},{2833}] // Room 2427 is next to the rooms 42, 289 and 2833 - those 3 rooms can be entered from room 2427.
A* needs a good heuristic to measure the following costs for each room. Example:
TotalCosts for Room Successor = CompleteCosts for CurrentRoom + Costs from CurrentRoom to Successor + estimated rest costs.
The "estimated rest costs" are the problem. If i had a 2D array i could easily use the air distance from a to b as a basic heuristic cause there will be at least X costs (air distance from a to b). However i can't use that now, because i don't know where the rooms are! They are completely randomized and when i don't know where Room 23878 will be.
Are there any other kinds of heuristics i can use? I cannot just set the estimated rest costs for every open room to 0, can i? Are there better ways?
Thanks!
Using 0 for the heuristic should just work. The 0-heuristic is actually a special case of the A* algorithm, also called the Dijkstra algorithm.
I believe there is no admissible heuristic estimate in this case, i.e. h(x) = 0 in all cases.
However, it seems that the edges of the graph are not weighted. You can simply code BFS, which will run optimally fast (and easy to implement) in O(V+E).

Calculating taxi movements

Let's say I have N taxis, and N customers waiting to be picked up by the taxis. The initial positions of both customers and taxis are random/arbitrary.
Now I want to assign each taxi to exactly one customer.
The customers are all stationary, and the taxis all move at identical speed. For simplicity, let's assume there are no obstacles, and the taxis can move in straight lines to assigned customers.
I now want to minimize the time until the last customer enters his/her taxi.
Is there a standard algorithm to solve this? I have tens of thousands of taxis/customers. Solution doesn't have to be optimal, just ‘good’.
The problem can almost be modelled as the standard “Assignment Problem”, solvable using the Hungarian algorithm (the Kuhn–Munkres algorithm or Munkres assignment algorithm). However, I want to minimize the cost of the costliest assignment, not minimize the sum of costs of the assignments.
Since you mentioned Hungarian Algorithm, I guess one thing you could do is using some different measure of distance rather than the euclidean distance and then run t Hungarian Algorithm on it. For example, instead of using
d = sqrt((x0 - x1) ^ 2 + (y1 - y0) ^ 2)
use
d = ((x0 - x1) ^ 2 + (y1 - y0) ^ 2) ^ 10
that could cause the algorithm to penalize big numbers heavily, which could constrain the length of the max distance.
EDIT: This paper "Geometry Helps in Bottleneck Matching and Related
Problems" may contains a better algorithm. However, I am still in the process of reading it.
I'm not sure that the Hungarian algorithm will work for your problem here. According to the link, it runs in n ^ 3 time. Plugging in 25,000 as n would yield 25,000 ^ 3 = 15,625,000,000,000. That could take quite a while to run.
Since the solution does not need to be optimal, you might consider using simulated annealing or possibly a genetic algorithm instead. Either of these should be much faster and still produce close to optimal solutions.
If using a genetic algorithm, the fitness function can be designed to minimize the longest period of time that an individual would need to wait. But, you would have to be careful because if that is the sole criteria, then the solution won't work too well for cases when there is just one cab that is closest to the passenger that is furthest away. So, the fitness function would need to take into account the other waiting times as well. One idea to solve this would be to run the model iteratively and remove the longest cab trip (both cab & person) after each iteration. But, doing that for all 10,000+ cabs/people could be expensive time wise.
I don't think any cab owner or manager would even consider minimizing the waiting time for the last customer entering his cab over minimizing the sum of the waiting time for all cabs - simply because they make more money overall when minimizing the sum of the waiting times. At least Louie DePalma would never do that... So, I suspect that the real problem you have has little or nothing to do with cabs...
A "good" algorithm that would solve your problem is a Greedy Algorithm. Since taxis and people have a position, these positions can be related to a "central" spot. Sort the taxis and people needing to get picked up in order (in relation to the "centre"). Then start assigning taxis, in order, to pick up people in order. This greedy rule will ensure taxis closest to the centre will pick up people closest to the centre and taxis farthest away pick up people farthest away.
A better way might be to use Dynamic Programming however, I am not sure nor have the time to invest. A good tutorial for Dynamic Programming can be found here
For an optimal solution: construct a weighted bipartite graph with a vertex for each taxi and customer and an edge from each taxi to each customer whose weight is the travel time. Scan the edges in order of nondecreasing weight, maintaining a maximum matching of the subgraph containing the edges scanned so far. Stop when the matching is perfect.

Travelling Salesman with multiple salesmen?

I have a problem that has been effectively reduced to a Travelling Salesman Problem with multiple salesmen. I have a list of cities to visit from an initial location, and have to visit all cities with a limited number of salesmen.
I am trying to come up with a heuristic and was wondering if anyone could give a hand. For example, if I have 20 cities with 2 salesmen, the approach that I thought of taking is a 2 step approach. First, divide the 20 cities up randomly into 10 cities for 2 salesman each, and I'd find the tour for each as if it were independent for a few iterations. Afterwards, I'd like to either swap or assign a city to another salesman and find the tour. Effectively, it'd be a TSP and then minimum makespan problem. The problem with this is that it'd be too slow and good neighborhood generation of swapping or assigning a city is hard.
Can anyone give an advise on how I could improve the above?
EDIT:
The geo-location for each city are known, and the salesmen start and end at the same places. The goal is to minimize the max travelling time, making this sort of a minimum makespan problem. So for example, if salesman1 takes 10 hours and salesman2 takes 20 hours, the maximum travelling time would be 20 hours.
TSP is a difficult problem. Multi-TSP is probably much worse. I'm not sure you can find good solutions with ad-hoc methods like this. Have you tried meta-heuristic methods ? I'd try using the Cross Entropy method first : it shouldn't be too hard to use it for your problem. Otherwise look for Generic Algorithms, Ant Colony Optimization, Simulated Annealing ...
See "A Tutorial on the Cross-Entropy Method" from Boer et al. They explain how to use the CE method on the TSP. A simple adaptation for your problem might be to define a different matrix for each salesman.
You might want to assume that you only want to find the optimal partition of cities between the salesmen (and delegate the shortest tour for each salesman to a classic TSP implementation). In this case, in the Cross Entropy setting, you consider a probability for each city Xi to be in the tour of salesman A : P(Xi in A) = pi. And you work, on the space of p = (p1, ... pn). (I'm not sure it will work very well, because you will have to solve many TSP problems.)
When you start talking about multiple salesmen I start thinking about particle swarm optimization. I've found a lot of success with this using a gravitational search algorithm. Here's a (lengthy) paper I found covering the topic. http://eprints.utm.my/11060/1/AmirAtapourAbarghoueiMFSKSM2010.pdf
Why don't you convert your multiple TSP into the traditional TSP?
This is a well-known problem (transforming multiple salesman TSP into TSP) and you can find several articles on it.
For most transformations, you basically copy your depot (where the salesmen start and finish) into several depots (in your case 2), make the edge weights in a way to force a TSP to come back to the depot twice, and then remove the two depots and turn them into one.
Voila! got two min cost tours that cover the vertices exactly once.
I wouldn't start writing an algorythm for such complicated issue (unless that's my day job - to write optimization algorythms). Why don't you turn to a generic solution like http://www.optaplanner.org/ ? You have to define your problem and the program uses algorithms that top developers took years to create and optimize.
My first thought on reading the problem description would be to use a standard approach for the salesman problem (googling for an appropriate one as I've never actually had to write code for it); Then take the result and split it in half. Your algorithm then could be to decide where "half" is -- maybe it is half of the cities, or maybe it is based on distance, or maybe some combination. Or search the result for the largest distance between two cities and choose that as the separation between salesman #1's last city and salesman #2's first city. Of course it does not limit to two salesman, you would break into x pieces; But overall the idea is that your standard 1 salesman TSP solution should have already gotten the "nearby" cities next to each other in the travel graph, so you don't have to come up with a separate grouping algorithm...
Anyway, I'm sure there are better solutions, but this seems like a good first approach to me.
Have a look at this question (562904) - while not identical to yours there should be some good food for thought and references for further study.
As mentioned in the answer above the hierarchical clustering solution will work very well for your problem. Instead of continuing to dissolve clusters until you have a single path, however, stop when you have n, where n is the number of salesmen you have. You can probably improve it some by adding in some "fake" stops to improve the likelihood that your clusters end up evenly spaced from the initial destination if the initial clusters are too disparate. It's not optimal - but you're not going to get an optimal solution for a problem like this. I'd create an app that visualizes the problem and then test many variants of the solution to get a feel for whether your heuristic is optimal enough.
In any case I would not randomize the clusters, that would cause the majority of the clusters to be sub-optimal.
just by starting to read your question using genetic algorithm came to my head. just use two genetic algorithm in the same time one can solve how to assign cities to salesmen and the other can solve the TSP for each salesman you have.

Travelling salesman with repeat nodes & dynamic weights

Given a list of cities and the cost to fly between each city, I am trying to find the cheapest itinerary that visits all of these cities. I am currently using a MATLAB solution to find the cheapest route, but I'd now like to modify the algorithm to allow the following:
repeat nodes - repeat nodes should be allowed, since travelling via hub cities can often result in a cheaper route
dynamic edge weights - return/round-trip flights have a different (usually lower) cost to two equivalent one-way flights
For now, I am ignoring the issue of flight dates and assuming that it is possible to travel from any city to any other city.
Does anyone have any ideas how to solve this problem? My first idea was to use an evolutionary optimisation method like GA or ACO to solve point 2, and simply adjust the edge weights when evaluating the objective function based on whether the itinerary contains return/round-trip flights, but perhaps somebody else has a better idea.
(Note: I am using MATLAB, but I am not specifically looking for coded solutions, more just high-level ideas about what algorithms can be used.)
Edit - after thinking about this some more, allowing "repeat nodes" seems to be too loose of a constraint. We could further constrain the problem so that, although nodes can be repeatedly visited, each directed edge can only be visited at most once. It seems reasonable to ignore any itineraries which include the same flight in the same direction more than once.
I haven't tested it myself; however, I have read that implementing Simulated Annealing to solve the TSP (or variants of it) can produce excellent results. The key point here is that Simulated Annealing is very easy to implement and requires minimal tweaking, while approximation algorithms can take much longer to implement and are probably more error prone. Skiena also has a page dedicated to specific TSP solvers.
If you want the cost of the solution produced by the algorithm is within 3/2 of the optimum then you want the Christofides algorithm. ACO and GA don't have a guaranteed cost.
Solving the TSP is a NP-hard problem for its subcycles elimination constraints, if you remove any of them (for your hub cities) you just make the problem easier.
But watch out: TSP has similarities with association problem in the meaning that you could obtain non-valid itineraries like:
Cities: New York, Boston, Dallas, Toronto
Path:
Boston - New York
New York - Boston
Dallas - Toronto
Toronto - Dallas
which is clearly wrong since we don't go across all cities.
The subcycle elimination constraints serve just to this purpose. Including a 'hub city' sounds like you need to add weights to the point and make an hybrid between flux problems and tsp problems. Sounds pretty hard but the first try may be: eliminate the subcycles constraints relative to your hub cities (and leave all the others). You can then link the subcycles obtained for the hub cities together.
Good luck
Firstly, what is approximate number of cities in your problem set? (Up to 100? More than 100?)
I have a fair bit of experience with GA (not ACO), and like epitaph says, it has a bit of gambling aspect. For some input, it might stop at a brutally inefficient solution. So, what I have done in the past is to use GA as the first option, compare the answer to some lower bound, and if that seems to be "way off", then run a second (usually a less efficient) algorithm.
Of course, I used plenty of terms that were not standard, so let us make sure that we agree what they would be in this context:
lower bound - of course, in this case, MST would be a lower bound.
"Way Off" - If triangle inequality holds, then an upper bound is UB = 2 * MST. A good "way off" in this context would be 2 * UB.
Second algorithm - In this case, both a linear programming based approach and Christofides would be good choices.
If you limit the problem to round-trips (i.e. the salesman can only buy round-trip tickets), then it can be represented by an undirected graph, and the problem boils down to finding the minimum spanning tree, which can be done efficiently.
In the general case I don't know of a clever way to use efficient algorithms; GA or similar might be a good way to go.
Do you want a near-optimal solution, or do you want the optimal solution?
For the optimal solution, there's still good ol' brute force. Due to requirement 1 involving repeat nodes, you'll have to make sure you search breadth-first, not dept-first. Otherwise you can end up in an infinite loop. You can slowly drop all routes that exceed your current minimum until all routes are exhausted and the minimal route is discovered.

Matching algorithm

Odd question here not really code but logic,hope its ok to post it here,here it is
I have a data structure that can be thought of as a graph.
Each node can support many links but is limited to a value for each node.
All links are bidirectional. and each link has a cost. the cost depends on euclidian difference between the nodes the minimum value of two parameters in each node. and a global modifier.
i wish to find the maximum cost for the graph.
wondering if there was a clever way to find such a matching, rather than going through in brute force ...which is ugly... and i'm not sure how i'd even do that without spending 7 million years running it.
To clarify:
Global variable = T
many nodes N each have E,X,Y,L
L is the max number of links each node can have.
cost of link A,B = Sqrt( min([a].e | [b].e) ) x
( 1 + Sqrt( sqrt(sqr([a].x-[b].x)+sqr([a].y-[b].y)))/75 + Sqrt(t)/10 )
total cost =sum all links.....and we wish to maximize this.
average values for nodes is 40-50 can range to (20..600)
average node linking factor is 3 range 0-10.
For the sake of completeness for anybody else that looks at this article, i would suggest revisiting your graph theory algorithms:
Dijkstra
Astar
Greedy
Depth / Breadth First
Even dynamic programming (in some situations)
ect. ect.
In there somewhere is the correct solution for your problem. I would suggest looking at Dijkstra first.
I hope this helps someone.
If I understand the problem correctly, there is likely no polynomial solution. Therefore I would implement the following algorithm:
Find some solution by beng greedy. To do that, you sort all edges by cost and then go through them starting with the highest, adding an edge to your graph while possible, and skipping when the node can't accept more edges.
Look at your edges and try to change them to archive higher cost by using a heuristics. The first that comes to my mind: you cycle through all 4-tuples of nodes (A,B,C,D) and if your current graph has edges AB, CD but AC, BD would be better, then you make the change.
Optionally the same thing with 6-tuples, or other genetic algorithms (they are called that way because they work by mutations).
This is equivalent to the traveling salesman problem (and is therefore NP-Complete) since if you could solve this problem efficiently, you could solve TSP simply by replacing each cost with its reciprocal.
This means you can't solve exactly. On the other hand, it means that you can do exactly as I said (replace each cost with its reciprocal) and then use any of the known TSP approximation methods on this problem.
Seems like a max flow problem to me.
Is it possible that by greedily selecting the next most expensive option from any given start point (omitting jumps to visited nodes) and stopping once all nodes are visited? If you get to a dead end backtrack to the previous spot where you are not at a dead end and greedily select. It would require some work and probably something like a stack to keep your paths in. I think this would work quite effectively provided the costs are well ordered and non negative.
Use Genetic Algorithms. They are designed to solve the problem you state rapidly reducing time complexity. Check for AI library in your language of choice.

Resources