I have a graph that represents a city. I know the location of places of interest (nodes, which have a Importance value), the location of the hotel I'm staying in, how the nodes are connected, the traversal time between them and have acess to latitude and longitude. There are no issues converting from time to distance and vice-versa.
The objective is to tour the city, maximizing the importance per day but limiting one day of travel to 10 hours. A day begins and ends at the hotel. I have a working A* algorithm that chooses the lowest value but with no heuristic yet, which I guess makes it a BB for now. With that in mind:
Since I have access to Lat/Long, my first stab at an heuristic, while
only dealing with times, would be the distance as the crow flies
between a node and the hotel. Would this be an admissible heuristic?
It gives me the shortest possible distance and time, so it wouldn't
overestimate.
Now let's say the Importance of a node is between 1-4. In order to factor it in, one idea could be g(neighbor) = g(current) + (edge_cost / Importance^2). Assuming this would be valid (if not, why?):
But now the heuristic values would be in a different unit. Could a solution to this simply be give the Hotel Importance = 1? If the value is the same, will it still be admissible? EDIT: I think this will end up giving me problems because of the difference in scale.
I still have to restrict the total amount of time. Should each node keep track of the total time spent, in order to compare to the limit, plus the g() and h() values, because of the different units?
And finally:
Since I have to start and end in the same node, what comes to mind is to explore a node and should I find the hotel see if I still have time to explore the neighbors instead of going back. However, if I still have time to expand to one more node, but time runs out and I can't get to the hotel from there, I'm assuming I'll have to backtrack to the parent.
I can't help but see similarities to the knapsack problem. Even though I have to use A*, is there any lesson I can take from it?
Must my heuristic be consistent in this case? If so, why?
By the way, the purpose here is pathfinding first, optimizations second.
This actually looks like a combination of the travelling salesman problem (TSP) and knapsack problem (KP). It's KP in this respect: the knapsack capacity is 10 (for total hours available in a day) and the locations are the items. The item value equals the location value. The item weight is equal to the time it takes to travel to the location (plus the location's portion of the trip back to the hotel). The challenge arises from the fact that an item's weight is unknown until you solve the optimal tour through the selected locations--enter the TSP and Pathfinding.
One approach might be to use a pathfinding algorithm (e.g. A*, Bellman–Ford, or Dijkstra's algorithm) primarily to compute a distance matrix between each node. The distance matrix can then be leveraged while solving the TSP portion of the problem: finding a tour through the locations and using the total time as the weight.
The next step is up to you. If you are looking for an approximate solution, many heuristics exist for both TSP and KP: See Christofides TSP Heuristic, or the Minimum TSP and Maximum Knapsack entries at the Compendium of NP Optimization problems.
If on the other hand you seek an optimal solution, you may be out of luck. Still I recommend you find a copy of Graph Theory. An Algorithmic Approach by Nicos Christofides (ISBN-13: 978-0121743505). It provides heuristics for early backtracking in a Depth-First-Search that expedite the search for optimal solutions to several NP-Complete problems.
Related
I am using A* in order to solve the Asymmetric Traveling Salesman problem.
My state representation has 4 variables:
1 - Visited cities (List)
2 - Unvisitied cities (List)
3 - Current City (Integer)
4 - Current Cost (Integer)
However, even tho I find many path-construction algorithms such as Nearest Neighbor, k-opt and so on, I can't find an heuristic suitable for A*, which is, a h(n) function that takes a state as input and returns an integer corresponding to that state's quality.
So my question is, are there such heuristics? Any recommendations?
Thanks in advance
The weight of the minimum spanning tree of the subgraph that contains all unvisited vertices and the current vertex is a lower bound for the cost to finish the current path. It can be used with the A* algorithm as it can't overestimate the remaining distance (otherwise, the weight of the remaining path is smaller than the weight of minimum spanning tree and it spans the given vertices, which is a contradiction).
I've never tried it though so I don't know how well it'll work in practice.
There always are: h(n) = 0 always works. It is useless, turning A* into Dijkstra, but it's definitely admissible.
An other obvious one: let h(n) be the shortest edge from the current city back to the beginning. Still a huge underestimation, but at least it's not necessarily zero. It's obviously valid, the loop has to be closed eventually and (given this partial route) there is no shorter way to do it.
You can be a bit more clever here, for example you could use linear programming (make two variables for each edge, one for each direction, then for every city make a constraint forcing the sum of entering edges to be 1 and a constraint forcing the sum of exiting edges to be one, weights are obviously the distances) to find an underestimation of the length from the current node back to the beginning while touching every city in the set of unvisited cities. Of course if you're doing that, you might as well drop A* and just use the usual integer linear programming tricks. A* doesn't seem like a good fit here (especially in the beginning, the branching factor is too high and the heuristics won't guide it enough yet), but I haven't tried this so who knows.
Also, given the solution from the LP, you can improve it a lot by using some simple tricks (and some advanced tricks that whole books have been written about, but let's not go there, read the books if you want to know). For example, one thing the LP likes to do is form lots of little triangles. This will satisfy the degree constraints everywhere locally and keeps everything nice and short. But it's not a tour, and forcing it be more like a tour will make the heuristic higher=better. To remove the sub-tours, you can detect them in the fractional solution and then force the number of entries to the subgraph to be at least 1 (it may have to become more than 1 at some point, so don't force it to be exactly 1) and force the number of exits to be at least 1, by adding the corresponding constraints and solving again. There are many more tricks, but this should already give a very reasonable heuristic, much closer to the actual cost than using any of the overestimating heuristics and dividing them by their worst case overestimation factor. The problem with those is that usually the heuristic is pretty good, much better than their worst case factor, and then dividing by the worst case factor really kills the quality of the heuristic.
I'm trying to reconcile two seemingly contradictory ideas:
The unbounded knapsack optimization problem is known to be NP-hard
My colleague and I think we can solve a minor variation on it in polynomial time using A*
Sounds crazy, right? That's what I think!
The variation of the problem is described in terms of a cargo plane that must unload some of its goods in order to reduce its payload to the plane's capacity. So there's a set of items each with a weight and a value, and a target weight which must be unloaded -- optimize the goods to unload so that you have at least W weight removed, and minimize the total value of the goods. Consider the unbounded problem where there are arbitrarily many items available each of N different types.
The proposed solution uses a graph which starts at a node (vertex) representing nothing unloaded. Each unload operation represents an edge, so the graph grows exponentially out from the starting point with every possible combination of goods unloaded. The destination node is a virtual aggregate in that all combinations with total weight >= the target are considered the target node. The total weight unloaded so far gets stored in each node and is used to determine whether the target has been reached or not. The cost of each edge is the value of the item being unloaded. So a shortest-path algorithm such as Dijkstra or A* will find the optimum set of goods.
Dijkstra clearly takes exponential time since it is exploring all possible combinations. But with an admissible heuristic, I think A* should run in polynomial time. And I think the following heuristic should work. For every good, calculate the "specific value" which is the ratio of value to weight. Pick the good with the highest specific value. As a heuristic for a given node, calculate the weight still needed to be unloaded times the maximum specific value. This provides an estimate which is either exactly correct in the case that the target weight can be achieved by an integer number of optimal goods, or in all other cases underestimates the distance (weight) remaining because the actual number of goods would need to be rounded up. So the heuristic is admissible.
I haven't proven runtime complexity in any rigorous way. But the way A* works, it will greedily add items towards the goal, exploring the best options quickly, which intuitively feels like it should run in polynomial time for N. And with a properly admissible heuristic the solution is guaranteed to be optimal.
So what's wrong with this solution? I absolutely do not believe we have found a novel solution to a well-studied problem by applying a well-known algorithm. But this seems like it should work.
This sounds like the standard branch and bound method for knapsack. It's good when there's variety in the ratios but devolves to exponential-time brute force when the ratios are the same.
If you have the full bus schedule for a country, how can you find the
furthest anyone can travel in one day without visiting the same stop twice?
I assume a bus schedule gives you the full list of leaving and arriving times for every bus stop.
A slow and naive method would be as follows.
You can of course make a graph from the bus schedule with multiple directed edges between bus stops. You could then do a depth first search remembering the arrival time of the edge you took to get to each node and only taking edges from that stop that leave after the one that you took to get there. If you go to a node you have been to before you would only carry on from there if the current time in your traversal is before the earliest time you had ever visited that node before. You could record the furthest you can get from each node and then you could check each node to find the furthest you can travel overall.
This seems very inefficient however and it really isn't a normal graph problem. The problem is that in a normal directed graph if you can get from A to B and from B to C then you can get from A to C. This isn't true here.
What is the fastest you can solve this problem?
I think your original algorithm is pretty good.
You can think of your approach as being a version of Dijkstra's algorithm, in attempting to find the shortest path to each node.
Note that it is best at this stage to weight edges in the graph in terms of time. The idea is to use your Dijkstra-like algorithm to compute all nodes reachable within 1 days worth of time, and then pick whichever of these nodes is furthest in space from the start point.
Implementations of Dijkstra can use a heap to retrieve the next node to explore in O(logn), and I think this would be a good enhancement to your approach as well. If you always choose the node that you can reach earliest, you never need to repeat the calculation for that node.
Overall the approach is:
For each starting point
Use a modified Dijkstra to compute all nodes reachable in 1 day
Find the furthest in space of all these nodes.
So for n starting points and e bus routes, the complexity is about O(n(n+e)log(n)) to get the optimal answer.
You should be able to get improved performance by using an appropriate heuristic in an A* search. The heuristic needs to underestimate the max distance possible from a point, so you could use the maximum speed of a bus multiplied by the remaining time.
Instead of making multiple edges for each departure from a location, you can make multiple nodes per location / time.
Create one node per location per departure time.
Create one node per location per arrival time.
Create edges to connect departures to arrivals.
Create edges to connect a given node to the node belonging to the same location at the nearest future time.
By doing this, any path you can traverse through the graph is "valid" (meaning a traveler would be able to achieve this by a combination of bus trips or choosing to sit at a location and wait for a future bus).
Sorry to say, but as described this problem has a pretty high complexity. Misread the problem originally and thought it was np-hard, but it is not. It does however have a pretty high complexity that I personally would not want to deal with. This algorithm is a pretty good approximation that give a considerable complexity savings that I personally think it worth it.
However, if all you want is an answer that is "pretty good" there are are lot of fairly efficient algorithms out there that will get close very quickly.
Personally I would suggest using a simple greedy algorithm here.
I've done this on a few (granted, small and contrived) examples and it's worked pretty well and has an nlog(n) efficiency.
Associate a velocity with each node, velocity being the fastest you can move away from a given node. In my examples this velocity was distance_travelled/(wait_time + travel_time). I used the maximum velocity of all trips leaving a node as the velocity score for that node.
From your node/time calculate the velocities of all neighboring nodes and travel to the "fastest" node.
This algorithm is pretty good for the complexity as it basically transforms the problem into a static search, but there are a couple potential pitfalls that could be adjusted for depending on your data set.
The biggest issue with this algorithm is the possibility of a really fast bus going into the middle of nowhere. You could get around that by adding a "popularity" term to the velocity calculation (make more popular stops effectively faster) but depending on your data set that could easily make things either better or worse.
The simplistic graph representation will not work. I. e. each city is a node and the edges represent time. That's because the "edge" is not always active -- it is only active at certain times of the day.
The second thing that comes to mind is Edward Tufte's Paris Train Schedule which is a different kind of graph. But that does not quite fit the problem either. With the train schedule, the stations have a sequential relationship between stations, but that's not the case in general with cities and bus schedules.
But Tufte motivates the following way to model it as a graph. You could write code only to construct the graph and use a standard graph library that includes the shortest path algorithm.
Each bus trip is an edge with weight = distance covered
Each (city, departure) and (city, arrival) is a node
All nodes for a given city are connected by zero-weight edges in a time-ordered sequence, ignoring whether it is an arrival or a departure. This subgraph will look like a chain.
(it is a directed graph)
Linear Time Solution: Note that the graph will be a directed, acyclic graph. Finding the longest path in such a graph is linear. "A longest path between two given vertices s and t in a weighted graph G is the same thing as a shortest path in a graph −G derived from G by changing every weight to its negation. Therefore, if shortest paths can be found in −G, then longest paths can also be found in G."
Hope this helps! If somebody can post a visualization of the graph, it would be nice. If I can do so myself, I will do 1 more edit.
Naive is the best you'll get -- http://en.wikipedia.org/wiki/Longest_path_problem
EDIT:
So the problem is two fold.
Create a list of graphs where its possible to travel from pointA to pointB. Possible is in terms of times available for busA to travel from pointA to pointB.
Find longest path from all the possible generated path above.
Another approach would be to reevaluate the graph upon each node traversal and find the longest path.
It still reduces to finding longest possible path, which is NP-Hard.
There is a network of towns, connected by roads of various integer lengths.
A traveler wishes to travel in his car from one town to another. However, he does not want to minimize distance traveled; instead he wishes to minimize the petrol cost of the journey. Petrol can be bought in any city, however each city supplies petrol at various (integer) prices (hence why the shortest route is not necessarily the cheapest). 1 unit of petrol enables him to drive for 1 unit of distance.
His car can only hold so much petrol in the tank, and he can choose how many units of petrol to purchase at each city he travels through. Find the minimum petrol cost.
Does anyone know an efficient algorithm that could be used to solve this problem? Even the name of this type of problem would be useful so that I can research it myself! Obviously it's not quite the same as a shortest path problem. Any other tips appreciated!
EDIT - the actual problem I have states that there will be <1000 cities; <10000 roads; and the petrol tank capacity will be somewhere between 1 and 100.
You could solve this directly using Djikstra's algorithm if you are happy to increase the size of the graph.
Suppose your petrol tank could hold from 0 to 9 units of petrol.
The idea would be to split each town into 10 nodes, with node x for town t representing being at town t with x units of petrol in the tank.
You can then construct zero-cost edges on this expanded graph to represent travelling between different towns (using up petrol in the process so you would go from a level 8 node to a level 5 node if the distance was 3), and more edges to represent filling up the tank at each town with one unit of petrol (with cost depending on the town).
Then applying Djikstra should give the lowest cost path from the start to the end.
I think the question is: Is there a chance the petrol stuff makes the underlying traveling salesman problem computationally more feasible? If not, there is no efficient non-approximating algorithm.
Of course, you can find efficient solutions for edge cases, and there might be more edge cases with the petrol condition, as in, always take this city first because the petrol is so cheap.
I think you can solve this with dynamic programming. For each node, you save an array of tuples of petrol cost and the length of the path where you use that petrol, containing the optimal solution. Every step you loop trough all nodes and if there is a node you can go, which already has a solution, you loop trough all the nodes you can go to with a solution. You select the minimum cost, but note: you have to account for the petrol cost in the current node. All costs in the array that are higher than the cost in the current node, can instead be bought at the current node. Note that nodes which already have a solution should be recalculated, as the nodes you can go to from there could change. You start with the end node, setting the solution to an empty array (or one entry with cost and length 0). The final solution is to take the solution at the beginning and sum up every cost * length.
I'd try this:
Find the shortest route from start to destination. Dijkstra's algorithm is appropriate for this.
Find the minimum cost of petrol to travel this route. I'm not aware of any off-the-shelf algorithm for this, but unless there are many cities along the route even an exhaustive search shouldn't be computationally infeasible.
Find the next shortest route ...
Defining precise stopping criteria is a bit of a challenge, it might be best just to stop once the minimum cost found for a newly-tested route is greater than the minimum cost for a route already tested.
So, use 2 algorithms, one for each part of the problem.
This might be optimized suitably well using a Genetic Algorithm. Genetic Algorithms beat humans at some complex problems:
http://en.wikipedia.org/wiki/Genetic_algorithm
The gist of a Genetic Algorithm is:
Come up with a ranking function for candidate solutions
Come up with a pool of unique candidate solutions. Initialize it
with some randomly-generated possibilities. Maybe 10 or 100 or
1000...
Copy a candidate solution from the pool and perturb it in some way -
add a town, remove a town, add two towns, etc. This might improve
or worsen matters - your ranking function will help you tell. Which
one do you pick? Usually, you pick the best, but once in a while,
you intentionally pick one that's not to avoid getting stuck on a
local optimum.
Has the new solution already been ranked? If yes, junk it and go to
If no, continue...
Add the perturbed candidate back to the pool under its newly-calculated rank
Keep going at this (repeat from #3) until you feel you've done it long enough
Finally, select the answer with the best rank. It may not be
optimal, but it should be pretty good.
You could also formulate that as an integer linear programming (ILP) problem. The advantage is that there is a number of off-the-shelf solvers for this task and the complexity won't grow so fast as in the case of Peters solution with the size of the tank.
The variables in this particular problem will be the amounts of petrol purchased in any one town, the amount in the cars tank in any town on the way and actual roads taken.
The constraints will have to guarantee that the car spends the necessary fuel on every road and does not have less that 0 or more than MAX units of fuel in any town and that the roads constitute a path from A to B.
The objective will be the total cost of the fuel purchased.
The whole thing may look monstrous (ILP formulations often do), but it does not mean it cannot be solved in a reasonable time.
Say, we have a circular list representing a solution of the traveling salesman problem. This list is initially empty.
If the user is allowed to enter a city and it's coordinate one by one, what heuristics could be used to insert those coordinates into the already existing tour?
An example uses the nearest neighbor heuristic : it inserts the new coordinate after the nearest coordinate already in the tour.
What are some other options (pseudo-code if possible).
There are plenty of construction heuristics you can use, such as First Fit, First Fit Decreasing, Best Fit, Best Fit Decreasing and Cheapest Insertion.
Those constructions heuristics are applied on bin packing normally, but they can be converted to TSP too. Documentation about those heuristics is here.
Since you're only inserting 1 unassigned entity at at time, all of these basically revert to what you call nearest neighbor heuristic (with a slight variation on ties), but note that that is not what they usually call Nearest Neighbor. Nearest Neighbor always adds to the end of the line, the nearest neighbor of all unassigned entities.
Now, what you really want, is a decent solution, without having to restart your entire construction heuristics. That's harder: welcome to repeated planning and real-time planning (and this documentation). I am working on a open source example for TSP and vehicle routing that does real-time planning.
You can of course generalize the idea you have mentioned:
Define k'th_path(v) = minimum weight of a path including max{k,not_visited cities} cities
Note that calculating the k'th path is O(|V|^k) [this bound is not tight]
Special cases:
For k=1 you get the nearest neighbor, as you suggested.
for k=|V| you get an optimal solution [note it will be very expansive to calculate].
There are not other heuristic because TSP is always about to find the nearest coordinate. At least I don't know an algorithm that can insert a coordinate and knows the nearest coordinate but there are plenty algorithm to find a good tour. A good heuristic is for example the Christofides algorithm, it works only in euklidian space but it give you a guarantee of the solution to be within 3/2 of the optimum. It's not very easy to code. Especially the edmond blossom v algorithm is for an expert skill. The importance of a guarantee isn't high enough because how would you explain that your method can deliver non-sense in some rare situation?