Travelling by bus - algorithm

If you have the full bus schedule for a country, how can you find the
furthest anyone can travel in one day without visiting the same stop twice?
I assume a bus schedule gives you the full list of leaving and arriving times for every bus stop.
A slow and naive method would be as follows.
You can of course make a graph from the bus schedule with multiple directed edges between bus stops. You could then do a depth first search remembering the arrival time of the edge you took to get to each node and only taking edges from that stop that leave after the one that you took to get there. If you go to a node you have been to before you would only carry on from there if the current time in your traversal is before the earliest time you had ever visited that node before. You could record the furthest you can get from each node and then you could check each node to find the furthest you can travel overall.
This seems very inefficient however and it really isn't a normal graph problem. The problem is that in a normal directed graph if you can get from A to B and from B to C then you can get from A to C. This isn't true here.
What is the fastest you can solve this problem?

I think your original algorithm is pretty good.
You can think of your approach as being a version of Dijkstra's algorithm, in attempting to find the shortest path to each node.
Note that it is best at this stage to weight edges in the graph in terms of time. The idea is to use your Dijkstra-like algorithm to compute all nodes reachable within 1 days worth of time, and then pick whichever of these nodes is furthest in space from the start point.
Implementations of Dijkstra can use a heap to retrieve the next node to explore in O(logn), and I think this would be a good enhancement to your approach as well. If you always choose the node that you can reach earliest, you never need to repeat the calculation for that node.
Overall the approach is:
For each starting point
Use a modified Dijkstra to compute all nodes reachable in 1 day
Find the furthest in space of all these nodes.
So for n starting points and e bus routes, the complexity is about O(n(n+e)log(n)) to get the optimal answer.
You should be able to get improved performance by using an appropriate heuristic in an A* search. The heuristic needs to underestimate the max distance possible from a point, so you could use the maximum speed of a bus multiplied by the remaining time.

Instead of making multiple edges for each departure from a location, you can make multiple nodes per location / time.
Create one node per location per departure time.
Create one node per location per arrival time.
Create edges to connect departures to arrivals.
Create edges to connect a given node to the node belonging to the same location at the nearest future time.
By doing this, any path you can traverse through the graph is "valid" (meaning a traveler would be able to achieve this by a combination of bus trips or choosing to sit at a location and wait for a future bus).

Sorry to say, but as described this problem has a pretty high complexity. Misread the problem originally and thought it was np-hard, but it is not. It does however have a pretty high complexity that I personally would not want to deal with. This algorithm is a pretty good approximation that give a considerable complexity savings that I personally think it worth it.
However, if all you want is an answer that is "pretty good" there are are lot of fairly efficient algorithms out there that will get close very quickly.
Personally I would suggest using a simple greedy algorithm here.
I've done this on a few (granted, small and contrived) examples and it's worked pretty well and has an nlog(n) efficiency.
Associate a velocity with each node, velocity being the fastest you can move away from a given node. In my examples this velocity was distance_travelled/(wait_time + travel_time). I used the maximum velocity of all trips leaving a node as the velocity score for that node.
From your node/time calculate the velocities of all neighboring nodes and travel to the "fastest" node.
This algorithm is pretty good for the complexity as it basically transforms the problem into a static search, but there are a couple potential pitfalls that could be adjusted for depending on your data set.
The biggest issue with this algorithm is the possibility of a really fast bus going into the middle of nowhere. You could get around that by adding a "popularity" term to the velocity calculation (make more popular stops effectively faster) but depending on your data set that could easily make things either better or worse.

The simplistic graph representation will not work. I. e. each city is a node and the edges represent time. That's because the "edge" is not always active -- it is only active at certain times of the day.
The second thing that comes to mind is Edward Tufte's Paris Train Schedule which is a different kind of graph. But that does not quite fit the problem either. With the train schedule, the stations have a sequential relationship between stations, but that's not the case in general with cities and bus schedules.
But Tufte motivates the following way to model it as a graph. You could write code only to construct the graph and use a standard graph library that includes the shortest path algorithm.
Each bus trip is an edge with weight = distance covered
Each (city, departure) and (city, arrival) is a node
All nodes for a given city are connected by zero-weight edges in a time-ordered sequence, ignoring whether it is an arrival or a departure. This subgraph will look like a chain.
(it is a directed graph)
Linear Time Solution: Note that the graph will be a directed, acyclic graph. Finding the longest path in such a graph is linear. "A longest path between two given vertices s and t in a weighted graph G is the same thing as a shortest path in a graph −G derived from G by changing every weight to its negation. Therefore, if shortest paths can be found in −G, then longest paths can also be found in G."
Hope this helps! If somebody can post a visualization of the graph, it would be nice. If I can do so myself, I will do 1 more edit.

Naive is the best you'll get -- http://en.wikipedia.org/wiki/Longest_path_problem
EDIT:
So the problem is two fold.
Create a list of graphs where its possible to travel from pointA to pointB. Possible is in terms of times available for busA to travel from pointA to pointB.
Find longest path from all the possible generated path above.
Another approach would be to reevaluate the graph upon each node traversal and find the longest path.
It still reduces to finding longest possible path, which is NP-Hard.

Related

Create the lowest-cost path between two vertices

I have a weighted undirected graph. Given two vertices in that graph that have no path between them, I want to create a path between then by adding edges to the graph, increasing the total weight of the graph by as little as possible. Is there a known algorithm for determining which edges to add?
An analogous problem would be if I have a graph of a country's road system, where there are two cities that are inaccessible by road from each-other, and I want to build the shortest set of new roads that will connect them. There may be other cities in between them that are connected to neither, and if they exist I want to take advantage of them.
Here's a little illustration; red and green are the vertices I want to connect, black lines are existing edges, and the blue line represents the path I want to exist.
Is there a known algorithm that gives the edges that are missing from that path?
You could use the A* algorithm with a weight of zero for existing edges and distance (or whatever cost makes sense) for missing edges.
https://en.wikipedia.org/wiki/A*_search_algorithm
Actually Dijkstra with some preprocessing is best friend for you.
I answered problem that can be similar to this one: What is the time complexity if it needs to revisit visited nodes in BFS?
What I see in common - you want to use as much existing roads as possible. Also you sometimes need to break that and build new road. The point is in setting the proper weights for existing ones and for "possibly new ones".
What would be my approach - lets say, that each 1 km of existing road has cost of 1. You can sum all existing roads in your graph and lets say there are 1000km in total. Then I would preprocess whole graph and from each node(city) I would create path to all other non-diretly-connected cities and each of them would cost 1000 + 1000 per kilometer and then run Dijkstra on it.
It will automatically use as much existing roads as possible and create as less new roads as possible.
Also you can play with that settings a bit to achieve what you want.
Imagine there are two cities that are only 100m away from each other. And there is actually path between them from existing roads that takes 20 000km. With the settings I have suggested you will get 20000km path as the best one (which satisfy need of "dont build any new roads if not necessary"). Sometimes you actually want this. Sometimes you do not. In case you do not, you can think about "ok, if I build like a little of extra road and it dramatically lowers the distance, its still better solution". If this is the case, you can think about lower the price of new roads (like removing the initial cost, or per kilometer cost or both of it - it depends on what you take as best output).
I don't think that there's an accepted algorithm. But you could try and do the following. First run Vitterbi Triangulation, then run a depth first search on this fully connected graph. Take the sum of the new links in the path from A -> B. Remove the longest new link, and repeat. Once the path from A -> B cannot be reached, check the previous solutions to see which of the solutions had the smallest sum of new links.

A* - Graph Traversal Heuristic

I have a graph that represents a city. I know the location of places of interest (nodes, which have a Importance value), the location of the hotel I'm staying in, how the nodes are connected, the traversal time between them and have acess to latitude and longitude. There are no issues converting from time to distance and vice-versa.
The objective is to tour the city, maximizing the importance per day but limiting one day of travel to 10 hours. A day begins and ends at the hotel. I have a working A* algorithm that chooses the lowest value but with no heuristic yet, which I guess makes it a BB for now. With that in mind:
Since I have access to Lat/Long, my first stab at an heuristic, while
only dealing with times, would be the distance as the crow flies
between a node and the hotel. Would this be an admissible heuristic?
It gives me the shortest possible distance and time, so it wouldn't
overestimate.
Now let's say the Importance of a node is between 1-4. In order to factor it in, one idea could be g(neighbor) = g(current) + (edge_cost / Importance^2). Assuming this would be valid (if not, why?):
But now the heuristic values would be in a different unit. Could a solution to this simply be give the Hotel Importance = 1? If the value is the same, will it still be admissible? EDIT: I think this will end up giving me problems because of the difference in scale.
I still have to restrict the total amount of time. Should each node keep track of the total time spent, in order to compare to the limit, plus the g() and h() values, because of the different units?
And finally:
Since I have to start and end in the same node, what comes to mind is to explore a node and should I find the hotel see if I still have time to explore the neighbors instead of going back. However, if I still have time to expand to one more node, but time runs out and I can't get to the hotel from there, I'm assuming I'll have to backtrack to the parent.
I can't help but see similarities to the knapsack problem. Even though I have to use A*, is there any lesson I can take from it?
Must my heuristic be consistent in this case? If so, why?
By the way, the purpose here is pathfinding first, optimizations second.
This actually looks like a combination of the travelling salesman problem (TSP) and knapsack problem (KP). It's KP in this respect: the knapsack capacity is 10 (for total hours available in a day) and the locations are the items. The item value equals the location value. The item weight is equal to the time it takes to travel to the location (plus the location's portion of the trip back to the hotel). The challenge arises from the fact that an item's weight is unknown until you solve the optimal tour through the selected locations--enter the TSP and Pathfinding.
One approach might be to use a pathfinding algorithm (e.g. A*, Bellman–Ford, or Dijkstra's algorithm) primarily to compute a distance matrix between each node. The distance matrix can then be leveraged while solving the TSP portion of the problem: finding a tour through the locations and using the total time as the weight.
The next step is up to you. If you are looking for an approximate solution, many heuristics exist for both TSP and KP: See Christofides TSP Heuristic, or the Minimum TSP and Maximum Knapsack entries at the Compendium of NP Optimization problems.
If on the other hand you seek an optimal solution, you may be out of luck. Still I recommend you find a copy of Graph Theory. An Algorithmic Approach by Nicos Christofides (ISBN-13: 978-0121743505). It provides heuristics for early backtracking in a Depth-First-Search that expedite the search for optimal solutions to several NP-Complete problems.

Graph search problem with route restrictions

I want to calculate the most profitable route and I think this is a type of traveling salesman problem.
I have a set of nodes that I can visit and a function to calculate cost for traveling between nodes and points for reaching the nodes. The goal is to reach a fixed known score while minimizing the cost.
This cost and rewards are not fixed and depend on the nodes visited before.
The starting node is fixed.
There are some restrictions on how nodes can be visited. Some simplified examples include:
Node B can only be visited after A
After node C has been visited, D or E can be visited. Visiting at least one is required, visiting both is permissible.
Z can only be visited after at least 5 other nodes have been visited
Once 50 nodes have been visited, the nodes A-M will no longer reward points
Certain nodes can (and probably must) be visited multiple times
Currently I can think of only two ways to solve this:
a) Genetic Algorithms, with the fitness function calculating the cost/benefit of the generated route
b) Dijkstra search through the graph, since the starting node is fixed, although the large number of nodes will probably make that not feasible memory wise.
Are there any other ways to determine the best route through the graph? It doesn't need to be perfect, an approximated path is perfectly fine, as long as it's error acceptable.
Would TSP-solvers be an option here?
With this much weird variation and path-dependence, what you're actually searching is not the graph itself, but the space of paths from the root, which is a tree. If the problem is as general as you say, you're not going to be able to do better than directly searching the "tree-of-paths", saving the best value and the corresponding path. If you can transform it into any way so that there is no such path-dependence, you should probably do so.
If you can't, there are two basic options: breadth-first, which will return the paths in order of length, but at the cost of high memory usage, as there are many temporary paths that must be stored. Depth-first search only needs to store a single path (which can be done entirely as a series of recursive calls), but has no natural stopping point, and is not guaranteed to actually terminate if there is no upper bound on the path size.
If you're lucky enough that the cost increases monotonically with each additional step, you can instead order by cost. The first one that's good enough is the one you then want. Breadth firs search is sometimes implemented by putting the paths to explore on a queue. Change this to a priority queue based on the cost, and you now have a "cost first search", known formally as Uniform-cost search.
If the cost function can decrease by adding on the path, A* search can be modified to do the search, but you no longer have the guarantee that you can stop early.

How to find minimum number of transfers for a metro or railway network?

I am aware that Dijkstra's algorithm can find the minimum distance between two nodes (or in case of a metro - stations). My question though concerns finding the minimum number of transfers between two stations. Moreover, out of all the minimum transfer paths I want the one with the shortest time.
Now in order to find a minimum-transfer path I utilize a specialized BFS applied to metro lines, but it does not guarantee that the path found is the shortest among all other minimum-transfer paths.
I was thinking that perhaps modifying Dijkstra's algorithm might help - by heuristically adding weight (time) for each transfer, such that it would deter the algorithm from making transfer to a different line. But in this case I would need to find the transfer weights empirically.
Addition to the question:
I have been recommended to add a "penalty" to each time the algorithm wants to transfer to a different subway line. Here I explain some of my concerns about that.
I have put off this problem for a few days and got back to it today. After looking at the problem again it looks like doing Dijkstra algorithm on stations and figuring out where the transfer occurs is hard, it's not as obvious as one might think.
Here's an example:
If here I have a partial graph (just 4 stations) and their metro lines: A (red), B (red, blue), C (red), D (blue). Let station A be the source.
And the connections are :
---- D(blue) - B (blue, red) - A (red) - C (red) -----
If I follow the Dijkstra algorithm: initially I place A into the queue, then dequeue A in the 1st iteration and look at its neighbors :
B and C, I update their distances according to the weights A-B and A-C. Now even though B connects two lines, at this point I don't know
if I need to make a transfer at B, so I do not add the "penalty" for a transfer.
Let's say that the distance between A-B < A-C, which causes on the next iteration for B to be dequeued. Its neighbor is D and only at this
point I see that the transfer had to be made at B. But B has already been processed (dequeued). S
So I am not sure how this "delay" in determining the need for transfer would affect the integrity of the algorithm.
Any thoughts?
You can make each of your weights a pair: (# of transfers, time). You can add these weights in the obvious way, and compare them in lexicographic order (compare # of transfers first, use time as the tiebreaker).
Of course, as others have mentioned, using K * (# of transfers) + time for some large enough K produces the same effect as long as you know the maximum time apriori and you don't run out of bits in your weight storage.
I'm going to be describing my solution using the A* Algorithm, which I consider to be an extension (and an improvement -- please don't shoot me) of Dijkstra's Algorithm that is easier to intuitively understand. The basics of it goes like this:
Add the starting path to the priority queue, weighted by distance-so-far + minimum distance to goal
Every iteration, take the lowest weighted path and explode it into every path that is one step from it (discarding paths that wrap around themselves) and put it back into the queue. Stop if you find a path that ends in the goal.
Instead of making your weight simply distance-so-far + minimum-distance-to-goal, you could use two weights: Stops and Distance/Time, compared this way:
Basically, to compare:
Compare stops first, and report this comparison if possible (i.e., if they aren't the same)
If stops are equal, compare distance traveled
And sort your queue this way.
If you've ever played Mario Party, think of stops as Stars and distance as Coins. In the middle of the game, a person with two stars and ten coins is going to be above someone with one star and fifty coins.
Doing this guarantees that the first node you take out of your priority queue will be the level that has the least amount of stops possible.
You have the right idea, but you don't really need to find the transfer weights empirically -- you just have to ensure that the weight for a single transfer is greater than the weight for the longest possible travel time. You should be pretty safe if you give a transfer a weight equivalent to, say, a year of travel time.
As Amadan noted in a comment, it's all about creating right graph. I'll just describe it in more details.
Consider two vertexes (stations) to have edge if they are on a single line. With this graph (and weights 1) you will find minimum number of transitions with Dijkstra.
Now, lets assume that maximum travel time is always less 10000 (use your constant). Then, weight of edge AB (A and B are on one line) is a time_to_travel_between(A, B) + 10000.
Running Dijkstra on such graph will guarantee that minimal number of transitions is used and minimum time is reached in the second place.
update on comment
Let's "prove" it. There're two solution: with 2 transfers and 40 minutes travel time and with 3 transfers and 25 minutes travel time. In first case you travel on 3 lines, so path weight will be 3*10000 + 40. In second: 4*10000 + 25. First solution will be chosen.
I had the same problem as you, until now. I was using Dijkstra. The penalties for transfers is a very good idea indeed and I've been using it for a while now. The main problem is that you cannot use it directly in the weight as you first you have to identify the transfer. And I didn't want to modify the algorithm.
So what I'be been doing, is that each time and you find a transfer, delete the node, add it with the penalty weight and rerun the graph.
But this way I found out that Dijkstra wont work. And this is where I tried Floyd-Warshall which au contraire to Dijkstra compares all possible paths through the graph between each pair of vertices.
It helped me with my problem switching to Floyd-Warshall. Hope it helps you as well.
Its easier to code and lot more easier to implement.

How to design an approximate path solution?

I am attempting to write (or expand on an existing) graph search algorithm that will let me find the path to get closest to destination node considering there is no guarantee that the nodes will be connected.
To provide a realistic application of this, let's say I need to get from Brampton, Ontario to Hamilton, Ontario. I know my possible options at my start point are Local transit, GO bus or Walking. I know that walking is the least desired way to get to my destination so I look at GO bus first. I know I can take GO to a point close to Hamilton, but at that point the GO bus turns and goes another direction at that closest point is at a place where I have no options (other than walk, but the algorithm would only consider walking for short distances otherwise it will consider the route not feasible)
Using this same example, if the algorithm were to find that I can get there a way that is longer but gets me closer to the destination node (or possible at the destination node) that would be a higher weighted path (The weightings don't matter so much while its searching, only when the results are delivered, it would list by which path was closest to the destination in ascending order). For example, one GO Bus may get me 3km from the destination node, while 3 public transit buses would get me 500m away
So my question is two fold:
1) What algorithm should I start with that does something similar
2) How would I programmaticly explain that it's ok if nodes don't connect so that it doesn't just jump from node A to node R. Would starting from the end and working backward accomplish this
Edit: I forgot to ask how to aim for the best approximate solution because especially with a large graph there will be possibly millions of solutions for this problem.
Thanks,
Michael
Read up on the A* algorithm. It is a generalization of Dijkstra's shortest path algorithm, that allows you to specify a heuristic, which provides a lower bound for distances between two verteces. In your case, the heuristic function would simply return the Euclidean distance.
Run the algorithm and keep track of the vertex with the best characteristic value, which you somehow compute from the graph distance from source and Euclidean distance to target. The only tricky part is to determine when to terminate (unless you want to traverse the entire graph).
Why can't you assume that all nodes are connected? In the real world, they normally are, i.e. you can always walk or call a taxi, etc.?
In this case, you could simply change your model the following way: You have one graph for each transportation method. The nodes that are at the same place are connected with edges of weight 0 (i.e. if you are dropped off by car at an airport or train station).
Then, label each vertex and edge with the transportation type and you can simply use existing routing algorithms. Oh, by the way: A* will not scale well to really big networks. To get something that software like Yahoo/Google/Microsoft Maps actually would use, have a look here. The work of this research group includes the winner of the DIMACS shortest path challenge.
Sounds very much like a travelling salesman problem with additional node characteristics. Just be wary that this type of problem is NP Complete and your best bet would be to go with some sort of approximation algorithm.

Resources