Reasonably efficient algorithm for reward driven graph traversal - algorithm

I'm wondering if there is a more elegant solution to this problem. The brute-force approach (depth-first search) is too computationally intensive.
You are given a network of nodes interconnected with paths. Each path has a distance and zero or more elements along the path that can only be collected once every five minutes. Collecting those elements increases your score.
The goal is to plan out the next five minutes of path traversal, keeping in mind the paths that have been traversed already in the last five minutes, so as to maximize the score increase.
The brute force algorithm is to try every possible route from the current location, avoiding places we have already been, stopping when we have traveled our max planning distance or time, and keep a virtual tally of rewards collected. Then all we have to do is choose the route with the highest score.
Unfortunately, the number of nodes and paths in the graph is high enough that planning out even just five minutes worth of travel requires too much computation.
Is there a known algorithm that solves this problem more efficiently than the brute-force method? Even if it only finds an approximate solution, and not an optimal one.
EDIT
Thank you #SaiBot, here is my final solution, in case anyone should ever find themselves asking this same question:
I assigned every path, going from node A to node B, a unique ID. The path from B to A had its own ID. Outside the DFS search function but accessible to it, I kept a hash keyed by the ID, and the value consists of both the distance traveled prior to taking this path, and the size of the reward received so far. To minimize extra work, I made sure that at each node, the outgoing paths were sorted shortest to longest. Then, when the DFS algorithm was asked to evaluate a path it has evaluated before, the first thing it inspects is that cached result. If the cached result arrived with:
( reward <= previous_reward && distance >= previous_distance )
|| reward / distance <= previous_score
Then it is reasoned that there will be no benefit to recursing this path again, so it returns immediately with a score of 0 to immediately disqualify it from consideration. Otherwise, it records the new incoming reward, distance, and score in the cache, and proceeds normally.
In addition, I did one other thing. I reasoned that I wanted a certain amount of novelty in the path, meaning I didn't want it to just find one tiny little path that gets maximum reward, I wanted it to explore the map. So I added a filter to outgoing nodes, saying that if the node has been visited in the past X minutes, remove it from consideration. This had the side-effect of allowing the algorithm to route itself into a corner, so I added a fall-back, where if there were no available options, it would sort the outgoing paths by last visited, oldest first, and try in that order.
The result was decent, but I'm going to do some more experiments to see if I can get even better results.

You problem is closely related to pareto optimal path computation in multi-criteria networks, e.g., as described in this paper.
If you would just have one criteria (like distance) associated with each edge, then Dijkstra lets you quickly find all possible paths (optimizing distance). This is possible since you can "discard" a path that arrives at a node if another path reaching that node already has a lower distance.
The problem arises when you have two or more criteria (e.g., distance and reward) associated with each edge. Now, if two paths (starting form your start node) lead to the same node and path_1 has a lower distance than path_2, but path_2 has higher reward than path_1 you cannot discard either. However, if both criteria of a path are worse than in another path you are able to discard it.
One possible algorithm to do the complete search is described in the above paper.
Edit
My answer above will not consider elements reappearing during the route. If you want to include this, you would have to know when and where elements reappear during route planning. This however, will make things a lot more complicated since you could achieve a higher reward by "waiting" for elements to respawn.

Related

Find min cost from node A to node B and also keep the path info

I got a question which is to find the min cost from the least number node (1) to the largest number node (7).
The cost is the edge between nodes. I labeled them.
This problem got me to think of the Dijkstra which leads the time complexity for O((v+e) log v)
Any other better approach to solving this question efficiently?
The other requirement is to keep the path information, any thought to keep the path?
As others pointed out, the complexity is as you say and cannot be better. As #nico-schertler commented, searching from both sides in parallel (or taking turns) and stopping as soon as something touches is faster than doing just a search from one side, but it will have the same complexity. It is possible in this case (with fixed costs for the bidirectional edges) but it needs not be in the general case (e. g. cost depending on the already taken path) where Dijkstra is still applicable.
Concerning the keeping of the path: Of course, the whole thing often only makes sense if you get the path to be taken as an answer. There are two main approaches to get the path as a result.
One is to store the path already taken to a certain node along with the node in the lists (white/grey in the classical implementation). Each time you add a new node, you extend the path of its former node by one step. If you find the target node, you can directly return the path as a result (along with the cost sum). Of course this way means uses a lot of memory.
The other is to not store the origin node along with each new found node, so each node points to the node it was visited from first. Think of it as putting up signposts in each node how to go back. This way, if you find the target node, you will have to go backwards from each node to the one it was first visited from and build the path in reverse order in the process. Then you can return this path.

Travelling by bus

If you have the full bus schedule for a country, how can you find the
furthest anyone can travel in one day without visiting the same stop twice?
I assume a bus schedule gives you the full list of leaving and arriving times for every bus stop.
A slow and naive method would be as follows.
You can of course make a graph from the bus schedule with multiple directed edges between bus stops. You could then do a depth first search remembering the arrival time of the edge you took to get to each node and only taking edges from that stop that leave after the one that you took to get there. If you go to a node you have been to before you would only carry on from there if the current time in your traversal is before the earliest time you had ever visited that node before. You could record the furthest you can get from each node and then you could check each node to find the furthest you can travel overall.
This seems very inefficient however and it really isn't a normal graph problem. The problem is that in a normal directed graph if you can get from A to B and from B to C then you can get from A to C. This isn't true here.
What is the fastest you can solve this problem?
I think your original algorithm is pretty good.
You can think of your approach as being a version of Dijkstra's algorithm, in attempting to find the shortest path to each node.
Note that it is best at this stage to weight edges in the graph in terms of time. The idea is to use your Dijkstra-like algorithm to compute all nodes reachable within 1 days worth of time, and then pick whichever of these nodes is furthest in space from the start point.
Implementations of Dijkstra can use a heap to retrieve the next node to explore in O(logn), and I think this would be a good enhancement to your approach as well. If you always choose the node that you can reach earliest, you never need to repeat the calculation for that node.
Overall the approach is:
For each starting point
Use a modified Dijkstra to compute all nodes reachable in 1 day
Find the furthest in space of all these nodes.
So for n starting points and e bus routes, the complexity is about O(n(n+e)log(n)) to get the optimal answer.
You should be able to get improved performance by using an appropriate heuristic in an A* search. The heuristic needs to underestimate the max distance possible from a point, so you could use the maximum speed of a bus multiplied by the remaining time.
Instead of making multiple edges for each departure from a location, you can make multiple nodes per location / time.
Create one node per location per departure time.
Create one node per location per arrival time.
Create edges to connect departures to arrivals.
Create edges to connect a given node to the node belonging to the same location at the nearest future time.
By doing this, any path you can traverse through the graph is "valid" (meaning a traveler would be able to achieve this by a combination of bus trips or choosing to sit at a location and wait for a future bus).
Sorry to say, but as described this problem has a pretty high complexity. Misread the problem originally and thought it was np-hard, but it is not. It does however have a pretty high complexity that I personally would not want to deal with. This algorithm is a pretty good approximation that give a considerable complexity savings that I personally think it worth it.
However, if all you want is an answer that is "pretty good" there are are lot of fairly efficient algorithms out there that will get close very quickly.
Personally I would suggest using a simple greedy algorithm here.
I've done this on a few (granted, small and contrived) examples and it's worked pretty well and has an nlog(n) efficiency.
Associate a velocity with each node, velocity being the fastest you can move away from a given node. In my examples this velocity was distance_travelled/(wait_time + travel_time). I used the maximum velocity of all trips leaving a node as the velocity score for that node.
From your node/time calculate the velocities of all neighboring nodes and travel to the "fastest" node.
This algorithm is pretty good for the complexity as it basically transforms the problem into a static search, but there are a couple potential pitfalls that could be adjusted for depending on your data set.
The biggest issue with this algorithm is the possibility of a really fast bus going into the middle of nowhere. You could get around that by adding a "popularity" term to the velocity calculation (make more popular stops effectively faster) but depending on your data set that could easily make things either better or worse.
The simplistic graph representation will not work. I. e. each city is a node and the edges represent time. That's because the "edge" is not always active -- it is only active at certain times of the day.
The second thing that comes to mind is Edward Tufte's Paris Train Schedule which is a different kind of graph. But that does not quite fit the problem either. With the train schedule, the stations have a sequential relationship between stations, but that's not the case in general with cities and bus schedules.
But Tufte motivates the following way to model it as a graph. You could write code only to construct the graph and use a standard graph library that includes the shortest path algorithm.
Each bus trip is an edge with weight = distance covered
Each (city, departure) and (city, arrival) is a node
All nodes for a given city are connected by zero-weight edges in a time-ordered sequence, ignoring whether it is an arrival or a departure. This subgraph will look like a chain.
(it is a directed graph)
Linear Time Solution: Note that the graph will be a directed, acyclic graph. Finding the longest path in such a graph is linear. "A longest path between two given vertices s and t in a weighted graph G is the same thing as a shortest path in a graph −G derived from G by changing every weight to its negation. Therefore, if shortest paths can be found in −G, then longest paths can also be found in G."
Hope this helps! If somebody can post a visualization of the graph, it would be nice. If I can do so myself, I will do 1 more edit.
Naive is the best you'll get -- http://en.wikipedia.org/wiki/Longest_path_problem
EDIT:
So the problem is two fold.
Create a list of graphs where its possible to travel from pointA to pointB. Possible is in terms of times available for busA to travel from pointA to pointB.
Find longest path from all the possible generated path above.
Another approach would be to reevaluate the graph upon each node traversal and find the longest path.
It still reduces to finding longest possible path, which is NP-Hard.

Why does Djikstra's algorithm need to keep track of the number of steps?

I can understand keeping track of the accumulated distance, the distance per path, and keeping track of the name (or position) of the vertex, but why keep track of the number of steps unless you are wanting to track how efficiently it reached its destination?
The step is totally unnecessary for finding the path, and it seems rather arbitrary anyway. For instance, if you have multiple vertices where the accumulated distance is the same, and the smallest number, there is no reason to care which one you start from, but whichever one it is gets labelled with the next step in line.
I see many pieces of code around, and they generally follow this principle of keeping track of the steps. It seems very strange, especially when many of them are pathfinding on a 2D matrix where the cost of movement is either 1 or infinite. In that case, it seems to me that not only is the number of steps per vertex superfluous, but the only information necessary to be bothered with is the distance and the label of the vertex. If you have a distance, you know you have visited the vertex, and since all distances are the same, the first time you reach a vertex should always be its lowest distance. No evaluating whether it is lower or greater is necessary, only that it exists.
Anyway, I'm just curious why something so simple should have superfluous information gathered. Is there some reason for it I'm just not grasping?
EDIT--
To add a little clarity, and since it wasn't formatting properly in the comment, the step is normally shown in the table people tell you to use.
____________________
|name|step|distance|
--------------------
|temporary Labels |
--------------------
The step is added when a position is the next shortest point to the origin.
Okay, I have seen that video now and it’s actually the first time I have ever seen such a table being used. It does not make much sense to me. It completely mixes “labels” with “distances”; a permanent label is the order in which nodes were marked, while temporary labels are the current non-fixed distances. Neither of these are necessary at all.
Instead what you usually have for a node is the following: The distance (from the start node), the parent (or previous) node, and a mark to mark a node as completed or not (in an implementation you usually have a priority queue for all unmarked nodes instead).
You then keep looking at the unmarked node with the smallest total distance, mark it and update the distance of all the unmarked neighbors. And whenever you update to a shorter distance you also update the parent node.
In no way though you need to have the order in which you marked the nodes as completed or have all the previous uncomplete distances. To me, in that video, it seems as if it’s just a way to make it easier to check a student’s work, as without identical distances you always have a single order in which you would look at the vertices.
That being said, the normal Dijkstra algorithm does not include this stuff, and it’s not necessary. See the pseudocode on Wikipedia for implementation details on what you actually store (as said, you usually have only the distance and parent for each node, and a priority queue for the unmarked nodes).
It seems very strange, especially when many of them are pathfinding on a 2D matrix where the cost of movement is either 1 or infinite.
What you are describing here is a very special case. The Dijkstra algorithm is actually used for many graph problems where distances are not equal, and with more connections that just 4 simple neighbors in every direction.

Graph search problem with route restrictions

I want to calculate the most profitable route and I think this is a type of traveling salesman problem.
I have a set of nodes that I can visit and a function to calculate cost for traveling between nodes and points for reaching the nodes. The goal is to reach a fixed known score while minimizing the cost.
This cost and rewards are not fixed and depend on the nodes visited before.
The starting node is fixed.
There are some restrictions on how nodes can be visited. Some simplified examples include:
Node B can only be visited after A
After node C has been visited, D or E can be visited. Visiting at least one is required, visiting both is permissible.
Z can only be visited after at least 5 other nodes have been visited
Once 50 nodes have been visited, the nodes A-M will no longer reward points
Certain nodes can (and probably must) be visited multiple times
Currently I can think of only two ways to solve this:
a) Genetic Algorithms, with the fitness function calculating the cost/benefit of the generated route
b) Dijkstra search through the graph, since the starting node is fixed, although the large number of nodes will probably make that not feasible memory wise.
Are there any other ways to determine the best route through the graph? It doesn't need to be perfect, an approximated path is perfectly fine, as long as it's error acceptable.
Would TSP-solvers be an option here?
With this much weird variation and path-dependence, what you're actually searching is not the graph itself, but the space of paths from the root, which is a tree. If the problem is as general as you say, you're not going to be able to do better than directly searching the "tree-of-paths", saving the best value and the corresponding path. If you can transform it into any way so that there is no such path-dependence, you should probably do so.
If you can't, there are two basic options: breadth-first, which will return the paths in order of length, but at the cost of high memory usage, as there are many temporary paths that must be stored. Depth-first search only needs to store a single path (which can be done entirely as a series of recursive calls), but has no natural stopping point, and is not guaranteed to actually terminate if there is no upper bound on the path size.
If you're lucky enough that the cost increases monotonically with each additional step, you can instead order by cost. The first one that's good enough is the one you then want. Breadth firs search is sometimes implemented by putting the paths to explore on a queue. Change this to a priority queue based on the cost, and you now have a "cost first search", known formally as Uniform-cost search.
If the cost function can decrease by adding on the path, A* search can be modified to do the search, but you no longer have the guarantee that you can stop early.

An algorithm to check if a vertex is reachable

Is there an algorithm that can check, in a directed graph, if a vertex, let's say V2, is reachable from a vertex V1, without traversing all the vertices?
You might find a route to that node without traversing all the edges, and if so you can give a yes answer as soon as you do. Nothing short of traversing all the edges can confirm that the node isn't reachable (unless there's some other constraint you haven't stated that could be used to eliminate the possibility earlier).
Edit: I should add that it depends on how often you need to do queries versus how large (and dense) your graph is. If you need to do a huge number of queries on a relatively small graph, it may make sense to pre-process the data in the graph to produce a matrix with a bit at the intersection of any V1 and V2 to indicate whether there's a connection from V1 to V2. This doesn't avoid traversing the graph, but it can avoid traversing the graph at the time of the query. I.e., it's basically a greedy algorithm that assumes you're going to eventually use enough of the combinations that it's easiest to just traverse them all and store the result. Depending on the size of the graph, the pre-processing step may be slow, but once it's done executing a query becomes quite fast (constant time, and usually a pretty small constant at that).
Depth first search or breadth first search. Stop when you find one. But there's no way to tell there's none without going through every one, no. You can improve the performance sometimes with some heuristics, like if you have additional information about the graph. For example, if the graph represents a coordinate space like a real map, and most of the time you know that there's going to be a mostly direct path, then you can attempt to have the depth-first search look along lines that "aim towards the target". However, imagine the case where the start and end points are right next to each other, but with no vector inbetween, and to find it, you have to go way out of the way. You have to check every case in order to be exhaustive.
I doubt it has a name, but a breadth-first search might go like this:
Add V1 to a queue of nodes to be visited
While there are nodes in the queue:
If the node is V2, return true
Mark the node as visited
For every node at the end of an outgoing edge which is not yet visited:
Add this node to the queue
End for
End while
Return false
Create an adjacency matrix when the graph is created. At the same time you do this, create matrices consisting of the powers of the adjacency matrix up to the number of nodes in the graph. To find if there is a path from node u to node v, check the matrices (starting from M^1 and going to M^n) and examine the value at (u, v) in each matrix. If, for any of the matrices checked, that value is greater than zero, you can stop the check because there is indeed a connection. (This gives you even more information as well: the power tells you the number of steps between nodes, and the value tells you how many paths there are between nodes for that step number.)
(Note that if you know the number of steps in the longest path in your graph, for whatever reason, you only need to create a number of matrices up to that power. As well, if you want to save memory, you could just store the base adjacency matrix and create the others as you go along, but for large matrices that may take a fair amount of time if you aren't using an efficient method of doing the multiplications, whether from a library or written on your own.)
It would probably be easiest to just do a depth- or breadth-first search, though, as others have suggested, not only because they're comparatively easy to implement but also because you can generate the path between nodes as you go along. (Technically you'd be generating multiple paths and discarding loops/dead-end ones along the way, but whatever.)
In principle, you can't determine that a path exists without traversing some part of the graph, because the failure case (a path does not exist) cannot be determined without traversing the entire graph.
You MAY be able to improve your performance by searching backwards (search from destination to starting point), or by alternating between forward and backward search steps.
Any good AI textbook will talk at length about search techniques. Elaine Rich's book was good in this area. Amazon is your FRIEND.
You mentioned here that the graph represents a road network. If the graph is planar, you could use Thorup's Algorithm which creates an O(nlogn) space data structure that takes O(nlogn) time to build and answers queries in O(1) time.
Another approach to this problem would allow you to ignore all of the vertices. If you were to only look at the edges, you can produce a transitive closure array that will show you each vertex that is reachable from any other vertex.
Start with your list of edges:
Va -> Vc
Va -> Vd
....
Create an array with start location as the rows and end location as the columns. Fill the arrays with 0. For each edge in the list of edges, place a one in the start,end coordinate of the edge.
Now you iterate a few times until either V1,V2 is 1 or there are no changes.
For each row:
NextRowN = RowN
For each column that is true for RowN
Use boolean OR to OR in the results of that row of that number with the current NextRowN.
Set RowN to NextRowN
If you run this algorithm until the end, you will quickly have a complete list of all reachable vertices without looking at any of them. The runtime is proportional to the number of edges. This would work well with a reasonable implementation and a reasonable number of edges.
A slightly more complex version of this algorithm would be to only calculate the vertices reachable by V1. To do this, you would focus your scope on the ones that are currently reachable at any given time. You can also limit adding rows to only one time, since the other rows are never changing.
In order to be sure, you either have to find a path, or traverse all vertices that are reachable from V1 once.
I would recommend an implementation of depth first or breadth first search that stops when it encounters a vertex that it has already seen. The vertex will be processed on the first occurrence only. You need to make sure that the search starts at V1 and stops when it runs out of vertices or encounters V2.

Resources