Shortest path: identify edges that cause negative cycles - boost

I have a directed graph with negative edge weights. The graph is modified by the program and sometimes will form negative cycles. When that happens, shortest path algorithms (Bellman-ford/Johnson/Floyd-Warshall) would detect the existence of such negative cycle and fail, but no other useful information is produced.
I would like to identify what edge causes the negative cycle and disallow such modifications in the graph. Can someone help me with a pointer?
Thanks,
Paul

Paul, If you're about to add an edge (source, destination, weight), and you know the distance from destination to source, then you're creating a negative cycle if and only if new weight + old distance is negative.
On the other hand, if you've just got a graph, the bellman-ford algorithm detects negative cycles and can exhibit one when it finds one. You just need to either find an implementation that does that (rather than just failing), or write one yourself. It's not a difficult algorithm and there's lots of pseudocode on the web.
(It's probably a couple of days consultancy work if you want one custom-written for you. I do this sort of thing for a living and would be happy to.)
I'm not sure exactly what you need. I don't know, but I'd imagine that there's an on-line version of Bellman-Ford that keeps its distances up to date cheaply as new edges come in, and will scream if you add a bad one.

Related

Longest path in small graph with small degree

this is a problem I've got from my class (I assure you it's not homework). I'm still pondering about it until now. You will receive a graph with at most 25 nodes and 25 edges. Additionally, each node will have degree of at most 3. The task is to find the longest path in this graph. However, you won't only receive 1 graph, but 15,000 graphs, and you'll need to find the longest path in all of them in 1 second. Could anyone please give me a solution (or better yet, just a hint) to this problem? Thank you very much!
Info:- Nodes can be revisited, the only constraints are the edges.- The graphs are given by the edges. So the first line states how many nodes and edges there are, and the lines after that represent the edges, each edge by a pair of integers.- The edges are unweighted.- The only answer required is the length of the path, not the path itself.- This might be important: the graph isn't necessarily connected.
After I saw that "nodes can be revisited", I realised that this is in some ways a trick question. To satisfy those seemingly unbelievable time constraints, what you actually need here is not an algorithm for constructing such a path (usually called a trail, BTW, if vertices can be reused), but rather, for each connected component of the graph, a way of quickly detecting whether all or nearly all edges in that component can be included in a single trail.
So here is my hint: Did you know that there are seven bridges in Königsberg? ;-)
That might seem cryptic, but I think some quick searching around will point you in the right direction, and you will soon find a way to quickly detect whether all edges in a component can form part of the same trail. (You'll need to do some more thinking to figure out how many edges can be included when the answer to the above question is "no".)

Shortest path that traverses all regions

I have a computational problem where, given a set of observations, I want to determine the minimum set of phenomena (explanations) that account for all observations. Phenomena can cause one another, that is all phenomena can be represented as an unweighed directed graph with causal relationships as edges.
I am given the following:
An exhaustive list of all possible observations O_1 ... O_N
An exhaustive list of all possible phenomena (causes/explanations) C_1, ... C_N
For each observation O_N, a list of the phenomena that can cause it
For each phenomenon C_N, a list of the other phenomena that can cause it
The problem is represented below in graph form (sorry for the quality of the picture). Each node is a phenomenon, and each edge represents a causal relationship between phenomena. Edges are unweighted. Each region outlined by a bigger "bubble" represents a possible observation, with all phenomena lying within the bubble being the subset of phenomena that are known to cause that observation.
The problem, restated, is to find the shortest path that crosses all regions in the graph. (For simplicity, assume there is a unique path that explains all observations - no branching, no need for multiple paths).
My questions are as follows:
Is this a known computational problem, or a variant of a known computational problem?
Are there known algorithms for solving this specific problem (beyond just "use existing shortest path" algorithms)?
If not, how should I approach this problem? Specifically, how do I decompose the problem into simpler (i.e. simple shortest path) problems?
If it helps regarding computational feasibility, the number of observations is on the order of 10,000 and the number of possible phenomena on the order of 100,000.
An phenomenon that causes neither observation nor another phenomenon won't appear in any minimal answer, so we can assume there aren't any of them. In other words, pass one of any algorithm is get rid of these "useless" phenomena.
With that assumption, we treat observations just like any other vertex. Since observations cause nothing, all observations are leaf vertices. Since all phenomenon cause something (see step 1), no phenomenon is a leaf vertex. Thus we can simplify the problem statement and simply talk about the leaf vertices of a directed graph.
In general, there's going to be no single path that hits at least one branch vertex of each leaf. Instead, a better way of posing the problem is seek some kind of minimal graph that spans all the leaf vertices, but need not span the phenomena.
This a variation on the Steiner Tree Problem on graphs. It's NP-complete. Most variations are also NP-complete. The best hope you've got is something good enough, that is, an approximation algorithm.
You don't state this assumption explicitly, but it seems like you're assuming that there's no cyclic causation of phenomena (such as A causes B causes C causes A again). In this case your problem is on a directed acyclic graph, but that doesn't help. The directed problem is as hard as the undirected one.
This is a set-cover probem combined with Hamiltonian path problem. Let me explain: Since each phenomena is related to a group of observations, you can look at each phenomena as a set in the set-cover problem. We need to check each group of phenomena which together cover all observations, to see if a Hamiltonian path exists for this group, that is - there is a simple path which includes all the phenomena in the group.
One approach is to find the smallest set cover (=group of phenomena) and check if a Hamiltonian Path exists for this group. Then continue to the next (equal or larger) set cover, and do the same check, and so on until we find a set cover which has a hamiltonian path. This will be the smallest group of phenomena, which cover all observations, and that has a simple path going over all phenomena in the group.

Pathfinding - A path of less than or equal to n turns

Most of the time when implementing a pathfinding algorithm such as A*, we seek to minimize the travel cost along the path. We could also seek to find the optimal path with the fewest number of turns. This could be done by, instead of having a grid of location states, having a grid of location-direction states. For any given location in the old grid, we would have 4 states in that spot representing that location moving left, right, up, or down. That is, if you were expanding to a node above you, you would actually be adding the 'up' state of that node to the priority queue, since we've found the quickest route to this node when going UP. If you were going that direction anyway, we wouldnt add anything to the weight. However, if we had to turn from the current node to get to the expanded node, we would add a small epsilon to the weight such that two shortest paths in distance would not be equal in cost if their number of turns differed. As long as epsilon is << cost of moving between nodes, its still the shortest path.
I now pose a similar problem, but with relaxed constraints. I no longer wish to find the shortest path, not even a path with the fewest turns. My only goal is to find a path of ANY length with numTurns <= n. To clarify, the goal of this algorithm would be to answer the question:
"Does there exist a path P from locations A to B such that there are fewer than or equal to n turns?"
I'm asking whether using some sort of greedy algorithm here would be helpful, since I do not require minimum distance nor turns. The problem is, if I'm NOT finding the minimum, the algorithm may search through more squares on the board. That is, normally a shortest path algorithm searches the least number of squares it has to, which is key for performance.
Are there any techniques that come to mind that would provide an efficient way (better or same as A*) to find such a path? Again, A* with fewest turns provides the "optimal" solution for distance and #turns. But for my problem, "optimal" is the fastest way the function can return whether there is a path of <=n turns between A and B. Note that there can be obstacles in the path, but other than that, moving from one square to another is the same cost (unless turning, as mentioned above).
I've been brainstorming, but I can not think of anything other than A* with the turn states . It might not be possible to do better than this, but I thought there may be a clever exploitation of my relaxed conditions. I've even considered using just numTurns as the cost of moving on the board, but that could waste a lot of time searching dead paths. Thanks very much!
Edit: Final clarification - Path does not have to have least number of turns, just <= n. Path does not have to be a shortest path, it can be a huge path if it only has n turns. The goal is for this function to execute quickly, I don't even need to record the path. I just need to know whether there exists one. Thanks :)

Travelling by bus

If you have the full bus schedule for a country, how can you find the
furthest anyone can travel in one day without visiting the same stop twice?
I assume a bus schedule gives you the full list of leaving and arriving times for every bus stop.
A slow and naive method would be as follows.
You can of course make a graph from the bus schedule with multiple directed edges between bus stops. You could then do a depth first search remembering the arrival time of the edge you took to get to each node and only taking edges from that stop that leave after the one that you took to get there. If you go to a node you have been to before you would only carry on from there if the current time in your traversal is before the earliest time you had ever visited that node before. You could record the furthest you can get from each node and then you could check each node to find the furthest you can travel overall.
This seems very inefficient however and it really isn't a normal graph problem. The problem is that in a normal directed graph if you can get from A to B and from B to C then you can get from A to C. This isn't true here.
What is the fastest you can solve this problem?
I think your original algorithm is pretty good.
You can think of your approach as being a version of Dijkstra's algorithm, in attempting to find the shortest path to each node.
Note that it is best at this stage to weight edges in the graph in terms of time. The idea is to use your Dijkstra-like algorithm to compute all nodes reachable within 1 days worth of time, and then pick whichever of these nodes is furthest in space from the start point.
Implementations of Dijkstra can use a heap to retrieve the next node to explore in O(logn), and I think this would be a good enhancement to your approach as well. If you always choose the node that you can reach earliest, you never need to repeat the calculation for that node.
Overall the approach is:
For each starting point
Use a modified Dijkstra to compute all nodes reachable in 1 day
Find the furthest in space of all these nodes.
So for n starting points and e bus routes, the complexity is about O(n(n+e)log(n)) to get the optimal answer.
You should be able to get improved performance by using an appropriate heuristic in an A* search. The heuristic needs to underestimate the max distance possible from a point, so you could use the maximum speed of a bus multiplied by the remaining time.
Instead of making multiple edges for each departure from a location, you can make multiple nodes per location / time.
Create one node per location per departure time.
Create one node per location per arrival time.
Create edges to connect departures to arrivals.
Create edges to connect a given node to the node belonging to the same location at the nearest future time.
By doing this, any path you can traverse through the graph is "valid" (meaning a traveler would be able to achieve this by a combination of bus trips or choosing to sit at a location and wait for a future bus).
Sorry to say, but as described this problem has a pretty high complexity. Misread the problem originally and thought it was np-hard, but it is not. It does however have a pretty high complexity that I personally would not want to deal with. This algorithm is a pretty good approximation that give a considerable complexity savings that I personally think it worth it.
However, if all you want is an answer that is "pretty good" there are are lot of fairly efficient algorithms out there that will get close very quickly.
Personally I would suggest using a simple greedy algorithm here.
I've done this on a few (granted, small and contrived) examples and it's worked pretty well and has an nlog(n) efficiency.
Associate a velocity with each node, velocity being the fastest you can move away from a given node. In my examples this velocity was distance_travelled/(wait_time + travel_time). I used the maximum velocity of all trips leaving a node as the velocity score for that node.
From your node/time calculate the velocities of all neighboring nodes and travel to the "fastest" node.
This algorithm is pretty good for the complexity as it basically transforms the problem into a static search, but there are a couple potential pitfalls that could be adjusted for depending on your data set.
The biggest issue with this algorithm is the possibility of a really fast bus going into the middle of nowhere. You could get around that by adding a "popularity" term to the velocity calculation (make more popular stops effectively faster) but depending on your data set that could easily make things either better or worse.
The simplistic graph representation will not work. I. e. each city is a node and the edges represent time. That's because the "edge" is not always active -- it is only active at certain times of the day.
The second thing that comes to mind is Edward Tufte's Paris Train Schedule which is a different kind of graph. But that does not quite fit the problem either. With the train schedule, the stations have a sequential relationship between stations, but that's not the case in general with cities and bus schedules.
But Tufte motivates the following way to model it as a graph. You could write code only to construct the graph and use a standard graph library that includes the shortest path algorithm.
Each bus trip is an edge with weight = distance covered
Each (city, departure) and (city, arrival) is a node
All nodes for a given city are connected by zero-weight edges in a time-ordered sequence, ignoring whether it is an arrival or a departure. This subgraph will look like a chain.
(it is a directed graph)
Linear Time Solution: Note that the graph will be a directed, acyclic graph. Finding the longest path in such a graph is linear. "A longest path between two given vertices s and t in a weighted graph G is the same thing as a shortest path in a graph −G derived from G by changing every weight to its negation. Therefore, if shortest paths can be found in −G, then longest paths can also be found in G."
Hope this helps! If somebody can post a visualization of the graph, it would be nice. If I can do so myself, I will do 1 more edit.
Naive is the best you'll get -- http://en.wikipedia.org/wiki/Longest_path_problem
EDIT:
So the problem is two fold.
Create a list of graphs where its possible to travel from pointA to pointB. Possible is in terms of times available for busA to travel from pointA to pointB.
Find longest path from all the possible generated path above.
Another approach would be to reevaluate the graph upon each node traversal and find the longest path.
It still reduces to finding longest possible path, which is NP-Hard.

graph algorithm to detect even cycles

I have an undirected graph. One edge in that graph is special. I want to find all other edges that are part of a even cycle containing the first edge.
I don't need to enumerate all the cycles, that would be inherently NP I think. I just need to know, for each each edge, whether it satisfies the conditions above.
A brute force search works of course but is too slow, and I'm struggling to come up with anything better. Any help appreciated.
I think we have an answer (I must credit my colleague with the idea). Essentially his idea is to do a flood fill algorithm through the space of even cycles. This works because if you have a large even cycle formed by merging two smaller cycles then the smaller cycles must have been both even or both odd. Similarly merging an odd and even cycle always forms a larger odd cycle.
This is a practical option only because I can imagine pathological cases consisting of alternating even and odd cycles. In this case we would never find two adjacent even cycles and so the algorithm would be slow. But I'm confident that such cases don't arise in real chemistry. At least in chemistry as it's currently known, 30 years ago we'd never heard of fullerenes.
If your graph has a small node degree, you might consider using a different graph representation:
Let three atoms u,v,w and two chemical bonds e=(u,v) and k=(v,w). A typical way of representing such data is to store u,v,w as nodes and e,k as edges in a graph.
However, one may represent e and k as nodes in the graph, having edges like f=(e,k) where f represents a 2-step link from u to w, f=(e,k) or f=(u,v,w). Running any algorithm to find cycles on such a graph will return all even cycles on the original graph.
Of course, this is efficient only if the original graph has a small node degree. When a user performs an edit, you can easily edit accordingly the alternative representation.

Resources