How can I know reachability of A* algorithm on a graph? - algorithm

I got some data set of a graph.
then I have to get distance between city A and B.
How can I know reachability of route from A to B ?
do I have to search for all reachable cities with a*?
I think that needs so much time.

If you want to know if an arbitrary node is reachable from another arbitrary node, then yes, you will need to search the graph. Yes, this means you could end up traversing every node on the graph. Yes, if you have a huge number of nodes, it will be slow. Life is tough.
If you need to do many such reachability checks, it might be worth it to pre-process the graph, and store reachability information which can be used for faster lookups. That information will probably take a lot of space, though you could use secondary storage (i.e. hard drive).

Related

Travelling by bus

If you have the full bus schedule for a country, how can you find the
furthest anyone can travel in one day without visiting the same stop twice?
I assume a bus schedule gives you the full list of leaving and arriving times for every bus stop.
A slow and naive method would be as follows.
You can of course make a graph from the bus schedule with multiple directed edges between bus stops. You could then do a depth first search remembering the arrival time of the edge you took to get to each node and only taking edges from that stop that leave after the one that you took to get there. If you go to a node you have been to before you would only carry on from there if the current time in your traversal is before the earliest time you had ever visited that node before. You could record the furthest you can get from each node and then you could check each node to find the furthest you can travel overall.
This seems very inefficient however and it really isn't a normal graph problem. The problem is that in a normal directed graph if you can get from A to B and from B to C then you can get from A to C. This isn't true here.
What is the fastest you can solve this problem?
I think your original algorithm is pretty good.
You can think of your approach as being a version of Dijkstra's algorithm, in attempting to find the shortest path to each node.
Note that it is best at this stage to weight edges in the graph in terms of time. The idea is to use your Dijkstra-like algorithm to compute all nodes reachable within 1 days worth of time, and then pick whichever of these nodes is furthest in space from the start point.
Implementations of Dijkstra can use a heap to retrieve the next node to explore in O(logn), and I think this would be a good enhancement to your approach as well. If you always choose the node that you can reach earliest, you never need to repeat the calculation for that node.
Overall the approach is:
For each starting point
Use a modified Dijkstra to compute all nodes reachable in 1 day
Find the furthest in space of all these nodes.
So for n starting points and e bus routes, the complexity is about O(n(n+e)log(n)) to get the optimal answer.
You should be able to get improved performance by using an appropriate heuristic in an A* search. The heuristic needs to underestimate the max distance possible from a point, so you could use the maximum speed of a bus multiplied by the remaining time.
Instead of making multiple edges for each departure from a location, you can make multiple nodes per location / time.
Create one node per location per departure time.
Create one node per location per arrival time.
Create edges to connect departures to arrivals.
Create edges to connect a given node to the node belonging to the same location at the nearest future time.
By doing this, any path you can traverse through the graph is "valid" (meaning a traveler would be able to achieve this by a combination of bus trips or choosing to sit at a location and wait for a future bus).
Sorry to say, but as described this problem has a pretty high complexity. Misread the problem originally and thought it was np-hard, but it is not. It does however have a pretty high complexity that I personally would not want to deal with. This algorithm is a pretty good approximation that give a considerable complexity savings that I personally think it worth it.
However, if all you want is an answer that is "pretty good" there are are lot of fairly efficient algorithms out there that will get close very quickly.
Personally I would suggest using a simple greedy algorithm here.
I've done this on a few (granted, small and contrived) examples and it's worked pretty well and has an nlog(n) efficiency.
Associate a velocity with each node, velocity being the fastest you can move away from a given node. In my examples this velocity was distance_travelled/(wait_time + travel_time). I used the maximum velocity of all trips leaving a node as the velocity score for that node.
From your node/time calculate the velocities of all neighboring nodes and travel to the "fastest" node.
This algorithm is pretty good for the complexity as it basically transforms the problem into a static search, but there are a couple potential pitfalls that could be adjusted for depending on your data set.
The biggest issue with this algorithm is the possibility of a really fast bus going into the middle of nowhere. You could get around that by adding a "popularity" term to the velocity calculation (make more popular stops effectively faster) but depending on your data set that could easily make things either better or worse.
The simplistic graph representation will not work. I. e. each city is a node and the edges represent time. That's because the "edge" is not always active -- it is only active at certain times of the day.
The second thing that comes to mind is Edward Tufte's Paris Train Schedule which is a different kind of graph. But that does not quite fit the problem either. With the train schedule, the stations have a sequential relationship between stations, but that's not the case in general with cities and bus schedules.
But Tufte motivates the following way to model it as a graph. You could write code only to construct the graph and use a standard graph library that includes the shortest path algorithm.
Each bus trip is an edge with weight = distance covered
Each (city, departure) and (city, arrival) is a node
All nodes for a given city are connected by zero-weight edges in a time-ordered sequence, ignoring whether it is an arrival or a departure. This subgraph will look like a chain.
(it is a directed graph)
Linear Time Solution: Note that the graph will be a directed, acyclic graph. Finding the longest path in such a graph is linear. "A longest path between two given vertices s and t in a weighted graph G is the same thing as a shortest path in a graph −G derived from G by changing every weight to its negation. Therefore, if shortest paths can be found in −G, then longest paths can also be found in G."
Hope this helps! If somebody can post a visualization of the graph, it would be nice. If I can do so myself, I will do 1 more edit.
Naive is the best you'll get -- http://en.wikipedia.org/wiki/Longest_path_problem
EDIT:
So the problem is two fold.
Create a list of graphs where its possible to travel from pointA to pointB. Possible is in terms of times available for busA to travel from pointA to pointB.
Find longest path from all the possible generated path above.
Another approach would be to reevaluate the graph upon each node traversal and find the longest path.
It still reduces to finding longest possible path, which is NP-Hard.

What's the purpose of BFS and DFS?

I've learned how these algorithms work, but what are they used for?
Do we use them to:
find a certain node in a graph or
to find a shortest path or
to find a cycle in a graph
?
Both of them just visit all the nodes and mark them visited, and I don't see the point of doing that.
I am sort of lost here what I am learning.
BFS and DFS are graph search algorithms that can be used for a variety of different purposes.
One common application of the two search techniques is to identify all nodes that are reachable from a given starting node. For example, suppose that you have a collection of computers that each are networked to a handful of other computers. By running a BFS or DFS from a given node, you will discover all other computers in the network that the original computer is capable of directly or indirectly talking to. These are the computers that come back marked.
BFS specifically can be used to find the shortest path between two nodes in an unweighted graph. Suppose, for example, that you want to send a packet from one computer in a network to another, and that the computers aren't directly connected to one another. Along what route should you send the packet to get it to arrive at the destination as quickly as possible? If you run a BFS and at each iteration have each node store a pointer to its "parent" node, you will end up finding route from the start node to each other node in the graph that minimizes the number of links that have to be traversed to reach the destination computer.
DFS is often used as a subroutine in more complex algorithms. For example, Tarjan's algorithm for computing strongly-connected components is based on depth-first search. Many optimizing compiler techniques run a DFS over an appropriately-constructed graph in order to determine in which order to apply a specific series of operations. Depth-first search can also be used in maze generation: by taking a grid of nodes and linking each node to its neighbors, you can construct a graph representing a grid. Running a random depth-first search over this graph then produces a maze that has exactly one solution.
This is by no means an exhaustive list. These algorithms have all sorts of applications, and as you start to explore more advanced algorithms you will often find yourself relying on DFS and BFS as building blocks. It's similar to sorting - sorting by itself isn't all that interesting, but being able to sort a list of values is enormously useful as a subroutine in more complex algorithms.
Hope this helps!

What is a fast algorithm for finding critical nodes?

I'm looking for a quick method/algorithm for finding which nodes in a graph is critical.
For example, in this graph:
Node number 2 and 5 are critical.
My current method is to try removing one non-endpoint node from the graph at a time and then check if the entire network can be reached from all other nodes. This method is obvious not very efficient.
What are a better way?
See biconnected components. Calling them articulation points instead of critical nodes seems to yield better search results.
In any case, the algorithm consists of a simple depth first search where you maintain certain information for each node.
there are several better ways. research is always helpful
but since this is homework, the point of the exercise is likely to be to figure it out yourself
hint: how could you decorate the graph to tell you what nodes depend on what other nodes, and would this information perhaps be useful to spot the critical nodes?

What is a good measure of strength of a link and influence of a node?

In the context of social networks, what is a good measure of strength of a link between two nodes? I am currently thinking that the following should give me what I want:
For two nodes A and B:
Strength(A,B) = (neighbors(A) intersection neighbors(B))/neighbors(A)
where neighbors(X) gives the total number of nodes directly connected to X and the intersection operation above gives the number of nodes that are connected to both A and B.
Of course, Strength(A,B) != Strength(B,A).
Now knowing this, is there a good way to determine the influence of a node? I was initially using the Degree Centrality of a node to determine its "influence" but I somehow think its not a good idea because just because a node has a lot of outgoing links does not mean anything. Those links should be powerful as well. In that case, maybe using an aggregate of the strengths of each node connected to this node is a good idea to estimate its influence? Am I in the right direction? Does anyone have any suggestions?
My Philosophy (and understanding of the terms):
Strength indicates how far A is
willing to do what B has already done
Influence indicates how far A can make B do something (persuasion perhaps?)
Constraints:
Access to only a subgraph. I mean, I am trying to be realistic here because social networks are huge and having a complete view is not so practical.
you might want to check out some more sophisticated notions of distance.
A really cool one is "resistance distance", which lets you view distance as how likely a random path from one node will lead you to another
there are several days of lecture notes plus references to further reading at http://www.cs.yale.edu/homes/spielman/462/.
Few thoughts on this:
When you talk about influence of a node in a graph one centrality measurement that comes to mind it closeness centrality. Closeness centrality looks at the number of shortest paths in a graph the node is on. From an influence point of view, the node that is on the most shortest paths is the node that can share information the easiest, ie its nearer to more nodes than any other.
You also mention using the strengths of each node connected to a node. Maybe you should look at eigenvector centrality which ranks a node highly if its connected to other high degree nodes. This is an undirected version of PageRank.
Some questions that might affect you choice here are:
Is you graph directed?
Do you edges have weight? You mention strength... do you mean weights of some kind?
If you do have weights maybe the next step from a simple degree centrality would be to try a weighted degree centrality approach. Thus, just having a high number of connections doesn't automatically make you the most influential.

How to design an approximate path solution?

I am attempting to write (or expand on an existing) graph search algorithm that will let me find the path to get closest to destination node considering there is no guarantee that the nodes will be connected.
To provide a realistic application of this, let's say I need to get from Brampton, Ontario to Hamilton, Ontario. I know my possible options at my start point are Local transit, GO bus or Walking. I know that walking is the least desired way to get to my destination so I look at GO bus first. I know I can take GO to a point close to Hamilton, but at that point the GO bus turns and goes another direction at that closest point is at a place where I have no options (other than walk, but the algorithm would only consider walking for short distances otherwise it will consider the route not feasible)
Using this same example, if the algorithm were to find that I can get there a way that is longer but gets me closer to the destination node (or possible at the destination node) that would be a higher weighted path (The weightings don't matter so much while its searching, only when the results are delivered, it would list by which path was closest to the destination in ascending order). For example, one GO Bus may get me 3km from the destination node, while 3 public transit buses would get me 500m away
So my question is two fold:
1) What algorithm should I start with that does something similar
2) How would I programmaticly explain that it's ok if nodes don't connect so that it doesn't just jump from node A to node R. Would starting from the end and working backward accomplish this
Edit: I forgot to ask how to aim for the best approximate solution because especially with a large graph there will be possibly millions of solutions for this problem.
Thanks,
Michael
Read up on the A* algorithm. It is a generalization of Dijkstra's shortest path algorithm, that allows you to specify a heuristic, which provides a lower bound for distances between two verteces. In your case, the heuristic function would simply return the Euclidean distance.
Run the algorithm and keep track of the vertex with the best characteristic value, which you somehow compute from the graph distance from source and Euclidean distance to target. The only tricky part is to determine when to terminate (unless you want to traverse the entire graph).
Why can't you assume that all nodes are connected? In the real world, they normally are, i.e. you can always walk or call a taxi, etc.?
In this case, you could simply change your model the following way: You have one graph for each transportation method. The nodes that are at the same place are connected with edges of weight 0 (i.e. if you are dropped off by car at an airport or train station).
Then, label each vertex and edge with the transportation type and you can simply use existing routing algorithms. Oh, by the way: A* will not scale well to really big networks. To get something that software like Yahoo/Google/Microsoft Maps actually would use, have a look here. The work of this research group includes the winner of the DIMACS shortest path challenge.
Sounds very much like a travelling salesman problem with additional node characteristics. Just be wary that this type of problem is NP Complete and your best bet would be to go with some sort of approximation algorithm.

Resources