faster graph traversal algorithms compared to dfs - algorithm

I have an undirected unweighted graph represented using adjacency matrix where each node of the graph represents a space partition (e.g. State) while the edges represent the neiborhood relationship (i.e. neighboring states sharing common boundaries). My baseline algorithm uses DFS to traverse the graph and form subgraphs after each step (i.e. adding the new node visited which would result in a bunch of contiguous states). With that subgraph I perform a statistical significance test on certain patterns which exist in the nodes of the graph (i.e. within the states).
At this point I am essentially trying to make the traversal step faster.
I was wondering if you all could suggest any algorithm or resources (e.g. research paper) which performs graph traversal computationally faster than DFS.
Thanks for your suggestion and your time!

Most graph algorithms contain "for given vertex u, list all its neighbors v" as a primitive. Not sure, but sounds like you might want to speed up this piece. Indeed, each country has only few neighbors, typically much less than the total number of countries. If this is the case, replace adjacency matrix graph representation with adjacency lists.
Note that the algorithm itself (DFS or other) will likely remain the same, with just a few changes where it uses this primitive.

Related

Is there any other Data structure to represent Graph other than Adjacency List or Adjacency Matrix?

I was looking for different Data structures for representing Graph and I came accross Nvidia CUDA Toolkit and found out new way to represent graph with the help of source_indices, destination_offsets.
Fascinated by this innovative representation of graph, I searched out for other ways of representing Graphs. But not found anything new.
I was wondering if there was any other way to represent Graph other than Adjacency Matrix or Lists...
I was wondering if there was any other way to represent Graph other
than Adjacency Matrix or Lists...
There are alternatives to the adjacency list or the adjacency matrix, such as edge list, adjacency map or forward star to name a few. Given this graph (images taken from here):
this is the adjacency matrix representation:
this is the adjacency list representation:
this would be another alternative, the edge list:
and another pretty common one is the forward star representation:
If you get into this research field you will find a good number of approaches, mainly optimizations for specific cases, taking into account factors such as:
Graph size (number of nodes)
Density of the graph
Directed or undirected graph
Static or dynamic graph
Graph known at compile time or constructed at runtime
Node IDs (labeled sequentially or not)
...
These optimizations can, for example, support reordering of the nodes in a preprocessing stage to increase reference locality. There is a lot of work for shortest path algorithms, specially when calculating the shortest path in a world map.
One example of optimization would be a dynamic graph structure (Packed-Memory Graph (PMG)) which is suited for large-scale transportation networks.
There is another representation of graphs using Adjacency Set. It is very much similar to adjacency list but instead of using Linked lists, Disjoint Sets [Union-Find] are used. You can read about disjoint sets ADT here.
If E is the number of edges and V is the number of vertices in the graph, then Adjacency set representation of graph takes up (E+V) space.
Complexities of other operations while using adjacency set representation:
Checking edge between vertex v and w : log(Degree(v))
Iterate over edges incident to vertex v: Degree(v)

Finding reachable vertices for every vertex in a directed graph

I know that brute force approach to do this is perform DFS on all the vertices of the graph.So for this algorithm the complexity would be O(V|V+E|). But is there more efficient way to do this?
I get the impression from papers like http://research.microsoft.com/pubs/144985/todsfinal.pdf that there is no algorithm that does better than O(VE) or O(V^3) in the general case. For sparse graphs and other special graphs there are faster algorithms. It seems, however, that you can still make improvements by separating "index construction" from "query", if you have some idea of the number of queries that will be made on the data. If there are going to be a lot of queries, O(1) is possible for queries if all the data is pre-computed (DFS or Floyd-Warshall, etc.) and stored in O(n^2) space. On the other hand, if there are going to be relatively few queries, space and/or index construction time can be reduced at the expense of query time.
I really suspect that there isn't a known better algorithm for general graphs. All the papers I found on the subject [1] [2] describe algorithms that run in O(|V| * |E|) time. That isn't better than your naïve attempt in the worst case.
Even the wikipedia page [3] says the fastest algorithms reduce the problem to matrix multiplication, which the fastest algorithms are only marginally better than your baseline.
[1] http://ion.uwinnipeg.ca/~ychen2/conferencePapers/tranRelationCopy.pdf
[2] http://www.vldb.org/conf/1988/P382.PDF
[3] http://en.wikipedia.org/wiki/Transitive_closure#Algorithms
[EDIT: As pointed out by kraskevich, the final query step can be worse than I had originally claimed: up to O(|V|^2) even for an output of size O(|V|), which is no better than ordinary DFS without any preprocessing.].
In the worst case, O(|V|^2) space would be needed to store all this information explicitly -- i.e., to store the complete list of reachable vertices for each vertex (think of a graph in which every vertex has an edge to every other vertex). But it's possible to represent it in such a way that only O(|V|) space is needed, and this representation can be built in O(|V|+|E|) time, and a query on it will only take time proportional to the size of the answer (number of reachable vertices).
The basic idea is: Every vertex in a strongly connected component (SCC) can reach every other vertex in the same SCC (this is the definition of SCC), and can reach all vertices in SCCs that it can reach, and no other vertices.
Find all SCCs; this can be done in O(|V|+|E|) time. Build a table SCC, so that SCC(u) = i if the SCC of u is i (both vertices in G and SCCs can be represented as integers). Afterwards make another pass through this table to build a dual table, Verts, so that Verts(i) contains a list of all vertices in the ith SCC.
Build a new graph G' whose vertices are the SCCs of G. G' will necessarily be acyclic.
So, given a vertex u in G, look up its SCC, SCC(u). Call this i. Perform a DFS through G' starting at vertex i: For each vertex (of G') j encountered during this DFS, output every vertex (of G) in Verts(j).

The usage of graph traversal algorithm

I am reading the materials related to graph in Data Structures and Algorithms in C++ 4e(By Adam Drozdek). In his implementation of Graph Breadth First Search, the psuedo code is like:
BFS():
for all vertices u
num(u) = 0
edges = null
i = 1
while there is a vertex v such that num(v) is 0
num(v)++
enqueue(v)
while queue is not empty
v = dequeue()
if num(u) is 0
num(u) = i++
enqueue(u)
attach edge(v,u) to edges
output edges
Basically, in the implementation of graph, we already keep a set of all vertices and a set of all edges. In BFS, the algorithm first enumerate every vertex not visited in this set to traverse the complete graph.
My question is:
since we already store all the vertex in a set, we can loop through the set to operate on a particular vertex without using BFS algorithm. Why do we need a graph traversal algorithm and what is main use?
There are many uses for BFS and DFS...
To give you an idea for BFS:
You have a graph representing a social network and want to make friend suggestions for a particular user. Then, you do a BFS. The closer the vertices (people), the better rank in the friends suggestion list. (If the number of users is large, it makes sense to stop at a distance of 3 and not do the BFS on the entire graph).
Solution space searching. Extremely useful when the solution space is infinite. (see Game Trees)
Shortest paths (if the edges have the same weight and there are no loops). Dijkstra adapted it to work for variable weights (see Dijkstra's algorithm).
For instance, typically DFS is implicitly used when a tree is traversed recursively.

Calculate sub-paths between two vertices from minimum spanning trees

I have a minimum spanning tree (MST) from a given graph. I am trying to compute the unique sub-path (which should be part of the MST, not the graph) for any two vertices but I am having trouble finding an efficient way of doing it.
So far, I have used Kruskal's algorithm (using Disjoint Data structure) to calculate the MST (for example: 10 vertices A to J).. But now I want to calculate the sub-path between C to E.. or J to C (assuming the graph is undirected).
Any suggestions would be appreciated.
If you want just one of these paths, doing a DFS on your tree is probably the easiest solution, depending on how you store your tree. If it's a proper graph, then doing a DFS is easy, however, if you only store parent pointers, it might be easier to find the least common ancestor of the two nodes.
To do so you can walk from both nodes u,v to the root r and then compare the r->u and r->v paths. The first node where they differ is the least common ancestor.
With linear preprocessing you can answer least common ancestor queries in constant time. If you want to find the paths between pairs of nodes often, you might want to consider implementing that data structure. This paper explains it quite nicely, I think.

Advantage of depth first search over breadth first search or vice versa

I have studied the two graph traversal algorithms,depth first search and breadth first search.Since both algorithms are used to solve the same problem of graph traversal I would like to know how to choose between the two.I mean is one more efficient than the other or any reason why i would choose one over the other in a particular scenario ?
Thank You
Main difference to me is somewhat theoretical. If you had an infinite sized graph then DFS would never find an element if it exists outside of the first path it chooses. It would essentially keep going down the first path and would never find the element. The BFS would eventually find the element.
If the size of the graph is finite, DFS would likely find a outlier (larger distance between root and goal) element faster where BFS would find a closer element faster. Except in the case where DFS chooses the path of the shallow element.
In general, BFS is better for problems related to finding the shortest paths or somewhat related problems. Because here you go from one node to all node that are adjacent to it and hence you effectively move from path length one to path length two and so on.
While DFS on the other end helps more in connectivity problems and also in finding cycles in graph(though I think you might be able to find cycles with a bit of modification of BFS too). Determining connectivity with DFS is trivial, if you call the explore procedure twice from the DFS procedure, then the graph is disconnected (this is for an undirected graph). You can see the strongly connected component algorithm for a directed graph here, which is a modification of DFS. Another application of the DFS is topological sorting.
These are some applications of both the algorithms:
DFS:
Connectivity
Strongly Connected Components
Topological Sorting
BFS:
Shortest Path(Dijkstra is some what of a modification of BFS).
Testing whether the graph is Bipartitie.
When traversing a multiply-connected graph, the order in which nodes are traversed may greatly influence (by many orders of magnitude) the number of nodes to be tracked by the traversing method. Some kinds of algorithms will be massively better when using breadth-first; others will be massively better when using depth-search.
At one extreme, doing a depth-first search on a binary tree with N leaf nodes requires that the traversing method keep track of lgN nodes while a breadth-first search would require keeping track of at least N/2 nodes (since it might scan all other nodes before it scans any leaf nodes; immediately prior to scanning the first leaf node, it would have encountered N/2 of the leafs' parent nodes which have to be tracked separately since none of them reference each other).
On the other extreme, doing a flood-fill on an NxN grid with a method that, if its pixel hasn't been colored yet, colors that pixel and then flood-fills its neighbors will require enqueuing O(N) pixel coordinates if using breadth-first search, but O(N^2) pixel coordinates if using depth-first. When using breadth-first search, paint will seem to "spread out", regardless of the shape to be painted; when using depth-first algorithm to paint a rectangular spiral, each line of which is straight on one side and jagged on the other (which sides should be straight and jagged depends upon the exact algorithm used), all of the straight sections will get painted before any of the jagged ones, meaning that the system must track the location of every jag separately.
For a complete/perfect tree, DFS takes a linear amount of space with respect to the depth of the tree whereas BFS takes an exponential amount of space with respect to the depth of the tree. This is because for BFS the maximum number of nodes in the queue is proportional to the number of nodes in one level of the tree. In DFS the maximum number of nodes in the stack is proportional to the depth of the tree.

Resources