Dijkstra's storing the Graph in a text file - algorithm

I was wondering, what is the most efficient way of storing the graph in a text file while you are implementing Dijkstra's algorithm? (Adjacency matrix, incidence matrix? etc)

In the general case, a good approach is to store a list of all edges.
It takes O(E) space: we store two endpoints per edge.
To store it on disk, that will suffice.
To work with such a list, it is usually stored in memory as V adjacency lists, one for every vertex.
This duplicates each edge (u->v and v->u) if the graph is undirected.
However, a common operation for graph algorithms is to traverse all edges from a given vertex.
By storing an adjacency list for each vertex, we get to do that in O(number of neighbors), which is the best possible.
Adjacency matrix takes O(V^2) space, which might be fine for dense graphs, but is worse than O(E) in the general case.
Incidence matrix takes O(VE) space, and is not efficient, unless your graph is somehow very special to make it so.
The fastest implementations of Dijkstra's algorithm take O(E log V) time, so O(E) memory is usually fine.

Related

faster graph traversal algorithms compared to dfs

I have an undirected unweighted graph represented using adjacency matrix where each node of the graph represents a space partition (e.g. State) while the edges represent the neiborhood relationship (i.e. neighboring states sharing common boundaries). My baseline algorithm uses DFS to traverse the graph and form subgraphs after each step (i.e. adding the new node visited which would result in a bunch of contiguous states). With that subgraph I perform a statistical significance test on certain patterns which exist in the nodes of the graph (i.e. within the states).
At this point I am essentially trying to make the traversal step faster.
I was wondering if you all could suggest any algorithm or resources (e.g. research paper) which performs graph traversal computationally faster than DFS.
Thanks for your suggestion and your time!
Most graph algorithms contain "for given vertex u, list all its neighbors v" as a primitive. Not sure, but sounds like you might want to speed up this piece. Indeed, each country has only few neighbors, typically much less than the total number of countries. If this is the case, replace adjacency matrix graph representation with adjacency lists.
Note that the algorithm itself (DFS or other) will likely remain the same, with just a few changes where it uses this primitive.

Which Graph Algorithms prefer adjacency matrix and why?

I heard that adjacency lists are used in most graph algorithms (but not all). I'm just wondering what algorithms prefer adjacency matrices and why?
So far I’ve found that Floyd Warshall uses adjacency matrices.
Adjacency lists are generally faster than adjacency matrices in algorithms in which the key operation performed per node is “iterate over all the nodes adjacent to this node.” That can be done in time O(deg(v)) time for an adjacency list, where deg(v) is the degree of node v, while it takes time Θ(n) in an adjacency matrix. Similarly, adjacency lists make it fast to iterate over all of the edges in a graph - it takes time O(m + n) to do so, compared with time Θ(n2) for adjacency matrices.
Some of the most-commonly-used graph algorithms (BFS, DFS, Dijkstra’s algorithm, A* search, Kruskal’s algorithm, Prim’s algorithm, Bellman-Ford, Karger’s algorithm, etc.) require fast iteration over all edges or the edges incident to particular nodes, so they work best with adjacency lists.
You mentioned that Floyd-Warshall uses adjacency matrices. While Floyd-Warshall does maintain an internal matrix tracking shortest paths seen so far, it doesn’t actually require the original graph to be an adjacency matrix. The overall cost of the dynamic programming work is Θ(n3), which is bigger than the O(n2) cost of converting an adjacency list into an adjacency matrix or vice-versa.
There are only a few places where an adjacency matrix is faster than an adjacency list. Adjacency matrices take time O(1) to test whether a particular edge is present in the graph, which is faster than the O(deg(v)) cost of the corresponding operation on an adjacency list. Since the cost of converting an adjacency list to an adjacency matrix is Θ(n2), the only cases where an adjacency matrix would outperform an adjacency list are in situations where (1) random access of the edges are required and (2) the total runtime of the algorithm is o(n2). I only know a few algorithms that do this. For example, there’s the celebrity-finding problem where you’re given a graph and are asked to find whether there’s a node with incoming edges from each node and outgoing edges to no nodes. This can be done in time O(n) using an adjacency matrix, faster than what can be done with an adjacency list.
(That being said, you could also use an adjacency list represented using cuckoo hash tables rather than regular lists and match the same runtime bounds as above, though with the cost of creating the adjacency list now only expected to be fast rather than actually worst-case efficient.)
The main reason I’ve found adjacency matrices to be useful is in thinking about graphs from a different perspective. For example, raising an adjacency matrix to the kth power makes a new matrix that counts the number of paths from one node to another using exactly k hops. This can be used to count and find triangles in graphs faster than the naive algorithm, for example. Similarly, the Four Russians algorithm for computing transitive closures of graphs works by representing the graph as a matrix and using some clever techniques (treating blocks of bits as integers then used in a lookup table) to outperform the naive search.
Hope this helps!

Dijkstra's algorithm vs relaxing edges in topologically sorted graph for DAG

I was reading Introduction To Algorithms 3rd Edition. There are 3 methods given to solve the problem. My inquiry is about two of them.
The one with no name
The algorithm starts by topologically sorting the dag (see Section 22.4) to impose a linear ordering on the vertices. If the dag contains a path from vertex u to vertex v, then u precedes v in the topological sort. We make just one pass over the vertices in the topologically sorted order. As we process each vertex, we relax each edge that leaves the vertex.
Dijkstra's Algorithm
This is quite well known
As far as the book shows, time complexity of without name one is O(V+E) but of Dijstra's is O(ElogV). We cannot use Dijkstra's on negative weight but we can use the other. What are the advantages of using Dijkstra's Algorithm except it can be used in cyclic ones?
Because the first algorithm you give only works on acyclic graphs, whereas Dijkstra runs on graph with non-negative weight.
The limitations are not the same.
In real-world, many applications can be modelled as graphs with non-negative weights, that's why Dijkstra is so used. Plus, it is very simple to implement. The complexity of Dijkstra is higher because it relies on priority queue, but this does not mean it takes necessarily more time to execute. (nlog(n) time is not that bad, because log(n) is a relatively small number: log(10^80) = 266)
However, this stand for sparse graphs (low density of edges). For dense graphs, other algorithms may be more efficient.

Finding reachable vertices for every vertex in a directed graph

I know that brute force approach to do this is perform DFS on all the vertices of the graph.So for this algorithm the complexity would be O(V|V+E|). But is there more efficient way to do this?
I get the impression from papers like http://research.microsoft.com/pubs/144985/todsfinal.pdf that there is no algorithm that does better than O(VE) or O(V^3) in the general case. For sparse graphs and other special graphs there are faster algorithms. It seems, however, that you can still make improvements by separating "index construction" from "query", if you have some idea of the number of queries that will be made on the data. If there are going to be a lot of queries, O(1) is possible for queries if all the data is pre-computed (DFS or Floyd-Warshall, etc.) and stored in O(n^2) space. On the other hand, if there are going to be relatively few queries, space and/or index construction time can be reduced at the expense of query time.
I really suspect that there isn't a known better algorithm for general graphs. All the papers I found on the subject [1] [2] describe algorithms that run in O(|V| * |E|) time. That isn't better than your naïve attempt in the worst case.
Even the wikipedia page [3] says the fastest algorithms reduce the problem to matrix multiplication, which the fastest algorithms are only marginally better than your baseline.
[1] http://ion.uwinnipeg.ca/~ychen2/conferencePapers/tranRelationCopy.pdf
[2] http://www.vldb.org/conf/1988/P382.PDF
[3] http://en.wikipedia.org/wiki/Transitive_closure#Algorithms
[EDIT: As pointed out by kraskevich, the final query step can be worse than I had originally claimed: up to O(|V|^2) even for an output of size O(|V|), which is no better than ordinary DFS without any preprocessing.].
In the worst case, O(|V|^2) space would be needed to store all this information explicitly -- i.e., to store the complete list of reachable vertices for each vertex (think of a graph in which every vertex has an edge to every other vertex). But it's possible to represent it in such a way that only O(|V|) space is needed, and this representation can be built in O(|V|+|E|) time, and a query on it will only take time proportional to the size of the answer (number of reachable vertices).
The basic idea is: Every vertex in a strongly connected component (SCC) can reach every other vertex in the same SCC (this is the definition of SCC), and can reach all vertices in SCCs that it can reach, and no other vertices.
Find all SCCs; this can be done in O(|V|+|E|) time. Build a table SCC, so that SCC(u) = i if the SCC of u is i (both vertices in G and SCCs can be represented as integers). Afterwards make another pass through this table to build a dual table, Verts, so that Verts(i) contains a list of all vertices in the ith SCC.
Build a new graph G' whose vertices are the SCCs of G. G' will necessarily be acyclic.
So, given a vertex u in G, look up its SCC, SCC(u). Call this i. Perform a DFS through G' starting at vertex i: For each vertex (of G') j encountered during this DFS, output every vertex (of G) in Verts(j).

Cost to transpose a directed graph?

I am trying to construct the transpose of a directed graph by running DFS on the original graph and then generating a adjancy list of the mirror as new nodes are discovered.
What would the computational time of this be? I know that the DFS takes O(|V| + |E|) but what about constructing the adjancy list? How long does it take to construct the adjancy list of the transpose through DFS?
If you have O(1) insertions of items into your graph (supposing you are using a hashtable or hashmap for vertex lookup or an array if your vertices are represented by integers), then the asymptotic runtime should be no different than the DFS.
I don't think you actually need to do a DFS, to be honest. I think you could just iterate over each vertex's adjacency list and then add the edges that way. The runtime will still be O(V+E), so theoretically, it doesn't really matter.
Also, if your graph is represented as an edge list, then I believe making the transpose graph would just be O(E), but I guess that requires the graph to be connected.
Sorry if there was too much extra information in there, and I hope I was able to help!

Resources