Multigraph and adjacency list - performance

I have a problem that can be represented as a multigraph. To represent this graph internally, I’m thinking of a matrix. I like the idea of a matrix because I want to count the number of edges for a vertex. This would be O(n) time because all I would have to do is loop through the correct column so the time complexity would be linear to the amount of vertices in the graph, right?. HOWEVER, I’m also thinking of the space complexity. If this graph were to grow, there could be a lot of wasted space. This leads me to using an adjacency list. This may reduce my space complexity but sounds like my time complexity just increased. How would I represent the time complexity if I wanted to determine the number of edges for a particular vertex? I know the operation would first be to find the vertex so this operation would be O(n), but then I would also have to scan the list of edges which could also be O(n). So does this mean my time complexity for this operation is O(n^2)?
EDIT:
I guess if I were to use a HASH table, the first operation would be O(1) so does that mean my operation to find number of edges for a vertex is O(n)?

It will be O(|e|), |e| can be O(|v|**2) but you wanna use adjacency list because the matrix is sparse so |e|<<|v| so it's better to say O(|e|).

Related

Which Graph Algorithms prefer adjacency matrix and why?

I heard that adjacency lists are used in most graph algorithms (but not all). I'm just wondering what algorithms prefer adjacency matrices and why?
So far I’ve found that Floyd Warshall uses adjacency matrices.
Adjacency lists are generally faster than adjacency matrices in algorithms in which the key operation performed per node is “iterate over all the nodes adjacent to this node.” That can be done in time O(deg(v)) time for an adjacency list, where deg(v) is the degree of node v, while it takes time Θ(n) in an adjacency matrix. Similarly, adjacency lists make it fast to iterate over all of the edges in a graph - it takes time O(m + n) to do so, compared with time Θ(n2) for adjacency matrices.
Some of the most-commonly-used graph algorithms (BFS, DFS, Dijkstra’s algorithm, A* search, Kruskal’s algorithm, Prim’s algorithm, Bellman-Ford, Karger’s algorithm, etc.) require fast iteration over all edges or the edges incident to particular nodes, so they work best with adjacency lists.
You mentioned that Floyd-Warshall uses adjacency matrices. While Floyd-Warshall does maintain an internal matrix tracking shortest paths seen so far, it doesn’t actually require the original graph to be an adjacency matrix. The overall cost of the dynamic programming work is Θ(n3), which is bigger than the O(n2) cost of converting an adjacency list into an adjacency matrix or vice-versa.
There are only a few places where an adjacency matrix is faster than an adjacency list. Adjacency matrices take time O(1) to test whether a particular edge is present in the graph, which is faster than the O(deg(v)) cost of the corresponding operation on an adjacency list. Since the cost of converting an adjacency list to an adjacency matrix is Θ(n2), the only cases where an adjacency matrix would outperform an adjacency list are in situations where (1) random access of the edges are required and (2) the total runtime of the algorithm is o(n2). I only know a few algorithms that do this. For example, there’s the celebrity-finding problem where you’re given a graph and are asked to find whether there’s a node with incoming edges from each node and outgoing edges to no nodes. This can be done in time O(n) using an adjacency matrix, faster than what can be done with an adjacency list.
(That being said, you could also use an adjacency list represented using cuckoo hash tables rather than regular lists and match the same runtime bounds as above, though with the cost of creating the adjacency list now only expected to be fast rather than actually worst-case efficient.)
The main reason I’ve found adjacency matrices to be useful is in thinking about graphs from a different perspective. For example, raising an adjacency matrix to the kth power makes a new matrix that counts the number of paths from one node to another using exactly k hops. This can be used to count and find triangles in graphs faster than the naive algorithm, for example. Similarly, the Four Russians algorithm for computing transitive closures of graphs works by representing the graph as a matrix and using some clever techniques (treating blocks of bits as integers then used in a lookup table) to outperform the naive search.
Hope this helps!

Dijkstra's storing the Graph in a text file

I was wondering, what is the most efficient way of storing the graph in a text file while you are implementing Dijkstra's algorithm? (Adjacency matrix, incidence matrix? etc)
In the general case, a good approach is to store a list of all edges.
It takes O(E) space: we store two endpoints per edge.
To store it on disk, that will suffice.
To work with such a list, it is usually stored in memory as V adjacency lists, one for every vertex.
This duplicates each edge (u->v and v->u) if the graph is undirected.
However, a common operation for graph algorithms is to traverse all edges from a given vertex.
By storing an adjacency list for each vertex, we get to do that in O(number of neighbors), which is the best possible.
Adjacency matrix takes O(V^2) space, which might be fine for dense graphs, but is worse than O(E) in the general case.
Incidence matrix takes O(VE) space, and is not efficient, unless your graph is somehow very special to make it so.
The fastest implementations of Dijkstra's algorithm take O(E log V) time, so O(E) memory is usually fine.

Linear Time Algorithm For MST

I was wondering if anyone can point to a linear time algorithm to find the MST of a graph when there is a small number of weights (I.e edges can only have 2 different weights).
I could not find anything on google other than Prim's, Kruskal's, Boruvka's none of which seem to have any properties that would reduce the run time in this special case. I'm guessing to make it linear time it would have to be some sort of modification of BFS (which finds the MST when the weights are uniform).
The cause of the lg V factor in Prim's O(V lg V) runtime is the heap that is used to find the next candidate edge. I'm pretty sure that it is possible to design a priority queue that does insertion and removal in constant time when there's a limited number of possible weights, which would reduce Prim to O(V).
For the priority queue, I believe it would suffice with an array whose indices covers all the possible weights, where each element points to a linked list that contains the elements with that weight. You'd still have a factor of d (the number of distinct weights) for figuring out which list to get the next element out of (the "lowest" non-empty one), but if d is a constant, then you'll be fine.
Elaborating on Aasmund Eldhuset's answer: if the weights in the MST are restricted to numbers in the range 0, 1, 2, 3, ..., U-1, then you can adapt many of the existing algorithms to run in (near) linear time if U is a constant.
For example, let's take Kruskal's algorithm. The first step in Kruskal's algorithm is to sort the edges into ascending order of weight. You can do this in time O(m + U) if you use counting sort or time O(m lg U) if you use radix sort. If U is a constant, then both of these sorting steps take linear time. Consequently, the runtime for running Kruskal's algorithm in this case would be O(m α(m)), where α(m) is the inverse Ackermann function, because the limiting factor is going to be the runtime of maintaining the disjoint-set forest.
Alternatively, look at Prim's algorithm. You need to maintain a priority queue of the candidate distances to the nodes. If you know that all the edges are in the range [0, U), then you can do this in a super naive way by just storing an array of U buckets, one per possible priority. Inserting into the priority queue then just requires you to dump an item into the right bucket. You can do a decrease-key by evicting an element and moving it to a lower bucket. You can then do a find-min by scanning the buckets. This causes the algorithm runtime to be O(m + nU), which is linear if U is a constant.
Barder and Burkhardt in 2019 proposed this approach to find MSTs in linear time given the non-MST edges are given in ascending order of their weights.

Can the adjacency matrix implementation of Prim use a min heap?

I found that there are two ways to implement Prim algorithm, and that the time complexity with an adjacency matrix is O(V^2) while time complexity with a heap and adjacency list is O(E lg(V)).
I'm wondering can I use a heap when graph is represented with adjacency matrix. Does it make sense? If it does, is there any difference between adjacency matrix + heap and adjacency list + heap?
Generally, the matrix graph-representation is not so good for Prim's algorithm.
This is because of the main iteration of the algorithm, which pops out a node from the heap, and then scans its neighbors. How do you find its neighbors? Using the matrix graph representation, you basically need to loop over an entire matrix row (in the list graph-representation, you just need to loop over the node's list, which can be significantly shorter).
This means that, irrespective of the heap, just the sum of the part of finding the neighbor's of the popped node is already Ω(|V|2), as each node's row is eventually scanned.
So, no - it doesn't make much sense. The heap does not reduce the overall complexity.

How can a heap be used to optimizie Prim's minimum spanning tree algorithm?

I have to solve a question that is something like this:
I am given a number N which represents the number of points I have. Each point has two coordinates: X and Y.
I can find the distance between two points with the following formula:
abs(x2-x1)+abs(y2-y1),
(x1,y1) being the coordinates of the first point, (x2,y2) the coordinates of the second point and abs() being the absolute value.
I have to find the minimum spanning tree, meaning I must have all my points connected with the sum of the edges being minimal. Prim's algorithm is good, but it is too slow. I read that I can make it faster using a heap but I didn't find any article that explains how to do that.
Can anyone explain me how Prim's algorithm works with a heap(some sample code would be good but not neccesarily), please?
It is possible to solve this problem efficiently(in O(n log n) time), but it is not that easy. Just using the Prim's algorithm with a heap does not help(it actually makes it even slower), because its time complexity is O(E log V), which is O(n^2 * log n) in this case.
However, you can use the Delaunay triangulation to reduce the number of edges in the graph. The Delaunay triangulation graph is planar, so it has linear number of edges. That's why running the Prim's algorithm with a heap on it gives O(n log n) time complexity(there are O(n) edges and n vertices). You can read more about it here(covering this algorithm in details and proving its correctness would make my answer way too long): http://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree. Note that even though the article is about the Euclidian mst, the approach for your case is essentially the same(it is possible to build the Delaunay triangulation for manhattan distance efficiently, too).
A description of the Prim's algorithm with a heap itself is already present in two other answers to your question.
From the Wikipedia article on Prim's algorithm:
[S]toring vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed minimum spanning tree (MST) (or infinity if no such edge exists). Every time a vertex v is chosen and added to the MST, a decrease-key operation is performed on all vertices w outside the partial MST such that v is connected to w, setting the key to the minimum of its previous value and the edge cost of (v,w).
While it was pointed out that Prim's with a heap is O(E log V), which is O(n^2 log n) in the worst case, I can provide what makes the heap faster in cases other than that worst case, since that has still not been answered.
What makes Prim's so costly at O(V^2) is the necessary updating each iteration in the algorithm. In general, Prim's works by keeping a table of your vertices with the lowest length to other vertices and picking the cheapest vertex to add to your growing tree until all are added. Every time you add a vertex, you must then go back to your table and update any vertices that can now be accessed with less weight. You then must walk back all the way through your table to decide which vertex is cheapest to add. This setup - having to pick the next vertex (O(V)) V times - gives the O(V^2).
The heap is able to help this running time is all cases besides the worst case because it fixes this bottleneck. By working with a minimum heap, you can access the minimum weight in consideration in O(1). Additionally, it costs O(log V) to fix a heap after adding a number to it to maintain its properties, which is done E times for O(E log V) to maintain the heap for Prim's. This becomes the new bottleneck, which is what gives rise to the final running time of O(E log V).
So, depending on how much you know about your data, Prim's with a heap can certainly be more efficient than without!

Resources