Linear Time Algorithm For MST - algorithm

I was wondering if anyone can point to a linear time algorithm to find the MST of a graph when there is a small number of weights (I.e edges can only have 2 different weights).
I could not find anything on google other than Prim's, Kruskal's, Boruvka's none of which seem to have any properties that would reduce the run time in this special case. I'm guessing to make it linear time it would have to be some sort of modification of BFS (which finds the MST when the weights are uniform).

The cause of the lg V factor in Prim's O(V lg V) runtime is the heap that is used to find the next candidate edge. I'm pretty sure that it is possible to design a priority queue that does insertion and removal in constant time when there's a limited number of possible weights, which would reduce Prim to O(V).
For the priority queue, I believe it would suffice with an array whose indices covers all the possible weights, where each element points to a linked list that contains the elements with that weight. You'd still have a factor of d (the number of distinct weights) for figuring out which list to get the next element out of (the "lowest" non-empty one), but if d is a constant, then you'll be fine.

Elaborating on Aasmund Eldhuset's answer: if the weights in the MST are restricted to numbers in the range 0, 1, 2, 3, ..., U-1, then you can adapt many of the existing algorithms to run in (near) linear time if U is a constant.
For example, let's take Kruskal's algorithm. The first step in Kruskal's algorithm is to sort the edges into ascending order of weight. You can do this in time O(m + U) if you use counting sort or time O(m lg U) if you use radix sort. If U is a constant, then both of these sorting steps take linear time. Consequently, the runtime for running Kruskal's algorithm in this case would be O(m Ξ±(m)), where Ξ±(m) is the inverse Ackermann function, because the limiting factor is going to be the runtime of maintaining the disjoint-set forest.
Alternatively, look at Prim's algorithm. You need to maintain a priority queue of the candidate distances to the nodes. If you know that all the edges are in the range [0, U), then you can do this in a super naive way by just storing an array of U buckets, one per possible priority. Inserting into the priority queue then just requires you to dump an item into the right bucket. You can do a decrease-key by evicting an element and moving it to a lower bucket. You can then do a find-min by scanning the buckets. This causes the algorithm runtime to be O(m + nU), which is linear if U is a constant.

Barder and Burkhardt in 2019 proposed this approach to find MSTs in linear time given the non-MST edges are given in ascending order of their weights.

Related

Find the shortest path from source to target in a weighted-undirected graph in O(V + E) time

I've been tasked with designing an algorithm that finds the shortest path in an weighted-undirected graph with V nodes and E edges in O(V + E) time. The graph weights are all positive integers and no weight is greater than 15.
I believe I can use Dijkstra's algorithm to find the shortest path from a source node to a target node, but I don't think it satisfies the runtime constraints.
Knowing at the runtimes of BFS and DFS, I'm thinking that some sort of modification with those algorithms will get me to O(V + E), but I'm not sure what direction to head in or how I can leverage the <= 15 weight constraint on the edges.
Any help is appreciated.
You can use Dijkstra's algorithm, but you have to be a little careful with the priority queue.
Since all the weights are integers from 1 to 15, there can only be 16 different priorities in the queue at any one time. You can use this fact to implement all your priority queue operations in constant time. That will change the complexity of the algorithm from O(|V| + |E| log |V|) to O(|V| + |E|)
There are lots of ways to make that priority queue. Basically you partition the entries into lists of entries with the same priority, and then you only have to prioritize the 16 lists. It's reasonable to keep those 16 lists in a circular array.
The algorithm that You're looking for is called Dial's Algorithm as it works also in graphs that contain cycles. It's complexity is O(E + WV). In case, W>>V you can replace one bucket per W with buckets for weights 1, 2-3, 4-7, 8-15 etc.
It's an optimization on Dijkstra, which uses the fact, that given the range of weights, You're able to replace the Fibonacci Heap with buckets which will decrease the find_node operation from O(logn) to O(1).
The algorithm in detail is well described on GeeksForGeeks and Wikipedia among others.
You should also be interested in Directed Acyclic Graph Shortest Path in Cormen's Introduction to Algorithms on p. 655 or on GeeksForGeeks

Design an algorithm which finds a minimum spanning tree of this graph in linear time

I am working on a problem in which I am given an undirected graph G on n vertices and with m edges, such that each edge e has a weight w(e) ∈ {1, 2, 3}. The task is to design an algorithm which finds a minimum spanning tree of G in linear time (O(n + m)).
These are my thoughts so far:
In the Algorithmic Graph Theory course which I am currently studying, we have covered Kruskal's and Prim's MST Algorithms. Perhaps I can modify these in some way, in order to gain linear time.
Sorting of edges generally takes log-linear (O(mlog(m))) time; however, since all edge weights are either 1, 2 or 3, Bucket Sort can be used to sort the edges in time linear in the number of edges (O(m)).
I am using the following version of Kruskal's algorithm:
Kruskal(G)
for each vertex 𝑣 ∈ 𝑉 do MAKEβˆ’SET(𝑣)
sort all edges in non-decreasing order
for edge 𝑒, 𝑣 ∈ 𝐸 (in the non-decreasing order) do
if FIND 𝑒 β‰  FIND(𝑣) then
colour (𝑒, 𝑣) blue
UNION(𝑒, 𝑣)
od
return the tree formed by blue edges
Also, MAKE-SET(x), UNION(x, y) and FIND(x) are defined as follows:
MAKE-SET(𝒙)
Create a new tree rooted at π‘₯
PARENT(π‘₯)=x
UNION(𝒙, π’š)
PARENT FIND(π‘₯) ≔ 𝐹𝐼𝑁𝐷(𝑦)
FIND(𝒙)
𝑦 ≔ π‘₯
while 𝑦 β‰  PARENT(𝑦) do
𝑦 ≔ PARENT(𝑦)
return y
The issue I have at the moment is that, although I can implement the first two lines of Kruskal's in linear time, I have not managed to do the same for the next four lines of the algorithm (from 'for edge u, ...' until 'UNION (u, v)').
I would appreciate hints as to how to implement the rest of the algorithm in linear time, or how to find a modification of Kruskal's (or some other minimum spanning tree algorithm) in linear time.
Thank you.
If you use the Disjoint Sets data structure with both path compression and union by rank, you get a data structure whose each operation's complexity grows extremely slowly - it is something like the inverse of the Ackermann function, and is not that large for sizes such as the estimated number of atoms in the universe. Effectively, then, each operation is considered constant time, and so the rest of the algorithm is considered linear time as well.
From the same wikipedia article
Since Ξ±(n) is the inverse of this function, Ξ±(n) is less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.

For a given graph G = (V,E) how can you sort its adjacency list representation in O(E+V) time?

Because we know that the integers representing a vertex can take values in [0,...,|V|-1] range, we can use counting sort in order to sort each entry of the adjacency list in O(V) time.
Since we have V lists to sort, that would give us a O(V^2) time algorithm. I don't see how we can transform this into an O(V+E) time algorithm...
In fact you need to sort E elements in total - the number of edges. Thus your estimation of O(V^2) is not quite correct. You sort each of the adjacency lists in linear time with respect to the number of edges it contains. And as in total you will have E edges, the complexity of sorting all lists will be O(E). Of course as you have V lists, you can't get lower than O(V) and thus the estimation O(V +E).

How can a heap be used to optimizie Prim's minimum spanning tree algorithm?

I have to solve a question that is something like this:
I am given a number N which represents the number of points I have. Each point has two coordinates: X and Y.
I can find the distance between two points with the following formula:
abs(x2-x1)+abs(y2-y1),
(x1,y1) being the coordinates of the first point, (x2,y2) the coordinates of the second point and abs() being the absolute value.
I have to find the minimum spanning tree, meaning I must have all my points connected with the sum of the edges being minimal. Prim's algorithm is good, but it is too slow. I read that I can make it faster using a heap but I didn't find any article that explains how to do that.
Can anyone explain me how Prim's algorithm works with a heap(some sample code would be good but not neccesarily), please?
It is possible to solve this problem efficiently(in O(n log n) time), but it is not that easy. Just using the Prim's algorithm with a heap does not help(it actually makes it even slower), because its time complexity is O(E log V), which is O(n^2 * log n) in this case.
However, you can use the Delaunay triangulation to reduce the number of edges in the graph. The Delaunay triangulation graph is planar, so it has linear number of edges. That's why running the Prim's algorithm with a heap on it gives O(n log n) time complexity(there are O(n) edges and n vertices). You can read more about it here(covering this algorithm in details and proving its correctness would make my answer way too long): http://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree. Note that even though the article is about the Euclidian mst, the approach for your case is essentially the same(it is possible to build the Delaunay triangulation for manhattan distance efficiently, too).
A description of the Prim's algorithm with a heap itself is already present in two other answers to your question.
From the Wikipedia article on Prim's algorithm:
[S]toring vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed minimum spanning tree (MST) (or infinity if no such edge exists). Every time a vertex v is chosen and added to the MST, a decrease-key operation is performed on all vertices w outside the partial MST such that v is connected to w, setting the key to the minimum of its previous value and the edge cost of (v,w).
While it was pointed out that Prim's with a heap is O(E log V), which is O(n^2 log n) in the worst case, I can provide what makes the heap faster in cases other than that worst case, since that has still not been answered.
What makes Prim's so costly at O(V^2) is the necessary updating each iteration in the algorithm. In general, Prim's works by keeping a table of your vertices with the lowest length to other vertices and picking the cheapest vertex to add to your growing tree until all are added. Every time you add a vertex, you must then go back to your table and update any vertices that can now be accessed with less weight. You then must walk back all the way through your table to decide which vertex is cheapest to add. This setup - having to pick the next vertex (O(V)) V times - gives the O(V^2).
The heap is able to help this running time is all cases besides the worst case because it fixes this bottleneck. By working with a minimum heap, you can access the minimum weight in consideration in O(1). Additionally, it costs O(log V) to fix a heap after adding a number to it to maintain its properties, which is done E times for O(E log V) to maintain the heap for Prim's. This becomes the new bottleneck, which is what gives rise to the final running time of O(E log V).
So, depending on how much you know about your data, Prim's with a heap can certainly be more efficient than without!

Directed graph (topological sort)

Say there exists a directed graph, G(V, E) (V represents vertices and E represents edges), where each edge (x, y) is associated with a weight (x, y) where the weight is an integer between 1 and 10.
Assume s and tare some vertices in V.
I would like to compute the shortest path from s to t in time O(m + n), where m is the number of vertices and n is the number of edges.
Would I be on the right track in implementing topological sort to accomplish this? Or is there another technique that I am overlooking?
The algorithm you need to use for finding the minimal path from a given vertex to another in a weighted graph is Dijkstra's algorithm. Unfortunately its complexity is O(n*log(n) + m) which may be more than you try to accomplish.
However in your case the edges are special - their weights have only 10 valid values. Thus you can implement a special data structure(kind of a heap, but takes advantage of the small dataset for the wights) to have all operations constant.
One possible way to do that is to have 10 lists - one for each weight. Adding an edge in the data structure is simply append to a list. Finding the minimum element is iteration over the 10 lists to find the first one that is non-empty. This still is constant as no more than 10 iterations will be performed. Removing the minimum element is also pretty straight-forward - simple removal from a list.
Using Dijkstra's algorithm with some data structure of the same asymptotic complexity will be what you need.

Resources