Design an algorithm which finds a minimum spanning tree of this graph in linear time - algorithm

I am working on a problem in which I am given an undirected graph G on n vertices and with m edges, such that each edge e has a weight w(e) ∈ {1, 2, 3}. The task is to design an algorithm which finds a minimum spanning tree of G in linear time (O(n + m)).
These are my thoughts so far:
In the Algorithmic Graph Theory course which I am currently studying, we have covered Kruskal's and Prim's MST Algorithms. Perhaps I can modify these in some way, in order to gain linear time.
Sorting of edges generally takes log-linear (O(mlog(m))) time; however, since all edge weights are either 1, 2 or 3, Bucket Sort can be used to sort the edges in time linear in the number of edges (O(m)).
I am using the following version of Kruskal's algorithm:
Kruskal(G)
for each vertex 𝑣 ∈ 𝑉 do MAKEβˆ’SET(𝑣)
sort all edges in non-decreasing order
for edge 𝑒, 𝑣 ∈ 𝐸 (in the non-decreasing order) do
if FIND 𝑒 β‰  FIND(𝑣) then
colour (𝑒, 𝑣) blue
UNION(𝑒, 𝑣)
od
return the tree formed by blue edges
Also, MAKE-SET(x), UNION(x, y) and FIND(x) are defined as follows:
MAKE-SET(𝒙)
Create a new tree rooted at π‘₯
PARENT(π‘₯)=x
UNION(𝒙, π’š)
PARENT FIND(π‘₯) ≔ 𝐹𝐼𝑁𝐷(𝑦)
FIND(𝒙)
𝑦 ≔ π‘₯
while 𝑦 β‰  PARENT(𝑦) do
𝑦 ≔ PARENT(𝑦)
return y
The issue I have at the moment is that, although I can implement the first two lines of Kruskal's in linear time, I have not managed to do the same for the next four lines of the algorithm (from 'for edge u, ...' until 'UNION (u, v)').
I would appreciate hints as to how to implement the rest of the algorithm in linear time, or how to find a modification of Kruskal's (or some other minimum spanning tree algorithm) in linear time.
Thank you.

If you use the Disjoint Sets data structure with both path compression and union by rank, you get a data structure whose each operation's complexity grows extremely slowly - it is something like the inverse of the Ackermann function, and is not that large for sizes such as the estimated number of atoms in the universe. Effectively, then, each operation is considered constant time, and so the rest of the algorithm is considered linear time as well.
From the same wikipedia article
Since Ξ±(n) is the inverse of this function, Ξ±(n) is less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.

Related

Building MST from a graph with "very few" edges in linear time

I was at an interview and interviewer asked me a question:
We have a graph G(V,E), we can find MST using prim's or kruskal algorithm. But these algorithms do not take into the account that there are "very few" edges in G. How can we use this information to improve time complexity of finding MST? Can we find MST in linear time?
The only thing I could remember was that Kruskal's algorithm is faster in a sparse graphs while Prim's algorithm is faster in really dense graphs. But I couldn't answer him how to use prior knowledge about the number of edges to make MST in linear time.
Any insight or solution would be appreciated.
Kruskal's algorithm is pretty much linear after sorting the edges. If you use a union find structure like disjoint set forest The complexity for processing a single edge will be in the order of lg*(n) where n is the number of vertices and this function grows so slowly that for this case can be considered constant. However the problem is that to sort the edges you still need a O(m * log(m)). Where m is the number of edges.
Prim's algorithm will not be able to take advantage of the fact that the edges are very few.
One approach that you can use is something like a 'reversed' MST approach where you start off with all edges and remove the longest edge until the graph becomes disconnected. You keep doing that until only n - 1 edges are left. Still note that this will be better than Kruskal only if the number of edges to remove k are few enough so that k * n < m * log(m).
Lets say |E| = |V| +c ,c being a small constant. You can run DFS on the graph and every time you detect a circle, remove the largest edge. you must do that c +1 times. O(c+1 * |E|) = O(E) linear time in theory.

Linear Time Algorithm For MST

I was wondering if anyone can point to a linear time algorithm to find the MST of a graph when there is a small number of weights (I.e edges can only have 2 different weights).
I could not find anything on google other than Prim's, Kruskal's, Boruvka's none of which seem to have any properties that would reduce the run time in this special case. I'm guessing to make it linear time it would have to be some sort of modification of BFS (which finds the MST when the weights are uniform).
The cause of the lg V factor in Prim's O(V lg V) runtime is the heap that is used to find the next candidate edge. I'm pretty sure that it is possible to design a priority queue that does insertion and removal in constant time when there's a limited number of possible weights, which would reduce Prim to O(V).
For the priority queue, I believe it would suffice with an array whose indices covers all the possible weights, where each element points to a linked list that contains the elements with that weight. You'd still have a factor of d (the number of distinct weights) for figuring out which list to get the next element out of (the "lowest" non-empty one), but if d is a constant, then you'll be fine.
Elaborating on Aasmund Eldhuset's answer: if the weights in the MST are restricted to numbers in the range 0, 1, 2, 3, ..., U-1, then you can adapt many of the existing algorithms to run in (near) linear time if U is a constant.
For example, let's take Kruskal's algorithm. The first step in Kruskal's algorithm is to sort the edges into ascending order of weight. You can do this in time O(m + U) if you use counting sort or time O(m lg U) if you use radix sort. If U is a constant, then both of these sorting steps take linear time. Consequently, the runtime for running Kruskal's algorithm in this case would be O(m Ξ±(m)), where Ξ±(m) is the inverse Ackermann function, because the limiting factor is going to be the runtime of maintaining the disjoint-set forest.
Alternatively, look at Prim's algorithm. You need to maintain a priority queue of the candidate distances to the nodes. If you know that all the edges are in the range [0, U), then you can do this in a super naive way by just storing an array of U buckets, one per possible priority. Inserting into the priority queue then just requires you to dump an item into the right bucket. You can do a decrease-key by evicting an element and moving it to a lower bucket. You can then do a find-min by scanning the buckets. This causes the algorithm runtime to be O(m + nU), which is linear if U is a constant.
Barder and Burkhardt in 2019 proposed this approach to find MSTs in linear time given the non-MST edges are given in ascending order of their weights.

Directed graph (topological sort)

Say there exists a directed graph, G(V, E) (V represents vertices and E represents edges), where each edge (x, y) is associated with a weight (x, y) where the weight is an integer between 1 and 10.
Assume s and tare some vertices in V.
I would like to compute the shortest path from s to t in time O(m + n), where m is the number of vertices and n is the number of edges.
Would I be on the right track in implementing topological sort to accomplish this? Or is there another technique that I am overlooking?
The algorithm you need to use for finding the minimal path from a given vertex to another in a weighted graph is Dijkstra's algorithm. Unfortunately its complexity is O(n*log(n) + m) which may be more than you try to accomplish.
However in your case the edges are special - their weights have only 10 valid values. Thus you can implement a special data structure(kind of a heap, but takes advantage of the small dataset for the wights) to have all operations constant.
One possible way to do that is to have 10 lists - one for each weight. Adding an edge in the data structure is simply append to a list. Finding the minimum element is iteration over the 10 lists to find the first one that is non-empty. This still is constant as no more than 10 iterations will be performed. Removing the minimum element is also pretty straight-forward - simple removal from a list.
Using Dijkstra's algorithm with some data structure of the same asymptotic complexity will be what you need.

tree graph - find how many pairs of vertices, for which the sum of edges on a path between them is C

I've got a weighted tree graph, where all the weights are positive. I need an algorithm to solve the following problem.
How many pairs of vertices are there in this graph, for which the sum of the weights of edges between them equals C?
I thought of a solutions thats O(n^2)
For each vertex we start a DFS from it and stop it when the sum gets bigger than C. Since the number of edges is n-1, that gives us obviously an O(n^2) solution.
But can we do better?
For an undirected graph, in terms of theoretic asymptotic complexity - no, you cannot do better, since the number of such pairs could be itself O(n^2).
As an example, take a 'sun/flower' graph:
G=(V[union]{x},E)
E = { (x,v) | v in V }
w(e) = 1 (for all edges)
It is easy to see that the graph is indeed a tree.
However, the number of pairs that have distance of exactly 2 is (n-1)(n-2) which is in Omega(n^2), and thus any algorithm that finds all of them will be Omega(n^2) in this case.

Breadth-first search algorithm (graph represented by the adjacency list) has a quadratic time complexity?

A friend told me that breadth-first search algorithm (graph represented by the adjacency list) has a quadratic time complexity. But in all the sources says that the complexity of BFS algortim exactly O (|V| + |E|) or O (n + m), from which we obtain a quadratic complexity ?
All the sources are right :-) With BFS you visit each vertex and each edge exactly once, resulting in linear complexity. Now, if it's a completely connected graph, i.e. each pair of vertices is connected by an edge, then the number of edges grows quadratic with the number of vertices:
|E| = |V| * (|V|-1) / 2
Then one might say the complexity of BFS is quadratic in the number of vertices: O(|V|+|E|) = O(|V|^2)
BFS is O(E+V) hence in terms of input given it is linear time algorithm but if vertices of graph are considered then no of edges can be O(|V|^2) in dense graphs hence if we consider time complexity in terms of vertices in graph then BFS is O(|V|^2) hence can be considered quadratic in terms of vertices
O(n + m) is linear in complexity and not quadratic. O(n*m) is quadratic.
0. Initially all the vertices are labelled as unvisited. We start from a given vertex as the current vertex.
1. A BFS will cover (visit) all the adjacent unvisited vertices to the current vertex queuing up these children.
2. It would then label the current vertex as 'visited' so that it might not be visited (queued again).
3 BFS would then take out the first vertex from the queue and would repeat the steps from 1 till no more unvisited vertices remain.
The runtime for the above algorithm is linear in the total no. of vertices and edges together because the algorithm would visit each vertex once and check each of its edges once and thus it would take no. of vertices + no. of edges steps to completely search the graph

Resources