Building MST from a graph with "very few" edges in linear time - algorithm

I was at an interview and interviewer asked me a question:
We have a graph G(V,E), we can find MST using prim's or kruskal algorithm. But these algorithms do not take into the account that there are "very few" edges in G. How can we use this information to improve time complexity of finding MST? Can we find MST in linear time?
The only thing I could remember was that Kruskal's algorithm is faster in a sparse graphs while Prim's algorithm is faster in really dense graphs. But I couldn't answer him how to use prior knowledge about the number of edges to make MST in linear time.
Any insight or solution would be appreciated.

Kruskal's algorithm is pretty much linear after sorting the edges. If you use a union find structure like disjoint set forest The complexity for processing a single edge will be in the order of lg*(n) where n is the number of vertices and this function grows so slowly that for this case can be considered constant. However the problem is that to sort the edges you still need a O(m * log(m)). Where m is the number of edges.
Prim's algorithm will not be able to take advantage of the fact that the edges are very few.
One approach that you can use is something like a 'reversed' MST approach where you start off with all edges and remove the longest edge until the graph becomes disconnected. You keep doing that until only n - 1 edges are left. Still note that this will be better than Kruskal only if the number of edges to remove k are few enough so that k * n < m * log(m).

Lets say |E| = |V| +c ,c being a small constant. You can run DFS on the graph and every time you detect a circle, remove the largest edge. you must do that c +1 times. O(c+1 * |E|) = O(E) linear time in theory.

Related

Partitioning a graph into two clusters

I have a complete weighted graph G(V, E). I want to partition V into two clusters such that maximum intra-cluster edge length gets minimized. What is the fastest algorithm that solves this problem? I believe this can be solved in O(n^2) time where |V|=n. One approach would be making the graph bipartite. I could not figure out the complete algorithm. Can anyone help me to figure out the complete algorithm?
Two-color (depth-first search, O(n) time) a maximum spanning forest (Prim's algorithm, O(n2) time). Proof of correctness left as an exercise.
For the record, for sparser graphs with only m edges, I'm pretty sure there's an O(m)-time algorithm.

Can we use the BFS on each vertex in order to find the graph's diameter? if so, is this the best solution?

So i found an old topic :
Algorithm for diameter of graph?
which they said the best solution for non sparse graph is O(V^3)
but can't we just use the BFS on each vertex and then find the maximum?
and this way the time complexity will be O(V*(V+E)) = O(V^2 + VE)
am i wrong? because if the number of edges is just a multiplicand of V then this would work better, right?
so i guess my question is :
what is the best time complexity for computing the graph's diameter as of now in 2018
is my method wrong? what am i missing here?
The matrix in question is non-sparse. So it gives a worst case E ~ (V^2)/2 edges. The solution mentioned will thus become O(V^2+V*(V^2)) for non-sparse matrixes.
If the matrix was sparse then it would indeed be faster than O(V^3).
Also given the graph is non-sparse, it is usually represented using adjacency matrix, for faster lookup times. Breadth First Search would thus take O(V^2). This done as you mentioned across all nodes will again lead to O(V^3) computational time complexity.
Finding the diameter can be done by finding all pair shortest paths first and determining the maximum length found. Floyd-Warshall algorithm does this in O(V^3) time. Johnson's algorithm can be implemented to achieve O(V^2 logV + VE) time.

Design an algorithm which finds a minimum spanning tree of this graph in linear time

I am working on a problem in which I am given an undirected graph G on n vertices and with m edges, such that each edge e has a weight w(e) ∈ {1, 2, 3}. The task is to design an algorithm which finds a minimum spanning tree of G in linear time (O(n + m)).
These are my thoughts so far:
In the Algorithmic Graph Theory course which I am currently studying, we have covered Kruskal's and Prim's MST Algorithms. Perhaps I can modify these in some way, in order to gain linear time.
Sorting of edges generally takes log-linear (O(mlog(m))) time; however, since all edge weights are either 1, 2 or 3, Bucket Sort can be used to sort the edges in time linear in the number of edges (O(m)).
I am using the following version of Kruskal's algorithm:
Kruskal(G)
for each vertex 𝑣 ∈ 𝑉 do MAKEβˆ’SET(𝑣)
sort all edges in non-decreasing order
for edge 𝑒, 𝑣 ∈ 𝐸 (in the non-decreasing order) do
if FIND 𝑒 β‰  FIND(𝑣) then
colour (𝑒, 𝑣) blue
UNION(𝑒, 𝑣)
od
return the tree formed by blue edges
Also, MAKE-SET(x), UNION(x, y) and FIND(x) are defined as follows:
MAKE-SET(𝒙)
Create a new tree rooted at π‘₯
PARENT(π‘₯)=x
UNION(𝒙, π’š)
PARENT FIND(π‘₯) ≔ 𝐹𝐼𝑁𝐷(𝑦)
FIND(𝒙)
𝑦 ≔ π‘₯
while 𝑦 β‰  PARENT(𝑦) do
𝑦 ≔ PARENT(𝑦)
return y
The issue I have at the moment is that, although I can implement the first two lines of Kruskal's in linear time, I have not managed to do the same for the next four lines of the algorithm (from 'for edge u, ...' until 'UNION (u, v)').
I would appreciate hints as to how to implement the rest of the algorithm in linear time, or how to find a modification of Kruskal's (or some other minimum spanning tree algorithm) in linear time.
Thank you.
If you use the Disjoint Sets data structure with both path compression and union by rank, you get a data structure whose each operation's complexity grows extremely slowly - it is something like the inverse of the Ackermann function, and is not that large for sizes such as the estimated number of atoms in the universe. Effectively, then, each operation is considered constant time, and so the rest of the algorithm is considered linear time as well.
From the same wikipedia article
Since Ξ±(n) is the inverse of this function, Ξ±(n) is less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.

How can a heap be used to optimizie Prim's minimum spanning tree algorithm?

I have to solve a question that is something like this:
I am given a number N which represents the number of points I have. Each point has two coordinates: X and Y.
I can find the distance between two points with the following formula:
abs(x2-x1)+abs(y2-y1),
(x1,y1) being the coordinates of the first point, (x2,y2) the coordinates of the second point and abs() being the absolute value.
I have to find the minimum spanning tree, meaning I must have all my points connected with the sum of the edges being minimal. Prim's algorithm is good, but it is too slow. I read that I can make it faster using a heap but I didn't find any article that explains how to do that.
Can anyone explain me how Prim's algorithm works with a heap(some sample code would be good but not neccesarily), please?
It is possible to solve this problem efficiently(in O(n log n) time), but it is not that easy. Just using the Prim's algorithm with a heap does not help(it actually makes it even slower), because its time complexity is O(E log V), which is O(n^2 * log n) in this case.
However, you can use the Delaunay triangulation to reduce the number of edges in the graph. The Delaunay triangulation graph is planar, so it has linear number of edges. That's why running the Prim's algorithm with a heap on it gives O(n log n) time complexity(there are O(n) edges and n vertices). You can read more about it here(covering this algorithm in details and proving its correctness would make my answer way too long): http://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree. Note that even though the article is about the Euclidian mst, the approach for your case is essentially the same(it is possible to build the Delaunay triangulation for manhattan distance efficiently, too).
A description of the Prim's algorithm with a heap itself is already present in two other answers to your question.
From the Wikipedia article on Prim's algorithm:
[S]toring vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed minimum spanning tree (MST) (or infinity if no such edge exists). Every time a vertex v is chosen and added to the MST, a decrease-key operation is performed on all vertices w outside the partial MST such that v is connected to w, setting the key to the minimum of its previous value and the edge cost of (v,w).
While it was pointed out that Prim's with a heap is O(E log V), which is O(n^2 log n) in the worst case, I can provide what makes the heap faster in cases other than that worst case, since that has still not been answered.
What makes Prim's so costly at O(V^2) is the necessary updating each iteration in the algorithm. In general, Prim's works by keeping a table of your vertices with the lowest length to other vertices and picking the cheapest vertex to add to your growing tree until all are added. Every time you add a vertex, you must then go back to your table and update any vertices that can now be accessed with less weight. You then must walk back all the way through your table to decide which vertex is cheapest to add. This setup - having to pick the next vertex (O(V)) V times - gives the O(V^2).
The heap is able to help this running time is all cases besides the worst case because it fixes this bottleneck. By working with a minimum heap, you can access the minimum weight in consideration in O(1). Additionally, it costs O(log V) to fix a heap after adding a number to it to maintain its properties, which is done E times for O(E log V) to maintain the heap for Prim's. This becomes the new bottleneck, which is what gives rise to the final running time of O(E log V).
So, depending on how much you know about your data, Prim's with a heap can certainly be more efficient than without!

Why do Kruskal and Prim MST algorithms have different runtimes for sparse and dense graphs?

I am trying to understand why Prim and Kruskal have different time complexities when it comes to sparse and dense graphs. After using a couple of applets that demonstrate how each works, I am still left a little confused about how the density of the graph affects the algorithms. I hope someone could give me a nudge in the right direction.
Wikipedia gives the complexity of these algorithms in terms of E, the number of edges, and V, the number of vertices, which is a good practice because it lets you do exactly this sort of analysis.
Kruskal's algorithm is O(E log V). Prim's complexity depends on which data structure you use for it. Using an adjacency matrix, it's O(V2).
Now if you plug in V2 for E, behold you get the complexities that you cited in your comment for dense graphs, and if you plug in V for E, lo you get the sparse ones.
Why do we plug in V2 for a dense graph? Well even in the densest possible graph you can't have as many as V2 edges, so clearly E = O(V2).
Why do we plug in V for a sparse graph? Well, you have to define what you mean by sparse, but suppose we call a graph sparse if each vertex has no more than five edges. I would say such graphs are pretty sparse: once you get up into the thousands of vertices, the adjacency matrix would be mostly empty space. That would mean that for sparse graphs, E ≀ 5 V, so E = O(V).
Are these different complexities with respect to the number of vertices by any chance?
there is often, a slightly handwavy, argument that says for a sparse graph, the number of edges E = O(V) where V is the number of verticies, for a dense graph E = O(V^2). as both algotrithms potentially have complexity that depends on E, when you convert this to comlexity that depends on V you get different complexities depending on dense or sparse graphs
edit:
different data structures will also effect the complexity of course wikipedia has a break down on this
Algorithms by Cormen et al does indeed give an analysis, in both cases using a sparse representation of the graph. With Kruskal's algorithm (link vertices in disjoint components until everything joins up) the first step is to sort the edges of the graph, which takes time O(E lg E) and they simply establish that nothing else takes longer than this. With Prim's algorithm (extend the current tree by adding the closest vertex not already on it) they use a Fibonacci heap to store the queue of pending vertices and get O(E + V lgV), because with a Fibonacci tree decreasing the distance to vertices in the queue is only O(1) and you do this at most once per edge.

Resources