Recursive minimal spanning tree algorithms - algorithm

Is this a right algorithm for finding minimal spanning tree.
Divide Graph into 2 equally connected parts. Find its minimal spanning trees. Connect them using the smallest edge that connects them. I am trying to get counterexample of this algorithm, but can't.

Consider a four-node graph, connected in a square, with the left edge having cost 10 and all other edges having cost 1. If you divide the graph into left and right for your recursive step, you will end up with a spanning tree of cost 12, instead of cost 3.
MST is not well-adapted to "divide-and-conquer" algorithms. The closest thing is probably the Reverse-Delete algorithm; whenever you fail to remove an edge (because it would disconnect the graph), you can think of the remaining steps as executing recursively on the two sides of that edge.

You have described a divide and conquer algorithm which will not work when determining an MST. Sneftel provided a good counter-example and recursively dividing the graph into two connected parts would be extremely costly.
Instead, a good approach to finding an MST would be to use a greedy algorithm such as Prim's algorithm. We know a greedy algorithm will work because this problem exhibits optimal substructure. For this algorithm, you will want to represent your graph as an adjacency list. First, start at an arbitrary node and add it to a visited list. Add all edges from this node into a min-heap. Include the cheapest edge in your MST and add the connecting node to your visited list. From that node add all edges to your min-heap and then select the cheapest edge to a node that has yet to be visited. Continue doing so until all nodes are visited. Once that is done you have your MST.
You can use other data structures to store the graph and the visited edges, but the ones I have outlined above will maximize the runtime. If we analyze the run-time with these data structures we can see that the runtime is O(E log V) which is the time to update the cost of the elements and maintaining your heap after an edge has been removed. More specifically O(log V) to fix the heap and that is done E number of times.
I also found this quick 2-minute video that outlines Prim's algorithm with an example: Prim's Algorithm in 2 Minutes
I hope this information is helpful!

Related

Minimum Spanning Tree (MST) algorithm variation

I was asked the following question in an interview and I am unable to find an efficient solution.
Here is the problem:
We want to build a network and we are given c nodes/cities and D possible edges/connections made by roads. Edges are bidirectional and we know the cost of the edge. The costs of the edges can be represented as d[i,j] which denotes the cost of the edge i-j. Note not all c nodes can be directly connected to each other (D is the set of possible edges).
Now we are given a list of k potential edges/connections that have no cost. However, you can only choose one edge in the list of k edges to use (like getting free funding to build an airport between two cities).
So the question is... find the set of roads (and the one free airport) that minimizes total cost required to build the network connecting all cities in an efficient runtime.
So in short, solve a minimum spanning tree problem but where you can choose 1 edge in a list of k potential edges to be free of cost. I'm unsure how to solve... I've tried finding all the spanning trees in order of increasing cost and choosing the lowest cost, but I'm still challenged on how to consider the one free edge from the list of k potential free edges. I've also tried finding the MST of the D potential connections and then adjusting it according the the options in k to get a result.
Thank you for any help!
One idea would be to treat your favorite MST algorithm as a black box and to think about changing the edges in the graph before asking for the MST. For example, you could try something like this:
for each edge in the list of possible free edges:
make the graph G' formed by setting that edge cost to 0.
compute the MST of G'
return the cheapest MST out of all the ones generated this way
The runtime of this approach is O(kT(m, n)), where k is the number of edges to test and T(m, n) is the cost of computing an MST using your favorite black-box algorithm.
We can do better than this. There's a well-known problem of the following form:
Suppose you have an MST T for a graph G. You then reduce the cost of some edge {u, v}. Find an MST T' in the new graph G'.
There are many algorithms for solving this problem efficiently. Here's one:
Run a DFS in T starting at u until you find v.
If the heaviest edge on the path found this way costs more than {u, v}:
Delete that edge.
Add {u, v} to the spanning tree.
Return the resulting tree T'.
(Proving that this works is tedious but doable.) This would give an algorithm of cost O(T(m, n) + kn), since you would be building an initial MST (time T(m, n)), then doing k runs of DFS in a tree with n nodes.
However, this can potentially be improved even further if you're okay using some more advanced algorithms. The paper "On Cartesian Trees and Range Minimum Queries" by Demaine et al shows that in O(n) time, it is possible to preprocess a minimum spanning tree so that, in time O(1), queries of the form "what is the lowest-cost edge on the path in this tree between nodes u and v?" in time O(1). You could therefore build this structure instead of doing a DFS to find the bottleneck edge between u and v, reducing the overall runtime to O(T(m, n) + n + k). Given that T(m, n) is very low (the best known bound is O(m α(m)), where α(m) is the Ackermann inverse function and is less than five for all inputs in the feasible univers), this is asymptotically a very quick algorithm!
First generate a MST. Now, if you add a free edge, you will create exactly one cycle. You could then remove the heaviest edge in the cycle to get a cheaper tree.
To find the best tree you can make by adding one free edge, you need to find the heaviest edge in the MST that you could replace with a free one.
You can do that by testing one free edge at a time:
Pick a free edge
Find the lowest common ancestor in the tree (from an arbitrary root) of its adjacent vertices
Remember the heaviest edge on the path between the free edge vertices
When you're done, you know which free edge to use -- it's the one associated with the heaviest tree edge, and you know which edge it replaces.
In order to make steps (2) and (3) faster, you can remember the depth of each node and connect it to multiple ancestors like a skip list. You can then do those steps in O(log |V|) time, leading to a total complexity of O( (|E|+k) log |V| ), which is pretty good.
EDIT: Even Easier Way
After thinking about this a bit, it seems there's a super easy way to figure out which free edge to use and which MST edge to replace.
Disregarding the k possible free edges, you build the MST from the other edges using Kruskal's algorithm, but you modify the usual disjoint set data structure as follows:
Use union by size or rank, but not path compression. Every union operation will then establish exactly one link, and take O(log N) time, and all path lengths will be at most O(log N) long.
For each link, remember the index of the edge that caused it to be created.
For each possible free edge, then, you can walk up the links in the disjoint set structure to find out exactly at which point its endpoints were connected into the same connected component. You get the index of the last required edge, i.e., the one it would replace, and the free edge with the greatest replacement target index is the one you should use.

Modified version of Prim's algorithm with O(kn + m) time complexity

Could you please help me with this problem?
Given an undirected graph G, connected, with weighted edges, such that the weights are integers in [1,k] . Write a modified version of Prim's algorithm that returns the minimum spanning tree in O(kn+m) time.
Note:
n represents the number of vertices
m represents the number of edges
You should be using the limited range of the edge length. This will help you keep a priority queue of the edges more efficiently. Keep in mind the most important step in the algorithm is to find the minimum-weight edge connecting the tree built thus far with a node not yet added to the tree. Try to use counting sort as an inspiration.

What is difference between BFS and Dijkstra's algorithms when looking for shortest path?

I was reading about Graph algorithms and I came across these two algorithms:
Dijkstra's algorithm
Breadth-first search
What is the difference between Dijkstra's algorithm and BFS while looking for the shortest-path between nodes?
I searched a lot about this but didn't get any satisfactory answer!
The rules for BFS for finding shortest-path in a graph are:
We discover all the connected vertices,
Add them in the queue and also
Store the distance (weight/length) from source u to that vertex v.
Update with path from source u to that vertex v with shortest distance and we have it!
This is exactly the same thing we do in Dijkstra's algorithm!
So why are the time complexities of these algorithms so different?
If anyone can explain it with the help of a pseudo code then I will be
very grateful!
I know I am missing something! Please help!
Breadth-first search is just Dijkstra's algorithm with all edge weights equal to 1.
Dijkstra's algorithm is conceptually breadth-first search that respects edge costs.
The process for exploring the graph is structurally the same in both cases.
When using BFS for finding the shortest path in a graph, we discover all the connected vertices, add them to the queue and also maintain the distance from source to that vertex. Now, if we find a path from source to that vertex with less distance then we update it!
We do not maintain a distance in BFS. It is for discovery of nodes.
So we put them in a general queue and pop them. Unlike in Dijikstra, where we put accumulative weight of node (after relaxation) in a priority queue and pop the min distance.
So BFS would work like Dijikstra in equal weight graph. Complexity varies because of the use of simple queue and priority queue.
Dijkstra and BFS, both are the same algorithm. As said by others members, Dijkstra using priority_queue whereas BFS using a queue. The difference is because of the way the shortest path is calculated in both algorithms.
In BFS Algorithm, for finding the shortest path we traverse in all directions and update the distance array respectively. Basically, the pseudo-code will be as follow:
distance[src] = 0;
q.push(src);
while(queue not empty) {
pop the node at front (say u)
for all its adjacent (say v)
if dist[u] + weight < dist[v]
update distance of v
push v into queue
}
The above code will also give the shortest path in a weighted graph. But the time complexity is not equal to normal BFS i.e. O(E+V). Time complexity is more than O(E+V) because many of the edges are repeated twice.
Graph-Diagram
Consider, the above graph. Dry run it for the above pseudo-code you will find that node 2 and node 3 are pushed two times into the queue and further the distance for all future nodes is updated twice.
BFS-Traversal-Working
So, assume if there is lot more nodes after 3 then the distance calculated by the first insertion of 2 will be used for all future nodes then those distance will be again updated using the second push of node 2. Same scenario with 3.
So, you can see that nodes are repeated. Hence, all nodes and edges are not traversed only once.
Dijkstra Algorithm does a smart work here...rather than traversing in all the directions it only traverses in the direction with the shortest distance, so that repetition of updation of distance is prevented.
So, to trace the shortest distance we have to use priority_queue in place of the normal queue.
Dijkstra-Algo-Working
If you try to dry run the above graph again using the Dijkstra algorithm you will find that nodes are push twice but only that node is considered which has a shorter distance.
So, all nodes are traversed only once but time complexity is more than normal BFS because of the use of priority_queue.
With SPFA algorithm, you can get shortest path with normal queue in weighted edge graph.
It is variant of bellman-ford algorithm, and it can also handle negative weights.
But on the down side, it has worse time complexity over Dijkstra's
Since you asked for psuedocode this website has visualizations with psuedocode https://visualgo.net/en/sssp

Time complexity of creating a minimal spanning tree if the number of edges is known

Suppose that the number of edges of a connected graph is known and the weight of each edge is distinct, would it possible to create a minimal spanning tree in linear time?
To do this we must look at each edge; and during this loop there can contain no searches otherwise it would result in at least n log n time. I'm not sure how to do this without searching in the loop. It would mean that, somehow we must only look at each edge once, and decide rather to include it or not based on some "static" previous values that does not involve a growing data structure.
So.. let's say we keep the endpoints of the node in question, then look at the next node, if the next node has the same vertices as prev, then compare the weight of prev and current node and keep the lower one. If the current node's endpoints are not equal to prev, then it is in a different component .. now I am stuck because we cannot create a hash or array to keep track of the component nodes that are already added while look through each edge in linear time.
Another approach I thought of is to find the edge with the minimal weight; since the edge weights are distinct this edge will be part of any MST. Then.. I am stuck. Since we cannot do this for n - 1 edges in linear time.
Any hints?
EDIT
What if we know the number of nodes, the number of edges and also that each edge weight is distinct? Say, for example, there are n nodes, n + 6 edges?
Then we would only have to find and remove the correct 7 edges correct?
To the best of my knowledge there is no way to compute an MST faster by knowing how many edges there are in the graph and that they are distinct. In the worst case, you would have to look at every edge in the graph before finding the minimum-cost edge (which must be in the MST), which takes Ω(m) time in the worst case. Therefore, I'll claim that any MST algorithm must take Ω(m) time in the worst case.
However, if we're already doing Ω(m) work in the worst-case, we could do the following preprocessing step on any MST algorithm:
Scan over the edges and count up how many there are.
Add an epsilon value to each edge weight to ensure the edges are unique.
This can be done in time Ω(m) as well. Consequently, if there were a way to speed up MST computation knowing the number of edges and that the edge costs are distinct, we would just do this preprocessing step on any current MST algorithm to try to get faster performance. Since to the best of my knowledge no MST algorithm actually tries to do this for performance reasons, I would suspect that there isn't a (known) way to get a faster MST algorithm based on this extra knowledge.
Hope this helps!
There's a famous randomised linear-time algorithm for minimum spanning trees whose complexity is linear in the number of edges. See "A randomized linear-time algorithm to find minimum spanning trees" by Karger, Klein, and Tarjan.
The key result in the paper is their "sampling lemma" -- that, if you independently randomly select a subset of the edges with probability p and find the minimum spanning tree of this subgraph, then there are only |V|/p edges that are better than the worst edge in the tree path connecting its ends.
As templatetypedef noted, you can't beat linear-time. That all edge weights are distinct is a common assumption that simplifies analysis; if anything, it makes MST algorithms run a little slower.
The fact that a number of edges (N) is known does not influence the complexity in any way. N is still a finite but unbounded variable, and each graph will have different N. If you place a upper bound on N, say, 1 million, then the complexity is O(1 million log 1 million) = O(1).
The fact that each edge has distinct weight does not influence the program either, because it does not say anything about the graph's structure. Therefore knowledge about current case cannot influence further processing, as we cannot predict how the graph's structure will look like in the next step.
If the number of edges is close to n, like in this case n-6 (after edit), we know that we only need to remove 7 edges as every spanning tree has only n-1 edges.
The Cycle Property shows that the most expensive edge in a cycle does not belong to any Minimum Spanning tree(assuming all edges are distinct) and thus, should be removed.
Now you can simply apply BFS or DFS to identify a cycle and remove the most expensive edge. So, overall, we need to run BFS 7 times. This takes 7*n time and gives us a time complexity of O(n). Again, this is only true if the number of edges is close to the number of nodes.

How to get the new MST from the old one if one edge in the graph changes its weight?

We know the original graph and the original MST. Now we change an edge's weight in the graph. Beside the Prim and the Kruskal, is there any way we can generate the new MST from the old one?
Here's how I would do it:
If the changed edge is in the original MST:
If its weight was reduced, then of course it must be in the new MST.
If its weight was increased, then delete it from the original MST and look for the lowest-weight edge that connects the two subtrees that remain (this could select the original edge again). This can be done efficiently in a Kruskal-like way by building up a disjoint-set data structure to hold the two subtrees, and sorting the remaining edges by weight: select the first one with endpoints in different sets. If you know a way of quickly deleting an edge from a disjoint-set data structure, and you built the original MST using Kruskal's algorithm, then you can avoid recomputing them here.
Otherwise:
If its weight was increased, then of course it will remain outside the MST.
If its weight was reduced, add it to the original MST. This will create a cycle. Scan the cycle, looking for the heaviest edge (this could select the original edge again). Delete this edge. If you will be performing many edge mutations, cycle-finding may be sped up by calculating all-pairs shortest paths using the Floyd-Warshall algorithm. You can then find all edges in the cycle by initially leaving the new edge out, and looking for the shortest path in the MST between its two endpoints (there will be exactly one such path).
Besides linear-time algorithm, proposed by j_random_hacker, you can find a sub-linear algorithm in this book: "Handbook of Data Structures and Applications" (Chapter 36) or in these papers: Dynamic graphs, Maintaining Minimum Spanning Trees in Dynamic Graphs.
You can change the problem a little while the result is same.
Get the structure of the original MST, run DFS from each vertice and
you can get the maximum weighted edge in the tree-path between each
vertice-pair. The complexity of this step is O(N ^ 2)
Instead of changing one edge's weight to w, we can assume we are
adding a new edge (u,v) into the original MST whose weight is w. The
adding edge would make a loop on the tree and we should cut one edge
on the loop to generate a new MST. It's obviously that we can only
compare the adding edge with the maximum edge in the path(a,b), The
complexity of this step is O(1)

Resources