Algorithm to find minimum spanning tree of chosen vertices - algorithm

One can use Prim's algorithm or Kruskal's algorithm to find the minimum spanning tree/graph of a collection of vertices/nodes and edges/links. What I want though, is an algorithm that finds the minimum spanning graph of this collection, but the resulting graph needs to include only arbitrarily chosen nodes, instead of all nodes. It's okay if the resulting graph includes more nodes than just those needed.
Does such an algorithm exist? Perhaps one could just use Prim's (or Kruskal's) algorithm after modifying the graph to include only the needed nodes? But, I'm not sure how to modify the graph to do so while maintaining its connectedness.
For example, say we have a diamond shaped starting graph (with costs of links in brackets):
A
(2)/ \(1)
B C
(2)\ /(5)
D
Now, we arbitrarily decide that only nodes A and D are needed. If we started at A, we'd still want it to take the left path, because ((2 + 2) < (1 + 5)).
Say we modify the graph slightly:
A
(2)/ \(1) (2)
B C ------E
(2)\ /(5)
D
If we decide that only nodes A, D, and E are needed, we realize that the path with the minimum cost is not necessarily the one with the fewest links. Taking A--B--D and A--C--E costs 7, but A--C--D and C--E costs 8.

What you want to find is a discrete Steiner tree. When not all vertices in the graph are mandatory but the tree is allowed to split at the optional vertices, the problem is NP-hard.
Wikipedia says (linked above) of this problem: it is believed that arbitrarily good approximation ratios cannot in general be achieved in polynomial time. There is a polynomial-time algorithm that finds a factor 1.39 approximation of a minimum Steiner tree.

Related

Minimum Spanning Tree (MST) algorithm variation

I was asked the following question in an interview and I am unable to find an efficient solution.
Here is the problem:
We want to build a network and we are given c nodes/cities and D possible edges/connections made by roads. Edges are bidirectional and we know the cost of the edge. The costs of the edges can be represented as d[i,j] which denotes the cost of the edge i-j. Note not all c nodes can be directly connected to each other (D is the set of possible edges).
Now we are given a list of k potential edges/connections that have no cost. However, you can only choose one edge in the list of k edges to use (like getting free funding to build an airport between two cities).
So the question is... find the set of roads (and the one free airport) that minimizes total cost required to build the network connecting all cities in an efficient runtime.
So in short, solve a minimum spanning tree problem but where you can choose 1 edge in a list of k potential edges to be free of cost. I'm unsure how to solve... I've tried finding all the spanning trees in order of increasing cost and choosing the lowest cost, but I'm still challenged on how to consider the one free edge from the list of k potential free edges. I've also tried finding the MST of the D potential connections and then adjusting it according the the options in k to get a result.
Thank you for any help!
One idea would be to treat your favorite MST algorithm as a black box and to think about changing the edges in the graph before asking for the MST. For example, you could try something like this:
for each edge in the list of possible free edges:
make the graph G' formed by setting that edge cost to 0.
compute the MST of G'
return the cheapest MST out of all the ones generated this way
The runtime of this approach is O(kT(m, n)), where k is the number of edges to test and T(m, n) is the cost of computing an MST using your favorite black-box algorithm.
We can do better than this. There's a well-known problem of the following form:
Suppose you have an MST T for a graph G. You then reduce the cost of some edge {u, v}. Find an MST T' in the new graph G'.
There are many algorithms for solving this problem efficiently. Here's one:
Run a DFS in T starting at u until you find v.
If the heaviest edge on the path found this way costs more than {u, v}:
Delete that edge.
Add {u, v} to the spanning tree.
Return the resulting tree T'.
(Proving that this works is tedious but doable.) This would give an algorithm of cost O(T(m, n) + kn), since you would be building an initial MST (time T(m, n)), then doing k runs of DFS in a tree with n nodes.
However, this can potentially be improved even further if you're okay using some more advanced algorithms. The paper "On Cartesian Trees and Range Minimum Queries" by Demaine et al shows that in O(n) time, it is possible to preprocess a minimum spanning tree so that, in time O(1), queries of the form "what is the lowest-cost edge on the path in this tree between nodes u and v?" in time O(1). You could therefore build this structure instead of doing a DFS to find the bottleneck edge between u and v, reducing the overall runtime to O(T(m, n) + n + k). Given that T(m, n) is very low (the best known bound is O(m α(m)), where α(m) is the Ackermann inverse function and is less than five for all inputs in the feasible univers), this is asymptotically a very quick algorithm!
First generate a MST. Now, if you add a free edge, you will create exactly one cycle. You could then remove the heaviest edge in the cycle to get a cheaper tree.
To find the best tree you can make by adding one free edge, you need to find the heaviest edge in the MST that you could replace with a free one.
You can do that by testing one free edge at a time:
Pick a free edge
Find the lowest common ancestor in the tree (from an arbitrary root) of its adjacent vertices
Remember the heaviest edge on the path between the free edge vertices
When you're done, you know which free edge to use -- it's the one associated with the heaviest tree edge, and you know which edge it replaces.
In order to make steps (2) and (3) faster, you can remember the depth of each node and connect it to multiple ancestors like a skip list. You can then do those steps in O(log |V|) time, leading to a total complexity of O( (|E|+k) log |V| ), which is pretty good.
EDIT: Even Easier Way
After thinking about this a bit, it seems there's a super easy way to figure out which free edge to use and which MST edge to replace.
Disregarding the k possible free edges, you build the MST from the other edges using Kruskal's algorithm, but you modify the usual disjoint set data structure as follows:
Use union by size or rank, but not path compression. Every union operation will then establish exactly one link, and take O(log N) time, and all path lengths will be at most O(log N) long.
For each link, remember the index of the edge that caused it to be created.
For each possible free edge, then, you can walk up the links in the disjoint set structure to find out exactly at which point its endpoints were connected into the same connected component. You get the index of the last required edge, i.e., the one it would replace, and the free edge with the greatest replacement target index is the one you should use.

Algorithm to find minimum spanning tree for graph with edge weights in {1,2,3} [duplicate]

This question already has answers here:
A fast algorithm for minimum spanning trees when edge lengths are constrained?
(2 answers)
Closed 7 years ago.
I have recently been doing some research into Prims/Kruskals algorithms for finding minimum spanning trees in graphs, and I am interested in the following problem:
Let G be an undirected graph on n vertices with m edges, such that each edge has a weight w(e) ∈ {1, 2, 3}. Is there an algorithm which finds a minimum spanning tree of G in time O(n+m)?
Obviously, you could just run Prims on the graph, and you would get a minimum spanning tree, but not in the required time.
I was thinking that we could start by adding every edge with weight 1 to the tree, provided it creates no cycles, as if there is an edge of weight 1 that creates no cycles, then it is preferable to an edge of weight 2 say, and do this in increasing order.
Any help on possible ways to design an algorithm to do this would be appreciated and any implementations (java preferable but any language welcome) would be super helpful.
You're describing a minor variation of Kruskal's algorithm that makes the cost of sorting by weight O(m) for m edges because you only need to put the edges in 3 buckets.
Since the rest of Kruskal's is very nearly O(m) due to the amazing properties of the disjoint set data structure, you should be in good shape.
Building the tree itself ought to be O(m) rather than O(n + m) as was your goal because there's no need to process the vertices. E.g. if you have a few edges on a gazillion vertices, most with no connection, the latter don't need to increase algorithm cost if you're careful about data structure design.

Finding most efficient path between two nodes in an interval graph

I have interval data:
A = (0,50)
B = (20,500)
C = (80,420)
....
And realized that there's an associated graph with this data, the interval graph
I'd like to find the most efficient path to go from A to G (assume I know all of the positive vertex weights, wa, wb, wc...). I need to start at A and go to G, so the minimum spanning tree must be bound between these points. One of the constraints in our application is that the interval starting at A and ending at G must be covered in full (no gaps). I'm looking at networkX's minspanning tree method, and don't understand how to specify that A and G must be the start and endpoints.
Some other questions that come to mind are:
Since this problem is NP-hard, should I even bother looking for a min-spanning tree if the number of nodes is high? How many nodes would be too many?
Notice that interval F has a unique region. In other words, to completely cover the interval A-G, one HAS to go through F. Therefore, my minimum spanning tree probably should only connect A-F, not A-G. Is there a standard way, given a larger graph, to find all of the subgraphs whose intervals contain no unique patches? In other words, since all paths have to pass through F to get to G, A-F is the min spanning path of interest, not A-G. How does one reduce a graph in such a way without inspecting it manually?
Becasue I have to go from A-G, I would never go backwards or take a cyclic path. For example, I'd never go A-B-A. Do spanning trees incorporate this? And does this make my graph directed? Consider point C: from C one could go to D, E, or F, but never back to A (for our use case). What does this mean in regard to directionality of the graph?
Sorry for novice Q's, new to most of this.
If you must go from A to G in an efficient way, you aren't looking for a minimum spanning tree algorithm. A simple shortest path algorithm is enough. You just have to adapt you graph to put the weights in the edges instead of the nodes. But it's just a matter of setting the node's weight to the incoming edge.
Also, both shortest path and minimum spanning tree problems aren't NP-hard. There are known polynomial algorithms for all these problems. In special, shortest path can be solved by Dijkstra's algorithm (if your graph doesn't have negative edges, which seems to be true) and minimum spanning tree can be solved by Prim's or Kruskal's algorithm.
And finally, any tree, by definition doesn't have cycles.
As mentioned in another answer, Dijkstra's algorithm is the solution. What wasn't mentioned is how to implement that solution in networkx. Here it is. Simple as this:
import networkx as nx
my_graph = nx.Graph()
my_graph.add_edges_from([('A','B'),('B','C'),('A','C'),('C','D'),('A','D'),('C','E'),('D','E'),('D','F'),('F','G')])
#graph is now defined.
shortestpath = nx.dijkstra_path(my_graph, 'A', 'G') #optional weight argument here.
shortestpath
> ['A', 'D', 'F', 'G']
In general, more documentation on how to do the shortest path algorithms (and there are many variations thereof) in networkx is here.
Note if you have weights on nodes and you want to minimize the sum of the nodes in the path, what you do is place weights on the edges so that the weight of (u, v) is (w[u]+w[v])/2.
Then run nx.dijkstra_path with the optional argument telling networkx where to find the weight of the edges. The weight of the entire path will equal the sum of the intermediate weights, plus half the values of the end nodes. You can then correct for the end node weights.

Why the tree resulting from Kruskal is different from Dijkstra?

Can anyone explain why the tree resulting from Kruskal is different from Dijkstra ?
I know for the fact that kruskal works on nondescending order of edges but Dijkstra takes advantage of priority queue but still cannot understand why the resulted tree from them are different?
The basic difference, I would say, is that given a set of nodes, Dijkstra's algorithm finds the shortest path between 2 nodes. Which does not necessarily cover all the nodes in the graph.
However on Kruskal's case, the algorithm tries to cover all the nodes while keeping the edge cost minimum.
Consider this graph:
E
3 / |
B | 3
5 /3\ |
/ D
A | 2
\ F
1 \ / 1
C
2 \
G
Dijkstra's algorithm will return the path A-C-F-D-E for source and destination nodes A and E, at a total cost of 7. However Kruskal's algorithm should cover all the nodes, therefore it will consider edges [AC], [CG], [CF], [FD], [DB] and [DE] with a total cost of 12.
In Dijkstra, the irrelevant nodes (ones that are not in the path from source the destination) are ignored, e.g., G and B in this case. The resulting path, of course, is a tree but does not cover all the nodes. There might be millions of nodes connected to G (assuming they are not connected to other nodes) which would not be in the path of the Dijkstra's result. On the other hand, Kruskal had to add those nodes to the resulting tree.
The minimum spanning tree is not unique.
The Kruskal's algorithm select a minimum length edge of all possible edges which connect two different disjoint MST components, found so far.
The Dijkstra/Prim/Jarnik's algorithm selects minimum length edge of all the edges, which extend the single MST component found so far.
At each step, in the general case the algorithms select a minimum edge from distinct sets of possiblities.
PS. Well, if the OP refers to the shortest path tree of the Dijkstra's shortest path algorithm the answer is that the shortest path tree is not necessarily a minimum spanning tree, the algorithms compute different things.
They solve different problems. It might be possible that they produce the same trees in some cases (e.g. the graph is already a tree), but in general the trees are optimized for different things: one finds the shortest paths from a single source, another minimizes the total weight of the tree.
For example, consider we built MST with Kruskall. Let's say all the edges in MST weight 1 and it looks more or less linear. Assume that to get from A to Z it takes 5 edges, so the path from A to Z in MST will cost 5. However, it might as well be possible that original graph had an edge from A to Z with cost 3 (< 5), which MST didn't include, yet Dijkstra will probably have the edge in its resulting tree.

Graph Has Two / Three Different Minimal Spanning Trees ?

I'm trying to find an efficient method of detecting whether a given graph G has two different minimal spanning trees. I'm also trying to find a method to check whether it has 3 different minimal spanning trees. The naive solution that I've though about is running Kruskal's algorithm once and finding the total weight of the minimal spanning tree. Later , removing an edge from the graph and running Kruskal's algorithm again and checking if the weight of the new tree is the weight of the original minimal spanning tree , and so for each edge in the graph. The runtime is O(|V||E|log|V|) which is not good at all, and I think there's a better way to do it.
Any suggestion would be helpful,
thanks in advance
You can modify Kruskal's algorithm to do this.
First, sort the edges by weight. Then, for each weight in ascending order, filter out all irrelevant edges. The relevant edges form a graph on the connected components of the minimum-spanning-forest-so-far. You can count the number of spanning trees in this graph. Take the product over all weights and you've counted the total number of minimum spanning trees in the graph.
You recover the same running time as Kruskal's algorithm if you only care about the one-tree, two-trees, and three-or-more-trees cases. I think you wind up doing a determinant calculation or something to enumerate spanning trees in general, so you likely wind up with an O(MM(n)) worst-case in general.
Suppose you have a MST T0 of a graph. Now, if we can get another MST T1, it must have at least one edge E different from the original MST. Throw away E from T1, now the graph is separated into two components. However, in T0, these two components must be connected, so there will be another edge across this two components that has exactly the same weight as E (or we could substitute the one with more weight with the other one and get a smaller ST). This means substitute this other edge with E will give you another MST.
What this implies is if there are more than one MSTs, we can always change just a single edge from a MST and get another MST. So if you are checking for each edge, try to substitute the edge with the ones with the same weight and if you get another ST it is a MST, you will get a faster algorithm.
Suppose G is a graph with n vertices and m edges; that the weight of any edge e is W(e); and that P is a minimal-weight spanning tree on G, weighing Cost(W,P).
Let δ = minimal positive difference between any two edge weights. (If all the edge weights are the same, then δ is indeterminate; but in this case, any ST is an MST so it doesn't matter.) Take ε such that δ > n·ε > 0.
Create a new weight function U() with U(e)=W(e)+ε when e is in P, else U(e)=W(e). Compute Q, an MST of G under U. If Cost(U,Q) < Cost(U,P) then Q≠P. But Cost(W,Q) = Cost(W,P) by construction of δ and ε. Hence P and Q are distinct MSTs of G under W. If Cost(U,Q) ≥ Cost(U,P) then Q=P and distinct MSTs of G under W do not exist.
The method above determines if there are at least two distinct MSTs, in time O(h(n,m)) if O(h(n,m)) bounds the time to find an MST of G.
I don't know if a similar method can treat whether three (or more) distinct MSTs exist; simple extensions of it fall to simple counterexamples.

Resources