Here's the problem.
A weighted undirected connected graph G is given. The weights are constant. The task is to come up with an algorithm that would find the total weight of a spanning tree for G that fulfills these two conditions (ordered by priority):
The spanning tree has to have maximum number of edges with the same weight (the actual repeated weight value is irrelevant);
The total spanning tree weight should be minimized. That means, for example, that the spanning tree T1 with weight 120 that has at most 4 edges with the same weight (and the weight of each of those four is 15) should be preferred over the spanning tree T2 with weight 140 that has at most 4 edges with the same weight (and the weight of each of those four is 8).
I've been stuck on that for quite a while now. I've already implemented Boruvka's MST search algorithm for the graph, and now I'm not sure whether I should perform any operations after MST is found, or it's better to modify MST-search algorithm itself.
Any suggestions welcome!
This can be done naively in O(m^2), and without much effort in O(mn). It seems like it is possible to do it even faster, maybe something like O(m log^2(n)) or even O(m log(n)) with a little work.
The basic idea is this. For any weight k, let MST(k) be a spanning tree which contains the maximum possible number of edges of weight k, and has minimum overall weight otherwise. There may be multiple such trees, but they will all have the same total weight, so it doesn't matter which one you pick. MST(k) can be found by using any MST algorithm and treating all edges of weight k as having weight -Infinity.
The solution to this problem will be MST(k) for some k. So the naive solution is to generate all the MST(k) and picking the one with the max number of identical edge weights and then minimum overall weight. Using Kruskal's algorithm, you can do this in O(m^2) since you only have to sort the edges once.
This can be improved to O(mn) by first finding the MST using the original weights, and then for each k, modifying the tree to MST(k) by reducing the weight of each edge of weight k to -Infinity. Updating an MST for a reduced edge weight is an O(n) operation, since you just need to find the maximum weight edge in the corresponding fundamental cycle.
To do better than O(mn) using this approach, you would have to preprocess the original MST in such a way that these edge weight reductions can be performed more quickly. It seems that something like a heavy path decomposition should work here, but there are some details to work out.
Related
I was asked the following question in an interview and I am unable to find an efficient solution.
Here is the problem:
We want to build a network and we are given c nodes/cities and D possible edges/connections made by roads. Edges are bidirectional and we know the cost of the edge. The costs of the edges can be represented as d[i,j] which denotes the cost of the edge i-j. Note not all c nodes can be directly connected to each other (D is the set of possible edges).
Now we are given a list of k potential edges/connections that have no cost. However, you can only choose one edge in the list of k edges to use (like getting free funding to build an airport between two cities).
So the question is... find the set of roads (and the one free airport) that minimizes total cost required to build the network connecting all cities in an efficient runtime.
So in short, solve a minimum spanning tree problem but where you can choose 1 edge in a list of k potential edges to be free of cost. I'm unsure how to solve... I've tried finding all the spanning trees in order of increasing cost and choosing the lowest cost, but I'm still challenged on how to consider the one free edge from the list of k potential free edges. I've also tried finding the MST of the D potential connections and then adjusting it according the the options in k to get a result.
Thank you for any help!
One idea would be to treat your favorite MST algorithm as a black box and to think about changing the edges in the graph before asking for the MST. For example, you could try something like this:
for each edge in the list of possible free edges:
make the graph G' formed by setting that edge cost to 0.
compute the MST of G'
return the cheapest MST out of all the ones generated this way
The runtime of this approach is O(kT(m, n)), where k is the number of edges to test and T(m, n) is the cost of computing an MST using your favorite black-box algorithm.
We can do better than this. There's a well-known problem of the following form:
Suppose you have an MST T for a graph G. You then reduce the cost of some edge {u, v}. Find an MST T' in the new graph G'.
There are many algorithms for solving this problem efficiently. Here's one:
Run a DFS in T starting at u until you find v.
If the heaviest edge on the path found this way costs more than {u, v}:
Delete that edge.
Add {u, v} to the spanning tree.
Return the resulting tree T'.
(Proving that this works is tedious but doable.) This would give an algorithm of cost O(T(m, n) + kn), since you would be building an initial MST (time T(m, n)), then doing k runs of DFS in a tree with n nodes.
However, this can potentially be improved even further if you're okay using some more advanced algorithms. The paper "On Cartesian Trees and Range Minimum Queries" by Demaine et al shows that in O(n) time, it is possible to preprocess a minimum spanning tree so that, in time O(1), queries of the form "what is the lowest-cost edge on the path in this tree between nodes u and v?" in time O(1). You could therefore build this structure instead of doing a DFS to find the bottleneck edge between u and v, reducing the overall runtime to O(T(m, n) + n + k). Given that T(m, n) is very low (the best known bound is O(m α(m)), where α(m) is the Ackermann inverse function and is less than five for all inputs in the feasible univers), this is asymptotically a very quick algorithm!
First generate a MST. Now, if you add a free edge, you will create exactly one cycle. You could then remove the heaviest edge in the cycle to get a cheaper tree.
To find the best tree you can make by adding one free edge, you need to find the heaviest edge in the MST that you could replace with a free one.
You can do that by testing one free edge at a time:
Pick a free edge
Find the lowest common ancestor in the tree (from an arbitrary root) of its adjacent vertices
Remember the heaviest edge on the path between the free edge vertices
When you're done, you know which free edge to use -- it's the one associated with the heaviest tree edge, and you know which edge it replaces.
In order to make steps (2) and (3) faster, you can remember the depth of each node and connect it to multiple ancestors like a skip list. You can then do those steps in O(log |V|) time, leading to a total complexity of O( (|E|+k) log |V| ), which is pretty good.
EDIT: Even Easier Way
After thinking about this a bit, it seems there's a super easy way to figure out which free edge to use and which MST edge to replace.
Disregarding the k possible free edges, you build the MST from the other edges using Kruskal's algorithm, but you modify the usual disjoint set data structure as follows:
Use union by size or rank, but not path compression. Every union operation will then establish exactly one link, and take O(log N) time, and all path lengths will be at most O(log N) long.
For each link, remember the index of the edge that caused it to be created.
For each possible free edge, then, you can walk up the links in the disjoint set structure to find out exactly at which point its endpoints were connected into the same connected component. You get the index of the last required edge, i.e., the one it would replace, and the free edge with the greatest replacement target index is the one you should use.
This question already has answers here:
A fast algorithm for minimum spanning trees when edge lengths are constrained?
(2 answers)
Closed 7 years ago.
I have recently been doing some research into Prims/Kruskals algorithms for finding minimum spanning trees in graphs, and I am interested in the following problem:
Let G be an undirected graph on n vertices with m edges, such that each edge has a weight w(e) ∈ {1, 2, 3}. Is there an algorithm which finds a minimum spanning tree of G in time O(n+m)?
Obviously, you could just run Prims on the graph, and you would get a minimum spanning tree, but not in the required time.
I was thinking that we could start by adding every edge with weight 1 to the tree, provided it creates no cycles, as if there is an edge of weight 1 that creates no cycles, then it is preferable to an edge of weight 2 say, and do this in increasing order.
Any help on possible ways to design an algorithm to do this would be appreciated and any implementations (java preferable but any language welcome) would be super helpful.
You're describing a minor variation of Kruskal's algorithm that makes the cost of sorting by weight O(m) for m edges because you only need to put the edges in 3 buckets.
Since the rest of Kruskal's is very nearly O(m) due to the amazing properties of the disjoint set data structure, you should be in good shape.
Building the tree itself ought to be O(m) rather than O(n + m) as was your goal because there's no need to process the vertices. E.g. if you have a few edges on a gazillion vertices, most with no connection, the latter don't need to increase algorithm cost if you're careful about data structure design.
Could you please help me with this problem?
Given an undirected graph G, connected, with weighted edges, such that the weights are integers in [1,k] . Write a modified version of Prim's algorithm that returns the minimum spanning tree in O(kn+m) time.
Note:
n represents the number of vertices
m represents the number of edges
You should be using the limited range of the edge length. This will help you keep a priority queue of the edges more efficiently. Keep in mind the most important step in the algorithm is to find the minimum-weight edge connecting the tree built thus far with a node not yet added to the tree. Try to use counting sort as an inspiration.
Suppose that the number of edges of a connected graph is known and the weight of each edge is distinct, would it possible to create a minimal spanning tree in linear time?
To do this we must look at each edge; and during this loop there can contain no searches otherwise it would result in at least n log n time. I'm not sure how to do this without searching in the loop. It would mean that, somehow we must only look at each edge once, and decide rather to include it or not based on some "static" previous values that does not involve a growing data structure.
So.. let's say we keep the endpoints of the node in question, then look at the next node, if the next node has the same vertices as prev, then compare the weight of prev and current node and keep the lower one. If the current node's endpoints are not equal to prev, then it is in a different component .. now I am stuck because we cannot create a hash or array to keep track of the component nodes that are already added while look through each edge in linear time.
Another approach I thought of is to find the edge with the minimal weight; since the edge weights are distinct this edge will be part of any MST. Then.. I am stuck. Since we cannot do this for n - 1 edges in linear time.
Any hints?
EDIT
What if we know the number of nodes, the number of edges and also that each edge weight is distinct? Say, for example, there are n nodes, n + 6 edges?
Then we would only have to find and remove the correct 7 edges correct?
To the best of my knowledge there is no way to compute an MST faster by knowing how many edges there are in the graph and that they are distinct. In the worst case, you would have to look at every edge in the graph before finding the minimum-cost edge (which must be in the MST), which takes Ω(m) time in the worst case. Therefore, I'll claim that any MST algorithm must take Ω(m) time in the worst case.
However, if we're already doing Ω(m) work in the worst-case, we could do the following preprocessing step on any MST algorithm:
Scan over the edges and count up how many there are.
Add an epsilon value to each edge weight to ensure the edges are unique.
This can be done in time Ω(m) as well. Consequently, if there were a way to speed up MST computation knowing the number of edges and that the edge costs are distinct, we would just do this preprocessing step on any current MST algorithm to try to get faster performance. Since to the best of my knowledge no MST algorithm actually tries to do this for performance reasons, I would suspect that there isn't a (known) way to get a faster MST algorithm based on this extra knowledge.
Hope this helps!
There's a famous randomised linear-time algorithm for minimum spanning trees whose complexity is linear in the number of edges. See "A randomized linear-time algorithm to find minimum spanning trees" by Karger, Klein, and Tarjan.
The key result in the paper is their "sampling lemma" -- that, if you independently randomly select a subset of the edges with probability p and find the minimum spanning tree of this subgraph, then there are only |V|/p edges that are better than the worst edge in the tree path connecting its ends.
As templatetypedef noted, you can't beat linear-time. That all edge weights are distinct is a common assumption that simplifies analysis; if anything, it makes MST algorithms run a little slower.
The fact that a number of edges (N) is known does not influence the complexity in any way. N is still a finite but unbounded variable, and each graph will have different N. If you place a upper bound on N, say, 1 million, then the complexity is O(1 million log 1 million) = O(1).
The fact that each edge has distinct weight does not influence the program either, because it does not say anything about the graph's structure. Therefore knowledge about current case cannot influence further processing, as we cannot predict how the graph's structure will look like in the next step.
If the number of edges is close to n, like in this case n-6 (after edit), we know that we only need to remove 7 edges as every spanning tree has only n-1 edges.
The Cycle Property shows that the most expensive edge in a cycle does not belong to any Minimum Spanning tree(assuming all edges are distinct) and thus, should be removed.
Now you can simply apply BFS or DFS to identify a cycle and remove the most expensive edge. So, overall, we need to run BFS 7 times. This takes 7*n time and gives us a time complexity of O(n). Again, this is only true if the number of edges is close to the number of nodes.
Yes this is homework. I was wondering if someone could explain the process of Sollin's (or Borůvka's) algorithm for determining a minimum spanning tree. Also if you could explain how to determine the number of iterations in the worst case, that would be great.
On a top level, the algorithm works as follows:
Maintain that you have a number of spanning trees for some subgraphs. Initially, every vertex of the graph is a m.s.t. with no edges.
In each iteration, for each of your spanning trees, find a cheapest edge connecting it to another spanning tree. (This is a simplification.)
The worst case in terms of iterations is that you always merge pairs of trees. In that case, the number of trees you have will halve in each iteration, so the number of iterations is logarithmic in the number of nodes.
Also note that there is a special trick involved in choosing the edges to add: if you were not careful, you might introduce a circle when tree A connects to tree B, tree B connects to tree C and tree C connects to tree A. (This can only happen if all three edges chosen have the same weight. The trick is to have an arbitrary but fixed tie-breaker, like a fixed order of the edges.)
So there, that's my back-of-index-card overview.
I'm using the layman's terminology.
First select a vertex
Check all the edges from that vertex and select one with the minimum
weight
Do this for all the vertices ( some edges may be selected more than
once)
You will get connected components.
From these connected components select one edge with minimum weight.
Your spanning tree with minimum weight will be formed