Minimum Spanning Tree (MST) algorithm variation

Minimum Spanning Tree (MST) algorithm variation - algorithm

I was asked the following question in an interview and I am unable to find an efficient solution.
Here is the problem:
We want to build a network and we are given c nodes/cities and D possible edges/connections made by roads. Edges are bidirectional and we know the cost of the edge. The costs of the edges can be represented as d[i,j] which denotes the cost of the edge i-j. Note not all c nodes can be directly connected to each other (D is the set of possible edges).
Now we are given a list of k potential edges/connections that have no cost. However, you can only choose one edge in the list of k edges to use (like getting free funding to build an airport between two cities).
So the question is... find the set of roads (and the one free airport) that minimizes total cost required to build the network connecting all cities in an efficient runtime.
So in short, solve a minimum spanning tree problem but where you can choose 1 edge in a list of k potential edges to be free of cost. I'm unsure how to solve... I've tried finding all the spanning trees in order of increasing cost and choosing the lowest cost, but I'm still challenged on how to consider the one free edge from the list of k potential free edges. I've also tried finding the MST of the D potential connections and then adjusting it according the the options in k to get a result.
Thank you for any help!

One idea would be to treat your favorite MST algorithm as a black box and to think about changing the edges in the graph before asking for the MST. For example, you could try something like this:
for each edge in the list of possible free edges:
make the graph G' formed by setting that edge cost to 0.
compute the MST of G'
return the cheapest MST out of all the ones generated this way
The runtime of this approach is O(kT(m, n)), where k is the number of edges to test and T(m, n) is the cost of computing an MST using your favorite black-box algorithm.
We can do better than this. There's a well-known problem of the following form:
Suppose you have an MST T for a graph G. You then reduce the cost of some edge {u, v}. Find an MST T' in the new graph G'.
There are many algorithms for solving this problem efficiently. Here's one:
Run a DFS in T starting at u until you find v.
If the heaviest edge on the path found this way costs more than {u, v}:
Delete that edge.
Add {u, v} to the spanning tree.
Return the resulting tree T'.
(Proving that this works is tedious but doable.) This would give an algorithm of cost O(T(m, n) + kn), since you would be building an initial MST (time T(m, n)), then doing k runs of DFS in a tree with n nodes.
However, this can potentially be improved even further if you're okay using some more advanced algorithms. The paper "On Cartesian Trees and Range Minimum Queries" by Demaine et al shows that in O(n) time, it is possible to preprocess a minimum spanning tree so that, in time O(1), queries of the form "what is the lowest-cost edge on the path in this tree between nodes u and v?" in time O(1). You could therefore build this structure instead of doing a DFS to find the bottleneck edge between u and v, reducing the overall runtime to O(T(m, n) + n + k). Given that T(m, n) is very low (the best known bound is O(m α(m)), where α(m) is the Ackermann inverse function and is less than five for all inputs in the feasible univers), this is asymptotically a very quick algorithm!

First generate a MST. Now, if you add a free edge, you will create exactly one cycle. You could then remove the heaviest edge in the cycle to get a cheaper tree.
To find the best tree you can make by adding one free edge, you need to find the heaviest edge in the MST that you could replace with a free one.
You can do that by testing one free edge at a time:
Pick a free edge
Find the lowest common ancestor in the tree (from an arbitrary root) of its adjacent vertices
Remember the heaviest edge on the path between the free edge vertices
When you're done, you know which free edge to use -- it's the one associated with the heaviest tree edge, and you know which edge it replaces.
In order to make steps (2) and (3) faster, you can remember the depth of each node and connect it to multiple ancestors like a skip list. You can then do those steps in O(log |V|) time, leading to a total complexity of O( (|E|+k) log |V| ), which is pretty good.
EDIT: Even Easier Way
After thinking about this a bit, it seems there's a super easy way to figure out which free edge to use and which MST edge to replace.
Disregarding the k possible free edges, you build the MST from the other edges using Kruskal's algorithm, but you modify the usual disjoint set data structure as follows:
Use union by size or rank, but not path compression. Every union operation will then establish exactly one link, and take O(log N) time, and all path lengths will be at most O(log N) long.
For each link, remember the index of the edge that caused it to be created.
For each possible free edge, then, you can walk up the links in the disjoint set structure to find out exactly at which point its endpoints were connected into the same connected component. You get the index of the last required edge, i.e., the one it would replace, and the free edge with the greatest replacement target index is the one you should use.

Related

Minimum spanning tree to minimize cost

Can someone please help me solve this problem?
We have a set E of roads, a set H of highways, and a set V of different cities. We also have a cost x(i) associated to each road i and a cost y(i) associated to each highways i. We want to build the roads to connect the cities, with the conditions that there is always a path between any pair of cities and that we can build at most one highway, which may be cheaper than a road.
Set E and set H are different, and their respective costs are unrelated.
Design an algorithm to build the roads (with at most one highway) that minimize the total cost.

So, what we have is a fully connected graph of edges.
Solution steps:
Find the minimum spanning tree for the roads alone and consider it as the minimum cost.
Add one highway to the roads graph an calculate the minimum spanning cost tree again.
compare step 2 cost with the minimum cost to replace it if its smaller.
remove that high way.
go back to step 2 and go the steps again for each highway.
O(nm) = m*mst_cost(n)

Using Prim's or Kruskal's to build an MST: O(E log V).
The problem is the constraint of at most 1 highway.
1. Naive method to solve this:
For each possible highway, build the MST from scratch.
Time complexity of this solution: O(H E log V)
2. Alternative
Idea: If you build an MST, you can refine the MST with a better MST if you have an additional available edge you have not considered before.
Suppose the new edge connects (u,v). If you use this edge, you can remove the most expensive edge in the path between vertices u and v in the MST. You can find the path naively in O(V) time.
Using this idea, the time complexity is the cost to build the initial MST O(E log V) and the time to try to refine the MST with each of the H highways. The total algorithmic complexity is therefore O(E log V + H V), which is better than the first solution.
3. Optimized refinement
Instead of doing a naive path-searching method with the second method, we can find a faster way to do this. One related problem is LCA (lowest-common ancestor). A good way of solving LCA is using jump pointers. First you root hte tree, then each vertex will have jump pointers towards the root (1 step, 2 steps, 4 steps etc.) Pre-processing might cost O(V log V) time, and finding the LCA of 2 vertices is O(log V) worst case (although it is actually O(log (depth of tree)) which is usually better).
Once you have found the LCA, that implicitly gives you the path between vertices u and v. However, to find the most expensive edge to delete could be expensive since traversing the path is costly.
In 1-dimensional problems, the range-maximum-query (RMQ) can be employed. This uses a segment tree to solve the RMQ in O(log N) time.
Instead of a 1-dimensional space (like an array), we have a tree. However, we can apply the same idea, and build a segment tree-like structure. In fact, this is equivalent to bundling an extra piece of information with each jump pointer. To find the LCA, each vertex in the tree will have log(tree depth) jump pointers towards the root. We can bundle the maximum edge weight of the edges we jump over with the jump pointer. The cost of adding this information is the same as creating the jump pointer in the first place. Therefore, a slight refinement to the LCA algorithm allows us to find the maximum edge weight on the path between vertices u and v in O(log (depth)) time.
Finally, putting it together, the algorithmic complexity of this 3rd solution is O(E log V + H log V) or equivalently O((E+H) log V).

Recursive minimal spanning tree algorithms

Is this a right algorithm for finding minimal spanning tree.
Divide Graph into 2 equally connected parts. Find its minimal spanning trees. Connect them using the smallest edge that connects them. I am trying to get counterexample of this algorithm, but can't.

Consider a four-node graph, connected in a square, with the left edge having cost 10 and all other edges having cost 1. If you divide the graph into left and right for your recursive step, you will end up with a spanning tree of cost 12, instead of cost 3.
MST is not well-adapted to "divide-and-conquer" algorithms. The closest thing is probably the Reverse-Delete algorithm; whenever you fail to remove an edge (because it would disconnect the graph), you can think of the remaining steps as executing recursively on the two sides of that edge.

You have described a divide and conquer algorithm which will not work when determining an MST. Sneftel provided a good counter-example and recursively dividing the graph into two connected parts would be extremely costly.
Instead, a good approach to finding an MST would be to use a greedy algorithm such as Prim's algorithm. We know a greedy algorithm will work because this problem exhibits optimal substructure. For this algorithm, you will want to represent your graph as an adjacency list. First, start at an arbitrary node and add it to a visited list. Add all edges from this node into a min-heap. Include the cheapest edge in your MST and add the connecting node to your visited list. From that node add all edges to your min-heap and then select the cheapest edge to a node that has yet to be visited. Continue doing so until all nodes are visited. Once that is done you have your MST.
You can use other data structures to store the graph and the visited edges, but the ones I have outlined above will maximize the runtime. If we analyze the run-time with these data structures we can see that the runtime is O(E log V) which is the time to update the cost of the elements and maintaining your heap after an edge has been removed. More specifically O(log V) to fix the heap and that is done E number of times.
I also found this quick 2-minute video that outlines Prim's algorithm with an example: Prim's Algorithm in 2 Minutes
I hope this information is helpful!

Time complexity of creating a minimal spanning tree if the number of edges is known

Suppose that the number of edges of a connected graph is known and the weight of each edge is distinct, would it possible to create a minimal spanning tree in linear time?
To do this we must look at each edge; and during this loop there can contain no searches otherwise it would result in at least n log n time. I'm not sure how to do this without searching in the loop. It would mean that, somehow we must only look at each edge once, and decide rather to include it or not based on some "static" previous values that does not involve a growing data structure.
So.. let's say we keep the endpoints of the node in question, then look at the next node, if the next node has the same vertices as prev, then compare the weight of prev and current node and keep the lower one. If the current node's endpoints are not equal to prev, then it is in a different component .. now I am stuck because we cannot create a hash or array to keep track of the component nodes that are already added while look through each edge in linear time.
Another approach I thought of is to find the edge with the minimal weight; since the edge weights are distinct this edge will be part of any MST. Then.. I am stuck. Since we cannot do this for n - 1 edges in linear time.
Any hints?
EDIT
What if we know the number of nodes, the number of edges and also that each edge weight is distinct? Say, for example, there are n nodes, n + 6 edges?
Then we would only have to find and remove the correct 7 edges correct?

To the best of my knowledge there is no way to compute an MST faster by knowing how many edges there are in the graph and that they are distinct. In the worst case, you would have to look at every edge in the graph before finding the minimum-cost edge (which must be in the MST), which takes Ω(m) time in the worst case. Therefore, I'll claim that any MST algorithm must take Ω(m) time in the worst case.
However, if we're already doing Ω(m) work in the worst-case, we could do the following preprocessing step on any MST algorithm:
Scan over the edges and count up how many there are.
Add an epsilon value to each edge weight to ensure the edges are unique.
This can be done in time Ω(m) as well. Consequently, if there were a way to speed up MST computation knowing the number of edges and that the edge costs are distinct, we would just do this preprocessing step on any current MST algorithm to try to get faster performance. Since to the best of my knowledge no MST algorithm actually tries to do this for performance reasons, I would suspect that there isn't a (known) way to get a faster MST algorithm based on this extra knowledge.
Hope this helps!

There's a famous randomised linear-time algorithm for minimum spanning trees whose complexity is linear in the number of edges. See "A randomized linear-time algorithm to find minimum spanning trees" by Karger, Klein, and Tarjan.
The key result in the paper is their "sampling lemma" -- that, if you independently randomly select a subset of the edges with probability p and find the minimum spanning tree of this subgraph, then there are only |V|/p edges that are better than the worst edge in the tree path connecting its ends.
As templatetypedef noted, you can't beat linear-time. That all edge weights are distinct is a common assumption that simplifies analysis; if anything, it makes MST algorithms run a little slower.

The fact that a number of edges (N) is known does not influence the complexity in any way. N is still a finite but unbounded variable, and each graph will have different N. If you place a upper bound on N, say, 1 million, then the complexity is O(1 million log 1 million) = O(1).
The fact that each edge has distinct weight does not influence the program either, because it does not say anything about the graph's structure. Therefore knowledge about current case cannot influence further processing, as we cannot predict how the graph's structure will look like in the next step.

If the number of edges is close to n, like in this case n-6 (after edit), we know that we only need to remove 7 edges as every spanning tree has only n-1 edges.
The Cycle Property shows that the most expensive edge in a cycle does not belong to any Minimum Spanning tree(assuming all edges are distinct) and thus, should be removed.
Now you can simply apply BFS or DFS to identify a cycle and remove the most expensive edge. So, overall, we need to run BFS 7 times. This takes 7*n time and gives us a time complexity of O(n). Again, this is only true if the number of edges is close to the number of nodes.

Prim and Kruskal's algorithms complexity

Given an undirected connected graph with weights. w:E->{1,2,3,4,5,6,7} - meaning there is only 7 weights possible.
I need to find a spanning tree using Prim's algorithm in O(n+m) and Kruskal's algorithm in O( m*a(m,n)).
I have no idea how to do this and really need some guidance about how the weights can help me in here.

You can sort edges weights faster.
In Kruskal algorithm you don't need O(M lg M) sort, you just can use count sort (or any other O(M) algorithm). So the final complexity is then O(M) for sorting and O(Ma(m)) for union-find phase. In total it is O(Ma(m)).
For the case of Prim algorithm. You don't need to use heap, you need 7 lists/queues/arrays/anything (with constant time insert and retrieval), one for each weight. And then when you are looking for cheapest outgoing edge you check is one of these lists is nonempty (from the cheapest one) and use that edge. Since 7 is a constant, whole algorithms runs in O(M) time.

As I understand, it is not popular to answer homework assignments, but this could hopefully be usefull for other people than just you ;)
Prim:
Prim is an algorithm for finding a minimum spanning tree (MST), just as Kruskal is.
An easy way to visualize the algorithm, is to draw the graph out on a piece of paper.
Then you create a moveable line (cut) over all the nodes you have selected. In the example below, the set A will be the nodes inside the cut. Then you chose the smallest edge running through the cut, i.e. from a node inside of the line to a node on the outside. Always chose the edge with the lowest weight. After adding the new node, you move the cut, so it contains the newly added node. Then you repeat untill all nodes are within the cut.
A short summary of the algorithm is:
Create a set, A, which will contain the chosen verticies. It will initially contain a random starting node, chosen by you.
Create another set, B. This will initially be empty and used to mark all chosen edges.
Choose an edge E (u, v), that is, an edge from node u to node v. The edge E must be the edge with the smallest weight, which has node u within the set A and v is not inside A. (If there are several edges with equal weight, any can be chosen at random)
Add the edge (u, v) to the set B and v to the set A.
Repeat step 3 and 4 until A = V, where V is the set of all verticies.
The set A and B now describe you spanning tree! The MST will contain the nodes within A and B will describe how they connect.
Kruskal:
Kruskal is similar to Prim, except you have no cut. So you always chose the smallest edge.
Create a set A, which initially is empty. It will be used to store chosen edges.
Chose the edge E with minimum weight from the set E, which is not already in A. (u,v) = (v,u), so you can only traverse the edge one direction.
Add E to A.
Repeat 2 and 3 untill A and E are equal, that is, untill you have chosen all edges.
I am unsure about the exact performance on these algorithms, but I assume Kruskal is O(E log E) and the performance of Prim is based on which data structure you use to store the edges. If you use a binary heap, searching for the smallest edge is faster than if you use an adjacency matrix for storing the minimum edge.
Hope this helps!

Finding X the lowest cost trees in graph

I have Graph with N nodes and edges with cost. (graph may be Complete but also can contain zero edges).
I want to find K trees in the graph (K < N) to ensure every node is visited and cost is the lowest possible.
Any recommendations what the best approach could be?
I tried to modify the problem to finding just single minimal spanning tree, but didn't succeeded.
Thank you for any hint!
EDIT
little detail, which can be significant. To cost is not related to crossing the edge. The cost is the price to BUILD such edge. Once edge is built, you can traverse it forward and backwards with no cost. The problem is not to "ride along all nodes", the problem is about "creating a net among all nodes". I am sorry for previous explanation
The story
Here is the story i have heard and trying to solve.
There is a city, without connection to electricity. Electrical company is able to connect just K houses with electricity. The other houses can be connected by dropping cables from already connected houses. But dropping this cable cost something. The goal is to choose which K houses will be connected directly to power plant and which houses will be connected with separate cables to ensure minimal cable cost and all houses coverage :)

As others have mentioned, this is NP hard. However, if you're willing to accept a good solution, you could use simulated annealing. For example, the traveling salesman problem is NP hard, yet near-optimal solutions can be found using simulated annealing, e.g. http://www.codeproject.com/Articles/26758/Simulated-Annealing-Solving-the-Travelling-Salesma

You are describing something like a cardinality constrained path cover. It's in the Traveling Salesman/ Vehicle routing family of problems and is NP-Hard. To create an algorithm you should ask
Are you only going to run it on small graphs.
Are you only going to run it on special cases of graphs which do have exact algorithms.
Can you live with a heuristic that solves the problem approximately.

Assume you can find a minimum spanning tree in O(V^2) using prim's algorithm.
For each vertex, find the minimum spanning tree with that vertex as the root.
This will be O(V^3) as you run the algorithm V times.
Sort these by total mass (sum of weights of their vertices) of the graph. This is O(V^2 lg V) which is consumed by the O(V^3) so essentially free in terms of order complexity.
Take the X least massive graphs - the roots of these are your "anchors" that are connected directly to the grid, as they are mostly likely to have the shortest paths. To determine which route it takes, you simply follow the path to root in each node in each tree and wire up whatever is the shortest. (This may be further optimized by sorting all paths to root and using only the shortest ones first. This will allow for optimizations on the next iterations. Finding path to root is O(V). Finding it for all V X times is O(V^2 * X). Because you would be doing this for every V, you're looking at O(V^3 * X). This is more than your biggest complexity, but I think the average case on these will be small, even if their worst case is large).
I cannot prove that this is the optimal solution. In fact, I am certain it is not. But when you consider an electrical grid of 100,000 homes, you can not consider (with any practical application) an NP hard solution. This gives you a very good solution in O(V^3 * X), which I imagine is going to give you a solution very close to optimal.

Looking at your story, I think that what you call a path can be a tree, which means that we don't have to worry about Hamiltonian circuits.
Looking at the proof of correctness of Prim's algorithm at http://en.wikipedia.org/wiki/Prim%27s_algorithm, consider taking a minimum spanning tree and removing the most expensive X-1 links. I think the proof there shows that the result has the same cost as the best possible answer to your problem: the only difference is that when you compare edges, you may find that the new edge join two separated components, but in this case you can maintain the number of separated components by removing an edge with cost at most that of the new edge.
So I think an answer for your problem is to take a minimum spanning tree and remove the X-1 most expensive links. This is certainly the case for X=1!

Here is attempt at solving this...
For X=1 I can calculate minimal spanning tree (MST) with Prim's algorithm from each node (this node is the only one connected to the grid) and select the one with the lowest overall cost
For X=2 I create extra node (Power plant node) beside my graph. I connect it with random node (eg. N0) by edge with cost of 0. I am now sure I have one power plant plug right (the random node will definitely be in one of the tree, so whole tree will be connected). Now the iterative part. I take other node (eg. N1) and again connected with PP with 0 cost edge. Now I calculate MST. Then repeat this process with replacing N1 with N2, N3 ...
So I will test every pair [N0, NX]. The lowest cost MST wins.
For X>2 is it really the same as for X=2, but I have to test connect to PP every (x-1)-tuple and calculate MST
with x^2 for MST I have complexity about (N over X-1) * x^2... Pretty complex, but I think it will give me THE OPTIMAL solution
what do you think?
edit by random node I mean random but FIXED node
attempt to visualize for x=2 (each description belongs to image above it)
Let this be our city, nodes A - F are houses, edges are candidates to future cables (each has some cost to build)
Just for image, this could be the solution
Let the green one be the power plant, this is how can look connection to one tree
But this different connection is really the same (connection to power plant(pp) cost the same, cables remains untouched). That is why we can set one of the nodes as fixed point of contact to the pp. We can be sure, that the node will be in one of the trees, and it does not matter where in the tree is.
So let this be our fixed situation with G as PP. Edge (B,G) with zero cost is added.
Now I am trying to connect second connection with PP (A,G, cost 0)
Now I calculate MST from the PP. Because red edges are the cheapest (the can actually have even negative cost), is it sure, that both of them will be in MST.
So when running MST I get something like this. Imagine detaching PP and two MINIMAL COST trees left. This is the best solution for A and B are the connections to PP. I store the cost and move on.
Now I do the same for B and C connections
I could get something like this, so compare cost to previous one and choose the better one.
This way I have to try all the connection pairs (B,A) (B,C) (B,D) (B,E) (B,F) and the cheapest one is the winner.
For X=3 I would just test other tuples with one fixed again. (A,B,C) (A,B,D) ... (A,C,D) ... (A,E,F)

I just came up with the easy solution as follows:
N - node count
C - direct connections to the grid
E - available edges
1, Sort all edges by cost
2, Repeat (N-C) times:
Take the cheapest edge available
Check if adding this edge will not caused circles in already added edge
If not, add this edge
3, That is all... You will end up with C disjoint sets of edges, connect every set to the grid

Sounds like the famous Traveling Salesman problem. The problem known to be NP-hard. Take a look at the Wikipedia as your starting point: http://en.wikipedia.org/wiki/Travelling_salesman_problem

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio