Is this minimum spanning tree algorithm correct? - algorithm

The minimum spanning tree problem is to take a connected weighted graph and find the subset of its edges with the lowest total weight while keeping the graph connected (and as a consequence resulting in an acyclic graph).
The algorithm I am considering is:
Find all cycles.
remove the largest edge from each cycle.
The impetus for this version is an environment that is restricted to "rule satisfaction" without any iterative constructs. It might also be applicable to insanely parallel hardware (i.e. a system where you expect to have several times more degrees of parallelism then cycles).
Edits:
The above is done in a stateless manner (all edges that are not the largest edge in any cycle are selected/kept/ignored, all others are removed).

What happens if two cycles overlap? Which one has its longest edge removed first? Does it matter if the longest edge of each is shared between the two cycles or not?
For example:
V = { a, b, c, d }
E = { (a,b,1), (b,c,2), (c,a,4), (b,d,9), (d,a,3) }
There's an a -> b -> c -> a cycle, and an a -> b -> d -> a

#shrughes.blogspot.com:
I don't know about removing all but two - I've been sketching out various runs of the algorithm and assuming that parallel runs may remove an edge more than once I can't find a situation where I'm left without a spanning tree. Whether or not it's minimal I don't know.

For this to work, you'd have to detail how you would want to find all cycles, apparently without any iterative constructs, because that is a non-trivial task. I'm not sure that's possible. If you really want to find a MST algorithm that doesn't use iterative constructs, take a look at Prim's or Kruskal's algorithm and see if you could modify those to suit your needs.
Also, is recursion barred in this theoretical architecture? If so, it might actually be impossible to find a MST on a graph, because you'd have no means whatsoever of inspecting every vertex/edge on the graph.

I dunno if it works, but no matter what your algorithm is not even worth implementing. Finding all cycles will be the freaking huge bottleneck that will kill it. Also doing that without iterations is impossible. Why don't you implement some standard algorithm, let's say Prim's.

Your algorithm isn't quite clearly defined. If you have a complete graph, your algorithm would seem to entail, in the first step, removing all but the two minimum elements. Also, listing all the cycles in a graph can take exponential time.
Elaboration:
In a graph with n nodes and an edge between every pair of nodes, there are, if I have my math right, n!/(2k(n-k)!) cycles of size k, if you're counting a cycle as some subgraph of k nodes and k edges with each node having degree 2.

#Tynan The system can be described (somewhat over simplified) as a systems of rules describing categorizations. "Things are in category A if they are in B but not in C", "Nodes connected to nodes in Z are also in Z", "Every category in M is connected to a node N and has 'child' categories, also in M for every node connected to N". It's slightly more complicated than this. (I have shown that by creating unstable rules you can model a turning machine but that's beside the point.) It can't explicitly define iteration or recursion but can operate on recursive data with rules like the 2nd and 3rd ones.
#Marcin, Assume that there are an unlimited number of processors. It is trivial to show that the program can be run in O(n^2) for n being the longest cycle. With better data structures, this can be reduced to O(n*O(set lookup function)), I can envision hardware (quantum computers?) that can evaluate all cycles in constant time. giving a O(1) solution to the MST problem.
The Reverse-delete algorithm seems to provide a partial proof of correctness (that the proposed algorithm will not produce a non-minimal spanning tree) this is derived by arguing that mt algorithm will remove every edge that the Reverse-delete algorithm will. However I'm not sure how to show that my algorithm won't delete more than that algorithm.
Hhmm....

OK this is an attempt to finish the proof of correctness. By analogy to the Reverse-delete algorithm, we know that enough edges will be removed. What remains is to show that there will not be to many edges removed.
Removing to many edges can be described as removing all the edges between the side of a binary partition of the graph nodes. However only edges in a cycle are ever removed, therefor, for all edge between partitions to be removed, there needs to be a return path to complete the cycle. If we only consider edges between the partitions then the algorithm can at most remove the larger of each pair of edges, this can never remove the smallest bridging edge. Therefor for any arbitrary binary partitioning, the algorithm can't sever all links between the side.
What remains is to show that this extends to >2 way partitions.

Related

Cycle detection that handles a series of directed edges? [duplicate]

I came upon wait-for graphs and I wonder, are there any efficient algorithms for detecting if adding an edge to a directed graph results in a cycle?
The graphs in question are mutable (they can have nodes and edges added or removed). And we're not interested in actually knowing an offending cycle, just knowing there is one is enough (to prevent adding an offending edge).
Of course it'd be possible to use an algorithm for computing strongly connected components (such as Tarjan's) to check if the new graph is acyclic or not, but running it again every time an edge is added seems quite inefficient.
If I understood your question correctly, then a new edge (u,v) is only inserted if there was no path from v to u before (i.e., if (u,v) does not create a cycle). Thus, your graph is always a DAG (directed acyclic graph). Using Tarjan's Algorithm to detect strongly connected components (http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) sounds like an overkill in this case. Before inserting (u,v), all you have to check is whether there is a directed path from v to u, which can be done with a simple BFS/DFS.
So the simplest way of doing it is the following (n = |V|, m = |E|):
Inserting (u,v): Check whether there is a path from v to u (BFS/DFS). Time complexity: O(m)
Deleting edges: Simply remove them from the graph. Time complexity: O(1)
Although inserting (u,v) takes O(m) time in the worst case, it is probably pretty fast in your situation. When doing the BFS/DFS starting from v to check whether u is reachable, you only visit vertices that are reachable from v. I would guess that in your setting the graph is pretty sparse and that the number of vertices reachable by another is not that high.
However, if you want to improve the theoretical running time, here are some hints (mostly showing that this will not be very easy). Assume we aim for testing in O(1) time whether there exists a directed path from v to u. The keyword in this context is the transitive closure of a DAG (i.e., a graph that contains an edge (u, v) if and only if there is a directed path from u to v in the DAG). Unfortunately, maintaining the transitive closure in a dynamic setting seems to be not that simple. There are several papers considering this problem and all papers I found were STOC or FOCS papers, which indicates that they are very involved. The newest (and fastest) result I found is in the paper Dynamic Transitive Closure via Dynamic Matrix Inverse by Sankowski (http://dl.acm.org/citation.cfm?id=1033207).
Even if you are willing to understand one of those dynamic transitive closure algorithms (or even want to implement it), they will not give you any speed up for the following reason. These algorithms are designed for the situation, where you have a lot of connectivity queries (which then can be performed in O(1) time) and only few changes in the graph. The goal then is to make these changes cheaper than recomputing the transitive closure. However, this update is still slower that a single check for connectivity. Thus, if you need to do an update on every connectivity query, it is better to use the simple approach mentioned above.
So why do I mention this approach of maintaining the transitive closure if it does not fit your needs? Well, it shows that searching an algorithm consuming only O(1) query time does probably not lead you to a solution faster than the simple one using BFS/DFS. What you could try is to get a query time that is faster than O(m) but worse than O(1), while updates are also faster than O(m). This is a very interesting problem, but it sounds to me like a very ambitious goal (so maybe do not spend too much time on trying to achieve it..).
As Mark suggested it is possible to use data structure that stores connected nodes. It is the best to use boolean matrix |V|x|V|. Values can be initialized with Floyd–Warshall algorithm. That is done in O(|V|^3).
Let T(i) be set of vertices that have path to vertex i, and F(j) set of vertices where exists path from vertex j. First are true's in i'th row and second true's in j'th column.
Adding an edge (i,j) is simple operation. If i and j wasn't connected before, than for each a from T(i) and each b from F(j) set matrix element (a,b) to true. But operation isn't cheap. In worst case it is O(|V|^2). That is in case of directed line, and adding edge from end to start vertex makes all vertices connected to all other vertices.
Removing an edge (i,j) is not so simple, but not more expensive operation in the worst case :-) If there is a path from i to j after removing edge, than nothing changes. That is checked with Dijkstra, less than O(|V|^2). Vertices that are not connected any more are (a,b):
a in T(i) - i - T(j),
b in F(j) + j
Only T(j) is changed with removing edge (i,j), so it has to be recalculated. That is done by any kind of graph traversing (BFS, DFS), by going in opposite edge direction from vertex j. That is done in less then O(|V|^2). Since setting of matrix element is in worst case is again O(|V|^2), this operation has same worst case complexity as adding edge.
This is a problem which I recently faced in a slightly different situation (optimal ordering of interdependent compiler instructions).
While I can't improve on O(n*n) theoretical bounds, after a fair amount of experimentation and assuming heuristics for my case (for example, assuming that the initial ordering wasn't created maliciously) the following was the best compromise algorithm in terms of performance.
(In my case I had an acceptable "right side failure": after the initial nodes and arcs were added (which was guaranteed to be possible), it was acceptable for the optimiser to occasionally reject the addition of further arcs where one could actually be added. This approximation isn't necessary for this algorithm when carried to completion, but it does admit such an approximation if you wish to do so, and so limiting its runtime further).
While a graph is topologically sorted, it is guaranteed to be cycle-free. In the first phase when I had a static bulk of nodes and arcs to add, I added the nodes and then topologically sorted them.
During the second phase, adding additional arcs, there are two situations when considering an arc from A to B. If A already lies to the left of B in the sort, an arc can simply be added and no cycle can be generated, as the list is still topologically sorted.
If B is to the left of A, we consider the sub-sequence between B and A and partition it into two disjoint sequences X, Y, where X is those nodes which can reach A (and Y the others). If A is not reachable from B, ie there are no direct arcs from B into X or to A, then the sequence can be reordered XABY before adding the A to B arc, showing it is still cycle-free and maintaining the topological sort. The efficiency over the naive algorithm here is that we only need consider the subsequence between B and A as our list is topologically sorted: A is not reachable from any node to the right of A. For my situation, where localised reorderings are the most frequent and important, this an important gain.
As we don't reorder within the sequences X,A,B,Y, clearly any arcs which start or end within the same sequence are still ordered correctly, and the same in each flank, and any "fly-over" arcs from the left to the right flanks. Any arcs between the flanks and X,A,B,Y are also still ordered correctly as our reordering is restricted to this local region. So we only need to consider arcs between our four sequences. Consider each possible "problematic" arc for our final ordering XABY in turn: YB YA YX BA BX AX. Our initial order was B[XY]A, so AX and YB cannot occur. X reaches A, but Y does not, therefore YX and YA do not occur or A could be reached from the source of the arc in Y (potentially via X) a contradiction. Our criterion for acceptability was that there are no links BX or BA. So there are no problematic arcs, and we are still topologically sorted.
Our only acceptability criterion (that A is not reachable from B) is clearly sufficient to create a cycle on adding the arc A->B: B -(X)-> A -> B, so the converse is also shown.
This can be implemented reasonably efficiently if we can add a flag to each node. Consider the nodes [BXY] going right-to-left from the node immediately to the left of A. If that node has a direct arc to A then set the flag. At an arbitrary such node, we need only consider direct outgoing arcs: the nodes to its right are either after A (and so irrelevant), or else have already been flagged if reachable from A, so the flag on such an arbitrary node is set when any flagged nodes are encountered by direct link. If B is not flagged at the end of the process, the reordering is acceptable and the flagged nodes comprise X.
Though this always yields a correct ordering if carried to completion (as far as I can tell), as I mentioned in the introduction it is particularly efficient if your initial build is approximately correct (in the sense of accommodating of likely additional arcs without reordering).
There also exists an effective approximation, if your context is such that "outrageous" arcs can be rejected (those which would massively reorder) by limiting the A to B distance you are prepared to scan. If you have an initial list of the additional arcs you wish to add, they can be ordered by increasing distance in the initial ordering until you run out of some scanning "credit", and call your optimisation a day at that point.
If the graph is directed, you would only have to check the parent nodes (navigate up until you reach the root) of the node where the new edge should start. If one of the parent nodes is equal to the end of the edge, adding the edge would create a cycle.
If all previous jobs are in Topologically sorted order. Then if you add an edge that appears to brake the sort, and can not be fixed, then you have a cycle.
https://stackoverflow.com/a/261621/831850
So if we have a sorted list of nodes:
1, 2, 3, ..., x, ..., z, ...
Such that each node is waiting for nodes to its left.
Say we want to add an edge from x->z. Well that appears to brake the sort. So we can move the node at x to position z+1 which will fix the sort iif none of the nodes (x, z] have an edge to the node at x.

Minimum spanning tree with two edges tied

I'd like to solve a harder version of the minimum spanning tree problem.
There are N vertices. Also there are 2M edges numbered by 1, 2, .., 2M. The graph is connected, undirected, and weighted. I'd like to choose some edges to make the graph still connected and make the total cost as small as possible. There is one restriction: an edge numbered by 2k and an edge numbered by 2k-1 are tied, so both should be chosen or both should not be chosen. So, if I want to choose edge 3, I must choose edge 4 too.
So, what is the minimum total cost to make the graph connected?
My thoughts:
Let's call two edges 2k and 2k+1 a edge set.
Let's call an edge valid if it merges two different components.
Let's call an edge set good if both of the edges are valid.
First add exactly m edge sets which are good in increasing order of cost. Then iterate all the edge sets in increasing order of cost, and add the set if at least one edge is valid. m should be iterated from 0 to M.
Run an kruskal algorithm with some variation: The cost of an edge e varies.
If an edge set which contains e is good, the cost is: (the cost of the edge set) / 2.
Otherwise, the cost is: (the cost of the edge set).
I cannot prove whether kruskal algorithm is correct even if the cost changes.
Sorry for the poor English, but I'd like to solve this problem. Is it NP-hard or something, or is there a good solution? :D Thanks to you in advance!
As I speculated earlier, this problem is NP-hard. I'm not sure about inapproximability; there's a very simple 2-approximation (split each pair in half, retaining the whole cost for both halves, and run your favorite vanilla MST algorithm).
Given an algorithm for this problem, we can solve the NP-hard Hamilton cycle problem as follows.
Let G = (V, E) be the instance of Hamilton cycle. Clone all of the other vertices, denoting the clone of vi by vi'. We duplicate each edge e = {vi, vj} (making a multigraph; we can do this reduction with simple graphs at the cost of clarity), and, letting v0 be an arbitrary original vertex, we pair one copy with {v0, vi'} and the other with {v0, vj'}.
No MST can use fewer than n pairs, one to connect each cloned vertex to v0. The interesting thing is that the other halves of the pairs of a candidate with n pairs like this can be interpreted as an oriented subgraph of G where each vertex has out-degree 1 (use the index in the cloned bit as the tail). This graph connects the original vertices if and only if it's a Hamilton cycle on them.
There are various ways to apply integer programming. Here's a simple one and a more complicated one. First we formulate a binary variable x_i for each i that is 1 if edge pair 2i-1, 2i is chosen. The problem template looks like
minimize sum_i w_i x_i (drop the w_i if the problem is unweighted)
subject to
<connectivity>
for all i, x_i in {0, 1}.
Of course I have left out the interesting constraints :). One way to enforce connectivity is to solve this formulation with no constraints at first, then examine the solution. If it's connected, then great -- we're done. Otherwise, find a set of vertices S such that there are no edges between S and its complement, and add a constraint
sum_{i such that x_i connects S with its complement} x_i >= 1
and repeat.
Another way is to generate constraints like this inside of the solver working on the linear relaxation of the integer program. Usually MIP libraries have a feature that allows this. The fractional problem has fractional connectivity, however, which means finding min cuts to check feasibility. I would expect this approach to be faster, but I must apologize as I don't have the energy to describe it detail.
I'm not sure if it's the best solution, but my first approach would be a search using backtracking:
Of all edge pairs, mark those that could be removed without disconnecting the graph.
Remove one of these sets and find the optimal solution for the remaining graph.
Put the pair back and remove the next one instead, find the best solution for that.
This works, but is slow and unelegant. It might be possible to rescue this approach though with a few adjustments that avoid unnecessary branches.
Firstly, the edge pairs that could still be removed is a set that only shrinks when going deeper. So, in the next recursion, you only need to check for those in the previous set of possibly removable edge pairs. Also, since the order in which you remove the edge pairs doesn't matter, you shouldn't consider any edge pairs that were already considered before.
Then, checking if two nodes are connected is expensive. If you cache the alternative route for an edge, you can check relatively quick whether that route still exists. If it doesn't, you have to run the expensive check, because even though that one route ceased to exist, there might still be others.
Then, some more pruning of the tree: Your set of removable edge pairs gives a lower bound to the weight that the optimal solution has. Further, any existing solution gives an upper bound to the optimal solution. If a set of removable edges doesn't even have a chance to find a better solution than the best one you had before, you can stop there and backtrack.
Lastly, be greedy. Using a regular greedy algorithm will not give you an optimal solution, but it will quickly raise the bar for any solution, making pruning more effective. Therefore, attempt to remove the edge pairs in the order of their weight loss.

Time complexity of creating a minimal spanning tree if the number of edges is known

Suppose that the number of edges of a connected graph is known and the weight of each edge is distinct, would it possible to create a minimal spanning tree in linear time?
To do this we must look at each edge; and during this loop there can contain no searches otherwise it would result in at least n log n time. I'm not sure how to do this without searching in the loop. It would mean that, somehow we must only look at each edge once, and decide rather to include it or not based on some "static" previous values that does not involve a growing data structure.
So.. let's say we keep the endpoints of the node in question, then look at the next node, if the next node has the same vertices as prev, then compare the weight of prev and current node and keep the lower one. If the current node's endpoints are not equal to prev, then it is in a different component .. now I am stuck because we cannot create a hash or array to keep track of the component nodes that are already added while look through each edge in linear time.
Another approach I thought of is to find the edge with the minimal weight; since the edge weights are distinct this edge will be part of any MST. Then.. I am stuck. Since we cannot do this for n - 1 edges in linear time.
Any hints?
EDIT
What if we know the number of nodes, the number of edges and also that each edge weight is distinct? Say, for example, there are n nodes, n + 6 edges?
Then we would only have to find and remove the correct 7 edges correct?
To the best of my knowledge there is no way to compute an MST faster by knowing how many edges there are in the graph and that they are distinct. In the worst case, you would have to look at every edge in the graph before finding the minimum-cost edge (which must be in the MST), which takes Ω(m) time in the worst case. Therefore, I'll claim that any MST algorithm must take Ω(m) time in the worst case.
However, if we're already doing Ω(m) work in the worst-case, we could do the following preprocessing step on any MST algorithm:
Scan over the edges and count up how many there are.
Add an epsilon value to each edge weight to ensure the edges are unique.
This can be done in time Ω(m) as well. Consequently, if there were a way to speed up MST computation knowing the number of edges and that the edge costs are distinct, we would just do this preprocessing step on any current MST algorithm to try to get faster performance. Since to the best of my knowledge no MST algorithm actually tries to do this for performance reasons, I would suspect that there isn't a (known) way to get a faster MST algorithm based on this extra knowledge.
Hope this helps!
There's a famous randomised linear-time algorithm for minimum spanning trees whose complexity is linear in the number of edges. See "A randomized linear-time algorithm to find minimum spanning trees" by Karger, Klein, and Tarjan.
The key result in the paper is their "sampling lemma" -- that, if you independently randomly select a subset of the edges with probability p and find the minimum spanning tree of this subgraph, then there are only |V|/p edges that are better than the worst edge in the tree path connecting its ends.
As templatetypedef noted, you can't beat linear-time. That all edge weights are distinct is a common assumption that simplifies analysis; if anything, it makes MST algorithms run a little slower.
The fact that a number of edges (N) is known does not influence the complexity in any way. N is still a finite but unbounded variable, and each graph will have different N. If you place a upper bound on N, say, 1 million, then the complexity is O(1 million log 1 million) = O(1).
The fact that each edge has distinct weight does not influence the program either, because it does not say anything about the graph's structure. Therefore knowledge about current case cannot influence further processing, as we cannot predict how the graph's structure will look like in the next step.
If the number of edges is close to n, like in this case n-6 (after edit), we know that we only need to remove 7 edges as every spanning tree has only n-1 edges.
The Cycle Property shows that the most expensive edge in a cycle does not belong to any Minimum Spanning tree(assuming all edges are distinct) and thus, should be removed.
Now you can simply apply BFS or DFS to identify a cycle and remove the most expensive edge. So, overall, we need to run BFS 7 times. This takes 7*n time and gives us a time complexity of O(n). Again, this is only true if the number of edges is close to the number of nodes.

graph algorithm to detect even cycles

I have an undirected graph. One edge in that graph is special. I want to find all other edges that are part of a even cycle containing the first edge.
I don't need to enumerate all the cycles, that would be inherently NP I think. I just need to know, for each each edge, whether it satisfies the conditions above.
A brute force search works of course but is too slow, and I'm struggling to come up with anything better. Any help appreciated.
I think we have an answer (I must credit my colleague with the idea). Essentially his idea is to do a flood fill algorithm through the space of even cycles. This works because if you have a large even cycle formed by merging two smaller cycles then the smaller cycles must have been both even or both odd. Similarly merging an odd and even cycle always forms a larger odd cycle.
This is a practical option only because I can imagine pathological cases consisting of alternating even and odd cycles. In this case we would never find two adjacent even cycles and so the algorithm would be slow. But I'm confident that such cases don't arise in real chemistry. At least in chemistry as it's currently known, 30 years ago we'd never heard of fullerenes.
If your graph has a small node degree, you might consider using a different graph representation:
Let three atoms u,v,w and two chemical bonds e=(u,v) and k=(v,w). A typical way of representing such data is to store u,v,w as nodes and e,k as edges in a graph.
However, one may represent e and k as nodes in the graph, having edges like f=(e,k) where f represents a 2-step link from u to w, f=(e,k) or f=(u,v,w). Running any algorithm to find cycles on such a graph will return all even cycles on the original graph.
Of course, this is efficient only if the original graph has a small node degree. When a user performs an edit, you can easily edit accordingly the alternative representation.

Finding X the lowest cost trees in graph

I have Graph with N nodes and edges with cost. (graph may be Complete but also can contain zero edges).
I want to find K trees in the graph (K < N) to ensure every node is visited and cost is the lowest possible.
Any recommendations what the best approach could be?
I tried to modify the problem to finding just single minimal spanning tree, but didn't succeeded.
Thank you for any hint!
EDIT
little detail, which can be significant. To cost is not related to crossing the edge. The cost is the price to BUILD such edge. Once edge is built, you can traverse it forward and backwards with no cost. The problem is not to "ride along all nodes", the problem is about "creating a net among all nodes". I am sorry for previous explanation
The story
Here is the story i have heard and trying to solve.
There is a city, without connection to electricity. Electrical company is able to connect just K houses with electricity. The other houses can be connected by dropping cables from already connected houses. But dropping this cable cost something. The goal is to choose which K houses will be connected directly to power plant and which houses will be connected with separate cables to ensure minimal cable cost and all houses coverage :)
As others have mentioned, this is NP hard. However, if you're willing to accept a good solution, you could use simulated annealing. For example, the traveling salesman problem is NP hard, yet near-optimal solutions can be found using simulated annealing, e.g. http://www.codeproject.com/Articles/26758/Simulated-Annealing-Solving-the-Travelling-Salesma
You are describing something like a cardinality constrained path cover. It's in the Traveling Salesman/ Vehicle routing family of problems and is NP-Hard. To create an algorithm you should ask
Are you only going to run it on small graphs.
Are you only going to run it on special cases of graphs which do have exact algorithms.
Can you live with a heuristic that solves the problem approximately.
Assume you can find a minimum spanning tree in O(V^2) using prim's algorithm.
For each vertex, find the minimum spanning tree with that vertex as the root.
This will be O(V^3) as you run the algorithm V times.
Sort these by total mass (sum of weights of their vertices) of the graph. This is O(V^2 lg V) which is consumed by the O(V^3) so essentially free in terms of order complexity.
Take the X least massive graphs - the roots of these are your "anchors" that are connected directly to the grid, as they are mostly likely to have the shortest paths. To determine which route it takes, you simply follow the path to root in each node in each tree and wire up whatever is the shortest. (This may be further optimized by sorting all paths to root and using only the shortest ones first. This will allow for optimizations on the next iterations. Finding path to root is O(V). Finding it for all V X times is O(V^2 * X). Because you would be doing this for every V, you're looking at O(V^3 * X). This is more than your biggest complexity, but I think the average case on these will be small, even if their worst case is large).
I cannot prove that this is the optimal solution. In fact, I am certain it is not. But when you consider an electrical grid of 100,000 homes, you can not consider (with any practical application) an NP hard solution. This gives you a very good solution in O(V^3 * X), which I imagine is going to give you a solution very close to optimal.
Looking at your story, I think that what you call a path can be a tree, which means that we don't have to worry about Hamiltonian circuits.
Looking at the proof of correctness of Prim's algorithm at http://en.wikipedia.org/wiki/Prim%27s_algorithm, consider taking a minimum spanning tree and removing the most expensive X-1 links. I think the proof there shows that the result has the same cost as the best possible answer to your problem: the only difference is that when you compare edges, you may find that the new edge join two separated components, but in this case you can maintain the number of separated components by removing an edge with cost at most that of the new edge.
So I think an answer for your problem is to take a minimum spanning tree and remove the X-1 most expensive links. This is certainly the case for X=1!
Here is attempt at solving this...
For X=1 I can calculate minimal spanning tree (MST) with Prim's algorithm from each node (this node is the only one connected to the grid) and select the one with the lowest overall cost
For X=2 I create extra node (Power plant node) beside my graph. I connect it with random node (eg. N0) by edge with cost of 0. I am now sure I have one power plant plug right (the random node will definitely be in one of the tree, so whole tree will be connected). Now the iterative part. I take other node (eg. N1) and again connected with PP with 0 cost edge. Now I calculate MST. Then repeat this process with replacing N1 with N2, N3 ...
So I will test every pair [N0, NX]. The lowest cost MST wins.
For X>2 is it really the same as for X=2, but I have to test connect to PP every (x-1)-tuple and calculate MST
with x^2 for MST I have complexity about (N over X-1) * x^2... Pretty complex, but I think it will give me THE OPTIMAL solution
what do you think?
edit by random node I mean random but FIXED node
attempt to visualize for x=2 (each description belongs to image above it)
Let this be our city, nodes A - F are houses, edges are candidates to future cables (each has some cost to build)
Just for image, this could be the solution
Let the green one be the power plant, this is how can look connection to one tree
But this different connection is really the same (connection to power plant(pp) cost the same, cables remains untouched). That is why we can set one of the nodes as fixed point of contact to the pp. We can be sure, that the node will be in one of the trees, and it does not matter where in the tree is.
So let this be our fixed situation with G as PP. Edge (B,G) with zero cost is added.
Now I am trying to connect second connection with PP (A,G, cost 0)
Now I calculate MST from the PP. Because red edges are the cheapest (the can actually have even negative cost), is it sure, that both of them will be in MST.
So when running MST I get something like this. Imagine detaching PP and two MINIMAL COST trees left. This is the best solution for A and B are the connections to PP. I store the cost and move on.
Now I do the same for B and C connections
I could get something like this, so compare cost to previous one and choose the better one.
This way I have to try all the connection pairs (B,A) (B,C) (B,D) (B,E) (B,F) and the cheapest one is the winner.
For X=3 I would just test other tuples with one fixed again. (A,B,C) (A,B,D) ... (A,C,D) ... (A,E,F)
I just came up with the easy solution as follows:
N - node count
C - direct connections to the grid
E - available edges
1, Sort all edges by cost
2, Repeat (N-C) times:
Take the cheapest edge available
Check if adding this edge will not caused circles in already added edge
If not, add this edge
3, That is all... You will end up with C disjoint sets of edges, connect every set to the grid
Sounds like the famous Traveling Salesman problem. The problem known to be NP-hard. Take a look at the Wikipedia as your starting point: http://en.wikipedia.org/wiki/Travelling_salesman_problem

Resources