Using a minimum spanning tree algorithm - algorithm

Suppose I have a weighted non-directed graph G = (V,E). Each vertex has a list of elements.
We start in a vertex root and start looking for all occurances of elements with a value x. We wish to travel the least amount of distance (in terms of edge weight) to uncover all occurances of elements with value x.
The way I think of it, a MST will contain all vertices (and hence all vertices that satisfy our condition). Therefore the algorithm to uncover all occurances can just be done by finding the shortest path from root to all other vertices (this will be done on the MST of course).
Edit :
As Louis pointed out, the MST will not work in all cases if the root is chosen arbitrarily. However, to make things clear, the root is part of the input and therefore there will be one and only one MST possible (given that the edges have distinct weights). This spanning tree will indeed have all minimum-cost paths to all other vertices in the graph starting from the root.

I don't think this will work. Consider the following example:
| /
5 3
| /
| /
The minimum spanning tree contains all three edges with weight 3, but this is clearly not the optimum solution.
If I understand the problem correctly, you want to find the minimum-weight tree in the graph which includes all vertices labeled x. (That is, the correct answer would have total weight 8, and would be the two edges drawn vertically in this drawing.) But this does not include your arbitrarily selected root at all.
I am pretty confident that the following variation on Prim's algorithm would work. Not sure if it's optimal, though.
Let's say the label we are looking for is called L.
Use an all-pairs shortest path algorithm to compute d(v, w) for all v, w.
Pick some node labeled L; call this the root. (We can be sure that this will be in the result tree, since we are including all nodes labeled L.)
Initialize a priority queue with the root initialized to 0. (The priority queue will consist of vertices labeled L, and their minimum distance from any node in the tree, including vertices not labeled L.)
While the priority queue is nonempty, do the following:
Pick out the top vertex in the queue; call it v, and its distance from the tree d.
For each vertex w on the path from v to the tree, v inclusive, find the nearest L-labeled node x to w, and add x to the priority queue, or update its priority. Add w to the tree.

The answer is no, if I'm understanding correctly. Finding the minimum spanning tree will contain all vertices V, but you only want to find the vertices with value x. Thus, your MST result may have unneeded vertices adding extra path length and therefore be sub-optimal.

An example has been given where the MST M1 from Root differs from an MST M2 containing all x nodes but not containing Root.
Here's an example where Root is in both MST's: Let graph G contain nodes R,S,T,U,V (R=Root), and a clockwise path R-S-T-U-V-R, with edge weights 1,1,3,2,2 going clockwise, and x at R, S, T, U. The first MST, M1, will have subtrees S-T and V-U below R, with cost 6 = 2+4, and cost-3 edge T-U not included in M1. But M2 has subtree S-T-U (only) below R, at cost 5.

Negative. If the idea is to find for every node that contains 'x' a separate path from root to it, and minimize the total cost of the paths, then you can just use simple shortest-path calculation separately for every node starting from the root, and put the paths together.
Some of those shortest paths will not be in the minimum spanning tree, so if this is your goal, the MST solution does not work. MST optimizes the cost of the tree, not the sum of costs of paths from root to the nodes.
If your idea is to find one path that starts from root and traverses through all nodes that contain 'x', then this is the traveling salesman problem and it is an NP-complete optimization problem, i.e. very hard.


Finding the shortest path with only passing specific edge less or equal to one time in Graph

Given a undirected graph that it has ordinary edges and specific edges, our goal is to find the sum of the shortest path's weight between two vertices(start vertex to end vertex) with only walk through specific edge equal or less than one time. In other words, there are multiple specific edges, and only at most one of them can be used.
This is a problem that I faced in my Data-Structure homework, and I stuck at the first step of the way to storage the weights of the edge in Graph. Because there are two kinds of edge in Graph, I have no idea that how to solve this problem.
I know that I can obtain the shortest path by using Dijkstra’s Algorithm, but during the process, how can I modify the Algorithm to meet the requirement of the restriction?
Thanks a lot for answering my question!
The solution is to duplicate the graph as follows:
Duplicate the vertices, such that for each original vertex A, you have an A and an A'.
If in the original graph there is a normal edge between A and B, then in the new graph, place an edge between A and B and also between A' and B'
If in the original graph there is a specific edge between A and B, then in the new graph place a (directed) edge from A to B' (not the inverse!) and from B to A' (again: not the inverse!). These edges should be directed.
If now the task was to find the shortest path between S and D, then solve in the new graph the problem of finding the shortest path between S and D or S and D', which ever is shortest. You can use a standard implementation of Dijkstra's algorithm for that, starting in S and ending when you find either D or D'.
Given n specific edges run Dijkstra's search n times.
On each run, one of the n nodes (let call it node i) should be set to its real weight and all other n-1 nodes to an infinite value.
At the end of each run store the shortest path and step i
At the end of all runs select from the stored paths the shortest one.
set all n edges weight to infinity
for i=0; i < n ; i++ {
set edge i to it real weight
run run Dijkstra's search
store path
set all n edges weight to infinity
select the shortest path from the stored paths.

How to prove there always exists a minimax path completely on the MST

A minimax path in an undirected graph is a path between two vertices v, w that minimizes the maximum weight of the edges on the path.
Let T be the minimum spanning tree of a given graph G=(V,E). How can I prove that, for any pair of vertices v, w in V, there always exists a minimax path between v and w that is completely on T.
I have tried to assume there is no minimax path completely on T, but I don't know how to get a contradiction.
Assume there exists a minimax path P between vertices u and v that is not completely on the minimum spanning tree T.
This means there is an edge A(p, q) in P that is not in T.
Let Q be the path in T from p to q.
Let B be an edge with the greatest weight in Q (in the imaged graph the length of the edge represents its weight):
T is marked in green
P = (u,p,q,v)
There are now 2 conditions to consider:
weight(B) > weight(A): In that case T is not a minimum spanning tree. If you would remove B from T and add A instead, you would still have a spanning tree, but its total weight would have decreased. As this is a contradiction (T is given as being a minimum spanning tree), the only possibility left is:
weight(B) <= weight(A): In that case you could remove A from P and add the edges from Q to it instead, and it would still be a minimax path, as we did not include an edge with a greater weight than that was already on that path before.
Note that this replacement will make the minimax path longer, but that is not an issue. There can be several paths between two vertices that all minimise the maximum edge weight -- there is no requirement that the minimax path be the shortest of those.
For every edge A on a minimax path that is not in T, an edge replacement can be done as described in point 2, thereby creating a minimax path that will be completely on T.
Suppose that there is a minimax path outside the minimum spanning tree that does better than the path on the minimum spanning tree. Remove the most costly edge on the minimum spanning tree path from the tree, splitting the graph into two connected components. You can get from one component to another using the minimax path. As you go along this path, there must be one edge that leaves one of the components and enters the other component. Add this edge to the minimum spanning tree. The graph is now connected again and the total cost of the minimum spanning tree has reduced, because every edge on the minimax path had a cost less than that of the most costly edge on the minimum spanning tree. So we have a contradiction and no such minimax path can exist.

Dijkstra's shortest path algorithm won't work

I have a graph with positive edge weights and positive node weights. The length of a path is defined as the sum of all the edge weights along the path, plus the maximum node weight encountered along the path.
I'd initially thought that a modified Dijkstra would work, but I found a test case where it would fail. How should I go about solving this problem? Are there any standard algorithms I should look at?
My modified Dijkstra is as follows: At each node I record the shortest path so far, and also the maximum node weight I've seen so far, and use that to calculate the length to neighboring nodes. Please see my comment for the details.
Here's a graph where Dijkstra fails:
The numbers in green are the node labels. Everything in blue is weights (node and edge weights). Lets say I want to compute the shortest path between nodes 1 and 7 (labeled in green). The problem with Dijkstra is that the node 4 always records the path 1-8-9-4 since its shorter than path 1-2-3-4 (former length 9 vs latter length 13). But to reach node 7, path 1-8-9-4-5-6-7 is longer than 1-2-3-4-5-6-7.
If you can forgive one order larger polynomial time, then fairly easy algorithm:
ModifiedShortestPath(u, v, G) {
X = StandardardShorestPath(u, v, G);
E = heaviest edge in X
F = all edges in G of weight >= E
Y = ModifiedShortestPath(u, v, G - F); // recur here on G without the F edges
return Min(X, Y);
The runtime of this is |E| times more than your standard shortest path.
Your graph is not that clear to begin with (too many values in blue of unclear role), which makes answers even more difficult. A much better question, a simpler graph and some straight answers in this post.
What made it clear for me, and allowed me to correct my implementation and get the correct results, was that at the end of each repetition in the loop, when it was time to pick the next node/vertex, whose unvisited neighbours I should examine, I had to pick from the whole pool of unvisited vertices, not just from the unvisited neighbours of the currently examined node. I was under the false impression that once you pick a path at a crossroad, because the greedy nature of the algorithm takes you there, you can only follow it to the end, unvisited after unvisited node. No. You pick the next globally unvisited node each time based on the smallest tentative value, regardless of its position in the graph or whether it is connected to the current node.
I Hope that clears the confussion that others like me have experienced and has led them here.

graph - The implementation of updating Minimum Spanning Tree after adding a new edge

Here is an excise
Suppose we are given the minimum spanning tree T of a given graph G
(with n vertices and m edges) and a new edge e = (u, v) of weight w
that we will add to G. Give an efficient algorithm to find the minimum
spanning tree of the graph G + e. Your algorithm should run in O(n)
time to receive full credit.
I have this idea:
In the MST, just find out the path between u and v. Then find the edge (along the path) with maximum weight; if the maximum weight is bigger than w, then remove that edge from the MST and add the new edge to the MST.
The tricky part is how to do this in O(n) time and it is also I get stuck.
The question is that how the MST is stored. In normal Prim's algorithm, the MST is stored as a parent array, i.e., each element is the parent of the according vertex.
So suppose the excise give me a parent array indicating the MST, how can I release the above algorithm in O(n)?
First, how can I identify the path between u and v from the parent array? I can have two ancestor arrays for u and v, then check on the common ancestor, then I can get the path, although in backwards. I think for this part, to find the common ancestor, at least I have to do it in O(n^2), right?
Then, we have the path. But we still need to find the weight of each edge along the path. Since I suppose the graph will use adjacency-list for Prim's algorithm, we have to do O(m) (m is the number of edges) to locate each weight of the edge.
So I don't see it is possible to do the algorithm in O(n). Am I wrong?
The idea you have is right. Note that, finding the path between u and v is O(n). I'll assume you have a parent array identifying the MST. tracking the path (for max edge) from u to v or u to root vertex should take only O(n). If you reach root vertex, just track the path from v to u or root vertex.
Now that you have the path from u -> u1 ... -> max_path_vert1 -> max_path_vert2 -> ... -> v, remove the edge max_path_vert1->max_path_vert2 (assuming this is greater than the added edge) and reverse the parents for u->...->max_path_vert1 and mark parent[u] = v.
Edit: More explanation for clarity
Note that, in MST there will be exactly one path between any pair of vertices. So, if you can trace from u->y and v->y, you have only traced through atmost n vertices. If you traced more than n vertices that means you visited a vertex twice, which will not happen in an MST. Ok, now hopefully you're convinced it's O(n) to track from u->y and v->y. Once you have these paths, you have established a path from u->v. Do you see how? I'm assuming this is an undirected graph, since finding MST for directed graph is a different concept in itself. For undirected graph, when you have a path from x->y you have a path from y-x. So, u->y->v exist. You don't even need to trace back from y->v, since weights for v->y will be same as that of y->v. Just find the edge with the maximum weight when you trace from u->y and v->y.
Now for finding edge weights in O(1); how are you storing your current weights? Adjacency list or adjacency matrix? For O(1) access, store it the way parent vertex array is stored. So, weight[v] = weight(v, parent[v]). So, you'll have O(1) access. Hope this helps.
Well - your solution is correct.
But regarding implementation, I dont see why you are using G instead of T to find the path between u and v. Using any search traversal in T for the path between u and v, will give you O(n). - That is, you can assume that v is the root and performs a Depth-First Search algorithm [in this case, you will have to assume all neighbors of v as children] - and stop the DFS once you find u - then, the nodes in the stack corresponds to the path between u and v.
It is easy afterward to find the cost of each edge in the path (O(n)), and it is easy as well to delete/add edges. In total O(n).
Does that help somehow ?
Or maybe you are getting O(n^2) - according to my understanding - because you access the children of a vertex v in T in O(n) -- Here, you have to present your data structure as a mapped array so that the cost is reduced to O(1). [for instace, {a,b,c,u,w}(vertices) -> {0,1,2,3,4}(indices of vertices).

minimum connected subgraph containing a given set of nodes

I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.
