I am looking for an algorithm that takes an undirected graph as input and finds a subset of vertices such that the subgraph induced by those vertices forms a connected acyclic tree.
For instance, in the following figure the 'X' nodes would create a valid solution, but including any of the 'O' nodes would make it invalid.
O
/|
O-X-X-X
\ /
X-X
The usefulness of the solution to me is proportionate to the size of the subset. Although I don't need the entire maximal subset, a close approximation would be very helpful.
I've tried the obvious algorithm of starting with a random node and adding adjacent vertices if they don't induce a cycle. However, I have the feeling that this produces very suboptimal trees.
I should mention that my particular application involves graphs of ~100 nodes and ~1000 edges. This is small enough that brute-force backtracking algorithms might be feasible if well implemented (e.g. using Dancing Links, but I haven't tried this out.
This problem is called very similar to Feedback Vertex Set, and unfortunately it's NP-hard. According to the Wikipedia page, the best known approximation algorithm has an approximation ratio of 2: Becker, Ann; Geiger, Dan (1996), "Optimization of Pearl's method of conditioning and greedy-like approximation algorithms for the vertex feedback set problem.".
NP-hardness proof for "Connected Feedback Vertex Set"
I neglected the condition that the resulting graph needs to be connected, which is not the case for Feedback Vertex Set (FVS). Below I'll show that your problem, which I'll call Connected Feedback Vertex Set (CVFS), is nevertheless NP-hard.
Given an instance (G = (V, E), k) of FVS, we need to construct an instance (G' = (V', E'), k') of CFVS with the property that (G, k) is a YES-instance of FVS if and only if (G', k') is a YES-instance of CFVS. Informally this G' will look like a "stack of copies" of G, with a few extra vertices and edges. Let's do this as follows:
For each vertex v_i in V, create a path (not a clique, as I originally said in the comments...) of |V| vertices v'_i_j in V', 1 <= j <= |V|. These are the "meat vertices". (You can think of vertex v'_i_j being in "layer" j.) The vertices v'_i_1, v'_i_2, ..., v'_i_|V| are the "strand" of meat corresponding to vertex v_i in G (yes, terrible name...).
For each edge (v_i, v_j) in E, create all |V| corresponding "parallel" edges between the corresponding vertices in G' -- that is, create the edges (v'_i_1, v'_j_1), (v'_i_2, v'_j_2), ..., (v'_i_|V|, v'_j_|V|). (These edges all connect vertices that are in the same layer.)
For each vertex v_i in V, also create an additional "skeleton vertex" u'_i in V'. Make this u'_i adjacent to v'_i_1.
Add another vertex r to V', and make it adjacent to every skeleton vertex u'_i.
Finally, set k' = |V|*k + |V| - 1.
First I'll show that if the FVS instance (G, k) is a YES-instance, then (G', k') is a YES-instance of your problem. Let X be any solution (i.e., set of deleted vertices) to the FVS instance (G, k) that leaves at least 1 vertex of G undeleted (such a solution must exist, since a 1-vertex graph contains no cycle); then we can construct a solution X' to the instance of your problem as follows:
For each vertex v_i deleted in the FVS solution X, we can delete the corresponding path v_i_1, ..., v_i_|V| from G' at a total cost of at most |V|*k (deleting each path costs |V| vertex deletions, and at most k vertices were deleted from G by X). This guarantees that there will be no cycle consisting only of meat vertices in G'-X' (if there were, this would contradict the feasibility of the FVS solution X to (G, k)).
For each connected component in the FVS solution X, we can delete all but 1 of the corresponding skeleton vertices in G'. What we are left with in G' is a stack of |V| copies of the FVS solution G-X, plus a single skeleton vertex per component of that solution, plus the root vertex r. Since we only have a single path to r from each connected component (via a single skeleton vertex per component), there can be no cycle in G'. Since G-X contains at least 1 connected component, this can involve at most |V|-1 deletions, so at most |V|*k + |V| - 1 deletions were needed overall, so the answer to the constructed CFVS instance (G', k') is YES.
Secondly I'll show that if the constructed instance (G', k') of your problem is a YES-instance, then the original instance (G, k) of FVS is a YES-instance.
Let X' be any solution (i.e., set of deleted vertices) to the constructed instance (G', k') of CFVS. Consider the subgraph induced by each layer of meat vertices in G'-X': there are |V| such layers. In general, different layers could contain different numbers of deletions. Choose any layer j that contains a minimum number of deletions; since G'-X' is cycle-free, so is every induced subgraph, including in particular layer j. The number of deletions in layer j is at most k'/|V|, since otherwise (by the minimal choice of j) there would be strictly more than k' deletions overall, a contradiction. But any integer <= k'/|V| must be <= RoundDown((|V|*k + |V| - 1) / |V|) = k, and layer j is just a copy of the original FVS problem (G, k), so it is possible to destroy every cycle in layer j -- and thus in the original FVS instance (G, k) -- with at most k deletions. This implies that (G, k) is a YES-instance of FVS.
(G, k) being a YES-instance of FVS implies (G', k') being a YES-instance of CFVS, and vice versa, so (G, k) being a NO-instance of FVS implies (G', k') being a NO-instance of CFVS, so the problems instances are equivalent. Clearly (G', k') can be constructed in polynomial time from (G, k), so it follows that CVFS is NP-hard. It's also clearly NP-complete, since a solution to a YES-instance can be checked for correctness (that is, cycle-freeness and connectedness) in O(|V|+|E|) time with a single DFS.
This sound like finding the maximum spanning tree and then taking a subset using the structure of it should probably help.
If not, try this:
Find the node with least amount of edges.
If by adding it to the subset you still have a tree, mark it as belonging to the subset and "erase" all its edges from the graph.
Repeat step 1 until all nodes not in the subset can't be added.
Hope this helps
Related
Say I have some graph with nodes and undirected edges (the edges may have a weight associated to them).
I want to find all (or at least one) connected subgraphs that maximize the sum of the degree centrality of all nodes in the subgraph (the degree centrality is based on the original graph) under the constraint that the sum of the weighted edges is < X.
Is there an algorithm that will do this?
A quick search took me to this description of degree centrality. It turns out that the "degree centrality" of a vertex is simply its degree (neighbour count).
Unfortunately your problem is NP-hard, so it's very unlikely that any algorithm exists that can solve every instance quickly. First notice that, assuming edge weights are positive, the edges in any optimal solution necessarily form a tree, since in any non-tree you can delete at least 1 edge without destroying connectivity, and doing so will decrease the total edge weight of the subgraph. (So, as a positive spinoff: If you compute the minimum spanning tree of your input graph and find that it happens to have total weight < X, then you can simply include every vertex in the graph in your solution.)
Let's formulate a decision version of your problem. Given a graph G = (V, E) with positive (I'll assume) weights on the edges, a number X and a number Y, we want to know: Does there exist a connected subgraph G' = (V', E') of G such that the sum of the edge weights in E' is at most X, and the sum of the degrees of V' (w.r.t. G) is at least Y? (Clearly this is no harder than your original problem: If you had an algorithm to solve your problem, then you could just run it, add up the degrees of the vertices in the subgraph it found and compare this to Y to answer "my" problem.)
Here's a reduction from the NP-hard Steiner Tree in Graphs problem, where we are given a graph G = (V, E) with positive weights on the edges, a subset S of its vertices, and a number k, and the task is to determine whether it's possible to connect the vertices in S using a subset of edges with total weight at most k. (As I showed above, the solution will necessarily be a tree.) If the sum of all degrees in G is d, then all we need to do to transform G into an input for your problem is the following: For each vertex s_i in S we add enough new "ballast" vertices that are each connected only to s_i, via edges with weight X+1, to bring the degree of s_i up to d+1. We set X to k, and set Y to |S|(d+1).
Now suppose that the solution to the Steiner Tree problem is YES -- that is, there exists a subset of edges having total weight <= k that does connect all the vertices in S. In that case, it's clear that the same subgraph in the instance of your problem constructed above connects (possibly among others) all the vertices in S, and since each vertex in S has degree d+1, the total degree is at least |S|(d+1), so the answer to your decision problem is also YES.
In the other direction, suppose that the answer to your decision problem is YES -- that is, there exists a subset of edges having total weight <= X ( = k) that connects a set of vertices having total degree at least |S|(d+1). We need to show that this implies a YES answer to the original Steiner Tree problem. Clearly it suffices to show that the vertex set V' of any subgraph satisfying the conditions above (i.e. edges have total weight <= k and vertices have total degree >= |S|(d+1)) contains S (possibly among other vertices). So let V' be the vertex set of such a solution, and suppose to the contrary that there is some vertex u in S that is not in V'. But then the largest sum of degrees that we could possibly make would be to include all other non-ballast vertices in the graph in V', which would give a degree total of at most (|S|-1)(d+1) + d (the first term is the degree sum for the other vertices in S; the second is an upper bound on the degree sum of all non-S vertices in G; note that none of the ballast vertices we added in could be in the subgraph, because the only way to include any of them is to use an edge of weight X+1, which we obviously can't do). But clearly (|S|-1)(d+1) + d = |S|(d+1) - 1, which is strictly less than |S|(d+1), contradicting our assumption that V' has a degree total at least |S|(d+1). So it follows that S is a subset of V', and thus that it is possible to use the same subset of edges to connect the vertices in S for a total weight of at most k, i.e. that the answer to the Steiner Tree problem is also YES.
So a YES answer to either problem implies a YES answer to the other one, in turn implying that a NO answer to either implies a NO answer to the other. Thus if it were possible to solve the decision version of your problem in polynomial time, it would imply a polynomial-time solution to the NP-hard Steiner Tree in Graphs problem. This means the decision version of your problem is itself NP-hard, and so is the optimisation version (which as I said above is at least as hard). (The decision form is also NP-complete, since a YES answer can be easily verified in polynomial time.)
Sidenote: At first I thought I had a very straightforward reduction from the NP-hard Knapsack problem: Given a list of n weights w_1, ..., w_n and a list of n profits p_1, ..., p_n, make a single central vertex c, and n other vertices v_1, ..., v_n. For each v_i, attach it to c with an edge of weight w_i, and add p_i other leaf vertices, each attached only to v_i with an edge of weight X+1. However this reduction doesn't actually work, because the profits can be exponential in the input size n, meaning that the constructed instance of your problem might need to have an exponential number of vertices, which isn't allowed for a polynomial-time reduction.
Given undirected, connected graph G={V,E}, a vertex in V(G), label him v, and a weight function f:E->R+(Positive real numbers), I need to find a MST such that v's degree is minimal. I've already noticed that if all the edges has unique weight, the MST is unique, so I believe it has something to do with repetitive weights on edges. I though about running Kruskal's algorithm, but when sorting the edges, I'll always consider edges that occur on v last. For example, if (a,b),(c,d),(v,e) are the only edges of weight k, so the possible permutations of these edges in the sorted edges array are: {(a,b),(c,d),(v,e)} or {(c,d),(a,b),(v,e)}. I've ran this variation over several graphs and it seems to work, but I couldn't prove it. Does anyone know how to prove the algorithm's correct (Meaning proving v's degree is minimal), or give a contrary example of the algorithm failing?
First note that Kruskal's algorithm can be applied to any weighted graph, whether or not it is connected. In general it results in a minimum-weight spanning forest (MSF), with one MST for each connected component. To prove that your modification of Kruskal's algorithm succeeds in finding the MST for which v has minimal degree, it helps to prove the slightly stronger result that if you apply your algorithm to a possibly disconnected graph then it succeeds in finding the MSF where the degree of v is minimized.
The proof is by induction on the number, k, of distinct weights.
Basis Case (k = 1). In this case weights can be ignored and we are trying to find a spanning forest in which the degree of v is minimized. In this case, your algorithm can be described as follows: pick edges for as long as possible according to the following two rules:
1) No selected edge forms a cycle with previously selected edges
2) An edge involving v isn't selected unless any edge which doesn't
involve v violates rule 1.
Let G' denote the graph from which v and all incident edges have been removed from G. It is easy to see that the algorithm in this special case works as follows. It starts by creating a spanning forest for G'. Then it takes those trees in the forest that are contained in v's connected component in the original graph G and connects each component to v by a single edge. Since the components connected to v in the second stage can be connected to each other in no other way (since if any connecting edge not involving v exists it would have been selected by rule 2) it is easy to see that the degree of v is minimal.
Inductive Case: Suppose that the result is true for k and G is a weighted graph with k+1 distinct weights and v is a specified vertex in G. Sort the distinct weights in increasing order (so that weight k+1 is the longest of the distinct weights -- say w_{k+1}). Let G' be the sub-graph of G with the same vertex set but with all edges of weight w_{k+1} removed. Since the edges are sorted in the order of increasing weight, note that the modified Kruskal's algorithm in effect starts by applying itself to G'. Thus -- by the induction hypothesis prior to considering edges of weight w_{k+1}, the algorithm has succeeded in constructing an MSF F' of G' for which the degree, d' of v in G' is minimized.
As a final step, modified Kruskal's applied to the overall graph G will merge certain of the trees in F' together by adding edges of weight w_{k+1}. One way to conceptualize the final step is the think of F' as a graph where two trees are connected exactly when there is an edge of weight w_{k+1} from some node in the first tree to some node in the second tree. We have (almost) the basis case with F'. Modified Kruskal's will add edged of weight w_{k+1} until it can't do so anymore -- and won't add an edge connecting to v unless there is no other way to connect to trees in F' that need to be connected to get a spanning forest for the original graph G.
The final degree of v in the resulting MSF is d = d'+d" where d" is the number of edges of weight w_{k+1} added at the final step. Neither d' nor d" can be made any smaller, hence it follows that d can't be made any smaller (since the degree of v in any spanning forest can be written as the sum of the number of edges whose weight is less than w_{k+1} coming into v and the number off edges of weight w_{k+1} coming into v).
QED.
There is still an element of hand-waving in this, especially with the final step -- but Stack Overflow isn't a peer-reviewed journal. Anyway, the overall logic should be clear enough.
One final remark -- it seems fairly clear that Prim's algorithm can be similarly modified for this problem. Have you looked into that?
I want to decompose a directed acyclic graph into minimum number of components such that in each component the following property holds true-
For all pair of vertices (u,v) in a components, there is a path from u to v or from v to u.
Is there any algorithm for this?
I know that when the or is replaced by and in the condition, it is same as finding the number of strongly connected components(which is possible using DFS).
*EDIT: * What happens if the Directed graph contains cycles (i.e. it is not acyclic)?
My idea is to order the graph topologically O(n) using DFS, and then think about for what vertices can this property be false. It can be false for those who are joining from 2 different branches, or who are spliting into 2 different branches.
I would go from any starting vertex(lowest in topological ordering) and follow it's path going into random branches, till you cannot go further and delete this path from graph(first component).This would be repeated till the graph is empty and you have all such components.
It seems like a greedy algorithm, but consider you find a very short path in the first run(by having a random bad luck) or you find a longest path(good luck). Then you would still have to find that small branch component in another step of algorithm.
Complexity would be O(n*number of components).
When there is and condition, you should be considering any oriented graph, as DAG cannot have strongly connected component.
The two existing answers both have problems that I've outlined in comments. But there's a more fundamental reason why no decomposition into components can work in general. First, let's concisely express the relation "u and v belong in the same component of the decomposition" as u # v.
It's not transitive
In order to represent a relation # as vertices in a component, that relation must be an equivalence relation, which means among other things that it must transitive: That is, if x # y and y # z, it must necessarily be true that x # z. Is our relation # transitive? Unfortunately the answer is "No", since it may be that there is a path from x to y (so that x # y), and a path from z to y (so that y # z), but no path from x to z or from z to x (so that x # z does not hold), as the following graph shows:
z
|
|
v
x----->y
The problem is that according to the above graph, x and y belong in the same component, and y and z belong in the same component, but x and z belong in different components, which is a contradiction. This means that, in general, it's impossible to represent the relationship # as a decomposition into components.
If an instance happens to be transitive
So there is no solution in general -- but there can still be input graphs for which the relation # happens to be transitive, and for which we can therefore compute a solution. Here is one way to do that (though probably not the most efficient way).
Compute shortest paths between all pairs of vertices (using e.g. the Floyd-Warshall algorithm, in O(n^3) time for n vertices). Now, for every vertex pair (u, v), either d(u, v) = inf, indicating that there is no way to reach v from u at all, or not, indicating that there is some path from u to v. To answer the question "Does u # v hold?" (i.e., "Do u and v belong in the same component of the decomposition?"), we can simply calculate d(u, v) != inf || d(v, u) != inf.
This gives us a relation that we can use to build an undirected graph G' in which there is a vertex u' for each original vertex u, and an edge between two vertices u' and v' if and only if d(u, v) != inf || d(v, u) != inf. Intuitively, every connected component in this new graph must be a clique. This property can be checked in O(n^2) time by first performing a series of DFS traversals from each vertex to assign a component label to each vertex, and then checking that each pair of vertices belongs to the same component if and only if they are connected by an edge. If the property holds then the resulting cliques correspond to the desired decomposition; otherwise, there is no valid decomposition.
Interestingly, there are graphs that are not chains of strongly connected components (as claimed by Zotta), but which nonetheless do have transitive # relations. For example, a tournament is a digraph in which there is an edge, in some direction, between every pair of vertices -- so clearly # holds for every pair of vertices in such a graph. But if we number the vertices 1 to n and include only edges from lower-numbered to higher-numbered vertices, there will be no cycles, and thus the graph is not strongly connected (and if n > 2, then clearly it's not a path).
Given a connected undirected vertex-labelled graph, an edge in the graph, and some number n ≥ 2, is there an efficient algorithm to find all the connected (but not necessarily induced) subgraphs of order n that contain that edge?
I thought the obvious thing to do would be a sort of recursive traversal of the edges, starting from the given edge and taking care that no one path uses an edge more than once (though visiting the same vertex multiple times is permitted). After traversing each edge I check how many vertices are on the path. If it's less than n, I proceed with the traversal. If it's equal to n, I yield the subgraph corresponding to the path and proceed with the traversal. If it's greater than n, I stop. However, this method is inefficient as it produces many of the same subgraphs over and over.
(If it matters, all vertices in the original graph are of degree 8, and I will be looking for subgraph orders up to 15.)
David Eisenstat's comment to my question suggested adapting his answer to a similar question. The following pseudocode is that adaptation, which I've verified works and doesn't generate any one subgraph more than once. In it n refers to the order of the subgraphs to be returned (the test for which can be omitted if you want to return all subgraphs) and neighbours(e) means the set of edges which share an endpoint with e. When the function is first called, the subgraphEdges parameter should contain the given edge, and the rest of the parameters should be set accordingly.
GenerateConnectedSubgraphs(edgesToConsider,
subgraphVertices,
subgraphEdges,
neighbouringEdges):
let edgeCandidates ← edgesToConsider ∩ neighbouringEdges
if edgeCandidates = ∅
if |subgraphVertices| = n
yield (subgraphVertices, subgraphEdges)
else
choose some e ∈ edgeCandidates
GenerateConnectedSubgraphs(edgesToConsider ∖ {e},
subgraphVertices,
subgraphEdges,
neighbouringEdges)
GenerateConnectedSubgraphs(edgesToConsider ∖ {e},
subgraphVertices ∪ endpoints(e),
subgraphEdges ∪ {e},
neighbouringEdges ∪ neighbours(e))
I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
loop:
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
https://scipjack.zib.de/
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.