Directed graph decomposition - algorithm

I want to decompose a directed acyclic graph into minimum number of components such that in each component the following property holds true-
For all pair of vertices (u,v) in a components, there is a path from u to v or from v to u.
Is there any algorithm for this?
I know that when the or is replaced by and in the condition, it is same as finding the number of strongly connected components(which is possible using DFS).
*EDIT: * What happens if the Directed graph contains cycles (i.e. it is not acyclic)?

My idea is to order the graph topologically O(n) using DFS, and then think about for what vertices can this property be false. It can be false for those who are joining from 2 different branches, or who are spliting into 2 different branches.
I would go from any starting vertex(lowest in topological ordering) and follow it's path going into random branches, till you cannot go further and delete this path from graph(first component).This would be repeated till the graph is empty and you have all such components.
It seems like a greedy algorithm, but consider you find a very short path in the first run(by having a random bad luck) or you find a longest path(good luck). Then you would still have to find that small branch component in another step of algorithm.
Complexity would be O(n*number of components).
When there is and condition, you should be considering any oriented graph, as DAG cannot have strongly connected component.

The two existing answers both have problems that I've outlined in comments. But there's a more fundamental reason why no decomposition into components can work in general. First, let's concisely express the relation "u and v belong in the same component of the decomposition" as u # v.
It's not transitive
In order to represent a relation # as vertices in a component, that relation must be an equivalence relation, which means among other things that it must transitive: That is, if x # y and y # z, it must necessarily be true that x # z. Is our relation # transitive? Unfortunately the answer is "No", since it may be that there is a path from x to y (so that x # y), and a path from z to y (so that y # z), but no path from x to z or from z to x (so that x # z does not hold), as the following graph shows:
z
|
|
v
x----->y
The problem is that according to the above graph, x and y belong in the same component, and y and z belong in the same component, but x and z belong in different components, which is a contradiction. This means that, in general, it's impossible to represent the relationship # as a decomposition into components.
If an instance happens to be transitive
So there is no solution in general -- but there can still be input graphs for which the relation # happens to be transitive, and for which we can therefore compute a solution. Here is one way to do that (though probably not the most efficient way).
Compute shortest paths between all pairs of vertices (using e.g. the Floyd-Warshall algorithm, in O(n^3) time for n vertices). Now, for every vertex pair (u, v), either d(u, v) = inf, indicating that there is no way to reach v from u at all, or not, indicating that there is some path from u to v. To answer the question "Does u # v hold?" (i.e., "Do u and v belong in the same component of the decomposition?"), we can simply calculate d(u, v) != inf || d(v, u) != inf.
This gives us a relation that we can use to build an undirected graph G' in which there is a vertex u' for each original vertex u, and an edge between two vertices u' and v' if and only if d(u, v) != inf || d(v, u) != inf. Intuitively, every connected component in this new graph must be a clique. This property can be checked in O(n^2) time by first performing a series of DFS traversals from each vertex to assign a component label to each vertex, and then checking that each pair of vertices belongs to the same component if and only if they are connected by an edge. If the property holds then the resulting cliques correspond to the desired decomposition; otherwise, there is no valid decomposition.
Interestingly, there are graphs that are not chains of strongly connected components (as claimed by Zotta), but which nonetheless do have transitive # relations. For example, a tournament is a digraph in which there is an edge, in some direction, between every pair of vertices -- so clearly # holds for every pair of vertices in such a graph. But if we number the vertices 1 to n and include only edges from lower-numbered to higher-numbered vertices, there will be no cycles, and thus the graph is not strongly connected (and if n > 2, then clearly it's not a path).

Related

Graph path-finding algorithm variant

I am looking for an algorithm that takes an undirected graph as input and finds a subset of vertices such that the subgraph induced by those vertices forms a connected acyclic tree.
For instance, in the following figure the 'X' nodes would create a valid solution, but including any of the 'O' nodes would make it invalid.
O
/|
O-X-X-X
\ /
X-X
The usefulness of the solution to me is proportionate to the size of the subset. Although I don't need the entire maximal subset, a close approximation would be very helpful.
I've tried the obvious algorithm of starting with a random node and adding adjacent vertices if they don't induce a cycle. However, I have the feeling that this produces very suboptimal trees.
I should mention that my particular application involves graphs of ~100 nodes and ~1000 edges. This is small enough that brute-force backtracking algorithms might be feasible if well implemented (e.g. using Dancing Links, but I haven't tried this out.
This problem is called very similar to Feedback Vertex Set, and unfortunately it's NP-hard. According to the Wikipedia page, the best known approximation algorithm has an approximation ratio of 2: Becker, Ann; Geiger, Dan (1996), "Optimization of Pearl's method of conditioning and greedy-like approximation algorithms for the vertex feedback set problem.".
NP-hardness proof for "Connected Feedback Vertex Set"
I neglected the condition that the resulting graph needs to be connected, which is not the case for Feedback Vertex Set (FVS). Below I'll show that your problem, which I'll call Connected Feedback Vertex Set (CVFS), is nevertheless NP-hard.
Given an instance (G = (V, E), k) of FVS, we need to construct an instance (G' = (V', E'), k') of CFVS with the property that (G, k) is a YES-instance of FVS if and only if (G', k') is a YES-instance of CFVS. Informally this G' will look like a "stack of copies" of G, with a few extra vertices and edges. Let's do this as follows:
For each vertex v_i in V, create a path (not a clique, as I originally said in the comments...) of |V| vertices v'_i_j in V', 1 <= j <= |V|. These are the "meat vertices". (You can think of vertex v'_i_j being in "layer" j.) The vertices v'_i_1, v'_i_2, ..., v'_i_|V| are the "strand" of meat corresponding to vertex v_i in G (yes, terrible name...).
For each edge (v_i, v_j) in E, create all |V| corresponding "parallel" edges between the corresponding vertices in G' -- that is, create the edges (v'_i_1, v'_j_1), (v'_i_2, v'_j_2), ..., (v'_i_|V|, v'_j_|V|). (These edges all connect vertices that are in the same layer.)
For each vertex v_i in V, also create an additional "skeleton vertex" u'_i in V'. Make this u'_i adjacent to v'_i_1.
Add another vertex r to V', and make it adjacent to every skeleton vertex u'_i.
Finally, set k' = |V|*k + |V| - 1.
First I'll show that if the FVS instance (G, k) is a YES-instance, then (G', k') is a YES-instance of your problem. Let X be any solution (i.e., set of deleted vertices) to the FVS instance (G, k) that leaves at least 1 vertex of G undeleted (such a solution must exist, since a 1-vertex graph contains no cycle); then we can construct a solution X' to the instance of your problem as follows:
For each vertex v_i deleted in the FVS solution X, we can delete the corresponding path v_i_1, ..., v_i_|V| from G' at a total cost of at most |V|*k (deleting each path costs |V| vertex deletions, and at most k vertices were deleted from G by X). This guarantees that there will be no cycle consisting only of meat vertices in G'-X' (if there were, this would contradict the feasibility of the FVS solution X to (G, k)).
For each connected component in the FVS solution X, we can delete all but 1 of the corresponding skeleton vertices in G'. What we are left with in G' is a stack of |V| copies of the FVS solution G-X, plus a single skeleton vertex per component of that solution, plus the root vertex r. Since we only have a single path to r from each connected component (via a single skeleton vertex per component), there can be no cycle in G'. Since G-X contains at least 1 connected component, this can involve at most |V|-1 deletions, so at most |V|*k + |V| - 1 deletions were needed overall, so the answer to the constructed CFVS instance (G', k') is YES.
Secondly I'll show that if the constructed instance (G', k') of your problem is a YES-instance, then the original instance (G, k) of FVS is a YES-instance.
Let X' be any solution (i.e., set of deleted vertices) to the constructed instance (G', k') of CFVS. Consider the subgraph induced by each layer of meat vertices in G'-X': there are |V| such layers. In general, different layers could contain different numbers of deletions. Choose any layer j that contains a minimum number of deletions; since G'-X' is cycle-free, so is every induced subgraph, including in particular layer j. The number of deletions in layer j is at most k'/|V|, since otherwise (by the minimal choice of j) there would be strictly more than k' deletions overall, a contradiction. But any integer <= k'/|V| must be <= RoundDown((|V|*k + |V| - 1) / |V|) = k, and layer j is just a copy of the original FVS problem (G, k), so it is possible to destroy every cycle in layer j -- and thus in the original FVS instance (G, k) -- with at most k deletions. This implies that (G, k) is a YES-instance of FVS.
(G, k) being a YES-instance of FVS implies (G', k') being a YES-instance of CFVS, and vice versa, so (G, k) being a NO-instance of FVS implies (G', k') being a NO-instance of CFVS, so the problems instances are equivalent. Clearly (G', k') can be constructed in polynomial time from (G, k), so it follows that CVFS is NP-hard. It's also clearly NP-complete, since a solution to a YES-instance can be checked for correctness (that is, cycle-freeness and connectedness) in O(|V|+|E|) time with a single DFS.
This sound like finding the maximum spanning tree and then taking a subset using the structure of it should probably help.
If not, try this:
Find the node with least amount of edges.
If by adding it to the subset you still have a tree, mark it as belonging to the subset and "erase" all its edges from the graph.
Repeat step 1 until all nodes not in the subset can't be added.
Hope this helps

Single edge addition to minimize number of bridges in a graph

You are given a graph G with N vertices and M edges with N<=10^4 and M<=10^5. Now, you have to add exactly one edge (u,v) to the graph so that the total number of bridges is minimized. G may have multiple edges, but no self loops. On the other hand, the newly generated graph, after adding the edge, G', may have both self loops and multiple edges. If many such (u,v) with u<=v is possible then output the lexicographically smallest one (the vertices are numbered from 1..n).
A trivial idea would be to try all edges in order and then use the bridges finding algorithm to find the number of bridges. This takes time O(V^2 * E), so it is clearly useless. How to do better in terms of runtime ?
EDIT: Following advice by j_random_hacker, I add the following details about the source of the above problem. This is a problem named Computer Network (specifically problem 3) from India's IOI Training Camp '14 Practice Test (Test 3). It was an onsite offline test, so I cannot prove that it is not from a present contest, by giving a link. But I have a PDF of the problem statement.
This is not a complete answer but some ideas to steer you towards it:
To avoid having to run the bridge-finding algorithm after trying each possible edge, it pays to ask: By how much can adding a single edge (u, v) change the number of bridges in a graph G?
If u and v are not already connected by any path in G, then certainly (u, v) will itself become a bridge. What about the "bridgeness" (bridgity? bridgulence?) of all other pairs of vertices? Does it change? (Most importantly: Can any edge go from being a bridge to being a non-bridge? If you can prove that this can never happen, then you can immediately discard all such vertex pairs (u, v) from consideration as they can only ever make the situation worse.)
If u and v are already connected in G, there are 2 possibilities:
Every path P that connects them shares some edge (x, y) (note that x and y are not necessarily distinct from u and v). Then (x, y) is a bridge in G, and adding (u, v) will cause (x, y) to stop being a bridge, because it will then become possible to get from x to y "the long way", by going from x back to u, via the new edge (u, v) to v, and then back up to y. (This assumes that x is closer to u on P than y is, but clearly the argument still works if y is closer: just swap u and v.) There could be multiple such bridges (x, y): in that case, all of them will become non-bridges after (u, v) is added.
There are at least 2 edge-disjoint paths P and Q already connecting u and v. Obviously no edge (x, y) on P or Q can be a bridge, since if (x, y) on P were deleted, it's still possible to get from x to y "the long way" via Q. The question is, again: What about the bridgeness of all other vertex pairs? You should be able to prove that this property doesn't change, meaning that adding the edge (u, v) leaves the total number of bridges unchanged, and can therefore be disregarded as a useless move (unless there are no bridges at all to start with).
We see that 2.1 above is the only case in which adding an edge (u, v) can be useful. Furthermore, it seems that the more bridges we can find in a single path in G, the more of them we can neutralise by choosing to connect the endpoints of that path.
So it seems like "Find the path in G that contains the most bridges" might be the right criterion. But first we need to ask ourselves: Does the number of bridges in a path P accurately count the number of bridges eliminated by adding an edge from the start of P to the end? (We know that adding such an edge must eliminate at least those bridges, but perhaps some others are also eliminated as a "side effect" -- and if so, then we need to count them somehow to make sure that we add the edge that eliminates the most bridges overall.)
Happily the answer is that no other bridges are eliminated. This time I'll do the proof myself.
Suppose that there is a path P from u to v, and suppose to the contrary that adding the edge (u, v) would eliminate a bridge (x, y) that is not on P. Then it must be that the single edge (x, y) is the only path from x to y, and that adding (u, v) would create a second path Q from x, via the edge (u, v) in either direction, to y that avoids the edge (x, y). But for any such Q, we could replace the edge (u, v) in Q with the path P, which from our initial assumption avoids (x, y), and still get a path Q' from x to y that avoids the edge (x, y) -- this means that (x, y) must have already been connected by two edge-disjoint paths (namely the single edge (x, y) and Q'), so it could not have been a bridge in the first place. Since this is a contradiction, it follows that no such "removed as a side effect" bridge (x, y) can exist.
So "Find the path in G that contains the most bridges, and add an edge between its endpoints" definitely gives the right answer -- but there is still a problem: this sounds a lot like the Longest Path problem, which is NP-hard for general graphs, and therefore slow to solve.
However, there is a way out. (There must be: you already have an O(V^2*E) algorithm, so it can't be that your problem is NP-hard :-) ) Think of the biconnected components in your input graph G as being vertices in another graph G'. What do the edges between these vertices (in G') correspond to in G? Do they have any particular structure? Final (big) hint: What is a critical path?
This answer is a spoiler. You should probably think along with j_random_hacker's answer instead.
If I understand your problem correctly:
Think of the graph as a tree of biconnected components. Find the longest path in this tree and link up its ends with the new edge.
There is a linear-time algorithm for finding biconnected components using depth first search. Finding the longest path in a tree takes linear time and can be done using depth-first search---make it do "find the farthest vertex and return both it and its distance" and use that. So this takes linear time overall.
(You can roll it all into a single depth-first search that returns the number of bridge edges in the bridgiest path and the farthest vertex in said bridgiest path.)

Minimizing the number of connected-checks in finding a shortest path in an implicit graph

I'm quite surprised I couldn't find anything on this anywhere, it seems to be a problem that should be quite well known:
Consider the Euclidean shortest path problem, in two dimensions. Given a set of obstacle polygons P and two points a and b, we want to find the shortest path from a to b not intersecting the (interior of) any p in P.
To solve this, one can create the visibility graph for this problem, the graph whose nodes are the vertices of the elements of P, and where two nodes are connected if the straight line between them does not intersect any element of P. The edge weight for any such edge is simply the Euclidean distance between such two points. To solve this, one can then determine the shortest path from a to b in the graph, let's say with A*.
However, this is not a good approach. Creating the visibility graph in advance requires checking if any two vertices from any two polygons are connected, a check that has higher complexity than determining the distance between two nodes. So working with a modified version of A* that "does everything what it can before checking if two nodes are actually connected" actually speeds up the problem.
Still, A* and all other shortest path problems always start with an explicitly given graph for which adjacent vertices can be traversed cheaply. So my question is, is there a good (optimal?) algorithm for finding a shortest path between two nodes a and b in an "implicit graph" that minimizes checking if two nodes are connected?
Edit:
To clarify what I mean, this is an example of what I'm looking for:
Let V be a set, a, b elements of V. Suppose w: V x V -> D is a weighing function (to some linearly ordered set D) and c: V x V -> {true, false} returns true iff two elements of V are considered to be connected. Then the following algorithm finds the shortest path from a to b in V, i.e., returns a list [x_i | i < n] such that x_0 = a, x_{n-1} = b, and c(x_i, x_{i+1}) = true for all i < n - 1.
Let (V, E) be the complete graph with vertex set V.
do
Compute shortest path from a to b in (V, E) and put it in P = [p_0, ..., p_{n-1}]
if P = empty (there is no shortest path), return NoShortestPath
Let all_good = true
for i = 0 ... n - 2 do
if c(p_i, p_{i+1}) == false, remove edge (p_i, p_{i+1}) from E, set all_good = false and exit for loop
while all_good = false
For computing the shortest paths in the loop, one could use A* if an appropriate heuristic exists. Obviously this algorithm produces a shortest path from a to b.
Also, I suppose this algorithm is somehow optimal in calling c as rarely as possible. For its found shortest path, it must have ruled out all shorter paths that the function w would have allowed for.
But surely there is a better way?
Edit 2:
So I found a solution that works relatively well for what I'm trying to do: Using A*, when relaxing a node, instead of going through the neighbors and adding them to / updating them in the priority queue, I put all vertices into the priority queue, marked as hypothetical, together with hypothetical f and g values and the hypothetical parent. Then, when picking the next element from the priority queue, I check if the node's connection to its parent is actually given. If so, the node is progressed as normal, if not, it is discarded.
This greatly reduces the number of connectivity checks and improves performance for me a lot. But I'm sure there's still a more elegant way, in particular one where the "hypothetical new path" doesn't just extend by length one (parents are always actual, not hypothetical).
A* or Dijkstra's algorithm do not need an explicit graph to work, they actually only need:
source vertex (s)
A function next:V->2^V such that next(v)={u | there is an edge from v to u }
A function isGoal:V->{0,1} such that isGoal(v) = 1 iff v is a target node.
A weight function w:E->R such that w(u,v)= cost to move from u to v
And, of course, in addition A* is going to need a heuristic function h:V->R such that h(v) is the cost approximation.
With these functions, you can generate only the portion of the graph that is needed to find shortest path, on the fly.
In fact, A* algorithm is often used on infinite graphs (or huge graphs that do not fit in any existing storage) in artificial inteliigence problems using this approach.
The idea is, you only look on edges in A* from a given node (all (u,v) in E for some given u). You don't need the entire edges set E in order to do it, you can just use your next(u) function instead.

Find all edges in min-cut

Let (G,s,t,{c}) be a flow network, and let F be the set of all edges e for which there exists at least one minimum cut (A,B) such that e goes from A to B. Give a polynomial time algorithm that finds all edges in F.
NOTE: So far I know I need to run Ford-Fulkerson so each edges has a flow. Furthermore I know for all edges in F, the flow f(e) = c(e). However not all edges in a graph G which respects that constraint will be in a min-cut. I am stuck here.
Suppose you have computed a max flow on a graph G and you know the flow through every edge in the graph. From the source vertex s, perform a Breadth First Search OR Depth First Search on the original graph and only traverse those edges that have flow less than the capacity of the edge. Denote the set of vertices reachable in this traversal as S, and unreachable vertices as T.
To obtain the minimum cut C, we simply find all edges in the original graph G which begin at some vertex in S and end at some vertex in T.
This tutorial in Topcoder provides an explanation / proof of the above algorithm. Look at the section beginning with the following text:
A cut in a flow network is simply a partition of the vertices in two sets, let's call them A and B, in such a way that the source vertex is in A and the sink is in B.
I shall attempt to provide an explanation of the corresponding section in the Topcoder tutorial (just for me to brush up on this as well).
Now, suppose that we have computed a max flow on a graph G, and that we have computed the set of edges C using the procedure outlined above. From here, we can conclude several facts.
Fact 1: Source vertex s must be in set S, and sink vertex t must be in set T.
Otherwise, vertices s and t must be in the same set, which means that we must have found a path from s to t consisting only of edges that have flow less than capacity. This means that we can push more flow from s to t, and therefore we have found an augmenting path! However, this is a contradiction, since we have already computed a max flow on the graph. Hence, it is impossible for source vertex s and sink vertex t to be connected, and they must be in different sets.
Fact 2: Every edge beginning at set S and ending at set T must have flow == capacity
Again we prove this by contradiction. Suppose that there is a vertex u in S and a vertex v in T, such that edge (u,v) in the residual network has flow less than capacity. By our algorithm above, this edge will be traversed, and vertex v should be in set S. This is a contradiction. Therefore, such an edge must have flow == capacity.
Fact 3: Removing the edges in C from graph G will mean that there is no path from any vertex in set S to any vertex in set T
Suppose that this is not the case, and there is some edge (u,v) that connects vertex u in set S to vertex v in set T. We can separate this into 2 cases:
Flow through edge (u,v) is less than its capacity. But we know this will cause vertex v to be part of set S, so this case is impossible.
Flow through edge (u,v) is equal to its capacity. This is impossible since edge (u,v) will be considered as part of the edge set C.
Hence both cases are impossible, and we see that removing the edges in C from the original graph G will indeed result in a situation where there is no path from S to T.
Fact 4: Every edge in the original graph G that begins at vertex set T but ends at vertex set S must have a flow of 0
The explanation on the Topcoder tutorial may not be obvious on first reading and the following is an educated guess on my part and may be incorrect.
Suppose that there exists some edge (x,y) (where x belongs to vertex set T and y belongs to vertex set S), such that the flow through (x,y) is greater than 0. For convenience, we denote the flow through (x,y) as f. This means that on the residual network, there must exist a backward edge (y,x) with capacity f and flow 0. Since vertex y is part of set S, the backward edge (y,x) has flow 0 with capacity f > 0, our algorithm will traverse the edge (y,x) and place vertex x as part of vertex set S. However, we know that vertex x is part of vertex set T, and hence this is a contradiction. As such, all edges from T to S must have a flow of 0.
With these 4 facts, along with the Max-flow min-cut theorem, we can conclude that:
The max flow must be less than or equal to the capacity of any cut. By Fact 3, C is a cut of the graph, so the max flow must be less than or equal to the capacity of cut C.
Fact 4 allows us to conclude that there is no "back flow" from T to S. This along with Fact 2 means that the flow consists entirely of "forward flow" from S to T. In particular, all the forward flow must result from the cut C. This flow value happens to be the max flow. As such, by the Max-flow min-cut theorem, we know that C must be a minimum cut.

How to detect whether a directed graph is uniquely connected?

A directed graph is said to be uniquely connected if there exists exactly one path between every pair of vertices. How to identify whether a graph has this property or not? This needs to be done in order O(n+m), where n are the number of vertices of the graph and m are the edges.
It is quite clear that there shouldn't be any cross-edges or forward-edges in the graph. But what about back-edges?
If there is exactly one directed path between every pair of nodes, then
every node must have at least one out-edge (else no paths from that node to other nodes)
no node can have have more than one out-edge (if there is an edge from X to Y and an edge from X to Z, and there are paths from Y to T and from Z to T, then there are multiple paths from X to T)
But now, with every node having exactly one out-edge, and every node being reachable from every other node, the graph must be a single directed cycle.
That is trivial to check in O(n) time.
Edit: As Erik P notes in the comments, this argument only applies if the paths in question are simple paths. In the same spirit, a graph of size 3 may need special treatment, because the X-Y-Z-T reasoning above doesn't apply, which means a graph with nodes X,Y,Z and edges from X to Y and Z, and from Y and Z to X would be legal.

Resources