In a DAG, how to find vertices where paths converge? - algorithm

I have a type of directed acyclic graph, with some constraints.
There is only one "entry" vertex
There can be multiple leaf vertices
Once a path splits, anything under that path cannot reach into the other path (this will become clearer with some examples below)
There can be any number of "split" vertices. They can be nested.
A "split" vertex can split into any number of paths. The examples below only show 2 paths for each, but it could be more.
My challenge is the following: for each "split" vertex (any vertex that has at least 2 outgoing edges), find the vertices where its paths reconnect - if such a vertex exists. The solution should be as efficient as possible.
Example A:
example a
In this example, vertex A is a "split" vertex, and its "reconnect vertex" is F.
Example B:
example b
Here, there are two split vertices: A and E. For both of them vertex G is the reconnect vertex.
Example C:
example c
Now there are three split vertices: A, D and E. The corresponding reconnect vertices are:
A -> K
D -> K
E -> J
Example D:
example d
Here we have three split vertices again: A, D and E. But this time, vertex E doesn't have a reconnect vertex because one of the paths terminates early.

Sounds like what you want is:
Connect each vertex with out-degree 0 to a single terminal vertex
Construct the dominator tree of the edge-reversed graph. The linked wikipedia article points to a couple algorithms for doing this.
The "reconnect vertex" for a split vertex is its immediate dominator in the edge-reversed graph, i.e., its parent in that dominator tree. This is called its "postdominator" in your original graph. If it's the terminal vertex that you added, then it doesn't have a reconnect vertex in your original graph.

This is the problem of identifying post-dominators in compilers and program analysis. This is often used in the context of calculating control dependences in control flow graphs. "Advanced Compiler Design and Implementation" is a good reference on these topics.
If the graph does not have cycles, then the solution (a) suggested by #matt-timmermans will work.
If the graph has cycles, then solution (a) can report spurious post-dominators. In such cases, a network-flow based approach works better. The algorithm to calculate non-termination sensitive control dependence in this paper using this approach. The basic idea is
at every split node, inject a unique token into the graph along each outgoing edge and
propagate the tokens thru the graph subject to this constraint: if node n is reachable from split node m, then tokens arriving at node m pass thru node n only if all tokens of node m have arrived at node n.
At the end, node n post-dominates node m if all tokens of node m have arrived at node n.

Related

Find Minimum Vertex Connected Sub-graph

First of all, I have to admit I'm not good at graph theory.
I have a weakly connected directed graph G=(V,E) where V is about 16 millions and E is about 180 millions.
For a given set S, which is a subset of V (size of S will be around 30), is it possible to find a weakly connected sub-graph G'=(V',E') where S is a subset of V' but try to keep the number of V' and E' as small as possible?
The graph G may change and I hope there's a way to find the sub-graph in real time. (When a process is writing into G, G will be locked, so don't worry about G get changed when your sub-graph calculation is still running.)
My current solution is find the shortest path for each pair of vertex in S and merge those paths to get the sub-graph. The result is OK but the running time is pretty expensive.
Is there a better way to solve this problem?
If you're happy with the results from your current approach, then it's certainly possible to do at least as well a lot faster:
Assign each vertex in S to a set in a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure. Then:
Do a breadth-first-search of the graph, starting with S as the root set.
When you the search discovers a new vertex, remember its predecessor and assign it to the same set as its predecessor.
When you discover an edge that connects two sets, merge the sets and follow the predecessor links to add the connecting path to G'
Another way to think about doing exactly the same thing:
Sort all the edges in E according to their distance from S. You can use BFS discovery order for this
Use Kruskal's algorithm to generate a spanning tree for G, processing the edges in that order (https://en.wikipedia.org/wiki/Kruskal%27s_algorithm)
Pick a root in S, and remove any subtrees that don't contain a member of S. When you're done, every leaf will be in S.
This will not necessarily find the smallest possible subgraph, but it will minimize its maximum distance from S.

How can I find a way to minimum the number of edges?

I am thinking an algorithm to solve the problem below:
A given graph composed of vertices and edges.
There are N customers who want to travel from a vertex to another vertex.
And each customer requirement need a directed edge to connect two vertices.
The problem is how to find the minimum number of edges to satisfy all customers requirements ?
There is a simple example:
Customer 1 wants to travel from vertex a to vertex b.
Customer 2 wants to travel from vertex b to vertex c.
Customer 3 wants to travel from vertex a to vertex c.
The simplest way is to give an edge for each customers:
edge 1: vertex a -> vertex b
edge 2: vertex b -> vertex c
edge 3: vertex a -> vertex c
But actually there only needs 2 edges (i.e. edge 1 and edge 2) to satisfy three customer requirements.
If the number customers is large, how to find the minimum edges to satisfy all customer requirements ?
Is there a algorithm to solve this problem ?
You can model the problem as a mixed integer program. You can define binary variables for "arc a-> b is used" and "customer c uses arc a -> b" and write down the requirements as linear inequalities. If your graph is not too large, you can solve such models in reasonable time by a mixed integer program solver (CPLEX, GUROBI, but there also free alternatives on the web).
I know that this solution requires some work if you are not familiar with linear programming, but it guarantees to find best solutions in finite time and you can probably solve it for (say) 1000 customers and 1000 arcs.
If you have N vertices, you can always construct a solution with N (directed) edges. Just create a directed cycle V_1 -> V_2 -> V_3 ->... -> V_N -> V_1. You can never have directed path from every vertex V_a to every other vertex V_b with fewer edges (because you'd have a directed tree which necessarily contains a leaf). The leaf is either un-reachable (if the edge goes from leaf out) or the leaf is a sink (can't connect to anything else) if the edge is ->leaf.
No need to use any new algorithm. You can use BFS/DFS algorithm.
Find if there exists any path between source and destination.
if !true
add a direct edge between source and destination
count++;
return count;
Here the key part is instead of loop through the graph we have to loop through newly added edges.
You can use Disjoint set data structure.
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
while (num_edges--)
if root(vertex_a) != root(vertex_b)
count++
union(vertex_a,vertex_B)
If I think of the same problem for undirected edges, what we are looking for is the minimum spanning tree (MST) of the original graph (constructed of all edges). The brief explanation is that for each edge E (v1 -> v2) if there is a second path to v2 from v1, there exist a cycle, and for each existing cycle there is an edge we can omit.
For finding MST of a directed graph there is Chu–Liu/Edmonds' algorithm you can use.
Note that you are assigning a weight of 1 to all of your edges.

Find paths that cover all edges between two nodes

we hope you are able to help us with the following problem:
A directed graph that may contain cycles is given. One has to find a set of paths that fulfill the following criterion:
all edges that can be passed on the way from node A to node B must be covered by the paths within the set (one edge can be part of more than one paths from the set)
the solution does not have to be necessarily the one with the lowest number of paths and the paths does not have to be necessarily the shortest ones. However, the solution should be efficiently implementable using a programming language just as java. We need the solution to generate a few test cases and it is important to cover all edges between a node A and a node B.
does everyone know a suitable algorithm? or does no efficient solution exist?
thanks a lot in advance for your advise! (we have already searched for a solution, but the one we found was focused on shortest paths and were extremely inefficient)
Here is a graphical representation of our problem:
http://i.stack.imgur.com/wIY34.jpg
Consider all edges R(A) reachable from A. They can be found by adding a node on each edge (i.e. turning each edge U->V to U->X->V) and then perform a Breadth First Search starting from A.
Edges outside of R(A) clearly cannot be be on a path from A to B, since then they'd be reachable from A. So all paths to B must go through edges of R(A).
So the set of edges, U, we want to "cover" are all edges of R(A) that B is reachable from.
Now we are looking for a set of paths S from A to B, which contains all edges of U.
A straightforward method is the following:
Color all edges of R(A) black and set S={ }
While there are black edges remaining:
Take a black edge UV.
If B is reachable from V:
Construct a path P = A -> ... -> U -> V -> ... -> B
Color all edges of P as gray
Add P to S
Else:
Color UV as gray.
Then return S
As #user189 pointed out, if we consider reachable edges from A that go through B, we are allowing paths that go twice through B. (I.e. a->b->c->g->f->e in the example image).
His suggested solution (removing the node B before computing R(A) ) fixes this.
Regarding complexity:
R(A) can be computed in O(|E|) time and the paths from A to an edge UV in R(A) can be directly read from the BFS tree. To check for reachability to B from V and to find the path, we can use a BFS tree starting from B and following edges backwards, computed in O(|E|) time.
If we reference the paths implicitly through the edge UV that connects the two BFS trees, and use a O(1) read/update structure to maintain the set of black edges and to look up edges in the BFS trees, I think we can do this in O(|E|) time.

Cycle detection in a Multigraph

I would like to list all the cycles in an undirected multigraph.
Tarjan's strongly connected components algorithm was written for a directed graph. Will it work for multigraphs? If not, is there an cycle listing algorithm for undirected multigraphs?
There are a few ways to reduce your problem to Tarjan, depending on how you want to count cycles.
First, apply two transformations to your graph:
Convert to a directed graph by replacing each undirected edge with a pair of opposing directed edges.
For each pair of nodes, collapse edges pointing the same direction into a single edge.
You'll be left with a directed graph. Apply Tarjan's algorithm.
Now, depending on what you consider a cycle, you may or may not be done. If a cycle is set of nodes (that happen to posses the required edges), then you can read the cycles directly off the transformed graph.
If a cycle is a set of edges (sharing the required nodes), then you need to "uncollapse" the edges introduced in step 2 above. For each collapsed edge, enumerate along the set of real edges it replaced. Doing so for each edge in each collapsed cycle will yield all actual cycles in a combinatorial explosion. Note that this will generate spurious two-cycles which you'll need to prune.
To illustrate, suppose the original graph has three nodes A, B and C, with two edges between A and B, one between B and C and one between A and C. The collapsed graph will be a triangle, with one cycle.
Having found a cycle between the three nodes, walk each combination of edges to recover the full set of cycles. Here, there are two cycles: both include the A to C and B to C edges. They differ in which A to B edge they choose.
If the original graph also had two edges between B and C, then there would be four expanded graphs. The total number of expanded cycles is the product of the edge counts: 4 == 2 * 2 * 1.

Minimize set of edges in a directed graph keeping connected components

Here is the full question:
Assume we have a directed graph G = (V,E), we want to find a graph G' = (V,E') that has the following properties:
G' has same connected components as G
G' has same component graph as G
E' is minimized. That is, E' is as small as possible.
Here is what I got:
First, run the strongly connected components algorithm. Now we have the strongly connected components. Now go to each strong connected component and within that SCC make a simple cycle; that is, a cycle where the only nodes that are repeated are the start/finish nodes. This will minimize the edges within each SCC.
Now, we need to minimize the edges between the SCCs. Alas, I can't think of a way of doing this.
My 2 questions are: (1) Does the algorithm prior to the part about minimizing edges between SCCs sound right? (2) How does one go about minimizing the edges between SCCs.
For (2), I know that this is equivalent to minimizing the number of edges in a DAG. (Think of the SCCs as the vertices). But this doesn't seem to help me.
The algorithm seems right, as long as you allow for closed walks (i.e. repeating vertices.) Proper cycles might not exist (e.g. in an "8" shaped component) and finding them is NP-hard.
It seems that it is sufficient to group the inter-component edges by ordered pairs of components they connect and leave only one edge in each group.
Regarding the step 2,minimize the edges between the SCCs, you could randomly select a vertex, and run DFS, only keeping the longest path for each pair of (root, end), while removing other paths. Store all the vertices searched in a list L.
Choose another vertex, if it exists in L, skip to the next vertex; if not, repeat the procedure above.

Resources