Find paths that cover all edges between two nodes - algorithm

we hope you are able to help us with the following problem:
A directed graph that may contain cycles is given. One has to find a set of paths that fulfill the following criterion:
all edges that can be passed on the way from node A to node B must be covered by the paths within the set (one edge can be part of more than one paths from the set)
the solution does not have to be necessarily the one with the lowest number of paths and the paths does not have to be necessarily the shortest ones. However, the solution should be efficiently implementable using a programming language just as java. We need the solution to generate a few test cases and it is important to cover all edges between a node A and a node B.
does everyone know a suitable algorithm? or does no efficient solution exist?
thanks a lot in advance for your advise! (we have already searched for a solution, but the one we found was focused on shortest paths and were extremely inefficient)
Here is a graphical representation of our problem:
http://i.stack.imgur.com/wIY34.jpg

Consider all edges R(A) reachable from A. They can be found by adding a node on each edge (i.e. turning each edge U->V to U->X->V) and then perform a Breadth First Search starting from A.
Edges outside of R(A) clearly cannot be be on a path from A to B, since then they'd be reachable from A. So all paths to B must go through edges of R(A).
So the set of edges, U, we want to "cover" are all edges of R(A) that B is reachable from.
Now we are looking for a set of paths S from A to B, which contains all edges of U.
A straightforward method is the following:
Color all edges of R(A) black and set S={ }
While there are black edges remaining:
Take a black edge UV.
If B is reachable from V:
Construct a path P = A -> ... -> U -> V -> ... -> B
Color all edges of P as gray
Add P to S
Else:
Color UV as gray.
Then return S
As #user189 pointed out, if we consider reachable edges from A that go through B, we are allowing paths that go twice through B. (I.e. a->b->c->g->f->e in the example image).
His suggested solution (removing the node B before computing R(A) ) fixes this.
Regarding complexity:
R(A) can be computed in O(|E|) time and the paths from A to an edge UV in R(A) can be directly read from the BFS tree. To check for reachability to B from V and to find the path, we can use a BFS tree starting from B and following edges backwards, computed in O(|E|) time.
If we reference the paths implicitly through the edge UV that connects the two BFS trees, and use a O(1) read/update structure to maintain the set of black edges and to look up edges in the BFS trees, I think we can do this in O(|E|) time.

Related

In a DAG, how to find vertices where paths converge?

I have a type of directed acyclic graph, with some constraints.
There is only one "entry" vertex
There can be multiple leaf vertices
Once a path splits, anything under that path cannot reach into the other path (this will become clearer with some examples below)
There can be any number of "split" vertices. They can be nested.
A "split" vertex can split into any number of paths. The examples below only show 2 paths for each, but it could be more.
My challenge is the following: for each "split" vertex (any vertex that has at least 2 outgoing edges), find the vertices where its paths reconnect - if such a vertex exists. The solution should be as efficient as possible.
Example A:
example a
In this example, vertex A is a "split" vertex, and its "reconnect vertex" is F.
Example B:
example b
Here, there are two split vertices: A and E. For both of them vertex G is the reconnect vertex.
Example C:
example c
Now there are three split vertices: A, D and E. The corresponding reconnect vertices are:
A -> K
D -> K
E -> J
Example D:
example d
Here we have three split vertices again: A, D and E. But this time, vertex E doesn't have a reconnect vertex because one of the paths terminates early.
Sounds like what you want is:
Connect each vertex with out-degree 0 to a single terminal vertex
Construct the dominator tree of the edge-reversed graph. The linked wikipedia article points to a couple algorithms for doing this.
The "reconnect vertex" for a split vertex is its immediate dominator in the edge-reversed graph, i.e., its parent in that dominator tree. This is called its "postdominator" in your original graph. If it's the terminal vertex that you added, then it doesn't have a reconnect vertex in your original graph.
This is the problem of identifying post-dominators in compilers and program analysis. This is often used in the context of calculating control dependences in control flow graphs. "Advanced Compiler Design and Implementation" is a good reference on these topics.
If the graph does not have cycles, then the solution (a) suggested by #matt-timmermans will work.
If the graph has cycles, then solution (a) can report spurious post-dominators. In such cases, a network-flow based approach works better. The algorithm to calculate non-termination sensitive control dependence in this paper using this approach. The basic idea is
at every split node, inject a unique token into the graph along each outgoing edge and
propagate the tokens thru the graph subject to this constraint: if node n is reachable from split node m, then tokens arriving at node m pass thru node n only if all tokens of node m have arrived at node n.
At the end, node n post-dominates node m if all tokens of node m have arrived at node n.

Find Minimum Vertex Connected Sub-graph

First of all, I have to admit I'm not good at graph theory.
I have a weakly connected directed graph G=(V,E) where V is about 16 millions and E is about 180 millions.
For a given set S, which is a subset of V (size of S will be around 30), is it possible to find a weakly connected sub-graph G'=(V',E') where S is a subset of V' but try to keep the number of V' and E' as small as possible?
The graph G may change and I hope there's a way to find the sub-graph in real time. (When a process is writing into G, G will be locked, so don't worry about G get changed when your sub-graph calculation is still running.)
My current solution is find the shortest path for each pair of vertex in S and merge those paths to get the sub-graph. The result is OK but the running time is pretty expensive.
Is there a better way to solve this problem?
If you're happy with the results from your current approach, then it's certainly possible to do at least as well a lot faster:
Assign each vertex in S to a set in a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure. Then:
Do a breadth-first-search of the graph, starting with S as the root set.
When you the search discovers a new vertex, remember its predecessor and assign it to the same set as its predecessor.
When you discover an edge that connects two sets, merge the sets and follow the predecessor links to add the connecting path to G'
Another way to think about doing exactly the same thing:
Sort all the edges in E according to their distance from S. You can use BFS discovery order for this
Use Kruskal's algorithm to generate a spanning tree for G, processing the edges in that order (https://en.wikipedia.org/wiki/Kruskal%27s_algorithm)
Pick a root in S, and remove any subtrees that don't contain a member of S. When you're done, every leaf will be in S.
This will not necessarily find the smallest possible subgraph, but it will minimize its maximum distance from S.

Show that the heuristic solution to vertex cover is at most twice as large as the optimal solution

The heuristic solution that I've been given is:
Perform a depth-first-search on the graph
Delete all the leaves
The remaining graph forms a vertex cover
I've been given the question: "Show that this heuristic is at most twice as large as the optimal solution to the vertex cover". How can I show this?
I assume that the graph is connected (if it's not the case, we can solve this problem for each component separately).
I also assume that a dfs-tree is rooted and a leaf is a vertex that doesn't have children in the rooted dfs-tree (it's important. If we define it differently, the algorithm may not work).
We need to show to things:
The set of vertices returned by the algorithm is a vertex cover. Indeed, there can be only types of edges in the dfs-tree of any undirected graph: tree edges (such an edge is covered as at least on of its endpoints is not a leaf) and a back edge (again, one of its endpoint is not a leaf because back edge goes from a vertex to its ancestor. A leaf cannot be an ancestor of a leaf).
Let's consider the dfs-tree and ignore the rest of the edges. I'll show that it's not possible to cover tree edges using less than half non-leave vertices. Let S be a minimum vertex cover. Consider a vertex v, such that v is not a leaf and v is not in S (that is, v is returned by the heuristic in question but it's not in the optimal answer). v is not a leaf, thus there is an edge v -> u in the dfs-tree (where u is a successor of v). The edge v -> u is covered by S. Thus, u is in S. Let's define a mapping f from vertices returned by the heuristic that are not in S as f(v) = u (where v and u have the same meaning as in the previous sentence). Note that v is a parent of u in the dfs-tree. But there can be only one parent for any vertex in a tree! Thus, f is an injection. It means that the number of vertices in the set returned by the heuristic but not in the optimal answer is not greater than the size of the optimal answer. That's exactly what we needed to show.
Bad news: heuristics does not work.
Strictly said, 1 isolated vertex is counter-example for the question.
Nevertheless, heuristic does not provide vertex cover solution at all, even if you correct it for isolated vertex and for 2-point cliques.
Take a look at fully connected graphs with number of vertexes from 1 to 3:
1 - strictly said, isolated vertex is not a leaf (it has degree 0, while leaf is a vertex with degree 1), so heuristic will keep it, while vertex cover will not
2 - heuristic will drop both leaves, while vertex cover will keep at least 1 of them
3 - heuristic will leave 1 vertex, while vertex cover has to keep at least 2 vertexes of this clique

Shortest paths with 2 constraints (Weight and Colour)

Input: We have a directed graph G=(V,E) and each edge has a weight and a colour {red,green}. We are also given a starting node s.Problem/Algorithm: Can we find for all u edges of G, the shortest paths s-u with at most k red edges ? First approach: We save for each node the shortest path with 0,1...k red edges. We modify Dijkstra's algorithm and depending on the colour of the edges we are looking into, we update the distances respectively. This approach fails due to its complexity. Second approach: We make k copies of G graph (G1,G2 ...Gk+1). In order to utilise the k red edges constraint, while we are searching for shortest paths with Dijkstra, every time we "meet" a red edge {ui,vi} in Gi, we connect ui with vi+1 in Gi+1. Because Gk+1 doesn't have any red edges, we can only reach Gk+1 with at most k edges.But it fails. For example with k=2 if a 2 red edges shortest path is found to X node then will not take into consideration a heavier path with less red edges which could lead to an undiscovered node. (If i had enough reputation i could post an image as example). Any ideas ?
I think your approaches are actually equivalent, provided that for approach #1, you record only the shortest distance to each node for each number of red edges used -- you don't need to record the entire path (just as you don't need to record it for ordinary Dijkstra on an ordinary shortest path problem)
Also this approach is sound. In particular, your reasoning that approach #2 is faulty is itself wrong: for any node X in the original graph, there is no single corresponding node X in the new graph; instead there are separate vertices for each number of red edges used. So the two paths "to X" you are considering are not actually to the same node: one is to (X, 2 red edges used) and one is to e.g. (X, 1 red edge used). Then you can use a single Dijkstra run to calculate shortest paths to all k+1 copies of every vertex (i.e. to the vertices (v, i red edges used) for each 0 <= i <= k and for each v in V(G)), and return the lowest. (I'm assuming here that when you wrote "Can we find for all u edges of G, the shortest paths s-u", you meant "for all nodes u of G, the shortest paths s-u".)
Finally, you need to make sure that for any red edge {u, v} in G, you delete the corresponding edge {ui, vi} for all Gi (as well as add in the edge {ui, vi+1}). You probably intended this, but you weren't explicit about it.

minimum connected subgraph containing a given set of nodes

I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
loop:
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
https://scipjack.zib.de/
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.

Resources