Linear time algorithm to make a graph strongly connected - algorithm

We have a weakly acyclic digraph.
Also we are given a set A which holds vertices of G that have in-degree zero and a set B which holds vertices that have out-degree zero. (size of A is smaller then size of B).
On top of that, we also know that if items in A and B have a particular order (e.g. A = a1, a2, ..., am and B = b1, b2, ... , bn) a DFS started at ai visits bi (1≤ i ≤ m).
Is it possible to design a linear time algorithm which makes G strongly connected by adding to it as few edges as possible?

Add arcs bj -> aj+1 for j = 1, ..., m-1 and arcs bj -> a1 for j = m, ..., n.
The resulting graph is strongly connected because the a's and b's are strongly connected by the added arcs and the paths from ai to bi and, for every node x, there exist i, j such that there exists a path in the original graph from ai to x and a path in the original graph from x to bj.
We cannot use fewer arcs, because an outgoing arc must be added to each of b1, ..., bn.

Edited - Following does not produce solution with least links:
You can run http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm in linear time. I propose that you do this and note that "no strongly connected component will be identified before any of its successors". Therefore the first strongly component out of the graph must not be a successor of any of the other components. I suggest that every time you emit a strongly connected component which has no successor, then you add a link connecting it to this first component. I suggest that you also add a link every time you essentially restart the Tarjan algorithm with a non-recursive call to strongconnect(), connecting the first component to the vertex you are restarting at.
With these links you can get from the first strong component to every other component, and from every other component to the first strong component. - unfortunately this is not necessarily the solution with the least links - see second comment by Per below.

Related

In a DAG, how to find vertices where paths converge?

I have a type of directed acyclic graph, with some constraints.
There is only one "entry" vertex
There can be multiple leaf vertices
Once a path splits, anything under that path cannot reach into the other path (this will become clearer with some examples below)
There can be any number of "split" vertices. They can be nested.
A "split" vertex can split into any number of paths. The examples below only show 2 paths for each, but it could be more.
My challenge is the following: for each "split" vertex (any vertex that has at least 2 outgoing edges), find the vertices where its paths reconnect - if such a vertex exists. The solution should be as efficient as possible.
Example A:
example a
In this example, vertex A is a "split" vertex, and its "reconnect vertex" is F.
Example B:
example b
Here, there are two split vertices: A and E. For both of them vertex G is the reconnect vertex.
Example C:
example c
Now there are three split vertices: A, D and E. The corresponding reconnect vertices are:
A -> K
D -> K
E -> J
Example D:
example d
Here we have three split vertices again: A, D and E. But this time, vertex E doesn't have a reconnect vertex because one of the paths terminates early.
Sounds like what you want is:
Connect each vertex with out-degree 0 to a single terminal vertex
Construct the dominator tree of the edge-reversed graph. The linked wikipedia article points to a couple algorithms for doing this.
The "reconnect vertex" for a split vertex is its immediate dominator in the edge-reversed graph, i.e., its parent in that dominator tree. This is called its "postdominator" in your original graph. If it's the terminal vertex that you added, then it doesn't have a reconnect vertex in your original graph.
This is the problem of identifying post-dominators in compilers and program analysis. This is often used in the context of calculating control dependences in control flow graphs. "Advanced Compiler Design and Implementation" is a good reference on these topics.
If the graph does not have cycles, then the solution (a) suggested by #matt-timmermans will work.
If the graph has cycles, then solution (a) can report spurious post-dominators. In such cases, a network-flow based approach works better. The algorithm to calculate non-termination sensitive control dependence in this paper using this approach. The basic idea is
at every split node, inject a unique token into the graph along each outgoing edge and
propagate the tokens thru the graph subject to this constraint: if node n is reachable from split node m, then tokens arriving at node m pass thru node n only if all tokens of node m have arrived at node n.
At the end, node n post-dominates node m if all tokens of node m have arrived at node n.

What is meant by the set of all possible configuration in a given graph G

I'm trying to understand a Solved exercise 2, Chapter 3 - Algorithm design by tardos.
But i'm not getting the idea of the answer.
In short the question is
We are given two robots located at node a & node b. The robots need to travel to node c and d respectively. The problem is if one of the nodes gets close to each other. "Let's assume the distance is r <= 1 so that if they become close to each other by one node or less" they will have an interference problem, So they won't be able to transmit data to the base station.
The answer is quite long and it does not make any sense to me or I'm not getting its idea.
Anyway I was thinking can't we just perform DFS/BFS to find a path from node a to c, & from b to d. then we modify the DFS/BFS Algorithm so that we keep checking at every movement if the robots are getting close to each other?
Since it's required to solve this problem in polynomial time, I don't think this modification to any of the algorithm "BFS/DFS" will consume a lot of time.
The solution is "From the book"
This problem can be tricky to think about if we view things at the level of the underlying graph G: for a given configuration of the robots—that is, the current location of each one—it’s not clear what rule we should be using to decide how to move one of the robots next. So instead we apply an idea that can be very useful for situations in which we’re trying to perform this type of search. We observe that our problem looks a lot like a path-finding problem, not in the original graph G but in the space of all possible configurations.
Let us define the following (larger) graph H. The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G. We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′)will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.
Why the need for larger graph H?
What does he mean by: The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G.
And what does he mean by: We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.?
I do not have the book, but it seems from their answer that at each step they move one robot or the other. Assuming that, H consists of all possible pairs of nodes that are more than distance r apart. The nodes in H are adjacent if they can be reached by moving one robot or the other.
There are not enough details in your proposed algorithm to say anything about it.
Anyway I was thinking can't we just perform DFS/BFS to find a path from node a to c, & from b to d. then we modify the DFS/BFS Algorithm so that we keep checking at every movement if the robots are getting close to each other?
I don't think this would be possible. What you're proposing is to calculate the full path, and afterwards check if the given path could work. If not, how would you handle the situation so that when you rerun the algorithm, it won't find that pathological path? You could exclude that from the set of possible options, but I don't see think that'd be a good approach.
Suppose a path of length n, and now suppose that the pathology resides in the first step of the given path. Suppose now that this happens every time you recalculate the path. You would have to recalculate the path a lot of times just because the algorithm itself isn't aware of the restrictions needed to get to the right answer.
I think this is the point: the algorithm itself doesn't consider the problem's restrictions, and that is the main problem, because there's no easy way of correcting the given (wrong) solution.
What does he mean by: The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G.
What they mean by that is that each node in H represents each possible position of the two robots, which is the same as "all possible pairs of nodes in G".
E.g.: graph G has nodes A, B, C, D, E. H will have nodes AB, AC, AD, AE, BC, BD, BE, CD, CE, DE (consider AB = BA for further analysis).
Let the two robots be named r1 and r2, they start at nodes A and B (given info in the question), so the path will start in node AB in graph H. Next, the possibilities are:
r1 moves to a neighbor node from A
r2 moves to a neighbor node from B
(...repeat for each step unitl r1 and r2 each reach its destination).
All these possible positions of the two robots at the same time are the configurations the answer talks about.
And what does he mean by: We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.?
Let's look at the possibilities from what they state here:
(u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.
The possibilities are:
(u,v) and (u,w) / (v,w) is and edge in E. In this case r2 moves to one of the neighbors from its current node.
(u,v) and (w,v) / (u,w) is and edge in E. In this case r1 moves to one of the neighbors from its current node.
This solution was a bit tricky to me too at first. But after reading it several times and drawing some examples, when I finally bumped into your question, the way you separated each part of the problem then helped me to fully understand each part of the solution. So, a big thanks to you for this question!
Hope it's clearer now for anyone stuck with this problem!

Directed graph decomposition

I want to decompose a directed acyclic graph into minimum number of components such that in each component the following property holds true-
For all pair of vertices (u,v) in a components, there is a path from u to v or from v to u.
Is there any algorithm for this?
I know that when the or is replaced by and in the condition, it is same as finding the number of strongly connected components(which is possible using DFS).
*EDIT: * What happens if the Directed graph contains cycles (i.e. it is not acyclic)?
My idea is to order the graph topologically O(n) using DFS, and then think about for what vertices can this property be false. It can be false for those who are joining from 2 different branches, or who are spliting into 2 different branches.
I would go from any starting vertex(lowest in topological ordering) and follow it's path going into random branches, till you cannot go further and delete this path from graph(first component).This would be repeated till the graph is empty and you have all such components.
It seems like a greedy algorithm, but consider you find a very short path in the first run(by having a random bad luck) or you find a longest path(good luck). Then you would still have to find that small branch component in another step of algorithm.
Complexity would be O(n*number of components).
When there is and condition, you should be considering any oriented graph, as DAG cannot have strongly connected component.
The two existing answers both have problems that I've outlined in comments. But there's a more fundamental reason why no decomposition into components can work in general. First, let's concisely express the relation "u and v belong in the same component of the decomposition" as u # v.
It's not transitive
In order to represent a relation # as vertices in a component, that relation must be an equivalence relation, which means among other things that it must transitive: That is, if x # y and y # z, it must necessarily be true that x # z. Is our relation # transitive? Unfortunately the answer is "No", since it may be that there is a path from x to y (so that x # y), and a path from z to y (so that y # z), but no path from x to z or from z to x (so that x # z does not hold), as the following graph shows:
z
|
|
v
x----->y
The problem is that according to the above graph, x and y belong in the same component, and y and z belong in the same component, but x and z belong in different components, which is a contradiction. This means that, in general, it's impossible to represent the relationship # as a decomposition into components.
If an instance happens to be transitive
So there is no solution in general -- but there can still be input graphs for which the relation # happens to be transitive, and for which we can therefore compute a solution. Here is one way to do that (though probably not the most efficient way).
Compute shortest paths between all pairs of vertices (using e.g. the Floyd-Warshall algorithm, in O(n^3) time for n vertices). Now, for every vertex pair (u, v), either d(u, v) = inf, indicating that there is no way to reach v from u at all, or not, indicating that there is some path from u to v. To answer the question "Does u # v hold?" (i.e., "Do u and v belong in the same component of the decomposition?"), we can simply calculate d(u, v) != inf || d(v, u) != inf.
This gives us a relation that we can use to build an undirected graph G' in which there is a vertex u' for each original vertex u, and an edge between two vertices u' and v' if and only if d(u, v) != inf || d(v, u) != inf. Intuitively, every connected component in this new graph must be a clique. This property can be checked in O(n^2) time by first performing a series of DFS traversals from each vertex to assign a component label to each vertex, and then checking that each pair of vertices belongs to the same component if and only if they are connected by an edge. If the property holds then the resulting cliques correspond to the desired decomposition; otherwise, there is no valid decomposition.
Interestingly, there are graphs that are not chains of strongly connected components (as claimed by Zotta), but which nonetheless do have transitive # relations. For example, a tournament is a digraph in which there is an edge, in some direction, between every pair of vertices -- so clearly # holds for every pair of vertices in such a graph. But if we number the vertices 1 to n and include only edges from lower-numbered to higher-numbered vertices, there will be no cycles, and thus the graph is not strongly connected (and if n > 2, then clearly it's not a path).

Check if unidirected graph is a tree

I want to check if my unidirected graph is a tree. Tree is an acyclic and connected graph. I have a function that checks if graph is connected. So it is enough to be a tree if graph is connected and |E|=|V|-1?
You are correct, E = V - 1 is sufficient to check that your graph is a tree.
The logic is that every tree begins with just a root note (V=1, E=0, so E=V-1), and from there, any time we add one node (V=V+1), we must also add exactly one edge (E=E+1). This makes the equation E=V-1 remain true for all trees.
A cycle occurs when we connect two existing nodes with a new edge (E=E+1 but V stays the same), rendering the equation E=V-1 false.
If it interests you, you may want to read about the more general formula v - e + f = 2, where f is the number of regions inside a graph, including the exterior region. (A tree only has an exterior region so f=1). This rule is called Euler's Formula, which you can read about on Wikipedia.
Connected: It means that for every pair of vertices you choose, there will always be a path between them.
|E|=|V|-1: if your graph has |V| vertices and you are given |E|=|V|-1 edges to connect them, then if you form a cycle, you won't be able to form a connected graph (some vertices will remain without edges). We can conclude that these conditions are enough.

How to proof by induction that a strongly connected directed graph has at most 2n -2 edges?

I have a directed graph which is strongly connected, but that removing any edge from it makes the graph no longer strongly connected.
How can I prove that such a graph has no more than 2n − 2 edges? (where n ≥ 3)
I've been searching literature for a couple of days but it seems such a proof never been made. Any hints are appreciated.
Here's one outline (details omitted to avoid completely spoiling an exam question).
Prove that the graph G has a simple cycle C.
Prove that every arc in G whose tail and head belong to V(C) belongs to C.
Prove that G/C (graph obtained from G by contracting every arc in C) is strongly connected and that, for all arcs e in G/C, the subgraph G/C - e is not strongly connected.
Conclude by strong induction that G has at most 2|V(G)| - 2 arcs.
To my understanding, you can prove it constructively using a very simple algorithm, and maybe this can help shed some light on a possible proof by induction.
You first pick up an arbitrary node r and run BFS from it - what you get is a directed tree with exactly n-1 edges and n vertices (all reachable from r).
Now, obtain the transposed graph (G^T) from the original, and again run BFS from r - what you get is a directed tree with exactly n-1 edges and n vertices (all reachable from r).
At last, examine each edge in the later tree and add it (reversed) to the first tree (only if not already in it). This step guarantees that r is reachable from every vertex in the graph, and as every vertex is reachable from r - what you get is a strongly connected spanning sub-graph.
Note that we have added at most n-1 edges to the first tree with n-1 to begin with - and hence there are at most n-1 + n-1 = 2n-2 edges in the resulting graph.
This is untrue. Proof by counter-example.
Graph has nodes A, B, and C
A -> B
B -> A
A -> C
B -> C
C -> B
This is strongly connected.
If I removed C->B, then C is isolated (you cannot get to anything from it) and is not strongly connected. Thus I have provided a graph that:
Is strongly connected
Has more than 2n-2 nodes
If I remove one edge, it is no longer strongly connected

Resources