Find disjoint sets of vertexes in a Graph - algorithm

I want to find a simple method to generate sets of disjoint parts in a Graph. In other words, in the following Graph, I want to get two sets of {A, B, C, D} and {E, F}.

You can use any graph traversal algorithm (BFS and DFS are the most common).
Whenever the algorithm is "stuck" (there is no more nodes to traverse), you have finished finding one component, mark it, and choose a random vertex that was not traversed yet to find the next component.

Related

In a DAG, how to find vertices where paths converge?

I have a type of directed acyclic graph, with some constraints.
There is only one "entry" vertex
There can be multiple leaf vertices
Once a path splits, anything under that path cannot reach into the other path (this will become clearer with some examples below)
There can be any number of "split" vertices. They can be nested.
A "split" vertex can split into any number of paths. The examples below only show 2 paths for each, but it could be more.
My challenge is the following: for each "split" vertex (any vertex that has at least 2 outgoing edges), find the vertices where its paths reconnect - if such a vertex exists. The solution should be as efficient as possible.
Example A:
example a
In this example, vertex A is a "split" vertex, and its "reconnect vertex" is F.
Example B:
example b
Here, there are two split vertices: A and E. For both of them vertex G is the reconnect vertex.
Example C:
example c
Now there are three split vertices: A, D and E. The corresponding reconnect vertices are:
A -> K
D -> K
E -> J
Example D:
example d
Here we have three split vertices again: A, D and E. But this time, vertex E doesn't have a reconnect vertex because one of the paths terminates early.
Sounds like what you want is:
Connect each vertex with out-degree 0 to a single terminal vertex
Construct the dominator tree of the edge-reversed graph. The linked wikipedia article points to a couple algorithms for doing this.
The "reconnect vertex" for a split vertex is its immediate dominator in the edge-reversed graph, i.e., its parent in that dominator tree. This is called its "postdominator" in your original graph. If it's the terminal vertex that you added, then it doesn't have a reconnect vertex in your original graph.
This is the problem of identifying post-dominators in compilers and program analysis. This is often used in the context of calculating control dependences in control flow graphs. "Advanced Compiler Design and Implementation" is a good reference on these topics.
If the graph does not have cycles, then the solution (a) suggested by #matt-timmermans will work.
If the graph has cycles, then solution (a) can report spurious post-dominators. In such cases, a network-flow based approach works better. The algorithm to calculate non-termination sensitive control dependence in this paper using this approach. The basic idea is
at every split node, inject a unique token into the graph along each outgoing edge and
propagate the tokens thru the graph subject to this constraint: if node n is reachable from split node m, then tokens arriving at node m pass thru node n only if all tokens of node m have arrived at node n.
At the end, node n post-dominates node m if all tokens of node m have arrived at node n.

Find Minimum Vertex Connected Sub-graph

First of all, I have to admit I'm not good at graph theory.
I have a weakly connected directed graph G=(V,E) where V is about 16 millions and E is about 180 millions.
For a given set S, which is a subset of V (size of S will be around 30), is it possible to find a weakly connected sub-graph G'=(V',E') where S is a subset of V' but try to keep the number of V' and E' as small as possible?
The graph G may change and I hope there's a way to find the sub-graph in real time. (When a process is writing into G, G will be locked, so don't worry about G get changed when your sub-graph calculation is still running.)
My current solution is find the shortest path for each pair of vertex in S and merge those paths to get the sub-graph. The result is OK but the running time is pretty expensive.
Is there a better way to solve this problem?
If you're happy with the results from your current approach, then it's certainly possible to do at least as well a lot faster:
Assign each vertex in S to a set in a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure. Then:
Do a breadth-first-search of the graph, starting with S as the root set.
When you the search discovers a new vertex, remember its predecessor and assign it to the same set as its predecessor.
When you discover an edge that connects two sets, merge the sets and follow the predecessor links to add the connecting path to G'
Another way to think about doing exactly the same thing:
Sort all the edges in E according to their distance from S. You can use BFS discovery order for this
Use Kruskal's algorithm to generate a spanning tree for G, processing the edges in that order (https://en.wikipedia.org/wiki/Kruskal%27s_algorithm)
Pick a root in S, and remove any subtrees that don't contain a member of S. When you're done, every leaf will be in S.
This will not necessarily find the smallest possible subgraph, but it will minimize its maximum distance from S.

Algorithm for finding every weakly connected component of a directed graph

I am searching for an algorithm for finding every weakly connected component in a directed graph. I know for an undirected graph you can do this via a dfs but this obviously doenst work for an directed graph. I am saving my graph as an adjacents list.
For example:
A -> B
B -> C
D -> X
So A-B-C is a connected component an D-X
I am not searching for an algorithm for finding strongly connected components!!
Unless your memory constraints are too strict, you can keep a second, temporary adjacency list. In that second adjacency list you put each edge a->b and you also put edges in reverse direction. (i.e. b->a) Then, you can use DFS on that adjacency list to find connected components.
A pretty simple solution would be as follows:
Start by creating an undirected graph from the given graph - simply make a copy and add the reverse for each edge to the set of edges. Create a copy of the set of vertices, start with an arbitrary vertex and DFS-traverse the component containing the vertex, removing all traversed nodes from the set and adding them to a list. Repeat this until the list is empty.
In pseudocode:
bimap edges
edges.putAll(graph.edges())
set vertices = graph.vertices()
list result
while !vertices.isEmpty()
list component
vertex a = vertices.removeAny()
dfsTraverse(a , v -> {
vertices.remove(v)
component.add(v)
})
result.add(component)

How to find Strongly Connected Components in a Graph?

I am trying self-study Graph Theory, and now trying to understand how to find SCC in a graph. I have read several different questions/answers on SO (e.g., 1,2,3,4,5,6,7,8), but I cant find one with a complete step-by-step example I could follow.
According to CORMEN (Introduction to Algorithms), one method is:
Call DFS(G) to compute finishing times f[u] for each vertex u
Compute Transpose(G)
Call DFS(Transpose(G)), but in the main loop of DFS, consider the vertices in order of decreasing f[u] (as computed in step 1)
Output the vertices of each tree in the depth-first forest of step 3 as a separate strong connected component
Observe the following graph (question is 3.4 from here. I have found several solutions here and here, but I am trying to break this down and understand it myself.)
Step 1: Call DFS(G) to compute finishing times f[u] for each vertex u
Running DFS starting on vertex A:
Please notice RED text formatted as [Pre-Vist, Post-Visit]
Step 2: Compute Transpose(G)
Step 3. Call DFS(Transpose(G)), but in the main loop of DFS, consider the vertices in order of decreasing f[u] (as computed in step 1)
Okay, so vertices in order of decreasing post-visit(finishing times) values:
{E, B, A, H, G, I , C, D, F ,J}
So at this step, we run DFS on G^T but start with each vertex from above list:
DFS(E): {E}
DFS(B): {B}
DFS(A): {A}
DFS(H): {H, I, G}
DFS(G): remove from list since it is already visited
DFS(I): remove from list since it is already visited
DFS(C): {C, J, F, D}
DFS(J): remove from list since it is already visited
DFS(F): remove from list since it is already visited
DFS(D): remove from list since it is already visited
Step 4: Output the vertices of each tree in the depth-first forest of step 3 as a separate strong connected component.
So we have five strongly connected components: {E}, {B}, {A}, {H, I, G}, {C, J, F, D}
This is what I believe is correct. However, solutions I found here and here say SCCs are {C,J,F,H,I,G,D}, and {A,E,B}. Where are my mistakes?
Your steps are correct and your answer is also correct, by examining the other answers you provided you can see that they used a different algorithm: First you run DFS on G transposed and then you run an undirected components algorithm on G processing the vertices in decreasing order of their post numbers from the previous step.
The problem is they ran this last step on G transposed instead of in G and thus got an incorrent answer. If you read Dasgupta from page 98 onwards you will see a detailed explanation of the algorithm they (tried) to use.
Your answers is correct. As per CLRS, "A strongly connected component of a directed graph G = (V,E) is a maximal set of vertices C, such that for every pair of vertices u and v, we have both u ~> v and v ~> u, i.e. vertices v and u are reachable from each other."
In case you assume {C, J, F, H, I, G, D} as correct, there is no way to reach from D to G (amongst many other fallacies), and same with other set, there is no way to reach from A to E.

How to traverse a Graph based on the angle between two edges

I am stuck in a problem regarding Graph traversal based on the angle between two edges. I would like to summarize the problem as follows, given 5 vertices a,b,c,d,e and the edges (a, b), (b, c), (c, d), (d, e).
If I want to traverse the graph based on calculating the angle between two edges like for example angle((a, b), (b, c)). If my angle is greater than 10 degree I should stop at b and start the process again.
What steps do I need to consider to approach this problem having concrete programming structures.
If I understand correctly, when angle((a,b),(b,c)) returns a value of over some threshold (10, in your example), you should stop traversing the graph.
This means that effectively, this node (b) is not helping by connecting the two edges ((a,b) and (b,c)). It might be useful for some other set of edges, but that specific connection is not available.
What I suggest is swapping the role of edges and nodes. Every edge in G becomes a node in G' and every node in G becomes and edge in G' only if the value of angle() returns a value lower than your threshold.
On G' you can now run BFS, DFS or any other algorithm of your liking. When you are done, use the reverse transformation to "translate" your answer back into the original graph in question.

Resources