Graph Traversal using DFS - algorithm

I am learning graph traversal from The Algorithm Design Manual by Steven S. Skiena. In his book, he has provided the code for traversing the graph using dfs. Below is the code.
dfs(graph *g, int v)
{
edgenode *p;
int y;
if (finished) return;
discovered[v] = TRUE;
time = time + 1;
entry_time[v] = time;
process_vertex_early(v);
p = g->edges[v];
while (p != NULL) {
/* temporary pointer */
/* successor vertex */
/* allow for search termination */
y = p->y;
if (discovered[y] == FALSE) {
parent[y] = v;
process_edge(v,y);
dfs(g,y);
}
else if ((!processed[y]) || (g->directed))
process_edge(v,y);
}
if (finished) return;
p = p->next;
}
process_vertex_late(v);
time = time + 1;
exit_time[v] = time;
processed[v] = TRUE;
}
In a undirected graph, it looks like below code is processing the edge twice (calling the method process_edge(v,y). One while traversing the vertex v and another at processing the vertex y) . So I have added the condition parent[v]!=y in else if ((!processed[y]) || (g->directed)). It processes the edge only once. However, I am not sure how to modify this code to work with the parallel edge and self-loop edge. The code should process the parallel edge and self-loop.

Short Answer:
Substitute your (parent[v]!=y) for (!processed[y]) instead of adding it to the condition.
Detailed Answer:
In my opinion there is a mistake in the implementation written in the book, which you discovered and fixed (except for parallel edges. More on that below). The implementation is supposed to be correct for both directed and undeirected graphs, with the distinction between them recorded in the g->directed boolean property.
In the book, just before the implementation the author writes:
The other important property of a depth-first search is that it partitions the
edges of an undirected graph into exactly two classes: tree edges and back edges. The
tree edges discover new vertices, and are those encoded in the parent relation. Back
edges are those whose other endpoint is an ancestor of the vertex being expanded,
so they point back into the tree.
So the condition (!processed[y]) is supposed to handle undirected graphs (as the condition (g->directed) is to handle directed graphs) by allowing the algorithm to process the edges that are back-edges and preventing it from re-process those that are tree edges (in the opposite direction). As you noticed, though, the tree-edges are treated as back-edges when read through the child with this condition so you should just replace this condition with your suggested (parent[v]!=y).
The condition (!processed[y]) will ALWAYS be true for an undirected graph when the algorithm reads it as long as there are no parallel edges (further details why this is true - *). If there are parallel edges - those parallel edges that are read after the first "copy" of them will yield false and the edge will not be processed, when it should be. Your suggested condition, however, will distinguish between tree-edges and the rest (back-edges, parallel edges and self-loops) and allow the algorithm to process only those that are not tree-edges in the opposite direction.
To refer to self-edges, they should be fine both with the new and old conditions: they are edges with y==v. Getting to them, y is discovered (because v is discovered before going through its edges), not processed (v is processed only as the last line - after going through its edges) and it is not v's parent (v is not its own parent).
*Going through v's edges, the algorithm reads this condition for y that has been discovered (so it doesn't go into the first conditional block). As quoted above (in the book there is a semi-proof for that as well which I will include at the end of this footnote), p is either a tree-edge or a back-edge. As y is discovered, it cannot be a tree-edge from v to y. It can be a back edge to an ancestor which means the call is in a recursion call that started processing this ancestor at some point, and so the ancestor's call has yet to reach the final line, marking it as processed (so it is still marked as not processed) and it can be a tree-edge from y to v, in which case the same situation holds - and y is still marked as not processed.
The semi-proof for every edge being a tree-edge or a back-edge:
Why can’t an edge go to a brother or cousin node instead of an ancestor?
All nodes reachable from a given vertex v are expanded before we finish with the
traversal from v, so such topologies are impossible for undirected graphs.

You are correct.
Quoting the book's (2nd edition) errata:
(*) Page 171, line -2 -- The dfs code has a bug, where each tree edge
is processed twice in undirected graphs. The test needs to be
strengthed to be:
else if (((!processed[y]) && (parent[v]!=y)) || (g->directed))
As for cycles - see here

Related

Write an efficient algorithm to divide the tree into connected components with at most V/2 vertices

I'm referring to the link below to try and write an algorithm to find a vertex in a tree so that removing that vertex gives connected components with the size of each component being at most V/2 vertices.
https://math.stackexchange.com/questions/1742440/you-can-always-delete-a-vertex-from-a-tree-g-such-that-the-remaining-connected
I do understand the proof given in the accepted answer which uses arrows to find that vertex. I can't quite figure out how to write an algorithm for the same.
I will just explain the proof and then later on give the pseudo code so that you can understand the psuedo code easily. The vertex that you are looking for is called centroid. So basically we need to find the centroid of the tree.
First of all this needs to be clear that there can be only one node that satisfies this property.
Let the given tree be T. Start from any vertex claiming to be the required vertex. Then check whether this is true or not. If this is the required vertex then nothing needs to be done. If this is not the vertex then select the next vertex adjacent to the current vertex that is the part of the subtree which had more than n/2 vertices in it. Repeat the process until you find the answer.
Now the pseudo code. Here are the meaning of the variables used.
v_centroid stores the centroid
v[i] stores the list of all nodes that are connected to i
size[i] stores the size of subtree of i.
v_centroid = any vertex
dfs(v_centroid,parent) // v_centroid is the assumed centroid and parent is parent of the node processing. For initial call you can use -1 as parent or any other undefined value suitable.
v_centroid = findCentroid(v_centroid,v_centroid)
func dfs(int node, int parent)
size[node] := 1
for i in v[node]
if(*i not equals parent)
dfs(*i, node)
size[node] = size[node] + size[parent]
end if
end for
end func
func findCentroid(int node, int parent)
for i in v[node]
if(i not equals parent and size[i]>MAX_SIZE/2)
return findCentroid(i, node)
end if
end for
return node
end func

pseudo-code for visiting a state machine

I want to write pseudo-code for visiting a state machine using DFS. State machine can be considered as a directed graph. Following algorithm from Cormen book uses a DFS algorithm to visit a graph.
DFS-VISIT(G, u) //G= graph, u=root vertex
u.color = GRAY
for each v from G.Adjacents(u) //edge (u, v)
if v.color == WHITE
DFS-VISIT(G, v)
State machine however can have multiple edges between the two vertices. And above algorithm stores edges in an adjacency list. I have implemented the algorithm in Java with following classes,
class Node{
String name;
....
ArrayList<Transition> transitions;
void addTransition(Transition tr);
}
class Transition
{
String src;
String dest;
}
With the above given information, I have built a state machine with node and transition objects. I want to modify the above algorithm where I don't have a graph object G. I just have access to root object.
Can I modify the above algorithm like below?
DFS-VISIT(root) //root node
root.color = GRAY
for each t from root.getTransitions() //t=transition
v = t.getDestination() //v=destination of t
if v.color == WHITE
DFS-VISIT(v)
The algorithm is independent of implementation. Actually, it's the other way around. The question you should be asking is this: "do your proprietary implementation has all the exact properties that are required by the algorithm?"
The algorithm has very few strict demands. You need a set of nodes and a set of edges, where each edge connects 2 nodes. That's the generic definition of graph. How you acquire, store and process these sets is irrelevant to the algorithm. For the algorithm step all you need is access to a given node from the set and access to a set of its neighbors. What you've presented seems fine (of course, for the next step you'll need to progress to the next node after root).

How to create distinct set from other sets?

While solving the problems on Techgig.com, I got struck with one one of the problem. The problem is like this:
A company organizes two trips for their employees in a year. They want
to know whether all the employees can be sent on the trip or not. The
condition is like, no employee can go on both the trips. Also to
determine which employee can go together the constraint is that the
employees who have worked together in past won't be in the same group.
Examples of the problems:
Suppose the work history is given as follows: {(1,2),(2,3),(3,4)};
then it is possible to accommodate all the four employees in two trips
(one trip consisting of employees 1& 3 and other having employees 2 &
4). Neither of the two employees in the same trip have worked together
in past. Suppose the work history is given as {(1,2),(1,3),(2,3)} then
there is no way possible to have two trips satisfying the company rule
and accommodating all the employees.
Can anyone tell me how to proceed on this problem?
I am using this code for DFS and coloring the vertices.
static boolean DFS(int rootNode) {
Stack<Integer> s = new Stack<Integer>();
s.push(rootNode);
state[rootNode] = true;
color[rootNode] = 1;
while (!s.isEmpty()) {
int u = s.peek();
for (int child = 0; child < numofemployees; child++) {
if (adjmatrix[u][child] == 1) {
if (!state[child]) {
state[child] = true;
s.push(child);
color[child] = color[u] == 1 ? 2 : 1;
break;
} else {
s.pop();
if (color[u] == color[child])
return false;
}
}
}
}
return true;
}
This problem is functionally equivalent to testing if an undirected graph is bipartite. A bipartite graph is a graph for which all of the nodes can be distributed among two sets, and within each set, no node is adjacent to another node.
To solve the problem, take the following steps.
Using the adjacency pairs, construct an undirected graph. This is pretty straightforward: each number represents a node, and for each pair you are given, form a connection between those nodes.
Test the newly generated graph for bipartiteness. This can be achieved in linear time, as described here.
If the graph is bipartite and you've generated the two node sets, the answer to the problem is yes, and each node set, along with its nodes (employees), correspond to one of the two trips.
Excerpt on how to test for bipartiteness:
It is possible to test whether a graph is bipartite, and to return
either a two-coloring (if it is bipartite) or an odd cycle (if it is
not) in linear time, using depth-first search. The main idea is to
assign to each vertex the color that differs from the color of its
parent in the depth-first search tree, assigning colors in a preorder
traversal of the depth-first-search tree. This will necessarily
provide a two-coloring of the spanning tree consisting of the edges
connecting vertices to their parents, but it may not properly color
some of the non-tree edges. In a depth-first search tree, one of the
two endpoints of every non-tree edge is an ancestor of the other
endpoint, and when the depth first search discovers an edge of this
type it should check that these two vertices have different colors. If
they do not, then the path in the tree from ancestor to descendant,
together with the miscolored edge, form an odd cycle, which is
returned from the algorithm together with the result that the graph is
not bipartite. However, if the algorithm terminates without detecting
an odd cycle of this type, then every edge must be properly colored,
and the algorithm returns the coloring together with the result that
the graph is bipartite.
I even used a recursive solution but this one is also passing the same number of cases. Am I leaving any special case handling ?
Below is the recursive solution of the problem:
static void dfs(int v, int curr) {
state[v] = true;
color[v] = curr;
for (int i = 0; i < numofemployees; i++) {
if (adjmatrix[v][i] == 1) {
if (color[i] == curr) {
bipartite = false;
return;
}
if (!state[i])
dfs(i, curr == 1 ? 2 : 1);
}
}
}
I am calling this function from main() as dfs(0,1) where 0 is the starting vertex and 1 is one of the color

How to detect if a directed graph is cyclic?

How can we detect if a directed graph is cyclic? I thought using breadth first search, but I'm not sure. Any ideas?
What you really need, I believe, is a topological sorting algorithm like the one described here:
http://en.wikipedia.org/wiki/Topological_sorting
If the directed graph has a cycle then the algorithm will fail.
The comments/replies that I've seen so far seem to be missing the fact that in a directed graph there may be more than one way to get from node X to node Y without there being any (directed) cycles in the graph.
Usually depth-first search is used instead. I don't know if BFS is applicable easily.
In DFS, a spanning tree is built in order of visiting. If a the ancestor of a node in the tree is visited (i.e. a back-edge is created), then we detect a cycle.
See http://www.cs.nyu.edu/courses/summer04/G22.1170-001/6a-Graphs-More.pdf for a more detailed explanation.
Use DFS to search if any path is cyclic
class Node<T> { T value; List<Node<T>> adjacent; }
class Graph<T>{
List<Node<T>> nodes;
public boolean isCyclicRec()
{
for (Node<T> node : nodes)
{
Set<Node<T>> initPath = new HashSet<>();
if (isCyclicRec(node, initPath))
{
return true;
}
}
return false;
}
private boolean isCyclicRec(Node<T> currNode, Set<Node<T>> path)
{
if (path.contains(currNode))
{
return true;
}
else
{
path.add(currNode);
for (Node<T> node : currNode.adjacent)
{
if (isCyclicRec(node, path))
{
return true;
}
else
{
path.remove(node);
}
}
}
return false;
}
approach:1
how about a level no assignment to detect a cycle. eg: consider the graph below. A->(B,C) B->D D->(E,F) E,F->(G) E->D As you perform a DFS start assigning a level no to the node you visit (root A=0). level no of node = parent+1. So A=0, B=1, D=2, F=3, G=4 then, recursion reaches D, so E=3. Dont mark level for G (G already a level no assigned which is grater than E) Now E also has an edge to D. So levelization would say D should get a level no of 4. But D already has a "lower level" assigned to it of 2. Thus any time you attempt to assign a level number to a node while doing DFS that already has a lower level number set to it, you know the directed graph has a cycle..
approach2:
use 3 colors. white, gray, black. color only white nodes, white nodes to gray as you go down the DFS, color gray nodes to black when recursion unfolds (all children are processed). if not all children yet processed and you hit a gray node thats a cycle.
eg: all white to begin in above direct graph.
color A, B, D, F,G are colored white-gray. G is leaf so all children processed color it gray to black. recursion unfolds to F(all children processed) color it black. now you reach D, D has unprocessed children, so color E gray, G already colored black so dont go further down. E also has edge to D, so while still processing D (D still gray), you find an edge back to D(a gray node), a cycle is detected.
Testing for Topological sort over the given graph will lead you to the solution. If the algorithm for topsort, i.e the edges should always be directed in one way fails, then it means that the graph contains cycles.
Another simple solution would be a mark-and-sweep approach. Basically, for each node in tree you flag it as "visited" and then move on to it's children. If you ever see a node with the "visted" flag set, you know there's a cycle.
If modifying the graph to include a "visited" bit isn't possible, a set of node pointers can be used instead. To flag a node as visited, you place a pointer to it in the set. If the pointer is already in the set, there's a cycle.

Finding all cycles in a directed graph

How can I find (iterate over) ALL the cycles in a directed graph from/to a given node?
For example, I want something like this:
A->B->A
A->B->C->A
but not:
B->C->B
I found this page in my search and since cycles are not same as strongly connected components, I kept on searching and finally, I found an efficient algorithm which lists all (elementary) cycles of a directed graph. It is from Donald B. Johnson and the paper can be found in the following link:
http://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF
A java implementation can be found in:
http://normalisiert.de/code/java/elementaryCycles.zip
A Mathematica demonstration of Johnson's algorithm can be found here, implementation can be downloaded from the right ("Download author code").
Note: Actually, there are many algorithms for this problem. Some of them are listed in this article:
http://dx.doi.org/10.1137/0205007
According to the article, Johnson's algorithm is the fastest one.
Depth first search with backtracking should work here.
Keep an array of boolean values to keep track of whether you visited a node before. If you run out of new nodes to go to (without hitting a node you have already been), then just backtrack and try a different branch.
The DFS is easy to implement if you have an adjacency list to represent the graph. For example adj[A] = {B,C} indicates that B and C are the children of A.
For example, pseudo-code below. "start" is the node you start from.
dfs(adj,node,visited):
if (visited[node]):
if (node == start):
"found a path"
return;
visited[node]=YES;
for child in adj[node]:
dfs(adj,child,visited)
visited[node]=NO;
Call the above function with the start node:
visited = {}
dfs(adj,start,visited)
The simplest choice I found to solve this problem was using the python lib called networkx.
It implements the Johnson's algorithm mentioned in the best answer of this question but it makes quite simple to execute.
In short you need the following:
import networkx as nx
import matplotlib.pyplot as plt
# Create Directed Graph
G=nx.DiGraph()
# Add a list of nodes:
G.add_nodes_from(["a","b","c","d","e"])
# Add a list of edges:
G.add_edges_from([("a","b"),("b","c"), ("c","a"), ("b","d"), ("d","e"), ("e","a")])
#Return a list of cycles described as a list o nodes
list(nx.simple_cycles(G))
Answer: [['a', 'b', 'd', 'e'], ['a', 'b', 'c']]
First of all - you do not really want to try find literally all cycles because if there is 1 then there is an infinite number of those. For example A-B-A, A-B-A-B-A etc. Or it may be possible to join together 2 cycles into an 8-like cycle etc., etc... The meaningful approach is to look for all so called simple cycles - those that do not cross themselves except in the start/end point. Then if you wish you can generate combinations of simple cycles.
One of the baseline algorithms for finding all simple cycles in a directed graph is this: Do a depth-first traversal of all simple paths (those that do not cross themselves) in the graph. Every time when the current node has a successor on the stack a simple cycle is discovered. It consists of the elements on the stack starting with the identified successor and ending with the top of the stack. Depth first traversal of all simple paths is similar to depth first search but you do not mark/record visited nodes other than those currently on the stack as stop points.
The brute force algorithm above is terribly inefficient and in addition to that generates multiple copies of the cycles. It is however the starting point of multiple practical algorithms which apply various enhancements in order to improve performance and avoid cycle duplication. I was surprised to find out some time ago that these algorithms are not readily available in textbooks and on the web. So I did some research and implemented 4 such algorithms and 1 algorithm for cycles in undirected graphs in an open source Java library here : http://code.google.com/p/niographs/ .
BTW, since I mentioned undirected graphs : The algorithm for those is different. Build a spanning tree and then every edge which is not part of the tree forms a simple cycle together with some edges in the tree. The cycles found this way form a so called cycle base. All simple cycles can then be found by combining 2 or more distinct base cycles. For more details see e.g. this : http://dspace.mit.edu/bitstream/handle/1721.1/68106/FTL_R_1982_07.pdf .
The DFS-based variants with back edges will find cycles indeed, but in many cases it will NOT be minimal cycles. In general DFS gives you the flag that there is a cycle but it is not good enough to actually find cycles. For example, imagine 5 different cycles sharing two edges. There is no simple way to identify cycles using just DFS (including backtracking variants).
Johnson's algorithm is indeed gives all unique simple cycles and has good time and space complexity.
But if you want to just find MINIMAL cycles (meaning that there may be more then one cycle going through any vertex and we are interested in finding minimal ones) AND your graph is not very large, you can try to use the simple method below.
It is VERY simple but rather slow compared to Johnson's.
So, one of the absolutely easiest way to find MINIMAL cycles is to use Floyd's algorithm to find minimal paths between all the vertices using adjacency matrix.
This algorithm is nowhere near as optimal as Johnson's, but it is so simple and its inner loop is so tight that for smaller graphs (<=50-100 nodes) it absolutely makes sense to use it.
Time complexity is O(n^3), space complexity O(n^2) if you use parent tracking and O(1) if you don't.
First of all let's find the answer to the question if there is a cycle.
The algorithm is dead-simple. Below is snippet in Scala.
val NO_EDGE = Integer.MAX_VALUE / 2
def shortestPath(weights: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
weights(i)(j) = throughK
}
}
}
Originally this algorithm operates on weighted-edge graph to find all shortest paths between all pairs of nodes (hence the weights argument). For it to work correctly you need to provide 1 if there is a directed edge between the nodes or NO_EDGE otherwise.
After algorithm executes, you can check the main diagonal, if there are values less then NO_EDGE than this node participates in a cycle of length equal to the value. Every other node of the same cycle will have the same value (on the main diagonal).
To reconstruct the cycle itself we need to use slightly modified version of algorithm with parent tracking.
def shortestPath(weights: Array[Array[Int]], parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = k
weights(i)(j) = throughK
}
}
}
Parents matrix initially should contain source vertex index in an edge cell if there is an edge between the vertices and -1 otherwise.
After function returns, for each edge you will have reference to the parent node in the shortest path tree.
And then it's easy to recover actual cycles.
All in all we have the following program to find all minimal cycles
val NO_EDGE = Integer.MAX_VALUE / 2;
def shortestPathWithParentTracking(
weights: Array[Array[Int]],
parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = parents(i)(k)
weights(i)(j) = throughK
}
}
}
def recoverCycles(
cycleNodes: Seq[Int],
parents: Array[Array[Int]]): Set[Seq[Int]] = {
val res = new mutable.HashSet[Seq[Int]]()
for (node <- cycleNodes) {
var cycle = new mutable.ArrayBuffer[Int]()
cycle += node
var other = parents(node)(node)
do {
cycle += other
other = parents(other)(node)
} while(other != node)
res += cycle.sorted
}
res.toSet
}
and a small main method just to test the result
def main(args: Array[String]): Unit = {
val n = 3
val weights = Array(Array(NO_EDGE, 1, NO_EDGE), Array(NO_EDGE, NO_EDGE, 1), Array(1, NO_EDGE, NO_EDGE))
val parents = Array(Array(-1, 1, -1), Array(-1, -1, 2), Array(0, -1, -1))
shortestPathWithParentTracking(weights, parents)
val cycleNodes = parents.indices.filter(i => parents(i)(i) < NO_EDGE)
val cycles: Set[Seq[Int]] = recoverCycles(cycleNodes, parents)
println("The following minimal cycle found:")
cycles.foreach(c => println(c.mkString))
println(s"Total: ${cycles.size} cycle found")
}
and the output is
The following minimal cycle found:
012
Total: 1 cycle found
To clarify:
Strongly Connected Components will find all subgraphs that have at least one cycle in them, not all possible cycles in the graph. e.g. if you take all strongly connected components and collapse/group/merge each one of them into one node (i.e. a node per component), you'll get a tree with no cycles (a DAG actually). Each component (which is basically a subgraph with at least one cycle in it) can contain many more possible cycles internally, so SCC will NOT find all possible cycles, it will find all possible groups that have at least one cycle, and if you group them, then the graph will not have cycles.
to find all simple cycles in a graph, as others mentioned, Johnson's algorithm is a candidate.
I was given this as an interview question once, I suspect this has happened to you and you are coming here for help. Break the problem into three questions and it becomes easier.
how do you determine the next valid
route
how do you determine if a point has
been used
how do you avoid crossing over the
same point again
Problem 1)
Use the iterator pattern to provide a way of iterating route results. A good place to put the logic to get the next route is probably the "moveNext" of your iterator. To find a valid route, it depends on your data structure. For me it was a sql table full of valid route possibilities so I had to build a query to get the valid destinations given a source.
Problem 2)
Push each node as you find them into a collection as you get them, this means that you can see if you are "doubling back" over a point very easily by interrogating the collection you are building on the fly.
Problem 3)
If at any point you see you are doubling back, you can pop things off the collection and "back up". Then from that point try to "move forward" again.
Hack: if you are using Sql Server 2008 there is are some new "hierarchy" things you can use to quickly solve this if you structure your data in a tree.
In the case of undirected graph, a paper recently published (Optimal listing of cycles and st-paths in undirected graphs) offers an asymptotically optimal solution. You can read it here http://arxiv.org/abs/1205.2766 or here http://dl.acm.org/citation.cfm?id=2627951
I know it doesn't answer your question, but since the title of your question doesn't mention direction, it might still be useful for Google search
Start at node X and check for all child nodes (parent and child nodes are equivalent if undirected). Mark those child nodes as being children of X. From any such child node A, mark it's children of being children of A, X', where X' is marked as being 2 steps away.). If you later hit X and mark it as being a child of X'', that means X is in a 3 node cycle. Backtracking to it's parent is easy (as-is, the algorithm has no support for this so you'd find whichever parent has X').
Note: If graph is undirected or has any bidirectional edges, this algorithm gets more complicated, assuming you don't want to traverse the same edge twice for a cycle.
If what you want is to find all elementary circuits in a graph you can use the EC algorithm, by JAMES C. TIERNAN, found on a paper since 1970.
The very original EC algorithm as I managed to implement it in php (hope there are no mistakes is shown below). It can find loops too if there are any. The circuits in this implementation (that tries to clone the original) are the non zero elements. Zero here stands for non-existence (null as we know it).
Apart from that below follows an other implementation that gives the algorithm more independece, this means the nodes can start from anywhere even from negative numbers, e.g -4,-3,-2,.. etc.
In both cases it is required that the nodes are sequential.
You might need to study the original paper, James C. Tiernan Elementary Circuit Algorithm
<?php
echo "<pre><br><br>";
$G = array(
1=>array(1,2,3),
2=>array(1,2,3),
3=>array(1,2,3)
);
define('N',key(array_slice($G, -1, 1, true)));
$P = array(1=>0,2=>0,3=>0,4=>0,5=>0);
$H = array(1=>$P, 2=>$P, 3=>$P, 4=>$P, 5=>$P );
$k = 1;
$P[$k] = key($G);
$Circ = array();
#[Path Extension]
EC2_Path_Extension:
foreach($G[$P[$k]] as $j => $child ){
if( $child>$P[1] and in_array($child, $P)===false and in_array($child, $H[$P[$k]])===false ){
$k++;
$P[$k] = $child;
goto EC2_Path_Extension;
} }
#[EC3 Circuit Confirmation]
if( in_array($P[1], $G[$P[$k]])===true ){//if PATH[1] is not child of PATH[current] then don't have a cycle
$Circ[] = $P;
}
#[EC4 Vertex Closure]
if($k===1){
goto EC5_Advance_Initial_Vertex;
}
//afou den ksana theoreitai einai asfales na svisoume
for( $m=1; $m<=N; $m++){//H[P[k], m] <- O, m = 1, 2, . . . , N
if( $H[$P[$k-1]][$m]===0 ){
$H[$P[$k-1]][$m]=$P[$k];
break(1);
}
}
for( $m=1; $m<=N; $m++ ){//H[P[k], m] <- O, m = 1, 2, . . . , N
$H[$P[$k]][$m]=0;
}
$P[$k]=0;
$k--;
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC5_Advance_Initial_Vertex:
if($P[1] === N){
goto EC6_Terminate;
}
$P[1]++;
$k=1;
$H=array(
1=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
2=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
3=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
4=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
5=>array(1=>0,2=>0,3=>0,4=>0,5=>0)
);
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC6_Terminate:
print_r($Circ);
?>
then this is the other implementation, more independent of the graph, without goto and without array values, instead it uses array keys, the path, the graph and circuits are stored as array keys (use array values if you like, just change the required lines). The example graph start from -4 to show its independence.
<?php
$G = array(
-4=>array(-4=>true,-3=>true,-2=>true),
-3=>array(-4=>true,-3=>true,-2=>true),
-2=>array(-4=>true,-3=>true,-2=>true)
);
$C = array();
EC($G,$C);
echo "<pre>";
print_r($C);
function EC($G, &$C){
$CNST_not_closed = false; // this flag indicates no closure
$CNST_closed = true; // this flag indicates closure
// define the state where there is no closures for some node
$tmp_first_node = key($G); // first node = first key
$tmp_last_node = $tmp_first_node-1+count($G); // last node = last key
$CNST_closure_reset = array();
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$CNST_closure_reset[$k] = $CNST_not_closed;
}
// define the state where there is no closure for all nodes
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$H[$k] = $CNST_closure_reset; // Key in the closure arrays represent nodes
}
unset($tmp_first_node);
unset($tmp_last_node);
# Start algorithm
foreach($G as $init_node => $children){#[Jump to initial node set]
#[Initial Node Set]
$P = array(); // declare at starup, remove the old $init_node from path on loop
$P[$init_node]=true; // the first key in P is always the new initial node
$k=$init_node; // update the current node
// On loop H[old_init_node] is not cleared cause is never checked again
do{#Path 1,3,7,4 jump here to extend father 7
do{#Path from 1,3,8,5 became 2,4,8,5,6 jump here to extend child 6
$new_expansion = false;
foreach( $G[$k] as $child => $foo ){#Consider each child of 7 or 6
if( $child>$init_node and isset($P[$child])===false and $H[$k][$child]===$CNST_not_closed ){
$P[$child]=true; // add this child to the path
$k = $child; // update the current node
$new_expansion=true;// set the flag for expanding the child of k
break(1); // we are done, one child at a time
} } }while(($new_expansion===true));// Do while a new child has been added to the path
# If the first node is child of the last we have a circuit
if( isset($G[$k][$init_node])===true ){
$C[] = $P; // Leaving this out of closure will catch loops to
}
# Closure
if($k>$init_node){ //if k>init_node then alwaya count(P)>1, so proceed to closure
$new_expansion=true; // $new_expansion is never true, set true to expand father of k
unset($P[$k]); // remove k from path
end($P); $k_father = key($P); // get father of k
$H[$k_father][$k]=$CNST_closed; // mark k as closed
$H[$k] = $CNST_closure_reset; // reset k closure
$k = $k_father; // update k
} } while($new_expansion===true);//if we don't wnter the if block m has the old k$k_father_old = $k;
// Advance Initial Vertex Context
}//foreach initial
}//function
?>
I have analized and documented the EC but unfortunately the documentation is in Greek.
There are two steps (algorithms) involved in finding all cycles in a DAG.
The first step is to use Tarjan's algorithm to find the set of strongly connected components.
Start from any arbitrary vertex.
DFS from that vertex. For each node x, keep two numbers, dfs_index[x] and dfs_lowval[x].
dfs_index[x] stores when that node is visited, while dfs_lowval[x] = min(dfs_low[k]) where
k is all the children of x that is not the directly parent of x in the dfs-spanning tree.
All nodes with the same dfs_lowval[x] are in the same strongly connected component.
The second step is to find cycles (paths) within the connected components. My suggestion is to use a modified version of Hierholzer's algorithm.
The idea is:
Choose any starting vertex v, and follow a trail of edges from that vertex until you return to v.
It is not possible to get stuck at any vertex other than v, because the even degree of all vertices ensures that, when the trail enters another vertex w there must be an unused edge leaving w. The tour formed in this way is a closed tour, but may not cover all the vertices and edges of the initial graph.
As long as there exists a vertex v that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from v, following unused edges until you return to v, and join the tour formed in this way to the previous tour.
Here is the link to a Java implementation with a test case:
http://stones333.blogspot.com/2013/12/find-cycles-in-directed-graph-dag.html
I stumbled over the following algorithm which seems to be more efficient than Johnson's algorithm (at least for larger graphs). I am however not sure about its performance compared to Tarjan's algorithm.
Additionally, I only checked it out for triangles so far. If interested, please see "Arboricity and Subgraph Listing Algorithms" by Norishige Chiba and Takao Nishizeki (http://dx.doi.org/10.1137/0214017)
DFS from the start node s, keep track of the DFS path during traversal, and record the path if you find an edge from node v in the path to s. (v,s) is a back-edge in the DFS tree and thus indicates a cycle containing s.
Regarding your question about the Permutation Cycle, read more here:
https://www.codechef.com/problems/PCYCLE
You can try this code (enter the size and the digits number):
# include<cstdio>
using namespace std;
int main()
{
int n;
scanf("%d",&n);
int num[1000];
int visited[1000]={0};
int vindex[2000];
for(int i=1;i<=n;i++)
scanf("%d",&num[i]);
int t_visited=0;
int cycles=0;
int start=0, index;
while(t_visited < n)
{
for(int i=1;i<=n;i++)
{
if(visited[i]==0)
{
vindex[start]=i;
visited[i]=1;
t_visited++;
index=start;
break;
}
}
while(true)
{
index++;
vindex[index]=num[vindex[index-1]];
if(vindex[index]==vindex[start])
break;
visited[vindex[index]]=1;
t_visited++;
}
vindex[++index]=0;
start=index+1;
cycles++;
}
printf("%d\n",cycles,vindex[0]);
for(int i=0;i<(n+2*cycles);i++)
{
if(vindex[i]==0)
printf("\n");
else
printf("%d ",vindex[i]);
}
}
DFS c++ version for the pseudo-code in second floor's answer:
void findCircleUnit(int start, int v, bool* visited, vector<int>& path) {
if(visited[v]) {
if(v == start) {
for(auto c : path)
cout << c << " ";
cout << endl;
return;
}
else
return;
}
visited[v] = true;
path.push_back(v);
for(auto i : G[v])
findCircleUnit(start, i, visited, path);
visited[v] = false;
path.pop_back();
}
http://www.me.utexas.edu/~bard/IP/Handouts/cycles.pdf
The CXXGraph library give a set of algorithms and functions to detect cycles.
For a full algorithm explanation visit the wiki.

Resources