Computing depth of each node in a "maximally packed" DAG - algorithm

(Note: I thought about asking this on https://cstheory.stackexchange.com/, but decided my question is not theoretical enough -- it's about an algorithm. If there is a better Stack Exchange community for this post, I'm happy to listen!)
I'm using the terminology "starting node" to mean a node with no links into it, and "terminal node" to mean a node with no links out of it. So the following graph has starting nodes A and B and terminal nodes F and G:
I want to draw it with the following rules:
at least one starting node has a depth of 0.
links always point from top to bottom
nodes are packed vertically as closely as possible
Using those rules, depth of for each node is shown for the graph above. Can someone suggest an algorithm to compute the depth of each node that runs in less than O(n^2) time?
update:
I tweaked the graph to show that the DAG may contain starting and terminal nodes at different depths. (This was a case that I didn't consider in my original buggy answer.) I also switched terminology from "x coordinate" to "depth" in order to emphasize that this is about "graphing" and not "graphics".

Your x coordinate of a node corresponds to the longest way from any node without incomming edges to this node in question. For a DAG it can be calculated in O(N):
given DAG G:
calculate incomming_degree[v] for every v in G
initialize queue q={v with incomming_degree[v]==0}, x[v]=0 for every v in q
while(q not empty):
v=q.pop() #retreive and delete first element
for(w in neighbors of v):
incomming_degree[w]--
if(incomming_degree[w]==0): #no further way to w exists, evaluate
q.offer(w)
x[w]=x[v]+1
x stores the desired information.

Here's one solution which is essentially a two-pass depth-first tree walk. The first pass (traverseA) traces the DAG from the starting nodes (A and B in the O.P.'s example) until encountering terminal nodes (F and G in the example). It them marks them with the maximum depth as traced through the graph.
The second pass (traverseB) starts at the terminal nodes and traces back towards the starting nodes, marking each node along the way with the node's current value OR the previous node's value minus one, whichever is smaller if the node hasn't been visited yet:
function labelDAG() {
nodes.forEach(function(node) { node.depth = -1; }); // initialize
// find and mark terminal nodes
startingNodes().forEach(function(node) { traverseA(node, 0); });
// walk backwards from the terminal nodes
terminalNodes().forEach(function(node) { traverseB(node); });
dumpGraph();
};
function traverseA(node, depth) {
var targets = targetsOf(node);
if (targets.length === 0) {
// we're at a leaf (terminal) node -- set depth
node.depth = Math.max(node.depth, depth);
} else {
// traverse each subtree with depth = depth+1
targets.forEach(function(target) {
traverseA(target, depth+1);
});
};
};
// walk backwards from a terminal node, setting each source node's depth value
// along the way.
function traverseB(node) {
sourcesOf(node).forEach(function(source) {
if ((source.depth === -1) || (source.depth > node.x - 1)) {
// source has not yet been visited, or we found a longer path
// between terminal node and source node.
source.depth = node.depth - 1;
}
traverseB(source);
});
};

Related

Traversal directed graph with cycles

I wrote a script to construct a directed graph using networkx in python, and I want to get all possible path from start to end including cycles.
For example, there is a directed graph:
I want to get these paths:
A->B->D
A->B->C->D
A->B->C->B->D
A->B->C->B->C->B->D
...
As far as I know, there are many algorithms to find shortest paths or paths without cycles between 2 nodes, but I want to find paths with cycles.
Is there any algorithm to achieve this ?
Thx a lot
As noted, there is an infinite number of such paths.
However, you can still generate all of them in a lazy way by maintaining all nodes v (and path you used to reach v) you can reach from the start node in k steps for k=1,2,...; if v is your target node, remember it.
When you have to return the next path, (i) pop the first target node off list, and (ii) generate the next candidates for all non-target nodes on the list. If there is no target node on the list, repeat (ii) until you find one.
The method works assuming the path always exists. If you don't find a path in n-1 steps, where n is the number of nodes, simply report that no path exists.
Here's the pseudo code for an algorithm that generates paths from shortest to longest assuming unit weights:
class Node {
int steps
Node prev
Node(int steps=0, Node prev=null) {
prev = prev
steps = steps
}
}
class PathGenerator {
Queue<Node> nodes
Node start, target;
PathGenerator(Node start, Node target) {
start = start
target = target
nodes = new Queue<>()
nodes.add(start) // assume start.steps=0 and stat.prev=null
}
Node nextPath(int n) {
current_length = -1;
do {
node = nodes.poll()
current_length = node.steps
// expand to all others you can reach from node
for each u in node.neighbors()
list.add(new Node(node, node.steps+1))
// if node is the target, return the path
if (node == target)
return node
} while (current_length < n);
throw new Exception("no path of length <=n exists");
}
}
Beware that the list nodes can grow exponentially in the worst case (think of what happens in case you run it on a complete graph).

Finding all the shortest paths between two nodes in unweighted undirected graph

I need help finding all the shortest paths between two nodes in an unweighted undirected graph.
I am able to find one of the shortest paths using BFS, but so far I am lost as to how I could find and print out all of them.
Any idea of the algorithm / pseudocode I could use?
As a caveat, remember that there can be exponentially many shortest paths between two nodes in a graph. Any algorithm for this will potentially take exponential time.
That said, there are a few relatively straightforward algorithms that can find all the paths. Here's two.
BFS + Reverse DFS
When running a breadth-first search over a graph, you can tag each node with its distance from the start node. The start node is at distance 0, and then, whenever a new node is discovered for the first time, its distance is one plus the distance of the node that discovered it. So begin by running a BFS over the graph, writing down the distances to each node.
Once you have this, you can find a shortest path from the source to the destination as follows. Start at the destination, which will be at some distance d from the start node. Now, look at all nodes with edges entering the destination node. A shortest path from the source to the destination must end by following an edge from a node at distance d-1 to the destination at distance d. So, starting at the destination node, walk backwards across some edge to any node you'd like at distance d-1. From there, walk to a node at distance d-2, a node at distance d-3, etc. until you're back at the start node at distance 0.
This procedure will give you one path back in reverse order, and you can flip it at the end to get the overall path.
You can then find all the paths from the source to the destination by running a depth-first search from the end node back to the start node, at each point trying all possible ways to walk backwards from the current node to a previous node whose distance is exactly one less than the current node's distance.
(I personally think this is the easiest and cleanest way to find all possible paths, but that's just my opinion.)
BFS With Multiple Parents
This next algorithm is a modification to BFS that you can use as a preprocessing step to speed up generation of all possible paths. Remember that as BFS runs, it proceeds outwards in "layers," getting a single shortest path to all nodes at distance 0, then distance 1, then distance 2, etc. The motivating idea behind BFS is that any node at distance k + 1 from the start node must be connected by an edge to some node at distance k from the start node. BFS discovers this node at distance k + 1 by finding some path of length k to a node at distance k, then extending it by some edge.
If your goal is to find all shortest paths, then you can modify BFS by extending every path to a node at distance k to all the nodes at distance k + 1 that they connect to, rather than picking a single edge. To do this, modify BFS in the following way: whenever you process an edge by adding its endpoint in the processing queue, don't immediately mark that node as being done. Instead, insert that node into the queue annotated with which edge you followed to get to it. This will potentially let you insert the same node into the queue multiple times if there are multiple nodes that link to it. When you remove a node from the queue, then you mark it as being done and never insert it into the queue again. Similarly, rather than storing a single parent pointer, you'll store multiple parent pointers, one for each node that linked into that node.
If you do this modified BFS, you will end up with a DAG where every node will either be the start node and have no outgoing edges, or will be at distance k + 1 from the start node and will have a pointer to each node of distance k that it is connected to. From there, you can reconstruct all shortest paths from some node to the start node by listing of all possible paths from your node of choice back to the start node within the DAG. This can be done recursively:
There is only one path from the start node to itself, namely the empty path.
For any other node, the paths can be found by following each outgoing edge, then recursively extending those paths to yield a path back to the start node.
This approach takes more time and space than the one listed above because many of the paths found this way will not be moving in the direction of the destination node. However, it only requires a modification to BFS, rather than a BFS followed by a reverse search.
Hope this helps!
#templatetypedef is correct, but he forgot to mention about distance check that must be done before any parent links are added to node. This means that se keep the distance from source in each of nodes and increment by one the distance for children. We must skip this increment and parent addition in case the child was already visited and has the lower distance.
public void addParent(Node n) {
// forbidding the parent it its level is equal to ours
if (n.level == level) {
return;
}
parents.add(n);
level = n.level + 1;
}
The full java implementation can be found by the following link.
http://ideone.com/UluCBb
I encountered the similar problem while solving this https://oj.leetcode.com/problems/word-ladder-ii/
The way I tried to deal with is first find the shortest distance using BFS, lets say the shortest distance is d. Now apply DFS and in DFS recursive call don't go beyond recursive level d.
However this might end up exploring all paths as mentioned by #templatetypedef.
First, find the distance-to-start of all nodes using breadth-first search.
(if there are a lot of nodes, you can use A* and stop when top of the queue has distance-to-start > distance-to-start(end-node). This will give you all nodes that belong to some shortest path)
Then just backtrack from the end-node. Anytime a node is connected to two (or more) nodes with a lower distance-to-start, you branch off into two (or more) paths.
templatetypedef your answer was very good, thank you a lot for that one(!!), but it missed out one point:
If you have a graph like this:
A-B-C-E-F
| |
D------
Now lets imagine I want this path:
A -> E.
It will expand like this:
A-> B -> D-> C -> F -> E.
The problem there is,
that you will have F as a parent of E, but
A->B->D->F-E is longer than
A->B->C->E. You will have to take of tracking the distances of parents you are so happily adding.
Step 1: Traverse the graph from the source by BFS and assign each node the minimal distance from the source
Step 2: The distance assigned to the target node is the shortest length
Step 3: From source, do a DFS search along all paths where the minimal distance is increased one by one until the target node is reached or the shortest length is reached. Print the path whenever the target node is reached.
A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:
Every adjacent pair of words differs by a single letter.
Every si for 1 <= i <= k is in wordList. Note that beginWord does not need to be in wordList.
sk == endWord
Given two words, beginWord and endWord, and a dictionary wordList, return all the shortest transformation sequences from beginWord to endWord, or an empty list if no such sequence exists. Each sequence should be returned as a list of the words [beginWord, s1, s2, ..., sk].
Example 1:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: [["hit","hot","dot","dog","cog"],["hit","hot","lot","log","cog"]]
Explanation: There are 2 shortest transformation sequences:
"hit" -> "hot" -> "dot" -> "dog" -> "cog"
"hit" -> "hot" -> "lot" -> "log" -> "cog"
Example 2:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: []
Explanation: The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.
https://leetcode.com/problems/word-ladder-ii
class Solution {
public List<List<String>> findLadders(String beginWord, String endWord, List<String> wordList) {
List<List<String>> result = new ArrayList<>();
if (wordList == null) {
return result;
}
Set<String> dicts = new HashSet<>(wordList);
if (!dicts.contains(endWord)) {
return result;
}
Set<String> start = new HashSet<>();
Set<String> end = new HashSet<>();
Map<String, List<String>> map = new HashMap<>();
start.add(beginWord);
end.add(endWord);
bfs(map, start, end, dicts, false);
List<String> subList = new ArrayList<>();
subList.add(beginWord);
dfs(map, result, subList, beginWord, endWord);
return result;
}
private void bfs(Map<String, List<String>> map, Set<String> start, Set<String> end, Set<String> dicts, boolean reverse) {
// Processed all the word in start
if (start.size() == 0) {
return;
}
dicts.removeAll(start);
Set<String> tmp = new HashSet<>();
boolean finish = false;
for (String str : start) {
char[] chars = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
char old = chars[i];
for (char n = 'a' ; n <='z'; n++) {
if(old == n) {
continue;
}
chars[i] = n;
String candidate = new String(chars);
if (!dicts.contains(candidate)) {
continue;
}
if (end.contains(candidate)) {
finish = true;
} else {
tmp.add(candidate);
}
String key = reverse ? candidate : str;
String value = reverse ? str : candidate;
if (! map.containsKey(key)) {
map.put(key, new ArrayList<>());
}
map.get(key).add(value);
}
// restore after processing
chars[i] = old;
}
}
if (!finish) {
// Switch the start and end if size from start is bigger;
if (tmp.size() > end.size()) {
bfs(map, end, tmp, dicts, !reverse);
} else {
bfs(map, tmp, end, dicts, reverse);
}
}
}
private void dfs (Map<String, List<String>> map,
List<List<String>> result , List<String> subList,
String beginWord, String endWord) {
if(beginWord.equals(endWord)) {
result.add(new ArrayList<>(subList));
return;
}
if (!map.containsKey(beginWord)) {
return;
}
for (String word : map.get(beginWord)) {
subList.add(word);
dfs(map, result, subList, word, endWord);
subList.remove(subList.size() - 1);
}
}
}

Dijkstra's pathfinding algorithm

I'm learning about pathfinding from a book, but I have spotted some weird statement.
For completeness, I will include most of the code, but feel free to skip to the second part (Search())
template <class graph_type >
class Graph_SearchDijkstra
{
private:
//create typedefs for the node and edge types used by the graph
typedef typename graph_type::EdgeType Edge;
typedef typename graph_type::NodeType Node;
private:
const graph_type & m_Graph;
//this vector contains the edges that comprise the shortest path tree -
//a directed sub-tree of the graph that encapsulates the best paths from
//every node on the SPT to the source node.
std::vector<const Edge*> m_ShortestPathTree;
//this is indexed into by node index and holds the total cost of the best
//path found so far to the given node. For example, m_CostToThisNode[5]
//will hold the total cost of all the edges that comprise the best path
//to node 5 found so far in the search (if node 5 is present and has
//been visited of course).
std::vector<double> m_CostToThisNode;
//this is an indexed (by node) vector of "parent" edges leading to nodes
//connected to the SPT but that have not been added to the SPT yet.
std::vector<const Edge*> m_SearchFrontier;
int m_iSource;
int m_iTarget;
void Search();
public:
Graph_SearchDijkstra(const graph_type& graph,
int source,
int target = -1):m_Graph(graph),
m_ShortestPathTree(graph.NumNodes()),
m_SearchFrontier(graph.NumNodes()),
m_CostToThisNode(graph.NumNodes()),
m_iSource(source),
m_iTarget(target)
{
Search();
}
//returns the vector of edges defining the SPT. If a target is given
//in the constructor, then this will be the SPT comprising all the nodes
//examined before the target is found, else it will contain all the nodes
//in the graph.
std::vector<const Edge*> GetAllPaths()const;
//returns a vector of node indexes comprising the shortest path
//from the source to the target. It calculates the path by working
//backward through the SPT from the target node.
std::list<int> GetPathToTarget()const;
//returns the total cost to the target
double GetCostToTarget()const;
};
Search():
template <class graph_type>
void Graph_SearchDijkstra<graph_type>::Search()
{
//create an indexed priority queue that sorts smallest to largest
//(front to back). Note that the maximum number of elements the iPQ
//may contain is NumNodes(). This is because no node can be represented
// on the queue more than once.
IndexedPriorityQLow<double> pq(m_CostToThisNode, m_Graph.NumNodes());
//put the source node on the queue
pq.insert(m_iSource);
//while the queue is not empty
while(!pq.empty())
{
//get the lowest cost node from the queue. Don't forget, the return value
//is a *node index*, not the node itself. This node is the node not already
//on the SPT that is the closest to the source node
int NextClosestNode = pq.Pop();
//move this edge from the search frontier to the shortest path tree
m_ShortestPathTree[NextClosestNode] = m_SearchFrontier[NextClosestNode];
//if the target has been found exit
if (NextClosestNode == m_iTarget) return;
//now to relax the edges. For each edge connected to the next closest node
graph_type::ConstEdgeIterator ConstEdgeItr(m_Graph, NextClosestNode);
for (const Edge* pE=ConstEdgeItr.begin();
!ConstEdgeItr.end();
pE=ConstEdgeItr.next())
{
//the total cost to the node this edge points to is the cost to the
//current node plus the cost of the edge connecting them.
double NewCost = m_CostToThisNode[NextClosestNode] + pE->Cost();
//if this edge has never been on the frontier make a note of the cost
//to reach the node it points to, then add the edge to the frontier
//and the destination node to the PQ.
if (m_SearchFrontier[pE->To()] == 0)
{
m_CostToThisNode[pE->To()] = NewCost;
pq.insert(pE->To());
m_SearchFrontier[pE->To()] = pE;
}
//else test to see if the cost to reach the destination node via the
//current node is cheaper than the cheapest cost found so far. If
//this path is cheaper we assign the new cost to the destination
//node, update its entry in the PQ to reflect the change, and add the
//edge to the frontier
else if ( (NewCost < m_CostToThisNode[pE->To()]) &&
(m_ShortestPathTree[pE->To()] == 0) )
{
m_CostToThisNode[pE->To()] = NewCost;
//because the cost is less than it was previously, the PQ must be
//resorted to account for this.
pq.ChangePriority(pE->To());
m_SearchFrontier[pE->To()] = pE;
}
}
}
}
What I don't get is this part:
//else test to see if the cost to reach the destination node via the
//current node is cheaper than the cheapest cost found so far. If
//this path is cheaper we assign the new cost to the destination
//node, update its entry in the PQ to reflect the change, and add the
//edge to the frontier
else if ( (NewCost < m_CostToThisNode[pE->To()]) &&
(m_ShortestPathTree[pE->To()] == 0) )
if the new cost is lower than the cost already found, then why do we also test if the node has not already been added to the SPT? This seems to beat the purpose of the check?
FYI, in m_ShortestPathTree[pE->To()] == 0 the container is a vector that has a pointer to an edge (or NULL) for each index (the index represents a node)
Imagine the following graph:
S --5-- A --2-- F
\ /
-3 -4
\ /
B
And you want to go from S to F. First, let me tell you the Dijkstra's algorithm assumes there are no loops in the graph with a negative weight. In my example, this loop is S -> B -> A -> S or simpler yet, S -> B -> S
If you have such a loop, you can infinitely loop in it, and your cost to F gets lower and lower. That is why this is not acceptable by Dijkstra's algorithm.
Now, how do you check that? It is as the code you posted does. Every time you want to update the weight of a node, besides checking whether it gets smaller weight, you check if it's not in the accepted list. If it is, then you must have had a negative weight loop. Otherwise, how can you end up going forward and reaching an already accepted node with smaller weight?
Let's follow the algorithm on the example graph (nodes in [] are accepted):
Without the if in question:
Starting Weights: S(0), A(inf), B(inf), F(inf)
- Accept S
New weights: [S(0)], A(5), B(-3), F(inf)
- Accept B
New weights: [S(-3)], A(-7), [B(-3)], F(inf)
- Accept A
New weights: [S(-3)], [A(-7)], [B(-11)], F(-5)
- Accept B again
New weights: [S(-14)], [A(-18)], [B(-11)], F(-5)
- Accept A again
... infinite loop
With the if in question:
Starting Weights: S(0), A(inf), B(inf), F(inf)
- Accept S
New weights: [S(0)], A(5), B(-3), F(inf)
- Accept B (doesn't change S)
New weights: [S(0)], A(-7), [B(-3)], F(inf)
- Accept A (doesn't change S or B
New weights: [S(0)], [A(-7)], [B(-3)], F(-5)
- Accept F

How to detect if a directed graph is cyclic?

How can we detect if a directed graph is cyclic? I thought using breadth first search, but I'm not sure. Any ideas?
What you really need, I believe, is a topological sorting algorithm like the one described here:
http://en.wikipedia.org/wiki/Topological_sorting
If the directed graph has a cycle then the algorithm will fail.
The comments/replies that I've seen so far seem to be missing the fact that in a directed graph there may be more than one way to get from node X to node Y without there being any (directed) cycles in the graph.
Usually depth-first search is used instead. I don't know if BFS is applicable easily.
In DFS, a spanning tree is built in order of visiting. If a the ancestor of a node in the tree is visited (i.e. a back-edge is created), then we detect a cycle.
See http://www.cs.nyu.edu/courses/summer04/G22.1170-001/6a-Graphs-More.pdf for a more detailed explanation.
Use DFS to search if any path is cyclic
class Node<T> { T value; List<Node<T>> adjacent; }
class Graph<T>{
List<Node<T>> nodes;
public boolean isCyclicRec()
{
for (Node<T> node : nodes)
{
Set<Node<T>> initPath = new HashSet<>();
if (isCyclicRec(node, initPath))
{
return true;
}
}
return false;
}
private boolean isCyclicRec(Node<T> currNode, Set<Node<T>> path)
{
if (path.contains(currNode))
{
return true;
}
else
{
path.add(currNode);
for (Node<T> node : currNode.adjacent)
{
if (isCyclicRec(node, path))
{
return true;
}
else
{
path.remove(node);
}
}
}
return false;
}
approach:1
how about a level no assignment to detect a cycle. eg: consider the graph below. A->(B,C) B->D D->(E,F) E,F->(G) E->D As you perform a DFS start assigning a level no to the node you visit (root A=0). level no of node = parent+1. So A=0, B=1, D=2, F=3, G=4 then, recursion reaches D, so E=3. Dont mark level for G (G already a level no assigned which is grater than E) Now E also has an edge to D. So levelization would say D should get a level no of 4. But D already has a "lower level" assigned to it of 2. Thus any time you attempt to assign a level number to a node while doing DFS that already has a lower level number set to it, you know the directed graph has a cycle..
approach2:
use 3 colors. white, gray, black. color only white nodes, white nodes to gray as you go down the DFS, color gray nodes to black when recursion unfolds (all children are processed). if not all children yet processed and you hit a gray node thats a cycle.
eg: all white to begin in above direct graph.
color A, B, D, F,G are colored white-gray. G is leaf so all children processed color it gray to black. recursion unfolds to F(all children processed) color it black. now you reach D, D has unprocessed children, so color E gray, G already colored black so dont go further down. E also has edge to D, so while still processing D (D still gray), you find an edge back to D(a gray node), a cycle is detected.
Testing for Topological sort over the given graph will lead you to the solution. If the algorithm for topsort, i.e the edges should always be directed in one way fails, then it means that the graph contains cycles.
Another simple solution would be a mark-and-sweep approach. Basically, for each node in tree you flag it as "visited" and then move on to it's children. If you ever see a node with the "visted" flag set, you know there's a cycle.
If modifying the graph to include a "visited" bit isn't possible, a set of node pointers can be used instead. To flag a node as visited, you place a pointer to it in the set. If the pointer is already in the set, there's a cycle.

Finding all cycles in a directed graph

How can I find (iterate over) ALL the cycles in a directed graph from/to a given node?
For example, I want something like this:
A->B->A
A->B->C->A
but not:
B->C->B
I found this page in my search and since cycles are not same as strongly connected components, I kept on searching and finally, I found an efficient algorithm which lists all (elementary) cycles of a directed graph. It is from Donald B. Johnson and the paper can be found in the following link:
http://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF
A java implementation can be found in:
http://normalisiert.de/code/java/elementaryCycles.zip
A Mathematica demonstration of Johnson's algorithm can be found here, implementation can be downloaded from the right ("Download author code").
Note: Actually, there are many algorithms for this problem. Some of them are listed in this article:
http://dx.doi.org/10.1137/0205007
According to the article, Johnson's algorithm is the fastest one.
Depth first search with backtracking should work here.
Keep an array of boolean values to keep track of whether you visited a node before. If you run out of new nodes to go to (without hitting a node you have already been), then just backtrack and try a different branch.
The DFS is easy to implement if you have an adjacency list to represent the graph. For example adj[A] = {B,C} indicates that B and C are the children of A.
For example, pseudo-code below. "start" is the node you start from.
dfs(adj,node,visited):
if (visited[node]):
if (node == start):
"found a path"
return;
visited[node]=YES;
for child in adj[node]:
dfs(adj,child,visited)
visited[node]=NO;
Call the above function with the start node:
visited = {}
dfs(adj,start,visited)
The simplest choice I found to solve this problem was using the python lib called networkx.
It implements the Johnson's algorithm mentioned in the best answer of this question but it makes quite simple to execute.
In short you need the following:
import networkx as nx
import matplotlib.pyplot as plt
# Create Directed Graph
G=nx.DiGraph()
# Add a list of nodes:
G.add_nodes_from(["a","b","c","d","e"])
# Add a list of edges:
G.add_edges_from([("a","b"),("b","c"), ("c","a"), ("b","d"), ("d","e"), ("e","a")])
#Return a list of cycles described as a list o nodes
list(nx.simple_cycles(G))
Answer: [['a', 'b', 'd', 'e'], ['a', 'b', 'c']]
First of all - you do not really want to try find literally all cycles because if there is 1 then there is an infinite number of those. For example A-B-A, A-B-A-B-A etc. Or it may be possible to join together 2 cycles into an 8-like cycle etc., etc... The meaningful approach is to look for all so called simple cycles - those that do not cross themselves except in the start/end point. Then if you wish you can generate combinations of simple cycles.
One of the baseline algorithms for finding all simple cycles in a directed graph is this: Do a depth-first traversal of all simple paths (those that do not cross themselves) in the graph. Every time when the current node has a successor on the stack a simple cycle is discovered. It consists of the elements on the stack starting with the identified successor and ending with the top of the stack. Depth first traversal of all simple paths is similar to depth first search but you do not mark/record visited nodes other than those currently on the stack as stop points.
The brute force algorithm above is terribly inefficient and in addition to that generates multiple copies of the cycles. It is however the starting point of multiple practical algorithms which apply various enhancements in order to improve performance and avoid cycle duplication. I was surprised to find out some time ago that these algorithms are not readily available in textbooks and on the web. So I did some research and implemented 4 such algorithms and 1 algorithm for cycles in undirected graphs in an open source Java library here : http://code.google.com/p/niographs/ .
BTW, since I mentioned undirected graphs : The algorithm for those is different. Build a spanning tree and then every edge which is not part of the tree forms a simple cycle together with some edges in the tree. The cycles found this way form a so called cycle base. All simple cycles can then be found by combining 2 or more distinct base cycles. For more details see e.g. this : http://dspace.mit.edu/bitstream/handle/1721.1/68106/FTL_R_1982_07.pdf .
The DFS-based variants with back edges will find cycles indeed, but in many cases it will NOT be minimal cycles. In general DFS gives you the flag that there is a cycle but it is not good enough to actually find cycles. For example, imagine 5 different cycles sharing two edges. There is no simple way to identify cycles using just DFS (including backtracking variants).
Johnson's algorithm is indeed gives all unique simple cycles and has good time and space complexity.
But if you want to just find MINIMAL cycles (meaning that there may be more then one cycle going through any vertex and we are interested in finding minimal ones) AND your graph is not very large, you can try to use the simple method below.
It is VERY simple but rather slow compared to Johnson's.
So, one of the absolutely easiest way to find MINIMAL cycles is to use Floyd's algorithm to find minimal paths between all the vertices using adjacency matrix.
This algorithm is nowhere near as optimal as Johnson's, but it is so simple and its inner loop is so tight that for smaller graphs (<=50-100 nodes) it absolutely makes sense to use it.
Time complexity is O(n^3), space complexity O(n^2) if you use parent tracking and O(1) if you don't.
First of all let's find the answer to the question if there is a cycle.
The algorithm is dead-simple. Below is snippet in Scala.
val NO_EDGE = Integer.MAX_VALUE / 2
def shortestPath(weights: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
weights(i)(j) = throughK
}
}
}
Originally this algorithm operates on weighted-edge graph to find all shortest paths between all pairs of nodes (hence the weights argument). For it to work correctly you need to provide 1 if there is a directed edge between the nodes or NO_EDGE otherwise.
After algorithm executes, you can check the main diagonal, if there are values less then NO_EDGE than this node participates in a cycle of length equal to the value. Every other node of the same cycle will have the same value (on the main diagonal).
To reconstruct the cycle itself we need to use slightly modified version of algorithm with parent tracking.
def shortestPath(weights: Array[Array[Int]], parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = k
weights(i)(j) = throughK
}
}
}
Parents matrix initially should contain source vertex index in an edge cell if there is an edge between the vertices and -1 otherwise.
After function returns, for each edge you will have reference to the parent node in the shortest path tree.
And then it's easy to recover actual cycles.
All in all we have the following program to find all minimal cycles
val NO_EDGE = Integer.MAX_VALUE / 2;
def shortestPathWithParentTracking(
weights: Array[Array[Int]],
parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = parents(i)(k)
weights(i)(j) = throughK
}
}
}
def recoverCycles(
cycleNodes: Seq[Int],
parents: Array[Array[Int]]): Set[Seq[Int]] = {
val res = new mutable.HashSet[Seq[Int]]()
for (node <- cycleNodes) {
var cycle = new mutable.ArrayBuffer[Int]()
cycle += node
var other = parents(node)(node)
do {
cycle += other
other = parents(other)(node)
} while(other != node)
res += cycle.sorted
}
res.toSet
}
and a small main method just to test the result
def main(args: Array[String]): Unit = {
val n = 3
val weights = Array(Array(NO_EDGE, 1, NO_EDGE), Array(NO_EDGE, NO_EDGE, 1), Array(1, NO_EDGE, NO_EDGE))
val parents = Array(Array(-1, 1, -1), Array(-1, -1, 2), Array(0, -1, -1))
shortestPathWithParentTracking(weights, parents)
val cycleNodes = parents.indices.filter(i => parents(i)(i) < NO_EDGE)
val cycles: Set[Seq[Int]] = recoverCycles(cycleNodes, parents)
println("The following minimal cycle found:")
cycles.foreach(c => println(c.mkString))
println(s"Total: ${cycles.size} cycle found")
}
and the output is
The following minimal cycle found:
012
Total: 1 cycle found
To clarify:
Strongly Connected Components will find all subgraphs that have at least one cycle in them, not all possible cycles in the graph. e.g. if you take all strongly connected components and collapse/group/merge each one of them into one node (i.e. a node per component), you'll get a tree with no cycles (a DAG actually). Each component (which is basically a subgraph with at least one cycle in it) can contain many more possible cycles internally, so SCC will NOT find all possible cycles, it will find all possible groups that have at least one cycle, and if you group them, then the graph will not have cycles.
to find all simple cycles in a graph, as others mentioned, Johnson's algorithm is a candidate.
I was given this as an interview question once, I suspect this has happened to you and you are coming here for help. Break the problem into three questions and it becomes easier.
how do you determine the next valid
route
how do you determine if a point has
been used
how do you avoid crossing over the
same point again
Problem 1)
Use the iterator pattern to provide a way of iterating route results. A good place to put the logic to get the next route is probably the "moveNext" of your iterator. To find a valid route, it depends on your data structure. For me it was a sql table full of valid route possibilities so I had to build a query to get the valid destinations given a source.
Problem 2)
Push each node as you find them into a collection as you get them, this means that you can see if you are "doubling back" over a point very easily by interrogating the collection you are building on the fly.
Problem 3)
If at any point you see you are doubling back, you can pop things off the collection and "back up". Then from that point try to "move forward" again.
Hack: if you are using Sql Server 2008 there is are some new "hierarchy" things you can use to quickly solve this if you structure your data in a tree.
In the case of undirected graph, a paper recently published (Optimal listing of cycles and st-paths in undirected graphs) offers an asymptotically optimal solution. You can read it here http://arxiv.org/abs/1205.2766 or here http://dl.acm.org/citation.cfm?id=2627951
I know it doesn't answer your question, but since the title of your question doesn't mention direction, it might still be useful for Google search
Start at node X and check for all child nodes (parent and child nodes are equivalent if undirected). Mark those child nodes as being children of X. From any such child node A, mark it's children of being children of A, X', where X' is marked as being 2 steps away.). If you later hit X and mark it as being a child of X'', that means X is in a 3 node cycle. Backtracking to it's parent is easy (as-is, the algorithm has no support for this so you'd find whichever parent has X').
Note: If graph is undirected or has any bidirectional edges, this algorithm gets more complicated, assuming you don't want to traverse the same edge twice for a cycle.
If what you want is to find all elementary circuits in a graph you can use the EC algorithm, by JAMES C. TIERNAN, found on a paper since 1970.
The very original EC algorithm as I managed to implement it in php (hope there are no mistakes is shown below). It can find loops too if there are any. The circuits in this implementation (that tries to clone the original) are the non zero elements. Zero here stands for non-existence (null as we know it).
Apart from that below follows an other implementation that gives the algorithm more independece, this means the nodes can start from anywhere even from negative numbers, e.g -4,-3,-2,.. etc.
In both cases it is required that the nodes are sequential.
You might need to study the original paper, James C. Tiernan Elementary Circuit Algorithm
<?php
echo "<pre><br><br>";
$G = array(
1=>array(1,2,3),
2=>array(1,2,3),
3=>array(1,2,3)
);
define('N',key(array_slice($G, -1, 1, true)));
$P = array(1=>0,2=>0,3=>0,4=>0,5=>0);
$H = array(1=>$P, 2=>$P, 3=>$P, 4=>$P, 5=>$P );
$k = 1;
$P[$k] = key($G);
$Circ = array();
#[Path Extension]
EC2_Path_Extension:
foreach($G[$P[$k]] as $j => $child ){
if( $child>$P[1] and in_array($child, $P)===false and in_array($child, $H[$P[$k]])===false ){
$k++;
$P[$k] = $child;
goto EC2_Path_Extension;
} }
#[EC3 Circuit Confirmation]
if( in_array($P[1], $G[$P[$k]])===true ){//if PATH[1] is not child of PATH[current] then don't have a cycle
$Circ[] = $P;
}
#[EC4 Vertex Closure]
if($k===1){
goto EC5_Advance_Initial_Vertex;
}
//afou den ksana theoreitai einai asfales na svisoume
for( $m=1; $m<=N; $m++){//H[P[k], m] <- O, m = 1, 2, . . . , N
if( $H[$P[$k-1]][$m]===0 ){
$H[$P[$k-1]][$m]=$P[$k];
break(1);
}
}
for( $m=1; $m<=N; $m++ ){//H[P[k], m] <- O, m = 1, 2, . . . , N
$H[$P[$k]][$m]=0;
}
$P[$k]=0;
$k--;
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC5_Advance_Initial_Vertex:
if($P[1] === N){
goto EC6_Terminate;
}
$P[1]++;
$k=1;
$H=array(
1=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
2=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
3=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
4=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
5=>array(1=>0,2=>0,3=>0,4=>0,5=>0)
);
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC6_Terminate:
print_r($Circ);
?>
then this is the other implementation, more independent of the graph, without goto and without array values, instead it uses array keys, the path, the graph and circuits are stored as array keys (use array values if you like, just change the required lines). The example graph start from -4 to show its independence.
<?php
$G = array(
-4=>array(-4=>true,-3=>true,-2=>true),
-3=>array(-4=>true,-3=>true,-2=>true),
-2=>array(-4=>true,-3=>true,-2=>true)
);
$C = array();
EC($G,$C);
echo "<pre>";
print_r($C);
function EC($G, &$C){
$CNST_not_closed = false; // this flag indicates no closure
$CNST_closed = true; // this flag indicates closure
// define the state where there is no closures for some node
$tmp_first_node = key($G); // first node = first key
$tmp_last_node = $tmp_first_node-1+count($G); // last node = last key
$CNST_closure_reset = array();
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$CNST_closure_reset[$k] = $CNST_not_closed;
}
// define the state where there is no closure for all nodes
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$H[$k] = $CNST_closure_reset; // Key in the closure arrays represent nodes
}
unset($tmp_first_node);
unset($tmp_last_node);
# Start algorithm
foreach($G as $init_node => $children){#[Jump to initial node set]
#[Initial Node Set]
$P = array(); // declare at starup, remove the old $init_node from path on loop
$P[$init_node]=true; // the first key in P is always the new initial node
$k=$init_node; // update the current node
// On loop H[old_init_node] is not cleared cause is never checked again
do{#Path 1,3,7,4 jump here to extend father 7
do{#Path from 1,3,8,5 became 2,4,8,5,6 jump here to extend child 6
$new_expansion = false;
foreach( $G[$k] as $child => $foo ){#Consider each child of 7 or 6
if( $child>$init_node and isset($P[$child])===false and $H[$k][$child]===$CNST_not_closed ){
$P[$child]=true; // add this child to the path
$k = $child; // update the current node
$new_expansion=true;// set the flag for expanding the child of k
break(1); // we are done, one child at a time
} } }while(($new_expansion===true));// Do while a new child has been added to the path
# If the first node is child of the last we have a circuit
if( isset($G[$k][$init_node])===true ){
$C[] = $P; // Leaving this out of closure will catch loops to
}
# Closure
if($k>$init_node){ //if k>init_node then alwaya count(P)>1, so proceed to closure
$new_expansion=true; // $new_expansion is never true, set true to expand father of k
unset($P[$k]); // remove k from path
end($P); $k_father = key($P); // get father of k
$H[$k_father][$k]=$CNST_closed; // mark k as closed
$H[$k] = $CNST_closure_reset; // reset k closure
$k = $k_father; // update k
} } while($new_expansion===true);//if we don't wnter the if block m has the old k$k_father_old = $k;
// Advance Initial Vertex Context
}//foreach initial
}//function
?>
I have analized and documented the EC but unfortunately the documentation is in Greek.
There are two steps (algorithms) involved in finding all cycles in a DAG.
The first step is to use Tarjan's algorithm to find the set of strongly connected components.
Start from any arbitrary vertex.
DFS from that vertex. For each node x, keep two numbers, dfs_index[x] and dfs_lowval[x].
dfs_index[x] stores when that node is visited, while dfs_lowval[x] = min(dfs_low[k]) where
k is all the children of x that is not the directly parent of x in the dfs-spanning tree.
All nodes with the same dfs_lowval[x] are in the same strongly connected component.
The second step is to find cycles (paths) within the connected components. My suggestion is to use a modified version of Hierholzer's algorithm.
The idea is:
Choose any starting vertex v, and follow a trail of edges from that vertex until you return to v.
It is not possible to get stuck at any vertex other than v, because the even degree of all vertices ensures that, when the trail enters another vertex w there must be an unused edge leaving w. The tour formed in this way is a closed tour, but may not cover all the vertices and edges of the initial graph.
As long as there exists a vertex v that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from v, following unused edges until you return to v, and join the tour formed in this way to the previous tour.
Here is the link to a Java implementation with a test case:
http://stones333.blogspot.com/2013/12/find-cycles-in-directed-graph-dag.html
I stumbled over the following algorithm which seems to be more efficient than Johnson's algorithm (at least for larger graphs). I am however not sure about its performance compared to Tarjan's algorithm.
Additionally, I only checked it out for triangles so far. If interested, please see "Arboricity and Subgraph Listing Algorithms" by Norishige Chiba and Takao Nishizeki (http://dx.doi.org/10.1137/0214017)
DFS from the start node s, keep track of the DFS path during traversal, and record the path if you find an edge from node v in the path to s. (v,s) is a back-edge in the DFS tree and thus indicates a cycle containing s.
Regarding your question about the Permutation Cycle, read more here:
https://www.codechef.com/problems/PCYCLE
You can try this code (enter the size and the digits number):
# include<cstdio>
using namespace std;
int main()
{
int n;
scanf("%d",&n);
int num[1000];
int visited[1000]={0};
int vindex[2000];
for(int i=1;i<=n;i++)
scanf("%d",&num[i]);
int t_visited=0;
int cycles=0;
int start=0, index;
while(t_visited < n)
{
for(int i=1;i<=n;i++)
{
if(visited[i]==0)
{
vindex[start]=i;
visited[i]=1;
t_visited++;
index=start;
break;
}
}
while(true)
{
index++;
vindex[index]=num[vindex[index-1]];
if(vindex[index]==vindex[start])
break;
visited[vindex[index]]=1;
t_visited++;
}
vindex[++index]=0;
start=index+1;
cycles++;
}
printf("%d\n",cycles,vindex[0]);
for(int i=0;i<(n+2*cycles);i++)
{
if(vindex[i]==0)
printf("\n");
else
printf("%d ",vindex[i]);
}
}
DFS c++ version for the pseudo-code in second floor's answer:
void findCircleUnit(int start, int v, bool* visited, vector<int>& path) {
if(visited[v]) {
if(v == start) {
for(auto c : path)
cout << c << " ";
cout << endl;
return;
}
else
return;
}
visited[v] = true;
path.push_back(v);
for(auto i : G[v])
findCircleUnit(start, i, visited, path);
visited[v] = false;
path.pop_back();
}
http://www.me.utexas.edu/~bard/IP/Handouts/cycles.pdf
The CXXGraph library give a set of algorithms and functions to detect cycles.
For a full algorithm explanation visit the wiki.

Resources