graph: find subgraphs using list of nodes including wildcards - algorithm

Has the following problem got a name and is there an algorithm to solve it? :
Given a graph, either directed or not, find all the paths which satisfy the specification given by
a list of exact nodes, or
'*?' which denotes just 'any node or no node at all', or
'*{n}' which denote 'any n consecutively connected nodes'
e.g.
A -> B -> *? -> D which results in ABXD and ABYD and ABD etc.
or
A -> *{1} -> D -> *? -> E which results in ABXDZE and ABYDZE and ABDZE etc. etc.
thanks
p.s.
Does anyone know a graph library doing this either in R or perl or C?

I dunno any library for that, but you have to separate this in two part :
the user query parsing
The algorithm to find what you are looking for
For the parsing, i let you find what you need to do (using parsing library or by your self)
Concerning the algorithm part i suggest you to define a special structure (like a linked list) for representing you query, in which each element can either denote a real node, x number of node, or unlimited number of nodes.
The only problem on your algorithm is to find all path from a node A to a node B, using an unlimited number or a limited number of intermediate nodes. You can do this by using dynamic programming, or a search algorithm such as DFS or BFS.

What I did at the end was:
The problem is to find all paths of length N between 2 nodes. Cycles are excluded.
read the data in as an edgelist, e.g. pairs of from->to nodes (names of nodes are assumed to be unique)
create a hashtable (or unordered_map in boost and stl, c++) of node names as keys and a hashtable as a value.
this second hashtable will contain all nodes the first node leads to as keys.
For example
A->B
A->C
B->D
C->E
E->D
the resultant data structure holding the input data in perl notation looks like this after reading in all the data as an 'edgelist':
my %hash = (
'A' => {'B' => 1, 'C' => 1},
'B' => {'D' => 1},
'C' => {'E' => 1},
'E' => {'D' => 1},
);
finding if a pair of nodes is DIRECTLY connected can be done roughly as (perl):
sub search {
my ($from,$to) = #_;
if( $to eq '*' ){ return defined($x=$hash{$from}) ? [keys $hash{$from}] : [] }
return defined($x=$hash{$from}) && defined($x{$to}) ? [$to] : []
}
In the above function there is provision to return all the nodes a 'from' node is connected to, by setting $to to '*'. The return is an array ref of nodes connected directly to the $from parameter.
Searching for the path between two nodes requires using the above function recursively.
e.g.
sub path {
my ($from,$to, $hops, $save_results) = #_;
if( $hops < 0 ){ return 0 }
$results = search($from, '*');
if( "".#$results == 0 ){ return 0 }
$found = 0;
foreach $result (#$results){
$a_node = new Tree::Nary($result);
if( path($result, $to, $hops-1, $a_node) == 1 ){
$save_results->insert($save_results, -1, $a_node);
$found = 1;
}
}
return $found;
}
It's ok to use recursion if the depth is not too much (i.e. $hops < 6 ?) because of stack overflow [sic].
The most tricky part is to read through the results and extract the nodes for each path. After a lot of deliberation I decided to use a Tree::Nary (n-ary tree) to store the results. At the end we have the following tree:
|-> B -> D
A -> |-> C -> E -> D
In order to extract all the paths, do:
find all the leaf nodes
start from each leaf node moving backwards via its parent to the root node and saving the node name.
The above was implemented using perl, but have also done it in C++ using boost::unordered_map for hashtable. I haven't yet added a tree structure in then C++ code.
Results: for 3281415 edges and 18601 unique nodes, perl takes 3 mins to find A->'*'->'*'->B. I will give an update on the C++ code when ready.

Related

Palindromes in a tree

I am looking at this challenge:
Given a tree with N nodes and N-1 edges. Each edge on the tree is labelled by a string of lowercase letters from the Latin alphabet. Given Q queries, consisting of two nodes u and v, check if it is possible to make a palindrome string which uses all the characters that belong to the string labelled on the edges in the path from node u to node v.
Characters can be used in any order.
N is of the order of 105 and Q is of the order of 106
Input:
N=3
u=1 v=3 weight=bc
u=1 v=2 weight=aba
Q=4
u=1 v=2
u=2 v=3
u=3 v=1
u=3 v=3
Output:
YES
YES
NO
NO
What I thought was to compute the LCA between 2 nodes by precomputation in O(1) using sparse table and Range minimum query on Euler tower and then see the path from LCA to node u and LCA to node v and store all the characters frequency. If the sum of frequency of all the characters is odd, we check if the frequency of each character except one is odd. If the sum of frequency of all the characters is even, we check if the frequency of each character is even. But this process will surely time out because Q can be upto 106.
Is there anyone with a better algorithm?
Preparation Step
Prepare your data structure as follows:
For each node get the path to the root, get all letters on the path, and only retain a letter when it occurs an odd number of times on that path. Finally encode that string with unique letters as a bit pattern, where bit 0 is set when there is an "a", bit 1 is set when there is a "b", ... bit 25 is set when there is a "z". Store this pattern with the node.
This preprocessing can be done with a depth-first recursive procedure, where the current node's pattern is passed down to the children, which can apply the edge's information to that pattern to create their own pattern. So this preprocessing can run in linear time in terms of the total number of characters in the tree, or more precisely O(N+S), where S represents that total number of characters.
Query Step
When a query is done perform the bitwise XOR on the two involved patterns. If the result is 0 or it has only one bit set, return "YES", else return "NO". So the query will not visit any other nodes than just the two ones that are given, look up the two patterns and perform their XOR and make the bit test. All this happens in constant time for one query.
The last query given in the question shows that the result should be "NO" when the two nodes are the same node. This is a boundary case, as it is debatable whether an empty string is a palindrome or not. The above XOR algorithm would return "YES", so you would need a specific test for this boundary case, and return "NO" instead.
Explanation
This works because if we look at the paths both nodes have to the root, they may share a part of their path. The characters on that common path should not be considered, and the XOR will make sure they aren't. Where the paths differ, we actually have the edges on the path from the one node to the other. There we see the characters that should contribute to a palindrome.
If a character appears an even number of times in those edges, it poses no problem for creating a palindrome. The XOR makes sure those characters "disappear".
If a character appears an odd number of times, all but one can mirror each other like in the even case. The remaining one can only be used in an odd-length palindrome, and only in the centre position of it. So there can only be one such character. This translates to the test that the XOR result is allowed to have 1 bit set (but not more).
Implementation
Here is an implementation in JavaScript. The example run uses the input as provided in the question. I did not bother to turn the query results from boolean to NO/YES:
function prepare(edges) {
// edges: array of [u, v, weight] triplets
// Build adjacency list from the list of edges
let adjacency = {};
for (let [u, v, weight] of edges) {
// convert weight to pattern, as we don't really need to
// store the strings
let pattern = 0;
for (let i = 0; i < weight.length; i++) {
let ascii = weight.charCodeAt(i) - 97;
pattern ^= 1 << ascii; // toggle bit that corresponds to letter
}
if (v in adjacency && u in adjacency) throw "Cycle detected!";
if (!(v in adjacency)) adjacency[v] = {};
if (!(u in adjacency)) adjacency[u] = {};
adjacency[u][v] = pattern;
adjacency[v][u] = pattern;
}
// Prepare the consolidated path-pattern for each node
let patterns = {}; // This is the information to return
function dfs(u, parent, pathPattern) {
patterns[u] = pathPattern;
for (let v in adjacency[u]) {
// recurse into the "children" (the parent is not revisited)
if (v !== parent) dfs(v, u, adjacency[u][v] ^ pathPattern);
}
}
// Start a DFS from an arbitrary node as root
dfs(edges[0][0], null, 0);
return patterns;
}
function query(nodePatterns, u, v) {
if (u === v) return false; // Boundary case.
let pattern = nodePatterns[u] ^ nodePatterns[v];
// "smart" test to verify that at most 1 bit is set
return pattern === (pattern & -pattern);
}
// Example:
let edges = [[1, 3, "bc"], [1, 2, "aba"]];
let queries = [[1, 2], [2, 3], [3, 1], [3, 3]];
let nodePatterns = prepare(edges);
for (let [u, v] of queries) {
console.log(u, v, query(nodePatterns, u, v));
}
First of all, let's choose a root. Now imagine that each edge points to a node which is deeper in the tree. Instead of having strings on edges, put them on vertices that those edges point to. Now there is no string only at your root. Now for each vertex calculate and store amount of each letter in it's string.
Since now we'll be doing stuff for each letter seperately.
Using DFS, calculate for each node v number of letters on vertices on a path from v to root. You'll also need LCA, so you may precompute RMQ or find LCA in O(logn) if you like. Let Letters[v][c] be number of letters c on path from v to root. Then, to find number of letter c from u to v just use Letters[v][c] + Letters[u][c] - 2 * Letters[LCA(v, u)][c]. You can check amount of single letter in O(1) (or O(logn) if you're not using RMQ). So in 26* O(1) you can check every single possible letter.

An algorithm which finds a path in a tree along the maximum node values at each level

I am looking for an algorithm wich finds a path in a tree along the maximum node values at each level. The following graphic illustrates the problem:
If all nodes on a level would have unique values, this problem would be quite trivial. However, if there are duplicate values on a level, then I (assume that I) need some sort of look ahead. As shown in the example, if there is a draw, the path is chosen which allows for a higher value in the next level. Taken to an "exterme", the path
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 0
would be chosen over the path
1 -> 2 -> 3 -> 4 -> 5 -> 5 -> 99
As soon as there is only one possible path, the algorithm may exit.
Until now, my naive solution has been to find all the possible paths and them compare them level by level, which works for the (small) trees I am dealing with, but I'd like to find a "proper" solution.
I am quite sure that this algorithm has already been implemented, however, I find it hard to find the right search terms for it. Is anyone aware of a solution to this problem?
You can start from the root and maintain the list of candidates. Initially the root is the only element in the list. At each iteration you can look at all children of the current candidates and put those that have the maximum value into a new candidates list (that is, each iteration goes one level lower).
Each node is visited at most once, so this solution has a linear time complexity.
Here is a possible implementation (not tested)
Assume a tree structure based on Nodes as follow:
internal class Node
{
int Value ;
List<Node> Children ;
}
Main program
// Initialize the best path and call the Seek procedure
List<Node> BestPath = SeekBestSubPath(TheRootNode)
Recursive function
private List<Node> SeekBestSubPath(Node CurrentNode)
{
// identify one or more children with max value
List<Node> BestChildrenNodes = null ;
int Max = int.MinValue ;
for (int i=0;i<CurrentNode.Children.Count;i++)
if (CurrentNode.Children[i].Value > Max)
BestChildrenNodes=new List<Node>() { CurrentNode.Children[i] } ;
else if (CurrentNode.Children[i].Value == Max)
BestChildrenNodes.Add(CurrentNode.Children[i]) ;
// Process BestChildrenNodes
List<Nodes> result = new List<Node>() ;
for (int i=0;i<BestChildrenNodes.Count;i++)
{ // search the best sub path for each child and keep the best one among all
List<Node> ChildBestSubPath = SeekBestSubPath(BestChildrenNodes[i]) ;
if (PathComparator(ChildBestSubPath,MaxPath)>1) result=ChildBestSubPath ;
}
result.Insert(0,CurrentNode) ;
return result ;
}
Compare 2 subpaths
private PathComparator(List<Node> Path1,List<Node> Path2)
{ returns 0:equality, 1:Path1>Path2, -1:Path1<Path2
int result = 0
for (int i=0;i<Math.Min(Path1.Length,Path2.length) && result=0;i++)
result=Math.Sign((Path1[i].Value).CompareTo(Path2[i].Value));
if (result==0) result=Math.Sign(((Path1[i].Count).CompareTo(Path2[i].Count));
return result ;
}

Edge classification during Breadth-first search on a directed graph

I am having difficulties finding a way to properly classify the edges while a breadth-first search on a directed graph.
During a breadth-first or depth-first search, you can classify the edges met with 4 classes:
TREE
BACK
CROSS
FORWARD
Skiena [1] gives an implementation. If you move along an edge from v1 to v2, here is a way to return the class during a DFS in java, for reference. The parents map returns the parent vertex for the current search, and the timeOf() method, the time at which the vertex has been discovered.
if ( v1.equals( parents.get( v2 ) ) ) { return EdgeClass.TREE; }
if ( discovered.contains( v2 ) && !processed.contains( v2 ) ) { return EdgeClass.BACK; }
if ( processed.contains( v2 ) )
{
if ( timeOf( v1 ) < timeOf( v2 ) )
{
return EdgeClass.FORWARD;
}
else
{
return EdgeClass.CROSS;
}
}
return EdgeClass.UNCLASSIFIED;
My problem is that I cannot get it right for a Breadth first search on a directed graph. For instance:
The following graph - that is a loop - is ok:
A -> B
A -> C
B -> C
BFSing from A, B will be discovered, then C. The edges eAB and eAC are TREE edges, and when eBC is crossed last, B and C are processed and discovered, and this edge is properly classified as CROSS.
But a plain loop does not work:
A -> B
B -> C
C -> A
When the edge eCA is crossed last, A is processed and discovered. So this edge is incorrectly labeled as CROSS, whether it should be a BACK edge.
There is indeed no difference in the way the two cases are treated, even if the two edges have different classes.
How do you implement a proper edge classification for a BFS on a directed graph?
[1] http://www.algorist.com/
EDIT
Here an implementation derived from #redtuna answer.
I just added a check not to fetch the parent of the root.
I have JUnits tests that show it works for directed and undirected graphs, in the case of a loop, a straight line, a fork, a standard example, a single edge, etc....
#Override
public EdgeClass edgeClass( final V from, final V to )
{
if ( !discovered.contains( to ) ) { return EdgeClass.TREE; }
int toDepth = depths.get( to );
int fromDepth = depths.get( from );
V b = to;
while ( toDepth > 0 && fromDepth < toDepth )
{
b = parents.get( b );
toDepth = depths.get( b );
}
V a = from;
while ( fromDepth > 0 && toDepth < fromDepth )
{
a = parents.get( a );
fromDepth = depths.get( a );
}
if ( a.equals( b ) )
{
return EdgeClass.BACK;
}
else
{
return EdgeClass.CROSS;
}
}
How do you implement a proper edge classification for a BFS on a
directed graph?
As you already established, seeing a node for the first time creates a tree edge. The problem with BFS instead of DFS, as David Eisenstat said before me, is that back edges cannot be distinguished from cross ones just based on traversal order.
Instead, you need to do a bit of extra work to distinguish them. The key, as you'll see, is to use the definition of a cross edge.
The simplest (but memory-intensive) way is to associate every node with the set of its predecessors. This can be done trivially when you visit nodes. When finding a non-tree edge between nodes a and b, consider their predecessor sets. If one is a proper subset of the other, then you have a back edge. Otherwise, it's a cross edge. This comes directly from the definition of a cross edge: it's an edge between nodes where neither is the ancestor nor the descendant of the other on the tree.
A better way is to associate only a "depth" number with each node instead of a set. Again, this is readily done as you visit nodes. Now when you find a non-tree edge between a and b, start from the deeper of the two nodes and follow the tree edges backwards until you go back to the same depth as the other. So for example suppose a was deeper. Then you repeatedly compute a=parent(a) until depth(a)=depth(b).
If at this point a=b then you can classify the edge as a back edge because, as per the definition, one of the nodes is an ancestor of the other on the tree. Otherwise you can classify it as a cross edge because we know that neither node is an ancestor or descendant of the other.
pseudocode:
foreach edge(a,b) in BFS order:
if !b.known then:
b.known = true
b.depth = a.depth+1
edge type is TREE
continue to next edge
while (b.depth > a.depth): b=parent(b)
while (a.depth > b.depth): a=parent(a)
if a==b then:
edge type is BACK
else:
edge type is CROSS
The key property of DFS here is that, given two nodes u and v, the interval [u.discovered, u.processed] is a subinterval of [v.discovered, v.processed] if and only if u is a descendant of v. The times in BFS do not have this property; you have to do something else, e.g., compute the intervals via DFS on the tree that BFS produced. Then the classification pseudocode is 1. check for membership in the tree (tree edge) 2. check for head's interval contains tail's (back edge) 3. check for tail's interval contains head's (forward edge) 4. otherwise, declare a cross edge.
Instead of timeof(), you need an other vertex property, which contains the distance from the root. Let name that distance.
You have to processing a v vertex in the following way:
for (v0 in v.neighbours) {
if (!v0.discovered) {
v0.discovered = true;
v0.parent = v;
v0.distance = v.distance + 1;
}
}
v.processed = true;
After you processed a vertex a v vertex, you can run the following algorithm for every edge (from v1 to v2) of the v:
if (!v1.discovered) return EdgeClass.BACK;
else if (!v2.discovered) return EdgeClass.FORWARD;
else if (v1.distance == v2.distance) return EdgeClass.CROSS;
else if (v1.distance > v2.distance) return EdgeClass.BACK;
else {
if (v2.parent == v1) return EdgeClass.TREE;
else return EdgeClass.FORWARD;
}

Merging some sorted lists with unknown order sequence

I've some lists with variable number of elements. Each list is sorted, but the sorting algorithm is not known. I would like to merge the lists into one big list which contains all lists in same order, without duplicates.
Example Input:
XS,M,L,XL
S,M,XXL
XXS,XS,S,L
Expected Result:
XXS,XS,S,M,L,XL,XXL
The expected result is obtained by matching up the input sequences in order to obtain a merged result that contains the elements of each input sequence in the correct order, like this:
XS M L XL
S M XXL
XXS XS S L
-------------------
XXS XS S M L XL XXL
The function should notify, if there are elements which have ambiguous positions. Here, it would be XXL (it could stay after M,L or XL) and I need to specify its position manually after XL (because here I know the sorting algorithm and can help).
I thought about defining pairs of every two elements, each pair in order as in original list. From this one could build the complete list.
This can be solved with a Topological Sort algorithm.
If you consider each of your input sequences to be a path through a directed graph, a topological sort will order your set of nodes from left to right in such a way that each directed edge points to the right.
The wikipedia page on Topological Sorting includes this algorithm, first described by Arthur Kahn in 1962:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
insert n into L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
This algorithm, as written, doesn't actually fail if it finds ambiguous sequences, but that's easy to add by inserting a check at the beginning of the loop, like this:
...
while S is non-empty do
if S contains more than 1 item
return error (inputs are ambiguous)
remove a node n from S
...
I don't know what language you're working in, but I've thrown together this PHP implementation as a proof of concept:
function mergeSequences($sequences, $detectAmbiguity = false) {
// build a list of nodes, with each node recording a list of all incoming edges
$nodes = array();
foreach ($sequences as $seq) {
foreach ($seq as $i => $item) {
if (!isset($nodes[$item])) $nodes[$item] = array();
if ($i !== 0) {
$nodes[$item][] = $seq[$i-1];
}
}
}
// build a list of all nodes with no incoming edges
$avail = array();
foreach ($nodes as $item => $edges) {
if (count($edges) == 0) {
$avail[] = $item;
unset($nodes[$item]);
}
}
$sorted = array();
$curr = '(start)';
while (count($avail) > 0) {
// optional: check for ambiguous sequence
if ($detectAmbiguity && count($avail) > 1) {
throw new Exception("Ambiguous sequence: {$curr} can be followed by " . join(' or ', $avail));
}
// get the next item and add it to the sorted list
$curr = array_pop($avail);
$sorted[] = $curr;
// remove all edges from the currently selected items to all others
foreach ($nodes as $item => $edges) {
$nodes[$item] = array_diff($edges, array($curr));
if (count($nodes[$item]) == 0) {
$avail[] = $item;
unset($nodes[$item]);
}
}
}
if (count($nodes) > 0) {
throw new Exception('Sequences contain conflicting information. Cannot continue after: ' . join(', ', $sorted));
}
return $sorted;
}
You can call the function like this:
$input = array(
array('XS', 'M', 'L', 'XL'),
array('S', 'M', 'XXL'),
array('XXS', 'XS', 'S', 'L'),
);
echo(join(', ', mergeSequences($input)));
echo(join(', ', mergeSequences($input, true)));
To get the following output:
XXS, XS, S, M, XXL, L, XL
Uncaught exception 'Exception' with message 'Ambiguous sequence: M can be followed by L or XXL'
You are trying to merge partially ordered sets, or posets. The ambiguous parts of the merge are called antichains. So, you want an algorithm that merges posets and tells you what the antichains are.
Here is a paper describing an algorithm for merging posets and detecting antichains, as well as a link to the first author's homepage in case you want to contact him to see if there is any source code available.
Here's what I would do:
Preprocess the lists: figuring out that XXS is smaller than XS is smaller than S is smaller than ... XXL is a [constraint satisfaction problem](http://en.wikipedia.org/wiki/Constraint_satisfaction_problem). This type of problem involves searching for a correct ordering among all the elements given the constraints defined in the original lists.
Create a bidirectional mapping from the set {XXS, ..., XXL} to the set {1, ..., 6}, after you have done step 1.
For each list, create another list by using the mapping defined in 2.
Use a modified [merge sort](http://en.wikipedia.org/wiki/Merge_sort) to combine two lists. Modify the merge algorithm so that it reports if two items being compared are identical (and disregards one of the items being merged).
Do step 4 for each pair of lists until there is one list.
Using the mapping defined in 2, create the text-version of the list.
For sorting part, I think Merge Sort is enough according to your description. One thing need to modify is during merging, we should skip elements at the input array if the first element of the input array is the same as the result array.
If I understand correctly, you want to build a total order of the whole possible input elements. Some partial order is already defined in the input arraies(since they are already sorted), while others need to specified by users. For example in the question, order
'S'<'M'<'XXL'
'XS'<'M'<'L'<'XL'
'XXS'<'XS'<'S'<'L'
is well defined. But algorithm still doesn't know whether 'XXL' is bigger or smaller than 'XL', 'L'.
Well since the three input arraies is sorted, there must exist a total order of the input elements. So my suggestion is to ask your data provider for a ordered list of all possible data elements. Sounds stupid, but it's a easy way.
If this list is not available, a easy way to deal is to prompt for a pair sort for the user, then check whether this conflict with existing input sequence and remember it, when the algorithm encounter into an ambiguous pair. I think topology sorting is more powerful than this application. Since we deal with single data elements, there must exit a total order. While topology sorting is to deal with partial order.

Finding all cycles in a directed graph

How can I find (iterate over) ALL the cycles in a directed graph from/to a given node?
For example, I want something like this:
A->B->A
A->B->C->A
but not:
B->C->B
I found this page in my search and since cycles are not same as strongly connected components, I kept on searching and finally, I found an efficient algorithm which lists all (elementary) cycles of a directed graph. It is from Donald B. Johnson and the paper can be found in the following link:
http://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF
A java implementation can be found in:
http://normalisiert.de/code/java/elementaryCycles.zip
A Mathematica demonstration of Johnson's algorithm can be found here, implementation can be downloaded from the right ("Download author code").
Note: Actually, there are many algorithms for this problem. Some of them are listed in this article:
http://dx.doi.org/10.1137/0205007
According to the article, Johnson's algorithm is the fastest one.
Depth first search with backtracking should work here.
Keep an array of boolean values to keep track of whether you visited a node before. If you run out of new nodes to go to (without hitting a node you have already been), then just backtrack and try a different branch.
The DFS is easy to implement if you have an adjacency list to represent the graph. For example adj[A] = {B,C} indicates that B and C are the children of A.
For example, pseudo-code below. "start" is the node you start from.
dfs(adj,node,visited):
if (visited[node]):
if (node == start):
"found a path"
return;
visited[node]=YES;
for child in adj[node]:
dfs(adj,child,visited)
visited[node]=NO;
Call the above function with the start node:
visited = {}
dfs(adj,start,visited)
The simplest choice I found to solve this problem was using the python lib called networkx.
It implements the Johnson's algorithm mentioned in the best answer of this question but it makes quite simple to execute.
In short you need the following:
import networkx as nx
import matplotlib.pyplot as plt
# Create Directed Graph
G=nx.DiGraph()
# Add a list of nodes:
G.add_nodes_from(["a","b","c","d","e"])
# Add a list of edges:
G.add_edges_from([("a","b"),("b","c"), ("c","a"), ("b","d"), ("d","e"), ("e","a")])
#Return a list of cycles described as a list o nodes
list(nx.simple_cycles(G))
Answer: [['a', 'b', 'd', 'e'], ['a', 'b', 'c']]
First of all - you do not really want to try find literally all cycles because if there is 1 then there is an infinite number of those. For example A-B-A, A-B-A-B-A etc. Or it may be possible to join together 2 cycles into an 8-like cycle etc., etc... The meaningful approach is to look for all so called simple cycles - those that do not cross themselves except in the start/end point. Then if you wish you can generate combinations of simple cycles.
One of the baseline algorithms for finding all simple cycles in a directed graph is this: Do a depth-first traversal of all simple paths (those that do not cross themselves) in the graph. Every time when the current node has a successor on the stack a simple cycle is discovered. It consists of the elements on the stack starting with the identified successor and ending with the top of the stack. Depth first traversal of all simple paths is similar to depth first search but you do not mark/record visited nodes other than those currently on the stack as stop points.
The brute force algorithm above is terribly inefficient and in addition to that generates multiple copies of the cycles. It is however the starting point of multiple practical algorithms which apply various enhancements in order to improve performance and avoid cycle duplication. I was surprised to find out some time ago that these algorithms are not readily available in textbooks and on the web. So I did some research and implemented 4 such algorithms and 1 algorithm for cycles in undirected graphs in an open source Java library here : http://code.google.com/p/niographs/ .
BTW, since I mentioned undirected graphs : The algorithm for those is different. Build a spanning tree and then every edge which is not part of the tree forms a simple cycle together with some edges in the tree. The cycles found this way form a so called cycle base. All simple cycles can then be found by combining 2 or more distinct base cycles. For more details see e.g. this : http://dspace.mit.edu/bitstream/handle/1721.1/68106/FTL_R_1982_07.pdf .
The DFS-based variants with back edges will find cycles indeed, but in many cases it will NOT be minimal cycles. In general DFS gives you the flag that there is a cycle but it is not good enough to actually find cycles. For example, imagine 5 different cycles sharing two edges. There is no simple way to identify cycles using just DFS (including backtracking variants).
Johnson's algorithm is indeed gives all unique simple cycles and has good time and space complexity.
But if you want to just find MINIMAL cycles (meaning that there may be more then one cycle going through any vertex and we are interested in finding minimal ones) AND your graph is not very large, you can try to use the simple method below.
It is VERY simple but rather slow compared to Johnson's.
So, one of the absolutely easiest way to find MINIMAL cycles is to use Floyd's algorithm to find minimal paths between all the vertices using adjacency matrix.
This algorithm is nowhere near as optimal as Johnson's, but it is so simple and its inner loop is so tight that for smaller graphs (<=50-100 nodes) it absolutely makes sense to use it.
Time complexity is O(n^3), space complexity O(n^2) if you use parent tracking and O(1) if you don't.
First of all let's find the answer to the question if there is a cycle.
The algorithm is dead-simple. Below is snippet in Scala.
val NO_EDGE = Integer.MAX_VALUE / 2
def shortestPath(weights: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
weights(i)(j) = throughK
}
}
}
Originally this algorithm operates on weighted-edge graph to find all shortest paths between all pairs of nodes (hence the weights argument). For it to work correctly you need to provide 1 if there is a directed edge between the nodes or NO_EDGE otherwise.
After algorithm executes, you can check the main diagonal, if there are values less then NO_EDGE than this node participates in a cycle of length equal to the value. Every other node of the same cycle will have the same value (on the main diagonal).
To reconstruct the cycle itself we need to use slightly modified version of algorithm with parent tracking.
def shortestPath(weights: Array[Array[Int]], parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = k
weights(i)(j) = throughK
}
}
}
Parents matrix initially should contain source vertex index in an edge cell if there is an edge between the vertices and -1 otherwise.
After function returns, for each edge you will have reference to the parent node in the shortest path tree.
And then it's easy to recover actual cycles.
All in all we have the following program to find all minimal cycles
val NO_EDGE = Integer.MAX_VALUE / 2;
def shortestPathWithParentTracking(
weights: Array[Array[Int]],
parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = parents(i)(k)
weights(i)(j) = throughK
}
}
}
def recoverCycles(
cycleNodes: Seq[Int],
parents: Array[Array[Int]]): Set[Seq[Int]] = {
val res = new mutable.HashSet[Seq[Int]]()
for (node <- cycleNodes) {
var cycle = new mutable.ArrayBuffer[Int]()
cycle += node
var other = parents(node)(node)
do {
cycle += other
other = parents(other)(node)
} while(other != node)
res += cycle.sorted
}
res.toSet
}
and a small main method just to test the result
def main(args: Array[String]): Unit = {
val n = 3
val weights = Array(Array(NO_EDGE, 1, NO_EDGE), Array(NO_EDGE, NO_EDGE, 1), Array(1, NO_EDGE, NO_EDGE))
val parents = Array(Array(-1, 1, -1), Array(-1, -1, 2), Array(0, -1, -1))
shortestPathWithParentTracking(weights, parents)
val cycleNodes = parents.indices.filter(i => parents(i)(i) < NO_EDGE)
val cycles: Set[Seq[Int]] = recoverCycles(cycleNodes, parents)
println("The following minimal cycle found:")
cycles.foreach(c => println(c.mkString))
println(s"Total: ${cycles.size} cycle found")
}
and the output is
The following minimal cycle found:
012
Total: 1 cycle found
To clarify:
Strongly Connected Components will find all subgraphs that have at least one cycle in them, not all possible cycles in the graph. e.g. if you take all strongly connected components and collapse/group/merge each one of them into one node (i.e. a node per component), you'll get a tree with no cycles (a DAG actually). Each component (which is basically a subgraph with at least one cycle in it) can contain many more possible cycles internally, so SCC will NOT find all possible cycles, it will find all possible groups that have at least one cycle, and if you group them, then the graph will not have cycles.
to find all simple cycles in a graph, as others mentioned, Johnson's algorithm is a candidate.
I was given this as an interview question once, I suspect this has happened to you and you are coming here for help. Break the problem into three questions and it becomes easier.
how do you determine the next valid
route
how do you determine if a point has
been used
how do you avoid crossing over the
same point again
Problem 1)
Use the iterator pattern to provide a way of iterating route results. A good place to put the logic to get the next route is probably the "moveNext" of your iterator. To find a valid route, it depends on your data structure. For me it was a sql table full of valid route possibilities so I had to build a query to get the valid destinations given a source.
Problem 2)
Push each node as you find them into a collection as you get them, this means that you can see if you are "doubling back" over a point very easily by interrogating the collection you are building on the fly.
Problem 3)
If at any point you see you are doubling back, you can pop things off the collection and "back up". Then from that point try to "move forward" again.
Hack: if you are using Sql Server 2008 there is are some new "hierarchy" things you can use to quickly solve this if you structure your data in a tree.
In the case of undirected graph, a paper recently published (Optimal listing of cycles and st-paths in undirected graphs) offers an asymptotically optimal solution. You can read it here http://arxiv.org/abs/1205.2766 or here http://dl.acm.org/citation.cfm?id=2627951
I know it doesn't answer your question, but since the title of your question doesn't mention direction, it might still be useful for Google search
Start at node X and check for all child nodes (parent and child nodes are equivalent if undirected). Mark those child nodes as being children of X. From any such child node A, mark it's children of being children of A, X', where X' is marked as being 2 steps away.). If you later hit X and mark it as being a child of X'', that means X is in a 3 node cycle. Backtracking to it's parent is easy (as-is, the algorithm has no support for this so you'd find whichever parent has X').
Note: If graph is undirected or has any bidirectional edges, this algorithm gets more complicated, assuming you don't want to traverse the same edge twice for a cycle.
If what you want is to find all elementary circuits in a graph you can use the EC algorithm, by JAMES C. TIERNAN, found on a paper since 1970.
The very original EC algorithm as I managed to implement it in php (hope there are no mistakes is shown below). It can find loops too if there are any. The circuits in this implementation (that tries to clone the original) are the non zero elements. Zero here stands for non-existence (null as we know it).
Apart from that below follows an other implementation that gives the algorithm more independece, this means the nodes can start from anywhere even from negative numbers, e.g -4,-3,-2,.. etc.
In both cases it is required that the nodes are sequential.
You might need to study the original paper, James C. Tiernan Elementary Circuit Algorithm
<?php
echo "<pre><br><br>";
$G = array(
1=>array(1,2,3),
2=>array(1,2,3),
3=>array(1,2,3)
);
define('N',key(array_slice($G, -1, 1, true)));
$P = array(1=>0,2=>0,3=>0,4=>0,5=>0);
$H = array(1=>$P, 2=>$P, 3=>$P, 4=>$P, 5=>$P );
$k = 1;
$P[$k] = key($G);
$Circ = array();
#[Path Extension]
EC2_Path_Extension:
foreach($G[$P[$k]] as $j => $child ){
if( $child>$P[1] and in_array($child, $P)===false and in_array($child, $H[$P[$k]])===false ){
$k++;
$P[$k] = $child;
goto EC2_Path_Extension;
} }
#[EC3 Circuit Confirmation]
if( in_array($P[1], $G[$P[$k]])===true ){//if PATH[1] is not child of PATH[current] then don't have a cycle
$Circ[] = $P;
}
#[EC4 Vertex Closure]
if($k===1){
goto EC5_Advance_Initial_Vertex;
}
//afou den ksana theoreitai einai asfales na svisoume
for( $m=1; $m<=N; $m++){//H[P[k], m] <- O, m = 1, 2, . . . , N
if( $H[$P[$k-1]][$m]===0 ){
$H[$P[$k-1]][$m]=$P[$k];
break(1);
}
}
for( $m=1; $m<=N; $m++ ){//H[P[k], m] <- O, m = 1, 2, . . . , N
$H[$P[$k]][$m]=0;
}
$P[$k]=0;
$k--;
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC5_Advance_Initial_Vertex:
if($P[1] === N){
goto EC6_Terminate;
}
$P[1]++;
$k=1;
$H=array(
1=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
2=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
3=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
4=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
5=>array(1=>0,2=>0,3=>0,4=>0,5=>0)
);
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC6_Terminate:
print_r($Circ);
?>
then this is the other implementation, more independent of the graph, without goto and without array values, instead it uses array keys, the path, the graph and circuits are stored as array keys (use array values if you like, just change the required lines). The example graph start from -4 to show its independence.
<?php
$G = array(
-4=>array(-4=>true,-3=>true,-2=>true),
-3=>array(-4=>true,-3=>true,-2=>true),
-2=>array(-4=>true,-3=>true,-2=>true)
);
$C = array();
EC($G,$C);
echo "<pre>";
print_r($C);
function EC($G, &$C){
$CNST_not_closed = false; // this flag indicates no closure
$CNST_closed = true; // this flag indicates closure
// define the state where there is no closures for some node
$tmp_first_node = key($G); // first node = first key
$tmp_last_node = $tmp_first_node-1+count($G); // last node = last key
$CNST_closure_reset = array();
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$CNST_closure_reset[$k] = $CNST_not_closed;
}
// define the state where there is no closure for all nodes
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$H[$k] = $CNST_closure_reset; // Key in the closure arrays represent nodes
}
unset($tmp_first_node);
unset($tmp_last_node);
# Start algorithm
foreach($G as $init_node => $children){#[Jump to initial node set]
#[Initial Node Set]
$P = array(); // declare at starup, remove the old $init_node from path on loop
$P[$init_node]=true; // the first key in P is always the new initial node
$k=$init_node; // update the current node
// On loop H[old_init_node] is not cleared cause is never checked again
do{#Path 1,3,7,4 jump here to extend father 7
do{#Path from 1,3,8,5 became 2,4,8,5,6 jump here to extend child 6
$new_expansion = false;
foreach( $G[$k] as $child => $foo ){#Consider each child of 7 or 6
if( $child>$init_node and isset($P[$child])===false and $H[$k][$child]===$CNST_not_closed ){
$P[$child]=true; // add this child to the path
$k = $child; // update the current node
$new_expansion=true;// set the flag for expanding the child of k
break(1); // we are done, one child at a time
} } }while(($new_expansion===true));// Do while a new child has been added to the path
# If the first node is child of the last we have a circuit
if( isset($G[$k][$init_node])===true ){
$C[] = $P; // Leaving this out of closure will catch loops to
}
# Closure
if($k>$init_node){ //if k>init_node then alwaya count(P)>1, so proceed to closure
$new_expansion=true; // $new_expansion is never true, set true to expand father of k
unset($P[$k]); // remove k from path
end($P); $k_father = key($P); // get father of k
$H[$k_father][$k]=$CNST_closed; // mark k as closed
$H[$k] = $CNST_closure_reset; // reset k closure
$k = $k_father; // update k
} } while($new_expansion===true);//if we don't wnter the if block m has the old k$k_father_old = $k;
// Advance Initial Vertex Context
}//foreach initial
}//function
?>
I have analized and documented the EC but unfortunately the documentation is in Greek.
There are two steps (algorithms) involved in finding all cycles in a DAG.
The first step is to use Tarjan's algorithm to find the set of strongly connected components.
Start from any arbitrary vertex.
DFS from that vertex. For each node x, keep two numbers, dfs_index[x] and dfs_lowval[x].
dfs_index[x] stores when that node is visited, while dfs_lowval[x] = min(dfs_low[k]) where
k is all the children of x that is not the directly parent of x in the dfs-spanning tree.
All nodes with the same dfs_lowval[x] are in the same strongly connected component.
The second step is to find cycles (paths) within the connected components. My suggestion is to use a modified version of Hierholzer's algorithm.
The idea is:
Choose any starting vertex v, and follow a trail of edges from that vertex until you return to v.
It is not possible to get stuck at any vertex other than v, because the even degree of all vertices ensures that, when the trail enters another vertex w there must be an unused edge leaving w. The tour formed in this way is a closed tour, but may not cover all the vertices and edges of the initial graph.
As long as there exists a vertex v that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from v, following unused edges until you return to v, and join the tour formed in this way to the previous tour.
Here is the link to a Java implementation with a test case:
http://stones333.blogspot.com/2013/12/find-cycles-in-directed-graph-dag.html
I stumbled over the following algorithm which seems to be more efficient than Johnson's algorithm (at least for larger graphs). I am however not sure about its performance compared to Tarjan's algorithm.
Additionally, I only checked it out for triangles so far. If interested, please see "Arboricity and Subgraph Listing Algorithms" by Norishige Chiba and Takao Nishizeki (http://dx.doi.org/10.1137/0214017)
DFS from the start node s, keep track of the DFS path during traversal, and record the path if you find an edge from node v in the path to s. (v,s) is a back-edge in the DFS tree and thus indicates a cycle containing s.
Regarding your question about the Permutation Cycle, read more here:
https://www.codechef.com/problems/PCYCLE
You can try this code (enter the size and the digits number):
# include<cstdio>
using namespace std;
int main()
{
int n;
scanf("%d",&n);
int num[1000];
int visited[1000]={0};
int vindex[2000];
for(int i=1;i<=n;i++)
scanf("%d",&num[i]);
int t_visited=0;
int cycles=0;
int start=0, index;
while(t_visited < n)
{
for(int i=1;i<=n;i++)
{
if(visited[i]==0)
{
vindex[start]=i;
visited[i]=1;
t_visited++;
index=start;
break;
}
}
while(true)
{
index++;
vindex[index]=num[vindex[index-1]];
if(vindex[index]==vindex[start])
break;
visited[vindex[index]]=1;
t_visited++;
}
vindex[++index]=0;
start=index+1;
cycles++;
}
printf("%d\n",cycles,vindex[0]);
for(int i=0;i<(n+2*cycles);i++)
{
if(vindex[i]==0)
printf("\n");
else
printf("%d ",vindex[i]);
}
}
DFS c++ version for the pseudo-code in second floor's answer:
void findCircleUnit(int start, int v, bool* visited, vector<int>& path) {
if(visited[v]) {
if(v == start) {
for(auto c : path)
cout << c << " ";
cout << endl;
return;
}
else
return;
}
visited[v] = true;
path.push_back(v);
for(auto i : G[v])
findCircleUnit(start, i, visited, path);
visited[v] = false;
path.pop_back();
}
http://www.me.utexas.edu/~bard/IP/Handouts/cycles.pdf
The CXXGraph library give a set of algorithms and functions to detect cycles.
For a full algorithm explanation visit the wiki.

Resources