minimum collection of vertice disjoint path that covers a given vertice set - algorithm

Problem
Given:
A directed graph G
A source vertex s in G and a target vertex t in G
A set S of vertices of G
I want to find a collection of paths from s to t that covers S.
Then I want to partition the collection of paths into subcollections of vertex-disjoint paths.
Under these constraints, the objective is to minimise the number of subcollections.
Example
For instance, [C1 = {p1,p2,p3}, C2= {p4,p5}, C3= {p6,p7}] is a solution if:
each p_i is a path from s to t
p1,p2,p3 have no vertices in common except s and t;
p4, p5 have no vertices in common except s and t;
p6,p7 have no vertices in common except s and t;
collectively, the 7 paths cover all vertices of S.
In that case, the number of subcollections is 3.
Question
What are some good algorithms or heuristics for this optimisation problem?
I already know min cost flow, and disjoint path algos, but they don't apply in my settings.
I tried min cost flow / node disjoint paths but one run only gives one collection at a time. I don't know how to adjust cost to cover the unexplored vertices.

Given:
A directed graph G
A source vertex s in G and a target vertex t in G
A set S of vertices of G
I want to find a collection of paths from s to t that covers S.
Use Dijkstra's algorithm to find a path from s to every vertex in S and from every point in S to t.
Connect the paths to and from each S vertex into one path from s to t via a point in S.
You now have a collection of paths that, together, cover S. Let's call it CS.
Then I want to partition the collection of paths into subcollections
of vertex-disjoint paths.
Note that if s, the source vertex, has an out degree of sOD, there can be no more than sOD paths in each vertex disjoint collection.
Construct vVDP, an empty vector of vertex disjoint path collections
LOOP P over paths in CS
SET found FALSE
LOOP cs over collections in vVDP
IF P is vertex disjoint with every path in cs
add P to cs
SET found TRUE
BREAK out of LOOP cs
IF found == false
ADD collection containing P to vVDP
Here is a C++ implementation of this algorithm
void cProblem::collect()
{
// loop over paths
for (auto &P : vpath)
{
// loop over collections
bool found = false;
for (auto &cs : vVDP)
{
//loop over paths in collection
bool disjoint;
for (auto &csPath : cs)
{
// check that P is vertex disjoint with path in collection
disjoint = true;
for (auto vc : csPath)
{
for (auto &vp : P)
{
if (vp == vc) {
disjoint = false;
break;
}
}
}
if( ! disjoint )
break;
}
if (disjoint)
{
// P is vertex disjoint from every path in collection
// add P to the collection
cs.push_back(P);
found = true;
break;
}
}
if (!found)
{
// P was NOT vertex disjoint with the paths in any collection
// start a new collection with P
std::vector<std::vector<int>> collection;
collection.push_back(P);
vVDP.push_back(collection);
}
}
}
The complete application is at https://github.com/JamesBremner/so75419067
Detailed documentation id the required input file format at
https://github.com/JamesBremner/so75419067/wiki
If you post a real example in the correct format, I will run the algorithm on it for you.

Related

sort graph by distance to end nodes

I have a list of nodes which belong in a graph. The graph is directed and does not contain cycles. Also, some of the nodes are marked as "end" nodes. Every node has a set of input nodes I can use.
The question is the following: How can I sort (ascending) the nodes in the list by the biggest distance to any reachable end node? Here is an example off how the graph could look like.
I have already added the calculated distance after which I can sort the nodes (grey). The end nodes have the distance 0 while C, D and G have the distance 1. However, F has the distance of 3 because the approach over D would be shorter (2).
I have made a concept of which I think, the problem would be solved. Here is some pseudo-code:
sortedTable<Node, depth> // used to store nodes and their currently calculated distance
tempTable<Node>// used to store nodes
currentDepth = 0;
- fill tempTable with end nodes
while( tempTable is not empty)
{
- create empty newTempTable<Node node>
// add tempTable to sortedTable
for (every "node" in tempTable)
{
if("node" is in sortedTable)
{
- overwrite depth in sortedTable with currentDepth
}
else
{
- add (node, currentDepth) to sortedTable
}
// get the node in the next layer
for ( every "newNode" connected to node)
{
- add newNode to newTempTable
}
- tempTable = newTempTable
}
currentDepth++;
}
This approach should work. However, the problem with this algorithm is that it basicly creates a tree from the graph based from every end node and then corrects old distance-calculations for every depth. For example: G would have the depth 1 (calculatet directly over B), then the depth 3 (calculated over A, D and F) and then depth 4 (calculated over A, C, E and F).
Do you have a better solution to this problem?
It can be done with dynamic programming.
The graph is a DAG, so first do a topological sort on the graph, let the sorted order be v1,v2,v3,...,vn.
Now, set D(v)=0 for all "end node", and from last to first (according to topological order) do:
D(v) = max { D(u) + 1, for each edge (v,u) }
It works because the graph is a DAG, and when done in reversed to the topological order, the values of all D(u) for all outgoing edges (v,u) is already known.
Example on your graph:
Topological sort (one possible):
H,G,B,F,D,E,C,A
Then, the algorithm:
init:
D(B)=D(A)=0
Go back from last to first:
D(A) - no out edges, done
D(C) = max{D(A) + 1} = max{0+1}=1
D(E) = max{D(C) + 1} = 2
D(D) = max{D(A) + 1} = 1
D(F) = max{D(E)+1, D(D)+1} = max{2+1,1+1} = 3
D(B) = 0
D(G) = max{D(B)+1,D(F)+1} = max{1,4}=4
D(H) = max{D(G) + 1} = 5
As a side note, if the graph is not a DAG, but a general graph, this is a variant of the Longest Path Problem, which is NP-Complete.
Luckily, it does have an efficient solution when our graph is a DAG.

Edge classification during Breadth-first search on a directed graph

I am having difficulties finding a way to properly classify the edges while a breadth-first search on a directed graph.
During a breadth-first or depth-first search, you can classify the edges met with 4 classes:
TREE
BACK
CROSS
FORWARD
Skiena [1] gives an implementation. If you move along an edge from v1 to v2, here is a way to return the class during a DFS in java, for reference. The parents map returns the parent vertex for the current search, and the timeOf() method, the time at which the vertex has been discovered.
if ( v1.equals( parents.get( v2 ) ) ) { return EdgeClass.TREE; }
if ( discovered.contains( v2 ) && !processed.contains( v2 ) ) { return EdgeClass.BACK; }
if ( processed.contains( v2 ) )
{
if ( timeOf( v1 ) < timeOf( v2 ) )
{
return EdgeClass.FORWARD;
}
else
{
return EdgeClass.CROSS;
}
}
return EdgeClass.UNCLASSIFIED;
My problem is that I cannot get it right for a Breadth first search on a directed graph. For instance:
The following graph - that is a loop - is ok:
A -> B
A -> C
B -> C
BFSing from A, B will be discovered, then C. The edges eAB and eAC are TREE edges, and when eBC is crossed last, B and C are processed and discovered, and this edge is properly classified as CROSS.
But a plain loop does not work:
A -> B
B -> C
C -> A
When the edge eCA is crossed last, A is processed and discovered. So this edge is incorrectly labeled as CROSS, whether it should be a BACK edge.
There is indeed no difference in the way the two cases are treated, even if the two edges have different classes.
How do you implement a proper edge classification for a BFS on a directed graph?
[1] http://www.algorist.com/
EDIT
Here an implementation derived from #redtuna answer.
I just added a check not to fetch the parent of the root.
I have JUnits tests that show it works for directed and undirected graphs, in the case of a loop, a straight line, a fork, a standard example, a single edge, etc....
#Override
public EdgeClass edgeClass( final V from, final V to )
{
if ( !discovered.contains( to ) ) { return EdgeClass.TREE; }
int toDepth = depths.get( to );
int fromDepth = depths.get( from );
V b = to;
while ( toDepth > 0 && fromDepth < toDepth )
{
b = parents.get( b );
toDepth = depths.get( b );
}
V a = from;
while ( fromDepth > 0 && toDepth < fromDepth )
{
a = parents.get( a );
fromDepth = depths.get( a );
}
if ( a.equals( b ) )
{
return EdgeClass.BACK;
}
else
{
return EdgeClass.CROSS;
}
}
How do you implement a proper edge classification for a BFS on a
directed graph?
As you already established, seeing a node for the first time creates a tree edge. The problem with BFS instead of DFS, as David Eisenstat said before me, is that back edges cannot be distinguished from cross ones just based on traversal order.
Instead, you need to do a bit of extra work to distinguish them. The key, as you'll see, is to use the definition of a cross edge.
The simplest (but memory-intensive) way is to associate every node with the set of its predecessors. This can be done trivially when you visit nodes. When finding a non-tree edge between nodes a and b, consider their predecessor sets. If one is a proper subset of the other, then you have a back edge. Otherwise, it's a cross edge. This comes directly from the definition of a cross edge: it's an edge between nodes where neither is the ancestor nor the descendant of the other on the tree.
A better way is to associate only a "depth" number with each node instead of a set. Again, this is readily done as you visit nodes. Now when you find a non-tree edge between a and b, start from the deeper of the two nodes and follow the tree edges backwards until you go back to the same depth as the other. So for example suppose a was deeper. Then you repeatedly compute a=parent(a) until depth(a)=depth(b).
If at this point a=b then you can classify the edge as a back edge because, as per the definition, one of the nodes is an ancestor of the other on the tree. Otherwise you can classify it as a cross edge because we know that neither node is an ancestor or descendant of the other.
pseudocode:
foreach edge(a,b) in BFS order:
if !b.known then:
b.known = true
b.depth = a.depth+1
edge type is TREE
continue to next edge
while (b.depth > a.depth): b=parent(b)
while (a.depth > b.depth): a=parent(a)
if a==b then:
edge type is BACK
else:
edge type is CROSS
The key property of DFS here is that, given two nodes u and v, the interval [u.discovered, u.processed] is a subinterval of [v.discovered, v.processed] if and only if u is a descendant of v. The times in BFS do not have this property; you have to do something else, e.g., compute the intervals via DFS on the tree that BFS produced. Then the classification pseudocode is 1. check for membership in the tree (tree edge) 2. check for head's interval contains tail's (back edge) 3. check for tail's interval contains head's (forward edge) 4. otherwise, declare a cross edge.
Instead of timeof(), you need an other vertex property, which contains the distance from the root. Let name that distance.
You have to processing a v vertex in the following way:
for (v0 in v.neighbours) {
if (!v0.discovered) {
v0.discovered = true;
v0.parent = v;
v0.distance = v.distance + 1;
}
}
v.processed = true;
After you processed a vertex a v vertex, you can run the following algorithm for every edge (from v1 to v2) of the v:
if (!v1.discovered) return EdgeClass.BACK;
else if (!v2.discovered) return EdgeClass.FORWARD;
else if (v1.distance == v2.distance) return EdgeClass.CROSS;
else if (v1.distance > v2.distance) return EdgeClass.BACK;
else {
if (v2.parent == v1) return EdgeClass.TREE;
else return EdgeClass.FORWARD;
}

Algorithm to find lowest common ancestor in directed acyclic graph?

Imagine a directed acyclic graph as follows, where:
"A" is the root (there is always exactly one root)
each node knows its parent(s)
the node names are arbitrary - nothing can be inferred from them
we know from another source that the nodes were added to the tree in the order A to G (e.g. they are commits in a version control system)
What algorithm could I use to determine the lowest common ancestor (LCA) of two arbitrary nodes, for example, the common ancestor of:
B and E is B
D and F is B
Note:
There is not necessarily a single path to a given node from the root (e.g. "G" has two paths), so you can't simply traverse paths from root to the two nodes and look for the last equal element
I've found LCA algorithms for trees, especially binary trees, but they do not apply here because a node can have multiple parents (i.e. this is not a tree)
Den Roman's link (Archived version) seems promising, but it seemed a little bit complicated to me, so I tried another approach. Here is a simple algorithm I used:
Let say you want to compute LCA(x,y) with x and y two nodes.
Each node must have a value color and count, resp. initialized to white and 0.
Color all ancestors of x as blue (can be done using BFS)
Color all blue ancestors of y as red (BFS again)
For each red node in the graph, increment its parents' count by one
Each red node having a count value set to 0 is a solution.
There can be more than one solution, depending on your graph. For instance, consider this graph:
LCA(4,5) possible solutions are 1 and 2.
Note it still work if you want find the LCA of 3 nodes or more, you just need to add a different color for each of them.
I was looking for a solution to the same problem and I found a solution in the following paper:
http://dx.doi.org/10.1016/j.ipl.2010.02.014
In short, you are not looking for the lowest common ancestor, but for the lowest SINGLE common ancestor, which they define in this paper.
I know it's and old question and pretty good discussion, but since I had some similar problem to solve I came across JGraphT's Lowest Common Ancestor algorithms, thought this might be of help:
NativeLcaFinder
TarjanLowestCommonAncestor
Just some wild thinking. What about using both input nodes as roots, and doing two BFS simultaneously step by step. At a certain step, when there are overlapping in their BLACK sets (recording visited nodes), algorithm stops and the overlapped nodes are their LCA(s). In this way, any other common ancestors will have longer distances than what we have discovered.
Assume that you want to find the ancestors of x and y in a graph.
Maintain an array of vectors- parents (storing parents of each node).
Firstly do a bfs(keep storing parents of each vertex) and find all the ancestors of x (find parents of x and using parents, find all the ancestors of x) and store them in a vector. Also, store the depth of each parent in the vector.
Find the ancestors of y using same method and store them in another vector. Now, you have two vectors storing the ancestors of x and y respectively along with their depth.
LCA would be common ancestor with greatest depth. Depth is defined as longest distance from root(vertex with in_degree=0). Now, we can sort the vectors in decreasing order of their depths and find out the LCA. Using this method, we can even find multiple LCA's (if there).
This link (Archived version) describes how it is done in Mercurial - the basic idea is to find all parents for the specified nodes, group them per distance from the root, then do a search on those groups.
If the graph has cycles then 'ancestor' is loosely defined. Perhaps you mean the ancestor on the tree output of a DFS or BFS? Or perhaps by 'ancestor' you mean the node in the digraph that minimizes the number of hops from E and B?
If you're not worried about complexity, then you could compute an A* (or Dijkstra's shortest path) from every node to both E and B. For the nodes that can reach both E and B, you can find the node that minimizes PathLengthToE + PathLengthToB.
EDIT:
Now that you've clarified a few things, I think I understand what you're looking for.
If you can only go "up" the tree, then I suggest you perform a BFS from E and also a BFS from B. Every node in your graph will have two variables associated with it: hops from B and hops from E. Let both B and E have copies of the list of graph nodes. B's list is sorted by hops from B while E's list is sorted by hops from E.
For each element in B's list, attempt to find it in E's list. Place matches in a third list, sorted by hops from B + hops from E. After you've exhausted B's list, your third sorted list should contain the LCA at its head. This allows for one solution, multiple solutions(arbitrarily chosen among by their BFS ordering for B), or no solution.
I also need exactly same thing , to find LCA in a DAG (directed acyclic graph). LCA problem is related to RMQ (Range Minimum Query Problem).
It is possible to reduce LCA to RMQ and find desired LCA of two arbitrary node from a directed acyclic graph.
I found THIS TUTORIAL detail and good. I am also planing to implement this.
I am proposing O(|V| + |E|) time complexity solution, and i think this approach is correct otherwise please correct me.
Given directed acyclic graph, we need to find LCA of two vertices v and w.
Step1: Find shortest distance of all vertices from root vertex using bfs http://en.wikipedia.org/wiki/Breadth-first_search with time complexity O(|V| + |E|) and also find the parent of each vertices.
Step2: Find the common ancestors of both the vertices by using parent until we reach root vertex Time complexity- 2|v|
Step3: LCA will be that common ancestor which have maximum shortest distance.
So, this is O(|V| + |E|) time complexity algorithm.
Please, correct me if i am wrong or any other suggestions are welcome.
package FB;
import java.util.*;
public class commomAnsectorForGraph {
public static void main(String[] args){
commomAnsectorForGraph com = new commomAnsectorForGraph();
graphNode g = new graphNode('g');
graphNode d = new graphNode('d');
graphNode f = new graphNode('f');
graphNode c = new graphNode('c');
graphNode e = new graphNode('e');
graphNode a = new graphNode('a');
graphNode b = new graphNode('b');
List<graphNode> gc = new ArrayList<>();
gc.add(d);
gc.add(f);
g.children = gc;
List<graphNode> dc = new ArrayList<>();
dc.add(c);
d.children = dc;
List<graphNode> cc = new ArrayList<>();
cc.add(b);
c.children = cc;
List<graphNode> bc = new ArrayList<>();
bc.add(a);
b.children = bc;
List<graphNode> fc = new ArrayList<>();
fc.add(e);
f.children = fc;
List<graphNode> ec = new ArrayList<>();
ec.add(b);
e.children = ec;
List<graphNode> ac = new ArrayList<>();
a.children = ac;
graphNode gn = com.findAncestor(g, c, d);
System.out.println(gn.value);
}
public graphNode findAncestor(graphNode root, graphNode a, graphNode b){
if(root == null) return null;
if(root.value == a.value || root.value == b.value) return root;
List<graphNode> list = root.children;
int count = 0;
List<graphNode> temp = new ArrayList<>();
for(graphNode node : list){
graphNode res = findAncestor(node, a, b);
temp.add(res);
if(res != null) {
count++;
}
}
if(count == 2) return root;
for(graphNode t : temp){
if(t != null) return t;
}
return null;
}
}
class graphNode{
char value;
graphNode parent;
List<graphNode> children;
public graphNode(char value){
this.value = value;
}
}
Everyone.
Try please in Java.
static String recentCommonAncestor(String[] commitHashes, String[][] ancestors, String strID, String strID1)
{
HashSet<String> setOfAncestorsLower = new HashSet<String>();
HashSet<String> setOfAncestorsUpper = new HashSet<String>();
String[] arrPair= {strID, strID1};
Arrays.sort(arrPair);
Comparator<String> comp = new Comparator<String>(){
#Override
public int compare(String s1, String s2) {
return s2.compareTo(s1);
}};
int indexUpper = Arrays.binarySearch(commitHashes, arrPair[0], comp);
int indexLower = Arrays.binarySearch(commitHashes, arrPair[1], comp);
setOfAncestorsLower.addAll(Arrays.asList(ancestors[indexLower]));
setOfAncestorsUpper.addAll(Arrays.asList(ancestors[indexUpper]));
HashSet<String>[] sets = new HashSet[] {setOfAncestorsLower, setOfAncestorsUpper};
for (int i = indexLower + 1; i < commitHashes.length; i++)
{
for (int j = 0; j < 2; j++)
{
if (sets[j].contains(commitHashes[i]))
{
if (i > indexUpper)
if(sets[1 - j].contains(commitHashes[i]))
return commitHashes[i];
sets[j].addAll(Arrays.asList(ancestors[i]));
}
}
}
return null;
}
The idea is very simple. We suppose that commitHashes ordered in downgrade sequence.
We find lowest and upper indexes of strings(hashes-does not mean).
It is clearly that (considering descendant order) the common ancestor can be only after upper index (lower value among hashes).
Then we start enumerating the hashes of commit and build chain of descendent parent chains . For this purpose we have two hashsets are initialised by parents of lowest and upper hash of commit. setOfAncestorsLower, setOfAncestorsUpper. If next hash -commit belongs to any of chains(hashsets),
then if current index is upper than index of lowest hash, then if it is contained in another set (chain) we return the current hash as result. If not, we add its parents (ancestors[i]) to hashset, which traces set of ancestors of set,, where the current element contained. That is the all, basically

Dijkstra's pathfinding algorithm

I'm learning about pathfinding from a book, but I have spotted some weird statement.
For completeness, I will include most of the code, but feel free to skip to the second part (Search())
template <class graph_type >
class Graph_SearchDijkstra
{
private:
//create typedefs for the node and edge types used by the graph
typedef typename graph_type::EdgeType Edge;
typedef typename graph_type::NodeType Node;
private:
const graph_type & m_Graph;
//this vector contains the edges that comprise the shortest path tree -
//a directed sub-tree of the graph that encapsulates the best paths from
//every node on the SPT to the source node.
std::vector<const Edge*> m_ShortestPathTree;
//this is indexed into by node index and holds the total cost of the best
//path found so far to the given node. For example, m_CostToThisNode[5]
//will hold the total cost of all the edges that comprise the best path
//to node 5 found so far in the search (if node 5 is present and has
//been visited of course).
std::vector<double> m_CostToThisNode;
//this is an indexed (by node) vector of "parent" edges leading to nodes
//connected to the SPT but that have not been added to the SPT yet.
std::vector<const Edge*> m_SearchFrontier;
int m_iSource;
int m_iTarget;
void Search();
public:
Graph_SearchDijkstra(const graph_type& graph,
int source,
int target = -1):m_Graph(graph),
m_ShortestPathTree(graph.NumNodes()),
m_SearchFrontier(graph.NumNodes()),
m_CostToThisNode(graph.NumNodes()),
m_iSource(source),
m_iTarget(target)
{
Search();
}
//returns the vector of edges defining the SPT. If a target is given
//in the constructor, then this will be the SPT comprising all the nodes
//examined before the target is found, else it will contain all the nodes
//in the graph.
std::vector<const Edge*> GetAllPaths()const;
//returns a vector of node indexes comprising the shortest path
//from the source to the target. It calculates the path by working
//backward through the SPT from the target node.
std::list<int> GetPathToTarget()const;
//returns the total cost to the target
double GetCostToTarget()const;
};
Search():
template <class graph_type>
void Graph_SearchDijkstra<graph_type>::Search()
{
//create an indexed priority queue that sorts smallest to largest
//(front to back). Note that the maximum number of elements the iPQ
//may contain is NumNodes(). This is because no node can be represented
// on the queue more than once.
IndexedPriorityQLow<double> pq(m_CostToThisNode, m_Graph.NumNodes());
//put the source node on the queue
pq.insert(m_iSource);
//while the queue is not empty
while(!pq.empty())
{
//get the lowest cost node from the queue. Don't forget, the return value
//is a *node index*, not the node itself. This node is the node not already
//on the SPT that is the closest to the source node
int NextClosestNode = pq.Pop();
//move this edge from the search frontier to the shortest path tree
m_ShortestPathTree[NextClosestNode] = m_SearchFrontier[NextClosestNode];
//if the target has been found exit
if (NextClosestNode == m_iTarget) return;
//now to relax the edges. For each edge connected to the next closest node
graph_type::ConstEdgeIterator ConstEdgeItr(m_Graph, NextClosestNode);
for (const Edge* pE=ConstEdgeItr.begin();
!ConstEdgeItr.end();
pE=ConstEdgeItr.next())
{
//the total cost to the node this edge points to is the cost to the
//current node plus the cost of the edge connecting them.
double NewCost = m_CostToThisNode[NextClosestNode] + pE->Cost();
//if this edge has never been on the frontier make a note of the cost
//to reach the node it points to, then add the edge to the frontier
//and the destination node to the PQ.
if (m_SearchFrontier[pE->To()] == 0)
{
m_CostToThisNode[pE->To()] = NewCost;
pq.insert(pE->To());
m_SearchFrontier[pE->To()] = pE;
}
//else test to see if the cost to reach the destination node via the
//current node is cheaper than the cheapest cost found so far. If
//this path is cheaper we assign the new cost to the destination
//node, update its entry in the PQ to reflect the change, and add the
//edge to the frontier
else if ( (NewCost < m_CostToThisNode[pE->To()]) &&
(m_ShortestPathTree[pE->To()] == 0) )
{
m_CostToThisNode[pE->To()] = NewCost;
//because the cost is less than it was previously, the PQ must be
//resorted to account for this.
pq.ChangePriority(pE->To());
m_SearchFrontier[pE->To()] = pE;
}
}
}
}
What I don't get is this part:
//else test to see if the cost to reach the destination node via the
//current node is cheaper than the cheapest cost found so far. If
//this path is cheaper we assign the new cost to the destination
//node, update its entry in the PQ to reflect the change, and add the
//edge to the frontier
else if ( (NewCost < m_CostToThisNode[pE->To()]) &&
(m_ShortestPathTree[pE->To()] == 0) )
if the new cost is lower than the cost already found, then why do we also test if the node has not already been added to the SPT? This seems to beat the purpose of the check?
FYI, in m_ShortestPathTree[pE->To()] == 0 the container is a vector that has a pointer to an edge (or NULL) for each index (the index represents a node)
Imagine the following graph:
S --5-- A --2-- F
\ /
-3 -4
\ /
B
And you want to go from S to F. First, let me tell you the Dijkstra's algorithm assumes there are no loops in the graph with a negative weight. In my example, this loop is S -> B -> A -> S or simpler yet, S -> B -> S
If you have such a loop, you can infinitely loop in it, and your cost to F gets lower and lower. That is why this is not acceptable by Dijkstra's algorithm.
Now, how do you check that? It is as the code you posted does. Every time you want to update the weight of a node, besides checking whether it gets smaller weight, you check if it's not in the accepted list. If it is, then you must have had a negative weight loop. Otherwise, how can you end up going forward and reaching an already accepted node with smaller weight?
Let's follow the algorithm on the example graph (nodes in [] are accepted):
Without the if in question:
Starting Weights: S(0), A(inf), B(inf), F(inf)
- Accept S
New weights: [S(0)], A(5), B(-3), F(inf)
- Accept B
New weights: [S(-3)], A(-7), [B(-3)], F(inf)
- Accept A
New weights: [S(-3)], [A(-7)], [B(-11)], F(-5)
- Accept B again
New weights: [S(-14)], [A(-18)], [B(-11)], F(-5)
- Accept A again
... infinite loop
With the if in question:
Starting Weights: S(0), A(inf), B(inf), F(inf)
- Accept S
New weights: [S(0)], A(5), B(-3), F(inf)
- Accept B (doesn't change S)
New weights: [S(0)], A(-7), [B(-3)], F(inf)
- Accept A (doesn't change S or B
New weights: [S(0)], [A(-7)], [B(-3)], F(-5)
- Accept F

Finding all cycles in a directed graph

How can I find (iterate over) ALL the cycles in a directed graph from/to a given node?
For example, I want something like this:
A->B->A
A->B->C->A
but not:
B->C->B
I found this page in my search and since cycles are not same as strongly connected components, I kept on searching and finally, I found an efficient algorithm which lists all (elementary) cycles of a directed graph. It is from Donald B. Johnson and the paper can be found in the following link:
http://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF
A java implementation can be found in:
http://normalisiert.de/code/java/elementaryCycles.zip
A Mathematica demonstration of Johnson's algorithm can be found here, implementation can be downloaded from the right ("Download author code").
Note: Actually, there are many algorithms for this problem. Some of them are listed in this article:
http://dx.doi.org/10.1137/0205007
According to the article, Johnson's algorithm is the fastest one.
Depth first search with backtracking should work here.
Keep an array of boolean values to keep track of whether you visited a node before. If you run out of new nodes to go to (without hitting a node you have already been), then just backtrack and try a different branch.
The DFS is easy to implement if you have an adjacency list to represent the graph. For example adj[A] = {B,C} indicates that B and C are the children of A.
For example, pseudo-code below. "start" is the node you start from.
dfs(adj,node,visited):
if (visited[node]):
if (node == start):
"found a path"
return;
visited[node]=YES;
for child in adj[node]:
dfs(adj,child,visited)
visited[node]=NO;
Call the above function with the start node:
visited = {}
dfs(adj,start,visited)
The simplest choice I found to solve this problem was using the python lib called networkx.
It implements the Johnson's algorithm mentioned in the best answer of this question but it makes quite simple to execute.
In short you need the following:
import networkx as nx
import matplotlib.pyplot as plt
# Create Directed Graph
G=nx.DiGraph()
# Add a list of nodes:
G.add_nodes_from(["a","b","c","d","e"])
# Add a list of edges:
G.add_edges_from([("a","b"),("b","c"), ("c","a"), ("b","d"), ("d","e"), ("e","a")])
#Return a list of cycles described as a list o nodes
list(nx.simple_cycles(G))
Answer: [['a', 'b', 'd', 'e'], ['a', 'b', 'c']]
First of all - you do not really want to try find literally all cycles because if there is 1 then there is an infinite number of those. For example A-B-A, A-B-A-B-A etc. Or it may be possible to join together 2 cycles into an 8-like cycle etc., etc... The meaningful approach is to look for all so called simple cycles - those that do not cross themselves except in the start/end point. Then if you wish you can generate combinations of simple cycles.
One of the baseline algorithms for finding all simple cycles in a directed graph is this: Do a depth-first traversal of all simple paths (those that do not cross themselves) in the graph. Every time when the current node has a successor on the stack a simple cycle is discovered. It consists of the elements on the stack starting with the identified successor and ending with the top of the stack. Depth first traversal of all simple paths is similar to depth first search but you do not mark/record visited nodes other than those currently on the stack as stop points.
The brute force algorithm above is terribly inefficient and in addition to that generates multiple copies of the cycles. It is however the starting point of multiple practical algorithms which apply various enhancements in order to improve performance and avoid cycle duplication. I was surprised to find out some time ago that these algorithms are not readily available in textbooks and on the web. So I did some research and implemented 4 such algorithms and 1 algorithm for cycles in undirected graphs in an open source Java library here : http://code.google.com/p/niographs/ .
BTW, since I mentioned undirected graphs : The algorithm for those is different. Build a spanning tree and then every edge which is not part of the tree forms a simple cycle together with some edges in the tree. The cycles found this way form a so called cycle base. All simple cycles can then be found by combining 2 or more distinct base cycles. For more details see e.g. this : http://dspace.mit.edu/bitstream/handle/1721.1/68106/FTL_R_1982_07.pdf .
The DFS-based variants with back edges will find cycles indeed, but in many cases it will NOT be minimal cycles. In general DFS gives you the flag that there is a cycle but it is not good enough to actually find cycles. For example, imagine 5 different cycles sharing two edges. There is no simple way to identify cycles using just DFS (including backtracking variants).
Johnson's algorithm is indeed gives all unique simple cycles and has good time and space complexity.
But if you want to just find MINIMAL cycles (meaning that there may be more then one cycle going through any vertex and we are interested in finding minimal ones) AND your graph is not very large, you can try to use the simple method below.
It is VERY simple but rather slow compared to Johnson's.
So, one of the absolutely easiest way to find MINIMAL cycles is to use Floyd's algorithm to find minimal paths between all the vertices using adjacency matrix.
This algorithm is nowhere near as optimal as Johnson's, but it is so simple and its inner loop is so tight that for smaller graphs (<=50-100 nodes) it absolutely makes sense to use it.
Time complexity is O(n^3), space complexity O(n^2) if you use parent tracking and O(1) if you don't.
First of all let's find the answer to the question if there is a cycle.
The algorithm is dead-simple. Below is snippet in Scala.
val NO_EDGE = Integer.MAX_VALUE / 2
def shortestPath(weights: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
weights(i)(j) = throughK
}
}
}
Originally this algorithm operates on weighted-edge graph to find all shortest paths between all pairs of nodes (hence the weights argument). For it to work correctly you need to provide 1 if there is a directed edge between the nodes or NO_EDGE otherwise.
After algorithm executes, you can check the main diagonal, if there are values less then NO_EDGE than this node participates in a cycle of length equal to the value. Every other node of the same cycle will have the same value (on the main diagonal).
To reconstruct the cycle itself we need to use slightly modified version of algorithm with parent tracking.
def shortestPath(weights: Array[Array[Int]], parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = k
weights(i)(j) = throughK
}
}
}
Parents matrix initially should contain source vertex index in an edge cell if there is an edge between the vertices and -1 otherwise.
After function returns, for each edge you will have reference to the parent node in the shortest path tree.
And then it's easy to recover actual cycles.
All in all we have the following program to find all minimal cycles
val NO_EDGE = Integer.MAX_VALUE / 2;
def shortestPathWithParentTracking(
weights: Array[Array[Int]],
parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = parents(i)(k)
weights(i)(j) = throughK
}
}
}
def recoverCycles(
cycleNodes: Seq[Int],
parents: Array[Array[Int]]): Set[Seq[Int]] = {
val res = new mutable.HashSet[Seq[Int]]()
for (node <- cycleNodes) {
var cycle = new mutable.ArrayBuffer[Int]()
cycle += node
var other = parents(node)(node)
do {
cycle += other
other = parents(other)(node)
} while(other != node)
res += cycle.sorted
}
res.toSet
}
and a small main method just to test the result
def main(args: Array[String]): Unit = {
val n = 3
val weights = Array(Array(NO_EDGE, 1, NO_EDGE), Array(NO_EDGE, NO_EDGE, 1), Array(1, NO_EDGE, NO_EDGE))
val parents = Array(Array(-1, 1, -1), Array(-1, -1, 2), Array(0, -1, -1))
shortestPathWithParentTracking(weights, parents)
val cycleNodes = parents.indices.filter(i => parents(i)(i) < NO_EDGE)
val cycles: Set[Seq[Int]] = recoverCycles(cycleNodes, parents)
println("The following minimal cycle found:")
cycles.foreach(c => println(c.mkString))
println(s"Total: ${cycles.size} cycle found")
}
and the output is
The following minimal cycle found:
012
Total: 1 cycle found
To clarify:
Strongly Connected Components will find all subgraphs that have at least one cycle in them, not all possible cycles in the graph. e.g. if you take all strongly connected components and collapse/group/merge each one of them into one node (i.e. a node per component), you'll get a tree with no cycles (a DAG actually). Each component (which is basically a subgraph with at least one cycle in it) can contain many more possible cycles internally, so SCC will NOT find all possible cycles, it will find all possible groups that have at least one cycle, and if you group them, then the graph will not have cycles.
to find all simple cycles in a graph, as others mentioned, Johnson's algorithm is a candidate.
I was given this as an interview question once, I suspect this has happened to you and you are coming here for help. Break the problem into three questions and it becomes easier.
how do you determine the next valid
route
how do you determine if a point has
been used
how do you avoid crossing over the
same point again
Problem 1)
Use the iterator pattern to provide a way of iterating route results. A good place to put the logic to get the next route is probably the "moveNext" of your iterator. To find a valid route, it depends on your data structure. For me it was a sql table full of valid route possibilities so I had to build a query to get the valid destinations given a source.
Problem 2)
Push each node as you find them into a collection as you get them, this means that you can see if you are "doubling back" over a point very easily by interrogating the collection you are building on the fly.
Problem 3)
If at any point you see you are doubling back, you can pop things off the collection and "back up". Then from that point try to "move forward" again.
Hack: if you are using Sql Server 2008 there is are some new "hierarchy" things you can use to quickly solve this if you structure your data in a tree.
In the case of undirected graph, a paper recently published (Optimal listing of cycles and st-paths in undirected graphs) offers an asymptotically optimal solution. You can read it here http://arxiv.org/abs/1205.2766 or here http://dl.acm.org/citation.cfm?id=2627951
I know it doesn't answer your question, but since the title of your question doesn't mention direction, it might still be useful for Google search
Start at node X and check for all child nodes (parent and child nodes are equivalent if undirected). Mark those child nodes as being children of X. From any such child node A, mark it's children of being children of A, X', where X' is marked as being 2 steps away.). If you later hit X and mark it as being a child of X'', that means X is in a 3 node cycle. Backtracking to it's parent is easy (as-is, the algorithm has no support for this so you'd find whichever parent has X').
Note: If graph is undirected or has any bidirectional edges, this algorithm gets more complicated, assuming you don't want to traverse the same edge twice for a cycle.
If what you want is to find all elementary circuits in a graph you can use the EC algorithm, by JAMES C. TIERNAN, found on a paper since 1970.
The very original EC algorithm as I managed to implement it in php (hope there are no mistakes is shown below). It can find loops too if there are any. The circuits in this implementation (that tries to clone the original) are the non zero elements. Zero here stands for non-existence (null as we know it).
Apart from that below follows an other implementation that gives the algorithm more independece, this means the nodes can start from anywhere even from negative numbers, e.g -4,-3,-2,.. etc.
In both cases it is required that the nodes are sequential.
You might need to study the original paper, James C. Tiernan Elementary Circuit Algorithm
<?php
echo "<pre><br><br>";
$G = array(
1=>array(1,2,3),
2=>array(1,2,3),
3=>array(1,2,3)
);
define('N',key(array_slice($G, -1, 1, true)));
$P = array(1=>0,2=>0,3=>0,4=>0,5=>0);
$H = array(1=>$P, 2=>$P, 3=>$P, 4=>$P, 5=>$P );
$k = 1;
$P[$k] = key($G);
$Circ = array();
#[Path Extension]
EC2_Path_Extension:
foreach($G[$P[$k]] as $j => $child ){
if( $child>$P[1] and in_array($child, $P)===false and in_array($child, $H[$P[$k]])===false ){
$k++;
$P[$k] = $child;
goto EC2_Path_Extension;
} }
#[EC3 Circuit Confirmation]
if( in_array($P[1], $G[$P[$k]])===true ){//if PATH[1] is not child of PATH[current] then don't have a cycle
$Circ[] = $P;
}
#[EC4 Vertex Closure]
if($k===1){
goto EC5_Advance_Initial_Vertex;
}
//afou den ksana theoreitai einai asfales na svisoume
for( $m=1; $m<=N; $m++){//H[P[k], m] <- O, m = 1, 2, . . . , N
if( $H[$P[$k-1]][$m]===0 ){
$H[$P[$k-1]][$m]=$P[$k];
break(1);
}
}
for( $m=1; $m<=N; $m++ ){//H[P[k], m] <- O, m = 1, 2, . . . , N
$H[$P[$k]][$m]=0;
}
$P[$k]=0;
$k--;
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC5_Advance_Initial_Vertex:
if($P[1] === N){
goto EC6_Terminate;
}
$P[1]++;
$k=1;
$H=array(
1=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
2=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
3=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
4=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
5=>array(1=>0,2=>0,3=>0,4=>0,5=>0)
);
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC6_Terminate:
print_r($Circ);
?>
then this is the other implementation, more independent of the graph, without goto and without array values, instead it uses array keys, the path, the graph and circuits are stored as array keys (use array values if you like, just change the required lines). The example graph start from -4 to show its independence.
<?php
$G = array(
-4=>array(-4=>true,-3=>true,-2=>true),
-3=>array(-4=>true,-3=>true,-2=>true),
-2=>array(-4=>true,-3=>true,-2=>true)
);
$C = array();
EC($G,$C);
echo "<pre>";
print_r($C);
function EC($G, &$C){
$CNST_not_closed = false; // this flag indicates no closure
$CNST_closed = true; // this flag indicates closure
// define the state where there is no closures for some node
$tmp_first_node = key($G); // first node = first key
$tmp_last_node = $tmp_first_node-1+count($G); // last node = last key
$CNST_closure_reset = array();
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$CNST_closure_reset[$k] = $CNST_not_closed;
}
// define the state where there is no closure for all nodes
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$H[$k] = $CNST_closure_reset; // Key in the closure arrays represent nodes
}
unset($tmp_first_node);
unset($tmp_last_node);
# Start algorithm
foreach($G as $init_node => $children){#[Jump to initial node set]
#[Initial Node Set]
$P = array(); // declare at starup, remove the old $init_node from path on loop
$P[$init_node]=true; // the first key in P is always the new initial node
$k=$init_node; // update the current node
// On loop H[old_init_node] is not cleared cause is never checked again
do{#Path 1,3,7,4 jump here to extend father 7
do{#Path from 1,3,8,5 became 2,4,8,5,6 jump here to extend child 6
$new_expansion = false;
foreach( $G[$k] as $child => $foo ){#Consider each child of 7 or 6
if( $child>$init_node and isset($P[$child])===false and $H[$k][$child]===$CNST_not_closed ){
$P[$child]=true; // add this child to the path
$k = $child; // update the current node
$new_expansion=true;// set the flag for expanding the child of k
break(1); // we are done, one child at a time
} } }while(($new_expansion===true));// Do while a new child has been added to the path
# If the first node is child of the last we have a circuit
if( isset($G[$k][$init_node])===true ){
$C[] = $P; // Leaving this out of closure will catch loops to
}
# Closure
if($k>$init_node){ //if k>init_node then alwaya count(P)>1, so proceed to closure
$new_expansion=true; // $new_expansion is never true, set true to expand father of k
unset($P[$k]); // remove k from path
end($P); $k_father = key($P); // get father of k
$H[$k_father][$k]=$CNST_closed; // mark k as closed
$H[$k] = $CNST_closure_reset; // reset k closure
$k = $k_father; // update k
} } while($new_expansion===true);//if we don't wnter the if block m has the old k$k_father_old = $k;
// Advance Initial Vertex Context
}//foreach initial
}//function
?>
I have analized and documented the EC but unfortunately the documentation is in Greek.
There are two steps (algorithms) involved in finding all cycles in a DAG.
The first step is to use Tarjan's algorithm to find the set of strongly connected components.
Start from any arbitrary vertex.
DFS from that vertex. For each node x, keep two numbers, dfs_index[x] and dfs_lowval[x].
dfs_index[x] stores when that node is visited, while dfs_lowval[x] = min(dfs_low[k]) where
k is all the children of x that is not the directly parent of x in the dfs-spanning tree.
All nodes with the same dfs_lowval[x] are in the same strongly connected component.
The second step is to find cycles (paths) within the connected components. My suggestion is to use a modified version of Hierholzer's algorithm.
The idea is:
Choose any starting vertex v, and follow a trail of edges from that vertex until you return to v.
It is not possible to get stuck at any vertex other than v, because the even degree of all vertices ensures that, when the trail enters another vertex w there must be an unused edge leaving w. The tour formed in this way is a closed tour, but may not cover all the vertices and edges of the initial graph.
As long as there exists a vertex v that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from v, following unused edges until you return to v, and join the tour formed in this way to the previous tour.
Here is the link to a Java implementation with a test case:
http://stones333.blogspot.com/2013/12/find-cycles-in-directed-graph-dag.html
I stumbled over the following algorithm which seems to be more efficient than Johnson's algorithm (at least for larger graphs). I am however not sure about its performance compared to Tarjan's algorithm.
Additionally, I only checked it out for triangles so far. If interested, please see "Arboricity and Subgraph Listing Algorithms" by Norishige Chiba and Takao Nishizeki (http://dx.doi.org/10.1137/0214017)
DFS from the start node s, keep track of the DFS path during traversal, and record the path if you find an edge from node v in the path to s. (v,s) is a back-edge in the DFS tree and thus indicates a cycle containing s.
Regarding your question about the Permutation Cycle, read more here:
https://www.codechef.com/problems/PCYCLE
You can try this code (enter the size and the digits number):
# include<cstdio>
using namespace std;
int main()
{
int n;
scanf("%d",&n);
int num[1000];
int visited[1000]={0};
int vindex[2000];
for(int i=1;i<=n;i++)
scanf("%d",&num[i]);
int t_visited=0;
int cycles=0;
int start=0, index;
while(t_visited < n)
{
for(int i=1;i<=n;i++)
{
if(visited[i]==0)
{
vindex[start]=i;
visited[i]=1;
t_visited++;
index=start;
break;
}
}
while(true)
{
index++;
vindex[index]=num[vindex[index-1]];
if(vindex[index]==vindex[start])
break;
visited[vindex[index]]=1;
t_visited++;
}
vindex[++index]=0;
start=index+1;
cycles++;
}
printf("%d\n",cycles,vindex[0]);
for(int i=0;i<(n+2*cycles);i++)
{
if(vindex[i]==0)
printf("\n");
else
printf("%d ",vindex[i]);
}
}
DFS c++ version for the pseudo-code in second floor's answer:
void findCircleUnit(int start, int v, bool* visited, vector<int>& path) {
if(visited[v]) {
if(v == start) {
for(auto c : path)
cout << c << " ";
cout << endl;
return;
}
else
return;
}
visited[v] = true;
path.push_back(v);
for(auto i : G[v])
findCircleUnit(start, i, visited, path);
visited[v] = false;
path.pop_back();
}
http://www.me.utexas.edu/~bard/IP/Handouts/cycles.pdf
The CXXGraph library give a set of algorithms and functions to detect cycles.
For a full algorithm explanation visit the wiki.

Resources