A fast way to find connected component in a 1-NN graph? - algorithm

First of all, I got a N*N distance matrix, for each point, I calculated its nearest neighbor, so we had a N*2 matrix, It seems like this:
0 -> 1
1 -> 2
2 -> 3
3 -> 2
4 -> 2
5 -> 6
6 -> 7
7 -> 6
8 -> 6
9 -> 8
the second column was the nearest neighbor's index. So this was a special kind of directed
graph, with each vertex had and only had one out-degree.
Of course, we could first transform the N*2 matrix to a standard graph representation, and perform BFS/DFS to get the connected components.
But, given the characteristic of this special graph, is there any other fast way to do the job ?
I will be really appreciated.
Update:
I've implemented a simple algorithm for this case here.
Look, I did not use a union-find algorithm, because the data structure may make things not that easy, and I doubt whether It's the fastest way in my case(I meant practically).
You could argue that the _merge process could be time consuming, but if we swap the edges into the continuous place while assigning new label, the merging may cost little, but it need another N spaces to trace the original indices.

The fastest algorithm for finding connected components given an edge list is the union-find algorithm: for each node, hold the pointer to a node in the same set, with all edges converging to the same node, if you find a path of length at least 2, reconnect the bottom node upwards.
This will definitely run in linear time:
- push all edges into a union-find structure: O(n)
- store each node in its set (the union-find root)
and update the set of non-empty sets: O(n)
- return the set of non-empty sets (graph components).
Since the list of edges already almost forms a union-find tree, it is possible to skip the first step:
for each node
- if the node is not marked as collected
-- walk along the edges until you find an order-1 or order-2 loop,
collecting nodes en-route
-- reconnect all nodes to the end of the path and consider it a root for the set.
-- store all nodes in the set for the root.
-- update the set of non-empty sets.
-- mark all nodes as collected.
return the set of non-empty sets
The second algorithm is linear as well, but only a benchmark will tell if it's actually faster. The strength of the union-find algorithm is its optimization. This delays the optimization to the second step but removes the first step completely.
You can probably squeeze out a little more performance if you join the union step with the nearest neighbor calculation, then collect the sets in the second pass.

If you want to do it sequencially you can do it using weighted quick union and path compression .Complexity O(N+Mlog(log(N))).check this link .
Here is the pseudocode .honoring #pycho 's words
`
public class QuickUnion
{
private int[] id;
public QuickUnion(int N)
{
id = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
}
public int root(int i)
{
while (i != id[i])
{
id[i] = id[id[i]];
i = id[i];
}
return i;
}
public boolean find(int p, int q)
{
return root(p) == root(q);
}
public void unite(int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}
`
#reference https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
If you want to find connected components parallely, the asymptotic complexity can be reduced to O(log(log(N)) time using pointer jumping and weighted quick union with path compression. Check this link
https://vishwasshanbhog.wordpress.com/2016/05/04/efficient-parallel-algorithm-to-find-the-connected-components-of-the-graphs/

Since each node has only one outgoing edge, you can just traverse the graph one edge at a time until you get to a vertex you've already visited. An out-degree of 1 means any further traversal at this point will only take you where you've already been. The traversed vertices in that path are all in the same component.
In your example:
0->1->2->3->2, so [0,1,2,3] is a component
4->2, so update the component to [0,1,2,3,4]
5->6->7->6, so [5,6,7] is a component
8->6, so update the compoent to [5,6,7,8]
9->8, so update the compoent to [5,6,7,8,9]
You can visit each node exactly once, so time is O(n). Space is O(n) since all you need is a component id for each node, and a list of component ids.

Related

How to adapt Fenwick tree to answer range minimum queries

Fenwick tree is a data-structure that gives an efficient way to answer to main queries:
add an element to a particular index of an array update(index, value)
find sum of elements from 1 to N find(n)
both operations are done in O(log(n)) time and I understand the logic and implementation. It is not hard to implement a bunch of other operations like find a sum from N to M.
I wanted to understand how to adapt Fenwick tree for RMQ. It is obvious to change Fenwick tree for first two operations. But I am failing to figure out how to find minimum on the range from N to M.
After searching for solutions majority of people think that this is not possible and a small minority claims that it actually can be done (approach1, approach2).
The first approach (written in Russian, based on my google translate has 0 explanation and only two functions) relies on three arrays (initial, left and right) upon my testing was not working correctly for all possible test cases.
The second approach requires only one array and based on the claims runs in O(log^2(n)) and also has close to no explanation of why and how should it work. I have not tried to test it.
In light of controversial claims, I wanted to find out whether it is possible to augment Fenwick tree to answer update(index, value) and findMin(from, to).
If it is possible, I would be happy to hear how it works.
Yes, you can adapt Fenwick Trees (Binary Indexed Trees) to
Update value at a given index in O(log n)
Query minimum value for a range in O(log n) (amortized)
We need 2 Fenwick trees and an additional array holding the real values for nodes.
Suppose we have the following array:
index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
value 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0
We wave a magic wand and the following trees appear:
Note that in both trees each node represents the minimum value for all nodes within that subtree. For example, in BIT2 node 12 has value 0, which is the minimum value for nodes 12,13,14,15.
Queries
We can efficiently query the minimum value for any range by calculating the minimum of several subtree values and one additional real node value. For example, the minimum value for range [2,7] can be determined by taking the minimum value of BIT2_Node2 (representing nodes 2,3) and BIT1_Node7 (representing node 7), BIT1_Node6 (representing nodes 5,6) and REAL_4 - therefore covering all nodes in [2,7]. But how do we know which sub trees we want to look at?
Query(int a, int b) {
int val = infinity // always holds the known min value for our range
// Start traversing the first tree, BIT1, from the beginning of range, a
int i = a
while (parentOf(i, BIT1) <= b) {
val = min(val, BIT2[i]) // Note: traversing BIT1, yet looking up values in BIT2
i = parentOf(i, BIT1)
}
// Start traversing the second tree, BIT2, from the end of range, b
i = b
while (parentOf(i, BIT2) >= a) {
val = min(val, BIT1[i]) // Note: traversing BIT2, yet looking up values in BIT1
i = parentOf(i, BIT2)
}
val = min(val, REAL[i]) // Explained below
return val
}
It can be mathematically proven that both traversals will end in the same node. That node is a part of our range, yet it is not a part of any subtrees we have looked at. Imagine a case where the (unique) smallest value of our range is in that special node. If we didn't look it up our algorithm would give incorrect results. This is why we have to do that one lookup into the real values array.
To help understand the algorithm I suggest you simulate it with pen & paper, looking up data in the example trees above. For example, a query for range [4,14] would return the minimum of values BIT2_4 (rep. 4,5,6,7), BIT1_14 (rep. 13,14), BIT1_12 (rep. 9,10,11,12) and REAL_8, therefore covering all possible values [4,14].
Updates
Since a node represents the minimum value of itself and its children, changing a node will affect its parents, but not its children. Therefore, to update a tree we start from the node we are modifying and move up all the way to the fictional root node (0 or N+1 depending on which tree).
Suppose we are updating some node in some tree:
If new value < old value, we will always overwrite the value and move up
If new value == old value, we can stop since there will be no more changes cascading upwards
If new value > old value, things get interesting.
If the old value still exists somewhere within that subtree, we are done
If not, we have to find the new minimum value between real[node] and each tree[child_of_node], change tree[node] and move up
Pseudocode for updating node with value v in a tree:
while (node <= n+1) {
if (v > tree[node]) {
if (oldValue == tree[node]) {
v = min(v, real[node])
for-each child {
v = min(v, tree[child])
}
} else break
}
if (v == tree[node]) break
tree[node] = v
node = parentOf(node, tree)
}
Note that oldValue is the original value we replaced, whereas v may be reassigned multiple times as we move up the tree.
Binary Indexing
In my experiments Range Minimum Queries were about twice as fast as a Segment Tree implementation and updates were marginally faster. The main reason for this is using super efficient bitwise operations for moving between nodes. They are very well explained here. Segment Trees are really simple to code so think about is the performance advantage really worth it? The update method of my Fenwick RMQ is 40 lines and took a while to debug. If anyone wants my code I can put it on github. I also produced a brute and test generators to make sure everything works.
I had help understanding this subject & implementing it from the Finnish algorithm community. Source of the image is http://ioinformatics.org/oi/pdf/v9_2015_39_44.pdf, but they credit Fenwick's 1994 paper for it.
The Fenwick tree structure works for addition because addition is invertible. It doesn't work for minimum, because as soon as you have a cell that's supposed to be the minimum of two or more inputs, you've lost information potentially.
If you're willing to double your storage requirements, you can support RMQ with a segment tree that is constructed implicitly, like a binary heap. For an RMQ with n values, store the n values at locations [n, 2n) of an array. Locations [1, n) are aggregates, with the formula A(k) = min(A(2k), A(2k+1)). Location 2n is an infinite sentinel. The update routine should look something like this.
def update(n, a, i, x): # value[i] = x
i += n
a[i] = x
# update the aggregates
while i > 1:
i //= 2
a[i] = min(a[2*i], a[2*i+1])
The multiplies and divides here can be replaced by shifts for efficiency.
The RMQ pseudocode is more delicate. Here's another untested and unoptimized routine.
def rmq(n, a, i, j): # min(value[i:j])
i += n
j += n
x = inf
while i < j:
if i%2 == 0:
i //= 2
else:
x = min(x, a[i])
i = i//2 + 1
if j%2 == 0:
j //= 2
else:
x = min(x, a[j-1])
j //= 2
return x

How to create distinct set from other sets?

While solving the problems on Techgig.com, I got struck with one one of the problem. The problem is like this:
A company organizes two trips for their employees in a year. They want
to know whether all the employees can be sent on the trip or not. The
condition is like, no employee can go on both the trips. Also to
determine which employee can go together the constraint is that the
employees who have worked together in past won't be in the same group.
Examples of the problems:
Suppose the work history is given as follows: {(1,2),(2,3),(3,4)};
then it is possible to accommodate all the four employees in two trips
(one trip consisting of employees 1& 3 and other having employees 2 &
4). Neither of the two employees in the same trip have worked together
in past. Suppose the work history is given as {(1,2),(1,3),(2,3)} then
there is no way possible to have two trips satisfying the company rule
and accommodating all the employees.
Can anyone tell me how to proceed on this problem?
I am using this code for DFS and coloring the vertices.
static boolean DFS(int rootNode) {
Stack<Integer> s = new Stack<Integer>();
s.push(rootNode);
state[rootNode] = true;
color[rootNode] = 1;
while (!s.isEmpty()) {
int u = s.peek();
for (int child = 0; child < numofemployees; child++) {
if (adjmatrix[u][child] == 1) {
if (!state[child]) {
state[child] = true;
s.push(child);
color[child] = color[u] == 1 ? 2 : 1;
break;
} else {
s.pop();
if (color[u] == color[child])
return false;
}
}
}
}
return true;
}
This problem is functionally equivalent to testing if an undirected graph is bipartite. A bipartite graph is a graph for which all of the nodes can be distributed among two sets, and within each set, no node is adjacent to another node.
To solve the problem, take the following steps.
Using the adjacency pairs, construct an undirected graph. This is pretty straightforward: each number represents a node, and for each pair you are given, form a connection between those nodes.
Test the newly generated graph for bipartiteness. This can be achieved in linear time, as described here.
If the graph is bipartite and you've generated the two node sets, the answer to the problem is yes, and each node set, along with its nodes (employees), correspond to one of the two trips.
Excerpt on how to test for bipartiteness:
It is possible to test whether a graph is bipartite, and to return
either a two-coloring (if it is bipartite) or an odd cycle (if it is
not) in linear time, using depth-first search. The main idea is to
assign to each vertex the color that differs from the color of its
parent in the depth-first search tree, assigning colors in a preorder
traversal of the depth-first-search tree. This will necessarily
provide a two-coloring of the spanning tree consisting of the edges
connecting vertices to their parents, but it may not properly color
some of the non-tree edges. In a depth-first search tree, one of the
two endpoints of every non-tree edge is an ancestor of the other
endpoint, and when the depth first search discovers an edge of this
type it should check that these two vertices have different colors. If
they do not, then the path in the tree from ancestor to descendant,
together with the miscolored edge, form an odd cycle, which is
returned from the algorithm together with the result that the graph is
not bipartite. However, if the algorithm terminates without detecting
an odd cycle of this type, then every edge must be properly colored,
and the algorithm returns the coloring together with the result that
the graph is bipartite.
I even used a recursive solution but this one is also passing the same number of cases. Am I leaving any special case handling ?
Below is the recursive solution of the problem:
static void dfs(int v, int curr) {
state[v] = true;
color[v] = curr;
for (int i = 0; i < numofemployees; i++) {
if (adjmatrix[v][i] == 1) {
if (color[i] == curr) {
bipartite = false;
return;
}
if (!state[i])
dfs(i, curr == 1 ? 2 : 1);
}
}
}
I am calling this function from main() as dfs(0,1) where 0 is the starting vertex and 1 is one of the color

Creating path array using IDDFS

My IDDFS algorithm finds the shortest path of my graph using adjacency matrix.
It shows how deep is the solution (I understand that this is amount of points connected together from starting point to end).
I would like to get these points in array.
For example:
Let's say that solution is found in depth 5, so I would like to have array with points: {0,2,3,4,6}.
Depth 3: array {1,2,3}.
Here is the algorithm in C++:
(I'm not sure if that algorithm "knows" if points which were visited are visited again while searching or not - I'm almost beginner with graphs)
int DLS(int node, int goal, int depth,int adj[9][9])
{
int i,x;
if ( depth >= 0 )
{
if ( node == goal )
return node;
for(i=0;i<nodes;i++)
{
if(adj[node][i] == 1)
{
child = i;
x = DLS(child, goal, depth-1,adj);
if(x == goal)
return goal;
}
}
}
return 0;
}
int IDDFS(int root,int goal,int adj[9][9])
{
depth = 0;
solution = 0;
while(solution <= 0 && depth < nodes)
{
solution = DLS(root, goal, depth,adj);
depth++;
}
if(depth == nodes)
return inf;
return depth-1;
}
int main()
{
int i,u,v,source,goal;
int adj[9][9] = {{0,1,0,1,0,0,0,0,0},
{1,0,1,0,1,0,0,0,0},
{0,1,0,0,0,1,0,0,0},
{1,0,0,0,1,0,1,0,0},
{0,1,0,1,0,1,0,1,0},
{0,0,1,0,1,0,0,0,1},
{0,0,0,1,0,0,0,1,0},
{0,0,0,0,1,0,1,0,1},
{0,0,0,0,0,1,0,1,0}
};
nodes=9;
edges=12;
source=0;
goal=8;
depth = IDDFS(source,goal,adj);
if(depth == inf)printf("No solution Exist\n");
else printf("Solution Found in depth limit ( %d ).\n",depth);
system("PAUSE");
return 0;
}
The reason why I'm using IDDFS instead of other path-finding algorithm is that I want to change depth to specified number to search for paths of exact length (but I'm not sure if that will work).
If someone would suggest other algorithm for finding path of specified length using adjacency matrix, please let me know about it.
The idea of getting the actual path retrieved from a pathfinding algorithm is to use a map:V->V such that the key is a vertex, and the value is the vertex used to discover the key (The source will not be a key, or be a key with null value, since it was not discovered from any vertex).
The pathfinding algorithm will modify this map while it runs, and when it is done - you can get your path by reading from the table iteratively - starting from the target - all the way up to the source - and you get your path in reversed order.
In DFS: you insert the (key,value) pair each time you discover a new vertex (which is key). Note that if key is already a key in the map - you should skip this branch.
Once you finish exploring a certain path, and "close" a vertex, you need take it out of the list, However - sometimes you can optimize the algorithm and skip this part (it will make the branch factor smaller).
Since IDDFS is actually doing DFS iteratively, you can just follow the same logic, and each time you make a new DFS iteration - for higher depth, you can just clear the old map, and start a new one from scratch.
Other pathfinding algorithms are are BFS, A* and dijkstra's algorithm. Note that the last 2 also fit for weighted graphs. All of these can be terminated when you reach a certain depth, same as DFS is terminated when you reach a certain depth in IDDFS.

Bidirectional spanning tree

I came across this question from interviewstreet.com
Machines have once again attacked the kingdom of Xions. The kingdom
of Xions has N cities and N-1 bidirectional roads. The road network is
such that there is a unique path between any pair of cities.
Morpheus has the news that K Machines are planning to destroy the
whole kingdom. These Machines are initially living in K different
cities of the kingdom and anytime from now they can plan and launch an
attack. So he has asked Neo to destroy some of the roads to disrupt
the connection among Machines i.e after destroying those roads there
should not be any path between any two Machines.
Since the attack can be at any time from now, Neo has to do this task
as fast as possible. Each road in the kingdom takes certain time to
get destroyed and they can be destroyed only one at a time.
You need to write a program that tells Neo the minimum amount of time
he will require to disrupt the connection among machines.
Sample Input First line of the input contains two, space-separated
integers, N and K. Cities are numbered 0 to N-1. Then follow N-1
lines, each containing three, space-separated integers, x y z, which
means there is a bidirectional road connecting city x and city y, and
to destroy this road it takes z units of time. Then follow K lines
each containing an integer. Ith integer is the id of city in which ith
Machine is currently located.
Output Format Print in a single line the minimum time required to
disrupt the connection among Machines.
Sample Input
5 3
2 1 8
1 0 5
2 4 5
1 3 4
2
4
0
Sample Output
10
Explanation Neo can destroy the road connecting city 2 and city 4 of
weight 5 , and the road connecting city 0 and city 1 of weight 5. As
only one road can be destroyed at a time, the total minimum time taken
is 10 units of time. After destroying these roads none of the Machines
can reach other Machine via any path.
Constraints
2 <= N <= 100,000
2 <= K <= N
1 <= time to destroy a road <= 1000,000
Can someone give idea how to approach the solution.
The kingdom has N cities, N-1 edges and it's fully connected, therefore our kingdom is tree (in graph theory). At this picture you can see tree representation of your input graph in which Machines are represented by red vertices.
By the way you should consider all paths from the root vertex to all leaf nodes. So in every path you would have several red nodes and during removing edges you should take in account only neighboring red nodes. For example in path 0-10 there are two meaningfull pairs - (0,3) and (3,10). And you must remove exactly one node (not less, not more) from each path which connected vertices in pairs.
I hope this advice was helpful.
All the three answers will lead to correct solution but you can not achieve the solution within the time limit provided by interviewstreet.com. You have to think of some simple approach to solve this problem successfully.
HINT: start from the node where machine is present.
As said by others, a connected graph with N vertices and N-1 edges is a tree.
This kind of problem asks for a greedy solution; I'd go for a modification of Kruskal's algorithm:
Start with a set of N components - 1 for every node (city). Keep track of which components contain a machine-occupied city.
Take 1 edge (road) at a time, order by descending weight (starting with roads most costly to destroy). For this edge (which necessarily connects two components - the graph is a tree):
if both neigboring components contain a machine-occupied city, this road must be destroyed, mark it as such
otherwise, merge the neigboring components into one. If one of them contained a machine-occupied city, so does the merged component.
When you're done with all edges, return the sum of costs for the destroyed roads.
Complexity will be the same as Kruskal's algorithm, that is, almost linear for well chosen data structure and sorting method.
pjotr has a correct answer (though not quite asymptotically optimal), but this statement
This kind of problem asks for a greedy solution
really requires proof, as in the real world (as distinguished from competitive programming), there are several problems of this “kind” for which the greedy solution is not optimal (e.g., this very same problem in general graphs, which is called multiterminal cut and is NP-hard). In this case, proof consists of verifying the matroid axioms. Let a set of edges A &subseteq; E be independent if the graph (V, E &setminus; A) has exactly |A| + 1 connected components containing at least one machine.
Independence of the empty set. Trivial.
Hereditary property. Let A be an independent set. Every edge e &in; A joins two connected components of the graph (V, E &setminus; A), and every connected component contains at least one machine. In putting e back in the graph, the number of connected components containing at least one machine decreases by 1, so A &setminus; {e} is also independent.
Augmentation property. Let A and B be independent sets with |A| < |B|. Since (V, E &setminus; B) has more connected components than (V, E &setminus; A), there exists by the pigeonhole principle a pair of machines u, v such that u and v are disconnected by B but not by A. Since there is exactly one path from u to v, B contains at least one edge e on this path, and A cannot contain e. The removal of A ∪ {e} induces one more connected component containing at least one machine than A, so A ∪ {e} is independent, as required.
Start performing a DFS from either of the machine nodes. Also, keep track of the edge with min weight encountered so far. As soon as you find the next node which also contains a machine, delete the min edge recorded so far. Start DFS from this new node now.
Repeat until you have found all nodes where the machines exists.
Should be of the O(N) that way !!
I write some code, and pasted all the tests.
#include <iostream>
#include<algorithm>
using namespace std;
class Line {
public:
Line(){
begin=0;end=0; weight=0;
}
int begin;int end;int weight;
bool operator<(const Line& _l)const {
return weight>_l.weight;
}
};
class Point{
public:
Point(){
pre=0;machine=false;
}
int pre;
bool machine;
};
void DP_Matrix();
void outputLines(Line* lines,Point* points,int N);
int main() {
DP_Matrix();
system("pause");
return 0;
}
int FMSFind(Point* trees,int x){
int r=x;
while(trees[r].pre!=r)
r=trees[r].pre;
int i=x;int j;
while(i!=r) {
j=trees[i].pre;
trees[i].pre=r;
i=j;
}
return r;
}
void DP_Matrix(){
int N,K,machine_index;scanf("%d%d",&N,&K);
Line* lines=new Line[100000];
Point* points=new Point[100000];
N--;
for(int i=0;i<N;i++) {
scanf("%d%d%d",&lines[i].begin,&lines[i].end,&lines[i].weight);
points[i].pre=i;
}
points[N].pre=N;
for(int i=0;i<K;i++) {
scanf("%d",&machine_index);
points[machine_index].machine=true;
}
long long finalRes=0;
for(int i=0;i<N;i++) {
int bP=FMSFind(points,lines[i].begin);
int eP=FMSFind(points,lines[i].end);
if(points[bP].machine&&points[eP].machine){
finalRes+=lines[i].weight;
}
else{
points[bP].pre=eP;
points[eP].machine=points[bP].machine||points[eP].machine;
points[bP].machine=points[eP].machine;
}
}
cout<<finalRes<<endl;
delete[] lines;
delete[] points;
}
void outputLines(Line* lines,Point* points,int N){
printf("\nLines:\n");
for(int i=0;i<N;i++){
printf("%d\t%d\t%d\n",lines[i].begin,lines[i].end,lines[i].weight);
}
printf("\nPoints:\n");
for(int i=0;i<=N;i++){
printf("%d\t%d\t%d\n",i,points[i].machine,points[i].pre);
}
}

Finding all cycles in a directed graph

How can I find (iterate over) ALL the cycles in a directed graph from/to a given node?
For example, I want something like this:
A->B->A
A->B->C->A
but not:
B->C->B
I found this page in my search and since cycles are not same as strongly connected components, I kept on searching and finally, I found an efficient algorithm which lists all (elementary) cycles of a directed graph. It is from Donald B. Johnson and the paper can be found in the following link:
http://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF
A java implementation can be found in:
http://normalisiert.de/code/java/elementaryCycles.zip
A Mathematica demonstration of Johnson's algorithm can be found here, implementation can be downloaded from the right ("Download author code").
Note: Actually, there are many algorithms for this problem. Some of them are listed in this article:
http://dx.doi.org/10.1137/0205007
According to the article, Johnson's algorithm is the fastest one.
Depth first search with backtracking should work here.
Keep an array of boolean values to keep track of whether you visited a node before. If you run out of new nodes to go to (without hitting a node you have already been), then just backtrack and try a different branch.
The DFS is easy to implement if you have an adjacency list to represent the graph. For example adj[A] = {B,C} indicates that B and C are the children of A.
For example, pseudo-code below. "start" is the node you start from.
dfs(adj,node,visited):
if (visited[node]):
if (node == start):
"found a path"
return;
visited[node]=YES;
for child in adj[node]:
dfs(adj,child,visited)
visited[node]=NO;
Call the above function with the start node:
visited = {}
dfs(adj,start,visited)
The simplest choice I found to solve this problem was using the python lib called networkx.
It implements the Johnson's algorithm mentioned in the best answer of this question but it makes quite simple to execute.
In short you need the following:
import networkx as nx
import matplotlib.pyplot as plt
# Create Directed Graph
G=nx.DiGraph()
# Add a list of nodes:
G.add_nodes_from(["a","b","c","d","e"])
# Add a list of edges:
G.add_edges_from([("a","b"),("b","c"), ("c","a"), ("b","d"), ("d","e"), ("e","a")])
#Return a list of cycles described as a list o nodes
list(nx.simple_cycles(G))
Answer: [['a', 'b', 'd', 'e'], ['a', 'b', 'c']]
First of all - you do not really want to try find literally all cycles because if there is 1 then there is an infinite number of those. For example A-B-A, A-B-A-B-A etc. Or it may be possible to join together 2 cycles into an 8-like cycle etc., etc... The meaningful approach is to look for all so called simple cycles - those that do not cross themselves except in the start/end point. Then if you wish you can generate combinations of simple cycles.
One of the baseline algorithms for finding all simple cycles in a directed graph is this: Do a depth-first traversal of all simple paths (those that do not cross themselves) in the graph. Every time when the current node has a successor on the stack a simple cycle is discovered. It consists of the elements on the stack starting with the identified successor and ending with the top of the stack. Depth first traversal of all simple paths is similar to depth first search but you do not mark/record visited nodes other than those currently on the stack as stop points.
The brute force algorithm above is terribly inefficient and in addition to that generates multiple copies of the cycles. It is however the starting point of multiple practical algorithms which apply various enhancements in order to improve performance and avoid cycle duplication. I was surprised to find out some time ago that these algorithms are not readily available in textbooks and on the web. So I did some research and implemented 4 such algorithms and 1 algorithm for cycles in undirected graphs in an open source Java library here : http://code.google.com/p/niographs/ .
BTW, since I mentioned undirected graphs : The algorithm for those is different. Build a spanning tree and then every edge which is not part of the tree forms a simple cycle together with some edges in the tree. The cycles found this way form a so called cycle base. All simple cycles can then be found by combining 2 or more distinct base cycles. For more details see e.g. this : http://dspace.mit.edu/bitstream/handle/1721.1/68106/FTL_R_1982_07.pdf .
The DFS-based variants with back edges will find cycles indeed, but in many cases it will NOT be minimal cycles. In general DFS gives you the flag that there is a cycle but it is not good enough to actually find cycles. For example, imagine 5 different cycles sharing two edges. There is no simple way to identify cycles using just DFS (including backtracking variants).
Johnson's algorithm is indeed gives all unique simple cycles and has good time and space complexity.
But if you want to just find MINIMAL cycles (meaning that there may be more then one cycle going through any vertex and we are interested in finding minimal ones) AND your graph is not very large, you can try to use the simple method below.
It is VERY simple but rather slow compared to Johnson's.
So, one of the absolutely easiest way to find MINIMAL cycles is to use Floyd's algorithm to find minimal paths between all the vertices using adjacency matrix.
This algorithm is nowhere near as optimal as Johnson's, but it is so simple and its inner loop is so tight that for smaller graphs (<=50-100 nodes) it absolutely makes sense to use it.
Time complexity is O(n^3), space complexity O(n^2) if you use parent tracking and O(1) if you don't.
First of all let's find the answer to the question if there is a cycle.
The algorithm is dead-simple. Below is snippet in Scala.
val NO_EDGE = Integer.MAX_VALUE / 2
def shortestPath(weights: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
weights(i)(j) = throughK
}
}
}
Originally this algorithm operates on weighted-edge graph to find all shortest paths between all pairs of nodes (hence the weights argument). For it to work correctly you need to provide 1 if there is a directed edge between the nodes or NO_EDGE otherwise.
After algorithm executes, you can check the main diagonal, if there are values less then NO_EDGE than this node participates in a cycle of length equal to the value. Every other node of the same cycle will have the same value (on the main diagonal).
To reconstruct the cycle itself we need to use slightly modified version of algorithm with parent tracking.
def shortestPath(weights: Array[Array[Int]], parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = k
weights(i)(j) = throughK
}
}
}
Parents matrix initially should contain source vertex index in an edge cell if there is an edge between the vertices and -1 otherwise.
After function returns, for each edge you will have reference to the parent node in the shortest path tree.
And then it's easy to recover actual cycles.
All in all we have the following program to find all minimal cycles
val NO_EDGE = Integer.MAX_VALUE / 2;
def shortestPathWithParentTracking(
weights: Array[Array[Int]],
parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = parents(i)(k)
weights(i)(j) = throughK
}
}
}
def recoverCycles(
cycleNodes: Seq[Int],
parents: Array[Array[Int]]): Set[Seq[Int]] = {
val res = new mutable.HashSet[Seq[Int]]()
for (node <- cycleNodes) {
var cycle = new mutable.ArrayBuffer[Int]()
cycle += node
var other = parents(node)(node)
do {
cycle += other
other = parents(other)(node)
} while(other != node)
res += cycle.sorted
}
res.toSet
}
and a small main method just to test the result
def main(args: Array[String]): Unit = {
val n = 3
val weights = Array(Array(NO_EDGE, 1, NO_EDGE), Array(NO_EDGE, NO_EDGE, 1), Array(1, NO_EDGE, NO_EDGE))
val parents = Array(Array(-1, 1, -1), Array(-1, -1, 2), Array(0, -1, -1))
shortestPathWithParentTracking(weights, parents)
val cycleNodes = parents.indices.filter(i => parents(i)(i) < NO_EDGE)
val cycles: Set[Seq[Int]] = recoverCycles(cycleNodes, parents)
println("The following minimal cycle found:")
cycles.foreach(c => println(c.mkString))
println(s"Total: ${cycles.size} cycle found")
}
and the output is
The following minimal cycle found:
012
Total: 1 cycle found
To clarify:
Strongly Connected Components will find all subgraphs that have at least one cycle in them, not all possible cycles in the graph. e.g. if you take all strongly connected components and collapse/group/merge each one of them into one node (i.e. a node per component), you'll get a tree with no cycles (a DAG actually). Each component (which is basically a subgraph with at least one cycle in it) can contain many more possible cycles internally, so SCC will NOT find all possible cycles, it will find all possible groups that have at least one cycle, and if you group them, then the graph will not have cycles.
to find all simple cycles in a graph, as others mentioned, Johnson's algorithm is a candidate.
I was given this as an interview question once, I suspect this has happened to you and you are coming here for help. Break the problem into three questions and it becomes easier.
how do you determine the next valid
route
how do you determine if a point has
been used
how do you avoid crossing over the
same point again
Problem 1)
Use the iterator pattern to provide a way of iterating route results. A good place to put the logic to get the next route is probably the "moveNext" of your iterator. To find a valid route, it depends on your data structure. For me it was a sql table full of valid route possibilities so I had to build a query to get the valid destinations given a source.
Problem 2)
Push each node as you find them into a collection as you get them, this means that you can see if you are "doubling back" over a point very easily by interrogating the collection you are building on the fly.
Problem 3)
If at any point you see you are doubling back, you can pop things off the collection and "back up". Then from that point try to "move forward" again.
Hack: if you are using Sql Server 2008 there is are some new "hierarchy" things you can use to quickly solve this if you structure your data in a tree.
In the case of undirected graph, a paper recently published (Optimal listing of cycles and st-paths in undirected graphs) offers an asymptotically optimal solution. You can read it here http://arxiv.org/abs/1205.2766 or here http://dl.acm.org/citation.cfm?id=2627951
I know it doesn't answer your question, but since the title of your question doesn't mention direction, it might still be useful for Google search
Start at node X and check for all child nodes (parent and child nodes are equivalent if undirected). Mark those child nodes as being children of X. From any such child node A, mark it's children of being children of A, X', where X' is marked as being 2 steps away.). If you later hit X and mark it as being a child of X'', that means X is in a 3 node cycle. Backtracking to it's parent is easy (as-is, the algorithm has no support for this so you'd find whichever parent has X').
Note: If graph is undirected or has any bidirectional edges, this algorithm gets more complicated, assuming you don't want to traverse the same edge twice for a cycle.
If what you want is to find all elementary circuits in a graph you can use the EC algorithm, by JAMES C. TIERNAN, found on a paper since 1970.
The very original EC algorithm as I managed to implement it in php (hope there are no mistakes is shown below). It can find loops too if there are any. The circuits in this implementation (that tries to clone the original) are the non zero elements. Zero here stands for non-existence (null as we know it).
Apart from that below follows an other implementation that gives the algorithm more independece, this means the nodes can start from anywhere even from negative numbers, e.g -4,-3,-2,.. etc.
In both cases it is required that the nodes are sequential.
You might need to study the original paper, James C. Tiernan Elementary Circuit Algorithm
<?php
echo "<pre><br><br>";
$G = array(
1=>array(1,2,3),
2=>array(1,2,3),
3=>array(1,2,3)
);
define('N',key(array_slice($G, -1, 1, true)));
$P = array(1=>0,2=>0,3=>0,4=>0,5=>0);
$H = array(1=>$P, 2=>$P, 3=>$P, 4=>$P, 5=>$P );
$k = 1;
$P[$k] = key($G);
$Circ = array();
#[Path Extension]
EC2_Path_Extension:
foreach($G[$P[$k]] as $j => $child ){
if( $child>$P[1] and in_array($child, $P)===false and in_array($child, $H[$P[$k]])===false ){
$k++;
$P[$k] = $child;
goto EC2_Path_Extension;
} }
#[EC3 Circuit Confirmation]
if( in_array($P[1], $G[$P[$k]])===true ){//if PATH[1] is not child of PATH[current] then don't have a cycle
$Circ[] = $P;
}
#[EC4 Vertex Closure]
if($k===1){
goto EC5_Advance_Initial_Vertex;
}
//afou den ksana theoreitai einai asfales na svisoume
for( $m=1; $m<=N; $m++){//H[P[k], m] <- O, m = 1, 2, . . . , N
if( $H[$P[$k-1]][$m]===0 ){
$H[$P[$k-1]][$m]=$P[$k];
break(1);
}
}
for( $m=1; $m<=N; $m++ ){//H[P[k], m] <- O, m = 1, 2, . . . , N
$H[$P[$k]][$m]=0;
}
$P[$k]=0;
$k--;
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC5_Advance_Initial_Vertex:
if($P[1] === N){
goto EC6_Terminate;
}
$P[1]++;
$k=1;
$H=array(
1=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
2=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
3=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
4=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
5=>array(1=>0,2=>0,3=>0,4=>0,5=>0)
);
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC6_Terminate:
print_r($Circ);
?>
then this is the other implementation, more independent of the graph, without goto and without array values, instead it uses array keys, the path, the graph and circuits are stored as array keys (use array values if you like, just change the required lines). The example graph start from -4 to show its independence.
<?php
$G = array(
-4=>array(-4=>true,-3=>true,-2=>true),
-3=>array(-4=>true,-3=>true,-2=>true),
-2=>array(-4=>true,-3=>true,-2=>true)
);
$C = array();
EC($G,$C);
echo "<pre>";
print_r($C);
function EC($G, &$C){
$CNST_not_closed = false; // this flag indicates no closure
$CNST_closed = true; // this flag indicates closure
// define the state where there is no closures for some node
$tmp_first_node = key($G); // first node = first key
$tmp_last_node = $tmp_first_node-1+count($G); // last node = last key
$CNST_closure_reset = array();
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$CNST_closure_reset[$k] = $CNST_not_closed;
}
// define the state where there is no closure for all nodes
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$H[$k] = $CNST_closure_reset; // Key in the closure arrays represent nodes
}
unset($tmp_first_node);
unset($tmp_last_node);
# Start algorithm
foreach($G as $init_node => $children){#[Jump to initial node set]
#[Initial Node Set]
$P = array(); // declare at starup, remove the old $init_node from path on loop
$P[$init_node]=true; // the first key in P is always the new initial node
$k=$init_node; // update the current node
// On loop H[old_init_node] is not cleared cause is never checked again
do{#Path 1,3,7,4 jump here to extend father 7
do{#Path from 1,3,8,5 became 2,4,8,5,6 jump here to extend child 6
$new_expansion = false;
foreach( $G[$k] as $child => $foo ){#Consider each child of 7 or 6
if( $child>$init_node and isset($P[$child])===false and $H[$k][$child]===$CNST_not_closed ){
$P[$child]=true; // add this child to the path
$k = $child; // update the current node
$new_expansion=true;// set the flag for expanding the child of k
break(1); // we are done, one child at a time
} } }while(($new_expansion===true));// Do while a new child has been added to the path
# If the first node is child of the last we have a circuit
if( isset($G[$k][$init_node])===true ){
$C[] = $P; // Leaving this out of closure will catch loops to
}
# Closure
if($k>$init_node){ //if k>init_node then alwaya count(P)>1, so proceed to closure
$new_expansion=true; // $new_expansion is never true, set true to expand father of k
unset($P[$k]); // remove k from path
end($P); $k_father = key($P); // get father of k
$H[$k_father][$k]=$CNST_closed; // mark k as closed
$H[$k] = $CNST_closure_reset; // reset k closure
$k = $k_father; // update k
} } while($new_expansion===true);//if we don't wnter the if block m has the old k$k_father_old = $k;
// Advance Initial Vertex Context
}//foreach initial
}//function
?>
I have analized and documented the EC but unfortunately the documentation is in Greek.
There are two steps (algorithms) involved in finding all cycles in a DAG.
The first step is to use Tarjan's algorithm to find the set of strongly connected components.
Start from any arbitrary vertex.
DFS from that vertex. For each node x, keep two numbers, dfs_index[x] and dfs_lowval[x].
dfs_index[x] stores when that node is visited, while dfs_lowval[x] = min(dfs_low[k]) where
k is all the children of x that is not the directly parent of x in the dfs-spanning tree.
All nodes with the same dfs_lowval[x] are in the same strongly connected component.
The second step is to find cycles (paths) within the connected components. My suggestion is to use a modified version of Hierholzer's algorithm.
The idea is:
Choose any starting vertex v, and follow a trail of edges from that vertex until you return to v.
It is not possible to get stuck at any vertex other than v, because the even degree of all vertices ensures that, when the trail enters another vertex w there must be an unused edge leaving w. The tour formed in this way is a closed tour, but may not cover all the vertices and edges of the initial graph.
As long as there exists a vertex v that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from v, following unused edges until you return to v, and join the tour formed in this way to the previous tour.
Here is the link to a Java implementation with a test case:
http://stones333.blogspot.com/2013/12/find-cycles-in-directed-graph-dag.html
I stumbled over the following algorithm which seems to be more efficient than Johnson's algorithm (at least for larger graphs). I am however not sure about its performance compared to Tarjan's algorithm.
Additionally, I only checked it out for triangles so far. If interested, please see "Arboricity and Subgraph Listing Algorithms" by Norishige Chiba and Takao Nishizeki (http://dx.doi.org/10.1137/0214017)
DFS from the start node s, keep track of the DFS path during traversal, and record the path if you find an edge from node v in the path to s. (v,s) is a back-edge in the DFS tree and thus indicates a cycle containing s.
Regarding your question about the Permutation Cycle, read more here:
https://www.codechef.com/problems/PCYCLE
You can try this code (enter the size and the digits number):
# include<cstdio>
using namespace std;
int main()
{
int n;
scanf("%d",&n);
int num[1000];
int visited[1000]={0};
int vindex[2000];
for(int i=1;i<=n;i++)
scanf("%d",&num[i]);
int t_visited=0;
int cycles=0;
int start=0, index;
while(t_visited < n)
{
for(int i=1;i<=n;i++)
{
if(visited[i]==0)
{
vindex[start]=i;
visited[i]=1;
t_visited++;
index=start;
break;
}
}
while(true)
{
index++;
vindex[index]=num[vindex[index-1]];
if(vindex[index]==vindex[start])
break;
visited[vindex[index]]=1;
t_visited++;
}
vindex[++index]=0;
start=index+1;
cycles++;
}
printf("%d\n",cycles,vindex[0]);
for(int i=0;i<(n+2*cycles);i++)
{
if(vindex[i]==0)
printf("\n");
else
printf("%d ",vindex[i]);
}
}
DFS c++ version for the pseudo-code in second floor's answer:
void findCircleUnit(int start, int v, bool* visited, vector<int>& path) {
if(visited[v]) {
if(v == start) {
for(auto c : path)
cout << c << " ";
cout << endl;
return;
}
else
return;
}
visited[v] = true;
path.push_back(v);
for(auto i : G[v])
findCircleUnit(start, i, visited, path);
visited[v] = false;
path.pop_back();
}
http://www.me.utexas.edu/~bard/IP/Handouts/cycles.pdf
The CXXGraph library give a set of algorithms and functions to detect cycles.
For a full algorithm explanation visit the wiki.

Resources