How to find all chordless cycles in an undirected graph?
For example, given the graph
0 --- 1
| | \
| | \
4 --- 3 - 2
the algorithm should return 1-2-3 and 0-1-3-4, but never 0-1-2-3-4.
(Note: [1] This question is not the same as small cycle finding in a planar graph because the graph is not necessarily planar. [2] I have read the paper Generating all cycles, chordless cycles, and Hamiltonian cycles with the principle of exclusion but I don't understand what they're doing :). [3] I have tried CYPATH but the program only gives the count, algorithm EnumChordlessPath in readme.txt has significant typos, and the C code is a mess. [4] I am not trying to find an arbitrary set of fundametal cycles. Cycle basis can have chords.)
Assign numbers to nodes from 1 to n.
Pick the node number 1. Call it 'A'.
Enumerate pairs of links coming out of 'A'.
Pick one. Let's call the adjacent nodes 'B' and 'C' with B less than C.
If B and C are connected, then output the cycle ABC, return to step 3 and pick a different pair.
If B and C are not connected:
Enumerate all nodes connected to B. Suppose it's connected to D, E, and F. Create a list of vectors CABD, CABE, CABF. For each of these:
if the last node is connected to any internal node except C and B, discard the vector
if the last node is connected to C, output and discard
if it's not connected to either, create a new list of vectors, appending all nodes to which the last node is connected.
Repeat until you run out of vectors.
Repeat steps 3-5 with all pairs.
Remove node 1 and all links that lead to it. Pick the next node and go back to step 2.
Edit: and you can do away with one nested loop.
This seems to work at the first sight, there may be bugs, but you should get the idea:
void chordless_cycles(int* adjacency, int dim)
{
for(int i=0; i<dim-2; i++)
{
for(int j=i+1; j<dim-1; j++)
{
if(!adjacency[i+j*dim])
continue;
list<vector<int> > candidates;
for(int k=j+1; k<dim; k++)
{
if(!adjacency[i+k*dim])
continue;
if(adjacency[j+k*dim])
{
cout << i+1 << " " << j+1 << " " << k+1 << endl;
continue;
}
vector<int> v;
v.resize(3);
v[0]=j;
v[1]=i;
v[2]=k;
candidates.push_back(v);
}
while(!candidates.empty())
{
vector<int> v = candidates.front();
candidates.pop_front();
int k = v.back();
for(int m=i+1; m<dim; m++)
{
if(find(v.begin(), v.end(), m) != v.end())
continue;
if(!adjacency[m+k*dim])
continue;
bool chord = false;
int n;
for(n=1; n<v.size()-1; n++)
if(adjacency[m+v[n]*dim])
chord = true;
if(chord)
continue;
if(adjacency[m+j*dim])
{
for(n=0; n<v.size(); n++)
cout<<v[n]+1<<" ";
cout<<m+1<<endl;
continue;
}
vector<int> w = v;
w.push_back(m);
candidates.push_back(w);
}
}
}
}
}
#aioobe has a point. Just find all the cycles and then exclude the non-chordless ones. This may be too inefficient, but the search space can be pruned along the way to reduce the inefficiencies. Here is a general algorithm:
void printChordlessCycles( ChordlessCycle path) {
System.out.println( path.toString() );
for( Node n : path.lastNode().neighbors() ) {
if( path.canAdd( n) ) {
path.add( n);
printChordlessCycles( path);
path.remove( n);
}
}
}
Graph g = loadGraph(...);
ChordlessCycle p = new ChordlessCycle();
for( Node n : g.getNodes()) {
p.add(n);
printChordlessCycles( p);
p.remove( n);
}
class ChordlessCycle {
private CountedSet<Node> connected_nodes;
private List<Node> path;
...
public void add( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.increment( neighbor);
}
path.add( n);
}
public void remove( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.decrement( neighbor);
}
path.remove( n);
}
public boolean canAdd( Node n) {
return (connected_nodes.getCount( n) == 0);
}
}
Just a thought:
Let's say you are enumerating cycles on your example graph and you are starting from node 0.
If you do a breadth-first search for each given edge, e.g. 0 - 1, you reach a fork at 1. Then the cycles that reach 0 again first are chordless, and the rest are not and can be eliminated... at least I think this is the case.
Could you use an approach like this? Or is there a counterexample?
How about this. First, reduce the problem to finding all chordless cycles that pass through a given vertex A. Once you've found all of those, you can remove A from the graph, and repeat with another point until there's nothing left.
And how to find all the chordless cycles that pass through vertex A? Reduce this to finding all chordless paths from B to A, given a list of permitted vertices, and search either breadth-first or depth-first. Note that when iterating over the vertices reachable (in one step) from B, when you choose one of them you must remove all of the others from the list of permitted vertices (take special care when B=A, so as not to eliminate three-edge paths).
Find all cycles.
Definition of a chordless cycle is a set of points in which a subset cycle of those points don't exist. So, once you have all cycles problem is simply to eliminate cycles which do have a subset cycle.
For efficiency, for each cycle you find, loop through all existing cycles and verify that it is not a subset of another cycle or vice versa, and if so, eliminate the larger cycle.
Beyond that, only difficulty is figuring out how to write an algorithm that determines if a set is a subset of another.
Related
We are given a graph of N nodes. (1-N), where each node has exactly 1 directed edge to some node (this node can be the same node).
We need to answer the queries of type : A, B, which asks time required when 2 objects collide if one start at A and other start at B. Both moves 1 hop in 1 sec. If it's not possible for them to collide time would be -1.
Time : from X -> to Y : 1 hop = 1 second.
Constraints :
N, Q <= 10^5 (number of nodes, number of queries).
Example : for given graph
A -> B -> C -> D -> E
^ |
K <- F
Query(A, E) : 3 seconds, as at time t = 3 secs they both will be on node D.
Query(C, D) : -1 seconds, as they will never collide.
What's the optimal way to answer each query?
Brute Force Approach: time - O(Q * N)
Improved solution using binary lifting technique: time - O(Q * log(N))
private static int[] collisionTime(int N, int Q, int[] A, int[][] queries) {
// ancestor matrix : creation time- O(n * log(n))
int M = (int) (Math.ceil(Math.log10(N) / Math.log10(2))) + 1;
int[][] ancestor = new int[N + 1][M];
for(int i = 1; i <= N; i++) {
ancestor[i][0] = A[i]; // 2^0-th ancestor.
}
for(int j = 1; j < M; j++) {
for(int i = 1; i <= N; i++) {
ancestor[i][j] = ancestor[ancestor[i][j-1]][j-1];
}
}
int[] answer = new int[Q];
for(int i = 0; i < Q; i++) {
int u = queries[i][0];
int v = queries[i][1];
answer[i] = timeToCollide(u, v, ancestor);
}
return answer;
}
// using binary lifting: time- O(log(n))
private static int timeToCollide(int u, int v, int[][] ancestor) {
int m = ancestor[0].length;
// edge cases
if(u == v) // already in collision state
return 0;
if(ancestor[u][m-1] != ancestor[v][m-1]) // their top most ancestor is not the same means they cannot collide at all.
return -1;
int t = 0;
for(int j = m - 1; j >=0; j--) {
if(ancestor[u][j] != ancestor[v][j]) {
u = ancestor[u][j];
v = ancestor[v][j];
t += Math.pow(2, j);
}
}
return t + 1;
}
Find all the terminal cycles and designate an arbitrary vertex in each terminal cycle as the cycle root (O(N))
For each vertex, record the length of its terminal cycle, its distance to entry into the terminal cycle, and the distance to the terminal cycle root (O(N)).
Sever the outgoing link from each cycle root. This turns the graph into a forest.
For each query, find the closest (lowest) common ancestor of the two query nodes in this forest.
From the information saved about each query node and the lowest common ancestor, you can figure out the time to collision in constant time.
Step (4), the lowest common ancestor query, is a very well-studied problem. The best algorithms require only linear processing time and provide constant query time, leading to O(N + Q) time for this problem all together.
I believe that the following approach technically achieves O(N+Q) time complexity.
Observations
Subgraphs: The graph is not necessarily contiguous. All graphs consist or one or more disjoint contiguous complete subgraphs, meaning:
No nodes are shared between subgraphs ("disjoint")
All of the nodes in the subgraph are connected ("contiguous")
There are no paths connecting different subgraphs ("complete")
I will hereafter refer to these as the subgraphs of the graph or just "subgraphs". These subgraphs have the following additional properties, which are a consequence of their definition (above) and the types of nodes in the graph (they are all "parent-pointer nodes" with exactly one out-edge/pointer):
All such subgraphs must have exactly one cycle in them (because a cycle is the only way that they can terminate or be closed)
The cycle can be of any length cycle.Len >= 1
Additionally, there may be any number (t >= 0) trees attached to the cycle at their root (base)
All nodes are either in the cycle or in one of these trees (the roots of the trees are in the cycle, but also counted as part of a tree)
Terms:
cycle Length: The number of nodes in a cycle.
cycle Base: An arbitrarily chosen node in the cycle used to measure and distances between two nodes in the same cycle, and also any two nodes in the same subgraph.
tree Base: The base or root node of one of the trees attached to the cycle. As the tree base is also the node that attaches it to the cycle, tree base nodes are counted as being in the cycle (and also part of their tree).
Distance: For a node in the cycle, this is the distance (number of hops) from that node to the cycle Base (zero if it is the cycle Base). For a node in a tree (not counting tree Base nodes, which count as in the cycle), this is the distance from that node to the tree Base node.
Collisions Cases
Trivial
There are many ways or "forms" of collisions possible in a graph, but we can identify two trivial cases upfront:
(A, B) Relation
Collide?
Collision Distance
same node
Yes
0
different subgraphs
No
-1
Obviously, if A and B are the same node, then they trivially collide at distance zero. Likewise, if they are in two different subgraphs, then they can never collide because there are no connections between the subgraphs. For the collision cases that follow I will be assuming that these two cases have already been filtered out so that:
A and B are assumed to be different nodes, and
A and B are assumed to be in the same subgraph
Non-Trivial
The following table covers all of the other, non-trivial, cases of the relation between two nodes.
(A, B) Relation
Collide?
Collision Distance
Notes
same cycle
No
-1
nodes in cycle always stay the same distance apart
A in a tree & B in the cycle (or vice-versa)
if they both arrive at A's treeBase at the same time
-1 OR A.treeDist
if B.cycleDist = (A.treeDist MOD cycle.Len)
A and B are in different trees
if A and B's distance to their cycle.Base is equal MOD cycle.Len
MAX(A.treeDist, B.treeDist)
They meet when the farther one gets to the cycle (tree root)
A & B are in the same tree, but have different treeDist's
If their treeDist's are equal MOD cycle.Len
MAX(A.treeDist, B.treeDist)
They meet when the farther one gets to the cycle (tree root/base)
A & B are in the same tree, and have the same treeDist's
Yes
At their lowest common ancestor (LCA) in the tree
Have to search up the tree
One important principle applied several times above is that two different nodes in a cycle can never collide. This is because when each node follows its path around the cycle, they will always stay the same distance apart, there is no way for one node's path to "catch-up" to another's in the cycle. They can only "collide" if they start out in the cycle at the same node.
The consequences of this are that:
Two different nodes in the cycle can never collide.
A node in a tree can only collide with a node in a cycle, if their total distances to the cycle base are the same Modulo the cycle length (that is the remainder when divided by the cycle length).
This is also true for two nodes in different trees and two nodes in the same tree but with different distances to their tree base.
In all of these cases (from #2 and #3), they will collide when the node that is farthest from its tree Base gets to the cycle (which is also its tree base). This is because nodes in the cycle cannot "catch-up" to each other, so they must always be the same once they are both in the cycle. Thus they always collide when the farther one finally gets to the cycle.
Another important consequence is that every case in both tables above, except for the last one, can be answered in O(1) time, simply by annotating the nodes with so easily determined information:
their Base node (tree or cycle)
their Distance to that base node
the Subgraph they belong to
the Length of their subgraph's Cycle
These can all be easily determined when initializing the graph in O(1) time per node (or O(N) total time).
Approach
Nodes
After the nodes are initially loaded into the graph (MPDGraph object), then I annotate the nodes with the additional information listed above. This process (called "Mapping" in the code) proceeds as follows:
Pick any node
Follow it's path until it "terminates" by reaching a node already in it's path, or a node that was previously mapped
If #2 crossed it's own path, then we've found a new cycle. Designate the node we crossed as the base node of the cycle, and fill-in the mapping properties (cycle, base node, distance, etc.). Unwind our path one node at a time, filling in each node and marking it as InCycle as we go back up the path until we reach the base node again. Now we are ate the base of the tree that our path followed into the cycle, so when we move back to the pervious node in the path we switch to marking it as a tree node, until we return to the first node in our path.
If instead, #2 reached an already mapped node, then we well attach our path to that node and copy its tree/cycle, base etc. information to our current node. Then we will return back up our path as in #3, setting the mapping properties of each node as we go,
If there are any unmapped nodes, pick one and goto #2.
This all takes O(N) time.
Queries
We have a method (called MPDGraph.FindCollision) that given two nodes will apply the rules in the two Collision Cases tables above and return the result. For very case except the last (nodes in same tree and same distance) the distance can be determined in O(1) time by using the mapping properties.
If the two nodes are in the same tree and are also the same distance from their tree base, then they could meet anywhere between them and their common treeBase node. For this case the FindCollision(A,B) method calls the findTreeDistance(A,B) which:
Returns zero if they are the same node.
Otherwise it checks a cache of previously calculated distances to see if it has already been calculated for these two node. If so, then it returns that value.
Otherwise, it calls findTreeDistance passing in the parents of the current two nodes to get their distance, and adds one to that. Then it adds this to the cache and returns the value.
Without this memoization (i.e., the cache) this would take on average apprx. O(log N) for each query of this type. With the memoization it is hard to calculate but I would guess no worse than O(log log N) but for Q counts much larger than N, this will converge to O(1).
This makes the query processing time complexity somewhere between O(Q log log N) and O(Q), and the total time between O(N + Q(log log N)) and O(N + Q).
Code
public static int[] collisionTime(int N, int Q, int[] A, int[,] queries)
{
// create the graph and fill-in the mapping attributes for all nodes
var graph = new MPDGraph(A);
graph.MapAllNodes();
int[] answers = new int[queries.GetLength(0)];
for (int i = 0; i < answers.Length; i++)
{
answers[i] = graph.FindCollision(queries[i, 0], queries[i, 1]);
}
return answers;
}
This utilizes the following classes,
MPDGraph Class:
// MPDG: Mono-Pointing, Directed Graph
// An MPDG is a directed graph where every node has exactly one out-edge.
// (MPDG is my own term, I don't know the real name for these)
class MPDGraph
{
public Node[] Nodes;
Dictionary<(Node, Node), int> cachedDistances = new Dictionary<(Node, Node), int>();
// constructor
public MPDGraph(int[] Pointers)
{
Nodes = new Node[Pointers.Length];
// first fill the array with objects
for (int i = 0; i < Nodes.Length; i++) { Nodes[i] = new Node(i); }
// apply their pointers
for(int i = 0; i < Nodes.Length; i++) { Nodes[i].toNode = Nodes[Pointers[i]]; }
}
// map all of the nodes by filling their mapping properties
public void MapAllNodes()
{
for(int i=0; i<Nodes.Length; i++)
{
if (!Nodes[i].isMapped)
MapPath(Nodes[i], 1);
}
}
// recursively map a path of unmapped nodes, starting from curr
// returns true if curr is in a cycle, false otherwise
public Boolean MapPath(Node curr, int pathNo)
{
Boolean inCycle = false;
curr.pathNo = pathNo;
Node next = curr.toNode;
if (next.IsInPath)
{
// we have found a new cycle
Cycle Cycle = new Cycle(this, next, curr.pathNo - next.pathNo + 1);
curr.Map(Cycle);
return true;
}
else if (next.isMapped)
{
// we are joining an already partially mapped tree
if (next.IsInCycle)
{
// next is a tree-Base, the top of our tree and also in the cycle
curr.Map(next.Cycle, false, next, 1);
}
else
{
// next is a normal tree-node
curr.Map(next.Cycle, false, next.BaseNode, next.Distance + 1);
}
return false;
}
else
{
// continue forward on the path, recurse to the next node
inCycle = MapPath(next, pathNo+1);
if (inCycle)
{
if (next.Cycle.Base == next || next.Distance == 0)
{
//we have returned from the cycleBase, which is also a treeBase
// so, switch from Cycle to Tree
curr.Map(next.Cycle, false, next, 1);
return false;
}
else
{
// still in the cycle
curr.Map(next.Cycle);
}
}
else
{
//returned in tree
curr.Map(next.Cycle, false, next.BaseNode, next.Distance + 1);
}
return inCycle;
}
}
// Given two starting nodes, determine how many steps it takes until their
// paths collide. Returns -1 if they will never collide.
public int FindCollision(int index1, int index2)
{
Node node1 = Nodes[index1];
Node node2 = Nodes[index2];
// eliminate trivial cases
if (node1.Cycle != node2.Cycle)
return -1; // cant collide if they're in different subgraphs
else if (node1 == node2)
return 0; // if they're the same node, then distance = 0
else if (node1.IsInCycle && node2.IsInCycle)
return -1; // different nodes in a cycle never collide
else
{ // they're both in the same subgraph, use math to tell if they collide
// get their distances to the cycle base
int dist1 = node1.Distance + (node1.IsInCycle ? 0 : node1.BaseNode.Distance);
int dist2 = node2.Distance + (node2.IsInCycle ? 0 : node2.BaseNode.Distance);
int cycleLen = node1.Cycle.Length;
// use math: modulo(cycle length)
if ((dist1 % cycleLen) != (dist2 % cycleLen))
{
return -1; // incompatible distances: cannot possibly collide
}
else
{
// they must collide somewhere, figure out how far that is
if (node1.IsInCycle || node2.IsInCycle)
{
// if one is in the cycle, they will collide when
// the other one reaches the cycle (it's treeBase)
return (!node1.IsInCycle ? node1.Distance : node2.Distance);
}
else if (node1.BaseNode != node2.BaseNode)
{
// They are in different trees: they will collide at
// the treeBase of the node that is farther
return Math.Max(node1.Distance, node2.Distance);
}
else
{
// They are in the same tree:
if (node1.Distance != node2.Distance)
{
//if they are in the same tree, but have different distances
// to the treeBase, then they will collide at the treeBase
// when the farther one arrives at the treeBase
return Math.Max(node1.Distance, node2.Distance);
}
else
{
// the hard case, have to walk down their paths
// to find their LCA (Lowest Common Ancestor)
return findTreeDistance(node1, node2);
}
}
}
}
}
int findTreeDistance(Node node1, Node node2)
{
if (node1 == node2) return 0;
// normalize their order
if (node1.index > node2.index)
{
var tmp = node1;
node1 = node2;
node2 = tmp;
}
// check the cache
int dist;
if (cachedDistances.ContainsKey((node1,node2)))
{
dist = cachedDistances[(node1, node2)];
}
else
{
// keep recursing to find where they meet
dist = findTreeDistance(node1.toNode, node2.toNode) + 1;
// cache this new distance
cachedDistances.Add((node1, node2), dist);
}
return dist;
}
}
Node Class:
// Represents a node in the MPDG (Mono-Pointing Directed Graph) with the constraint
// that all nodes have exactly one out-edge ("toNode").
class Node
{
// Primary properties (from input)
public int index { get; set; } // this node's unique index (to the original array)
public Node toNode { get; set; } // what our single out-edge is pointing to
public Node(int Index_) { this.index = Index_; }
// Mapping properties
// (these must be filled-in to finish mapping the node)
// The unique cycle of this node's subgraph (all MPDG-subgraphs have exactly one)
public Cycle Cycle;
// Every node is either in their subgraphs cycle or in one of the inverted
// trees whose apex (base) is attached to it. Only valid when BaseNode is set.
// (tree base nodes are counted as being in the cycle)
public Boolean IsInCycle = false;
// The base node of the tree or cycle that this node is in.
// If (IsInCycle) then it points to this cycle's base node (cycleBase)
// Else it points to base node of this node's tree (treeBase)
public Node BaseNode;
// The distance (hops) from this node to the BaseNode
public int Distance = -1; // -1 if not yet mapped
// Total distance from this node to the cycleBase
public int TotalDistance { get { return Distance + (IsInCycle ? 0 : BaseNode.Distance); } }
// housekeeping for mapping nodes
public int pathNo = -1; // order in our working path
// Derived (implicit) properties
public Boolean isMapped { get { return Cycle != null; } }
public Boolean IsInPath { get { return (pathNo >= 0); } }
public void Map(Cycle Cycle, bool InCycle = true, Node BaseNode = null, int distance_ = -1)
{
this.Cycle = Cycle;
IsInCycle = InCycle;
if (InCycle)
{
this.BaseNode = Cycle.Base;
Distance = (Cycle.Length + (Cycle.Base.pathNo - pathNo)) % Cycle.Length;
}
else
{
this.BaseNode = BaseNode;
Distance = distance_;
}
pathNo = -1; // clean-up the path once we're done
}
}
Cycle Class:
// represents the cycle of a unique MPDG-subgraph
// (should have one of these for each subgraph)
class Cycle
{
public MPDGraph Graph; // the MPDG that contains this cycle
public Node Base; // the base node of a subgraph's cycle
public int Length; // the length of this cycle
public Cycle(MPDGraph graph_, Node base_, int length_)
{
Graph = graph_;
Base = base_;
Length = length_;
}
}
Performance Measurements:
Node Count
Build & Map Graphmean microsecs
Question Count
All Questions mean microsecs
Question mean microseconds
Total mean microseconds
50
0.9
1225
26
0.0212
26.9
500
10.1
124750
2267
0.0182
2277.1
1000
23.4
499500
8720
0.0175
8743.4
5000
159.6
12497500
229000
0.0183
229159.6
10000
345.3
49995000
793212
0.0159
793557.3
It is only possible for a collision to occur on a node that has more than 1 link leading to it. Node D in your example.
Let's call these nodes "crash sites"
So you can prune your graph down to just the crash site nodes. The nodes that lead to the crash site nodes become attributes of the crash site nodes.
Like this for your example:
D : { A,B,C }, { E,F,K }
A collision can ONLY occur if the starting nodes are on two different attribute lists of the same crash site node.
Once you are sure a crash can occurr, then you can check that both starting nodes are the same distance from the crash site.
Algorithm:
Prune graph to crash site nodes
LOOP over questions
Get 2 start nodes
LOOP over crash sites
IF start nodes on two different attributes of crash site
IF start nodes are equi-distant from crash site
report crash time
BREAK from crash site loop
Here is a randomly generated graph with 50 nodes where every node has one out edge connected to another node chosen randomly
The collision sites are
5 7 8 9 10 11 18 19 23 25 31 33 36 37 39
So the algorithm need only loop over 15 nodes, at most, instead of 50.
The answer to the question 'do two particular nodes collide?' is almost always 'NO'. It is kind of boring that way. So let's ask a slightly different question: 'for a particular graph, which pairs of nodes result in a collision?' This requires the same algorithm ( applied to every pair of nodes ) but alway produces an interesting answer.
for this graph
I get this answer
0 and 29 collide at 41
1 and 5 collide at 40
2 and 23 collide at 13
8 and 16 collide at 34
8 and 22 collide at 34
8 and 39 collide at 34
9 and 30 collide at 37
10 and 31 collide at 25
11 and 47 collide at 23
12 and 28 collide at 25
12 and 35 collide at 25
12 and 49 collide at 25
13 and 38 collide at 27
14 and 44 collide at 1
15 and 17 collide at 0
15 and 18 collide at 0
15 and 37 collide at 0
16 and 22 collide at 34
16 and 39 collide at 34
17 and 18 collide at 0
17 and 37 collide at 0
18 and 37 collide at 0
20 and 26 collide at 9
20 and 42 collide at 9
20 and 43 collide at 9
21 and 45 collide at 24
22 and 39 collide at 34
25 and 34 collide at 3
26 and 42 collide at 9
26 and 43 collide at 9
28 and 35 collide at 25
28 and 49 collide at 25
32 and 48 collide at 34
33 and 36 collide at 7
35 and 49 collide at 25
42 and 43 collide at 9
Some timing results
Node Count
Crash Sitesmillisecs
Question Count
Question meanmicrosecs
50
0.4
1225
0.02
500
50
124750
0.02
5000
5500
~12M
0.02
10000
30000
~50M
0.03
30000
181000
~450M
0.6
Notes:
The mean time for a question is the average of checking every possible pair of nodes for a possible collision.
Answering a single question is extremely fast, about 20 nanoseconds for moderately sized graphs ( < 10,000 nodes ) [ A previous timing report included outputting the results when a collision was found, which takes much longer than detecting the collision. These results were taken with all output to the console commented out. ]
Setting up the crash sites and their tributaries gets slow with moderately sized graphs ( > 5,000 nodes ). It is only worth doing if a lot of questions are going to be asked.
The code for this is available at https://github.com/JamesBremner/PathFinder
A path for the context of this question is a collection of points with integer coordinates v1, v2, v3 ... vn such that v1 is connected to v2, v2 is connected to v3 and so on. The path is non-cyclic and does not have any branches.(By v and u are connected it means that the absolute difference between their either x or y coordinate is equal to 1)
We say there is a possible segment between vi and vj if they follow some criteria which is irrelevant to this question.
ci represents the farthest point on the path in the forward direction such that there is a possible segment between vi and ci. (ci lies ahead of vi)
di represents the farthest point on the path in the backward direction such that there is a possible segment between vi and di. (vi lies ahead of di)
Note: If there is a possible segment between u and v then there is a possible segment between any of its sub segments.
The values of ci and di are already calculated for each i.
For each pair vi and vj there is a penalty associated which also has been calculated for each i and j.
A sequence in a path is a collection of points of the path u1, u2, u3 ... um (not necessarily connected) such that u1 = v1, um = vn and there is a possible segment between each ui and ui+1.
Number of segments in such a cycle is (m-1).
The problem is to find the most optimal sequence which is a sequence having minimum number of segments possible and of all the such possible sequences have minimum sum of penalties of consecutive points in that sequence.
This problem is solved in a program called potrace which I am trying to implement but that implementation uses cyclic paths while my program uses non-cyclic.
I also cannot understand how the potrace implementation below works in the first place.
In the implementation below clip0[i] represents ci and clip1[i] represents di.
In potrace implementation cyclic means v1 and vn are also connected in the path.
Source Line 575
Documentation 2.2.4
/* calculate seg0[j] = longest path from 0 with j segments */
i = 0;
for (j=0; i<n; j++) {
seg0[j] = i;
i = clip0[i];
}
seg0[j] = n;
m = j;
/* calculate seg1[j] = longest path to n with m-j segments */
i = n;
for (j=m; j>0; j--) {
seg1[j] = i;
i = clip1[i];
}
seg1[0] = 0;
/* now find the shortest path with m segments, based on penalty3 */
/* note: the outer 2 loops jointly have at most n iterations, thus
the worst-case behavior here is quadratic. In practice, it is
close to linear since the inner loop tends to be short. */
pen[0]=0;
for (j=1; j<=m; j++) {
for (i=seg1[j]; i<=seg0[j]; i++) {
best = -1;
for (k=seg0[j-1]; k>=clip1[i]; k--) {
thispen = penalty3(pp, k, i) + pen[k];
if (best < 0 || thispen < best) {
prev[i] = k;
best = thispen;
}
}
pen[i] = best;
}
}
pp->m = m;
SAFE_CALLOC(pp->po, m, int); // output
/* read off shortest path */
for (i=n, j=m-1; i>0; j--) {
i = prev[i];
pp->po[j] = i;
}
A sample input can be this.
EDIT 1:
So when I implemented the same code for my case the last loop was breaking the code, the index value j was either becoming negative (without self looping) or i = prev[i] was self looping.
The penalty values are positive.
EDIT 2:
I coded vaguely the Dijkstra's algorithm and it seems to be working.
I am providing my relevant bit of code below.
using Weight = std::pair<int, float>;
std::vector<std::vector<std::pair<int, Weight>>> graph;
graph.resize(n);
/*This takes O(n^2).*/
for (int i = 0; i < n; ++i) {
for (int j = clip1[i]; j <= clip0[i]; ++j) {
float pen = calculatePenalty(index, i, j);
graph[i].emplace_back(j, Weight(1, pen));
graph[j].emplace_back(i, Weight(1, pen));
}
}
std::vector<bool> vis(n, false);
std::vector<Weight> dist(n, {10e5 + 1, 0.0f});
std::vector<int> prev(n, 0);
dist[0] = {0, 0.0f};
std::multiset<std::pair<Weight, int>> set;
set.insert({{0, 0.0f}, 0});
while (!set.empty()) {
auto p = *set.begin();
set.erase(set.begin());
int x = p.second;
Weight w0 = p.first;
if (vis[x]) continue;
vis[x] = true;
for (auto v : graph[x]) {
int e = v.first;
Weight w = v.second;
Weight w_ = {dist[x].first + w.first, dist[x].second + w.second};
if (w_ < dist[e]) {
prev[e] = x;
dist[e] = w_;
set.insert({dist[e], e});
}
}
}
for (int i = n - 1; i > 0;) {
seq.push_back(i);
i = prev[i];
}
seq.push_back(0);
If there are any errors in the above code then please correct it.
I think a number of improvements can be made in the above code.
The initialization of the graph itself has O(n^2) complexity. There should be an alternative way to do this part or the whole part.
Its also not so compact as the potrace counter part. A more compact implementation with better time complexity seems possible. If someone could provide some pseudocode in that direction than that would be appreciated.
Also in the potrace implementation it seems that the number of segments is precisely m. But when I compute m in my case and compare it with seg.size() - 1, it is not equal. (It is both greater and less in different cases but not by a large margin.)
The problem you're describing is the (single-source single-destination) shortest-path problem, where an edge's weight is (1, penalty) (and weights are summed elementwise and ordered lexically, so minimizing number of edges is first priority and minimizing total penalty is second priority). You can solve this problem in near-linear time with Dijkstra's algorithm if all your penalties are positive (or zero). In this case, you can prove that the shortest path will never repeat any vertices.
potrace's implementation looks roughly like Bellman-Ford's algorithm (in dynamic programming interpretation), which is a good approach if you have a mixture of positive and negative penalties (but unnecessarily slow if you have only positive penalties). In this case, the shortest path might repeat vertices, but when that happens, the path will actually repeat some vertices (a negative-weight cycle) infinitely many times, which is probably not what you want.
The input to the program is the set of edges in the graph. For e.g. consider the following simple directed graph:
a -> b -> c
The set of edges for this graph is
{ (b, c), (a, b) }
So given a directed graph as a set of edges, how do you determine if the directed graph is a tree? If it is a tree, what is the root node of the tree?
First of I'm looking at how will you represent this graph, adjacency list/ adjacency matrix / any thing else? How will utilize the representation that you have chosen to efficiently answer the above questions?
Edit 1:
Some people are mentoning about using DFS for cycle detection but the problem is which node to start the DFS from. Since it is a directed graph we cannot start the DFS from a random node, for e.g. if I started a DFS from vertex 'c' it won't proceed further since there is no back edge to go to any other nodes. The follow up question here should be how do you determine what is the root of this tree.
Here's a fairly direct method. It can be done with either an adjacency matrix or an edge list.
Find the set, R, of nodes that do not appear as the destination of any edge. If R does not have exactly one member, the graph is not a tree.
If R does have exactly one member, r, it is the only possible root.
Mark r.
Starting from r, recursively mark all nodes that can be reached by following edges from source to destination. If any node is already marked, there is a cycle and the graph is not a tree. (This step is the same as a previously posted answer).
If any node is not marked at the end of step 3, the graph is not a tree.
If none of those steps find that the graph is not a tree, the graph is a tree with r as root.
It's difficult to know what is going to be efficient without some information about the numbers of nodes and edges.
There are 3 properties to check if a graph is a tree:
(1) The number of edges in the graph is exactly one less than the number of vertices |E| = |V| - 1
(2) There are no cycles
(3) The graph is connected
I think this example algorithm can work in the cases of a directed graph:
# given a graph and a starting vertex, check if the graph is a tree
def checkTree(G, v):
# |E| = |V| - 1
if edges.size != vertices.size - 1:
return false;
for v in vertices:
visited[v] = false;
hasCycle = explore_and_check_cycles(G, v);
# a tree is acyclic
if hasCycle:
return false;
for v in vertices:
if not visited[v]: # the graph isn't connected
return false;
# otherwise passes all properties for a tree
return true;
# given a Graph and a vertex, explore all reachable vertices from the vertex
# and returns true if there are any cycles
def explore_and_check_cycles(G, v):
visited[v] = true;
for (v, u) in edges:
if not visited[u]:
return explore_and_check_cyles(G, u)
else: # a backedge between two vertices indicates a cycle
return true
return false
Sources:
Algorithms by S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani
http://www.cs.berkeley.edu/~vazirani/algorithms.html
start from the root, "mark" it and then go to all children and repeat recursively. if you reach a child that is already marked it means that it is not a tree...
Note: By no means the most efficient way, but conceptually useful. Sometimes you want efficiency, sometimes you want an alternate viewpoint for pedagogic reasons. This is most certainly the later.
Algorithm: Starting with an adjacency matrix A of size n. Take the matrix power A**n. If the matrix is zero for every entry you know that it is at least a collection of trees (a forest). If you can show that it is connected, then it must be a tree. See a Nilpotent matrix. for more information.
To find the root node, we assume that you've shown the graph is a connected tree. Let k be the the number of times you have to raise the power A**k before the matrix becomes all zero. Take the transpose to the (k-1) power A.T ** (k-1). The only non-zero entry must be the root.
Analysis: A rough worse case analysis shows that it is bounded above by O(n^4), three for the matrix multiplication at most n times. You can do better by diagonalizing the matrix which should bring it down to O(n^3). Considering that this problem can be done in O(n), O(1) time/space this is only a useful exercise in logic and understanding of the problem.
For a directed graph, the underlying undirected graph will be a tree if the undirected graph is acyclic and fully connected. The same property holds good if for the directed graph, each vertex has in-degree=1 except one having in-degree=0.
If adjacency-list representation also supports in-degree property of each vertex, then we can apply the above rule easily. Otherwise, we should apply a tweaked DFS to find no-loops for underlying undirected graph as well as that |E|=|V|-1.
Following is the code I have written for this. Feel free to suggest optimisations.
import java.util.*;
import java.lang.*;
import java.io.*;
class Graph
{
private static int V;
private static int adj[][];
static void initializeGraph(int n)
{
V=n+1;
adj = new int[V][V];
for(int i=0;i<V;i++)
{
for(int j=0;j<V ;j++)
adj[i][j]= 0;
}
}
static int isTree(int edges[][],int n)
{
initializeGraph(n);
for(int i=0;i< edges.length;i++)
{
addDirectedEdge(edges[i][0],edges[i][1]);
}
int root = findRoot();
if(root == -1)
return -1;
boolean visited[] = new boolean[V];
boolean isTree = isTree(root, visited, -1);
boolean isConnected = isConnected(visited);
System.out.println("isTree= "+ isTree + " isConnected= "+ isConnected);
if(isTree && isConnected)
return root;
else
return -1;
}
static boolean isTree(int node, boolean visited[], int parent)
{
// System.out.println("node =" +node +" parent" +parent);
visited[node] = true;int j;
for(j =1;j<V;j++)
{
// System.out.println("node =" +node + " j=" +j+ "parent" + parent);
if(adj[node][j]==1)
{
if(visited[j])
{
// System.out.println("returning false for j="+ j);
return false;
}
else
{ //visit all adjacent vertices
boolean child = isTree(j, visited, node);
if(!child)
{
// System.out.println("returning false for j="+ j + " node=" +node);
return false;
}
}
}
}
if(j==V)
return true;
else
return false;
}
static int findRoot()
{
int root =-1, j=0,i;
int count =0;
for(j=1;j<V ;j++)
{
count=0;
for(i=1 ;i<V;i++)
{
if(adj[i][j]==1)
count++;
}
// System.out.println("j"+j +" count="+count);
if(count==0)
{
// System.out.println(j);
return j;
}
}
return -1;
}
static void addDirectedEdge(int s, int d)
{
// System.out.println("s="+ s+"d="+d);
adj[s][d]=1;
}
static boolean isConnected(boolean visited[])
{
for(int i=1; i<V;i++)
{
if(!visited[i])
return false;
}
return true;
}
public static void main (String[] args) throws java.lang.Exception
{
int edges[][]= {{2,3},{2,4},{3,1},{3,5},{3,7},{4,6}, {2,8}, {8,9}};
int n=9;
int root = isTree(edges,n);
System.out.println("root is:" + root);
int edges2[][]= {{2,3},{2,4},{3,1},{3,5},{3,7},{4,6}, {2,8}, {6,3}};
int n2=8;
root = isTree(edges2,n2);
System.out.println("root is:" + root);
}
}
In short, I need a fast algorithm to count how many acyclic paths are there in a simple directed graph.
By simple graph I mean one without self loops or multiple edges.
A path can start from any node and must end on a node that has no outgoing edges. A path is acyclic if no edge occurs twice in it.
My graphs (empirical datasets) have only between 20-160 nodes, however, some of them have many cycles in them, therefore there will be a very large number of paths, and my naive approach is simply not fast enough for some of the graph I have.
What I'm doing currently is "descending" along all possible edges using a recursive function, while keeping track of which nodes I have already visited (and avoiding them). The fastest solution I have so far was written in C++, and uses std::bitset argument in the recursive function to keep track of which nodes were already visited (visited nodes are marked by bit 1). This program runs on the sample dataset in 1-2 minutes (depending on computer speed). With other datasets it takes more than a day to run, or apparently much longer.
The sample dataset: http://pastie.org/1763781
(each line is an edge-pair)
Solution for the sample dataset (first number is the node I'm starting from, second number is the path-count starting from that node, last number is the total path count):
http://pastie.org/1763790
Please let me know if you have ideas about algorithms with a better complexity. I'm also interested in approximate solutions (estimating the number of paths with some Monte Carlo approach). Eventually I'll also want to measure the average path length.
Edit: also posted on MathOverflow under same title, as it might be more relevant there. Hope this is not against the rules. Can't link as site won't allow more than 2 links ...
This is #P-complete, it seems. (ref http://www.maths.uq.edu.au/~kroese/ps/robkro_rev.pdf). The link has an approximation
If you can relax the simple path requirement, you can efficiently count the number of paths using a modified version of Floyd-Warshall or graph exponentiation as well. See All pairs all paths on a graph
As mentioned by spinning_plate, this problem is #P-complete so start looking for your aproximations :). I really like the #P-completeness proof for this problem, so I'd think it would be nice to share it:
Let N be the number of paths (starting at s) in the graph and p_k be the number of paths of length k. We have:
N = p_1 + p_2 + ... + p_n
Now build a second graph by changing every edge to a pair of paralel edges.For each path of length k there will now be k^2 paths so:
N_2 = p_1*2 + p_2*4 + ... + p_n*(2^n)
Repeating this process, but with i edges instead of 2, up n, would give us a linear system (with a Vandermonde matrix) allowing us to find p_1, ..., p_n.
N_i = p_1*i + p_2*(i^2) + ...
Therefore, finding the number of paths in the graph is just as hard as finding the number of paths of a certain length. In particular, p_n is the number of Hamiltonian Paths (starting at s), a bona-fide #P-complete problem.
I havent done the math I'd also guess that a similar process should be able to prove that just calculating average length is also hard.
Note: most times this problem is discussed the paths start from a single edge and stop wherever. This is the opposite from your problem, but you they should be equivalent by just reversing all the edges.
Importance of Problem Statement
It is unclear what is being counted.
Is the starting node set all nodes for which there is at least one outgoing edge, or is there a particular starting node criteria?
Is the the ending node set the set of all nodes for which there are zero outgoing edges, or can any node for which there is at least one incoming edge be a possible ending node?
Define your problem so that there are no ambiguities.
Estimation
Estimations can be off by orders of magnitude when designed for randomly constructed directed graphs and the graph is very statistically skewed or systematic in its construction. This is typical of all estimation processes, but particularly pronounced in graphs because of their exponential pattern complexity potential.
Two Optimizing Points
The std::bitset model will be slower than bool values for most processor architectures because of the instruction set mechanics of testing a bit at a particular bit offset. The bitset is more useful when memory footprint, not speed is the critical factor.
Eliminating cases or reducing via deductions is important. For instance, if there are nodes for which there is only one outgoing edge, one can calculate the number of paths without it and add to the number of paths in the sub-graph the number of paths from the node from which it points.
Resorting to Clusters
The problem can be executed on a cluster by distributing according to starting node. Some problems simply require super-computing. If you have 1,000,000 starting nodes and 10 processors, you can place 100,000 starting node cases on each processor. The above case eliminations and reductions should be done prior to distributing cases.
A Typical Depth First Recursion and How to Optimize It
Here is a small program that provides a basic depth first, acyclic traversal from any node to any node, which can be altered, placed in a loop, or distributed. The list can be placed into a static native array by using a template with a size as one parameter if the maximum data set size is known, which reduces iteration and indexing times.
#include <iostream>
#include <list>
class DirectedGraph {
private:
int miNodes;
std::list<int> * mnpEdges;
bool * mpVisitedFlags;
private:
void initAlreadyVisited() {
for (int i = 0; i < miNodes; ++ i)
mpVisitedFlags[i] = false;
}
void recurse(int iCurrent, int iDestination,
int path[], int index,
std::list<std::list<int> *> * pnai) {
mpVisitedFlags[iCurrent] = true;
path[index ++] = iCurrent;
if (iCurrent == iDestination) {
auto pni = new std::list<int>;
for (int i = 0; i < index; ++ i)
pni->push_back(path[i]);
pnai->push_back(pni);
} else {
auto it = mnpEdges[iCurrent].begin();
auto itBeyond = mnpEdges[iCurrent].end();
while (it != itBeyond) {
if (! mpVisitedFlags[* it])
recurse(* it, iDestination,
path, index, pnai);
++ it;
}
}
-- index;
mpVisitedFlags[iCurrent] = false;
}
public:
DirectedGraph(int iNodes) {
miNodes = iNodes;
mnpEdges = new std::list<int>[iNodes];
mpVisitedFlags = new bool[iNodes];
}
~DirectedGraph() {
delete mpVisitedFlags;
}
void addEdge(int u, int v) {
mnpEdges[u].push_back(v);
}
std::list<std::list<int> *> * findPaths(int iStart,
int iDestination) {
initAlreadyVisited();
auto path = new int[miNodes];
auto pnpi = new std::list<std::list<int> *>();
recurse(iStart, iDestination, path, 0, pnpi);
delete path;
return pnpi;
}
};
int main() {
DirectedGraph dg(5);
dg.addEdge(0, 1);
dg.addEdge(0, 2);
dg.addEdge(0, 3);
dg.addEdge(1, 3);
dg.addEdge(1, 4);
dg.addEdge(2, 0);
dg.addEdge(2, 1);
dg.addEdge(4, 1);
dg.addEdge(4, 3);
int startingNode = 0;
int destinationNode = 1;
auto pnai = dg.findPaths(startingNode, destinationNode);
std::cout
<< "Unique paths from "
<< startingNode
<< " to "
<< destinationNode
<< std::endl
<< std::endl;
bool bFirst;
std::list<int> * pi;
auto it = pnai->begin();
auto itBeyond = pnai->end();
std::list<int>::iterator itInner;
std::list<int>::iterator itInnerBeyond;
while (it != itBeyond) {
bFirst = true;
pi = * it ++;
itInner = pi->begin();
itInnerBeyond = pi->end();
while (itInner != itInnerBeyond) {
if (bFirst)
bFirst = false;
else
std::cout << ' ';
std::cout << (* itInner ++);
}
std::cout << std::endl;
delete pi;
}
delete pnai;
return 0;
}
Given two points A and B in a weighted graph, find all paths from A to B where the length of the path is between C1 and C2.
Ideally, each vertex should only be visited once, although this is not a hard requirement. I suppose I could use a heuristic to sort the results of the algorithm to weed out "silly" paths (e.g. a path that just visits the same two nodes over and over again)
I can think of simple brute force algorithms, but are there any more sophisticed algorithms that will make this more efficient? I can imagine as the graph grows this could become expensive.
In the application I am developing, A & B are actually the same point (i.e. the path must return to the start), if that makes any difference.
Note that this is an engineering problem, not a computer science problem, so I can use an algorithm that is fast but not necessarily 100% accurate. i.e. it is ok if it returns most of the possible paths, or if most of the paths returned are within the given length range.
[UPDATE]
This is what I have so far. I have this working on a small graph (30 nodes with around 100 edges). The time required is < 100ms
I am using a directed graph.
I do a depth first search of all possible paths.
At each new node
For each edge leaving the node
Reject the edge if the path we have already contains this edge (in other words, never go down the same edge in the same direction twice)
Reject the edge if it leads back to the node we just came from (in other words, never double back. This removes a lot of 'silly' paths)
Reject the edge if (minimum distance from the end node of the edge to the target node B + the distance travelled so far) > Maximum path length (C2)
If the end node of the edge is our target node B:
If the path fits within the length criteria, add it to the list of suitable paths.
Otherwise reject the edge (in other words, we only ever visit the target node B at the end of the path. It won't be an intermediate point on a path)
Otherwise, add the edge to our path and recurse into it's target node
I use Dijkstra to precompute the minimum distance of all nodes to the target node.
I wrote some java code to test the DFS approach I suggested: the code does not check for paths in range, but prints all paths. It should be simple to modify the code to only keep those in range. I also ran some simple tests. It seems to be giving correct results with 10 vertices and 50 edges or so, though I did not find time for any thorough testing. I also ran it for 100 vertices and 1000 edges. It doesn't run out of memory and keeps printing new paths till I kill it, of which there are a lot. This is not surprising for randomly generated dense graphs, but may not be the case for real world graphs, for example where vertex degrees follow a power law (specially with narrow weight ranges. Also, if you are just interested in how path lengths are distributed in a range, you can stop once you have generated a certain number.
The program outputs the following:
a) the adjacency list of a randomly generated graph.
b) Set of all paths it has found till now.
public class AllPaths {
int numOfVertices;
int[] status;
AllPaths(int numOfVertices){
this.numOfVertices = numOfVertices;
status = new int[numOfVertices+1];
}
HashMap<Integer,ArrayList<Integer>>adjList = new HashMap<Integer,ArrayList<Integer>>();
class FoundSubpath{
int pathWeight=0;
int[] vertices;
}
// For each vertex, a a list of all subpaths of length less than UB found.
HashMap<Integer,ArrayList<FoundSubpath>>allSubpathsFromGivenVertex = new HashMap<Integer,ArrayList<FoundSubpath>>();
public void printInputGraph(){
System.out.println("Random Graph Adjacency List:");
for(int i=1;i<=numOfVertices;i++){
ArrayList<Integer>toVtcs = adjList.get(new Integer(i));
System.out.print(i+ " ");
if(toVtcs==null){
continue;
}
for(int j=0;j<toVtcs.size();j++){
System.out.print(toVtcs.get(j)+ " ");
}
System.out.println(" ");
}
}
public void randomlyGenerateGraph(int numOfTrials){
Random rnd = new Random();
for(int i=1;i < numOfTrials;i++){
Integer fromVtx = new Integer(rnd.nextInt(numOfVertices)+1);
Integer toVtx = new Integer(rnd.nextInt(numOfVertices)+1);
if(fromVtx.equals(toVtx)){
continue;
}
ArrayList<Integer>toVtcs = adjList.get(fromVtx);
boolean alreadyAdded = false;
if(toVtcs==null){
toVtcs = new ArrayList<Integer>();
}else{
for(int j=0;j<toVtcs.size();j++){
if(toVtcs.get(j).equals(toVtx)){
alreadyAdded = true;
break;
}
}
}
if(!alreadyAdded){
toVtcs.add(toVtx);
adjList.put(fromVtx, toVtcs);
}
}
}
public void addAllViableSubpathsToMap(ArrayList<Integer>VerticesTillNowInPath){
FoundSubpath foundSpObj;
ArrayList<FoundSubpath>foundPathsList;
for(int i=0;i<VerticesTillNowInPath.size()-1;i++){
Integer startVtx = VerticesTillNowInPath.get(i);
if(allSubpathsFromGivenVertex.containsKey(startVtx)){
foundPathsList = allSubpathsFromGivenVertex.get(startVtx);
}else{
foundPathsList = new ArrayList<FoundSubpath>();
}
foundSpObj = new FoundSubpath();
foundSpObj.vertices = new int[VerticesTillNowInPath.size()-i-1];
int cntr = 0;
for(int j=i+1;j<VerticesTillNowInPath.size();j++){
foundSpObj.vertices[cntr++] = VerticesTillNowInPath.get(j);
}
foundPathsList.add(foundSpObj);
allSubpathsFromGivenVertex.put(startVtx,foundPathsList);
}
}
public void printViablePaths(Integer v,ArrayList<Integer>VerticesTillNowInPath){
ArrayList<FoundSubpath>foundPathsList;
foundPathsList = allSubpathsFromGivenVertex.get(v);
if(foundPathsList==null){
return;
}
for(int j=0;j<foundPathsList.size();j++){
for(int i=0;i<VerticesTillNowInPath.size();i++){
System.out.print(VerticesTillNowInPath.get(i)+ " ");
}
FoundSubpath fpObj = foundPathsList.get(j) ;
for(int k=0;k<fpObj.vertices.length;k++){
System.out.print(fpObj.vertices[k]+" ");
}
System.out.println("");
}
}
boolean DfsModified(Integer v,ArrayList<Integer>VerticesTillNowInPath,Integer source,Integer dest){
if(v.equals(dest)){
addAllViableSubpathsToMap(VerticesTillNowInPath);
status[v] = 2;
return true;
}
// If vertex v is already explored till destination, just print all subpaths that meet criteria, using hashmap.
if(status[v] == 1 || status[v] == 2){
printViablePaths(v,VerticesTillNowInPath);
}
// Vertex in current path. Return to avoid cycle.
if(status[v]==1){
return false;
}
if(status[v]==2){
return true;
}
status[v] = 1;
boolean completed = true;
ArrayList<Integer>toVtcs = adjList.get(v);
if(toVtcs==null){
status[v] = 2;
return true;
}
for(int i=0;i<toVtcs.size();i++){
Integer vDest = toVtcs.get(i);
VerticesTillNowInPath.add(vDest);
boolean explorationComplete = DfsModified(vDest,VerticesTillNowInPath,source,dest);
if(explorationComplete==false){
completed = false;
}
VerticesTillNowInPath.remove(VerticesTillNowInPath.size()-1);
}
if(completed){
status[v] = 2;
}else{
status[v] = 0;
}
return completed;
}
}
public class AllPathsCaller {
public static void main(String[] args){
int numOfVertices = 20;
/* This is the number of attempts made to create an edge. The edge is usually created but may not be ( eg, if an edge already exists between randomly attempted source and destination.*/
int numOfEdges = 200;
int src = 1;
int dest = 10;
AllPaths allPaths = new AllPaths(numOfVertices);
allPaths.randomlyGenerateGraph(numOfEdges);
allPaths.printInputGraph();
ArrayList<Integer>VerticesTillNowInPath = new ArrayList<Integer>();
VerticesTillNowInPath.add(new Integer(src));
System.out.println("List of Paths");
allPaths.DfsModified(new Integer(src),VerticesTillNowInPath,new Integer(src),new Integer(dest));
System.out.println("done");
}
}
I think you are on the right track with BFS. I came up with some rough vaguely java-like pseudo-code for a proposed solution using BFS. The idea is to store subpaths found during previous traversals, and their lengths, for reuse. I'll try to improve the code sometime today when I find the time, but hopefully it gives a clue as to where I am going with this. The complexity, I am guessing, should be order O(E).
,
Further comments:
This seems like a reasonable approach, though I am not sure I understand completely. I've constructed a simple example to make sure I do. Lets consider a simple graph with all edges weighted 1, and adjacency list representation as follows:
A->B,C
B->C
C->D,F
F->D
Say we wanted to find all paths from A to F, not just those in range, and destination vertices from a source vertex are explored in alphabetic order. Then the algorithm would work as follows:
First starting with B:
ABCDF
ABCF
Then starting with C:
ACDF
ACF
Is that correct?
A simple improvement in that case, would be to store for each vertex visited, the paths found after the first visit to that node. For example, in this example, once you visit C from B, you find that there are two paths to F from C: CF and CDF. You can save this information, and in the next iteration once you reach C, you can just append CF and CDF to the path you have found, and won't need to explore further.
To find edges in range, you can use the conditions you already described for paths generated as above.
A further thought: maybe you do not need to run Dijkstra's to find shortest paths at all. A subpath's length will be found the first time you traverse the subpath. So, in this example, the length of CDF and CF the first time you visit C via B. This information can be used for pruning the next time C is visited directly via A. This length will be more accurate than that found by Dijkstra's, as it would be the exact value, not the lower bound.
Further comments:
The algorithm can probably be improved with some thought. For example, each time the relaxation step is executed in Dijkstra's algorithm (steps 16-19 in the wikipedia description), the rejected older/newer subpath can be remembered using some data structure, if the older path is a plausible candidate (less than upper bound). In the end, it should be possible to reconstruct all the rejected paths, and keep the ones in range.
This algorithm should be O(V^2).
I think visiting each vertex only once may be too optimistic: algorithms such as Djikstra's shortest path have complexity v^2 for finding a single path, the shortest path. Finding all paths (including shortest path) is a harder problem, so should have complexity at least V^2.
My first thought on approaching the problem is a variation of Djikstra's shortest path algorithm. Applying this algorithm once would give you the length of the shortest path. This gives you a lower bound on the path length between the two vertices. Removing an edge at a time from this shortest path, and recalculating the shortest path should give you slightly longer paths.
In turn, edges can be removed from these slightly longer paths to generate more paths, and so on. You can stop once you have a sufficient number of paths, or if the paths you generate are over your upper bound.
This is my first guess. I am a newbie to stackoverflow: any feedback is welcome.