Correlation Network Implementation - algorithm

I have been working on my graph/network problem, and I think I finally know what I want to do. Now that I am getting into the implementation, I am having issues deciding what libraries to use. The graph itself is pretty simple, each node is labeled by a string, and each each is a probability/correlation coefficient between the two nodes(variables), and is undirected. The operations that I want to perform on the graph are:
Inserting new nodes/edges (fast)
Finding the all pairs shortest (1/probability) path, and remembering the nodes in the path - probably Johnson's algorithm
Constructing the minimum weight Steiner tree for k specific vertices
Use Johnson's algorithm to build shortest paths
Iterating over the current nodes in the path p, find the shortest route to the remaining nodes in k
Looking at the mean degree of the graph
Evaluating the betweenness of the nodes
Getting the clustering coefficients
Finding the modularity of the graph
For many of these, I want to compare the result to the Erdos-Renyi model, testing against it as a null hypothesis. Also, being able to be able to use the statistical mechanics definitions via a Markov Field would be helpful, as then I could calculate correlations between two nodes that are not identical, and ask the graph questions about the entropy, etc. So a good mapping onto a Markov field library of some sort would be useful too.
The crux of the problem at the moment is that I am trying to find a C++ library to work in. I have taken a look at R, but I want something that is going to be more robust and faster. The three libraries that I am considering are:
LEMON
Easy to use and install
Straightforward documentation
Has some of the functions I want already
Dynamically creating a graph from reading in a text file, and making sure there are no duplicate nodes, is a nightmare that I have not been able to figure out
Boost Graph Library
Intractable, arcane definitions for objects, and how to use them
Documentation does not match what the code does, necessarily
Does have many of the algorithms that I want, as well as a very easy way to create a graph from a text file
MultiThreaded Graph Library
Parallelism already incorporated
Reads easier than the BGL
Not as many functions
Still arcane
Further down the road, I envision the graph living on a distributed network, with distributed storage (hadoop or something). I suspect that the whole graph will not fit into memory, and so I will have to come up with a caching scenario to look at parts of the graph.
What library would people suggest for the problem that I described? Would it be better to just use the BGL, and write my own functions? What about the multi-threaded version? Are there any libraries that lend themselves more readily to the type of work I want to do, especially the quantities I want to compute?
Original Post
Thanks!
Edit1
So I am seriously frustrated by the BGL. I have an adjacency list graph, and I want to run my own version of the Johnson's (or Floyd's, at this point, I am not picky) on the graph, and return the Distance Matrix for me to look at. Except that I can't get it to work. Here is my full code implementation thus far:
using namespace boost;
int main()
{
//Read in the file
std::ifstream datafile("stuff");
if (!datafile)
{
std::cerr << "No Stuff file" << std::endl;
return EXIT_FAILURE;
}
//Build the graph
typedef adjacency_list < vecS, vecS, undirectedS, property < vertex_name_t,
std::string >, property < edge_weight_t, double > > Graph;
Graph g;
//Build the two properties we want, string and double
//Note, you have to nest properties for more
typedef property_map< Graph, vertex_index_t >::type vertex_index_map_t;
vertex_index_map_t vertex_index_map = get(vertex_index, g);
typedef property_map < Graph, vertex_name_t >::type name_map_t;
name_map_t name_map = get(vertex_name, g);
typedef property_map < Graph, edge_weight_t >::type probability_map_t;
probability_map_t probability = get(edge_weight, g);
//Map of of the vertices by string
typedef graph_traits < Graph >::vertex_descriptor Vertex;
typedef std::map < std::string, Vertex > NameVertexMap;
NameVertexMap AllNodes;
//Load the file into the graph
for (std::string line; std::getline(datafile, line);)
{
char_delimiters_separator < char >sep(false, "", ";");
tokenizer <> line_toks(line, sep);
tokenizer <>::iterator i = line_toks.begin();
std::string conditionA = *i++;
NameVertexMap::iterator pos;
bool inserted;
Vertex u, v;
boost::tie(pos, inserted) = AllNodes.insert(std::make_pair(conditionA, Vertex()));
if (inserted)
{
u = add_vertex(g);
name_map[u] = conditionA;
pos->second = u;
}
else
{
u = pos->second;
}
std::string correlation = *i++;
std::istringstream incorrelation(correlation);
double correlate;
incorrelation >> correlate;
boost::tie(pos, inserted) = AllNodes.insert(std::make_pair(*i, Vertex()));
if (inserted) {
v = add_vertex(g);
name_map[v] = *i;
pos->second = v;
}
else
{
v = pos->second;
}
graph_traits < Graph >::edge_descriptor e;
boost::tie(e, inserted) = add_edge(u, v, g);
if (inserted)
probability[e] = 1.0/correlate;
}
typedef boost::graph_traits<Graph>::edge_iterator edge_iter;
std::pair<edge_iter, edge_iter> edgePair;
Vertex u, v;
for(edgePair = edges(g); edgePair.first != edgePair.second; ++edgePair.first)
{
u = source(*edgePair.first, g);
v = target(*edgePair.first, g);
std::cout << "( " << vertex_index_map[u] << ":" << name_map[u] << ", ";
std::cout << probability[*edgePair.first] << ", ";
std::cout << vertex_index_map[v] << ":" << name_map[v] << " )" << std::endl;
}
}
Where the input file is of the format NodeA;correlation;NodeB. The code that I pasted above works, but I get into serious trouble when I attempt to include the johnson_all_pairs_shortest_paths functionality. Really what I want is not only a DistanceMatrix D (which I cannot seem to construct correctly, I want it to be a square matrix of doubles double D[V][V], V = num_vertices(g), but it gives me back that I am not calling the function correctly), but also a list of the nodes that were taken along that path, similar to what the wiki article has for Floyd's Algorithm path reconstruction. Should I just make the attempt to roll my own algorithm(s) for this problem, since I can't figure out if the functionality is there or not (not to mention how to make the function calls)? The documentation for the BGL is as obtuse as the implementation, so I don't really have any modern examples to go on.

Related

Boost Graph identifying vertices from its created matrix

I have a table of vertices and edges and from this tables i created a Boost graph. each of the vertex edges had its id assign to it while the edges also contains length. now i want to prune the graph by removing nodes. my algorithm is done by creating a matrix of num_vertices. My problem is how to associate my matrix with the boost::vertices that is how do i know which of the matrix column correspond to my vertex in the graph since the matrix has no id. hope i am not thinking too complicated.
void Nodekiller::build_matrix(){
int ndsize=num_vertices(graph);
double matrixtb[ndsize][ndsize];
for(int i=0; i<ndsize;i++){
for (int j=0;j<ndsize; j++){
if(i==j) {matrixtb[i][j]=0;}
else {
matrixtb[i][j]=addEdgeValue(); //if none add random value
}
}
}
}
//i want to to sum each column and then prioritize them based on the values gotten.
so i don't know how to associate the boost::vertices(graph) with the matrix in other to be able to prune the graph.
The question is not very clear. Do I understand right:
You have a boost graph
You create a matrix from that graph?
So a first trivial question (maybe outside of the scope): do you really need two representations of the same graphe? one as a boost::graph, and an other as your matrix?
You can add and remove edges from a boost::graph easily. The easiest representation is the adjacency list: http://www.boost.org/doc/libs/1_55_0/libs/graph/doc/adjacency_list.html
Maybe a starting point could be this answer: adding custom vertices to a boost graph
You can create all your nodes, iterate on every node, and add a vertice only if the two nodes are different. Something like :
boost::graph_traits<Graph>::vertex_iterator vi, vi_end;
boost::tie(vi, vi_end) = boost::vertices(g);
boost::tie(vi2, vi2_end) = boost::vertices(g);
for (; vi != vi_end; ++vi) {
for (; vi2 != vi2_end; ++vi2) {
if(*vi != *vi2) {
boost::add_edge(
edge_t e; bool b;
boost::tie(e,b) = boost::add_edge(u,v,g);
g[e] = addEdgeValue();
}
}
}

Detect cycles in undirected graph using boost graph library

I've been stuck since yesterday with this problem. Unfortunately/fortunately this problem makes only about 0.5% of the my super huge (for me, a c++ newbie) algorithm thus the need for a library of existing code that one can just adapt and get things working.
I'll like to detect and give out all the circles in an undirected graph. My edges are not weighted. Yes, what I need is really all the cycles i.e. somthing like all the hamiltonian cycles of a directed graph
I've been playing aroung with boost graph library, the DFS algorithm seemed very promissing, however, it visits the vertices only once and as such cannot give all hamiltonian circles.
For the moment, I just need the code to work, so that I can continue my algorithm design, afterwards I may consider performance issues. Even a solution with 5-nested for loops is welcome.
Here is a code I got and played around with from boost but I don't know how to record and access my predecessors of back_edges and even if that was solve, boost DFS will visit vertices only once:
struct detect_loops : public boost::dfs_visitor<>
{
template <class Edge, class Graph>
void back_edge(Edge e, const Graph& g) {
std::cout << source(e, g)
<< " -- "
<< target(e, g) << "\n";
}
};
int main(int,char*[])
{
typedef std::pair<int,int> Edge;
std::vector<Edge> edges;
edges.push_back(Edge(0,1));
edges.push_back(Edge(1,2));
edges.push_back(Edge(2,3));
edges.push_back(Edge(3,1));
edges.push_back(Edge(4,5));
edges.push_back(Edge(5,0));
edges.push_back(Edge(4,0));
edges.push_back(Edge(5,6));
edges.push_back(Edge(2,6));
typedef adjacency_list<
vecS, vecS, undirectedS, no_property,
property<edge_color_t, default_color_type>
> graph_t;
typedef graph_traits<graph_t>::vertex_descriptor vertex_t;
graph_t g(edges.begin(), edges.end(), 7);
std::cout << "back edges:\n";
detect_loops vis;
undirected_dfs(g, root_vertex(vertex_t(0)).visitor(vis).edge_color_map(get(edge_color, g)));
std::cout << std::endl;
return 0;
}
The example above says there are only 3 cycles normally I'll expect more than 4 whereby a single vertex may appear in multiple cycles. And secondly even, I cannot even access all the three the cycles that boost's back_edge() gives me like this std::vector<uInt32> fCycle1, fCycle2,fCycle3. All I get from back_edge() is just the source and target vertices.
I'll be grateful for any help and tips. So far, all the examples here will just detect the presence of a cycle or a number thereof but none has shown how to list all of the cycles present.

Boost Graph Library: edge insertion slow for large graph

I'm trying to use implement an "intelligent scissor" for an interactive image segmentation. Therefore, I have to create a directed graph from an image where each vertex represents a single pixel. Each vertex is then conntected to each of its neighbours by two edges: one outgoing and one incoming edge. This is due to the fact that the cost of an edge (a,b) may differ from the cost of (b,a). I'm using images with a size of 512*512 pixel so i need to create a graph with 262144 vertices and 2091012 edges. Currently, I'm using the following graph:
typedef property<vertex_index_t, int,
property<vertex_distance_t, double,
property<x_t, int,
property<y_t, int
>>>> VertexProperty;
typedef property<edge_weight_t, double> EdgeProperty;
// define MyGraph
typedef adjacency_list<
vecS, // container used for the out-edges (list)
vecS, // container used for the vertices (vector)
directedS, // directed edges (not sure if this is the right choice for incidenceGraph)
VertexProperty,
EdgeProperty
> MyGraph;
I'm using an additional class Graph (sorry for the uninspired naming) which handles the graph:
class Graph
{
private:
MyGraph *graph;
property_map<MyGraph, vertex_index_t>::type indexmap;
property_map<MyGraph, vertex_distance_t>::type distancemap;
property_map<MyGraph, edge_weight_t>::type weightmap;
property_map<MyGraph, x_t>::type xmap;
property_map<MyGraph, y_t>::type ymap;
std::vector<MyGraph::vertex_descriptor> predecessors;
public:
Graph();
~Graph();
};
Creating a new graph with 262144 vertices is pretty fast but the insertion of the edges tooks up to 10 seconds which is way too slow for the desired application. Right now, I'm inserting the edges the following way:
tie(vertexIt, vertexEnd) = vertices(*graph);
for(; vertexIt != vertexEnd; vertexIt++){
vertexID = *vertexIt;
x = vertexID % 512;
y = (vertexID - x) / 512;
xmap[vertexID] = x;
ymap[vertexID] = y;
if(y > 0){
if(x > 0){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y-1)+(x-1)], *graph); // upper left neighbour
}
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y-1)+(x)], *graph); // upper
if(x < 511){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y-1)+(x+1)], *graph); // upper right
}
}
if(x < 511){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y)+(x+1)], *graph); // right
}
if(y < 511){
if(x > 0){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y+1)+(x-1)], *graph); // lower left
}
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y+1)+(x)], *graph); // lower
if(x < 511){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y+1)+(x+1)], *graph); // lower right
}
}
if(x > 0){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y)+(x-1)], *graph); // left
}
}
Is there anything I can do do improve the speed of the programm? I'm using Microsoft Visual C++ 2010 Express in release mode with optimization (as recommended by Boost). I thought I could use a listS container for the vertices or edges but the vertices are no problem and if I use listS for the edges, it gets even slower.
adjacency_list is very general purpose; unfortunately it's never going to be as efficient as a solution exploiting the regularity of your particular use-case could be. BGL isn't magic.
Your best bet is probably to come up with the efficient graph representation you'd use in the absence of BGL (hint: for a graph of an image's neighbouring pixels, this is not going to explicitly allocate all those node and edge objects) and then fit BGL to it (example), or equivalently just directly implement a counterpart to the existing adjacency_list / adjacency_matrix templates (concept guidelines) tuned to the regularities of your system.
By an optimised representation, I of course mean one in which you don't actually store all the nodes and edges explicitly but just have some way of iterating over enumerations of the implicit nodes and edges arising from the fact that the image is a particular size. The only thing you should really need to store is an array of edge weights.

Fast algorithm for counting the number of acyclic paths on a directed graph

In short, I need a fast algorithm to count how many acyclic paths are there in a simple directed graph.
By simple graph I mean one without self loops or multiple edges.
A path can start from any node and must end on a node that has no outgoing edges. A path is acyclic if no edge occurs twice in it.
My graphs (empirical datasets) have only between 20-160 nodes, however, some of them have many cycles in them, therefore there will be a very large number of paths, and my naive approach is simply not fast enough for some of the graph I have.
What I'm doing currently is "descending" along all possible edges using a recursive function, while keeping track of which nodes I have already visited (and avoiding them). The fastest solution I have so far was written in C++, and uses std::bitset argument in the recursive function to keep track of which nodes were already visited (visited nodes are marked by bit 1). This program runs on the sample dataset in 1-2 minutes (depending on computer speed). With other datasets it takes more than a day to run, or apparently much longer.
The sample dataset: http://pastie.org/1763781
(each line is an edge-pair)
Solution for the sample dataset (first number is the node I'm starting from, second number is the path-count starting from that node, last number is the total path count):
http://pastie.org/1763790
Please let me know if you have ideas about algorithms with a better complexity. I'm also interested in approximate solutions (estimating the number of paths with some Monte Carlo approach). Eventually I'll also want to measure the average path length.
Edit: also posted on MathOverflow under same title, as it might be more relevant there. Hope this is not against the rules. Can't link as site won't allow more than 2 links ...
This is #P-complete, it seems. (ref http://www.maths.uq.edu.au/~kroese/ps/robkro_rev.pdf). The link has an approximation
If you can relax the simple path requirement, you can efficiently count the number of paths using a modified version of Floyd-Warshall or graph exponentiation as well. See All pairs all paths on a graph
As mentioned by spinning_plate, this problem is #P-complete so start looking for your aproximations :). I really like the #P-completeness proof for this problem, so I'd think it would be nice to share it:
Let N be the number of paths (starting at s) in the graph and p_k be the number of paths of length k. We have:
N = p_1 + p_2 + ... + p_n
Now build a second graph by changing every edge to a pair of paralel edges.For each path of length k there will now be k^2 paths so:
N_2 = p_1*2 + p_2*4 + ... + p_n*(2^n)
Repeating this process, but with i edges instead of 2, up n, would give us a linear system (with a Vandermonde matrix) allowing us to find p_1, ..., p_n.
N_i = p_1*i + p_2*(i^2) + ...
Therefore, finding the number of paths in the graph is just as hard as finding the number of paths of a certain length. In particular, p_n is the number of Hamiltonian Paths (starting at s), a bona-fide #P-complete problem.
I havent done the math I'd also guess that a similar process should be able to prove that just calculating average length is also hard.
Note: most times this problem is discussed the paths start from a single edge and stop wherever. This is the opposite from your problem, but you they should be equivalent by just reversing all the edges.
Importance of Problem Statement
It is unclear what is being counted.
Is the starting node set all nodes for which there is at least one outgoing edge, or is there a particular starting node criteria?
Is the the ending node set the set of all nodes for which there are zero outgoing edges, or can any node for which there is at least one incoming edge be a possible ending node?
Define your problem so that there are no ambiguities.
Estimation
Estimations can be off by orders of magnitude when designed for randomly constructed directed graphs and the graph is very statistically skewed or systematic in its construction. This is typical of all estimation processes, but particularly pronounced in graphs because of their exponential pattern complexity potential.
Two Optimizing Points
The std::bitset model will be slower than bool values for most processor architectures because of the instruction set mechanics of testing a bit at a particular bit offset. The bitset is more useful when memory footprint, not speed is the critical factor.
Eliminating cases or reducing via deductions is important. For instance, if there are nodes for which there is only one outgoing edge, one can calculate the number of paths without it and add to the number of paths in the sub-graph the number of paths from the node from which it points.
Resorting to Clusters
The problem can be executed on a cluster by distributing according to starting node. Some problems simply require super-computing. If you have 1,000,000 starting nodes and 10 processors, you can place 100,000 starting node cases on each processor. The above case eliminations and reductions should be done prior to distributing cases.
A Typical Depth First Recursion and How to Optimize It
Here is a small program that provides a basic depth first, acyclic traversal from any node to any node, which can be altered, placed in a loop, or distributed. The list can be placed into a static native array by using a template with a size as one parameter if the maximum data set size is known, which reduces iteration and indexing times.
#include <iostream>
#include <list>
class DirectedGraph {
private:
int miNodes;
std::list<int> * mnpEdges;
bool * mpVisitedFlags;
private:
void initAlreadyVisited() {
for (int i = 0; i < miNodes; ++ i)
mpVisitedFlags[i] = false;
}
void recurse(int iCurrent, int iDestination,
int path[], int index,
std::list<std::list<int> *> * pnai) {
mpVisitedFlags[iCurrent] = true;
path[index ++] = iCurrent;
if (iCurrent == iDestination) {
auto pni = new std::list<int>;
for (int i = 0; i < index; ++ i)
pni->push_back(path[i]);
pnai->push_back(pni);
} else {
auto it = mnpEdges[iCurrent].begin();
auto itBeyond = mnpEdges[iCurrent].end();
while (it != itBeyond) {
if (! mpVisitedFlags[* it])
recurse(* it, iDestination,
path, index, pnai);
++ it;
}
}
-- index;
mpVisitedFlags[iCurrent] = false;
}
public:
DirectedGraph(int iNodes) {
miNodes = iNodes;
mnpEdges = new std::list<int>[iNodes];
mpVisitedFlags = new bool[iNodes];
}
~DirectedGraph() {
delete mpVisitedFlags;
}
void addEdge(int u, int v) {
mnpEdges[u].push_back(v);
}
std::list<std::list<int> *> * findPaths(int iStart,
int iDestination) {
initAlreadyVisited();
auto path = new int[miNodes];
auto pnpi = new std::list<std::list<int> *>();
recurse(iStart, iDestination, path, 0, pnpi);
delete path;
return pnpi;
}
};
int main() {
DirectedGraph dg(5);
dg.addEdge(0, 1);
dg.addEdge(0, 2);
dg.addEdge(0, 3);
dg.addEdge(1, 3);
dg.addEdge(1, 4);
dg.addEdge(2, 0);
dg.addEdge(2, 1);
dg.addEdge(4, 1);
dg.addEdge(4, 3);
int startingNode = 0;
int destinationNode = 1;
auto pnai = dg.findPaths(startingNode, destinationNode);
std::cout
<< "Unique paths from "
<< startingNode
<< " to "
<< destinationNode
<< std::endl
<< std::endl;
bool bFirst;
std::list<int> * pi;
auto it = pnai->begin();
auto itBeyond = pnai->end();
std::list<int>::iterator itInner;
std::list<int>::iterator itInnerBeyond;
while (it != itBeyond) {
bFirst = true;
pi = * it ++;
itInner = pi->begin();
itInnerBeyond = pi->end();
while (itInner != itInnerBeyond) {
if (bFirst)
bFirst = false;
else
std::cout << ' ';
std::cout << (* itInner ++);
}
std::cout << std::endl;
delete pi;
}
delete pnai;
return 0;
}

Enumerate all paths in a weighted graph from A to B where path length is between C1 and C2

Given two points A and B in a weighted graph, find all paths from A to B where the length of the path is between C1 and C2.
Ideally, each vertex should only be visited once, although this is not a hard requirement. I suppose I could use a heuristic to sort the results of the algorithm to weed out "silly" paths (e.g. a path that just visits the same two nodes over and over again)
I can think of simple brute force algorithms, but are there any more sophisticed algorithms that will make this more efficient? I can imagine as the graph grows this could become expensive.
In the application I am developing, A & B are actually the same point (i.e. the path must return to the start), if that makes any difference.
Note that this is an engineering problem, not a computer science problem, so I can use an algorithm that is fast but not necessarily 100% accurate. i.e. it is ok if it returns most of the possible paths, or if most of the paths returned are within the given length range.
[UPDATE]
This is what I have so far. I have this working on a small graph (30 nodes with around 100 edges). The time required is < 100ms
I am using a directed graph.
I do a depth first search of all possible paths.
At each new node
For each edge leaving the node
Reject the edge if the path we have already contains this edge (in other words, never go down the same edge in the same direction twice)
Reject the edge if it leads back to the node we just came from (in other words, never double back. This removes a lot of 'silly' paths)
Reject the edge if (minimum distance from the end node of the edge to the target node B + the distance travelled so far) > Maximum path length (C2)
If the end node of the edge is our target node B:
If the path fits within the length criteria, add it to the list of suitable paths.
Otherwise reject the edge (in other words, we only ever visit the target node B at the end of the path. It won't be an intermediate point on a path)
Otherwise, add the edge to our path and recurse into it's target node
I use Dijkstra to precompute the minimum distance of all nodes to the target node.
I wrote some java code to test the DFS approach I suggested: the code does not check for paths in range, but prints all paths. It should be simple to modify the code to only keep those in range. I also ran some simple tests. It seems to be giving correct results with 10 vertices and 50 edges or so, though I did not find time for any thorough testing. I also ran it for 100 vertices and 1000 edges. It doesn't run out of memory and keeps printing new paths till I kill it, of which there are a lot. This is not surprising for randomly generated dense graphs, but may not be the case for real world graphs, for example where vertex degrees follow a power law (specially with narrow weight ranges. Also, if you are just interested in how path lengths are distributed in a range, you can stop once you have generated a certain number.
The program outputs the following:
a) the adjacency list of a randomly generated graph.
b) Set of all paths it has found till now.
public class AllPaths {
int numOfVertices;
int[] status;
AllPaths(int numOfVertices){
this.numOfVertices = numOfVertices;
status = new int[numOfVertices+1];
}
HashMap<Integer,ArrayList<Integer>>adjList = new HashMap<Integer,ArrayList<Integer>>();
class FoundSubpath{
int pathWeight=0;
int[] vertices;
}
// For each vertex, a a list of all subpaths of length less than UB found.
HashMap<Integer,ArrayList<FoundSubpath>>allSubpathsFromGivenVertex = new HashMap<Integer,ArrayList<FoundSubpath>>();
public void printInputGraph(){
System.out.println("Random Graph Adjacency List:");
for(int i=1;i<=numOfVertices;i++){
ArrayList<Integer>toVtcs = adjList.get(new Integer(i));
System.out.print(i+ " ");
if(toVtcs==null){
continue;
}
for(int j=0;j<toVtcs.size();j++){
System.out.print(toVtcs.get(j)+ " ");
}
System.out.println(" ");
}
}
public void randomlyGenerateGraph(int numOfTrials){
Random rnd = new Random();
for(int i=1;i < numOfTrials;i++){
Integer fromVtx = new Integer(rnd.nextInt(numOfVertices)+1);
Integer toVtx = new Integer(rnd.nextInt(numOfVertices)+1);
if(fromVtx.equals(toVtx)){
continue;
}
ArrayList<Integer>toVtcs = adjList.get(fromVtx);
boolean alreadyAdded = false;
if(toVtcs==null){
toVtcs = new ArrayList<Integer>();
}else{
for(int j=0;j<toVtcs.size();j++){
if(toVtcs.get(j).equals(toVtx)){
alreadyAdded = true;
break;
}
}
}
if(!alreadyAdded){
toVtcs.add(toVtx);
adjList.put(fromVtx, toVtcs);
}
}
}
public void addAllViableSubpathsToMap(ArrayList<Integer>VerticesTillNowInPath){
FoundSubpath foundSpObj;
ArrayList<FoundSubpath>foundPathsList;
for(int i=0;i<VerticesTillNowInPath.size()-1;i++){
Integer startVtx = VerticesTillNowInPath.get(i);
if(allSubpathsFromGivenVertex.containsKey(startVtx)){
foundPathsList = allSubpathsFromGivenVertex.get(startVtx);
}else{
foundPathsList = new ArrayList<FoundSubpath>();
}
foundSpObj = new FoundSubpath();
foundSpObj.vertices = new int[VerticesTillNowInPath.size()-i-1];
int cntr = 0;
for(int j=i+1;j<VerticesTillNowInPath.size();j++){
foundSpObj.vertices[cntr++] = VerticesTillNowInPath.get(j);
}
foundPathsList.add(foundSpObj);
allSubpathsFromGivenVertex.put(startVtx,foundPathsList);
}
}
public void printViablePaths(Integer v,ArrayList<Integer>VerticesTillNowInPath){
ArrayList<FoundSubpath>foundPathsList;
foundPathsList = allSubpathsFromGivenVertex.get(v);
if(foundPathsList==null){
return;
}
for(int j=0;j<foundPathsList.size();j++){
for(int i=0;i<VerticesTillNowInPath.size();i++){
System.out.print(VerticesTillNowInPath.get(i)+ " ");
}
FoundSubpath fpObj = foundPathsList.get(j) ;
for(int k=0;k<fpObj.vertices.length;k++){
System.out.print(fpObj.vertices[k]+" ");
}
System.out.println("");
}
}
boolean DfsModified(Integer v,ArrayList<Integer>VerticesTillNowInPath,Integer source,Integer dest){
if(v.equals(dest)){
addAllViableSubpathsToMap(VerticesTillNowInPath);
status[v] = 2;
return true;
}
// If vertex v is already explored till destination, just print all subpaths that meet criteria, using hashmap.
if(status[v] == 1 || status[v] == 2){
printViablePaths(v,VerticesTillNowInPath);
}
// Vertex in current path. Return to avoid cycle.
if(status[v]==1){
return false;
}
if(status[v]==2){
return true;
}
status[v] = 1;
boolean completed = true;
ArrayList<Integer>toVtcs = adjList.get(v);
if(toVtcs==null){
status[v] = 2;
return true;
}
for(int i=0;i<toVtcs.size();i++){
Integer vDest = toVtcs.get(i);
VerticesTillNowInPath.add(vDest);
boolean explorationComplete = DfsModified(vDest,VerticesTillNowInPath,source,dest);
if(explorationComplete==false){
completed = false;
}
VerticesTillNowInPath.remove(VerticesTillNowInPath.size()-1);
}
if(completed){
status[v] = 2;
}else{
status[v] = 0;
}
return completed;
}
}
public class AllPathsCaller {
public static void main(String[] args){
int numOfVertices = 20;
/* This is the number of attempts made to create an edge. The edge is usually created but may not be ( eg, if an edge already exists between randomly attempted source and destination.*/
int numOfEdges = 200;
int src = 1;
int dest = 10;
AllPaths allPaths = new AllPaths(numOfVertices);
allPaths.randomlyGenerateGraph(numOfEdges);
allPaths.printInputGraph();
ArrayList<Integer>VerticesTillNowInPath = new ArrayList<Integer>();
VerticesTillNowInPath.add(new Integer(src));
System.out.println("List of Paths");
allPaths.DfsModified(new Integer(src),VerticesTillNowInPath,new Integer(src),new Integer(dest));
System.out.println("done");
}
}
I think you are on the right track with BFS. I came up with some rough vaguely java-like pseudo-code for a proposed solution using BFS. The idea is to store subpaths found during previous traversals, and their lengths, for reuse. I'll try to improve the code sometime today when I find the time, but hopefully it gives a clue as to where I am going with this. The complexity, I am guessing, should be order O(E).
,
Further comments:
This seems like a reasonable approach, though I am not sure I understand completely. I've constructed a simple example to make sure I do. Lets consider a simple graph with all edges weighted 1, and adjacency list representation as follows:
A->B,C
B->C
C->D,F
F->D
Say we wanted to find all paths from A to F, not just those in range, and destination vertices from a source vertex are explored in alphabetic order. Then the algorithm would work as follows:
First starting with B:
ABCDF
ABCF
Then starting with C:
ACDF
ACF
Is that correct?
A simple improvement in that case, would be to store for each vertex visited, the paths found after the first visit to that node. For example, in this example, once you visit C from B, you find that there are two paths to F from C: CF and CDF. You can save this information, and in the next iteration once you reach C, you can just append CF and CDF to the path you have found, and won't need to explore further.
To find edges in range, you can use the conditions you already described for paths generated as above.
A further thought: maybe you do not need to run Dijkstra's to find shortest paths at all. A subpath's length will be found the first time you traverse the subpath. So, in this example, the length of CDF and CF the first time you visit C via B. This information can be used for pruning the next time C is visited directly via A. This length will be more accurate than that found by Dijkstra's, as it would be the exact value, not the lower bound.
Further comments:
The algorithm can probably be improved with some thought. For example, each time the relaxation step is executed in Dijkstra's algorithm (steps 16-19 in the wikipedia description), the rejected older/newer subpath can be remembered using some data structure, if the older path is a plausible candidate (less than upper bound). In the end, it should be possible to reconstruct all the rejected paths, and keep the ones in range.
This algorithm should be O(V^2).
I think visiting each vertex only once may be too optimistic: algorithms such as Djikstra's shortest path have complexity v^2 for finding a single path, the shortest path. Finding all paths (including shortest path) is a harder problem, so should have complexity at least V^2.
My first thought on approaching the problem is a variation of Djikstra's shortest path algorithm. Applying this algorithm once would give you the length of the shortest path. This gives you a lower bound on the path length between the two vertices. Removing an edge at a time from this shortest path, and recalculating the shortest path should give you slightly longer paths.
In turn, edges can be removed from these slightly longer paths to generate more paths, and so on. You can stop once you have a sufficient number of paths, or if the paths you generate are over your upper bound.
This is my first guess. I am a newbie to stackoverflow: any feedback is welcome.

Resources