Just for kind of "fun" I'm developing almost every algorithm (if possible) shown in the book Introduction to Algorithms (Cormen) in C. I've reached the graphs chapters and I'm not sure how to design my functions, but first take a look at my data structures (hope this will make clear my question).
typedef struct s_multi_matrix
{
int width;
int height;
int depth;
multi_matrix_type type; // stores either MMTYPE_INT or MMTYPE_RECORD enums to know where to read values (ivals or rvals)
int * ivals;
DT_record * rvals;
} DT_multi_matrix;
typedef struct s_double_linked_list
{
DT_record * sentinel;
} DT_double_linked_list;
typedef struct s_record // multi purpose structure
{
int key;
enum e_record_type type; // stores either TYPE_INT, TYPE_SZ, TYPE_FLOAT, TYPE_VOID to know how to use the data union
union
{
int ival;
char * sval;
float fval;
void * vval;
} data;
struct s_record * left, // for trees, disjoint sets and linked lists
* right,
* parent;
} DT_record;
typedef struct s_graph // this is the structure I'm focusing on right now
{
graph_type type; // GRAPH_DIRECTED or GRAPH_UNDIRECTED
graph_adj_type adj_type; // ADJ_LIST or ADJ_MATRIX
edge_type etype; // WEIGHTED or NOT_WEIGHTED
union
{
DT_double_linked_list * list;
DT_multi_matrix * matrix;
} adjacency;
} DT_graph;
So, I'm thinking about several functions to manipulate the DT_graph type:
// takes a pointer to a pointer to a graph, allocates memory and sets properties
void build_graph(DT_graph ** graph_ptr, graph_type gtype, graph_adj_type atype);
// prints the graph in file (using graphviz)
void print_graph(DT_graph * graph, char * graph_name);
This is the tricky part, since my graph type has several different type combinations (undirected and weighted edges, directed and weighted edges, undirected and not weighted edges, directed and not weighed edges, ...) I'm wondering which is the best approach for the functions:
void dgraph_add_wedge(DT_graph * graph, DT_record * u, DT_record * v, int weight);
void ugraph_add_wedge(DT_graph * graph, DT_record * u, DT_record * v, int weight);
void dgraph_add_nwedge(DT_graph * graph, DT_record * u, DT_record * v);
void ugraph_add_nwedge(DT_graph * graph, DT_record * u, DT_record * v);
The first two would add a weighted vertex into a directed / undirected graph and the last two would do the same thing but without any weight related to the edge.
The other approach that comes to my mind looks like this:
void graph_add_edge(DT_graph * graph, DT_record * u, DT_record * v, edge_type etype, graph_type gtype);
This seems like a "golden" method for everything, and depending on the values for etype and gtype would do different operations on the graph.
Soooooooo, based upon your experiences and knowledge, what do you recommend?
BTW, I'm sure that any one has asked this before, since this is MY implementation.
Pity this is plain C. With C++ several of these questions would be handled by the polymorphic features of the language. On the other hand studying algorithms that way make you focus on the algorithm/data-structure proper rather than some slick features of the language.
Anyway... My two cents:
With regards to selecting the d or u Graph Type: Why not addomg an attribute to the DT_Graph to inform the method(s) called of the graph type. After all, this is specified when the graph is created. ==> Bam! We're down to only 2 methods.
With regards to the edge weights... Maybe having two methods is preferable, API-wise : why bother the non weighed cases with the extra argument. Implementation-wise you can of course share as much of the logic as possible between all 4 cases. And frankly, once you've written all this you can still facade these behind a single "golden" method as the one you suggested.
Good luck with your coding!
Related
The question is as follows:
Find an algorithm that gets a pointer to a binary tree (to the beginning of it) and returns the number of leaves that are in an even depth (level).
In this example, the algorithm will return 2 because we won't count the leaf in level 1 (since 1 is not even).
I guess I need a recursive algorithm. It's pretty easy if I pass two parameters I pass in the function (a pointer to a tree and level).
I'm wondering if I can solve it with passing the pointer only, without the level.
Consider a function f which recursively descends in your tree. You have to differantiate three cases:
Your current node has no children and its depth is even. You return 1.
Your current node has no children and its depth is odd. You return 0.
Your current node has children. You return the sum of all recursive calls of f on these children.
You have to define f on your own.
And no, it is not possible to define f with only one parameter. You have to memorize the current node as well as the actual depth. Recursive Algorithms, by their very nature, have no idea from where they are being called. You can, of course (but not recommended) remember the latter in a static variable as long as you do not parallelize f.
Also, you can "override" f that it takes only one paremeter and calls function f taking two parameters with the current depth set to 0.
You can, indeed, solve it using only one perimeter. However in that case, you need two little helper functions:
typedef struct TreeNode
{
int val;
struct TreeNode *left;
struct TreeNode *right;
} TreeNode;
int countForOdd(TreeNode*);
int countForEven(TreeNode*);
int count(TreeNode*);
//If the TreeNode to be passed as perimeter is at an odd level, call this function
int countForOdd(TreeNode *node)
{
if(!node) return 0;
return countForEven(node->left)
+ countForEven(node->right);
}
//If the TreeNode to be passed as perimeter is at an even level, call this function
int countForEven(TreeNode *node)
{
if(!node) return 0;
return 1 + countForOdd(node->left)
+ countForOdd(node->right);
}
//And finally, our specific function for root is:
int count(TreeNode* root)
{
return countForOdd(root);
}
I have been assigned a project where i must take in a bunch of nodes, as well as edges with weights between certain nodes.
I must then use this information to find a minimal spanning tree for each connected component of the graph (so if the graph has two connected components i need to create two spanning trees)
The catch is i cannot use any STL libraries except for .
I know i will need to create my own data structures but i don't know which ones i will need. I suppose a minimum heap would be useful for finding the lowest weight edges to use but how would i go about creating a min heap for each connected component?
And i was thinking i need to implement union-find in order to organize the sets of connected components.
what other data structures would i need to implement for this?
For union-find you need to implement DISJOINT SET.
here is simple implementation using simple arrays.. have a look
// Disjoint Set implementation
// Shashank Jain
#include<iostream>
#define LL long long int
#define LIM 100005
using namespace std;
int p[LIM],n; // p is for parent
int rank[LIM];
void create_set()
{
for(int i=1;i<=n;i++)
{
p[i]=i;
rank[i]=0;
}
}
int find_set(int x)
{
if(x==p[x])
return x;
else
{
p[x]=find_set(p[x]);
return p[x];
}
}
void merge_sets(int x,int y)
{
int px,py;
px=find_set(x);
py=find_set(y);
if(rank[px]>rank[py])
p[py]=px;
else
if(rank[py]>rank[px])
p[px]=py;
else
if(rank[px]==rank[py])
{
p[px]=py;
rank[py]++;
}
}
int main()
{
cin>>n; // no: of vertex , considering that vertex are numbered from 1 to n
create_set();
int a,b,q,i;
cin>>q; // queries
while(q--)
{
cin>>a>>b;
merge_sets(a,b);
}
for(i=1;i<=n;i++)
{
cout<<find_set(i)<<endl; // vertex having same value of find_set i.e same representative of set are in same subset
}
return 0;
}
I'm going to assume that you can choose your MST algorithm and that the output is a list of edges. Borůvka's algorithm is simple to implement and needs no data structures other than the graph and a disjoint set structure. By contrast, Prim's algorithm requires a priority queue and some logic to handle disconnected graphs, and Kruskal's algorithm requires a disjoint set structure and a sorting algorithm. I would set up the data structures like this. There is an adjacency record for each incident vertex-edge pair.
struct Adjacency;
struct Edge {
int weight;
};
struct Vertex {
struct Adjacency *listhead; // singly-linked list of adjacencies
struct Vertex *parent; // union-find parent
};
struct Adjacency {
struct Adjacency *listnext;
struct Edge *edge;
struct Vertex *endpoint; // the "other" endpoint
};
I have been working on my graph/network problem, and I think I finally know what I want to do. Now that I am getting into the implementation, I am having issues deciding what libraries to use. The graph itself is pretty simple, each node is labeled by a string, and each each is a probability/correlation coefficient between the two nodes(variables), and is undirected. The operations that I want to perform on the graph are:
Inserting new nodes/edges (fast)
Finding the all pairs shortest (1/probability) path, and remembering the nodes in the path - probably Johnson's algorithm
Constructing the minimum weight Steiner tree for k specific vertices
Use Johnson's algorithm to build shortest paths
Iterating over the current nodes in the path p, find the shortest route to the remaining nodes in k
Looking at the mean degree of the graph
Evaluating the betweenness of the nodes
Getting the clustering coefficients
Finding the modularity of the graph
For many of these, I want to compare the result to the Erdos-Renyi model, testing against it as a null hypothesis. Also, being able to be able to use the statistical mechanics definitions via a Markov Field would be helpful, as then I could calculate correlations between two nodes that are not identical, and ask the graph questions about the entropy, etc. So a good mapping onto a Markov field library of some sort would be useful too.
The crux of the problem at the moment is that I am trying to find a C++ library to work in. I have taken a look at R, but I want something that is going to be more robust and faster. The three libraries that I am considering are:
LEMON
Easy to use and install
Straightforward documentation
Has some of the functions I want already
Dynamically creating a graph from reading in a text file, and making sure there are no duplicate nodes, is a nightmare that I have not been able to figure out
Boost Graph Library
Intractable, arcane definitions for objects, and how to use them
Documentation does not match what the code does, necessarily
Does have many of the algorithms that I want, as well as a very easy way to create a graph from a text file
MultiThreaded Graph Library
Parallelism already incorporated
Reads easier than the BGL
Not as many functions
Still arcane
Further down the road, I envision the graph living on a distributed network, with distributed storage (hadoop or something). I suspect that the whole graph will not fit into memory, and so I will have to come up with a caching scenario to look at parts of the graph.
What library would people suggest for the problem that I described? Would it be better to just use the BGL, and write my own functions? What about the multi-threaded version? Are there any libraries that lend themselves more readily to the type of work I want to do, especially the quantities I want to compute?
Original Post
Thanks!
Edit1
So I am seriously frustrated by the BGL. I have an adjacency list graph, and I want to run my own version of the Johnson's (or Floyd's, at this point, I am not picky) on the graph, and return the Distance Matrix for me to look at. Except that I can't get it to work. Here is my full code implementation thus far:
using namespace boost;
int main()
{
//Read in the file
std::ifstream datafile("stuff");
if (!datafile)
{
std::cerr << "No Stuff file" << std::endl;
return EXIT_FAILURE;
}
//Build the graph
typedef adjacency_list < vecS, vecS, undirectedS, property < vertex_name_t,
std::string >, property < edge_weight_t, double > > Graph;
Graph g;
//Build the two properties we want, string and double
//Note, you have to nest properties for more
typedef property_map< Graph, vertex_index_t >::type vertex_index_map_t;
vertex_index_map_t vertex_index_map = get(vertex_index, g);
typedef property_map < Graph, vertex_name_t >::type name_map_t;
name_map_t name_map = get(vertex_name, g);
typedef property_map < Graph, edge_weight_t >::type probability_map_t;
probability_map_t probability = get(edge_weight, g);
//Map of of the vertices by string
typedef graph_traits < Graph >::vertex_descriptor Vertex;
typedef std::map < std::string, Vertex > NameVertexMap;
NameVertexMap AllNodes;
//Load the file into the graph
for (std::string line; std::getline(datafile, line);)
{
char_delimiters_separator < char >sep(false, "", ";");
tokenizer <> line_toks(line, sep);
tokenizer <>::iterator i = line_toks.begin();
std::string conditionA = *i++;
NameVertexMap::iterator pos;
bool inserted;
Vertex u, v;
boost::tie(pos, inserted) = AllNodes.insert(std::make_pair(conditionA, Vertex()));
if (inserted)
{
u = add_vertex(g);
name_map[u] = conditionA;
pos->second = u;
}
else
{
u = pos->second;
}
std::string correlation = *i++;
std::istringstream incorrelation(correlation);
double correlate;
incorrelation >> correlate;
boost::tie(pos, inserted) = AllNodes.insert(std::make_pair(*i, Vertex()));
if (inserted) {
v = add_vertex(g);
name_map[v] = *i;
pos->second = v;
}
else
{
v = pos->second;
}
graph_traits < Graph >::edge_descriptor e;
boost::tie(e, inserted) = add_edge(u, v, g);
if (inserted)
probability[e] = 1.0/correlate;
}
typedef boost::graph_traits<Graph>::edge_iterator edge_iter;
std::pair<edge_iter, edge_iter> edgePair;
Vertex u, v;
for(edgePair = edges(g); edgePair.first != edgePair.second; ++edgePair.first)
{
u = source(*edgePair.first, g);
v = target(*edgePair.first, g);
std::cout << "( " << vertex_index_map[u] << ":" << name_map[u] << ", ";
std::cout << probability[*edgePair.first] << ", ";
std::cout << vertex_index_map[v] << ":" << name_map[v] << " )" << std::endl;
}
}
Where the input file is of the format NodeA;correlation;NodeB. The code that I pasted above works, but I get into serious trouble when I attempt to include the johnson_all_pairs_shortest_paths functionality. Really what I want is not only a DistanceMatrix D (which I cannot seem to construct correctly, I want it to be a square matrix of doubles double D[V][V], V = num_vertices(g), but it gives me back that I am not calling the function correctly), but also a list of the nodes that were taken along that path, similar to what the wiki article has for Floyd's Algorithm path reconstruction. Should I just make the attempt to roll my own algorithm(s) for this problem, since I can't figure out if the functionality is there or not (not to mention how to make the function calls)? The documentation for the BGL is as obtuse as the implementation, so I don't really have any modern examples to go on.
I'm trying to use implement an "intelligent scissor" for an interactive image segmentation. Therefore, I have to create a directed graph from an image where each vertex represents a single pixel. Each vertex is then conntected to each of its neighbours by two edges: one outgoing and one incoming edge. This is due to the fact that the cost of an edge (a,b) may differ from the cost of (b,a). I'm using images with a size of 512*512 pixel so i need to create a graph with 262144 vertices and 2091012 edges. Currently, I'm using the following graph:
typedef property<vertex_index_t, int,
property<vertex_distance_t, double,
property<x_t, int,
property<y_t, int
>>>> VertexProperty;
typedef property<edge_weight_t, double> EdgeProperty;
// define MyGraph
typedef adjacency_list<
vecS, // container used for the out-edges (list)
vecS, // container used for the vertices (vector)
directedS, // directed edges (not sure if this is the right choice for incidenceGraph)
VertexProperty,
EdgeProperty
> MyGraph;
I'm using an additional class Graph (sorry for the uninspired naming) which handles the graph:
class Graph
{
private:
MyGraph *graph;
property_map<MyGraph, vertex_index_t>::type indexmap;
property_map<MyGraph, vertex_distance_t>::type distancemap;
property_map<MyGraph, edge_weight_t>::type weightmap;
property_map<MyGraph, x_t>::type xmap;
property_map<MyGraph, y_t>::type ymap;
std::vector<MyGraph::vertex_descriptor> predecessors;
public:
Graph();
~Graph();
};
Creating a new graph with 262144 vertices is pretty fast but the insertion of the edges tooks up to 10 seconds which is way too slow for the desired application. Right now, I'm inserting the edges the following way:
tie(vertexIt, vertexEnd) = vertices(*graph);
for(; vertexIt != vertexEnd; vertexIt++){
vertexID = *vertexIt;
x = vertexID % 512;
y = (vertexID - x) / 512;
xmap[vertexID] = x;
ymap[vertexID] = y;
if(y > 0){
if(x > 0){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y-1)+(x-1)], *graph); // upper left neighbour
}
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y-1)+(x)], *graph); // upper
if(x < 511){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y-1)+(x+1)], *graph); // upper right
}
}
if(x < 511){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y)+(x+1)], *graph); // right
}
if(y < 511){
if(x > 0){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y+1)+(x-1)], *graph); // lower left
}
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y+1)+(x)], *graph); // lower
if(x < 511){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y+1)+(x+1)], *graph); // lower right
}
}
if(x > 0){
tie(edgeID, ok) = add_edge(vertexID, indexmap[IRES2D*(y)+(x-1)], *graph); // left
}
}
Is there anything I can do do improve the speed of the programm? I'm using Microsoft Visual C++ 2010 Express in release mode with optimization (as recommended by Boost). I thought I could use a listS container for the vertices or edges but the vertices are no problem and if I use listS for the edges, it gets even slower.
adjacency_list is very general purpose; unfortunately it's never going to be as efficient as a solution exploiting the regularity of your particular use-case could be. BGL isn't magic.
Your best bet is probably to come up with the efficient graph representation you'd use in the absence of BGL (hint: for a graph of an image's neighbouring pixels, this is not going to explicitly allocate all those node and edge objects) and then fit BGL to it (example), or equivalently just directly implement a counterpart to the existing adjacency_list / adjacency_matrix templates (concept guidelines) tuned to the regularities of your system.
By an optimised representation, I of course mean one in which you don't actually store all the nodes and edges explicitly but just have some way of iterating over enumerations of the implicit nodes and edges arising from the fact that the image is a particular size. The only thing you should really need to store is an array of edge weights.
In short, I need a fast algorithm to count how many acyclic paths are there in a simple directed graph.
By simple graph I mean one without self loops or multiple edges.
A path can start from any node and must end on a node that has no outgoing edges. A path is acyclic if no edge occurs twice in it.
My graphs (empirical datasets) have only between 20-160 nodes, however, some of them have many cycles in them, therefore there will be a very large number of paths, and my naive approach is simply not fast enough for some of the graph I have.
What I'm doing currently is "descending" along all possible edges using a recursive function, while keeping track of which nodes I have already visited (and avoiding them). The fastest solution I have so far was written in C++, and uses std::bitset argument in the recursive function to keep track of which nodes were already visited (visited nodes are marked by bit 1). This program runs on the sample dataset in 1-2 minutes (depending on computer speed). With other datasets it takes more than a day to run, or apparently much longer.
The sample dataset: http://pastie.org/1763781
(each line is an edge-pair)
Solution for the sample dataset (first number is the node I'm starting from, second number is the path-count starting from that node, last number is the total path count):
http://pastie.org/1763790
Please let me know if you have ideas about algorithms with a better complexity. I'm also interested in approximate solutions (estimating the number of paths with some Monte Carlo approach). Eventually I'll also want to measure the average path length.
Edit: also posted on MathOverflow under same title, as it might be more relevant there. Hope this is not against the rules. Can't link as site won't allow more than 2 links ...
This is #P-complete, it seems. (ref http://www.maths.uq.edu.au/~kroese/ps/robkro_rev.pdf). The link has an approximation
If you can relax the simple path requirement, you can efficiently count the number of paths using a modified version of Floyd-Warshall or graph exponentiation as well. See All pairs all paths on a graph
As mentioned by spinning_plate, this problem is #P-complete so start looking for your aproximations :). I really like the #P-completeness proof for this problem, so I'd think it would be nice to share it:
Let N be the number of paths (starting at s) in the graph and p_k be the number of paths of length k. We have:
N = p_1 + p_2 + ... + p_n
Now build a second graph by changing every edge to a pair of paralel edges.For each path of length k there will now be k^2 paths so:
N_2 = p_1*2 + p_2*4 + ... + p_n*(2^n)
Repeating this process, but with i edges instead of 2, up n, would give us a linear system (with a Vandermonde matrix) allowing us to find p_1, ..., p_n.
N_i = p_1*i + p_2*(i^2) + ...
Therefore, finding the number of paths in the graph is just as hard as finding the number of paths of a certain length. In particular, p_n is the number of Hamiltonian Paths (starting at s), a bona-fide #P-complete problem.
I havent done the math I'd also guess that a similar process should be able to prove that just calculating average length is also hard.
Note: most times this problem is discussed the paths start from a single edge and stop wherever. This is the opposite from your problem, but you they should be equivalent by just reversing all the edges.
Importance of Problem Statement
It is unclear what is being counted.
Is the starting node set all nodes for which there is at least one outgoing edge, or is there a particular starting node criteria?
Is the the ending node set the set of all nodes for which there are zero outgoing edges, or can any node for which there is at least one incoming edge be a possible ending node?
Define your problem so that there are no ambiguities.
Estimation
Estimations can be off by orders of magnitude when designed for randomly constructed directed graphs and the graph is very statistically skewed or systematic in its construction. This is typical of all estimation processes, but particularly pronounced in graphs because of their exponential pattern complexity potential.
Two Optimizing Points
The std::bitset model will be slower than bool values for most processor architectures because of the instruction set mechanics of testing a bit at a particular bit offset. The bitset is more useful when memory footprint, not speed is the critical factor.
Eliminating cases or reducing via deductions is important. For instance, if there are nodes for which there is only one outgoing edge, one can calculate the number of paths without it and add to the number of paths in the sub-graph the number of paths from the node from which it points.
Resorting to Clusters
The problem can be executed on a cluster by distributing according to starting node. Some problems simply require super-computing. If you have 1,000,000 starting nodes and 10 processors, you can place 100,000 starting node cases on each processor. The above case eliminations and reductions should be done prior to distributing cases.
A Typical Depth First Recursion and How to Optimize It
Here is a small program that provides a basic depth first, acyclic traversal from any node to any node, which can be altered, placed in a loop, or distributed. The list can be placed into a static native array by using a template with a size as one parameter if the maximum data set size is known, which reduces iteration and indexing times.
#include <iostream>
#include <list>
class DirectedGraph {
private:
int miNodes;
std::list<int> * mnpEdges;
bool * mpVisitedFlags;
private:
void initAlreadyVisited() {
for (int i = 0; i < miNodes; ++ i)
mpVisitedFlags[i] = false;
}
void recurse(int iCurrent, int iDestination,
int path[], int index,
std::list<std::list<int> *> * pnai) {
mpVisitedFlags[iCurrent] = true;
path[index ++] = iCurrent;
if (iCurrent == iDestination) {
auto pni = new std::list<int>;
for (int i = 0; i < index; ++ i)
pni->push_back(path[i]);
pnai->push_back(pni);
} else {
auto it = mnpEdges[iCurrent].begin();
auto itBeyond = mnpEdges[iCurrent].end();
while (it != itBeyond) {
if (! mpVisitedFlags[* it])
recurse(* it, iDestination,
path, index, pnai);
++ it;
}
}
-- index;
mpVisitedFlags[iCurrent] = false;
}
public:
DirectedGraph(int iNodes) {
miNodes = iNodes;
mnpEdges = new std::list<int>[iNodes];
mpVisitedFlags = new bool[iNodes];
}
~DirectedGraph() {
delete mpVisitedFlags;
}
void addEdge(int u, int v) {
mnpEdges[u].push_back(v);
}
std::list<std::list<int> *> * findPaths(int iStart,
int iDestination) {
initAlreadyVisited();
auto path = new int[miNodes];
auto pnpi = new std::list<std::list<int> *>();
recurse(iStart, iDestination, path, 0, pnpi);
delete path;
return pnpi;
}
};
int main() {
DirectedGraph dg(5);
dg.addEdge(0, 1);
dg.addEdge(0, 2);
dg.addEdge(0, 3);
dg.addEdge(1, 3);
dg.addEdge(1, 4);
dg.addEdge(2, 0);
dg.addEdge(2, 1);
dg.addEdge(4, 1);
dg.addEdge(4, 3);
int startingNode = 0;
int destinationNode = 1;
auto pnai = dg.findPaths(startingNode, destinationNode);
std::cout
<< "Unique paths from "
<< startingNode
<< " to "
<< destinationNode
<< std::endl
<< std::endl;
bool bFirst;
std::list<int> * pi;
auto it = pnai->begin();
auto itBeyond = pnai->end();
std::list<int>::iterator itInner;
std::list<int>::iterator itInnerBeyond;
while (it != itBeyond) {
bFirst = true;
pi = * it ++;
itInner = pi->begin();
itInnerBeyond = pi->end();
while (itInner != itInnerBeyond) {
if (bFirst)
bFirst = false;
else
std::cout << ' ';
std::cout << (* itInner ++);
}
std::cout << std::endl;
delete pi;
}
delete pnai;
return 0;
}