Related to Graph daa structure - data-structures

Could anyone please differentiate between implicit graph and explicit graph.Actually i tried to read Wikipedia page related to this topic but got confused.

An explicit graph is an explicitly defined graph, where we know how many nodes and edges are present and which nodes have an edge between them.
An implicit graph is not explicitly defined, meaning we don't know about it's nodes and edges instead we are building the graph as we go in some process.
Which is very commonly in a backtrack search.
For example: A simple backtrack for generating all subsets of a set {A, B, C}
#include <iostream>
using namespace std;
char elements[4]= "ABC";
void powerSet(int n, string subset){
if(n < 0){
cout<< subset <<endl;
return;
}
powerSet(n-1, subset);
powerSet(n-1, subset+ elements[n]);
}
int main(){
powerSet(2, string(""));
}
We don't have any graph defined yet we're constructing and using the following tree: (the recursion tree isn't the exact for the code above but it is very similar )
/ \
/ \
a {}
/ \ / \
b {} b {}
/ \ / \ / \ /\
c {} c {} c {} c {}
abc ab ac a bc b c {empty} :We get these subsets

Related

boost::adjacency_list<> and boost::listS

the documentation says:
listS selects std::list
The same is being used in an example I'm trying to adapt.
I don't see anywhere in the example where the edge and the vertex type are being passed to boost::adjacency_list<>.
And unsurprisingly constructing the graph using a begin-end pair into a container of edges does not compile for me.
How can one tell the graph library about the type of edges and vertices one intends to use?
I created a copy of the graph with plain std::size_t for vertices and std::pair<std::size_t, std::size_t> for edges.
I'm unclear why I have to do this, as the graph library is a template library.
Q. I don't see anywhere in the example where the edge and the vertex type are being passed to boost::adjacency_list<>.
You choose all the properties of the graph with the template arguments.
The first two choose how edges and vertexes are to be stored. For adjacency-lists, the adjacency-list is a given, but you can choose the edge container selector (which stores the adjacencies (out-edges) per source vertex) and the actual vertex container selector (which stores the vertices themselves).
The other template arguments include the vertex/edge and graph properties. So, where the container selectors choose how to store graph entities, the properties describe what should be stored.
HOW everything is being stored is ultimately an implementation detail.
Q. And unsurprisingly constructing the graph using a begin-end pair into a container of edges does not compile for me.
We can't say anything about that, because we don't know what you mean by "a container of edges". Do you already have your graph in some other format?¹
The constructor arguments are not required to build a graph. E.g.:
#include <boost/graph/adjacency_list.hpp>
using G = boost::adjacency_list<>;
int main() {
G g(10);
}
Is the simplest possible program that creates a graph with 10, unconnected, vertices. To also print it: Live
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graph_utility.hpp>
using G = boost::adjacency_list<>;
int main() {
G g(10);
print_graph(g);
}
Output shows 10 vertices with no adjacencies:
0 -->
1 -->
2 -->
3 -->
4 -->
5 -->
6 -->
7 -->
8 -->
9 -->
How can one tell the graph library about the type of edges and vertices one intends to use?
Let me start with the "using", then modify the types a little:
Instead you can add edges/vertices after construction: Live
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graph_utility.hpp>
using G = boost::adjacency_list<>;
using V = G::vertex_descriptor;
using E = G::edge_descriptor;
int main() {
G g;
V v1 = add_vertex(g);
V v2 = add_vertex(g);
E e1 = add_edge(v1, v2, g).first;
// or:
auto [ed, inserted] = add_edge(v1, v2, g);
std::cout << "Second insertion happened: " << std::boolalpha << inserted << "\n";
print_graph(g);
}
Output:
Second insertion happened: true
0 --> 1 1
1 -->
Now, let's immediately show the effect of setS as edge storage:
using G = boost::adjacency_list<boost::setS>;
Output becomes - because the setS doesn't admit duplicate out-edges:
Second insertion happened: false
0 --> 1
1 -->
Now, let's also try a different vertex container selector?
using G = boost::adjacency_list<boost::setS, boost::listS>;
Uhoh, now there is trouble printing the graph, because, as you might have read already in the linked documentation:
If the VertexList of the graph is vecS, then the graph has a builtin vertex indices accessed via the property map for the vertex_index_t property. The indices fall in the range [0, num_vertices(g)) and are contiguous. When a vertex is removed the indices are adjusted so that they retain these properties. Some care must be taken when using these indices to access exterior property storage. The property map for vertex index is a model of Readable Property Map.
If you use listS then there is no implicit vertex index. Of course, we can add an index, but let's instead add a name property to our vertex.
There are many ways to add properties, but let me show you the more versatile/friendly version: bundled properties:
struct VertexProps {
std::string name;
};
using G = boost::adjacency_list<boost::setS, boost::listS, boost::undirectedS, VertexProps>;
Now, all we need to do is tell print_graph to use the name property from the bundle instead of the vertex index (which isn't usable for printing since listS):
print_graph(g, get(&VertexProps::name, g));
Of course, it becomes nicer when you actually give the vertices names:
V v1 = add_vertex(VertexProps{"v1"}, g);
V v2 = add_vertex(VertexProps{"v2"}, g);
But of course, names can be changed:
g[v2].name += "(changed)";
See it all together: Live
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graph_utility.hpp>
struct VertexProps {
std::string name;
};
using G = boost::adjacency_list<boost::setS, boost::listS, boost::undirectedS, VertexProps>;
using V = G::vertex_descriptor;
using E = G::edge_descriptor;
int main() {
G g;
V v1 = add_vertex(VertexProps{"v1"}, g);
V v2 = add_vertex(VertexProps{"v2"}, g);
E e1 = add_edge(v1, v2, g).first;
// or:
auto [ed, inserted] = add_edge(v1, v2, g);
g[v2].name += "(changed)";
std::cout << "Second insertion happened: " << std::boolalpha << inserted << "\n";
print_graph(g, get(&VertexProps::name, g));
}
Prints
Second insertion happened: false
v1 <--> v2(changed)
v2(changed) <--> v1
¹ You may not need to copy anything at all, just adapt/use the existing graph)

How to find genetic connection between two nodes

One feature in our current project is to find out relationship of two nodes. One node can be generated by two other nodes together. To put it simple, I take them as a family tree, as below:
A B
/ \ |
C D E
| \ /
F G
I'm going to write a function to decide if two nodes are genetic connected, for example:
is_genetic_connected("D", "F"); // returns true because they have common ancestor: "A"
is_genetic_connected("E", "F"); // returns false
I'm not sure if we can apply LCA here, or is there any other good algorithm for such problem?

Finding the minimum number of calls on a tree

I was asked this question in an interview and struggled to answer it correctly in the time allotted. Nonetheless, I thought it was an interesting problem, and I hadn't seen it before.
Suppose you have a tree where the root can call (on the phone) each of it's children, when a child receives the call, he can call each of his children, etc. The problem is that each call must be done in a number of rounds, and we need to minimize the number of rounds it takes to make the calls. For example, suppose you have the following tree:
A
/ \
/ \
B D
|
|
C
One solution is for A to call D in round one, A to call B in round two, and B to call C in round three. The optimal solution is for A to call B in round one, and A to call D and B to call C in round two.
Note that A cannot call both B and D in the same round, nor can any node call more than one of its children in the same round. However, multiple nodes with a different parent can call simultaneously. For example, given the tree:
A
/ | \
/ | \
B C D
/\ |
/ \ |
E F G
We can have a sequence (where - separates rounds), such as:
A B - B E, A D - B F, A C, D G
(A calls B first round, B calls E and A calls D second, ...)
I'm assuming some type of dynamic programming can be used, but I'm not sure which direction to take this in. My initial inclination is to use DFS to order the longest path from the root to leaves in decreasing order, but when it comes to the nodes actually making calls, I'm not sure how we can achieve optimality given any tree, not how we can output the paths that the optimal calls would make (i.e. in the first example we could output
A B - B C, A D
I think something like this could get the optimal solution:
suppose the value of 'calls' for each of leaves is 1
for each node get the value of calls for all of his children and rank them according to their 'calls' value
consider rank of each child as 'ranks'
to compute the value of 'calls' for each node loop over his children (after computing their ranks) and find the maximum value of 'calls' + 'ranks'
'calls' value of the root node is the answer
It's sorta dynamic programming on trees and you can implement it recursively like this:
int f(node v)
{
int s = 0;
for each u in v.children
{
d[u] = f(u)
}
sort d and rank its values in r (r for the maximum u would be 1)
for each u in v.children
{
s = max(s, d[u] + r[u] + 1)
}
return s
}
Good Luck!

Correlation Network Implementation

I have been working on my graph/network problem, and I think I finally know what I want to do. Now that I am getting into the implementation, I am having issues deciding what libraries to use. The graph itself is pretty simple, each node is labeled by a string, and each each is a probability/correlation coefficient between the two nodes(variables), and is undirected. The operations that I want to perform on the graph are:
Inserting new nodes/edges (fast)
Finding the all pairs shortest (1/probability) path, and remembering the nodes in the path - probably Johnson's algorithm
Constructing the minimum weight Steiner tree for k specific vertices
Use Johnson's algorithm to build shortest paths
Iterating over the current nodes in the path p, find the shortest route to the remaining nodes in k
Looking at the mean degree of the graph
Evaluating the betweenness of the nodes
Getting the clustering coefficients
Finding the modularity of the graph
For many of these, I want to compare the result to the Erdos-Renyi model, testing against it as a null hypothesis. Also, being able to be able to use the statistical mechanics definitions via a Markov Field would be helpful, as then I could calculate correlations between two nodes that are not identical, and ask the graph questions about the entropy, etc. So a good mapping onto a Markov field library of some sort would be useful too.
The crux of the problem at the moment is that I am trying to find a C++ library to work in. I have taken a look at R, but I want something that is going to be more robust and faster. The three libraries that I am considering are:
LEMON
Easy to use and install
Straightforward documentation
Has some of the functions I want already
Dynamically creating a graph from reading in a text file, and making sure there are no duplicate nodes, is a nightmare that I have not been able to figure out
Boost Graph Library
Intractable, arcane definitions for objects, and how to use them
Documentation does not match what the code does, necessarily
Does have many of the algorithms that I want, as well as a very easy way to create a graph from a text file
MultiThreaded Graph Library
Parallelism already incorporated
Reads easier than the BGL
Not as many functions
Still arcane
Further down the road, I envision the graph living on a distributed network, with distributed storage (hadoop or something). I suspect that the whole graph will not fit into memory, and so I will have to come up with a caching scenario to look at parts of the graph.
What library would people suggest for the problem that I described? Would it be better to just use the BGL, and write my own functions? What about the multi-threaded version? Are there any libraries that lend themselves more readily to the type of work I want to do, especially the quantities I want to compute?
Original Post
Thanks!
Edit1
So I am seriously frustrated by the BGL. I have an adjacency list graph, and I want to run my own version of the Johnson's (or Floyd's, at this point, I am not picky) on the graph, and return the Distance Matrix for me to look at. Except that I can't get it to work. Here is my full code implementation thus far:
using namespace boost;
int main()
{
//Read in the file
std::ifstream datafile("stuff");
if (!datafile)
{
std::cerr << "No Stuff file" << std::endl;
return EXIT_FAILURE;
}
//Build the graph
typedef adjacency_list < vecS, vecS, undirectedS, property < vertex_name_t,
std::string >, property < edge_weight_t, double > > Graph;
Graph g;
//Build the two properties we want, string and double
//Note, you have to nest properties for more
typedef property_map< Graph, vertex_index_t >::type vertex_index_map_t;
vertex_index_map_t vertex_index_map = get(vertex_index, g);
typedef property_map < Graph, vertex_name_t >::type name_map_t;
name_map_t name_map = get(vertex_name, g);
typedef property_map < Graph, edge_weight_t >::type probability_map_t;
probability_map_t probability = get(edge_weight, g);
//Map of of the vertices by string
typedef graph_traits < Graph >::vertex_descriptor Vertex;
typedef std::map < std::string, Vertex > NameVertexMap;
NameVertexMap AllNodes;
//Load the file into the graph
for (std::string line; std::getline(datafile, line);)
{
char_delimiters_separator < char >sep(false, "", ";");
tokenizer <> line_toks(line, sep);
tokenizer <>::iterator i = line_toks.begin();
std::string conditionA = *i++;
NameVertexMap::iterator pos;
bool inserted;
Vertex u, v;
boost::tie(pos, inserted) = AllNodes.insert(std::make_pair(conditionA, Vertex()));
if (inserted)
{
u = add_vertex(g);
name_map[u] = conditionA;
pos->second = u;
}
else
{
u = pos->second;
}
std::string correlation = *i++;
std::istringstream incorrelation(correlation);
double correlate;
incorrelation >> correlate;
boost::tie(pos, inserted) = AllNodes.insert(std::make_pair(*i, Vertex()));
if (inserted) {
v = add_vertex(g);
name_map[v] = *i;
pos->second = v;
}
else
{
v = pos->second;
}
graph_traits < Graph >::edge_descriptor e;
boost::tie(e, inserted) = add_edge(u, v, g);
if (inserted)
probability[e] = 1.0/correlate;
}
typedef boost::graph_traits<Graph>::edge_iterator edge_iter;
std::pair<edge_iter, edge_iter> edgePair;
Vertex u, v;
for(edgePair = edges(g); edgePair.first != edgePair.second; ++edgePair.first)
{
u = source(*edgePair.first, g);
v = target(*edgePair.first, g);
std::cout << "( " << vertex_index_map[u] << ":" << name_map[u] << ", ";
std::cout << probability[*edgePair.first] << ", ";
std::cout << vertex_index_map[v] << ":" << name_map[v] << " )" << std::endl;
}
}
Where the input file is of the format NodeA;correlation;NodeB. The code that I pasted above works, but I get into serious trouble when I attempt to include the johnson_all_pairs_shortest_paths functionality. Really what I want is not only a DistanceMatrix D (which I cannot seem to construct correctly, I want it to be a square matrix of doubles double D[V][V], V = num_vertices(g), but it gives me back that I am not calling the function correctly), but also a list of the nodes that were taken along that path, similar to what the wiki article has for Floyd's Algorithm path reconstruction. Should I just make the attempt to roll my own algorithm(s) for this problem, since I can't figure out if the functionality is there or not (not to mention how to make the function calls)? The documentation for the BGL is as obtuse as the implementation, so I don't really have any modern examples to go on.

Find all chordless cycles in an undirected graph

How to find all chordless cycles in an undirected graph?
For example, given the graph
0 --- 1
| | \
| | \
4 --- 3 - 2
the algorithm should return 1-2-3 and 0-1-3-4, but never 0-1-2-3-4.
(Note: [1] This question is not the same as small cycle finding in a planar graph because the graph is not necessarily planar. [2] I have read the paper Generating all cycles, chordless cycles, and Hamiltonian cycles with the principle of exclusion but I don't understand what they're doing :). [3] I have tried CYPATH but the program only gives the count, algorithm EnumChordlessPath in readme.txt has significant typos, and the C code is a mess. [4] I am not trying to find an arbitrary set of fundametal cycles. Cycle basis can have chords.)
Assign numbers to nodes from 1 to n.
Pick the node number 1. Call it 'A'.
Enumerate pairs of links coming out of 'A'.
Pick one. Let's call the adjacent nodes 'B' and 'C' with B less than C.
If B and C are connected, then output the cycle ABC, return to step 3 and pick a different pair.
If B and C are not connected:
Enumerate all nodes connected to B. Suppose it's connected to D, E, and F. Create a list of vectors CABD, CABE, CABF. For each of these:
if the last node is connected to any internal node except C and B, discard the vector
if the last node is connected to C, output and discard
if it's not connected to either, create a new list of vectors, appending all nodes to which the last node is connected.
Repeat until you run out of vectors.
Repeat steps 3-5 with all pairs.
Remove node 1 and all links that lead to it. Pick the next node and go back to step 2.
Edit: and you can do away with one nested loop.
This seems to work at the first sight, there may be bugs, but you should get the idea:
void chordless_cycles(int* adjacency, int dim)
{
for(int i=0; i<dim-2; i++)
{
for(int j=i+1; j<dim-1; j++)
{
if(!adjacency[i+j*dim])
continue;
list<vector<int> > candidates;
for(int k=j+1; k<dim; k++)
{
if(!adjacency[i+k*dim])
continue;
if(adjacency[j+k*dim])
{
cout << i+1 << " " << j+1 << " " << k+1 << endl;
continue;
}
vector<int> v;
v.resize(3);
v[0]=j;
v[1]=i;
v[2]=k;
candidates.push_back(v);
}
while(!candidates.empty())
{
vector<int> v = candidates.front();
candidates.pop_front();
int k = v.back();
for(int m=i+1; m<dim; m++)
{
if(find(v.begin(), v.end(), m) != v.end())
continue;
if(!adjacency[m+k*dim])
continue;
bool chord = false;
int n;
for(n=1; n<v.size()-1; n++)
if(adjacency[m+v[n]*dim])
chord = true;
if(chord)
continue;
if(adjacency[m+j*dim])
{
for(n=0; n<v.size(); n++)
cout<<v[n]+1<<" ";
cout<<m+1<<endl;
continue;
}
vector<int> w = v;
w.push_back(m);
candidates.push_back(w);
}
}
}
}
}
#aioobe has a point. Just find all the cycles and then exclude the non-chordless ones. This may be too inefficient, but the search space can be pruned along the way to reduce the inefficiencies. Here is a general algorithm:
void printChordlessCycles( ChordlessCycle path) {
System.out.println( path.toString() );
for( Node n : path.lastNode().neighbors() ) {
if( path.canAdd( n) ) {
path.add( n);
printChordlessCycles( path);
path.remove( n);
}
}
}
Graph g = loadGraph(...);
ChordlessCycle p = new ChordlessCycle();
for( Node n : g.getNodes()) {
p.add(n);
printChordlessCycles( p);
p.remove( n);
}
class ChordlessCycle {
private CountedSet<Node> connected_nodes;
private List<Node> path;
...
public void add( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.increment( neighbor);
}
path.add( n);
}
public void remove( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.decrement( neighbor);
}
path.remove( n);
}
public boolean canAdd( Node n) {
return (connected_nodes.getCount( n) == 0);
}
}
Just a thought:
Let's say you are enumerating cycles on your example graph and you are starting from node 0.
If you do a breadth-first search for each given edge, e.g. 0 - 1, you reach a fork at 1. Then the cycles that reach 0 again first are chordless, and the rest are not and can be eliminated... at least I think this is the case.
Could you use an approach like this? Or is there a counterexample?
How about this. First, reduce the problem to finding all chordless cycles that pass through a given vertex A. Once you've found all of those, you can remove A from the graph, and repeat with another point until there's nothing left.
And how to find all the chordless cycles that pass through vertex A? Reduce this to finding all chordless paths from B to A, given a list of permitted vertices, and search either breadth-first or depth-first. Note that when iterating over the vertices reachable (in one step) from B, when you choose one of them you must remove all of the others from the list of permitted vertices (take special care when B=A, so as not to eliminate three-edge paths).
Find all cycles.
Definition of a chordless cycle is a set of points in which a subset cycle of those points don't exist. So, once you have all cycles problem is simply to eliminate cycles which do have a subset cycle.
For efficiency, for each cycle you find, loop through all existing cycles and verify that it is not a subset of another cycle or vice versa, and if so, eliminate the larger cycle.
Beyond that, only difficulty is figuring out how to write an algorithm that determines if a set is a subset of another.

Resources