I'd like to use the parallel MST algorithm dense_boruvka_minimum_spanning_tree from boost.
One required parameter for the interface of that algorithm is a graph which "must be a model of Vertex List Graph and Distributed Edge List Graph". I found that the only model of boost that incorporates the Distributed Edge List Graph concept is the Distributed Adjacency List. However in the section "Graph Concepts" of that model it is explicitly said that
"[...] the distributed adjacency list does not model the Vertex List Graph or Edge List Graph concepts [...]
(Emphasized by me)
At this point I am confused. I'm supposed to pass a data structure to the interface of a boost algorithm which is not provided by the framework? Did I misunderstand something?
NB: I'm pretty new in the boost world.
Boost graph provides generic algorithms around concepts, and has historically included very few models of graph concepts. People will generally have their graphs already in some existing datastructure that they can adapt.
In this light
At this point I am confused. I'm supposed to pass a data structure to the interface of a boost algorithm which is not provided by the framework?
is not even so strange.
The concept of DistributedAdjacencyList only provides DistributedVertexListGraph when you need VertexListGraph.
The key difference is highlighted under DVLG:
A Distributed Vertex List Graph is a graph whose vertices are distributed across multiple processes or address spaces. The vertices and num_vertices functions retain the same signatures as in the Vertex List Graph concept, but return only the local set (and size of the local set) of vertices.
In other words: a DVLG is really just a VLG already, just distributed.
What you will want to do is "undistribute" the DVLG using the VertexListAdaptor:
The vertex list graph adaptor adapts any model of Distributed Vertex List Graph in a Vertex List Graph. In the former type of graph, the set of vertices is distributed across the process group, so no process has access to all vertices. In the latter type of graph, however, every process has access to every vertex in the graph. This is required by some distributed algorithms, such as the implementations of Minimum spanning tree algorithms.
The solution seems to be the "VertexListAdaptor": http://www.boost.org/doc/libs/1_47_0/libs/graph_parallel/doc/html/vertex_list_adaptor.html
Related
I was looking for different Data structures for representing Graph and I came accross Nvidia CUDA Toolkit and found out new way to represent graph with the help of source_indices, destination_offsets.
Fascinated by this innovative representation of graph, I searched out for other ways of representing Graphs. But not found anything new.
I was wondering if there was any other way to represent Graph other than Adjacency Matrix or Lists...
I was wondering if there was any other way to represent Graph other
than Adjacency Matrix or Lists...
There are alternatives to the adjacency list or the adjacency matrix, such as edge list, adjacency map or forward star to name a few. Given this graph (images taken from here):
this is the adjacency matrix representation:
this is the adjacency list representation:
this would be another alternative, the edge list:
and another pretty common one is the forward star representation:
If you get into this research field you will find a good number of approaches, mainly optimizations for specific cases, taking into account factors such as:
Graph size (number of nodes)
Density of the graph
Directed or undirected graph
Static or dynamic graph
Graph known at compile time or constructed at runtime
Node IDs (labeled sequentially or not)
...
These optimizations can, for example, support reordering of the nodes in a preprocessing stage to increase reference locality. There is a lot of work for shortest path algorithms, specially when calculating the shortest path in a world map.
One example of optimization would be a dynamic graph structure (Packed-Memory Graph (PMG)) which is suited for large-scale transportation networks.
There is another representation of graphs using Adjacency Set. It is very much similar to adjacency list but instead of using Linked lists, Disjoint Sets [Union-Find] are used. You can read about disjoint sets ADT here.
If E is the number of edges and V is the number of vertices in the graph, then Adjacency set representation of graph takes up (E+V) space.
Complexities of other operations while using adjacency set representation:
Checking edge between vertex v and w : log(Degree(v))
Iterate over edges incident to vertex v: Degree(v)
In a recent algorithms course we had to form a condensation graph and compute its reflexive-transitive closure to get a partial order. But it was never really explained why we would want to do that in a graph. I understand the gist of a condensation graph in that it highlights the strongly connected components, but what does the partial order give us that the original graph did not?
The algorithm implemented went like this:
Find strongly connected components (I used Tarjan's algorithm)
Create condensation graph for the SCCs
Form reflexive-transitive closure of adjacency matrix (I used Warshall's algorithm)
Doing that forms the partial order, but.... what advantage does finding the partial order give us?
Like any other data structure or algorithm, advantages are there only if it's properties are needed :-)
Result of procedure you described is structure that can be used to (easily) answer questions like:
For two nodes x, y. Is it x<=y and/or y<=x, or neither?
For a node x, find all nodes a that are a<=x, or x<=a?
These properties can be used to answer other questions about initial graph (DAG). Like, if adding edge x->y will produce a cycle. That can be checked by intersecting set A, of a<=x, and set B of y<=b. If A intersection B is not empty than edge x->y creates a cycle.
Structure also can be used to simpler implement algorithms that use graph to describes other dependencies. E.g. x->y means that result of calculation x is used for calculation y. If calculation x is changed than all calculations a where x<=a should be re-evaluated or flagged 'dirty' or result of x removed from a cache.
Suppose I have an undirected weighted connected graph. I want to group vertices that have highest edges' value all together (vertices degree). Using clustering algorithms is one way. What clustering algorithms can I consider for this task? I hope it is clear; any question for clarification, please ask. Thanks.
There are two main approach - giving your graph as an input to existing tool, or using expert knowledge you have on this graph (and its domain) in order to create a representation, and then apply machine learning methods on it.
I'll start with the second approach:
If you have only the nodes and edges (no farther data for each node), you first need to think of a representation for each node\edge. I going to explain about nodes, but it should should be similar for edges' case.
The simplest approach is to represent each node n as a connectivity vector:
Every node will be represented as n=(Ia(n),Ib(n),Ic(n),Id(n),Ie(n)), where Ii(n)=1 in case node n is a 'friend' (neighbor) of node i, and 0 otherwise. (e.g.a=(0,1,1,0,1))
Note that you can decide if a node is a friend of itself.
Second approach, which is quite similar to the first one, is to use edges' weights vector:
n=(W(a,n),W(b,n),W(c,n),W(d,n),W(e,n)) , where W(i,n) is the weight of the edge (i,n).
There are a few more ways to represent nodes, but this is enough in order to run some calculations on it.
After you have this presentation, you can start applying some clustering algorithms on it.
kmeans is considered great for this task, and sklearn has a great implementation. It has some parameters you can (and should) configure (i.e. the distance measure).
The product of kmeans, is k different non-intersecting groups of nodes.
If you want to pass you graph to an algorithm and get some measures, there are more advanced algorithms you can apply. community detection is used to find communities in a graph. Again, there is a nice python implementation in the networkxpackage.
Is there an algorithm that, when given a graph, computes the vertex connectivity of that graph (the minimum number of vertices to remove in order to separate the graph into two connected graphs). (Note that the graph may be already be disconnected). Thanks!
See:
Determining if a graph is K-vertex-connected
k-vertex connectivity of a graph
When you combine this with binary search you are done.
This book chapter should have everything you need to get started; it is basically a survey over algorithms to determine the edge connectivity and vertex connectivity of graphs, with pseudo code for the algorithms described in it. Page 12 has an overview over the available algorithms alongside complexity analyses and references. Most of the solutions for the general case are flow-based, with the exception of one randomized algorithm. The different general solutions optimize for different properties of the graph, so you can choose the most asymptotically efficient one beforehand. Also, for some classes of graphs, there exist specialized algorithms with better complexity than the general solutions provide.
Here's my situation. I have a graph that has different sets of data being added at different times. For example, set1 might have a few thousand nodes and then set2 comes in later and we apply business logic to create edges from set1 to set2(and disgard any Vertices from set1 that do not have edges to set2). Then at a later point, we get set3, set4, and so on and the same process applies between each set and its previous set.
Question, what's the best way to organize this? What I did before was name the nodes set1-xx, set2-xx,etc.. The problem I faced was when I was trying to run analytics between the current set and the previous set I would have to run a loop through the entire graph and look for all the nodes that started with 'setx'. It took a long time as the graph grew, so I thought of another solution which was to create a node called 'set1' and have it connected to all nodes for that particular set. I am testing it but I was wondering if there way a more efficient way or a build in way of handling data structures like this? Is there a way to somehow segment data like this?
I think a general solution would be application but if it helps I'm using neo4j(so any specific solution to that database would be good as well).
You have a very special type of a directed graph, called a layered graph.
The choice of the data structure depends primarily on the expected graph density (how many nodes from a previous set/layer are typically connected to a node in the current set/layer) and on the operations that you need to perform on it most of the time. It is definitely a good idea to have each layer directly represented by a numeric index (that is, the outermost structure will be an array of sets/layers), and presumably you can also use one array of vertices per layer. However, the list of edges per vertex (out only, or in and out sets of edges depending on whether you ever traverse the layers backward) may be any of the following:
Linked list of vertex identifiers; this is good if the graph is very sparse and edges are often added/removed.
Sorted array of vertex identifiers; this is good if the graph is quite sparse and immutable.
Array of booleans, indexed by vertex identifiers, determining whether a given vertex is or is not linked by an edge from the current vertex; this is good if the graph is dense.
The "vertex identifier" can take many forms. For example, it can be an index into the array of vertices on the next layer.
Your second solution is what I would do- create a setX node and connect all nodes belonging to that set to setX. That way your data is partitioned and it is easier to query.