Adjacency List structure in HBase

Adjacency List structure in HBase - hadoop

I'm trying to implement the following graph reduction algorithm in
The graph is an undirected weighted graph
I want to strip away all nodes with only two neighbors
and update the weights
Have a look at the following illustration:
Algorithm reduce graph http://public.kungi.org/graph-reduction.png
The algorithm shall transform the upper graph into the lower one. Eliminate node 2 and update the weight of the edge to: w(1-3) = w(1-2)+w(2-3)
Since I have a very large graph I'm doing this with MapReduce.
My Question is how to represent the graph in HBase. I thought about building an adjacency list structure in HBase like this:
Column families: nodes, neighbors
1 -> 2, 6, 7
...
Is there a nicer way to do this?

Adjacency lists are the most frequently recommended structure.
You could use each node ID as the row ID and neighbor IDs as column qualifiers, with the weights as values.

Related

faster graph traversal algorithms compared to dfs

I have an undirected unweighted graph represented using adjacency matrix where each node of the graph represents a space partition (e.g. State) while the edges represent the neiborhood relationship (i.e. neighboring states sharing common boundaries). My baseline algorithm uses DFS to traverse the graph and form subgraphs after each step (i.e. adding the new node visited which would result in a bunch of contiguous states). With that subgraph I perform a statistical significance test on certain patterns which exist in the nodes of the graph (i.e. within the states).
At this point I am essentially trying to make the traversal step faster.
I was wondering if you all could suggest any algorithm or resources (e.g. research paper) which performs graph traversal computationally faster than DFS.
Thanks for your suggestion and your time!

Most graph algorithms contain "for given vertex u, list all its neighbors v" as a primitive. Not sure, but sounds like you might want to speed up this piece. Indeed, each country has only few neighbors, typically much less than the total number of countries. If this is the case, replace adjacency matrix graph representation with adjacency lists.
Note that the algorithm itself (DFS or other) will likely remain the same, with just a few changes where it uses this primitive.

Reduce graph by removing transitive nodes

I'm working on a problem that requires removing transitive nodes in a graph. More specifically I need to reduce the number of edges by removing the nodes from the path between two sets of nodes.
Picture says a thousand words so here is what I'm trying to do
The graph contains 3 types of nodes (Ai, Bi, Ci). I'd like to reduce the graph by removing all the nodes Bi on a path between nodes Ai and Ci, whilst preserving reachability between the Ai,Ci nodes.
This is a tripartite graph, indeed, and I'm wondering if there is an efficient algorithm that can reduce it as per the description shown in the attached picture.

If we let A denote the adjacency matrix between the As and Bs and B denote the adjacency matrix between the Bs and Cs, then the adjacency matrix of the resulting graph is the Boolean matrix product AB. In theory the fast matrix multiplication algorithms apply (if you have dense matrices), but in practice I doubt that they'll help much.

An extension of bipartite graph maximum matching

Assume that there are two graphs like this:
We aim to find the matching correspondences between the two graph.And now we use a method to calculate the similarity of two nodes between the two graphs.
w(A,1) means the similarity of the node A from the left graph between the node 1 from the right graph. Then we can have table like this:
Our target is to calculate the maximum weight matching of all this nodes. And we can use the algorithm Kuhn－Munkras to solve this problem.
But now the question is that is if we add the similarity between edges from the two graphs,how can we calulate the maximum weight matching. It means that the table become this:
AA means the node A, and AB means the edge from A to B. The constraints are that if the final result is that node A matches node 1,the edge AB must matches 12 or 13.So can we use a algorithm like Kuhn－Munkras to solve this problem? If not , how can we find the the maximum weight matching in polynomial time?

Suppose we want to know if two graphs are isomorphic, e.g. the two in your example.
In the first graph we have edges AC and CB, while in the second graph we have edges 13 and 32.
We can set the weight matrix such that there is a high reward for mapping any edge in the first to an edge in the second.
i.e. AC->13 and AC->32 and CB->13 and CB->32 will all have weight 1, while all other matchings have weight zero.
There is an isomorphism between the graphs if and only if there is a maximum weight matching with weight equal to the number of edges.
Therefore your problem is at least as hard as graph isomorphism so it is unlikely that the Kuhn algorithm can be efficiently extended to this case.

BOOST graph queries

I have the following queries:
I want to know how to create a graph dynamically
How to manage multiple weights on a graph
How to find the distance from a particular node to another in a minimum spanning tree using kruskal.In kruskals the minimum spanning tree is output as a vector of edges.Hence the vertices are not explicitly stored. I do not know how to get the distance for say an example node 0 to the node furthest from it. I tried getting the vertices using sourc and target and then storing the verices in an array.After that, locating node 0 and from there iterating and reverse iterating through the vertices calculating the weights to find the largest diatance from the node 0. But I fell I'm using the most round about way of going about it.There must be a function for this, or perhaps a clevere way of going about this.
Does kruskal store the edges in the spanning tree in order of the spanning tree? Or at least is the first node of the first edge stored the actual first node? How can I get the order of the nodes in spanning tree in kruskals?
Similarly how can I get the weight of the spanning tree using Prim? The way I did it was to use the predecessor array where predecessors are stored and find what edge in weightsMap and add it.Is there an easier way? And in prims the distanceMap stores the distance from node 0 to the others in the original graph and not the spanning tree right?

Making a fully connected graph using a distance metric

Say I have a series of several thousand nodes. For each pair of nodes I have a distance metric. This distance metric could be a physical distance ( say x,y coordinates for every node ) or other things that make nodes similar.
Each node can connect to up to N other nodes, where N is small - say 6.
How can I construct a graph that is fully connected ( e.g. I can travel between any two nodes following graph edges ) while minimizing the total distance between all graph nodes.
That is I don't want a graph where the total distance for any traversal is minimized, but where for any node the total distance of all the links from that node is minimized.
I don't need an absolute minimum - as I think that is likely NP complete - but a relatively efficient method of getting a graph that is close to the true absolute minimum.

I'd suggest a greedy heuristic where you select edges until all vertices have 6 neighbors. For example, start with a minimum spanning tree. Then, for some random pairs of vertices, find a shortest path between them that uses at most one of the unselected edges (using Dijkstra's algorithm on two copies of the graph with the selected edges, connected by the unselected edges). Then select the edge that yielded in total the largest decrease of distance.

You can use a kernel to create edges only for nodes under a certain cutoff distance.
If you want non-weighted edges You could simply use a basic cutoff to start with. You add an edge between 2 points if d(v1,v2) < R
You can tweak your cutoff R to get the right average number of edges between nodes.
If you want a weighted graph, the preferred kernel is often the gaussian one, with
K(x,y) = e^(-d(x,y)^2/d_0)
with a cutoff to keep away nodes with too low values. d_0 is the parameter to tweak to get the weights that suits you best.
While looking for references, I found this blog post that I didn't about, but that seems very explanatory, with many more details : http://charlesmartin14.wordpress.com/2012/10/09/spectral-clustering/
This method is used in graph-based semi-supervised machine learning tasks, for instance in image recognition, where you tag a small part of an object, and have an efficient label propagation to identify the whole object.
You can search on google for : semi supervised learning with graph

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Adjacency List structure in HBase - hadoop

Adjacency lists are the most frequently recommended structure. You could use each node ID as the row ID and neighbor IDs as column qualifiers, with the weights as values.

Related

faster graph traversal algorithms compared to dfs

Reduce graph by removing transitive nodes

An extension of bipartite graph maximum matching

BOOST graph queries

Making a fully connected graph using a distance metric

Categories

Resources