is the Louvain method used in networkx take edge weight in account? - social-networking

I'm using networkX to do some community detection. I wonder if there is a method for community detection can take the weight of edge into account?
I've used the Louvain Method in Gephi, but the result shows me it hasn't refer to edge weight, I want to know if the Louvain Method in networkX can use edge weight?

Yes, it does. Sometimes there is small differences in networkx or the algorithm about which python object property is used as weight, but managing weighting graph is one of the fundamental principles of the algorithm (at least after the first level of recursion).

Related

What clustering algorithms can I consider for graph?

Suppose I have an undirected weighted connected graph. I want to group vertices that have highest edges' value all together (vertices degree). Using clustering algorithms is one way. What clustering algorithms can I consider for this task? I hope it is clear; any question for clarification, please ask. Thanks.
There are two main approach - giving your graph as an input to existing tool, or using expert knowledge you have on this graph (and its domain) in order to create a representation, and then apply machine learning methods on it.
I'll start with the second approach:
If you have only the nodes and edges (no farther data for each node), you first need to think of a representation for each node\edge. I going to explain about nodes, but it should should be similar for edges' case.
The simplest approach is to represent each node n as a connectivity vector:
Every node will be represented as n=(Ia(n),Ib(n),Ic(n),Id(n),Ie(n)), where Ii(n)=1 in case node n is a 'friend' (neighbor) of node i, and 0 otherwise. (e.g.a=(0,1,1,0,1))
Note that you can decide if a node is a friend of itself.
Second approach, which is quite similar to the first one, is to use edges' weights vector:
n=(W(a,n),W(b,n),W(c,n),W(d,n),W(e,n)) , where W(i,n) is the weight of the edge (i,n).
There are a few more ways to represent nodes, but this is enough in order to run some calculations on it.
After you have this presentation, you can start applying some clustering algorithms on it.
kmeans is considered great for this task, and sklearn has a great implementation. It has some parameters you can (and should) configure (i.e. the distance measure).
The product of kmeans, is k different non-intersecting groups of nodes.
If you want to pass you graph to an algorithm and get some measures, there are more advanced algorithms you can apply. community detection is used to find communities in a graph. Again, there is a nice python implementation in the networkxpackage.

Is it possible to use A* search for non grid graphs

I know A* is the most optimal algorithm for finding the shortest path, but I cannot see how any heuristic can work on a non lattice graph? This made me wonder whether or not A* is in fact capable of being used on an undirected or directed graph. If A* is capable of doing this, what kind of heuristic could one use? If A* is not, what is the current fastest algorithm for calculating the shortest path on a directed or undirected non lattice graph? Please comment if any more information is required.
It might be possible yes, but A* is a popular algorithm for finding shortest paths in grid like graphs. For graphs in general there are a lot of other algorithms for shortest path calculation which will match your case better. It depends on your setting which one to use.
If you plan to do a Single Source Shortest Path (SSSP), where you try to find the shortest path from one node to another one and your graph is unweighted you could use Breath-First-Search. A bidirectional version of this algorithm performs well in practice. If you do SSSP and your graph is weighted Dijkstra's algorithm is a common choice, a bidirectional version could be used as well. For all pairs shortest path other algorithms like Floyd-Warshall or Johnson's algorithm are useful.
If you want to use heuristics and estimate the distance before the search is done you can do pre calculations which are mostly applicable to each of the algorithms mentioned above. Some examples:
shortcut calculation (ARC-Flags, SHARC, KFlags)
hub identification (also transite nodes): pre calculate distances between all hub nodes (has to be done only once in non dynamic graphs), find next hub of source and sink e.g. with dijkstra. add up distance between hubs and source to next hub and sink to next hub
structured look up tables, e.g. distance between hubs and distances between a hub an nodes in a specific distance. After pre calculation you never need to traverse the graph again, instead your shortest path calculation is a number of look-ups. this results in high memory consumption but strong performance. You can tune the memory by using the upper approximations for distance calculations.
I would highly recommend that you identify your exact case and do some research for graph algorithms well suited to this one. A lot of research is done in shortest path for dozens of fields of application.

Strategy to build test graphs for Dijkstra's algorithm?

I recently implemented Dijkstra's algorithm to practice Java. I'm now considering how to build random test graphs (with unidirectional edges).
Currently, I use a naive method. Nodes are created at random locations in 2d space (where x and y are unsigned integers between 0 and some MAX_SPACE constant). Edges are randomly created to connect the nodes, so that each node has an outdegree of at least 1 (and at most MAX_DEGREE). Indegree is not enforced. Then I search for a path between the first and last Nodes in the set, which may or may not be connected.
In a more realistic situation, nodes would have a probability of being connected proportional to their proximity in 2d space. What is a good strategy to build random test graphs with that property?
NOTES
I will primarily use this to build graphs that can be drawn and verified by hand, but scaling to larger graphs is a consideration.
The strategy should be easily modified to support the following constants (and maybe others -- let me know if you think of any interesting ones):
MIN_NODES, MAX_NODES: a range of sizes for the graph
CONNECTEDNESS: average out-degree
PROXIMITY: weight given to preferring to connect proximal nodes
You could start by looking at the different random graph generators available in JUNG (Java library):
Barabasi Albert Generator - Simple evolving scale-free random graph generator. At each time step, a new vertex is created and is connected to existing vertices according to the principle of "preferential attachment", whereby vertices with higher degree have a higher probability of being selected for attachment.
Eppstein Power Law Generator - Graph generator that generates undirected graphs with power-law degree distributions.
There are various other generators available to - See Listing Here
For python there is the NetworkX library that also provides many graph generators - Listed Here
With many of these generators you can specify the size, so you can start small and go from there.

Weighted n-coloring problem algorithms

I have 100 vertices and a function f(x,y) that computes the weight of the edge between vertex x and vertex y. f is not particularly expensive, so I could generate an indexed adjacency list with weights, if necessary.
What are some efficient, tractable methods for optimizing the n-coloring of these vertices by minimizing or maximizing the sum of the weights of all edges connecting vertices of the same color?
I imagine simulated annealing could be useful in this circumstance.
Links to code packages would also be super useful so I don't have to rewrite the wheel!
Thanks!
A very handy python package for experimenting with graphs is NetworkX. If you prefer C++ there's also boost, but using graphs in boost will seem ridiculously clumsy after NetworkX.
Simulated annealing isn't a bad idea. You can do a regular coloring first to find a lower bound which will help direct your search. You should define your problem more precisely, though. Do you mean to pick some pivot value for the sum of incoming edges and try to partition the colors around the pivot?

Graph Isomorphism

Is there an algorithm or heuristics for graph isomorphism?
Corollary: A graph can be represented in different different drawings.
What s the best approach to find different drawing of a graph?
It is a hell of a problem.
In general, the basic idea is to simplify the graph into a canonical form, and then perform comparison of canonical forms. Spanning trees are generated with this objective, but spanning trees are not unique, so you need to have a canonical way to create them.
After you have canonical forms, you can perform isomorphism comparison (relatively) easy, but that's just the start, since non-isomorphic graphs can have the same spanning tree. (e.g. think about a spanning tree T and a single addition of an edge to it to create T'. These two graphs are not isomorph, but they have the same spanning tree).
Other techniques involve comparing descriptors (e.g. number of nodes, number of edges), which can produce false positive in general.
I suggest you to start with the wiki page about the graph isomorphism problem. I also have a book to suggest: "Graph Theory and its applications". It's a tome, but worth every page.
As from you corollary, every possible spatial distribution of a given graph's vertexes is an isomorph. So two isomorph graphs have the same topology and they are, in the end, the same graph, from the topological point of view. Another matter is, for example, to find those isomorph structures enjoying particular properties (e.g. with non crossing edges, if exists), and that depends on the properties you want.
One of the best algorithms out there for finding graph isomorphisms is VF2.
I've written a high-level overview of VF2 as applied to chemistry - where it is used extensively. The post touches on the differences between VF2 and Ullmann. There is also a from-scratch implementation of VF2 written in Java that might be helpful.
A very similar problem - graph automorphism - can be solved by saucy, which is available in source code. This finds all symmetries of a graph. If you have two graphs, join them into one and any isomorphism can be discovered as an automorphism of the join.
Disclaimer: I am one of co-authors of saucy.
There are algorithms to do this -- however, I have not had cause to seriously investigate them as of yet. I believe Donald Knuth is either writing or has written on this subject in his Art of Computing series during his second pass at (re)writing them.
As for a simple way to do something that might work in practice on small graphs, I would recommend counting degrees, then for each vertex, also note the set of degrees for those vertexes that are adjacent. This will then give you a set of potential vertex isomorphisms for each point. Then just try all those (via brute force, but choosing the vertexes in increasing order of potential vertex isomorphism sets) from this restricted set. Intuitively, most graph isomorphism can be practically computed this way, though clearly there would be degenerate cases that might take a long time.
I recently came across the following paper : http://arxiv.org/abs/0711.2010
This paper proposes "A Polynomial Time Algorithm for Graph Isomorphism"
My project - Griso - at sf.net: http://sourceforge.net/projects/griso/ with this description:
Griso is a graph isomorphism testing utility written in C++. It is based on my own POLYNOMIAL-TIME (in this point the salt of the project) algorithm. See Griso's sample input/output on http://funkybee.narod.ru/graphs.htm page.
nauty and Traces
nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems.
AutGroupGraph command in GRAPE's package of GAP.
bliss: another symmetry and canonical labeling program.
conauto: a graph ismorphism package.
As for heuristics: i've been fantasising about a modified Ullmann's algorithm, where you don't only use breadth first search but mix it with depth first search the way, that first you use breadth first search intensively, than you set a limit for breadth analysis and go deeper after checking a few neighbours, and you lower the breadh every step at some amount. This is practically how i find my way on a map: first locate myself with breadth first search, then search the route with depth first search - largely, and this is the best evolution of my brain has ever invented. :) On the long term some intelligence may be added for increasing breadth first search neighbour count at critical vertexes - for example where there are a large number of neighbouring vertexes with the same edge count. Like checking your actual route sometimes with the car (without a gps).
I've found out that the algorithm belongs in the category of k-dimension Weisfeiler-Lehman algorithms, and it fails with regular graphs. For more here:
http://dabacon.org/pontiff/?p=4148
Original post follows:
I've worked on the problem to find isomorphic graphs in a database of graphs (containing chemical compositions).
In brief, the algorithm creates a hash of a graph using the power iteration method. There might be false positive hash collisions but the probability of that is exceedingly small (i didn't had any such collisions with tens of thousands of graphs).
The way the algorithm works is this:
Do N (where N is the radius of the graph) iterations. On each iteration and for each node:
Sort the hashes (from the previous step) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by the direct neighbors of it. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first step) in the internal hashes that you calculate for each node.
There is more background here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
You can find the source code of it here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py

Resources