I recently implemented Dijkstra's algorithm to practice Java. I'm now considering how to build random test graphs (with unidirectional edges).
Currently, I use a naive method. Nodes are created at random locations in 2d space (where x and y are unsigned integers between 0 and some MAX_SPACE constant). Edges are randomly created to connect the nodes, so that each node has an outdegree of at least 1 (and at most MAX_DEGREE). Indegree is not enforced. Then I search for a path between the first and last Nodes in the set, which may or may not be connected.
In a more realistic situation, nodes would have a probability of being connected proportional to their proximity in 2d space. What is a good strategy to build random test graphs with that property?
NOTES
I will primarily use this to build graphs that can be drawn and verified by hand, but scaling to larger graphs is a consideration.
The strategy should be easily modified to support the following constants (and maybe others -- let me know if you think of any interesting ones):
MIN_NODES, MAX_NODES: a range of sizes for the graph
CONNECTEDNESS: average out-degree
PROXIMITY: weight given to preferring to connect proximal nodes
You could start by looking at the different random graph generators available in JUNG (Java library):
Barabasi Albert Generator - Simple evolving scale-free random graph generator. At each time step, a new vertex is created and is connected to existing vertices according to the principle of "preferential attachment", whereby vertices with higher degree have a higher probability of being selected for attachment.
Eppstein Power Law Generator - Graph generator that generates undirected graphs with power-law degree distributions.
There are various other generators available to - See Listing Here
For python there is the NetworkX library that also provides many graph generators - Listed Here
With many of these generators you can specify the size, so you can start small and go from there.
Related
I have a set of objects (between 1 and roughly 500). Each object is compatible with certain (zero or more) other objects from that same set.
Can anyone give me some pointers as to how to determine the best way to create pairs of objects that are compatible with each other so that most of the objects in the set are paired?
You're looking for a maximum matching in a general graph. As opposed to the stable marriage problem with which you are familiar, in the maximum matching problem the input graph is not necessarily bipartite. There is no notion of stability (as vertices do not rank their compatible options) and what you're looking for is a subset of the edges of the graph such that no two edges share a common vertex (a.k.a., a matching). You're trying to construct that matching which contains the maximum possible number of edges.
Luckily, the problem of finding a maximum matching in a general graph can be solved in polynomial time using Edmond's matching algorithm (also known as the blossom algorithm because of how it contracts blossoms (odd cycles) into single vertices). The time complexity of Edmond's matching algorithm is O(E•V^2). While not very efficient, I believe this is good enough for the relatively small graphs you're dealing with. You don't even have to implement it from scratch by yourself as there's an open source Java implementation of Edmond's algorithm you can use. However, if you're interested in the state of the art, you can use the most efficient algorithm known for the problem which runs in O(E•sqrt(V)).
If the vertex compatibility of your input is not dichotomous (that is, each vertex has a ranking specifying its preferences among its neighbors), you can add corresponding weights to the edges to accommodate for the preference profile and use the variation of Edmond's algorithm for weighted graphs.
Suppose I have an undirected weighted connected graph. I want to group vertices that have highest edges' value all together (vertices degree). Using clustering algorithms is one way. What clustering algorithms can I consider for this task? I hope it is clear; any question for clarification, please ask. Thanks.
There are two main approach - giving your graph as an input to existing tool, or using expert knowledge you have on this graph (and its domain) in order to create a representation, and then apply machine learning methods on it.
I'll start with the second approach:
If you have only the nodes and edges (no farther data for each node), you first need to think of a representation for each node\edge. I going to explain about nodes, but it should should be similar for edges' case.
The simplest approach is to represent each node n as a connectivity vector:
Every node will be represented as n=(Ia(n),Ib(n),Ic(n),Id(n),Ie(n)), where Ii(n)=1 in case node n is a 'friend' (neighbor) of node i, and 0 otherwise. (e.g.a=(0,1,1,0,1))
Note that you can decide if a node is a friend of itself.
Second approach, which is quite similar to the first one, is to use edges' weights vector:
n=(W(a,n),W(b,n),W(c,n),W(d,n),W(e,n)) , where W(i,n) is the weight of the edge (i,n).
There are a few more ways to represent nodes, but this is enough in order to run some calculations on it.
After you have this presentation, you can start applying some clustering algorithms on it.
kmeans is considered great for this task, and sklearn has a great implementation. It has some parameters you can (and should) configure (i.e. the distance measure).
The product of kmeans, is k different non-intersecting groups of nodes.
If you want to pass you graph to an algorithm and get some measures, there are more advanced algorithms you can apply. community detection is used to find communities in a graph. Again, there is a nice python implementation in the networkxpackage.
Does anyone of you know any real world applications where spanning tree data structure is used?
In networking, we use Minimum spanning tree algorithm often. So the problem is as stated here, given a graph with weighted edges, find a tree of edges with the minimum total weight that satisfies these three properties: connected, acyclic, and consisting of |V| - 1 edges. (In fact, any two of the three conditions imply the third condition.)
as an example,
For instance, if you have a large local area network with a lot of
switches, it might be useful to find a minimum spanning tree so that
only the minimum number of packets need to be relayed across the
network and multiple copies of the same packet don't arrive via
different paths (remember, any two nodes are connected via only a
single path in a spanning tree).
Other real-world problems include laying out electrical grids,
reportedly the original motivation for Boruvka's algorithm, one of the
first algorithms for finding minimum spanning trees. It shouldn't be
surprising that it would be better to find a minimum spanning tree
than just any old spanning tree; if one spanning tree on a network
would involve taking the most congested, slowest path, it's probably
not going to be ideal!
There are many other applications apart from the computer networks, i listed the references below:
Network design:
– telephone, electrical, hydraulic, TV cable, computer, road
The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn’t a tree you can always remove some edges and save money.
Approximation algorithms for NP-hard problems:
– traveling salesperson problem, Steiner tree
A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once.
Note that if you have a path visiting all points exactly once, it’s a special kind of tree. For instance in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it’s a minimization over a strictly larger set.
On the other hand, if you draw a path tracing around the minimum spanning tree, you trace each edge twice and visit all points, so the TSP weight is less than twice the MST weight. Therefore this tour is within a factor of two of optimal.
Indirect applications:
– max bottleneck paths
– LDPC codes for error correction
– image registration with Renyi entropy
– learning salient features for real-time face verification
– reducing data storage in sequencing amino acids in a protein
– model locality of particle interactions in turbulent fluid flows
– autoconfig protocol for Ethernet bridging to avoid cycles in a network
Cluster analysis:
k clustering problem can be viewed as finding an MST and deleting the k-1 most
expensive edges.
you can read the details from here, and here, and for a demo check here please.
Is there an algorithm that, when given a graph, computes the vertex connectivity of that graph (the minimum number of vertices to remove in order to separate the graph into two connected graphs). (Note that the graph may be already be disconnected). Thanks!
See:
Determining if a graph is K-vertex-connected
k-vertex connectivity of a graph
When you combine this with binary search you are done.
This book chapter should have everything you need to get started; it is basically a survey over algorithms to determine the edge connectivity and vertex connectivity of graphs, with pseudo code for the algorithms described in it. Page 12 has an overview over the available algorithms alongside complexity analyses and references. Most of the solutions for the general case are flow-based, with the exception of one randomized algorithm. The different general solutions optimize for different properties of the graph, so you can choose the most asymptotically efficient one beforehand. Also, for some classes of graphs, there exist specialized algorithms with better complexity than the general solutions provide.
Is there an algorithm or heuristics for graph isomorphism?
Corollary: A graph can be represented in different different drawings.
What s the best approach to find different drawing of a graph?
It is a hell of a problem.
In general, the basic idea is to simplify the graph into a canonical form, and then perform comparison of canonical forms. Spanning trees are generated with this objective, but spanning trees are not unique, so you need to have a canonical way to create them.
After you have canonical forms, you can perform isomorphism comparison (relatively) easy, but that's just the start, since non-isomorphic graphs can have the same spanning tree. (e.g. think about a spanning tree T and a single addition of an edge to it to create T'. These two graphs are not isomorph, but they have the same spanning tree).
Other techniques involve comparing descriptors (e.g. number of nodes, number of edges), which can produce false positive in general.
I suggest you to start with the wiki page about the graph isomorphism problem. I also have a book to suggest: "Graph Theory and its applications". It's a tome, but worth every page.
As from you corollary, every possible spatial distribution of a given graph's vertexes is an isomorph. So two isomorph graphs have the same topology and they are, in the end, the same graph, from the topological point of view. Another matter is, for example, to find those isomorph structures enjoying particular properties (e.g. with non crossing edges, if exists), and that depends on the properties you want.
One of the best algorithms out there for finding graph isomorphisms is VF2.
I've written a high-level overview of VF2 as applied to chemistry - where it is used extensively. The post touches on the differences between VF2 and Ullmann. There is also a from-scratch implementation of VF2 written in Java that might be helpful.
A very similar problem - graph automorphism - can be solved by saucy, which is available in source code. This finds all symmetries of a graph. If you have two graphs, join them into one and any isomorphism can be discovered as an automorphism of the join.
Disclaimer: I am one of co-authors of saucy.
There are algorithms to do this -- however, I have not had cause to seriously investigate them as of yet. I believe Donald Knuth is either writing or has written on this subject in his Art of Computing series during his second pass at (re)writing them.
As for a simple way to do something that might work in practice on small graphs, I would recommend counting degrees, then for each vertex, also note the set of degrees for those vertexes that are adjacent. This will then give you a set of potential vertex isomorphisms for each point. Then just try all those (via brute force, but choosing the vertexes in increasing order of potential vertex isomorphism sets) from this restricted set. Intuitively, most graph isomorphism can be practically computed this way, though clearly there would be degenerate cases that might take a long time.
I recently came across the following paper : http://arxiv.org/abs/0711.2010
This paper proposes "A Polynomial Time Algorithm for Graph Isomorphism"
My project - Griso - at sf.net: http://sourceforge.net/projects/griso/ with this description:
Griso is a graph isomorphism testing utility written in C++. It is based on my own POLYNOMIAL-TIME (in this point the salt of the project) algorithm. See Griso's sample input/output on http://funkybee.narod.ru/graphs.htm page.
nauty and Traces
nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems.
AutGroupGraph command in GRAPE's package of GAP.
bliss: another symmetry and canonical labeling program.
conauto: a graph ismorphism package.
As for heuristics: i've been fantasising about a modified Ullmann's algorithm, where you don't only use breadth first search but mix it with depth first search the way, that first you use breadth first search intensively, than you set a limit for breadth analysis and go deeper after checking a few neighbours, and you lower the breadh every step at some amount. This is practically how i find my way on a map: first locate myself with breadth first search, then search the route with depth first search - largely, and this is the best evolution of my brain has ever invented. :) On the long term some intelligence may be added for increasing breadth first search neighbour count at critical vertexes - for example where there are a large number of neighbouring vertexes with the same edge count. Like checking your actual route sometimes with the car (without a gps).
I've found out that the algorithm belongs in the category of k-dimension Weisfeiler-Lehman algorithms, and it fails with regular graphs. For more here:
http://dabacon.org/pontiff/?p=4148
Original post follows:
I've worked on the problem to find isomorphic graphs in a database of graphs (containing chemical compositions).
In brief, the algorithm creates a hash of a graph using the power iteration method. There might be false positive hash collisions but the probability of that is exceedingly small (i didn't had any such collisions with tens of thousands of graphs).
The way the algorithm works is this:
Do N (where N is the radius of the graph) iterations. On each iteration and for each node:
Sort the hashes (from the previous step) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by the direct neighbors of it. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first step) in the internal hashes that you calculate for each node.
There is more background here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
You can find the source code of it here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py