Graph Isomorphism - algorithm

Is there an algorithm or heuristics for graph isomorphism?
Corollary: A graph can be represented in different different drawings.
What s the best approach to find different drawing of a graph?

It is a hell of a problem.
In general, the basic idea is to simplify the graph into a canonical form, and then perform comparison of canonical forms. Spanning trees are generated with this objective, but spanning trees are not unique, so you need to have a canonical way to create them.
After you have canonical forms, you can perform isomorphism comparison (relatively) easy, but that's just the start, since non-isomorphic graphs can have the same spanning tree. (e.g. think about a spanning tree T and a single addition of an edge to it to create T'. These two graphs are not isomorph, but they have the same spanning tree).
Other techniques involve comparing descriptors (e.g. number of nodes, number of edges), which can produce false positive in general.
I suggest you to start with the wiki page about the graph isomorphism problem. I also have a book to suggest: "Graph Theory and its applications". It's a tome, but worth every page.
As from you corollary, every possible spatial distribution of a given graph's vertexes is an isomorph. So two isomorph graphs have the same topology and they are, in the end, the same graph, from the topological point of view. Another matter is, for example, to find those isomorph structures enjoying particular properties (e.g. with non crossing edges, if exists), and that depends on the properties you want.

One of the best algorithms out there for finding graph isomorphisms is VF2.
I've written a high-level overview of VF2 as applied to chemistry - where it is used extensively. The post touches on the differences between VF2 and Ullmann. There is also a from-scratch implementation of VF2 written in Java that might be helpful.

A very similar problem - graph automorphism - can be solved by saucy, which is available in source code. This finds all symmetries of a graph. If you have two graphs, join them into one and any isomorphism can be discovered as an automorphism of the join.
Disclaimer: I am one of co-authors of saucy.

There are algorithms to do this -- however, I have not had cause to seriously investigate them as of yet. I believe Donald Knuth is either writing or has written on this subject in his Art of Computing series during his second pass at (re)writing them.
As for a simple way to do something that might work in practice on small graphs, I would recommend counting degrees, then for each vertex, also note the set of degrees for those vertexes that are adjacent. This will then give you a set of potential vertex isomorphisms for each point. Then just try all those (via brute force, but choosing the vertexes in increasing order of potential vertex isomorphism sets) from this restricted set. Intuitively, most graph isomorphism can be practically computed this way, though clearly there would be degenerate cases that might take a long time.

I recently came across the following paper : http://arxiv.org/abs/0711.2010
This paper proposes "A Polynomial Time Algorithm for Graph Isomorphism"

My project - Griso - at sf.net: http://sourceforge.net/projects/griso/ with this description:
Griso is a graph isomorphism testing utility written in C++. It is based on my own POLYNOMIAL-TIME (in this point the salt of the project) algorithm. See Griso's sample input/output on http://funkybee.narod.ru/graphs.htm page.

nauty and Traces
nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems.
AutGroupGraph command in GRAPE's package of GAP.
bliss: another symmetry and canonical labeling program.
conauto: a graph ismorphism package.

As for heuristics: i've been fantasising about a modified Ullmann's algorithm, where you don't only use breadth first search but mix it with depth first search the way, that first you use breadth first search intensively, than you set a limit for breadth analysis and go deeper after checking a few neighbours, and you lower the breadh every step at some amount. This is practically how i find my way on a map: first locate myself with breadth first search, then search the route with depth first search - largely, and this is the best evolution of my brain has ever invented. :) On the long term some intelligence may be added for increasing breadth first search neighbour count at critical vertexes - for example where there are a large number of neighbouring vertexes with the same edge count. Like checking your actual route sometimes with the car (without a gps).

I've found out that the algorithm belongs in the category of k-dimension Weisfeiler-Lehman algorithms, and it fails with regular graphs. For more here:
http://dabacon.org/pontiff/?p=4148
Original post follows:
I've worked on the problem to find isomorphic graphs in a database of graphs (containing chemical compositions).
In brief, the algorithm creates a hash of a graph using the power iteration method. There might be false positive hash collisions but the probability of that is exceedingly small (i didn't had any such collisions with tens of thousands of graphs).
The way the algorithm works is this:
Do N (where N is the radius of the graph) iterations. On each iteration and for each node:
Sort the hashes (from the previous step) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by the direct neighbors of it. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first step) in the internal hashes that you calculate for each node.
There is more background here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
You can find the source code of it here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py

Related

Graph Isomorphism Heuristic Solutions

I'm trying to implement a heuristic solution to identify classes of isomorphic graphs from a given set of graphs. Currently I'm labeling each node with a multiset of the degrees of its neighbours (WL Algorithm).
This obviously produces false positives, for cases such as degree regular graphs. I was hoping to find another cheaply implementable (time and space constrained) heuristic that could cut across the corner cases of the WL Algorithm. Essentially I'm looking for a pair of easily implementable heuristics which between them give marginal false positives.
Which heuristic other than the WL algorithm should I be looking at?
Thanks!
Checkout the VF2 algorithm: http://www.researchgate.net/profile/Carlo_Sansone/publication/200034365_An_Improved_Algorithm_for_Matching_Large_Graphs/links/0912f50dc9cf0a98d4000000.pdf
There's a C++ library that implements VF2:
http://mivia.unisa.it/datasets/graph-database/vflib/
A Comparison of VF2 with a few other algorithms:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2640&rep=rep1&type=pdf
I've found out that the algorithm belongs in the category of k-dimension Weisfeiler-Lehman algorithms, and it fails with regular graphs. For more here:
http://dabacon.org/pontiff/?p=4148
Original post follows:
I've worked on the problem to find isomorphic graphs in a database of graphs (containing chemical compositions).
In brief, the algorithm creates a hash of a graph using the power iteration method. There might be false positive hash collisions but the probability of that is exceedingly small (i didn't had any such collisions with tens of thousands of graphs).
The way the algorithm works is this:
Do N (where N is the radius of the graph) iterations. On each iteration and for each node:
Sort the hashes of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by the direct neighbors of it. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them in the internal hashes that you calculate for each node (and at each step).
There is more background here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
You can find the source code of it here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py
Maybe consider the least colored shortest path invariant discussed in this paper: http://www.academia.edu/5111652/A_new_refinement_procedure_for_graph_isomorphism_algorithms?
Another invariant that could be relatively cheap to calculate is the list of cycles that a vertex is part of. Of course, that requires finding the cycles in your graph, but there are many algorithms to do that.

Random spanning trees of bipartite graphs

I'm working on making some code using metaheuristics for finding good solutions to the Fixed Charge Transportation Problem (FCTP).
The problem I'm having is to generate a starting solution, based on finding a spanning tree for the underlying bipartite graph.
I want it to be a random spanning tree, so that I can run the procedure on the same problem multiple times, possibly getting different solutions.
I'm having some difficulties doing this. The approach I've gone for so far is to make a random permutation of the arcs, then iterate through this list, sequentially putting them into basis if it won't create a cycle.
I need to find a fast method to check if including an arc will create a cycle. I don't want to "brute force" it, since this approach could take a large amount of time, given big problem instances.
Given that A is an array with a random permutation of the arcs, how would you go around making a selection procedure?
I've been working on this for a couple of hours now, and nothing I've tried has worked, most of it being nonsensical when it came to application...
Kruskals Algorithm is used for finding the minimum spanning tree. The fast-cycle detection is not actually part of Kruskals algorithm. The algorithm will work with a data structure that is able to find cycles fast as well as with a slow naive attempt (however the complexity will be different).
However Kruskals Algorithm is on track here, since it usually uses a so called union-find or disjoint-set datastructure for fast detection of cycles. This is the part of the Kruskals Algorithm page on wikipedia that you will need for your algorithm. This is also linked on wikipedia: http://en.wikipedia.org/wiki/Disjoint-set_data_structure
I found Kruskal's algorithm after long hours of research. I only needed to randomize the order in which I investigated the nodes of the graph.

Why do these maze generation algorithms produce mazes with different properties?

I was browsing the Wikipedia entry on maze generation algorithms and found that the article strongly insinuated that different maze generation algorithms (randomized depth-first search, randomized Kruskal's, etc.) produce mazes with different characteristics. This seems to suggest that the algorithms produce random mazes with different probability distributions over the set of all single-solution mazes (spanning trees on a rectangular grid).
My questions are:
Is this correct? That is, am I reading this article correctly, and is the article correct?
If so, why? I don't see an intuitive reason why the different algorithms would produce different distributions.
Uh well I think it's pretty obvious different algorithms generate different mazes. Let's just talk about spanning trees of a grid. Suppose you have a grid G and you have two algorithms to generate a spanning tree for the grid:
Algorithm A:
Pick any edge of the grid, with 99% probability choose a horizontal one, otherwise a vertical one
Add the edge to the maze, unless adding it would create a cycle
Stop when every vertex is connected to every other vertex (spanning tree complete)
Algorithm B:
As algorithm A, but set the probability to 1% instead of 99%
"Obviously" algorithm A produces mazes with lots of horizontal passages and algorithm B mazes with lots of vertical passages. That is, there is a statistical correlation between the number of horizontal passages in a maze and the maze being produced by algorithm A.
Of course the differences between the Wikipedia algorithms are more intricate but the principle is the same. The algorithms sample the space of possible mazes for a given grid in a non-uniform, structured way.
LOL I remember a scientific conference where a researcher presented her results about her algorithm that did something "for graphs". The results were statistical and presented for "random graphs". Someone asked from the audience "which distribution of random graphs did you draw the graphs from?" The answer: "uh... they were produced by our graph generation program". Duh!
Interesting question. Here my random 2c.
Comparing Prim's to, say, DFS, the latter seems to have a proclivity for producing deeper trees simply due to the fact that the first 'runs' have more space to create deep trees with less branches. Prim's algorithm, on the other hand, appears to create trees with more branching due to the fact that any open branch can be selected at each iteration.
One way to ask this would be to look at what is the probability that each algorithm will produce a tree of depth > N. I have a hunch that they would be different. A more formal approach to do proving this might be to assign some weights to each part of the tree and show it's more likely to be taken or attempt to characterize the space some other way, but I'll be hand wavy and guessing it's correct :). I'm interested in what lead to you think it wouldn't be, because my intuition was the opposite. And no, the Wiki article doesn't give a convincing argument.
EDIT
One simple way to see this to consider an initial tree with two children with a total of k nodes
e.g.,
*---* ... *
\--* ... *
Choose a random node as the start and end. DFS will produce one of two mazes, either the entire tree, or the part of it with the direct path from start to end. Prim's algorithm will produce the 'maze' with the direct path from start to end with secondary paths of length 1 ... k.
It is not statistical until you request that each algorithm produce every solution it can.
What you are perceiving as statistical bias is only a bias towards the preferred, first solution.
That bias may not be algorithmic (set-theory-wise) but implementation dependent (like the bias in the choice of the pivot in quicksort).
Yes, it is correct. You can produce different mazes by starting the process in different ways. Some algorithms start with a fully closed grid and remove walls to generate a path through the maze while some start with a empty grid and add walls leaving behind a path. This alone can produce different results.

How to assign consecutive numbers to nodes of directed graph?

There's a graph with a lot of nodes, and very few edges between them - the problem is assigning numbers to nodes, so that most nodes are from i to i+1 or otherwise close.
My problem is about printing graph data nicely, but an algorithm just like that is part of pretty much every compiler (intermediate code is just a graph, produced object code gets memory locations).
I thought it was just straightforward depth-first search, but results of that aren't that great - it seems to minimize number of links back well enough, but ones it leaves tend to be horrible (like 1 -> 500 -> 1).
Any better ideas?
This paper discusses this problem, if you use Eyal Schneider's formulation of minimizing the sum of the edge deltas (absolute value of the difference between the endpoints' labels). It's under #2, Optimal Linear Arrangements.
Sadly, there's no algorithm given for achieving an optimal ordering (or labeling), and the general problem is NP-complete. There are references to some polynomial-time algorithms for trees, though.
If you want to get into the academic stuff, google gives lots of hits for "Optimal Linear Arrangements".

Graph Algorithm To Find All Paths Between N Arbitrary Vertices

I have an graph with the following attributes:
Undirected
Not weighted
Each vertex has a minimum of 2 and maximum of 6 edges connected to it.
Vertex count will be < 100
Graph is static and no vertices/edges can be added/removed or edited.
I'm looking for paths between a random subset of the vertices (at least 2). The paths should simple paths that only go through any vertex once.
My end goal is to have a set of routes so that you can start at one of the subset vertices and reach any of the other subset vertices. Its not necessary to pass through all the subset nodes when following a route.
All of the algorithms I've found (Dijkstra,Depth first search etc.) seem to be dealing with paths between two vertices and shortest paths.
Is there a known algorithm that will give me all the paths (I suppose these are subgraphs) that connect these subset of vertices?
edit:
I've created a (warning! programmer art) animated gif to illustrate what i'm trying to achieve: http://imgur.com/mGVlX.gif
There are two stages pre-process and runtime.
pre-process
I have a graph and a subset of the vertices (blue nodes)
I generate all the possible routes that connect all the blue nodes
runtime
I can start at any blue node select any of the generated routes and travel along it to reach my destination blue node.
So my task is more about creating all of the subgraphs (routes) that connect all blue nodes, rather than creating a path from A->B.
There are so many ways to approach this and in order not confuse things, here's a separate answer that's addressing the description of your core problem:
Finding ALL possible subgraphs that connect your blue vertices is probably overkill if you're only going to use one at a time anyway. I would rather use an algorithm that finds a single one, but randomly (so not any shortest path algorithm or such, since it will always be the same).
If you want to save one of these subgraphs, you simply have to save the seed you used for the random number generator and you'll be able to produce the same subgraph again.
Also, if you really want to find a bunch of subgraphs, a randomized algorithm is still a good choice since you can run it several times with different seeds.
The only real downside is that you will never know if you've found every single one of the possible subgraphs, but it doesn't really sound like that's a requirement for your application.
So, on to the algorithm: Depending on the properties of your graph(s), the optimal algorithm might vary, but you could always start of with a simple random walk, starting from one blue node, walking to another blue one (while making sure you're not walking in your own old footsteps). Then choose a random node on that path and start walking to the next blue from there, and so on.
For certain graphs, this has very bad worst-case complexity but might suffice for your case. There are of course more intelligent ways to find random paths, but I'd start out easy and see if it's good enough. As they say, premature optimization is evil ;)
A simple breadth-first search will give you the shortest paths from one source vertex to all other vertices. So you can perform a BFS starting from each vertex in the subset you're interested in, to get the distances to all other vertices.
Note that in some places, BFS will be described as giving the path between a pair of vertices, but this is not necessary: You can keep running it until it has visited all nodes in the graph.
This algorithm is similar to Johnson's algorithm, but greatly simplified thanks to the fact that your graph is unweighted.
Time complexity: Since there is a constant number of edges per vertex, each BFS will take O(n), and the total will take O(kn), where n is the number of vertices and k is the size of the subset. As a comparison, the Floyd-Warshall algorithm will take O(n^3).
What you're searching for is (if I understand it correctly) not really all paths, but rather all spanning trees. Read the wikipedia article about spanning trees here to determine if those are what you're looking for. If it is, there is a paper you would probably want to read:
Gabow, Harold N.; Myers, Eugene W. (1978). "Finding All Spanning Trees of Directed and Undirected Graphs". SIAM J. Comput. 7 (280).

Resources