Why do these maze generation algorithms produce mazes with different properties? - algorithm

I was browsing the Wikipedia entry on maze generation algorithms and found that the article strongly insinuated that different maze generation algorithms (randomized depth-first search, randomized Kruskal's, etc.) produce mazes with different characteristics. This seems to suggest that the algorithms produce random mazes with different probability distributions over the set of all single-solution mazes (spanning trees on a rectangular grid).
My questions are:
Is this correct? That is, am I reading this article correctly, and is the article correct?
If so, why? I don't see an intuitive reason why the different algorithms would produce different distributions.

Uh well I think it's pretty obvious different algorithms generate different mazes. Let's just talk about spanning trees of a grid. Suppose you have a grid G and you have two algorithms to generate a spanning tree for the grid:
Algorithm A:
Pick any edge of the grid, with 99% probability choose a horizontal one, otherwise a vertical one
Add the edge to the maze, unless adding it would create a cycle
Stop when every vertex is connected to every other vertex (spanning tree complete)
Algorithm B:
As algorithm A, but set the probability to 1% instead of 99%
"Obviously" algorithm A produces mazes with lots of horizontal passages and algorithm B mazes with lots of vertical passages. That is, there is a statistical correlation between the number of horizontal passages in a maze and the maze being produced by algorithm A.
Of course the differences between the Wikipedia algorithms are more intricate but the principle is the same. The algorithms sample the space of possible mazes for a given grid in a non-uniform, structured way.
LOL I remember a scientific conference where a researcher presented her results about her algorithm that did something "for graphs". The results were statistical and presented for "random graphs". Someone asked from the audience "which distribution of random graphs did you draw the graphs from?" The answer: "uh... they were produced by our graph generation program". Duh!

Interesting question. Here my random 2c.
Comparing Prim's to, say, DFS, the latter seems to have a proclivity for producing deeper trees simply due to the fact that the first 'runs' have more space to create deep trees with less branches. Prim's algorithm, on the other hand, appears to create trees with more branching due to the fact that any open branch can be selected at each iteration.
One way to ask this would be to look at what is the probability that each algorithm will produce a tree of depth > N. I have a hunch that they would be different. A more formal approach to do proving this might be to assign some weights to each part of the tree and show it's more likely to be taken or attempt to characterize the space some other way, but I'll be hand wavy and guessing it's correct :). I'm interested in what lead to you think it wouldn't be, because my intuition was the opposite. And no, the Wiki article doesn't give a convincing argument.
EDIT
One simple way to see this to consider an initial tree with two children with a total of k nodes
e.g.,
*---* ... *
\--* ... *
Choose a random node as the start and end. DFS will produce one of two mazes, either the entire tree, or the part of it with the direct path from start to end. Prim's algorithm will produce the 'maze' with the direct path from start to end with secondary paths of length 1 ... k.

It is not statistical until you request that each algorithm produce every solution it can.
What you are perceiving as statistical bias is only a bias towards the preferred, first solution.
That bias may not be algorithmic (set-theory-wise) but implementation dependent (like the bias in the choice of the pivot in quicksort).

Yes, it is correct. You can produce different mazes by starting the process in different ways. Some algorithms start with a fully closed grid and remove walls to generate a path through the maze while some start with a empty grid and add walls leaving behind a path. This alone can produce different results.

Related

How to divide a connected weighted graph to N semi-equal subgraphs

I have a graph of many hundred nodes that are mainly connected with each other. I can do processing on entire graph but it really takes a lot of time, so I would like to divide it to smaller sub-graphs of approximately similar size.
With other words. I have a collection of aerial images and I do pairwise image matching on all of them. As a result I get a set of matches for each pair (pixel from first image matched with pixel on second image). Number of matches is considered as weight of this (undirected) edge. These edges then form a graph mentioned above.
I'm not so familiar with graph theory (as it is a very broad topic). What is the best algorithm for this job?
Thank you.
Edit:
This problem has a perfect analogy which I think is easier to understand. Imagine you have a set of people and their connections/friendships, like I social network. Each friendship has a numeric value/weight representing how good friends they are. So in a large group of people I want to get k most interconnected sub-groups .
Unfortunately, the problem you're describing is almost certainly NP-hard. From a graph perspective, you have a graph where each edge has a weight on it. You're trying to split the graph into relatively equal pieces while cutting the lowest total cost of edges cut. This problem is called the maximum k-cut problem and is NP-hard. If you introduce the constraint that you also want to try to make the pieces roughly even in size, you have the balanced k-cut problem, which is also NP-hard.
The good news is that there are nice approximation algorithms for these problems, so if you're looking for solutions that are just "good enough," then you can probably find a library somewhere that implements them. There are also other techniques like spectral clustering which work well in practice and are really fast, but which don't have any guarantees on how well they'll do.

Count distinct rectangular grid mazes with acyclical paths for given size

I was trying to solve the following problem:
An mn maze is an mn rectangular grid with walls placed between grid cells such that there is exactly one path from the top-left square to any other square.
The following are examples of a 912 maze and a 1520 maze:
Let C(m,n) be the number of distinct mn mazes. Mazes which can be formed by rotation and reflection from another maze are considered distinct.
It can be verified that C(1,1) = 1, C(2,2) = 4, C(3,4) = 2415, and C(9,12) = 2.5720e46 (in scientific notation rounded to 5 significant digits).
Find C(100,500)
Now, there is an explicit formula which gives the right result, and it is perfectly computable. However, as I understand, the solutions to Project Euler problems should be more like clever algorithms and not explicit formula computations. Trying to formulate the solution as a recursion, I could only arrive at a linear system with number of variables growing exponentially with the size of the maze (more precisely, if one tries to write a recursion for the number of mxn mazes with m held fixed, one arrives at a linear system such that the number of its variables grows exponentially with m: one of the variables is the number of mxn mazes with the property given in the declaration of problem 380, while the other variables are numbers of mxn mazes with more than one connected component which touch the boundary of the maze in some specific "configuration" - and the number of such "configurations" seems to grow exponentially with m. So, while this approach is feasible with m=2,3,4 etc, it does not seem to work with m=100).
I thought also to reduce the problem to subproblems which can be solved more easily,
then reusing the subproblems solutions when constructing a solution to larger subproblems(the dynamic programming approach), but here I stumbled upon the fact that subproblems seem to involve mazes of irregular shapes, and again, the number of such mazes is exponential in m,n.
If someone knows of a feasible approach (m=100, n=500) other than using explicit formulas or some ad hoc theorems, and can hint where to look, for me it would be quite interesting.
This is basically a spanning tree counting problem. Specifically, it is counting the number of spanning trees in a grid graph.
Counting Spanning Trees in a Grid Graph
From the "Counting spanning trees" section of the Wikipedia entry:
The number t(G) of spanning trees of a connected graph is a
well-studied invariant. In some cases, it is easy to calculate t(G)
directly. For example, if G is itself a tree, then t(G)=1, while if G
is the cycle graph C_n with n vertices, then t(G)=n. For any graph G,
the number t(G) can be calculated using Kirchhoff's matrix-tree theorem...
Related Algorithms
Here are a few papers or posts related to counting the number of spanning trees in grid graphs:
"Counting Spanning Trees in Grid Graphs", Melissa Desjarlais and Robert Molina
Department of Mathematics and Computer Science, Alma College, August 17, 2012? (publish date uncertain)
"Counting the number of spanning trees in a graph - A spectral approach", from Univ. of Maryland class notes for CMSC858W: Algorithms for Biosequence Analysis,
April 29th, 2010
"Automatic Generation of Generating Functions for Counting the Number of Spanning Trees for Grid Graphs (and more general creatures) of Fixed (but arbitrary!) Width", by Shalosh B. Ekhad and Doron Zeilberger
The latter by Ekhad and Zeilberger provided the following, with answers that matched up with the problem-at-hand:
If you want to see explicit expressions (as rational functions in z)
for the formal power series whose coefficient of zn in its Maclaurin
expansion (with respect to z) would give you the number of spanning
trees of the m by n grid graph (the Cartesian product of a path of m
vertices and a path of length n) for m=2 to m=6, the the input
gives the output.
Specifically, see the output.
Sidenote: Without the provided solution values that suggest otherwise, a valid interpretation could be that the external structure of the maze is important. Two or more mazes with identical paths would be different and distinct in this case, as there could be 3 options for entering and exiting a maze on a corner, where the top left corner would be open at top, top left corner open on left, or open on both left and top, and similar for a corner exit. If trying to represent these maze possibilities as a tree, two nodes may converge on entry rather than just diverging from start to finish, and there would be one or more additional nodes for exit possibilities. This would increase the value of C(m,n).
The insight here comes from the question (emphasis mine)
A .. maze is a rectangular grid with walls placed between grid cells such that there is exactly one path from the top-left square to any other square.
If you think of the dual of the maze, i.e. the spaces one can occupy, it is clear that a maze must form a graph. Not just any graph either, for there to be a singular path the graph must not contain any cycles which makes it a tree. This reduction to a combinatorics problem suggests an algorithm. In the spirit of Project Euler, the rest is left as an exercise to the reader.
SPOILER AHEAD
I was wrong, stating in one of the comments that "Now, there is a general theorem about spanning trees in a graph, but it does not seem to give a computationally feasible way to compute the number sought". The "general theorem", being the Matrix-Tree theorem, attributed to Kirchhoff, and referred to in one of the answers here, gives the result not only as the product of the nonzero eigenvalues of the graph Laplacian divided by the order of the graph, but also as the absolute value of any cofactor of the Laplacian, which in this case is the absolute value of the determinant of a 49999x49999 matrix. But, although the matrix is very sparse, it still looked to me out of reach.
However, the reference
http://arxiv.org/pdf/0712.0681.pdf
("Determinants of block tridiagonal matrices", by Luca Guido Molinari),
permitted to reduce the problem to the evaluation of the determinant of an integer 100x100 dense matrix, having very large integers as its entries.
Further, the reference
http://www.ams.org/journals/mcom/1968-22-103/S0025-5718-1968-0226829-0/S0025-5718-1968-0226829-0.pdf
by Erwin H. Bareiss (usually one just speaks of "Bareiss algorithm", but the recursion which I used and which is referred to as formula (8) in the reference, seems to be due to Charles Dodgson, a.k.a. Lewis Carroll :) ), perimitted me then to evaluate this last determinant and thus to obtain the answer to the original problem.
I would say that finding a explicit formula is a correct way to solve an Euler problem. It will be fast, it can be scaled. Just go for it :)

Generating a tower defense maze (longest maze with limited walls) - near-optimal heuristic?

In a tower defense game, you have an NxM grid with a start, a finish, and a number of walls.
Enemies take the shortest path from start to finish without passing through any walls (they aren't usually constrained to the grid, but for simplicity's sake let's say they are. In either case, they can't move through diagonal "holes")
The problem (for this question at least) is to place up to K additional walls to maximize the path the enemies have to take. For example, for K=14
My intuition tells me this problem is NP-hard if (as I'm hoping to do) we generalize this to include waypoints that must be visited before moving to the finish, and possibly also without waypoints.
But, are there any decent heuristics out there for near-optimal solutions?
[Edit] I have posted a related question here.
I present a greedy approach and it's maybe close to the optimal (but I couldn't find approximation factor). Idea is simple, we should block the cells which are in critical places of the Maze. These places can help to measure the connectivity of maze. We can consider the vertex connectivity and we find minimum vertex cut which disconnects the start and final: (s,f). After that we remove some critical cells.
To turn it to the graph, take dual of maze. Find minimum (s,f) vertex cut on this graph. Then we examine each vertex in this cut. We remove a vertex its deletion increases the length of all s,f paths or if it is in the minimum length path from s to f. After eliminating a vertex, recursively repeat the above process for k time.
But there is an issue with this, this is when we remove a vertex which cuts any path from s to f. To prevent this we can weight cutting node as high as possible, means first compute minimum (s,f) cut, if cut result is just one node, make it weighted and set a high weight like n^3 to that vertex, now again compute the minimum s,f cut, single cutting vertex in previous calculation doesn't belong to new cut because of waiting.
But if there is just one path between s,f (after some iterations) we can't improve it. In this case we can use normal greedy algorithms like removing node from a one of a shortest path from s to f which doesn't belong to any cut. after that we can deal with minimum vertex cut.
The algorithm running time in each step is:
min-cut + path finding for all nodes in min-cut
O(min cut) + O(n^2)*O(number of nodes in min-cut)
And because number of nodes in min cut can not be greater than O(n^2) in very pessimistic situation the algorithm is O(kn^4), but normally it shouldn't take more than O(kn^3), because normally min-cut algorithm dominates path finding, also normally path finding doesn't takes O(n^2).
I guess the greedy choice is a good start point for simulated annealing type algorithms.
P.S: minimum vertex cut is similar to minimum edge cut, and similar approach like max-flow/min-cut can be applied on minimum vertex cut, just assume each vertex as two vertex, one Vi, one Vo, means input and outputs, also converting undirected graph to directed one is not hard.
it can be easily shown (proof let as an exercise to the reader) that it is enough to search for the solution so that every one of the K blockades is put on the current minimum-length route. Note that if there are multiple minimal-length routes then all of them have to be considered. The reason is that if you don't put any of the remaining blockades on the current minimum-length route then it does not change; hence you can put the first available blockade on it immediately during search. This speeds up even a brute-force search.
But there are more optimizations. You can also always decide that you put the next blockade so that it becomes the FIRST blockade on the current minimum-length route, i.e. you work so that if you place the blockade on the 10th square on the route, then you mark the squares 1..9 as "permanently open" until you backtrack. This saves again an exponential number of squares to search for during backtracking search.
You can then apply heuristics to cut down the search space or to reorder it, e.g. first try those blockade placements that increase the length of the current minimum-length route the most. You can then run the backtracking algorithm for a limited amount of real-time and pick the best solution found thus far.
I believe we can reduce the contained maximum manifold problem to boolean satisifiability and show NP-completeness through any dependency on this subproblem. Because of this, the algorithms spinning_plate provided are reasonable as heuristics, precomputing and machine learning is reasonable, and the trick becomes finding the best heuristic solution if we wish to blunder forward here.
Consider a board like the following:
..S........
#.#..#..###
...........
...........
..........F
This has many of the problems that cause greedy and gate-bound solutions to fail. If we look at that second row:
#.#..#..###
Our logic gates are, in 0-based 2D array ordered as [row][column]:
[1][4], [1][5], [1][6], [1][7], [1][8]
We can re-render this as an equation to satisfy the block:
if ([1][9] AND ([1][10] AND [1][11]) AND ([1][12] AND [1][13]):
traversal_cost = INFINITY; longest = False # Infinity does not qualify
Excepting infinity as an unsatisfiable case, we backtrack and rerender this as:
if ([1][14] AND ([1][15] AND [1][16]) AND [1][17]:
traversal_cost = 6; longest = True
And our hidden boolean relationship falls amongst all of these gates. You can also show that geometric proofs can't fractalize recursively, because we can always create a wall that's exactly N-1 width or height long, and this represents a critical part of the solution in all cases (therefore, divide and conquer won't help you).
Furthermore, because perturbations across different rows are significant:
..S........
#.#........
...#..#....
.......#..#
..........F
We can show that, without a complete set of computable geometric identities, the complete search space reduces itself to N-SAT.
By extension, we can also show that this is trivial to verify and non-polynomial to solve as the number of gates approaches infinity. Unsurprisingly, this is why tower defense games remain so fun for humans to play. Obviously, a more rigorous proof is desirable, but this is a skeletal start.
Do note that you can significantly reduce the n term in your n-choose-k relation. Because we can recursively show that each perturbation must lie on the critical path, and because the critical path is always computable in O(V+E) time (with a few optimizations to speed things up for each perturbation), you can significantly reduce your search space at a cost of a breadth-first search for each additional tower added to the board.
Because we may tolerably assume O(n^k) for a deterministic solution, a heuristical approach is reasonable. My advice thus falls somewhere between spinning_plate's answer and Soravux's, with an eye towards machine learning techniques applicable to the problem.
The 0th solution: Use a tolerable but suboptimal AI, in which spinning_plate provided two usable algorithms. Indeed, these approximate how many naive players approach the game, and this should be sufficient for simple play, albeit with a high degree of exploitability.
The 1st-order solution: Use a database. Given the problem formulation, you haven't quite demonstrated the need to compute the optimal solution on the fly. Therefore, if we relax the constraint of approaching a random board with no information, we can simply precompute the optimum for all K tolerable for each board. Obviously, this only works for a small number of boards: with V! potential board states for each configuration, we cannot tolerably precompute all optimums as V becomes very large.
The 2nd-order solution: Use a machine-learning step. Promote each step as you close a gap that results in a very high traversal cost, running until your algorithm converges or no more optimal solution can be found than greedy. A plethora of algorithms are applicable here, so I recommend chasing the classics and the literature for selecting the correct one that works within the constraints of your program.
The best heuristic may be a simple heat map generated by a locally state-aware, recursive depth-first traversal, sorting the results by most to least commonly traversed after the O(V^2) traversal. Proceeding through this output greedily identifies all bottlenecks, and doing so without making pathing impossible is entirely possible (checking this is O(V+E)).
Putting it all together, I'd try an intersection of these approaches, combining the heat map and critical path identities. I'd assume there's enough here to come up with a good, functional geometric proof that satisfies all of the constraints of the problem.
At the risk of stating the obvious, here's one algorithm
1) Find the shortest path
2) Test blocking everything node on that path and see which one results in the longest path
3) Repeat K times
Naively, this will take O(K*(V+ E log E)^2) but you could with some little work improve 2 by only recalculating partial paths.
As you mention, simply trying to break the path is difficult because if most breaks simply add a length of 1 (or 2), its hard to find the choke points that lead to big gains.
If you take the minimum vertex cut between the start and the end, you will find the choke points for the entire graph. One possible algorithm is this
1) Find the shortest path
2) Find the min-cut of the whole graph
3) Find the maximal contiguous node set that intersects one point on the path, block those.
4) Wash, rinse, repeat
3) is the big part and why this algorithm may perform badly, too. You could also try
the smallest node set that connects with other existing blocks.
finding all groupings of contiguous verticies in the vertex cut, testing each of them for the longest path a la the first algorithm
The last one is what might be most promising
If you find a min vertex cut on the whole graph, you're going to find the choke points for the whole graph.
Here is a thought. In your grid, group adjacent walls into islands and treat every island as a graph node. Distance between nodes is the minimal number of walls that is needed to connect them (to block the enemy).
In that case you can start maximizing the path length by blocking the most cheap arcs.
I have no idea if this would work, because you could make new islands using your points. but it could help work out where to put walls.
I suggest using a modified breadth first search with a K-length priority queue tracking the best K paths between each island.
i would, for every island of connected walls, pretend that it is a light. (a special light that can only send out horizontal and vertical rays of light)
Use ray-tracing to see which other islands the light can hit
say Island1 (i1) hits i2,i3,i4,i5 but doesn't hit i6,i7..
then you would have line(i1,i2), line(i1,i3), line(i1,i4) and line(i1,i5)
Mark the distance of all grid points to be infinity. Set the start point as 0.
Now use breadth first search from the start. Every grid point, mark the distance of that grid point to be the minimum distance of its neighbors.
But.. here is the catch..
every time you get to a grid-point that is on a line() between two islands, Instead of recording the distance as the minimum of its neighbors, you need to make it a priority queue of length K. And record the K shortest paths to that line() from any of the other line()s
This priority queque then stays the same until you get to the next line(), where it aggregates all priority ques going into that point.
You haven't showed the need for this algorithm to be realtime, but I may be wrong about this premice. You could then precalculate the block positions.
If you can do this beforehand and then simply make the AI build the maze rock by rock as if it was a kind of tree, you could use genetic algorithms to ease up your need for heuristics. You would need to load any kind of genetic algorithm framework, start with a population of non-movable blocks (your map) and randomly-placed movable blocks (blocks that the AI would place). Then, you evolve the population by making crossovers and transmutations over movable blocks and then evaluate the individuals by giving more reward to the longest path calculated. You would then simply have to write a resource efficient path-calculator without the need of having heuristics in your code. In your last generation of your evolution, you would take the highest-ranking individual, which would be your solution, thus your desired block pattern for this map.
Genetic algorithms are proven to take you, under ideal situation, to a local maxima (or minima) in reasonable time, which may be impossible to reach with analytic solutions on a sufficiently large data set (ie. big enough map in your situation).
You haven't stated the language in which you are going to develop this algorithm, so I can't propose frameworks that may perfectly suit your needs.
Note that if your map is dynamic, meaning that the map may change over tower defense iterations, you may want to avoid this technique since it may be too intensive to re-evolve an entire new population every wave.
I'm not at all an algorithms expert, but looking at the grid makes me wonder if Conway's game of life might somehow be useful for this. With a reasonable initial seed and well-chosen rules about birth and death of towers, you could try many seeds and subsequent generations thereof in a short period of time.
You already have a measure of fitness in the length of the creeps' path, so you could pick the best one accordingly. I don't know how well (if at all) it would approximate the best path, but it would be an interesting thing to use in a solution.

How to assign consecutive numbers to nodes of directed graph?

There's a graph with a lot of nodes, and very few edges between them - the problem is assigning numbers to nodes, so that most nodes are from i to i+1 or otherwise close.
My problem is about printing graph data nicely, but an algorithm just like that is part of pretty much every compiler (intermediate code is just a graph, produced object code gets memory locations).
I thought it was just straightforward depth-first search, but results of that aren't that great - it seems to minimize number of links back well enough, but ones it leaves tend to be horrible (like 1 -> 500 -> 1).
Any better ideas?
This paper discusses this problem, if you use Eyal Schneider's formulation of minimizing the sum of the edge deltas (absolute value of the difference between the endpoints' labels). It's under #2, Optimal Linear Arrangements.
Sadly, there's no algorithm given for achieving an optimal ordering (or labeling), and the general problem is NP-complete. There are references to some polynomial-time algorithms for trees, though.
If you want to get into the academic stuff, google gives lots of hits for "Optimal Linear Arrangements".

Graph Isomorphism

Is there an algorithm or heuristics for graph isomorphism?
Corollary: A graph can be represented in different different drawings.
What s the best approach to find different drawing of a graph?
It is a hell of a problem.
In general, the basic idea is to simplify the graph into a canonical form, and then perform comparison of canonical forms. Spanning trees are generated with this objective, but spanning trees are not unique, so you need to have a canonical way to create them.
After you have canonical forms, you can perform isomorphism comparison (relatively) easy, but that's just the start, since non-isomorphic graphs can have the same spanning tree. (e.g. think about a spanning tree T and a single addition of an edge to it to create T'. These two graphs are not isomorph, but they have the same spanning tree).
Other techniques involve comparing descriptors (e.g. number of nodes, number of edges), which can produce false positive in general.
I suggest you to start with the wiki page about the graph isomorphism problem. I also have a book to suggest: "Graph Theory and its applications". It's a tome, but worth every page.
As from you corollary, every possible spatial distribution of a given graph's vertexes is an isomorph. So two isomorph graphs have the same topology and they are, in the end, the same graph, from the topological point of view. Another matter is, for example, to find those isomorph structures enjoying particular properties (e.g. with non crossing edges, if exists), and that depends on the properties you want.
One of the best algorithms out there for finding graph isomorphisms is VF2.
I've written a high-level overview of VF2 as applied to chemistry - where it is used extensively. The post touches on the differences between VF2 and Ullmann. There is also a from-scratch implementation of VF2 written in Java that might be helpful.
A very similar problem - graph automorphism - can be solved by saucy, which is available in source code. This finds all symmetries of a graph. If you have two graphs, join them into one and any isomorphism can be discovered as an automorphism of the join.
Disclaimer: I am one of co-authors of saucy.
There are algorithms to do this -- however, I have not had cause to seriously investigate them as of yet. I believe Donald Knuth is either writing or has written on this subject in his Art of Computing series during his second pass at (re)writing them.
As for a simple way to do something that might work in practice on small graphs, I would recommend counting degrees, then for each vertex, also note the set of degrees for those vertexes that are adjacent. This will then give you a set of potential vertex isomorphisms for each point. Then just try all those (via brute force, but choosing the vertexes in increasing order of potential vertex isomorphism sets) from this restricted set. Intuitively, most graph isomorphism can be practically computed this way, though clearly there would be degenerate cases that might take a long time.
I recently came across the following paper : http://arxiv.org/abs/0711.2010
This paper proposes "A Polynomial Time Algorithm for Graph Isomorphism"
My project - Griso - at sf.net: http://sourceforge.net/projects/griso/ with this description:
Griso is a graph isomorphism testing utility written in C++. It is based on my own POLYNOMIAL-TIME (in this point the salt of the project) algorithm. See Griso's sample input/output on http://funkybee.narod.ru/graphs.htm page.
nauty and Traces
nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems.
AutGroupGraph command in GRAPE's package of GAP.
bliss: another symmetry and canonical labeling program.
conauto: a graph ismorphism package.
As for heuristics: i've been fantasising about a modified Ullmann's algorithm, where you don't only use breadth first search but mix it with depth first search the way, that first you use breadth first search intensively, than you set a limit for breadth analysis and go deeper after checking a few neighbours, and you lower the breadh every step at some amount. This is practically how i find my way on a map: first locate myself with breadth first search, then search the route with depth first search - largely, and this is the best evolution of my brain has ever invented. :) On the long term some intelligence may be added for increasing breadth first search neighbour count at critical vertexes - for example where there are a large number of neighbouring vertexes with the same edge count. Like checking your actual route sometimes with the car (without a gps).
I've found out that the algorithm belongs in the category of k-dimension Weisfeiler-Lehman algorithms, and it fails with regular graphs. For more here:
http://dabacon.org/pontiff/?p=4148
Original post follows:
I've worked on the problem to find isomorphic graphs in a database of graphs (containing chemical compositions).
In brief, the algorithm creates a hash of a graph using the power iteration method. There might be false positive hash collisions but the probability of that is exceedingly small (i didn't had any such collisions with tens of thousands of graphs).
The way the algorithm works is this:
Do N (where N is the radius of the graph) iterations. On each iteration and for each node:
Sort the hashes (from the previous step) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by the direct neighbors of it. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first step) in the internal hashes that you calculate for each node.
There is more background here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
You can find the source code of it here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py

Resources