How to determine which rooms are effectively identical in a given maze - algorithm

The problem in question can be found here
TL;DR:
There is a maze made up of circular rooms connected by indistinguishable corridors, the goal of players is to walk around and map out the whole maze.
Our goal is to look at a maze and try to reduce it as much as possible.
When looking at a maze you can compare two rooms A and B, if, when you are randomly dropped into the maze, you cannot tell whether you began in A or B these rooms are considered
effectively identical.
By running the maze through an algorithm we should be able to remove all effectively identical rooms thus making the maze smaller without affecting the overall feel of the maze to the players.
More details and rules are in the aforementioned document.
My intuition tells me to walk though the maze making a tree from every single node and then comparing the trees. I will include pictures for the given examples in the document.
Picture 1
Picture 2

I feel, as does Patrick87 in the comments, that it seems likely that a simple graph will eventually be reduced to a single node. So as a frame challenge, I suggest not doing this! But if you want to, you're essentially asking to do a pile of graph isomorphisms, and so there's no better tool than nauty.*
So what you need to do is essentially (pseudocode):
reduce(G):
for n1 in G.nodes:
for n2 in G.nodes:
if n1 != n2:
let G1 = G
swap G1[n1] and G1[n2]
if nauty.isomorphic(G, G1):
G.delete(n1) # or n2
return G
return G # G cannot be reduced
* Boost has an isomorphism library, but it is naive and very slow.

Related

Knight's Travails and Binary Search Tree

Here is some context if you're unfamiliar with Knight's Travails.
Your task is to build a function knight_moves that shows the simplest possible way to get from one square to another by outputting all squares the knight will stop on along the way.
I know where to find complete solutions to this exercise, but I'm trying to mostly work through it on my own.
Where I'm stuck is how to set up a binary tree, or, specifically, what is the relationship between all the next possible moves a knight can make from its current location?
As I understand a BST, the defining property is that a tree(and any of its subtrees and leaves) store keys that are larger to the root node to the right, and smaller keys to the left of the root node.
How do I represent the value of the knight's current location and its possible(valid) moves?
It would be more valuable if the answer provided is a more of a guiding principle (philosophy?) when thinking about BST keys/values and defining parent-child relationships.
Thank you.
This is not a BST problem. Attack this as a graph search problem with backtracking. Start by learning and understanding Dijkstra's algorithm, treating the squares of the board as nodes in a graph. For instance, a1 connects to b3 and c2; all moves have a cost of 1 (which simplifies the algorithm a little).
You'll need to build your graph either before you start, connecting all possible moves; or on the fly, generating all possibilities from the current square. Most students find that doing it on the fly is easier for them.
Also, changing from algebraic chess notation to (row, col) notation helps. If the knight is currently at (row, col), then the legal next moves are
(row+-2, col +-1) and
(row+-1, col +-2)
... where x and y are both in the range 0-7 (throw out moves that take the knight past the board's edge).
Maintain a global list of squares you've already visited, and another of squares you need to visit (one move from those you've visited). You'll also need an array of least cost (number of moves) to get to each square. Initialize that to 100, and let the algorithm reduce it as it walks through the search.
Is that enough to get you moving?

Shortest path exercise

I’m trying to solve the following problem:
There are N planets in our galaxy. You can travel in between different planets, but not every planet is joined to another one through a safe route. Each route has a given length in light years. Your task is to establish bases on a given set of planets T(where 0
The input consists of N(number of planets), R(number of safe routes between planets), R routes in the form triples A B L, where A and B represent the stellar IDs of the planets, and L represents the distance between them in light years, T(number of planets where bases need to be established), followed by T numbers which represent the IDs of the planets where bases need to be established.
You always start with the planet with ID: 0. There may or may not be needed to establish a base on planet 0.
I tried solving the exercise and managed to get a working solution but it is too slow. I used the Floyd Warshall algorithm to get minimum paths in between any two nodes(planets) in a graph. Next I find the closest planet that needs a base to 0, and calculate that path. Then, I add this planet to a list of “visited planets”, and remove it from the list of “target” planets. I repeat this process until the end, but now I try to find the closest planet from the target planets to ANY of my visited planets(since traveling between them is free, I don’t care where I’ve been last). Then, I add the distance, remove it from targets, add it to visited and keep going until I establish every base necessary.
This provides the correct output, but it is too slow.
Any ideas for improvement? Possibly some altered version of Dijkstra’s algorithm?
I believe you want a minimal spanning tree of the graph consisting of the nodes from T and node 0 (starting point). The distances between nodes T are given by the shortest distances you calculated. The exact path between nodes in T may go through non-T points in N, but otherwise those points are irrelevant.
There are numerous algorithms for minimal spanning tree. I suggest Kruskal's algorithm as reasonably fast and reasonably easy to implement.

Should I iterate over a directed graph using Iterative deepening depth-first search (IDDFS)?

Example: I have 20 persons as object, and every person knows 0-n others. The direction of the link matters! A person A might know B, but B might not know A. It's a directed graph.
Edit: For simplification, my node objects (in this case Person objects) are able to store arbitrary information. I know this is not the best design but for now that would be fine.
So in the worst case everyone is connected with everyone else, everyone knows everyone else.
This is no real use case but I want to write a test for this to learn and play around. In a productive environment the number of objects would be limited to about 20, but the ways in which those objects are connected to eachother are unlimited.
This illustrates the problem in a simplified way:
thanks to source
Given a specific person as starting point, I want to walk through the whole graph and examine every possible path exactly once without getting stuck in an infinite loop.
Let's imagine person A knows B, who knows C, and who knows A. The output might be:
A knows B knows C knows A (ok but we don't want to end in an infinite loop so we stop here)
A knows C knows A
A knows T knows R knows V
This would be stupid and must be eliminated:
A knows B knows C knows A knows C knows A knows T knows R knows V ...
I do have a couple of crazy ideas how to tackle this problem. But...
Question) Must I do that with an Iterative deepening depth-first search (IDDFS)?
Jon was so kind to point out DFS on Wikipedia
I'm stuck with this part in the article:
a depth-first search starting at A,
assuming that the left edges in the
shown graph are chosen before right
edges, and assuming the search
remembers previously-visited nodes and
will not repeat them (since this is a
small graph), will visit the nodes in
the following order: A, B, D, F, E, C,
G. The edges traversed in this search
form a Trémaux tree, a structure with
important applications in graph
theory.
specifically this note:
"(since this is a small graph)"
OK so what if this is a huge graph?
Edit: I should mention the authors title and question has changed so much, some of the information in this answer may not be 100% relevant.
As Jon has already mentioned, this is indeed, a graph. A directed graph in fact.
I suggest you look at Adjacency matrices, they will provide you with direct insight as to how you can reach a solution.
I imagine your original lazy solution was probably something akin to an Adjacency list; which is fine, but isn't as easy to implement, and also may be harder to traverse. There is two main differences between the two.
Adjacency lists will take up more space, but may be nicer in larger networks in minimizing computation over unconnected nodes; whereas adjacency matrices are are little more friendly, but store data for every edge, regardless of whether it exists (connected) or not.
The primary concern I found when using adjacency lists, was not their theoretical space, but in C++, I was storing each connected node as a pointer in a vector inside each node; this could get way out of hand as soon as the network got bigger, and was very unfriendly to visualize as well as managing new nodes and deleting nodes.
In comparison with adjacency matrices, which have a single reference for all nodes (can be stored in a single vector of nodes) and can be easily modified.
If your question is truly about traversal, then if your graph is implemented as an adjacency matrix, as a vector of vectors, traversal is simple. See below pseudocode:
To read (for each neuron) all neurons a neuron's axon is connected to (ie neuron outputs)
for (size_t i = 0; i < n; ++i) { // adjacency matrix is n * n
Neuron& neuron = nodes[i];
for (size_t j = 0; i < n; ++i) {
Axon_connection& connection = edges[j][i];
if (connection.exists()) {
...
}
}
}
To read all (for each neuron) neurons a neuron's dendrites are connected to (ie neuron inputs)
for (size_t i = 0; i < n; ++i) { // adjacency matrix is n * n
Neuron& neuron = nodes[i];
for (size_t j = 0; i < n; ++i) {
Dendrite& dendrite = edges[j][i];
if (dendrite.exists()) {
...
}
}
}
Note this second method may not be cache friendly for big networks, depending on your implementation.
The exists method simply ensures the adjacency matrix bit is set to true, you can then implement other data such as strengths in these edges.
My friend, you have posted many very similar questions over the last day or two. I suggest you take a little bit of time out and read an introductory textbook on graph theory, or find some lectures on the subject.
Then you will at least know how to recognize and classify the standard problems. All you are going to get on SO are links back to such resources - it's not worth anyone's time writing out a fresh exposition. When you have a specific question, or are stuck understanding a particular issue, then ask and we will be happy to help, but you need to meet us half-way.
To answer your question, you can perform depth first search and breadth first search on an arbitrary graph as you have previously done on a tree - you just need to keep track of which nodes you have visited. Look out for this in any code/pseudocode you encounter. You don't have to keep track of visited notes on a tree (as in your other questions), as a tree is a special instance of a graph (a connected acyclic graph) which cannot be "wildly interconnected".
In answer to your original question, it is definitely theoretically possible to solve. However if you are after the shortest path then this looks suspiciously like the travelling salesman problem which is NP-hard.
In any case, there are many different graph traversal algorithms (DFS, IDDFS, BFS, etc) which could be of use.
Your data structure is indeed a graph.
I hate to provide such a bare answer, but the question is so basic that Graph Traversal on Wikipedia is more than adequate. The two basic approaches are explained and there is also pseudocode.
One way (and not, necessarily, the best way) to do this is to modify the graph.
For example, say that the graph initially encodes A-->B-->C. If the edge A-->C does not exist, add the edge A-->C.
You can do this for each node in your graph to explicitly state which nodes know each other.

Dominoes matching algorithm

Given some inputs, which consist of a left and right symbol, output chains which link the inputs.
Imagine the inputs to be dominoes which you cannot flip horizontally and need to chain them together. Creating big circular chains (ignore if you cannot physically do it with real dominos) is preferred over small circular chains which are preferred over chains where the start and end does not match.
Outputs with more circular chains (regardless of how many or chain length) are what we are looking for. For example, an output of 3 circular chains is better than 1 big chain and a leftover single domino.
Can someone point me in the right direction? What group of problems does this belong and are there existing algorithms for solving this?
Examples (outputs may be incorrect!):
in[0]=(A,B)
in[1]=(B,C)
in[2]=(C,A)
out[0]=(0,1,2)
in[0]=(A,B)
in[1]=(B,A)
in[2]=(C,D)
in[3]=(D,C)
out[0]=(0,1)
out[1]=(2,3)
in[0]=(A,B)
in[1]=(B,A)
in[2]=(C,D)
in[3]=(E,F)
out[0]=(0,1)
out[1]=(2)
out[2]=(3)
in[0]=(A,B)
in[1]=(B,A)
in[2]=(C,D)
in[3]=(D,E)
out[0]=(0,1)
out[1]=(2,3)
in[0]=(A,B)
in[1]=(B,C)
in[2]=(C,D)
out[0]=(0,1,2)
Dominoes which cannot be flipped horizontally == directed graphs.
Putting dominoes one after the other is called a "path", if it is a closed path, it's a circuit.
A circuit that includes all the vertices of a graph is a Hamiltonian circuit.
Your problem in graph theory terms is: how to split (decompose) your graph into a minimum number of subgraphs that have Hamiltonian circuits. (a.k.a. Hamiltonian graphs)
The problem as it is now is not as clearly stated as it could be - how exactly are solutions rated? What is the most important criterion? Is it the length of the longest chain? Is there a penalty for creating chains of length one?
It is often helpful in such problems to visualize the structure as a graph - say, assign a vertex (V[i]) to each tile. Then for each i, j create an edge between vertices V[i], V[j] if you can place V[i] to the left of V[j] in a chain (so if V[i] corresponds to (X, A) then V[j] corresponds to (A, Y) for some X, Y, A).
In such a graph chains are paths, cycles are closed ("circular") chains and the problem has been reduced to finding some cycle and/or path covering for a graph. This type of problems can in turn often be reduced to matching or *-flow problems (max-flow, max-cost-max-flow, min-cost-max-flow or what have you).
But before you can reduce further you have to establish the precise rules according to which one solution is determined to be "better" than another.
It is easy to check whether there exists a circular chain consisting of all dominoes. First you need to make the following directed graph G:
Nodes of G are symbols on the dominoes (A,B,C..) in your example,
For each domino (A,B) you put a directed edge from A to B.
There exists a circular chain consisting of all dominoes iff there exists a Eulerian cycle in G. To check whether there exists Eulerian cycle in G it is sufficient to check weather each node has even degree.
I'm not sure if this is really the case, but judging by your examples, your problem looks similar to the problem of decomposing a permutation into a product of disjoint cycles. Each tile (X,Y) can be seen as P(X) = Y for a permutation P. If this agrees with your assumptions, the good (or bad) news is that such decomposition is unique (up to the cycle order) and is very easy to find. Basically, you start with any tile, find a matching tile on the other hand and follow this until no matching can be found. Then you move to the next untouched point. If that's not what you are looking for, the more general graph-based solution by t.dubrownik looks like the way to go.

Algorithm to find two points furthest away from each other

Im looking for an algorithm to be used in a racing game Im making. The map/level/track is randomly generated so I need to find two locations, start and goal, that makes use of the most of the map.
The algorithm is to work inside a two dimensional space
From each point, one can only traverse to the next point in four directions; up, down, left, right
Points can only be either blocked or nonblocked, only nonblocked points can be traversed
Regarding the calculation of distance, it should not be the "bird path" for a lack of a better word. The path between A and B should be longer if there is a wall (or other blocking area) between them.
Im unsure on where to start, comments are very welcome and proposed solutions are preferred in pseudo code.
Edit: Right. After looking through gs's code I gave it another shot. Instead of python, I this time wrote it in C++. But still, even after reading up on Dijkstras algorithm, the floodfill and Hosam Alys solution, I fail to spot any crucial difference. My code still works, but not as fast as you seem to be getting yours to run. Full source is on pastie. The only interesting lines (I guess) is the Dijkstra variant itself on lines 78-118.
But speed is not the main issue here. I would really appreciate the help if someone would be kind enough to point out the differences in the algorithms.
In Hosam Alys algorithm, is the only difference that he scans from the borders instead of every node?
In Dijkstras you keep track and overwrite the distance walked, but not in floodfill, but thats about it?
Assuming the map is rectangular, you can loop over all border points, and start a flood fill to find the most distant point from the starting point:
bestSolution = { start: (0,0), end: (0,0), distance: 0 };
for each point p on the border
flood-fill all points in the map to find the most distant point
if newDistance > bestSolution.distance
bestSolution = { p, distantP, newDistance }
end if
end loop
I guess this would be in O(n^2). If I am not mistaken, it's (L+W) * 2 * (L*W) * 4, where L is the length and W is the width of the map, (L+W) * 2 represents the number of border points over the perimeter, (L*W) is the number of points, and 4 is the assumption that flood-fill would access a point a maximum of 4 times (from all directions). Since n is equivalent to the number of points, this is equivalent to (L + W) * 8 * n, which should be better than O(n2). (If the map is square, the order would be O(16n1.5).)
Update: as per the comments, since the map is more of a maze (than one with simple obstacles as I was thinking initially), you could make the same logic above, but checking all points in the map (as opposed to points on the border only). This should be in order of O(4n2), which is still better than both F-W and Dijkstra's.
Note: Flood filling is more suitable for this problem, since all vertices are directly connected through only 4 borders. A breadth first traversal of the map can yield results relatively quickly (in just O(n)). I am assuming that each point may be checked in the flood fill from each of its 4 neighbors, thus the coefficient in the formulas above.
Update 2: I am thankful for all the positive feedback I have received regarding this algorithm. Special thanks to #Georg for his review.
P.S. Any comments or corrections are welcome.
Follow up to the question about Floyd-Warshall or the simple algorithm of Hosam Aly:
I created a test program which can use both methods. Those are the files:
maze creator
find longest distance
In all test cases Floyd-Warshall was by a great magnitude slower, probably this is because of the very limited amount of edges that help this algorithm to achieve this.
These were the times, each time the field was quadruplet and 3 out of 10 fields were an obstacle.
Size Hosam Aly Floyd-Warshall
(10x10) 0m0.002s 0m0.007s
(20x20) 0m0.009s 0m0.307s
(40x40) 0m0.166s 0m22.052s
(80x80) 0m2.753s -
(160x160) 0m48.028s -
The time of Hosam Aly seems to be quadratic, therefore I'd recommend using that algorithm.
Also the memory consumption by Floyd-Warshall is n2, clearly more than needed.
If you have any idea why Floyd-Warshall is so slow, please leave a comment or edit this post.
PS: I haven't written C or C++ in a long time, I hope I haven't made too many mistakes.
It sounds like what you want is the end points separated by the graph diameter. A fairly good and easy to compute approximation is to pick a random point, find the farthest point from that, and then find the farthest point from there. These last two points should be close to maximally separated.
For a rectangular maze, this means that two flood fills should get you a pretty good pair of starting and ending points.
I deleted my original post recommending the Floyd-Warshall algorithm. :(
gs did a realistic benchmark and guess what, F-W is substantially slower than Hosam Aly's "flood fill" algorithm for typical map sizes! So even though F-W is a cool algorithm and much faster than Dijkstra's for dense graphs, I can't recommend it anymore for the OP's problem, which involves very sparse graphs (each vertex has only 4 edges).
For the record:
An efficient implementation of Dijkstra's algorithm takes O(Elog V) time for a graph with E edges and V vertices.
Hosam Aly's "flood fill" is a breadth first search, which is O(V). This can be thought of as a special case of Dijkstra's algorithm in which no vertex can have its distance estimate revised.
The Floyd-Warshall algorithm takes O(V^3) time, is very easy to code, and is still the fastest for dense graphs (those graphs where vertices are typically connected to many other vertices). But it's not the right choice for the OP's task, which involves very sparse graphs.
Raimund Seidel gives a simple method using matrix multiplication to compute the all-pairs distance matrix on an unweighted, undirected graph (which is exactly what you want) in the first section of his paper On the All-Pairs-Shortest-Path Problem in Unweighted Undirected Graphs
[pdf].
The input is the adjacency matrix and the output is the all-pairs shortest-path distance matrix. The run-time is O(M(n)*log(n)) for n points where M(n) is the run-time of your matrix multiplication algorithm.
The paper also gives the method for computing the actual paths (in the same run-time) if you need this too.
Seidel's algorithm is cool because the run-time is independent of the number of edges, but we actually don't care here because our graph is sparse. However, this may still be a good choice (despite the slightly-worse-than n^2 run-time) if you want the all pairs distance matrix, and this might also be easier to implement and debug than floodfill on a maze.
Here is the pseudocode:
Let A be the nxn (0-1) adjacency matrix of an unweighted, undirected graph, G
All-Pairs-Distances(A)
Z = A * A
Let B be the nxn matrix s.t. b_ij = 1 iff i != j and (a_ij = 1 or z_ij > 0)
if b_ij = 1 for all i != j return 2B - A //base case
T = All-Pairs-Distances(B)
X = T * A
Let D be the nxn matrix s.t. d_ij = 2t_ij if x_ij >= t_ij * degree(j), otherwise d_ij = 2t_ij - 1
return D
To get the pair of points with the greatest distance we just return argmax_ij(d_ij)
Finished a python mockup of the dijkstra solution to the problem.
Code got a bit long so I posted it somewhere else: http://refactormycode.com/codes/717-dijkstra-to-find-two-points-furthest-away-from-each-other
In the size I set, it takes about 1.5 seconds to run the algorithm for one node. Running it for every node takes a few minutes.
Dont seem to work though, it always displays the topleft and bottomright corner as the longest path; 58 tiles. Which of course is true, when you dont have obstacles. But even adding a couple of randomly placed ones, the program still finds that one the longest. Maybe its still true, hard to test without more advanced shapes.
But maybe it can at least show my ambition.
Ok, "Hosam's algorithm" is a breadth first search with a preselection on the nodes.
Dijkstra's algorithm should NOT be applied here, because your edges don't have weights.
The difference is crucial, because if the weights of the edges vary, you need to keep a lot of options (alternate routes) open and check them with every step. This makes the algorithm more complex.
With the breadth first search, you simply explore all edges once in a way that garantuees that you find the shortest path to each node. i.e. by exploring the edges in the order you find them.
So basically the difference is Dijkstra's has to 'backtrack' and look at edges it has explored before to make sure it is following the shortest route, while the breadth first search always knows it is following the shortest route.
Also, in a maze the points on the outer border are not guaranteed to be part of the longest route.
For instance, if you have a maze in the shape of a giant spiral, but with the outer end going back to the middle, you could have two points one at the heart of the spiral and the other in the end of the spiral, both in the middle!
So, a good way to do this is to use a breadth first search from every point, but remove the starting point after a search (you already know all the routes to and from it).
Complexity of breadth first is O(n), where n = |V|+|E|. We do this once for every node in V, so it becomes O(n^2).
Your description sounds to me like a maze routing problem. Check out the Lee Algorithm. Books about place-and-route problems in VLSI design may help you - Sherwani's "Algorithms for VLSI Physical Design Automation" is good, and you may find VLSI Physical Design Automation by Sait and Youssef useful (and cheaper in its Google version...)
If your objects (points) do not move frequently you can perform such a calculation in a much shorter than O(n^3) time.
All you need is to break the space into large grids and pre-calculate the inter-grid distance. Then selecting point pairs that occupy most distant grids is a matter of simple table lookup. In the average case you will need to pair-wise check only a small set of objects.
This solution works if the distance metrics are continuous. Thus if, for example there are many barriers in the map (as in mazes), this method might fail.

Resources