Solving logic game Lights out with A* algorithm - algorithm

I have some problems solving the logic puzzle called Lights out using the A* algorithm. For now, I'm using an implementation of the A* algorithm where I consider the entire matrix of lights as a node in the algorithm (where 1 represents the lights on and 0 the lights off), together with the coordinates of the current node that will be toggled. After I select the node from the open list with the lowest f score, I will toggle it and get its 8 adjacent neighbors and then append them to the open list and repeat until I find a node that has the sum of all the lights equals to 0 (all the lights are off).
For calculating the f score of each node, I simply compute the sum of all the lights in their local matrix, thus selecting every time the node which has the matrix with the lowest number of lights on.
I know that the algorithm will be not so performant, even when compared to the "Chasing the Lights" method, but I do not understand how to tell the algorithm which next node to pick, so which f scoring function to use, because considering the sum of the lights in the matrix will end up with the algorithm looping through the same 3/4 nodes every time.
Also, I would like some suggestions on how to represent a node for the algorithm since I can't get how to use an algorithm used generally for path optimization inside a matrix where you have a goal node, used in a situation like this where you consider the entire matrix as the node and your goal is not reaching a particular node but just checking that its sum is 0.
The language that I implemented all the work is Lua.
Thank you.
EDIT 5/27/19
Since I'm new to Lua, I'm blaming my mistakes and my capability of writing code in it and also my understanding of the algorithm to the fact that I'm not able to find the solution.
I wasn't good at explaining the problem I was having so I tried to get the best from the comments I received and now I will post the modified code so that guys if you want to help, you will understand better (code >> words haha).
Note: I wrote the algorithm based on this article A* algorithm
lua source code

Not sure what you're looking for.
First of all, simple observation: each field in your game board you want to toggle at most once (toggling a field two times doesn't do anything).
You can go with a very bad approach and create 2^n game states (nodes in your graph), where n is amount of fields in your game board. Why is it terrible? It would take you at least O(2^n) time and space to just create the graph. In O(2^n) time you can just check all possible moves (since you want to toggle each board once) and just immidiately tell the result instead of running additional A*.
Better idea(?): Let's not phisically create the whole graph. You don't want to visit a node two times, so you should store somewhere (probably as some set of bitmasks, where you'd be able to check if a node is already in the set quickly) already visited nodes. Then, when you're in some node, you check all your neighbours - game states after toggling one field. If they're visited, ignore them, otherwise add them to your priority queue. Then take the first node from priority queue, remove it and 'go' to a game state it is representing.
As I said, you can represent game state as bitmask of size n - 0 on i-th position if i-th field hasn't been toggled, 1 otherwise.
Will that be better than naive approach? Depends on your heuristic function. I have no idea which one is better, you have to try some and check the results. In worst case your A* will check every nodes, making the complexity worse than simple brute force. If you get lucky, it may significantly speed it up.

Related

Fixing Karger's min cut algorithm with union-find data structure

I was trying to implement Karger's min cut algorithm in the same way it is explained here but I don't like the fact that at each step of the while loop we can pick an edge with it's two endpoints already in a supernode. More specifically, this part
// If two corners belong to same subset,
// then no point considering this edge
if (subset1 == subset2)
continue;
Is there a quick fix for avoiding this problem?
It might help to back up and think about why there’s a union-find structure here at all and why it’s worth improving on the continue statement.
Conceptually, each contraction performed changes the graph in the following way:
The nodes contracted get replaced with a single node.
The edges incident to either node get replaced with an edge to the new joint node.
The edges running between the two earlier nodes get removed.
The question, then, is how to actually do this to the graph. The code you’ve found does this lazily. It doesn’t actually change the graph when the contraction is done. Instead, it uses the union-find structure to show which nodes are now equivalent to one another. When it samples a random edge, it then has to check whether that edge is one of the ones that would have been deleted in step (3). If so, it skips it and moves on. This has the effect that early contractions are really fast (the likelihood of picking two edges that are part of contracted nodes is very low when few edges are contracted), but later contractions might be a lot slower (once edges have started being contracted, lots of edges may have been deleted).
Here’s a simple modification you can use to speed this step up. Whenever you pick an edge to contract and find that its endpoints are already connected, discard that edge, and remove it from the list of edges so that it never gets picked again. You can do this by swapping that edge to the end of the list of edges, then removing the last element of the list. This has the effect that every edge processed will never be seen again, so across all iterations of the algorithm every edge will be processed at most once. That gives a runtime of one randomized contraction phase as O(m + nα(n)), where m is the number of edges and n is the number of nodes. The factor of α(n) comes from the use of the union-find structure.
If you truly want to remove all semblance of that continue statement, an alternative approach would be to directly simulate the contraction. After each contraction, iterate over all m edges and adjust each one by seeing whether it needs to remain unchanged, point to the new contracted node, or be removed altogether. This will take time O(m) per contraction for a net cost of O(mn) for the overall min cut calculation.
There are ways to speed things up beyond this. Karger’s original paper suggests generating a random permutation of the edges and using binary search over that array with a clever use of BFS or DFS to find the cut produced in time O(m), which is slightly faster than the O(m + nα(n)) approach for large graphs. The basic idea is the following:
Probe the middle element of the list of edges.
Run a BFS on the graph formed by only using those edges and see if there are exactly two connected components.
If so, great! Those two CCs are the ones you want.
If there is only one CC, discard the back half of the array of edges and try again.
If there is more than one CC, contract each CC into a single node and update a global table indicating which CC each node belongs to. Then discard the first half of the array and try again.
The cost of each BFS is O(m), where m is the number of edges in the graph, and this gives the recurrence T(m) = T(m/2) + O(m) because at each stage we’re throwing away half of the edges. That solves to O(m) total time, though as you can see, it’s a much trickier way of coding this algorithm up!
To summarize:
With a very small modification to the provided code, you can keep the continue statement in but still have a very fast implementation of the randomized contraction algorithm.
To eliminate that continue without sacrificing the runtime of the algorithm, you need to do some major surgery and change approaches to something only marginally asymptotically faster than keeping the continue in.
Hope this helps!

How do i find a path through a boolean array?

Here's a question I came across on an interview forum:
You have an array filled with 0s and 1s. A 0 represents burning lava,
and a 1 represents a stepping stone. You start at the beginning of the
array, and you want to find the fastest way to reach the end. At each
time step, you can either increase or decrease your velocity V by 1,
or you can jump to a stepping stone V steps away. You want to reach
the end of the array without overshooting.
What is a good algorithm to solve this?
I tried a few things (mostly using dynamic programming and recursion), but I couldn't figure out an optimal substructure that would lead to a non exponential algorithm. Any ideas?
At each step you have two choices: increasing velocity, or not increasing the velocity. Then, for each of these options, you end up at a different step and the choice repeats. Maybe you can see the tree pattern emerging here. Each node in the tree is a step and each edge is a choice. Each node (step) has two edges (choices), so it is a binary tree.
Also note that if you are at a step x with a velocity V, then it doesn't matter how you got there, the result of what follows will be the same. So here you can optimize a bit. (For example, using dynamic programming.)
The brute-force approach would be to just imagine this tree and do a depth-first search until you reach the end step exactly. There may be multiple solutions, and the fastest one is the solution.
Dynamic Programming is the right approach here.
Your search space is two-dimensional: the current position is your first dimension, and the current velocity is the second dimension. This means that you need a 2D array best[N][N], where N is the number of items in the boolean array. The value at best[s,v] represents the smallest number of steps required to reach position of s with the velocity of v. Examine each point of the search space to check if a stepping stone can be reached with the current velocity. If the answer is "yes", set the corresponding spot in the search space. Also set the points for adjacent speeds. The answer would be at the position best[N-1][0].

Generating a tower defense maze (longest maze with limited walls) - near-optimal heuristic?

In a tower defense game, you have an NxM grid with a start, a finish, and a number of walls.
Enemies take the shortest path from start to finish without passing through any walls (they aren't usually constrained to the grid, but for simplicity's sake let's say they are. In either case, they can't move through diagonal "holes")
The problem (for this question at least) is to place up to K additional walls to maximize the path the enemies have to take. For example, for K=14
My intuition tells me this problem is NP-hard if (as I'm hoping to do) we generalize this to include waypoints that must be visited before moving to the finish, and possibly also without waypoints.
But, are there any decent heuristics out there for near-optimal solutions?
[Edit] I have posted a related question here.
I present a greedy approach and it's maybe close to the optimal (but I couldn't find approximation factor). Idea is simple, we should block the cells which are in critical places of the Maze. These places can help to measure the connectivity of maze. We can consider the vertex connectivity and we find minimum vertex cut which disconnects the start and final: (s,f). After that we remove some critical cells.
To turn it to the graph, take dual of maze. Find minimum (s,f) vertex cut on this graph. Then we examine each vertex in this cut. We remove a vertex its deletion increases the length of all s,f paths or if it is in the minimum length path from s to f. After eliminating a vertex, recursively repeat the above process for k time.
But there is an issue with this, this is when we remove a vertex which cuts any path from s to f. To prevent this we can weight cutting node as high as possible, means first compute minimum (s,f) cut, if cut result is just one node, make it weighted and set a high weight like n^3 to that vertex, now again compute the minimum s,f cut, single cutting vertex in previous calculation doesn't belong to new cut because of waiting.
But if there is just one path between s,f (after some iterations) we can't improve it. In this case we can use normal greedy algorithms like removing node from a one of a shortest path from s to f which doesn't belong to any cut. after that we can deal with minimum vertex cut.
The algorithm running time in each step is:
min-cut + path finding for all nodes in min-cut
O(min cut) + O(n^2)*O(number of nodes in min-cut)
And because number of nodes in min cut can not be greater than O(n^2) in very pessimistic situation the algorithm is O(kn^4), but normally it shouldn't take more than O(kn^3), because normally min-cut algorithm dominates path finding, also normally path finding doesn't takes O(n^2).
I guess the greedy choice is a good start point for simulated annealing type algorithms.
P.S: minimum vertex cut is similar to minimum edge cut, and similar approach like max-flow/min-cut can be applied on minimum vertex cut, just assume each vertex as two vertex, one Vi, one Vo, means input and outputs, also converting undirected graph to directed one is not hard.
it can be easily shown (proof let as an exercise to the reader) that it is enough to search for the solution so that every one of the K blockades is put on the current minimum-length route. Note that if there are multiple minimal-length routes then all of them have to be considered. The reason is that if you don't put any of the remaining blockades on the current minimum-length route then it does not change; hence you can put the first available blockade on it immediately during search. This speeds up even a brute-force search.
But there are more optimizations. You can also always decide that you put the next blockade so that it becomes the FIRST blockade on the current minimum-length route, i.e. you work so that if you place the blockade on the 10th square on the route, then you mark the squares 1..9 as "permanently open" until you backtrack. This saves again an exponential number of squares to search for during backtracking search.
You can then apply heuristics to cut down the search space or to reorder it, e.g. first try those blockade placements that increase the length of the current minimum-length route the most. You can then run the backtracking algorithm for a limited amount of real-time and pick the best solution found thus far.
I believe we can reduce the contained maximum manifold problem to boolean satisifiability and show NP-completeness through any dependency on this subproblem. Because of this, the algorithms spinning_plate provided are reasonable as heuristics, precomputing and machine learning is reasonable, and the trick becomes finding the best heuristic solution if we wish to blunder forward here.
Consider a board like the following:
..S........
#.#..#..###
...........
...........
..........F
This has many of the problems that cause greedy and gate-bound solutions to fail. If we look at that second row:
#.#..#..###
Our logic gates are, in 0-based 2D array ordered as [row][column]:
[1][4], [1][5], [1][6], [1][7], [1][8]
We can re-render this as an equation to satisfy the block:
if ([1][9] AND ([1][10] AND [1][11]) AND ([1][12] AND [1][13]):
traversal_cost = INFINITY; longest = False # Infinity does not qualify
Excepting infinity as an unsatisfiable case, we backtrack and rerender this as:
if ([1][14] AND ([1][15] AND [1][16]) AND [1][17]:
traversal_cost = 6; longest = True
And our hidden boolean relationship falls amongst all of these gates. You can also show that geometric proofs can't fractalize recursively, because we can always create a wall that's exactly N-1 width or height long, and this represents a critical part of the solution in all cases (therefore, divide and conquer won't help you).
Furthermore, because perturbations across different rows are significant:
..S........
#.#........
...#..#....
.......#..#
..........F
We can show that, without a complete set of computable geometric identities, the complete search space reduces itself to N-SAT.
By extension, we can also show that this is trivial to verify and non-polynomial to solve as the number of gates approaches infinity. Unsurprisingly, this is why tower defense games remain so fun for humans to play. Obviously, a more rigorous proof is desirable, but this is a skeletal start.
Do note that you can significantly reduce the n term in your n-choose-k relation. Because we can recursively show that each perturbation must lie on the critical path, and because the critical path is always computable in O(V+E) time (with a few optimizations to speed things up for each perturbation), you can significantly reduce your search space at a cost of a breadth-first search for each additional tower added to the board.
Because we may tolerably assume O(n^k) for a deterministic solution, a heuristical approach is reasonable. My advice thus falls somewhere between spinning_plate's answer and Soravux's, with an eye towards machine learning techniques applicable to the problem.
The 0th solution: Use a tolerable but suboptimal AI, in which spinning_plate provided two usable algorithms. Indeed, these approximate how many naive players approach the game, and this should be sufficient for simple play, albeit with a high degree of exploitability.
The 1st-order solution: Use a database. Given the problem formulation, you haven't quite demonstrated the need to compute the optimal solution on the fly. Therefore, if we relax the constraint of approaching a random board with no information, we can simply precompute the optimum for all K tolerable for each board. Obviously, this only works for a small number of boards: with V! potential board states for each configuration, we cannot tolerably precompute all optimums as V becomes very large.
The 2nd-order solution: Use a machine-learning step. Promote each step as you close a gap that results in a very high traversal cost, running until your algorithm converges or no more optimal solution can be found than greedy. A plethora of algorithms are applicable here, so I recommend chasing the classics and the literature for selecting the correct one that works within the constraints of your program.
The best heuristic may be a simple heat map generated by a locally state-aware, recursive depth-first traversal, sorting the results by most to least commonly traversed after the O(V^2) traversal. Proceeding through this output greedily identifies all bottlenecks, and doing so without making pathing impossible is entirely possible (checking this is O(V+E)).
Putting it all together, I'd try an intersection of these approaches, combining the heat map and critical path identities. I'd assume there's enough here to come up with a good, functional geometric proof that satisfies all of the constraints of the problem.
At the risk of stating the obvious, here's one algorithm
1) Find the shortest path
2) Test blocking everything node on that path and see which one results in the longest path
3) Repeat K times
Naively, this will take O(K*(V+ E log E)^2) but you could with some little work improve 2 by only recalculating partial paths.
As you mention, simply trying to break the path is difficult because if most breaks simply add a length of 1 (or 2), its hard to find the choke points that lead to big gains.
If you take the minimum vertex cut between the start and the end, you will find the choke points for the entire graph. One possible algorithm is this
1) Find the shortest path
2) Find the min-cut of the whole graph
3) Find the maximal contiguous node set that intersects one point on the path, block those.
4) Wash, rinse, repeat
3) is the big part and why this algorithm may perform badly, too. You could also try
the smallest node set that connects with other existing blocks.
finding all groupings of contiguous verticies in the vertex cut, testing each of them for the longest path a la the first algorithm
The last one is what might be most promising
If you find a min vertex cut on the whole graph, you're going to find the choke points for the whole graph.
Here is a thought. In your grid, group adjacent walls into islands and treat every island as a graph node. Distance between nodes is the minimal number of walls that is needed to connect them (to block the enemy).
In that case you can start maximizing the path length by blocking the most cheap arcs.
I have no idea if this would work, because you could make new islands using your points. but it could help work out where to put walls.
I suggest using a modified breadth first search with a K-length priority queue tracking the best K paths between each island.
i would, for every island of connected walls, pretend that it is a light. (a special light that can only send out horizontal and vertical rays of light)
Use ray-tracing to see which other islands the light can hit
say Island1 (i1) hits i2,i3,i4,i5 but doesn't hit i6,i7..
then you would have line(i1,i2), line(i1,i3), line(i1,i4) and line(i1,i5)
Mark the distance of all grid points to be infinity. Set the start point as 0.
Now use breadth first search from the start. Every grid point, mark the distance of that grid point to be the minimum distance of its neighbors.
But.. here is the catch..
every time you get to a grid-point that is on a line() between two islands, Instead of recording the distance as the minimum of its neighbors, you need to make it a priority queue of length K. And record the K shortest paths to that line() from any of the other line()s
This priority queque then stays the same until you get to the next line(), where it aggregates all priority ques going into that point.
You haven't showed the need for this algorithm to be realtime, but I may be wrong about this premice. You could then precalculate the block positions.
If you can do this beforehand and then simply make the AI build the maze rock by rock as if it was a kind of tree, you could use genetic algorithms to ease up your need for heuristics. You would need to load any kind of genetic algorithm framework, start with a population of non-movable blocks (your map) and randomly-placed movable blocks (blocks that the AI would place). Then, you evolve the population by making crossovers and transmutations over movable blocks and then evaluate the individuals by giving more reward to the longest path calculated. You would then simply have to write a resource efficient path-calculator without the need of having heuristics in your code. In your last generation of your evolution, you would take the highest-ranking individual, which would be your solution, thus your desired block pattern for this map.
Genetic algorithms are proven to take you, under ideal situation, to a local maxima (or minima) in reasonable time, which may be impossible to reach with analytic solutions on a sufficiently large data set (ie. big enough map in your situation).
You haven't stated the language in which you are going to develop this algorithm, so I can't propose frameworks that may perfectly suit your needs.
Note that if your map is dynamic, meaning that the map may change over tower defense iterations, you may want to avoid this technique since it may be too intensive to re-evolve an entire new population every wave.
I'm not at all an algorithms expert, but looking at the grid makes me wonder if Conway's game of life might somehow be useful for this. With a reasonable initial seed and well-chosen rules about birth and death of towers, you could try many seeds and subsequent generations thereof in a short period of time.
You already have a measure of fitness in the length of the creeps' path, so you could pick the best one accordingly. I don't know how well (if at all) it would approximate the best path, but it would be an interesting thing to use in a solution.

An algorithm to check if a vertex is reachable

Is there an algorithm that can check, in a directed graph, if a vertex, let's say V2, is reachable from a vertex V1, without traversing all the vertices?
You might find a route to that node without traversing all the edges, and if so you can give a yes answer as soon as you do. Nothing short of traversing all the edges can confirm that the node isn't reachable (unless there's some other constraint you haven't stated that could be used to eliminate the possibility earlier).
Edit: I should add that it depends on how often you need to do queries versus how large (and dense) your graph is. If you need to do a huge number of queries on a relatively small graph, it may make sense to pre-process the data in the graph to produce a matrix with a bit at the intersection of any V1 and V2 to indicate whether there's a connection from V1 to V2. This doesn't avoid traversing the graph, but it can avoid traversing the graph at the time of the query. I.e., it's basically a greedy algorithm that assumes you're going to eventually use enough of the combinations that it's easiest to just traverse them all and store the result. Depending on the size of the graph, the pre-processing step may be slow, but once it's done executing a query becomes quite fast (constant time, and usually a pretty small constant at that).
Depth first search or breadth first search. Stop when you find one. But there's no way to tell there's none without going through every one, no. You can improve the performance sometimes with some heuristics, like if you have additional information about the graph. For example, if the graph represents a coordinate space like a real map, and most of the time you know that there's going to be a mostly direct path, then you can attempt to have the depth-first search look along lines that "aim towards the target". However, imagine the case where the start and end points are right next to each other, but with no vector inbetween, and to find it, you have to go way out of the way. You have to check every case in order to be exhaustive.
I doubt it has a name, but a breadth-first search might go like this:
Add V1 to a queue of nodes to be visited
While there are nodes in the queue:
If the node is V2, return true
Mark the node as visited
For every node at the end of an outgoing edge which is not yet visited:
Add this node to the queue
End for
End while
Return false
Create an adjacency matrix when the graph is created. At the same time you do this, create matrices consisting of the powers of the adjacency matrix up to the number of nodes in the graph. To find if there is a path from node u to node v, check the matrices (starting from M^1 and going to M^n) and examine the value at (u, v) in each matrix. If, for any of the matrices checked, that value is greater than zero, you can stop the check because there is indeed a connection. (This gives you even more information as well: the power tells you the number of steps between nodes, and the value tells you how many paths there are between nodes for that step number.)
(Note that if you know the number of steps in the longest path in your graph, for whatever reason, you only need to create a number of matrices up to that power. As well, if you want to save memory, you could just store the base adjacency matrix and create the others as you go along, but for large matrices that may take a fair amount of time if you aren't using an efficient method of doing the multiplications, whether from a library or written on your own.)
It would probably be easiest to just do a depth- or breadth-first search, though, as others have suggested, not only because they're comparatively easy to implement but also because you can generate the path between nodes as you go along. (Technically you'd be generating multiple paths and discarding loops/dead-end ones along the way, but whatever.)
In principle, you can't determine that a path exists without traversing some part of the graph, because the failure case (a path does not exist) cannot be determined without traversing the entire graph.
You MAY be able to improve your performance by searching backwards (search from destination to starting point), or by alternating between forward and backward search steps.
Any good AI textbook will talk at length about search techniques. Elaine Rich's book was good in this area. Amazon is your FRIEND.
You mentioned here that the graph represents a road network. If the graph is planar, you could use Thorup's Algorithm which creates an O(nlogn) space data structure that takes O(nlogn) time to build and answers queries in O(1) time.
Another approach to this problem would allow you to ignore all of the vertices. If you were to only look at the edges, you can produce a transitive closure array that will show you each vertex that is reachable from any other vertex.
Start with your list of edges:
Va -> Vc
Va -> Vd
....
Create an array with start location as the rows and end location as the columns. Fill the arrays with 0. For each edge in the list of edges, place a one in the start,end coordinate of the edge.
Now you iterate a few times until either V1,V2 is 1 or there are no changes.
For each row:
NextRowN = RowN
For each column that is true for RowN
Use boolean OR to OR in the results of that row of that number with the current NextRowN.
Set RowN to NextRowN
If you run this algorithm until the end, you will quickly have a complete list of all reachable vertices without looking at any of them. The runtime is proportional to the number of edges. This would work well with a reasonable implementation and a reasonable number of edges.
A slightly more complex version of this algorithm would be to only calculate the vertices reachable by V1. To do this, you would focus your scope on the ones that are currently reachable at any given time. You can also limit adding rows to only one time, since the other rows are never changing.
In order to be sure, you either have to find a path, or traverse all vertices that are reachable from V1 once.
I would recommend an implementation of depth first or breadth first search that stops when it encounters a vertex that it has already seen. The vertex will be processed on the first occurrence only. You need to make sure that the search starts at V1 and stops when it runs out of vertices or encounters V2.

Algorithm to find two points furthest away from each other

Im looking for an algorithm to be used in a racing game Im making. The map/level/track is randomly generated so I need to find two locations, start and goal, that makes use of the most of the map.
The algorithm is to work inside a two dimensional space
From each point, one can only traverse to the next point in four directions; up, down, left, right
Points can only be either blocked or nonblocked, only nonblocked points can be traversed
Regarding the calculation of distance, it should not be the "bird path" for a lack of a better word. The path between A and B should be longer if there is a wall (or other blocking area) between them.
Im unsure on where to start, comments are very welcome and proposed solutions are preferred in pseudo code.
Edit: Right. After looking through gs's code I gave it another shot. Instead of python, I this time wrote it in C++. But still, even after reading up on Dijkstras algorithm, the floodfill and Hosam Alys solution, I fail to spot any crucial difference. My code still works, but not as fast as you seem to be getting yours to run. Full source is on pastie. The only interesting lines (I guess) is the Dijkstra variant itself on lines 78-118.
But speed is not the main issue here. I would really appreciate the help if someone would be kind enough to point out the differences in the algorithms.
In Hosam Alys algorithm, is the only difference that he scans from the borders instead of every node?
In Dijkstras you keep track and overwrite the distance walked, but not in floodfill, but thats about it?
Assuming the map is rectangular, you can loop over all border points, and start a flood fill to find the most distant point from the starting point:
bestSolution = { start: (0,0), end: (0,0), distance: 0 };
for each point p on the border
flood-fill all points in the map to find the most distant point
if newDistance > bestSolution.distance
bestSolution = { p, distantP, newDistance }
end if
end loop
I guess this would be in O(n^2). If I am not mistaken, it's (L+W) * 2 * (L*W) * 4, where L is the length and W is the width of the map, (L+W) * 2 represents the number of border points over the perimeter, (L*W) is the number of points, and 4 is the assumption that flood-fill would access a point a maximum of 4 times (from all directions). Since n is equivalent to the number of points, this is equivalent to (L + W) * 8 * n, which should be better than O(n2). (If the map is square, the order would be O(16n1.5).)
Update: as per the comments, since the map is more of a maze (than one with simple obstacles as I was thinking initially), you could make the same logic above, but checking all points in the map (as opposed to points on the border only). This should be in order of O(4n2), which is still better than both F-W and Dijkstra's.
Note: Flood filling is more suitable for this problem, since all vertices are directly connected through only 4 borders. A breadth first traversal of the map can yield results relatively quickly (in just O(n)). I am assuming that each point may be checked in the flood fill from each of its 4 neighbors, thus the coefficient in the formulas above.
Update 2: I am thankful for all the positive feedback I have received regarding this algorithm. Special thanks to #Georg for his review.
P.S. Any comments or corrections are welcome.
Follow up to the question about Floyd-Warshall or the simple algorithm of Hosam Aly:
I created a test program which can use both methods. Those are the files:
maze creator
find longest distance
In all test cases Floyd-Warshall was by a great magnitude slower, probably this is because of the very limited amount of edges that help this algorithm to achieve this.
These were the times, each time the field was quadruplet and 3 out of 10 fields were an obstacle.
Size Hosam Aly Floyd-Warshall
(10x10) 0m0.002s 0m0.007s
(20x20) 0m0.009s 0m0.307s
(40x40) 0m0.166s 0m22.052s
(80x80) 0m2.753s -
(160x160) 0m48.028s -
The time of Hosam Aly seems to be quadratic, therefore I'd recommend using that algorithm.
Also the memory consumption by Floyd-Warshall is n2, clearly more than needed.
If you have any idea why Floyd-Warshall is so slow, please leave a comment or edit this post.
PS: I haven't written C or C++ in a long time, I hope I haven't made too many mistakes.
It sounds like what you want is the end points separated by the graph diameter. A fairly good and easy to compute approximation is to pick a random point, find the farthest point from that, and then find the farthest point from there. These last two points should be close to maximally separated.
For a rectangular maze, this means that two flood fills should get you a pretty good pair of starting and ending points.
I deleted my original post recommending the Floyd-Warshall algorithm. :(
gs did a realistic benchmark and guess what, F-W is substantially slower than Hosam Aly's "flood fill" algorithm for typical map sizes! So even though F-W is a cool algorithm and much faster than Dijkstra's for dense graphs, I can't recommend it anymore for the OP's problem, which involves very sparse graphs (each vertex has only 4 edges).
For the record:
An efficient implementation of Dijkstra's algorithm takes O(Elog V) time for a graph with E edges and V vertices.
Hosam Aly's "flood fill" is a breadth first search, which is O(V). This can be thought of as a special case of Dijkstra's algorithm in which no vertex can have its distance estimate revised.
The Floyd-Warshall algorithm takes O(V^3) time, is very easy to code, and is still the fastest for dense graphs (those graphs where vertices are typically connected to many other vertices). But it's not the right choice for the OP's task, which involves very sparse graphs.
Raimund Seidel gives a simple method using matrix multiplication to compute the all-pairs distance matrix on an unweighted, undirected graph (which is exactly what you want) in the first section of his paper On the All-Pairs-Shortest-Path Problem in Unweighted Undirected Graphs
[pdf].
The input is the adjacency matrix and the output is the all-pairs shortest-path distance matrix. The run-time is O(M(n)*log(n)) for n points where M(n) is the run-time of your matrix multiplication algorithm.
The paper also gives the method for computing the actual paths (in the same run-time) if you need this too.
Seidel's algorithm is cool because the run-time is independent of the number of edges, but we actually don't care here because our graph is sparse. However, this may still be a good choice (despite the slightly-worse-than n^2 run-time) if you want the all pairs distance matrix, and this might also be easier to implement and debug than floodfill on a maze.
Here is the pseudocode:
Let A be the nxn (0-1) adjacency matrix of an unweighted, undirected graph, G
All-Pairs-Distances(A)
Z = A * A
Let B be the nxn matrix s.t. b_ij = 1 iff i != j and (a_ij = 1 or z_ij > 0)
if b_ij = 1 for all i != j return 2B - A //base case
T = All-Pairs-Distances(B)
X = T * A
Let D be the nxn matrix s.t. d_ij = 2t_ij if x_ij >= t_ij * degree(j), otherwise d_ij = 2t_ij - 1
return D
To get the pair of points with the greatest distance we just return argmax_ij(d_ij)
Finished a python mockup of the dijkstra solution to the problem.
Code got a bit long so I posted it somewhere else: http://refactormycode.com/codes/717-dijkstra-to-find-two-points-furthest-away-from-each-other
In the size I set, it takes about 1.5 seconds to run the algorithm for one node. Running it for every node takes a few minutes.
Dont seem to work though, it always displays the topleft and bottomright corner as the longest path; 58 tiles. Which of course is true, when you dont have obstacles. But even adding a couple of randomly placed ones, the program still finds that one the longest. Maybe its still true, hard to test without more advanced shapes.
But maybe it can at least show my ambition.
Ok, "Hosam's algorithm" is a breadth first search with a preselection on the nodes.
Dijkstra's algorithm should NOT be applied here, because your edges don't have weights.
The difference is crucial, because if the weights of the edges vary, you need to keep a lot of options (alternate routes) open and check them with every step. This makes the algorithm more complex.
With the breadth first search, you simply explore all edges once in a way that garantuees that you find the shortest path to each node. i.e. by exploring the edges in the order you find them.
So basically the difference is Dijkstra's has to 'backtrack' and look at edges it has explored before to make sure it is following the shortest route, while the breadth first search always knows it is following the shortest route.
Also, in a maze the points on the outer border are not guaranteed to be part of the longest route.
For instance, if you have a maze in the shape of a giant spiral, but with the outer end going back to the middle, you could have two points one at the heart of the spiral and the other in the end of the spiral, both in the middle!
So, a good way to do this is to use a breadth first search from every point, but remove the starting point after a search (you already know all the routes to and from it).
Complexity of breadth first is O(n), where n = |V|+|E|. We do this once for every node in V, so it becomes O(n^2).
Your description sounds to me like a maze routing problem. Check out the Lee Algorithm. Books about place-and-route problems in VLSI design may help you - Sherwani's "Algorithms for VLSI Physical Design Automation" is good, and you may find VLSI Physical Design Automation by Sait and Youssef useful (and cheaper in its Google version...)
If your objects (points) do not move frequently you can perform such a calculation in a much shorter than O(n^3) time.
All you need is to break the space into large grids and pre-calculate the inter-grid distance. Then selecting point pairs that occupy most distant grids is a matter of simple table lookup. In the average case you will need to pair-wise check only a small set of objects.
This solution works if the distance metrics are continuous. Thus if, for example there are many barriers in the map (as in mazes), this method might fail.

Resources