How can I find islands in a randomly generated hexagonal map?

How can I find islands in a randomly generated hexagonal map? - algorithm

I'm programming a Risk like game in Codigniter and JQuery. I've come up with a way to create randomly generated maps by making a full layout of tiles then deleting random ones. However, this sometimes produces what I call islands.
In risk, you can only attack one space over. So if one player happens to have an island all to them self, they would never be able to loose.
I'm trying to find a way that I can check the map before the game begins to see if it has islands.
I have already come up with a function to find out how many adjacent spaces there are to each space, but am not sure how to implement it in order to find islands.
Each missing spot is also identified as "water."
I'm not allowed to use image tags:
http://imgur.com/xwWzC.gif

There's a standard name for this problem but off the top of my head the following might work:
Pick any tile at random
Color it
Color its neighbours
Color its neighbours' neighbours
Color its neighbours' neighbours' neighbours, etc.
When you're done (i.e. when all neighbours are colored), loop through the list of all tiles to see whether there are any still/left uncolored (if so, they're an island).

How do you do the random generation? Probably the best way is to solve it at this time. When you're generating the map, if you notice that you just created is impossible to get to, you can resolve it by adding the appropriate element.
Though we'll need to know how you do the generation.

Here's your basic depth-first traversal starting at a random tile, pseudo-coded in python-like language:
visited = set()
queue = Queue()
r = random tile
queue.add(r)
while not queue.empty():
current = queue.pop()
visited.add(current)
for neighbor in current.neighbors():
if neighbor not in visited:
queue.add(neighbor)
if visited == set(all tiles):
print "No islands"
else:
print "Island starting at ", r

This hopefully provides another solution. Instead of "island" I'm using the term "disconnected component" since it only matters whether all tiles are reachable from all others (if there are disconnected components then a player cannot win via conquest if his own territories are all in one component).
Iterate over all 'land' tiles (easy enough to do) and for each tile generate a node in a graph.
For each vertex, join it with an undirected edge to the vertices representing its neighbour tiles (maximum of 6).
Pick any vertex and run depth-first search (or bread-first) from it.
If the set of vertices found by the DFS is equal to the set of all vertices then there are no disconnected components, otherwise a disconnected component (island) exists.
This should (I think) run in time O(n) where n is the number of land tiles.

Run a blurring kernel over your data set.
treating the hex grid as an image ( it is , sort of)
value(x,y) = average of all tiles around this (x,y)
this will erode beaches slightly, and eliminate islands.
It is left as an exercise for the student to run an edge-detection kernel over the resulting dataset to populate the beach tiles.

Related

Shortest path in a maze

I'm developing a game similar to Pacman: consider this maze:
Each white square is a node from the maze where an object located at P, say X, is moving towards node A in the right-to-left direction. X cannot switch to its opposite direction unless it encounters a dead-end such as A. Thus the shortest path joining P and B goes through A because X cannot reverse its direction towards the rightmost-bottom node (call it C). A common A* algorithm would output:
to get to B from P first go rightward, then go upward;
which is wrong. So I thought: well, I can set the C's visited attribute to true before running A* and let the algorithm find the path. Obviously this method doesn't work for the linked maze, unless I allow it to rediscover some nodes (the question is: which nodes? How to discriminate from useless nodes?). The first thinking that crossed my mind was: use the previous method always keeping track of the last-visited cell; if the resulting path isn't empty, you are done. Otherwise, when you get to the last-visited dead-end, say Y, (this step is followed by the failing of A*) go to Y, then use standard A* to get to the goal (I'm assuming the maze is connected). My questions are: is this guaranteed to work always? Is there a more efficient algorithm, such as an A*-derived algorithm modified to this purpose? How would you tackle this problem? I would greatly appreciate an answer explaining both optimal and non-optimal search techniques (actually I don't need the shortest path, a slightly long path is good, but I'm curious if such an optimal algorithm running as efficiently as Dijkstra's algorithm exists; if it does, what is its running time compared to a non-optimal algorithm?)
EDIT For Valdo: I added 3 cells in order to generalize a bit: please tell me if I got the idea:

Good question. I can suggest the following approach.
Use Dijkstra (or A*) algorithm on a directed graph. Each cell in your maze should be represented by multiple (up to 4) graph nodes, each node denoting the visited cell in a specific state.
That is, in your example you may be in the cell denoted by P in one of 2 states: while going left, and while going right. Each of them is represented by a separate graph node (though spatially it's the same cell). There's also no direct link between those 2 nodes, since you can't switch your direction in this specific cell.
According to your rules you may only switch direction when you encounter an obstacle, this is where you put links between the nodes denoting the same cell in different states.
You may also think of your graph as your maze copied into 4 layers, each layer representing the state of your pacman. In the layer that represents movement to the right you put only links to the right, also w.r.t. to the geometry of your maze. In the cells with obstacles where moving right is not possible you put links to the same cells at different layers.
Update:
Regarding the scenario that you described in your sketch. It's actually correct, you've got the idea right, but it looks complicated because you decided to put links between different cells AND states.
I suggest the following diagram:
The idea is to split your inter-cell AND inter-state links. There are now 2 kinds of edges: inter-cell, marked by blue, and inter-state, marked by red.
Blue edges always connect nodes of the same state (arrow direction) between adjacent cells, whereas red edges connect different states within the same cell.
According to your rules the state change is possible where the obstacle is encountered, hence every state node is the source of either blue edges if no obstacle, or red if it encounters an obstacle (i.e. can't emit a blue edge). Hence I also painted the state nodes in blue and red.
If according to your rules state transition happens instantly, without delay/penalty, then red edges have weight 0. Otherwise you may assign a non-zero weight for them, the weight ratio between red/blue edges should correspond to the time period ratio of turn/travel.

Generating Random Puzzle Boards for Rush Hour Game

If you're not familiar with it, the game consists of a collection of cars of varying sizes, set either horizontally or vertically, on a NxM grid that has a single exit.
Each car can move forward/backward in the directions it's set in, as long as another car is not blocking it. You can never change the direction of a car.
There is one special car, usually it's the red one. It's set in the same row that the exit is in, and the objective of the game is to find a series of moves (a move - moving a car N steps back or forward) that will allow the red car to drive out of the maze.
I've been trying to think how to generate instances for this problem, generating levels of difficulty based on the minimum number to solve the board.
Any idea of an algorithm or a strategy to do that?
Thanks in advance!

The board given in the question has at most 4*4*4*5*5*3*5 = 24.000 possible configurations, given the placement of cars.
A graph with 24.000 nodes is not very large for todays computers. So a possible approach would be to
construct the graph of all positions (nodes are positions, edges are moves),
find the number of winning moves for all nodes (e.g. using Dijkstra) and
select a node with a large distance from the goal.

One possible approach would be creating it in reverse.
Generate a random board, that has the red car in the winning position.
Build the graph of all reachable positions.
Select a position that has the largest distance from every winning position.
The number of reachable positions is not that big (probably always below 100k), so (2) and (3) are feasible.
How to create harder instances through local search
It's possible that above approach will not yield hard instances, as most random instances don't give rise to a complex interlocking behavior of the cars.
You can do some local search, which requires
a way to generate other boards from an existing one
an evaluation/fitness function
(2) is simple, maybe use the length of the longest solution, see above. Though this is quite costly.
(1) requires some thought. Possible modifications are:
add a car somewhere
remove a car (I assume this will always make the board easier)
Those two are enough to reach all possible boards. But one might to add other ways, because of removing makes the board easier. Here are some ideas:
move a car perpendicularly to its driving direction
swap cars within the same lane (aaa..bb.) -> (bb..aaa.)
Hillclimbing/steepest ascend is probably bad because of the large branching factor. One can try to subsample the set of possible neighbouring boards, i.e., don't look at all but only at a few random ones.

I know this is ancient but I recently had to deal with a similar problem so maybe this could help.
Constructing instances by applying random operators from a terminal state (i.e., reverse) will not work well. This is due to the symmetry in the state space. On average you end up in a state that is too close to the terminal state.
Instead, what worked better was to generate initial states (by placing random cars on the grid) and then to try to solve it with some bounded heuristic search algorithm such as IDA* or branch and bound. If an instance cannot be solved under the bound, discard it.
Try to avoid A*. If you have your definition of what you mean is a "hard" instance (I find 16 moves to be pretty difficult) you can use A* with a pruning rule that prevents expansion of nodes x with g(x)+h(x)>T (T being your threshold (e.g., 16)).
Heuristics function - Since you don't have to be optimal when solving it, you can use any simple inadmissible heuristic such as number of obstacle squares to the goal. Alternatively, if you need a stronger heuristic function, you can implement a manhattan distance function by generating the entire set of winning states for the generated puzzle and then using the minimal distance from a current state to any of the terminal state.

Finding Contiguous Areas of Bits in 2D Bit Array

The Problem
I have a bit array which represents a 2-dimensional "map" of "tiles". This image provides a graphical example of the bits in the bit array:
I need to determine how many contiguous "areas" of bits exist in the array. In the example above, there are two such contiguous "areas", as illustrated here:
Tiles must be located directly N, S, E or W of a tile to be considered "contiguous". Diagonally-touching tiles do not count.
What I've Got So Far
Because these bit arrays can become relatively large (several MB in size), I have intentionally avoided using any sort of recursion in my algorithm.
The pseudo-code is as follows:
LET S BE SOURCE DATA ARRAY
LET C BE ARRAY OF IDENTICAL LENGTH TO SOURCE DATA USED TO TRACK "CHECKED" BITS
FOREACH INDEX I IN S
IF C[I] THEN
CONTINUE
ELSE
SET C[I]
IF S[I] THEN
EXTRACT_AREA(S, C, I)
EXTRACT_AREA(S, C, I):
LET T BE TARGET DATA ARRAY FOR STORING BITS OF THE AREA WE'RE EXTRACTING
LET F BE STACK OF TILES TO SEARCH NEXT
PUSH I UNTO F
SET T[I]
WHILE F IS NOT EMPTY
LET X = POP FROM F
IF C[X] THEN
CONTINUE
ELSE
SET C[X]
IF S[X] THEN
PUSH TILE NORTH OF X TO F
PUSH TILE SOUTH OF X TO F
PUSH TILE WEST OF X TO F
PUSH TILE EAST OF X TO F
SET T[X]
RETURN T
What I Don't Like About My Solution
Just to run, it requires two times the memory of the bitmap array it's processing.
While extracting an "area", it uses three times the memory of the bitmap array.
Duplicates exist in the "tiles to check" stack - which seems ugly, but not worth avoiding the way I have things now.
What I'd Like To See
Better memory profile
Faster handling of large areas
Solution / Follow-Up
I re-wrote the solution to explore the edges only (per #hatchet 's suggestion).
This was very simple to implement - and eliminated the need to keep track of "visited tiles" completely.
Based on three simple rules, I can traverse the edges, track min/max x & y values, and complete when I've arrived at the start again.
Here's the demo with the three rules I used:

One approach would be a perimeter walk.
Given a starting point anywhere along the edge of the shape, remember that point.
Start the bounding box as just that point.
Walk the perimeter using a clockwise rule set - if the point used to get to the current point was above, then first look right, then down, then left to find the next point on the shape perimeter. This is kind of like the simple strategy of solving a maze where you continuously follow a wall and always bear to the right.
Each time you visit a new perimeter point, expand the bounding box if the new point is outside it (i.e. keep track of the min and max x and y.
Continue until the starting point is reached.
Cons: if the shape has lots of single pixel 'filaments', you'll be revisiting them as the walk comes back.
Pros: if the shape has large expanses of internal occupied space, you never have to visit them or record them like you would if you were recording visited pixels in a flood fill.
So, conserves space, but in some cases at the expense of time.
Edit
As is so often the case, this problem is known, named, and has multiple algorithmic solutions. The problem you described is called Minimum Bounding Rectangle. One way to solve this is using Contour Tracing. The method I described above is in that class, and is called Moore-Neighbor Tracing or Radial Sweep. The links I've included for them discuss them in detail and point out a problem I hadn't caught. Sometimes you'll revisit the start point before you have traversed the entire perimeter. If your start point is for instance somewhere along a single pixel 'filament', you will revisit it before you're done, and unless you consider this possibility, you'll stop prematurely. The website I linked to talks about ways to address this stopping problem. Other pages at that website also talk about two other algorithms: Square Tracing, and Theo Pavlidis's Algorithm. One thing to note is that these consider diagonals as contiguous, whereas you don't, but that should just something that can be handled with minor modifications to the basic algorithms.
An alternative approach to your problem is Connected-component labeling. For your needs though, this may be a more time expensive solution than you require.
Additional resource:
Moore Neighbor Contour Tracing Algorithm in C++

I actually got a question like this in an interview once.
You can pretend the array is a graph and the connected nodes are the adjacent ones. My algo would involves going 1 to the right until you find a marked node. When you find one do a breadth first search which runs in O(n) and avoids recursion. When the BFS returns keep searching from where you left off and if the node has already been marked by one of the previous BFS's you obviously don't need to search. I wasn't sure if you wanted to actually return the number of objects found, but it's easy to keep track by just incrementing a counter when you hit the first marked square.
Generally when you do a flood fill type algorithm you are placed in a spot and asked to fill. Since this is finding all the filled regions one way you would want to optimize it is to avoid rechecking the already marked nodes from previous BFS's, unfortunately at the moment I cannot think of a way to do that.
One hacky way to reduce memory consumption would be too store a short[][] instead of a boolean. Then use this scheme to avoid making a whole second 2d-array
unmarked = 0, marked = 1, checked and unmarked = 3, checked and marked = 3
This way you can check the status of an entry by its value and avoid making a second array.

Shortest Path, Least Turns Algorithm

There is a square grid with obstacles. On that grid, there are two members of a class Person. They face a certain direction (up, right, left or down). Each person has a certain amount of energy. Making the persons turn or making them move consumes energy (turning consumes 1 energy unit, moving consumes 5 energy units).
My goal is to make them move as close as possible next to each other (expressed as the manhattan distance), consuming the least amount of energy possible. Keep in mind there are obstacles on the grid.
How would I do this?

I would use a breadth-first search and count a minimal energy value to reach each square. It would terminate when the players meet or there is no more energy left.

I'll make assumptions and remove them later.
Assuming the grid is smaller than 1000x1000 and that you can't run out of energy..:
Assuming they can't reach each other:
for Person1,Person2, find their respective sets of reachable points, R1,R2.
(use Breadth first search for example)
sort R1 and R2 by x value.
Now go through R1 and R2 to find the pair of points that are closest together.
hint: we sorted the two arrays so we know when points are close in terms of their x coordinate. We never have to go further apart on the x coordinate than the current found minimum.
Assuming they can reach each other: Use BFS from person1 until you find person2 and record the path
If the path found using BFS required no turns, then that is the solution,
Otherwise:
Create 4 copies of the grid (call them right-grid, left-grid, up-grid, down-grid).
the rule is, you can only be in the left grid if you are moving left, you can only be in the right grid if you are moving right, etc. To turn, you must move from one grid to the other (which uses energy).
Create this structure and then use BFS.
Example:
Now the left grid assumes you are moving left, so create a graph from the left grid where every point is connected to the point on its left with the amount of energy to move forwards.
The only other option when in the left-grid is the move to the up-grid or the down-grid (which uses 1 energy), so connect the corresponding gridpoints from up-grid and left-grid etc.
Now you have built your graph, simply use breadth first search again.
I suggest you use pythons NetworkX, it will only be about 20 lines of code.
Make sure you don't connect squares if there is an obstacle in the way.
good luck.

Combinatorial optimization

Suppose we have a connected and undirected graph: G=(V,E).
Definition of connected-set: a group of points belonging to V of G forms a valid connected-set iff every point in this group is within T-1 edges away from any other point in the same group, T is the number of points in the group.
Pls note that a connected set is just a connected subgraph of G without the edges but with the points.
And we have an arbitrary function F defined on connected-set, i.e given an arbitrary connected-set CS F(CS) will give us a real value.
Two connected-sets are said disjoint if their union is not a connected set.
For an visual explanation, pls see the graph below:
In the graph, the red,black,green point sets are all valid connected-sets, green set is disjoint to red set, but black set is not disjoint to the red one.
Now the question:
We want to find a bunch of disjoint connected-sets from G so that:
(1)every connected-set has at least K points. (K is a global parameter).
(2)the sum of their function values,i.e max(Σ F(CS)) are maximized.
Is there any efficient algorithm to tackle such a problem other than an exhaustive search?
Thx!
For example, the graph can be a planar graph in the 2D Euclidean plane, and the function value F of a connected-set CS can be defined as the area of the minimum bounding rectangle of all the points in CS(minimum bounding rectangle is the smallest rectangle enclosing all the points in the CS).

If you can define your function and prove it is a Submodular Function (property analogous to that of Convexity in continuous Optimization) then there are very efficient (strongly polynomial) algorithms that will solve your problem e.g. Minimum Norm Point.
To prove that your function is Submodular you only need to prove the following:
There are several available implementations of the Minimum Norm Point algorithm e.g. Matlab Toolbox for Submodular Function Optimization

I doubt there is an efficient algorithm since for a complete graph for instance, you cannot solve the problem without knowing the value of F on every subgraph (except if you have assumptions on F: monotonicity for instance).
Nevertheless, I'd go for a non deterministic algorithm. Try simulated annealing, with transitions being:
Remove a point from a set (if it stays connected)
Move a point from a set to another (if they stay connected)
Remove a set
Add a set with one point
Good luck, this seems to be a difficult problem.

For such a general F, it is not an easy task to draft an optimized algorithm, far from the brute force approach.
For instance, since we want to find a bunch of CS where F(CS) is maximized, should we assume we want actually to find max(Σ F(CS)) for all CS or the highest F value from all possible CS, max(F(csi))? We don't know for sure.
Also, F being arbitrary, we cannot estimate the probability of having F(cs+p1) > F(cs) => F(cs+p1+p2) > F(cs).
However, we can still discuss it:
It seems we can deduce from the problem that we can treat each CS independently, meaning if n = F(cs1) adding any cs2 (being disjoint from cs1) will have no impact on the n value.
It seems also believable, and this is where we should be able to get some gain, that the calculation of F can be made starting from any point of a CS, and, in general, if CS = cs1+cs2, F(CS) = F(cs1+cs2) = F(cs2+cs1).
Then we want to inject memoization in the algorithm in order to speed up the process when a CS is grown up little by little in order to find max(F(cs)) [considering F general, the dynamic programming approach, for instance starting from a CS made of all points, then reducing it little by little, doesn't seem to have a big interest].
Ideally, we could start with a CS made of a point, extending it by one, checking and storing F values (for each subset). Each test would first check if the F value exists in order not to calculate it ; then repeat the process for another point etc..., find the best subsets that maximize F. For a large number of points, this is a very lengthy experience.
A more reasonable approach would be to try random points and grow the CS up to a given size, then try another area distinct from the bigger CS obtained at the previous stage. One could try to assess the probability explained above, and direct the algorithm in a certain way depending on the result.
But, again due to lack of F properties, we can expect an exponential space need via memoization (like storing F(p1,...,pn), for all subsets). And an exponential complexity.

I would use dynamic programming. You can start out rephrasing your problem as a node coloring problem:
Your goal is to assign a color to each node. (In other words you are looking for a coloring of the nodes)
The available colors are black and white.
In order to judge a coloring you have to examine the set of "maximal connected sets of black nodes".
A set of black nodes is called connected if the induced subgraph is connected
A connected set of black nodes is called maximal none of the nodes in the set has a black neighbor in the original graph that is not contained in the set)
Your goal is to find the coloring that maximizes ΣF(CS). (Here you sum over the "maximal connected sets of black nodes")
You have some extra constraints are specified in your original post.
Perhaps you could look for an algorithm that does something like the following
Pick a node
Try to color the chosen node white
Look for a coloring of the remaining nodes that maximizes ΣF(CS)
Try to color the chosen node black
Look for a coloring of the remaining nodes that maximizes ΣF(CS)
Each time you have colored a node white then you can examine whether or not the graph has become "decomposable" (I made up this word. It is not official):
A partially colored graph is called "decomposable" if it contains a pair of none-white nodes that are not connected by any path that does not contain a white node.
If your partially colored graph is decomposable then you can split your problem in to two sub-problems.
EDIT: I added an alternative idea and deleted it again. :)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio