Problem:
Consider a patient suffering from skin infection and germs are spreading all over rapidly. Assume that skin surface is scaled as a rectangular grid of size MxN and cells are marked by 0 and 1 where 0 represents non affected region on skin and 1 represents affected region on skin. Germs can move from one cell of grid to another in 4 possible directions (right, left, up, down) but can move to only one cell at a time in one direction and affect that cell in 1 sec. Doctor currently who is treating the patient see's status and wants to know the time left for him to save him before the germs spread all over the skin and patient dies. Can you help to estimate the minimum time taken for the germs to completely occupy skin surface?
Input: : Current status of skin. (A matrix of size MxN with 1's and 0's which represents affected and non affected area)
Output: : Min time in sec to cover all over the grid.
Example:
Input:
[1 1 0 0 1]
[0 1 1 0 0]
[0 0 0 0 1]
[0 1 0 0 0]
Output: 2 seconds
Explanation:
After 1 sec from input, matrix could be as below
[1 1 1 0 1]
[1 1 1 0 1]
[0 1 1 0 1]
[0 1 1 0 1]
In next sec, matrix is completely filled by 1's
I will not present a detailed solution here, but some thoughts that hopefully may help you to write your own program.
First step is to determine the kind of algorithm to implement. The optimal way would be to find a simple and fast ad hoc solution for this problem. In the absence of such a solution, for this kind of problems, classical candidates are DFS, BFS, A* ...
As the goal is to find the shortest solution, it seems natural to consider BFS first, as once BFS finds a solution, we know that it is the shortest ones and we can stop the search. However, then, we have to consider avoiding inflation of the nodes, as it would lead not only to a huge calculation time, but also a huge memory.
First idea to avoid node inflation is to consider that some 1 cells can only be expended in one another cell. In the posted diagram, for example the cell (0, 0) (top left) can only be expended to cell (1, 0). Then, after this expansion, cell (1, 1) can only move to cell (2, 1). Therefore, we know it would be suboptimal to move cell (1,1) to cell (1,0). Therefore: move such cells first.
In a similar way, once an infected cell is surrounded by other infected cells only, it is no longer necessary to consider it for next moves.
At the end, it would be convenient to have a list of infected cells, together with the number of non-infected cells that each such cell can move to.
Another idea to limit the number of nodes is to detect duplicates, as it is likely here that many of them will exist. For that, we have to define a kind of hashing. The used hash function does not need to be 100% efficient, but need to be calculated rapidly, and if possible in a recursive manner. If we obtain B diagram from A diagram by adding a 1-cell at position (i, j), then I propose something like
H(B) = H(A)^f(i, j)
f(i, j) = a*(1024*i+j)%b
Here, I used the fact that N and M are less than 1000.
Each time a new diagram is consider, we have to calculate the corresponding H value and check if it exists already in the set of past diagrams.
I'm not sure how far I would get with this in an interview situation. After some thought, rather than considering solutions that store more than one full board state, I would rather consider a greedy priority queue since a strong heuristic for the next zero-cell candidates to fill seems to be:
(1) healthy cells that have the least neighbouring infected cells (but at least one, of course),
e.g., choose A over B
1 1 B 0 1
0 1 1 0 0
0 0 A 0 1
0 1 0 0 0
and (2) break ties by choosing first the healthy cells that when infected will block the least infected cells.
e.g., choose A over B
1 1 1 0 1
1 B 1 0 A
0 0 0 0 1
0 1 0 0 0
An interesting observation is that any healthy cell destination can technically be reached in time Manhattan-distance from the nearest infected cell, where the cell leading such a "crawl" continually chooses the single move that brings us closer to the destination. We know that at the same time, though, this same infected-cell "snake" produces new "crawlers" that could reach any equally far or closer neighbours. This makes me wonder if there may be a more efficient way to determine the lower-bound, based on counts of the farthest healthy cells.
This is a variant of the multi-agent pathfinding problem (MAPF). There is a ton of recent work on this topic, but earlier modern work is a good starting point for finding optimal solutions to this problem - for instance the operator decomposition approach.
To do this you would order the agents (germs) 1..k. Then, you would start a search where you generate all possible first moves for germ 1, followed by all possible first moves for germ 2, and so on, where moves for an agent are to stay in place, or to spread to an adjacent unoccupied location. With 4 possible actions for each germ, there are up to 4^k possible actions between complete states. (Partial states occur when you haven't yet assigned actions to all k agents.) The number of actions is exponential, meaning you may run up against resource constraints (time or space) fairly quickly. But, there are only 2^(MxN) states possible. (Since agents don't go away, it's actually 2^(MxN-i) where i is the number of initial germs.)
Every time all (k) germs have considered a possible action, you have a new complete state. (And k then increases for the next iteration.) The minimum time left comes from the shallowest complete state which has the grid filled. A bit of brute-force computation will find the shortest solution. (Quite a bit in the case of large grids.)
You could use a BFS to find the first state that is completely filled. But, A* might do much better. As a heuristic, you could consider that all adjacent locations of all cells were filled in each step, and then compute the number of steps required to fill the grid under that model. That gives a lower bound on the time required to fill the full grid.
But, there are many more optimizations. The reason to do operator decomposition is that you could order the moves to take the best moves first and not consider the weaker possibilities (eg all germs don't spread). You could also use a partial-expansion approach (EPEA*) to avoid generating a lot of clearly suboptimal policies for the germs.
If I was asking this as an interview questions I might be looking to see someone formulate the problem (what are actions, what are states), come up with the lower bound on the solution (every germ expands to every adjacent cell), come up with an algorithm, and perhaps analyze how hard the problem is, in order of increasing difficulty.
Related
I have a collection $C$ of unique binary 1-dimensional vectors of length $n$, so there are $2^{n-1}$ elements in this collection excluding the one with all $0$'s. It's not exactly a powerset because all items are the same length, but I used a powerset to create this and appended $0$'s in an appropriate way. For example, when $n = 3$,
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Each element has an associated reward function $R$ where $R(v) \ge 0$ for $v \in C$. Also, it's possible $R(v) \ge n$.
These are my current observations on $R$ (percentages are not exact, but representative):
$R(v) = 0$ like $80$% of the time
$R(v) = 10$ like $19$% of
the time
$R(v) > 10$ like $1$% of the time
In short, most of the items in this collection will not provide a useful reward. The rewards are not random, but I don't know how to find out what distribution they follow. What is the best way to maximize this reward without having to go through all 2^n-1 elements?
Let me rephrase the question with an example.
I think of $C$ as a collection of on/off switch configurations and each configuration turns on a number of light bulbs (one switch != one light bulb, for example, 0010 can turn on 0 light bulbs, and 0110 can turn on 5 lightbulbs). What's the fastest way to find the maximum number of light bulbs turned on if
hardly any of these configurations will turn on a light-bulb,
very few light bulbs turn on more than 10 light bulbs, but there are a few configurations that can turn on 100's of light bulbs .
I tried using a variation of Viterbi (using a cumulative sum of rewards instead of probabilities) in which I keep track of how changing a single element will yield a maximal result, but I think an implementation of Branch and Bound or a decision tree would be better in this case. A genetic algorithm would not work because the distribution of the rewards is too biased towards 0 and 10. And a brute-force approach for one at a time would take forever if $n$ is large.
Any suggestions? I will have to code this up but I want to get a feel for a solution.
So I started practicing some algorithms and programming before university starts and I ran into this problem:
Given a 3x3 matrix containing the numbers from 0 to 8, find the minimum number of steps required to sort the matrix in the following format:
1 2 3
4 5 6
7 8 0
In one move it is only allowed to pick a cell that is adjacent to the cell which contains the 0 and swap those two cells.
Now, I am really stuck with this one and have no idea how to begin. Any tips and ideas to get me started are appreciated.
This is not homework if anyone thinks that way, I am just trying to exercise and by moving to tougher problems I got stuck. I am not looking for anyone to write the code for me, I just need a point in the right direction because I really want to understand the algorithm behind this. Thank you.
Note: This is actually an AI problem, and not a trivial data structure/algorithm problem.
This problem is called the n-puzzle problem. The example in your question is the 8-puzzle problem.
The way to solve this problem is by trying to shuffle the boxes in a way that each step gets you closer to your final goal. Think of this as a Greedy approach (Best-first search). The best algorithm to use here is the A* algorithm.
We define a state of the game to be the board position, the number of
moves made to reach the board position, and the previous state. First,
insert the initial state (the initial board, 0 moves, and a null
previous state) into a priority queue. Then, delete from the priority
queue the state with the minimum priority, and insert onto the
priority queue all neighboring states (those that can be reached in
one move). Repeat this procedure until the state dequeued is the goal
state. The success of this approach hinges on the choice of priority
function for a state. We consider two priority functions:
Hamming priority function. The number of blocks in the wrong position, plus the number of moves made so far to get to the state. Intutively, a state with a small number of blocks in the wrong position is close to the goal state, and we prefer a state that have been reached using a small number of moves.
Manhattan priority function. The sum of the distances (sum of the vertical and horizontal distance) from the blocks to their goal positions, plus the number of moves made so far to get to the state.
For example, the Hamming and Manhattan priorities of the initial state
below are 5 and 10, respectively.
8 1 3 1 2 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
4 2 4 5 6 ---------------------- ----------------------
7 6 5 7 8 1 1 0 0 1 1 0 1 1 2 0 0 2 2 0 3
initial goal Hamming = 5 + 0 Manhattan = 10 + 0
We make a key oberservation: to solve the puzzle from a given state
on the priority queue, the total number of moves we need to make
(including those already made) is at least its priority, using either
the Hamming or Manhattan priority function. (For Hamming priority,
this is true because each block that is out of place must move at
least once to reach its goal position. For Manhattan priority, this is
true because each block must move its Manhattan distance from its goal
position. Note that we do not count the blank tile when computing the
Hamming or Manhattan priorities.)
Consequently, as soon as we dequeue a state, we have not only
discovered a sequence of moves from the initial board to the board
associated with the state, but one that makes the fewest number of
moves.
(Source)
I have been sitting on this for almost a week now. Here is the question in a PDF format.
I could only think of one idea so far but it failed. The idea was to recursively create all connected subgraphs which works in O(num_of_connected_subgraphs), but that is way too slow.
I would really appreciate someone giving my a direction. I'm inclined to think that the only way is dynamic programming but I can't seem to figure out how to do it.
OK, here is a conceptual description for the algorithm that I came up with:
Form an array of the (x,y) board map from -7 to 7 in both dimensions and place the opponents pieces on it.
Starting with the first row (lowest Y value, -N):
enumerate all possible combinations of the 2nd player's pieces on the row, eliminating only those that conflict with the opponents pieces.
for each combination on this row:
--group connected pieces into separate networks and number these
networks starting with 1, ascending
--encode the row as a vector using:
= 0 for any unoccupied or opponent position
= (1-8) for the network group that that piece/position is in.
--give each such grouping a COUNT of 1, and add it to a dictionary/hashset using the encoded vector as its key
Now, for each succeeding row, in ascending order {y=y+1}:
For every entry in the previous row's dictionary:
--If the entry has exactly 1 group, add it's COUNT to TOTAL
--enumerate all possible combinations of the 2nd player's pieces
on the current row, eliminating only those that conflict with the
opponents pieces. (change:) you should skip the initial combination
(where all entries are zero) for this step, as the step above actually
covers it. For each such combination on the current row:
+ produce a grouping vector as described above
+ compare the current row's group-vector to the previous row's
group-vector from the dictionary:
++ if there are any group-*numbers* from the previous row's
vector that are not adjacent to any gorups in the current
row's vector, *for at least one value of X*, then skip
to the next combination.
++ any groups for the current row that are adjacent to any
groups of the previous row, acquire the lowest such group
number
++ any groups for the current row that are not adjacent to
any groups of the previous row, are assigned an unused
group number
+ Re-Normalize the group-number assignments for the current-row's
combination (**) and encode the vector, giving it a COUNT equal
to the previous row-vector's COUNT
+ Add the current-row's vector to the dictionary for the current
Row, using its encoded vector as the key. If it already exists,
then add it's COUNT to the COUNT for the pre-exising entry
Finally, for every entry in the dictionary for the last row:
If the entry has exactly one group, then add it's COUNT to TOTAL
**: Re-Normalizing simply means to re-assign the group numbers so as to eliminate any permutations in the grouping pattern. Specifically, this means that new group numbers should be assigned in increasing order, from left-to-right, starting from one. So for example, if your grouping vector looked like this after grouping ot to the previous row:
2 0 5 5 0 3 0 5 0 7 ...
it should be re-mapped to this normal form:
1 0 2 2 0 3 0 2 0 4 ...
Note that as in this example, after the first row, the groupings can be discontiguous. This relationship must be preserved, so the two groups of "5"s are re-mapped to the same number ("2") in the re-normalization.
OK, a couple of notes:
A. I think that this approach is correct , but I I am really not certain, so it will definitely need some vetting, etc.
B. Although it is long, it's still pretty sketchy. Each individual step is non-trivial in itself.
C. Although there are plenty of individual optimization opportunities, the overall algorithm is still pretty complicated. It is a lot better than brute-force, but even so, my back-of-the-napkin estimate is still around (2.5 to 10)*10^11 operations for N=7.
So it's probably tractable, but still a long way off from doing 74 cases in 3 seconds. I haven't read all of the detail for Peter de Revaz's answer, but his idea of rotating the "diamond" might be workable for my algorithm. Although it would increase the complexity of the inner loop, it may drop the size of the dictionaries (and thus, the number of grouping-vectors to compare against) by as much as a 100x, though it's really hard to tell without actually trying it.
Note also that there isn't any dynamic programming here. I couldn't come up with an easy way to leverage it, so that might still be an avenue for improvement.
OK, I enumerated all possible valid grouping-vectors to get a better estimate of (C) above, which lowered it to O(3.5*10^9) for N=7. That's much better, but still about an order of magnitude over what you probably need to finish 74 tests in 3 seconds. That does depend on the tests though, if most of them are smaller than N=7, it might be able to make it.
Here is a rough sketch of an approach for this problem.
First note that the lattice points need |x|+|y| < N, which results in a diamond shape going from coordinates 0,6 to 6,0 i.e. with 7 points on each side.
If you imagine rotating this diamond by 45 degrees, you will end up with a 7*7 square lattice which may be easier to think about. (Although note that there are also intermediate 6 high columns.)
For example, for N=3 the original lattice points are:
..A..
.BCD.
EFGHI
.JKL.
..M..
Which rotate to
A D I
C H
B G L
F K
E J M
On the (possibly rotated) lattice I would attempt to solve by dynamic programming the problem of counting the number of ways of placing armies in the first x columns such that the last column is a certain string (plus a boolean flag to say whether some points have been placed yet).
The string contains a digit for each lattice point.
0 represents an empty location
1 represents an isolated point
2 represents the first of a new connected group
3 represents an intermediate in a connected group
4 represents the last in an connected group
During the algorithm the strings can represent shapes containing multiple connected groups, but we reject any transformations that leave an orphaned connected group.
When you have placed all columns you need to only count strings which have at most one connected group.
For example, the string for the first 5 columns of the shape below is:
....+ = 2
..+++ = 3
..+.. = 0
..+.+ = 1
..+.. = 0
..+++ = 3
..+++ = 4
The middle + is currently unconnected, but may become connected by a later column so still needs to be tracked. (In this diagram I am also assuming a up/down/left/right 4-connectivity. The rotated lattice should really use a diagonal connectivity but I find that a bit harder to visualise and I am not entirely sure it is still a valid approach with this connectivity.)
I appreciate that this answer is not complete (and could do with lots more pictures/explanation), but perhaps it will prompt someone else to provide a more complete solution.
I've encountered an interesting problem while programming a random level generator for a tile-based game. I've implemented a brute-force solver for it but it is exponentially slow and definitely unfit for my use case. I'm not necessarily looking for a perfect solution, I'll be satisfied with a “good enough” solution that performs well.
Problem Statement:
Say you have all or a subset of the following tiles available (this is the combination of all possible 4-bit patterns mapped to the right, up, left and down directions):
alt text http://img189.imageshack.us/img189/3713/basetileset.png
You are provided a grid where some cells are marked (true) and others not (false). This could be generated by a perlin noise algorithm, for example. The goal is to fill this space with tiles so that there are as many complex tiles as possible. Ideally, all tiles should be connected. There might be no solution for some input values (available tiles + pattern). There is always at least one solution if the top-left, unconnected tile is available (that is, all pattern cells can be filled with that tile).
Example:
Images left to right: tile availability (green tiles can be used, red cannot), pattern to fill and a solution
alt text http://img806.imageshack.us/img806/2391/sampletileset.png + alt text http://img841.imageshack.us/img841/7/samplepattern.png = alt text http://img690.imageshack.us/img690/2585/samplesolution.png
What I tried:
My brute-force implementation attempts every possible tile everywhere and keeps track of the solutions that were found. Finally, it chooses the solution that maximizes the total number of connections outgoing from each of the tiles. The time it takes is exponential with regard to the number of tiles in the pattern. A pattern of 12 tiles takes a few seconds to solve.
Notes:
As I said, performance is more important than perfection. However, the final solution must be properly connected (no tile pointing to a tile which doesn't point to the original tile). To give an idea of scope, I'd like to handle a pattern of 100 tiles under about 2 seconds.
For 100-tile instances, I believe that a dynamic program based on a carving decomposition of the input graph could fit the bill.
Carving decomposition
In graph theory, a carving decomposition of a graph is a recursive binary partition of its vertices. For example, here's a graph
1--2--3
| |
| |
4--5
and one of its carving decompositions
{1,2,3,4,5}
/ \
{1,4} {2,3,5}
/ \ / \
{1} {4} {2,5} {3}
/ \
{2} {5}.
The width of a carving decomposition is the maximum number of edges leaving one of its partitions. In this case, {2,5} has outgoing edges 2--1, 2--3, and 5--4, so the width is 3. The width of a kd-tree-style partition of a 10 x 10 grid is 13.
The carving-width of a graph is the minimum width of a carving decomposition. It is known that planar graphs (in particular, subgraphs of grid graphs) with n vertices have carving-width O(√n), and the big-O constant is relatively small.
Dynamic program
Given an n-vertex input graph and a carving decomposition of width w, there is an O(2w n)-time algorithm to compute the optimal tile choice. This running time grows rapidly in w, so you should try decomposing some sample inputs by hand to get an idea of what kind of performance to expect.
The algorithm works on the decomposition tree from the bottom up. Let X be a partition, and let F be the set of edges that leave X. We make a table mapping each of 2|F| possibilities for the presence or absence of edges in F to the optimal sum on X under the specified constraints (-Infinity if there is no solution). For example, with the partition {1,4}, we have entries
{} -> ??
{1--2} -> ??
{4--5} -> ??
{1--2,4--5} -> ??
For the leaf partitions with only one vertex, the subset of F completely determines the tile, so it's easy to fill in the number of connections (if the tile is valid) or -Infinity otherwise. For the other partitions, when computing an entry of the table, try all different connectivity patterns for the edges that go between the two children.
For example, suppose we have pieces
|
. .- .- -. .
|
The table for {1} is
{} -> 0
{1--2} -> 1
{1--4} -> -Infinity
{1--2,1--4} -> 2
The table for {4} is
{} -> 0
{1--4} -> 1
{4--5} -> 1
{1--4,4--5} -> -Infinity
Now let's compute the table for {1,4}. For {}, without the edge 1--4 we have score 0 for {1} (entry {}) plus score 0 for {4} (entry {}). With edge 1--4 we have score -Infinity + 1 = -Infinity (entries {1--4}).
{} -> 0
For {1--2}, the scores are 1 + 0 = 1 without 1--4 and 2 + 1 = 3 with.
{1--2} -> 3
Continuing.
{4--5} -> 0 + 1 = 1 (> -Infinity = -Infinity + (-Infinity))
{1--2,4--5} -> 1 + 1 = 2 (> -Infinity = 2 + (-Infinity))
At the end we can use the tables to determine an optimal solution.
Finding a carving decomposition
There are sophisticated algorithms for finding good carving decompositions, but you might not need them. Try a simple binary space partitioning scheme.
As a base, take a look at an earlier answer I gave on searching. Hill-climbing search programs are a tool every programmer should have in their arsenal as they work much better than plain brute-force solvers.
Here, even a relatively bad search algorithm has as an advantage the fact that it won't generate illegal boards, greatly reducing the expected run time.
I think I may have a better idea. I didn't test it, but I'm pretty sure it will be faster than a purely brute-force-ish solution for large zones.
First, create an empty set (a "set" being a collection that only contains unique objects) of nodes. This collection will be used to identify which tiles have broken connections that need to be fixed.
Fill the data structures to represent the board with the pieces that are available, using the ones you see the most fit based on your personal criteria with no regard to the correctness of the solution. This will almost certainly lead you to an invalid state, but it's okay for now. Iterate through the board, and find all tiles that have connections leading to nowhere. Add them to the set of broken tiles.
Now, iterate through the set. Change the tiles it refers to by reducing their number of connections (otherwise you could get into an infinite loop) so they have no broken connection, respecting the currently available pieces. Check their neighbors again, and if you broke connections to other tiles, add these to the set of broken ones too.
Once the set of broken connections will be empty, you should have a fine-looking pattern. Note however that it has an important caveat: it might to tend to oversimplify patterns, since the "fixing" phase will always attempt to reduce the number of connections. You may have to be lucky to get interesting patterns since this could be greatly affected by first piece you put on each tile.
Well I have been through many sites teaching on how to solve it, but was wondering how to create it. I am not interested much in the coding aspects of it, but wanted to know more on the algorithms behind it. For example, when the grid is generated with 10 mines or so, I would use any random function to distribute itself across the grid, but then again how do I set the numbers associated to it and decide which box to be opened? I couldn't frame any generic algorithm on how would I go about doing that.
Perhaps something in the lines of :
grid = [n,m] // initialize all cells to 0
for k = 1 to number_of_mines
get random mine_x and mine_y where grid(mine_x, mine_y) is not a mine
for x = -1 to 1
for y = -1 to 1
if x = 0 and y = 0 then
grid[mine_x, mine_y] = -number_of_mines // negative value = mine
else
increment grid[mine_x + x, mine_y + y] by 1
That's pretty much it...
** EDIT **
Because this algorithm could lead into creating a board with some mines grouped too much together, or worse very dispersed (thus boring to solve), you can then add extra validation when generating mine_x and mine_y number. For example, to ensure that at least 3 neighboring cells are not mines, or even perhaps favor limiting the number of mines that are too far from each other, etc.
** UPDATE **
I've taken the liberty of playing a little with JS bin here came up with a functional Minesweeper game demo. This is simply to demonstrate the algorithm described in this answer. I did not optimize the randomness of the generated mine position, therefore some games could be impossible or too easy. Also, there are no validation as to how many mines there are in the grid, so you can actually create a 2 by 2 grid with 1000 mines.... but that will only lead to an infinite loop :) Enjoy!
You just seed the mines and after that, you traverse every cell and count the neighbouring mines.
Or you set every counter to 0 and with each seeded mine, you increment all neighbouring cells counters.
If you want to place m mines on N squares, and you have access to a random number generator, you just walk through the squares remaining and for each square compute (# mines remaining)/(# squares remaining) and place a mine if your random number is equal to or below that value.
Now, if you want to label every square with the number of adjacent mines, you can just do it directly:
count(x,y) = sum(
for i = -1 to 1
for j = -1 to 1
1 if (x+i,y+j) contains a mine
0 otherwise
)
or if you prefer you can start off with an array of zeros and increment each by one in the 3x3 square that has a mine in the center. (It doesn't hurt to number the squares with mines.)
This produces a purely random and correctly annotated minesweeper game. Some random games may not be fun games, however; selecting random-but-fun games is a much more challenging task.