Minimum number of steps to sort 3x3 matrix in a specific way - algorithm

So I started practicing some algorithms and programming before university starts and I ran into this problem:
Given a 3x3 matrix containing the numbers from 0 to 8, find the minimum number of steps required to sort the matrix in the following format:
1 2 3
4 5 6
7 8 0
In one move it is only allowed to pick a cell that is adjacent to the cell which contains the 0 and swap those two cells.
Now, I am really stuck with this one and have no idea how to begin. Any tips and ideas to get me started are appreciated.
This is not homework if anyone thinks that way, I am just trying to exercise and by moving to tougher problems I got stuck. I am not looking for anyone to write the code for me, I just need a point in the right direction because I really want to understand the algorithm behind this. Thank you.

Note: This is actually an AI problem, and not a trivial data structure/algorithm problem.
This problem is called the n-puzzle problem. The example in your question is the 8-puzzle problem.
The way to solve this problem is by trying to shuffle the boxes in a way that each step gets you closer to your final goal. Think of this as a Greedy approach (Best-first search). The best algorithm to use here is the A* algorithm.
We define a state of the game to be the board position, the number of
moves made to reach the board position, and the previous state. First,
insert the initial state (the initial board, 0 moves, and a null
previous state) into a priority queue. Then, delete from the priority
queue the state with the minimum priority, and insert onto the
priority queue all neighboring states (those that can be reached in
one move). Repeat this procedure until the state dequeued is the goal
state. The success of this approach hinges on the choice of priority
function for a state. We consider two priority functions:
Hamming priority function. The number of blocks in the wrong position, plus the number of moves made so far to get to the state. Intutively, a state with a small number of blocks in the wrong position is close to the goal state, and we prefer a state that have been reached using a small number of moves.
Manhattan priority function. The sum of the distances (sum of the vertical and horizontal distance) from the blocks to their goal positions, plus the number of moves made so far to get to the state.
For example, the Hamming and Manhattan priorities of the initial state
below are 5 and 10, respectively.
8 1 3 1 2 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
4 2 4 5 6 ---------------------- ----------------------
7 6 5 7 8 1 1 0 0 1 1 0 1 1 2 0 0 2 2 0 3
initial goal Hamming = 5 + 0 Manhattan = 10 + 0
We make a key oberservation: to solve the puzzle from a given state
on the priority queue, the total number of moves we need to make
(including those already made) is at least its priority, using either
the Hamming or Manhattan priority function. (For Hamming priority,
this is true because each block that is out of place must move at
least once to reach its goal position. For Manhattan priority, this is
true because each block must move its Manhattan distance from its goal
position. Note that we do not count the blank tile when computing the
Hamming or Manhattan priorities.)
Consequently, as soon as we dequeue a state, we have not only
discovered a sequence of moves from the initial board to the board
associated with the state, but one that makes the fewest number of
moves.
(Source)

Related

Find Minimum Time to Occupy Grid

Problem:
Consider a patient suffering from skin infection and germs are spreading all over rapidly. Assume that skin surface is scaled as a rectangular grid of size MxN and cells are marked by 0 and 1 where 0 represents non affected region on skin and 1 represents affected region on skin. Germs can move from one cell of grid to another in 4 possible directions (right, left, up, down) but can move to only one cell at a time in one direction and affect that cell in 1 sec. Doctor currently who is treating the patient see's status and wants to know the time left for him to save him before the germs spread all over the skin and patient dies. Can you help to estimate the minimum time taken for the germs to completely occupy skin surface?
Input: : Current status of skin. (A matrix of size MxN with 1's and 0's which represents affected and non affected area)
Output: : Min time in sec to cover all over the grid.
Example:
Input:
[1 1 0 0 1]
[0 1 1 0 0]
[0 0 0 0 1]
[0 1 0 0 0]
Output: 2 seconds
Explanation:
After 1 sec from input, matrix could be as below
[1 1 1 0 1]
[1 1 1 0 1]
[0 1 1 0 1]
[0 1 1 0 1]
In next sec, matrix is completely filled by 1's
I will not present a detailed solution here, but some thoughts that hopefully may help you to write your own program.
First step is to determine the kind of algorithm to implement. The optimal way would be to find a simple and fast ad hoc solution for this problem. In the absence of such a solution, for this kind of problems, classical candidates are DFS, BFS, A* ...
As the goal is to find the shortest solution, it seems natural to consider BFS first, as once BFS finds a solution, we know that it is the shortest ones and we can stop the search. However, then, we have to consider avoiding inflation of the nodes, as it would lead not only to a huge calculation time, but also a huge memory.
First idea to avoid node inflation is to consider that some 1 cells can only be expended in one another cell. In the posted diagram, for example the cell (0, 0) (top left) can only be expended to cell (1, 0). Then, after this expansion, cell (1, 1) can only move to cell (2, 1). Therefore, we know it would be suboptimal to move cell (1,1) to cell (1,0). Therefore: move such cells first.
In a similar way, once an infected cell is surrounded by other infected cells only, it is no longer necessary to consider it for next moves.
At the end, it would be convenient to have a list of infected cells, together with the number of non-infected cells that each such cell can move to.
Another idea to limit the number of nodes is to detect duplicates, as it is likely here that many of them will exist. For that, we have to define a kind of hashing. The used hash function does not need to be 100% efficient, but need to be calculated rapidly, and if possible in a recursive manner. If we obtain B diagram from A diagram by adding a 1-cell at position (i, j), then I propose something like
H(B) = H(A)^f(i, j)
f(i, j) = a*(1024*i+j)%b
Here, I used the fact that N and M are less than 1000.
Each time a new diagram is consider, we have to calculate the corresponding H value and check if it exists already in the set of past diagrams.
I'm not sure how far I would get with this in an interview situation. After some thought, rather than considering solutions that store more than one full board state, I would rather consider a greedy priority queue since a strong heuristic for the next zero-cell candidates to fill seems to be:
(1) healthy cells that have the least neighbouring infected cells (but at least one, of course),
e.g., choose A over B
1 1 B 0 1
0 1 1 0 0
0 0 A 0 1
0 1 0 0 0
and (2) break ties by choosing first the healthy cells that when infected will block the least infected cells.
e.g., choose A over B
1 1 1 0 1
1 B 1 0 A
0 0 0 0 1
0 1 0 0 0
An interesting observation is that any healthy cell destination can technically be reached in time Manhattan-distance from the nearest infected cell, where the cell leading such a "crawl" continually chooses the single move that brings us closer to the destination. We know that at the same time, though, this same infected-cell "snake" produces new "crawlers" that could reach any equally far or closer neighbours. This makes me wonder if there may be a more efficient way to determine the lower-bound, based on counts of the farthest healthy cells.
This is a variant of the multi-agent pathfinding problem (MAPF). There is a ton of recent work on this topic, but earlier modern work is a good starting point for finding optimal solutions to this problem - for instance the operator decomposition approach.
To do this you would order the agents (germs) 1..k. Then, you would start a search where you generate all possible first moves for germ 1, followed by all possible first moves for germ 2, and so on, where moves for an agent are to stay in place, or to spread to an adjacent unoccupied location. With 4 possible actions for each germ, there are up to 4^k possible actions between complete states. (Partial states occur when you haven't yet assigned actions to all k agents.) The number of actions is exponential, meaning you may run up against resource constraints (time or space) fairly quickly. But, there are only 2^(MxN) states possible. (Since agents don't go away, it's actually 2^(MxN-i) where i is the number of initial germs.)
Every time all (k) germs have considered a possible action, you have a new complete state. (And k then increases for the next iteration.) The minimum time left comes from the shallowest complete state which has the grid filled. A bit of brute-force computation will find the shortest solution. (Quite a bit in the case of large grids.)
You could use a BFS to find the first state that is completely filled. But, A* might do much better. As a heuristic, you could consider that all adjacent locations of all cells were filled in each step, and then compute the number of steps required to fill the grid under that model. That gives a lower bound on the time required to fill the full grid.
But, there are many more optimizations. The reason to do operator decomposition is that you could order the moves to take the best moves first and not consider the weaker possibilities (eg all germs don't spread). You could also use a partial-expansion approach (EPEA*) to avoid generating a lot of clearly suboptimal policies for the germs.
If I was asking this as an interview questions I might be looking to see someone formulate the problem (what are actions, what are states), come up with the lower bound on the solution (every germ expands to every adjacent cell), come up with an algorithm, and perhaps analyze how hard the problem is, in order of increasing difficulty.

Unable to solve a codechef puzzle?

I am trying to solve this apparently beginner level problem. But not even able to come up with a brute force approach.
Here is the problem statement:
Johnny has some difficulty memorizing the small prime numbers. So, his computer science teacher has asked him to play with the following puzzle game frequently.
The puzzle is a 3x3 board consisting of numbers from 1 to 9. The objective of the puzzle is to swap the tiles until the following final state is reached:
1 2 3
4 5 6
7 8 9
At each step, Johnny may swap two adjacent tiles if their sum is a prime number. Two tiles are considered adjacent if they have a common edge.
Help Johnny to find the shortest number of steps needed to reach the goal state.
Input
The first line contains t, the number of test cases (about 50). Then t test cases follow. Each test case consists of a 3x3 table describing a puzzle which Johnny would like to solve.
The input data for successive test cases is separated by a blank line.
Output
For each test case print a single line containing the shortest number of steps needed to solve the corresponding puzzle. If there is no way to reach the final state, print the number -1.
Example
Input:
2
7 3 2
4 1 5
6 8 9
9 8 5
2 4 1
3 7 6
Output:
6
-1
Here is a start on Python for finding the distance to ALL boards that are reachable from the start.
start = '123456789'
needed_moves = {start: 0}
work_queue = [start]
while (len(work_queue)):
board = work_queue.pop(0)
for next_board in next_move(board):
if next_board not in needed_moves:
needed_moves[next_board] = needed_moves[board] + 1
work_queue.append(next_board)
This is a standard pattern for a breadth first search. Just supply a next_move function and some glue for the tests cases and you're good.
For a depth first search, just switch the queue for a stack. Which really means do work_stack.pop() to pop the last element rather than work_queue.pop(0) to get the first one.

What is the largest value one can get in game 2048 without new tiles appear

This is a simplified version of the famous game 2048. Given a 4x4 grids with some values chosen from {0, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048}. A value of 0 indicates that the position in the grid is unoccupied. What is the largest tile that can be produced by any (possible length-zero) sequence of moves (up, down, left,right) from the given game setup, if no new tiles were to be introduced to the grid?
For example, given
2 64 4 32
8 16 8 4
4 32 4 0
2 2 0 0
the answer is 128:
The problem can probably be solved by those AI algorithms (minmax for example), however, I guess that will definitely be an overkill. Is there any simpler algorithm to solve this?
The simplest algorithm you can do in this case is a graph-based search. Every node of the graph represents a current state i of the grid. Every node has 4 children, representing state i+1 and therefore each edge represents a move (up, down, left, right).
So assume you start from the given state, node 1. Then you have to apply the 4 moves and simulate them. That is, node 1 has 4 children, node 2(with left movement), node 3(with right movement) and so on. For each movement you have simulate what that movement does. Store in each node the current state and the value of the maximum tile.
The algorithm would finish whenever there are no more possible movements. So looking for the maximum value of all the leaf nodes would do it.
The pseudo-code would be something like:
input: current state s_current
output: max tile value M
-----
queue <- s_current
while !queue.empty do
s = queue.pop
for each m in move // move contains the 4 moves.
s' = simulate(s,m)
if s' != 0 //so the move was possible
queue.add(s')
else
mark s' as leaf node
M = max_tile(s')
if M > M_current update M
DISCLAIMER: I have not checked for errors in the pseucode, probably some minor steps are missing.
Note that depending on queue datastructure, you are actually implementing a Breath First Search or Depth First Searh, which are the simplest graph algorithms you can implement. Actually, the most difficult part I see here is the simulate() function, which is the one that actually implements the logic of your game. I guess there are much better algorithms, but this is the most simple thing (and actually is not that bad :) )

Is there better way than a Dijkstra algorithm for finding fastest path that do not exceed specified cost

I'm having a problem with finding fastest path that do not exceed specified cost.
There's similar question to this one, however there's a big difference between them.
Here, the only records that can appear in the data are the ones, that lead from a lower point to a higher point (eg. 1 -> 3 might appear but 3 -> 1 might not) (see below). Without knowing that, I'd use Dijkstra. That additional information, might let it do in a time faster than Dijkstras algorithm.
What do you think about it?
Let's say I've got specified maximum cost, and 4 records.
// specified cost
10
// end point
5
//(start point) (finish point) (time) (cost)
2 5 50 5
3 5 20 9
1 2 30 5
1 3 30 7
// all the records lead from a lower to a higher point no.
I have to decide, whether It's possible to get from point (1) to (5) (its impossible when theres no path that costs <= than we've got or when theres no connection between 1-5) and if so, what would be the fastest way to get in there.
The output for such data would be:
80 // fastest time
3 1 // number of points that (1 -> 2) -> (2 -> 5)
Keep in mind, that if there's a record saying you can move 1->2
1 2 30 5
It doesnt allow you to move 2<-1.
For each node at depth n, the minimum cost of path to it is n/2 * (minimal first edge at the path + minimal edge connecting to the node) - sum of arithmetic series. If this computation exceeds the required maximum, no need to check the nodes that follow. Cut these nodes off and apply Dijkstra on the rest.

Finding good heuristic for A* search

I'm trying to find the optimal solution for a little puzzle game called Twiddle (an applet with the game can be found here). The game has a 3x3 matrix with the number from 1 to 9. The goal is to bring the numbers in the correct order using the minimum amount of moves. In each move you can rotate a 2x2 square either clockwise or counterclockwise.
I.e. if you have this state
6 3 9
8 7 5
1 2 4
and you rotate the upper left 2x2 square clockwise you get
8 6 9
7 3 5
1 2 4
I'm using a A* search to find the optimal solution. My f() is simply the number of rotations needed. My heuristic function already leads to the optimal solution (if I modify it, see the notice a t the end) but I don't think it's the best one you can find. My current heuristic takes each corner, looks at the number at the corner and calculates the manhatten distance to the position this number will have in the solved state (which gives me the number of rotation needed to bring the number to this postion) and sums all these values. I.e. You take the above example:
6 3 9
8 7 5
1 2 4
and this end state
1 2 3
4 5 6
7 8 9
then the heuristic does the following
6 is currently at index 0 and should by at index 5: 3 rotations needed
9 is currently at index 2 and should by at index 8: 2 rotations needed
1 is currently at index 6 and should by at index 0: 2 rotations needed
4 is currently at index 8 and should by at index 3: 3 rotations needed
h = 3 + 2 + 2 + 3 = 10
Additionally, if h is 0, but the state is not completely ordered, than h = 1.
But there is the problem, that you rotate 4 elements at once. So there a rare cases where you can do two (ore more) of theses estimated rotations in one move. This means theses heuristic overestimates the distance to the solution.
My current workaround is, to simply excluded one of the corners from the calculation which solves this problem at least for my test-cases. I've done no research if really solves the problem or if this heuristic still overestimates in some edge-cases.
So my question is: What is the best heuristic you can come up with?
(Disclaimer: This is for a university project, so this is a bit of homework. But I'm free to use any resource if can come up with, so it's okay to ask you guys. Also I will credit Stackoverflow for helping me ;) )
Simplicity is often most effective. Consider the nine digits (in the rows-first order) as forming a single integer. The solution is represented by the smallest possible integer i(g) = 123456789. Hence I suggest the following heuristic h(s) = i(s) - i(g). For your example, h(s) = 639875124 - 123456789.
You can get an admissible (i.e., not overestimating) heuristic from your approach by taking all numbers into account, and dividing by 4 and rounding up to the next integer.
To improve the heuristic, you could look at pairs of numbers. If e.g. in the top left the numbers 1 and 2 are swapped, you need at least 3 rotations to fix them both up, which is a better value than 1+1 from considering them separately. In the end, you still need to divide by 4. You can pair up numbers arbitrarily, or even try all pairs and find the best division into pairs.
All elements should be taken into account when calculating distance, not just corner elements. Imagine that all corner elements 1, 3, 7, 9 are at their home, but all other are not.
It could be argued that those elements that are neighbors in the final state should tend to become closer during each step, so neighboring distance can also be part of heuristic, but probably with weaker influence than distance of elements to their final state.

Resources