Alpha-beta pruning with ties causes avoidable loss

Alpha-beta pruning with ties causes avoidable loss - algorithm

In developing software to play some simple strategy games, I am using standard techniques for searching and evaluation, including alpha-beta pruning. But I have encountered an unexpected problem which results in one player choosing an ultimately losing move instead of one that ties the game.
Imagine this scenario: The MAX player has at least two moves to evaluate at depth D. One results in a tie, so its value is 0 and alpha is set to 0. In searching a second MAX move, with alpha = 0, at depth D+1, MIN has at least two replies to evaluate. One of them is a forced win for MIN. An earlier MIN reply results in a tie, so its value is 0, and beta is set to 0. This triggers an alpha-beta cutoff for MIN because beta 0 <= alpha 0. So the later forced win for MIN is never seen, and that second MAX move gets a value of 0. Thus it may be chosen instead of the first MAX move (a tie), resulting in an avoidable loss for MAX.
Here's a more concrete example: A tic-tac-toe board can be represented as…
0 1 2
3 4 5
6 7 8
Let's say that the MIN player chooses square 8. The MAX player has eight possible replies. The first four replies examined – squares 0, 1, 2, and 3 – are all evaluated to eventually result in a loss for MAX. Square 4 is evaluated as a tie, with a value of 0, and thus alpha is set to 0. Square 5 is next evaluated, and as part of that process, the alpha value of 0 is passed down, and the possible replies by MIN are evaluated one at a time. Square 0 is evaluated for MIN as a tie, and thus beta is set to 0. The next possible reply by MIN, square 1, is also a tie. At this point, both beta and alpha are 0, triggering an alpha-beta pruning of the remaining MIN replies. These include squares 4, 6, and 7, all of which allow MIN to win. But these are never seen, because of the alpha-beta pruning. A value of 0 is passed up the search tree. Consequently, MAX incorrectly sees square 5 as a tie just as good as square 4, even though playing at square 5 would likely result in a loss.
Does anyone see a problem with this analysis? Isn't this a problem with alpha-beta pruning?

Related

Find Minimum Time to Occupy Grid

Problem:
Consider a patient suffering from skin infection and germs are spreading all over rapidly. Assume that skin surface is scaled as a rectangular grid of size MxN and cells are marked by 0 and 1 where 0 represents non affected region on skin and 1 represents affected region on skin. Germs can move from one cell of grid to another in 4 possible directions (right, left, up, down) but can move to only one cell at a time in one direction and affect that cell in 1 sec. Doctor currently who is treating the patient see's status and wants to know the time left for him to save him before the germs spread all over the skin and patient dies. Can you help to estimate the minimum time taken for the germs to completely occupy skin surface?
Input: : Current status of skin. (A matrix of size MxN with 1's and 0's which represents affected and non affected area)
Output: : Min time in sec to cover all over the grid.
Example:
Input:
[1 1 0 0 1]
[0 1 1 0 0]
[0 0 0 0 1]
[0 1 0 0 0]
Output: 2 seconds
Explanation:
After 1 sec from input, matrix could be as below
[1 1 1 0 1]
[1 1 1 0 1]
[0 1 1 0 1]
[0 1 1 0 1]
In next sec, matrix is completely filled by 1's

I will not present a detailed solution here, but some thoughts that hopefully may help you to write your own program.
First step is to determine the kind of algorithm to implement. The optimal way would be to find a simple and fast ad hoc solution for this problem. In the absence of such a solution, for this kind of problems, classical candidates are DFS, BFS, A* ...
As the goal is to find the shortest solution, it seems natural to consider BFS first, as once BFS finds a solution, we know that it is the shortest ones and we can stop the search. However, then, we have to consider avoiding inflation of the nodes, as it would lead not only to a huge calculation time, but also a huge memory.
First idea to avoid node inflation is to consider that some 1 cells can only be expended in one another cell. In the posted diagram, for example the cell (0, 0) (top left) can only be expended to cell (1, 0). Then, after this expansion, cell (1, 1) can only move to cell (2, 1). Therefore, we know it would be suboptimal to move cell (1,1) to cell (1,0). Therefore: move such cells first.
In a similar way, once an infected cell is surrounded by other infected cells only, it is no longer necessary to consider it for next moves.
At the end, it would be convenient to have a list of infected cells, together with the number of non-infected cells that each such cell can move to.
Another idea to limit the number of nodes is to detect duplicates, as it is likely here that many of them will exist. For that, we have to define a kind of hashing. The used hash function does not need to be 100% efficient, but need to be calculated rapidly, and if possible in a recursive manner. If we obtain B diagram from A diagram by adding a 1-cell at position (i, j), then I propose something like
H(B) = H(A)^f(i, j)
f(i, j) = a*(1024*i+j)%b
Here, I used the fact that N and M are less than 1000.
Each time a new diagram is consider, we have to calculate the corresponding H value and check if it exists already in the set of past diagrams.

I'm not sure how far I would get with this in an interview situation. After some thought, rather than considering solutions that store more than one full board state, I would rather consider a greedy priority queue since a strong heuristic for the next zero-cell candidates to fill seems to be:
(1) healthy cells that have the least neighbouring infected cells (but at least one, of course),
e.g., choose A over B
1 1 B 0 1
0 1 1 0 0
0 0 A 0 1
0 1 0 0 0
and (2) break ties by choosing first the healthy cells that when infected will block the least infected cells.
e.g., choose A over B
1 1 1 0 1
1 B 1 0 A
0 0 0 0 1
0 1 0 0 0
An interesting observation is that any healthy cell destination can technically be reached in time Manhattan-distance from the nearest infected cell, where the cell leading such a "crawl" continually chooses the single move that brings us closer to the destination. We know that at the same time, though, this same infected-cell "snake" produces new "crawlers" that could reach any equally far or closer neighbours. This makes me wonder if there may be a more efficient way to determine the lower-bound, based on counts of the farthest healthy cells.

This is a variant of the multi-agent pathfinding problem (MAPF). There is a ton of recent work on this topic, but earlier modern work is a good starting point for finding optimal solutions to this problem - for instance the operator decomposition approach.
To do this you would order the agents (germs) 1..k. Then, you would start a search where you generate all possible first moves for germ 1, followed by all possible first moves for germ 2, and so on, where moves for an agent are to stay in place, or to spread to an adjacent unoccupied location. With 4 possible actions for each germ, there are up to 4^k possible actions between complete states. (Partial states occur when you haven't yet assigned actions to all k agents.) The number of actions is exponential, meaning you may run up against resource constraints (time or space) fairly quickly. But, there are only 2^(MxN) states possible. (Since agents don't go away, it's actually 2^(MxN-i) where i is the number of initial germs.)
Every time all (k) germs have considered a possible action, you have a new complete state. (And k then increases for the next iteration.) The minimum time left comes from the shallowest complete state which has the grid filled. A bit of brute-force computation will find the shortest solution. (Quite a bit in the case of large grids.)
You could use a BFS to find the first state that is completely filled. But, A* might do much better. As a heuristic, you could consider that all adjacent locations of all cells were filled in each step, and then compute the number of steps required to fill the grid under that model. That gives a lower bound on the time required to fill the full grid.
But, there are many more optimizations. The reason to do operator decomposition is that you could order the moves to take the best moves first and not consider the weaker possibilities (eg all germs don't spread). You could also use a partial-expansion approach (EPEA*) to avoid generating a lot of clearly suboptimal policies for the germs.
If I was asking this as an interview questions I might be looking to see someone formulate the problem (what are actions, what are states), come up with the lower bound on the solution (every germ expands to every adjacent cell), come up with an algorithm, and perhaps analyze how hard the problem is, in order of increasing difficulty.

Minimum number of steps to sort 3x3 matrix in a specific way

So I started practicing some algorithms and programming before university starts and I ran into this problem:
Given a 3x3 matrix containing the numbers from 0 to 8, find the minimum number of steps required to sort the matrix in the following format:
1 2 3
4 5 6
7 8 0
In one move it is only allowed to pick a cell that is adjacent to the cell which contains the 0 and swap those two cells.
Now, I am really stuck with this one and have no idea how to begin. Any tips and ideas to get me started are appreciated.
This is not homework if anyone thinks that way, I am just trying to exercise and by moving to tougher problems I got stuck. I am not looking for anyone to write the code for me, I just need a point in the right direction because I really want to understand the algorithm behind this. Thank you.

Note: This is actually an AI problem, and not a trivial data structure/algorithm problem.
This problem is called the n-puzzle problem. The example in your question is the 8-puzzle problem.
The way to solve this problem is by trying to shuffle the boxes in a way that each step gets you closer to your final goal. Think of this as a Greedy approach (Best-first search). The best algorithm to use here is the A* algorithm.
We define a state of the game to be the board position, the number of
moves made to reach the board position, and the previous state. First,
insert the initial state (the initial board, 0 moves, and a null
previous state) into a priority queue. Then, delete from the priority
queue the state with the minimum priority, and insert onto the
priority queue all neighboring states (those that can be reached in
one move). Repeat this procedure until the state dequeued is the goal
state. The success of this approach hinges on the choice of priority
function for a state. We consider two priority functions:
Hamming priority function. The number of blocks in the wrong position, plus the number of moves made so far to get to the state. Intutively, a state with a small number of blocks in the wrong position is close to the goal state, and we prefer a state that have been reached using a small number of moves.
Manhattan priority function. The sum of the distances (sum of the vertical and horizontal distance) from the blocks to their goal positions, plus the number of moves made so far to get to the state.
For example, the Hamming and Manhattan priorities of the initial state
below are 5 and 10, respectively.
8 1 3 1 2 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
4 2 4 5 6 ---------------------- ----------------------
7 6 5 7 8 1 1 0 0 1 1 0 1 1 2 0 0 2 2 0 3
initial goal Hamming = 5 + 0 Manhattan = 10 + 0
We make a key oberservation: to solve the puzzle from a given state
on the priority queue, the total number of moves we need to make
(including those already made) is at least its priority, using either
the Hamming or Manhattan priority function. (For Hamming priority,
this is true because each block that is out of place must move at
least once to reach its goal position. For Manhattan priority, this is
true because each block must move its Manhattan distance from its goal
position. Note that we do not count the blank tile when computing the
Hamming or Manhattan priorities.)
Consequently, as soon as we dequeue a state, we have not only
discovered a sequence of moves from the initial board to the board
associated with the state, but one that makes the fewest number of
moves.
(Source)

What is the largest value one can get in game 2048 without new tiles appear

This is a simplified version of the famous game 2048. Given a 4x4 grids with some values chosen from {0, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048}. A value of 0 indicates that the position in the grid is unoccupied. What is the largest tile that can be produced by any (possible length-zero) sequence of moves (up, down, left,right) from the given game setup, if no new tiles were to be introduced to the grid?
For example, given
2 64 4 32
8 16 8 4
4 32 4 0
2 2 0 0
the answer is 128:
The problem can probably be solved by those AI algorithms (minmax for example), however, I guess that will definitely be an overkill. Is there any simpler algorithm to solve this?

The simplest algorithm you can do in this case is a graph-based search. Every node of the graph represents a current state i of the grid. Every node has 4 children, representing state i+1 and therefore each edge represents a move (up, down, left, right).
So assume you start from the given state, node 1. Then you have to apply the 4 moves and simulate them. That is, node 1 has 4 children, node 2(with left movement), node 3(with right movement) and so on. For each movement you have simulate what that movement does. Store in each node the current state and the value of the maximum tile.
The algorithm would finish whenever there are no more possible movements. So looking for the maximum value of all the leaf nodes would do it.
The pseudo-code would be something like:
input: current state s_current
output: max tile value M
-----
queue <- s_current
while !queue.empty do
s = queue.pop
for each m in move // move contains the 4 moves.
s' = simulate(s,m)
if s' != 0 //so the move was possible
queue.add(s')
else
mark s' as leaf node
M = max_tile(s')
if M > M_current update M
DISCLAIMER: I have not checked for errors in the pseucode, probably some minor steps are missing.
Note that depending on queue datastructure, you are actually implementing a Breath First Search or Depth First Searh, which are the simplest graph algorithms you can implement. Actually, the most difficult part I see here is the simulate() function, which is the one that actually implements the logic of your game. I guess there are much better algorithms, but this is the most simple thing (and actually is not that bad :) )

picking element game

This is a simple game:
There is a set, A={a1,...,an}, the opponents can choose one of the first or last elements of set, and at the end the one who collect bigger numbers wins. Now say each participants dose his best, what I need to do is write a Dynamic algorithm to estimate their score.
any idea or clue is truly appreciated.

Here's a hint: to write a dynamic programming algorithm, you typically need a recurrence. Given
A={a1,...,an}
The recurrence would look something like this
f(A)= max( f({a1,...,a_n-1}) , f({a2,...,a_n}) )

Actually the recurrence relation given by dfb may not lead to right answer
as it is not leading to the right sub-optimal structure !
Assume the Player A begins the game :
the structure of problem for him is [a1,a2,...an]
After choosing an element , either a1 or an , its player B's turn to play , and then after that move it is player A's move.
So after two moves , Player A's turn will come again and this will be the right sub-problem for him .The right recurrence relation will be
Suppose from i to j elements are left :
A(i,j)= max(min( A(i+1,j-1),A(i+2,j)+a[i] ), min(A(i,j-2),A(i+1,j-1))+a[j])
Refer to the following link :
http://people.csail.mit.edu/bdean/6.046/dp/

EXAMPLE CODE
Here is Python code to compute the optimal score for first and second players.
A=[3,1,1,3,1,1,3]
cache={}
def go(a,b):
"""Find greatest difference between player 1 coins and player 2 coins when choosing from A[a:b]"""
if a==b: return 0 # no stacks left
if a==b-1: return A[a] # only one stack left
key=a,b
if key in cache:
return cache[key]
v=A[a]-go(a+1,b) # taking first stack
v=max(v,A[b-1]-go(a,b-1)) # taking last stack
cache[key]=v
return v
v = go(0,len(A))
n=sum(A)
print (n+v)/2,(n-v)/2
COUNTEREXAMPLE
Note that the code includes a counter example to one of the other answers to this question.
Consider the case [3,1,1,3,1,1,3].
By symmetry, the first players move always leaves the pattern [1,1,3,1,1,3].
For this the sum of even elements is 1+3+1=5, while the sum of odd is 1+1+3=5, so the argument is that from this position the second player will always win 5, and the first player will always win 5, so the first player will win (as he gets 5 in addition to the 3 from the first move).
However, this logic is flawed because the second player can actually get more.
First player takes 3, leaves [1,1,3,1,1,3] (only choice by symmetry)
Second player takes 3, leaves [1,1,3,1,1]
First player takes 1, leaves [1,3,1,1] (only choice by symmetry)
Second player takes 1, leaves [1,3,1]
First player takes 1, leaves [3,1] (only choice by symmetry)
Second player takes 3, leaves [1]
First player takes 1
So overall first player gets 3+1+1+1=6, while second gets 3+1+3=7 and second player wins.
The flaw is that although it is true that the second player can play such that they will win all even or all odd positions, this is not optimal play and they can actually do better than this in some cases.

Actually you do not need dynamic programming, because it is easy to find an explicit solution for the game above.
Case n is even or n = 1.
The second player to move will always lose.
Case n odd and n > 1.
The second player has a winning strategy iff one of the following 2 scenarios happen:
The elements with even index have bigger sum than all the elements with odd index
All odd elements except the last have bigger sum than all the remainings AND
All odd elements except the first have bigger sum than all the remainings.
Proof sketch:
Case n is even or n = 1: Let Sodd and Seven be the sum of all elements with even/odd indexes. Assume that Sodd > Seven, same argument hold otherwise. The first player has a winning strategy, since he can play in such a way that he will get all odd indexed items.
The case n is odd and n > 1 can also be resolved directly. In fact the first player has two options, he can get the first or last element of the set. Of the remaining elements, partition them the two subsets with odd and even indexes; by the argument above, the second player is going to take the subset with largest sum. If you expand the tree game you will end up with the statement above.

Finding good heuristic for A* search

I'm trying to find the optimal solution for a little puzzle game called Twiddle (an applet with the game can be found here). The game has a 3x3 matrix with the number from 1 to 9. The goal is to bring the numbers in the correct order using the minimum amount of moves. In each move you can rotate a 2x2 square either clockwise or counterclockwise.
I.e. if you have this state
6 3 9
8 7 5
1 2 4
and you rotate the upper left 2x2 square clockwise you get
8 6 9
7 3 5
1 2 4
I'm using a A* search to find the optimal solution. My f() is simply the number of rotations needed. My heuristic function already leads to the optimal solution (if I modify it, see the notice a t the end) but I don't think it's the best one you can find. My current heuristic takes each corner, looks at the number at the corner and calculates the manhatten distance to the position this number will have in the solved state (which gives me the number of rotation needed to bring the number to this postion) and sums all these values. I.e. You take the above example:
6 3 9
8 7 5
1 2 4
and this end state
1 2 3
4 5 6
7 8 9
then the heuristic does the following
6 is currently at index 0 and should by at index 5: 3 rotations needed
9 is currently at index 2 and should by at index 8: 2 rotations needed
1 is currently at index 6 and should by at index 0: 2 rotations needed
4 is currently at index 8 and should by at index 3: 3 rotations needed
h = 3 + 2 + 2 + 3 = 10
Additionally, if h is 0, but the state is not completely ordered, than h = 1.
But there is the problem, that you rotate 4 elements at once. So there a rare cases where you can do two (ore more) of theses estimated rotations in one move. This means theses heuristic overestimates the distance to the solution.
My current workaround is, to simply excluded one of the corners from the calculation which solves this problem at least for my test-cases. I've done no research if really solves the problem or if this heuristic still overestimates in some edge-cases.
So my question is: What is the best heuristic you can come up with?
(Disclaimer: This is for a university project, so this is a bit of homework. But I'm free to use any resource if can come up with, so it's okay to ask you guys. Also I will credit Stackoverflow for helping me ;) )

Simplicity is often most effective. Consider the nine digits (in the rows-first order) as forming a single integer. The solution is represented by the smallest possible integer i(g) = 123456789. Hence I suggest the following heuristic h(s) = i(s) - i(g). For your example, h(s) = 639875124 - 123456789.

You can get an admissible (i.e., not overestimating) heuristic from your approach by taking all numbers into account, and dividing by 4 and rounding up to the next integer.
To improve the heuristic, you could look at pairs of numbers. If e.g. in the top left the numbers 1 and 2 are swapped, you need at least 3 rotations to fix them both up, which is a better value than 1+1 from considering them separately. In the end, you still need to divide by 4. You can pair up numbers arbitrarily, or even try all pairs and find the best division into pairs.

All elements should be taken into account when calculating distance, not just corner elements. Imagine that all corner elements 1, 3, 7, 9 are at their home, but all other are not.
It could be argued that those elements that are neighbors in the final state should tend to become closer during each step, so neighboring distance can also be part of heuristic, but probably with weaker influence than distance of elements to their final state.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio