Modify Minimax to Alpha-Beta pruning pseudo code

Modify Minimax to Alpha-Beta pruning pseudo code - algorithm

I am learning the Alpha-Beta pseudo code and I want to write a simplest pseudo code for Alpha Beta pruning.
I have written the pseudo code for Minimax:
function minimax(node, depth)
if node is a terminal node or depth ==0
return the heuristic value of node
else
best = -99999
for child in node
best = max(best, -minimax(child, depth-1))
return best
However, I don't know how to modify it into alpha-beta pruning. Can anyone help?

In Alpha-Beta, you keep track of your guaranteed score for one position. You can stop immediately if you found a move that is better than the score that the opponent already has guaranteed in its previous position (one move before).
Technically, each side keeps track of its lowerbound score (alpha), and you have access to the opponent's lowerbound score (beta).
The following pseudo-code is not tested, but here is the idea:
function alphabeta(node, depth, alpha, beta)
if node is a terminal node or depth ==0
return the heuristic value of node
else
best = -99999
for child in node
best = max(best, -alphabeta(child, depth-1, -beta, -alpha))
if best >= beta
return best
if best > alpha
alpha = best
return best
At the start of the search, you can set alpha to -INFINITY and beta to +INFINITY. Strictly speaking, the sketched algorithm is not alpha-beta but Negamax. Both are identical, so this is just an implementation detail.
Note that in Alpha-Beta, the move ordering is crucial. If most of the time, you start with the best move, or at least a very good move, you should see a huge improvement over Minimax.
An additional optimization to start with restricted alpha beta windows (not -INFINITY and +INFINITY). However, if your assumption turns out wrong, you have to restart the search with a more open search window.

Related

A star algorithm: using Heuristic value to act as Tie-breaker where nodes have identical F-values

Background: I am currently working on an 8-puzzle implementation of the original A Star algorithm and comparing this with a slightly modified algorithm which intends to improve node expansion (using additional information, ofcourse A Star in an equally informed unidirectional search has been proven optimal). The Open List of nodes are currently ordered by their F values (G-Cost+ H-Cost).
So in implementing this in Java I have set up a comparator which orders the List by their F-Values in ascending order.
#Override
public int compare(PNode o1, PNode o2) {
return Integer.compare(o1.costF, o2.costF);
}
Question: My question is whether:
Is it permitted under A-star to implement a further Tie-breaker in A-Star using the Heuristic cost (H-Value) to break any deadlock with the F-Value among many nodes (where nodes exist with the same F-value in the Open List they will be ordered by the Heuristic value (H-Cost) in ascending order instead). The reason I am confused about this as the actual A star algorithm pseudocode only sorts the Open List by the F value.
Extending the same code above, the implementation will be:
#Override
public int compare(PNode o1, PNode o2) {
if (o1.costF == o2.costF) {
return Integer.compare(o1.costH, o2.costH);
}
return Integer.compare(o1.costF, o2.costF);
}
If this is permitted, are there any reasons why I should be wary of doing this? Logically I appreciate that ordering the Open List will be more costly. But from my experiments the difference do not seem significant enough to prevent me from using this.
Thanks so much everyone~

Yes it is allowed, pretty much for the reasons stated in mcdowella's answer.
However, I'd like to add that it is often actually a very good idea, and much more beneficial than implied in that answer. This kind of tie-breaker can result in much more significant gains in performance than only finding the goal slightly earlier. See, for example, this page, which visualizes how A* still explores a rather massive area without the tie-breaker, and only a very narrow band along the optimal path with the tie-breaker.
Intuitively, you can understand that it is so effective by thinking of the different levels of "reliability" of G costs versus H costs. G costs are very reliable, they are ground truth, they are precisely the cost you have really already had to pay to reach a node. H costs are much less reliable, they are heuristics, they can be wildly wrong. By implementing the tie-breaker you propose, you are essentially saying "whenever two nodes have the same value for F = G + H, I prefer those with greater G and smaller H over those with smaller G and greater H". This is wise because, when the more reliable G component of F dominates the less reliable H component, F itself will also be more reliable.
Another major advantage is, as described on the page I linked to, that this tie-breaker can avoid exploring large portions of lots of different paths that are all equal and all optimal.

I think this is OK for two reasons
1) You are only changing the behaviour in the case of ties, so all you are doing is selecting one possible execution path from a larger set of execution paths which are possible with the original version.
2) You preserve the property that if you retrieve the goal node from the open List, every other node in the open has G-Cost + H-Cost at least as expensive as that of the node you have just retrieved, and so must lead to a path to the goal node at least as expensive as the node you have just retrieved.
By favoring nodes with low heuristic cost in the case of ties, you are favoring any goal nodes in the case of ties, so I guess you might retrieve the goal node slightly earlier and so finish slightly earlier.

the heurstic can be changed to guarntee less points have equal F
[ this article ] 1
The quick hack to work around this problem is to either adjust the g or h values. The tie breaker needs to be deterministic with respect to
the vertex (i.e., it shouldn’t be a random number), and it needs to
make the f values differ. Since A* sorts by f value, making them
different means only one of the “equivalent” f values will be
explored.
One way to break ties is to nudge the scale of h slightly. If we scale
it downwards, then f will increase as we move towards the goal.
Unfortunately, this means that A* will prefer to expand vertices close
to the starting point instead of vertices close to the goal. We can
instead scale h upwards slightly (even by 0.1%). A* will prefer to
expand vertices close to the goal.
dx1 = current.x - goal.x
dy1 = current.y - goal.y
dx2 = start.x - goal.x
dy2 = start.y - goal.y
cross = abs(dx1*dy2 - dx2*dy1)
heuristic += cross*0.001

Expected performance improvement of alpha-beta pruning optimizations: tt, MTD(f)?

What is the expected performance gain of adding Transposition Tables and then MTD(f) to pure alpha beta pruning for chess?
In my pure alpha beta pruning I use the following move ordering:
principal variation (from previous iterative deepening iteration)
best move from transposition table (if there is any)
captures: highest captured then lowest capturing
killer moves
history heuristics
With above setup I get following average results for depth 9:
pure alpha beta: X nodes visited
alpha beta+TT: 0.5*X nodes visited
MTDF(f): 0.25*X nodes visited
I expected better results and I wanted to make sure if that's as good as it can get or if there is something wrong with my implementation?
I search with accuracy of 0.1 pawn.

Minimax: what to do with equal scores in endgame?

As far as I understood, the Minimax algorithm in its simplest form works the following way: traverse the game tree bottom-up and if it's the player's turn, assign the maximum score of all child nodes to the current node, else the minimum score. Leaves are scored according to the game output, let's say +1 for win, 0 for draw, -1 for loss. Finally choose the move which leads to the node with the highest score.
Of course it is impractical to traverse the whole game tree, so a heuristic is used. But assume we are near the end of the game. Then I have found some problems with this simple approach.
For example, we are playing chess and the player (playing white) has reached this position:
It is the players turn. So there is a mate in one with Qg7, the node for Qg7 thus gets a score of 1. But for instance, Ke1 is also a legal move. The only reply is c5, then Qg7# is still available. And because Qg7 gets a score of 1, so does c5, so does Ke1.
So we have at least two moves with a score of 1 (Ke1 and Qg7). Let's say the algorithm considers king moves first and chooses the first move with the highest score. That would mean, that in this position, the player would not checkmate the opponent, but would do random King moves until the opponent could actually prevent the checkmate (with queening the pawn).
The fundamental problem is, that a checkmate in one (Qg7) has the same score as a checkmate in two (Ke1), so there is no reason for the player, to actually go for the checkmate in one.
This can be prevented with a simple modification to the Minimax algorithm: in case of equal score, choose the shorter path to the position with this score. So a checkmate in one would be prefered.
My question is: I have not found any mention of this in any Minimax-related source, so is there some misunderstanding of Minimax on my part? If not, is this the usual way to solve this or are there superior ways?

I'm pretty sure you understand minimax correctly.
The thing I would probably do is to simply pass down the current distance in the minimax function, and weight the wins / losses according to that. A faster win (to reduce the possibility of unseen situations) and a slower loss (to allow for mistakes by the opponent) should generally be preferred. Whether the win is 1, or any positive value, it doesn't matter too much - it will still get picked as better than 0 or -1.
If you have a win be the largest possible value in your heuristic - you can still do something similar - just weight it by increasing or decreasing it a little, but still have it be larger than all other non-win values.
For your example, it probably doesn't really matter, as, when the pawn gets close to promoting, you'd detect that a draw is coming and then you'd make the winning move. But it can certainly be a problem if:
There's a sequence of inevitable moves beyond the search depth that results in a worst outcome for you (which is a pretty common problem with minimax, but if it's likely avoidable, it's obviously better to do so)
There's a time constraint on your side
You don't cater for all draw conditions (e.g. 3 repeated positions)

Tic Tac Toe AI Bugs

I'm trying to implement an AI for Tic Tac Toe that is smart enough to never lose. I've tried two different algorithms but the AI still makes mistakes.
I started with this minimax alpha-beta pruning algorithm. Here's a live demo: http://iioengine.com/ttt/minimax.htm
It runs without error, but if you take the bottom left corner first, then either of the other two squares on the bottom row - the AI doesn't see that coming. I'm sure this isn't a flaw in the minimax algorithm - can anyone see an error in my source? you can inspect the demo page to see everything but here is the primary ai function:
function bestMove(board,depth,low,high,opponent){
var best=new Move(null,-iio.maxInt);
var p;
for (var c=0;c<grid.C;c++)
for(var r=0;r<grid.R;r++){
if (board[c][r]=='_'){
var nuBoard=board.clone();
nuBoard[c][r]=getTypeChar(opponent);
if(checkWin(nuBoard,getTypeChar(opponent)))
p=new Move([c,r],-evaluateBoard(board,getTypeChar(opponent))*10000);
else if (checkScratch(nuBoard))
p=new Move([c,r],0);
else if (depth==0)
p=new Move([c,r],-evaluateBoard(board,getTypeChar(opponent)));
else {
p=bestMove(nuBoard,depth-1,-high,-low,!opponent);
}
if (p.score>best.score){
best=p;
if (best.score > low)
low=best.score;
if (best.score >= high) return best;
}
}
}
return best;
}
If you are more familiar with negamax, I tried that one too. I lifted the logic straight from this page. Here is a live demo: http://iioengine.com/ttt/negamax.htm
That one freezes up once you reach a win state, but you can already see that the AI is pretty stupid. Is something wrong with the code integration?
Please let me know if you find a flaw in my code that prevents these algrothims from running properly. Thnx.
Update with code:
function checkWin(board,type){
for (var i=0;i<3;i++)
if (evaluateRow(board,[i,0,i,1,i,2],type) >= WIN_SCORE
||evaluateRow(board,[0,i,1,i,2,i],type) >= WIN_SCORE)
return true;
if(evaluateRow(board,[0,0,1,1,2,2],type) >= WIN_SCORE
||evaluateRow(board,[2,0,1,1,0,2],type) >= WIN_SCORE)
return true;
return false;
}
function evaluateBoard(board,type){
var moveTotal=0;
for (var i=0;i<3;i++){
moveTotal+=evaluateRow(board,[i,0,i,1,i,2],type);
moveTotal+=evaluateRow(board,[0,i,1,i,2,i],type);
}
moveTotal+=evaluateRow(board,[0,0,1,1,2,2],type);
moveTotal+=evaluateRow(board,[2,0,1,1,0,2],type);
return moveTotal;
}

The problem lies in your evaluateBoard() function. The evaluation function is the heart of a minimax/negamax algorithm. If your AI is behaving poorly, the problem usually lies in the evaluation of the board at each move.
For the evaluation of the board, you need to take into consideration three things: winning moves, blocking moves, and moves that result in a fork.
Winning Moves
The static evaluation function needs to know if a move results in a win or a loss for the current player. If the move results in a loss for the current player, it needs to return a very low negative number (lower than any regular move). If the move results in a win for the current player, it needs to return a very high positive number (larger than any regular move).
What is important to remember is that this evaluation has to be relative to the player whose turn the AI is making. If the AI is currently predicting where the Human player will move, then the evaluation must look at the board from the point of view of the Human player. When it's the AI's turn move, the evaluation must look at the board from the point of view of the Computer player.
Blocking Moves
When you run your evaluation function, the AI actually doesn't think blocking the Human player is beneficial. Your evaluation function looks like it just counts the number of available moves and returns the result. Instead, you need to return a higher positive number for moves that will help the AI win.
To account for blocking, you need to figure out if a player has 2 of their tokens in an open row, column, or diagonal, and then score the blocking square higher than any other square. So if it is the Computer's turn to move, and the Human player has 2 tokens in an open row, the 3rd square in the row needs to have a high positive number (but not as high as a winning square). This will cause the computer to favor that square over any others.
By just accounting for Winning moves and Blocking moves, you will have a Computer that plays fairly well.
Forking Moves
Forking moves cause problems for the Computer. The main problem is that the Computer is 'too smart' for it's own good. Since it assumes that the Human player will always make the best move every time, it will find situations where all moves that it could make will eventually end in a loss for it, so it will just pick the first move on the board it can since nothing else matters.
If we go through your example, we can see this happen: Human player plays bottom left, Computer plays top middle.
| O |
---+---+---
| |
---+---+---
X | |
When the Human player makes a move to the bottom right corner, the Computer sees that if it tries to block that move, the best move the Human player would make is to take the middle square, resulting in a fork and a win for the Human (although this won't happen even time since Humans are fallible, the Computer doesn't know that).
| O |
---+---+---
| X |
---+---+---
X | O | X
Because the computer will lose whether it blocks or doesn't block the Human from winning, blocking the Human will actually bubble up the lowest possible score (since it results in a loss for the Computer). This means that the Computer will take the best score it can - the middle square.
You'll have to figure out what is the best way to handle such situations, since everyone would play it differently. It's just something to be aware of.

With pure Minimax implementation for Tic-Tac-Toe, the A.I. should never lose. At worst, it should go into a draw.
By pure Minimax, I mean an implementation that explores each and every possible move (actually transition from one move to the other) and creates a tree for said moves and transitions (starting with an empty board at the top of the tree, branching off in all possible first moves, then all possible 2nd moves, etc).
(There's also heuristic Minimax, in which you do not render all positions in the tree node from the start, but only go a certain depth.)
The tree should have as leafs only board positions that end the game (X wins, O wins or draw). Such a tree for classing Tic-Tac-Toe (3x3 board) contains 5477 nodes (not counting the all-empty board at the top).
Once such a tree is created, the leaves are scored directly by simply evaluating how the game ended: top score for a leaf node containing a board state where A.I. wins, 0 score for draw, and lowest score for nodes with board state where the human player has won.
(in heuristic Minimax, you'll have to create a "guesstimation" function, that evaluates the leafs of the partial tree and assigns min/0/max score accordingly - in this implementation, there's a chance that the A.I. might lose at the end, and that chance is inversely proportional with how good your "guesstimator" function is at assessing partial game states.)
Next, all intermediate, non-leaf nodes of the tree are scored based of their children. Obviously, you'd do this bottoms-up, as initially, only the lowest non leaf nodes have scored children (the leaf nodes) from which to draw their own score.
(In the context of Tic-Tac-Toe there's no point in making a heuristic implementation of Minimax, as it's fairly cheap to render a tree with 5477 + 1 nodes, and then score them all. This kind of implementation is useful for games where there's a lot of branching (a lot of possible moves for a given game state), thus making for a slow/memory-hog full tree - such as chess))
In the end, you'll have a data structure containing all possible Tic-Tac-Toe games, and an exact idea of what's the best move to perform in response to any move the human player does. As such, due to how Tic-Tac-Toe rules work, Minimax A.I. will only win (if the human player makes at least one crucial mistake) or draw (if the human player always makes the best possible move). This stands true no matter who makes the first move.
I've implemented this myself, and it works as expected.
Here are some of the finer points (with which I've struggled a bit):
make sure the function you use to evaluate the board works well, i.e. that it correctly spots when there's a win/draw situation for either X and O. This function will be used on almost each node of your Minimax tree as you build it, and having it bug-out will result in seemingly working but in fact flawed code. Test this part extensively
Make sure you navigate your tree properly, especially when you're scoring intermediate nodes (but also when you're searching for the next move to make). A trivial solution is to make, along-side the tree, a hash table containing each intermediary node (non-leaf node) per level of tree depth. This way you'll be sure to get all nodes at the right time when you do the bottom-up scoring.

Alpha-beta pruning for Minimax

I have spent a whole day trying to implement minimax without really understanding it. Now, , I think I understand how minimax works, but not alpha-beta pruning.
This is my understanding of minimax:
Generate a list of all possible moves, up until the depth limit.
Evaluate how favorable a game field is for every node on the bottom.
For every node, (starting from the bottom), the score of that node is the highest score of it's children if the layer is max. If the layer is min, the score of that node is the lowest score of it's children.
Perform the move that has the highest score if you are trying to max it, or the lowest if you want the min score.
My understanding of alpha-beta pruning is that, if the parent layer is min and your node has a higher score than the minimum score, then you can prune it since it will not affect the result.
However, what I don't understand is, if you can work out the score of a node, you will need to know the score of all nodes on a layer lower than the node (in my understanding of minimax). Which means that you'llstill be using the same amount of CPU power.
Could anyone please point out what I am getting wrong? This answer ( Minimax explained for an idiot ) helped me understand minimax, but I don't get how alpha beta pruning would help.
Thank you.

To understand Alpha-Beta, consider the following situation. It's Whites turn, white is trying to maximize the score, black is trying to minimize the score.
White evaluates move A,B, and C and finds the best score is 20 with C. Now consider what happens when evaluating move D:
If white selects move D, we need to consider counter-moves by black. Early on, we find black can capture the white queen, and that subtree gets a MIN score of 5 due to the lost queen. However, we have not considered all of blacks counter-moves. Is it worth checking the rest? No.
We don't care if black can get a score lower than 5 because whites move "C" could keep the score to 20. Black will not choose a counter-move with a score higher than 5 because he is trying to MINimize the score and has already found move with a score of 5. For white, move C is preferred over move D as soon as the MIN for D (5 so far) goes below that of C (20 for sure). So we "prune" the rest of the tree there, pop back up a level and evaluate white moves E,F,G,H.... to the end.
Hope that helps.

You don't need to evaluate the entire subtree of a node to decide its value. Alpha Beta Pruning uses two dynamically computed bounds alpha and beta to bound the values that nodes can take.
Alpha is the minimum value that the max player is guaranteed (regardless of what the min player does) through another path through the game tree. This value is used to perform cutoffs (pruning) at the minimizing levels. When the min player has discovered that the score of a min node would necessarily be less than alpha, it need not evaluate any more choices from that node because the max player already has a better move (the one which has value alpha).
Beta is the maximum value that the min player is guaranteed and is used to perform cutoffs at the maximizing levels. When the max player has discovered that the score of a max node would necessarily be greater than beta, it can stop evaluating any more choices from that node because the min player would not allow it to take this path since the min player already has a path that guarantees a value of beta.
I've written a detailed explanation of Alpha Beta Pruning, its pseudocode and several improvements: http://kartikkukreja.wordpress.com/2014/06/29/alphabetasearch/

(Very) short explanation for mimimax:
You (the evaluator of a board position) have the choice of playing n moves. You try all of them and give the board positions to the (opponent) evaluator.
The opponent evaluates the new board positions (for him, the opponent side) - by doing essentially the same thing, recursively calling (his opponent) evaluator, unless the maximum depth or some other condition has been reached and a static evaluator is called - and then selects the maximum evaluation and sends the evaluations back to you.
You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.
(Very) short explanation for α-β-pruning:
You (the evaluator of a board position) have the choice of playing n moves. You try all of them one by one and give the board positions to the (opponent) evaluator - but you also pass along your current evaluation (of your board).
The opponent evaluates the new board position (for him, the opponent side) and sends the evaluation back to you. But how does he do that? He has the choice of playing m moves. He tries all of them and gives the new board positions (one by one) to (his opponent) evaluator and then chooses the maximum one.
Crucial step: If any of those evaluations that he gets back, is bigger than the minimum you gave him, it is certain that he will eventually return an evaluation value at least that large (because he wants to maximize). And you are sure to ignore that value (because you want to minimize), so he stops any more work for boards he hasn't yet evaluated.
You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.

Here's a short answer -- you can know the value of a node without computing the precise value of all its children.
As soon as we know that a child node cannot be better, from the perspective of the parent-node player, than the previously evaluated sibling nodes, we can stop evaluating the child subtree. It's at least this bad.

I think your question hints at misunderstanding of the evaluation function
if you can work out the score of a node, you will need to know the score of all nodes on a layer lower than the node (in my understanding of minimax)
I'm not completely sure what you meant there, but it sounds wrong. The evaluation function (EF) is usually a very fast, static position evaluation. This means that it needs only look at a single position and reach a 'verdict' from that. (IOW, you don't always evaluate a branch to n plys)
Now many times, the evaluation truly is static, which means that the position evaluation function is completely deterministic. This is also the reason why the evaluation results are easily cacheable (since they will be the same each time a position is evaluated).
Now, for e.g. chess, there is usually quite a bit of overt/covert deviation from the above:
a position might be evaluated differently depending on game context (e.g. whether the exact position did occur earlier during the game; how many moves without pawn moves/captures have occurred, en passant and castling opportunity). The most common 'trick' to tackle this is by actually incorporating that state into the 'position'1
a different EF is usually selected for the different phases of the game (opening, middle, ending); this has some design impact (how to deal with cached evaluations when changing the EF? How to do alpha/beta pruning when the EF is different for different plies?)
To be honest, I'm not aware how common chess engines solve the latter (I simply avoided it for my toy engine)
I'd refer to an online resources like:
Computer Chess Programming Theory
Alpha/Beta Pruning
Iterative Deepening
Transposition Table
1just like the 'check'/'stalemate' conditions, if they are not special cased outside the evaluation function anyways

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio