How do I know when I can stop increasing the depth for an iterative deepening algorithm with negamax alpha beta pruning and transposition tables? The following pseudo code taken from a wiki page:
function negamax(node, depth, α, β, color)
alphaOrig := α
// Transposition Table Lookup; node is the lookup key for ttEntry
ttEntry := TranspositionTableLookup( node )
if ttEntry is valid and ttEntry.depth ≥ depth
if ttEntry.Flag = EXACT
return ttEntry.Value
else if ttEntry.Flag = LOWERBOUND
α := max( α, ttEntry.Value)
else if ttEntry.Flag = UPPERBOUND
β := min( β, ttEntry.Value)
endif
if α ≥ β
return ttEntry.Value
endif
if depth = 0 or node is a terminal node
return color * the heuristic value of node
bestValue := -∞
childNodes := GenerateMoves(node)
childNodes := OrderMoves(childNodes)
foreach child in childNodes
val := -negamax(child, depth - 1, -β, -α, -color)
bestValue := max( bestValue, val )
α := max( α, val )
if α ≥ β
break
// Transposition Table Store; node is the lookup key for ttEntry
ttEntry.Value := bestValue
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
else if bestValue ≥ β
ttEntry.Flag := LOWERBOUND
else
ttEntry.Flag := EXACT
endif
ttEntry.depth := depth
TranspositionTableStore( node, ttEntry )
return bestValue
And this is the iterative deepening call:
while(depth < ?)
{
depth++;
rootNegamaxValue := negamax( rootNode, depth, -∞, +∞, 1)
}
Of course, when I know the total number of moves in a game I could use depth < numberOfMovesLeft as an upper bound. But if this information is not given, when do I know that another call of negamax doesn't give any better result then the previous run? What do I need to change in the algorithm?
The short answer is: when you run out of time (and the transpositional tables are irrelevant to the answer/question)
Here I assume that your evaluation function is reasonable (gives good approximation of the position).
The main idea to combine the iterative deepening with alpha beta is the following: let's assume that you have 15 seconds to come up with the best move. How far can you search? I do not know and no one else know. You can try to search till depth = 8 only to find out that the search finished in 1 second (so you waisted available 14 seconds of time). With trial and error you found that depth = 10 gives you result in 13 seconds. So you decided to use it all the time. But now something went terribly wrong (your alpha beta was not pruning good enough, some of the positions took too much time to evaluate) and your result was not ready in 15 seconds. So you either made a random move or have lost the game.
So that this would never happened it is nice to have a good result ready. So you do the following. Get the best result for depth=1 and store it. Find the best result for depth=2, and overwrite it. And so on. From time to time check how much time left, and if it is really close to timelimit - return your best move.
Now you do not need to worry about the time, your method will give the best result you have found so far. With all these recalculations of different subtrees you only waste half of your resources (if you check the whole tree, but in alpha-beta you most probably are not). The additional advantage is that now you reorder the moves from the best to worse on each depth iteration and thus will make pruning more aggressive.
Related
I'm confused with these two. is Negamax just an optimization for minimax? or is Negamax is another search tree algorithm? If negamax is another search tree algorithm, then which one is better?
Extracted information from here
Negamax is a simplification of MinMax by using the following property :
max(a,b) = -min(-a,-b)
So, instead of having a conditonnal value compute in minmax, which is the following:
if maximizingPlayer then
value := −∞
for each child of node do
value := max(value, minimax(child, depth − 1, FALSE))
return value
else (* minimizing player *)
value := +∞
for each child of node do
value := min(value, minimax(child, depth − 1, TRUE))
you have a single line that does the same in Negamax:
value := max(value, −negamax(child, depth − 1, −color))
and the boolean is replaced by the notion (in the article) of color, which is just a value of 1 or -1 to alternate between player turn (if we should minimize or maximize the next turn)
I don't get why flags for table entries are used as they are. Consider e.g. the pseudocode for Negamax with alpha-beta pruning and transposition tables and concentrate on the TT parts.
(* Transposition Table Lookup; node is the lookup key for ttEntry *)
ttEntry := transpositionTableLookup(node)
if ttEntry is valid and ttEntry.depth ≥ depth then
if ttEntry.flag = EXACT then
return ttEntry.value
else if ttEntry.flag = LOWERBOUND then
α := max(α, ttEntry.value)
else if ttEntry.flag = UPPERBOUND then
β := min(β, ttEntry.value)
if α ≥ β then
return ttEntry.value
That's OK. If entry contains the lower bound of the exact value we try to shrink the window from the left, etc.
(* Transposition Table Store; node is the lookup key for ttEntry *)
ttEntry.value := value
if value ≤ alphaOrig then
ttEntry.flag := UPPERBOUND
else if value ≥ β then
ttEntry.flag := LOWERBOUND
else
ttEntry.flag := EXACT
ttEntry.depth := depth
transpositionTableStore(node, ttEntry)
And this part I don't understand. Why we set UPPERBOUND flag if value was too small? value is on the left side of the search window -- it's smaller than the known lower bound -- alpha. So it seems that value should be a LOWERBOUND.
I'm certainly wrong with that logic, as it seen from my tests and the fact that everyone using that version. But I don't get why.
On the second thought, the question is trivial :)
Indeed, if the child node value was too good to cause a beta-cutoff (value ≥ β), it means that parent node has move that is at lest good as value, but maybe there are some even better move. So the value is the LOWERBOUND for the exact node value.
value ≤ alphaOrig means that all moves was worse than alphaOrig. That means that value is the UPPERBOUND of consequences of all moves.
Lower and Upper are bounds of the value of the current node, not the root one, as I somehow implied.
I'm trying to implement the alpha beta pruning with transpositional tables, I found the pseudocode of the algorithm in wikipedia: https://en.wikipedia.org/wiki/Negamax#cite_note-Breuker-1
However I belive that this psudocode is wrong, I think that alphaOrig is useless and instead of:
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
It should be:
if bestValue ≤ α
ttEntry.Flag := UPPERBOUND
Can anyone confirm if I'm right or explain to me why I'm wrong, thanks!
Here the pseudocode:
function negamax(node, depth, α, β, color)
alphaOrig := α
// Transposition Table Lookup; node is the lookup key for ttEntry
ttEntry := TranspositionTableLookup( node )
if ttEntry is valid and ttEntry.depth ≥ depth
if ttEntry.Flag = EXACT
return ttEntry.Value
else if ttEntry.Flag = LOWERBOUND
α := max( α, ttEntry.Value)
else if ttEntry.Flag = UPPERBOUND
β := min( β, ttEntry.Value)
endif
if α ≥ β
return ttEntry.Value
endif
if depth = 0 or node is a terminal node
return color * the heuristic value of node
bestValue := -∞
childNodes := GenerateMoves(node)
childNodes := OrderMoves(childNodes)
foreach child in childNodes
v := -negamax(child, depth - 1, -β, -α, -color)
bestValue := max( bestValue, v )
α := max( α, v )
if α ≥ β
break
// Transposition Table Store; node is the lookup key for ttEntry
ttEntry.Value := bestValue
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
else if bestValue ≥ β
ttEntry.Flag := LOWERBOUND
else
ttEntry.Flag := EXACT
endif
ttEntry.depth := depth
TranspositionTableStore( node, ttEntry )
return bestValue
There are different implementations for alpha beta pruning with transposition tables available. For example the one from Marsland: A REVIEW OF GAME-TREE PRUNING, Breuker: Memory versus Search in Games and Carolus: Alpha-Beta with Sibling Prediction Pruning in Chess
For my answer I will quote a snippet of the Talk:Negamax page:
Marsland transposition table logic is equivalent when alphaOrig in Breuker stores α after the transposition table lookup (rather than before). But consider the following case during a negamax function call:
transposition table lookup updates α because it's a "lower bound" (Breuker: alphaOrig < α Marsland: alphaOrig = α)
the move evaluation returns the same as unchanged α for bestValue (score)
update the node's transposition table entry with the same bestValue (score)
In Breuker's logic, the node's transposition table entry will update with "exact" flag (since alphaOrig < bestValue < β). In Marsland, the update will have "upper bound" flag (since score ≤ α). Optimally, the flag for the score should be "exact" rather than alternating between upper and lower bound. So I think Breuker's version is better?
In Carolus, there's no alphaOrig and no equivalent. alpha updates during move evaluation. In this case, after move evaluation, best can never be greater than alpha, and setting "exact" flag for the transposition table entry is impossible.
There are even more discussion about this on the talk page of the Negamax article.
I am trying to apply the alpha beta pruning algorithm to this given tree.
I am stuck when I hit node C because after expanding all the children of B, I give A >= -4, I then expand C to get I =-3, which IS greater than -4 (-3 >= -4). Do I therefore, update A to -3? If so do I then afterwards, prune J and K because -3 >= -3 ? When I worked through the example, I pruned, J, K, M and N. I am really uncertain about this =(
EDIT:
Another question: After exploring B and passing the value of B to A, do we pass this value to C and thus to I? I saw an example that this was the case. Here it is: http://web.cecs.pdx.edu/~mm/AIFall2011/alphabeta-example.pdf
However, in this example, http://web.cecs.pdx.edu/~mm/AIFall2011/alphabeta-example.pdf, it doesn't seem to pass down values, instead it seems to only propagate values upwards. I am not sure which one is correct or if it makes a difference at all.
After expanding all the children of B, then A has α=-4, β=∞.
When you get to I, then α=-4, β=-3. α < β so J and K are not pruned. They would need to be evaluated to make sure that they're not less than -3, lowering the evaluation of C. The value of A is updated to α=-3, β=∞ after C is expanded. You can't use the updated alpha value of A when evaluating J because it wouldn't have been updated yet.
J and K would be pruned if I was -5 instead. In that case it wouldn't matter what J and K are because we already know the evaluation of C is worse than B because -5 < -4, and J and K can only make that worse.
Each node passes the alpha and beta values to its children. The children will then update their own copies of the alpha or beta value depending on whose turn it is and return the final evaluation of that node. That is then used to update the alpha or beta value of the parent.
See Alpha-Beta pruning for example:
function alphabeta(node, depth, α, β, Player)
if depth = 0 or node is a terminal node
return the heuristic value of node
if Player = MaxPlayer
for each child of node
α := max(α, alphabeta(child, depth-1, α, β, not(Player)))
if β ≤ α
break // Beta cut-off
return α
else
for each child of node
β := min(β, alphabeta(child, depth-1, α, β, not(Player)))
if β ≤ α
break // Alpha cut-off
return β
// Initial call
alphabeta(origin, depth, -infinity, +infinity, MaxPlayer)
Whenever I need to refresh my understanding of the algorithm I use this:
http://homepage.ufp.pt/jtorres/ensino/ia/alfabeta.html
You can enter your tree there and step through the algorithm. The values you would want are:
3 3 3 3
-2 -4 3 etc.
I find that deducing the algorithm from an example provides a deeper understanding.
I'm having trouble understanding this pseudocode I found for alpha beta pruning on wikipedia:
function alphabeta(node, depth, α, β, Player)
if depth = 0 or node is a terminal node
return the heuristic value of node
if Player = MaxPlayer
for each child of node
α := max(α, alphabeta(child, depth-1, α, β, not(Player)))
if β ≤ α
break (* Beta cut-off *)
return α
else
for each child of node
β := min(β, alphabeta(child, depth-1, α, β, not(Player)))
if β ≤ α
break (* Alpha cut-off *)
return β
What is confusing me is the if Player = MaxPlayer condition. I understand the whole recursively calling the function with not(Player) to get the minimum value, which will then recursively call the function with Player, repeating until the depth limit is reached or a goal state has be found. However, I don't understand the
if β ≤ α
break
statement. My understanding of that is that, the second value higher than the minimum value found in the previous call (β) is found, that is the value that is used. But since this is the MAX part of the function, don't we want the HIGHEST value, not just ANY value that is greater than beta?
This is the trimming phase of the algorithm, in the MaxPlayer clause (When checking for max value for the player in this node):
Beta is the parameter of the function which is the "trimming factor". It represents the minimum score you have found so far. It means that the parent of the current node, which is a minimizing node - has already found a solution which is beta.
Now, if we continue iterating all children, we will get something at least as good as the current alpha. Since beta <= alpha - the parent node - which is minimizing node - will NEVER chose this alpha (or any value greater than it) - it will chose a value which is beta or lower - and the current node has no chance of finding such, so we can trim the calculation.
Example:
MIN
/ \
/ \
/ \
/ \
5 MAX
/ | \
/ | \
/ | \
6 8 4
When evaluating the MAX node, we will return 8, if we apply the normal min-max algorithm. However, we know that the MIN function is going to do min(5, MAX(6, 8, 4)). Since after we read 6 we know max(6, 8, 4) >= 6, we can return 6 without continuing computations because the MIN computation of the upper level will be min(5, MAX(6, 8, 4)) = min(5, 6) = 5.
This is the intuition for one level, it is of course done recursively to "flow" to all levels with the same idea.
The same idea holds for the trimming condition in the MIN vertex.