Transpositional tables with alpha beta pruning - pseudocode

I'm trying to implement the alpha beta pruning with transpositional tables, I found the pseudocode of the algorithm in wikipedia: https://en.wikipedia.org/wiki/Negamax#cite_note-Breuker-1
However I belive that this psudocode is wrong, I think that alphaOrig is useless and instead of:
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
It should be:
if bestValue ≤ α
ttEntry.Flag := UPPERBOUND
Can anyone confirm if I'm right or explain to me why I'm wrong, thanks!
Here the pseudocode:
function negamax(node, depth, α, β, color)
alphaOrig := α
// Transposition Table Lookup; node is the lookup key for ttEntry
ttEntry := TranspositionTableLookup( node )
if ttEntry is valid and ttEntry.depth ≥ depth
if ttEntry.Flag = EXACT
return ttEntry.Value
else if ttEntry.Flag = LOWERBOUND
α := max( α, ttEntry.Value)
else if ttEntry.Flag = UPPERBOUND
β := min( β, ttEntry.Value)
endif
if α ≥ β
return ttEntry.Value
endif
if depth = 0 or node is a terminal node
return color * the heuristic value of node
bestValue := -∞
childNodes := GenerateMoves(node)
childNodes := OrderMoves(childNodes)
foreach child in childNodes
v := -negamax(child, depth - 1, -β, -α, -color)
bestValue := max( bestValue, v )
α := max( α, v )
if α ≥ β
break
// Transposition Table Store; node is the lookup key for ttEntry
ttEntry.Value := bestValue
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
else if bestValue ≥ β
ttEntry.Flag := LOWERBOUND
else
ttEntry.Flag := EXACT
endif
ttEntry.depth := depth
TranspositionTableStore( node, ttEntry )
return bestValue

There are different implementations for alpha beta pruning with transposition tables available. For example the one from Marsland: A REVIEW OF GAME-TREE PRUNING, Breuker: Memory versus Search in Games and Carolus: Alpha-Beta with Sibling Prediction Pruning in Chess
For my answer I will quote a snippet of the Talk:Negamax page:
Marsland transposition table logic is equivalent when alphaOrig in Breuker stores α after the transposition table lookup (rather than before). But consider the following case during a negamax function call:
transposition table lookup updates α because it's a "lower bound" (Breuker: alphaOrig < α Marsland: alphaOrig = α)
the move evaluation returns the same as unchanged α for bestValue (score)
update the node's transposition table entry with the same bestValue (score)
In Breuker's logic, the node's transposition table entry will update with "exact" flag (since alphaOrig < bestValue < β). In Marsland, the update will have "upper bound" flag (since score ≤ α). Optimally, the flag for the score should be "exact" rather than alternating between upper and lower bound. So I think Breuker's version is better?
In Carolus, there's no alphaOrig and no equivalent. alpha updates during move evaluation. In this case, after move evaluation, best can never be greater than alpha, and setting "exact" flag for the transposition table entry is impossible.
There are even more discussion about this on the talk page of the Negamax article.

Related

Alpha-beta pruning with transposition tables

I don't get why flags for table entries are used as they are. Consider e.g. the pseudocode for Negamax with alpha-beta pruning and transposition tables and concentrate on the TT parts.
(* Transposition Table Lookup; node is the lookup key for ttEntry *)
ttEntry := transpositionTableLookup(node)
if ttEntry is valid and ttEntry.depth ≥ depth then
if ttEntry.flag = EXACT then
return ttEntry.value
else if ttEntry.flag = LOWERBOUND then
α := max(α, ttEntry.value)
else if ttEntry.flag = UPPERBOUND then
β := min(β, ttEntry.value)
if α ≥ β then
return ttEntry.value
That's OK. If entry contains the lower bound of the exact value we try to shrink the window from the left, etc.
(* Transposition Table Store; node is the lookup key for ttEntry *)
ttEntry.value := value
if value ≤ alphaOrig then
ttEntry.flag := UPPERBOUND
else if value ≥ β then
ttEntry.flag := LOWERBOUND
else
ttEntry.flag := EXACT
ttEntry.depth := depth
transpositionTableStore(node, ttEntry)
And this part I don't understand. Why we set UPPERBOUND flag if value was too small? value is on the left side of the search window -- it's smaller than the known lower bound -- alpha. So it seems that value should be a LOWERBOUND.
I'm certainly wrong with that logic, as it seen from my tests and the fact that everyone using that version. But I don't get why.
On the second thought, the question is trivial :)
Indeed, if the child node value was too good to cause a beta-cutoff (value ≥ β), it means that parent node has move that is at lest good as value, but maybe there are some even better move. So the value is the LOWERBOUND for the exact node value.
value ≤ alphaOrig means that all moves was worse than alphaOrig. That means that value is the UPPERBOUND of consequences of all moves.
Lower and Upper are bounds of the value of the current node, not the root one, as I somehow implied.

When to terminate iterative deepening with alpha beta pruning and transposition tables?

How do I know when I can stop increasing the depth for an iterative deepening algorithm with negamax alpha beta pruning and transposition tables? The following pseudo code taken from a wiki page:
function negamax(node, depth, α, β, color)
alphaOrig := α
// Transposition Table Lookup; node is the lookup key for ttEntry
ttEntry := TranspositionTableLookup( node )
if ttEntry is valid and ttEntry.depth ≥ depth
if ttEntry.Flag = EXACT
return ttEntry.Value
else if ttEntry.Flag = LOWERBOUND
α := max( α, ttEntry.Value)
else if ttEntry.Flag = UPPERBOUND
β := min( β, ttEntry.Value)
endif
if α ≥ β
return ttEntry.Value
endif
if depth = 0 or node is a terminal node
return color * the heuristic value of node
bestValue := -∞
childNodes := GenerateMoves(node)
childNodes := OrderMoves(childNodes)
foreach child in childNodes
val := -negamax(child, depth - 1, -β, -α, -color)
bestValue := max( bestValue, val )
α := max( α, val )
if α ≥ β
break
// Transposition Table Store; node is the lookup key for ttEntry
ttEntry.Value := bestValue
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
else if bestValue ≥ β
ttEntry.Flag := LOWERBOUND
else
ttEntry.Flag := EXACT
endif
ttEntry.depth := depth
TranspositionTableStore( node, ttEntry )
return bestValue
And this is the iterative deepening call:
while(depth < ?)
{
depth++;
rootNegamaxValue := negamax( rootNode, depth, -∞, +∞, 1)
}
Of course, when I know the total number of moves in a game I could use depth < numberOfMovesLeft as an upper bound. But if this information is not given, when do I know that another call of negamax doesn't give any better result then the previous run? What do I need to change in the algorithm?
The short answer is: when you run out of time (and the transpositional tables are irrelevant to the answer/question)
Here I assume that your evaluation function is reasonable (gives good approximation of the position).
The main idea to combine the iterative deepening with alpha beta is the following: let's assume that you have 15 seconds to come up with the best move. How far can you search? I do not know and no one else know. You can try to search till depth = 8 only to find out that the search finished in 1 second (so you waisted available 14 seconds of time). With trial and error you found that depth = 10 gives you result in 13 seconds. So you decided to use it all the time. But now something went terribly wrong (your alpha beta was not pruning good enough, some of the positions took too much time to evaluate) and your result was not ready in 15 seconds. So you either made a random move or have lost the game.
So that this would never happened it is nice to have a good result ready. So you do the following. Get the best result for depth=1 and store it. Find the best result for depth=2, and overwrite it. And so on. From time to time check how much time left, and if it is really close to timelimit - return your best move.
Now you do not need to worry about the time, your method will give the best result you have found so far. With all these recalculations of different subtrees you only waste half of your resources (if you check the whole tree, but in alpha-beta you most probably are not). The additional advantage is that now you reorder the moves from the best to worse on each depth iteration and thus will make pruning more aggressive.

0/1 knapsack and dynamic programming

In http://en.wikipedia.org/wiki/Knapsack_problem, the DP is:
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
for j from 0 to W do
m[0, j] := 0
end for
for i from 1 to n do
for j from 0 to W do
if w[i] <= j then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
else
m[i, j] := m[i-1, j]
end if
end for
end for
I think switching the order for the weight loop and the number loop does not impact the optimal solution. Is this right? Say
for j from 0 to W do
for i from 1 to n do
Thanks.
You are correct. The value of m[i,j] depends only on values with both smaller is and js. The situation where changing the lop order matters is when one of the elements can increase. For example, if m[2,2] depends on m[1,3] then we need calculate the first row comlpetely before moving to the second row.

0/1 Knapsack Clarification and Optimization

I was reading wikipedia regarding the 0-1 knapsack problem. I just want to clarify a couple things. I have two questions:
http://en.wikipedia.org/wiki/Knapsack_problem#0.2F1_Knapsack_Problem
I encountered this pseudo-code:
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
for w from 0 to W do
m[0, w] := 0
end for
for i from 1 to n do
for j from 0 to W do
if j >= w[i] then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
else
m[i, j] := m[i-1, j]
end if
end for
end for
Specifically for this part:
if j >= w[i] then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
1) Correct me if I'm wrong, but shouldn't it be:
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i], m[i,j-w[i]] + v[i])
?
Or if not, can someone explain me why it's not needed?
...
2) And I also have another question, if say I want to optimize this a bit. Would it be wise to have the loop "for j from 0 to W" increment by the GCD of all the weights of the items (i.e. GCD of the values stored in array w). (I'm thinking just code-wise right now when I'm about to implement it).
1) When you add m[i,j-w[i]] + v[i], you're allowing the same item i to be selected more than once, thus it is no longer 0/1 Knapsack - it becomes a Knapsack problem with unlimited amounts of each item.
2) Yes, but this GCD usually comes down to 1 on real instances, thus not worth bothering with for general cases, unless you know beforehand that your data would benefit from it. (In this case, you'd actually want to divide all your data by the GCD, and keep the original algorithm incrementing 1 at a time, then multiply the final result by the GCD. This would save you memory as well, but your Knapsack capacity must also be divisible by such GCD)

Trouble applying the Alpha Beta Pruning algorithm to this tree

I am trying to apply the alpha beta pruning algorithm to this given tree.
I am stuck when I hit node C because after expanding all the children of B, I give A >= -4, I then expand C to get I =-3, which IS greater than -4 (-3 >= -4). Do I therefore, update A to -3? If so do I then afterwards, prune J and K because -3 >= -3 ? When I worked through the example, I pruned, J, K, M and N. I am really uncertain about this =(
EDIT:
Another question: After exploring B and passing the value of B to A, do we pass this value to C and thus to I? I saw an example that this was the case. Here it is: http://web.cecs.pdx.edu/~mm/AIFall2011/alphabeta-example.pdf
However, in this example, http://web.cecs.pdx.edu/~mm/AIFall2011/alphabeta-example.pdf, it doesn't seem to pass down values, instead it seems to only propagate values upwards. I am not sure which one is correct or if it makes a difference at all.
After expanding all the children of B, then A has α=-4, β=∞.
When you get to I, then α=-4, β=-3. α < β so J and K are not pruned. They would need to be evaluated to make sure that they're not less than -3, lowering the evaluation of C. The value of A is updated to α=-3, β=∞ after C is expanded. You can't use the updated alpha value of A when evaluating J because it wouldn't have been updated yet.
J and K would be pruned if I was -5 instead. In that case it wouldn't matter what J and K are because we already know the evaluation of C is worse than B because -5 < -4, and J and K can only make that worse.
Each node passes the alpha and beta values to its children. The children will then update their own copies of the alpha or beta value depending on whose turn it is and return the final evaluation of that node. That is then used to update the alpha or beta value of the parent.
See Alpha-Beta pruning for example:
function alphabeta(node, depth, α, β, Player)
if depth = 0 or node is a terminal node
return the heuristic value of node
if Player = MaxPlayer
for each child of node
α := max(α, alphabeta(child, depth-1, α, β, not(Player)))
if β ≤ α
break // Beta cut-off
return α
else
for each child of node
β := min(β, alphabeta(child, depth-1, α, β, not(Player)))
if β ≤ α
break // Alpha cut-off
return β
// Initial call
alphabeta(origin, depth, -infinity, +infinity, MaxPlayer)
Whenever I need to refresh my understanding of the algorithm I use this:
http://homepage.ufp.pt/jtorres/ensino/ia/alfabeta.html
You can enter your tree there and step through the algorithm. The values you would want are:
3 3 3 3
-2 -4 3 etc.
I find that deducing the algorithm from an example provides a deeper understanding.

Resources