I don't get why flags for table entries are used as they are. Consider e.g. the pseudocode for Negamax with alpha-beta pruning and transposition tables and concentrate on the TT parts.
(* Transposition Table Lookup; node is the lookup key for ttEntry *)
ttEntry := transpositionTableLookup(node)
if ttEntry is valid and ttEntry.depth ≥ depth then
if ttEntry.flag = EXACT then
return ttEntry.value
else if ttEntry.flag = LOWERBOUND then
α := max(α, ttEntry.value)
else if ttEntry.flag = UPPERBOUND then
β := min(β, ttEntry.value)
if α ≥ β then
return ttEntry.value
That's OK. If entry contains the lower bound of the exact value we try to shrink the window from the left, etc.
(* Transposition Table Store; node is the lookup key for ttEntry *)
ttEntry.value := value
if value ≤ alphaOrig then
ttEntry.flag := UPPERBOUND
else if value ≥ β then
ttEntry.flag := LOWERBOUND
else
ttEntry.flag := EXACT
ttEntry.depth := depth
transpositionTableStore(node, ttEntry)
And this part I don't understand. Why we set UPPERBOUND flag if value was too small? value is on the left side of the search window -- it's smaller than the known lower bound -- alpha. So it seems that value should be a LOWERBOUND.
I'm certainly wrong with that logic, as it seen from my tests and the fact that everyone using that version. But I don't get why.
On the second thought, the question is trivial :)
Indeed, if the child node value was too good to cause a beta-cutoff (value ≥ β), it means that parent node has move that is at lest good as value, but maybe there are some even better move. So the value is the LOWERBOUND for the exact node value.
value ≤ alphaOrig means that all moves was worse than alphaOrig. That means that value is the UPPERBOUND of consequences of all moves.
Lower and Upper are bounds of the value of the current node, not the root one, as I somehow implied.
Related
I'm confused with these two. is Negamax just an optimization for minimax? or is Negamax is another search tree algorithm? If negamax is another search tree algorithm, then which one is better?
Extracted information from here
Negamax is a simplification of MinMax by using the following property :
max(a,b) = -min(-a,-b)
So, instead of having a conditonnal value compute in minmax, which is the following:
if maximizingPlayer then
value := −∞
for each child of node do
value := max(value, minimax(child, depth − 1, FALSE))
return value
else (* minimizing player *)
value := +∞
for each child of node do
value := min(value, minimax(child, depth − 1, TRUE))
you have a single line that does the same in Negamax:
value := max(value, −negamax(child, depth − 1, −color))
and the boolean is replaced by the notion (in the article) of color, which is just a value of 1 or -1 to alternate between player turn (if we should minimize or maximize the next turn)
This is the pseudocode of A* from wiki (https://en.wikipedia.org/wiki/A*_search_algorithm):
function reconstruct_path(cameFrom, current)
total_path := {current}
while current in cameFrom.Keys:
current := cameFrom[current]
total_path.prepend(current)
return total_path
// A* finds a path from start to goal.
// h is the heuristic function. h(n) estimates the cost to reach goal from node n.
function A_Star(start, goal, h)
// The set of discovered nodes that may need to be (re-)expanded.
// Initially, only the start node is known.
// This is usually implemented as a min-heap or priority queue rather than a hash-set.
openSet := {start}
// For node n, cameFrom[n] is the node immediately preceding it on the cheapest path from start
// to n currently known.
cameFrom := an empty map
// For node n, gScore[n] is the cost of the cheapest path from start to n currently known.
gScore := map with default value of Infinity
gScore[start] := 0
// For node n, fScore[n] := gScore[n] + h(n). fScore[n] represents our current best guess as to
// how cheap a path could be from start to finish if it goes through n.
fScore := map with default value of Infinity
fScore[start] := h(start)
while openSet is not empty
// This operation can occur in O(Log(N)) time if openSet is a min-heap or a priority queue
current := the node in openSet having the lowest fScore[] value
if current = goal
return reconstruct_path(cameFrom, current)
openSet.Remove(current)
for each neighbor of current
// d(current,neighbor) is the weight of the edge from current to neighbor
// tentative_gScore is the distance from start to the neighbor through current
tentative_gScore := gScore[current] + d(current, neighbor)
if tentative_gScore < gScore[neighbor]
// This path to neighbor is better than any previous one. Record it!
cameFrom[neighbor] := current
gScore[neighbor] := tentative_gScore
fScore[neighbor] := tentative_gScore + h(neighbor)
if neighbor not in openSet
openSet.add(neighbor)
// Open set is empty but goal was never reached
return failure
I was wondering if this algorithm is assuming that the heuristic function is admissible(never overestimates the actual cost to get to the goal) and consistent (h(x) ≤ d(x, y) + h(y))?
Because i found another pseudocode of A* that is more complex:
function A*(start,goal)
closedset := the empty set % The set of nodes already evaluated.
openset := set containing the initial node % The set of tentative nodes to be evaluated.
g_score[start] := 0 % Distance from start along optimal path.
came_from := the empty map % The map of navigated nodes.
h_score[start] := heuristic_estimate_of_distance(start, goal)
f_score[start] := h_score[start] % Estimated total distance from start to goal through y.
while openset is not empty
x := the node in openset having the lowest f_score[] value
if x = goal
return reconstruct_path(came_from,goal)
remove x from openset
add x to closedset
foreach y in neighbor_nodes(x)
if y in closedset
continue
tentative_g_score := g_score[x] + dist_between(x,y)
if y not in openset
add y to openset
tentative_is_better := true
elseif tentative_g_score < g_score[y]
tentative_is_better := true
else
tentative_is_better := false
if tentative_is_better = true
came_from[y] := x
g_score[y] := tentative_g_score
h_score[y] := heuristic_estimate_of_distance(y, goal)
f_score[y] := g_score[y] + h_score[y]
return failure
function reconstruct_path(came_from,current_node)
if came_from[current_node] is set
p = reconstruct_path(came_from,came_from[current_node])
return (p + current_node)
else
return the empty path
Both algorithm seems working correctly with euclidean heuristic function on an undirected graph that has euclidean distance between nodes as weight. But is the second pseudocode more general? Does the first one takes for granted the admissibility and the consistency of the heuristic function?
Neither algorithm can ensure to find the shortest path if the heuristic function is not admissible. But you are right that the first algorithm uses the assumption of consistency, while the second one does not.
This difference is expressed in the use of a closed set: the first algorithm does not maintain a closed set. The closed set of the second algorithm collects nodes for which the shortest path from the source to that node has been determined. The role of this closed set is to avoid that the algorithm would consider such a node as a next optimal target via a different path. This should never have success, as we already determined the shortest path to that node and it wasn't the current path.
However, if edge weights could be negative, then there might be a cycle that keeps making the cost of a path less and less, just by running through that cycle repeatedly. The first algorithm would get stuck in such an endless loop, while the second wouldn't.
Some other differences between the algorithms are not essential:
Only the first version uses a map that has infinity as default value: the second version does not have this default value, and so it has to check whether the neighbor is in the open set. If not, then this is the first visit to that neighbor and it should be made. The first version does not have to make this separate check, as in this case the best distance will be found to be infinity and so surely this first visit improves on that.
The second version stores the results of the heuristic function in an array: but it brings no benefit, because that value is only read from that array right after it is stored, so storing it is not necessary, and the first version demonstrates that.
I'm trying to implement the alpha beta pruning with transpositional tables, I found the pseudocode of the algorithm in wikipedia: https://en.wikipedia.org/wiki/Negamax#cite_note-Breuker-1
However I belive that this psudocode is wrong, I think that alphaOrig is useless and instead of:
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
It should be:
if bestValue ≤ α
ttEntry.Flag := UPPERBOUND
Can anyone confirm if I'm right or explain to me why I'm wrong, thanks!
Here the pseudocode:
function negamax(node, depth, α, β, color)
alphaOrig := α
// Transposition Table Lookup; node is the lookup key for ttEntry
ttEntry := TranspositionTableLookup( node )
if ttEntry is valid and ttEntry.depth ≥ depth
if ttEntry.Flag = EXACT
return ttEntry.Value
else if ttEntry.Flag = LOWERBOUND
α := max( α, ttEntry.Value)
else if ttEntry.Flag = UPPERBOUND
β := min( β, ttEntry.Value)
endif
if α ≥ β
return ttEntry.Value
endif
if depth = 0 or node is a terminal node
return color * the heuristic value of node
bestValue := -∞
childNodes := GenerateMoves(node)
childNodes := OrderMoves(childNodes)
foreach child in childNodes
v := -negamax(child, depth - 1, -β, -α, -color)
bestValue := max( bestValue, v )
α := max( α, v )
if α ≥ β
break
// Transposition Table Store; node is the lookup key for ttEntry
ttEntry.Value := bestValue
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
else if bestValue ≥ β
ttEntry.Flag := LOWERBOUND
else
ttEntry.Flag := EXACT
endif
ttEntry.depth := depth
TranspositionTableStore( node, ttEntry )
return bestValue
There are different implementations for alpha beta pruning with transposition tables available. For example the one from Marsland: A REVIEW OF GAME-TREE PRUNING, Breuker: Memory versus Search in Games and Carolus: Alpha-Beta with Sibling Prediction Pruning in Chess
For my answer I will quote a snippet of the Talk:Negamax page:
Marsland transposition table logic is equivalent when alphaOrig in Breuker stores α after the transposition table lookup (rather than before). But consider the following case during a negamax function call:
transposition table lookup updates α because it's a "lower bound" (Breuker: alphaOrig < α Marsland: alphaOrig = α)
the move evaluation returns the same as unchanged α for bestValue (score)
update the node's transposition table entry with the same bestValue (score)
In Breuker's logic, the node's transposition table entry will update with "exact" flag (since alphaOrig < bestValue < β). In Marsland, the update will have "upper bound" flag (since score ≤ α). Optimally, the flag for the score should be "exact" rather than alternating between upper and lower bound. So I think Breuker's version is better?
In Carolus, there's no alphaOrig and no equivalent. alpha updates during move evaluation. In this case, after move evaluation, best can never be greater than alpha, and setting "exact" flag for the transposition table entry is impossible.
There are even more discussion about this on the talk page of the Negamax article.
How do I know when I can stop increasing the depth for an iterative deepening algorithm with negamax alpha beta pruning and transposition tables? The following pseudo code taken from a wiki page:
function negamax(node, depth, α, β, color)
alphaOrig := α
// Transposition Table Lookup; node is the lookup key for ttEntry
ttEntry := TranspositionTableLookup( node )
if ttEntry is valid and ttEntry.depth ≥ depth
if ttEntry.Flag = EXACT
return ttEntry.Value
else if ttEntry.Flag = LOWERBOUND
α := max( α, ttEntry.Value)
else if ttEntry.Flag = UPPERBOUND
β := min( β, ttEntry.Value)
endif
if α ≥ β
return ttEntry.Value
endif
if depth = 0 or node is a terminal node
return color * the heuristic value of node
bestValue := -∞
childNodes := GenerateMoves(node)
childNodes := OrderMoves(childNodes)
foreach child in childNodes
val := -negamax(child, depth - 1, -β, -α, -color)
bestValue := max( bestValue, val )
α := max( α, val )
if α ≥ β
break
// Transposition Table Store; node is the lookup key for ttEntry
ttEntry.Value := bestValue
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
else if bestValue ≥ β
ttEntry.Flag := LOWERBOUND
else
ttEntry.Flag := EXACT
endif
ttEntry.depth := depth
TranspositionTableStore( node, ttEntry )
return bestValue
And this is the iterative deepening call:
while(depth < ?)
{
depth++;
rootNegamaxValue := negamax( rootNode, depth, -∞, +∞, 1)
}
Of course, when I know the total number of moves in a game I could use depth < numberOfMovesLeft as an upper bound. But if this information is not given, when do I know that another call of negamax doesn't give any better result then the previous run? What do I need to change in the algorithm?
The short answer is: when you run out of time (and the transpositional tables are irrelevant to the answer/question)
Here I assume that your evaluation function is reasonable (gives good approximation of the position).
The main idea to combine the iterative deepening with alpha beta is the following: let's assume that you have 15 seconds to come up with the best move. How far can you search? I do not know and no one else know. You can try to search till depth = 8 only to find out that the search finished in 1 second (so you waisted available 14 seconds of time). With trial and error you found that depth = 10 gives you result in 13 seconds. So you decided to use it all the time. But now something went terribly wrong (your alpha beta was not pruning good enough, some of the positions took too much time to evaluate) and your result was not ready in 15 seconds. So you either made a random move or have lost the game.
So that this would never happened it is nice to have a good result ready. So you do the following. Get the best result for depth=1 and store it. Find the best result for depth=2, and overwrite it. And so on. From time to time check how much time left, and if it is really close to timelimit - return your best move.
Now you do not need to worry about the time, your method will give the best result you have found so far. With all these recalculations of different subtrees you only waste half of your resources (if you check the whole tree, but in alpha-beta you most probably are not). The additional advantage is that now you reorder the moves from the best to worse on each depth iteration and thus will make pruning more aggressive.
I am trying to apply the alpha beta pruning algorithm to this given tree.
I am stuck when I hit node C because after expanding all the children of B, I give A >= -4, I then expand C to get I =-3, which IS greater than -4 (-3 >= -4). Do I therefore, update A to -3? If so do I then afterwards, prune J and K because -3 >= -3 ? When I worked through the example, I pruned, J, K, M and N. I am really uncertain about this =(
EDIT:
Another question: After exploring B and passing the value of B to A, do we pass this value to C and thus to I? I saw an example that this was the case. Here it is: http://web.cecs.pdx.edu/~mm/AIFall2011/alphabeta-example.pdf
However, in this example, http://web.cecs.pdx.edu/~mm/AIFall2011/alphabeta-example.pdf, it doesn't seem to pass down values, instead it seems to only propagate values upwards. I am not sure which one is correct or if it makes a difference at all.
After expanding all the children of B, then A has α=-4, β=∞.
When you get to I, then α=-4, β=-3. α < β so J and K are not pruned. They would need to be evaluated to make sure that they're not less than -3, lowering the evaluation of C. The value of A is updated to α=-3, β=∞ after C is expanded. You can't use the updated alpha value of A when evaluating J because it wouldn't have been updated yet.
J and K would be pruned if I was -5 instead. In that case it wouldn't matter what J and K are because we already know the evaluation of C is worse than B because -5 < -4, and J and K can only make that worse.
Each node passes the alpha and beta values to its children. The children will then update their own copies of the alpha or beta value depending on whose turn it is and return the final evaluation of that node. That is then used to update the alpha or beta value of the parent.
See Alpha-Beta pruning for example:
function alphabeta(node, depth, α, β, Player)
if depth = 0 or node is a terminal node
return the heuristic value of node
if Player = MaxPlayer
for each child of node
α := max(α, alphabeta(child, depth-1, α, β, not(Player)))
if β ≤ α
break // Beta cut-off
return α
else
for each child of node
β := min(β, alphabeta(child, depth-1, α, β, not(Player)))
if β ≤ α
break // Alpha cut-off
return β
// Initial call
alphabeta(origin, depth, -infinity, +infinity, MaxPlayer)
Whenever I need to refresh my understanding of the algorithm I use this:
http://homepage.ufp.pt/jtorres/ensino/ia/alfabeta.html
You can enter your tree there and step through the algorithm. The values you would want are:
3 3 3 3
-2 -4 3 etc.
I find that deducing the algorithm from an example provides a deeper understanding.