Difference between minimax and negamax algorithms? [duplicate] - algorithm

I'm confused with these two. is Negamax just an optimization for minimax? or is Negamax is another search tree algorithm? If negamax is another search tree algorithm, then which one is better?

Extracted information from here
Negamax is a simplification of MinMax by using the following property :
max(a,b) = -min(-a,-b)
So, instead of having a conditonnal value compute in minmax, which is the following:
if maximizingPlayer then
value := −∞
for each child of node do
value := max(value, minimax(child, depth − 1, FALSE))
return value
else (* minimizing player *)
value := +∞
for each child of node do
value := min(value, minimax(child, depth − 1, TRUE))
you have a single line that does the same in Negamax:
value := max(value, −negamax(child, depth − 1, −color))
and the boolean is replaced by the notion (in the article) of color, which is just a value of 1 or -1 to alternate between player turn (if we should minimize or maximize the next turn)

Related

Is this A* pseudocode taking the condition of admisible ad consistent heuristic function hypothesis for granted?

This is the pseudocode of A* from wiki (https://en.wikipedia.org/wiki/A*_search_algorithm):
function reconstruct_path(cameFrom, current)
total_path := {current}
while current in cameFrom.Keys:
current := cameFrom[current]
total_path.prepend(current)
return total_path
// A* finds a path from start to goal.
// h is the heuristic function. h(n) estimates the cost to reach goal from node n.
function A_Star(start, goal, h)
// The set of discovered nodes that may need to be (re-)expanded.
// Initially, only the start node is known.
// This is usually implemented as a min-heap or priority queue rather than a hash-set.
openSet := {start}
// For node n, cameFrom[n] is the node immediately preceding it on the cheapest path from start
// to n currently known.
cameFrom := an empty map
// For node n, gScore[n] is the cost of the cheapest path from start to n currently known.
gScore := map with default value of Infinity
gScore[start] := 0
// For node n, fScore[n] := gScore[n] + h(n). fScore[n] represents our current best guess as to
// how cheap a path could be from start to finish if it goes through n.
fScore := map with default value of Infinity
fScore[start] := h(start)
while openSet is not empty
// This operation can occur in O(Log(N)) time if openSet is a min-heap or a priority queue
current := the node in openSet having the lowest fScore[] value
if current = goal
return reconstruct_path(cameFrom, current)
openSet.Remove(current)
for each neighbor of current
// d(current,neighbor) is the weight of the edge from current to neighbor
// tentative_gScore is the distance from start to the neighbor through current
tentative_gScore := gScore[current] + d(current, neighbor)
if tentative_gScore < gScore[neighbor]
// This path to neighbor is better than any previous one. Record it!
cameFrom[neighbor] := current
gScore[neighbor] := tentative_gScore
fScore[neighbor] := tentative_gScore + h(neighbor)
if neighbor not in openSet
openSet.add(neighbor)
// Open set is empty but goal was never reached
return failure
I was wondering if this algorithm is assuming that the heuristic function is admissible(never overestimates the actual cost to get to the goal) and consistent (h(x) ≤ d(x, y) + h(y))?
Because i found another pseudocode of A* that is more complex:
function A*(start,goal)
closedset := the empty set % The set of nodes already evaluated.
openset := set containing the initial node % The set of tentative nodes to be evaluated.
g_score[start] := 0 % Distance from start along optimal path.
came_from := the empty map % The map of navigated nodes.
h_score[start] := heuristic_estimate_of_distance(start, goal)
f_score[start] := h_score[start] % Estimated total distance from start to goal through y.
while openset is not empty
x := the node in openset having the lowest f_score[] value
if x = goal
return reconstruct_path(came_from,goal)
remove x from openset
add x to closedset
foreach y in neighbor_nodes(x)
if y in closedset
continue
tentative_g_score := g_score[x] + dist_between(x,y)
if y not in openset
add y to openset
tentative_is_better := true
elseif tentative_g_score < g_score[y]
tentative_is_better := true
else
tentative_is_better := false
if tentative_is_better = true
came_from[y] := x
g_score[y] := tentative_g_score
h_score[y] := heuristic_estimate_of_distance(y, goal)
f_score[y] := g_score[y] + h_score[y]
return failure
function reconstruct_path(came_from,current_node)
if came_from[current_node] is set
p = reconstruct_path(came_from,came_from[current_node])
return (p + current_node)
else
return the empty path
Both algorithm seems working correctly with euclidean heuristic function on an undirected graph that has euclidean distance between nodes as weight. But is the second pseudocode more general? Does the first one takes for granted the admissibility and the consistency of the heuristic function?
Neither algorithm can ensure to find the shortest path if the heuristic function is not admissible. But you are right that the first algorithm uses the assumption of consistency, while the second one does not.
This difference is expressed in the use of a closed set: the first algorithm does not maintain a closed set. The closed set of the second algorithm collects nodes for which the shortest path from the source to that node has been determined. The role of this closed set is to avoid that the algorithm would consider such a node as a next optimal target via a different path. This should never have success, as we already determined the shortest path to that node and it wasn't the current path.
However, if edge weights could be negative, then there might be a cycle that keeps making the cost of a path less and less, just by running through that cycle repeatedly. The first algorithm would get stuck in such an endless loop, while the second wouldn't.
Some other differences between the algorithms are not essential:
Only the first version uses a map that has infinity as default value: the second version does not have this default value, and so it has to check whether the neighbor is in the open set. If not, then this is the first visit to that neighbor and it should be made. The first version does not have to make this separate check, as in this case the best distance will be found to be infinity and so surely this first visit improves on that.
The second version stores the results of the heuristic function in an array: but it brings no benefit, because that value is only read from that array right after it is stored, so storing it is not necessary, and the first version demonstrates that.

Iterative Deepening A* graph with cycles implemenation

I have been trying to implement the Iterative Deepening A* algorithm where I have a graph with cycles .I have looked at the pseudo code of wikipedia found below :
node current node
g the cost to reach current node
f estimated cost of the cheapest path (root..node..goal)
h(node) estimated cost of the cheapest path (node..goal)
cost(node, succ) step cost function
is_goal(node) goal test
successors(node) node expanding function, expand nodes ordered by g + h(node)
procedure ida_star(root)
bound := h(root)
loop
t := search(root, 0, bound)
if t = FOUND then return bound
if t = ∞ then return NOT_FOUND
bound := t
end loop
end procedure
function search(node, g, bound)
f := g + h(node)
if f > bound then return f
if is_goal(node) then return FOUND
min := ∞
for succ in successors(node) do
t := search(succ, g + cost(node, succ), bound)
if t = FOUND then return FOUND
if t < min then min := t
end for
return min
end function
However the problem is that this pseudocode does not deal with cycles as when entering a cycle the loop does not terminate.How can this be done ?
I recommend you create two node lists and check them for each iteration:
Open list: Contains nodes that have not been expanded. Sorted by evaluation function f(n) = g(n) + h(n). Initially contains the root. To expand a node you get the first one from the list. Besides you add the successors to the list.
Closed list: Contains nodes that have been expanded. When you are going to expand a node you check that it is not in the closed list. If it is you discard it.
Hope this helps.

When to terminate iterative deepening with alpha beta pruning and transposition tables?

How do I know when I can stop increasing the depth for an iterative deepening algorithm with negamax alpha beta pruning and transposition tables? The following pseudo code taken from a wiki page:
function negamax(node, depth, α, β, color)
alphaOrig := α
// Transposition Table Lookup; node is the lookup key for ttEntry
ttEntry := TranspositionTableLookup( node )
if ttEntry is valid and ttEntry.depth ≥ depth
if ttEntry.Flag = EXACT
return ttEntry.Value
else if ttEntry.Flag = LOWERBOUND
α := max( α, ttEntry.Value)
else if ttEntry.Flag = UPPERBOUND
β := min( β, ttEntry.Value)
endif
if α ≥ β
return ttEntry.Value
endif
if depth = 0 or node is a terminal node
return color * the heuristic value of node
bestValue := -∞
childNodes := GenerateMoves(node)
childNodes := OrderMoves(childNodes)
foreach child in childNodes
val := -negamax(child, depth - 1, -β, -α, -color)
bestValue := max( bestValue, val )
α := max( α, val )
if α ≥ β
break
// Transposition Table Store; node is the lookup key for ttEntry
ttEntry.Value := bestValue
if bestValue ≤ alphaOrig
ttEntry.Flag := UPPERBOUND
else if bestValue ≥ β
ttEntry.Flag := LOWERBOUND
else
ttEntry.Flag := EXACT
endif
ttEntry.depth := depth
TranspositionTableStore( node, ttEntry )
return bestValue
And this is the iterative deepening call:
while(depth < ?)
{
depth++;
rootNegamaxValue := negamax( rootNode, depth, -∞, +∞, 1)
}
Of course, when I know the total number of moves in a game I could use depth < numberOfMovesLeft as an upper bound. But if this information is not given, when do I know that another call of negamax doesn't give any better result then the previous run? What do I need to change in the algorithm?
The short answer is: when you run out of time (and the transpositional tables are irrelevant to the answer/question)
Here I assume that your evaluation function is reasonable (gives good approximation of the position).
The main idea to combine the iterative deepening with alpha beta is the following: let's assume that you have 15 seconds to come up with the best move. How far can you search? I do not know and no one else know. You can try to search till depth = 8 only to find out that the search finished in 1 second (so you waisted available 14 seconds of time). With trial and error you found that depth = 10 gives you result in 13 seconds. So you decided to use it all the time. But now something went terribly wrong (your alpha beta was not pruning good enough, some of the positions took too much time to evaluate) and your result was not ready in 15 seconds. So you either made a random move or have lost the game.
So that this would never happened it is nice to have a good result ready. So you do the following. Get the best result for depth=1 and store it. Find the best result for depth=2, and overwrite it. And so on. From time to time check how much time left, and if it is really close to timelimit - return your best move.
Now you do not need to worry about the time, your method will give the best result you have found so far. With all these recalculations of different subtrees you only waste half of your resources (if you check the whole tree, but in alpha-beta you most probably are not). The additional advantage is that now you reorder the moves from the best to worse on each depth iteration and thus will make pruning more aggressive.

Minimax Algorithm Explanation

I'm looking at this pseudocode for the Minimax algorithm:
Function Minimax-Decision(state) returns an action
;inputs: state (current game state)
;'E' means element of, 'a' is the action
return a E Actions(state) maximizing Min-Value(Result(a, state))
Function Max-Value(state) returns a utility value
if Terminal-Test(state) then return Utility(state)
v <-- -infinity
for a, s in Successors(state) do v <-- Max(v, Min-Value(s))
return v
Function Min-Value(state) returns a utility value
if Terminal-Test(state) then return Utility(state)
v <-- infinity
for a, s in Successors(state) do v <-- Min(v, Max-Values(s))
return v
I think I know how the Minimax algorithm works. You fill "score" values from the bottom up starting from the utility scores. At Max's nodes are the largest of its children, Min's will be the smallest. Max predicts that Min will always try to put Max in the worst position possible for next turn, and using that knowledge, Max tries to make the best move possible.
So for: with order Max,Min,Max from the top
1) Max wants to make the best move for himself (max utility) so C1=5,C2=11,C3=8,etc
2) Max predicts that Min will want to put Max in the worst position possible (restrict Max to smallest utility) so B1=5,B2=2,B3=3
3) Max wants to make best possible move, so A=B1=5
Whats confusing me about the pseudocode is the double recursion, and the purpose of v. Can anyone break this one down for me?
Thanks for reading everyone!
I think you can understand this code by walking through an informal proof by induction that it works for trees of depth d.
For depth 1 you just have a single node and both Min-Value and Max-Value return the utility of this node.
For depth d > 1 consider Min-Value first. v starts off as infinity, so at the first call to Min(v, Max-Value(s)) v gets set to the utility of a child node, as computed by Max, because it is Max's turn after Min, and we know it is correct by induction because it is of depth d-1. (This call amounts to an assignment because v <= infinity), Later calls to Min(v, Max-Value(s)) in this routine end up computing the Min of Max-Value(s) over all children of the node passed in. So Min-Value ends up computing the minimum utility over all children of the node passed in, which is the value of that node to Min when it is Min's turn to move.
The argument for Max-Value is much the same as for Min-Value, so the induction tells you that, for trees of any depth, both Min-Value and Max-Value return to you the value of the node passed to them, when it is Max or Min's turn to move and make the choices associated with that node.
You could also show by induction that what this code does is equivalent to what you have described.
So the double recursion arises because it allows Max and Min to take alternate turns as you walk up the tree from the leaves, and v is a temporary which is used to work out the min or max value of all the child nodes of a tree.

What can be the efficient approach to solve the 8 puzzle problem?

The 8-puzzle is a square board with 9 positions, filled by 8 numbered tiles and one gap. At any point, a tile adjacent to the gap can be moved into the gap, creating a new gap position. In other words the gap can be swapped with an adjacent (horizontally and vertically) tile. The objective in the game is to begin with an arbitrary configuration of tiles, and move them so as to get the numbered tiles arranged in ascending order either running around the perimeter of the board or ordered from left to right, with 1 in the top left-hand position.
I was wondering what approach will be efficient to solve this problem?
I will just attempt to rewrite the previous answer with more details on why it is optimal.
The A* algorithm taken directly from wikipedia is
function A*(start,goal)
closedset := the empty set // The set of nodes already evaluated.
openset := set containing the initial node // The set of tentative nodes to be evaluated.
came_from := the empty map // The map of navigated nodes.
g_score[start] := 0 // Distance from start along optimal path.
h_score[start] := heuristic_estimate_of_distance(start, goal)
f_score[start] := h_score[start] // Estimated total distance from start to goal through y.
while openset is not empty
x := the node in openset having the lowest f_score[] value
if x = goal
return reconstruct_path(came_from, came_from[goal])
remove x from openset
add x to closedset
foreach y in neighbor_nodes(x)
if y in closedset
continue
tentative_g_score := g_score[x] + dist_between(x,y)
if y not in openset
add y to openset
tentative_is_better := true
elseif tentative_g_score < g_score[y]
tentative_is_better := true
else
tentative_is_better := false
if tentative_is_better = true
came_from[y] := x
g_score[y] := tentative_g_score
h_score[y] := heuristic_estimate_of_distance(y, goal)
f_score[y] := g_score[y] + h_score[y]
return failure
function reconstruct_path(came_from, current_node)
if came_from[current_node] is set
p = reconstruct_path(came_from, came_from[current_node])
return (p + current_node)
else
return current_node
So let me fill in all the details here.
heuristic_estimate_of_distance is the function Σ d(xi) where d(.) is the Manhattan distance of each square xi from its goal state.
So the setup
1 2 3
4 7 6
8 5
would have a heuristic_estimate_of_distance of 1+2+1=4 since each of 8,5 are one away from their goal position with d(.)=1 and 7 is 2 away from its goal state with d(7)=2.
The set of nodes that the A* searches over is defined to be the starting position followed by all possible legal positions. That is lets say the starting position x is as above:
x =
1 2 3
4 7 6
8 5
then the function neighbor_nodes(x) produces the 2 possible legal moves:
1 2 3
4 7
8 5 6
or
1 2 3
4 7 6
8 5
The function dist_between(x,y) is defined as the number of square moves that took place to transition from state x to y. This is mostly going to be equal to 1 in A* always for the purposes of your algorithm.
closedset and openset are both specific to the A* algorithm and can be implemented using standard data structures (priority queues I believe.) came_from is a data structure used
to reconstruct the solution found using the function reconstruct_path who's details can be found on wikipedia. If you do not wish to remember the solution you do not need to implement this.
Last, I will address the issue of optimality. Consider the excerpt from the A* wikipedia article:
"If the heuristic function h is admissible, meaning that it never overestimates the actual minimal cost of reaching the goal, then A* is itself admissible (or optimal) if we do not use a closed set. If a closed set is used, then h must also be monotonic (or consistent) for A* to be optimal. This means that for any pair of adjacent nodes x and y, where d(x,y) denotes the length of the edge between them, we must have:
h(x) <= d(x,y) +h(y)"
So it suffices to show that our heuristic is admissible and monotonic. For the former (admissibility), note that given any configuration our heuristic (sum of all distances) estimates that each square is not constrained by only legal moves and can move freely towards its goal position, which is clearly an optimistic estimate, hence our heuristic is admissible (or it never over-estimates since reaching a goal position will always take at least as many moves as the heuristic estimates.)
The monotonicity requirement stated in words is:
"The heuristic cost (estimated distance to goal state) of any node must be less than or equal to the cost of transitioning to any adjacent node plus the heuristic cost of that node."
It is mainly to prevent the possibility of negative cycles, where transitioning to an unrelated node may decrease the distance to the goal node more than the cost of actually making the transition, suggesting a poor heuristic.
To show monotonicity its pretty simple in our case. Any adjacent nodes x,y have d(x,y)=1 by our definition of d. Thus we need to show
h(x) <= h(y) + 1
which is equivalent to
h(x) - h(y) <= 1
which is equivalent to
Σ d(xi) - Σ d(yi) <= 1
which is equivalent to
Σ d(xi) - d(yi) <= 1
We know by our definition of neighbor_nodes(x) that two neighbour nodes x,y can have at most the position of one square differing, meaning that in our sums the term
d(xi) - d(yi) = 0
for all but 1 value of i. Lets say without loss of generality this is true of i=k. Furthermore, we know that for i=k, the node has moved at most one place, so its distance to
a goal state must be at most one more than in the previous state thus:
Σ d(xi) - d(yi) = d(xk) - d(yk) <= 1
showing monotonicity. This shows what needed to be showed, thus proving this algorithm will be optimal (in a big-O notation or asymptotic kind of way.)
Note, that I have shown optimality in terms of big-O notation but there is still lots of room to play in terms of tweaking the heuristic. You can add additional twists to it so that it is a closer estimate of the actual distance to the goal state, however you have to make sure that the heuristic is always an underestimate otherwise you loose optimality!
EDIT MANY MOONS LATER
Reading this over again (much) later, I realized the way I wrote it sort of confounds the meaning of optimality of this algorithm.
There are two distinct meanings of optimality I was trying to get at here:
1) The algorithm produces an optimal solution, that is the best possible solution given the objective criteria.
2) The algorithm expands the least number of state nodes of all possible algorithms using the same heuristic.
The simplest way to understand why you need admissibility and monotonicity of the heuristic to obtain 1) is to view A* as an application of Dijkstra's shortest path algorithm on a graph where the edge weights are given by the node distance traveled thus far plus the heuristic distance. Without these two properties, we would have negative edges in the graph, thereby negative cycles would be possible and Dijkstra's shortest path algorithm would no longer return the correct answer! (Construct a simple example of this to convince yourself.)
2) is actually quite confusing to understand. To fully understand the meaning of this, there are a lot of quantifiers on this statement, such as when talking about other algorithms, one refers to similar algorithms as A* that expand nodes and search without a-priori information (other than the heuristic.) Obviously, one can construct a trivial counter-example otherwise, such as an oracle or genie that tells you the answer at every step of the way. To understand this statement in depth I highly suggest reading the last paragraph in the History section on Wikipedia as well as looking into all the citations and footnotes in that carefully stated sentence.
I hope this clears up any remaining confusion among would-be readers.
You can use the heuristic that is based on the positions of the numbers, that is the higher the overall sum of all the distances of each letter from its goal state is, the higher the heuristic value. Then you can implement A* search which can be proved to be the optimal search in terms of time and space complexity (provided the heuristic is monotonic and admissible.) http://en.wikipedia.org/wiki/A*_search_algorithm
Since the OP cannot post a picture, this is what he's talking about:
As far as solving this puzzle, goes, take a look at the iterative deepening depth-first search algorithm, as made relevant to the 8-puzzle problem by this page.
Donut's got it! IDDFS will do the trick, considering the relatively limited search space of this puzzle. It would be efficient hence respond to the OP's question. It would find the optimal solution, but not necessarily in optimal complexity.
Implementing IDDFS would be the more complicated part of this problem, I just want to suggest an simple approach to managing the board, the games rules etc. This in particular addresses a way to obtain initial states for the puzzle which are solvable. An hinted in the notes of the question, not all random assignemts of 9 tites (considering the empty slot a special tile), will yield a solvable puzzle. It is a matter of mathematical parity... So, here's a suggestions to model the game:
Make the list of all 3x3 permutation matrices which represent valid "moves" of the game.
Such list is a subset of 3x3s w/ all zeros and two ones. Each matrix gets an ID which will be quite convenient to keep track of the moves, in the IDDFS search tree. An alternative to matrices, is to have two-tuples of the tile position numbers to swap, this may lead to faster implementation.
Such matrices can be used to create the initial puzzle state, starting with the "win" state, and running a arbitrary number of permutations selected at random. In addition to ensuring that the initial state is solvable this approach also provides a indicative number of moves with which a given puzzle can be solved.
Now let's just implement the IDDFS algo and [joke]return the assignement for an A+[/joke]...
This is an example of the classical shortest path algorithm. You can read more about shortest path here and here.
In short, think of all possible states of the puzzle as of vertices in some graph. With each move you change states - so, each valid move represents an edge of the graph. Since moves don't have any cost, you may think of the cost of each move being 1. The following c++-like pseudo-code will work for this problem:
{
int[][] field = new int[3][3];
// fill the input here
map<string, int> path;
queue<string> q;
put(field, 0); // we can get to the starting position in 0 turns
while (!q.empty()) {
string v = q.poll();
int[][] take = decode(v);
int time = path.get(v);
if (isFinalPosition(take)) {
return time;
}
for each valid move from take to int[][] newPosition {
put(newPosition, time + 1);
}
}
// no path
return -1;
}
void isFinalPosition(int[][] q) {
return encode(q) == "123456780"; // 0 represents empty space
}
void put(int[][] position, int time) {
string s = encode(newPosition);
if (!path.contains(s)) {
path.put(s, time);
}
}
string encode(int[][] field) {
string s = "";
for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) s += field[i][j];
return s;
}
int[][] decode(string s) {
int[][] ans = new int[3][3];
for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) field[i][j] = s[i * 3 + j];
return ans;
}
See this link for my parallel iterative deepening search for a solution to the 15-puzzle, which is the 4x4 big-brother of the 8-puzzle.

Resources