What algorithm for a tic-tac-toe game can I use to determine the "best move" for the AI? - algorithm

In a tic-tac-toe implementation I guess that the challenging part is to determine the best move to be played by the machine.
What are the algorithms that can pursued? I'm looking into implementations from simple to complex. How would I go about tackling this part of the problem?

The strategy from Wikipedia for playing a perfect game (win or tie every time) seems like straightforward pseudo-code:
Quote from Wikipedia (Tic Tac Toe#Strategy)
A player can play a perfect game of Tic-tac-toe (to win or, at least, draw) if they choose the first available move from the following list, each turn, as used in Newell and Simon's 1972 tic-tac-toe program.[6]
Win: If you have two in a row, play the third to get three in a row.
Block: If the opponent has two in a row, play the third to block them.
Fork: Create an opportunity where you can win in two ways.
Block Opponent's Fork:
Option 1: Create two in a row to force
the opponent into defending, as long
as it doesn't result in them creating
a fork or winning. For example, if "X"
has a corner, "O" has the center, and
"X" has the opposite corner as well,
"O" must not play a corner in order to
win. (Playing a corner in this
scenario creates a fork for "X" to
win.)
Option 2: If there is a configuration
where the opponent can fork, block
that fork.
Center: Play the center.
Opposite Corner: If the opponent is in the corner, play the opposite
corner.
Empty Corner: Play an empty corner.
Empty Side: Play an empty side.
Recognizing what a "fork" situation looks like could be done in a brute-force manner as suggested.
Note: A "perfect" opponent is a nice exercise but ultimately not worth 'playing' against. You could, however, alter the priorities above to give characteristic weaknesses to opponent personalities.

What you need (for tic-tac-toe or a far more difficult game like Chess) is the minimax algorithm, or its slightly more complicated variant, alpha-beta pruning. Ordinary naive minimax will do fine for a game with as small a search space as tic-tac-toe, though.
In a nutshell, what you want to do is not to search for the move that has the best possible outcome for you, but rather for the move where the worst possible outcome is as good as possible. If you assume your opponent is playing optimally, you have to assume they will take the move that is worst for you, and therefore you have to take the move that MINimises their MAXimum gain.

The brute force method of generating every single possible board and scoring it based on the boards it later produces further down the tree doesn't require much memory, especially once you recognize that 90 degree board rotations are redundant, as are flips about the vertical, horizontal, and diagonal axis.
Once you get to that point, there's something like less than 1k of data in a tree graph to describe the outcome, and thus the best move for the computer.
-Adam

A typical algo for tic-tac-toe should look like this:
Board : A nine-element vector representing the board. We store 2 (indicating
Blank), 3 (indicating X), or 5 (indicating O).
Turn: An integer indicating which move of the game about to be played.
The 1st move will be indicated by 1, last by 9.
The Algorithm
The main algorithm uses three functions.
Make2: returns 5 if the center square of the board is blank i.e. if board[5]=2. Otherwise, this function returns any non-corner square (2, 4, 6 or 8).
Posswin(p): Returns 0 if player p can’t win on his next move; otherwise, it returns the number of the square that constitutes a winning move. This function will enable the program both to win and to block opponents win. This function operates by checking each of the rows, columns, and diagonals. By multiplying the values of each square together for an entire row (or column or diagonal), the possibility of a win can be checked. If the product is 18 (3 x 3 x 2), then X can win. If the product is 50 (5 x 5 x 2), then O can win. If a winning row (column or diagonal) is found, the blank square in it can be determined and the number of that square is returned by this function.
Go (n): makes a move in square n. this procedure sets board [n] to 3 if Turn is odd, or 5 if Turn is even. It also increments turn by one.
The algorithm has a built-in strategy for each move. It makes the odd numbered
move if it plays X, the even-numbered move if it plays O.
Turn = 1 Go(1) (upper left corner).
Turn = 2 If Board[5] is blank, Go(5), else Go(1).
Turn = 3 If Board[9] is blank, Go(9), else Go(3).
Turn = 4 If Posswin(X) is not 0, then Go(Posswin(X)) i.e. [ block opponent’s win], else Go(Make2).
Turn = 5 if Posswin(X) is not 0 then Go(Posswin(X)) [i.e. win], else if Posswin(O) is not 0, then Go(Posswin(O)) [i.e. block win], else if Board[7] is blank, then Go(7), else Go(3). [to explore other possibility if there be any ].
Turn = 6 If Posswin(O) is not 0 then Go(Posswin(O)), else if Posswin(X) is not 0, then Go(Posswin(X)), else Go(Make2).
Turn = 7 If Posswin(X) is not 0 then Go(Posswin(X)), else if Posswin(X) is not 0, then Go(Posswin(O)) else go anywhere that is blank.
Turn = 8 if Posswin(O) is not 0 then Go(Posswin(O)), else if Posswin(X) is not 0, then Go(Posswin(X)), else go anywhere that is blank.
Turn = 9 Same as Turn=7.
I have used it. Let me know how you guys feel.

Since you're only dealing with a 3x3 matrix of possible locations, it'd be pretty easy to just write a search through all possibilities without taxing you computing power. For each open space, compute through all the possible outcomes after that marking that space (recursively, I'd say), then use the move with the most possibilities of winning.
Optimizing this would be a waste of effort, really. Though some easy ones might be:
Check first for possible wins for
the other team, block the first one
you find (if there are 2 the games
over anyway).
Always take the center if it's open
(and the previous rule has no
candidates).
Take corners ahead of sides (again,
if the previous rules are empty)

You can have the AI play itself in some sample games to learn from. Use a supervised learning algorithm, to help it along.

An attempt without using a play field.
to win(your double)
if not, not to lose(opponent's double)
if not, do you already have a fork(have a double double)
if not, if opponent has a fork
search in blocking points for possible double and fork(ultimate win)
if not search forks in blocking points(which gives the opponent the most losing possibilities )
if not only blocking points(not to lose)
if not search for double and fork(ultimate win)
if not search only for forks which gives opponent the most losing possibilities
if not search only for a double
if not dead end, tie, random.
if not(it means your first move)
if it's the first move of the game;
give the opponent the most losing possibility(the algorithm results in only corners which gives 7 losing point possibility to opponent)
or for breaking boredom just random.
if it's second move of the game;
find only the not losing points(gives a little more options)
or find the points in this list which has the best winning chance(it can be boring,cause it results in only all corners or adjacent corners or center)
Note: When you have double and forks, check if your double gives the opponent a double.if it gives, check if that your new mandatory point is included in your fork list.

Rank each of the squares with numeric scores. If a square is taken, move on to the next choice (sorted in descending order by rank). You're going to need to choose a strategy (there are two main ones for going first and three (I think) for second). Technically, you could just program all of the strategies and then choose one at random. That would make for a less predictable opponent.

This answer assumes you understand implementing the perfect algorithm for P1 and discusses how to achieve a win in conditions against ordinary human players, who will make some mistakes more commonly than others.
The game of course should end in a draw if both players play optimally. At a human level, P1 playing in a corner produces wins far more often. For whatever psychological reason, P2 is baited into thinking that playing in the center is not that important, which is unfortunate for them, since it's the only response that does not create a winning game for P1.
If P2 does correctly block in the center, P1 should play the opposite corner, because again, for whatever psychological reason, P2 will prefer the symmetry of playing a corner, which again produces a losing board for them.
For any move P1 may make for the starting move, there is a move P2 may make that will create a win for P1 if both players play optimally thereafter. In that sense P1 may play wherever. The edge moves are weakest in the sense that the largest fraction of possible responses to this move produce a draw, but there are still responses that will create a win for P1.
Empirically (more precisely, anecdotally) the best P1 starting moves seem to be first corner, second center, and last edge.
The next challenge you can add, in person or via a GUI, is not to display the board. A human can definitely remember all the state but the added challenge leads to a preference for symmetric boards, which take less effort to remember, leading to the mistake I outlined in the first branch.
I'm a lot of fun at parties, I know.

A Tic-tac-toe adaptation to the min max algorithem
let gameBoard: [
[null, null, null],
[null, null, null],
[null, null, null]
]
const SYMBOLS = {
X:'X',
O:'O'
}
const RESULT = {
INCOMPLETE: "incomplete",
PLAYER_X_WON: SYMBOLS.x,
PLAYER_O_WON: SYMBOLS.o,
tie: "tie"
}
We'll need a function that can check for the result. The function will check for a succession of chars. What ever the state of the board is, the result is one of 4 options: either Incomplete, player X won, Player O won or a tie.
function checkSuccession (line){
if (line === SYMBOLS.X.repeat(3)) return SYMBOLS.X
if (line === SYMBOLS.O.repeat(3)) return SYMBOLS.O
return false
}
function getResult(board){
let result = RESULT.incomplete
if (moveCount(board)<5){
return result
}
let lines
//first we check row, then column, then diagonal
for (var i = 0 ; i<3 ; i++){
lines.push(board[i].join(''))
}
for (var j=0 ; j<3; j++){
const column = [board[0][j],board[1][j],board[2][j]]
lines.push(column.join(''))
}
const diag1 = [board[0][0],board[1][1],board[2][2]]
lines.push(diag1.join(''))
const diag2 = [board[0][2],board[1][1],board[2][0]]
lines.push(diag2.join(''))
for (i=0 ; i<lines.length ; i++){
const succession = checkSuccesion(lines[i])
if(succession){
return succession
}
}
//Check for tie
if (moveCount(board)==9){
return RESULT.tie
}
return result
}
Our getBestMove function will receive the state of the board, and the symbol of the player for which we want to determine the best possible move. Our function will check all possible moves with the getResult function. If it is a win it will give it a score of 1. if it's a loose it will get a score of -1, a tie will get a score of 0. If it is undetermined we will call the getBestMove function with the new state of the board and the opposite symbol. Since the next move is of the oponent, his victory is the lose of the current player, and the score will be negated. At the end possible move receives a score of either 1,0 or -1, we can sort the moves, and return the move with the highest score.
const copyBoard = (board) => board.map(
row => row.map( square => square )
)
function getAvailableMoves (board) {
let availableMoves = []
for (let row = 0 ; row<3 ; row++){
for (let column = 0 ; column<3 ; column++){
if (board[row][column]===null){
availableMoves.push({row, column})
}
}
}
return availableMoves
}
function applyMove(board,move, symbol) {
board[move.row][move.column]= symbol
return board
}
function getBestMove (board, symbol){
let availableMoves = getAvailableMoves(board)
let availableMovesAndScores = []
for (var i=0 ; i<availableMoves.length ; i++){
let move = availableMoves[i]
let newBoard = copyBoard(board)
newBoard = applyMove(newBoard,move, symbol)
result = getResult(newBoard,symbol).result
let score
if (result == RESULT.tie) {score = 0}
else if (result == symbol) {
score = 1
}
else {
let otherSymbol = (symbol==SYMBOLS.x)? SYMBOLS.o : SYMBOLS.x
nextMove = getBestMove(newBoard, otherSymbol)
score = - (nextMove.score)
}
if(score === 1) // Performance optimization
return {move, score}
availableMovesAndScores.push({move, score})
}
availableMovesAndScores.sort((moveA, moveB )=>{
return moveB.score - moveA.score
})
return availableMovesAndScores[0]
}
Algorithm in action, Github, Explaining the process in more details

Related

Write a recurrence for the probability that the champion retains the title?

Problem:
The traditional world chess championship is a match of 24 games. The
current champion retains the title in case the match is a tie. Each
game ends in a win, loss, or draw (tie) where wins count as 1, losses
as 0, and draws as 1/2. The players take turns playing white and
black. White plays first and so has an advantage. The champion plays
white in the first game. The champ has probabilities ww, wd, and wl of
winning, drawing, and losing playing white, and has probabilities bw,
bd, and bl of winning, drawing, and losing playing black.
(a) Write a recurrence for the probability that the champion retains the title.
Assume that there are g games left to play in the match and that the
champion needs to get i points (which may be a multiple of 1/2).
I found this problem in skiena's Algorithm design manual book and tried to solve it.
My take on this problem is At i`th game a champion must either win , lose or tie the game.we have prior probability of ww,wd and wl(since champion starts from white I am considering only ww,wd and wl only).
so the probability that the champion will win is equal to current outcome weightage multiplied by prior probability,
P_win(i) = ∑(prior_p * current_outcome_weightage) = ((ww*win_weight)+(wd*draw_weight)+(wl*loss_weight))/3
current outcome weightage is either one of the three case:
if champion wins : win_weight=1,draw_weight=0,loss_weight=0
if champion loses: win_weight=0,draw_weight=0,loss_weight=0
if champion draws: win_weight=0,draw_weight=1/2,loss_weight=0
Is there any better way to write the recurrence for this problem?
You basically need to "guess" at the current probability if the champion wins loses or draws, and recurse based on it, modifying i according to win/loss/draw - and always reducing g.
Stop clause is if either champion wins (no more points needed) or loses (no more games left).
(Note, conditions are evaluated in order, so g==0, i==0 will apply g<=0 only, for example)
Sol(g, i) = {
i <= 0: 1
g == 0: 0
i%2 == 0: ww*Sol(g-1, i-1) + wd*Sol(g-1,i-1/2) + wl*Sol(g-1,i)
i%2 == 1: bw*Sol(g-1, i-1) + bd*Sol(g-1,i-1/2) + bl*Sol(g-1,i)
}
Quick sanity test, if there is a final match left, and the champion must win it - then the probability of him winning is bw. In this case you need to evaluate: Sol(1, 1) = bw*Sol(0,0) + bd*Sol(0,1/2) + bl*Sol(0,1) = bw*1 + bd*0 + bl*0 = bw

How to account for position's history in transposition tables

I'm currently developing a solver for a trick-based card game called Skat in a perfect information situation. Although most of the people may not know the game, please bear with me; my problem is of a general nature.
Short introduction to Skat:
Basically, each player plays one card alternatingly, and every three cards form a trick. Every card has a specific value. The score that a player has achieved is the result of adding up the value of every card contained in the tricks that the respective player has won. I left out certain things that are unimportant for my problem, e.g. who plays against whom or when do I win a trick.
What we should keep in mind is that there is a running score, and who played what before when investigating a certain position (-> its history) is relevant to that score.
I have written an alpha beta algorithm in Java which seems to work fine, but it's way too slow. The first enhancement that seems the most promising is the use of a transposition table. I read that when searching the tree of a Skat game, you will encounter a lot of positions that have already been investigated.
And that's where my problem comes into play: If I find a position that has already been investigated before, the moves leading to this position have been different. Therewith, in general, the score (and alpha or beta) will be different, too.
This leads to my question: How can I determine the value of a position, if I know the value of the same position, but with a different history?
In other words: How can I decouple a subtree from its path to the root, so that it can be applied to a new path?
My first impulse was it's just not possible, because alpha or beta could have been influenced by other paths, which might not be applicable to the current position, but...
There already seems to be a solution
...that I don't seem to understand. In Sebastion Kupferschmid's master thesis about a Skat solver, I found this piece of code (maybe C-ish / pseudo code?):
def ab_tt(p, alpha, beta):
if p isa Leaf:
return 0
if hash.lookup(p, val, flag):
if flag == VALID:
return val
elif flag == LBOUND:
alpha = max(alpha, val)
elif flag == UBOUND:
beta = min(beta, val)
if alpha >= beta:
return val
if p isa MAX_Node:
res = alpha
else:
res = beta
for q in succ(p):
if p isa MAX_Node:
succVal = t(q) + ab_tt(q, res - t(q), beta - t(q))
res = max(res, succVal)
if res >= beta:
hash.add(p, res, LBOUND)
return res
elif p isa MIN_Node:
succVal = t(q) + ab_tt(q, alpha - t(q), res - t(q))
res = min(res, succVal)
if res <= alpha:
hash.add(p, res, UBOUND)
return res
hash.add(p, res, VALID)
return res
It should be pretty self-explanatory. succ(p) is a function that returns every possible move at the current position. t(q) is what I believe to be the running score of the respective position (the points achieved so far by the declarer).
Since I don't like copying stuff without understanding it, this should just be an aid for anyone who would like to help me out. Of course, I have given this code some thought, but I can't wrap my head around one thing: By subtracting the current score from alpha/beta before calling the function again [e.g. ab_tt(q, res - t(q), beta - t(q))], there seems to be some kind of decoupling going on. But what exactly is the benefit if we store the position's value in the transposition table without doing the same subtraction right here, too? If we found a previously investigated position, how come we can just return its value (in case it's VALID) or use the bound value for alpha or beta? The way I see it, both storing and retrieving values from the transposition table won't account for the specific histories of these positions. Or will it?
Literature:
There's almost no English sources out there that deal with AI in skat games, but I found this one: A Skat Player Based on Monte Carlo Simulation by Kupferschmid, Helmert. Unfortunately, the whole paper and especially the elaboration on transposition tables is rather compact.
Edit:
So that everyone can imagine better how the score develops thoughout a Skat game until all cards have been played, here's an example. The course of the game is displayed in the lower table, one trick per line. The actual score after each trick is on its left side, where +X is the declarer's score (-Y is the defending team's score, which is irrelevant for alpha beta). As I said, the winner of a trick (declarer or defending team) adds the value of each card in this trick to their score.
The card values are:
Rank J A 10 K Q 9 8 7
Value 2 11 10 4 3 0 0 0
I solved the problem. Intead of doing weird subtractions upon each recursive call, as suggested by the reference in my question, I subtract the running score from the resulting alpha beta value, only when storing a position in the transposition table:
For exact values (the position hasn't been pruned):
transpo.put(hash, new int[] { TT_VALID, bestVal - node.getScore()});
If the node caused a beta-cutoff:
transpo.put(hash, new int[] { TT_LBOUND, bestVal - node.getScore()});
If the node caused an alpha-cutoff:
transpo.put(hash, new int[] { TT_UBOUND, bestVal - node.getScore()});
Where:
transpo is a HashMap<Long, int[]>
hash is the long value representing that position
bestVal is either the exact value or the value that caused a cutoff
TT_VALID, TT_LBOUND and TT_UBOUND are simple constants, describing the type of transposition table entry
However, this didn't work per se. After posting the same question on gamedev.net, a user named Álvaro gave me the deciding hint:
When storing exact scores (TT_VALID), I should only store positions, that improved alpha.

How exactly to use "History Heuristic" in alpha-beta minimax?

I'm making an AI for a chess game.
So far, I've successfully implemented the Alpha-Beta Pruning Minimax algorithm, which looks like this (from Wikipedia):
(* Initial call *)
alphabeta(origin, depth, -∞, +∞, TRUE)
function alphabeta(node, depth, α, β, maximizingPlayer)
if depth = 0 or node is a terminal node
return the heuristic value of node
if maximizingPlayer
for each child of node
α := max(α, alphabeta(child, depth - 1, α, β, FALSE))
if β ≤ α
break (* β cut-off *)
return α
else
for each child of node
β := min(β, alphabeta(child, depth - 1, α, β, TRUE))
if β ≤ α
break (* α cut-off *)
return β
Since this costs too much time complexity (going through all the trees one by one), I came across something called "History Heuristic".
The Algorithm from the original paper:
int AlphaBeta(pos, d, alpha, beta)
{
if (d=0 || game is over)
return Eval (pos); // evaluate leaf position from current player’s standpoint
score = - INFINITY; // preset return value
moves = Generate(pos); // generate successor moves
for i=1 to sizeof(moves) do // rating all moves
rating[i] = HistoryTable[ moves[i] ];
Sort( moves, rating ); // sorting moves according to their history scores
for i =1 to sizeof(moves) do { // look over all moves
Make(moves[i]); // execute current move
cur = - AlphaBeta(pos, d-1, -beta, -alpha); //call other player
if (cur > score) {
score = cur;
bestMove = moves[i]; // update best move if necessary
}
if (score > alpha) alpha = score; //adjust the search window
Undo(moves[i]); // retract current move
if (alpha >= beta) goto done; // cut off
}
done:
// update history score
HistoryTable[bestMove] = HistoryTable[bestMove] + Weight(d);
return score;
}
So basically, the idea is to keep track of a Hashtable or a Dictionary for previous "moves".
Now I'm confused what this "move" means here.
I'm not sure if it literally refers to a single move or a overall state after each move.
In chess, for example, what should be the "key" for this hashtable be?
Individual moves like (Queen to position (0,1)) or (Knight to position (5,5))?
Or the overall state of the chessboard after individual moves?
If 1 is the case, I guess the positions of other pieces are not taken into account when recording the "move" into my History table?
I think the original paper (The History Heuristic and Alpha-Beta Search Enhancements in Practice, Jonathan Schaeffer) available on-line answers the question clearly. In the paper, the author defined move as the 2 indices (from square and to) on the chess board, using a 64x64 table (in effect, I think he used bit shifting and a single index array) to contain the move history.
The author compared all the available means of move ordering and determined that hh was the best. If current best practice has established an improved form of move ordering (beyond hh + transposition table), I would also like to know what it is.
You can use a transposition table so you avoid evaluating the same board multiple times. Transposition meaning you can reach the same board state by performing moves in different orders. Naive example:
1. e4 e5 2. Nf3 Nc6
1. e4 Nc6 2. Nf3 e5
These plays result in the same position but were reached differently.
http://en.wikipedia.org/wiki/Transposition_table
A common method is called Zobrist hashing to hash a chess position:
http://en.wikipedia.org/wiki/Zobrist_hashing
From my experience the history heuristic produces negligible benefits compared to other techniques, and is not worthwhile for a basic search routine. It is not the same thing as using transposition table. If the latter is what you want to implement, I'd still advise against it. There are many other techniques that will produce good results for far less effort. In fact, an efficient and correct transposition table is one of the most difficult parts to code in a chess engine.
First try pruning and move ordering heuristics, most of which are one to a few lines of code. I've detailed such techniques in this post, which also gives estimates of the performance gains you can expect.
In chess, for example, what should be the "key" for this hashtable be?
Individual moves like (Queen to position (0,1)) or (Knight to position (5,5))?
Or the overall state of the chessboard after individual moves?
The key is an individual move and the positions of other pieces aren't taken into account when recording the "move" into the history table.
The traditional form of the history table (also called butterfly board) is something like:
score history_table[side_to_move][from_square][to_square];
For instance, if the move e2-e4 produces a cutoff, the element:
history_table[white][e2][e4]
is (somehow) incremented (irrespectively from the position in which the move has been made).
As in the example code, history heuristics uses those counters for move ordering. Other heuristics can take advantage of history tables (e.g. late move reductions).
Consider that:
usually history heuristics isn't applied to plain Alpha-Beta with no knowledge of move ordering (in chess only "quiet" moves are ordered via history heuristic);
there are alternative forms for the history table (often used is history_table[piece][to_square]).

Grundy's game extended to more than two heaps

How can In break a heap into two heaps in the Grundy's game?
What about breaking a heap into any number of heaps (no two of them being equal)?
Games of this type are analyzed in great detail in the book series "Winning Ways for your Mathematical Plays". Most of the things you are looking for are probably in volume 1.
You can also take a look at these links: Nimbers (Wikipedia), Sprague-Grundy theorem (Wikipedia) or do a search for "combinatorial game theory".
My knowledge on this is quite rusty, so I'm afraid I can't help you myself with this specific problem. My excuses if you were already aware of everything I linked.
Edit: In general, the method of solving these types of games is to "build up" stack sizes. So start with a stack of 1 and decide who wins with optimal play. Then do the same for a stack of 2, which can be split into 1 & 1. The move on to 3, which can be split into 1 & 2. Same for 4 (here it gets trickier): 3 & 1 or 2 & 2, using the Spague-Grundy theorem & the algebraic rules for nimbers, you can calculate who will win. Keep going until you reach the stack size for which you need to know the answer.
Edit 2: The website I was talking about in the comments seems to be down. Here is a link of a backup of it: Wayback Machine - Introduction to Combinatorial Games.
Grundy's Game, and many games like it, can be solved with an algorithm like this:
//returns a Move object representing the current player's optimal move, or null if the player has no chance of winning
function bestMove(GameState g){
for each (move in g.possibleMoves()){
nextState = g.applyMove(move)
if (bestMove(nextState) == null){
//the next player's best move is null, so if we take this move,
//he has no chance of winning. This is good for us!
return move;
}
}
//none of our possible moves led to a winning strategy.
//We have no chance of winning. This is bad for us :-(
return null;
}
Implementations of GameState and Move depend on the game. For Grundy's game, both are simple.
GameState stores a list of integers, representing the size of each heap in the game.
Move stores an initialHeapSize integer, and a resultingHeapSizes list of integers.
GameState::possibleMoves iterates through its heap size list, and determines the legal divisions for each one.
GameState::applyMove(Move) returns a copy of the GameState, except the move given to it is applied to the board.
GameState::possibleMoves can be implemented for "classic" Grundy's Game like so:
function possibleMoves(GameState g){
moves = []
for each (heapSize in g.heapSizes){
for each (resultingHeaps in possibleDivisions(heapSize)){
Move m = new Move(heapSize, resultingHeaps)
moves.append(m)
}
}
return moves
}
function possibleDivisions(int heapSize){
divisions = []
for(int leftPileSize = 1; leftPileSize < heapSize; leftPileSize++){
int rightPileSize = heapSize - leftPileSize
if (leftPileSize != rightPileSize){
divisions.append([leftPileSize, rightPileSize])
}
}
return divisions
}
Modifying this to use the "divide into any number of unequal piles" rule is just a matter of changing the implementation of possibleDivisions.
I haven't calculated it exactly, but an unoptimized bestMove has a pretty crazy worst-case runtime. Once you start giving it a starting state of around 12 stones, you'll get long wait times. So you should implement memoization to improve performance.
For best results, keep each GameState's heap size list sorted, and discard any heaps of size 2 or 1.

Algorithm for the game of Chomp

I am writing a program for the game of Chomp. You can read the description of the game on Wikipedia, however I'll describe it briefly anyway.
We play on a chocolate bar of dimension n x m, i.e. the bar is divided in n x m squares. At each turn the current player chooses a square and eats everything below and right of the chosen square. So, for example, the following is a valid first move:
The objective is to force your opponent to eat the last piece of chocolate (it is poisoned).
Concerning the AI part, I used a minimax algorithm with depth-truncation. However I can't come up with a suitable position evaluation function. The result is that, with my evaluation function, it is quite easy for a human player to win against my program.
Can anyone:
suggest a good position evaluation function or
provide some useful reference or
suggest an alternative algorithm?
How big are your boards?
If your boards are fairly small then you can solve the game exactly with dynamic programming. In Python:
n,m = 6,6
init = frozenset((x,y) for x in range(n) for y in range(m))
def moves(board):
return [frozenset([(x,y) for (x,y) in board if x < px or y < py]) for (px,py) in board]
#memoize
def wins(board):
if not board: return True
return any(not wins(move) for move in moves(board))
The function wins(board) computes whether the board is a winning position. The board representation is a set of tuples (x,y) indicating whether piece (x,y) is still on the board. The function moves computes list of boards reachable in one move.
The logic behind the wins function works like this. If the board is empty on our move the other player must have eaten the last piece, so we won. If the board is not empty then we can win if there is any move we can do such that the resulting position is a losing position (i.e. not winning i.e. not wins(move)), because then we got the other player into a losing position.
You'll also need the memoize helper function which caches the results:
def memoize(f):
cache = dict()
def memof(x):
try: return cache[x]
except:
cache[x] = f(x)
return cache[x]
return memof
By caching we only compute who is the winner for a given position once, even if this position is reachable in multiple ways. For example the position of a single row of chocolate can be obtained if the first player eats all remaining rows in his first move, but it can also be obtained through many other series of moves. It would be wasteful to compute who wins on the single row board again and again, so we cache the result. This improves the asymptotic performance from something like O((n*m)^(n+m)) to O((n+m)!/(n!m!)), a vast improvement though still slow for large boards.
And here is a debugging printing function for convenience:
def show(board):
for x in range(n):
print '|' + ''.join('x ' if (x,y) in board else ' ' for y in range(m))
This code is still fairly slow because the code isn't optimized in any way (and this is Python...). If you write it in C or Java efficiently you can probably improve performance over 100 fold. You should easily be able to handle 10x10 boards, and you can probably handle up to 15x15 boards. You should also use a different board representation, for example a bit board. Perhaps you would even be able to speed it up 1000x if you make use of multiple processors.
Here is a derivation from minimax
We'll start with minimax:
def minimax(board, depth):
if depth > maxdepth: return heuristic(board)
else:
alpha = -1
for move in moves(board):
alpha = max(alpha, -minimax(move, depth-1))
return alpha
We can remove the depth checking to do a full search:
def minimax(board):
if game_ended(board): return heuristic(board)
else:
alpha = -1
for move in moves(board):
alpha = max(alpha, -minimax(move))
return alpha
Because the game ended, heuristic will return either -1 or 1, depending on which player won. If we represent -1 as false and 1 as true, then max(a,b) becomes a or b, and -a becomes not a:
def minimax(board):
if game_ended(board): return heuristic(board)
else:
alpha = False
for move in moves(board):
alpha = alpha or not minimax(move)
return alpha
You can see this is equivalent to:
def minimax(board):
if not board: return True
return any([not minimax(move) for move in moves(board)])
If we had instead started with minimax with alpha-beta pruning:
def alphabeta(board, alpha, beta):
if game_ended(board): return heuristic(board)
else:
for move in moves(board):
alpha = max(alpha, -alphabeta(move, -beta, -alpha))
if alpha >= beta: break
return alpha
// start the search:
alphabeta(initial_board, -1, 1)
The search starts out with alpha = -1 and beta = 1. As soon as alpha becomes 1, the loop breaks. So we can assume that alpha stays -1 and beta stays 1 in the recursive calls. So the code is equivalent to this:
def alphabeta(board, alpha, beta):
if game_ended(board): return heuristic(board)
else:
for move in moves(board):
alpha = max(alpha, -alphabeta(move, -1, 1))
if alpha == 1: break
return alpha
// start the search:
alphabeta(initial_board, -1, 1)
So we can simply remove the parameters, as they are always passed in as the same values:
def alphabeta(board):
if game_ended(board): return heuristic(board)
else:
alpha = -1
for move in moves(board):
alpha = max(alpha, -alphabeta(move))
if alpha == 1: break
return alpha
// start the search:
alphabeta(initial_board)
We can again do the switch from -1 and 1 to booleans:
def alphabeta(board):
if game_ended(board): return heuristic(board)
else:
alpha = False
for move in moves(board):
alpha = alpha or not alphabeta(move))
if alpha: break
return alpha
So you can see this is equivalent to using any with a generator which stops the iteration as soon as it has found a True value instead of always computing the whole list of children:
def alphabeta(board):
if not board: return True
return any(not alphabeta(move) for move in moves(board))
Note that here we have any(not alphabeta(move) for move in moves(board)) instead of any([not minimax(move) for move in moves(board)]). This speeds up the search by about a factor of 10 for reasonably sized boards. Not because the first form is faster, but because it allows us to skip entire rest of the loop including the recursive calls as soon as we have found a value that's True.
So there you have it, the wins function was just alphabeta search in disguise. The next trick we used for wins is to memoize it. In game programming this would be called using "transposition tables". So the wins function is doing alphabeta search with transposition tables. Of course it's simpler to write down this algorithm directly instead of going through this derivation ;)
I don't think a good position evaluation function is possible here, because unlike games like chess, there's no 'progress' short of winning or losing. The Wikipedia article suggests an exhaustive solution is practical for modern computers, and I think you'll find this is the case, given suitable memoization and optimisation.
A related game you may find of interest is Nim.

Resources