Alpha-beta pruning for Minimax - algorithm

I have spent a whole day trying to implement minimax without really understanding it. Now, , I think I understand how minimax works, but not alpha-beta pruning.
This is my understanding of minimax:
Generate a list of all possible moves, up until the depth limit.
Evaluate how favorable a game field is for every node on the bottom.
For every node, (starting from the bottom), the score of that node is the highest score of it's children if the layer is max. If the layer is min, the score of that node is the lowest score of it's children.
Perform the move that has the highest score if you are trying to max it, or the lowest if you want the min score.
My understanding of alpha-beta pruning is that, if the parent layer is min and your node has a higher score than the minimum score, then you can prune it since it will not affect the result.
However, what I don't understand is, if you can work out the score of a node, you will need to know the score of all nodes on a layer lower than the node (in my understanding of minimax). Which means that you'llstill be using the same amount of CPU power.
Could anyone please point out what I am getting wrong? This answer ( Minimax explained for an idiot ) helped me understand minimax, but I don't get how alpha beta pruning would help.
Thank you.

To understand Alpha-Beta, consider the following situation. It's Whites turn, white is trying to maximize the score, black is trying to minimize the score.
White evaluates move A,B, and C and finds the best score is 20 with C. Now consider what happens when evaluating move D:
If white selects move D, we need to consider counter-moves by black. Early on, we find black can capture the white queen, and that subtree gets a MIN score of 5 due to the lost queen. However, we have not considered all of blacks counter-moves. Is it worth checking the rest? No.
We don't care if black can get a score lower than 5 because whites move "C" could keep the score to 20. Black will not choose a counter-move with a score higher than 5 because he is trying to MINimize the score and has already found move with a score of 5. For white, move C is preferred over move D as soon as the MIN for D (5 so far) goes below that of C (20 for sure). So we "prune" the rest of the tree there, pop back up a level and evaluate white moves E,F,G,H.... to the end.
Hope that helps.

You don't need to evaluate the entire subtree of a node to decide its value. Alpha Beta Pruning uses two dynamically computed bounds alpha and beta to bound the values that nodes can take.
Alpha is the minimum value that the max player is guaranteed (regardless of what the min player does) through another path through the game tree. This value is used to perform cutoffs (pruning) at the minimizing levels. When the min player has discovered that the score of a min node would necessarily be less than alpha, it need not evaluate any more choices from that node because the max player already has a better move (the one which has value alpha).
Beta is the maximum value that the min player is guaranteed and is used to perform cutoffs at the maximizing levels. When the max player has discovered that the score of a max node would necessarily be greater than beta, it can stop evaluating any more choices from that node because the min player would not allow it to take this path since the min player already has a path that guarantees a value of beta.
I've written a detailed explanation of Alpha Beta Pruning, its pseudocode and several improvements: http://kartikkukreja.wordpress.com/2014/06/29/alphabetasearch/

(Very) short explanation for mimimax:
You (the evaluator of a board position) have the choice of playing n moves. You try all of them and give the board positions to the (opponent) evaluator.
The opponent evaluates the new board positions (for him, the opponent side) - by doing essentially the same thing, recursively calling (his opponent) evaluator, unless the maximum depth or some other condition has been reached and a static evaluator is called - and then selects the maximum evaluation and sends the evaluations back to you.
You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.
(Very) short explanation for α-β-pruning:
You (the evaluator of a board position) have the choice of playing n moves. You try all of them one by one and give the board positions to the (opponent) evaluator - but you also pass along your current evaluation (of your board).
The opponent evaluates the new board position (for him, the opponent side) and sends the evaluation back to you. But how does he do that? He has the choice of playing m moves. He tries all of them and gives the new board positions (one by one) to (his opponent) evaluator and then chooses the maximum one.
Crucial step: If any of those evaluations that he gets back, is bigger than the minimum you gave him, it is certain that he will eventually return an evaluation value at least that large (because he wants to maximize). And you are sure to ignore that value (because you want to minimize), so he stops any more work for boards he hasn't yet evaluated.
You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.

Here's a short answer -- you can know the value of a node without computing the precise value of all its children.
As soon as we know that a child node cannot be better, from the perspective of the parent-node player, than the previously evaluated sibling nodes, we can stop evaluating the child subtree. It's at least this bad.

I think your question hints at misunderstanding of the evaluation function
if you can work out the score of a node, you will need to know the score of all nodes on a layer lower than the node (in my understanding of minimax)
I'm not completely sure what you meant there, but it sounds wrong. The evaluation function (EF) is usually a very fast, static position evaluation. This means that it needs only look at a single position and reach a 'verdict' from that. (IOW, you don't always evaluate a branch to n plys)
Now many times, the evaluation truly is static, which means that the position evaluation function is completely deterministic. This is also the reason why the evaluation results are easily cacheable (since they will be the same each time a position is evaluated).
Now, for e.g. chess, there is usually quite a bit of overt/covert deviation from the above:
a position might be evaluated differently depending on game context (e.g. whether the exact position did occur earlier during the game; how many moves without pawn moves/captures have occurred, en passant and castling opportunity). The most common 'trick' to tackle this is by actually incorporating that state into the 'position'1
a different EF is usually selected for the different phases of the game (opening, middle, ending); this has some design impact (how to deal with cached evaluations when changing the EF? How to do alpha/beta pruning when the EF is different for different plies?)
To be honest, I'm not aware how common chess engines solve the latter (I simply avoided it for my toy engine)
I'd refer to an online resources like:
Computer Chess Programming Theory
Alpha/Beta Pruning
Iterative Deepening
Transposition Table
1just like the 'check'/'stalemate' conditions, if they are not special cased outside the evaluation function anyways

Related

Chess programming: minimax, detecting repeats, transposition tables

I'm building a database of chess evaluations (essentially a map from a chess position to an evaluation), and I want to use this to come up with a good move for given positions. The idea is to do a kind of "static" minimax, i.e.: for each position, use the stored evaluation if evaluations for child nodes (positions after next ply) are not available, otherwise use max (white to move)/min (black to move) evaluations of child nodes (which are determined in the same way).
The problem are, of course, loops in the graph, i.e. repeating positions. I can't fathom how to deal with this without making this infinitely less efficient.
The ideas I have explored so far are:
assume an evaluation of 0 for any position that can be reached in a game with less moves than are currently evaluated. This is an invalid assumption, because - for example - if White plays A, it might not be desirable for Black to follow up with x, but if White plays B, then y -> A -> x -> -B -> -y might be best line, resulting in the same position as A -> x, without any repetitions (-m denoting the inverse move to m here, lower case: Black moves, upper case: White moves).
having one instance for each possible way a position can be reached solves the loop problem, but this yields a bazillion of instances in some positions and is therefore not practical
the fact that there is a loop from a position back to that position doesn't mean that it's a draw by repetition, because playing the repeating line may not be best choice
I've tried iterating through the loops a few times to see if the overall evaluation would become stable. It doesn't, because in some cases, assuming the repeat is the best line means it isn't any longer - and then it goes back to the draw being the back line etc.
I know that chess engines use transposition tables to detect positions already reached before, but I believe this doesn't address my problem, and I actually wonder if there isn't an issue with them: a position may be reachable through two paths in the search tree - one of them going through the same position before, so it's a repeat, and the other path not doing that. Then the evaluation for path 1 would have to be 0, but the one for path 2 wouldn't necessarily be (path 1 may not be the best line), so whichever evaluation the transposition table holds may be wrong, right?
I feel sure this problem must have a "standard / best practice" solution, but google failed me. Any pointers / ideas would be very welcome!
I don't understand what the problem is. A minimax evaluation, unless we've added randomness to it, will have the exact same result for any given board position combined with who's turn it is and other key info. If we have the space available to store common board_position+who's_turn+castling+en passant+draw_related tuples (or hash thereof), go right ahead. When reaching that tuple in any other evaluation, just return the stored value or rely on its more detailed record for more complex evaluations (if the search yielding that record was not exhaustive, we can have different interpretations for it in any one evaluation). If the program also plays chess with time limits on the game, an additional time dimension (maybe a few broad blocks) would probably be needed in the memoisation as well.
(I assume you've read common public info about transposition tables.)

Minimax: what to do with equal scores in endgame?

As far as I understood, the Minimax algorithm in its simplest form works the following way: traverse the game tree bottom-up and if it's the player's turn, assign the maximum score of all child nodes to the current node, else the minimum score. Leaves are scored according to the game output, let's say +1 for win, 0 for draw, -1 for loss. Finally choose the move which leads to the node with the highest score.
Of course it is impractical to traverse the whole game tree, so a heuristic is used. But assume we are near the end of the game. Then I have found some problems with this simple approach.
For example, we are playing chess and the player (playing white) has reached this position:
It is the players turn. So there is a mate in one with Qg7, the node for Qg7 thus gets a score of 1. But for instance, Ke1 is also a legal move. The only reply is c5, then Qg7# is still available. And because Qg7 gets a score of 1, so does c5, so does Ke1.
So we have at least two moves with a score of 1 (Ke1 and Qg7). Let's say the algorithm considers king moves first and chooses the first move with the highest score. That would mean, that in this position, the player would not checkmate the opponent, but would do random King moves until the opponent could actually prevent the checkmate (with queening the pawn).
The fundamental problem is, that a checkmate in one (Qg7) has the same score as a checkmate in two (Ke1), so there is no reason for the player, to actually go for the checkmate in one.
This can be prevented with a simple modification to the Minimax algorithm: in case of equal score, choose the shorter path to the position with this score. So a checkmate in one would be prefered.
My question is: I have not found any mention of this in any Minimax-related source, so is there some misunderstanding of Minimax on my part? If not, is this the usual way to solve this or are there superior ways?
I'm pretty sure you understand minimax correctly.
The thing I would probably do is to simply pass down the current distance in the minimax function, and weight the wins / losses according to that. A faster win (to reduce the possibility of unseen situations) and a slower loss (to allow for mistakes by the opponent) should generally be preferred. Whether the win is 1, or any positive value, it doesn't matter too much - it will still get picked as better than 0 or -1.
If you have a win be the largest possible value in your heuristic - you can still do something similar - just weight it by increasing or decreasing it a little, but still have it be larger than all other non-win values.
For your example, it probably doesn't really matter, as, when the pawn gets close to promoting, you'd detect that a draw is coming and then you'd make the winning move. But it can certainly be a problem if:
There's a sequence of inevitable moves beyond the search depth that results in a worst outcome for you (which is a pretty common problem with minimax, but if it's likely avoidable, it's obviously better to do so)
There's a time constraint on your side
You don't cater for all draw conditions (e.g. 3 repeated positions)

Tic Tac Toe AI Bugs

I'm trying to implement an AI for Tic Tac Toe that is smart enough to never lose. I've tried two different algorithms but the AI still makes mistakes.
I started with this minimax alpha-beta pruning algorithm. Here's a live demo: http://iioengine.com/ttt/minimax.htm
It runs without error, but if you take the bottom left corner first, then either of the other two squares on the bottom row - the AI doesn't see that coming. I'm sure this isn't a flaw in the minimax algorithm - can anyone see an error in my source? you can inspect the demo page to see everything but here is the primary ai function:
function bestMove(board,depth,low,high,opponent){
var best=new Move(null,-iio.maxInt);
var p;
for (var c=0;c<grid.C;c++)
for(var r=0;r<grid.R;r++){
if (board[c][r]=='_'){
var nuBoard=board.clone();
nuBoard[c][r]=getTypeChar(opponent);
if(checkWin(nuBoard,getTypeChar(opponent)))
p=new Move([c,r],-evaluateBoard(board,getTypeChar(opponent))*10000);
else if (checkScratch(nuBoard))
p=new Move([c,r],0);
else if (depth==0)
p=new Move([c,r],-evaluateBoard(board,getTypeChar(opponent)));
else {
p=bestMove(nuBoard,depth-1,-high,-low,!opponent);
}
if (p.score>best.score){
best=p;
if (best.score > low)
low=best.score;
if (best.score >= high) return best;
}
}
}
return best;
}
If you are more familiar with negamax, I tried that one too. I lifted the logic straight from this page. Here is a live demo: http://iioengine.com/ttt/negamax.htm
That one freezes up once you reach a win state, but you can already see that the AI is pretty stupid. Is something wrong with the code integration?
Please let me know if you find a flaw in my code that prevents these algrothims from running properly. Thnx.
Update with code:
function checkWin(board,type){
for (var i=0;i<3;i++)
if (evaluateRow(board,[i,0,i,1,i,2],type) >= WIN_SCORE
||evaluateRow(board,[0,i,1,i,2,i],type) >= WIN_SCORE)
return true;
if(evaluateRow(board,[0,0,1,1,2,2],type) >= WIN_SCORE
||evaluateRow(board,[2,0,1,1,0,2],type) >= WIN_SCORE)
return true;
return false;
}
function evaluateBoard(board,type){
var moveTotal=0;
for (var i=0;i<3;i++){
moveTotal+=evaluateRow(board,[i,0,i,1,i,2],type);
moveTotal+=evaluateRow(board,[0,i,1,i,2,i],type);
}
moveTotal+=evaluateRow(board,[0,0,1,1,2,2],type);
moveTotal+=evaluateRow(board,[2,0,1,1,0,2],type);
return moveTotal;
}
The problem lies in your evaluateBoard() function. The evaluation function is the heart of a minimax/negamax algorithm. If your AI is behaving poorly, the problem usually lies in the evaluation of the board at each move.
For the evaluation of the board, you need to take into consideration three things: winning moves, blocking moves, and moves that result in a fork.
Winning Moves
The static evaluation function needs to know if a move results in a win or a loss for the current player. If the move results in a loss for the current player, it needs to return a very low negative number (lower than any regular move). If the move results in a win for the current player, it needs to return a very high positive number (larger than any regular move).
What is important to remember is that this evaluation has to be relative to the player whose turn the AI is making. If the AI is currently predicting where the Human player will move, then the evaluation must look at the board from the point of view of the Human player. When it's the AI's turn move, the evaluation must look at the board from the point of view of the Computer player.
Blocking Moves
When you run your evaluation function, the AI actually doesn't think blocking the Human player is beneficial. Your evaluation function looks like it just counts the number of available moves and returns the result. Instead, you need to return a higher positive number for moves that will help the AI win.
To account for blocking, you need to figure out if a player has 2 of their tokens in an open row, column, or diagonal, and then score the blocking square higher than any other square. So if it is the Computer's turn to move, and the Human player has 2 tokens in an open row, the 3rd square in the row needs to have a high positive number (but not as high as a winning square). This will cause the computer to favor that square over any others.
By just accounting for Winning moves and Blocking moves, you will have a Computer that plays fairly well.
Forking Moves
Forking moves cause problems for the Computer. The main problem is that the Computer is 'too smart' for it's own good. Since it assumes that the Human player will always make the best move every time, it will find situations where all moves that it could make will eventually end in a loss for it, so it will just pick the first move on the board it can since nothing else matters.
If we go through your example, we can see this happen: Human player plays bottom left, Computer plays top middle.
| O |
---+---+---
| |
---+---+---
X | |
When the Human player makes a move to the bottom right corner, the Computer sees that if it tries to block that move, the best move the Human player would make is to take the middle square, resulting in a fork and a win for the Human (although this won't happen even time since Humans are fallible, the Computer doesn't know that).
| O |
---+---+---
| X |
---+---+---
X | O | X
Because the computer will lose whether it blocks or doesn't block the Human from winning, blocking the Human will actually bubble up the lowest possible score (since it results in a loss for the Computer). This means that the Computer will take the best score it can - the middle square.
You'll have to figure out what is the best way to handle such situations, since everyone would play it differently. It's just something to be aware of.
With pure Minimax implementation for Tic-Tac-Toe, the A.I. should never lose. At worst, it should go into a draw.
By pure Minimax, I mean an implementation that explores each and every possible move (actually transition from one move to the other) and creates a tree for said moves and transitions (starting with an empty board at the top of the tree, branching off in all possible first moves, then all possible 2nd moves, etc).
(There's also heuristic Minimax, in which you do not render all positions in the tree node from the start, but only go a certain depth.)
The tree should have as leafs only board positions that end the game (X wins, O wins or draw). Such a tree for classing Tic-Tac-Toe (3x3 board) contains 5477 nodes (not counting the all-empty board at the top).
Once such a tree is created, the leaves are scored directly by simply evaluating how the game ended: top score for a leaf node containing a board state where A.I. wins, 0 score for draw, and lowest score for nodes with board state where the human player has won.
(in heuristic Minimax, you'll have to create a "guesstimation" function, that evaluates the leafs of the partial tree and assigns min/0/max score accordingly - in this implementation, there's a chance that the A.I. might lose at the end, and that chance is inversely proportional with how good your "guesstimator" function is at assessing partial game states.)
Next, all intermediate, non-leaf nodes of the tree are scored based of their children. Obviously, you'd do this bottoms-up, as initially, only the lowest non leaf nodes have scored children (the leaf nodes) from which to draw their own score.
(In the context of Tic-Tac-Toe there's no point in making a heuristic implementation of Minimax, as it's fairly cheap to render a tree with 5477 + 1 nodes, and then score them all. This kind of implementation is useful for games where there's a lot of branching (a lot of possible moves for a given game state), thus making for a slow/memory-hog full tree - such as chess))
In the end, you'll have a data structure containing all possible Tic-Tac-Toe games, and an exact idea of what's the best move to perform in response to any move the human player does. As such, due to how Tic-Tac-Toe rules work, Minimax A.I. will only win (if the human player makes at least one crucial mistake) or draw (if the human player always makes the best possible move). This stands true no matter who makes the first move.
I've implemented this myself, and it works as expected.
Here are some of the finer points (with which I've struggled a bit):
make sure the function you use to evaluate the board works well, i.e. that it correctly spots when there's a win/draw situation for either X and O. This function will be used on almost each node of your Minimax tree as you build it, and having it bug-out will result in seemingly working but in fact flawed code. Test this part extensively
Make sure you navigate your tree properly, especially when you're scoring intermediate nodes (but also when you're searching for the next move to make). A trivial solution is to make, along-side the tree, a hash table containing each intermediary node (non-leaf node) per level of tree depth. This way you'll be sure to get all nodes at the right time when you do the bottom-up scoring.

Othello Evaluation Function

I am currently developing a simple AI for Othello using minimax and alpha-beta pruning.
My question is related to the evaluation function for the state of the board.
I am currently looking to evaluate it by looking at:
Disc count (parity)
Number of legal moves
Importance of particular positions
So lets say the root node is the initial game state. The first action is the the AI's action while the second action is the opponent's action.
0
/ \ AI's Action
1 1
/ \ \ Opponent's action
2 2 2
At node level 1, do I evaluate the disc count of my AI's chips and the number of legal moves it can make at the point of time after it has completed an action?
At node level 2, do I evaluate the disc count of the opponent's chips and the number of legal moves it can make at the point of time after the opponent has completed an action?
Meaning AI move -> Opponent move ==> At this point of time I evaluate the opponent's disc count and the number of legal the opponent can make.
When generating games trees, you shouldn't evaluate a node unless it is a leaf node. That is, you generate a tree until a level N (which corresponds to a board with N moves made ahead of the current state of the board) unless you have reached a node which corresponds to an end of game situation. It is only in those nodes when you should evaluate the state of the board game with your evaluation function. That is what the minimax algorithm is about. The only case I know in which you evaluate a node after every player move is in iterative deepening algorithm which seems you are not using.
The evaluation function is responsible for providing a quick assessment of the "score" of a particular position - in other words, which side is winning and by how much. It is also called static evaluation function because it looks only at a specific board configuration. So yes, when you reach level N you can count the possible moves of both the computer and the user and substract them. For example, if the result is positive it would mean that the computer has the advantage, if it is 0 it would mean a tie and it it is negative it will represent a disadvantage situation for the user in terms of mobility. Scoring a node which represents an end of game board configuration is trivial, assign a maximum value if you win and minimum value if you lose.
Mobility is one of the most important features to be considered in the evaluation function of most board games (those in which it is valuable). And to evaluate it, you count the possible moves of each player given a static board configuration no matter whose turn is next. Even if a player recently made a move, you are giving scores to boards in the same level N of the tree when the same player made the last move (therefore, scoring them in the same conditions) and picking of those the one which has the best score.
The features you are considering in your evaluation are very good. Usually, you want to consider material and mobility (which you are) in games in which they are very valuable (though, I don't know if material is always an advantage in Othello, you should know it better as it is the game you are working on) for a winning situation so I guess you are on the correct path.
EDIT: Be careful! In a leaf node the only thing you want to do is assign a particular score to a board configuration. It is in its parent node where that score is returned and compared with the other scores (corresponding to other children). In order to choose which is the best move available for a particular player, do the following: If the parent node corresponds to an opponent's move, then pick the one with the least (min)value. If it is the computer's turn to move, pick the score with the highest (max)value so that it represents the best possible move for this player.
End game evaluation function
If your search reaches a full board then the evaluation should simply be based on the disc count to determine who won.
Mid game evaluation function
The number of legal moves is useful in the opening and midgame as a large number of moves for you (and a low number for the opponent) normally indicates that you have a good position with lots of stable discs that your opponent cannot attack, while the opponent has a bad position where they may run out of moves and be forced to play a bad move (e.g. to let you play in the corner).
For this purpose, it does not matter greatly whose turn it is when counting the moves so I think you are on the correct path.
(Note that in the early stages of the game it is often an advantage to be the person with fewer discs as this normally means your opponent has few safe moves.)
Random evaluation function
Once upon a time I heard that just using a random number for the Othello evaluation function is (surprisingly to me) also a perfectly reasonable choice.
The logic is that the player with the most choices will be able to steer the game to get the highest random number, and so this approach again means that the AI will favour moves which give it lots of choices, and his opponent few.

Minimax algorithm: Cost/evaluation function?

A school project has me writing a Date game in C++ (example at http://www.cut-the-knot.org/Curriculum/Games/Date.shtml) where the computer player must implement a Minimax algorithm with alpha-beta pruning. Thus far, I understand what the goal is behind the algorithm in terms of maximizing potential gains while assuming the opponent will minify them.
However, none of the resources I read helped me understand how to design the evaluation function the minimax bases all it's decisions on. All the examples have had arbitrary numbers assigned to the leaf nodes, however, I need to actually assign meaningful values to those nodes.
Intuition tells me it'd be something like +1 for a win leaf node, and -1 for a loss, but how do intermediate nodes evaluate?
Any help would be most appreciated.
The most basic minimax evaluates only leaf nodes, marking wins, losses and draws, and backs those values up the tree to determine the intermediate node values.  In the case that the game tree is intractable, you need to use a cutoff depth as an additional parameter to your minimax functions.  Once the depth is reached, you need to run some kind of evaluation function for incomplete states.
Most evaluation functions in a minimax search are domain specific, so finding help for your particular game can be difficult.  Just remember that the evaluation needs to return some kind of percentage expectation of the position being a win for a specific player (typically max, though not when using a negamax implementation).  Just about any less researched game is going to closely resemble another more researched game.  This one ties in very closely with the game pickup sticks.  Using minimax and alpha beta only, I would guess the game is tractable.  
If you are must create an evaluation function for non terminal positions, here is a little help with the analysis of the sticks game, which you can decide if its useful for the date game or not.
Start looking for a way to force an outcome by looking at a terminal position and all the moves which can lead to that position.  In the sticks game, a terminal position is with 3 or fewer sticks remaining on the last move.  The position that immediately proceeds that terminal position is therefore leaving 4 sticks to your opponent.  The goal is now leave your opponent with 4 sticks no matter what, and that can be done from either 5, 6 or 7 sticks being left to you, and you would like to force your opponent to leave you in one of those positions.  The place your opponent needs to be in order for you to be in either 5, 6 or 7 is 8.  Continue this logic on and on and a pattern becomes available very quickly.  Always leave your opponent with a number divisible by 4 and you win, anything else, you lose.  
This is a rather trivial game, but the method for determining the heuristic is what is important because it can be directly applied to your assignment.  Since the last to move goes first, and you can only change 1 date attribute at a time, you know to win there needs to be exactly 2 moves left... and so on.
Best of luck, let us know what you end up doing.
The simplest case of an evaluation function is +1 for a win, -1 for a loss and 0 for any non-finished position. Given your tree is deep enough, even this simple function will give you a good player. For any non-trivial games, with high branching factor, typically you need a better function, with some heuristics (e.g. for chess you could assign weights to pieces and find a sum, etc.). In the case of the Date game, I would just use the simplest evaluation function, with 0 for all the intermediate nodes.
As a side note, minimax is not the best algorithm for this particular game; but I guess you know it already.
From what I understand of the Date game you linked to, it seems that the only possible outcomes for a player are win or lose, there is not in between (please correct me if I'm wrong).
In this case, it is only a matter of assigning a value of 1 to a winning position (current player gets to Dec 31) and a value of -1 to the losing positions (other player gets to Dec 31).
Your minimax algorithm (without alpha-beta pruning) would look something like this:
A_move(day):
if day==December 31:
return +1
else:
outcome=-1
for each day obtained by increasing the day or month in cur_date:
outcome=max(outcome,B_move(day))
return outcome
B_move(day):
if day==December 31:
return -1
else:
outcome=+1
for each day obtained by increasing the day or month in cur_date:
outcome=min(outcome,A_move(day))
return outcome

Resources