Minimax: what to do with equal scores in endgame? - algorithm

As far as I understood, the Minimax algorithm in its simplest form works the following way: traverse the game tree bottom-up and if it's the player's turn, assign the maximum score of all child nodes to the current node, else the minimum score. Leaves are scored according to the game output, let's say +1 for win, 0 for draw, -1 for loss. Finally choose the move which leads to the node with the highest score.
Of course it is impractical to traverse the whole game tree, so a heuristic is used. But assume we are near the end of the game. Then I have found some problems with this simple approach.
For example, we are playing chess and the player (playing white) has reached this position:
It is the players turn. So there is a mate in one with Qg7, the node for Qg7 thus gets a score of 1. But for instance, Ke1 is also a legal move. The only reply is c5, then Qg7# is still available. And because Qg7 gets a score of 1, so does c5, so does Ke1.
So we have at least two moves with a score of 1 (Ke1 and Qg7). Let's say the algorithm considers king moves first and chooses the first move with the highest score. That would mean, that in this position, the player would not checkmate the opponent, but would do random King moves until the opponent could actually prevent the checkmate (with queening the pawn).
The fundamental problem is, that a checkmate in one (Qg7) has the same score as a checkmate in two (Ke1), so there is no reason for the player, to actually go for the checkmate in one.
This can be prevented with a simple modification to the Minimax algorithm: in case of equal score, choose the shorter path to the position with this score. So a checkmate in one would be prefered.
My question is: I have not found any mention of this in any Minimax-related source, so is there some misunderstanding of Minimax on my part? If not, is this the usual way to solve this or are there superior ways?

I'm pretty sure you understand minimax correctly.
The thing I would probably do is to simply pass down the current distance in the minimax function, and weight the wins / losses according to that. A faster win (to reduce the possibility of unseen situations) and a slower loss (to allow for mistakes by the opponent) should generally be preferred. Whether the win is 1, or any positive value, it doesn't matter too much - it will still get picked as better than 0 or -1.
If you have a win be the largest possible value in your heuristic - you can still do something similar - just weight it by increasing or decreasing it a little, but still have it be larger than all other non-win values.
For your example, it probably doesn't really matter, as, when the pawn gets close to promoting, you'd detect that a draw is coming and then you'd make the winning move. But it can certainly be a problem if:
There's a sequence of inevitable moves beyond the search depth that results in a worst outcome for you (which is a pretty common problem with minimax, but if it's likely avoidable, it's obviously better to do so)
There's a time constraint on your side
You don't cater for all draw conditions (e.g. 3 repeated positions)

Related

AI algorithm for item pickup race

I would like to build an AI for the following game:
there are two players on a M x N board
each player can move up/down or left/right
there are different items on the board
the player wins who has more items than the other player in as many categories as possible (having more items in one category makes you the winner of this category, the player with more categories wins the game)
in one turn you can either pick up an item you are standing on or move
player moves are made at the same time
two players standing on the same field have a 0.5 pickup chance if both do it
The game ends if one of the following condition is met:
all the items have been picked up
there is already a clear winner since one player has has more than half the items of more than half of the categories
I have no idea of AI, but I have taken a machine learning class some time ago.
How do I get started on such a problem?
Is there a generalization of this problem?
The canonical choice for adversarial search games like you proposed (called two player zero-sum games) is called Minimax search. From wikipedia, the goal of Minimax is to
Minimize the possible loss for a worst case (maximum loss) scenario. Alternatively, it can be thought of as maximizing the minimum gain.
Hence, it is called minimax, or maximin. Essentially you build a tree of Max and Min levels, where the nodes each have a branching factor equal to the number of possible actions at each turn, 4 in your case. Each level corresponds to one of the player's turns, and the tree extends until the end of the game, allowing you to search for the optimal choice at each turn, assuming the opponent is playing optimally as well. If your opponent is not playing optimally, you will only score better. Essentially, at each node you simulate every possible game and choose the best action for the current turn.
If it seems like generating all possible games would take a long time, you are correct, it's an exponential complexity algorithm. From here you would want to investigate alpha-beta pruning, which essentially allows you to eliminate some of the possible games you are enumerating based on the values you have found so far, and is a fairly simple modification of minimax. This solution will still be optimal. I defer to the wikipedia article for further explanation.
From there, you would want to experiment with different heuristics for eliminating nodes, which could prune the tree of a significant number of nodes to traverse, however do note that eliminating nodes via heuristics will potentially produce a sub-optimal, but still good solution depending on your heuristic. One common tactic is to limit the depth of the search tree, essentially you search maybe 5 moves ahead to determine the best current move, using an estimate of each player's score at 5 moves ahead. Once again, this is a heuristic you could tweak. Something like simply calculating the score of the game as if it ended on that turn might suffice, and is definitely a good starting point.
Finally, for the nodes where probability is concerned, there is a slight modification of Minimax called Expectiminimax that essentially takes care of probability by adding a "third" player that chooses the random choice for you. The nodes for this third player take the expected value of the random event as their value.
The usual approach to any such problem is to play the game with live opponent long enough to find some heuristic solutions (short term goals) that lead you to victory. Then you implement these heuristics in your solution. Start with really small boards (1x3) and small number of categories (1), play them and see what happens, and then advance to more complicated cases.
Without playing the game I can only imagine that categories with less items are more valuable, also categories with items currently closer to you, and categories with items that are farthest away from you but still closer to you than to the opponent.
Every category has a cost, which is number of moves required to gain control of it, but the cost for you is different from the cost for the opponent, and it changes with every move. Category has greater value to you if the cost for you is near the cost for the opponent, but is still less than opponent's cost.
Every time you make a move categories change their values, so you have to recalculate the board and go from there in deciding your next move. The goal is to maximize your values and minimize opponents values, assuming that opponent uses the same algorithm as you.
The search for best move gets more complicated if you explore more than one turn in advance, but is also more effective. In this case you have to simulate opponents moves using the same algorithm, and then choosing your move to which opponent has the weakest counter-move. This strategy is called minimax.
All this is not really an AI, but it is an road map for an algorithm. Neural networks mentioned in the other answer are more AI-like, but I don't know anything about them.
The goal of the AI is to always seek to maintain the win conditions.
If it is practical (depending on how item locations are stored), at the start of each turn, the distance to all remaining items should be known to the AI. Ideally, this would be calculated once when the game is started, then simply "adjusted" based on where the AI moves, instead of recalculating at each turn. It also wouldn't be wise to have the AI do the same thing for the player if the AI isn't going to be only considering it's own situation.
From there is a matter of determining what item should be picked up as an optimization of the following considerations:
What items and item categories does the AI currently have?
What items and item categories does the player currently have?
What items and item categories are near the AI?
What items and item categories are near the Player?
Exactly how you do this largely depends on how difficult to beat you want the AI to be.
A simple way would be to use a greedy approach and simply go after the "current" best choice. This could be done by simply finding the closest item that is not in a category that the player is currently winning by so many items (probably 1-3). This produces an AI that tries to win, but doesn't think ahead making it rather easy to predict.
Allowing for the greedy algorithm to check multiple turns ahead will improve the algorithm, that and considering what the player will do will improve the algorithm further.
Heuristics will lead to a more realistic AI and hard to beat AI. Possibly even practically impossible to beat.

Othello Evaluation Function

I am currently developing a simple AI for Othello using minimax and alpha-beta pruning.
My question is related to the evaluation function for the state of the board.
I am currently looking to evaluate it by looking at:
Disc count (parity)
Number of legal moves
Importance of particular positions
So lets say the root node is the initial game state. The first action is the the AI's action while the second action is the opponent's action.
0
/ \ AI's Action
1 1
/ \ \ Opponent's action
2 2 2
At node level 1, do I evaluate the disc count of my AI's chips and the number of legal moves it can make at the point of time after it has completed an action?
At node level 2, do I evaluate the disc count of the opponent's chips and the number of legal moves it can make at the point of time after the opponent has completed an action?
Meaning AI move -> Opponent move ==> At this point of time I evaluate the opponent's disc count and the number of legal the opponent can make.
When generating games trees, you shouldn't evaluate a node unless it is a leaf node. That is, you generate a tree until a level N (which corresponds to a board with N moves made ahead of the current state of the board) unless you have reached a node which corresponds to an end of game situation. It is only in those nodes when you should evaluate the state of the board game with your evaluation function. That is what the minimax algorithm is about. The only case I know in which you evaluate a node after every player move is in iterative deepening algorithm which seems you are not using.
The evaluation function is responsible for providing a quick assessment of the "score" of a particular position - in other words, which side is winning and by how much. It is also called static evaluation function because it looks only at a specific board configuration. So yes, when you reach level N you can count the possible moves of both the computer and the user and substract them. For example, if the result is positive it would mean that the computer has the advantage, if it is 0 it would mean a tie and it it is negative it will represent a disadvantage situation for the user in terms of mobility. Scoring a node which represents an end of game board configuration is trivial, assign a maximum value if you win and minimum value if you lose.
Mobility is one of the most important features to be considered in the evaluation function of most board games (those in which it is valuable). And to evaluate it, you count the possible moves of each player given a static board configuration no matter whose turn is next. Even if a player recently made a move, you are giving scores to boards in the same level N of the tree when the same player made the last move (therefore, scoring them in the same conditions) and picking of those the one which has the best score.
The features you are considering in your evaluation are very good. Usually, you want to consider material and mobility (which you are) in games in which they are very valuable (though, I don't know if material is always an advantage in Othello, you should know it better as it is the game you are working on) for a winning situation so I guess you are on the correct path.
EDIT: Be careful! In a leaf node the only thing you want to do is assign a particular score to a board configuration. It is in its parent node where that score is returned and compared with the other scores (corresponding to other children). In order to choose which is the best move available for a particular player, do the following: If the parent node corresponds to an opponent's move, then pick the one with the least (min)value. If it is the computer's turn to move, pick the score with the highest (max)value so that it represents the best possible move for this player.
End game evaluation function
If your search reaches a full board then the evaluation should simply be based on the disc count to determine who won.
Mid game evaluation function
The number of legal moves is useful in the opening and midgame as a large number of moves for you (and a low number for the opponent) normally indicates that you have a good position with lots of stable discs that your opponent cannot attack, while the opponent has a bad position where they may run out of moves and be forced to play a bad move (e.g. to let you play in the corner).
For this purpose, it does not matter greatly whose turn it is when counting the moves so I think you are on the correct path.
(Note that in the early stages of the game it is often an advantage to be the person with fewer discs as this normally means your opponent has few safe moves.)
Random evaluation function
Once upon a time I heard that just using a random number for the Othello evaluation function is (surprisingly to me) also a perfectly reasonable choice.
The logic is that the player with the most choices will be able to steer the game to get the highest random number, and so this approach again means that the AI will favour moves which give it lots of choices, and his opponent few.

Alpha-beta pruning for Minimax

I have spent a whole day trying to implement minimax without really understanding it. Now, , I think I understand how minimax works, but not alpha-beta pruning.
This is my understanding of minimax:
Generate a list of all possible moves, up until the depth limit.
Evaluate how favorable a game field is for every node on the bottom.
For every node, (starting from the bottom), the score of that node is the highest score of it's children if the layer is max. If the layer is min, the score of that node is the lowest score of it's children.
Perform the move that has the highest score if you are trying to max it, or the lowest if you want the min score.
My understanding of alpha-beta pruning is that, if the parent layer is min and your node has a higher score than the minimum score, then you can prune it since it will not affect the result.
However, what I don't understand is, if you can work out the score of a node, you will need to know the score of all nodes on a layer lower than the node (in my understanding of minimax). Which means that you'llstill be using the same amount of CPU power.
Could anyone please point out what I am getting wrong? This answer ( Minimax explained for an idiot ) helped me understand minimax, but I don't get how alpha beta pruning would help.
Thank you.
To understand Alpha-Beta, consider the following situation. It's Whites turn, white is trying to maximize the score, black is trying to minimize the score.
White evaluates move A,B, and C and finds the best score is 20 with C. Now consider what happens when evaluating move D:
If white selects move D, we need to consider counter-moves by black. Early on, we find black can capture the white queen, and that subtree gets a MIN score of 5 due to the lost queen. However, we have not considered all of blacks counter-moves. Is it worth checking the rest? No.
We don't care if black can get a score lower than 5 because whites move "C" could keep the score to 20. Black will not choose a counter-move with a score higher than 5 because he is trying to MINimize the score and has already found move with a score of 5. For white, move C is preferred over move D as soon as the MIN for D (5 so far) goes below that of C (20 for sure). So we "prune" the rest of the tree there, pop back up a level and evaluate white moves E,F,G,H.... to the end.
Hope that helps.
You don't need to evaluate the entire subtree of a node to decide its value. Alpha Beta Pruning uses two dynamically computed bounds alpha and beta to bound the values that nodes can take.
Alpha is the minimum value that the max player is guaranteed (regardless of what the min player does) through another path through the game tree. This value is used to perform cutoffs (pruning) at the minimizing levels. When the min player has discovered that the score of a min node would necessarily be less than alpha, it need not evaluate any more choices from that node because the max player already has a better move (the one which has value alpha).
Beta is the maximum value that the min player is guaranteed and is used to perform cutoffs at the maximizing levels. When the max player has discovered that the score of a max node would necessarily be greater than beta, it can stop evaluating any more choices from that node because the min player would not allow it to take this path since the min player already has a path that guarantees a value of beta.
I've written a detailed explanation of Alpha Beta Pruning, its pseudocode and several improvements: http://kartikkukreja.wordpress.com/2014/06/29/alphabetasearch/
(Very) short explanation for mimimax:
You (the evaluator of a board position) have the choice of playing n moves. You try all of them and give the board positions to the (opponent) evaluator.
The opponent evaluates the new board positions (for him, the opponent side) - by doing essentially the same thing, recursively calling (his opponent) evaluator, unless the maximum depth or some other condition has been reached and a static evaluator is called - and then selects the maximum evaluation and sends the evaluations back to you.
You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.
(Very) short explanation for α-β-pruning:
You (the evaluator of a board position) have the choice of playing n moves. You try all of them one by one and give the board positions to the (opponent) evaluator - but you also pass along your current evaluation (of your board).
The opponent evaluates the new board position (for him, the opponent side) and sends the evaluation back to you. But how does he do that? He has the choice of playing m moves. He tries all of them and gives the new board positions (one by one) to (his opponent) evaluator and then chooses the maximum one.
Crucial step: If any of those evaluations that he gets back, is bigger than the minimum you gave him, it is certain that he will eventually return an evaluation value at least that large (because he wants to maximize). And you are sure to ignore that value (because you want to minimize), so he stops any more work for boards he hasn't yet evaluated.
You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.
Here's a short answer -- you can know the value of a node without computing the precise value of all its children.
As soon as we know that a child node cannot be better, from the perspective of the parent-node player, than the previously evaluated sibling nodes, we can stop evaluating the child subtree. It's at least this bad.
I think your question hints at misunderstanding of the evaluation function
if you can work out the score of a node, you will need to know the score of all nodes on a layer lower than the node (in my understanding of minimax)
I'm not completely sure what you meant there, but it sounds wrong. The evaluation function (EF) is usually a very fast, static position evaluation. This means that it needs only look at a single position and reach a 'verdict' from that. (IOW, you don't always evaluate a branch to n plys)
Now many times, the evaluation truly is static, which means that the position evaluation function is completely deterministic. This is also the reason why the evaluation results are easily cacheable (since they will be the same each time a position is evaluated).
Now, for e.g. chess, there is usually quite a bit of overt/covert deviation from the above:
a position might be evaluated differently depending on game context (e.g. whether the exact position did occur earlier during the game; how many moves without pawn moves/captures have occurred, en passant and castling opportunity). The most common 'trick' to tackle this is by actually incorporating that state into the 'position'1
a different EF is usually selected for the different phases of the game (opening, middle, ending); this has some design impact (how to deal with cached evaluations when changing the EF? How to do alpha/beta pruning when the EF is different for different plies?)
To be honest, I'm not aware how common chess engines solve the latter (I simply avoided it for my toy engine)
I'd refer to an online resources like:
Computer Chess Programming Theory
Alpha/Beta Pruning
Iterative Deepening
Transposition Table
1just like the 'check'/'stalemate' conditions, if they are not special cased outside the evaluation function anyways

What is it that I don't understand about the minimax algorithm

I have a question about the minimax algorithm.
Lets say I have the following game tree, and I've added some random heuristic values to it.
As I've understood the minimax algorithm, it will choose the green path. However, this might not be the best thing to choose in the situation. Because the right child of the top node, has the HIGHEST value that it can get, it's not the best move...
Since if the other player does the other move, my win chance is much less...
I'm sorry, I'm having a hard time expressing what I mean on this question. But how am I thinking wrong here?
The usual way to solve this is to proceed backwards from the lower layers of the tree. Let's check the lowermost four leaves first (the 10-20-15-20 part). Player 2 gets to choose from these if the game ever gets there, so P2 will choose the smaller ones, i.e. 10 and 15. We can then prune the 10-20-15-20 branches of the tree and replace them with 10 (for the leftmost two leaves) and 15 (for the rightmost two). Similarly, we can prune the -100 - 50 pair at the middle and replace them with -100 (not 50 as you did, because at this level it is player 2's turn and he will choose the smaller outcome), the -200 - -100 pair with -200 and so on. So, for me, it seems to be the case that you are taking the maximum at each branching point instead of alternating between the maximum and the minimum.
You should alternate between taking the minimum and maximum. If you want to take 50, which is the maximum of 30 and 50, then you should have chosen -100 one level lower on the right side instead, etc.. That's why the algorithm is called minimax.
the algorithm assumes both you and the 2nd player wants to win, and will always choose the best move. thus, in the question's tree - as I said in the comment, the last move (2nd player makes) is left and not right. this results in making the whole right subtree - unworthy for the first player, and the minmax algorithm will choose the following path (and not as described in the question): left->left->right->left
this is true the algorithm "gives you less chance to win" this is because of the fact that there is a 2nd player, who wants to win as well!
have a look at his example.
in here, the x player wants to avoid defeat so he persues the '0' in the first step. note that if (in the example) he would take first left, the 2nd player then takes left again and wins! the algorithm assures best possibility - asuuming the 2nd player acts the same as well (and assuming it knows the whole game tree)

Minimax algorithm: Cost/evaluation function?

A school project has me writing a Date game in C++ (example at http://www.cut-the-knot.org/Curriculum/Games/Date.shtml) where the computer player must implement a Minimax algorithm with alpha-beta pruning. Thus far, I understand what the goal is behind the algorithm in terms of maximizing potential gains while assuming the opponent will minify them.
However, none of the resources I read helped me understand how to design the evaluation function the minimax bases all it's decisions on. All the examples have had arbitrary numbers assigned to the leaf nodes, however, I need to actually assign meaningful values to those nodes.
Intuition tells me it'd be something like +1 for a win leaf node, and -1 for a loss, but how do intermediate nodes evaluate?
Any help would be most appreciated.
The most basic minimax evaluates only leaf nodes, marking wins, losses and draws, and backs those values up the tree to determine the intermediate node values.  In the case that the game tree is intractable, you need to use a cutoff depth as an additional parameter to your minimax functions.  Once the depth is reached, you need to run some kind of evaluation function for incomplete states.
Most evaluation functions in a minimax search are domain specific, so finding help for your particular game can be difficult.  Just remember that the evaluation needs to return some kind of percentage expectation of the position being a win for a specific player (typically max, though not when using a negamax implementation).  Just about any less researched game is going to closely resemble another more researched game.  This one ties in very closely with the game pickup sticks.  Using minimax and alpha beta only, I would guess the game is tractable.  
If you are must create an evaluation function for non terminal positions, here is a little help with the analysis of the sticks game, which you can decide if its useful for the date game or not.
Start looking for a way to force an outcome by looking at a terminal position and all the moves which can lead to that position.  In the sticks game, a terminal position is with 3 or fewer sticks remaining on the last move.  The position that immediately proceeds that terminal position is therefore leaving 4 sticks to your opponent.  The goal is now leave your opponent with 4 sticks no matter what, and that can be done from either 5, 6 or 7 sticks being left to you, and you would like to force your opponent to leave you in one of those positions.  The place your opponent needs to be in order for you to be in either 5, 6 or 7 is 8.  Continue this logic on and on and a pattern becomes available very quickly.  Always leave your opponent with a number divisible by 4 and you win, anything else, you lose.  
This is a rather trivial game, but the method for determining the heuristic is what is important because it can be directly applied to your assignment.  Since the last to move goes first, and you can only change 1 date attribute at a time, you know to win there needs to be exactly 2 moves left... and so on.
Best of luck, let us know what you end up doing.
The simplest case of an evaluation function is +1 for a win, -1 for a loss and 0 for any non-finished position. Given your tree is deep enough, even this simple function will give you a good player. For any non-trivial games, with high branching factor, typically you need a better function, with some heuristics (e.g. for chess you could assign weights to pieces and find a sum, etc.). In the case of the Date game, I would just use the simplest evaluation function, with 0 for all the intermediate nodes.
As a side note, minimax is not the best algorithm for this particular game; but I guess you know it already.
From what I understand of the Date game you linked to, it seems that the only possible outcomes for a player are win or lose, there is not in between (please correct me if I'm wrong).
In this case, it is only a matter of assigning a value of 1 to a winning position (current player gets to Dec 31) and a value of -1 to the losing positions (other player gets to Dec 31).
Your minimax algorithm (without alpha-beta pruning) would look something like this:
A_move(day):
if day==December 31:
return +1
else:
outcome=-1
for each day obtained by increasing the day or month in cur_date:
outcome=max(outcome,B_move(day))
return outcome
B_move(day):
if day==December 31:
return -1
else:
outcome=+1
for each day obtained by increasing the day or month in cur_date:
outcome=min(outcome,A_move(day))
return outcome

Resources