Monte Carlo Tree Search: Tree Policy for two player games - algorithm

I am a little confused about how the MCTS "Tree Policy" is implemented. Every paper or article I read talks about going down the tree from the current game state(in MCTS teminology: the root for the player about to make a move). My question is how am I selecting the best child even when I am at the MIN player level ( assuming I am the MAX player). Even if I select some particular action that MIN might take, and my search tree gets deeper through that node, the MIN player during its turn might just as well choose some different node.( If the min player is a amateur human it might just as well choose some node which is not necessarily the best). This kind of makes MAX's entire work in propagating through that node futile since the MIN has chosen a different node.
For the steps I am referring to :
https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/
where the tree policy : https://jeffbradberry.com/images/mcts_selection.png
kind of makes me believe that they are executing it from a single player perspective.

To implement MCTS for two player game, you can simply flip the sign in every step of back-propagation, a one-line change in the code.
This means we are trying to maximize reward in every layer, but when we propagate the reward up the tree the positive reward for your opponent become negative for you when you get to your layer.

For MCTS, you need some way of generating a reasonable estimate of the probability distribution of possible moves. For AlphaGo [1], this is the fast rollout probability, $p_\pi$ in the paper, which takes a state and outputs a rough probability distribution over all possible moves. The AlphaGo team implemented this as a shallow neural net trained first on expert games, and then improved by playing against itself.
[1] http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html

Related

Game Tree Algorithms & Progressive Deepening: How to approximate an answer without reaching the leaf nodes?

I just saw this MIT lecture on Game Trees and MinMax algorithms where Alpha Beta pruning and Progressive Deepening was discussed.
https://www.youtube.com/watch?v=STjW3eH0Cik
So If I understand correctly progressive deepening is when you try to approximate the answer at every level and try to go deep towards the leaf nodes depending on the time limit you have for your move. It's important to have some answer at any point of time.
Now, at 36:22 Prof discusses the case when we don't have enough time and we went only till the (d-1) th level, where d is the depth of the tree. And then he also suggests we can have an temporary answer at every level as we go down as we should have some approximate answer at any point of time.
My question is how can we have any answer without going to the leaf nodes because it's only at the leaf nodes we can conclude who can win the game. Think this for tic-tac-toe game. At (d-1)th level we don't have enough information to decide if this series of moves till this node at (d-1) will win me or lose me the game. At higher levels say at (d-3) it's even more blur! Everything is possible as we go down. Isn't it? So, if an algorithm decides to compute till (d-1) th level then all those path options are equal! Nothing guarantees a win and nothing guarantees a lose at (d-1)th level because if I understand correctly wins and losses can be calculated only at the leaf nodes. This is so true especially in pure MinMax algorithm.
So how exactly are we going to have an 'approximate answer' at (d-1)th level or say (d-5)th level?
I will try to explain that well.
Context and importants of progressive deepening
I need you to know that in the real-world game, the time that you will use to decide is limited! ( because user experience and other issue on human-computer interaction or about the problem/design in your game). You have a game tree and to use difference algorithm to optimization that travel all tree. But there are three problems:
You have a time constraints!
You need to calculate the best solution in your current game tree and that's time of calculating depending the deep of tree!
you need to decide if go down in the tree to have a more precise answer without violate the time constraints.
The Answer of all problem is Progressive deepening: in current level you calculate the answer and try to pass the next level in the tree; but if you have not time you ready have a answer in the previous level and to get it out as answer
The answer your question
you can imagine the current level in your tree is "the final level" (you are supposing) in the game tree, but you will get a the best solution if you go to the next level in the tree, then if you can go to the next level: go now! but you need to calculate the optimal answer in the current game tree because it's "the final level" in the game tree as insurance policy if you don't finish the calcutation of the best answer in the next level by time constraint.

AI algorithm for item pickup race

I would like to build an AI for the following game:
there are two players on a M x N board
each player can move up/down or left/right
there are different items on the board
the player wins who has more items than the other player in as many categories as possible (having more items in one category makes you the winner of this category, the player with more categories wins the game)
in one turn you can either pick up an item you are standing on or move
player moves are made at the same time
two players standing on the same field have a 0.5 pickup chance if both do it
The game ends if one of the following condition is met:
all the items have been picked up
there is already a clear winner since one player has has more than half the items of more than half of the categories
I have no idea of AI, but I have taken a machine learning class some time ago.
How do I get started on such a problem?
Is there a generalization of this problem?
The canonical choice for adversarial search games like you proposed (called two player zero-sum games) is called Minimax search. From wikipedia, the goal of Minimax is to
Minimize the possible loss for a worst case (maximum loss) scenario. Alternatively, it can be thought of as maximizing the minimum gain.
Hence, it is called minimax, or maximin. Essentially you build a tree of Max and Min levels, where the nodes each have a branching factor equal to the number of possible actions at each turn, 4 in your case. Each level corresponds to one of the player's turns, and the tree extends until the end of the game, allowing you to search for the optimal choice at each turn, assuming the opponent is playing optimally as well. If your opponent is not playing optimally, you will only score better. Essentially, at each node you simulate every possible game and choose the best action for the current turn.
If it seems like generating all possible games would take a long time, you are correct, it's an exponential complexity algorithm. From here you would want to investigate alpha-beta pruning, which essentially allows you to eliminate some of the possible games you are enumerating based on the values you have found so far, and is a fairly simple modification of minimax. This solution will still be optimal. I defer to the wikipedia article for further explanation.
From there, you would want to experiment with different heuristics for eliminating nodes, which could prune the tree of a significant number of nodes to traverse, however do note that eliminating nodes via heuristics will potentially produce a sub-optimal, but still good solution depending on your heuristic. One common tactic is to limit the depth of the search tree, essentially you search maybe 5 moves ahead to determine the best current move, using an estimate of each player's score at 5 moves ahead. Once again, this is a heuristic you could tweak. Something like simply calculating the score of the game as if it ended on that turn might suffice, and is definitely a good starting point.
Finally, for the nodes where probability is concerned, there is a slight modification of Minimax called Expectiminimax that essentially takes care of probability by adding a "third" player that chooses the random choice for you. The nodes for this third player take the expected value of the random event as their value.
The usual approach to any such problem is to play the game with live opponent long enough to find some heuristic solutions (short term goals) that lead you to victory. Then you implement these heuristics in your solution. Start with really small boards (1x3) and small number of categories (1), play them and see what happens, and then advance to more complicated cases.
Without playing the game I can only imagine that categories with less items are more valuable, also categories with items currently closer to you, and categories with items that are farthest away from you but still closer to you than to the opponent.
Every category has a cost, which is number of moves required to gain control of it, but the cost for you is different from the cost for the opponent, and it changes with every move. Category has greater value to you if the cost for you is near the cost for the opponent, but is still less than opponent's cost.
Every time you make a move categories change their values, so you have to recalculate the board and go from there in deciding your next move. The goal is to maximize your values and minimize opponents values, assuming that opponent uses the same algorithm as you.
The search for best move gets more complicated if you explore more than one turn in advance, but is also more effective. In this case you have to simulate opponents moves using the same algorithm, and then choosing your move to which opponent has the weakest counter-move. This strategy is called minimax.
All this is not really an AI, but it is an road map for an algorithm. Neural networks mentioned in the other answer are more AI-like, but I don't know anything about them.
The goal of the AI is to always seek to maintain the win conditions.
If it is practical (depending on how item locations are stored), at the start of each turn, the distance to all remaining items should be known to the AI. Ideally, this would be calculated once when the game is started, then simply "adjusted" based on where the AI moves, instead of recalculating at each turn. It also wouldn't be wise to have the AI do the same thing for the player if the AI isn't going to be only considering it's own situation.
From there is a matter of determining what item should be picked up as an optimization of the following considerations:
What items and item categories does the AI currently have?
What items and item categories does the player currently have?
What items and item categories are near the AI?
What items and item categories are near the Player?
Exactly how you do this largely depends on how difficult to beat you want the AI to be.
A simple way would be to use a greedy approach and simply go after the "current" best choice. This could be done by simply finding the closest item that is not in a category that the player is currently winning by so many items (probably 1-3). This produces an AI that tries to win, but doesn't think ahead making it rather easy to predict.
Allowing for the greedy algorithm to check multiple turns ahead will improve the algorithm, that and considering what the player will do will improve the algorithm further.
Heuristics will lead to a more realistic AI and hard to beat AI. Possibly even practically impossible to beat.

Othello Evaluation Function

I am currently developing a simple AI for Othello using minimax and alpha-beta pruning.
My question is related to the evaluation function for the state of the board.
I am currently looking to evaluate it by looking at:
Disc count (parity)
Number of legal moves
Importance of particular positions
So lets say the root node is the initial game state. The first action is the the AI's action while the second action is the opponent's action.
0
/ \ AI's Action
1 1
/ \ \ Opponent's action
2 2 2
At node level 1, do I evaluate the disc count of my AI's chips and the number of legal moves it can make at the point of time after it has completed an action?
At node level 2, do I evaluate the disc count of the opponent's chips and the number of legal moves it can make at the point of time after the opponent has completed an action?
Meaning AI move -> Opponent move ==> At this point of time I evaluate the opponent's disc count and the number of legal the opponent can make.
When generating games trees, you shouldn't evaluate a node unless it is a leaf node. That is, you generate a tree until a level N (which corresponds to a board with N moves made ahead of the current state of the board) unless you have reached a node which corresponds to an end of game situation. It is only in those nodes when you should evaluate the state of the board game with your evaluation function. That is what the minimax algorithm is about. The only case I know in which you evaluate a node after every player move is in iterative deepening algorithm which seems you are not using.
The evaluation function is responsible for providing a quick assessment of the "score" of a particular position - in other words, which side is winning and by how much. It is also called static evaluation function because it looks only at a specific board configuration. So yes, when you reach level N you can count the possible moves of both the computer and the user and substract them. For example, if the result is positive it would mean that the computer has the advantage, if it is 0 it would mean a tie and it it is negative it will represent a disadvantage situation for the user in terms of mobility. Scoring a node which represents an end of game board configuration is trivial, assign a maximum value if you win and minimum value if you lose.
Mobility is one of the most important features to be considered in the evaluation function of most board games (those in which it is valuable). And to evaluate it, you count the possible moves of each player given a static board configuration no matter whose turn is next. Even if a player recently made a move, you are giving scores to boards in the same level N of the tree when the same player made the last move (therefore, scoring them in the same conditions) and picking of those the one which has the best score.
The features you are considering in your evaluation are very good. Usually, you want to consider material and mobility (which you are) in games in which they are very valuable (though, I don't know if material is always an advantage in Othello, you should know it better as it is the game you are working on) for a winning situation so I guess you are on the correct path.
EDIT: Be careful! In a leaf node the only thing you want to do is assign a particular score to a board configuration. It is in its parent node where that score is returned and compared with the other scores (corresponding to other children). In order to choose which is the best move available for a particular player, do the following: If the parent node corresponds to an opponent's move, then pick the one with the least (min)value. If it is the computer's turn to move, pick the score with the highest (max)value so that it represents the best possible move for this player.
End game evaluation function
If your search reaches a full board then the evaluation should simply be based on the disc count to determine who won.
Mid game evaluation function
The number of legal moves is useful in the opening and midgame as a large number of moves for you (and a low number for the opponent) normally indicates that you have a good position with lots of stable discs that your opponent cannot attack, while the opponent has a bad position where they may run out of moves and be forced to play a bad move (e.g. to let you play in the corner).
For this purpose, it does not matter greatly whose turn it is when counting the moves so I think you are on the correct path.
(Note that in the early stages of the game it is often an advantage to be the person with fewer discs as this normally means your opponent has few safe moves.)
Random evaluation function
Once upon a time I heard that just using a random number for the Othello evaluation function is (surprisingly to me) also a perfectly reasonable choice.
The logic is that the player with the most choices will be able to steer the game to get the highest random number, and so this approach again means that the AI will favour moves which give it lots of choices, and his opponent few.

Negamax - player moves twice

How do you handle games where, if a condition is met, the same player moves ?
I tried something like this but I don't think it's quite right:
function negamax(node, depth, α, β, color)
if node is a terminal node or depth = 0
return color * the heuristic value of node
else
foreach child of node
if (condition is met) // the same player moves
val := negamax(child, depth-1, α, β, color)
else
val := -negamax(child, depth-1, -β, -α, -color)
if val≥β
return val
if val≥α
α:=val
return α
Dont try changing the minimax algorithm itself for this, instead modify the game representation to accommodate. There are basically two solutions:
Represent the sequences of moves made by a single player as one move. This works when the game is simple, but wont always. I wrote an AI engine for a game where generating this tree (described as one "move" in the game rules) is PSPACE hard (and had a very large n for real games) meaning it was not computationally feasible. On the other-hand, if modeling it this way is easy for your particular game, do it that way
Represent the sequences of moves made by one player as a sequences of moves alternating moves, where the other player does do anything. That is, you add a piece of information to the game state whenever your condition is met, such that the only move the other player can make does not change the state. This method is mathematically correct, and when I used it worked pretty well in practice. The only complexity to keep in mind is that if you use iterative deepening you will under evaluate game trees where one player moves many times in a row. You also may need to be careful when designing storage heuristics to be used with the transposition table and other hashes
I know of no literature that discuses your particular problem. I felt clever when I came up with solution 2 above, but I suspect many other people have invented the same trick.
I should say, getting the minimax family right is surprisingly hard. A trick when designing game search AIs in high level languages is to test your search algorithms on simpler games (reduced board size, use tic-tac-toe, etc) to ensure correctness. If the game is small engough you can a. make sure its results make sense by playing the game out by hand and b. test advanced algorithms like negascout by making sure they give the same answer as naive negamax. It is also a good idea to try to keep code with game specific behavior (evaluation functions, board representations, search and storage heuristics, etc) away from the code that does tree searches.
In negamax, you are exploring a tree structure in which each node has children corresponding to the moves made by a player. If in some case a player can move twice, you would probably want to think of the "move" for that player as the two-move sequence that player makes. More generally, you should think of the children of the current game state as all possible states that the current player can get the game into after their turn. This includes all game states reachable by one move, plus all the game states reachable in two moves if the player is able to make two moves in one turn. You should, therefore, leave the basic logic of negamax unchanged, but update your code to generate successor states to handle the case where a single player can move twice.
Hope this helps!
When condition is met, don't decrement depth.

How to simulate battle between two players?

I've got two players and I want to simulate a game between them. Both have some attributes (power, intelligence...) and different actions. The outcome of some action is based on attribute values and some luck factor.
Algorithm:
Construct a game tree of all possible moves for both players
game tree would probably have limited depth
Every level would belong to different player
Use some heuristics at leaf nodes to find out probability of wining for player who has to make a move
propagate probabilities up (like minimax algorithm does)
choose a move with highest probability
continue at the beginning of this algorithm
So, basically this is minimax algorithm. I've got few question though:
How to take luck factor in account?
When I make one move, do I have to run whole algorithm again? (building the tree with +1 depth and new root node, calculate new probabilities...)
Any other idea for simulating a battle?
Thanks.
You should look into Monte Carlo Tree Search, it sounds as it if will fit in great with your problem.
Rather than using a heuristic, it runs a full game using random players at each branch before expanding the tree. The good thing about this is, that you're actually building a tree of probabilities, AND you do not have to expand the tree to the end or some cutoff with heuristic like MinMax.
MCTS is also the current best method in the game GO, and currently best at playing games with unknown rules. For extra effect, you can use some finite state machine agents instead of random players to make the probability more accurate. And you can also reduce branching factor by using a decider that skips certain branches, using a machine learning derived heuristic. (But that's something you'd do last to increase the speed of the technique)
If you can do MinMax, you can do MCTS without too much trouble :) And MCTS can play far more complex games than MinMax ever will, because of its greatly reduced complexity in comparison. (Good if you intend to expand the rules of the game continously)
Have a look here if you're interested:
http://www.aaai.org/Papers/AIIDE/2008/AIIDE08-036.pdf
And yes, you have to do this at every move for every player. So both MinMax and MCTS will be slow; all game tree based techniques are slow.
With MinMax you can however preserve some of your tree; move to the branch that is your new state, and remove its parent and the subtrees that are connected to it. Then expand one depth futher in the subtree that remains. But this is speculation; I have never had time to do that before :) (You'll be preserving errors in your probability calculations however)
Good thing about these techniques, is that when you've built them they work. Machine learning techniques runs a lot faster, but requires hours if not days of training prior to the use ;)
While generally your algorithm makes sense there is no way we can guarantee that this algorithm is the best one. For example let's imagine two games:
In first game each player has 2 actions: fire with a gun and strike with a sword. In this game each step doesn't affect other steps so building a move tree won't make any sense here. Each player just have to choose the weapon and keep firing/striking and shouting 'with the shield or on it!' till death or win.
Second game has also third action - steal opponent's shield. In this case move tree will make more sense since it is pretty clear that if you've decided to steal enemy shield anyway then it will make more sense to steal it before striking with a sword.
So whether you need this move tree or not highly depends on your game rules.
The main option regarding luck factor as I see is whether to include it's influence into move tree or not. It depends on whether luck factor affects every action in the same way. If it is true then luck factor can be omitted while calculating move tree and then applied when you will calculate outcome of chosen action. Otherwise if luck factor affects different actions in different manner (for example even complete loser is able to shot an enemy with a gun but kill with a spoon skill requires good luck) then luck factor should be taken into account while calculating probabilities in move tree.
Whether you need or not to recalculate the entire tree after each node depends on whether you can predict result of chosen action with 100%. For example in chess you can predict that if you decided to move a pawn then that pawn will definitely move where you decided to. This allows you on each step take chosen branch in move tree and calculate one more move for each scenario in it instead of recalculating the full tree from nothing. But this is not applicable if player can decide to shoot with a gun but because of he's unlucky day he will shoot himself in a leg.
What you are asking for is very "vast"...but has been done by many developers.
I would advise that you starting reading a book about Game Design like this one:
http://www.amazon.com/Game-Design-Practice-Wordware-Developers/dp/1556229127/ref=cm_cr_pr_product_top
... and also search in www.CodeProject.com and www.codeplex.com for examples of games implementations.
Good luck,

Resources