AI algorithm for item pickup race - algorithm

I would like to build an AI for the following game:
there are two players on a M x N board
each player can move up/down or left/right
there are different items on the board
the player wins who has more items than the other player in as many categories as possible (having more items in one category makes you the winner of this category, the player with more categories wins the game)
in one turn you can either pick up an item you are standing on or move
player moves are made at the same time
two players standing on the same field have a 0.5 pickup chance if both do it
The game ends if one of the following condition is met:
all the items have been picked up
there is already a clear winner since one player has has more than half the items of more than half of the categories
I have no idea of AI, but I have taken a machine learning class some time ago.
How do I get started on such a problem?
Is there a generalization of this problem?

The canonical choice for adversarial search games like you proposed (called two player zero-sum games) is called Minimax search. From wikipedia, the goal of Minimax is to
Minimize the possible loss for a worst case (maximum loss) scenario. Alternatively, it can be thought of as maximizing the minimum gain.
Hence, it is called minimax, or maximin. Essentially you build a tree of Max and Min levels, where the nodes each have a branching factor equal to the number of possible actions at each turn, 4 in your case. Each level corresponds to one of the player's turns, and the tree extends until the end of the game, allowing you to search for the optimal choice at each turn, assuming the opponent is playing optimally as well. If your opponent is not playing optimally, you will only score better. Essentially, at each node you simulate every possible game and choose the best action for the current turn.
If it seems like generating all possible games would take a long time, you are correct, it's an exponential complexity algorithm. From here you would want to investigate alpha-beta pruning, which essentially allows you to eliminate some of the possible games you are enumerating based on the values you have found so far, and is a fairly simple modification of minimax. This solution will still be optimal. I defer to the wikipedia article for further explanation.
From there, you would want to experiment with different heuristics for eliminating nodes, which could prune the tree of a significant number of nodes to traverse, however do note that eliminating nodes via heuristics will potentially produce a sub-optimal, but still good solution depending on your heuristic. One common tactic is to limit the depth of the search tree, essentially you search maybe 5 moves ahead to determine the best current move, using an estimate of each player's score at 5 moves ahead. Once again, this is a heuristic you could tweak. Something like simply calculating the score of the game as if it ended on that turn might suffice, and is definitely a good starting point.
Finally, for the nodes where probability is concerned, there is a slight modification of Minimax called Expectiminimax that essentially takes care of probability by adding a "third" player that chooses the random choice for you. The nodes for this third player take the expected value of the random event as their value.

The usual approach to any such problem is to play the game with live opponent long enough to find some heuristic solutions (short term goals) that lead you to victory. Then you implement these heuristics in your solution. Start with really small boards (1x3) and small number of categories (1), play them and see what happens, and then advance to more complicated cases.
Without playing the game I can only imagine that categories with less items are more valuable, also categories with items currently closer to you, and categories with items that are farthest away from you but still closer to you than to the opponent.
Every category has a cost, which is number of moves required to gain control of it, but the cost for you is different from the cost for the opponent, and it changes with every move. Category has greater value to you if the cost for you is near the cost for the opponent, but is still less than opponent's cost.
Every time you make a move categories change their values, so you have to recalculate the board and go from there in deciding your next move. The goal is to maximize your values and minimize opponents values, assuming that opponent uses the same algorithm as you.
The search for best move gets more complicated if you explore more than one turn in advance, but is also more effective. In this case you have to simulate opponents moves using the same algorithm, and then choosing your move to which opponent has the weakest counter-move. This strategy is called minimax.
All this is not really an AI, but it is an road map for an algorithm. Neural networks mentioned in the other answer are more AI-like, but I don't know anything about them.

The goal of the AI is to always seek to maintain the win conditions.
If it is practical (depending on how item locations are stored), at the start of each turn, the distance to all remaining items should be known to the AI. Ideally, this would be calculated once when the game is started, then simply "adjusted" based on where the AI moves, instead of recalculating at each turn. It also wouldn't be wise to have the AI do the same thing for the player if the AI isn't going to be only considering it's own situation.
From there is a matter of determining what item should be picked up as an optimization of the following considerations:
What items and item categories does the AI currently have?
What items and item categories does the player currently have?
What items and item categories are near the AI?
What items and item categories are near the Player?
Exactly how you do this largely depends on how difficult to beat you want the AI to be.
A simple way would be to use a greedy approach and simply go after the "current" best choice. This could be done by simply finding the closest item that is not in a category that the player is currently winning by so many items (probably 1-3). This produces an AI that tries to win, but doesn't think ahead making it rather easy to predict.
Allowing for the greedy algorithm to check multiple turns ahead will improve the algorithm, that and considering what the player will do will improve the algorithm further.
Heuristics will lead to a more realistic AI and hard to beat AI. Possibly even practically impossible to beat.

Related

Monte Carlo Tree Search: Tree Policy for two player games

I am a little confused about how the MCTS "Tree Policy" is implemented. Every paper or article I read talks about going down the tree from the current game state(in MCTS teminology: the root for the player about to make a move). My question is how am I selecting the best child even when I am at the MIN player level ( assuming I am the MAX player). Even if I select some particular action that MIN might take, and my search tree gets deeper through that node, the MIN player during its turn might just as well choose some different node.( If the min player is a amateur human it might just as well choose some node which is not necessarily the best). This kind of makes MAX's entire work in propagating through that node futile since the MIN has chosen a different node.
For the steps I am referring to :
https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/
where the tree policy : https://jeffbradberry.com/images/mcts_selection.png
kind of makes me believe that they are executing it from a single player perspective.
To implement MCTS for two player game, you can simply flip the sign in every step of back-propagation, a one-line change in the code.
This means we are trying to maximize reward in every layer, but when we propagate the reward up the tree the positive reward for your opponent become negative for you when you get to your layer.
For MCTS, you need some way of generating a reasonable estimate of the probability distribution of possible moves. For AlphaGo [1], this is the fast rollout probability, $p_\pi$ in the paper, which takes a state and outputs a rough probability distribution over all possible moves. The AlphaGo team implemented this as a shallow neural net trained first on expert games, and then improved by playing against itself.
[1] http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html

Minimax: what to do with equal scores in endgame?

As far as I understood, the Minimax algorithm in its simplest form works the following way: traverse the game tree bottom-up and if it's the player's turn, assign the maximum score of all child nodes to the current node, else the minimum score. Leaves are scored according to the game output, let's say +1 for win, 0 for draw, -1 for loss. Finally choose the move which leads to the node with the highest score.
Of course it is impractical to traverse the whole game tree, so a heuristic is used. But assume we are near the end of the game. Then I have found some problems with this simple approach.
For example, we are playing chess and the player (playing white) has reached this position:
It is the players turn. So there is a mate in one with Qg7, the node for Qg7 thus gets a score of 1. But for instance, Ke1 is also a legal move. The only reply is c5, then Qg7# is still available. And because Qg7 gets a score of 1, so does c5, so does Ke1.
So we have at least two moves with a score of 1 (Ke1 and Qg7). Let's say the algorithm considers king moves first and chooses the first move with the highest score. That would mean, that in this position, the player would not checkmate the opponent, but would do random King moves until the opponent could actually prevent the checkmate (with queening the pawn).
The fundamental problem is, that a checkmate in one (Qg7) has the same score as a checkmate in two (Ke1), so there is no reason for the player, to actually go for the checkmate in one.
This can be prevented with a simple modification to the Minimax algorithm: in case of equal score, choose the shorter path to the position with this score. So a checkmate in one would be prefered.
My question is: I have not found any mention of this in any Minimax-related source, so is there some misunderstanding of Minimax on my part? If not, is this the usual way to solve this or are there superior ways?
I'm pretty sure you understand minimax correctly.
The thing I would probably do is to simply pass down the current distance in the minimax function, and weight the wins / losses according to that. A faster win (to reduce the possibility of unseen situations) and a slower loss (to allow for mistakes by the opponent) should generally be preferred. Whether the win is 1, or any positive value, it doesn't matter too much - it will still get picked as better than 0 or -1.
If you have a win be the largest possible value in your heuristic - you can still do something similar - just weight it by increasing or decreasing it a little, but still have it be larger than all other non-win values.
For your example, it probably doesn't really matter, as, when the pawn gets close to promoting, you'd detect that a draw is coming and then you'd make the winning move. But it can certainly be a problem if:
There's a sequence of inevitable moves beyond the search depth that results in a worst outcome for you (which is a pretty common problem with minimax, but if it's likely avoidable, it's obviously better to do so)
There's a time constraint on your side
You don't cater for all draw conditions (e.g. 3 repeated positions)

Negamax - player moves twice

How do you handle games where, if a condition is met, the same player moves ?
I tried something like this but I don't think it's quite right:
function negamax(node, depth, α, β, color)
if node is a terminal node or depth = 0
return color * the heuristic value of node
else
foreach child of node
if (condition is met) // the same player moves
val := negamax(child, depth-1, α, β, color)
else
val := -negamax(child, depth-1, -β, -α, -color)
if val≥β
return val
if val≥α
α:=val
return α
Dont try changing the minimax algorithm itself for this, instead modify the game representation to accommodate. There are basically two solutions:
Represent the sequences of moves made by a single player as one move. This works when the game is simple, but wont always. I wrote an AI engine for a game where generating this tree (described as one "move" in the game rules) is PSPACE hard (and had a very large n for real games) meaning it was not computationally feasible. On the other-hand, if modeling it this way is easy for your particular game, do it that way
Represent the sequences of moves made by one player as a sequences of moves alternating moves, where the other player does do anything. That is, you add a piece of information to the game state whenever your condition is met, such that the only move the other player can make does not change the state. This method is mathematically correct, and when I used it worked pretty well in practice. The only complexity to keep in mind is that if you use iterative deepening you will under evaluate game trees where one player moves many times in a row. You also may need to be careful when designing storage heuristics to be used with the transposition table and other hashes
I know of no literature that discuses your particular problem. I felt clever when I came up with solution 2 above, but I suspect many other people have invented the same trick.
I should say, getting the minimax family right is surprisingly hard. A trick when designing game search AIs in high level languages is to test your search algorithms on simpler games (reduced board size, use tic-tac-toe, etc) to ensure correctness. If the game is small engough you can a. make sure its results make sense by playing the game out by hand and b. test advanced algorithms like negascout by making sure they give the same answer as naive negamax. It is also a good idea to try to keep code with game specific behavior (evaluation functions, board representations, search and storage heuristics, etc) away from the code that does tree searches.
In negamax, you are exploring a tree structure in which each node has children corresponding to the moves made by a player. If in some case a player can move twice, you would probably want to think of the "move" for that player as the two-move sequence that player makes. More generally, you should think of the children of the current game state as all possible states that the current player can get the game into after their turn. This includes all game states reachable by one move, plus all the game states reachable in two moves if the player is able to make two moves in one turn. You should, therefore, leave the basic logic of negamax unchanged, but update your code to generate successor states to handle the case where a single player can move twice.
Hope this helps!
When condition is met, don't decrement depth.

How to simulate battle between two players?

I've got two players and I want to simulate a game between them. Both have some attributes (power, intelligence...) and different actions. The outcome of some action is based on attribute values and some luck factor.
Algorithm:
Construct a game tree of all possible moves for both players
game tree would probably have limited depth
Every level would belong to different player
Use some heuristics at leaf nodes to find out probability of wining for player who has to make a move
propagate probabilities up (like minimax algorithm does)
choose a move with highest probability
continue at the beginning of this algorithm
So, basically this is minimax algorithm. I've got few question though:
How to take luck factor in account?
When I make one move, do I have to run whole algorithm again? (building the tree with +1 depth and new root node, calculate new probabilities...)
Any other idea for simulating a battle?
Thanks.
You should look into Monte Carlo Tree Search, it sounds as it if will fit in great with your problem.
Rather than using a heuristic, it runs a full game using random players at each branch before expanding the tree. The good thing about this is, that you're actually building a tree of probabilities, AND you do not have to expand the tree to the end or some cutoff with heuristic like MinMax.
MCTS is also the current best method in the game GO, and currently best at playing games with unknown rules. For extra effect, you can use some finite state machine agents instead of random players to make the probability more accurate. And you can also reduce branching factor by using a decider that skips certain branches, using a machine learning derived heuristic. (But that's something you'd do last to increase the speed of the technique)
If you can do MinMax, you can do MCTS without too much trouble :) And MCTS can play far more complex games than MinMax ever will, because of its greatly reduced complexity in comparison. (Good if you intend to expand the rules of the game continously)
Have a look here if you're interested:
http://www.aaai.org/Papers/AIIDE/2008/AIIDE08-036.pdf
And yes, you have to do this at every move for every player. So both MinMax and MCTS will be slow; all game tree based techniques are slow.
With MinMax you can however preserve some of your tree; move to the branch that is your new state, and remove its parent and the subtrees that are connected to it. Then expand one depth futher in the subtree that remains. But this is speculation; I have never had time to do that before :) (You'll be preserving errors in your probability calculations however)
Good thing about these techniques, is that when you've built them they work. Machine learning techniques runs a lot faster, but requires hours if not days of training prior to the use ;)
While generally your algorithm makes sense there is no way we can guarantee that this algorithm is the best one. For example let's imagine two games:
In first game each player has 2 actions: fire with a gun and strike with a sword. In this game each step doesn't affect other steps so building a move tree won't make any sense here. Each player just have to choose the weapon and keep firing/striking and shouting 'with the shield or on it!' till death or win.
Second game has also third action - steal opponent's shield. In this case move tree will make more sense since it is pretty clear that if you've decided to steal enemy shield anyway then it will make more sense to steal it before striking with a sword.
So whether you need this move tree or not highly depends on your game rules.
The main option regarding luck factor as I see is whether to include it's influence into move tree or not. It depends on whether luck factor affects every action in the same way. If it is true then luck factor can be omitted while calculating move tree and then applied when you will calculate outcome of chosen action. Otherwise if luck factor affects different actions in different manner (for example even complete loser is able to shot an enemy with a gun but kill with a spoon skill requires good luck) then luck factor should be taken into account while calculating probabilities in move tree.
Whether you need or not to recalculate the entire tree after each node depends on whether you can predict result of chosen action with 100%. For example in chess you can predict that if you decided to move a pawn then that pawn will definitely move where you decided to. This allows you on each step take chosen branch in move tree and calculate one more move for each scenario in it instead of recalculating the full tree from nothing. But this is not applicable if player can decide to shoot with a gun but because of he's unlucky day he will shoot himself in a leg.
What you are asking for is very "vast"...but has been done by many developers.
I would advise that you starting reading a book about Game Design like this one:
http://www.amazon.com/Game-Design-Practice-Wordware-Developers/dp/1556229127/ref=cm_cr_pr_product_top
... and also search in www.CodeProject.com and www.codeplex.com for examples of games implementations.
Good luck,

Logic / Probability Question: Picking from a bag

I'm coding a board game where there is a bag of possible pieces. Each turn, players remove randomly selected pieces from the bag according to certain rules.
For my implementation, it may be easier to divide up the bag initially into pools for one or more players. These pools would be randomly selected, but now different players would be picking from different bags. Is this any different?
If one player's bag ran out, more would be randomly shuffled into it from the general stockpile.
So long as:
the partition into "pool" bags is random
the assignment of players to a given pool bag is random
the game is such that items drawn by the players are effectively removed from the bag (never returned to the bag,or any other bag, for the duration of the current game)
the players are not cognizant of the content of any of the bags
The two approaches ("original" with one big common bag, "modified" with one pool bag per player are equivalent with regards to probabilities.
It only gets a bit tricky towards the end of the game, when some of players' bags are empty. The fairest to let pick from 100% of the items still in play, hence, they should both pick from which bag they pick and [blindly,of course] pick one item from said bag.
This problem illustrate an interesting characteristic of probabilities which is that probabilities are relative to the amount of knowledge one has about the situation. For example the game host may well know that the "pool" bag assigned to say player X does not include any say Letter "A" (thinking about scrabble), but so long as none of the players know this (and so long as the partitions into pool bag was fully random), the game remains fair, and player "X" still has to assume that his/her probably of hitting an "A" the next time a letter is drawn, is the same as if all remaining letters were available to him/her.
Edit:
Not withstanding the mathematical validity of the assertion that both procedures are fully equivalent, perception is an important factor in games that include a chance component (in particular if the game also includes a pecuniary component). To avoid the ire of players who do not understand this equity, you may stick to the original procedure...
Depending on the game rules, #mjv is right, the initial random division doesn't affect the probabilities. This is analogous to a game where n players draw cards in turn from a face down deck: the initial shuffle of the deck is the random division into the "bags" of cards for each player.
But if you replace the items after each draw, it does matter if there is one bag or many. With one bag any particular item will eventually be drawn by any player with the same probability. With many bags, that item can only be drawn by the player whose bag it was initially placed in.
Popping up to the software level, if the game calls for a single bag, I'd recommend just programming it that way: it should be no more difficult than n bags, and you don't have to prove the new game equivalent to the old.
My intuition tells me that dividing a random collection of things into smaller random subsets would remains equally random... doesn't matter if a player picks from a big pool or a smaller one (that in turns, feed itself into the big one)
For a game it is enough random IMHO!
Depending on how crucial security is, it might be okay (if money is involved (you or them) DO NOT DO THAT). I'm not entirely sure it would be less random from the perspective of an ignorant player.
a) Don't count on them being ignorant, your program could be cracked and then they would know what pieces are coming up
b) It would be very tricky to fill the bags in such a way that you don't introduce vulnerabilities. For instance, let's take the naive algorithm of picking one randomly and putting it in a the first bucket, taking it out, and then doing the same for the second bucket and so on. You just ensured that if there are N pieces, the first player had a probability of 1/N of picking a given piece, the second player had a 1/(N-1), the third had 1/(N-3) and so on. Players can then analyze the pieces already played in order to figure out the probabilities that other players are holding certain pieces.
I THINK the following algorithm might work better, but almost all people get probability wrong the first time they come up with a new algorithm. DON'T USE THIS, just understand that it might cover the security vulnerability I talked about:
Create a list of N ordered items and instantiate P players
Mark 1/P of the items randomly (with replacement) for each player
Do this repeatedly until all N items are marked and there are an equal
number of items marked for each player (NOTE: May take much longer than you may live depending on N and P)
Place the appropriate items in the player's bucket and randomly rearrange (do NOT use a place swapping algorithm)
Even then after all this, you might still have a vulnerability to someone figuring out what's in their bucket from an exploit. Stick with a combined pool, it's still tricky to pick really randomly, but it will make your life easier.
Edit: I know the tone sounds kind of jerky. I mostly included all that bold for people who might read this out of context and try some of these algorithms. I really wish you well :-)
Edit 2: On further consideration, I think that the problem with picking in order might reduce to having players taking turns in the first place. If that's in the rules already it might not matter.

Resources