I've been reading this small tutorial on Nimbers and game theory.
Could someone explain why the mex rule governs the nimber of a game position?
See: http://en.wikipedia.org/wiki/Mex_(mathematics)
From the minimal excluded ordinal, it seems to me that the Nimber for a state is actually the minimum state that the person 'cannot' reach. How does that help in governing the state of the current game ?
I see a proof on Wikipedia, but I don't understand anything from it.
http://en.wikipedia.org/wiki/Sprague%E2%80%93Grundy_theorem#Proof
The entire idea of a nimber is to draw an analogy with the well understood game of Nim. So unless you understand THAT game, it won't make sense to you.
In the game of Nim we have a set of piles of things. On each turn, you take as many things as you want from one pile and one pile only. The winner is the person to take the last thing from the last pile.
Now try to convince yourself of the following facts.
In Nim, the nimber of a single pile is the size of that pile.
If we have a 2 pile game, the nimber of the position is the xor of the sizes of the two piles. (You will need to do a double induction.)
If we take the set of piles and split it into two, then the nimber of the whole position is the xor of the nimbers of the two subsets.
Now here is the point. Replace the piles with arbitrary deterministic games with a guaranteed win/lose. Turn the collection into a game where you're taking turns with different games, and the person who wins the last game wins. The nimber as defined above tells you, by analogy with Nim, how to play the combined game perfectly.
If you're playing just the regular 2 person game, then the only fact about the nimber that you actually need to know is whether it is 0 (you're in a losing position) or non-zero (you're in a winning position). The exact nimber is only useful when you can break a complex game into a collection of separate games that you are choosing between on each turn. However a surprising number of mathematical games do admit of such a structure.
For me, it was like this:
Understand Nim, and why the strategy works
Understand Poker Nim, and why the strategy is the same
Understand why the mex is the important number
Poker Nim is just like Nim, except that the players hold onto the ``coins'' that they remove, and on their turn, they may either move any positive number of coins from one stack into their hand, or move any positive number of coins from their hand onto one stack.
Initially, this feels very different. Play can even proceed for infinitely many moves! But that doesn't happen if Bob and Alice are playing hard. Suppose Bob looks at the stacks and sees that he would have a winning strategy if they were playing Nim and not Poker Nim. He can adapt that strategy to Nim as follows: if Alice takes coins off the table, he proceeds as if he is playing Nim; if Alice puts coins onto the table, he immediately removes the coins she just placed. Since she can only have finitely many coins in her hand, she can only stall finitely many times before she is forced to make her losing Nim move.
In Poker Nim, if I have 5 coins in hand and I look at a stack of 3 coins, I can on my move change it to any have 0, 1, 2, 4, 5, 6, 7, or 8 coins. What I can't do is leave it at the mex, which is 3. If I move it down, I am playing Nim. I move it up, you can immediately reverse it back to 3, and I am facing the same situation I was except that now I have fewer than 5 coins in hand.
So that's Poker Nim, and the essence of how the mex becomes relevant. Moves above the mex are reversible, and so cannot ever turn a losing position into a winning won. Moving above the mex is never helpful. Unless you are trying to overwhelm the computational power of your opponent, that is.
Related
I am thinking about an AI for a game that I could implement. My question is about finding an evaluation function for this game in order to apply the minimax algorithm with alpha/beta cuts.
https://en.wikipedia.org/wiki/Minimax
https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning
Let me describe the game first, explain what I plan to achieve with my AI, and get to the problem.
The game:
A 2-player turn-by-turn game.
Goal is to kill opponent or have more life points at the end.
In comparison with Magic: The Gathering, you both have monsters to attack the opponent. The number is fixed, let’s say 5 each.
A monster has a fight ability (let's say between 1 and 10), and a damage ability (let's say between 1 and 5).
Each turn:
- Active player declares to his opponent which monster (he owns) engages the current fight.
- He secretly sets multipliers face down (let’s see that in next paragraph).
- Opponent declares which monster (he owns) fights against the first one, while setting multipliers the same way.
- Fight: fight ability * multipliers = final attack. Biggest attack wins and inflicts damage ability to opponent.
- Next turn, active player switch
About multipliers: you have 4 cards in hand that can double your attack (and many empty cards, so that you put 4 cards each turn on the table, and the opponent does not know if you multiplied by 1, 2, 4, 8 or 16).
Just in case: let's say we have a rule for draws to be solved.
What I expect with the AI:
To be able to say if a perfect player should win in a given position. That means, for a winnable position, AI should tell that there is a way that leads to victory, and give the steps (see example below). For a position that is a winnable by the opponent, I have not decided yet, neither for positions that do not lead to the same winner in all cases (they exist ;D).
** An example: **
2 rounds left to go. I have
- Monster A: fight: 5, damage: 2
- Monster B: fight: 3, damage: 4
- life: 5, 1 multiplier left, my turn to begin
My opponent has
- Monster C: fight: 2, damage: 6
- Monster D: fight: 8, damage: 1
-life: 5, 1 multiplier left
In that case, if you think about it, you win if you play well.
Solution:
You can see that if monster C wins, he inflicts 6 and I lost.
But if he loses, one my monsters will inflict at least 2, and even if monster D wins (before or after),
I won't die and I will have more life that my opponent. Victory.
That's an example of what I want the AI to find.
Of course, I simplified the example. Maybe it can be trickier. And that's where my question arrives.
We can mentally kind of see that it is simple to calculate all possible duels when we have 2 rounds left (the last round does not need calculation: it is deterministic if both play their last multipliers).
As we said, we have 5 rounds to go. But my point is that we could have 20, and it becomes very long to calculate everything (as in trying to find best move in first round).
Indeed, we won't try to compute that. In chess, for instance, too many positions lead to the impossibility of computing all possibilities.
But, if you follow me, there is a solution in chess - we can implement an evaluation function. How do we know that 10 moves ahead, this moves leads to a better position ? Because we evaluate this position. We claim that a position is better if it's checkmate, obviously, or if you have more pieces, or if you control the center and so on...
Then, my question here:
How to evaluate a position in the game I presented ?
I mean, first round, if I can compute the possible moves in the 2 next rounds, I arrive at all possible positions for round 3 or 4. But it does not seem to help in my opinion. You can have better life points, better cards, more left multipliers, it all depends on what will arrive next. I don't see advantages that are compliant in general situations. What about you ?
N.B.1 I hope it was clear, I simplified game rules, of course we could add rules (combo if 2 consecutive rounds won, multipliers applicable to damage ability...)
N.B.2 I thought about a neural network, but the question is still interesting for me. And a neural network seems hard to settle because of the multiple rounds (my knowledge is far more restricted than knowing any model with retroactive action in a neural network).
N.B.3 I think that minimax and alpha/beta cuts will help if I still do a full computation analysis, but what I am afraid of is computation time, that's why I ask this here. I could probably begin with complete computation for last-2-round positions, yes.
Thanks for reading, and I hope you find this problem as stimulating as I do!
One way to evaluate the position in any game is to try to understand the thinking process of players who are considered experts in the game. So you can find experts in this game and ask them about factors that determine their decisions during the game. Or you can become an expert yourself by studying the game and playing it a lot. It can be very hard to come up with a good evaluation function by just looking at the rules of the game.
I haven't played this game, but probably it makes sense to start with some simple heuristic which is a linear combination of variables that determine the game state (health points of your main character, number of multipliers you have, total fight/damage ability of all your monsters, maximum fight/damage ability of any of your monsters, number of turns left etc). Take into account the corresponding values for your opponent and you'll get the eval function like this: a1*(my_hp - opp_hp) + a2*(my_monsters_total_fight - opp_monsters_total_fight) + a3*(my_monsters_total_damage - opp_monsters_total_damage) + a4*(my_number_of_multipliers - opp_number_of_multipliers) + ..., where coefficients a1,a2,.. can be positive or negative depending on the effect of corresponding variable (for instance, coefficient of hp variable a1 is positive etc.)
Now, this function might or might not work, but at least it will give you a starting point from which you can try to improve it or completely discard if it fails miserably. You can try to improve this evaluation function by tuning the coefficients, adding some non-linear terms to produce more complex relationships between the variables (multiplications, powers, logs etc.) and see how it affects performance. You can also try to automate the tuning process by using optimization techniques like genetic algorithms and differential evolution. In general, coming up with a good heuristic can be more an art than a science (after all, it's called heuristic for a reason). Start by trial and error and see how it goes.
I would like to build an AI for the following game:
there are two players on a M x N board
each player can move up/down or left/right
there are different items on the board
the player wins who has more items than the other player in as many categories as possible (having more items in one category makes you the winner of this category, the player with more categories wins the game)
in one turn you can either pick up an item you are standing on or move
player moves are made at the same time
two players standing on the same field have a 0.5 pickup chance if both do it
The game ends if one of the following condition is met:
all the items have been picked up
there is already a clear winner since one player has has more than half the items of more than half of the categories
I have no idea of AI, but I have taken a machine learning class some time ago.
How do I get started on such a problem?
Is there a generalization of this problem?
The canonical choice for adversarial search games like you proposed (called two player zero-sum games) is called Minimax search. From wikipedia, the goal of Minimax is to
Minimize the possible loss for a worst case (maximum loss) scenario. Alternatively, it can be thought of as maximizing the minimum gain.
Hence, it is called minimax, or maximin. Essentially you build a tree of Max and Min levels, where the nodes each have a branching factor equal to the number of possible actions at each turn, 4 in your case. Each level corresponds to one of the player's turns, and the tree extends until the end of the game, allowing you to search for the optimal choice at each turn, assuming the opponent is playing optimally as well. If your opponent is not playing optimally, you will only score better. Essentially, at each node you simulate every possible game and choose the best action for the current turn.
If it seems like generating all possible games would take a long time, you are correct, it's an exponential complexity algorithm. From here you would want to investigate alpha-beta pruning, which essentially allows you to eliminate some of the possible games you are enumerating based on the values you have found so far, and is a fairly simple modification of minimax. This solution will still be optimal. I defer to the wikipedia article for further explanation.
From there, you would want to experiment with different heuristics for eliminating nodes, which could prune the tree of a significant number of nodes to traverse, however do note that eliminating nodes via heuristics will potentially produce a sub-optimal, but still good solution depending on your heuristic. One common tactic is to limit the depth of the search tree, essentially you search maybe 5 moves ahead to determine the best current move, using an estimate of each player's score at 5 moves ahead. Once again, this is a heuristic you could tweak. Something like simply calculating the score of the game as if it ended on that turn might suffice, and is definitely a good starting point.
Finally, for the nodes where probability is concerned, there is a slight modification of Minimax called Expectiminimax that essentially takes care of probability by adding a "third" player that chooses the random choice for you. The nodes for this third player take the expected value of the random event as their value.
The usual approach to any such problem is to play the game with live opponent long enough to find some heuristic solutions (short term goals) that lead you to victory. Then you implement these heuristics in your solution. Start with really small boards (1x3) and small number of categories (1), play them and see what happens, and then advance to more complicated cases.
Without playing the game I can only imagine that categories with less items are more valuable, also categories with items currently closer to you, and categories with items that are farthest away from you but still closer to you than to the opponent.
Every category has a cost, which is number of moves required to gain control of it, but the cost for you is different from the cost for the opponent, and it changes with every move. Category has greater value to you if the cost for you is near the cost for the opponent, but is still less than opponent's cost.
Every time you make a move categories change their values, so you have to recalculate the board and go from there in deciding your next move. The goal is to maximize your values and minimize opponents values, assuming that opponent uses the same algorithm as you.
The search for best move gets more complicated if you explore more than one turn in advance, but is also more effective. In this case you have to simulate opponents moves using the same algorithm, and then choosing your move to which opponent has the weakest counter-move. This strategy is called minimax.
All this is not really an AI, but it is an road map for an algorithm. Neural networks mentioned in the other answer are more AI-like, but I don't know anything about them.
The goal of the AI is to always seek to maintain the win conditions.
If it is practical (depending on how item locations are stored), at the start of each turn, the distance to all remaining items should be known to the AI. Ideally, this would be calculated once when the game is started, then simply "adjusted" based on where the AI moves, instead of recalculating at each turn. It also wouldn't be wise to have the AI do the same thing for the player if the AI isn't going to be only considering it's own situation.
From there is a matter of determining what item should be picked up as an optimization of the following considerations:
What items and item categories does the AI currently have?
What items and item categories does the player currently have?
What items and item categories are near the AI?
What items and item categories are near the Player?
Exactly how you do this largely depends on how difficult to beat you want the AI to be.
A simple way would be to use a greedy approach and simply go after the "current" best choice. This could be done by simply finding the closest item that is not in a category that the player is currently winning by so many items (probably 1-3). This produces an AI that tries to win, but doesn't think ahead making it rather easy to predict.
Allowing for the greedy algorithm to check multiple turns ahead will improve the algorithm, that and considering what the player will do will improve the algorithm further.
Heuristics will lead to a more realistic AI and hard to beat AI. Possibly even practically impossible to beat.
I am currently developing a simple AI for Othello using minimax and alpha-beta pruning.
My question is related to the evaluation function for the state of the board.
I am currently looking to evaluate it by looking at:
Disc count (parity)
Number of legal moves
Importance of particular positions
So lets say the root node is the initial game state. The first action is the the AI's action while the second action is the opponent's action.
0
/ \ AI's Action
1 1
/ \ \ Opponent's action
2 2 2
At node level 1, do I evaluate the disc count of my AI's chips and the number of legal moves it can make at the point of time after it has completed an action?
At node level 2, do I evaluate the disc count of the opponent's chips and the number of legal moves it can make at the point of time after the opponent has completed an action?
Meaning AI move -> Opponent move ==> At this point of time I evaluate the opponent's disc count and the number of legal the opponent can make.
When generating games trees, you shouldn't evaluate a node unless it is a leaf node. That is, you generate a tree until a level N (which corresponds to a board with N moves made ahead of the current state of the board) unless you have reached a node which corresponds to an end of game situation. It is only in those nodes when you should evaluate the state of the board game with your evaluation function. That is what the minimax algorithm is about. The only case I know in which you evaluate a node after every player move is in iterative deepening algorithm which seems you are not using.
The evaluation function is responsible for providing a quick assessment of the "score" of a particular position - in other words, which side is winning and by how much. It is also called static evaluation function because it looks only at a specific board configuration. So yes, when you reach level N you can count the possible moves of both the computer and the user and substract them. For example, if the result is positive it would mean that the computer has the advantage, if it is 0 it would mean a tie and it it is negative it will represent a disadvantage situation for the user in terms of mobility. Scoring a node which represents an end of game board configuration is trivial, assign a maximum value if you win and minimum value if you lose.
Mobility is one of the most important features to be considered in the evaluation function of most board games (those in which it is valuable). And to evaluate it, you count the possible moves of each player given a static board configuration no matter whose turn is next. Even if a player recently made a move, you are giving scores to boards in the same level N of the tree when the same player made the last move (therefore, scoring them in the same conditions) and picking of those the one which has the best score.
The features you are considering in your evaluation are very good. Usually, you want to consider material and mobility (which you are) in games in which they are very valuable (though, I don't know if material is always an advantage in Othello, you should know it better as it is the game you are working on) for a winning situation so I guess you are on the correct path.
EDIT: Be careful! In a leaf node the only thing you want to do is assign a particular score to a board configuration. It is in its parent node where that score is returned and compared with the other scores (corresponding to other children). In order to choose which is the best move available for a particular player, do the following: If the parent node corresponds to an opponent's move, then pick the one with the least (min)value. If it is the computer's turn to move, pick the score with the highest (max)value so that it represents the best possible move for this player.
End game evaluation function
If your search reaches a full board then the evaluation should simply be based on the disc count to determine who won.
Mid game evaluation function
The number of legal moves is useful in the opening and midgame as a large number of moves for you (and a low number for the opponent) normally indicates that you have a good position with lots of stable discs that your opponent cannot attack, while the opponent has a bad position where they may run out of moves and be forced to play a bad move (e.g. to let you play in the corner).
For this purpose, it does not matter greatly whose turn it is when counting the moves so I think you are on the correct path.
(Note that in the early stages of the game it is often an advantage to be the person with fewer discs as this normally means your opponent has few safe moves.)
Random evaluation function
Once upon a time I heard that just using a random number for the Othello evaluation function is (surprisingly to me) also a perfectly reasonable choice.
The logic is that the player with the most choices will be able to steer the game to get the highest random number, and so this approach again means that the AI will favour moves which give it lots of choices, and his opponent few.
A school project has me writing a Date game in C++ (example at http://www.cut-the-knot.org/Curriculum/Games/Date.shtml) where the computer player must implement a Minimax algorithm with alpha-beta pruning. Thus far, I understand what the goal is behind the algorithm in terms of maximizing potential gains while assuming the opponent will minify them.
However, none of the resources I read helped me understand how to design the evaluation function the minimax bases all it's decisions on. All the examples have had arbitrary numbers assigned to the leaf nodes, however, I need to actually assign meaningful values to those nodes.
Intuition tells me it'd be something like +1 for a win leaf node, and -1 for a loss, but how do intermediate nodes evaluate?
Any help would be most appreciated.
The most basic minimax evaluates only leaf nodes, marking wins, losses and draws, and backs those values up the tree to determine the intermediate node values. In the case that the game tree is intractable, you need to use a cutoff depth as an additional parameter to your minimax functions. Once the depth is reached, you need to run some kind of evaluation function for incomplete states.
Most evaluation functions in a minimax search are domain specific, so finding help for your particular game can be difficult. Just remember that the evaluation needs to return some kind of percentage expectation of the position being a win for a specific player (typically max, though not when using a negamax implementation). Just about any less researched game is going to closely resemble another more researched game. This one ties in very closely with the game pickup sticks. Using minimax and alpha beta only, I would guess the game is tractable.
If you are must create an evaluation function for non terminal positions, here is a little help with the analysis of the sticks game, which you can decide if its useful for the date game or not.
Start looking for a way to force an outcome by looking at a terminal position and all the moves which can lead to that position. In the sticks game, a terminal position is with 3 or fewer sticks remaining on the last move. The position that immediately proceeds that terminal position is therefore leaving 4 sticks to your opponent. The goal is now leave your opponent with 4 sticks no matter what, and that can be done from either 5, 6 or 7 sticks being left to you, and you would like to force your opponent to leave you in one of those positions. The place your opponent needs to be in order for you to be in either 5, 6 or 7 is 8. Continue this logic on and on and a pattern becomes available very quickly. Always leave your opponent with a number divisible by 4 and you win, anything else, you lose.
This is a rather trivial game, but the method for determining the heuristic is what is important because it can be directly applied to your assignment. Since the last to move goes first, and you can only change 1 date attribute at a time, you know to win there needs to be exactly 2 moves left... and so on.
Best of luck, let us know what you end up doing.
The simplest case of an evaluation function is +1 for a win, -1 for a loss and 0 for any non-finished position. Given your tree is deep enough, even this simple function will give you a good player. For any non-trivial games, with high branching factor, typically you need a better function, with some heuristics (e.g. for chess you could assign weights to pieces and find a sum, etc.). In the case of the Date game, I would just use the simplest evaluation function, with 0 for all the intermediate nodes.
As a side note, minimax is not the best algorithm for this particular game; but I guess you know it already.
From what I understand of the Date game you linked to, it seems that the only possible outcomes for a player are win or lose, there is not in between (please correct me if I'm wrong).
In this case, it is only a matter of assigning a value of 1 to a winning position (current player gets to Dec 31) and a value of -1 to the losing positions (other player gets to Dec 31).
Your minimax algorithm (without alpha-beta pruning) would look something like this:
A_move(day):
if day==December 31:
return +1
else:
outcome=-1
for each day obtained by increasing the day or month in cur_date:
outcome=max(outcome,B_move(day))
return outcome
B_move(day):
if day==December 31:
return -1
else:
outcome=+1
for each day obtained by increasing the day or month in cur_date:
outcome=min(outcome,A_move(day))
return outcome
I'm coding a board game where there is a bag of possible pieces. Each turn, players remove randomly selected pieces from the bag according to certain rules.
For my implementation, it may be easier to divide up the bag initially into pools for one or more players. These pools would be randomly selected, but now different players would be picking from different bags. Is this any different?
If one player's bag ran out, more would be randomly shuffled into it from the general stockpile.
So long as:
the partition into "pool" bags is random
the assignment of players to a given pool bag is random
the game is such that items drawn by the players are effectively removed from the bag (never returned to the bag,or any other bag, for the duration of the current game)
the players are not cognizant of the content of any of the bags
The two approaches ("original" with one big common bag, "modified" with one pool bag per player are equivalent with regards to probabilities.
It only gets a bit tricky towards the end of the game, when some of players' bags are empty. The fairest to let pick from 100% of the items still in play, hence, they should both pick from which bag they pick and [blindly,of course] pick one item from said bag.
This problem illustrate an interesting characteristic of probabilities which is that probabilities are relative to the amount of knowledge one has about the situation. For example the game host may well know that the "pool" bag assigned to say player X does not include any say Letter "A" (thinking about scrabble), but so long as none of the players know this (and so long as the partitions into pool bag was fully random), the game remains fair, and player "X" still has to assume that his/her probably of hitting an "A" the next time a letter is drawn, is the same as if all remaining letters were available to him/her.
Edit:
Not withstanding the mathematical validity of the assertion that both procedures are fully equivalent, perception is an important factor in games that include a chance component (in particular if the game also includes a pecuniary component). To avoid the ire of players who do not understand this equity, you may stick to the original procedure...
Depending on the game rules, #mjv is right, the initial random division doesn't affect the probabilities. This is analogous to a game where n players draw cards in turn from a face down deck: the initial shuffle of the deck is the random division into the "bags" of cards for each player.
But if you replace the items after each draw, it does matter if there is one bag or many. With one bag any particular item will eventually be drawn by any player with the same probability. With many bags, that item can only be drawn by the player whose bag it was initially placed in.
Popping up to the software level, if the game calls for a single bag, I'd recommend just programming it that way: it should be no more difficult than n bags, and you don't have to prove the new game equivalent to the old.
My intuition tells me that dividing a random collection of things into smaller random subsets would remains equally random... doesn't matter if a player picks from a big pool or a smaller one (that in turns, feed itself into the big one)
For a game it is enough random IMHO!
Depending on how crucial security is, it might be okay (if money is involved (you or them) DO NOT DO THAT). I'm not entirely sure it would be less random from the perspective of an ignorant player.
a) Don't count on them being ignorant, your program could be cracked and then they would know what pieces are coming up
b) It would be very tricky to fill the bags in such a way that you don't introduce vulnerabilities. For instance, let's take the naive algorithm of picking one randomly and putting it in a the first bucket, taking it out, and then doing the same for the second bucket and so on. You just ensured that if there are N pieces, the first player had a probability of 1/N of picking a given piece, the second player had a 1/(N-1), the third had 1/(N-3) and so on. Players can then analyze the pieces already played in order to figure out the probabilities that other players are holding certain pieces.
I THINK the following algorithm might work better, but almost all people get probability wrong the first time they come up with a new algorithm. DON'T USE THIS, just understand that it might cover the security vulnerability I talked about:
Create a list of N ordered items and instantiate P players
Mark 1/P of the items randomly (with replacement) for each player
Do this repeatedly until all N items are marked and there are an equal
number of items marked for each player (NOTE: May take much longer than you may live depending on N and P)
Place the appropriate items in the player's bucket and randomly rearrange (do NOT use a place swapping algorithm)
Even then after all this, you might still have a vulnerability to someone figuring out what's in their bucket from an exploit. Stick with a combined pool, it's still tricky to pick really randomly, but it will make your life easier.
Edit: I know the tone sounds kind of jerky. I mostly included all that bold for people who might read this out of context and try some of these algorithms. I really wish you well :-)
Edit 2: On further consideration, I think that the problem with picking in order might reduce to having players taking turns in the first place. If that's in the rules already it might not matter.