AI: How to find an evaluation function in this game (minimax algo)?

AI: How to find an evaluation function in this game (minimax algo)? - algorithm

I am thinking about an AI for a game that I could implement. My question is about finding an evaluation function for this game in order to apply the minimax algorithm with alpha/beta cuts.
https://en.wikipedia.org/wiki/Minimax
https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning
Let me describe the game first, explain what I plan to achieve with my AI, and get to the problem.
The game:
A 2-player turn-by-turn game.
Goal is to kill opponent or have more life points at the end.
In comparison with Magic: The Gathering, you both have monsters to attack the opponent. The number is fixed, let’s say 5 each.
A monster has a fight ability (let's say between 1 and 10), and a damage ability (let's say between 1 and 5).
Each turn:
- Active player declares to his opponent which monster (he owns) engages the current fight.
- He secretly sets multipliers face down (let’s see that in next paragraph).
- Opponent declares which monster (he owns) fights against the first one, while setting multipliers the same way.
- Fight: fight ability * multipliers = final attack. Biggest attack wins and inflicts damage ability to opponent.
- Next turn, active player switch
About multipliers: you have 4 cards in hand that can double your attack (and many empty cards, so that you put 4 cards each turn on the table, and the opponent does not know if you multiplied by 1, 2, 4, 8 or 16).
Just in case: let's say we have a rule for draws to be solved.
What I expect with the AI:
To be able to say if a perfect player should win in a given position. That means, for a winnable position, AI should tell that there is a way that leads to victory, and give the steps (see example below). For a position that is a winnable by the opponent, I have not decided yet, neither for positions that do not lead to the same winner in all cases (they exist ;D).
** An example: **
2 rounds left to go. I have
- Monster A: fight: 5, damage: 2
- Monster B: fight: 3, damage: 4
- life: 5, 1 multiplier left, my turn to begin
My opponent has
- Monster C: fight: 2, damage: 6
- Monster D: fight: 8, damage: 1
-life: 5, 1 multiplier left
In that case, if you think about it, you win if you play well.
Solution:
You can see that if monster C wins, he inflicts 6 and I lost.
But if he loses, one my monsters will inflict at least 2, and even if monster D wins (before or after),
I won't die and I will have more life that my opponent. Victory.
That's an example of what I want the AI to find.
Of course, I simplified the example. Maybe it can be trickier. And that's where my question arrives.
We can mentally kind of see that it is simple to calculate all possible duels when we have 2 rounds left (the last round does not need calculation: it is deterministic if both play their last multipliers).
As we said, we have 5 rounds to go. But my point is that we could have 20, and it becomes very long to calculate everything (as in trying to find best move in first round).
Indeed, we won't try to compute that. In chess, for instance, too many positions lead to the impossibility of computing all possibilities.
But, if you follow me, there is a solution in chess - we can implement an evaluation function. How do we know that 10 moves ahead, this moves leads to a better position ? Because we evaluate this position. We claim that a position is better if it's checkmate, obviously, or if you have more pieces, or if you control the center and so on...
Then, my question here:
How to evaluate a position in the game I presented ?
I mean, first round, if I can compute the possible moves in the 2 next rounds, I arrive at all possible positions for round 3 or 4. But it does not seem to help in my opinion. You can have better life points, better cards, more left multipliers, it all depends on what will arrive next. I don't see advantages that are compliant in general situations. What about you ?
N.B.1 I hope it was clear, I simplified game rules, of course we could add rules (combo if 2 consecutive rounds won, multipliers applicable to damage ability...)
N.B.2 I thought about a neural network, but the question is still interesting for me. And a neural network seems hard to settle because of the multiple rounds (my knowledge is far more restricted than knowing any model with retroactive action in a neural network).
N.B.3 I think that minimax and alpha/beta cuts will help if I still do a full computation analysis, but what I am afraid of is computation time, that's why I ask this here. I could probably begin with complete computation for last-2-round positions, yes.
Thanks for reading, and I hope you find this problem as stimulating as I do!

One way to evaluate the position in any game is to try to understand the thinking process of players who are considered experts in the game. So you can find experts in this game and ask them about factors that determine their decisions during the game. Or you can become an expert yourself by studying the game and playing it a lot. It can be very hard to come up with a good evaluation function by just looking at the rules of the game.
I haven't played this game, but probably it makes sense to start with some simple heuristic which is a linear combination of variables that determine the game state (health points of your main character, number of multipliers you have, total fight/damage ability of all your monsters, maximum fight/damage ability of any of your monsters, number of turns left etc). Take into account the corresponding values for your opponent and you'll get the eval function like this: a1*(my_hp - opp_hp) + a2*(my_monsters_total_fight - opp_monsters_total_fight) + a3*(my_monsters_total_damage - opp_monsters_total_damage) + a4*(my_number_of_multipliers - opp_number_of_multipliers) + ..., where coefficients a1,a2,.. can be positive or negative depending on the effect of corresponding variable (for instance, coefficient of hp variable a1 is positive etc.)
Now, this function might or might not work, but at least it will give you a starting point from which you can try to improve it or completely discard if it fails miserably. You can try to improve this evaluation function by tuning the coefficients, adding some non-linear terms to produce more complex relationships between the variables (multiplications, powers, logs etc.) and see how it affects performance. You can also try to automate the tuning process by using optimization techniques like genetic algorithms and differential evolution. In general, coming up with a good heuristic can be more an art than a science (after all, it's called heuristic for a reason). Start by trial and error and see how it goes.

Related

Programming a probability to allow an AI decide when to discard a card or not in 5 card poker

I am writing an AI to play 5 card poker, where you are allowed to discard a card from your hand and swap it for another randomly dealt one if you wish. My AI can value every possible poker hand as shown in the answer to my previous question. In short, it assigns a unique value to each possible hand where a higher value correlates to a better/winning hand.
My task is to now write a function, int getDiscardProbability(int cardNumber) that gives my AI a number from 0-100 relating to whether or not it should discard this card (0 = defintely do not discard, 100 = definitely discard).
The approach I have thought up of was to compute every possible hand by swapping this card for every other card in the deck (assume there are still 47 left, for now), then compare each of their values with the current hand, count how many are better and so (count / 47) * 100 is my probability.
However, this solution is simply looking for any better hand, and not distinguishing between how much better one hand is. For example, if my AI had the hand 23457, it could discard the 7 for an K, producing a very slightly better hand (better high card), or it could exchange the 7 for an A or a 6, completing the Straight - a much better hand (much higher value) than a High King.
So, when my AI is calculating this probability, it would be increased by the same amount when it sees that the hand could be improved by getting the K than it would when it sees that the hand could be improved by getting an A or 6. Because of this, I somehow need to factor in the difference in value from my hand and each of the possible hands when calculating this probability. What would be a good approach to achieve this with?

Games in general have a chicken-egg problem: you want to design an AI that can beat a good player, but you need a good AI to train your AI against. I'll assume you're making an AI for a 2-player version of poker that has antes but no betting.
First, I'd note that if I had a table of probabilities for win-rate for each possible poker hand (of which there are surprisingly few really different ones), one can write a function that tells you the expected value from discarding a set of cards from your hand: simply enumerate all possible replacement cards and average the probability of winning with the hands. There's not that many cards to evaluate -- even if you don't ignore suits, and you're replacing the maximum 3 cards, you have only 47 * 46 * 43 / 6 = 16215 possibilities. In practice, there's many fewer interesting possibilities -- for example, if the cards you don't discard aren't all of the same suit, you can ignore suits completely, and if they are of the same suit, you only need to distinguish "same suit" replacements with "different suits" replacement. This is slightly trickier than I describe it, since you've got to be careful to count possibilities right.
Then your AI can work by enumerating all the possible sets of cards to discard of which there are (5 choose 0) + (5 choose 1) + (5 choose 2) + (5 choose 3) = 1 + 5 + 10 + 10 = 26, and pick the one with the highest expectation, as computed above.
The chicken-egg problem is that you don't have a table of win-rate probabilities per hand. I describe an approach for a different poker-related game here, but the idea is the same: http://paulhankin.github.io/ChinesePoker/ . This approach is not my idea, and essentially the same idea is used for example in game-theory-optimal solvers for real poker variants like piosolver.
Here's the method.
Start with a table of probabilities made up somehow. Perhaps you just start assuming the highest rank hand (AKQJTs) wins 100% of the time and the worst hand (75432) wins 0% of the time, and that probabilities are linear in between. It won't matter much.
Now, simulate tens of thousands of hands with your AI and count how often each hand rank is played. You can use this to construct a new table of win-rate probabilities. This new table of win-rate probabilities is (ignoring some minor theoretical issues) an optimal counter-strategy to your AI in that an AI that uses this table knows how likely your original AI is to end up with each hand, and plays optimally against that.
The natural idea is now to repeat the process again, and hope this yields better and better AIs. However, the process will probably oscillate and not settle down. For example, if at one stage of your training your AI tends to draw to big hands, the counter AI will tend to play very conservatively, beating your AI when it misses its draw. And against a very conservative AI, a slightly less conservative AI will do better. So you'll tend to get a sequence of less and less conservative AIs, and then a tipping point where your AI is beaten again by an ultra-conservative one.
But the fix for this is relatively simple -- just blend the old table and the new table in some way (one standard way is to, at step i, replace the table with a weighted average of 1/i of the new table and (i-1)/i of the old table). This has the effect of not over-adjusting to the most recent iteration. And ignoring some minor details that occur because of assumptions (for example, ignoring replacement effects from the original cards in your hand), this approach will give you a game-theoretically optimal AI, as described in: "An iterative method of solving a game, Julia Robinson (1950)."

A simple (but not so simple) way would be to use some kind of database with the hand combination probabilities (maybe University of Alberta Computer Poker Research Group Database).
The idea is getting to know each combination how much percentage of winning has. And doing the combination and comparing that percentage of each possible hand.
For instance, you have 5 cards, AAAKJ, and it's time to discard (or not).
AAAKJ has a winning percentage (which I ignore, lets say 75)
AAAK (discarting J) has a 78 percentage (let's say).
AAAJ (discarting K) has x.
AAA (discarting KJ) has y.
AA (discarting AKJ) has z.
KJ (discarting AAA) has 11 (?)..
etc..
And the AI would keep the one from the combination which had a higher rate of success.

Instead of counting how many are better you might compute a sum of probabilities Pi that the new hand (with swapped card) will win, i = 1, ..., 47.
This might be a tough call because of other players as you don't know their cards, and thus, their current chances to win. To make it easier, maybe an approximation of some sort can be applied.
For example, Pi = N_lose / N where N_lose is the amount of hands that would lose to the new hand with ith card, and N is the total possible amount of hands without the 5 that the AI is holding. Finally, you use the sum of Pi instead of count.

Game Theory: MEX rule and Nimbers

I've been reading this small tutorial on Nimbers and game theory.
Could someone explain why the mex rule governs the nimber of a game position?
See: http://en.wikipedia.org/wiki/Mex_(mathematics)
From the minimal excluded ordinal, it seems to me that the Nimber for a state is actually the minimum state that the person 'cannot' reach. How does that help in governing the state of the current game ?
I see a proof on Wikipedia, but I don't understand anything from it.
http://en.wikipedia.org/wiki/Sprague%E2%80%93Grundy_theorem#Proof

The entire idea of a nimber is to draw an analogy with the well understood game of Nim. So unless you understand THAT game, it won't make sense to you.
In the game of Nim we have a set of piles of things. On each turn, you take as many things as you want from one pile and one pile only. The winner is the person to take the last thing from the last pile.
Now try to convince yourself of the following facts.
In Nim, the nimber of a single pile is the size of that pile.
If we have a 2 pile game, the nimber of the position is the xor of the sizes of the two piles. (You will need to do a double induction.)
If we take the set of piles and split it into two, then the nimber of the whole position is the xor of the nimbers of the two subsets.
Now here is the point. Replace the piles with arbitrary deterministic games with a guaranteed win/lose. Turn the collection into a game where you're taking turns with different games, and the person who wins the last game wins. The nimber as defined above tells you, by analogy with Nim, how to play the combined game perfectly.
If you're playing just the regular 2 person game, then the only fact about the nimber that you actually need to know is whether it is 0 (you're in a losing position) or non-zero (you're in a winning position). The exact nimber is only useful when you can break a complex game into a collection of separate games that you are choosing between on each turn. However a surprising number of mathematical games do admit of such a structure.

For me, it was like this:
Understand Nim, and why the strategy works
Understand Poker Nim, and why the strategy is the same
Understand why the mex is the important number
Poker Nim is just like Nim, except that the players hold onto the ``coins'' that they remove, and on their turn, they may either move any positive number of coins from one stack into their hand, or move any positive number of coins from their hand onto one stack.
Initially, this feels very different. Play can even proceed for infinitely many moves! But that doesn't happen if Bob and Alice are playing hard. Suppose Bob looks at the stacks and sees that he would have a winning strategy if they were playing Nim and not Poker Nim. He can adapt that strategy to Nim as follows: if Alice takes coins off the table, he proceeds as if he is playing Nim; if Alice puts coins onto the table, he immediately removes the coins she just placed. Since she can only have finitely many coins in her hand, she can only stall finitely many times before she is forced to make her losing Nim move.
In Poker Nim, if I have 5 coins in hand and I look at a stack of 3 coins, I can on my move change it to any have 0, 1, 2, 4, 5, 6, 7, or 8 coins. What I can't do is leave it at the mex, which is 3. If I move it down, I am playing Nim. I move it up, you can immediately reverse it back to 3, and I am facing the same situation I was except that now I have fewer than 5 coins in hand.
So that's Poker Nim, and the essence of how the mex becomes relevant. Moves above the mex are reversible, and so cannot ever turn a losing position into a winning won. Moving above the mex is never helpful. Unless you are trying to overwhelm the computational power of your opponent, that is.

Minimax algorithm: Cost/evaluation function?

A school project has me writing a Date game in C++ (example at http://www.cut-the-knot.org/Curriculum/Games/Date.shtml) where the computer player must implement a Minimax algorithm with alpha-beta pruning. Thus far, I understand what the goal is behind the algorithm in terms of maximizing potential gains while assuming the opponent will minify them.
However, none of the resources I read helped me understand how to design the evaluation function the minimax bases all it's decisions on. All the examples have had arbitrary numbers assigned to the leaf nodes, however, I need to actually assign meaningful values to those nodes.
Intuition tells me it'd be something like +1 for a win leaf node, and -1 for a loss, but how do intermediate nodes evaluate?
Any help would be most appreciated.

The most basic minimax evaluates only leaf nodes, marking wins, losses and draws, and backs those values up the tree to determine the intermediate node values.  In the case that the game tree is intractable, you need to use a cutoff depth as an additional parameter to your minimax functions.  Once the depth is reached, you need to run some kind of evaluation function for incomplete states.
Most evaluation functions in a minimax search are domain specific, so finding help for your particular game can be difficult.  Just remember that the evaluation needs to return some kind of percentage expectation of the position being a win for a specific player (typically max, though not when using a negamax implementation).  Just about any less researched game is going to closely resemble another more researched game.  This one ties in very closely with the game pickup sticks.  Using minimax and alpha beta only, I would guess the game is tractable.  
If you are must create an evaluation function for non terminal positions, here is a little help with the analysis of the sticks game, which you can decide if its useful for the date game or not.
Start looking for a way to force an outcome by looking at a terminal position and all the moves which can lead to that position.  In the sticks game, a terminal position is with 3 or fewer sticks remaining on the last move.  The position that immediately proceeds that terminal position is therefore leaving 4 sticks to your opponent.  The goal is now leave your opponent with 4 sticks no matter what, and that can be done from either 5, 6 or 7 sticks being left to you, and you would like to force your opponent to leave you in one of those positions.  The place your opponent needs to be in order for you to be in either 5, 6 or 7 is 8.  Continue this logic on and on and a pattern becomes available very quickly.  Always leave your opponent with a number divisible by 4 and you win, anything else, you lose.  
This is a rather trivial game, but the method for determining the heuristic is what is important because it can be directly applied to your assignment.  Since the last to move goes first, and you can only change 1 date attribute at a time, you know to win there needs to be exactly 2 moves left... and so on.
Best of luck, let us know what you end up doing.

The simplest case of an evaluation function is +1 for a win, -1 for a loss and 0 for any non-finished position. Given your tree is deep enough, even this simple function will give you a good player. For any non-trivial games, with high branching factor, typically you need a better function, with some heuristics (e.g. for chess you could assign weights to pieces and find a sum, etc.). In the case of the Date game, I would just use the simplest evaluation function, with 0 for all the intermediate nodes.
As a side note, minimax is not the best algorithm for this particular game; but I guess you know it already.

From what I understand of the Date game you linked to, it seems that the only possible outcomes for a player are win or lose, there is not in between (please correct me if I'm wrong).
In this case, it is only a matter of assigning a value of 1 to a winning position (current player gets to Dec 31) and a value of -1 to the losing positions (other player gets to Dec 31).
Your minimax algorithm (without alpha-beta pruning) would look something like this:
A_move(day):
if day==December 31:
return +1
else:
outcome=-1
for each day obtained by increasing the day or month in cur_date:
outcome=max(outcome,B_move(day))
return outcome
B_move(day):
if day==December 31:
return -1
else:
outcome=+1
for each day obtained by increasing the day or month in cur_date:
outcome=min(outcome,A_move(day))
return outcome

Gomoku array-based AI-algorithm?

Way way back (think 20+ years) I encountered a Gomoku game source code in a magazine that I typed in for my computer and had a lot of fun with.
The game was difficult to win against, but the core algorithm for the computer AI was really simply and didn't account for a lot of code. I wonder if anyone knows this algorithm and has some links to some source or theory about it.
The things I remember was that it basically allocated an array that covered the entire board. Then, whenever I, or it, placed a piece, it would add a number of weights to all locations on the board that the piece would possibly impact.
For instance (note that the weights are definitely wrong as I don't remember those):
1 1 1
2 2 2
3 3 3
444
1234X4321
3 3 3
2 2 2
1 1 1
Then it simply scanned the array for an open location with the lowest or highest value.
Things I'm fuzzy on:
Perhaps it had two arrays, one for me and one for itself and there was a min/max weighting?
There might've been more to the algorithm, but at its core it was basically an array and weighted numbers
Does this ring a bell with anyone at all? Anyone got anything that would help?

Reading your description, and thinking a little about it, I think it probably works with a single array, exactly the way you described.
To accomplish the goal of getting five-in-a-row you have to (a) prevent the opponent from succeeding and (b) succeed yourself.
To succeed yourself, you have to place stones near other stones you already have on the board, so it makes sense to add a positive score for fields next to your stones that could participate in a row. Either the linear example you gave, or something quadratic would probably work well.
To prevent your opponent from succeeding, you have to place stones next to his / her stones. It's especially good if you strike two birds with a single stone, so opponent's stones should increase the value of the surrounding fields the same way yours do -- the more stones he already has lined up, the higher the score, and the more likely the algorithm will try to cut the opponent off.
The most important thing here is the weighting of the different fields, and whether the opponent's stones are weighted differently than yours. Unfortunately I can't help with that, but the values should be reasonably simple to figure out through trial and error once the game itself is written.
However this is a very basic approach, and would be outperformed by a tree search algorithm. Searching Google, there's a related paper on Threat search, which apparently works well for Gomoku. The paper is behind a pay-wall though :/

I haven't read the article, but from the description my guess would be some form of the Minimax algorithm

I saw this algorithm you mentioned - it was pretty simple and fast (no backtracking :-)) and it played very well :-) I must have the source somewhere but it is a lot years ago... There were weights for your stones depending on how much of other stones were near, and weights of oponent stones. These were lower so the algorithm preferred the attacking strategy.
But this is of course very trivial algorithm. Winning strategy has been already found.
See this paper: L. Victor Allis, H. J. van den Herik, M. P. H. Huntjens. Go-Moku and Threat-Space Search. It helped me a lot when I was writting my own program. This way you'll be able to write program which is very good in attacking the opponent and finding winning combinations.

It's an ancient game - I found the code on Planet Source Code. I played this game during college and in 286 days had a BASIC version of it.

Here is the program you are looking for
ftp://ftp.mrynet.com/USENIX/80.1/boulder/dpw/gomoku.c
It is almost 40 years old

Working on an open source version for iPhone.
Hit me up if interested in joining!
https://github.com/kigster/kigomoku

What algorithm should I use for "genetic AI improvement"

First of all: This is not a question about how to make a program play Five in a Row. Been there, done that.
Introductory explanation
I have made a five-in-a-row-game as a framework to experiment with genetically improving AI (ouch, that sounds awfully pretentious). As with most turn-based games the best move is decided by assigning a score to every possible move, and then playing the move with the highest score. The function for assigning a score to a move (a square) goes something like this:
If the square already has a token, the score is 0 since it would be illegal to place a new token in the square.
Each square can be a part of up to 20 different winning rows (5 horizontal, 5 vertical, 10 diagonal). The score of the square is the sum of the score of each of these rows.
The score of a row depends on the number of friendly and enemy tokens already in the row. Examples:
A row with four friendly tokens should have infinite score, because if you place a token there you win the game.
The score for a row with four enemy tokens should be very high, since if you don't put a token there, the opponent will win on his next turn.
A row with both friendly and enemy tokens will score 0, since this row can never be part of a winning row.
Given this algorithm, I have declared a type called TBrain:
type
TBrain = array[cFriendly..cEnemy , 0..4] of integer;
The values in the array indicates the score of a row with either N friendly tokens and 0 enemy tokens, or 0 friendly tokens and N enemy tokens. If there are 5 tokens in a row there's no score since the row is full.
It's actually quite easy to decide which values should be in the array. Brain[0,4] (four friendly tokens) should be "infinite", let's call that 1.000.000. vBrain[1,4] should be very high, but not so high that the brain would prefer blocking several enemy wins rather than wining itself
Concider the following (improbable) board:
0123456789
+----------
0|1...1...12
1|.1..1..1.2
2|..1.1.1..2
3|...111...2
4|1111.1111.
5|...111....
6|..1.1.1...
7|.1..1..1..
8|1...1...1.
Player 2 should place his token in (9,4), winning the game, not in (4,4) even though he would then block 8 potential winning rows for player 1. Ergo, vBrain[1,4] should be (vBrain[0,4]/8)-1. Working like this we can find optimal values for the "brain", but again, this is not what I'm interested in. I want an algorithm to find the best values.
I have implemented this framework so that it's totally deterministic. There's no random values added to the scores, and if several squares have the same score the top-left will be chosen.
Actual problem
That's it for the introduction, now to the interesting part (for me, at least)
I have two "brains", vBrain1 and vBrain2. How should I iteratively make these better? I Imagine something like this:
Initialize vBrain1 and vBrain2 with random values.
Simulate a game between them.
Assign the values from the winner to the loser, then randomly change one of them slightly.
This doesn't seem work. The brains don't get any smarter. Why?
Should the score-method add some small random values to the result, so that two games between the same two brains would be different? How much should the values change for each iteration? How should the "brains" be initialized? With constant values? With random values?
Also, does this have anything to do with AI or genetic algorithms at all?
PS: The question has nothing to do with Five in a Row. That's just something I chose because I can declare a very simple "Brain" to experiment on.

If you want to approach this problem like a genetic algorithm, you will need an entire population of "brains". Then evaluate them against each other, either every combination or use a tournament style. Then select the top X% of the population and use those as the parents of the next generation, where offspring are created via mutation (which you have) or genetic crossover (e.g., swap rows or columns between two "brains").
Also, if you do not see any evolutionary progress, you may need more than just win/loss, but come up with some kind of point system so that you can rank the entire population more effectively, which makes selection easier.

Generally speaking, yes you can make a brain smarter by using genetic algorithms techniques.
Randomness, or mutation, plays a significant part on genetic programming.
I like this tutorial, Genetic Algorithms: Cool Name & Damn Simple.
(It uses Python for the examples but it's not difficult to understand them)

Take a look at Neuro Evolution of Augmenting Tologies (NEAT). A fancy acronymn which basically means the evolution of neural nets - both their structure (topology) and connection weights. I wrote a .Net implementation called SharpNEAT that you may wish to look at. SharpNEAT V1 also has a Tic-Tac-Toe experiment.
http://sharpneat.sourceforge.net/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio