Minimax with a-b prunning and transposition table - algorithm

I'm trying to implement a minimax algorithm with alpha-beta prunning AND transposition table. This is for a pacman agent that may cycle, so special care must be taken about this. If a state (state of the game and turn (pacman or ghost)) is in the transposition table and the previous to be seen is a parent (grand-parent, ...) of the node, it can be discarded. This works for minimax without a-b prunning. From previous search, tt (transposition table) with a-b seems to be much much harder to implement. I'm trying to keep the code as clear as possible, it is based on this pseudo-code Artificial Intelligence: A Modern Approach. I would like to keep the final result as close as possible with this first approach.
Each pseudo-code I found was defined in a very diffrent way :
First pseudo-code ;
Second pseudo-code ; Third pseudo-code
Most of the differences seem cosmetic. But none of those codes has exactly the structure I'm looking for : a minimax divided with a minValue and a maxValue with a-b prunning
Thanks in advance,
Please ask for any futher explanation

I'm still kind of new to more advanced AI optimization, but I'll share what I've learned. Two of the pseudo-code links (1 and 3) are both Negamax, which is trickier than minimax because it is less intuitive. The two differing implementations of Negamax in 1 and 3 require different evaluation functions and that is the main reason for their differences (more below). The second link you posted is for MTD(f) which I haven't implemented before, but I believe is different still from both Minimax and Negamax. I believe MTD(f) is considered to be faster. Lastly, the only resource I have ever seen for Minimax with transposition tables is here and I am really not sure if it is even correct. Negamax is pretty much the standard and if you can use Minimax, you can probably use Negamax instead.
While Negamax and Minimax look different, they are essentially doing the same thing. This blog post gives a pretty good description of how they're related, but doesn't explain the differences. I will try explain why they're different below.
Why minimax and negamax look different but are essentially the same becomes a little more apparent after thinking about a few things related to Minimax:
Minimax only works for 2 player games with one player being the maximizer and another being the minimizer. Tic Tac Toe is a simple example.
A typical evaluation function for Minimax will return +100 if X won in a terminal state, will return -100 if O won in a terminal state, and will return 0 for a draw.
Notice how the scores are the inverse of one another. Every point gained by player 1 is a point lost for player 2. It's a zero sum game.
And a few points about Negamax:
Negamax also only works for 2 player zero sum games. Every point for player 1 is a point lost for player 2.
Negamax uses a slightly different evaluation function than Minimax. It requires that the evaluation is always done from the current player's point of view. That is to say that if in a terminal state X won and it is X's turn, the evaluation should be +100. If it is in a terminal state where X won but it's O's turn, the evaluation would be -100. This is different than what Minimax expects (Minimax always wants and X win to be worth +100). Pseudo-code 1 expects this type of evaluation function.
Some Negamax pseudo-code's, like the wikipedia article in 3, try to use the same evaluation function as Minimax by negating the evaluation function value using color in this line "return color × the heuristic value of node". This also works, but I've never done it that way (links to how I do it below). Note that the color value will only be -1 for min players. I find this way to be more confusing all around.
Now that the evaluation function is described... notice this line "value := max(value, −negamax(child, depth − 1, −β, −α, −color))" in pseudo-code 3. Notice that the returned value (some evaluation value), which is always from the current player's point of view is inverted. That's because turns alternate and the eval came from a child state, the other player's turn. The alpha and beta values are also inverted.
With Minimax we're coming up with positive and negative evaluations. With Negamax we're always creating positive evaluations and then inverting them as necessary, hence Nega. This is possible because the game is zero sum, a point for player 1 is a loss of a point for player 2.
Why use Negamax? Because its simpler. It is more challenging to implement the first time, but makes things more concise. I also believe that transposition table entries need to be handled differently (more complex) for Minimax than for Negamax. Most importantly, everyone else uses it. I wish I had a better explanation for why.
Here is the best resource I have found for implementing Transposition Tables with Negamax (most pseudo-code isn't all that helpful):
Iterative Deepening NegaScout with alpha beta pruning and transposition tables
I've also implemented vanilla Negamax with transposition tables, but I can no longer find the resources I used. To convert the above to vanilla Negamax you simply replace lines 504 (beginning with // null window search) down to 521 with "goodness = -minimax(state, depth - 1, -beta, -alpha);" The extra lines in that code block are the "scout" part which starts with a narrow search alphaBeta window and widens it as needed. Generally NegaScout is better than NegaMax. I could share my full source, but I'd need some time to prepare something that is fit for posting to SO.
If for some reason you cannot implement Negamax, this is the only resource I have found for implementing Transposition Tables with Minimax.
Lastly, I want to throw out a couple things:
When using transposition tables you will probably want to use Iterative Deepening as it provides a natural cutoff when time is your constraint
When using transposition tables you will want to consider isomorphic boards. That is you will want to consider the same board in reflecting positions. Example: Evaluating this board in tic tac toe XOX|---|X-- is the same as evaluating X--|---|XOX (vertical flip). Not sure if this will apply to Pacman but it is a huge improvement when available. In Tic Tac Toe it leads to 70-90% of search states being shaved off w/ transposition tables. Reply in a comment if you want to discuss.
If you're implementing your game in JavaScript, take note that standard Zobrist keys won't work because JS binary operators operate on 32 bits instead of 64. There are a few different ways to do it, but I suggest starting with just using strings as a key in a {} object.
If you're searching for a multiplayer AI you should be looking at Hypermax / Max-N. Minimax and Negamax fail beyond 2 players.

Related

Chess programming: minimax, detecting repeats, transposition tables

I'm building a database of chess evaluations (essentially a map from a chess position to an evaluation), and I want to use this to come up with a good move for given positions. The idea is to do a kind of "static" minimax, i.e.: for each position, use the stored evaluation if evaluations for child nodes (positions after next ply) are not available, otherwise use max (white to move)/min (black to move) evaluations of child nodes (which are determined in the same way).
The problem are, of course, loops in the graph, i.e. repeating positions. I can't fathom how to deal with this without making this infinitely less efficient.
The ideas I have explored so far are:
assume an evaluation of 0 for any position that can be reached in a game with less moves than are currently evaluated. This is an invalid assumption, because - for example - if White plays A, it might not be desirable for Black to follow up with x, but if White plays B, then y -> A -> x -> -B -> -y might be best line, resulting in the same position as A -> x, without any repetitions (-m denoting the inverse move to m here, lower case: Black moves, upper case: White moves).
having one instance for each possible way a position can be reached solves the loop problem, but this yields a bazillion of instances in some positions and is therefore not practical
the fact that there is a loop from a position back to that position doesn't mean that it's a draw by repetition, because playing the repeating line may not be best choice
I've tried iterating through the loops a few times to see if the overall evaluation would become stable. It doesn't, because in some cases, assuming the repeat is the best line means it isn't any longer - and then it goes back to the draw being the back line etc.
I know that chess engines use transposition tables to detect positions already reached before, but I believe this doesn't address my problem, and I actually wonder if there isn't an issue with them: a position may be reachable through two paths in the search tree - one of them going through the same position before, so it's a repeat, and the other path not doing that. Then the evaluation for path 1 would have to be 0, but the one for path 2 wouldn't necessarily be (path 1 may not be the best line), so whichever evaluation the transposition table holds may be wrong, right?
I feel sure this problem must have a "standard / best practice" solution, but google failed me. Any pointers / ideas would be very welcome!
I don't understand what the problem is. A minimax evaluation, unless we've added randomness to it, will have the exact same result for any given board position combined with who's turn it is and other key info. If we have the space available to store common board_position+who's_turn+castling+en passant+draw_related tuples (or hash thereof), go right ahead. When reaching that tuple in any other evaluation, just return the stored value or rely on its more detailed record for more complex evaluations (if the search yielding that record was not exhaustive, we can have different interpretations for it in any one evaluation). If the program also plays chess with time limits on the game, an additional time dimension (maybe a few broad blocks) would probably be needed in the memoisation as well.
(I assume you've read common public info about transposition tables.)

Minimax algorithm with memoization?

I am trying to implement a connect four AI using the minimax algorithm in javascript. Currently, it is very slow. Other than alpha-beta pruning which I will implement, I was wondering if it is worth it to hash game states to
their heuristic evaluations
the next best move
I can immediately see why 2 would be useful since there are many ways to get to the same game state but I am wondering if I also have to hash the current depth to make this work.
For example, if I reached this state with a depth of 3 (so only say 4 more moves to look ahead) vs a depth of 2 with 5 moves to look ahead, I might arrive at a different answer. Doesn't this mean I should take the depth into account with the hash?
My second question is whether hashing boards to their evaluation is worth it. It takes me O(n) time to build my hash, and O(n) time to evaluate a board (though it's really more like O(2 or 3n)). Are game states usually hashed to their evaluations, or is this overkill? Thanks for any help
Whenever you hash a value of a state (using heuristics), you need to have information about the depth at which this state was evaluated. This is because there is a big difference between the value is 0.1 at depth 1 and the value is 0.1 at depth 20. In the first case we barely investigated the space, so we are pretty unsure what happens. In the second case we have already done huge amount of work, so we kind of know what are we talking about.
The thing is that for some games we do not know what the depth is for a position. For example chess. But in connect 4, looking at a position you know what is the depth.
For the connect 4 the depth here is 14 (only 14 circles have been put). So you do not need to store the depth.
As for whether you actually have to hash the state or re-evaluate it. Clearly a position in this game can be reached through many game-paths, so you kind of expect the hash to be helpful. Important question is the trade-off of creating/looking at a hash and how intensive your evaluation function is. If it looks like it does a lot of work - hash it and benchmark.
One last suggestion. You mentioned alpha-beta which is more helpful than hashing at your stage (and not that hard to implement). You can go further and implement move ordering for your alpha-beta. If I were you, I would do it, and only after that I would implement hashing.

Negamax - player moves twice

How do you handle games where, if a condition is met, the same player moves ?
I tried something like this but I don't think it's quite right:
function negamax(node, depth, α, β, color)
if node is a terminal node or depth = 0
return color * the heuristic value of node
else
foreach child of node
if (condition is met) // the same player moves
val := negamax(child, depth-1, α, β, color)
else
val := -negamax(child, depth-1, -β, -α, -color)
if val≥β
return val
if val≥α
α:=val
return α
Dont try changing the minimax algorithm itself for this, instead modify the game representation to accommodate. There are basically two solutions:
Represent the sequences of moves made by a single player as one move. This works when the game is simple, but wont always. I wrote an AI engine for a game where generating this tree (described as one "move" in the game rules) is PSPACE hard (and had a very large n for real games) meaning it was not computationally feasible. On the other-hand, if modeling it this way is easy for your particular game, do it that way
Represent the sequences of moves made by one player as a sequences of moves alternating moves, where the other player does do anything. That is, you add a piece of information to the game state whenever your condition is met, such that the only move the other player can make does not change the state. This method is mathematically correct, and when I used it worked pretty well in practice. The only complexity to keep in mind is that if you use iterative deepening you will under evaluate game trees where one player moves many times in a row. You also may need to be careful when designing storage heuristics to be used with the transposition table and other hashes
I know of no literature that discuses your particular problem. I felt clever when I came up with solution 2 above, but I suspect many other people have invented the same trick.
I should say, getting the minimax family right is surprisingly hard. A trick when designing game search AIs in high level languages is to test your search algorithms on simpler games (reduced board size, use tic-tac-toe, etc) to ensure correctness. If the game is small engough you can a. make sure its results make sense by playing the game out by hand and b. test advanced algorithms like negascout by making sure they give the same answer as naive negamax. It is also a good idea to try to keep code with game specific behavior (evaluation functions, board representations, search and storage heuristics, etc) away from the code that does tree searches.
In negamax, you are exploring a tree structure in which each node has children corresponding to the moves made by a player. If in some case a player can move twice, you would probably want to think of the "move" for that player as the two-move sequence that player makes. More generally, you should think of the children of the current game state as all possible states that the current player can get the game into after their turn. This includes all game states reachable by one move, plus all the game states reachable in two moves if the player is able to make two moves in one turn. You should, therefore, leave the basic logic of negamax unchanged, but update your code to generate successor states to handle the case where a single player can move twice.
Hope this helps!
When condition is met, don't decrement depth.

Minimax algorithm: Cost/evaluation function?

A school project has me writing a Date game in C++ (example at http://www.cut-the-knot.org/Curriculum/Games/Date.shtml) where the computer player must implement a Minimax algorithm with alpha-beta pruning. Thus far, I understand what the goal is behind the algorithm in terms of maximizing potential gains while assuming the opponent will minify them.
However, none of the resources I read helped me understand how to design the evaluation function the minimax bases all it's decisions on. All the examples have had arbitrary numbers assigned to the leaf nodes, however, I need to actually assign meaningful values to those nodes.
Intuition tells me it'd be something like +1 for a win leaf node, and -1 for a loss, but how do intermediate nodes evaluate?
Any help would be most appreciated.
The most basic minimax evaluates only leaf nodes, marking wins, losses and draws, and backs those values up the tree to determine the intermediate node values.  In the case that the game tree is intractable, you need to use a cutoff depth as an additional parameter to your minimax functions.  Once the depth is reached, you need to run some kind of evaluation function for incomplete states.
Most evaluation functions in a minimax search are domain specific, so finding help for your particular game can be difficult.  Just remember that the evaluation needs to return some kind of percentage expectation of the position being a win for a specific player (typically max, though not when using a negamax implementation).  Just about any less researched game is going to closely resemble another more researched game.  This one ties in very closely with the game pickup sticks.  Using minimax and alpha beta only, I would guess the game is tractable.  
If you are must create an evaluation function for non terminal positions, here is a little help with the analysis of the sticks game, which you can decide if its useful for the date game or not.
Start looking for a way to force an outcome by looking at a terminal position and all the moves which can lead to that position.  In the sticks game, a terminal position is with 3 or fewer sticks remaining on the last move.  The position that immediately proceeds that terminal position is therefore leaving 4 sticks to your opponent.  The goal is now leave your opponent with 4 sticks no matter what, and that can be done from either 5, 6 or 7 sticks being left to you, and you would like to force your opponent to leave you in one of those positions.  The place your opponent needs to be in order for you to be in either 5, 6 or 7 is 8.  Continue this logic on and on and a pattern becomes available very quickly.  Always leave your opponent with a number divisible by 4 and you win, anything else, you lose.  
This is a rather trivial game, but the method for determining the heuristic is what is important because it can be directly applied to your assignment.  Since the last to move goes first, and you can only change 1 date attribute at a time, you know to win there needs to be exactly 2 moves left... and so on.
Best of luck, let us know what you end up doing.
The simplest case of an evaluation function is +1 for a win, -1 for a loss and 0 for any non-finished position. Given your tree is deep enough, even this simple function will give you a good player. For any non-trivial games, with high branching factor, typically you need a better function, with some heuristics (e.g. for chess you could assign weights to pieces and find a sum, etc.). In the case of the Date game, I would just use the simplest evaluation function, with 0 for all the intermediate nodes.
As a side note, minimax is not the best algorithm for this particular game; but I guess you know it already.
From what I understand of the Date game you linked to, it seems that the only possible outcomes for a player are win or lose, there is not in between (please correct me if I'm wrong).
In this case, it is only a matter of assigning a value of 1 to a winning position (current player gets to Dec 31) and a value of -1 to the losing positions (other player gets to Dec 31).
Your minimax algorithm (without alpha-beta pruning) would look something like this:
A_move(day):
if day==December 31:
return +1
else:
outcome=-1
for each day obtained by increasing the day or month in cur_date:
outcome=max(outcome,B_move(day))
return outcome
B_move(day):
if day==December 31:
return -1
else:
outcome=+1
for each day obtained by increasing the day or month in cur_date:
outcome=min(outcome,A_move(day))
return outcome

What algorithm should I use for "genetic AI improvement"

First of all: This is not a question about how to make a program play Five in a Row. Been there, done that.
Introductory explanation
I have made a five-in-a-row-game as a framework to experiment with genetically improving AI (ouch, that sounds awfully pretentious). As with most turn-based games the best move is decided by assigning a score to every possible move, and then playing the move with the highest score. The function for assigning a score to a move (a square) goes something like this:
If the square already has a token, the score is 0 since it would be illegal to place a new token in the square.
Each square can be a part of up to 20 different winning rows (5 horizontal, 5 vertical, 10 diagonal). The score of the square is the sum of the score of each of these rows.
The score of a row depends on the number of friendly and enemy tokens already in the row. Examples:
A row with four friendly tokens should have infinite score, because if you place a token there you win the game.
The score for a row with four enemy tokens should be very high, since if you don't put a token there, the opponent will win on his next turn.
A row with both friendly and enemy tokens will score 0, since this row can never be part of a winning row.
Given this algorithm, I have declared a type called TBrain:
type
TBrain = array[cFriendly..cEnemy , 0..4] of integer;
The values in the array indicates the score of a row with either N friendly tokens and 0 enemy tokens, or 0 friendly tokens and N enemy tokens. If there are 5 tokens in a row there's no score since the row is full.
It's actually quite easy to decide which values should be in the array. Brain[0,4] (four friendly tokens) should be "infinite", let's call that 1.000.000. vBrain[1,4] should be very high, but not so high that the brain would prefer blocking several enemy wins rather than wining itself
Concider the following (improbable) board:
0123456789
+----------
0|1...1...12
1|.1..1..1.2
2|..1.1.1..2
3|...111...2
4|1111.1111.
5|...111....
6|..1.1.1...
7|.1..1..1..
8|1...1...1.
Player 2 should place his token in (9,4), winning the game, not in (4,4) even though he would then block 8 potential winning rows for player 1. Ergo, vBrain[1,4] should be (vBrain[0,4]/8)-1. Working like this we can find optimal values for the "brain", but again, this is not what I'm interested in. I want an algorithm to find the best values.
I have implemented this framework so that it's totally deterministic. There's no random values added to the scores, and if several squares have the same score the top-left will be chosen.
Actual problem
That's it for the introduction, now to the interesting part (for me, at least)
I have two "brains", vBrain1 and vBrain2. How should I iteratively make these better? I Imagine something like this:
Initialize vBrain1 and vBrain2 with random values.
Simulate a game between them.
Assign the values from the winner to the loser, then randomly change one of them slightly.
This doesn't seem work. The brains don't get any smarter. Why?
Should the score-method add some small random values to the result, so that two games between the same two brains would be different? How much should the values change for each iteration? How should the "brains" be initialized? With constant values? With random values?
Also, does this have anything to do with AI or genetic algorithms at all?
PS: The question has nothing to do with Five in a Row. That's just something I chose because I can declare a very simple "Brain" to experiment on.
If you want to approach this problem like a genetic algorithm, you will need an entire population of "brains". Then evaluate them against each other, either every combination or use a tournament style. Then select the top X% of the population and use those as the parents of the next generation, where offspring are created via mutation (which you have) or genetic crossover (e.g., swap rows or columns between two "brains").
Also, if you do not see any evolutionary progress, you may need more than just win/loss, but come up with some kind of point system so that you can rank the entire population more effectively, which makes selection easier.
Generally speaking, yes you can make a brain smarter by using genetic algorithms techniques.
Randomness, or mutation, plays a significant part on genetic programming.
I like this tutorial, Genetic Algorithms: Cool Name & Damn Simple.
(It uses Python for the examples but it's not difficult to understand them)
Take a look at Neuro Evolution of Augmenting Tologies (NEAT). A fancy acronymn which basically means the evolution of neural nets - both their structure (topology) and connection weights. I wrote a .Net implementation called SharpNEAT that you may wish to look at. SharpNEAT V1 also has a Tic-Tac-Toe experiment.
http://sharpneat.sourceforge.net/

Resources