Minimax algorithm with memoization? - algorithm

I am trying to implement a connect four AI using the minimax algorithm in javascript. Currently, it is very slow. Other than alpha-beta pruning which I will implement, I was wondering if it is worth it to hash game states to
their heuristic evaluations
the next best move
I can immediately see why 2 would be useful since there are many ways to get to the same game state but I am wondering if I also have to hash the current depth to make this work.
For example, if I reached this state with a depth of 3 (so only say 4 more moves to look ahead) vs a depth of 2 with 5 moves to look ahead, I might arrive at a different answer. Doesn't this mean I should take the depth into account with the hash?
My second question is whether hashing boards to their evaluation is worth it. It takes me O(n) time to build my hash, and O(n) time to evaluate a board (though it's really more like O(2 or 3n)). Are game states usually hashed to their evaluations, or is this overkill? Thanks for any help

Whenever you hash a value of a state (using heuristics), you need to have information about the depth at which this state was evaluated. This is because there is a big difference between the value is 0.1 at depth 1 and the value is 0.1 at depth 20. In the first case we barely investigated the space, so we are pretty unsure what happens. In the second case we have already done huge amount of work, so we kind of know what are we talking about.
The thing is that for some games we do not know what the depth is for a position. For example chess. But in connect 4, looking at a position you know what is the depth.
For the connect 4 the depth here is 14 (only 14 circles have been put). So you do not need to store the depth.
As for whether you actually have to hash the state or re-evaluate it. Clearly a position in this game can be reached through many game-paths, so you kind of expect the hash to be helpful. Important question is the trade-off of creating/looking at a hash and how intensive your evaluation function is. If it looks like it does a lot of work - hash it and benchmark.
One last suggestion. You mentioned alpha-beta which is more helpful than hashing at your stage (and not that hard to implement). You can go further and implement move ordering for your alpha-beta. If I were you, I would do it, and only after that I would implement hashing.

Related

Minimal data structure to prevent 2D-grid traveler from repeat itself

I'm sorry if this is a duplicate of some thread, but I'm really not sure how to describe the question.
I'm wondering what is the minimal data structure to prevent 2D-grid traveler from repeating itself (i.e. travel to some point it already traveled before). The traveler can only move horizontally or vertically 1 step each time. For my special case (below), the 2D-grid is actually a lower-left triagle where one coordinate never exceeds another.
For example, with 1D case, this can be simply done by recording the direction of last travel. If direction changes, it's repeating itself.
For 2D case it becomes complicated. The most trivial way would be creating a list recording the points traveled before, but I'm wondering are there more efficient ways to do that?
I'm implementing a more-or-less "4-finger" algorithm for 4-sum where the 2 fingers in the middle moves in two directions (namely i, j, k, and l):
i=> <=j=> <=k=> <=l
1 2 3 ... 71 72 ... 123 124 ... 201 202 203
The directions fingers travel are decided (or suggested) by some algorithm but might lead to forever-loop. Therefore, I have to force not to take some suggestion if the 2 fingers in the middle starts to repeat history position.
EDIT
Among these days, I found 2 solutions. None of them is ideal solution to this problem, but they're at least somewhat usable:
As #Sorin mentioned below, one solution would be saving a bit array representing state of all cells. For the triangular-grid example here, we can even condense the array to cut memory cost by half (though requiring k^2 time to compute the bit position where k is the degree of freedom i.e. 2 here. A standard array would use only linear time).
Another solution would be directly avoid backward-travelling. Set up the algorithm such that j and k only move in one direction (this is probably greedy).
But still since the 2D-grid traveler have the nice property that it moves along axis 1 step each time, I'm wondering are there more "specialized" representation
for this kind of movement.
Thanks for your help!
If you are looking for optimal lookup complexity, then a hashset is the best thing. You need O(N) memory but all lookups & insertions will be O(1).
If it's often that you visit most of the cells then you can even skip the hash part and store a bit array. That is store one bit for every cell and just check if the corresponding bit is 0 or 1. This is much more compact in memory (at least 32x, one bit vs. one int, but likely more as you also skip storing some pointers internal to the datastructure, 64 bits).
If this still take too much space, you could use a bloom filter (link), but that will give you some false positives (tells you that you've visited a cell, but in fact you didn't). If that's something you can live with the space savings are fairly huge.
Other structures like BSP or Kd-trees could work as well. Once you reach a point where everything is either free or occupied (ignoring the unused cells in the upper triangle) you can store all that information in a single node.
This is hard to recommend because of it's complexity and that it will likely also use O(N) memory in many cases, but with a larger constant. Also all checks will be O(logN).

Minimax with a-b prunning and transposition table

I'm trying to implement a minimax algorithm with alpha-beta prunning AND transposition table. This is for a pacman agent that may cycle, so special care must be taken about this. If a state (state of the game and turn (pacman or ghost)) is in the transposition table and the previous to be seen is a parent (grand-parent, ...) of the node, it can be discarded. This works for minimax without a-b prunning. From previous search, tt (transposition table) with a-b seems to be much much harder to implement. I'm trying to keep the code as clear as possible, it is based on this pseudo-code Artificial Intelligence: A Modern Approach. I would like to keep the final result as close as possible with this first approach.
Each pseudo-code I found was defined in a very diffrent way :
First pseudo-code ;
Second pseudo-code ; Third pseudo-code
Most of the differences seem cosmetic. But none of those codes has exactly the structure I'm looking for : a minimax divided with a minValue and a maxValue with a-b prunning
Thanks in advance,
Please ask for any futher explanation
I'm still kind of new to more advanced AI optimization, but I'll share what I've learned. Two of the pseudo-code links (1 and 3) are both Negamax, which is trickier than minimax because it is less intuitive. The two differing implementations of Negamax in 1 and 3 require different evaluation functions and that is the main reason for their differences (more below). The second link you posted is for MTD(f) which I haven't implemented before, but I believe is different still from both Minimax and Negamax. I believe MTD(f) is considered to be faster. Lastly, the only resource I have ever seen for Minimax with transposition tables is here and I am really not sure if it is even correct. Negamax is pretty much the standard and if you can use Minimax, you can probably use Negamax instead.
While Negamax and Minimax look different, they are essentially doing the same thing. This blog post gives a pretty good description of how they're related, but doesn't explain the differences. I will try explain why they're different below.
Why minimax and negamax look different but are essentially the same becomes a little more apparent after thinking about a few things related to Minimax:
Minimax only works for 2 player games with one player being the maximizer and another being the minimizer. Tic Tac Toe is a simple example.
A typical evaluation function for Minimax will return +100 if X won in a terminal state, will return -100 if O won in a terminal state, and will return 0 for a draw.
Notice how the scores are the inverse of one another. Every point gained by player 1 is a point lost for player 2. It's a zero sum game.
And a few points about Negamax:
Negamax also only works for 2 player zero sum games. Every point for player 1 is a point lost for player 2.
Negamax uses a slightly different evaluation function than Minimax. It requires that the evaluation is always done from the current player's point of view. That is to say that if in a terminal state X won and it is X's turn, the evaluation should be +100. If it is in a terminal state where X won but it's O's turn, the evaluation would be -100. This is different than what Minimax expects (Minimax always wants and X win to be worth +100). Pseudo-code 1 expects this type of evaluation function.
Some Negamax pseudo-code's, like the wikipedia article in 3, try to use the same evaluation function as Minimax by negating the evaluation function value using color in this line "return color × the heuristic value of node". This also works, but I've never done it that way (links to how I do it below). Note that the color value will only be -1 for min players. I find this way to be more confusing all around.
Now that the evaluation function is described... notice this line "value := max(value, −negamax(child, depth − 1, −β, −α, −color))" in pseudo-code 3. Notice that the returned value (some evaluation value), which is always from the current player's point of view is inverted. That's because turns alternate and the eval came from a child state, the other player's turn. The alpha and beta values are also inverted.
With Minimax we're coming up with positive and negative evaluations. With Negamax we're always creating positive evaluations and then inverting them as necessary, hence Nega. This is possible because the game is zero sum, a point for player 1 is a loss of a point for player 2.
Why use Negamax? Because its simpler. It is more challenging to implement the first time, but makes things more concise. I also believe that transposition table entries need to be handled differently (more complex) for Minimax than for Negamax. Most importantly, everyone else uses it. I wish I had a better explanation for why.
Here is the best resource I have found for implementing Transposition Tables with Negamax (most pseudo-code isn't all that helpful):
Iterative Deepening NegaScout with alpha beta pruning and transposition tables
I've also implemented vanilla Negamax with transposition tables, but I can no longer find the resources I used. To convert the above to vanilla Negamax you simply replace lines 504 (beginning with // null window search) down to 521 with "goodness = -minimax(state, depth - 1, -beta, -alpha);" The extra lines in that code block are the "scout" part which starts with a narrow search alphaBeta window and widens it as needed. Generally NegaScout is better than NegaMax. I could share my full source, but I'd need some time to prepare something that is fit for posting to SO.
If for some reason you cannot implement Negamax, this is the only resource I have found for implementing Transposition Tables with Minimax.
Lastly, I want to throw out a couple things:
When using transposition tables you will probably want to use Iterative Deepening as it provides a natural cutoff when time is your constraint
When using transposition tables you will want to consider isomorphic boards. That is you will want to consider the same board in reflecting positions. Example: Evaluating this board in tic tac toe XOX|---|X-- is the same as evaluating X--|---|XOX (vertical flip). Not sure if this will apply to Pacman but it is a huge improvement when available. In Tic Tac Toe it leads to 70-90% of search states being shaved off w/ transposition tables. Reply in a comment if you want to discuss.
If you're implementing your game in JavaScript, take note that standard Zobrist keys won't work because JS binary operators operate on 32 bits instead of 64. There are a few different ways to do it, but I suggest starting with just using strings as a key in a {} object.
If you're searching for a multiplayer AI you should be looking at Hypermax / Max-N. Minimax and Negamax fail beyond 2 players.

Why is chess, checkers, Go, etc. in EXP but conjectured to be in NP?

If I tell you the moves for a game of chess and declare who wins, why can't it be checked in polynomial time if the winner does really win? This would make it an NP problem from my understanding.
First of all: The number of positions you can set up with 32 pieces on a 8x8 field is limited. We need to consider any pawn being converted to any other piece and include any such available position, too. Of course, among all these, there are some positions that cannot be reached following the rules of chess, but this does not matter. The important thing is: we have a limit. Lets name this limit simply MaxPositions.
Now for any given position, let's build up a tree as follows:
The given position is the root.
Add any position (legal chess position or not) as child.
For any of these children, add any position as child again.
Continue this way, until your tree reaches a depth of MaxPositions.
I'm now too tired to think of if we need one additional level of depth or not for the idea (proof?), but heck, just let's add it. The important thing is: the tree constructed like this is limited.
Next step: Of this tree, remove any sub-tree that is not reachable from the root via legal chess moves. Repeat this step for the remaining children, grand-children, ..., until there is no unreachable position left in the whole tree. The number of steps must be limited, as the tree is limited.
Now do a breadth-first search and make any node a leaf if it has been found previously. It must be marked as such(!; draw candidate?). Same for any mate position.
How to find out if there is a forced mate? In any sub tree, if it is your turn, there must be at least one child leading to a forced mate. If it is the opponents move, there must be a grand child for every child that leads to a mate. This applies recursively, of course. However, as the tree is limited, this whole algorithm is limited.
[sensored], this whole algorithm is limited! There is some constant limiting the whole stuff. So: although the limit is incredibly high (and far beyond what up-to-date hardware can handle), it is a limit (please do not ask me to calculate it...). So: our problem actually is O(1)!!!
The same for checkers, go, ...
This applies for the forced mate, so far. What is the best move? First, check if we can find a forced mate. If so, fine, we found the best move. If there are several, select the one with the least moves necessary (still there might be more than one...).
If there is no such forced mate, then we need to measure by some means the 'best' one. Possibly count the number of available successions to mate. Other propositions for measurement? As long as operating on this tree from top to down, we still remain limited. So again, we are O(1).
Now what did we miss? Have a look at the link in your comment again. They are talking about an NxN checkers! The author is varying size of the field!
So have a look back at how we constructed the tree. I think it is obvious that the tree grows exponentially with the size of the field (try to prove it yourself...).
I know very well that this answer is not a prove for that the problem is EXP(TIME). Actually, I admit, it is not really an answer at all. But I think what I illustrated still gives quite a good image/impression of the complexity of the problem. And as long as no one provides a better answer, I dare to claim that this is better than nothing at all...
Addendum, considering your comment:
Let me allow to refer to wikipedia. Actually, it should be suffient to transform the other problem in exponential time, not polynomial as in the link, as applying the transformation + solving the resulting problem still remains exponential. But I'm not sure about the exact definition...
It is sufficient to show this for a problem of which you know already it is EXP complete (transforming any other problem to this one and then to the chess problem again remains exponential, if both transformations are exponential).
Apparently, J.M. Robson found a way to do this for NxN checkers. It must be possible for generalized chess, too, probably simply modifying Robsons algorithm. I do not think it is possible for classical 8x8 chess, though...
O(1) applies for classical chess only, not for generalized chess. But it is the latter one for which we assume not being in NP! Actually, in my answer up to this addendum, there is one prove lacking: The size of the limited tree (if N is fix) does not grow faster than exponentially with growing N (so the answer actually is incomplete!).
And to prove that generalized chess is not in NP, we have to prove that there is no polynomial algorithm to solve the problem on a non-deterministic turing machine. This I leave open again, and my answer remains even less complete...
If I tell you the moves for a game of chess and declare who wins, why
can't it be checked in polynomial time if the winner does really win?
This would make it an NP problem from my understanding.
Because in order to check if the winner(white) does really win, you will have to also evaluate all possible moves that the looser(black) could've made in other to also win. That makes the checking also exponential.

Algorithm for Connect 4 Evaluation of Data Set

I am working on a connect 4 AI, and saw many people were using this data set, containing all the legal positions at 8 ply, and their eventual outcome.
I am using a standard minimax with alpha/beta pruning as my search algorithm. It seems like this data set could could be really useful for my AI. However, I'm trying to find the best way to implement it. I thought the best approach might be to process the list, and use the board state as a hash for the eventual result (win, loss, draw).
What is the best way for to design an AI to use a data set like this? Is my idea of hashing the board state, and using it in a traditional search algorithm (eg. minimax) on the right track? or is there is better way?
Update: I ended up converting the large move database to a plain test format, where 1 represented X and -1 O. Then I used a string of the board state, an an integer representing the eventual outcome, and put it in an std::unsorted_map (see Stack Overflow With Unordered Map to for a problem I ran into). The performance of the map was excellent. It built quickly, and the lookups were fast. However, I never quite got the search right. Is the right way to approach the problem to just search the database when the number of turns in the game is less than 8, then switch over to a regular alpha-beta?
Your approach seems correct.
For the first 8 moves, use alpha-beta algorithm, and use the look-up table to evaluate the value of each node at depth 8.
Once you have "exhausted" the table (exceeded 8 moves in the game) - you should switch to regular alpha-beta algorithm, that ends with terminal states (leaves in the game tree).
This is extremely helpful because:
Remember that the complexity of searching the tree is O(B^d) - where B is the branch factor (number of possible moves per state) and d is the needed depth until the end.
By using this approach you effectively decrease both B and d for the maximal waiting times (longest moves needed to be calculated) because:
Your maximal depth shrinks significantly to d-8 (only for the last moves), effectively decreasing d!
The branch factor itself tends to shrink in this game after a few moves (many moves become impossible or leading to defeat and should not be explored), this decreases B.
In the first move, you shrink the number of developed nodes as well
to B^8 instead of B^d.
So, because of these - the maximal waiting time decreases significantly by using this approach.
Also note: If you find the optimization not enough - you can always expand your look up table (to 9,10,... first moves), of course it will increase the needed space exponentially - this is a tradeoff you need to examine and chose what best serves your needs (maybe even store the entire game in file system if the main memory is not enough should be considered)

Writing Simulated Annealing algorithm for 0-1 knapsack in C#

I'm in the process of learning about simulated annealing algorithms and have a few questions on how I would modify an example algorithm to solve a 0-1 knapsack problem.
I found this great code on CP:
http://www.codeproject.com/KB/recipes/simulatedAnnealingTSP.aspx
I'm pretty sure I understand how it all works now (except the whole Bolzman condition, as far as I'm concerned is black magic, though I understand about escaping local optimums and apparently this does exactly that). I'd like to re-design this to solve a 0-1 knapsack-"ish" problem. Basically I'm putting one of 5,000 objects in 10 sacks and need to optimize for the least unused space. The actual "score" I assign to a solution is a bit more complex, but not related to the algorithm.
This seems easy enough. This means the Anneal() function would be basically the same. I'd have to implement the GetNextArrangement() function to fit my needs. In the TSM problem, he just swaps two random nodes along the path (ie, he makes a very small change each iteration).
For my problem, on the first iteration, I'd pick 10 random objects and look at the leftover space. For the next iteration, would I just pick 10 new random objects? Or am I best only swapping out a few of the objects, like half of them or only even one of them? Or maybe the number of objects I swap out should be relative to the temperature? Any of these seem doable to me, I'm just wondering if someone has some advice on the best approach (though I can mess around with improvements once I have the code working).
Thanks!
Mike
With simulated annealing, you want to make neighbour states as close in energy as possible. If the neighbours have significantly greater energy, then it will just never jump to them without a very high temperature -- high enough that it will never make progress. On the other hand, if you can come up with heuristics that exploit lower-energy states, then exploit them.
For the TSP, this means swapping adjacent cities. For your problem, I'd suggest a conditional neighbour selection algorithm as follows:
If there are objects that fit in the empty space, then it always puts the biggest one in.
If no objects fit in the empty space, then pick an object to swap out -- but prefer to swap objects of similar sizes.
That is, objects have a probability inverse to the difference in their sizes. You might want to use something like roulette selection here, with the slice size being something like (1 / (size1 - size2)^2).
Ah, I think I found my answer on Wikipedia.. It suggests moving to a "neighbor" state, which usually implies changing as little as possible (like swapping two cities in a TSM problem)..
From: http://en.wikipedia.org/wiki/Simulated_annealing
"The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities. The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts. In the previous example, the parts of the solution are the parts of the tour."
So I believe my GetNextArrangement function would want to swap out a random item with an item unused in the set..

Resources