I am working on a connect 4 AI, and saw many people were using this data set, containing all the legal positions at 8 ply, and their eventual outcome.
I am using a standard minimax with alpha/beta pruning as my search algorithm. It seems like this data set could could be really useful for my AI. However, I'm trying to find the best way to implement it. I thought the best approach might be to process the list, and use the board state as a hash for the eventual result (win, loss, draw).
What is the best way for to design an AI to use a data set like this? Is my idea of hashing the board state, and using it in a traditional search algorithm (eg. minimax) on the right track? or is there is better way?
Update: I ended up converting the large move database to a plain test format, where 1 represented X and -1 O. Then I used a string of the board state, an an integer representing the eventual outcome, and put it in an std::unsorted_map (see Stack Overflow With Unordered Map to for a problem I ran into). The performance of the map was excellent. It built quickly, and the lookups were fast. However, I never quite got the search right. Is the right way to approach the problem to just search the database when the number of turns in the game is less than 8, then switch over to a regular alpha-beta?
Your approach seems correct.
For the first 8 moves, use alpha-beta algorithm, and use the look-up table to evaluate the value of each node at depth 8.
Once you have "exhausted" the table (exceeded 8 moves in the game) - you should switch to regular alpha-beta algorithm, that ends with terminal states (leaves in the game tree).
This is extremely helpful because:
Remember that the complexity of searching the tree is O(B^d) - where B is the branch factor (number of possible moves per state) and d is the needed depth until the end.
By using this approach you effectively decrease both B and d for the maximal waiting times (longest moves needed to be calculated) because:
Your maximal depth shrinks significantly to d-8 (only for the last moves), effectively decreasing d!
The branch factor itself tends to shrink in this game after a few moves (many moves become impossible or leading to defeat and should not be explored), this decreases B.
In the first move, you shrink the number of developed nodes as well
to B^8 instead of B^d.
So, because of these - the maximal waiting time decreases significantly by using this approach.
Also note: If you find the optimization not enough - you can always expand your look up table (to 9,10,... first moves), of course it will increase the needed space exponentially - this is a tradeoff you need to examine and chose what best serves your needs (maybe even store the entire game in file system if the main memory is not enough should be considered)
Related
I'm building a database of chess evaluations (essentially a map from a chess position to an evaluation), and I want to use this to come up with a good move for given positions. The idea is to do a kind of "static" minimax, i.e.: for each position, use the stored evaluation if evaluations for child nodes (positions after next ply) are not available, otherwise use max (white to move)/min (black to move) evaluations of child nodes (which are determined in the same way).
The problem are, of course, loops in the graph, i.e. repeating positions. I can't fathom how to deal with this without making this infinitely less efficient.
The ideas I have explored so far are:
assume an evaluation of 0 for any position that can be reached in a game with less moves than are currently evaluated. This is an invalid assumption, because - for example - if White plays A, it might not be desirable for Black to follow up with x, but if White plays B, then y -> A -> x -> -B -> -y might be best line, resulting in the same position as A -> x, without any repetitions (-m denoting the inverse move to m here, lower case: Black moves, upper case: White moves).
having one instance for each possible way a position can be reached solves the loop problem, but this yields a bazillion of instances in some positions and is therefore not practical
the fact that there is a loop from a position back to that position doesn't mean that it's a draw by repetition, because playing the repeating line may not be best choice
I've tried iterating through the loops a few times to see if the overall evaluation would become stable. It doesn't, because in some cases, assuming the repeat is the best line means it isn't any longer - and then it goes back to the draw being the back line etc.
I know that chess engines use transposition tables to detect positions already reached before, but I believe this doesn't address my problem, and I actually wonder if there isn't an issue with them: a position may be reachable through two paths in the search tree - one of them going through the same position before, so it's a repeat, and the other path not doing that. Then the evaluation for path 1 would have to be 0, but the one for path 2 wouldn't necessarily be (path 1 may not be the best line), so whichever evaluation the transposition table holds may be wrong, right?
I feel sure this problem must have a "standard / best practice" solution, but google failed me. Any pointers / ideas would be very welcome!
I don't understand what the problem is. A minimax evaluation, unless we've added randomness to it, will have the exact same result for any given board position combined with who's turn it is and other key info. If we have the space available to store common board_position+who's_turn+castling+en passant+draw_related tuples (or hash thereof), go right ahead. When reaching that tuple in any other evaluation, just return the stored value or rely on its more detailed record for more complex evaluations (if the search yielding that record was not exhaustive, we can have different interpretations for it in any one evaluation). If the program also plays chess with time limits on the game, an additional time dimension (maybe a few broad blocks) would probably be needed in the memoisation as well.
(I assume you've read common public info about transposition tables.)
I'm sorry if this is a duplicate of some thread, but I'm really not sure how to describe the question.
I'm wondering what is the minimal data structure to prevent 2D-grid traveler from repeating itself (i.e. travel to some point it already traveled before). The traveler can only move horizontally or vertically 1 step each time. For my special case (below), the 2D-grid is actually a lower-left triagle where one coordinate never exceeds another.
For example, with 1D case, this can be simply done by recording the direction of last travel. If direction changes, it's repeating itself.
For 2D case it becomes complicated. The most trivial way would be creating a list recording the points traveled before, but I'm wondering are there more efficient ways to do that?
I'm implementing a more-or-less "4-finger" algorithm for 4-sum where the 2 fingers in the middle moves in two directions (namely i, j, k, and l):
i=> <=j=> <=k=> <=l
1 2 3 ... 71 72 ... 123 124 ... 201 202 203
The directions fingers travel are decided (or suggested) by some algorithm but might lead to forever-loop. Therefore, I have to force not to take some suggestion if the 2 fingers in the middle starts to repeat history position.
EDIT
Among these days, I found 2 solutions. None of them is ideal solution to this problem, but they're at least somewhat usable:
As #Sorin mentioned below, one solution would be saving a bit array representing state of all cells. For the triangular-grid example here, we can even condense the array to cut memory cost by half (though requiring k^2 time to compute the bit position where k is the degree of freedom i.e. 2 here. A standard array would use only linear time).
Another solution would be directly avoid backward-travelling. Set up the algorithm such that j and k only move in one direction (this is probably greedy).
But still since the 2D-grid traveler have the nice property that it moves along axis 1 step each time, I'm wondering are there more "specialized" representation
for this kind of movement.
Thanks for your help!
If you are looking for optimal lookup complexity, then a hashset is the best thing. You need O(N) memory but all lookups & insertions will be O(1).
If it's often that you visit most of the cells then you can even skip the hash part and store a bit array. That is store one bit for every cell and just check if the corresponding bit is 0 or 1. This is much more compact in memory (at least 32x, one bit vs. one int, but likely more as you also skip storing some pointers internal to the datastructure, 64 bits).
If this still take too much space, you could use a bloom filter (link), but that will give you some false positives (tells you that you've visited a cell, but in fact you didn't). If that's something you can live with the space savings are fairly huge.
Other structures like BSP or Kd-trees could work as well. Once you reach a point where everything is either free or occupied (ignoring the unused cells in the upper triangle) you can store all that information in a single node.
This is hard to recommend because of it's complexity and that it will likely also use O(N) memory in many cases, but with a larger constant. Also all checks will be O(logN).
I am trying to implement a connect four AI using the minimax algorithm in javascript. Currently, it is very slow. Other than alpha-beta pruning which I will implement, I was wondering if it is worth it to hash game states to
their heuristic evaluations
the next best move
I can immediately see why 2 would be useful since there are many ways to get to the same game state but I am wondering if I also have to hash the current depth to make this work.
For example, if I reached this state with a depth of 3 (so only say 4 more moves to look ahead) vs a depth of 2 with 5 moves to look ahead, I might arrive at a different answer. Doesn't this mean I should take the depth into account with the hash?
My second question is whether hashing boards to their evaluation is worth it. It takes me O(n) time to build my hash, and O(n) time to evaluate a board (though it's really more like O(2 or 3n)). Are game states usually hashed to their evaluations, or is this overkill? Thanks for any help
Whenever you hash a value of a state (using heuristics), you need to have information about the depth at which this state was evaluated. This is because there is a big difference between the value is 0.1 at depth 1 and the value is 0.1 at depth 20. In the first case we barely investigated the space, so we are pretty unsure what happens. In the second case we have already done huge amount of work, so we kind of know what are we talking about.
The thing is that for some games we do not know what the depth is for a position. For example chess. But in connect 4, looking at a position you know what is the depth.
For the connect 4 the depth here is 14 (only 14 circles have been put). So you do not need to store the depth.
As for whether you actually have to hash the state or re-evaluate it. Clearly a position in this game can be reached through many game-paths, so you kind of expect the hash to be helpful. Important question is the trade-off of creating/looking at a hash and how intensive your evaluation function is. If it looks like it does a lot of work - hash it and benchmark.
One last suggestion. You mentioned alpha-beta which is more helpful than hashing at your stage (and not that hard to implement). You can go further and implement move ordering for your alpha-beta. If I were you, I would do it, and only after that I would implement hashing.
Every once in a while I must deal with a list of elements that the user can sort manually.
In most cases I try to rely on a model using an order sensitive container, however this is not always possible and resort to adding a position field to my data. This position field is a double type, therefore I can always calculate a position between two numbers. However this is not ideal, because I am concerned about reaching an edge case where I do not have enough numerical precision to continue inserting between two numbers.
I am having doubts about the best approach to maintain my position numbers. The first thought is traversing all the rows and give them a round number after every insertion, like:
Right after dropping a row between 2 and 3:
1 2 2.5 3 4 5
After position numbers update:
1 2 3 4 5 6
That of course, might get heavy if I have a high number of entries. Not specially in memory, but to store all new values back to the disk/database. I usually work with some type of ORM and mobile software. Updating all the codes will pull out of disk every object and will set them as dirty, leading to a re-verification of all the related validation rules of my data model.
I could also wait until the precision is not enough to calculate a number between two positions. However the user experience would be bad, since the same operation will no longer require the same amount of time.
I believe that there is an standard algorithm for these cases that regularly and consistently keep the position numbers updated, or just some of them. Ideally it should be O(log n), with no big time differences between the worst and best cases.
Being honest I also think that anything that must be user/sorted, cannot grow as large as to become a real problem in its worst case. The edge case seems also to be extremely rare, even more if I search a solution pushing the border numbers. However I still believe that there is an standard well known solution for this problem which I am not aware of, and I would like to learn about it.
Second try.
Consider the full range of position values, say 0 -> 1000
The first item we insert should have a position of 500. Our list is now :
(0) -> 500 -> (1000).
If you insert another item at first position, we end up with :
(0) -> 250 -> 500 -> (1000).
If we keep inserting items at first position, we gonna have a problem, as our ranges are not equally balanced and... Wait... balanced ? Doesn't it sounds like a binary tree problem !?
Basically, you store your list as a binary tree. When inserting a node, you assign it a position according to surrounding nodes. When your tree become unbalanced, you rotate nodes to make it balanced again and you recompute position for rotated nodes !
So :
Most of the time, adding a node will not require to change position of other nodes.
When balancing is required, only a subset of your items will be changed.
It's O(log n) !
EDIT
If the user is actually sorting the list manually, then is there really any need to worry about taking O(n) to record the new order? It's O(n) in any case just to display the list to the user.
This not really answers the question but...
As you talked about "adding a position field to your data", I suppose that your data store is a relational database and that your data has some kind of identifier.
So maybe you can implement a doubly linked list by adding a previous_data_id and next_data_id to your data. Insert/move/remove operations thus are O(1).
Loading such a collection from a database is rather easy:
Fetch each item and add them to a map with their id as key.
For each item connect it with its previous and next item.
Starting with the first item (previous_data_id is undefined) follow the chain and add them to a list.
After some days with no valid answer. This is my theory:
The real challenge here is a practical solution. Maybe there is a mathematical correct solution, but every day that goes by, it seems that the implementation would be of a great complexity. A good solution should not only be mathematically correct, but also balanced with the nature the problem, the low chances to meet it, and its minor implications. Like how useless it could be killing flies with bullets, although extremely effective.
I am starting to believe that a good answer could be: to the hell with the right solution, leave it like one line calculation and live with the rare case where sorting of two elements might fail. It is not worth to increase complexity and invest time or money in such nity-picky problem, so rare, that causes no data damage, just a temporal UX glitch.
I am coding a board game. I have generated a game tree using alpha-beta pruning and have 2 options:
Use iterative deepening to optimize the alpha-beta so that it keeps generating one more ply until time is over.
By trial and error, I know the maximum depth reachable for every board configuration in the time limit without previously inspecting lower plies.
Which approach is better and will make the search reach a deeper depth? I know, for example, that at the beginning I can generate a tree of depth X consuming all the time available... Can iterative deepening add more depth?
Let me know if I can be more clear...
If your branching factor and evaluation function's required time stays about the same between different rounds you might be able to use option 2. But it sounds like it might be very difficult.
Option 1 has the ability to dramatically increase alpha-beta's performance if you don't have perfect move ordering already. You save the best move at every node, found during the previous search up to d-1, and then tell alpha-beta to search that first. It's called a transposition table, and if it is possible to implement this in your case option 1 will dominate option 2.
The first time I heard of a transposition table I didn't think it could make much of a difference, but it increased my maximum search depth by 50%.