How to account for position's history in transposition tables

How to account for position's history in transposition tables - algorithm

I'm currently developing a solver for a trick-based card game called Skat in a perfect information situation. Although most of the people may not know the game, please bear with me; my problem is of a general nature.
Short introduction to Skat:
Basically, each player plays one card alternatingly, and every three cards form a trick. Every card has a specific value. The score that a player has achieved is the result of adding up the value of every card contained in the tricks that the respective player has won. I left out certain things that are unimportant for my problem, e.g. who plays against whom or when do I win a trick.
What we should keep in mind is that there is a running score, and who played what before when investigating a certain position (-> its history) is relevant to that score.
I have written an alpha beta algorithm in Java which seems to work fine, but it's way too slow. The first enhancement that seems the most promising is the use of a transposition table. I read that when searching the tree of a Skat game, you will encounter a lot of positions that have already been investigated.
And that's where my problem comes into play: If I find a position that has already been investigated before, the moves leading to this position have been different. Therewith, in general, the score (and alpha or beta) will be different, too.
This leads to my question: How can I determine the value of a position, if I know the value of the same position, but with a different history?
In other words: How can I decouple a subtree from its path to the root, so that it can be applied to a new path?
My first impulse was it's just not possible, because alpha or beta could have been influenced by other paths, which might not be applicable to the current position, but...
There already seems to be a solution
...that I don't seem to understand. In Sebastion Kupferschmid's master thesis about a Skat solver, I found this piece of code (maybe C-ish / pseudo code?):
def ab_tt(p, alpha, beta):
if p isa Leaf:
return 0
if hash.lookup(p, val, flag):
if flag == VALID:
return val
elif flag == LBOUND:
alpha = max(alpha, val)
elif flag == UBOUND:
beta = min(beta, val)
if alpha >= beta:
return val
if p isa MAX_Node:
res = alpha
else:
res = beta
for q in succ(p):
if p isa MAX_Node:
succVal = t(q) + ab_tt(q, res - t(q), beta - t(q))
res = max(res, succVal)
if res >= beta:
hash.add(p, res, LBOUND)
return res
elif p isa MIN_Node:
succVal = t(q) + ab_tt(q, alpha - t(q), res - t(q))
res = min(res, succVal)
if res <= alpha:
hash.add(p, res, UBOUND)
return res
hash.add(p, res, VALID)
return res
It should be pretty self-explanatory. succ(p) is a function that returns every possible move at the current position. t(q) is what I believe to be the running score of the respective position (the points achieved so far by the declarer).
Since I don't like copying stuff without understanding it, this should just be an aid for anyone who would like to help me out. Of course, I have given this code some thought, but I can't wrap my head around one thing: By subtracting the current score from alpha/beta before calling the function again [e.g. ab_tt(q, res - t(q), beta - t(q))], there seems to be some kind of decoupling going on. But what exactly is the benefit if we store the position's value in the transposition table without doing the same subtraction right here, too? If we found a previously investigated position, how come we can just return its value (in case it's VALID) or use the bound value for alpha or beta? The way I see it, both storing and retrieving values from the transposition table won't account for the specific histories of these positions. Or will it?
Literature:
There's almost no English sources out there that deal with AI in skat games, but I found this one: A Skat Player Based on Monte Carlo Simulation by Kupferschmid, Helmert. Unfortunately, the whole paper and especially the elaboration on transposition tables is rather compact.
Edit:
So that everyone can imagine better how the score develops thoughout a Skat game until all cards have been played, here's an example. The course of the game is displayed in the lower table, one trick per line. The actual score after each trick is on its left side, where +X is the declarer's score (-Y is the defending team's score, which is irrelevant for alpha beta). As I said, the winner of a trick (declarer or defending team) adds the value of each card in this trick to their score.
The card values are:
Rank J A 10 K Q 9 8 7
Value 2 11 10 4 3 0 0 0

I solved the problem. Intead of doing weird subtractions upon each recursive call, as suggested by the reference in my question, I subtract the running score from the resulting alpha beta value, only when storing a position in the transposition table:
For exact values (the position hasn't been pruned):
transpo.put(hash, new int[] { TT_VALID, bestVal - node.getScore()});
If the node caused a beta-cutoff:
transpo.put(hash, new int[] { TT_LBOUND, bestVal - node.getScore()});
If the node caused an alpha-cutoff:
transpo.put(hash, new int[] { TT_UBOUND, bestVal - node.getScore()});
Where:
transpo is a HashMap<Long, int[]>
hash is the long value representing that position
bestVal is either the exact value or the value that caused a cutoff
TT_VALID, TT_LBOUND and TT_UBOUND are simple constants, describing the type of transposition table entry
However, this didn't work per se. After posting the same question on gamedev.net, a user named Álvaro gave me the deciding hint:
When storing exact scores (TT_VALID), I should only store positions, that improved alpha.

Related

Reducing the time complexity of this algorithm

I'm playing a game that has a weapon-forging component, where you combine two weapons to get a new one. The sheer number of weapon combinations (see "6.1. Blade Combination Tables" at http://www.gamefaqs.com/ps/914326-vagrant-story/faqs/8485) makes it difficult to figure out what you can ultimately create out of your current weapons through repeated forging, so I tried writing a program that would do this for me. I give it a list of weapons that I currently have, such as:
francisca
tabarzin
kris
and it gives me the list of all weapons that I can forge:
ball mace
chamkaq
dirk
francisca
large crescent
throwing knife
The problem is that I'm using a brute-force algorithm that scales extremely poorly; it takes about 15 seconds to calculate all possible weapons for seven starting weapons, and a few minutes to calculate for eight starting weapons. I'd like it to be able to calculate up to 64 weapons (the maximum that you can hold at once), but I don't think I'd live long enough to see the results.
function find_possible_weapons(source_weapons)
{
for (i in source_weapons)
{
for (j in source_weapons)
{
if (i != j)
{
result_weapon = combine_weapons(source_weapons[i], source_weapons[j]);
new_weapons = array();
new_weapons.add(result_weapon);
for (k in source_weapons)
{
if (k != i && k != j)
new_weapons.add(source_weapons[k]);
}
find_possible_weapons(new_weapons);
}
}
}
}
In English: I attempt every combination of two weapons from my list of source weapons. For each of those combinations, I create a new list of all weapons that I'd have following that combination (that is, the newly-combined weapon plus all of the source weapons except the two that I combined), and then I repeat these steps for the new list.
Is there a better way to do this?
Note that combining weapons in the reverse order can change the result (Rapier + Firangi = Short Sword, but Firangi + Rapier = Spatha), so I can't skip those reversals in the j loop.
Edit: Here's a breakdown of the test example that I gave above, to show what the algorithm is doing. A line in brackets shows the result of a combination, and the following line is the new list of weapons that's created as a result:
francisca,tabarzin,kris
[francisca + tabarzin = chamkaq]
chamkaq,kris
[chamkaq + kris = large crescent]
large crescent
[kris + chamkaq = large crescent]
large crescent
[francisca + kris = dirk]
dirk,tabarzin
[dirk + tabarzin = francisca]
francisca
[tabarzin + dirk = francisca]
francisca
[tabarzin + francisca = chamkaq]
chamkaq,kris
[chamkaq + kris = large crescent]
large crescent
[kris + chamkaq = large crescent]
large crescent
[tabarzin + kris = throwing knife]
throwing knife,francisca
[throwing knife + francisca = ball mace]
ball mace
[francisca + throwing knife = ball mace]
ball mace
[kris + francisca = dirk]
dirk,tabarzin
[dirk + tabarzin = francisca]
francisca
[tabarzin + dirk = francisca]
francisca
[kris + tabarzin = throwing knife]
throwing knife,francisca
[throwing knife + francisca = ball mace]
ball mace
[francisca + throwing knife = ball mace]
ball mace
Also, note that duplicate items in a list of weapons are significant and can't be removed. For example, if I add a second kris to my list of starting weapons so that I have the following list:
francisca
tabarzin
kris
kris
then I'm able to forge the following items:
ball mace
battle axe
battle knife
chamkaq
dirk
francisca
kris
kudi
large crescent
scramasax
throwing knife
The addition of a duplicate kris allowed me to forge four new items that I couldn't before. It also increased the total number of forge tests to 252 for a four-item list, up from 27 for the three-item list.
Edit: I'm getting the feeling that solving this would require more math and computer science knowledge than I have, so I'm going to give up on it. It seemed like a simple enough problem at first, but then, so does the Travelling Salesman. I'm accepting David Eisenstat's answer since the suggestion of remembering and skipping duplicate item lists made such a huge difference in execution time and seems like it would be applicable to a lot of similar problems.

Start by memoizing the brute force solution, i.e., sort source_weapons, make it hashable (e.g., convert to a string by joining with commas), and look it up in a map of input/output pairs. If it isn't there, do the computation as normal and add the result to the map. This often results in big wins for little effort.
Alternatively, you could do a backward search. Given a multiset of weapons, form predecessors by replacing one of the weapon with two weapons that forge it, in all possible ways. Starting with the singleton list consisting of the singleton multiset consisting of the goal weapon, repeatedly expand the list by predecessors of list elements and then cull multisets that are supersets of others. Stop when you reach a fixed point.
If linear programming is an option, then there are systematic ways to prune search trees. In particular, let's make the problem easier by (i) allowing an infinite supply of "catalysts" (maybe not needed here?) (ii) allowing "fractional" forging, e.g., if X + Y => Z, then 0.5 X + 0.5 Y => 0.5 Z. Then there's an LP formulation as follows. For all i + j => k (i and j forge k), the variable x_{ijk} is the number of times this forge is performed.
minimize sum_{i, j => k} x_{ijk} (to prevent wasteful cycles)
for all i: sum_{j, k: j + k => i} x_{jki}
- sum_{j, k: j + i => k} x_{jik}
- sum_{j, k: i + j => k} x_{ijk} >= q_i,
for all i + j => k: x_{ijk} >= 0,
where q_i is 1 if i is the goal item, else minus the number of i initially available. There are efficient solvers for this easy version. Since the reactions are always 2 => 1, you can always recover a feasible forging schedule for an integer solution. Accordingly, I would recommend integer programming for this problem. The paragraph below may still be of interest.
I know shipping an LP solver may be inconvenient, so here's an insight that will let you do without. This LP is feasible if and only if its dual is bounded. Intuitively, the dual problem is to assign a "value" to each item such that, however you forge, the total value of your inventory does not increase. If the goal item is valued at more than the available inventory, then you can't forge it. You can use any method that you can think of to assign these values.

I think you are unlikely to get a good general answer to this question because
if there was an efficient algorithm to solve your problem, then it would also be able to solve NP-complete problems.
For example, consider the problem of finding the maximum number of independent rows in a binary matrix.
This is a known NP-complete problem (e.g. by showing equivalence to the maximum independent set problem).
We can reduce this problem to your question in the following manner:
We can start holding one weapon for each column in the binary matrix, and then we imagine each row describes an alternative way of making a new weapon (say a battle axe).
We construct the weapon translation table such that to make the battle axe using method i, we need all weapons j such that M[i,j] is equal to 1 (this may involve inventing some additional weapons).
Then we construct a series of super weapons which can be made by combining different numbers of our battle axes.
For example, the mega ultimate battle axe may require 4 battle axes to be combined.
If we are able to work out the best weapon that can be constructed from your starting weapons, then we have solved the problem of finding the maximum number of independent rows in the original binary matrix.

It's not a huge saving, however looking at the source document, there are times when combining weapons produces the same weapon as one that was combined. I assume that you won't want to do this as you'll end up with less weapons.
So if you added a check for if the result_weapon was the same type as one of the inputs, and didn't go ahead and recursively call find_possible_weapons(new_weapons), you'd trim the search down a little.
The other thing I could think of, is you are not keeping a track of work done, so if the return from find_possible_weapons(new_weapons) returns the same weapon that you already have got by combining other weapons, you might well be performing the same search branch multiple times.
e.g. if you have a, b, c, d, e, f, g, and if a + b = x, and c + d = x, then you algorithm will be performing two lots of comparing x against e, f, and g. So if you keep a track of what you've already computed, you'll be onto a winner...
Basically, you have to trim the search tree. There are loads of different techniques to do this: it's called search. If you want more advice, I'd recommend going to the computer science stack exchange.
If you are still struggling, then you could always start weighting items/resulting items, and only focus on doing the calculation on 'high gain' objects...

You might want to start by creating a Weapon[][] matrix, to show the results of forging each pair. You could map the name of the weapon to the index of the matrix axis, and lookup of the results of a weapon combination would occur in constant time.

Genetic/Evolutionary algorithm - Painter

My task:
Create a program to copy a picture (given as input) using primitives only (like triangle or something). The program should use evolutionary algorithm to create output picture.
My question:
I need to invent an algorithm to create populations and check them (how much - in % - they match the input picture).
I have an idea; you can find it below.
So what I want from you: advice (if you find my idea not so bad) or inspiration (maybe you have a better idea?)
My idea:
Let's say that I'll use only triangles to build the output picture.
My first population is P pictures (generated by using T randomly generated triangles - called Elements).
I check by my fitness function every pictures in population and choose E of them as elite and rest of population just remove:
To compare 2 pictures we check every pixel in picture A and compare his R,G,B with
the same pixel (the same coordinates) in picture B.
I use this:
SingleDif = sqrt[ (Ar - Br)^2 + (Ag - Bg)^2 + (Ab - Bb)^2]
then i sum all differences (from all pixels) - lets call it SumDif
and use:
PictureDif = (DifMax - SumDif)/DifMax
where
DifMax = pictureHeight * pictureWidth * 255*3
The best are used to create the next population in this way:
picture MakeChild(picture Mother, picture Father)
{
picture child;
for( int i = 0; i < T; ++i )
{
j //this is a random number from 0 to 1 - created now
if( j < 0.5 ) child.element(i) = Mother.element(i);
else child.element(i) = Father.element(i)
if( j < some small % ) mutate( child.element(i) );
}
return child;
}
So it's quite simple. Only the mutation needs a comment: So there is always some small probability that element X in child will be different than X in his parent. To do this we make random changes in element in child (change his colour by random number, or add random number to his (x,y) coordinate - or his node).
So this is my idea... I didn't test it, didn't code it.
Please check my idea - what do you think about it?

I would make the number of patches of each child dynamic and get the mutation operation to insert/delete patches with some (low) probability. Of course this could result in a lot of redundancy and bloat in the child's genome. In these situations, it is usually a good idea to use the length of an individual's genome as a parameter of the fitness function so that individuals get rewarded (with a higher fitness value) for using fewer patches. So for example if the PictureDif of individuals A and B are the same but the A has fewer patches than B, then A has a higher fitness.
Another issue is the reproductive operator that you proposed (namely, the crossover operation). In order for the evolutionary process to work efficiently, you need to achieve a reasonable exploration and exploitation balance. One way of doing this is by having a set of reproductive operators that exhibit a good fitness correlation [1] which means the fitness of a child must be close to the fitness of its parent(s).
In the case of single parent reproduction you only need to find the right mutation parameters. However, when it comes to multi-parent reproduction (crossover) one of the frequently used techniques is to produce 2 children (instead of 1) from the same 2 parents. For the first child, each gene comes from the mother with the probability of 0.2 and from the father with the probability of 0.8, and for the second child the other way around. Of course after the crossover, you can do the mutation.
Oh and one more thing, for the mutation operators, when you say
... make random changes in element in child (change his colour by random number, or add random number to his (x,y) coordinate - or his node)
it's a good idea to use a Gaussian distribution to change the colour, coordinate etc.
[1] Evolutionary Computation: A unified approach by Kenneth A. De Jong, page 69

How exactly to use "History Heuristic" in alpha-beta minimax?

I'm making an AI for a chess game.
So far, I've successfully implemented the Alpha-Beta Pruning Minimax algorithm, which looks like this (from Wikipedia):
(* Initial call *)
alphabeta(origin, depth, -∞, +∞, TRUE)
function alphabeta(node, depth, α, β, maximizingPlayer)
if depth = 0 or node is a terminal node
return the heuristic value of node
if maximizingPlayer
for each child of node
α := max(α, alphabeta(child, depth - 1, α, β, FALSE))
if β ≤ α
break (* β cut-off *)
return α
else
for each child of node
β := min(β, alphabeta(child, depth - 1, α, β, TRUE))
if β ≤ α
break (* α cut-off *)
return β
Since this costs too much time complexity (going through all the trees one by one), I came across something called "History Heuristic".
The Algorithm from the original paper:
int AlphaBeta(pos, d, alpha, beta)
{
if (d=0 || game is over)
return Eval (pos); // evaluate leaf position from current player’s standpoint
score = - INFINITY; // preset return value
moves = Generate(pos); // generate successor moves
for i=1 to sizeof(moves) do // rating all moves
rating[i] = HistoryTable[ moves[i] ];
Sort( moves, rating ); // sorting moves according to their history scores
for i =1 to sizeof(moves) do { // look over all moves
Make(moves[i]); // execute current move
cur = - AlphaBeta(pos, d-1, -beta, -alpha); //call other player
if (cur > score) {
score = cur;
bestMove = moves[i]; // update best move if necessary
}
if (score > alpha) alpha = score; //adjust the search window
Undo(moves[i]); // retract current move
if (alpha >= beta) goto done; // cut off
}
done:
// update history score
HistoryTable[bestMove] = HistoryTable[bestMove] + Weight(d);
return score;
}
So basically, the idea is to keep track of a Hashtable or a Dictionary for previous "moves".
Now I'm confused what this "move" means here.
I'm not sure if it literally refers to a single move or a overall state after each move.
In chess, for example, what should be the "key" for this hashtable be?
Individual moves like (Queen to position (0,1)) or (Knight to position (5,5))?
Or the overall state of the chessboard after individual moves?
If 1 is the case, I guess the positions of other pieces are not taken into account when recording the "move" into my History table?

I think the original paper (The History Heuristic and Alpha-Beta Search Enhancements in Practice, Jonathan Schaeffer) available on-line answers the question clearly. In the paper, the author defined move as the 2 indices (from square and to) on the chess board, using a 64x64 table (in effect, I think he used bit shifting and a single index array) to contain the move history.
The author compared all the available means of move ordering and determined that hh was the best. If current best practice has established an improved form of move ordering (beyond hh + transposition table), I would also like to know what it is.

You can use a transposition table so you avoid evaluating the same board multiple times. Transposition meaning you can reach the same board state by performing moves in different orders. Naive example:
1. e4 e5 2. Nf3 Nc6
1. e4 Nc6 2. Nf3 e5
These plays result in the same position but were reached differently.
http://en.wikipedia.org/wiki/Transposition_table
A common method is called Zobrist hashing to hash a chess position:
http://en.wikipedia.org/wiki/Zobrist_hashing

From my experience the history heuristic produces negligible benefits compared to other techniques, and is not worthwhile for a basic search routine. It is not the same thing as using transposition table. If the latter is what you want to implement, I'd still advise against it. There are many other techniques that will produce good results for far less effort. In fact, an efficient and correct transposition table is one of the most difficult parts to code in a chess engine.
First try pruning and move ordering heuristics, most of which are one to a few lines of code. I've detailed such techniques in this post, which also gives estimates of the performance gains you can expect.

In chess, for example, what should be the "key" for this hashtable be?
Individual moves like (Queen to position (0,1)) or (Knight to position (5,5))?
Or the overall state of the chessboard after individual moves?
The key is an individual move and the positions of other pieces aren't taken into account when recording the "move" into the history table.
The traditional form of the history table (also called butterfly board) is something like:
score history_table[side_to_move][from_square][to_square];
For instance, if the move e2-e4 produces a cutoff, the element:
history_table[white][e2][e4]
is (somehow) incremented (irrespectively from the position in which the move has been made).
As in the example code, history heuristics uses those counters for move ordering. Other heuristics can take advantage of history tables (e.g. late move reductions).
Consider that:
usually history heuristics isn't applied to plain Alpha-Beta with no knowledge of move ordering (in chess only "quiet" moves are ordered via history heuristic);
there are alternative forms for the history table (often used is history_table[piece][to_square]).

Grundy's game extended to more than two heaps

How can In break a heap into two heaps in the Grundy's game?
What about breaking a heap into any number of heaps (no two of them being equal)?

Games of this type are analyzed in great detail in the book series "Winning Ways for your Mathematical Plays". Most of the things you are looking for are probably in volume 1.
You can also take a look at these links: Nimbers (Wikipedia), Sprague-Grundy theorem (Wikipedia) or do a search for "combinatorial game theory".
My knowledge on this is quite rusty, so I'm afraid I can't help you myself with this specific problem. My excuses if you were already aware of everything I linked.
Edit: In general, the method of solving these types of games is to "build up" stack sizes. So start with a stack of 1 and decide who wins with optimal play. Then do the same for a stack of 2, which can be split into 1 & 1. The move on to 3, which can be split into 1 & 2. Same for 4 (here it gets trickier): 3 & 1 or 2 & 2, using the Spague-Grundy theorem & the algebraic rules for nimbers, you can calculate who will win. Keep going until you reach the stack size for which you need to know the answer.
Edit 2: The website I was talking about in the comments seems to be down. Here is a link of a backup of it: Wayback Machine - Introduction to Combinatorial Games.

Grundy's Game, and many games like it, can be solved with an algorithm like this:
//returns a Move object representing the current player's optimal move, or null if the player has no chance of winning
function bestMove(GameState g){
for each (move in g.possibleMoves()){
nextState = g.applyMove(move)
if (bestMove(nextState) == null){
//the next player's best move is null, so if we take this move,
//he has no chance of winning. This is good for us!
return move;
}
}
//none of our possible moves led to a winning strategy.
//We have no chance of winning. This is bad for us :-(
return null;
}
Implementations of GameState and Move depend on the game. For Grundy's game, both are simple.
GameState stores a list of integers, representing the size of each heap in the game.
Move stores an initialHeapSize integer, and a resultingHeapSizes list of integers.
GameState::possibleMoves iterates through its heap size list, and determines the legal divisions for each one.
GameState::applyMove(Move) returns a copy of the GameState, except the move given to it is applied to the board.
GameState::possibleMoves can be implemented for "classic" Grundy's Game like so:
function possibleMoves(GameState g){
moves = []
for each (heapSize in g.heapSizes){
for each (resultingHeaps in possibleDivisions(heapSize)){
Move m = new Move(heapSize, resultingHeaps)
moves.append(m)
}
}
return moves
}
function possibleDivisions(int heapSize){
divisions = []
for(int leftPileSize = 1; leftPileSize < heapSize; leftPileSize++){
int rightPileSize = heapSize - leftPileSize
if (leftPileSize != rightPileSize){
divisions.append([leftPileSize, rightPileSize])
}
}
return divisions
}
Modifying this to use the "divide into any number of unequal piles" rule is just a matter of changing the implementation of possibleDivisions.
I haven't calculated it exactly, but an unoptimized bestMove has a pretty crazy worst-case runtime. Once you start giving it a starting state of around 12 stones, you'll get long wait times. So you should implement memoization to improve performance.
For best results, keep each GameState's heap size list sorted, and discard any heaps of size 2 or 1.

What algorithm for a tic-tac-toe game can I use to determine the "best move" for the AI?

In a tic-tac-toe implementation I guess that the challenging part is to determine the best move to be played by the machine.
What are the algorithms that can pursued? I'm looking into implementations from simple to complex. How would I go about tackling this part of the problem?

The strategy from Wikipedia for playing a perfect game (win or tie every time) seems like straightforward pseudo-code:
Quote from Wikipedia (Tic Tac Toe#Strategy)
A player can play a perfect game of Tic-tac-toe (to win or, at least, draw) if they choose the first available move from the following list, each turn, as used in Newell and Simon's 1972 tic-tac-toe program.[6]
Win: If you have two in a row, play the third to get three in a row.
Block: If the opponent has two in a row, play the third to block them.
Fork: Create an opportunity where you can win in two ways.
Block Opponent's Fork:
Option 1: Create two in a row to force
the opponent into defending, as long
as it doesn't result in them creating
a fork or winning. For example, if "X"
has a corner, "O" has the center, and
"X" has the opposite corner as well,
"O" must not play a corner in order to
win. (Playing a corner in this
scenario creates a fork for "X" to
win.)
Option 2: If there is a configuration
where the opponent can fork, block
that fork.
Center: Play the center.
Opposite Corner: If the opponent is in the corner, play the opposite
corner.
Empty Corner: Play an empty corner.
Empty Side: Play an empty side.
Recognizing what a "fork" situation looks like could be done in a brute-force manner as suggested.
Note: A "perfect" opponent is a nice exercise but ultimately not worth 'playing' against. You could, however, alter the priorities above to give characteristic weaknesses to opponent personalities.

What you need (for tic-tac-toe or a far more difficult game like Chess) is the minimax algorithm, or its slightly more complicated variant, alpha-beta pruning. Ordinary naive minimax will do fine for a game with as small a search space as tic-tac-toe, though.
In a nutshell, what you want to do is not to search for the move that has the best possible outcome for you, but rather for the move where the worst possible outcome is as good as possible. If you assume your opponent is playing optimally, you have to assume they will take the move that is worst for you, and therefore you have to take the move that MINimises their MAXimum gain.

The brute force method of generating every single possible board and scoring it based on the boards it later produces further down the tree doesn't require much memory, especially once you recognize that 90 degree board rotations are redundant, as are flips about the vertical, horizontal, and diagonal axis.
Once you get to that point, there's something like less than 1k of data in a tree graph to describe the outcome, and thus the best move for the computer.
-Adam

A typical algo for tic-tac-toe should look like this:
Board : A nine-element vector representing the board. We store 2 (indicating
Blank), 3 (indicating X), or 5 (indicating O).
Turn: An integer indicating which move of the game about to be played.
The 1st move will be indicated by 1, last by 9.
The Algorithm
The main algorithm uses three functions.
Make2: returns 5 if the center square of the board is blank i.e. if board[5]=2. Otherwise, this function returns any non-corner square (2, 4, 6 or 8).
Posswin(p): Returns 0 if player p can’t win on his next move; otherwise, it returns the number of the square that constitutes a winning move. This function will enable the program both to win and to block opponents win. This function operates by checking each of the rows, columns, and diagonals. By multiplying the values of each square together for an entire row (or column or diagonal), the possibility of a win can be checked. If the product is 18 (3 x 3 x 2), then X can win. If the product is 50 (5 x 5 x 2), then O can win. If a winning row (column or diagonal) is found, the blank square in it can be determined and the number of that square is returned by this function.
Go (n): makes a move in square n. this procedure sets board [n] to 3 if Turn is odd, or 5 if Turn is even. It also increments turn by one.
The algorithm has a built-in strategy for each move. It makes the odd numbered
move if it plays X, the even-numbered move if it plays O.
Turn = 1 Go(1) (upper left corner).
Turn = 2 If Board[5] is blank, Go(5), else Go(1).
Turn = 3 If Board[9] is blank, Go(9), else Go(3).
Turn = 4 If Posswin(X) is not 0, then Go(Posswin(X)) i.e. [ block opponent’s win], else Go(Make2).
Turn = 5 if Posswin(X) is not 0 then Go(Posswin(X)) [i.e. win], else if Posswin(O) is not 0, then Go(Posswin(O)) [i.e. block win], else if Board[7] is blank, then Go(7), else Go(3). [to explore other possibility if there be any ].
Turn = 6 If Posswin(O) is not 0 then Go(Posswin(O)), else if Posswin(X) is not 0, then Go(Posswin(X)), else Go(Make2).
Turn = 7 If Posswin(X) is not 0 then Go(Posswin(X)), else if Posswin(X) is not 0, then Go(Posswin(O)) else go anywhere that is blank.
Turn = 8 if Posswin(O) is not 0 then Go(Posswin(O)), else if Posswin(X) is not 0, then Go(Posswin(X)), else go anywhere that is blank.
Turn = 9 Same as Turn=7.
I have used it. Let me know how you guys feel.

Since you're only dealing with a 3x3 matrix of possible locations, it'd be pretty easy to just write a search through all possibilities without taxing you computing power. For each open space, compute through all the possible outcomes after that marking that space (recursively, I'd say), then use the move with the most possibilities of winning.
Optimizing this would be a waste of effort, really. Though some easy ones might be:
Check first for possible wins for
the other team, block the first one
you find (if there are 2 the games
over anyway).
Always take the center if it's open
(and the previous rule has no
candidates).
Take corners ahead of sides (again,
if the previous rules are empty)

You can have the AI play itself in some sample games to learn from. Use a supervised learning algorithm, to help it along.

An attempt without using a play field.
to win(your double)
if not, not to lose(opponent's double)
if not, do you already have a fork(have a double double)
if not, if opponent has a fork
search in blocking points for possible double and fork(ultimate win)
if not search forks in blocking points(which gives the opponent the most losing possibilities )
if not only blocking points(not to lose)
if not search for double and fork(ultimate win)
if not search only for forks which gives opponent the most losing possibilities
if not search only for a double
if not dead end, tie, random.
if not(it means your first move)
if it's the first move of the game;
give the opponent the most losing possibility(the algorithm results in only corners which gives 7 losing point possibility to opponent)
or for breaking boredom just random.
if it's second move of the game;
find only the not losing points(gives a little more options)
or find the points in this list which has the best winning chance(it can be boring,cause it results in only all corners or adjacent corners or center)
Note: When you have double and forks, check if your double gives the opponent a double.if it gives, check if that your new mandatory point is included in your fork list.

Rank each of the squares with numeric scores. If a square is taken, move on to the next choice (sorted in descending order by rank). You're going to need to choose a strategy (there are two main ones for going first and three (I think) for second). Technically, you could just program all of the strategies and then choose one at random. That would make for a less predictable opponent.

This answer assumes you understand implementing the perfect algorithm for P1 and discusses how to achieve a win in conditions against ordinary human players, who will make some mistakes more commonly than others.
The game of course should end in a draw if both players play optimally. At a human level, P1 playing in a corner produces wins far more often. For whatever psychological reason, P2 is baited into thinking that playing in the center is not that important, which is unfortunate for them, since it's the only response that does not create a winning game for P1.
If P2 does correctly block in the center, P1 should play the opposite corner, because again, for whatever psychological reason, P2 will prefer the symmetry of playing a corner, which again produces a losing board for them.
For any move P1 may make for the starting move, there is a move P2 may make that will create a win for P1 if both players play optimally thereafter. In that sense P1 may play wherever. The edge moves are weakest in the sense that the largest fraction of possible responses to this move produce a draw, but there are still responses that will create a win for P1.
Empirically (more precisely, anecdotally) the best P1 starting moves seem to be first corner, second center, and last edge.
The next challenge you can add, in person or via a GUI, is not to display the board. A human can definitely remember all the state but the added challenge leads to a preference for symmetric boards, which take less effort to remember, leading to the mistake I outlined in the first branch.
I'm a lot of fun at parties, I know.

A Tic-tac-toe adaptation to the min max algorithem
let gameBoard: [
[null, null, null],
[null, null, null],
[null, null, null]
]
const SYMBOLS = {
X:'X',
O:'O'
}
const RESULT = {
INCOMPLETE: "incomplete",
PLAYER_X_WON: SYMBOLS.x,
PLAYER_O_WON: SYMBOLS.o,
tie: "tie"
}
We'll need a function that can check for the result. The function will check for a succession of chars. What ever the state of the board is, the result is one of 4 options: either Incomplete, player X won, Player O won or a tie.
function checkSuccession (line){
if (line === SYMBOLS.X.repeat(3)) return SYMBOLS.X
if (line === SYMBOLS.O.repeat(3)) return SYMBOLS.O
return false
}
function getResult(board){
let result = RESULT.incomplete
if (moveCount(board)<5){
return result
}
let lines
//first we check row, then column, then diagonal
for (var i = 0 ; i<3 ; i++){
lines.push(board[i].join(''))
}
for (var j=0 ; j<3; j++){
const column = [board[0][j],board[1][j],board[2][j]]
lines.push(column.join(''))
}
const diag1 = [board[0][0],board[1][1],board[2][2]]
lines.push(diag1.join(''))
const diag2 = [board[0][2],board[1][1],board[2][0]]
lines.push(diag2.join(''))
for (i=0 ; i<lines.length ; i++){
const succession = checkSuccesion(lines[i])
if(succession){
return succession
}
}
//Check for tie
if (moveCount(board)==9){
return RESULT.tie
}
return result
}
Our getBestMove function will receive the state of the board, and the symbol of the player for which we want to determine the best possible move. Our function will check all possible moves with the getResult function. If it is a win it will give it a score of 1. if it's a loose it will get a score of -1, a tie will get a score of 0. If it is undetermined we will call the getBestMove function with the new state of the board and the opposite symbol. Since the next move is of the oponent, his victory is the lose of the current player, and the score will be negated. At the end possible move receives a score of either 1,0 or -1, we can sort the moves, and return the move with the highest score.
const copyBoard = (board) => board.map(
row => row.map( square => square )
)
function getAvailableMoves (board) {
let availableMoves = []
for (let row = 0 ; row<3 ; row++){
for (let column = 0 ; column<3 ; column++){
if (board[row][column]===null){
availableMoves.push({row, column})
}
}
}
return availableMoves
}
function applyMove(board,move, symbol) {
board[move.row][move.column]= symbol
return board
}
function getBestMove (board, symbol){
let availableMoves = getAvailableMoves(board)
let availableMovesAndScores = []
for (var i=0 ; i<availableMoves.length ; i++){
let move = availableMoves[i]
let newBoard = copyBoard(board)
newBoard = applyMove(newBoard,move, symbol)
result = getResult(newBoard,symbol).result
let score
if (result == RESULT.tie) {score = 0}
else if (result == symbol) {
score = 1
}
else {
let otherSymbol = (symbol==SYMBOLS.x)? SYMBOLS.o : SYMBOLS.x
nextMove = getBestMove(newBoard, otherSymbol)
score = - (nextMove.score)
}
if(score === 1) // Performance optimization
return {move, score}
availableMovesAndScores.push({move, score})
}
availableMovesAndScores.sort((moveA, moveB )=>{
return moveB.score - moveA.score
})
return availableMovesAndScores[0]
}
Algorithm in action, Github, Explaining the process in more details

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio