How does minimax algorithm work on tictactoe?

How does minimax algorithm work on tictactoe? - algorithm

I've read a lot of documents regarding minimax algorithm and it's implementation on the game of tic-tac-toe but I'm really having a hard time applying it.
Here are links that I've read 1, 2, 3, 4, 5 and an example using java 6
Considering the pseudocode: 7
function minimax(node, depth, maximizingPlayer)
if depth = 0 or node is a terminal node
return the heuristic value of node
if (maximizingPlayer is TRUE) {
bestValue = +∞
for each child of the node {
val = minimax(child, depth-1, FALSE)
bestValue = max(bestValue, val)
}
return bestValue
} else if maximizingPlayer is FALSE {
bestValue = -∞
for each child of the node {
val = minimax(child, depth-1, TRUE)
bestValue = min(bestValue, val)
}
return bestValue
}
Here are my questions:
1. What will I pass to the signature node, is it the valid moves for the current player?
2. What is + and - infinity, What are their possible values?
3. What is the relationship between minimax and the occupied cells on the board.
4. What are child nodes how are values extracted from it?
5. How will I determine the maximum allowed depth?

Node is the grid with the information of which cells were already played
±∞ is just a placeholder, you can take int.maxValue or any other "big"/"small" value in your programming language. It is used later as a comparison to the calculated value of a node and 0 might already be a valid value. (Your pseudocode is wrong there, if we assign +∞ to bestValue, we want min(bestValue, val) and max(bestValue, val) for -∞.
Not sure what you mean.
Child nodes are possible moves that can be made. Once no move can be made the board is evaluated and score is returned.
The search space in TicTacToe is very small, in other games a heuristic score is returned, depending if the situation is is favorable to the player or not. In your scenario there should be no hard depth.

First of all, stop asking 100 questions in one question. Ask one, try to understand and figure out whether you actually need to ask another.
Now to your questions:
What will I pass to the signature node, is it the valid moves for the
current player?
No, you will pass only node, depth, maximizingPlayer (who plays - min or max). Here for each child of the node you find the possible moves from that node. In real scenario most probably you will have a generator like getChildren using which you will get your children
What is + and - infinity, What are their possible values?
minimax algorithm just tries to find the maximum value over all minimum values recursively. What is the worst possible result a maximum player can get - -infinity. The worse for minimum is when maximum will get +infinity. So these are just starting values.
What is the relationship between minimax and the occupied cells on the
board.
Not sure what do you mean here. Minimax is the algorithm, occupied cell is an occupied cell. This question is similar to what is the relation between dynamic programming and csv file.
What are child nodes how are values extracted from it?
they are all possible positions that can be reached from your current position. For example
How will I determine the maximum allowed depth?
In standard minimax it is till you will reach the end of the tree. At the end of the tree your evaluation function will give you a reward. Because the tree is most of the time too huge to fit anywhere people use a cutoff and use approximation to evaluation function (heuristics). The easiest cut of is look n moves ahead and hope that your evaluation function is good enough and your opponent think n-1 moves ahead.

Related

Understanding subtleties of dynamic programming approaches

I understand that there are mainly two approaches to dynamic programming solutions:
Fixed optimal order of evaluation (lets call it Foo approach): Foo approach usually goes from subproblems to bigger problems thus using results obtained earlier for subproblems to solve bigger problems, thus avoiding "revisiting" subproblem. CLRS also seems to call this "Bottom Up" approach.
Without fixed optimal order of evaluation (lets call it Non-Foo approach): In this approach evaluation proceeds from problems to their sub-problems . It ensures that sub problems are not "re-evaluated" (thus ensuring optimality) by maintaining results of their past evaluations in some data structure and then first checking if the result of the problem at hand exists in this data structure before starting its evaluation. CLRS seem to call this as "Top Down" approach
This is what is roughly conveyed as one of the main points by this answer.
I have following doubts:
Q1. Memoization or not?
CLRS uses terms "top down with memoization" approach and "bottom up" approach. I feel both approaches require memory to cache results of sub problems. But, then, why CLRS use term "memoization" only for top down approach and not for bottom up approach? After solving some problems by DP approach, I feel that solutions by top down approach for all problems require memory to caches results of "all" subproblems. However, that is not the case with bottom up approach. Solutions by bottom up approach for some problems does not need to cache results of "all" sub problems. Q1. Am I correct with this?
For example consider this problem:
Given cost[i] being the cost of ith step on a staircase, give the minimum cost of reaching the top of the floor if:
you can climb either one or two steps
you can start from the step with index 0, or the step with index 1
The top down approach solution is as follows:
class Solution:
def minCostAux(self, curStep, cost):
if self.minCosts[curStep] > -1:
return self.minCosts[curStep]
if curStep == -1:
return 0
elif curStep == 0:
self.minCosts[curStep] = cost[0]
else:
self.minCosts[curStep] = min(self.minCostAux(curStep-2, cost) + cost[curStep]
, self.minCostAux(curStep-1, cost) + cost[curStep])
return self.minCosts[curStep]
def minCostClimbingStairs(self, cost) -> int:
cost.append(0)
self.minCosts = [-1] * len(cost)
return self.minCostAux(len(cost)-1, cost)
The bottom up approach solution is as follows:
class Solution:
def minCostClimbingStairs(self, cost) -> int:
cost.append(0)
secondLastMinCost = cost[0]
lastMinCost = min(cost[0]+cost[1], cost[1])
minCost = lastMinCost
for i in range(2,len(cost)):
minCost = min(lastMinCost, secondLastMinCost) + cost[i]
secondLastMinCost = lastMinCost
lastMinCost = minCost
return minCost
Note that the top down approach caches result of all steps in self.minCosts while bottom up approach caches result of only last two steps in variables lastMinCost and secondLastMinCost.
Q2. Does all problems have solutions by both approaches?
I feel no. I came to this opinion after solving this problem:
Find the probability that the knight will not go out of n x n chessboard after k moves, if the knight was initially kept in the cell at index (row, column).
I feel the only way to solve this problem is to find successive probabilities in increasing number of steps starting from cell (row, column), that is probability that the knight will not go out of chessboard after step 1, then after step 2, then after step 3 and so on. This is bottom up approach. We cannot do it top down, for example, we cannot start with kth step and go to k-1th step, then k-2th step and so on, because:
We cannot know which cells will be reached in kth step to start with
We cannot ensure that all paths from kth step will lead to initial knight cell position (row,column).
Even one of the top voted answer gives dp solution as follows:
class Solution {
private int[][]dir = new int[][]{{-2,-1},{-1,-2},{1,-2},{2,-1},{2,1},{1,2},{-1,2},{-2,1}};
private double[][][] dp;
public double knightProbability(int N, int K, int r, int c) {
dp = new double[N][N][K + 1];
return find(N,K,r,c);
}
public double find(int N,int K,int r,int c){
if(r < 0 || r > N - 1 || c < 0 || c > N - 1) return 0;
if(K == 0) return 1;
if(dp[r][c][K] != 0) return dp[r][c][K];
double rate = 0;
for(int i = 0;i < dir.length;i++) rate += 0.125 * find(N,K - 1,r + dir[i][0],c + dir[i][1]);
dp[r][c][K] = rate;
return rate;
}
}
I feel this is still a bottom up approach since it starts with initial knight cell position (r,c) (and hence starts from 0th or no step to Kth step) despite the fact that it counts K downwads to 0. So, this is bottom up approach done recursively and not top down approach. To be precise, this solution does NOT first find:
probability of knight not going out of chessboard after K steps starting at cell (r,c)
and then find:
probability of knight not going out of chessboard after K-1 steps starting at cell (r,c)
but it finds in reverse / bottom up order: first for K-1 steps and then for K steps.
Also, I did not find any solutions in of top voted discussions in leetcode doing it in truly top down manner, starting from Kth step to 0th step ending in (row,column) cell, instead of starting with (row,column) cell.
Similarly we cannot solve the following problem with the bottom up approach but only with top down approach:
Find the probability that the Knight ends up in the cell at index (row,column) after K steps, starting at any initial cell.
Q2. So am I correct with my understanding that not all problems have solutions by both top down or bottom up approaches? Or am I just overthinking unnecessarily and both above problems can indeed be solved with both top down and bottom up approaches?
PS: I indeed seem to have done overthinking here: knightProbability() function above is indeed top down, and I ill-interpreted as explained in detailed above 😑. I have kept this explanation for reference as there are already some answers below and also as a hint of how confusion / mis-interpretaions might happen, so that I will be more cautious in future. Sorry if this long explanation caused you some confusion / frustrations. Regardless, the main question still holds: does every problem have bottom up and top down solutions?
Q3. Bottom up approach recursively?
I am pondering if bottom up solutions for all problems can also be implemented recursively. After trying to do so for other problems, I came to following conclusion:
We can implement bottom up solutions for such problems recursively, only that the recursion won't be meaningful, but kind of hacky.
For example, below is recursive bottom up solution for minimum cost climbing stairs problem mentioned in Q1:
class Solution:
def minCostAux(self, step_i, cost):
if self.minCosts[step_i] != -1:
return self.minCosts[step_i]
self.minCosts[step_i] = min(self.minCostAux(step_i-1, cost)
, self.minCostAux(step_i-2, cost)) + cost[step_i]
if step_i == len(cost)-1: # returning from non-base case, gives sense of
# not-so meaningful recursion.
# Also, base cases usually appear at the
# beginning, before recursive call.
# Or should we call it "ceil condition"?
return self.minCosts[step_i]
return self.minCostAux(step_i+1, cost)
def minCostClimbingStairs(self, cost: List[int]) -> int:
cost.append(0)
self.minCosts = [-1] * len(cost)
self.minCosts[0] = cost[0]
self.minCosts[1] = min(cost[0]+cost[1], cost[1])
return self.minCostAux(2, cost)
Is my quoted understanding correct?

First, context.
Every dynamic programming problem can be solved without dynamic programming using a recursive function. Generally this will take exponential time, but you can always do it. At least in principle. If the problem can't be written that way, then it really isn't a dynamic programming problem.
The idea of dynamic programming is that if I already did a calculation and have a saved result, I can just use that saved result instead of doing the calculation again.
The whole top-down vs bottom-up distinction refers to the naive recursive solution.
In a top-down approach your call stack looks like the naive version except that you make a "memo" of what the recursive result would have given. And then the next time you short-circuit the call and return the memo. This means you can always, always, always solve dynamic programming problems top down. There is always a solution that looks like recursion+memoization. And that solution by definition is top down.
In a bottom up approach you start with what some of the bottom levels would have been and build up from there. Because you know the structure of the data very clearly, frequently you are able to know when you are done with data and can throw it away, saving memory. Occasionally you can filter data on non-obvious conditions that are hard for memoization to duplicate, making bottom up faster as well. For a concrete example of the latter, see Sorting largest amounts to fit total delay.
Start with your summary.
I strongly disagree with your thinking about the distinction in terms of the optimal order of evaluations. I've encountered many cases with top down where optimizing the order of evaluations will cause memoization to start hitting sooner, making code run faster. Conversely while bottom up certainly picks a convenient order of operations, it is not always optimal.
Now to your questions.
Q1: Correct. Bottom up often knows when it is done with data, top down does not. Therefore bottom up gives you the opportunity to delete data when you are done with it. And you gave an example where this happens.
As for why only one is called memoization, it is because memoization is a specific technique for optimizing a function, and you get top down by memoizing recursion. While the data stored in dynamic programming may match up to specific memos in memoization, you aren't using the memoization technique.
Q2: I do not know.
I've personally found cases where I was solving a problem over some complex data structure and simply couldn't find a bottom up approach. Maybe I simply wasn't clever enough, but I don't believe that a bottom up approach always exists to be found.
But top down is always possible. Here is how to do it in Python for the example that you gave.
First the naive recursive solution looks like this:
def prob_in_board(n, i, j, k):
if i < 0 or j < 0 or n <= i or n <= j:
return 0
elif k <= 0:
return 1
else:
moves = [
(i+1, j+2), (i+1, j-2),
(i-1, j+2), (i-1, j-2),
(i+2, j+1), (i+2, j-1),
(i-2, j+1), (i-2, j-1),
]
answer = 0
for next_i, next_j in moves:
answer += prob_in_board(n, next_i, next_j, k-1) / len(moves)
return answer
print(prob_in_board(8, 3, 4, 7))
And now we just memoize.
def prob_in_board_memoized(n, i, j, k, cache=None):
if cache is None:
cache = {}
if i < 0 or j < 0 or n <= i or n <= j:
return 0
elif k <= 0:
return 1
elif (i, j, k) not in cache:
moves = [
(i+1, j+2), (i+1, j-2),
(i-1, j+2), (i-1, j-2),
(i+2, j+1), (i+2, j-1),
(i-2, j+1), (i-2, j-1),
]
answer = 0
for next_i, next_j in moves:
answer += prob_in_board_memoized(n, next_i, next_j, k-1, cache) / len(moves)
cache[(i, j, k)] = answer
return cache[(i, j, k)]
print(prob_in_board_memoized(8, 3, 4, 7))
This solution is top down. If it seems otherwise to you, then you do not correctly understand what is meant by top-down.

I found your question ( does every dynamic programming problem have bottom up and top down solutions ? ) very interesting. That's why I'm adding another answer to continue the discussion about it.
To answer the question in its generic form, I need to formulate it more precisely with math. First, I need to formulate precisely what is a dynamic programming problem. Then, I need to define precisely what is a bottom up solution and what is a top down solution.
I will try to put some definitions but I think they are not the most generic ones. I think a really generic definition would need more heavy math.
First, define a state space S of dimension d as a subset of Z^d (Z represents the integers set).
Let f: S -> R be a function that we are interested in calculate for a given point P of the state space S (R represents the real numbers set).
Let t: S -> S^k be a transition function (it associates points in the state space to sets of points in the state space).
Consider the problem of calculating f on a point P in S.
We can consider it as a dynamic programming problem if there is a function g: R^k -> R such that f(P) = g(f(t(P)[0]), f(t(P)[1]), ..., f(t(P)[k])) (a problem can be solved only by using sub problems) and t defines a directed graph that is not a tree (sub problems have some overlap).
Consider the graph defined by t. We know it has a source (the point P) and some sinks for which we know the value of f (the base cases). We can define a top down solution for the problem as a depth first search through this graph that starts in the source and calculate f for each vertex at its return time (when the depth first search of all its sub graph is completed) using the transition function. On the other hand, a bottom up solution for the problem can be defined as a multi source breadth first search through the transposed graph that starts in the sinks and finishes in the source vertex, calculating f at each visited vertex using the previous visited layer.
The problem is: to navigate through the transposed graph, for each point you visit you need to know what points transition to this point in the original graph. In math terms, for each point Q in the transition graph, you need to know the set J of points such that for each point Pi in J, t(Pi) contains Q and there is no other point Pr in the state space outside of J such that t(Pr) contains Q. Notice that a trivial way to know this is to visit all the state space for each point Q.
The conclusion is that a bottom up solution as defined here always exists but it only compensates if you have a way to navigate through the transposed graph at least as efficiently as navigating through the original graph. This depends essentially in the properties of the transition function.
In particular, for the leetcode problem you mentioned, the transition function is the function that, for each point in the chessboard, gives all the points to which the knight can go to. A very special property about this function is that it's symmetric: if the knight can go from A to B, then it can also go from B to A. So, given a certain point P, you can know to which points the knight can go as efficiently as you can know from which points the knight can come from. This is the property that guarantees you that there exists a bottom up approach as efficient as the top down approach for this problem.

For the leetcode question you mentioned, the top down approach is like the following:
Let P(x, y, k) be the probability that the knight is at the square (x, y) at the k-th step. Look at all squares that the knight could have come from (you can get them in O(1), just look at the board with a pen and paper and get the formulas from the different cases, like knight in the corner, knight in the border, knight in a central region etc). Let them be (x1, y1), ... (xj, yj). For each of these squares, what is the probability that the knight jumps to (x, y) ? Considering that it can go out of the border, it's always 1/8. So:
P(x, y, k) = (P(x1, y1, k-1) + ... + P(xj, yj, k-1))/8
The base case is k = 0:
P(x, y ,0) = 1 if (x, y) = (x_start, y_start) and P(x, y, 0) = 0 otherwise.
You iterate through all n^2 squares and use the recurrence formula to calculate P(x, y, k). Many times you will need solutions you already calculated for k-1 and so you can benefit a lot from memoization.
In the end, the final solution will be the sum of P(x, y, k) over all squares of the board.

Dyanmic Shortest Path

Your friends are planning an expedition to a small town deep in the Canadian north
next winter break. They’ve researched all the travel options and have drawn up a directed
graph whose nodes represent intermediat destinations and edges represent the reoads betweeen
them.
In the course of this, they’ve also learned that extreme weather causes roads in this part of
the world to become quite slow in the winter and may cause large travel delays. They’ve
found an excellent travel Web site that can accurately predict how fast they’ll be able to
travel along the roads; however, the speed of travel depends on the time of the year. More
precisely, the Web site answers queries of the following form: given an edge e = (u, v)
connecting two sites u and v, and given a proposed starting time t from location u, the
site will return a value fe(t), the predicted arrival time at v. The web site guarantees that
1
fe(t) > t for every edge e and every time t (you can’t travel backward in time), and that
fe(t) is a monotone increasing function of t (that is, you do not arrive earlier by starting
later). Other than that, the functions fe may be arbitrary. For example, in areas where the
travel time does not vary with the season, we would have fe(t) = t + e, wheree is the
time needed to travel from the beginning to the end of the edge e.
Your friends want to use the Web site to determine the fastest way to travel through the
directed graph from their starting point to their intended destination. (You should assume
that they start at time 0 and that all predictions made by the Web site are completely
correct.) Give a polynomial-time algorithm to do this, where we treat a single query to
the Web site (based on a specific edge e and a time t) as taking a single computational step.
def updatepath(node):
randomvalue = random.randint(0,3)
print(node,"to other node:",randomvalue)
for i in range(0,n):
distance[node][i] = distance[node][i] + randomvalue
def minDistance(dist,flag_array,n):
min_value = math.inf
for i in range(0,n):
if dist[i] < min_value and flag_array[i] == False:
min_value = dist[i]
min_index = i
return min_index
def shortest_path(graph, src,n):
dist = [math.inf] * n
flag_array = [False] * n
dist[src] = 0
for cout in range(n):
#find the node index that have min cost
u = minDistance(dist,flag_array,n)
flag_array[u] = True
updatepath(u)
for i in range(n):
if graph[u][i] > 0 and flag_array[i]==False and dist[i] > dist[u] + graph[u][i]:
dist[i] = dist[u] + graph[u][i]
path[i] = u
return dist
I applied Dijkstra algorithm but it is not correct ? What would i change in my algorithm to work it for dynamic changing edge.

Well, Key points are that function is monotonically increasing. There is an algorithm which exploits this property and it is called A*.
Accumulated cost: Your prof wants you to use two distances one is accumulated cost(this is simple the cost from previous added to the cost/time needed to move to the next node).
Heuristic cost: This is some predicted cost.
Disjkstra approach would not work because you are working with heuristic cost/predicted and accumulated cost.
Monotonically increasing means h(A) <= h(A) + f(A..B).It simply says that if you move from node A to node B then the cost should not be less than the previous node (in this case A) and this is heuristic + accumulated. If this property holds then the first path which A* chooses is always the path to goal and it never needs to backtrack.
Note: The power of this algorithm is totally base on how you predict value.
If you underestimate the value that will be corrected with accumulated value but if you overestimate the value it will chose wrong path.
Algorithm:
Create a Min Priority queue.
insert initial city in q.
while(!pq.isEmpty() && !Goalfound)
Node min = pq.delMin() //this should return you a cities to which your
distance(heuristic+accumulated is minial).
put all succesors of min in pq // all cities which you can reach, you
can better make a list of visited
cities s that queue will be
efficient by not placing same
element twice.
Keep doing this and at the end you will either reach goal or your queue will be empty
Extra
Here i implemented a 8-puzzle-solve using A*, it can give you an idea about how costs are defined and ho it works.
`
private void solve(MinPQ<Node> pq, HashSet<Node> closedList) {
while(!(pq.min().getBoad().isGoal(pq.min().getBoad()))){
Node e = pq.delMin();
closedList.add(e);
for(Board boards: e.getBoad().neighbors()){
Node nextNode = new Node(boards,e,e.getMoves()+1);
if(!equalToPreviousNode(nextNode,e.getPreviousNode()))
pq.insert(nextNode);
}
}
Node collection = pq.delMin();
while(!(collection.getPreviousNode() == null)){
this.getB().add(collection.getBoad());
collection =collection.getPreviousNode();
}
this.getB().add(collection.getBoad());
System.out.println(pq.size());
}
A link to full code is here.

How to adapt Fenwick tree to answer range minimum queries

Fenwick tree is a data-structure that gives an efficient way to answer to main queries:
add an element to a particular index of an array update(index, value)
find sum of elements from 1 to N find(n)
both operations are done in O(log(n)) time and I understand the logic and implementation. It is not hard to implement a bunch of other operations like find a sum from N to M.
I wanted to understand how to adapt Fenwick tree for RMQ. It is obvious to change Fenwick tree for first two operations. But I am failing to figure out how to find minimum on the range from N to M.
After searching for solutions majority of people think that this is not possible and a small minority claims that it actually can be done (approach1, approach2).
The first approach (written in Russian, based on my google translate has 0 explanation and only two functions) relies on three arrays (initial, left and right) upon my testing was not working correctly for all possible test cases.
The second approach requires only one array and based on the claims runs in O(log^2(n)) and also has close to no explanation of why and how should it work. I have not tried to test it.
In light of controversial claims, I wanted to find out whether it is possible to augment Fenwick tree to answer update(index, value) and findMin(from, to).
If it is possible, I would be happy to hear how it works.

Yes, you can adapt Fenwick Trees (Binary Indexed Trees) to
Update value at a given index in O(log n)
Query minimum value for a range in O(log n) (amortized)
We need 2 Fenwick trees and an additional array holding the real values for nodes.
Suppose we have the following array:
index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
value 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0
We wave a magic wand and the following trees appear:
Note that in both trees each node represents the minimum value for all nodes within that subtree. For example, in BIT2 node 12 has value 0, which is the minimum value for nodes 12,13,14,15.
Queries
We can efficiently query the minimum value for any range by calculating the minimum of several subtree values and one additional real node value. For example, the minimum value for range [2,7] can be determined by taking the minimum value of BIT2_Node2 (representing nodes 2,3) and BIT1_Node7 (representing node 7), BIT1_Node6 (representing nodes 5,6) and REAL_4 - therefore covering all nodes in [2,7]. But how do we know which sub trees we want to look at?
Query(int a, int b) {
int val = infinity // always holds the known min value for our range
// Start traversing the first tree, BIT1, from the beginning of range, a
int i = a
while (parentOf(i, BIT1) <= b) {
val = min(val, BIT2[i]) // Note: traversing BIT1, yet looking up values in BIT2
i = parentOf(i, BIT1)
}
// Start traversing the second tree, BIT2, from the end of range, b
i = b
while (parentOf(i, BIT2) >= a) {
val = min(val, BIT1[i]) // Note: traversing BIT2, yet looking up values in BIT1
i = parentOf(i, BIT2)
}
val = min(val, REAL[i]) // Explained below
return val
}
It can be mathematically proven that both traversals will end in the same node. That node is a part of our range, yet it is not a part of any subtrees we have looked at. Imagine a case where the (unique) smallest value of our range is in that special node. If we didn't look it up our algorithm would give incorrect results. This is why we have to do that one lookup into the real values array.
To help understand the algorithm I suggest you simulate it with pen & paper, looking up data in the example trees above. For example, a query for range [4,14] would return the minimum of values BIT2_4 (rep. 4,5,6,7), BIT1_14 (rep. 13,14), BIT1_12 (rep. 9,10,11,12) and REAL_8, therefore covering all possible values [4,14].
Updates
Since a node represents the minimum value of itself and its children, changing a node will affect its parents, but not its children. Therefore, to update a tree we start from the node we are modifying and move up all the way to the fictional root node (0 or N+1 depending on which tree).
Suppose we are updating some node in some tree:
If new value < old value, we will always overwrite the value and move up
If new value == old value, we can stop since there will be no more changes cascading upwards
If new value > old value, things get interesting.
If the old value still exists somewhere within that subtree, we are done
If not, we have to find the new minimum value between real[node] and each tree[child_of_node], change tree[node] and move up
Pseudocode for updating node with value v in a tree:
while (node <= n+1) {
if (v > tree[node]) {
if (oldValue == tree[node]) {
v = min(v, real[node])
for-each child {
v = min(v, tree[child])
}
} else break
}
if (v == tree[node]) break
tree[node] = v
node = parentOf(node, tree)
}
Note that oldValue is the original value we replaced, whereas v may be reassigned multiple times as we move up the tree.
Binary Indexing
In my experiments Range Minimum Queries were about twice as fast as a Segment Tree implementation and updates were marginally faster. The main reason for this is using super efficient bitwise operations for moving between nodes. They are very well explained here. Segment Trees are really simple to code so think about is the performance advantage really worth it? The update method of my Fenwick RMQ is 40 lines and took a while to debug. If anyone wants my code I can put it on github. I also produced a brute and test generators to make sure everything works.
I had help understanding this subject & implementing it from the Finnish algorithm community. Source of the image is http://ioinformatics.org/oi/pdf/v9_2015_39_44.pdf, but they credit Fenwick's 1994 paper for it.

The Fenwick tree structure works for addition because addition is invertible. It doesn't work for minimum, because as soon as you have a cell that's supposed to be the minimum of two or more inputs, you've lost information potentially.
If you're willing to double your storage requirements, you can support RMQ with a segment tree that is constructed implicitly, like a binary heap. For an RMQ with n values, store the n values at locations [n, 2n) of an array. Locations [1, n) are aggregates, with the formula A(k) = min(A(2k), A(2k+1)). Location 2n is an infinite sentinel. The update routine should look something like this.
def update(n, a, i, x): # value[i] = x
i += n
a[i] = x
# update the aggregates
while i > 1:
i //= 2
a[i] = min(a[2*i], a[2*i+1])
The multiplies and divides here can be replaced by shifts for efficiency.
The RMQ pseudocode is more delicate. Here's another untested and unoptimized routine.
def rmq(n, a, i, j): # min(value[i:j])
i += n
j += n
x = inf
while i < j:
if i%2 == 0:
i //= 2
else:
x = min(x, a[i])
i = i//2 + 1
if j%2 == 0:
j //= 2
else:
x = min(x, a[j-1])
j //= 2
return x

How exactly to use "History Heuristic" in alpha-beta minimax?

I'm making an AI for a chess game.
So far, I've successfully implemented the Alpha-Beta Pruning Minimax algorithm, which looks like this (from Wikipedia):
(* Initial call *)
alphabeta(origin, depth, -∞, +∞, TRUE)
function alphabeta(node, depth, α, β, maximizingPlayer)
if depth = 0 or node is a terminal node
return the heuristic value of node
if maximizingPlayer
for each child of node
α := max(α, alphabeta(child, depth - 1, α, β, FALSE))
if β ≤ α
break (* β cut-off *)
return α
else
for each child of node
β := min(β, alphabeta(child, depth - 1, α, β, TRUE))
if β ≤ α
break (* α cut-off *)
return β
Since this costs too much time complexity (going through all the trees one by one), I came across something called "History Heuristic".
The Algorithm from the original paper:
int AlphaBeta(pos, d, alpha, beta)
{
if (d=0 || game is over)
return Eval (pos); // evaluate leaf position from current player’s standpoint
score = - INFINITY; // preset return value
moves = Generate(pos); // generate successor moves
for i=1 to sizeof(moves) do // rating all moves
rating[i] = HistoryTable[ moves[i] ];
Sort( moves, rating ); // sorting moves according to their history scores
for i =1 to sizeof(moves) do { // look over all moves
Make(moves[i]); // execute current move
cur = - AlphaBeta(pos, d-1, -beta, -alpha); //call other player
if (cur > score) {
score = cur;
bestMove = moves[i]; // update best move if necessary
}
if (score > alpha) alpha = score; //adjust the search window
Undo(moves[i]); // retract current move
if (alpha >= beta) goto done; // cut off
}
done:
// update history score
HistoryTable[bestMove] = HistoryTable[bestMove] + Weight(d);
return score;
}
So basically, the idea is to keep track of a Hashtable or a Dictionary for previous "moves".
Now I'm confused what this "move" means here.
I'm not sure if it literally refers to a single move or a overall state after each move.
In chess, for example, what should be the "key" for this hashtable be?
Individual moves like (Queen to position (0,1)) or (Knight to position (5,5))?
Or the overall state of the chessboard after individual moves?
If 1 is the case, I guess the positions of other pieces are not taken into account when recording the "move" into my History table?

I think the original paper (The History Heuristic and Alpha-Beta Search Enhancements in Practice, Jonathan Schaeffer) available on-line answers the question clearly. In the paper, the author defined move as the 2 indices (from square and to) on the chess board, using a 64x64 table (in effect, I think he used bit shifting and a single index array) to contain the move history.
The author compared all the available means of move ordering and determined that hh was the best. If current best practice has established an improved form of move ordering (beyond hh + transposition table), I would also like to know what it is.

You can use a transposition table so you avoid evaluating the same board multiple times. Transposition meaning you can reach the same board state by performing moves in different orders. Naive example:
1. e4 e5 2. Nf3 Nc6
1. e4 Nc6 2. Nf3 e5
These plays result in the same position but were reached differently.
http://en.wikipedia.org/wiki/Transposition_table
A common method is called Zobrist hashing to hash a chess position:
http://en.wikipedia.org/wiki/Zobrist_hashing

From my experience the history heuristic produces negligible benefits compared to other techniques, and is not worthwhile for a basic search routine. It is not the same thing as using transposition table. If the latter is what you want to implement, I'd still advise against it. There are many other techniques that will produce good results for far less effort. In fact, an efficient and correct transposition table is one of the most difficult parts to code in a chess engine.
First try pruning and move ordering heuristics, most of which are one to a few lines of code. I've detailed such techniques in this post, which also gives estimates of the performance gains you can expect.

In chess, for example, what should be the "key" for this hashtable be?
Individual moves like (Queen to position (0,1)) or (Knight to position (5,5))?
Or the overall state of the chessboard after individual moves?
The key is an individual move and the positions of other pieces aren't taken into account when recording the "move" into the history table.
The traditional form of the history table (also called butterfly board) is something like:
score history_table[side_to_move][from_square][to_square];
For instance, if the move e2-e4 produces a cutoff, the element:
history_table[white][e2][e4]
is (somehow) incremented (irrespectively from the position in which the move has been made).
As in the example code, history heuristics uses those counters for move ordering. Other heuristics can take advantage of history tables (e.g. late move reductions).
Consider that:
usually history heuristics isn't applied to plain Alpha-Beta with no knowledge of move ordering (in chess only "quiet" moves are ordered via history heuristic);
there are alternative forms for the history table (often used is history_table[piece][to_square]).

Grundy's game extended to more than two heaps

How can In break a heap into two heaps in the Grundy's game?
What about breaking a heap into any number of heaps (no two of them being equal)?

Games of this type are analyzed in great detail in the book series "Winning Ways for your Mathematical Plays". Most of the things you are looking for are probably in volume 1.
You can also take a look at these links: Nimbers (Wikipedia), Sprague-Grundy theorem (Wikipedia) or do a search for "combinatorial game theory".
My knowledge on this is quite rusty, so I'm afraid I can't help you myself with this specific problem. My excuses if you were already aware of everything I linked.
Edit: In general, the method of solving these types of games is to "build up" stack sizes. So start with a stack of 1 and decide who wins with optimal play. Then do the same for a stack of 2, which can be split into 1 & 1. The move on to 3, which can be split into 1 & 2. Same for 4 (here it gets trickier): 3 & 1 or 2 & 2, using the Spague-Grundy theorem & the algebraic rules for nimbers, you can calculate who will win. Keep going until you reach the stack size for which you need to know the answer.
Edit 2: The website I was talking about in the comments seems to be down. Here is a link of a backup of it: Wayback Machine - Introduction to Combinatorial Games.

Grundy's Game, and many games like it, can be solved with an algorithm like this:
//returns a Move object representing the current player's optimal move, or null if the player has no chance of winning
function bestMove(GameState g){
for each (move in g.possibleMoves()){
nextState = g.applyMove(move)
if (bestMove(nextState) == null){
//the next player's best move is null, so if we take this move,
//he has no chance of winning. This is good for us!
return move;
}
}
//none of our possible moves led to a winning strategy.
//We have no chance of winning. This is bad for us :-(
return null;
}
Implementations of GameState and Move depend on the game. For Grundy's game, both are simple.
GameState stores a list of integers, representing the size of each heap in the game.
Move stores an initialHeapSize integer, and a resultingHeapSizes list of integers.
GameState::possibleMoves iterates through its heap size list, and determines the legal divisions for each one.
GameState::applyMove(Move) returns a copy of the GameState, except the move given to it is applied to the board.
GameState::possibleMoves can be implemented for "classic" Grundy's Game like so:
function possibleMoves(GameState g){
moves = []
for each (heapSize in g.heapSizes){
for each (resultingHeaps in possibleDivisions(heapSize)){
Move m = new Move(heapSize, resultingHeaps)
moves.append(m)
}
}
return moves
}
function possibleDivisions(int heapSize){
divisions = []
for(int leftPileSize = 1; leftPileSize < heapSize; leftPileSize++){
int rightPileSize = heapSize - leftPileSize
if (leftPileSize != rightPileSize){
divisions.append([leftPileSize, rightPileSize])
}
}
return divisions
}
Modifying this to use the "divide into any number of unequal piles" rule is just a matter of changing the implementation of possibleDivisions.
I haven't calculated it exactly, but an unoptimized bestMove has a pretty crazy worst-case runtime. Once you start giving it a starting state of around 12 stones, you'll get long wait times. So you should implement memoization to improve performance.
For best results, keep each GameState's heap size list sorted, and discard any heaps of size 2 or 1.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How does minimax algorithm work on tictactoe? - algorithm

Related

Understanding subtleties of dynamic programming approaches

Dyanmic Shortest Path

How to adapt Fenwick tree to answer range minimum queries

How exactly to use "History Heuristic" in alpha-beta minimax?

Grundy's game extended to more than two heaps

Categories

Resources