Understanding MCTS node selection - algorithm

I'm currently trying to implement MCTS for a project of mine but I'm not sure if I understand the idea of node selection correctly. In the beginning of the game, after I randomly select one move, unwind the whole tree to the point of a game end and then do the backpropagation, this node is obviously seen as better than all the other ones since it's 1/1 (if we got the win) vs. their 0/0. How does the MCTS flee that trap and not get stuck with the one, randomly selected, node?
I mean, if we use, say, UCB for finding the best node to expand, it'll always choose the node we selected first (given it resulted in a win) completely ignoring all the other ones since it'll be the only one non-zero valued. What am I missing here, since it's obviously not the case?

Each time you are at a node, you expand a node according to these rules :
if a child node has never been expanded before, then expand one of the unexplored child at random (and you can immediately unwind from this child node)
otherwise, each child node has been visited at least once. Compute for all of them the "exploration/exploitation" value and expand the child node with highest value
The idea of MCTS is maximizing the exploration/exploitation. If a child node has never been explored before, the "exploration" value associated with it is infinite, you will have to explore it. However, once you have expanded all child nodes, then you will expand more frequently the child nodes with higher value (this is the "exploitation" part)

Related

How would I jump from one component to another in this undirected graph?

Since this distance learning thing started I've really struggled to understand data structures and this question really threw me for a loop. I have absolutely no idea how to even start with the code let alone get my point across. Any help at all would be much appreciated...
Pick any node in the graph. That node belongs to some component, and it’s there with potentially a few other nodes. So run a BFS, painting that node and everything reachable from it gold. That’s one component.
Now pick another node. One of two things must be true about it. First, it could be the case that the node has already been painted gold. In that case, you already have counted the component that contains it. Second, it could be unpainted. In that case, you haven’t counted its component, so paint it and all nodes reachable from it gold, and that’s your second component.
Do you think you can generalize this idea so that you count all the components?
As for runtime - how many times does each node get visited this way? Remember that you only paint each node gold once and that each edge is only visited in the course or painting nodes.

How to understand the 4 steps of Monte Carlo Tree Search

From many blogs and this one https://web.archive.org/web/20160308070346/http://mcts.ai/about/index.html
We know that the process of MCTS algorithm has 4 steps.
Selection: Starting at root node R, recursively select optimal child nodes until a leaf node L is reached.
What does leaf node L mean here? I thought it should be a node representing the terminal state of the game, or another word which ends the game.
If L is not a terminal node (one end state of the game), how do we decide that the selection step stops on node L?
Expansion: If L is a not a terminal node (i.e. it does not end the game) then create one or more child nodes and select one C.
From this description I realise that obviously my previous thought incorrect.
Then if L is not a terminal node, it implies that L should have children, why not continue finding a child from L at the "Selection" step?
Do we have the children list of L at this step?
From the description of this step itself, when do we create one child node, and when do we need to create more than one child nodes? Based on what rule/policy do we select node C?
Simulation: Run a simulated playout from C until a result is achieved.
Because of the confusion of the 1st question, I totally cannot understand why we need to simulate the game. I thought from the selection step, we can reach the terminal node and the game should be ended on node L in this path. We even do not need to do "Expansion" because node L is the terminal node.
Backpropagation: Update the current move sequence with the simulation result.
Fine. Last question, from where did you get the answer to these questions?
Thank you
BTW, I also post the same question https://ai.stackexchange.com/questions/16606/how-to-understand-the-4-steps-of-monte-carlo-tree-search
What does leaf node L mean here?
For the sake of explanation I'm assuming that all the children of a selected node are added during the expansion phase of the algorithm.
When the algorithm starts, the tree is formed only by the root node (a leaf node).
The Expansion phase adds all the states reachable from the root to the tree. Now you have a bigger tree where the leaves are the last added nodes (the root node isn't a leaf anymore).
At any given iteration of the algorithm, the tree (the gray area of the picture) grows. Some of its leaves could be terminal states (according to the rules of the game/problem) but it's not granted.
If you expand too much, you could run out of memory. So the typical implementation of the expansion phase only adds a single node to the existing tree.
In this scenario you could change the word leaf with not fully expanded:
Starting at root node R, recursively select optimal child nodes until a not fully expanded node L is reached
Based on what rule/policy do we select node C?
It's domain-dependent. Usually you randomly choose a move/state.
NOTES
Image from Multi-Objective Monte Carlo Tree Search for Real-Time Games (Diego Perez, Sanaz Mostaghim, Spyridon Samothrakis, Simon M. Lucas).

A star pathfinding. Why do you need to re-evaluate an adjacent node that's already in the open list if it has a lower g cost to the current node?

There is one thing about a star path finding algorithm that I do not understand. In the pseudocode; if the current node's (the node being analysed) g cost is less than the adjacent nodes g cost then recalculate the adjacent nodes g,h a f cost and reassign the parent node.
Why do you do this?
Why do you need to reevaluate the adjacent nodes costs and parent if it's gCost is greater than the current nodes gCost? I'm what instance would you need to do this?
Edit; I am watcing this video
https://www.youtube.com/watch?v=C0qCR18gXdU\
At at 8.19 he says: When you come across blocks (nodes) that have already been analysed, the question is should we change the properties of the block?
First a tip. You can actually add the time you want as a bookmark to get a video that starts right where you want. In this case https://www.youtube.com/watch?v=C0qCR18gXdU#t=08m19s is the bookmarked time link.
Now the quick answer to your question. We fill in a node the first time we find a path to it. But the first path we found to it might not be the cheapest one. We want the cheapest one, and if we find a cheaper one second, we want it.
Here is a visual metaphor. Imagine a running path with a fence next to it. The spot we want is on the other side of the fence. Actually draw this out, it will help.
The first path that our algorithm finds to it is run down the path, jump over the fence. The second path that we find is run part way down the path, go through the gate, then get to that spot. We don't want to throw away the idea of using the gate just because we already figured out that we could get there by jumping the fence!
Now in your picture put costs of moving from one spot to another that are reasonable for moving along a running path, an open field, through a gate, and jumping a fence. Run the algorithm by hand and you'll see that you figure out first that you can jump the fence, and then later that you really wanted to use the gate.
This guy is totally wrong because he says change the parent node however your succesors are based on your parent node and if you change parent Node then you you can't have a valid path because the path is simply by moving from parent to child.
Instead of changing parent, Pathmax function. It says that if a parent Node A have a child node whose cost (heuristic(A) <= heuristic(A) + accumulated(cost)) then set the cost of child equal to the cost of parent.
PathMax to ensure monotonicty:
Monotonicity: Every parent Node has cost greater or equal then the cost of it's child node.
A* has a property: It says that if your cost is monotonically increasing then the first (sub)path that A* finds is always the part of the final path. More precisely: Under monotonicity each node is reach first through the best path.
Do you see why?
Suppose you have a graph :(A,B) (B,C) (A,E) ,(E,D) here every tuple means they are connected. Suppose cost is monotonically increasing and your algortihm chooses (A,B),(B,C) and at this point you know your algorithm have chosen best path till now and everyother path which can reach this node,must have cost higher but if the cost is not monotonically increasing then it can be the case that (A,E) is cost greater than your current cost and from (E,D) it's zero. so you have better path there.
This algorithm relies on it's heuristic function , if it's underustimated then it gets corrected by accumulated cost but if it's over-estimated then it can explore extra node and i leave it for you why this happends.
Why do you need to re-evaluate an adjacent node that's already in the open list if it has a lower g cost to the current node?
Don't do this because it's just extra work.
Corollary: if you later come the same node from a node p with same cost then simply remove that node from queue. Do not extend it.

Confusion about complete binary tree

I keep seeing it defined as
A complete binary tree is a binary tree in which every level, except
possibly the last, is completely filled, and all nodes are as far left
as possible.
But..I have no clue as to what it means by "all nodes are as far left as possible." That's..literally my question. I can't expand on it any further because I have no idea what it means by "all nodes are as far left as possible." Like..as far left as possible compared to what? I don't get it
The as far left as possible part applies to the last level. That is, at the last level, you should start filling nodes from the left.
For example, the following is a valid complete binary tree since at the last level, all the nodes are as far left as possible
The following is not

Something wrong with this heap diagram joke from xkcd?

I came across this picture, and someone had commented that there's a problem with the diagram, but I am not sure what it is.
Here's the picture: (original link)
Now the tree looks alright to me but the heap creates some doubt.
I know in binary heap, if the root has two children, then the left child must have it's two children before we can proceed on to the right child. Is it the case with n-ary heap also. That is, since the root has four children, then the first child should have had it's four children, before we move on to the next child.
In general, a structure is a heap if it satisfies heap condition - therefore this heap is ok, because it does satisfy it.
If we're looking for some concrete heap, I guess that pairing heap would be ok.
The problem is that there is a second condition that is generally required. That condition is that every row of the tree must be full except possibly the last, but the last row must be left-filled. In other words, if there are any nodes missing on the last row, they must be all towards the right. In the diagram the second node in the fourth row has no children, and the forth and fifth each have just a right child. Even worse, the first node in the second row doesn't have a right child. There is one more problem, but I'll leave it to you to find it.

Resources