How to understand the 4 steps of Monte Carlo Tree Search

How to understand the 4 steps of Monte Carlo Tree Search - algorithm

From many blogs and this one https://web.archive.org/web/20160308070346/http://mcts.ai/about/index.html
We know that the process of MCTS algorithm has 4 steps.
Selection: Starting at root node R, recursively select optimal child nodes until a leaf node L is reached.
What does leaf node L mean here? I thought it should be a node representing the terminal state of the game, or another word which ends the game.
If L is not a terminal node (one end state of the game), how do we decide that the selection step stops on node L?
Expansion: If L is a not a terminal node (i.e. it does not end the game) then create one or more child nodes and select one C.
From this description I realise that obviously my previous thought incorrect.
Then if L is not a terminal node, it implies that L should have children, why not continue finding a child from L at the "Selection" step?
Do we have the children list of L at this step?
From the description of this step itself, when do we create one child node, and when do we need to create more than one child nodes? Based on what rule/policy do we select node C?
Simulation: Run a simulated playout from C until a result is achieved.
Because of the confusion of the 1st question, I totally cannot understand why we need to simulate the game. I thought from the selection step, we can reach the terminal node and the game should be ended on node L in this path. We even do not need to do "Expansion" because node L is the terminal node.
Backpropagation: Update the current move sequence with the simulation result.
Fine. Last question, from where did you get the answer to these questions?
Thank you
BTW, I also post the same question https://ai.stackexchange.com/questions/16606/how-to-understand-the-4-steps-of-monte-carlo-tree-search

What does leaf node L mean here?
For the sake of explanation I'm assuming that all the children of a selected node are added during the expansion phase of the algorithm.
When the algorithm starts, the tree is formed only by the root node (a leaf node).
The Expansion phase adds all the states reachable from the root to the tree. Now you have a bigger tree where the leaves are the last added nodes (the root node isn't a leaf anymore).
At any given iteration of the algorithm, the tree (the gray area of the picture) grows. Some of its leaves could be terminal states (according to the rules of the game/problem) but it's not granted.
If you expand too much, you could run out of memory. So the typical implementation of the expansion phase only adds a single node to the existing tree.
In this scenario you could change the word leaf with not fully expanded:
Starting at root node R, recursively select optimal child nodes until a not fully expanded node L is reached
Based on what rule/policy do we select node C?
It's domain-dependent. Usually you randomly choose a move/state.
NOTES
Image from Multi-Objective Monte Carlo Tree Search for Real-Time Games (Diego Perez, Sanaz Mostaghim, Spyridon Samothrakis, Simon M. Lucas).

Related

How to remove invalid edges in graded directed acyclig graph?

The following graph sample is a portion of a directed acyclic graph which is to be layered and cleaned up so that only edges connecting consecutive layers are kept.
So what I need is to eliminate edges that form "shortcuts", that is, that jump between non-consecutive layers.
The following considerations apply:
The bluish ring layering is valid because, starting at 83140 and ending at 29518, both branches have the same amount (3) of intermediary nodes, and there is no path that is longer between start and end node;
The green ring, starting at 94347 and ending at 107263, has an invalid edge (already red-crossed), because the left branch encompasses only one intermediary node, while the right branch encompasses three intermediary nodes; Besides, since the first edge of that branch is already valid - we know it pertains to the valid blue ring - it is possible to know which is the right edge to cross-out - otherwise it would be impossible to know which layer should be assigned to node 94030 and so it should be eliminated;
If we consider the pink ring after considering the green one, we know that the lower red-crossed edge is to be removed.
BUT if we consider only the yellow ring, both branches seem to be right (they contain the same number of inner nodes), but actually they only seem right because they contain symmetric errors (shortcuts jumping the same amount of nodes on both branches). If we take this ring locally, at least one of the branches would end up in wrong layers, so it is necessary to use more global data to avoid this error.
My questions are:
What typical concepts and operations are involved in the formulation and possible solution of this problem?
Is there an algorithm for that?

First, topologically sort the graph.
Now from the beginning of sorted array, start breadth first search and try to find the proper "depth" (i.e distance from root) of every node. Since a node can have multiple parents, for a node x, depth[x] is maximum of depth of all it's parents, plus one. We initialize depth for all nodes as -1.
Now in bfs traversal, when we encounter a node p, we try to update the depth of all it's childs c, where depth[c] = max(depth[c],depth[p]+1). Now there are two ways we can detect a child with shortcut.
if depth[p]+1 < depth[c], it means c has a parent with higher depth than p. So edge p to c must be a shortcut.
if depth[p]+1 > depth[c] and depth[c]!=-1, it means c have a parent with lower depth than p. So p is a better parent, and that other parent of c must have a shortcut with p.
In both cases, we mark c as problematic.
Now our goal is for every 'problematic' node x, we check all it's parent, whose depth should be depth[x]-1. If any of them have depth that is lower than that, that one have a shortcut edge with x that needs to be removed.
Since the graph can have multiple roots, we should have a variable to mark visited nodes, and repeat the above thing for any that's left unvisited.
This will sort the yellow ring problem, because before we visit any node, all it's predecessors has already been visited and properly ranked. This is ensured by the topological sort.
(Note : we can do this by just one pass. Instead of marking problematic nodes, we can maintain a parent variable for all nodes, and delete edge with the old parent whenever case 2 occurs. case 1 should be obvious)

Can anyone explain the deletion of Left-Lean-Red-Black tree clearly?

I am learning Left-Lean-Red-Black tree, from Prof.Robert Sedgewick
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
While I got to understand the insert of the 2-3 tree and the LLRB, I have spent totally like 40 hours now for 2 weeks and I still can't get the deletion of the LLRB.
Can anyone really explain the deletion of LLRB to me?

Ok I am going to try this, and maybe the other good people of SO can help out. You know how one way of thinking of red nodes is as indicators of
where there there imbalance/new nodes in the tree, and
how much imbalance there is.
This is why all new nodes are red. When the nodes (locally) balance out, they undergo a color flip, and the redness is passed up to the parent, and now the parent may look imbalanced relative to its sibling.
As an illustration, consider a situation where you are adding nodes from larger to smaller. You start with node Z which is now root and is black. You add node Y, which is red and is a left child of Z. You add a red X as a child of Z, but now you have two successive reds, so you rotate right, recolor, and you have a balanced, all black (no imbalance/"new nodes"!) tree rooted at Y [first drawing]. Now you add W and V, in that order. At first they are both red [second drawing], but immediately V/X/W are rotated right, and color flipped, so that only X is red [third drawing]. This is important: X being red indicates that left subtree of Y is unbalanced by 2 nodes, or, in other words, there are two new nodes in the left subtree. So the height of the red links is the count of new, potentially unbalanced nodes: there are 2^height of new nodes in the red subtree.
Note how when adding nodes, the redness is always passed up: in color flip, two red children become black (=locally balanced) while coloring their parent red. Essentially what the deletion does, is reverse this process. Just like a new node is red, we always also want to delete a red node. If the node isn't red, then we want to make it red first. This can be done by a color flip (incidentally, this is why color flip in the code on page 3 is actually color-neutral). So if the child we want to delete is black, we can make it red by color-flipping its parent. Now the child is guaranteed to be red.
The next problem to deal with is the fact that when we start the deletion we don't know if the target node to be deleted is red or not. One strategy would be to find out first. However, according to my reading of your first reference, the strategy chosen there is to ensure that the deleted node can be made red, by "pushing" a red node down in front of the search node as we are searching down the tree for the node to be deleted. This may create unnecessary red nodes that fixUp() procedure will resolve on the way back up the tree. fixUp() presumably maintains the usual LLRBT invariants: "no successive red nodes" and "no right red nodes."
Not sure if that helps, or if we need to get into more detailed examination of code.

There is an interesting comment about the Sedgwich implementation and in particular its delete method from a Harvard Comp Sci professor. Left-Leaning Red-Black Trees Considered Harmful was written in 2013 (the Sedgwich pdf you referenced above is dated 2008):
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as the default and describes 2–3 trees as a variant. The delete implementation, however, only works for 2–3 trees. If you implement the default variant of insert and the only variant of delete, your tree won’t work. The text doesn’t highlight the switch from 2–3–4 to 2–3: not kind.
The most recent version I could find of the Sedgwich code, which contains a 2-3 implementation, is dated April 2014. It is on his Algorithms book site at RedBlackBST.java

Follow the next strategy to delete an arbitrary node in a LLRB tree which is not in a leaf:
Transform a LLRB tree to a 2-3-4 tree (we do not need to transform the whole tree, only a part of the tree).
Replace the value of the node (which we want to delete) its successor.
Delete its successor.
Fix the tree (recover balance, see the book "Algorithms 4th edition" on the pages 435, 436).
If a value in a leaf then we do not need to use a successor to replase this value, but we still need to transform the current tree to 2-3-4 tree to delete this value.
The slide on the page 20 of this presentation https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf and the book "Algorithms 4th edition" on the page 437 are a key. They show how a LLRB tree transformations into a 2-3 tree. In the book "Algorithms 4th edition" on the page 442 https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA442 is an algorithm of transformation for trees.
For example, open the page 54 of the presentation https://www.cs.princeton.edu/~rs/talks/LLRB/08Dagstuhl/RedBlack.pdf. The node H has children D and L. According to the algorithm on the page 442 we transform these three nodes into the 4-node of a 2-3-4 tree. Then the node D has children B and F we also transform these nodes into a node of 2-3-4 tree. Then the node B has children A and C we also transform these nodes into a node of 2-3-4 tree. And finally we need to delete A. After deletion we need to recover balance. We move up through the tree and we restore balance of the tree (according to rules, see the book "Algorithms 4th edition" on the pages 435, 436). If you need to delete the node D (the same tree on the page 54). You need the same transformations and need to replace the value of the node D on the value of the node E and delete the node E (because it is a successor of D).

What defines left and right in graphs and trees?

I'm learning about tree traversals and I can't seem to find any clear rules for how DFS or BFS algorithms decide which path to take first. I've seen variations of left first or least first.
Is left taken as being first child in the list?
Does this mean that (for a given node) the depth of a vertex in a graph that is part of a cycle is taken using the leftward path?
Also doesn't using a 'least first' rule make for a slower algorithm?
Thanks

Left is only meaningful for trees where the child nodes are oldered. Otherwise usually the author refers to first in the list of child nodes. Depth of a vertex is also not well defined in a graph that is not a tree, but if you refer to depth with respect to a given node that would usually be the shortest distance from the starting node.
I am not sure what does least first mean but if it refers to the key values of nodes and there is no ordering in the child nodes, finding the least will take more time of course.
Hope this helps.

Find a loop in a binary tree

How to find a loop in a binary tree? I am looking for a solution other than marking the visited nodes as visited or doing a address hashing. Any ideas?

Suppose you have a binary tree but you don't trust it and you think it might be a graph, the general case will dictate to remember the visited nodes. It is, somewhat, the same algorithm to construct a minimum spanning tree from a graph and this means the space and time complexity will be an issue.
Another approach would be to consider the data you save in the tree. Consider you have numbers of hashes so you can compare.
A pseudocode would test for this conditions:
Every node would have to have a maximum of 2 children and 1 parent (max 3 connections). More then 3 connections => not a binary tree.
The parent must not be a child.
If a node has two children, then the left child has a smaller value than the parent and the right child has a bigger value. So considering this, if a leaf, or inner node has as a child some node on a higher level (like parent's parent) you can determine a loop based on the values. If a child is a right node then it's value must be bigger then it's parent but if that child forms a loop, it means he is from the left part or the right part of the parent.
3.a. So if it is from the left part then it's value is smaller than it's sibling. So => not a binary tree. The idea is somewhat the same for the other part.
Testing aside, in what form is the tree that you want to test? Remeber that every node has a pointer to it's parent. An this pointer points to a single parent. So depending of the format you tree is in, you can take advantage from this.

As mentioned already: A tree does not (by definition) contain cycles (loops).
To test if your directed graph contains cycles (references to nodes already added to the tree) you can iterate trough the tree and add each node to a visited-list (or the hash of it if you rather prefer) and check each new node if it is in the list.
Plenty of algorithms for cycle-detection in graphs are just a google-search away.

Is there an effient way of determining whether a leaf node is reachable from another arbitrary node in a Directed Acyclic Graph?

Wikipedia: Directed Acyclic Graph
Not sure if leaf node is still proper terminology since it's not really a tree (each node can have multiple children and also multiple parents) and also I'm actually trying to find all the root nodes (which is really just a matter of semantics, if you reverse the direction of all the edges it'd they'd be leaf nodes).
Right now we're just traversing the entire graph (that's reachable from the specified node), but that's turning out to be somewhat expensive, so I'm wondering if there's a better algorithm for doing this. One thing I'm thinking is that we keep track of nodes that have been visited already (while traversing a different path) and don't recheck those.
Are there any other algorithmic optimizations?
We also thought about keeping a list of root nodes that this node is a descendant of, but it seems like maintaining such a list would be fairly expensive as well if we need to check if it changes every time a node is added, moved, or removed.
Edit:
This is more than just finding a single node, but rather finding ALL nodes that are endpoints.
Also there is no master list of nodes. Each node has a list of it's children and it's parents. (Well, that's not completely true, but pulling millions of nodes from the DB ahead of time is prohibitively expensive and would likely cause an OutOfMemory exception)
Edit2:
May or may not change possible solutions, but the graph is bottom-heavy in that there's at most a few dozen root nodes (what I'm trying to find) and some millions (possibly tens or hundreds of millions) leaf nodes (where I'm starting from).

There are a few methods that each may be faster depending on your structure, but in general what youre going to want is a traversal.
A depth first search, goes through each possible route, keeping track of nodes that have already been visited. It's a recursive function, because at each node you have to branch and try each child node of it. There's no faster method if you dont know which way to look for the object you just have to try each way! You definitely need to keep track of where you have already been because it would be wasteful otherwise. It should require on the order of the number of nodes to do a full traversal.
A breadth first search is similar but visits each child of the node before "moving on" and as such builds up layers of distance from the chosen root. This can be faster if the destination is expected to be close to the root node. It would be slower if it is expected to be all the way down a path, because it forces you to traverse every possible edge.
Youre right about maybe keeping a list of known root nodes, the tradeoff there is that you basically have to do the search whenever you alter the graph. If you are altering the graph rarely this is acceptable, but if you alter the graph more frequently than you need to generate this information, then of course it is too costly.
EDIT: Info Update.
It sounds like we are actually looking for a path between two arbitrary nodes, the root/leaf semantic keeps getting switched. The DepthFirstSearch (DFS) starts at one node, and then for each unvisited child, recurse. Break if you find the target node. Due to the way recursion evaluates, this will traverse all the way down the 'left' path, then enumerate nodes at this distance before ever getting to the 'right' path. This is time costly and inefficient if the target node is potentially the first child on the right. BreadthFirst walks in steps, covering all children before moving forward. Because your graph is bottom heavy like a tree, both will be approximately the same execution time.
When the graph is bottom heavy you might be interested in a reverse traversal. Start at the target node and walk upwards, because there are relatively fewer nodes in this direction. So long as the nodes in general have more parents than children, this direction will be much faster. You can also combine the approaches, stepping one up and one down , then comparing lists of nodes, and meeting somewhere in the middle. (this combination might seem the fastest if you ignore that twice as much work is done at each step).
However, since you said that your graph is stored as a list of lists of children, you have no real way of traversing the graph backwards. A node does not know what its parents are. This is a problem. To fix it you have to get a node to know what its parents are by adding that data on graph update, or by creating a duplicate of the whole structure (which you have said is too large). It will need the whole structure to be rewritten, which sounds probably out of the question due to it being a large database at this point.
There's a lot of work to do.
http://en.wikipedia.org/wiki/Graph_(data_structure)

Just color (keep track of) visited nodes.
Sample in Python:
def reachable(nodes, edges, start, end):
color = {}
for n in nodes:
color[n] = False
q = [start]
while q:
n = q.pop()
if color[n]:
continue
color[n] = True
for adj in edges[n]:
q.append(adj)
return color[end]

For a vertex x you want to compute a bit array f(x), each bit corresponds to a root vertex Ri, and 1 (resp 0) means "x can (resp can't) be reached from root vertex Ri.
You could partition the graph into one "upper" set U containing all your target roots R and such that if x in U then all parents of x are in U. For example the set of all vertices at distance <=D from the closest Ri.
Keep U not too big, and precompute f for each vertex x of U.
Then, for a query vertex y: if y is in U, you already have the result. Otherwise recursively perform the query for all parents of y, caching the value f(x) for each visited vertex x (in a map for example), so you won't compute a value twice. The value of f(y) is the bitwise OR of the value of its parents.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio