Can someone explain Breadth-first search? - algorithm

Can someone explain Breadth-first search to solve the below kind of problems
I need to find all the paths between 4 and 7

You look at all the nodes adjacent to your start node. Then you look at all the nodes adjacent to those (not returning to nodes you've already looked at). Repeat until satisfying node found or no more nodes.
For the kind of problem you indicate, you use the process described above to build a set of paths, terminating any that arrive at the desired destination node, and when your graph is exhausted, the set of paths that so terminated is your solution set.

Breadth first search (BFS) means that you process all of your starting nodes' direct children, before going to deeper levels. BFS is implemented with a queue which stores a list of nodes that need to be processed.
You start the algorithm by adding your start node to the queue. Then you continue to run your algorithm until you have nothing left to process in the queue.
When you dequeue an element from the queue, it becomes your active node. When you are processing your active node, you add all of its children to the back of the queue. You terminate the algorithm when you have an empty queue.
Since you are dealing with a graph instead of a tree, you need to store a list of your processed nodes so you don't enter into a loop.
Once you reach node 7 you have a matching path and you can stop doing the BFS for that recursive path. Meaning that you simply don't add its children to the queue. To avoid loops and to know the exact path you've visited, as you do your recursive BFS calls, pass up the already visited nodes.

Think of it as websites with links to other sites on them. Let A be our root node or our starting website.
A is the starting page (Layer 1)
A links to AA, AB, AC (Layer 2)
AA links to AAA, AAB, AAC (Layer 3)
AB links to ABA, ABB, ABC (Layer 3)
AC links to ACA, ACB, ACC (Layer 3)
This is only three layers deep. You search one layer at a time. So in this case you would start at layer A. If that does not match you go to the next layer, AA, AB and AC. If none of those websites is the one you are searching for then you follow the links and go to the next layer. In other words, you look at one layer at a time.
A depth first search (its complement) you would go from A to AA to AAA. In other words you would go DEEP before going WIDE.

You test each node connected to the root node. Then you test each node connected to the previous nodes. So on, until you find your answer.
Basically, each iteration tests nodes that are the same distance away from the root node.

BEGIN
4;
4,2;
4,2,1; 4,2,3; 4,2,5;
4,2,1;(fail) 4,2,3,6; 4,2,5,6; 4,2,5,11;
4,2,3,6,7;(pass) 4,2,3,6,8; 4,2,5,6,7;(pass) 4,2,5,6,8; 4,2,5,11,12;
4,2,3,6,8,9; 4,2,3,6,8,10; 4,2,5,6,8,9; 4,2,5,6,8,10; 4,2,5,11,12;(fail)
4,2,3,6,8,9;(fail) 4,2,3,6,8,10;(fail) 4,2,5,6,8,9;(fail) 4,2,5,6,8,10;(fail)
END

Related

Efficient Graph Traversal for Node Editor Evaluation

I have a directed acyclic graph created by users, where each node (vertex) of the graph represents an operation to perform on some data. The outputs of a node depend on its inputs (obviously), and that input is provided by its parents. The outputs are then passed on to its children. Cycles are guaranteed to not be present, so can be ignored.
This graph works on the same principle as the Shader Editor in Blender. Each node performs some operation on its input, and this operation can be arbitrarily expensive. For this reason, I only want to evaluate these operations when strictly required.
When a node is updated, via user input or otherwise, I need to reevaluate every node which depends on the output of the updated node. However, given that I can't justify evaluating the same node multiple times, I need a way to determine the correct order to update the nodes. A basic breadth-first traversal doesn't solve the problem. To see why, consider this graph:
A traditional breadth-first traversal would result in D being evaluated prior to B, despite D depending on B.
I've tried doing a breadth-first traversal in reverse (that is, starting with the O1 and O2 nodes, and traversing up the graph), but I seem to run into the same problem. A reversed breadth-first traversal will visit D before B, thus I2 before A, resulting in I2 being ordered after A, despite A depending on I2.
I'm sure I'm missing something relatively simple here, and I feel as though the reverse traversal is key, but I can't seem to wrap my head around it and get all the pieces to fit. I suppose one potential solution is to use the reverse traversal as intended, but rather than avoiding visiting each node more than once, just visiting each node each time it comes up, ensuring that it has a definitely correct ordering. But visiting each node multiple times and the exponential scaling that comes with that is a very unattractive solution.
Is there a well-known efficient algorithm for this type of problem?
Yes, there is a well known efficient algorithm. It's topological sorting.
Create a dictionary with all nodes and their corresponding in-degree, let's call it indegree_dic. in-degree is the number of parents/or incoming edges to that node. Have a set S of the nodes with in-degree equal to zero.
Taken from the Wikipedia page with some modification:
L ← Empty list that will contain the nodes sorted topologically
S ← Set of all nodes with no incoming edge that haven't been added to L yet
while S is not empty do
remove a node n from S
add n to L
for each child node m of n do
decrement m's indegree
if indegree_dic[m] equals zero then
delete m from indegree_dic
insert m into S
if indegree_dic has length > 0 then
return error (graph is not a DAG)
else
return L (a topologically sorted order)
This sort is not unique. I mention that because it has some impact on your algorithm.
Now, whenever a change happens to any of the nodes, you can safely avoid recalculation of any nodes that come before the changed node in your topologically sorted list, but need to nodes that come after it. You can be sure that all the parents are processed before their children if you follow the sorted list in your calculation.
This algorithm is not optimal, as there could be nodes after the changed node, that are not children of that node. Like in the following scenario:
A
/ \
B C
One correct topological sort would be [A, B, C]. Now, suppose B changes. You skip A because nothing has changed for it, but recalculate C because it comes after B. But you actually don't need to, because B has no effect on C whatsoever.
If the impact of this isn't big, you could use this algorithm and keep the implementation easier and less prone to bugs. But if efficiency is key, here are some ideas that may help:
You can do a topological sort each time and include the which node has change as a factor. When choosing nodes from S in the above algorithm, choose every other node that you can before you choose the changed node. In other words, you choose the changed node from S only when S has length 1. This guarantees that you process every node that isn't below the hierarchy of the changed node before it. This approach helps when the sorting is much cheaper then processing the nodes.
Another approach, which I'm not entirely sure is correct, is to look after the changed node in the topological sorted list and start processing only when you reach the first child of the changed node.
Another way relies on idea 1 but is helpful if you can do some pre-processing. You can create topological sorts for each case of one node being changed. When a node is changed, you try to put it in the ordering as late as possible. You save all these ordering in a node to ordering dictionary and based on which node has changed you choose that ordering.

How to understand the 4 steps of Monte Carlo Tree Search

From many blogs and this one https://web.archive.org/web/20160308070346/http://mcts.ai/about/index.html
We know that the process of MCTS algorithm has 4 steps.
Selection: Starting at root node R, recursively select optimal child nodes until a leaf node L is reached.
What does leaf node L mean here? I thought it should be a node representing the terminal state of the game, or another word which ends the game.
If L is not a terminal node (one end state of the game), how do we decide that the selection step stops on node L?
Expansion: If L is a not a terminal node (i.e. it does not end the game) then create one or more child nodes and select one C.
From this description I realise that obviously my previous thought incorrect.
Then if L is not a terminal node, it implies that L should have children, why not continue finding a child from L at the "Selection" step?
Do we have the children list of L at this step?
From the description of this step itself, when do we create one child node, and when do we need to create more than one child nodes? Based on what rule/policy do we select node C?
Simulation: Run a simulated playout from C until a result is achieved.
Because of the confusion of the 1st question, I totally cannot understand why we need to simulate the game. I thought from the selection step, we can reach the terminal node and the game should be ended on node L in this path. We even do not need to do "Expansion" because node L is the terminal node.
Backpropagation: Update the current move sequence with the simulation result.
Fine. Last question, from where did you get the answer to these questions?
Thank you
BTW, I also post the same question https://ai.stackexchange.com/questions/16606/how-to-understand-the-4-steps-of-monte-carlo-tree-search
What does leaf node L mean here?
For the sake of explanation I'm assuming that all the children of a selected node are added during the expansion phase of the algorithm.
When the algorithm starts, the tree is formed only by the root node (a leaf node).
The Expansion phase adds all the states reachable from the root to the tree. Now you have a bigger tree where the leaves are the last added nodes (the root node isn't a leaf anymore).
At any given iteration of the algorithm, the tree (the gray area of the picture) grows. Some of its leaves could be terminal states (according to the rules of the game/problem) but it's not granted.
If you expand too much, you could run out of memory. So the typical implementation of the expansion phase only adds a single node to the existing tree.
In this scenario you could change the word leaf with not fully expanded:
Starting at root node R, recursively select optimal child nodes until a not fully expanded node L is reached
Based on what rule/policy do we select node C?
It's domain-dependent. Usually you randomly choose a move/state.
NOTES
Image from Multi-Objective Monte Carlo Tree Search for Real-Time Games (Diego Perez, Sanaz Mostaghim, Spyridon Samothrakis, Simon M. Lucas).

Breadth first search: the timing of checking visitation status

In a breadth first search of a directed graph (cycles possible), when a node is dequeued, all its children that has not yet been visited are enqueued, and the process continues until the queue its empty.
One time, I implement it the other way around, where all a node's children are enqueued, and the visitation status is checked instead when a node is dequeued. If a node being dequeued has been visited before, it is discarded and the process continue to the next in queue.
But the result is wrong. Wikipedia also says
depth-first search ...... The non-recursive implementation is similar
to breadth-first search but differs from it in two ways: it uses a
stack instead of a queue, and it delays checking whether a vertex has
been discovered until the vertex is popped from the stack rather than
making this check before pushing the vertex.
However, I cannot wrap my head around what exactly is the difference. Why does depth first search check when popping items out and breadth first search must check before enqueuing?
DFS
Suppose you have a graph:
A---B---E
| |
| |
C---D
And you search DFS from A.
You would expect it to search the nodes A,B,D,C,E if using a depth first search (assuming a certain ordering of the children).
However, if you mark nodes as visited before placing them on the stack, then you will visit A,B,D,E,C because C was marked as visited when we examined A.
In some applications where you just want to visit all connected nodes this is a perfectly valid thing to do, but it is not technically a depth first search.
BFS
In breadth first search you can mark the nodes as visited either before or after pushing to the queue. However, it is more efficient to check before as you do not end up with lots of duplicate nodes in the queue.
I don't understand why your BFS code failed in this case, perhaps if you post the code it will become clearer?
DFS checks whether a node has been visited when dequeing because it may have been visited at a "deeper" level. For example:
A--B--C--E
| |
-------
If we start at A, then B and C will be put on the stack; assume we put them on the stack so B will be processed first. When B is now processed, we want to go down to C and finally to E, which would not happen if we marked C as visited when we discovered it from A. Now once we proceed from B, we find the yet unvisited C and put it on the stack a second time. After we finished processing E, all C entries on the stack need to be ignored, which marking as visited will take care of for us.
As #PeterdeRivaz said, for BFS it's not a matter of correctness, but efficiency whether we check nodes for having been visited when enqueuing or dequeuing.

How can I do this graph traversal?

I have a Directed Cyclic graph consisting of node a, b, c, d, e,f g, where ever node is connected to every other node. The edges may be unidirectional or bidirectional. I need to printout a valid order like this for eg. f->a->c->b->e->d->g such that I can reach the end node from the start node. Note that all the nodes must be present in the output list.
Also note that there may be cycles in the graph.
What I came up with:
Basically first we can try to find a start node. If there is a node such that there is no incoming edge to it (there could be atmost one such node). I may find a start node or may not. Also I will do some preprocessing to find the total number of nodes(lets call it n). Now I will start a DFS from the start node marking nodes as visited when I reach them and counting how many nodes I visited. If I can reach n nodes by this method. I am done. If I hit a node, from which there are no outgoing edges to any unvisited node, I have hit a dead end, and I will just mark that node as unvisited again, reduce the pointer and go to its previous node to try a different route.
This was the case when I find a start node. If I dont find a start node, I will just have to try this with various nodes.
I have no idea if I am even close to the solution. Can anyone help me in this regard?
In my opinion, if there is no incoming edge to a node, it means that node is a start node. You can traverse the graph using this start node. And if this start node can not visit all the n nodes, then there is no solution (as you said that all the nodes must be present in the output list.). This is because if you start with some other nodes, you won't be able to reach this start node.
The problem with your solution is that if you enter a loop you don't know if and when to exit.
A DFS search in these conditions can easily became a non polynomial task!
Let me introduce a polynomial algorithm for your problem.
It looks complicated I hope there's room for simplifications.
Here my suggested solution
1) For each node construct the table of the nodes it can reach (if a can reach b and c; b can reach d; c can reach e; a can reach b,c,d,e even tough there is not a single pathfrom a passing through all of them).
If no node can reach all the other ones you're done: there is no the path you're looking for.
2) Find loops. That's easy: if a node can reach itself, there is a loop. This should be part of the construction of the table at the previous point.
Once you have find one loop you can shrink it (and its nodes) to the representative node whose ingoing (outgoing) connections are the union of the ingoing (outgoing) connections of the nodes in the loop.
You keep reducing loops until you cannot do any more.
3) At this point you are left with an acyclic graph, If there is a path connecting all nodes, there is a single node connected to all and starting from it you can perform depth first search.
4)
Write down the path by replacing the traversal of representative nodes with a loop from the entry point of the loop to the exit point.

Algorithm for creating dedicated paths between two nodes

I've some texts that the sequence follows a specific order. Some texts change in consequence of the traversed trail. My goal is generate static pages for each page, interconnecting them through links.
The question is to solve a problem for a tool that will generate text for printed books (that is static, obviously). So imagine you are reading a book that is represented in the Example 1 (on the image bellow). Initially, you're in the node A, and the text of this page is "Go to page B or page C". Choosing the node C, followed of F -> B -> E -> H, you'll see a content in the node H that should be different than what you would see whether you have been traversed by A -> B -> D -> H, for instance. As it is a printed book, I need to duplicate some paths to make possible change the content of some nodes according with the path traversed.
Example:
In this example, I have two possibilities for traversing:
A -> B -> D
A -> C -> D
Expected result:
Page 1: A (link to page 2 and 3)
Page 2: B (link to page 4)
Page 3: C (link to page 5)
Page 4: D
Page 5: DD'
This simple example generates 5 pages, once the page 4 has a part of text that should be displayed only whether the reading passes through page 3.
To model this problem I chose use the Graph Theory. For a better understanding, I drew in the below graph two examples of the problem I'm trying to solve :
Note that the red dashed edges are not edges in fact. These are a way that I've used to represent when the content of the a given node X changes in consequence of visiting the node Y (reads "the content of node X changes if the path to arrive in X passes by Y").
I read a lot about graphs, traversing strategies (BFS and DFS) and some other topics. My goal is develop a algorithm that rearranges a given graph in a manner to be possible generate the pages mentioned previously. I didn't find any well-known problem that solves this problem, but I believe it should already exist. My research didn't find anything useful, so I tried to solve by myself.
My successfully approach consisted in traversing the graph up to find a node that contains a content that depends of others nodes. Once this node has been found, finds all paths from the dependent nodes to the current node. Traverses these paths duplicating all nodes that contains more than one incoming edge, removing the previous connection and connecting the current node with the duplicated one, and so on until consume all nodes of the path. This algorithm works well, but this approach is not efficient and can be very slow with long texts.
My question is: do you know any other better way to solve this problem? Is there any theory or known algorithm that can solve this kind of problem?
Thanks in advance.
Do a DFS and when you see a visited node, duplicate it, break the link through which you just visited and mark the new node as visited and continue dfs from this node. This method does not visit node multiple times and hence is the fastest(meaning it will visit H1 just exacyly 2 times not n or k times).
This is linear in terms of output graph. That is if the output graph has V' vertices and E' edges its order is O(V'+E'). You can not achieve better as you must visit everything in output graph atleast once.
I am assuming the rules of these red edges are stactic. Keep multiple content in one node instead of duplicating it. Now as the content displayed depends on the path taken to reach it, at each step we can check the "stack" of the DFS to see the path taken to reach it.The stack will give us the exact path taken to reach it (but note that it will not give details whether path visited other decendants of the parent). Then we compare to our static rules that we already have and display the content.
Time complexity analysis(Worst case):
At each step of DFS we check the entire stack against the rules. The maximum length of stack can be h(where h is the hight of tree). Hence the time complexity is O((V+E)*h).
Alternatively, if path visited other decendants of the parent matters(like analysing path A->B->E and if it matters D was already visted ), you can introduce the red edges yourself on the data-structure based upon the rules. Again keep multiple content in a node. While deciding which content to display simply check if the "endpoint" of "red edges originating from this vertice" are already visited. Now use the rules to display the appropriate content

Resources