DFS vs BFS .2 differences - algorithm

According to wikipedia, there are basically two difference in implementation of DFS and BFS.
They are:
1)DFS uses stack while BFS uses queue.(this I understand).
2)DFS delays checking whether a vertex has been discovered until the vertex is popped from the stack rather than making this check before pushing the vertex.
I am not able to understand the second difference.I mean why DFS visits the node after removing from stack while BFS visits the node before adding it to queue.
Thanks!
Extra info:
In a simple implementation of above two algorithms, we take a boolean array (let us name it visited) to keep track of which node is visited or not.The question mentions this visited boolean array.

This is probably the first time I ever heard of when DFS would delay setting the "discovered" property until popped from the stack (even on wikipedia, both the recursive and iterative pseudocode have labeling the current node as discovered before pushing the children in the stack). Also, if you "discover" the node only when you finish processing it, I think you could easily get into endless loops.
There are however situations when I use two flags for each node: one set when entering, one upon leaving the node (usually, I write DFS as recursive, so, right at the end of the recursive function). I think I used something like this is when I needed stuff like: strongly connected components or critical points in a connected graph (="nodes which, if removed, the graph would lose its connectivity"). Also, the order in which you exit the nodes is often used for topological sorting (the topological sort is just the inverse of the order you finished processing the nodes).

The wikipedia article mentions two ways to perform a DFS: using recursion and using a stack.
For completeness, I copy both here:
Using recursion
procedure DFS(G,v):
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
if vertex w is not labeled as discovered then
recursively call DFS(G,w)
Using a stack
procedure DFS-iterative(G,v):
let S be a stack
S.push(v)
while S is not empty
v ← S.pop()
if v is not labeled as discovered:
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
S.push(w)
The important thing to know here is how method calls work. There is an underlying stack, lets call is T. Before the method gets called, it's arguments are pushed on the stack. The method then takes the arguments from that stack again, performs its operation, and pushes its result back on the stack. This result is then taken from the stack by the calling method.
As an example, consider the following snippet:
function caller() {
callResult = callee(argument1, argument2);
}
In terms of the stack T, this is what happens (schematically):
// inside method caller
T.push(argument1);
T.push(argument2);
"call method callee"
// inside method callee
argument2 = T.pop();
argument1 = T.pop();
"do something"
T.push(result);
"return from method callee"
// inside method caller
callResult = T.pop();
This is pretty much what is happening in the second implementation: the stack is used explicitly. You can compare making the call to DFS in the first snippet with pushing a vertex on the stack, and compare executing the call to DFS in the first snippet with popping the vertex from the stack.
The first thing after popping vertex v from the stack is marking it as discovered. This is equivalent to marking it as discovered as a first step in the execution of DFS.

Related

How to find longest accepted word by automata?

I need to write a code in Java that will find the longest word that DFA accepts. Firstly, if there is transition to one of previous states (or self-transition) on path that leads to final state, that means there are infinite words, and longest one doesn't exist (that means there is Kleene star applied on some word). I was thinking to form queue by BFS, where each level is separated by null, so that when I'm iterating through queue and come across null, length of the word would be increases by one, but it would be hard to track set of previous states so I'm kind of idealess. If you can't code in Java I would appreciate pseudocode or algorithm.
I don't think this is strictly necessary, but it would not hurt the performance too terribly much in practice and might be sufficient for your needs. I would suggest, as a first pass, minimizing the DFA. This can be done in O(nlogn) in terms of the number of states, using e.g. Hopcroft. This is probably conceptually similiar to what Christian Sloper suggests in the comments regarding reversing the transitions to find unproductive states ; indeed, there is a minimization algorithm that does this as well, but you might be able to get away with just removing unproductive states and not minimizing here (though minimizing does make the reasoning a little easier).
Doing that is nice because it will remove all unproductive loops and combine them into a single dead state, if indeed there are any unproductive prefixes. It is easy to find the one dead state, if there is one, and remove it from the directed graph formed by the DFA's states and transitions. To do this, do either DFS or BFS and check each state to come to and see if (1) all transitions are self-loops and (2) the state is not accepting.
With the one dead state removed (if any) any loops or cycles we detect in the remaining directed graph imply there are infinitely many strings in the language, since by definition any remaining states have a path to acceptance. If we find a loop or cycle, we know the language is infinite, and can respond accordingly.
If there are no loops or cycles remaining after removing the dead state from the minimal DFA, what remains is a tree rooted at the start state and whose leaves are accepting states (think about this for a moment and you will see it must be true). Therefore, the length of the longest string accepted is the length (in edges) of the longest path from the root to a leaf; so basically the height of the tree or something close to it (depending on how you define depth/height, whether edges or nodes). You can take any old algorithm for finding the depth and modify it so that in addition to returning the depth, it returns the string corresponding to the deepest subtree, so you can get the string without having to go back through the tree. Something like this:
GetLongestStringInTree(root)
1. if root is null return ""
2. result = ""
3. maxlen = 0
4. for each transition
5. child = transition.target
6. symbol = transition.symbol
7. str = GetLongestStringInTree(child)
8. if str.length > maxlen then
9. maxlen = str.length
10. result = str
11. return result
This could be pretty easily modified to find all words of maximum length by adding str to a collection if its length is equal to the max length so far, and emptying that collection when a new longer string is found, and returning the collection (and using the length of the first thing in the collection for checking). That can be left as an exercise; as written, this will just find some arbitrary longest string accepted by the DFA.
This problem becomes a lot simpler if you split it in two. (Sorry no java)
Step 1: Determine if there is a loop.
If there is a loop there exist an infinite long input. Detecting a loop in a directed graph can be done with DFS.
Step 2 (no loop): You now have a directed acyclic graph (DAG) and you can find the longest path using this algorithm: Longest path in Directed acyclic graph

How to find the previous node of a singly linked list without the head pointer

Hi this was an interview question, Any suggestions/answers would be appreciated.
In the question : He gave me a singly linked list and pointed out at a random node and asked me to find the previous node. I asked if i have access to the head pointer , he said no. I asked if it was a circular or double linked list, he again said no. How would i find the previous node?
P.S. - I do not have access to the head pointer at all, so i can't keep track of previous pointer.
From a purely CS standpoint, if you construct a non-cyclic singly linked-list of length n as:
L1 -> L2 -> ... -> L(x-1) -> Lx -> ... -> Ln
The following theorem holds true:
Theorem: Node Lx can only be reached by nodes L1 through Lx.
Just by looking at it, this is fairly obvious: there is no going backwards. I skipped over some steps, but this is a fairly easy conclusion to make. There is no path from Lx to previous nodes in the chain. A formal proof would use induction. This page explains how one might perform induction over a linked list.
The contrapositive is: Nodes L1 through L(x-1) cannot be reached by node Lx. As a direct result L(x-1) cannot be reached by node Lx which contradicts the claim by the interviewer. Either the interviewer is wrong or your interpretation of the interviewer is wrong.
I was only given the random node and i was asked to find the previous node, no head pointer, no cycles
If such a thing were possible, it would break many existing applications of computer science. For example, Garbage Collection in safe languages like Python or Ruby relies on nodes no longer being reachable once there is no path to them. If you were able to reach a node in this way, you could cause a "use after freed" bug making the language unsafe.
The interviewer might have expected you to state the question is impossible. I have had interviews where this is the case. Sometimes, the interviewer is probing for a more "creative" solution. For example, in a language like C++, all the nodes may be stored in an underlying resource pool which can be iterated over to find the previous node. However, I would find this implementation unusual for an interview and in practice.
Needless to say, the problem as you have stated is not possible under the given constraints.
You can do it like this.. you can replace the value of current node by value of next node.. and in the next of 2nd last node you can put null. its like delete a element from a string.
here is code
void deleteNode(ListNode* node) {
ListNode *pre=node;
while(node->next)
{
node->val=node->next->val;
pre=node;
node=node->next;
}
pre->next=NULL;
}
The ONLY way you can do this is if the linked list is circular, i.e., the last node points to the first node as a type of circular list. Then it is simply a list walk keeping track of the previous node until you arrive again at the node you're on.
It is possible. here is the code for your reference.Here, I have assumed that I know the data values of each node. You can test this code by giving 'b' and 'c' in call method. Obviously you can create multiple objects too. Let me know if this is the solution you are looking for.
# Program to delete any one node in a singly link list
class A:
def __init__(self,data):
self.data=data
self.node=None
class B:
def __init__(self):
self.head=None
def printlist(self):
printval=self.head
c=0
self.call(printval,c)
def call(self,printval,c):
temp1=None
temp2=None
while printval:
if printval.data=='c' and c==0:
c=c+1
temp2=printval.node
del printval
temp1.node=temp2
printval=temp1.node
print(printval.data)
temp1 = printval
printval=printval.node
o1=B()
o1.head=A("a")
o2=A("b")
o3=A("c")
o4=A("d")
o1.head.node=o2
o2.node=o3
o3.node=o4
o1.printlist()

Efficient way to find all paths between two nodes in a non-cyclical directed graph

I want to find all paths between two nodes in a graph. I wrote a recursive function that finds all the paths with help of the depth-first-search algorithm. But for bigger graphs, it is very inefficient, so that i can not use it for my programm.
I am thinking about implementing an iterative method for my problem. This will be very time consuming for me. So did anybody know if this would make sense?
Is an iterative way more efficient in this case? Or is it possible to optimize my recursive method?
My current function:
function RecDFS(g::GenericGraph, visited)
nodes = out_neighbors(visited[length(visited)], g)
for i in nodes
if in(i,visited)
continue
end
if i.label == "End"
push!(visited,i)
println(visited) # print every path from the first node in visited to the node with the label End
pop!(visited)
break
end
# continue recursive..
for i in nodes
if (in(i, visited) || i.label == "End")
continue
end
push!(visited,i)
depthFirstSearchAllI(g, visited)
pop!(visited)
end
end
The problem you are trying to solve is actually an NP-hard problem, which means there is no polynomial time algorithm for it, yet!
So you can maybe find some optimizations for your problem, but you cannot have it run fast enough!
As in optimizations, you can do the following. First of all you mentioned in your question topic that your input is a DAG graph, and DAGs by definition have the following property:
There are no paths between two nodes in two different connected parts of DAG.
so if you have a list of what nodes are in which connected piece of the DAG ( this is achievable in polynomial time ), you can easily cross out a lot of hopeless combinations.
As in making your program iterative, you can easily use a stack instead. Just replace every recursive call with a stack.push(node) and put the traverse part of your code inside a while(stack is not empty), and just pop the nodes one by one unless there are none. That should do it.
After some thoughts i have found a good solution for my problem. Take a look at this example code:
function RecDFS(g::GenericGraph, visited)
nodes = out_neighbors(visited[length(visited)], g)
if(checkPath(visited))
for i in nodes
if in(i,visited)
continue
end
if i.label == "End"
push!(visited,i)
println(visited) # print every path from the first node in visited to the node with the label End
pop!(visited)
break
end
end
# continue recursive..
for i in nodes
if (in(i, visited) || i.label == "End")
continue
end
push!(visited,i)
depthFirstSearchAllI(g, visited)
pop!(visited)
end
end
end
All in all i have just added an additional if-statement. The function checkPath(visited) returns true, if the path is valid until now. If the path (or piece of path) is not valid, the function ends.
For my specific problem, this is a very good solution. It was 100 times faster in my test run and needs only 15 seconds for my biggest problem instance with 500 nodes and 16000 edges.
Thank you very much Ashkan Kzme and Rob for your help.
Perform a topological sort of the vertices in your DAG, to get [v0, v1, ... , vn]. Suppose your start node is vs, your destination vt. (If s > t then there are no paths)
Then, for each i in s+1 .. t calculate the paths P(i) from vs to vi as follows:
If there's an edge vs -> vi, that's one path (of length 1)
Find all j such that s < j < i and there's an edge vj -> vi. Add all paths obtained by taking paths from P(j) and appending the edge vj -> vi
Note. for a given i there's no guarantee that there are any paths at all from vs to vi
As already commented, there can be exponentially many paths, thus outputting all paths can't in general be done in less than exponential time. However, you can calculate the number of paths in linear time using this approach.
You can use a queue for depth first search, which saves the nodes you've just passed by.

Time complexity of recursive method involving a tree traversal

I'm trying to get my head around the calculation of time complexity of my own method - can anybody nudge me in the direction of calculating a method involving a for each method which involves a recursive call on itself?
I've written a method that does a tree traversal of an -nary tree. Unfortunately I can't post the exact code, but it goes: given the root node to start with
for each (child of node)
does a quick check
sets a boolean
does a recursive call on itself until we go to the lead nodes
Your loop visits every node of the tree exactly once.
Beginning withe the root node, you visit all of its child nodes which for them you call the same function on every child node of the root-child nodes and the same repeats.
Since you visit every node exactly once this loop has a runtime of O(n) for n nodes of your tree assuming that quick check is constant and does not depend on n or does anything that exceeds O(n).
"is the for each part done n times":
Yes and no: The for each part is done numberOfChildsOfNode(Node node) for a single node but since you do that for each child node by calling your function recursively the number of times this is executed is actually n times.
What you can test/try: Declare a static variable executionCount or somtheing like that, initialize it to 0 and increment it inside your loop. You should see that executionCount equals the number of nodes.

DFS questions (one for rooted trees, one for graphs)

Question 1:
For rooted trees, like binary trees, why do people make sure never to push NULL/nil/null onto the stack?
e.g.
while(!stack.empty() ) {
x = stack.pop();
visit(x);
if(x->right) stack.push(x->right);
if(x->left) stack.push(x->left);
}
instead of
while(!stack.empty() ) {
x = stack.pop();
if(!x) continue;
visit(x);
stack.push(x->left), stack.push(x->right);
}
I mean, the second form more naturally aligns with the recursive form of preorder/DFS, so why do people use the "check before" with iterative usually, and "check after" with recursive? Any reason I am missing, other than saving (n+1) stack spots (which doesn't add to the space complexity)?
Question 2:
DFS on Graph: Using iterative, why do people set visited flag when pushing adjacent nodes of current node, but in recursive, we only do it after the recursion takes place?
e.g. iterative:
while(!stack.empty()){
x=stack.pop();
// we do not need to check if x is visited after popping
for all u adjacent from x
if(!visited[u]) stack.push(u), visited[u]=true; //we check/set before push
}
but in recursive:
void DFS(Graph *G, int x)
{
if( !visited[x] ) return; //we check/set after popping into node
visited[x]=true;
for all u adjacent from x
DFS(G, u); //we do not check if u is already visited before push
}
SO in general, the connection between two questions is: Why we are more careful before pushing valid things onto an actual stack object (iterative methods of DFS), but are less careful when using the hardware stack (recursive)? Am I missing something? Isn't the hardware stack more of a "luxury" because it can overflow (for stack objects, we can declare them off heap) ?
Thank you kind people for the insights.
[ This is not simply coding style, this is about total rearrangements of how algorithms are coded. I'm talking about being careless with harware stack vs. being careful with software stack? I'm also wondering about some technical differences (i.e., are they truly equal methods for all situations). And almost all books follow above patterns. ]
Check when you push or when you pop is both OK, but check when you push perform better,
For example if you have a binary tree of depth 10, if you check when you pop, you are basically traversing a tree of depth 11(It's like you added added two more NULL Children to each leaf) thats 2048 more wasted operation. If its recursive then it means it will make 2048 more none necessary function call.
Other than that it's OK to check before or after, it's just happened that the code you saw been written that way.

Resources