DFS questions (one for rooted trees, one for graphs) - algorithm

Question 1:
For rooted trees, like binary trees, why do people make sure never to push NULL/nil/null onto the stack?
e.g.
while(!stack.empty() ) {
x = stack.pop();
visit(x);
if(x->right) stack.push(x->right);
if(x->left) stack.push(x->left);
}
instead of
while(!stack.empty() ) {
x = stack.pop();
if(!x) continue;
visit(x);
stack.push(x->left), stack.push(x->right);
}
I mean, the second form more naturally aligns with the recursive form of preorder/DFS, so why do people use the "check before" with iterative usually, and "check after" with recursive? Any reason I am missing, other than saving (n+1) stack spots (which doesn't add to the space complexity)?
Question 2:
DFS on Graph: Using iterative, why do people set visited flag when pushing adjacent nodes of current node, but in recursive, we only do it after the recursion takes place?
e.g. iterative:
while(!stack.empty()){
x=stack.pop();
// we do not need to check if x is visited after popping
for all u adjacent from x
if(!visited[u]) stack.push(u), visited[u]=true; //we check/set before push
}
but in recursive:
void DFS(Graph *G, int x)
{
if( !visited[x] ) return; //we check/set after popping into node
visited[x]=true;
for all u adjacent from x
DFS(G, u); //we do not check if u is already visited before push
}
SO in general, the connection between two questions is: Why we are more careful before pushing valid things onto an actual stack object (iterative methods of DFS), but are less careful when using the hardware stack (recursive)? Am I missing something? Isn't the hardware stack more of a "luxury" because it can overflow (for stack objects, we can declare them off heap) ?
Thank you kind people for the insights.
[ This is not simply coding style, this is about total rearrangements of how algorithms are coded. I'm talking about being careless with harware stack vs. being careful with software stack? I'm also wondering about some technical differences (i.e., are they truly equal methods for all situations). And almost all books follow above patterns. ]

Check when you push or when you pop is both OK, but check when you push perform better,
For example if you have a binary tree of depth 10, if you check when you pop, you are basically traversing a tree of depth 11(It's like you added added two more NULL Children to each leaf) thats 2048 more wasted operation. If its recursive then it means it will make 2048 more none necessary function call.
Other than that it's OK to check before or after, it's just happened that the code you saw been written that way.

Related

How to find the previous node of a singly linked list without the head pointer

Hi this was an interview question, Any suggestions/answers would be appreciated.
In the question : He gave me a singly linked list and pointed out at a random node and asked me to find the previous node. I asked if i have access to the head pointer , he said no. I asked if it was a circular or double linked list, he again said no. How would i find the previous node?
P.S. - I do not have access to the head pointer at all, so i can't keep track of previous pointer.
From a purely CS standpoint, if you construct a non-cyclic singly linked-list of length n as:
L1 -> L2 -> ... -> L(x-1) -> Lx -> ... -> Ln
The following theorem holds true:
Theorem: Node Lx can only be reached by nodes L1 through Lx.
Just by looking at it, this is fairly obvious: there is no going backwards. I skipped over some steps, but this is a fairly easy conclusion to make. There is no path from Lx to previous nodes in the chain. A formal proof would use induction. This page explains how one might perform induction over a linked list.
The contrapositive is: Nodes L1 through L(x-1) cannot be reached by node Lx. As a direct result L(x-1) cannot be reached by node Lx which contradicts the claim by the interviewer. Either the interviewer is wrong or your interpretation of the interviewer is wrong.
I was only given the random node and i was asked to find the previous node, no head pointer, no cycles
If such a thing were possible, it would break many existing applications of computer science. For example, Garbage Collection in safe languages like Python or Ruby relies on nodes no longer being reachable once there is no path to them. If you were able to reach a node in this way, you could cause a "use after freed" bug making the language unsafe.
The interviewer might have expected you to state the question is impossible. I have had interviews where this is the case. Sometimes, the interviewer is probing for a more "creative" solution. For example, in a language like C++, all the nodes may be stored in an underlying resource pool which can be iterated over to find the previous node. However, I would find this implementation unusual for an interview and in practice.
Needless to say, the problem as you have stated is not possible under the given constraints.
You can do it like this.. you can replace the value of current node by value of next node.. and in the next of 2nd last node you can put null. its like delete a element from a string.
here is code
void deleteNode(ListNode* node) {
ListNode *pre=node;
while(node->next)
{
node->val=node->next->val;
pre=node;
node=node->next;
}
pre->next=NULL;
}
The ONLY way you can do this is if the linked list is circular, i.e., the last node points to the first node as a type of circular list. Then it is simply a list walk keeping track of the previous node until you arrive again at the node you're on.
It is possible. here is the code for your reference.Here, I have assumed that I know the data values of each node. You can test this code by giving 'b' and 'c' in call method. Obviously you can create multiple objects too. Let me know if this is the solution you are looking for.
# Program to delete any one node in a singly link list
class A:
def __init__(self,data):
self.data=data
self.node=None
class B:
def __init__(self):
self.head=None
def printlist(self):
printval=self.head
c=0
self.call(printval,c)
def call(self,printval,c):
temp1=None
temp2=None
while printval:
if printval.data=='c' and c==0:
c=c+1
temp2=printval.node
del printval
temp1.node=temp2
printval=temp1.node
print(printval.data)
temp1 = printval
printval=printval.node
o1=B()
o1.head=A("a")
o2=A("b")
o3=A("c")
o4=A("d")
o1.head.node=o2
o2.node=o3
o3.node=o4
o1.printlist()

Recursion vs iteration with regards to memory usage

Suppose I have a recursive as well as an iterative solution (using a stack) to some problem e.g. preorder traversal of a binary tree. With current computers, memory-wise, is there an advantage to using the recursive solution over the iterative version or vice versa for very large trees?
I'm aware that for certain recursive solutions where sub-problems repeat, there are additional time and memory costs if recursion is used. Assume that this is not the case here. For example,
preOrder(Node n){
if (n == null) return;
print(n);
preOrder(n.left);
preOrder(n.right);
}
vs
preOrder(Node n){
stack s;
s.push(n);
while(!s.empty()){
Node node = s.pop();
print(node);
s.push(node.right);
s.push(node.left);
}
}
If there is a risk of stack overflow (in this case, because the trees are not guaranteed to be even semi-balanced), then a robust program will avoid recursion and use an explicit stack.
The explicit stack may use less memory, because stack frames tend to be larger than is strictly necessary to maintain the context of recursive calls. (For example, the stack frame will contain at least a return address as well as the local variables.)
However, if the recursion depth is known to be limited, then not having to dynamically allocate can save space and time, as well as programmer time. For example, walking a balanced binary tree only requires recursion to the depth of the tree, which is log2 of the number of nodes; that cannot be a very large number.
As suggested by a commentator, one possible scenario is that the tree is known to be right-skewed. In that case, you can recurse down the left branches without worrying about stack overflow (as long as you are absolutely certain that the tree is right-skewed). Since the second recursive call is in the tail position, it can just be rewritten as a loop:
void preOrder(Node n) {
for (; n; n = n.right) {
print(n);
preOrder(n.left);
n = n.right;
}
A similar technique is often (and should always be) applied to quicksort: after partitioning, the function recurses on the smaller partition, and then loops to handle the larger partition. Since the smaller partition must be less than half the size of the original array, that will guarantee the recursion depth to be less than log2 of the original array size, which is certainly less than 50 stack frames, and probably a lot less.

DFS vs BFS .2 differences

According to wikipedia, there are basically two difference in implementation of DFS and BFS.
They are:
1)DFS uses stack while BFS uses queue.(this I understand).
2)DFS delays checking whether a vertex has been discovered until the vertex is popped from the stack rather than making this check before pushing the vertex.
I am not able to understand the second difference.I mean why DFS visits the node after removing from stack while BFS visits the node before adding it to queue.
Thanks!
Extra info:
In a simple implementation of above two algorithms, we take a boolean array (let us name it visited) to keep track of which node is visited or not.The question mentions this visited boolean array.
This is probably the first time I ever heard of when DFS would delay setting the "discovered" property until popped from the stack (even on wikipedia, both the recursive and iterative pseudocode have labeling the current node as discovered before pushing the children in the stack). Also, if you "discover" the node only when you finish processing it, I think you could easily get into endless loops.
There are however situations when I use two flags for each node: one set when entering, one upon leaving the node (usually, I write DFS as recursive, so, right at the end of the recursive function). I think I used something like this is when I needed stuff like: strongly connected components or critical points in a connected graph (="nodes which, if removed, the graph would lose its connectivity"). Also, the order in which you exit the nodes is often used for topological sorting (the topological sort is just the inverse of the order you finished processing the nodes).
The wikipedia article mentions two ways to perform a DFS: using recursion and using a stack.
For completeness, I copy both here:
Using recursion
procedure DFS(G,v):
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
if vertex w is not labeled as discovered then
recursively call DFS(G,w)
Using a stack
procedure DFS-iterative(G,v):
let S be a stack
S.push(v)
while S is not empty
v ← S.pop()
if v is not labeled as discovered:
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
S.push(w)
The important thing to know here is how method calls work. There is an underlying stack, lets call is T. Before the method gets called, it's arguments are pushed on the stack. The method then takes the arguments from that stack again, performs its operation, and pushes its result back on the stack. This result is then taken from the stack by the calling method.
As an example, consider the following snippet:
function caller() {
callResult = callee(argument1, argument2);
}
In terms of the stack T, this is what happens (schematically):
// inside method caller
T.push(argument1);
T.push(argument2);
"call method callee"
// inside method callee
argument2 = T.pop();
argument1 = T.pop();
"do something"
T.push(result);
"return from method callee"
// inside method caller
callResult = T.pop();
This is pretty much what is happening in the second implementation: the stack is used explicitly. You can compare making the call to DFS in the first snippet with pushing a vertex on the stack, and compare executing the call to DFS in the first snippet with popping the vertex from the stack.
The first thing after popping vertex v from the stack is marking it as discovered. This is equivalent to marking it as discovered as a first step in the execution of DFS.

Shortest branch in a binary tree?

A binary tree can be encoded using two functions l and r
such that for a node n, l(n) give the left child of n, r(n)
give the right child of n.
A branch of a tree is a path from the root to a leaf, the
length of a branch to a particular leaf is the number of
arcs on the path from the root to that leaf.
Let MinBranch(l,r,x) be a simple recursive algorithm for
taking a binary tree encoded by the l and r functions
together with the root node x for the binary tree and
returns the length of the shortest branch of the binary
tree.
Give the pseudocode for this algorithm.
OK, so basically this is what I've come up with so far:
MinBranch(l, r, x)
{
if x is None return 0
left_one = MinBranch(l, r, l(x))
right_one = MinBranch(l, r, r(x))
return {min (left_one),(right_one)}
}
Obviously this isn't great or perfect. I'd be greatful if
people can help me get this perfect and working - any help
will be appreciated.
I doubt anyone will solve homework for you straight-up. A clue: the return value must surely grow higher as the tree gets bigger, right? However I don't see any numeric literals in your function except 0, and no addition operators either. How will you ever return larger numbers?
Another angle on the same issue: anytime you write a recursive function, it helps to enumerate "what are all the conditions where I should stop calling myself? what I return in each circumstance?"
You're on the right approach, but you're not quite there; your recursive algorithm will always return 0. (the logic is almost right, though...)
note that the length of the sub-branches is one less than the length of the branch; so left_one and right_one should be 1 + MinBranch....
Steping through the algorithm with some sample trees will help uncover off-by-one errors like this one...
It looks like you almost have it, but consider this example:
4
3 5
When you trace through MinBranch, you'll see that in your
MinBranch(l,r,4) call:
left_one = MinBranch(l, r, l(x))
= MinBranch(l, r, l(4))
= MinBranch(l, r, 3)
= 0
That makes sense, after all, 3 is a leaf node, so of course the distance
to the closest leaf node is 0. The same happens for right_one.
But you then wind up here:
return {min (left_one),(right_one)}
= {min (0), (0) }
= 0
but that's clearly wrong, because this node (4) is not a leaf node. Your
code forgot to count the current node (oops!). I'm sure you can manage
to fix that.
Now, actually, they way you're doing this isn't the fastest, but I'm not
sure if that's relevant for this exercise. Consider this tree:
4
3 5
2
1
Your algorithm will count up the left branch recursively, even though it
could, hypothetically, bail out if you first counted the right branch
and noted that 3 has a left, so its clearly longer than 5 (which is a
leaf). But, of course, counting the right branch first doesn't always
work!
Instead, with more complicated code, and probably a tradeoff of greater
memory usage, you can check nodes left-to-right, top-to-bottom (just
like English reading order) and stop at the first leaf you find.
What you've created can be thought of as a depth-first search. However, given what you're after (shortest branch), this may not be the most efficent approach. Think about how your algorithm would perform on a tree that was very heavy on the left side (of the root node), but had only one node on the right side.
Hint: consider a breadth-first search approach.
What you have there looks like a depth first search algorithm which will have to search the entire tree before you come up with a solution. what you need is the breadth first search algorithm which can return as soon as it finds the solution without doing a complete search

Best algorithm to test if a linked list has a cycle

What's the best (halting) algorithm for determining if a linked list has a cycle in it?
[Edit] Analysis of asymptotic complexity for both time and space would be sweet so answers can be compared better.
[Edit] Original question was not addressing nodes with outdegree > 1, but there's some talk about it. That question is more along the lines of "Best algorithm to detect cycles in a directed graph".
Have two pointers iterating through the list; make one iterate through at twice the speed of the other, and compare their positions at each step. Off the top of my head, something like:
node* tortoise(begin), * hare(begin);
while(hare = hare->next)
{
if(hare == tortoise) { throw std::logic_error("There's a cycle"); }
hare = hare->next;
if(hare == tortoise) { throw std::logic_error("There's a cycle"); }
tortoise = tortoise->next;
}
O(n), which is as good as you can get.
Precondition: Keep track of the list size (update the size whenenver a node is added or deleted).
Loop detection:
Keep a counter when traversing the list size.
If the counter exceeds the list size, there is a possible cycle.
Complexity: O(n)
Note: the comparison between the counter and the list size, as well as the update operation of the list size, must be made thread-safe.
Take 2 pointer *p and *q , start traversing the linked list "LL" using both pointers :
1) pointer p will delete previous node each time and pointing to next node.
2) pointer q will go each time in forward direction direction only.
conditions:
1) pointer p is pointing to null and q is pointing to some node : Loop is present
2) both pointer pointing to null: no loop is there
What about using a hash table to store the already seen nodes (you look at them in order from the start of the list)? In practise, you could achieve something close to O(N).
Otherwise, using a sorted heap instead of a hash table would achieve O(N log(N)).
I wonder if there's any other way than just going iteratively - populate an array as you step forwards, and check if the current node is already present in the array...
Konrad Rudolph's algorithm won't work if the cycle isn't pointing to the beginning. The following list will make it an infinite loop: 1->2->3->2.
DrPizza's algorithm is definitely the way to go.
In this case OysterD's code will be the fastest solution (vertex coloring)
That would really surprise me. My solution makes at most two passes through the list (if the last node is linked to the penultimate lode), and in the common case (no loop) will make only one pass. With no hashing, no memory allocation, etc.
In this case OysterD's code will be the fastest solution (vertex coloring)
That would really surprise me. My solution makes at most two passes through the list (if the last node is linked to the penultimate lode), and in the common case (no loop) will make only one pass. With no hashing, no memory allocation, etc.
Yes. I've noticed that the formulation wasn't perfect and have rephrased it. I still believe that a clever hashing might perform faster – by a hair. I believe your algorithm is the best solution, though.
Just to underline my point: the vertec coloring is used to detect cycles in dependencies by modern garbage collectors so there is a very real use case for it. They mostly use bit flags to perform the coloring.
You will have to visit every node to determine this. This can be done recursively. To stop you visiting already visited nodes, you need a flag to say 'already visited'. This of course doesn't give you loops. So instead of a bit flag, use a number. Start at 1. Check connected nodes and then mark these as 2 and recurse until the network is covered. If when checking nodes you encounter a node which is more than one less than the current node, then you have a cycle. The cycle length is given by the difference.
Two pointers are initialized at the head of list. One pointer forwards once at each step, and the other forwards twice at each step. If the faster pointer meets the slower one again, there is a loop in the list. Otherwise there is no loop if the faster one reaches the end of list.
The sample code below is implemented according to this solution. The faster pointer is pFast, and the slower one is pSlow.
bool HasLoop(ListNode* pHead)
{
if(pHead == NULL)
return false;
ListNode* pSlow = pHead->m_pNext;
if(pSlow == NULL)
return false;
ListNode* pFast = pSlow->m_pNext;
while(pFast != NULL && pSlow != NULL)
{
if(pFast == pSlow)
return true;
pSlow = pSlow->m_pNext;
pFast = pFast->m_pNext;
if(pFast != NULL)
pFast = pFast->m_pNext;
}
return false;
}
This solution is available on my blog. There is a more problem discussed in the blog: What is the entry node when there is cycle/loop in a list?
"Hack" solution (should work in C/C++):
Traverse the list, and set the last bit of next pointer to 1.
If find an element with flagged pointer -- return true and the first element of the cycle.
Before returning, reset pointers back, though i believe dereferencing will work even with flagged pointers.
Time complexity is 2n. Looks like it doesn't use any temporal variables.
This is a solution using a Hash table ( just a list actually) to save the pointer address.
def hash_cycle(node):
hashl=[]
while(node):
if node in hashl:
return True
else:
hashl.append(node)
node=node.next
return False

Resources