What are some true Iterative Depth First Search implementation suggestions? - algorithm

So from what I am seeing I have been thaught like most people that the iterative version of DFS is just like iterative BFS besides two differences: replace queue with stack and mark a node as discovered after POP not after PUSH.
Two questions have really been puzzling me recently:
For certain cases it will result in different output, but is it really necessary to mark a node as visited after we POP? Why is it wrong to do it after we PUSH? As far as I can see this will result in the same consequence as to why we do it for BFS...to not have duplicates in our queue/stack.
Now the big one: My impression is that this kind of implementation of iterative DFS is not true DFS at all: If we think about the recursive version it is quite space efficient since it doesnt store all the possible neighbours at one level(as we would do in the iterative version), it only selects one and goes with it and then it backtracks and goes for a second one. As an extreme example think of a graph with one node in the center and conected to 100 leaf nodes. In the recursive implementation if we start from the middle node the underlying stack will grow to maximum 2 ... one for the middle node and one for every leaf it visits. If we do it as we have been thaught with the iterative version the stack will grow to 1000 elements. That doesnt seem right.
So with all those previous details in mind, my question is what would the approach be to have a true iterative DFS implementation?

Question 1: If you check+mark before the push, it uses less space but changes the order in which nodes are visited. You will visit everything, though.
Question 2: You are correct that an iterative DFS will usually put all the children of a node onto the stack at the same time. This increases the space used for some graphs, but it doesn't change the worst case space usage, and it's the easiest way so there's usually no reason to change that.
Occasionally you know that it will save a lot of space if you don't do this, and then you can write an iterative DFS that works more like the recursive one. Instead of pushing the next nodes to visit on the stack, you push a parent and a position in its list of children, or equivalent, which is pretty much what the recursive version has to remember when it recurses. In pseudo-code, it looks like this:
func DFS(start):
let visited = EMPTY_SET
let stack = EMPTY_STACK
visited.add(start)
visit(start)
stack.push( (start,0) )
while(!stack.isEmpty()):
let (parent, pos) = stack.pop()
if (pos < parent.numChildren()):
let child = parent.child[pos]
stack.push(parent,pos+1)
if (!visited.contains(child)):
visited.add(child)
visit(child)
stack.push( (child,0) )
You can see that it's a little more complicated, and the records you push on the stack are tuples, which is annoying in some cases. Often we'll use two stacks in parallel instead of creating tuples to push, or we'll push/pop two records at a time, depending on how nodes and child list positions have to be represented.

Related

Is there a name for this BFS/DFS/IDDFS-like algorithm?

Essentially, it is a depth-first search that stops at a certain depth or cost. For example, it may DFS all nodes within 10 edges from the source, then 20, then 30. The difference is that rather than starting the DFS from scratch after each iteration, I store the "perimeter" of the searched area (a list of nodes) when each iteration of the search reaches its limits.
On the next iteration, i loop through all nodes on the perimeter, performing a DFS from each node, again to a fixed depth/cost before stopping, again recording down the perimeter of the searched area for the next iteration to start from.
The reason I am doing this is because my graph (which is a tree) is split into a set of logical "chunks", each of which must be fully explored before its child-chunks can start being explored. There are a large number of nodes but only a small number of chunks; I am essentially doing a chunk-by-chunk BFS, which each individual chunk (comprising a large number of individual nodes) being fully explored by its own mini-DFS.
Now, I just completely made this up on the spot to solve my problem, and it does, but is there anything like this in the literature? I haven't managed to find anything, but I'm sure someone else has done this before, and properly analysed its performance, asymptotic behavior, disadvantages, bugs, etc.. In that case, I would want to know about it.
I do not now of a name for this mixed type. I have used something similar too, but I don't think it is used very frequently and has a name. Often, the other algorithms make more sense:
If you want to advance slowly in chunks, Why don't you use a BFS?
Often a DFS is preferred because there you get the full traces. Furthermore, doing an iterative deepening DFS is simpler then your algorithm, just twice as time consuming, and requires much less memory.

Algorithm Question.. Linked List

Scenario is as follows:-
I want to reverse the direction of the singly linked list, In other words, after the reversal all pointers should now point backwards..
Well the algorithm should take linear time.
The solution that i have thought of using another datastructure A Stack.. With the help of which the singly linked list would be easily reversed, with all pointers pointing backwards.. But i am in doubt, that whether the following implementation yeild linear time complexity.. Please comment on this.. And if any other efficient algorithm is in place, then please discuss..
Thanks.
You could do it like this: As long as there are nodes in the input list, remove its first node and insert it at the beginning of the output list:
node* reverse(node *in) {
out = NULL;
while (in) {
node = in;
in = in->next;
node->next = out;
out = node;
}
return out;
}
2 times O(N) = O(2*n) is still O(N). So first push N elements and then popping N elements from a stack is indeed linear in time, as you expected.
See also the section Multiplication by a Constant on the "Big O Notation" wikipedia entry.
If you put all of the nodes of your linked list in a stack, it will run in linear time, as you simply traverse the nodes on the stack backwards.
However, I don't think you need a stack. All you need to remember is the node you were just at, to reverse the pointer of the current node. Make note of the next node before you reverse the pointer at this node.
The previous answers have and already (and rightly) mentioned that the solution using pointer manipulation and the solution using stack are both O(n).
The remaining question is to compare the real run time (machine cycle complexity) performance of the two different implementations of the reverse() function.
I expect that the following two aspects might be relevant:
The stack implementation. Does it
require the maximum stack depth to
be explicitly specified? If so, how is that specified? If not, how
the stack does memory management as
the size grows arbitrarily large?
I guess that nodes have to be copied
from list to stack. [Is there a way
without copying?] In that case, the
copy complexity of the node needs to
be accounted for. Thats because the
size of the node can be
(arbitrarily) large.
Given these, in place reversal by manipulating pointers seems more attractive to me.
For a list of size n, you call n times push and n times pop, both of which are O(1) operations, so the whole operation is O(n).
You can use a stack to achieve a O(n) implementation. But the recursive solution IS using a stack (THE stack)! And, like all recursive algorithms, it is equivalent to looping. However, in this case, using recursion or an explicit stack would create a space complexity of O(n) which is completely unnecessary.

Determining the maximum stack depth

Imagine I have a stack-based toy language that comes with the operations Push, Pop, Jump and If.
I have a program and its input is the toy language. For instance I get the sequence
Push 1
Push 1
Pop
Pop
In that case the maximum stack would be 2. A more complicated example would use branches.
Push 1
Push true
If .success
Pop
Jump .continue
.success:
Push 1
Push 1
Pop
Pop
Pop
.continue:
In this case the maximum stack would be 3. However it is not possible to get the maximum stack by walking top to bottom as shown in this case since it would result in a stack-underflow error actually.
CFGs to the rescue you can build a graph and walk every possible path of the basic blocks you have. However since the number of paths can grow quickly for n vertices you get (n-1)! possible paths.
My current approach is to simplify the graph as much as possible and to have less possible paths. This works but I would consider it ugly. Is there a better (read: faster) way to attack this problem? I am fine if the algorithm produces a stack depth that is not optimal. If the correct stack size is m then my only constraint is that the result n is n >= m. Is there maybe a greedy algorithm available that would produce a good result here?
Update: I am aware of cycles and the invariant that all controlf flow merges have the same stack depth. I thought I write down a simple toy-like language to illustrate the issue. Basically I have a deterministic stack-based language (JVM bytecode), so each operation has a known stack-delta.
Please note that I do have a working solution to this problem that produces good results (simplified cfg) but I am looking for a better/faster approach.
Given that your language doesn't seem to have any user input all programs will simply compute in the same way all the time. Therefore, you could execute the program and keep track of the maximum stacksize during execution. Probably not what you want though.
As for your path argumentation: Be aware, that jumping allows cycles, hence, without further analysis a cycle might imply non-termination and stack overflows (i.e. stack size is increased after each cycle execution). [n nodes still means infinitely many paths if there is a cycle]
Instead of actual execution of the code you might be able to do some form of abstract interpretation.
Regarding the comment from IVlad: Simply counting the pushs is wrong due to the existence of possible cycles.
I am not sure what the semantics of your if-statements is though, so this could be useful too: Assume that an if-statement's label can only be a forward label (i.e., you can never jump back in your code). In that case your path counting argument comes back to life. In effect, the resulting CFG will be a tree (or DAG if you don't copy code). In that case you could do an approximative count, by a bottom-up computation of the number of pushs and then taking the maximum number of pushs for both branches in case of an if-statement. It's still not the optimal correct result, but yields a better approximation than a simple count of push-statements.
You generally want to have the stack depth invariant over jumps and loops.
That means that for every node, every incoming edge should have the same stack depth. This simplifies walking the CFG significantly, because backedges can no longer change the stack depth of already calculated instructions.
This also is requirement for bounded stack depth. If not enforced, you will have increasing loops in your code.
Another thing you should consider is making the stack effect of all opcodes deterministic. An example of a nondeterministic opcode would be: POP IF TopOfStack == 0.
Edit:
If you do have a deterministic set of opcodes and the stack depth invariant, there is no need to visit every possible path of the program. It's enough to do a DFS/BFS through the CFG to determine the maximum stack depth. This can be done in linear time (depending on the amount of instructions), but not faster.
Evaluating if the basic blocks, at which the outgoing edges of your current basic block point, still need to be evaluated should not be performance relevant. Even in the worst case, every instruction is an IF, there will be only 2*N edges to evaluate.

What's the difference between backtracking and depth first search?

What's the difference between backtracking and depth first search?
Backtracking is a more general purpose algorithm.
Depth-First search is a specific form of backtracking related to searching tree structures. From Wikipedia:
One starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking.
It uses backtracking as part of its means of working with a tree, but is limited to a tree structure.
Backtracking, though, can be used on any type of structure where portions of the domain can be eliminated - whether or not it is a logical tree. The Wiki example uses a chessboard and a specific problem - you can look at a specific move, and eliminate it, then backtrack to the next possible move, eliminate it, etc.
I think this answer to another related question offers more insights.
For me, the difference between backtracking and DFS is that backtracking handles an implicit tree and DFS deals with an explicit one. This seems trivial, but it means a lot. When the search space of a problem is visited by backtracking, the implicit tree gets traversed and pruned in the middle of it. Yet for DFS, the tree/graph it deals with is explicitly constructed and unacceptable cases have already been thrown, i.e. pruned, away before any search is done.
So, backtracking is DFS for implicit tree, while DFS is backtracking without pruning.
IMHO, most of the answers are either largely imprecise and/or without any reference to verify. So let me share a very clear explanation with a reference.
First, DFS is a general graph traversal (and search) algorithm. So it can be applied to any graph (or even forest). Tree is a special kind of Graph, so DFS works for tree as well. In essence, let’s stop saying it works only for a tree, or the likes.
Based on [1], Backtracking is a special kind of DFS used mainly for space (memory) saving. The distinction that I’m about to mention may seem confusing since in Graph algorithms of such kind we’re so used to having adjacency list representations and using iterative pattern for visiting all immediate neighbors (for tree it is the immediate children) of a node, we often ignore that a bad implementation of get_all_immediate_neighbors may cause a difference in memory uses of the underlying algorithm.
Further, if a graph node has branching factor b, and diameter h (for a tree this is the tree height), if we store all immediate neighbors at each steps of visiting a node, memory requirements will be big-O(bh). However, if we take only a single (immediate) neighbor at a time and expand it, then the memory complexity reduces to big-O(h). While the former kind of implementation is termed as DFS, the latter kind is called Backtracking.
Now you see, if you’re working with high level languages, most likely you’re actually using Backtracking in the guise of DFS. Moreover, keeping track of visited nodes for a very large problem set could be really memory intensive; calling for a careful design of get_all_immediate_neighbors (or algorithms that can handle revisiting a node without getting into infinite loops).
[1] Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd Ed
According to Donald Knuth, it's the same.
Here is the link on his paper about Dancing Links algorithm, which is used to solve such "non-tree" problems as N-queens and Sudoku solver.
Backtracking, also called depth-first search
Backtracking is usually implemented as DFS plus search pruning. You traverse search space tree depth-first constructing partial solutions along the way. Brute-force DFS can construct all search outcomes, even the ones, that do not make sense practically. This can be also very inefficient to construct all solutions (n! or 2^n). So in reality as you do DFS, you need to also prune partial solutions, which do not make sense in context of the actual task, and focus on the partial solutions, which can lead to valid optimal solutions. This is the actual backtracking technique - you discard partial solutions as early as possible, make a step back and try to find local optimum again.
Nothing stops to traverse search space tree using BFS and execute backtracking strategy along the way, but it doesn't make sense in practice because you would need to store search state layer by layer in the queue, and tree width grows exponentially to the height, so we would waste a lot of space very quickly. That's why trees are usually traversed using DFS. In this case search state is stored in the stack (call stack or explicit structure) and it can't exceed tree height.
Usually, a depth-first-search is a way of iterating through an actual graph/tree structure looking for a value, whereas backtracking is iterating through a problem space looking for a solution. Backtracking is a more general algorithm that doesn't necessarily even relate to trees.
I would say, DFS is the special form of backtracking; backtracking is the general form of DFS.
If we extend DFS to general problems, we can call it backtracking.
If we use backtracking to tree/graph related problems, we can call it DFS.
They carry the same idea in algorithmic aspect.
DFS describes the way in which you want to explore or traverse a graph. It focuses on the concept of going as deep as possible given the choice.
Backtracking, while usually implemented via DFS, focuses more on the concept of pruning unpromising search subspaces as early as possible.
IMO, on any specific node of the backtracking, you try to depth firstly branching into each of its children, but before you branch into any of the child node, you need to "wipe out" previous child's state(this step is equivalent to back walk to the parent node). In other words, each siblings state should not impact each other. On the contrary, during normal DFS algorithm, you don't usually have this constraint, you don't need to wipe out(back track) previous siblings state in order to construct next sibling node.
Depth first is an algorithm for traversing or searching a tree. See here. Backtracking is a much more broad term that is used whereever a solution candidate is formed and later discarded by backtracking to a former state. See here. Depth first search uses backtracking to search a branch first (solution candidate) and if not successful search the other branch(es).
Depth first search(DFS) and backtracking are different searching and traversing algorithms. DFS is more broad and is used in both graph and tree data structure while DFS is limited to tree. With that being said,it does not mean DFS can't be used in graph. It is used in graph as well but only produce spanning tree, a tree without loop(multiple edge starting and ending at same vertex). That is why it is limited to tree.
Now coming back to backtracking, DFS use backtracking algorithm in tree data structure so in tree, DFS and backtracking are similar.
Thus,we can say, they are in same in tree data structure whereas in graph data structure, they are not same.
Idea - Start from any point, check if its the desired endpoint, if yes then we found a solution else goes to all next possible positions and if we can't go further then return to the previous position and look for other alternatives marking that current path wont lead us to the solution.
Now backtracking and DFS are 2 different names given to the same idea applied on 2 different abstract data types.
If the idea is applied on matrix data structure we call it backtracking.
If the same idea is applied on tree or graph then we call it DFS.
The cliche here is that a matrix could be converted to a graph and a graph could be transformed to a matrix. So, we actually apply the idea. If on a graph then we call it DFS and if on a matrix then we call it backtracking.
The idea in both of the algorithm is same.
The way I look at DFS vs. Backtracking is that backtracking is much more powerful. DFS helps me answer whether a node exists in a tree, while backtracking can help me answer all paths between 2 nodes.
Note the difference: DFS visits a node and marks it as visited since we are primarily searching, so seeing things once is sufficient. Backtracking visits a node multiple times as it's a course correction, hence the name backtracking.
Most backtracking problems involve:
def dfs(node, visited):
visited.add(node)
for child in node.children:
dfs(child, visited)
visited.remove(node) # this is the key difference that enables course correction and makes your dfs a backtracking recursion.
In a depth-first search, you start at the root of the tree and then explore as far along each branch, then you backtrack to each subsequent parent node and traverse it's children
Backtracking is a generalised term for starting at the end of a goal, and incrementally moving backwards, gradually building a solution.
Backtracking is just depth first search with specific termination conditions.
Consider walking through a maze where for each step you make a decision, that decision is a call to the call stack (which conducts your depth first search)... if you reach the end, you can return your path. However, if you reach a deadend, you want to return out of a certain decision, in essence returning out of a function on your call stack.
So when I think of backtracking I care about
State
Decisions
Base Cases (Termination Conditions)
I explain it in my video on backtracking here.
An analysis of backtracking code is below. In this backtracking code I want all of the combinations that will result in a certain sum or target. Therefore, I have 3 decisions which make calls to my call stack, at each decision I either can pick a number as part of my path to reach the target num, skip that number, or pick it and pick it again. And then if I reach a termination condition, my backtracking step is just to return. Returning is the backtracking step because it gets out of that call on the call stack.
class Solution:
"""
Approach: Backtracking
State
-candidates
-index
-target
Decisions
-pick one --> call func changing state: index + 1, target - candidates[index], path + [candidates[index]]
-pick one again --> call func changing state: index, target - candidates[index], path + [candidates[index]]
-skip one --> call func changing state: index + 1, target, path
Base Cases (Termination Conditions)
-if target == 0 and path not in ret
append path to ret
-if target < 0:
return # backtrack
"""
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
"""
#desc find all unique combos summing to target
#args
#arg1 candidates, list of ints
#arg2 target, an int
#ret ret, list of lists
"""
if not candidates or min(candidates) > target: return []
ret = []
self.dfs(candidates, 0, target, [], ret)
return ret
def dfs(self, nums, index, target, path, ret):
if target == 0 and path not in ret:
ret.append(path)
return #backtracking
elif target < 0 or index >= len(nums):
return #backtracking
# for i in range(index, len(nums)):
# self.dfs(nums, i, target-nums[i], path+[nums[i]], ret)
pick_one = self.dfs(nums, index + 1, target - nums[index], path + [nums[index]], ret)
pick_one_again = self.dfs(nums, index, target - nums[index], path + [nums[index]], ret)
skip_one = self.dfs(nums, index + 1, target, path, ret)
In my opinion, the difference is pruning of tree. Backtracking can stop (finish) searching certain branch in the middle by checking the given conditions (if the condition is not met). However, in DFS, you have to reach to the leaf node of the branch to figure out if the condition is met or not, so you cannot stop searching certain branch until you reach to its leaf nodes.
The difference is: Backtracking is a concept of how an algorithm works, DFS (depth first search) is an actual algorithm that bases on backtracking. DFS essentially is backtracking (it is searching a tree using backtracking) but not every algorithm based on backtracking is DFS.
To add a comparison: Backtracking is a concept like divide and conqueror. QuickSort would be an algorithm based on the concept of divide and conqueror.

In place min-max tree invalidation problems

I’m trying to build a parallel implementation of a min-max search. My current approach is to materialize the tree to a small depth and then do the normal thing from each of these nodes.
The simple way to do this is to compute the heuristic value for each leaf and then sweep up and compute the min/max. The problem is that it this omits alpha/beta pruning at the upper levels and makes for a major performance hit.
My first “solution” to that was to push the min/max up after each leaf is computed. This gives updating value so I can scan up the tree and check if a leaf should be pruned.
The problem is that it's totally broken. (2 days of debugging to notice that, darn I feel stupid)
Now for the question:
Is there a way to build a min-max tree that allows the leafs to be evaluated in random order and also allows alpha/beta pruning?
Check out parallel game tree search, e.g. this paper.
I think I have found a solution but I don't like it in a few regards:
Annotate the tree with the number of unfinished children.
After a leaf is evaluated, update it's parent, decrement the parent's count
If that count just reached zero, update it's parent, decrement that count
Lather, rise, repeat
Alpha/beta pruning works as expected.
The problems with this is that with random order evaluation, a lot more nodes will get evaluated before stuff starts getting pruned. On the other hand, that might be mitigated by better ordering of the leafs.

Resources