Difference between different graph traversal methods - data-structures

I know of the famous BFS and DFS methods for tree traversal.
I have also heard of bottom up traversal as well as top down traversal when visiting an AST (related to compilers).
I cannot understand the relationship between these 2 sets, are they referring to the same methods?

For the first part of your question, see this answer. For the second part, we usually refer to bottom up and top down in recursivity, so it is not related to trees but to recursive calls, when it's the caller which gives intermediate result to the callee it's top down, when the caller calls the callee for a result it will use we talk about bottom up.

Related

When to use parent pointers in trees?

There are many problems in which we need to find the parents or ancestors of a node in a tree repeatedly. So, in those scenarios, instead of finding the parent node at run-time, a less complicated approach seems to be using parent pointers. This is time efficient but increase space. Can anyone suggest, in which kind of problems or scenarios, it is advisable to use parent pointers in a tree?
For example - distance between two nodes of a tree?
using parent pointers. This is time efficient but increase space.
A classic trade-off in Computer Science.
In which kind of problems or scenarios, it is advisable to use parent pointers in a tree?
In cases where finding the parents in runtime would cost much more than having pointers to the parents.
Now, one has to understand what cost means. You mentioned the trade-off yourself: One should think whether or not is worth to spend some extra memory to store the pointers, in order speedup your program.
Here are some of the scenarios that I can think of, where having a parent pointer saved in a node could help improve out time complexity
-> Ancestors of a given node in a binary tree
-> Union Find Algorithm
-> Maintain collection of disjoint sets
-> Merge two sets together
Now according to me in general having a parent pointer for any kind of tree problem or trie problem would make your traversal up-down or bottom-up easier.
Hope this helps!
Just cases where you need efficient bottom-up traversal outside the context of top-to-bottom traversal as a generalized answer.
As a concrete example, let's say you have a graphics software which uses a quad-tree to efficiently draw only elements on screen and let users select elements efficiently that they click on or marquee select.
However, after the users select some elements, they can then delete them. Deleting those elements would require the quad-tree to be updated in a bottom-up sort of fashion, updating parent nodes in response to leaf nodes becoming empty. But the elements we want to delete are stored in a different selection list data structure. We didn't arrive at the elements to delete through a top-to-bottom tree traversal.
In that case it might not only be a lot simpler to implement but also computationally efficient to store pointers/indices from child to parent, and possibly even element to leaf, since we're updating the tree in response to activity that occurred at the leaves in a bottom-up fashion. Otherwise you'd have to work from top to bottom and then back up again somehow, and the removal of such elements would have to be done centrally through the tree working in a top-to-bottom-and-back-up-again fashion.
To me the most useful cases I've found would be cases where the tree needs to update as a result of activity occurring in the leaves from the "outside world", so to speak, not in the middle of descending down the tree, and often involving two or more data structures, not just that one tree itself.
Another example is like, say you have a GUI widget which, upon being clicked, minimizes its parent widget. But we don't descend down the tree to determine what widget is clicked. We use another data structure for that like a spatial hash. So in that case we want to get from child widget to parent widget, but we didn't arrive at the child widget through top-down tree traversal of the GUI hierarchy so we don't have the parent widget readily available in a stack, e.g. We arrived at the child widget being clicked on through a spatial query into a different data structure. In that case we could avoid working our way down from root to child's parent if the child simply stored its parent.

Why do we have to use depth-first traversal for a parse tree?

During my learning of parsing technology, it seems the parse tree is always traversed in a depth-first manner.
The leftmost derivation corresponds to a preorder traversal of the
parse tree, while the rightmost derivation corresponds to the reverse
of a postorder traversal of the parse tree.
[1]
And pre-order and post-order traversals are just 2 specific types of
depth-first tree traversal[2].
I think the reason lies in the difference between a plain tree and a parse tree. A plain tree only records the topology structure among nodes, while a parse tree records more than that. A parse tree further implies that the parent node is built upon the child nodes because a parent node derives into a collection of child nodes. If we want to compute the root node of the parse tree, which is the ultimate goal of creating a parse tree, we have to compute all the prerequisites. So a depth-first traversal is a natural must.
Is my understanding correct? Or is there any other scenario where other ways of traversal of a parse tree are necessary/mandatory?
You are considering only two of the possible parsing strategies: top-down left-to-right parsing, and bottom-up left-to-right parsing. Those are the two most popular strategies, to be sure. But they are not the only possibilities.
Each of these two strategies corresponds to one parse tree traverse, as the text you quote indicates. And the two traverses are both depth-first, in effect because the two parse strategies are both left-to-right. [Note 1]
Many other parse strategies are available, and they would correspond to other tree traverses. You could, for example, attempt to parse the text by starting in the middle somewhere (say, at a point where you were for some reason certain of the parse, perhaps because you are within all possible parenthetic groupings) and work outwards from there in some manner determined by your parsing algorithm. This strategy is certainly possible, and there is even a certain amount of literature about it (possibly not very current) because it makes sense in the context of doing partial parses of incorrect texts, for example for diagnostic purposes or syntax-highlighted display.
Even if you perform a left-to-right parse, you don't need to choose between top-down and bottom-up parsing. Before the LALR algorithm was discovered, there was quite a bit of investigation of "left corner" (LC) parsing, which switches between top-down and bottom-up parsing at the point where it becomes convenient to do so (the "corner"). The derivation so produced is neither leftmost nor rightmost, and it is hard to characterize the corresponding traverse (as per my footnote), although I think that a reasonable characterization would still result in a depth-first correspondence because the algorithm is still left-to-right.
In all cases, once the parse tree (or abstract syntax tree) has been constructed, you are free to traverse it in any fashion you like, and different semantic analysis algorithms perform different types of traverses. In an optimizing multi-pass compiler, you would expect to find a huge variety of different tree traverses, some depth-first, some breadth-first, and some which bounce around as necessary.
Notes:
I'm not sure whether the word "traverse" is really accurate here. The parse tree is not really being traversed, as such, since it doesn't yet exist; it is being constructed. The top-down strategy can be viewed as a depth-first preorder traverse of a tree which magically springs into existence during the traverse.
On the other hand, the bottom-up strategy starts at the leftmost leaf node, and proceeds to deduce the traverse which arrived at that point, which is why the quoted text calls it "the reverse" of a traverse. Is that really a meaningful concept? It is meaningful as a description of the final result, certainly, but it doesn't really correspond to any intuitive sense of the word "traverse". If you were travelling to London, you couldn't start your trip at the point where you make the final exit from the M40.

Why nodes of a binary tree have links only from parent to children?

Why nodes of a binary tree have links only from parent to children? I know tha there is threaded binary tree but those are harder to implement. A binary tree with two links will allow traversal in both directions iteratively without a stack or queue.
I do not know of any such design. If there is one please let me know.
Edit1: Let me conjure a problem for this. I want to do traversal without recursion and without using extra memory in form of stack or queue.
PS: I am afraid that I am going to get flake and downvotes for this stupid question.
Some binary trees do require children to keep up with their parent, or even their grandparent, e.g. Splay Trees. However this is only to balance or splay the tree. The reason we only traverse a tree from the parent to the children is because we are usually searching for a specific node, and as long as the binary tree is implemented such that all left children are less than the parent, and all right children are greater than the parent (or vice-versa), we only need links in one direction to find that node. We start the search at the root and then iterate down, and if the node is in the tree, we are guaranteed to find it. If we started at a leaf, there is no guarantee we would find the node we want by going back to the root. The reason we don't have links from the child to the parent is because it is unnecessary for searches. Hope this helps.
It can be, however, we should consider the balance between the memory usage and the complexity.
Yeah you can traverse the binary tree with an extra link in each node, but actually you are using the same extra memory as you do the traversal with a queue, which even run faster.
What binary search tree good at is that it can implement many searching problems in O(logN). It's fast enough and memory saving.
Let me conjure a problem for this. I want to do traversal without recursion and without using extra memory in form of stack or queue.
Have you considered that the parent pointers in the tree occupy space themselves?
They add O(N) memory to the tree to store parent pointer in order not to use O(log N) space during recursion.
What parent pointers allow us to do is to support an API whereby the caller can pass a pointer to a node and request an operation on it like "find the next node in order" (for example).
In this situation, we do not have a stack which holds the path to the root; we just receive a node "out of the blue" from the caller. With parent pointers, given a tree node, we can find its successor in amortized constant time O(1).
Implementations which don't require this functionality can save space by not including the parent pointers in the tree, and using recursion or an explicit stack structure for the root to leaf traversals.

How to prevent cycles when using a purely functional depth-first search

I have a graph that is implemented as a list of edges connecting arbitrary nodes, with the data types defined below.
type edge = int * int;;
type graph = edge list;;
How would I perform a purely functional depth-first search while avoiding getting stuck on a cycle? I am not quite sure of how to keep track of all the nodes visited while remaining purely functional. The answer is probably something trivial that I am not conceptually grasping for some reason.
The search function has a parameter that tracks visited nodes. In FP one of the insights is that you can keep calling deeper and deeper (with tail calls). So you can pass the parameter along through all the calls, adding new nodes as you go.
Another parameter could be the nodes you plan to visit later. For DFS this would work like a stack.

Efficient way to recursively calculate dominator tree?

I'm using the Lengauer and Tarjan algorithm with path compression to calculate the dominator tree for a graph where there are millions of nodes. The algorithm is quite complex and I have to admit I haven't taken the time to fully understand it, I'm just using it. Now I have a need to calculate the dominator trees of the direct children of the root node and possibly recurse down the graph to a certain depth repeating this operation. I.e. when I calculate the dominator tree for a child of the root node I want to pretend that the root node has been removed from the graph.
My question is whether there is an efficient solution to this that makes use of immediate dominator information already calculated in the initial dominator tree for the root node? In other words I don't want to start from scratch for each of the children because the whole process is quite time consuming.
Naively it seems it must be possible since there will be plenty of nodes deep down in the graph that have idoms just a little way above them and are unaffected by changes at the top of the graph.
BTW just as aside: it's bizarre that the subject of dominator trees is "owned" by compiler people and there is no mention of it in books on classic graph theory. The application I'm using it for - my FindRoots java heap analyzer - is not related to compiler theory.
Clarification: I'm talking about directed graphs here. The "root" I refer to is actually the node with the greatest reachability. I've updated the text above replacing references to "tree" with "graph". I tend to think of them as trees because the shape is mainly tree-like. The graph is actually of the objects in a java heap and as you can imagine is reasonably hierarchical. I have found the dominator tree useful when doing OOM leak analysis because what you are interested in is "what keeps this object alive?" and the answer ultimately is its dominator. Dominator trees allow you to <ahem> see the wood rather than the trees. But sometimes lots of junk floats to the top of the tree so you have a root with thousands of children directly below it. For such cases I would like to experiment with calculating the dominator trees rooted at each of the direct children (in the original graph) of the root and then maybe go to the next level down and so on. (I'm trying not to worry about the possibility of back links for the time being :)
boost::lengauer_tarjan_dominator_tree_without_dfs might help.
Judging by the lack of comments, I guess there aren't many people on Stackoverflow with the relevent experience to help you. I'm one of those people, but I don't want such an interesting question go down with with a dull thud so I'll try and lend a hand.
My first thought is that if this graph is generated by other compilers would it be worth taking a look at an open-source compiler, like GCC, to see how it solves this problem?
My second thought is that, the main point of your question appears to be avoiding recomputing the result for the root of the tree.
What I would do is create a wrapper around each node that contains the node itself and any pre-computed data associated with that node. A new tree would then be reconstructed from the old tree recursively using these wrapper classes. As you're constructing this tree, you'd start at the root and work your way out to the leaf nodes. For each node, you'd store the result of the computation for all the ancestory thus far. That way, you should only ever have to look at the parent node and the current node data you're processing to compute the value for your new node.
I hope that helps!
Could you elaborate on what sort of graph you're starting with? I don't see how there is any difference between a graph which is a tree, and the dominator tree of that graph. Every node's parent should be its idom, and it would of course be dominated by everything above it in the tree.
I do not fully understand your question, but it seems to me you want to have some incremental update feature. I researched a while ago what algorithms are their but it seemed to me that there's no known way for large graphs to do this quickly (at least from a theoretical standpoint).
You may just search for "incremental updates dominator tree" to find some references.
I guess you are aware the Eclipse Memory Analyzer does use dominator trees, so this topic is not completely "owned" by the compiler community anymore :)

Resources