When to use parent pointers in trees? - algorithm

There are many problems in which we need to find the parents or ancestors of a node in a tree repeatedly. So, in those scenarios, instead of finding the parent node at run-time, a less complicated approach seems to be using parent pointers. This is time efficient but increase space. Can anyone suggest, in which kind of problems or scenarios, it is advisable to use parent pointers in a tree?
For example - distance between two nodes of a tree?

using parent pointers. This is time efficient but increase space.
A classic trade-off in Computer Science.
In which kind of problems or scenarios, it is advisable to use parent pointers in a tree?
In cases where finding the parents in runtime would cost much more than having pointers to the parents.
Now, one has to understand what cost means. You mentioned the trade-off yourself: One should think whether or not is worth to spend some extra memory to store the pointers, in order speedup your program.

Here are some of the scenarios that I can think of, where having a parent pointer saved in a node could help improve out time complexity
-> Ancestors of a given node in a binary tree
-> Union Find Algorithm
-> Maintain collection of disjoint sets
-> Merge two sets together
Now according to me in general having a parent pointer for any kind of tree problem or trie problem would make your traversal up-down or bottom-up easier.
Hope this helps!

Just cases where you need efficient bottom-up traversal outside the context of top-to-bottom traversal as a generalized answer.
As a concrete example, let's say you have a graphics software which uses a quad-tree to efficiently draw only elements on screen and let users select elements efficiently that they click on or marquee select.
However, after the users select some elements, they can then delete them. Deleting those elements would require the quad-tree to be updated in a bottom-up sort of fashion, updating parent nodes in response to leaf nodes becoming empty. But the elements we want to delete are stored in a different selection list data structure. We didn't arrive at the elements to delete through a top-to-bottom tree traversal.
In that case it might not only be a lot simpler to implement but also computationally efficient to store pointers/indices from child to parent, and possibly even element to leaf, since we're updating the tree in response to activity that occurred at the leaves in a bottom-up fashion. Otherwise you'd have to work from top to bottom and then back up again somehow, and the removal of such elements would have to be done centrally through the tree working in a top-to-bottom-and-back-up-again fashion.
To me the most useful cases I've found would be cases where the tree needs to update as a result of activity occurring in the leaves from the "outside world", so to speak, not in the middle of descending down the tree, and often involving two or more data structures, not just that one tree itself.
Another example is like, say you have a GUI widget which, upon being clicked, minimizes its parent widget. But we don't descend down the tree to determine what widget is clicked. We use another data structure for that like a spatial hash. So in that case we want to get from child widget to parent widget, but we didn't arrive at the child widget through top-down tree traversal of the GUI hierarchy so we don't have the parent widget readily available in a stack, e.g. We arrived at the child widget being clicked on through a spatial query into a different data structure. In that case we could avoid working our way down from root to child's parent if the child simply stored its parent.

Related

Rope and self-balancing binary tree hybrid? (i.e Sorted set with fast n-th element lookup)

Is there a data structure for a sorted set allows quick lookup of the n-th (i.e. the least but n-th) item? That is, something like a a hybrid between a rope and a red-black tree.
Seems like it should be possible to either keep track of the size of the left subtree and update it through rotations or do something else clever and I'm hoping someone smart has already worked this out.
Seems like it should be possible to either keep track of the size of the left subtree and update it through rotations […]
Yes, this is quite possible; but instead of keeping track of the size of the left subtree, it's a bit simpler to keep track of the size of the complete subtree rooted at a given node. (You can then get the size of its left subtree by examining its left-child's size.) It's not as tricky as you might think, because you can always re-calculate a node's size as long as its children are up-to-date, so you don't need any extra bookkeeping beyond making sure that you recalculate sizes by working your way up the tree.
Note that, in most mutable red-black tree implementations, 'put' and 'delete' stop walking back up the tree once they've restored the invariants, whereas with this approach you need to walk all the way back up the tree in all cases. That'll be a small performance hit, but at least it's not hard to implement. (In purely functional red-black tree implementations, even that isn't a problem, because those always have to walk the full path back up to create the new parent nodes. So you can just put the size-calculation in the constructor — very simple.)
Edited in response to your comment:
I was vaguely hoping this data structure already had a name so I could just find some implementations out there and that there was something clever one could do to minimize the updating but (while I can find plenty of papers on data structures that are variations of balanced binary trees) I can't figure out a good search term to look for papers that let one lookup the nth least element.
The fancy term for the nth smallest value in a collection is order statistic; so a tree structure that enables fast lookup by order statistic is called an order statistic tree. That second link includes some references that may help you — not sure, I haven't looked at them — but regardless, that should give you some good search terms. :-)
Yes, this is fully possible. Self-balancing tree algorithms do not actually need to be search trees, that is simply the typical presentation. The actual requirement is that nodes be ordered in some fashion (which a rope provides).
What is required is to update the tree weight on insert and erase. Rotations do not require a full update, local is enough. For example, a left rotate requires that the weight of the parent be added to the new parent (since that new parent is the old parent's right child it is not necessary to walk down the new parent's right descent tree since that was already the new parent's left descent tree). Similarly, for a right rotate it is necessary to subtract the weight of the new parent only, since the new parent's right descent tree will become the left descent tree of the old parent.
I suppose it would be possible to create an insert that updates the weight as it does rotations then adds the weight up any remaining ancestors but I didn't bother when I was solving this problem. I simply added the new node's weight all the way up the tree then did rotations as needed. Similarly for erase, I did the fix-up rotations then subtracted the weight of the node being removed before finally unhooking the node from the tree.

Why nodes of a binary tree have links only from parent to children?

Why nodes of a binary tree have links only from parent to children? I know tha there is threaded binary tree but those are harder to implement. A binary tree with two links will allow traversal in both directions iteratively without a stack or queue.
I do not know of any such design. If there is one please let me know.
Edit1: Let me conjure a problem for this. I want to do traversal without recursion and without using extra memory in form of stack or queue.
PS: I am afraid that I am going to get flake and downvotes for this stupid question.
Some binary trees do require children to keep up with their parent, or even their grandparent, e.g. Splay Trees. However this is only to balance or splay the tree. The reason we only traverse a tree from the parent to the children is because we are usually searching for a specific node, and as long as the binary tree is implemented such that all left children are less than the parent, and all right children are greater than the parent (or vice-versa), we only need links in one direction to find that node. We start the search at the root and then iterate down, and if the node is in the tree, we are guaranteed to find it. If we started at a leaf, there is no guarantee we would find the node we want by going back to the root. The reason we don't have links from the child to the parent is because it is unnecessary for searches. Hope this helps.
It can be, however, we should consider the balance between the memory usage and the complexity.
Yeah you can traverse the binary tree with an extra link in each node, but actually you are using the same extra memory as you do the traversal with a queue, which even run faster.
What binary search tree good at is that it can implement many searching problems in O(logN). It's fast enough and memory saving.
Let me conjure a problem for this. I want to do traversal without recursion and without using extra memory in form of stack or queue.
Have you considered that the parent pointers in the tree occupy space themselves?
They add O(N) memory to the tree to store parent pointer in order not to use O(log N) space during recursion.
What parent pointers allow us to do is to support an API whereby the caller can pass a pointer to a node and request an operation on it like "find the next node in order" (for example).
In this situation, we do not have a stack which holds the path to the root; we just receive a node "out of the blue" from the caller. With parent pointers, given a tree node, we can find its successor in amortized constant time O(1).
Implementations which don't require this functionality can save space by not including the parent pointers in the tree, and using recursion or an explicit stack structure for the root to leaf traversals.

Can a graph node maintain a list of references to its parents?

I have a DAG implementation that works perfectly for my needs. I'm using it as an internal structure for one of my projects. Recently, I came across a use case where if I modify the attribute of a node, I need to propagate that attribute up to its parents and all the way up to the root. Each node in my DAG currently has an adjacency list that is basically just a list of references to the node's children. However, if I need to propagate changes to the parents of this node (and this node can have multiple parents), I will need a list of references to parent nodes.
Is this acceptable? Or is there a better way of doing this? Does it make sense to maintain two lists (one for parents and one for children)? I thought of adding the parents to the same adjacency list but this will give me cycles (i.e., parent->child and child->parent) for every parent-child relationship.
It's never necessary to store parent pointers in each node, but doing so can make things run a lot faster because you know exactly where to look in order to find the parents. In your case it's perfectly reasonable.
As an analogy - many implementations of binary search trees will store parent pointers so that they can more easily support rotations (which needs access to the parent) or deletions (where the parent node may need to be known). Similarly, some more complex data structures like Fibonacci heaps use parent pointers in each node in order to more efficiently implement the decrease-key operation.
The memory overhead for storing a list of parents isn't going to be too bad - you're essentially now double-counting each edge: each parent stores a pointer to its child and each child stores a pointer to its parent.
Hope this helps!

Red-Black trees - Erasing a node with two non-leaf children

I've been implementing my own version of a red-black tree, mostly basing my algorithms from Wikipedia (http://en.wikipedia.org/wiki/Red-black_tree). Its fairly concise for the most part, but there's one part that I would like clarification on.
When erasing a node from the tree that has 2 non-leaf (non-NULL) children, it says to move either side's children into the deletable node, and remove that child.
I'm a little confused as to which side to remove from, based on that. Do I pick the side randomly, do I alternate between sides, or do I stick to the same side for every future deletion?
If you have no prior knowledge about your input data, you cannot know which side is more benefitial of being the new intermediate node or the new child.
You can therefore just apply the rule that fits you most (is easiest to write/compute -- probably "always take the left one"). Employing a random scheme typically just introduces more unneeded computation.

Efficient way to recursively calculate dominator tree?

I'm using the Lengauer and Tarjan algorithm with path compression to calculate the dominator tree for a graph where there are millions of nodes. The algorithm is quite complex and I have to admit I haven't taken the time to fully understand it, I'm just using it. Now I have a need to calculate the dominator trees of the direct children of the root node and possibly recurse down the graph to a certain depth repeating this operation. I.e. when I calculate the dominator tree for a child of the root node I want to pretend that the root node has been removed from the graph.
My question is whether there is an efficient solution to this that makes use of immediate dominator information already calculated in the initial dominator tree for the root node? In other words I don't want to start from scratch for each of the children because the whole process is quite time consuming.
Naively it seems it must be possible since there will be plenty of nodes deep down in the graph that have idoms just a little way above them and are unaffected by changes at the top of the graph.
BTW just as aside: it's bizarre that the subject of dominator trees is "owned" by compiler people and there is no mention of it in books on classic graph theory. The application I'm using it for - my FindRoots java heap analyzer - is not related to compiler theory.
Clarification: I'm talking about directed graphs here. The "root" I refer to is actually the node with the greatest reachability. I've updated the text above replacing references to "tree" with "graph". I tend to think of them as trees because the shape is mainly tree-like. The graph is actually of the objects in a java heap and as you can imagine is reasonably hierarchical. I have found the dominator tree useful when doing OOM leak analysis because what you are interested in is "what keeps this object alive?" and the answer ultimately is its dominator. Dominator trees allow you to <ahem> see the wood rather than the trees. But sometimes lots of junk floats to the top of the tree so you have a root with thousands of children directly below it. For such cases I would like to experiment with calculating the dominator trees rooted at each of the direct children (in the original graph) of the root and then maybe go to the next level down and so on. (I'm trying not to worry about the possibility of back links for the time being :)
boost::lengauer_tarjan_dominator_tree_without_dfs might help.
Judging by the lack of comments, I guess there aren't many people on Stackoverflow with the relevent experience to help you. I'm one of those people, but I don't want such an interesting question go down with with a dull thud so I'll try and lend a hand.
My first thought is that if this graph is generated by other compilers would it be worth taking a look at an open-source compiler, like GCC, to see how it solves this problem?
My second thought is that, the main point of your question appears to be avoiding recomputing the result for the root of the tree.
What I would do is create a wrapper around each node that contains the node itself and any pre-computed data associated with that node. A new tree would then be reconstructed from the old tree recursively using these wrapper classes. As you're constructing this tree, you'd start at the root and work your way out to the leaf nodes. For each node, you'd store the result of the computation for all the ancestory thus far. That way, you should only ever have to look at the parent node and the current node data you're processing to compute the value for your new node.
I hope that helps!
Could you elaborate on what sort of graph you're starting with? I don't see how there is any difference between a graph which is a tree, and the dominator tree of that graph. Every node's parent should be its idom, and it would of course be dominated by everything above it in the tree.
I do not fully understand your question, but it seems to me you want to have some incremental update feature. I researched a while ago what algorithms are their but it seemed to me that there's no known way for large graphs to do this quickly (at least from a theoretical standpoint).
You may just search for "incremental updates dominator tree" to find some references.
I guess you are aware the Eclipse Memory Analyzer does use dominator trees, so this topic is not completely "owned" by the compiler community anymore :)

Resources