Binary tree node keeping reference to its parent - binary-tree

Is it 'traditional' (or 'ethical') for a Node in a binary tree to keep a reference to its parents?
Normally, I would not think so, simply because a tree is a directed graph, and so the fact that the PARENT-->CHILD link is defined should not mean that CHILD --->PARENT is also defined.
In other words, by keeping a reference to the parent, we would somehow break the semantic of the tree.
But I would like to know what people think?
I asked because I was given a problem of finding the lowest common parent of two given nodes in a tree. If each node has a reference to its parent, the problem would be super easy to solve, but that feels like cheating!
Thanks

How you implement a binary tree should be dependent on your needs.
If your application requires tree traversal in the direction of leaf to trunk, then the best way to do so would be to implement references to parent nodes.
I find that it is better to fit your data structures to your needs rather than try to make workarounds with other logic. After all, why must a tree be a directed graph? Making it directed is a specific implementation, much like a list and its specific implementation as a singly- or doubly-linked list.

It can still be a directed graph of ownership. Consider the following node:
template <typename T>
struct node
{
T data_;
std::unique_ptr<node> left_child_; // I own my children.
std::unique_ptr<node> right_child_;
node* parent_; // Just lookin' at my parent.
};
As Steven Meyer said above, it's really not cheating: build the data structure to solve your problem, don't worry about the ethics of it :-)

Cross-posting from the Software Engineering Stack Exchange.
The Wikipedia definition states "For example, looking at a tree as a whole, one can talk about "the parent node" of a given node, but in general, as a data structure, a given node only contains the list of its children but does not contain a reference to its parent (if any)."

Related

Node-Traversal Style Algorithm

Apologies if this is the wrong place to ask this kind of question but seeing as it was algorithm based, I felt it fit.
What I am trying to do is figure out or find out the name of an algorithm I can use to apply to the following...
Currently, this is what happens.
I have a node tree that can have any number of starting nodes. (Not exactly a node tree, but the best analogy I could come up with).
These nodes then branch off into any number of other nodes
Resources are added to these nodes and associate themselves with all child nodes recursively.
Now the problem arises when I want to get all associated nodes for a particular resource. Getting all of them is fine, but not desirable. What I really want to do is only retrieve the top-most node for each association.
Edit
This is being done in JS using Sails as a framework with MySQL datasource.
{
name: 'Some Node Name'
children: [] // Array of child nodes
parent: 1 // Id of the parent node or null if it is top-level
resources: [] // Array of resources associated
}
If there is already an algorithm that tackles this already then I'd appreciate the direction.
Thanks.
If I understood the question correctly, the nodes you term top level resources refers to theose nodes which are not referenced by any other node; in terms of graph theory, these are the ones with indegree zero. From the description of the problem, these are the ones where parent is null.

When to use parent pointers in trees?

There are many problems in which we need to find the parents or ancestors of a node in a tree repeatedly. So, in those scenarios, instead of finding the parent node at run-time, a less complicated approach seems to be using parent pointers. This is time efficient but increase space. Can anyone suggest, in which kind of problems or scenarios, it is advisable to use parent pointers in a tree?
For example - distance between two nodes of a tree?
using parent pointers. This is time efficient but increase space.
A classic trade-off in Computer Science.
In which kind of problems or scenarios, it is advisable to use parent pointers in a tree?
In cases where finding the parents in runtime would cost much more than having pointers to the parents.
Now, one has to understand what cost means. You mentioned the trade-off yourself: One should think whether or not is worth to spend some extra memory to store the pointers, in order speedup your program.
Here are some of the scenarios that I can think of, where having a parent pointer saved in a node could help improve out time complexity
-> Ancestors of a given node in a binary tree
-> Union Find Algorithm
-> Maintain collection of disjoint sets
-> Merge two sets together
Now according to me in general having a parent pointer for any kind of tree problem or trie problem would make your traversal up-down or bottom-up easier.
Hope this helps!
Just cases where you need efficient bottom-up traversal outside the context of top-to-bottom traversal as a generalized answer.
As a concrete example, let's say you have a graphics software which uses a quad-tree to efficiently draw only elements on screen and let users select elements efficiently that they click on or marquee select.
However, after the users select some elements, they can then delete them. Deleting those elements would require the quad-tree to be updated in a bottom-up sort of fashion, updating parent nodes in response to leaf nodes becoming empty. But the elements we want to delete are stored in a different selection list data structure. We didn't arrive at the elements to delete through a top-to-bottom tree traversal.
In that case it might not only be a lot simpler to implement but also computationally efficient to store pointers/indices from child to parent, and possibly even element to leaf, since we're updating the tree in response to activity that occurred at the leaves in a bottom-up fashion. Otherwise you'd have to work from top to bottom and then back up again somehow, and the removal of such elements would have to be done centrally through the tree working in a top-to-bottom-and-back-up-again fashion.
To me the most useful cases I've found would be cases where the tree needs to update as a result of activity occurring in the leaves from the "outside world", so to speak, not in the middle of descending down the tree, and often involving two or more data structures, not just that one tree itself.
Another example is like, say you have a GUI widget which, upon being clicked, minimizes its parent widget. But we don't descend down the tree to determine what widget is clicked. We use another data structure for that like a spatial hash. So in that case we want to get from child widget to parent widget, but we didn't arrive at the child widget through top-down tree traversal of the GUI hierarchy so we don't have the parent widget readily available in a stack, e.g. We arrived at the child widget being clicked on through a spatial query into a different data structure. In that case we could avoid working our way down from root to child's parent if the child simply stored its parent.

Is there a name for this data structure that is kind of "opposite" of a tree?

We all know what a tree is: on the first level of a tree we have a root, and from the root come branches that are trees as well. But how do I name the "opposite" structure: on the i-th level we have a set of "leaf" nodes, and those nodes form groups of 1+ nodes, and a group points to a "trunk" node on i+1th level. If you want a visual example, imagine raindrops flowing down a window and combining as they collide.
A lot of tree data structures are actually constructed from leaf to root, and can be stored to allow for going one or both directions.
I don't think it really has a special name as it's more a convention than a requirement for trees typically to go from root to leaf rather than the other way or both ways. Also there are a number of tree data structures that allow for going both ways.
Every tree is a DAG, a directed acyclic graph, and so is the data-structure that you describe. What you describe is also a multitree, a subset of DAGs. Possibly there is a more precise real subset of multitrees that describes your graph, but I am not aware of it. Hope this helps.

Can a graph node maintain a list of references to its parents?

I have a DAG implementation that works perfectly for my needs. I'm using it as an internal structure for one of my projects. Recently, I came across a use case where if I modify the attribute of a node, I need to propagate that attribute up to its parents and all the way up to the root. Each node in my DAG currently has an adjacency list that is basically just a list of references to the node's children. However, if I need to propagate changes to the parents of this node (and this node can have multiple parents), I will need a list of references to parent nodes.
Is this acceptable? Or is there a better way of doing this? Does it make sense to maintain two lists (one for parents and one for children)? I thought of adding the parents to the same adjacency list but this will give me cycles (i.e., parent->child and child->parent) for every parent-child relationship.
It's never necessary to store parent pointers in each node, but doing so can make things run a lot faster because you know exactly where to look in order to find the parents. In your case it's perfectly reasonable.
As an analogy - many implementations of binary search trees will store parent pointers so that they can more easily support rotations (which needs access to the parent) or deletions (where the parent node may need to be known). Similarly, some more complex data structures like Fibonacci heaps use parent pointers in each node in order to more efficiently implement the decrease-key operation.
The memory overhead for storing a list of parents isn't going to be too bad - you're essentially now double-counting each edge: each parent stores a pointer to its child and each child stores a pointer to its parent.
Hope this helps!

Efficient way to recursively calculate dominator tree?

I'm using the Lengauer and Tarjan algorithm with path compression to calculate the dominator tree for a graph where there are millions of nodes. The algorithm is quite complex and I have to admit I haven't taken the time to fully understand it, I'm just using it. Now I have a need to calculate the dominator trees of the direct children of the root node and possibly recurse down the graph to a certain depth repeating this operation. I.e. when I calculate the dominator tree for a child of the root node I want to pretend that the root node has been removed from the graph.
My question is whether there is an efficient solution to this that makes use of immediate dominator information already calculated in the initial dominator tree for the root node? In other words I don't want to start from scratch for each of the children because the whole process is quite time consuming.
Naively it seems it must be possible since there will be plenty of nodes deep down in the graph that have idoms just a little way above them and are unaffected by changes at the top of the graph.
BTW just as aside: it's bizarre that the subject of dominator trees is "owned" by compiler people and there is no mention of it in books on classic graph theory. The application I'm using it for - my FindRoots java heap analyzer - is not related to compiler theory.
Clarification: I'm talking about directed graphs here. The "root" I refer to is actually the node with the greatest reachability. I've updated the text above replacing references to "tree" with "graph". I tend to think of them as trees because the shape is mainly tree-like. The graph is actually of the objects in a java heap and as you can imagine is reasonably hierarchical. I have found the dominator tree useful when doing OOM leak analysis because what you are interested in is "what keeps this object alive?" and the answer ultimately is its dominator. Dominator trees allow you to <ahem> see the wood rather than the trees. But sometimes lots of junk floats to the top of the tree so you have a root with thousands of children directly below it. For such cases I would like to experiment with calculating the dominator trees rooted at each of the direct children (in the original graph) of the root and then maybe go to the next level down and so on. (I'm trying not to worry about the possibility of back links for the time being :)
boost::lengauer_tarjan_dominator_tree_without_dfs might help.
Judging by the lack of comments, I guess there aren't many people on Stackoverflow with the relevent experience to help you. I'm one of those people, but I don't want such an interesting question go down with with a dull thud so I'll try and lend a hand.
My first thought is that if this graph is generated by other compilers would it be worth taking a look at an open-source compiler, like GCC, to see how it solves this problem?
My second thought is that, the main point of your question appears to be avoiding recomputing the result for the root of the tree.
What I would do is create a wrapper around each node that contains the node itself and any pre-computed data associated with that node. A new tree would then be reconstructed from the old tree recursively using these wrapper classes. As you're constructing this tree, you'd start at the root and work your way out to the leaf nodes. For each node, you'd store the result of the computation for all the ancestory thus far. That way, you should only ever have to look at the parent node and the current node data you're processing to compute the value for your new node.
I hope that helps!
Could you elaborate on what sort of graph you're starting with? I don't see how there is any difference between a graph which is a tree, and the dominator tree of that graph. Every node's parent should be its idom, and it would of course be dominated by everything above it in the tree.
I do not fully understand your question, but it seems to me you want to have some incremental update feature. I researched a while ago what algorithms are their but it seemed to me that there's no known way for large graphs to do this quickly (at least from a theoretical standpoint).
You may just search for "incremental updates dominator tree" to find some references.
I guess you are aware the Eclipse Memory Analyzer does use dominator trees, so this topic is not completely "owned" by the compiler community anymore :)

Resources