Multi-Root Tree Structure - data-structures

I have a data such that there are many parents each with 0-n children where each child can have 0-n nodes. Each node has a unique identifier (key) Ultimately, the parents are not connected to each other. It seems like this would be a list of trees, however that seems imprecise. I was thinking of joining them with a dummy root.
I need to be able to assembly a list of nodes that occur:
from any given node down (children)
from any given node down (children) then up to the root (up to the specific parent)
the top level parent of any given node (in an O(n) operation)
the level of the child in the tree (in an O(n) operation)
The structure will contain 300,000 nodes.
I was thinking perhaps I could implement a List of Trees and then also maintain a hash lookup structure that will reference a specific key value to provide me with a node as a starting point.
Is this a logical structure? Is there a better way to handle it? It seems crude to me.

If you are concerned in find a root node quickly you can think of create a tree where each node points to another tree.

Related

How to check if two binary trees share a node

Given an array of binary trees find whether any two trees share a node, not value wise, but "pointer" wise. At the bottom I provided an example.
My approach was to iterate through all the trees and store all the leaves (pointers) from each tree into a list, then check if list has any duplicates, but that's a rather slow approach. Is there perhaps a quicker way to solve this?
In the worst case you will have to traverse all nodes (all pointers) to find a shared node (pointer), as it might happen to be the last one visited. So the best time complexity we can expect to have is O(𝑚+𝑛) where 𝑚 and 𝑛 represent the number of nodes in either tree.
We can achieve this time complexity if we store the pointers from the first tree in a hash set and then traverse the pointers of the second tree to see if any of those is in the set. Assuming that get/set operations on a hash set have an amortized constant time complexity, the overal time complexity will be O(𝑚+𝑛).
If the same program is responsible for constructing the trees, then a reuse of the same node can be detected upon insertion. For instance, reuse of the same node in multiple trees can be completely avoided by having the insert method of your tree only take a value as argument, never a node instance. The method will then encapsulate the actual creation of the node, guaranteeing its uniqueness.
An idea for O(#nodes) time and O(1) space. It does more traversal work than simple traversals using a hash table, but it doesn't have the cost of using a hash table. I don't know what's better. Might depend on the language.
For two trees
Create one extra node. Do a Morris traversal of the first tree. It only modifies right child pointers, so we can use left child pointers for marking nodes as seen. For every tree node without left child, set our extra node as left child. Whenever checking a left child pointer, treat our extra node like a null pointer, i.e., don't visit it. After the traversal, the tree structure is restored, and all originally left-child-less tree nodes now point to our extra node as left child. That includes all leaf nodes.
Do a Morris traversal of the second tree. Again treat pointers to our extra node like null pointers. If we ever do encounter our extra node, we know the trees share a node. If not, then we know the trees don't share a node, since if they did share any, they'd also share a leaf node (just go down from any shared node to a leaf node, that's also shared), and all leafs nodes of the first tree are marked. After the traversal, the second tree is restored.
Do a Morris traversal of the first tree again, this time removing our extra node, restoring the original null pointers.
For an array of more than two trees
Mark the first tree as above. Check the second tree as above. Mark the second tree. Check the third. Mark the third. Check the fourth. Mark the fourth. Etc. When you found a shared node or there are no more trees, unmark the marked trees.
Every shared node must have two parents, or an ancestor with two parents.
LOOP over nodes
IF node has two parents
MARK node as shared
Mark all descendants as shared.

Can a graph node maintain a list of references to its parents?

I have a DAG implementation that works perfectly for my needs. I'm using it as an internal structure for one of my projects. Recently, I came across a use case where if I modify the attribute of a node, I need to propagate that attribute up to its parents and all the way up to the root. Each node in my DAG currently has an adjacency list that is basically just a list of references to the node's children. However, if I need to propagate changes to the parents of this node (and this node can have multiple parents), I will need a list of references to parent nodes.
Is this acceptable? Or is there a better way of doing this? Does it make sense to maintain two lists (one for parents and one for children)? I thought of adding the parents to the same adjacency list but this will give me cycles (i.e., parent->child and child->parent) for every parent-child relationship.
It's never necessary to store parent pointers in each node, but doing so can make things run a lot faster because you know exactly where to look in order to find the parents. In your case it's perfectly reasonable.
As an analogy - many implementations of binary search trees will store parent pointers so that they can more easily support rotations (which needs access to the parent) or deletions (where the parent node may need to be known). Similarly, some more complex data structures like Fibonacci heaps use parent pointers in each node in order to more efficiently implement the decrease-key operation.
The memory overhead for storing a list of parents isn't going to be too bad - you're essentially now double-counting each edge: each parent stores a pointer to its child and each child stores a pointer to its parent.
Hope this helps!

Tree traversal - starting at leaves with only parent pointers?

Is it conceptually possible to have a tree where you traverse it by starting at a given leaf node (rather than the root node) and use parent pointers to get to the root?
I ask this since I saw someone implement a tree and they used an array to hold all of the leaf nodes/external nodes and each of the leaf/external nodes point only to their parent nodes and those parent point to their parent node etc. until you get to the root node which has no parents. Their implementation thus would require you to start at one of the leaves to get to anywhere in the tree and you would cannot go "down" the tree since her tree nodes do not have any children pointers, only parent pointers.
I found this implementation interesting since I haven't seen anything like it but I was curious if it could still be considered it a "tree". I have never seen a tree where you start traversal at the leaves, instead of root. I have also never seen a tree where the tree nodes only have parent pointers and no children pointers.
Yep, this structure exists. It's often called a spaghetti stack.
Spaghetti stacks are useful for representing the "is a part of" relation. For example, if you want to represent a class hierarchy in a way that makes upcasting efficient, then you might represent the class hierarchy as a spaghetti stack in which the node for each type stores a pointer to its parent type. That way, it's easy to find whether an upcast is valid by just walking upward from the node.
They're also often used in compilers to track scoping information. Each node represents one scope in the program, and each node has a pointer to the node representing the scope one level above it.
You can also think of a blockchain this way. Each block stores a backreference to its parent block. By starting at any block and tracing backwards to the root, you can recover the state encoded by that block.
Hope this helps!
If an array A of leaf nodes are given, traversal is possible. If only a single leaf node is given, I don't know how to traverse the tree. Pseudocode:
// initial step
add all nodes in A to a queue Q
//removeNode(Q) returns pointer to node at front of Q
while((node = removeNode(Q)) != NULL)
/* do your operation on node */
add node->parent to Q

Find a loop in a binary tree

How to find a loop in a binary tree? I am looking for a solution other than marking the visited nodes as visited or doing a address hashing. Any ideas?
Suppose you have a binary tree but you don't trust it and you think it might be a graph, the general case will dictate to remember the visited nodes. It is, somewhat, the same algorithm to construct a minimum spanning tree from a graph and this means the space and time complexity will be an issue.
Another approach would be to consider the data you save in the tree. Consider you have numbers of hashes so you can compare.
A pseudocode would test for this conditions:
Every node would have to have a maximum of 2 children and 1 parent (max 3 connections). More then 3 connections => not a binary tree.
The parent must not be a child.
If a node has two children, then the left child has a smaller value than the parent and the right child has a bigger value. So considering this, if a leaf, or inner node has as a child some node on a higher level (like parent's parent) you can determine a loop based on the values. If a child is a right node then it's value must be bigger then it's parent but if that child forms a loop, it means he is from the left part or the right part of the parent.
3.a. So if it is from the left part then it's value is smaller than it's sibling. So => not a binary tree. The idea is somewhat the same for the other part.
Testing aside, in what form is the tree that you want to test? Remeber that every node has a pointer to it's parent. An this pointer points to a single parent. So depending of the format you tree is in, you can take advantage from this.
As mentioned already: A tree does not (by definition) contain cycles (loops).
To test if your directed graph contains cycles (references to nodes already added to the tree) you can iterate trough the tree and add each node to a visited-list (or the hash of it if you rather prefer) and check each new node if it is in the list.
Plenty of algorithms for cycle-detection in graphs are just a google-search away.

Are there any tools to find duplicate sections in a tree data structure?

I'm looking for a tool that finds duplicate nodes in a tree data structure (using Freemind to map the data structure, but I'll settle for anything I can export a generic data tree out too...)
The idea is that I can break the tree down into modules which I can repeat thus simplifying the structure of the tree.
I would just have a table of subtrees.
Walk the tree depth-first. At each node, after visiting sub-nodes, if there is an equivalent node in the table, replace the current node with the one in the table. If there is not an equivalent node in the table, then add the current node to the table.
Does that do it? I believe it's called common-subexpression-elimination.
Wouldn't it actually be better to prevent duplicate nodes in a tree? Why do you need duplicate nodes in a tree?

Resources