Hashing a Tree Structure - algorithm

I've just come across a scenario in my project where it I need to compare different tree objects for equality with already known instances, and have considered that some sort of hashing algorithm that operates on an arbitrary tree would be very useful.
Take for example the following tree:
O
/ \
/ \
O O
/|\ |
/ | \ |
O O O O
/ \
/ \
O O
Where each O represents a node of the tree, is an arbitrary object, has has an associated hash function. So the problem reduces to: given the hash code of the nodes of tree structure, and a known structure, what is a decent algorithm for computing a (relatively) collision-free hash code for the entire tree?
A few notes on the properties of the hash function:
The hash function should depend on the hash code of every node within the tree as well as its position.
Reordering the children of a node should distinctly change the resulting hash code.
Reflecting any part of the tree should distinctly change the resulting hash code
If it helps, I'm using C# 4.0 here in my project, though I'm primarily looking for a theoretical solution, so pseudo-code, a description, or code in another imperative language would be fine.
UPDATE
Well, here's my own proposed solution. It has been helped much by several of the answers here.
Each node (sub-tree/leaf node) has the following hash function:
public override int GetHashCode()
{
int hashCode = unchecked((this.Symbol.GetHashCode() * 31 +
this.Value.GetHashCode()));
for (int i = 0; i < this.Children.Count; i++)
hashCode = unchecked(hashCode * 31 + this.Children[i].GetHashCode());
return hashCode;
}
The nice thing about this method, as I see it, is that hash codes can be cached and only recalculated when the node or one of its descendants changes. (Thanks to vatine and Jason Orendorff for pointing this out).
Anyway, I would be grateful if people could comment on my suggested solution here - if it does the job well, then great, otherwise any possible improvements would be welcome.

If I were to do this, I'd probably do something like the following:
For each leaf node, compute the concatenation of 0 and the hash of the node data.
For each internal node, compute the concatenation of 1 and the hash of any local data (NB: may not be applicable) and the hash of the children from left to right.
This will lead to a cascade up the tree every time you change anything, but that MAY be low-enough of an overhead to be worthwhile. If changes are relatively infrequent compared to the amount of changes, it may even make sense to go for a cryptographically secure hash.
Edit1: There is also the possibility of adding a "hash valid" flag to each node and simply propagate a "false" up the tree (or "hash invalid" and propagate "true") up the tree on a node change. That way, it may be possible to avoid a complete recalculation when the tree hash is needed and possibly avoid multiple hash calculations that are not used, at the risk of slightly less predictable time to get a hash when needed.
Edit3: The hash code suggested by Noldorin in the question looks like it would have a chance of collisions, if the result of GetHashCode can ever be 0. Essentially, there is no way of distinguishing a tree composed of a single node, with "symbol hash" 30 and "value hash" 25 and a two-node tree, where the root has a "symbol hash" of 0 and a "value hash" of 30 and the child node has a total hash of 25. The examples are entirely invented, I don't know what expected hash ranges are so I can only comment on what I see in the presented code.
Using 31 as the multiplicative constant is good, in that it will cause any overflow to happen on a non-bit boundary, although I am thinking that, with sufficient children and possibly adversarial content in the tree, the hash contribution from items hashed early MAY be dominated by later hashed items.
However, if the hash performs decently on expected data, it looks as if it will do the job. It's certainly faster than using a cryptographic hash (as done in the example code listed below).
Edit2: As for specific algorithms and minimum data structure needed, something like the following (Python, translating to any other language should be relatively easy).
#! /usr/bin/env python
import Crypto.Hash.SHA
class Node:
def __init__ (self, parent=None, contents="", children=[]):
self.valid = False
self.hash = False
self.contents = contents
self.children = children
def append_child (self, child):
self.children.append(child)
self.invalidate()
def invalidate (self):
self.valid = False
if self.parent:
self.parent.invalidate()
def gethash (self):
if self.valid:
return self.hash
digester = crypto.hash.SHA.new()
digester.update(self.contents)
if self.children:
for child in self.children:
digester.update(child.gethash())
self.hash = "1"+digester.hexdigest()
else:
self.hash = "0"+digester.hexdigest()
return self.hash
def setcontents (self):
self.valid = False
return self.contents

Okay, after your edit where you've introduced a requirement that the hashing result should be different for different tree layouts, you're only left with option to traverse the whole tree and write its structure to a single array.
That's done like this: you traverse the tree and dump the operations you do. For an original tree that could be (for a left-child-right-sibling structure):
[1, child, 2, child, 3, sibling, 4, sibling, 5, parent, parent, //we're at root again
sibling, 6, child, 7, child, 8, sibling, 9, parent, parent]
You may then hash the list (that is, effectively, a string) the way you like. As another option, you may even return this list as a result of hash-function, so it becomes collision-free tree representation.
But adding precise information about the whole structure is not what hash functions usually do. The way proposed should compute hash function of every node as well as traverse the whole tree. So you may consider other ways of hashing, described below.
If you don't want to traverse the whole tree:
One algorithm that immediately came to my mind is like this. Pick a large prime number H (that's greater than maximal number of children). To hash a tree, hash its root, pick a child number H mod n, where n is the number of children of root, and recursively hash the subtree of this child.
This seems to be a bad option if trees differ only deeply near the leaves. But at least it should run fast for not very tall trees.
If you want to hash less elements but go through the whole tree:
Instead of hashing subtree, you may want to hash layer-wise. I.e. hash root first, than hash one of nodes that are its children, then one of children of the children etc. So you cover the whole tree instead of one of specific paths. This makes hashing procedure slower, of course.
--- O ------- layer 0, n=1
/ \
/ \
--- O --- O ----- layer 1, n=2
/|\ |
/ | \ |
/ | \ |
O - O - O O------ layer 2, n=4
/ \
/ \
------ O --- O -- layer 3, n=2
A node from a layer is picked with H mod n rule.
The difference between this version and previous version is that a tree should undergo quite an illogical transformation to retain the hash function.

The usual technique of hashing any sequence is combining the values (or hashes thereof) of its elements in some mathematical way. I don't think a tree would be any different in this respect.
For example, here is the hash function for tuples in Python (taken from Objects/tupleobject.c in the source of Python 2.6):
static long
tuplehash(PyTupleObject *v)
{
register long x, y;
register Py_ssize_t len = Py_SIZE(v);
register PyObject **p;
long mult = 1000003L;
x = 0x345678L;
p = v->ob_item;
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (long)(82520L + len + len);
}
x += 97531L;
if (x == -1)
x = -2;
return x;
}
It's a relatively complex combination with constants experimentally chosen for best results for tuples of typical lengths. What I'm trying to show with this code snippet is that the issue is very complex and very heuristic, and the quality of the results probably depend on the more specific aspects of your data - i.e. domain knowledge may help you reach better results. However, for good-enough results you shouldn't look too far. I would guess that taking this algorithm and combining all the nodes of the tree instead of all the tuple elements, plus adding their position into play will give you a pretty good algorithm.
One option of taking the position into account is the node's position in an inorder walk of the tree.

Any time you are working with trees recursion should come to mind:
public override int GetHashCode() {
int hash = 5381;
foreach(var node in this.BreadthFirstTraversal()) {
hash = 33 * hash + node.GetHashCode();
}
}
The hash function should depend on the hash code of every node within the tree as well as its position.
Check. We are explicitly using node.GetHashCode() in the computation of the tree's hash code. Further, because of the nature of the algorithm, a node's position plays a role in the tree's ultimate hash code.
Reordering the children of a node should distinctly change the resulting hash code.
Check. They will be visited in a different order in the in-order traversal leading to a different hash code. (Note that if there are two children with the same hash code you will end up with the same hash code upon swapping the order of those children.)
Reflecting any part of the tree should distinctly change the resulting hash code
Check. Again the nodes would be visited in a different order leading to a different hash code. (Note that there are circumstances where the reflection could lead to the same hash code if every node is reflected into a node with the same hash code.)

The collision-free property of this will depend on how collision-free the hash function used for the node data is.
It sounds like you want a system where the hash of a particular node is a combination of the child node hashes, where order matters.
If you're planning on manipulating this tree a lot, you may want to pay the price in space of storing the hashcode with each node, to avoid the penalty of recalculation when performing operations on the tree.
Since the order of the child nodes matters, a method which might work here would be to combine the node data and children using prime number multiples and addition modulo some large number.
To go for something similar to Java's String hashcode:
Say you have n child nodes.
hash(node) = hash(nodedata) +
hash(childnode[0]) * 31^(n-1) +
hash(childnode[1]) * 31^(n-2) +
<...> +
hash(childnode[n])
Some more detail on the scheme used above can be found here: http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/

I can see that if you have a large set of trees to compare, then you could use a hash function to retrieve a set of potential candidates, then do a direct comparison.
A substring that would work is just use lisp syntax to put brackets around the tree, write out the identifiere of each node in pre-order. But this is computationally equivalent to a pre-order comparison of the tree, so why not just do that?
I've given 2 solutions: one is for comparing the two trees when you're done (needed to resolve collisions) and the other to compute the hashcode.
TREE COMPARISON:
The most efficient way to compare will be to simply recursively traverse each tree in a fixed order (pre-order is simple and as good as anything else), comparing the node at each step.
So, just create a Visitor pattern that successively returns the next node in pre-order for a tree. i.e. it's constructor can take the root of the tree.
Then, just create two insces of the Visitor, that act as generators for the next node in preorder. i.e. Vistor v1 = new Visitor(root1), Visitor v2 = new Visitor(root2)
Write a comparison function that can compare itself to another node.
Then just visit each node of the trees, comparing, and returning false if comparison fails. i.e.
Module
Function Compare(Node root1, Node root2)
Visitor v1 = new Visitor(root1)
Visitor v2 = new Visitor(root2)
loop
Node n1 = v1.next
Node n2 = v2.next
if (n1 == null) and (n2 == null) then
return true
if (n1 == null) or (n2 == null) then
return false
if n1.compare(n2) != 0 then
return false
end loop
// unreachable
End Function
End Module
HASH CODE GENERATION:
if you want to write out a string representation of the tree, you can use the lisp syntax for a tree, then sample the string to generate a shorter hashcode.
Module
Function TreeToString(Node n1) : String
if node == null
return ""
String s1 = "(" + n1.toString()
for each child of n1
s1 = TreeToString(child)
return s1 + ")"
End Function
The node.toString() can return the unique label/hash code/whatever for that node. Then you can just do a substring comparison from the strings returned by the TreeToString function to determine if the trees are equivalent. For a shorter hashcode, just sample the TreeToString Function, i.e. take every 5 character.
End Module

I think you could do this recursively: Assume you have a hash function h that hashes strings of arbitrary length (e.g. SHA-1). Now, the hash of a tree is the hash of a string that is created as a concatenation of the hash of the current element (you have your own function for that) and hashes of all the children of that node (from recursive calls of the function).
For a binary tree you would have:
Hash( h(node->data) || Hash(node->left) || Hash(node->right) )
You may need to carefully check if tree geometry is properly accounted for. I think that with some effort you could derive a method for which finding collisions for such trees could be as hard as finding collisions in the underlying hash function.

A simple enumeration (in any deterministic order) together with a hash function that depends when the node is visited should work.
int hash(Node root) {
ArrayList<Node> worklist = new ArrayList<Node>();
worklist.add(root);
int h = 0;
int n = 0;
while (!worklist.isEmpty()) {
Node x = worklist.remove(worklist.size() - 1);
worklist.addAll(x.children());
h ^= place_hash(x.hash(), n);
n++;
}
return h;
}
int place_hash(int hash, int place) {
return (Integer.toString(hash) + "_" + Integer.toString(place)).hash();
}

class TreeNode
{
public static QualityAgainstPerformance = 3; // tune this for your needs
public static PositionMarkConstan = 23498735; // just anything
public object TargetObject; // this is a subject of this TreeNode, which has to add it's hashcode;
IEnumerable<TreeNode> GetChildParticipiants()
{
yield return this;
foreach(var child in Children)
{
yield return child;
foreach(var grandchild in child.GetParticipiants() )
yield return grandchild;
}
IEnumerable<TreeNode> GetParentParticipiants()
{
TreeNode parent = Parent;
do
yield return parent;
while( ( parent = parent.Parent ) != null );
}
public override int GetHashcode()
{
int computed = 0;
var nodesToCombine =
(Parent != null ? Parent : this).GetChildParticipiants()
.Take(QualityAgainstPerformance/2)
.Concat(GetParentParticipiants().Take(QualityAgainstPerformance/2));
foreach(var node in nodesToCombine)
{
if ( node.ReferenceEquals(this) )
computed = AddToMix(computed, PositionMarkConstant );
computed = AddToMix(computed, node.GetPositionInParent());
computed = AddToMix(computed, node.TargetObject.GetHashCode());
}
return computed;
}
}
AddToTheMix is a function, which combines the two hashcodes, so the sequence matters.
I don't know what it is, but you can figure out. Some bit shifting, rounding, you know...
The idea is that you have to analyse some environment of the node, depending on the quality you want to achieve.

I have to say, that you requirements are somewhat against the entire concept of hashcodes.
Hash function computational complexity should be very limited.
It's computational complexity should not linearly depend on the size of the container (the tree), otherwise it totally breaks the hashcode-based algorithms.
Considering the position as a major property of the nodes hash function also somewhat goes against the concept of the tree, but achievable, if you replace the requirement, that it HAS to depend on the position.
Overall principle i would suggest, is replacing MUST requirements with SHOULD requirements.
That way you can come up with appropriate and efficient algorithm.
For example, consider building a limited sequence of integer hashcode tokens, and add what you want to this sequence, in the order of preference.
Order of the elements in this sequence is important, it affects the computed value.
for example for each node you want to compute:
add the hashcode of underlying object
add the hashcodes of underlying objects of the nearest siblings, if available. I think, even the single left sibling would be enough.
add the hashcode of underlying object of the parent and it's nearest siblings like for the node itself, same as 2.
repeat this to with the grandparents to a limited depth.
//--------5------- ancestor depth 2 and it's left sibling;
//-------/|------- ;
//------4-3------- ancestor depth 1 and it's left sibling;
//-------/|------- ;
//------2-1------- this;
the fact that you are adding a direct sibling's underlying object's hashcode gives a positional property to the hashfunction.
if this is not enough, add the children:
You should add every child, just some to give a decent hashcode.
add the first child and it's first child and it's first child.. limit the depth to some constant, and do not compute anything recursively - just the underlying node's object's hashcode.
//----- this;
//-----/--;
//----6---;
//---/--;
//--7---;
This way the complexity is linear to the depth of the underlying tree, not the total number of elements.
Now you have a sequence if integers, combine them with a known algorithm, like Ely suggests above.
1,2,...7
This way, you will have a lightweight hash function, with a positional property, not dependent on the total size of the tree, and even not dependent on the tree depth, and not requiring to recompute hash function of the entire tree when you change the tree structure.
I bet this 7 numbers would give a hash destribution near to perfect.

Writing your own hash function is almost always a bug, because you basically need a degree in mathematics to do it well. Hashfunctions are incredibly nonintuitive, and have highly unpredictable collision characteristics.
Don't try directly combining hashcodes for Child nodes -- this will magnify any problems in the underlying hash functions. Instead, concatenate the raw bytes from each node in order, and feed this as a byte stream to a tried-and-true hash function. All the cryptographic hash functions can accept a byte stream. If the tree is small, you may want to just create a byte array and hash it in one operation.

Related

Efficient tree implementation in MATLAB

Tree class in MATLAB
I am implementing a tree data structure in MATLAB. Adding new child nodes to the tree, assigning and updating data values related to the nodes are typical operations that I expect to execute. Each node has the same type of data associated with it. Removing nodes is not necessary for me. So far, I've decided on a class implementation inheriting from the handle class to be able to pass references to nodes around to functions that will modify the tree.
Edit: December 2nd
First of all, thanks for all the suggestions in the comments and answers so far. They have already helped me to improve my tree class.
Someone suggested trying digraph introduced in R2015b. I have yet to explore this, but seeing as it does not work as a reference parameter similarly to a class inheriting from handle, I am a bit sceptical how it will work in my application. It is also at this point not yet clear to me how easy it will be to work with it using custom data for nodes and edges.
Edit: (Dec 3rd) Further information on the main application: MCTS
Initially, I assumed the details of the main application would only be of marginal interest, but since reading the comments and the answer by #FirefoxMetzger, I realise that it has important implications.
I am implementing a type of Monte Carlo tree search algorithm. A search tree is explored and expanded in an iterative manner. Wikipedia offers a nice graphical overview of the process:
In my application I perform a large number of search iterations. On every search iteration, I traverse the current tree starting from the root until a leaf node, then expand the tree by adding new nodes, and repeat. As the method is based on random sampling, at the start of each iteration I do not know which leaf node I will finish at on each iteration. Instead, this is determined jointly by the data of nodes currently in the tree, and the outcomes of random samples. Whatever nodes I visit during a single iteration have their data updated.
Example: I am at node n which has a few children. I need to access data in each of the children and draw a random sample that determines which of the children I move to next in the search. This is repeated until a leaf node is reached. Practically I am doing this by calling a search function on the root that will decide which child to expand next, call search on that node recursively, and so on, finally returning a value once a leaf node is reached. This value is used while returning from the recursive functions to update the data of the nodes visited during the search iteration.
The tree may be quite unbalanced such that some branches are very long chains of nodes, while others terminate quickly after the root level and are not expanded further.
Current implementation
Below is an example of my current implementation, with example of a few of the member functions for adding nodes, querying the depth or number of nodes in the tree, and so on.
classdef stree < handle
% A class for a tree object that acts like a reference
% parameter.
% The tree can be traversed in both directions by using the parent
% and children information.
% New nodes can be added to the tree. The object will automatically
% keep track of the number of nodes in the tree and increment the
% storage space as necessary.
properties (SetAccess = private)
% Hold the data at each node
Node = { [] };
% Index of the parent node. The root of the tree as a parent index
% equal to 0.
Parent = 0;
num_nodes = 0;
size_increment = 1;
maxSize = 1;
end
methods
function [obj, root_ID] = stree(data, init_siz)
% New object with only root content, with specified initial
% size
obj.Node = repmat({ data },init_siz,1);
obj.Parent = zeros(init_siz,1);
root_ID = 1;
obj.num_nodes = 1;
obj.size_increment = init_siz;
obj.maxSize = numel(obj.Parent);
end
function ID = addnode(obj, parent, data)
% Add child node to specified parent
if obj.num_nodes < obj.maxSize
% still have room for data
idx = obj.num_nodes + 1;
obj.Node{idx} = data;
obj.Parent(idx) = parent;
obj.num_nodes = idx;
else
% all preallocated elements are in use, reserve more memory
obj.Node = [
obj.Node
repmat({data},obj.size_increment,1)
];
obj.Parent = [
obj.Parent
parent
zeros(obj.size_increment-1,1)];
obj.num_nodes = obj.num_nodes + 1;
obj.maxSize = numel(obj.Parent);
end
ID = obj.num_nodes;
end
function content = get(obj, ID)
%% GET Return the contents of the given node IDs.
content = [obj.Node{ID}];
end
function obj = set(obj, ID, content)
%% SET Set the content of given node ID and return the modifed tree.
obj.Node{ID} = content;
end
function IDs = getchildren(obj, ID)
% GETCHILDREN Return the list of ID of the children of the given node ID.
% The list is returned as a line vector.
IDs = find( obj.Parent(1:obj.num_nodes) == ID );
IDs = IDs';
end
function n = nnodes(obj)
% NNODES Return the number of nodes in the tree.
% Equal to root + those whose parent is not root.
n = 1 + sum(obj.Parent(1:obj.num_nodes) ~= 0);
assert( obj.num_nodes == n);
end
function flag = isleaf(obj, ID)
% ISLEAF Return true if given ID matches a leaf node.
% A leaf node is a node that has no children.
flag = ~any( obj.Parent(1:obj.num_nodes) == ID );
end
function depth = depth(obj,ID)
% DEPTH return depth of tree under ID. If ID is not given, use
% root.
if nargin == 1
ID = 0;
end
if obj.isleaf(ID)
depth = 0;
else
children = obj.getchildren(ID);
NC = numel(children);
d = 0; % Depth from here on out
for k = 1:NC
d = max(d, obj.depth(children(k)));
end
depth = 1 + d;
end
end
end
end
However, performance at times is slow, with operations on the tree taking up most of my computation time. What specific ways would there be to make the implementation more efficient? It would even be possible to change the implementation to something else than the handle inheritance type if there are performance gains.
Profiling results with current implementation
As adding new nodes to the tree is the most typical operation (along with updating the data of a node), I did some profiling on that.
I ran the profiler on the following benchmarking code with Nd=6, Ns=10.
function T = benchmark(Nd, Ns)
% Tree benchmark. Nd: tree depth, Ns: number of nodes per layer
% Initialize tree
T = stree(rand, 10000);
add_layers(1, Nd);
function add_layers(node_id, num_layers)
if num_layers == 0
return;
end
child_id = zeros(Ns,1);
for s = 1:Ns
% add child to current node
child_id(s) = T.addnode(node_id, rand);
% recursively increase depth under child_id(s)
add_layers(child_id(s), num_layers-1);
end
end
end
Results from the profiler:
R2015b performance
It has been discovered that R2015b improves the performance of MATLAB's OOP features. I redid the above benchmark and indeed observed an improvement in performance:
So this is already good news, although further improvements are of course accepted ;)
Reserving memory differently
It was also suggested in the comments to use
obj.Node = [obj.Node; data; cell(obj.size_increment - 1,1)];
to reserve more memory rather than the current approach with repmat. This improved performance slightly. I should note that my benchmark code is for dummy data, and since the actual data is more complicated this is likely to help. Thanks! Profiler results below:
Questions on even further increasing performance
Perhaps there is an alternative way to maintain memory for the tree that is more efficient? Sadly, I typically don't know ahead of time how many nodes there will be in the tree.
Adding new nodes and modifying the data of existing nodes are the most typical operations I do on the tree. As of now, they actually take up most of the processing time of my main application. Any improvements on these functions would be most welcome.
Just as a final note, I would ideally like to keep the implementation as pure MATLAB. However, options such as MEX or using some of the integrated Java functionalities may be acceptable.
TL:DR You deep copy the entire data stored on each insertation, initialize the parent and Node cell bigger then what you expect to need.
Your data does have a tree structure, however you are not utilising this in your implementation. Instead the implemented code is a computational hungry version of a look up table (actually 2 tables), that stores the data and the relational data for the tree.
The reasons I am saying this are the following:
To insert you call stree.addnote(parent, data), which will store all data in the tree object stree's fields Node = {} and Parent = []
you seem to know prior which element in your tree you want to access as the search code is not given (if you use stree.getchild(ID) for it I have some bad news)
once you processed a node you trace it back using find() which is a list search
By no means does that mean the implementation is clumsy for the data, it may even be the best, depending on what you are doing. However it does explain your memory allocation issues and gives hints on how to resolve them.
Keep Data as lookup table
One of the ways to store data is to keep the underlying look up table. I would only do this, if you know the ID of the first element you want to modify without searching for it. This case allows you to make your structure more efficient in two steps.
First initialise your arrays bigger then what you expect you need to store the data. If the look up table's capacity is exceeded, a new one is initialized, which is X fields larger, and a deep-copy of the old data is made. If you need to expand capcity once or twice (during all insertations) it might not be an issue, but in your case a deep copy is made for ever insertation!
Second I would change the internal structure and merge the two tables Node and Parent. The reason for this is that back-propagation in your code takes O(depth_from_root * n), where n is the number of nodes in your table. This is because find() will iterate over the entire table for each parent.
Instead you can implement something similar to
table = cell(n,1) % n bigger then expected value
end_pointer = 1 % simple pointer to the first free value
function insert(data,parent_ID)
if end_pointer < numel(table)
content.data = data;
content.parent = parent_ID;
table{end_pointer} = content;
end_pointer = end_pointer + 1;
else
% need more space, make sure its enough this time
table = [table cell(end_pointer,1)];
insert(data,parent_ID);
end
end
function content = get_value(ID)
content = table(ID);
end
This instantly gives you access to the parent's ID without the need to find() it first, saving n iterations each step, so afford becomes O(depth). If you do not know your initial node, then you have to find() that one, which costs O(n).
Note that this structure has no need for is_leaf(), depth(), nnodes() or get_children(). If you still need those I need more insight in what you want to do with your data, as this highly influences a proper structure.
Tree Structure
This structure makes sense, if you never know the first node's ID and thus allways have to search for it.
The benefit is that the search for an arbitrary note works with O(depth), so searching is O(depth) instead of O(n) and back propagating is O(depth^2) instead of O(depth + n). Note that depth can be anything from log(n) for a perfectly balanced tree, that may be possible depending on your data, to n for the degenerated tree, that just is a linked list.
However to suggest something proper I would require more insight, as every tree structure kind of has its own nich. From what I can see so far, I'd suggest an unbalanced tree, that is 'sorted' by the simple order given by a nodes wanted parent. This may be further optimized depending on
is it possible to define a total order on your data
how do you treat double values (same data appearing twice)
what scale is your data (thousands, millions, ...)
is a lookup / search allways paired with back propagation
how long are the chains of 'parent-child' on your data (or how balanced and deep will the tree be using this simple order)
is there allways just one parent or is the same element inserted twice with different parents
I'll happily provide example code for above tree, just leave me a comment.
EDIT:
In your case an unbalanced tree (that is construted paralell to doing the MCTS) seems to be the best option. The code below assumes that the data is split in state and score and further that a state is unique. If it isn't this will still work, however there is a possible optimisation to increase MCTS preformance.
classdef node < handle
% A node for a tree in a MCTS
properties
state = {}; %some state of the search space that identifies the node
score = 0;
childs = cell(50,1);
num_childs = 0;
end
methods
function obj = node(state)
% for a new node simulate a score using MC
obj.score = simulate_from(state); % TODO implement simulation state -> finish
obj.state = state;
end
function value = update(obj)
% update the this node using MC recursively
if obj.num_childs == numel(obj.childs)
% there are to many childs, we have to expand the table
obj.childs = [obj.childs cell(obj.num_childs,1)];
end
if obj.do_exploration() || obj.num_childs == 0
% explore a potential state
state_to_explore = obj.explore();
%check if state has already been visited
terminate = false;
idx = 1;
while idx <= obj.num_childs && ~terminate
if obj.childs{idx}.state_equals(state_to_explore)
terminate = true;
end
idx = idx + 1;
end
%preform the according action based on search
if idx > obj.num_childs
% state has never been visited
% this action terminates the update recursion
% and creates a new leaf
obj.num_childs = obj.num_childs + 1;
obj.childs{obj.num_childs} = node(state_to_explore);
value = obj.childs{obj.num_childs}.calculate_value();
obj.update_score(value);
else
% state has been visited at least once
value = obj.childs{idx}.update();
obj.update_score(value);
end
else
% exploit what we know already
best_idx = 1;
for idx = 1:obj.num_childs
if obj.childs{idx}.score > obj.childs{best_idx}.score
best_idx = idx;
end
end
value = obj.childs{best_idx}.update();
obj.update_score(value);
end
value = obj.calculate_value();
end
function state = explore(obj)
%select a next state to explore, that may or may not be visited
%TODO
end
function bool = do_exploration(obj)
% decide if this node should be explored or exploited
%TODO
end
function bool = state_equals(obj, test_state)
% returns true if the nodes state is equal to test_state
%TODO
end
function update_score(obj, value)
% updates the score based on some value
%TODO
end
function calculate_value(obj)
% returns the value of this node to update previous nodes
%TODO
end
end
end
A few comments on the code:
depending on the setup the obj.calculate_value() might not be needed. E.g. if it is some value that can be computed by evaluating the child's scores alone
if a state can have multiple parents it makes sense to reuse the note object and cover it in the structure
as each node knows all its children a subtree can be easily generated using node as root node
searching the tree (without any update) is a simple recursive greedy search
depending on the branching factor of your search, it might be worth to visit each possible child once (upon initialization of the node) and later do randsample(obj.childs,1) for exploration as this avoids copying / reallocating of the child array
the parent property is encoded as the tree is updated recursively, passing value to the parent upon finishing the update for a node
The only time I reallocate memory is when a single node has more then 50 childs any I only do reallocation for that individual node
This should run a lot faster, as it just worries about whatever part of the tree is chosen and does not touch anything else.
I know that this might sound stupid... but how about keeping the number of free nodes instead of total number of nodes? This would require comparison against a constant (which is zero), which is single property access.
One other voodoo improvement would be moving .maxSize near .num_nodes, and placing both those before the .Node cell. Like this their position in memory won't change relative to the beginning of the object because of the growth of .Node property (the voodoo here being me guessing the internal implementation of objects in MATLAB).
Later Edit When I profiled with the .Node moved at the end of the property list, the bulk of the execution time was consumed by extending the .Node property, as expected (5.45 seconds, compared to 1.25 seconds for the comparison you mentioned).
You can try to allocate a number of elements that is proportional to the number of elements you have actually filled: this is the standard implementation for std::vector in c++
obj.Node = [obj.Node; data; cell(q * obj.num_nodes,1)];
I don't remember exactly but in MSCC q is 1 while it is .75 for GCC.
This is a solution using Java. I don't like it very much, but it does its job. I implemented the example you extracted from wikipedia.
import javax.swing.tree.DefaultMutableTreeNode
% Let's create our example tree
top = DefaultMutableTreeNode([11,21])
n1 = DefaultMutableTreeNode([7,10])
top.add(n1)
n2 = DefaultMutableTreeNode([2,4])
n1.add(n2)
n2 = DefaultMutableTreeNode([5,6])
n1.add(n2)
n3 = DefaultMutableTreeNode([2,3])
n2.add(n3)
n3 = DefaultMutableTreeNode([3,3])
n2.add(n3)
n1 = DefaultMutableTreeNode([4,8])
top.add(n1)
n2 = DefaultMutableTreeNode([1,2])
n1.add(n2)
n2 = DefaultMutableTreeNode([2,3])
n1.add(n2)
n2 = DefaultMutableTreeNode([2,3])
n1.add(n2)
n1 = DefaultMutableTreeNode([0,3])
top.add(n1)
% Element to look for, your implementation will be recursive
searching = [0 1 1];
idx = 1;
node(idx) = top;
for item = searching,
% Java transposes the matrices, remember to transpose back when you are reading
node(idx).getUserObject()'
node(idx+1) = node(idx).getChildAt(item);
idx = idx + 1;
end
node(idx).getUserObject()'
% We made a new test...
newdata = [0, 1]
newnode = DefaultMutableTreeNode(newdata)
% ...so we expand our tree at the last node we searched
node(idx).add(newnode)
% The change has to be propagated (this is where your recursion returns)
for it=length(node):-1:1,
itnode=node(it);
val = itnode.getUserObject()'
newitemdata = val + newdata
itnode.setUserObject(newitemdata)
end
% Let's see if the new values are correct
searching = [0 1 1 0];
idx = 1;
node(idx) = top;
for item = searching,
node(idx).getUserObject()'
node(idx+1) = node(idx).getChildAt(item);
idx = idx + 1;
end
node(idx).getUserObject()'

How to calculate Hash value of a Tree

What is the best way to calculate the hash value of a Tree?
I need to compare the similarity between several trees in O(1). Now, I want to precalculate the hash values and compare them when needed. But then I realized, hashing a tree is different than hashing a sequence. I wasn't able to come up with a good hash function.
What is the best way to calculate hash value of a tree?
Note : I will implement the function in c/c++
Well hasing a tree means representing it in a unique way so that we can differ other trees from this tree using a simple representation or number. On normal polynomial hash we use number base conversion, we convert a string or a sequence in a specific prime base and use a mod value which is also a large prime. Now using this same technique we can hash a tree.
Now fix the root of the tree at any vertex. Let root = 1 and,
B = The base in which we want to convert.
P[i] = i th power of B (B^i).
level[i] = Depth of the ith vertex where (distance from the root).
child[i] = Total number of the vertex in the subtree of ith vertex including i.
degree[i] = Number of adjacent node of vertex i.
Now the contribution of the ith vertex in the hash value is -
hash[i] = ( (P[level[i]]+degree[i]) * child[i] ) % modVal
And the hash value of the entire tree is the summation of the all vertices hash value-
(hash[1]+hash[2]+....+hash[n]) % modVal
If we use this definition of tree equivalence:
T1 is equivalent to T2 iff
all paths to leaves of T1 exist exactly once in T2, and
all paths to leaves of T2 exist exactly once in T2
Hashing a sequence (a path) is straightforward. If h_tree(T) is a hash of all paths-to-leafs of T, where the order of the paths does not alter the outcome, then it is a good hash for the whole of T, in the sense that equivalent trees will produce equal hashes, according to the above definition of equivalence. So I propose:
h_path(path) = an order-dependent hash of all elements in the path.
Requires O(|path|) time to calculate,
but child nodes can reuse the calculation of their
parent node's h_path in their own calculations.
h_tree(T) = an order-independent hashing of all its paths-to-leaves.
Can be calculated in O(|L|), where L is the number of leaves
In pseudo-c++:
struct node {
int path_hash; // path-to-root hash; only use for building tree_hash
int tree_hash; // takes children into account; use to compare trees
int content;
vector<node> children;
int update_hash(int parent_path_hash = 1) {
path_hash = parent_path_hash * PRIME1 + content; // order-dependent
tree_hash = path_hash;
for (node n : children) {
tree_hash += n.update_hash(path_hash) * PRIME2; // order-independent
}
return tree_hash;
}
};
After building two trees, update their hashes and compare away. Equivalent trees should have the same hash, different trees not so much. Note that the path and tree hashes that I am using are rather simplistic, and chosen rather for ease of programming than for great collision resistance...
Child hashes should be successively multiplied by a prime number & added. Hash of the node itself should be multiplied by a different prime number & added.
Cache the hash of the tree overall -- I prefer to cache it outside the AST node, if I have a wrapper object holding the AST.
public class RequirementsExpr {
protected RequirementsAST ast;
protected int hash = -1;
public int hashCode() {
if (hash == -1)
this.hash = ast.hashCode();
return hash;
}
}
public class RequirementsAST {
protected int nodeType;
protected Object data;
// -
protected RequirementsAST down;
protected RequirementsAST across;
public int hashCode() {
int nodeHash = nodeType;
nodeHash = (nodeHash * 17) + (data != null ? data.hashCode() : 0);
nodeHash *= 23; // prime A.
int childrenHash = 0;
for (RequirementsAST child = down; child != null; child = child.getAcross()) {
childrenHash *= 41; // prime B.
childrenHash += child.hashCode();
}
int result = nodeHash + childrenHash;
return result;
}
}
The result of this, is that child/descendant nodes in different positions are always multiplied in by different factors; and the node itself is always multiplied in by a different factor from any possible child/descendant nodes.
Note that other primes should also be used in building the nodeHash of the node data, itself. This helps avoid eg. different values of nodeType colliding with different values of data.
Within the limits of 32-bit hashing, this scheme overall gives a very high chance of uniqueness for any differences in tree-structure (eg, transposing two siblings) or value.
Once calculated (over the entire AST) the hashes are highly efficient.
I would recommend converting the tree to a canonical sequence and hashing the sequence. (The details of the conversion depend on your definition of equivalence. For example, if the trees are binary search trees and the equivalence relation is structural, then the conversion could be to enumerate the tree in preorder, as the structure of binary search trees can be recovered from the preorder enumeration.)
Thomas's answer boils down at first glance to associating a multivariable polynomial with each tree and evaluating the polynomial at a particular location. There are two steps that, at the moment, have to be assumed on faith; the first is that the map doesn't send inequivalent trees to the same polynomial, and the second is that the evaluation scheme doesn't introduce too many collisions. I can't evaluate the first step presently, though there are reasonable definitions of equivalence that permit reconstruction from a two-variable polynomial. The second is not theoretically sound but could be made so via Schwartz--Zippel.

Sum up child values and save the values calculated in intermediate steps

struct node {
int value;
struct node* left;
struct node* right;
int left_sum;
int right_sum;
}
In a binary tree, from a particular node, there is a simply recursive algorithm to sum up all its child values. Is there a way to save the values calculated in the intermediate steps and store them as left_sum and right_sum in child nodes?
Will it be easier to do this bottom up by adding a struct node* parent link to the node definition?
No, this is clearly an exercise in recursion. Think about what the sum means. It's zero plus the "sum of all values from the root down".
Interestingly enough, the "sum of all values from the root down" is the value of the root node plus the "sum of all values from its left node down" plus the "sum of all values from its right node down".
Hopefully, you can see where I'm going here.
The essence of recursion is to define an operation in terms of similar, simpler, operations with a terminating condition.
The terminating condition, in this case, is the leaf nodes of the tree or, to make the code simpler, beyond the leaf nodes.
Examine the following pseudo-code:
def sumAllNodes (node):
if node == NULL:
return 0
return node.value + sumAllNodes (node.left) + sumAllNodes (node.right)
fullSum = sumAllNodes (rootnode)
That's really all there is to it. With the following tree:
__A(9)__
/ \
B(3) C(2)
/ \ \
D(21) E(7) F(1)
Using the pseudo-code, the sum is the value of A (9) plus the sums of the left and right subtrees.
The left subtree of A is the value of B (3) plus the sums of its left and right subtrees.
The left subtree of B is the value of D (21) plus the sums of its left and right subtrees.
The left subtree of D is the value of NULL (0).
Later on, the right subtree of A is the value of C (2) plus the sums of its left and right subtrees, it's left subtree being empty, its right subtree being F (1).
Because you're doing this recursively, you don't explicitly ever walk your way up the tree. It's the fact that the recursive calls are returning with the summed values which gives that ability. In other words, it happens under the covers.
And the other part of your question is not really useful though, of course, there may be unstated requirements that I'm not taking into account, because they're, well, ... unstated :-)
Is there a way to save the values calculated in the intermediate steps and store them as left_sum and right_sum in child nodes?
You never actually re-use the sums for a given sub-tree. During a sum calculation, you would calculate the B-and-below subtree only once as part of adding it to A and the C-and-below subtree.
You could store those values so that B contained both the value and the two sums (left and right) - this would mean that every change to the tree would have to propagate itself up to the root as well but it's doable.
Now there are some situations where that may be useful. For example, if the tree itself changes very rarely but you want the sum very frequently, it makes sense performance wise to do it on update so that the cost is amortised across lots of reads.
I sometimes use this method with databases (which are mostly read far more often than written) but it's unusual to see it in "normal" binary trees.
Another possible optimisation: just maintain the sum as a separate variable in the tree object. Initialise it to zero then, whenever you add a node, add its value to the sum.
When you delete a node, subtract its value from the sum. That gives you your very fast O(1) "return sum" function without having to propagate upwards on update.
The downside is that you only have a sum for the tree as a whole but I'm having a hard time coming up with a valid use case for needing the sum of subtrees. If you have such a use case, then I'd go for something like:
def updateAllNodes (node):
if node == NULL:
return 0
node.leftSum = updateAllNodes (node.left)
node.rightSum = updateAllNodes (node.right)
return node.value + node.leftSum + node.rightSum
change the tree somehow (possibly many times)
fullSum = updateAllNodes (root)
In other words, just update the entire tree after each change (or batch the changes then update if you know there's quite a few changes happening). This will probably be a little simpler than trying to do it as part of the tree update itself.
You can even use a separate dirtyFlag which is set to true whenever the tree changes and set to false whenever you calculate and store the sum. Then use that in the sum calculation code to only do the recalc if it's dirty (in other words, a cache of the sums).
That way, code like:
fullSum = updateAllNodes (root)
fullSum = updateAllNodes (root)
fullSum = updateAllNodes (root)
fullSum = updateAllNodes (root)
fullSum = updateAllNodes (root)
will only incur a cost on the first invocation. The other four should be blindingly fast since the sum is cached.

Permutations with extra restrictions

I have a set of items, for example: {1,1,1,2,2,3,3,3}, and a restricting set of sets, for example {{3},{1,2},{1,2,3},{1,2,3},{1,2,3},{1,2,3},{2,3},{2,3}. I am looking for permutations of items, but the first element must be 3, and the second must be 1 or 2, etc.
One such permutation that fits is:
{3,1,1,1,2,2,3}
Is there an algorithm to count all permutations for this problem in general? Is there a name for this type of problem?
For illustration, I know how to solve this problem for certain types of "restricting sets".
Set of items: {1,1,2,2,3}, Restrictions {{1,2},{1,2,3},{1,2,3},{1,2},{1,2}}. This is equal to 2!/(2-1)!/1! * 4!/2!/2!. Effectively permuting the 3 first, since it is the most restrictive and then permuting the remaining items where there is room.
Also... polynomial time. Is that possible?
UPDATE: This is discussed further at below links. The problem above is called "counting perfect matchings" and each permutation restriction above is represented by a {0,1} on a matrix of slots to occupants.
https://math.stackexchange.com/questions/519056/does-a-matrix-represent-a-bijection
https://math.stackexchange.com/questions/509563/counting-permutations-with-additional-restrictions
https://math.stackexchange.com/questions/800977/parking-cars-and-vans-into-car-van-and-car-van-parking-spots
All of the other solutions here are exponential time--even for cases that they don't need to be. This problem exhibits similar substructure, and so it should be solved with dynamic programming.
What you want to do is write a class that memoizes solutions to subproblems:
class Counter {
struct Problem {
unordered_multiset<int> s;
vector<unordered_set<int>> v;
};
int Count(Problem const& p) {
if (m.v.size() == 0)
return 1;
if (m.find(p) != m.end())
return m[p];
// otherwise, attack the problem choosing either choosing an index 'i' (notes below)
// or a number 'n'. This code only illustrates choosing an index 'i'.
Problem smaller_p = p;
smaller_p.v.erase(v.begin() + i);
int retval = 0;
for (auto it = p.s.begin(); it != p.s.end(); ++it) {
if (smaller_p.s.find(*it) == smaller_p.s.end())
continue;
smaller_p.s.erase(*it);
retval += Count(smaller_p);
smaller_p.s.insert(*it);
}
m[p] = retval;
return retval;
}
unordered_map<Problem, int> m;
};
The code illustrates choosing an index i, which should be chosen at a place where there are v[i].size() is small. The other option is to choose a number n, which should be one for which there are few locations v that it can be placed in. I'd say the minimum of the two deciding factors should win.
Also, you'll have to define a hash function for Problem -- that shouldn't be too hard using boost's hash stuff.
This solution can be improved by replacing the vector with a set<>, and defining a < operator for unordered_set. This will collapse many more identical subproblems into a single map element, and further reduce mitigate exponential blow-up.
This solution can be further improved by making Problem instances that are the same except that the numbers are rearranged hash to the same value and compare to be the same.
You might consider a recursive solution that uses a pool of digits (in the example you provide, it would be initialized to {1,1,1,2,2,3,3,3}), and decides, at the index given as a parameter, which digit to place at this index (using, of course, the restrictions that you supply).
If you like, I can supply pseudo-code.
You could build a tree.
Level 0: Create a root node.
Level 1: Append each item from the first "restricting set" as children of the root.
Level 2: Append each item from the second restricting set as children of each of the Level 1 nodes.
Level 3: Append each item from the third restricting set as children of each of the Level 2 nodes.
...
The permutation count is then the number of leaf nodes of the final tree.
Edit
It's unclear what is meant by the "set of items" {1,1,1,2,2,3,3,3}. If that is meant to constrain how many times each value can be used ("1" can be used 3 times, "2" twice, etc.) then we need one more step:
Before appending a node to the tree, remove the values used on the current path from the set of items. If the value you want to append is still available (e.g. you want to append a "1", and "1" has only been used twice so far) then append it to the tree.
To save space, you could build a directed graph instead of a tree.
Create a root node.
Create a node for each item in the
first set, and link from the root to
the new nodes.
Create a node for each item in the
second set, and link from each first
set item to each second set item.
...
The number of permutations is then the number of paths from the root node to the nodes of the final set.

Determine if two binary trees are equal

What would be the efficient algorithm to find if two given binary trees are equal - in structure and content?
It's a minor issue, but I'd adapt the earlier solution as follows...
eq(t1, t2) =
t1.data=t2.data && eq(t1.left, t2.left) && eq(t1.right, t2.right)
The reason is that mismatches are likely to be common, and it is better to detect (and stop comparing) early - before recursing further. Of course, I'm assuming a short-circuit && operator here.
I'll also point out that this is glossing over some issues with handling structurally different trees correctly, and with ending the recursion. Basically, there need to be some null checks for t1.left etc. If one tree has a null .left but the other doesn't, you have found a structural difference. If both have null .left, there's no difference, but you have reached a leaf - don't recurse further. Only if both .left values are non-null do you recurse to check the subtree. The same applies, of course, for .right.
You could include checks for e.g. (t1.left == t2.left), but this only makes sense if subtrees can be physically shared (same data structure nodes) for the two trees. This check would be another way to avoid recursing where it is unnecessary - if t1.left and t2.left are the same physical node, you already know that those whole subtrees are identical.
A C implementation might be...
bool tree_compare (const node* t1, const node* t2)
{
// Same node check - also handles both NULL case
if (t1 == t2) return true;
// Gone past leaf on one side check
if ((t1 == NULL) || (t2 == NULL)) return false;
// Do data checks and recursion of tree
return ((t1->data == t2->data) && tree_compare (t1->left, t2->left )
&& tree_compare (t1->right, t2->right));
}
EDIT In response to a comment...
The running time for a full tree comparison using this is most simply stated as O(n) where n is kinda the size of a tree. If you're willing to accept a more complex bound you can get a smaller one such as O(minimum(n1, n2)) where n1 and n2 are the sizes of the trees.
The explanation is basically that the recursive call is only made (at most) once for each node in the left tree, and only made (at most) once for each node in the right tree. As the function itself (excluding recursions) only specifies at most a constant amount of work (there are no loops), the work including all recursive calls can only be as much as the size of the smaller tree times that constant.
You could analyse further to get a more complex but smaller bound using the idea of the intersection of the trees, but big O just gives an upper bound - not necessarily the lowest possible upper bound. It's probably not worthwhile doing that analysis unless you're trying to build a bigger algorithm/data structure with this as a component, and as a result you know that some property will always apply to those trees which may allow you a tighter bound for the larger algorithm.
One way to form a tigher bound is to consider the sets of paths to nodes in both trees. Each step is either an L (left subtree) or an R (right subtree). So the root is specified with an empty path. The right child of the left child of the root is "LR". Define a function "paths (T)" (mathematically - not part of the program) to represent the set of valid paths into a tree - one path for every node.
So we might have...
paths(t1) = { "", "L", "LR", "R", "RL" }
paths(t2) = { "", "L", "LL", "R", "RR" }
The same path specifications apply to both trees. And each recursion always follows the same left/right link for both trees. So the recursion visits the paths in the itersection of these sets, and the tightest bound we can specify using this is the cardinality of that intersection (still with the constant bound on work per recursive call).
For the tree structures above, we do recursions for the following paths...
paths(t1) intersection paths(t2) = { "", "L", "R" }
So our work in this case is bounded to at most three times the maximum cost of non-recursive work in the tree_compare function.
This is normally an unnecessary amount of detail, but clearly the intersection of the path-sets is at most as large as the number of nodes in the smallest original tree. And whether the n in O(n) refers to the number of nodes in one original tree or to the sum of the nodes in both, this is clearly no smaller than either the minimum or our intersection. Therefore O(n) isn't such a tight bound, but it's still a valid upper bound, even if we're a bit vague which size we're talking about.
Modulo stack overflow, something like
eq(t1, t2) =
eq(t1.left, t2.left) && t1.data=t2.data && eq(t1.right, t2.right)
(This generalizes to an equality predicate for all tree-structured algebraic data types - for any piece of structured data, check if each of its sub-parts are equal to each of the other one's sub-parts.)
We can also do any of the two traversals (pre-order, post-order or in-order) and then compare the results of both the trees. If they are same, we can be sure of their equivalence.
A more general term for what you are probably trying to accomplish is graph isomorphism. There are some algorithms to do this on that page.
Since it's a proven fact that - it is possible to recreate a binary tree as long as we have the following:
The sequence of nodes that are encountered in an In-Order Traversal.
The sequence of nodes that are encountered in a Pre-Order OR Post-Order Traversal
If two binary trees have the same in-order and [pre-order OR post-order] sequence, then they should be equal both structurally and in terms of values.
Each traversal is an O(n) operation. The traversals are done 4 times in total and the results from the same-type of traversal is compared.
O(n) * 4 + 2 => O(n)
Hence, the total order of time-complexity would be O(n)
I would write it as follows. The following code will work in most functional language, and even in python if your datatypes are hashable (e.g. not dictionaries or lists):
topological equality (same in structure, i.e. Tree(1,Tree(2,3))==Tree(Tree(2,3),1)):
tree1==tree2 means set(tree1.children)==set(tree2.children)
ordered equality:
tree1==tree2 means tree1.children==tree2.children
(Tree.children is an ordered list of children)
You don't need to handle the base cases (leaves), because equality has been defined for them already.
bool identical(node* root1,node* root2){
if(root1 == NULL && root2 == NULL)
return true;
if(root1==NULL && root2!=NULL || root1!=NULL && root2 == NULL)
return false;
if(root1->data == root2->data){
bool lIdetical = identical(root1->left,root2->left);
if(!lIdentical)
return false;
bool rIdentical = identical(root1->right,root2->identical);
return lIdentical && rIdentical;
}
else{
printf("data1:%d vs data2:%d",root1->data,root2->data);
return false;
}
}
I do not know if this is the most effecient but I think this works.

Resources