Figuring a max repetitive sub-tree in an object tree - algorithm

I am trying to solve a problem of finding a max repetitive sub-tree in an object tree.
By the object tree I mean a tree where each leaf and node has a name. Each leaf has a type and a value of that type associated with that leaf. Each node has a set of leaves / nodes in certain order.
Given an object tree that - we know - has a repetitive sub-tree in it.
By repetitive I mean 2 or more sub-trees that are similar in everything (names/types/order of sub-elements) but the values of leaves. No nodes/leaves can be shared between sub-trees.
Problem is to identify these sub-trees of the max height.
I know that the exhaustive search can do the trick. I am rather looking for more efficient approach.

you could implement a dfs traversal generating a hash value for each node. Store these values with the node height in a simple array. Sub-tree candidates are duplicate values, just check that the candidates are ok since two different sub-trees could yield same hash value.
Assuming the leafs and internal nodes are all of type Node and that standard access and traversal functions are available :
procedure dfs_update( node : Node, hashmap : Hashmap )
begin
if is_leaf(node) then
hashstring = concat("LEAF",'|',get_name_str(node),'|',get_type_str(node))
else // node is an internal node
hashstring = concat("NODE",'|',get_name_str(node))
for each child in get_children_sorted(node)
dfs_update(child,hashmap)
hashstring = concat(hashstring,'|',get_hash_string(hashmap,child))
end for
end if
// only a ref to node is added to the hashmap, we could also add
// the node's height, hashstring, whatever could be useful and inapropriate
// to keep in the Node ds
add(hashmap, hash(hashstring),node)
end
The tricky part is after a dfs_update, we have to get the list of collinding nodes in the hasmap by descending height and check two by two they are really repetitive.

Related

What is the pseudocode for this binary tree

basically i am required to come out with a pseudocode for this. What i currently have is
dictionary = {}
if node.left == none and node.right == none
visit(node)
dictionary[node] = 1
This is only the leaf nodes, how do i get the size for each node(parent and root)?
You can do a post-order traversal to find the size of each node.
The idea is to first handle both left and right trees. Then, after they are processed - you can use this data to process the current node.
This should look something like:
count = 0
if (node.left != none)
count += visit(node.left)
if (node.right != none)
count += visit(node.right)
// self is included.
count += 1
// update the node
node.size = count
return count
The dictionary for visited nodes is not needed since this is a tree, it guarantees to end.
As a side note - the size attribute of each node, is an important one. It basically upgrades your tree to a Order Statistics Tree
well the concept is that each node will know it's subtree size by first knowing the subtree size of all it's child which is maximum two child here as it is a binary tree, so once it knows subtree size of all child it can then add up all of them and atlast add 1 to it's
result and then the same thing will be done by it's parent also and so on upto root node. if we think about leaf node, it
has no child, so result subtree size will be only 1 in which it include itself.
one this idea is clear, it is easy to write code
that while traversing we will first know the subtree size of child nodes of current node then add 1 in it, in case of leaf node it will have subtree size of 1 only, below is the pseudocode of traverse funtion which finds the subtree size of each node and store them in dictionary sizeDictionary and a visited dictionary/array having larger scope has been used to keep track of visited nodes.
traverse(Tree curNode, dictionary subTreeSizeDictionary)
visited[curNode] = true
subtreeSizeDictionary[curNode] = 0
for child of curNode
if(notVisited[child])
traverse(child , sizeDictionary)
subtreeSizeDictionary[curNode] += subtreeSizeDictionary[child]
subtreeSizeDictionary[curNode] += 1;
here it is binary tree, but as you can see from pseudocode this concept can be used for any valid tree, the time complexity is O(n) as we visited each node only once.

Count the height of a binary tree

I want to caculate the height of a binary tree wihout using the algorithme that takes the maximum of the depths of each leaf.
This is the structure that I have on eache node
[content, left_son, right_son, father_node]
a list of arrays of size 4 that represents every node. the left_son, right_son, and father are respectively the indexes of the the left son node, the right son node and the father node in the list
This is indeed possible, but you have to rework your data structure a bit:
You can store a binary Tree in an array like this:
Store the value of the root node at index 0.
Store the left child of
a node at index 2*i+1, where i is the index of the current node;
right child goes at index 2*i+2.
If you do it like this
a node's parent is saved at index (i-1)/2.
your array needs the length 2^(h+1)-1, where h is the height of the tree.
So all you have to do ist keep track of the last "used" index in your array and use the above formula to calculate the height.
2^(h+1)-1 = l => h = ceil(ld(l+1))-1
whith l being the length of your array you effictively use and ld the logarithm with base 2.

Decision Tree Depth

As part of my project, I have to use Decision tree that I am using "fitctree" function that is the Matlab function for classified my features that extracted with PCA.
I want to control number of Tree and tree depth in fitctree function.
anyone knows how can I do this? for example changed the number of trees to 200 and tree depth to 10. How am I going to do this?
Is it possible to change these value in decision tree?
Best,
fitctree offers only input parameters to control the depth of the resulting tree:
MaxNumSplits
MinLeafSize
MinParentSize
https://de.mathworks.com/help/stats/classification-trees-and-regression-trees.html#bsw6baj
You have to play with those parameters to control the depth of your tree. Thats because the decision tree only stops growing when purity is reached.
Another possibility would be to turn on pruning. Pruning will reduce the size of your tree by removing sections of the tree that provide little power to classify instances.
Let me assume that you are using ID3 algorithm. Its pseudocode can provide a way to control the depth of the tree.
ID3 (Examples, Target_Attribute, Attributes, **Depth**)
// Check the depth of the tree, if it is 0, we are going to break
if (Depth == 0) { break; }
// Else continue
Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root,
with label = most common value of the target attribute in the examples.
Otherwise Begin
A ← The Attribute that best classifies examples.
Decision Tree attribute for Root = A.
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi.
Let Examples(vi) be the subset of examples that have the value vi for A
If Examples(vi) is empty
Then below this new branch add a leaf node with label = most common target value in the examples
// We decrease the value of Depth by 1 so the tree stops growing when it reaches the designated depth
Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A}, Depth - 1)
End
Return Root
What algorithm does your fictree function try to implement?

How do I transfer a normal binary tree into a "smarter" binary tree where each node knows its parents, total subnodes and level?

I'm still getting used to data structures, and I'm comfortable with traversing binary trees in the various ways, but I'm presented now with a situation where I have a normal binary tree, constructed of nodes that only know have data, left and right attributes.
However I want to transfer it into a "smarter" binary tree. This tree is to know its parent node, its total subnodes, and the level in the total tree it is at.
I'm really struggling with how I'd go about transferring the one "dumber" tree into the smarter version. My first instinct is to traverse recursively, but I'm not sure how I'd then be able to distinguish the parent and the level.
Copy the old tree to a new tree, using the normal recursive methods to traverse the original.
Since you're adding new attributes to the nodes, I presume you'll need to construct new nodes with fields for the new attributes.
Define a recursive function to copy the (sub)tree rooted at a given node. It needs as input its depth and parent. (The parent, of course, needs to be what will be its parent in the new tree.) Let it return the root of the new (sub)tree.
function copy_node (old_node, new_parent, depth) -> returns new_node {
new_node = new node
new_node.data = old_root.data // whatever that data might be
new_node.depth = depth
new_node.parent = parent
new_node.left = copy_node (old_node.left, new_node, depth + 1)
new_node.right = copy_node (old_node.right, new_node, depth + 1)
return new_node }
Copy the whole tree with
new_tree = copy_node (old_tree, nil, 0)
If you're using a language where fields can be added to existing objects willy-nilly, you don't even have to do the extra copying:
function adorn_node (node, parent, depth) {
node.parent = parent
node.depth = depth
adorn_node (node.left, node, depth + 1)
adorn_node (node.right, node, depth + 1) }
and start the ball rolling with
adorn_node (root, nil, 0)
That having been said, you will probably discover that there is a very good reason why most binary tree implementations do not contain these extra fields. It's a lot of work to maintain them across the many different operations you want to perform on trees. depth, especially, is hard to keep correct when you need to re-balance a tree.
And the fields don't generally buy you anything. Most algorithms that operate on trees do so using recursive functions, and as you can see from the above examples it's really easy to re-calculate both parent and depth on the fly while you're walking the tree. They don't need to be stored in the nodes themselves.
Tree-balancing often needs to know the difference in heights of the left and right subtrees. ("depth" is the distance to the root; "height" is the distance to the most distant leaf node in the subtree.) height is not so easy to calculate on the way down from the root, but fortunately you're usually only interested in which of the subtrees has the greatest height, and for that it's usually sufficient to store only the values -1, 0, +1 in each node.

Efficiently convert array to cartesian tree

I know how to convert an array to a cartesian tree in O(n) time
http://en.wikipedia.org/wiki/Cartesian_tree#Efficient_construction and
http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor#From RMQ to LCA
However, the amount of memory required is too high (constants) since I need to associate a left and right pointer at least with every node in the cartesian tree.
Can anyone link me to work done to reduce these constants (hopefully to 1)?
You do not need to keep the right and left pointers associated with your cartesian tree nodes.
You just need to keep the parent of each node and by the definition of cartesian tree
(A Cartesian Tree of an array A[0, N - 1] is a binary tree C(A) whose root is a minimum element of A, labeled with the position i of this minimum. The left child of the root is the Cartesian Tree of A[0, i - 1] if i > 0, otherwise there's no child. The right child is defined similary for A[i + 1, N - 1].), you can just traverse through this array and if the parent of the node has lower index than the node itself than the node will be the right son of its parent and similarly if the parent of the node has higher index than the node will be left son of its parent.
Hope this helps.
It is possible to construct a Cartesian tree with only extra space for child-to-parent references (by index): so besides the input array, you would need an array of equal size, holding index values that relate to the first array. If we call that extra array parentOf, then array[parentOf[i]] will be the parent of array[i], except when array[i] is the root. In that case parentOf[i] should be like a NIL pointer (or, for example, -1).
The Wikipedia article on Cartesian trees, gives a simple construction method:
One method is to simply process the sequence values in left-to-right order [...] in a structure that allows both upwards and downwards traversal of the tree
This may give the impression that it is necessary for that algorithm to maintain both upwards and downwards links in the tree, but this is not the case. It can be done with only maintaining links from child to parent.
During the construction, a new value is injected into the path that ends in the rightmost node (having the value that was most recently added). Any child in that path is by necessity a right child of its parent.
While walking up that path in the opposite direction, from the leaf, keep track of a parent and its right child (where you came from). Once you find the insertion point, that child will get the new node as parent, and the new child will get the "old" parent as its parent.
At no instance in this process do you need to store pointers to children.
Here is the algorithm written in JavaScript. As example, the tree is populated from the input array [9,3,7,1,8,12,10,20,15,18,5]. For verification only, both the input array and the parent references are printed:
class CartesianTree {
constructor() {
this.values = [];
this.parentOf = [];
}
extend(values) {
for (let value of values) this.push(value);
}
push(value) {
let added = this.values.length; // index of the new value
let parent = added - 1; // index of the most recently added value
let child = -1; // a NIL pointer
this.values.push(value);
while (parent >= 0 && this.values[parent] > value) {
child = parent;
parent = this.parentOf[parent]; // move up
}
// inject the new node between child and parent
this.parentOf[added] = parent;
if (child >= 0) this.parentOf[child] = added;
}
}
let tree = new CartesianTree;
tree.extend([9,3,7,1,8,12,10,20,15,18,5]);
printArray("indexes:", tree.values.keys());
printArray(" values:", tree.values);
printArray("parents:", tree.parentOf);
function printArray(label, arr) {
console.log(label, Array.from(arr, value => (""+value).padStart(3)).join(" "));
}
You can use a heap to store your tree, essentially it is an array where the first element int he array is the root, the second is the left child of the root the third the right, etc.. it is much cheaper but requires a little more care when programming it.
http://en.wikipedia.org/wiki/Binary_heap

Resources