An algorithm to operate on a binary tree leaves? - algorithm

I didn't systematically learn the Data Structure and Algorithm course in the uni (just read some books) and would like to ask if there is a well-defined algorithm to do the following work for a binary tree:
For a given binary tree and a positive integer n, search its leaves. If the difference between the depth of the two adjacent leaves (imagine all leaves are display as an array, so two adjacent leaves may be in two different sub-trees) is larger than n. Subdivide the leaf with lower depth. Recursively do this operation until no subdivision is required.
The following figure is a demonstration, for n:
Since leaf 1's depth is smaller than leaf 2 by 2, leaf 1 need to be subdivided:
Now no further subdivision is required.

Initialize P as empty
DFS the remaining part of the tree until you are in a leaf L; exit if no more leaves
Compare the current depth DL with the depth DP of the previous leaf P; go to [5] if P is empty
If |DL – DP| ≥ N, split either L (when DL < DP) or P (otherwise)
Consider L the new P and go to [2]
Repeat this until the tree converges (i.e. until no splits can be performed).

Related

How many non-isomorphic full binary trees can n nodes generate?

Isomorphism means that arbitary sub-trees of a full binary tree swapping themselves can be identical to another one.
The answer is definitely not Catalan Number, because the amount of Catalan Number counts the isomorphic ones.
I assume by n nodes you mean n internal nodes. So it will have 2n+1 vertices, and 2n edges.
Next, we can put an order on binary trees as follows. A tree with more nodes is bigger. If two trees have the same number of nodes, compare the left side, and break ties by comparing the right. If two trees are equal in this order, it isn't hard to show by induction that they are the same tree.
For your problem, we can assume that for each isomorphism class we are only interested in the maximal tree in that isomorphism class. Note that this means that both the left and the right subtrees must also be maximal in their isomorphism classes, and the left subtree must be the same as or bigger than the right.
So suppose that f(n) is the number of non-isomorphic binary trees with n nodes. We can now go recursively. Here are our cases:
n=0 there is one, the empty tree.
n=1 there is one. A node with 2 leaves.
n > 1. Let us iterate over m, the number on the right. If 2m+1 < n then there are f(m) maximal trees on the right, f(n-m-1) on the left, and all of those are maximal for f(m) * f(n-m-1). If 2m+1 = n then we want a maximal tree on the right with m nodes, and a maximal tree on the left with m nodes, and the one on the right has to be smaller than or equal to the one on the left. But there is a well-known formula for how many ways to do that, which is f(m) * (f(m) + 1) / 2. And finally we can't have n < 2m+1 because in that case we don't have a maximal tree.
Using this, you should be able to write a recursive function to calculate the answer.
UPDATE Here is such a function:
cache = {0: 1, 1:1}
def count_nonisomorphic_full_trees (n):
if n not in cache:
answer = 0
for m in range(n):
c = count_nonisomorphic_full_trees(m)
if n < m+m+1:
break
elif n == m+m+1:
answer = answer + c*(c+1)//2
else:
d = count_nonisomorphic_full_trees(n-m-1)
answer = answer + c*d
cache[n] = answer
return cache[n]
Note that it starts off slower than the Catalan numbers but still grows exponentially.

Find algorithm in 2-3 BST tree

Is there an algorithm that with a given 2-3 tree T and a pointer to some node v in said tree, the algo can change the key of the node v so T would remain a legal 2-3 tree, in O(logn/loglogn) amortized efficiency?
No.
Assume it was possible, with the algorithm f, we will show we can sort an array with O(n*logn/loglogn) time complexity.
sort array A of length n:
(1) Create an 2-3 tree of size n, with no importance to keys. let it be T.
(2) store all pointers to nodes in T in a second array B.
(3) for each i from 0 to n:
(3.1) f(B[i],A[i]) //modify the tree: pointer: B[i] new value: A[i]
(4) extract elements from T back to A inorder.
correctness:
After each activation of f the tree is legal. After finishing activating f on all elements of T and all elements of A, the tree is legal and contains all elements. Thus, extracting elements from A, we get back the sorted array.
complexity:
(1)Creating a tree [no importance which keys we put] is O(n) we can put 0 in all elements, it doesn't matter
(2)iterating T and creating B is O(n)
(3)activating f is O(logn/loglogn), thus invoking it n times is O(n*logn/loglogn)
(4) extracting elements is just a traversal: O(n)
Thus: total complexity is O(n*logn/loglogn)
But sorting is an Omega(nlogn) problem with comparisons based algorithms. contradiction.
Conclusion: desired f doesn't exist.

How to find largest common sub-tree in the given two binary search trees?

Two BSTs (Binary Search Trees) are given. How to find largest common sub-tree in the given two binary trees?
EDIT 1:
Here is what I have thought:
Let, r1 = current node of 1st tree
r2 = current node of 2nd tree
There are some of the cases I think we need to consider:
Case 1 : r1.data < r2.data
2 subproblems to solve:
first, check r1 and r2.left
second, check r1.right and r2
Case 2 : r1.data > r2.data
2 subproblems to solve:
- first, check r1.left and r2
- second, check r1 and r2.right
Case 3 : r1.data == r2.data
Again, 2 cases to consider here:
(a) current node is part of largest common BST
compute common subtree size rooted at r1 and r2
(b)current node is NOT part of largest common BST
2 subproblems to solve:
first, solve r1.left and r2.left
second, solve r1.right and r2.right
I can think of the cases we need to check, but I am not able to code it, as of now. And it is NOT a homework problem. Does it look like?
Just hash the children and key of each node and look for duplicates. This would give a linear expected time algorithm. For example, see the following pseudocode, which assumes that there are no hash collisions (dealing with collisions would be straightforward):
ret = -1
// T is a tree node, H is a hash set, and first is a boolean flag
hashTree(T, H, first):
if (T is null):
return 0 // leaf case
h = hash(hashTree(T.left, H, first), hashTree(T.right, H, first), T.key)
if (first):
// store hashes of T1's nodes in the set H
H.insert(h)
else:
// check for hashes of T2's nodes in the set H containing T1's nodes
if H.contains(h):
ret = max(ret, size(T)) // size is recursive and memoized to get O(n) total time
return h
H = {}
hashTree(T1, H, true)
hashTree(T2, H, false)
return ret
Note that this is assuming the standard definition of a subtree of a BST, namely that a subtree consists of a node and all of its descendants.
Assuming there are no duplicate values in the trees:
LargestSubtree(Tree tree1, Tree tree2)
Int bestMatch := 0
Int bestMatchCount := 0
For each Node n in tree1 //should iterate breadth-first
//possible optimization: we can skip every node that is part of each subtree we find
Node n2 := BinarySearch(tree2(n.value))
Int matchCount := CountMatches(n, n2)
If (matchCount > bestMatchCount)
bestMatch := n.value
bestMatchCount := matchCount
End
End
Return ExtractSubtree(BinarySearch(tree1(bestMatch)), BinarySearch(tree2(bestMatch)))
End
CountMatches(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return 0
End
Return 1 + CountMatches(n1.left, n2.left) + CountMatches(n1.right, n2.right)
End
ExtractSubtree(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return nil
End
Node result := New Node(n1.value)
result.left := ExtractSubtree(n1.left, n2.left)
result.right := ExtractSubtree(n1.right, n2.right)
Return result
End
To briefly explain, this is a brute-force solution to the problem. It does a breadth-first walk of the first tree. For each node, it performs a BinarySearch of the second tree to locate the corresponding node in that tree. Then using those nodes it evaluates the total size of the common subtree rooted there. If the subtree is larger than any previously found subtree, it remembers it for later so that it can construct and return a copy of the largest subtree when the algorithm completes.
This algorithm does not handle duplicate values. It could be extended to do so by using a BinarySearch implementation that returns a list of all nodes with the given value, instead of just a single node. Then the algorithm could iterate this list and evaluate the subtree for each node and then proceed as normal.
The running time of this algorithm is O(n log m) (it traverses n nodes in the first tree, and performs a log m binary-search operation for each one), putting it on par with most common sorting algorithms. The space complexity is O(1) while running (nothing allocated beyond a few temporary variables), and O(n) when it returns its result (because it creates an explicit copy of the subtree, which may not be required depending upon exactly how the algorithm is supposed to express its result). So even this brute-force approach should perform reasonably well, although as noted by other answers an O(n) solution is possible.
There are also possible optimizations that could be applied to this algorithm, such as skipping over any nodes that were contained in a previously evaluated subtree. Because the tree-walk is breadth-first we know than any node that was part of some prior subtree cannot ever be the root of a larger subtree. This could significantly improve the performance of the algorithm in certain cases, but the worst-case running time (two trees with no common subtrees) would still be O(n log m).
I believe that I have an O(n + m)-time, O(n + m) space algorithm for solving this problem, assuming the trees are of size n and m, respectively. This algorithm assumes that the values in the trees are unique (that is, each element appears in each tree at most once), but they do not need to be binary search trees.
The algorithm is based on dynamic programming and works with the following intution: suppose that we have some tree T with root r and children T1 and T2. Suppose the other tree is S. Now, suppose that we know the maximum common subtree of T1 and S and of T2 and S. Then the maximum subtree of T and S
Is completely contained in T1 and r.
Is completely contained in T2 and r.
Uses both T1, T2, and r.
Therefore, we can compute the maximum common subtree (I'll abbreviate this as MCS) as follows. If MCS(T1, S) or MCS(T2, S) has the roots of T1 or T2 as roots, then the MCS we can get from T and S is given by the larger of MCS(T1, S) and MCS(T2, S). If exactly one of MCS(T1, S) and MCS(T2, S) has the root of T1 or T2 as a root (assume w.l.o.g. that it's T1), then look up r in S. If r has the root of T1 as a child, then we can extend that tree by a node and the MCS is given by the larger of this augmented tree and MCS(T2, S). Otherwise, if both MCS(T1, S) and MCS(T2, S) have the roots of T1 and T2 as roots, then look up r in S. If it has as a child the root of T1, we can extend the tree by adding in r. If it has as a child the root of T2, we can extend that tree by adding in r. Otherwise, we just take the larger of MCS(T1, S) and MCS(T2, S).
The formal version of the algorithm is as follows:
Create a new hash table mapping nodes in tree S from their value to the corresponding node in the tree. Then fill this table in with the nodes of S by doing a standard tree walk in O(m) time.
Create a new hash table mapping nodes in T from their value to the size of the maximum common subtree of the tree rooted at that node and S. Note that this means that the MCS-es stored in this table must be directly rooted at the given node. Leave this table empty.
Create a list of the nodes of T using a postorder traversal. This takes O(n) time. Note that this means that we will always process all of a node's children before the node itself; this is very important!
For each node v in the postorder traversal, in the order they were visited:
Look up the corresponding node in the hash table for the nodes of S.
If no node was found, set the size of the MCS rooted at v to 0.
If a node v' was found in S:
If neither of the children of v' match the children of v, set the size of the MCS rooted at v to 1.
If exactly one of the children of v' matches a child of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the subtree rooted at that child.
If both of the children of v' match the children of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the left subtree plus the size of the MCS of the right subtree.
(Note that step (4) runs in expected O(n) time, since it visits each node in S exactly once, makes O(n) hash table lookups, makes n hash table inserts, and does a constant amount of processing per node).
Iterate across the hash table and return the maximum value it contains. This step takes O(n) time as well. If the hash table is empty (S has size zero), return 0.
Overall, the runtime is O(n + m) time expected and O(n + m) space for the two hash tables.
To see a correctness proof, we proceed by induction on the height of the tree T. As a base case, if T has height zero, then we just return zero because the loop in (4) does not add anything to the hash table. If T has height one, then either it exists in T or it does not. If it exists in T, then it can't have any children at all, so we execute branch 4.3.1 and say that it has height one. Step (6) then reports that the MCS has size one, which is correct. If it does not exist, then we execute 4.2, putting zero into the hash table, so step (6) reports that the MCS has size zero as expected.
For the inductive step, assume that the algorithm works for all trees of height k' < k and consider a tree of height k. During our postorder walk of T, we will visit all of the nodes in the left subtree, then in the right subtree, and finally the root of T. By the inductive hypothesis, the table of MCS values will be filled in correctly for the left subtree and right subtree, since they have height ≤ k - 1 < k. Now consider what happens when we process the root. If the root doesn't appear in the tree S, then we put a zero into the table, and step (6) will pick the largest MCS value of some subtree of T, which must be fully contained in either its left subtree or right subtree. If the root appears in S, then we compute the size of the MCS rooted at the root of T by trying to link it with the MCS-es of its two children, which (inductively!) we've computed correctly.
Whew! That was an awesome problem. I hope this solution is correct!
EDIT: As was noted by #jonderry, this will find the largest common subgraph of the two trees, not the largest common complete subtree. However, you can restrict the algorithm to only work on subtrees quite easily. To do so, you would modify the inner code of the algorithm so that it records a subtree of size 0 if both subtrees aren't present with nonzero size. A similar inductive argument will show that this will find the largest complete subtree.
Though, admittedly, I like the "largest common subgraph" problem a lot more. :-)
The following algorithm computes all the largest common subtrees of two binary trees (with no assumption that it is a binary search tree). Let S and T be two binary trees. The algorithm works from the bottom of the trees up, starting at the leaves. We start by identifying leaves with the same value. Then consider their parents and identify nodes with the same children. More generally, at each iteration, we identify nodes provided they have the same value and their children are isomorphic (or isomorphic after swapping the left and right children). This algorithm terminates with the collection of all pairs of maximal subtrees in T and S.
Here is a more detailed description:
Let S and T be two binary trees. For simplicity, we may assume that for each node n, the left child has value <= the right child. If exactly one child of a node n is NULL, we assume the right node is NULL. (In general, we consider two subtrees isomorphic if they are up to permutation of the left/right children for each node.)
(1) Find all leaf nodes in each tree.
(2) Define a bipartite graph B with edges from nodes in S to nodes in T, initially with no edges. Let R(S) and T(S) be empty sets. Let R(S)_next and R(T)_next also be empty sets.
(3) For each leaf node in S and each leaf node in T, create an edge in B if the nodes have the same value. For each edge created from nodeS in S to nodeT in T, add all the parents of nodeS to the set R(S) and all the parents of nodeT to the set R(T).
(4) For each node nodeS in R(S) and each node nodeT in T(S), draw an edge between them in B if they have the same value AND
{
(i): nodeS->left is connected to nodeT->left and nodeS->right is connected to nodeT->right, OR
(ii): nodeS->left is connected to nodeT->right and nodeS->right is connected to nodeT->left, OR
(iii): nodeS->left is connected to nodeT-> right and nodeS->right == NULL and nodeT->right==NULL
(5) For each edge created in step (4), add their parents to R(S)_next and R(T)_next.
(6) If (R(S)_next) is nonempty {
(i) swap R(S) and R(S)_next and swap R(T) and R(T)_next.
(ii) Empty the contents of R(S)_next and R(T)_next.
(iii) Return to step (4).
}
When this algorithm terminates, R(S) and T(S) contain the roots of all maximal subtrees in S and T. Furthermore, the bipartite graph B identifies all pairs of nodes in S and nodes in T that give isomorphic subtrees.
I believe this algorithm has complexity is O(n log n), where n is the total number of nodes in S and T, since the sets R(S) and T(S) can be stored in BST’s ordered by value, however I would be interested to see a proof.

N-ary trees - is it symmetric or not

Given an N-ary tree, find out if it is symmetric about the line drawn through the root node of the tree. It is easy to do it in case of a binary tree. However for N-ary trees it seems to be difficult
One way to think about this problem is to notice that a tree is symmetric if it is its own reflection, where the reflection of a tree is defined recursively:
The reflection of the empty tree is itself.
The reflection of a tree with root r and children c1, c2, ..., cn is the tree with root r and children reflect(cn), ..., reflect(c2), reflect(c1).
You can then solve this problem by computing the tree's reflection and checking if it's equal to the original tree. This again can be done recursively:
The empty tree is only equal to itself.
A tree with root r and children c1, c2, ..., cn is equal to another tree T iff the other tree is nonempty, has root r, has n children, and has children that are equal to c1, ..., cn in that order.
Of course, this is a bit inefficient because it makes a full copy of the tree before doing the comparison. The memory usage is O(n + d), where n is the number of nodes in the tree (to hold the copy) and d is the height of the tree (to hold the stack frames in the recursion tom check for equality). Since d = O(n), this uses O(n) memory. However, it runs in O(n) time since each phase visits each node exactly once.
A more space-efficient way of doing this would be to use the following recursive formulation:
1. The empty tree is symmetric.
2. A tree with n children is symmetric if the first and last children are mirrors, the second and penultimate children are mirrors, etc.
You can then define two trees to be mirrors as follows:
The empty tree is only a mirror of itself.
A tree with root r and children c1, c2,..., cn is a mirror of a tree with root t and children d1, d2, ..., dn iff r = t, c1 is a mirror of dn, c2 is a mirror of dn-1, etc.
This approach also runs in linear time, but doesn't make a full copy of the tree. Comsequently, the memory usage is only O(d), where d is the depth of the tree. This is at worst O(n) but is in all likelihood much better.
I would just do an in-order tree traversal( node left right) on the left sub-tree and save it to a list. Then do another in-order tree traversal (node right left) on the right sub-tree and save it to a list. Then, you can just compare the two lists. They should be the same.
Take a stack
Now each time start traversing through root node,
now recursively call a function and push the element of left sub tree one by one at a particular level.
maintain a global variable and update its value each time a left sub tree is pushed onto the stack.now call recursively(after recursive call to left sub tree)the right sub and pop on each correct match.doing this will ensure that it is being checked in symmetric manner.
At the end if stack is empty ,i.e. all elements are processed and each element of stack has been popped out..you are through!
One more way to answer this question is to look at each level and see if each level is palindrome or not. For Null nodes, we can keep adding dummy nodes with any value which is unique.
It's not difficult. I'm going to play golf with this question. I got 7... anyone got better?
data Tree = Tree [Tree]
symmetrical (Tree ts) =
(even n || symmetrical (ts !! m)) &&
all mirror (zip (take m ts) (reverse $ drop (n - m) ts))
where { n = length ts; m = n `div` 2 }
mirror (Tree xs, Tree ys) =
length xs == length ys && all mirror (zip xs (reverse ys))

Enumerate search trees

According to this question the number of different search trees of a certain size is equal to a catalan number. Is it possible to enumerate those trees? That is, can someone implement the following two functions:
Node* id2tree(int id); // return root of tree
int tree2id(Node* root); // return id of tree
(I ask because the binary code for the tree (se one of the answers to this question) would be a very efficient code for representing arbitrarily large integers of unknown range, i.e, a variable length code for integers
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
etc
notice that the number of integers of each code length is 1, 1, 2, 5,.. (the catalan sequence). )
It should be possible to convert the id to tree and back.
The id and bitstrings being:
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
First consider the fact that given a bitstring, we can easily move to the tree (a recursive method) and viceversa (preorder, outputting 1 for parent and 0 for leaf).
The main challenge comes from trying to map the id to the bitstring and vice versa.
Suppose we listed out the trees of n nodes as follows:
Left sub-tree n-1 nodes, Right sub-tree 0 nodes. (Cn-1*C0 of them)
Left sub-tree n-2 nodes, Right sub-tree 1 node. (Cn-2*C1 of them)
Left sub-tree n-3 nodes, right sub-tree 2 nodes. (Cn-3*C2 of them)
...
...
Left sub-tree 0 nodes, Right sub-tree n-1 nodes. (C0*Cn-1 of them)
Cr = rth catalan number.
The enumeration you have given seems to come from the following procedure: we keep the left subtree fixed, enumerate through the right subtrees. Then move onto the next left subtree, enumerate through the right subtrees, and so on. We start with the maximum size left subtree, then next one is max size -1, etc.
So say we have an id = S say. We first find an n such that
C0 + C1 + C2 + ... + Cn < S <= C0+C1+ C2 + ... +Cn+1
Then S would correspond to a tree with n+1 nodes.
So you now consider P = S - (C0+C1+C2+ ...+Cn), which is the position in the enumeration of the trees of n+1 nodes.
Now we figure out an r such that Cn*C0 + Cn-1*C1 + .. + Cn-rCr < P <= CnC0 + Cn-1*C1 + .. + Cn-r+1*Cr-1
This tell us how many nodes the left subtree and the right subtree have.
Considering P - Cn*C0 + Cn-1*C1 + .. + Cn-r*Cr , we can now figure out the exact left subtree enumeration position(only considering trees of that size) and the exact right subtree enumeration position and recursively form the bitstring.
Mapping the bitstring to the id should be similar, as we know what the left subtree and right subtrees look like, all we would need to do is find the corresponding positions and do some arithmetic to get the ID.
Not sure how helpful it is though. You will be working with some pretty huge numbers all the time.
For general (non-search) binary trees I can see how this would be possible, since when building up the tree there are three choices (the amount of children) for every node, only restricted by having the total reach exactly N. You could find a way to represent such a tree as a sequence of choices (by building up the tree in a specific order), and represent that sequence as a base-3 number (or perhaps a variable base would be more appropriate).
But for binary search trees, not every organisation of elements is acceptable. You have to obey the numeric ordering constraints as well. On the other hand, since insertion into a binary search tree is well-defined, you can represent an entire tree of N elements by having a list of N numbers in a specific insertion order. By permuting the numbers to be in a different order, you can generate a different tree.
Permutations are of course easily counted by using variable-base numbers: You have N choices for the first item, N-1 for the second, etc. That gives you a sequence of N numbers that you can encode as a number with base varying from N to 1. Encoding and decoding from variable-base to binary or decimal is trivially adapted from a normal fixed-base conversion algorithm. (The ones that use modulus and division operations).
So you can convert a number to and from a permutation, and given a list of numbers you can convert a permutation (of that list) from and to a binary search tree. Now I think that you could get all the possible binary search trees of size N by permuting just the integers 1 to N, but I'm not entirely sure, and attempting to prove that is a bit too much for this post.
I hope this is a good starting point for a discussion.

Resources