Augmenting AVL tree, sequence - algorithm

I found this in book:
Design a data structure for maintaining dynamic sequence x_1, x_2,... , x_n
which provides operations:
Insert(a,i) - inserts a as x_i, indexes from i+1 to n go up by 1
Delete(i) - deletes x_i, indexes from i+1 to n go down by 1
Find(i) - returns element x_i
Sum_even() - returns sum of the elements with even indexes
The sequence is dynamic so I want to use AVL tree to keep it. So I think I can use standard trick for find, delete and insert - just keep in each node size of sub tree rooted in this node. Then I can easily navigate the tree to find ith element in O(log n) time. Inorder gives me then: x_1, x_2,..., x_n.
But for Sum_even() I probably should have also sum of elements with even and odd indexes in each node. Although it's difficult to update while inserting or deleting. How should it work, can anyone help?

You don't have to store size of subtree in a node. AVL tree stores only difference in heights of left and right subtree: -1, 0 or +1.
In order to easily maintain sum_even you need to store sum_even and sum_odd for each node. It will be the sum of values on even/odd indexes for subtree.
So each node will have variables:
difference difference in height of left and right subtrees
value
parity parity of subtree size (0 or 1)
sum_even
sum_odd
For inserts and deletes use standard algorithm http://en.wikipedia.org/wiki/AVL_tree.
But for each affected node (nodes on the way up to root and rotated nodes) update values:
parity := (left.parity + right.parity + 1) % 2
if left.parity == 0
sum_even := left.sum_even + right.sum_odd
sum_odd := left.sum_odd + right.sum_even + value
else
sum_even := left.sum_even + right.sum_even + value
sum_odd := left.sum_odd + right.sum_odd

Related

Find the largest subarray with all ones

Given a binary array (element is 0 or 1), I need to find the maximum length of sub array having all ones for given range(l and r) in the array.
I know the O(n) approach to find such sub array but if there are O(n) queries the overall complexity becomes O(n^2).
I know that segment tree is used for such type of problems but I cant figure out how to build tree for this problem.
Can I build a segment tree using which I can answer queries in log(n) time such that for O(n) queries overall complexity will become O(nlog(n)).
Let A be your binary array.
Build two array IL and IR:
- IL contains, in order, every i such that A[i] = 1 and (i = 0 or A[i-1] = 0);
- IR contains, in order, every i such that A[i-1] = 1 and (i = N or A[i] = 0).
In other words, for any i, the range defined by IL[i] inclusive and IR[i] non-inclusive corresponds to a sequence of 1s in A.
Now, for any query {L, R} (for the range [L; R] inclusive), let S = 0. Traverse both IL and IR with i, until IL[i] >= L. At this point, if IR[i-1] > L, set S = IR[i-1]-L. Continue traversing IL and IR, setting S = max(S, IR[i]-IL[i]), until IR[i] > R. Finally, if IL[i] <= R, set S = max(S, R-IL[i]).
S is now the size of the greatest sequence of 1s in A between L and R.
The complexity of building IL and IR is O(N), and the complexity of answering a query is O(M), with M the length of IL or IR.
Yes, you can use a segment tree to solve this problem.
Let's try to think what that tree must look like. Obviously, every node must contain the length of max subarray of 1s and 0s in that range.
Now, how do we join two nodes into a bigger one. In other words, you have a node representing [low, mid) and a node representing [mid, high). You have to obtain max subarray for [low, high).
First things first, max for whole will at least be max for parts. So we have to take the maximum among the left and right values.
But what if the real max subarray overlaps both nodes? Well, then it must be the rightmost part of left node and leftmost part of right node. So, we need to keep track of longest subarray at start and end as well.
Now, how to update these left and rightmost subarray lengths? Well, leftmost of parent node must be leftmost of left child, unless leftmost of left child spans the entire left node. In that case, leftmost of parent node will be leftmost of left + leftmost of right node.
A similar rule applies to tracking the rightmost subarray of 1s.
And we're finished. Here's the final rules in pseudo code.
max_sub[parent] = max(max_sub[left], max_sub[right], right_sub[left] + left_sub[right])
left_sub[parent] = left_sub[left] if left_sub[left] < length[left] else left_sub[left] + left_sub[right]
right_sub[parent] = right_sub[right] if right_sub[right] < length[right] else right_sub[right] + right_sub[left]
Note that you will need to take similar steps when finding the result for a range.
Here's an example tree for the array [0, 1, 1, 0, 1, 1, 1, 0].

Find two elements with smallest absolute difference in an interval

I'm given an array and a list of queries of type L R which mean find the smallest absolute difference between any two array elements such that their indices are between L and R inclusive (Here the starting index of array is at 1 instead of at 0)
For example take the array a with elements 2 1 8 5 11 then the query 1-3 which would be (2 1 8) the answer would be 1=2-1, or the query 2-4 (1 8 5) where the answer would be 3=8-5
Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.
The problem is that I'll have a lot of intervals to check I have to keep the original array intact.
What I've done is I constructed a new array b with indices from the first one such that a[b[i]] <= a[b[j]] for i <= j. Now for each query I loop through the whole array and look if b[j] is between L and R if it is compare its absolute difference to the first next element that is also between L and R keep the minimum and then do the same for that element until you get to the end.
This is inefficient because for each query I have to check all elements of the array especially if the query is small compared to the size of array. I'm looking for a time efficient approach.
EDIT: The numbers don't have to be consecutive, perhaps I gave a bad array as an example, What I've meant for example if it's 1 5 2 then the smallest difference is 1=2-1. In a sorted array the smallest difference is guaranteed to be between two consecutive elements, that's why I've thought of sorting
I'll sketch an O(n (√n) log n)-time solution, which might be fast enough? When I gave up sport programming, computers were a lot slower.
The high-level idea is to apply Mo's trick to a data structure with the following operations.
insert(x) - inserts x into the underlying multiset
delete(x) - deletes one copy of x from the underlying multiset
min-abs-diff() - returns the minimum absolute difference
between two elements of the multiset
(0 if some element has multiplicity >1)
Read in all of the query intervals [l, r], sort them in order of lexicographically nondecreasing (floor(l / sqrt(n)), r) where n is the length of the input, and then to process an interval I, insert the elements in I - I' where I' was the previous interval, delete the elements in I' - I, and report the minimum absolute difference. (The point of the funny sort order is to reduce the number of operations from O(n^2) to O(n √n) assuming n queries.)
There are a couple ways to implement the data structure to have O(log n)-time operations. I'm going to use a binary search tree for clarity of exposition, but you could also sort the array and use a segment tree (less work if you don't have a BST implementation that lets you specify decorations).
Add three fields to each BST node: min (minimum value in the subtree rooted at this node), max (maximum value in the subtree rooted at this node), min-abs-diff (minimum absolute difference between values in the subtree rooted at this node). These fields can be computed bottom-up like so.
if node v has left child u and right child w:
v.min = u.min
v.max = w.max
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max,
w.min - v.value, w.min-abs-diff)
if node v has left child u and no right child:
v.min = u.min
v.max = v.value
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
if node v has no left child and right child w:
v.min = v.value
v.max = w.max
v.min-abs-diff = min(w.min - v.value, w.min-abs-diff)
if node v has no left child and no right child:
v.min = v.value
v.max = v.value
v.min-abs-diff = ∞
This logic can be implemented pretty compactly.
if v has a left child u:
v.min = u.min
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
else:
v.min = v.value
v.min-abs-diff = ∞
if v has a right child w:
v.max = w.max
v.min-abs-diff = min(v.min-abs-diff, w.min - v.value, w.min-abs-diff)
else:
v.max = v.value
insert and delete work as usual, except that the decorations need to be updated along the traversal path. The total time is still O(log n) for reasonable container choices.
min-abs-diff is implemented by returning root.min-abs-diff where root is the root of the tree.
EDIT #2: My answer determines the smallest difference between any two adjacent values in a sequence, not the smallest difference between any two values in the sequence.
When you say that you have a lot of intervals to check, do you happen to mean that you have to perform checks of many intervals over the same sequence of numbers? If so, what if you just pre-computed the differences from one number to the next? E.g., in Python:
elements = [2, 1, 8, 5, 11]
def get_differences(sequence):
"""Yield absolute differences between each pair of items in the sequence"""
it = iter(sequence)
sentinel = object()
previous = next(it, sentinel)
if previous is sentinel:
return ()
for current in it:
yield abs(previous - current)
previous = current
differences = list(get_differences(elements)) # differences = [1, 7, 3, 6]
Then when you have to find the minimum difference, just return min(differences[start_index:stop_index-1].
EDIT: I missed your paragraph:
Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.
But I still think what I'm saying makes sense; you don't have to sort the entire collection but you still need to do an O(n) operation. If you're dealing with numeric values on a platform where the numbers can be represented as machine integers or floats, then as long as you use an array-like container, this should be cache friendly and relatively efficient. If you happen to have repeated queries, you might be able to do some memoization to cache pre-computed results.

Each subtree sum in a binary tree in O(N) time and O(1) space?

Given a binary tree, I want to return the maximum sum subtree's root.
Maximum Subtree: A subtree of the tree, whose sum of all its nodes is greater than that of the any other sub tree.
Edit: Node value is an integer.
I can do the following which takes O(n^2).
Calculate the sum of all nodes in the left subtree
Calculate the sum of all nodes in the right subtree
If sum of left subtree and right subtree and value of the root is greater than the current maximum sum, root is stored in the result
call this function recursively , with left subtree as root
call this function recursively , with right subtree as root.
This will take O(n^2).
I can change it to a bottom up approach and use a hashmap to store the node to sum mapping which will make it O(N) but it will take O(N) space.
Is there any approach/method, that is O(N) time and O(1) space ?
That's almost your solution, but takes O(n) and O(h) memory. You just visit each node only once.
calculateSum(vertex):
if not vertex:
return 0
sum = calculateSum(left) + calculateSum(right) + vertex.value
if (sum > max)
max = sum
return sum
Yes, you can do it in O(n) with O(h) space if you calculate the sum on the fly. while iterating the tree. (In here, h is the maximal height of the tree, which is the size of the recursion stack).
Pseudo code:
TreeSum(v):
if (v == null):
return 0
v_sum = TreeSum(v.left) + TreeSum(v.right) + v.value
# max_sum is some global space variable holding the max_sum.
# you hold only once such variable.
if v_sum > max_sum:
max_sum = v_sum
return v_sum
When you are done, max_value hold the value of the maximal such sum.
If you also need the node itself, hold an extra variable which is a pointer to the relevant node, and is modified together with max_sum.
The idea is to do a post order traversal on the tree. First calculate the sum of each subtree, and only then - calculate the value of the root.
While calculating the sum of each subtree, also modify max_sum when you find a new "best" subtree.

Partition a binary tree into k parts with similar sizes

I was trying to split a binary-tree into k similar-sized parts (by removing k-1 edges). Is there any efficient algorithm for this problem? Or is it NP-hard? Any pointers to papers, problem definitions, etc?
-- One reasonable metric for evaluating the quality of partitioning could be the size gap between the largest and smallest partition; another metric could be making the smallest partition having as many vertices as possible.
I can suggest pretty fast solution for making the smallest part having as many vertices as possible metric.
Let suppose we guess the size S of smallest partit and want check if it's correct.
First I want to make a few statements:
If total size of tree bigger than S there is at least one subtree which is bigger than S and all subtrees of that subtree are smaller. (It's enough to check both biggest.)
If there is some way to split tree where size of smallest part >= S and we have subtree T all subtrees of which are smaller than S than we can grant that no edges inside T are deleted. (Cause any such deletion will create a partition which will be smaller than S)
If there is some way to split tree where size of smallest part >= S, and we have some subtree T which size >= S, has no deleted edges inside but is not one of parts, we can split the tree in other way where subtree T will be one of parts itself and all parts will be no smaller than S. (Just move some extra vertices from original part to any other part, this other part will not become smaller.)
So here is an algorithm to check if we can split the tree in k parts no smaller than S.
find all suitable vertices (roots of subtrees of size >= S and size for both childs < S) and add them in list. You can start from the root and move through all vertices while subtrees are bigger than S.
While list not empty and number of parts lesser then K take a vertice from the list and cut it off the tree. Than update size of subtrees for parent vertices and add to the list if one of them become suitable.
You even have no need to update all the parent vertices, only until you will find first which's new subtree size is bigger than S, parent vertices cant't be suitable for adding in list yet and can be updated later.
You may need to construct tree back to restore original subtree sizes assigned to the vertices.
Now we can use bisection method. We can determine upper bound as Smax = n/k and lower bound can be retrieved from equation (2*Smin- 1)*(K - 1) + Smin = N it will grants that if we will cut off k-1 subtrees with two child subtrees of size Smin - 1 each, we will have part of size Smin left. Smin = (n + k -1)/(2*k - 1)
And now we can check S = (Smax + Smin)/2
If we manage to construct partition using the method above than S is smaller or equal to it's largest possible value, also smallest part in constructed partition may be bigger than S and we can set new lower bound to it instead of S, if we fail S is bigger than possible.
Time complexity of one check is k multiplied by number of parent nodes updated each time, for well balanced tree number of updated nodes is constant (we will use trick explaned earlier and will not update all parent nodes), still it's not bigger than (n/k) in worst case for ultimately unbalanced tree. Searching for suitable vertices has very similar behavior (all vertices passed while searching will be updated later.).
Difference between n/k and (n + k -1)/(2*k - 1) is proportional to n/k.
So we have time complexity O(k * log(n/k)) in best case if we have precalculated subtree sizes, O(n) if subtree sizes are not precalculated and O(n * log(n/k)) in worst case.
This method may lead to situation when last of parts will be comparably big but I suppose once you've got suggested method you can figure out some improvements to minimize it.
Here is a polynomial deterministic solution:
Let's assume that the tree is rooted and there are two fixed values: MIN and MAX - minimum and maximum allowed size of one component.
Then one can use dynamic programming to check if there is a partition such that each component size is between MIN and MAX:
Let's assume f(node, cuts_count, current_count) is true if and only if there is a way to make exactly cuts_count cuts in node's subtree so that current_count vertices are connected to the node so that condition 2) holds true.
The base case for the leaves is: f(leaf, 1, 0)(cut the edge from the parent to the leaf) is true if and only if MIN <= 1 and MAX >= 1 f(leaf, 0, 1)(do not cut it) is always true(it is false for all other values of cuts_count and current_count).
To compute f for a node(not a leaf), one can use the following algorithm:
//Combine all possible children states.
for cuts_left in 0..k
for cuts_right in 0..k
for cnt_left in 0..left_subtree_size
for cnt_right in 0..right_subtree_size
if f(left_child, cuts_left, cnt_left) is true and
f(right_child, cuts_right, cnt_right) is true and then
f(node, cuts_left + cuts_right, cnt_left + cnt_right + 1) = true
//Cut an edge from this node to its parent.
for cuts in 0..k-1
for cnt in 0..node's_subtree_size
if f(node, cuts, node's_subtree_size) is true and MIN <= cnt <= MAX:
f(node, cuts + 1, 0) = true
What this pseudo code does is combining all possible states of node's children to compute all reachable states for this node(the first bunch of for loops) and then produces the rest of reachable states by cutting the edge between this node and its parent(the second bunch of for loops)(the state means (node, cuts_count, current_count) tuple. I call it reachable if f(state) is true).
That is the case for a node with two children, the case with one child can be processes in similar manner.
Finally, if f(root, k, 0) is true then it is possible to find the partition which stratifies the condition 2) and it is not possible otherwise. We need to "pretend" that we did k cuts here because we also cut an imaginary edge from root to its parent(this edge and this parent doesn't exist actually) when we computed f for the root(to avoid corner case).
The space complexity of this algorithm(for fixed MIN and MAX) is O(n^2 * k)(n is the number of nodes), time complexity is O(k^2 * n^2). It might seem that the complexity is actually O(k^2 * n^3), but is not so because the product of number of vertices in left and right subtree of a node is exactly the number of pairs of node's such that their least common ancestor is this node. But the total number of pairs of nodes is O(n^2)(and each pair has only one least common ancestor). Thus, the sum of products of left and right subtree sizes over all nodes is O(n^2).
One can simply try all possible MIN and MAX values and choose the best, but it can be done faster. The key observation is that if there is a solution for MIN and MAX, there is always a solution for MIN and MAX + 1. Thus, one can iterate over all possible values of MIN(n / k different values) and apply binary search to find the smallest MAX which gives a valid solution(log n iterations). So the overall time complexity is O(n^2 * k^2 * n / k * log n) = O(n^3 * k * log n). However, if you want to maximize MIN(not to minimize the difference between MAX and MIN), you can simply use this algorithm and ignore MAX value everywhere(by setting its value to n). Then no binary search over MAX would be required, but one would be able to binary search over MIN instead and obtain an O(n^2 * k^2 * log n) solution.
To reconstruct the partition itself, one can start from f(root, k, 0) and apply the steps we used to compute f, but this time in opposite direction(from root to leaves). It is also possible to save the information about how to get the value of each state(what children's states were combined or what was the state before the edge was cut)(and update it appropriately during the initial computation of f) and then reconstruct the partition using this data(if my explanation of this step seems not very clear, reading an article on dynamic programming and reconstructing the answer might help).
So, there is a polynomial solution for this problem on a binary tree(even though it is NP-hard for an arbitrary graph).

How to find largest common sub-tree in the given two binary search trees?

Two BSTs (Binary Search Trees) are given. How to find largest common sub-tree in the given two binary trees?
EDIT 1:
Here is what I have thought:
Let, r1 = current node of 1st tree
r2 = current node of 2nd tree
There are some of the cases I think we need to consider:
Case 1 : r1.data < r2.data
2 subproblems to solve:
first, check r1 and r2.left
second, check r1.right and r2
Case 2 : r1.data > r2.data
2 subproblems to solve:
- first, check r1.left and r2
- second, check r1 and r2.right
Case 3 : r1.data == r2.data
Again, 2 cases to consider here:
(a) current node is part of largest common BST
compute common subtree size rooted at r1 and r2
(b)current node is NOT part of largest common BST
2 subproblems to solve:
first, solve r1.left and r2.left
second, solve r1.right and r2.right
I can think of the cases we need to check, but I am not able to code it, as of now. And it is NOT a homework problem. Does it look like?
Just hash the children and key of each node and look for duplicates. This would give a linear expected time algorithm. For example, see the following pseudocode, which assumes that there are no hash collisions (dealing with collisions would be straightforward):
ret = -1
// T is a tree node, H is a hash set, and first is a boolean flag
hashTree(T, H, first):
if (T is null):
return 0 // leaf case
h = hash(hashTree(T.left, H, first), hashTree(T.right, H, first), T.key)
if (first):
// store hashes of T1's nodes in the set H
H.insert(h)
else:
// check for hashes of T2's nodes in the set H containing T1's nodes
if H.contains(h):
ret = max(ret, size(T)) // size is recursive and memoized to get O(n) total time
return h
H = {}
hashTree(T1, H, true)
hashTree(T2, H, false)
return ret
Note that this is assuming the standard definition of a subtree of a BST, namely that a subtree consists of a node and all of its descendants.
Assuming there are no duplicate values in the trees:
LargestSubtree(Tree tree1, Tree tree2)
Int bestMatch := 0
Int bestMatchCount := 0
For each Node n in tree1 //should iterate breadth-first
//possible optimization: we can skip every node that is part of each subtree we find
Node n2 := BinarySearch(tree2(n.value))
Int matchCount := CountMatches(n, n2)
If (matchCount > bestMatchCount)
bestMatch := n.value
bestMatchCount := matchCount
End
End
Return ExtractSubtree(BinarySearch(tree1(bestMatch)), BinarySearch(tree2(bestMatch)))
End
CountMatches(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return 0
End
Return 1 + CountMatches(n1.left, n2.left) + CountMatches(n1.right, n2.right)
End
ExtractSubtree(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return nil
End
Node result := New Node(n1.value)
result.left := ExtractSubtree(n1.left, n2.left)
result.right := ExtractSubtree(n1.right, n2.right)
Return result
End
To briefly explain, this is a brute-force solution to the problem. It does a breadth-first walk of the first tree. For each node, it performs a BinarySearch of the second tree to locate the corresponding node in that tree. Then using those nodes it evaluates the total size of the common subtree rooted there. If the subtree is larger than any previously found subtree, it remembers it for later so that it can construct and return a copy of the largest subtree when the algorithm completes.
This algorithm does not handle duplicate values. It could be extended to do so by using a BinarySearch implementation that returns a list of all nodes with the given value, instead of just a single node. Then the algorithm could iterate this list and evaluate the subtree for each node and then proceed as normal.
The running time of this algorithm is O(n log m) (it traverses n nodes in the first tree, and performs a log m binary-search operation for each one), putting it on par with most common sorting algorithms. The space complexity is O(1) while running (nothing allocated beyond a few temporary variables), and O(n) when it returns its result (because it creates an explicit copy of the subtree, which may not be required depending upon exactly how the algorithm is supposed to express its result). So even this brute-force approach should perform reasonably well, although as noted by other answers an O(n) solution is possible.
There are also possible optimizations that could be applied to this algorithm, such as skipping over any nodes that were contained in a previously evaluated subtree. Because the tree-walk is breadth-first we know than any node that was part of some prior subtree cannot ever be the root of a larger subtree. This could significantly improve the performance of the algorithm in certain cases, but the worst-case running time (two trees with no common subtrees) would still be O(n log m).
I believe that I have an O(n + m)-time, O(n + m) space algorithm for solving this problem, assuming the trees are of size n and m, respectively. This algorithm assumes that the values in the trees are unique (that is, each element appears in each tree at most once), but they do not need to be binary search trees.
The algorithm is based on dynamic programming and works with the following intution: suppose that we have some tree T with root r and children T1 and T2. Suppose the other tree is S. Now, suppose that we know the maximum common subtree of T1 and S and of T2 and S. Then the maximum subtree of T and S
Is completely contained in T1 and r.
Is completely contained in T2 and r.
Uses both T1, T2, and r.
Therefore, we can compute the maximum common subtree (I'll abbreviate this as MCS) as follows. If MCS(T1, S) or MCS(T2, S) has the roots of T1 or T2 as roots, then the MCS we can get from T and S is given by the larger of MCS(T1, S) and MCS(T2, S). If exactly one of MCS(T1, S) and MCS(T2, S) has the root of T1 or T2 as a root (assume w.l.o.g. that it's T1), then look up r in S. If r has the root of T1 as a child, then we can extend that tree by a node and the MCS is given by the larger of this augmented tree and MCS(T2, S). Otherwise, if both MCS(T1, S) and MCS(T2, S) have the roots of T1 and T2 as roots, then look up r in S. If it has as a child the root of T1, we can extend the tree by adding in r. If it has as a child the root of T2, we can extend that tree by adding in r. Otherwise, we just take the larger of MCS(T1, S) and MCS(T2, S).
The formal version of the algorithm is as follows:
Create a new hash table mapping nodes in tree S from their value to the corresponding node in the tree. Then fill this table in with the nodes of S by doing a standard tree walk in O(m) time.
Create a new hash table mapping nodes in T from their value to the size of the maximum common subtree of the tree rooted at that node and S. Note that this means that the MCS-es stored in this table must be directly rooted at the given node. Leave this table empty.
Create a list of the nodes of T using a postorder traversal. This takes O(n) time. Note that this means that we will always process all of a node's children before the node itself; this is very important!
For each node v in the postorder traversal, in the order they were visited:
Look up the corresponding node in the hash table for the nodes of S.
If no node was found, set the size of the MCS rooted at v to 0.
If a node v' was found in S:
If neither of the children of v' match the children of v, set the size of the MCS rooted at v to 1.
If exactly one of the children of v' matches a child of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the subtree rooted at that child.
If both of the children of v' match the children of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the left subtree plus the size of the MCS of the right subtree.
(Note that step (4) runs in expected O(n) time, since it visits each node in S exactly once, makes O(n) hash table lookups, makes n hash table inserts, and does a constant amount of processing per node).
Iterate across the hash table and return the maximum value it contains. This step takes O(n) time as well. If the hash table is empty (S has size zero), return 0.
Overall, the runtime is O(n + m) time expected and O(n + m) space for the two hash tables.
To see a correctness proof, we proceed by induction on the height of the tree T. As a base case, if T has height zero, then we just return zero because the loop in (4) does not add anything to the hash table. If T has height one, then either it exists in T or it does not. If it exists in T, then it can't have any children at all, so we execute branch 4.3.1 and say that it has height one. Step (6) then reports that the MCS has size one, which is correct. If it does not exist, then we execute 4.2, putting zero into the hash table, so step (6) reports that the MCS has size zero as expected.
For the inductive step, assume that the algorithm works for all trees of height k' < k and consider a tree of height k. During our postorder walk of T, we will visit all of the nodes in the left subtree, then in the right subtree, and finally the root of T. By the inductive hypothesis, the table of MCS values will be filled in correctly for the left subtree and right subtree, since they have height ≤ k - 1 < k. Now consider what happens when we process the root. If the root doesn't appear in the tree S, then we put a zero into the table, and step (6) will pick the largest MCS value of some subtree of T, which must be fully contained in either its left subtree or right subtree. If the root appears in S, then we compute the size of the MCS rooted at the root of T by trying to link it with the MCS-es of its two children, which (inductively!) we've computed correctly.
Whew! That was an awesome problem. I hope this solution is correct!
EDIT: As was noted by #jonderry, this will find the largest common subgraph of the two trees, not the largest common complete subtree. However, you can restrict the algorithm to only work on subtrees quite easily. To do so, you would modify the inner code of the algorithm so that it records a subtree of size 0 if both subtrees aren't present with nonzero size. A similar inductive argument will show that this will find the largest complete subtree.
Though, admittedly, I like the "largest common subgraph" problem a lot more. :-)
The following algorithm computes all the largest common subtrees of two binary trees (with no assumption that it is a binary search tree). Let S and T be two binary trees. The algorithm works from the bottom of the trees up, starting at the leaves. We start by identifying leaves with the same value. Then consider their parents and identify nodes with the same children. More generally, at each iteration, we identify nodes provided they have the same value and their children are isomorphic (or isomorphic after swapping the left and right children). This algorithm terminates with the collection of all pairs of maximal subtrees in T and S.
Here is a more detailed description:
Let S and T be two binary trees. For simplicity, we may assume that for each node n, the left child has value <= the right child. If exactly one child of a node n is NULL, we assume the right node is NULL. (In general, we consider two subtrees isomorphic if they are up to permutation of the left/right children for each node.)
(1) Find all leaf nodes in each tree.
(2) Define a bipartite graph B with edges from nodes in S to nodes in T, initially with no edges. Let R(S) and T(S) be empty sets. Let R(S)_next and R(T)_next also be empty sets.
(3) For each leaf node in S and each leaf node in T, create an edge in B if the nodes have the same value. For each edge created from nodeS in S to nodeT in T, add all the parents of nodeS to the set R(S) and all the parents of nodeT to the set R(T).
(4) For each node nodeS in R(S) and each node nodeT in T(S), draw an edge between them in B if they have the same value AND
{
(i): nodeS->left is connected to nodeT->left and nodeS->right is connected to nodeT->right, OR
(ii): nodeS->left is connected to nodeT->right and nodeS->right is connected to nodeT->left, OR
(iii): nodeS->left is connected to nodeT-> right and nodeS->right == NULL and nodeT->right==NULL
(5) For each edge created in step (4), add their parents to R(S)_next and R(T)_next.
(6) If (R(S)_next) is nonempty {
(i) swap R(S) and R(S)_next and swap R(T) and R(T)_next.
(ii) Empty the contents of R(S)_next and R(T)_next.
(iii) Return to step (4).
}
When this algorithm terminates, R(S) and T(S) contain the roots of all maximal subtrees in S and T. Furthermore, the bipartite graph B identifies all pairs of nodes in S and nodes in T that give isomorphic subtrees.
I believe this algorithm has complexity is O(n log n), where n is the total number of nodes in S and T, since the sets R(S) and T(S) can be stored in BST’s ordered by value, however I would be interested to see a proof.

Resources