Encoding directed graph as numbers - algorithm

Let's say that I have a directed graph, with a single root and without cycles. I would like to add a type on each node (for example as an integer with some custom ordering) with the following property:
if Node1.type <= Node2.type then there exists a path from Node1 to Node2
Note that topological sorting actually satisfies the reversed property:
if there exists a path from Node1 to Node2 then Node1.type <= Node2.type
so it cannot be used here.
Now note that integers with natural ordering cannot be used here because every 2 integers can be compared, i.e. the ordering of integers is linear while the tree does not have to be.
So here's an example. Assume that the graph has 4 nodes A, B, C, D and 4 arrows:
A->B, A->C, B->D, C->D
So it's a diamond. Now we can put
A.type = 00
B.type = 01
C.type = 10
D.type = 11
where on the right side we have integers in binary format. The comparison is defined bitwise:
(X <= Y) if and only if (n-th bit of X <= n-th bit of Y for all n)
So I guess such ordering could be used, the question is how to construct values from a given graph? I'm not even sure if the solution always exists. Any hints?
UPDATE: Since there is some misunderstanding about terminology I'm using let me be more explicite: I'm interested in directed acyclic graph such that there is exactly one node without predecessors (a.k.a. the root) and there's at most one arrow between any two nodes. The diamond would be an example. It does not have to have one leaf (i.e. the node without successors). Each node might have multiple predecessors and multiple successors. You might say that this is a partially ordered set with a smallest element (i.e. a unique globally minimal element).

You call the relation <=, but it's necessarily not complete (that is: it may be that for a given pair a and b, neither a <= b nor b <= a).
Here's one idea for how to define it.
If your nodes are numbered 0, 1..., N-1, then you can define type like this:
type(i) = (1 << i) + sum(1 << (N + j), for j such that Path(i, j))
And define <= like this:
type1 <= type2 if (type1 >> N) & type2 != 0
That is, type(i) encodes the value of i in the lowest N bits, and the set of all reachable nodes in the highest N bits. The <= relation looks for the target node in the encoded set of reachable nodes.
This definition works whether or not there's cycles in the graph, and in fact just encodes an arbitrary relation on your set of nodes.
You could make the definition a little more efficient by using ceil(log2(N)) bits to encode the node number (for a total of N + ceil(log2(N)) bits per type).

For any DAG, you can define x <= y as "there's a path from x to y". This relation is a partial order. I take it that the question is how to represent this relation efficiently.
For each vertex X, define ¡X to be the set of vertices reachable from X (including X itself). The two statements
¡X is a subset of ¡Y
X is reachable from Y
are equivalent.
Encode these sets as bitsets (N-bit binary numbers), and you are set.

The question said (and continues to say) that the input is a tree, but a later edit contradicted this with an example of a diamond graph. In such non-tree cases, my algorithm below won't apply.
The existing answers work for general relations on general directed graphs, which inflates their representation sizes to O(n) bits for n vertices. Since you have a tree, a shorter O(log n)-bit representation is possible.
In a tree directed away from the root, for any two vertices u and v, the sets of leaves L(u) and L(v) reachable from u and v, respectively, must either be disjoint, or one must be a subset of the other. If they are disjoint, then u is not reachable from v (and vice versa); if one is a proper subset of the other, the one with the smaller set is reachable from the other (and in this case, the one with the smaller set will necessarily have strictly greater depth). If L(u) = L(v), then u is reachable from v if and only if depth(v) < depth(u), where depth(u) is the number of edges on the path from the root to u. (In particular, if L(u) = L(v) and depth(u) = depth(v), then u = v.)
We can encode this relationship concisely by noticing that all leaves reachable from a given vertex v occupy a contiguous segment of the leaves output by an inorder traversal of the tree. For any given vertex v, this set of leaves can therefore be represented by a pair of integers (first, last), with first identifying the first leaf (in inorder traversal order) and last the last. The test for whether a path exists from i to j is then very simple -- in pseudo-C++:
bool doesPathExist(int i, int j) {
return x[i].first <= x[j].first && x[i].last >= x[j].last && depth[i] <= depth[j];
}
Note that if every non-leaf vertex in the tree has at least 2 children, then you don't need to bother with depths, since L(u) = L(v) implies u = v in this case. (My original version of the post made this assumption; I've now fixed it to work even when this is not the case.)

Related

How to prune this type of sorted weighted trees to maximize this particular function?

Disclaimer #1: I'm not a pro, so many of my nomenclatures might be not standard or useful. Please bear with me / edit me.
Disclaimer #2: As the tags suggest, this may start out as a theoretical question, but I think it's a programming one, though some theory would also be nice.
First, let me describe this type of sorted weighted trees, now called SWR trees. Let T = (V, E, W, U, m, r) be an SWR tree. The only defining properties of T are:
T is a m-ary rooted tree with root r, and every leaf has the same height/level in T
T has predefined and unchanged weights on edges, defined by the function W: E -> R+ (R+ is the set of positive real numbers)
T has predefined and unchanged weights on leaves, defined by the function U: V_L -> R+ (V_L is the set of leaves in V)
For each non-leaf node v of T, its children are sorted in the increasing values of the edges connecting them to v
Now, let me describe the function on T, now called F(T). F will produce a number on T as follows:
Extend the function U to U*: V -> R+ as follows: for each non-leaf node v, assign to v the largest value of the child edges of v (the edges connecting v to its children)
For each height/level h of T, calculate f(h) as the minimum value of the vertices (defined by U*) at that height/level
Sum all of the f(h) to get F(T)
Also, let me describe the proper pruning process on T. Consider the pruning of the edges. When an edge is pruned, its sub-tree is removed. Not only that, all of its larger edges (and their sub-trees) are also removed (keep in mind, due to the sorting, only consider the larger sibling edges). Hence, the remaining tree T' is still an SWR tree and properly inherits all properties from T. Obviously, F(T') has changed (even U* and f have changed).
Therefore, the problem arises. Given an SWR tree T, how can one properly prune it to get an SWR tree T' with the maximum value of F ?
Disclaimer #3: I'm aware of the fact that the problem is like fallen from the sky and rather messy. Please feel free to reformulate it as you like. Also, just to formulate the problem itself exhausts me a bit, so I have had no handle to solve this yet.
Let's first simplify your problem definition slightly by removing the leaf weights. Now that none of the weights are negative, we can put a single child under each leaf and move each leaf's weight to its new child edge.
I can write down what seems like a pretty tight integer program that captures this problem. For each edge e, the variable x[e] is 1 if we keep the edge, 0 otherwise. The variable y[e] is 1 if e is the minimum value of the maximum sibling on its level, 0 otherwise.
maximize sum_{e} W(e) y[e]
subject to
for all e, x[e] ∈ {0, 1}
for all e, y[e] ∈ {0, 1}
for all e sibling of e' with W(e) ≤ W(e'), x[e'] − x[e] ≤ 0
for all e parent of e', x[e'] − x[e] ≤ 0
for all levels ℓ, for all e at level ℓ, for all p at level ℓ−1, y[e] + x[p] − sum_{e' child of p with W(e) ≤ W(e')} x[e] ≤ 1
for all levels ℓ, sum_{e at level ℓ} y[e] = 1
The first two constraint groups enforce the restrictions on pruning. The next constraint group says, essentially, an edge cannot be the minimum value of the maximum sibling on its level unless each sibling group on its level has an edge at least as valuable or is totally gone. The final constraint is only needed to break ties.
This formulation can be solved as is with an integer program solver, but I strongly suspect that there's a more efficient algorithm.

An efficient solution to find if n vertex disjoint path exist

You have been given an r x c grid. The vertices in i row and j column is denoted by (i,j). All vertices in grid have exactly four neighbors except boundary ones which are denoted by (i,j) if i = 1, i = r , j = 1 or j = c. You are given n starting points. Determine whether there are n vertex disjoint paths from starting points to n boundary points.
My Solution
This can be modeled as a max-flow problem. The starting points will be sources, boundary targets and each edge and vertex will have capacity of 1. This can be further reduced to generic max flow problem by making each vertex split in two, with capacity of 1 in edge between them, and having a supersource and a supersink connected with sources and targets be edge of capacity one respectively.
After this I can simply check whether there exists a flow in each edge (s , si) where s is supersource and si is ith source in i = 1 to n. If it does then the method returns True otherwise False.
Problem
But it seems like using max-flow in this is kind of overkill. It would take some time in preprocessing the graph and the max-flow takes about O(V(E1/2)).
So I was wondering if there exists an more efficient solution to compute it?

How to Check for existence of Hamiltonian walk in O(2^n) of memory and O(2^n *n) of time

We can simply modify the travelling salesman problem to get whether a Hamilton walk exists or not in O(2^N * N^2)Time Complexity.
But I read it at codeforces that it is possible to solve this problem in O(2^N * N) Time .
Although , I cannot understand the state they are considering, but they are kind of compressing the original state of Travelling Salesman Problem.
Can someone give me a detailed explanation, I am new to bitmasking + DP (Infact I started today :D)
If you want you can look Point 4 at Codeforces
Terminology:
binary (x) means x is based 2.
Nodes numbered starting from 0
mask always represent a set of nodes. A node i is in mask means 2^i AND mask != 0. In the same way set mask-i (here - means removing i from the set) can be represented as mask XOR 2^i in bitmask.
Let mask be the bitmask of a set of nodes. We define dp[mask] as the bitmask of another set of nodes, which contains a node i if and only if:
i in mask
a hamilton walk exists for the set of nodes mask, which ends in node i
For example, dp[binary(1011)] = binary(1010) means that a hamilton walk exists for binary(1011) which ends in node 1, and another hamilton walk exists for binary(1011) which ends in node 3
So for N nodes, a hamilton walk exists for all of them if dp[2^N - 1] != 0
Then as described in the Codeforces link you posted, we can calculate dp[] by:
When mask only contains one node i
dp[2^i] = 2^i (which means for a single node, a
walk always exists, and it ends at itself.
Otherwise
Given mask, by definition of dp[mask], for every node
i in mask, we want to know if a walk exist for mask, and ends at i. To
calculate this, we first check if any walk exists for the set of nodes
mask-i, then check among those walks of mask-i, if there's a walk
ends at a node j that's connected to i. Since combining them gives us a walk of mask that ends at i.
To make this step faster, we pre-process M[i] to be the bitmask of
all notes connected to i.
So i in dp[mask] if dp[mask XOR 2^i] AND M[i] != 0.
To explain a bit more about this step, dp[mask XOR 2^i] is the set of nodes that walk of mask-i can end, and M[i] is the set of nodes that's directly connected to i. So the AND of them means if any walk of mask that ends in i exists.
dp[] is O(2^N) in space.
Calculating dp[] looks like
for mask = range(0, 2^N):
for i in range(0,N):
if 2^i AND mask != 0: # Node i in mask
if (mask only has one node) || (dp[mask XOR 2^i] AND M[i] != 0):
dp[mask] = dp[mask] OR 2^i # adding node i to dp[mask]
Which is O(2^N * N)
[EDIT: After seeing the other answer, I realise I answered the wrong question. Maybe this info is still useful? If not, I'll delete this. Let me know in a comment.]
They give a clear statement of what each entry of the DP table will hold. It's the solution to a particular subproblem consisting of just a particular subset of vertices, with the additional constraint that the path must end at a particular vertex:
Let dp[mask][i] be the length of the shortest Hamiltonian walk in the subgraph generated by vertices in mask, that ends in the vertex i.
Every path ends at some vertex, so the solution to the original problem (or at least its length) can be found by looking for the minimum of dp[(1 << n) - 1][i] over all 0 <= i < n ((1 << n) - 1 is just a nice trick for creating a bitset with the bottommost n bits all set to 1).
The main update rule (which I've slightly paraphrased below due to formatting limitations) could maybe benefit from more explanation:
dp[mask][i] = min(dp[mask XOR (1 << i)][j] + d(j, i)) over all j such that bit(j, mask) = 1 and (j, i) is an edge
So to populate dp[mask][i] we want to solve the subproblem for the set of vertices in mask, under the constraint that the last vertex in the path is i. First, notice that any path P that goes through all the vertices in mask and ends at i must have a final edge (assuming that there are at least 2 vertices in mask). This edge will be from some non-i vertex j in mask, to i. For convenience, let k be the number of vertices in mask that have an out-edge to i. Let Q be the same path as P, but with its final edge (j, i) discarded: then the length of P is length(Q) + d(j, i). Since any path can be decomposed this way, we could break up the set of all paths through mask to i into k groups according to their final edge, find the best path in each group, and then pick the best of these k minima, and this will guarantee that we haven't overlooked any possibilities.
More formally, to find the shortest path P it would suffice to consider all k possible final edges (j, i), for each such choice finding a path Q through the remaining vertices in mask (i.e., all vertices except for i itself) that ends at j and minimises length(Q) + d(j, i), and then picking the minimum of these minima.
At first, grouping by final edge doesn't seem to help much. But notice that for a particular choice of j, any path Q that ends at j and minimises length(Q) + d(j, i) also minimises length(Q) and vice versa, since d(j, i) is just a fixed extra cost when j (and of course i) are fixed. And it so happens that we already know such a path (or at least its length, which is all we actually need): it is dp[mask XOR (1 << i)][j]! (1 << i) means "the binary integer 1 shifted left i times" -- this creates a bitset consisting of a single vertex, namely i; the XOR has the effect of removing this vertex from mask (since we already know the corresponding bit must be 1 in mask). All in all, mask XOR (1 << i) means mask \ {i} in more mathematical notation.
We still don't know which penultimate vertex j is the best, so we have to try all k of them and pick the best as before -- but finding the best path Q for each choice of j is now a simple O(1) array lookup instead of an exponential-time search :)

Complete graph with only two possible costs. What's the shortest path's cost from 0 to N - 1

You are given a complete undirected graph with N vertices. All but K edges have a cost of A. Those K edges have a cost of B and you know them (as a list of pairs). What's the minimum cost from node 0 to node N - 1.
2 <= N <= 500k
0 <= K <= 500k
1 <= A, B <= 500k
The problem is, obviously, when those K edges cost more than the other ones and node 0 and node N - 1 are connected by a K-edge.
Dijkstra doesn't work. I've even tried something very similar with a BFS.
Step1: Let G(0) be the set of "good" adjacent nodes with node 0.
Step2: For each node in G(0):
compute G(node)
if G(node) contains N - 1
return step
else
add node to some queue
repeat step2 and increment step
The problem is that this uses up a lot of time due to the fact that for every node you have to make a loop from 0 to N - 1 in order to find the "good" adjacent nodes.
Does anyone have any better ideas? Thank you.
Edit: Here is a link from the ACM contest: http://acm.ro/prob/probleme/B.pdf
This is laborous case work:
A < B and 0 and N-1 are joined by A -> trivial.
B < A and 0 and N-1 are joined by B -> trivial.
B < A and 0 and N-1 are joined by A ->
Do BFS on graph with only K edges.
A < B and 0 and N-1 are joined by B ->
You can check in O(N) time is there is a path with length 2*A (try every vertex in middle).
To check other path lengths following algorithm should do the trick:
Let X(d) be set of nodes reachable by using d shorter edges from 0. You can find X(d) using following algorithm: Take each vertex v with unknown distance and iterativelly check edges between v and vertices from X(d-1). If you found short edge, then v is in X(d) otherwise you stepped on long edge. Since there are at most K long edges you can step on them at most K times. So you should find distance of each vertex in at most O(N + K) time.
I propose a solution to a somewhat more general problem where you might have more than two types of edges and the edge weights are not bounded. For your scenario the idea is probably a bit overkill, but the implementation is quite simple, so it might be a good way to go about the problem.
You can use a segment tree to make Dijkstra more efficient. You will need the operations
set upper bound in a range as in, given U, L, R; for all x[i] with L <= i <= R, set x[i] = min(x[i], u)
find a global minimum
The upper bounds can be pushed down the tree lazily, so both can be implemented in O(log n)
When relaxing outgoing edges, look for the edges with cost B, sort them and update the ranges in between all at once.
The runtime should be O(n log n + m log m) if you sort all the edges upfront (by outgoing vertex).
EDIT: Got accepted with this approach. The good thing about it is that it avoids any kind of special casing. It's still ~80 lines of code.
In the case when A < B, I would go with kind of a BFS, where you would check where you can't reach instead of where you can. Here's the pseudocode:
G(k) is the set of nodes reachable by k cheap edges and no less. We start with G(0) = {v0}
while G(k) isn't empty and G(k) doesn't contain vN-1 and k*A < B
A = array[N] of zeroes
for every node n in G(k)
for every expensive edge (n,m)
A[m]++
# now we have that A[m] == |G(k)| iff m can't be reached by a cheap edge from any of G(k)
set G(k+1) to {m; A[m] < |G(k)|} except {n; n is in G(0),...G(k)}
k++
This way you avoid iterating through the (many) cheap edges and only iterate through the relatively few expensive edges.
As you have correctly noted, the problem comes when A > B and edge from 0 to n-1 has a cost of A.
In this case you can simply delete all edges in the graph that have a cost of A. This is because an optimal route shall only have edges with cost B.
Then you can perform a simple BFS since the costs of all edges are the same. It will give you optimal performance as pointed out by this link: Finding shortest path for equal weighted graph
Moreover, you can stop your BFS when the total cost exceeds A.

How to find largest common sub-tree in the given two binary search trees?

Two BSTs (Binary Search Trees) are given. How to find largest common sub-tree in the given two binary trees?
EDIT 1:
Here is what I have thought:
Let, r1 = current node of 1st tree
r2 = current node of 2nd tree
There are some of the cases I think we need to consider:
Case 1 : r1.data < r2.data
2 subproblems to solve:
first, check r1 and r2.left
second, check r1.right and r2
Case 2 : r1.data > r2.data
2 subproblems to solve:
- first, check r1.left and r2
- second, check r1 and r2.right
Case 3 : r1.data == r2.data
Again, 2 cases to consider here:
(a) current node is part of largest common BST
compute common subtree size rooted at r1 and r2
(b)current node is NOT part of largest common BST
2 subproblems to solve:
first, solve r1.left and r2.left
second, solve r1.right and r2.right
I can think of the cases we need to check, but I am not able to code it, as of now. And it is NOT a homework problem. Does it look like?
Just hash the children and key of each node and look for duplicates. This would give a linear expected time algorithm. For example, see the following pseudocode, which assumes that there are no hash collisions (dealing with collisions would be straightforward):
ret = -1
// T is a tree node, H is a hash set, and first is a boolean flag
hashTree(T, H, first):
if (T is null):
return 0 // leaf case
h = hash(hashTree(T.left, H, first), hashTree(T.right, H, first), T.key)
if (first):
// store hashes of T1's nodes in the set H
H.insert(h)
else:
// check for hashes of T2's nodes in the set H containing T1's nodes
if H.contains(h):
ret = max(ret, size(T)) // size is recursive and memoized to get O(n) total time
return h
H = {}
hashTree(T1, H, true)
hashTree(T2, H, false)
return ret
Note that this is assuming the standard definition of a subtree of a BST, namely that a subtree consists of a node and all of its descendants.
Assuming there are no duplicate values in the trees:
LargestSubtree(Tree tree1, Tree tree2)
Int bestMatch := 0
Int bestMatchCount := 0
For each Node n in tree1 //should iterate breadth-first
//possible optimization: we can skip every node that is part of each subtree we find
Node n2 := BinarySearch(tree2(n.value))
Int matchCount := CountMatches(n, n2)
If (matchCount > bestMatchCount)
bestMatch := n.value
bestMatchCount := matchCount
End
End
Return ExtractSubtree(BinarySearch(tree1(bestMatch)), BinarySearch(tree2(bestMatch)))
End
CountMatches(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return 0
End
Return 1 + CountMatches(n1.left, n2.left) + CountMatches(n1.right, n2.right)
End
ExtractSubtree(Node n1, Node n2)
If (!n1 || !n2 || n1.value != n2.value)
Return nil
End
Node result := New Node(n1.value)
result.left := ExtractSubtree(n1.left, n2.left)
result.right := ExtractSubtree(n1.right, n2.right)
Return result
End
To briefly explain, this is a brute-force solution to the problem. It does a breadth-first walk of the first tree. For each node, it performs a BinarySearch of the second tree to locate the corresponding node in that tree. Then using those nodes it evaluates the total size of the common subtree rooted there. If the subtree is larger than any previously found subtree, it remembers it for later so that it can construct and return a copy of the largest subtree when the algorithm completes.
This algorithm does not handle duplicate values. It could be extended to do so by using a BinarySearch implementation that returns a list of all nodes with the given value, instead of just a single node. Then the algorithm could iterate this list and evaluate the subtree for each node and then proceed as normal.
The running time of this algorithm is O(n log m) (it traverses n nodes in the first tree, and performs a log m binary-search operation for each one), putting it on par with most common sorting algorithms. The space complexity is O(1) while running (nothing allocated beyond a few temporary variables), and O(n) when it returns its result (because it creates an explicit copy of the subtree, which may not be required depending upon exactly how the algorithm is supposed to express its result). So even this brute-force approach should perform reasonably well, although as noted by other answers an O(n) solution is possible.
There are also possible optimizations that could be applied to this algorithm, such as skipping over any nodes that were contained in a previously evaluated subtree. Because the tree-walk is breadth-first we know than any node that was part of some prior subtree cannot ever be the root of a larger subtree. This could significantly improve the performance of the algorithm in certain cases, but the worst-case running time (two trees with no common subtrees) would still be O(n log m).
I believe that I have an O(n + m)-time, O(n + m) space algorithm for solving this problem, assuming the trees are of size n and m, respectively. This algorithm assumes that the values in the trees are unique (that is, each element appears in each tree at most once), but they do not need to be binary search trees.
The algorithm is based on dynamic programming and works with the following intution: suppose that we have some tree T with root r and children T1 and T2. Suppose the other tree is S. Now, suppose that we know the maximum common subtree of T1 and S and of T2 and S. Then the maximum subtree of T and S
Is completely contained in T1 and r.
Is completely contained in T2 and r.
Uses both T1, T2, and r.
Therefore, we can compute the maximum common subtree (I'll abbreviate this as MCS) as follows. If MCS(T1, S) or MCS(T2, S) has the roots of T1 or T2 as roots, then the MCS we can get from T and S is given by the larger of MCS(T1, S) and MCS(T2, S). If exactly one of MCS(T1, S) and MCS(T2, S) has the root of T1 or T2 as a root (assume w.l.o.g. that it's T1), then look up r in S. If r has the root of T1 as a child, then we can extend that tree by a node and the MCS is given by the larger of this augmented tree and MCS(T2, S). Otherwise, if both MCS(T1, S) and MCS(T2, S) have the roots of T1 and T2 as roots, then look up r in S. If it has as a child the root of T1, we can extend the tree by adding in r. If it has as a child the root of T2, we can extend that tree by adding in r. Otherwise, we just take the larger of MCS(T1, S) and MCS(T2, S).
The formal version of the algorithm is as follows:
Create a new hash table mapping nodes in tree S from their value to the corresponding node in the tree. Then fill this table in with the nodes of S by doing a standard tree walk in O(m) time.
Create a new hash table mapping nodes in T from their value to the size of the maximum common subtree of the tree rooted at that node and S. Note that this means that the MCS-es stored in this table must be directly rooted at the given node. Leave this table empty.
Create a list of the nodes of T using a postorder traversal. This takes O(n) time. Note that this means that we will always process all of a node's children before the node itself; this is very important!
For each node v in the postorder traversal, in the order they were visited:
Look up the corresponding node in the hash table for the nodes of S.
If no node was found, set the size of the MCS rooted at v to 0.
If a node v' was found in S:
If neither of the children of v' match the children of v, set the size of the MCS rooted at v to 1.
If exactly one of the children of v' matches a child of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the subtree rooted at that child.
If both of the children of v' match the children of v, set the size of the MCS rooted at v to 1 plus the size of the MCS of the left subtree plus the size of the MCS of the right subtree.
(Note that step (4) runs in expected O(n) time, since it visits each node in S exactly once, makes O(n) hash table lookups, makes n hash table inserts, and does a constant amount of processing per node).
Iterate across the hash table and return the maximum value it contains. This step takes O(n) time as well. If the hash table is empty (S has size zero), return 0.
Overall, the runtime is O(n + m) time expected and O(n + m) space for the two hash tables.
To see a correctness proof, we proceed by induction on the height of the tree T. As a base case, if T has height zero, then we just return zero because the loop in (4) does not add anything to the hash table. If T has height one, then either it exists in T or it does not. If it exists in T, then it can't have any children at all, so we execute branch 4.3.1 and say that it has height one. Step (6) then reports that the MCS has size one, which is correct. If it does not exist, then we execute 4.2, putting zero into the hash table, so step (6) reports that the MCS has size zero as expected.
For the inductive step, assume that the algorithm works for all trees of height k' < k and consider a tree of height k. During our postorder walk of T, we will visit all of the nodes in the left subtree, then in the right subtree, and finally the root of T. By the inductive hypothesis, the table of MCS values will be filled in correctly for the left subtree and right subtree, since they have height ≤ k - 1 < k. Now consider what happens when we process the root. If the root doesn't appear in the tree S, then we put a zero into the table, and step (6) will pick the largest MCS value of some subtree of T, which must be fully contained in either its left subtree or right subtree. If the root appears in S, then we compute the size of the MCS rooted at the root of T by trying to link it with the MCS-es of its two children, which (inductively!) we've computed correctly.
Whew! That was an awesome problem. I hope this solution is correct!
EDIT: As was noted by #jonderry, this will find the largest common subgraph of the two trees, not the largest common complete subtree. However, you can restrict the algorithm to only work on subtrees quite easily. To do so, you would modify the inner code of the algorithm so that it records a subtree of size 0 if both subtrees aren't present with nonzero size. A similar inductive argument will show that this will find the largest complete subtree.
Though, admittedly, I like the "largest common subgraph" problem a lot more. :-)
The following algorithm computes all the largest common subtrees of two binary trees (with no assumption that it is a binary search tree). Let S and T be two binary trees. The algorithm works from the bottom of the trees up, starting at the leaves. We start by identifying leaves with the same value. Then consider their parents and identify nodes with the same children. More generally, at each iteration, we identify nodes provided they have the same value and their children are isomorphic (or isomorphic after swapping the left and right children). This algorithm terminates with the collection of all pairs of maximal subtrees in T and S.
Here is a more detailed description:
Let S and T be two binary trees. For simplicity, we may assume that for each node n, the left child has value <= the right child. If exactly one child of a node n is NULL, we assume the right node is NULL. (In general, we consider two subtrees isomorphic if they are up to permutation of the left/right children for each node.)
(1) Find all leaf nodes in each tree.
(2) Define a bipartite graph B with edges from nodes in S to nodes in T, initially with no edges. Let R(S) and T(S) be empty sets. Let R(S)_next and R(T)_next also be empty sets.
(3) For each leaf node in S and each leaf node in T, create an edge in B if the nodes have the same value. For each edge created from nodeS in S to nodeT in T, add all the parents of nodeS to the set R(S) and all the parents of nodeT to the set R(T).
(4) For each node nodeS in R(S) and each node nodeT in T(S), draw an edge between them in B if they have the same value AND
{
(i): nodeS->left is connected to nodeT->left and nodeS->right is connected to nodeT->right, OR
(ii): nodeS->left is connected to nodeT->right and nodeS->right is connected to nodeT->left, OR
(iii): nodeS->left is connected to nodeT-> right and nodeS->right == NULL and nodeT->right==NULL
(5) For each edge created in step (4), add their parents to R(S)_next and R(T)_next.
(6) If (R(S)_next) is nonempty {
(i) swap R(S) and R(S)_next and swap R(T) and R(T)_next.
(ii) Empty the contents of R(S)_next and R(T)_next.
(iii) Return to step (4).
}
When this algorithm terminates, R(S) and T(S) contain the roots of all maximal subtrees in S and T. Furthermore, the bipartite graph B identifies all pairs of nodes in S and nodes in T that give isomorphic subtrees.
I believe this algorithm has complexity is O(n log n), where n is the total number of nodes in S and T, since the sets R(S) and T(S) can be stored in BST’s ordered by value, however I would be interested to see a proof.

Resources