Proving that one binary tree is a subtree of another - algorithm

Assume you have two binary trees and you want to know whether one is a subtree of the other. One solution is to get the inorder and preorder traversals of both trees and check whether the traversals of the candidate subtree are substrings of the corresponding traversal for the other tree. I read several posts about this posts about this solution. One discussion shows that inorder AND preorder traversal are both necessary. Can someone explain why they are sufficient? Why is the case that if the inorder and preorder traversal of tree2 are substrings of those of tree1, then tree2 is a subtree of tree1?

Q: One discussion shows that inorder AND preorder traversal are both
necessary. Can someone explain why they are sufficient?
Because of the simple fact that it is possible to uniquely reconstruct a binary tree from these two traversals (or inorder and postorder, as well). Check this example:
Inorder : [1,2,3,4,5,6]
Preorder : [4,2,1,3,5,6]
From preorder, you know that 4 is the root of the tree. From inorder, you can determine the left and right subtree, and you proceed recursively from this point:
4
/ \
Left subtree Right subtree
Inorder : [1,2,3] Inorder : [5,6]
Preorder: [2,1,3] Preorder: [5,6]
Check for more details in this excellent article:
Reconstructing binary trees from tree traversal. Since these two serializations (traversals actually serialize tree to a string) of the tree combined together have to be unique for a binary tree, we get that one tree is a subtree of another if and only if these traversals are substrings of other two serializations.

People agreed that binary tree can represent the order on it's nodes by left/right relation. That means that left part comes before right part. You may call trees equivalent if the order is the same. So in-order string represents the the order and if you want to check the equivalence, then it is sufficient to check only in-order (by definition).
But when you want to check the full equality of trees then we have to find the way how we can distinguish equivalent trees.For example it can be level-order check. But for subtrees level order doesn't fit, because the level order string for subtree is split. For pre-order you walk the subtree form root before other parts of tree.
Suppose equivalent trees are not equal, then traversing in pre-order everything will be equal until first differ. 2 situations can happen.
1) The value of node of one tree differs from another. That means that pre-order strings differs, because you walk the tree in pre-order.
2) Children signature (no children, only left, only right, both children) differs. But in this situation easy to understand that the in-order will change and trees are not equivalent, which contradicts the conditions.
Note that this works only when all the nodes are unique. If you have all nodes of value like "a" then no matter how you walk, your string is always "aa...a". So you have to distinguish the nodes somehow, not only by "value".

Related

Proof that a unique BST can be reconstructed from a preorder (or a postorder) traversal unambiguously

For a Binary SEARCH Tree, a preorder or a postorder traversal is sufficient to reconstruct its original binary search tree unambigiously.
But I cannot find a proof or a explanation for why it is so.
For inorder traversal, it is trivial to come up with a counter-example to show that there may be many different BSTs correspond to a given inorder traversal.
Is there any proof or reference that a preorder or a postorder traversal is sufficient to reconstruct its original BST unambiguously?
This is for a BST, not for a general binary tree.
By the definition of pre-order traversal, it is guaranteed that root will be the first element.
Now, given the root, there is a unique set of elements that will come in the left half of the root - i.e. all elements having value less that root->val.
Likewise for the right subtree.
Now, we have to apply the above logic recursively for the left subtree and the right subtree.
Since, post-order traversal is just the reverse of pre-order traversal on the reverse tree, the same logic applies to post-order traversal too.

checking subtrees using preorder and inorder strings

A book I'm reading claims that one way to check whether a binary tree B is a subtree of a binary tree A is to build the inorder and preorder strings (strings that represent the inorder and preorder traversal of each tree) of both trees, and check whether inorder_B is a substring of inorder_A and preorder_B is a substring of preorder_A. Note that it claims that you have to check substring match on both the inorder and preorder strings.
Is it really necessary to check for a substring match on both the inorder and preorder strings? Wouldn't it suffice to check either? Could someone provide an example to prove me wrong (i.e. prove the claim in the book right)? I couldn't come up with an example where two trees were unequal but the preorder or inorder strings match.
Consider the two two node trees with A and B as nodes. Tree one has B as root and A as left child. Tree two has A as root and B as right child. The inorder traversals match but the trees differ.
I think you need both if the tree is not a binary search tree but a plain binary tree. Any set of nodes can be a preorder notation. Suppose there is a binary tree a,b,c,d,e,f,g,h and your subtree is cdef. You can create another tree with subtree cde and another subtree fg. There is no way to know the difference.
If it is a binary search tree however you needn't have the inorder.
Incidentally here's an amusing algorithmic problem: given a preorder notation find the number of binary trees that satisfy it.
As supplement to user1952500's answer: if it is a binary search tree, either only preorder or only postorder can make it unique, while only inorder can not. For example:
5
/ \
3 6
inorder: 3-5-6
however, another binary search tree can have the same inorder:
3
\
5
\
6
Also, I believe preorder+inorder+string_comparison only works to check whether two trees are identical. It fails to check whether a tree is the subtree of another tree. To see an example, refer
Determine if a binary tree is subtree of another binary tree using pre-order and in-order strings
a preorder traversal with sentinel to represent null node is well enough.
we can use this approach to serialize/deserialize binary tree. it means there is one-to-one mapping between a binary tree and its preorder with sentinel representation.

How many traversals need to be known to construct a BST

I am very confused by a number of articles at different sites regarding constructing a Binary Search Tree from any one traversal (pre,post or in-order), or a combination of any two of them. For example, at this page, it says that given the pre,post or level order traversal, along with the in-order traversal, one can construct the BST. But here and there, they show us to construct a BST from pre-order alone. Also, here they show us how to construct the BST from given pre and post-order traversals. In some other site, I found a solution for constructing a BST from the post-order traversal only.
Now I know that given the inorder and pre-order traversals, it is possible to uniquely form a BST. As regards the first link I provided, although they say that we can't construct the BST from pre-order and post-order, can't I just sort the post-order array to get its inorder traversal, and then use that and the pre-order array to form the BST? Will that be same as the solution in the 4th link, or different? And given pre-order only, I can sort that to get the in-order, then use that and the pre-order to get the BST. Again, does that have to be different from the solution at links 2 and 3?
Specifically, what is sufficient to uniquely generate the BST? If uniquement is not required, then I can simply sort it to get the in-order traversal, and build one of the N possible BSTs from it recursively.
To construct a BST you need only one (not in-order) traversal.
In general, to build a binary tree you are going to need two traversals, in order and pre-order for example. However, for the special case of BST - the in-order traversal is always the sorted array containing the elements, so you can always reconstruct it and use an algorithm to reconstruct a generic tree from pre-order and in-order traversals.
So, the information that the tree is a BST, along with the elements in it (even unordered) are equivalent to an in-order traversal.
Bonus: why is one traversal not enough for a general tree, (without the information it is a BST)?
Answer: Let's assume we have n distinct elements. There are n! possible lists to these n elements, however - the possible number of trees is much larger (2 * n! possible trees for the n elements are all decayed trees, such that node.right = null in every node, thus the tree is actually a list to the right. There are n! such trees, and another n! trees where always node.left = null ) Thus, from pigeon hole principle - there is at least one list that generates 2 trees, thus we cannot reconstruct the tree from a single traversal.
(QED)
If the values for the nodes of the BST are given then only one traversal is enough because the rest of the data is provided by the values of the nodes. But if the values are unknown then, as per my understanding, constructing a unique BST from a single traversal is not possible. However, I am open to suggestions.

Tree traversal and serialization

I am trying to get straight in my head how tree traversals can be used to uniquely identify a tree, and the crux of it seems to be whether the tree is a vanilla Binary Tree (BT), or if it also has the stricter stipulation of being a Binary Search Tree (BST). This article seems to indicate that for BT's, a single inorder, preorder and postorder traversal will not uniquely identify a tree (uniquely means structure and values of keys in this context). Here is a quick summary of the article:
BTs
1. We can uniquely reconstruct a BT with preorder + inorder and postorder + inorder.
2. We can also use preorder + postorder if we also stipulate that the traversals keeps track of the null children of a node.
(an open question (for me) is if the above is still true if the BT can have non-unique elements)
BSTs
3. We cannot use inorder for a unique id. We need inorder + preorder, or inorder + postorder.
Now, (finally) my question is, can we use just pre-order or just post-order to uniquely identify a BST? I think that we can, since this question and
answer
seems to say yes, we can use preorder, but any input much appreciated.
I can't tell what's being asked here. Any binary tree, whether it's ordered or not, can be serialized by writing out a sequence of operations needed to reconstruct the tree. Imagine a simple stack machine with just two instructions:
Push an empty tree (or NULL pointer if you like) onto the stack
Allocate a new internal node N, stuff a value into N, pop the top two trees off the stack and make them N's left and right children, and finally push N onto the stack.
Any binary tree can be serialized as a "program" for such a machine.
The serialization algorithm uses a postorder traversal.
Okay, you can use preorder only to identify a tree. This is possible because only in preorder traversal does the id-of-current-node comes before the ids of children. So you can read the traversal output root-to-leaves.
You can check http://en.wikipedia.org/wiki/Tree_traversal#Pre-order to confirm
So you can consider a preorder traversal as a list of insertions into a tree. Because the tree insertion into BST is deterministic, when you insert a list of values into an empty tree, you always get the same tree.

Postorder Traversal

In-order tree traversal obviously has application; getting the contents in order.
Preorder traversal seems really useful for creating a copy of the tree.
Is there a common use for postorder traversal of a binary tree?
Let me add another one:
Postorder traversal is also useful in deleting a tree. In order to free up allocated memory of all nodes in a tree, the nodes must be deleted in the order where the current node can only be deleted when both of its left and right subtrees are deleted.
Postorder does exactly just that. It processes both of the left and right subtrees before processing the current node.
If the tree represents a mathematical expression, then to evaluate the expression, a post-order traversal is necessary.
Yes. Postorder is sometimes used to translate mathematical expressions between different notations.
It can also generate a postfix representation of a binary tree.

Resources