Algorithm to identify if tree is subtree of other tree - algorithm

I am reading cracking the coding interview, and I have a question on the solution of the following problem:
You have two very large binary trees: T1, with millions of nodes, and T2, with hundreds of nodes. Create an algorithm to decide if T2 is a subtree of T1.
The simple solution that it suggests is to create a string representing the in-order and pre-order traversals and check if T2s pre-order/in-order traversal is substring of T1's pre-order/in-order traversal.
What I wonder is why do we need to compare both traversals? And why exactly that two traversals, why not for example in-order and post-order. And mainly won't only one traversal be enough? Say only in-order or pre-order traversal?

One traversal isn't enough. Consider the graphs 1->2->3 and 2<-1->3. If you start with node 1 and do a traversal, you encounter the nodes in the order 1, 2, 3. If you simply create a string showing the pre-order the two give the same result: 1,2,3
On the other hand, if you use a post-order, then the two will give a different result. 3,2,1 and 2,3,1
I bet for any one ordering, you can find two different trees with the same result.
So the question you need to answer for yourself for any other pair you want to look at is: would there be a tree that would give the same order for both traversals? I'm going to leave that as something to think about and come back later to see if you've got it.

I think preorder traversal with sentinel to represent null node is enough.
we can use this approach to serialize/deserialize a binary tree. That means, it is an one-to-one mapping between a binary tree to its preorder+sentinel representation.
After we get strings for both small tree and big tree. then we do a string match using kmp algorithm.
I know people are saying that we have to use both preorder and inorder (or postorder and inorder). but most of them just follow what others are saying, rather than think independently.

Related

Use case of different traversal order in binary tree

There are preorder, inorder and postorder traversal for a binary tree, but no matter what order, it just traverses the tree to find a matched path. Is there any use case where I have to use any of the orders? Or are they just different ways but no difference regarding practical usage? Thanks.
There is definite practical usage with these traversals.
There are few specific use cases as below :
By using In-order traversal, you can get sorted node values if your requirement needs sorted information..
By using Pre-order traversal, you can create a copy of the tree and also can be used to get prefix expression of an expression tree.
Postorder traversal is used to delete the tree and also can be useful to get postfix expression of an expression tree.
The appropriate traversal technique shall be used based on which nodes should be fetched first for the requirement / design in hand. In case, if your requirement requires roots to be processed /picked / analyzed before picking up leaf nodes then pre-order traversal shall be helpful. Else, if leaf nodes have to be processed / fetched / analyzed before root nodes, then post-order shall be helpful.

Proving that one binary tree is a subtree of another

Assume you have two binary trees and you want to know whether one is a subtree of the other. One solution is to get the inorder and preorder traversals of both trees and check whether the traversals of the candidate subtree are substrings of the corresponding traversal for the other tree. I read several posts about this posts about this solution. One discussion shows that inorder AND preorder traversal are both necessary. Can someone explain why they are sufficient? Why is the case that if the inorder and preorder traversal of tree2 are substrings of those of tree1, then tree2 is a subtree of tree1?
Q: One discussion shows that inorder AND preorder traversal are both
necessary. Can someone explain why they are sufficient?
Because of the simple fact that it is possible to uniquely reconstruct a binary tree from these two traversals (or inorder and postorder, as well). Check this example:
Inorder : [1,2,3,4,5,6]
Preorder : [4,2,1,3,5,6]
From preorder, you know that 4 is the root of the tree. From inorder, you can determine the left and right subtree, and you proceed recursively from this point:
4
/ \
Left subtree Right subtree
Inorder : [1,2,3] Inorder : [5,6]
Preorder: [2,1,3] Preorder: [5,6]
Check for more details in this excellent article:
Reconstructing binary trees from tree traversal. Since these two serializations (traversals actually serialize tree to a string) of the tree combined together have to be unique for a binary tree, we get that one tree is a subtree of another if and only if these traversals are substrings of other two serializations.
People agreed that binary tree can represent the order on it's nodes by left/right relation. That means that left part comes before right part. You may call trees equivalent if the order is the same. So in-order string represents the the order and if you want to check the equivalence, then it is sufficient to check only in-order (by definition).
But when you want to check the full equality of trees then we have to find the way how we can distinguish equivalent trees.For example it can be level-order check. But for subtrees level order doesn't fit, because the level order string for subtree is split. For pre-order you walk the subtree form root before other parts of tree.
Suppose equivalent trees are not equal, then traversing in pre-order everything will be equal until first differ. 2 situations can happen.
1) The value of node of one tree differs from another. That means that pre-order strings differs, because you walk the tree in pre-order.
2) Children signature (no children, only left, only right, both children) differs. But in this situation easy to understand that the in-order will change and trees are not equivalent, which contradicts the conditions.
Note that this works only when all the nodes are unique. If you have all nodes of value like "a" then no matter how you walk, your string is always "aa...a". So you have to distinguish the nodes somehow, not only by "value".

How many traversals need to be known to construct a BST

I am very confused by a number of articles at different sites regarding constructing a Binary Search Tree from any one traversal (pre,post or in-order), or a combination of any two of them. For example, at this page, it says that given the pre,post or level order traversal, along with the in-order traversal, one can construct the BST. But here and there, they show us to construct a BST from pre-order alone. Also, here they show us how to construct the BST from given pre and post-order traversals. In some other site, I found a solution for constructing a BST from the post-order traversal only.
Now I know that given the inorder and pre-order traversals, it is possible to uniquely form a BST. As regards the first link I provided, although they say that we can't construct the BST from pre-order and post-order, can't I just sort the post-order array to get its inorder traversal, and then use that and the pre-order array to form the BST? Will that be same as the solution in the 4th link, or different? And given pre-order only, I can sort that to get the in-order, then use that and the pre-order to get the BST. Again, does that have to be different from the solution at links 2 and 3?
Specifically, what is sufficient to uniquely generate the BST? If uniquement is not required, then I can simply sort it to get the in-order traversal, and build one of the N possible BSTs from it recursively.
To construct a BST you need only one (not in-order) traversal.
In general, to build a binary tree you are going to need two traversals, in order and pre-order for example. However, for the special case of BST - the in-order traversal is always the sorted array containing the elements, so you can always reconstruct it and use an algorithm to reconstruct a generic tree from pre-order and in-order traversals.
So, the information that the tree is a BST, along with the elements in it (even unordered) are equivalent to an in-order traversal.
Bonus: why is one traversal not enough for a general tree, (without the information it is a BST)?
Answer: Let's assume we have n distinct elements. There are n! possible lists to these n elements, however - the possible number of trees is much larger (2 * n! possible trees for the n elements are all decayed trees, such that node.right = null in every node, thus the tree is actually a list to the right. There are n! such trees, and another n! trees where always node.left = null ) Thus, from pigeon hole principle - there is at least one list that generates 2 trees, thus we cannot reconstruct the tree from a single traversal.
(QED)
If the values for the nodes of the BST are given then only one traversal is enough because the rest of the data is provided by the values of the nodes. But if the values are unknown then, as per my understanding, constructing a unique BST from a single traversal is not possible. However, I am open to suggestions.

Can a non binary tree be tranversed in order?

We are dealing with a Most similar neigthbour algorithm here. Part of the algorithm involves searching in order over a tree.
The thing is that until now, we cant make that tree to be binary.
Is there an analog to in order traversal for non binary trees. Particularly, I think there is, just traversing the nodes from left to right (and processing the parent node only once?")
Any thoughts?
update
This tree will have in each node a small graph of n objects. Each node will have n children (1 per each element in the graph), each of which will be another graph. So its "kind of" a b tree, without all the overflow - underflow mechanics. So I guess the most similar in order traversal would be similar to a btree inorder traversal ?
Thanks in advance.
Yes, but you need to define what the order is. Post and Pre order are identical, but inorder takes a definition of how the branches compare with the nodes.
There is no simple analog of the in-order sequence for trees other than binary trees (actually in-order is a way to get sorted elements from a binary search tree).
You can find more detail in "The art of computer programming" by Knuth, vol. 1, page 336.
If breadth-first search can serve your purpose then you can use that.

Postorder Traversal

In-order tree traversal obviously has application; getting the contents in order.
Preorder traversal seems really useful for creating a copy of the tree.
Is there a common use for postorder traversal of a binary tree?
Let me add another one:
Postorder traversal is also useful in deleting a tree. In order to free up allocated memory of all nodes in a tree, the nodes must be deleted in the order where the current node can only be deleted when both of its left and right subtrees are deleted.
Postorder does exactly just that. It processes both of the left and right subtrees before processing the current node.
If the tree represents a mathematical expression, then to evaluate the expression, a post-order traversal is necessary.
Yes. Postorder is sometimes used to translate mathematical expressions between different notations.
It can also generate a postfix representation of a binary tree.

Resources