Running time to check if a binary tree is subtree of another binary tree - algorithm

I've come across a naive solution for the problem of checking if a binary tree is subtree of another binary tree:
Given two binary trees, check if the first tree is subtree of the second one. A subtree of a tree T is a tree S consisting of a node in T and all of its descendants in T. The subtree corresponding to the root node is the entire tree; the subtree corresponding to any other node is called a proper subtree.
For example, in the following case, tree S is a subtree of tree T:
Tree 2
10
/ \
4 6
\
30
Tree 1
26
/ \
10 3
/ \ \
4 6 3
\
30
The solution is to traverse the tree T in preorder fashion. For every visited node in the traversal, see if the subtree rooted with this node is identical to S.
It is said in the post that the algorithm has a running time of n^2 or O(m*n) in the worst case where m and n are the sizes of both trees involved.
The point of confusion here is that, if we are traversing through both trees at the same time, in the worst case, it would seem that you would simply have to recurse through all of the nodes in the larger tree to find the subtree. So how could this version of the algorithm (not this one) have a quadratic running time?

Well, basically in the isSubTree() function you only traverse T tree (the main one, not a subtree). You do nothing with S, so in the worst case this function would be executed for every node in T. However (in the worst case) for each execution, it will check if areIdentical(T, S), which in the worst case has to fully traverse one of the given trees (till one of those is zero-sized).
Trees passed to areIdentical() function are obviously smaller and smaller, but in this case it doesn't matter if it comes to time complexity. Either way this gives you O(n^2) or O(n*m) (where n,m - number of nodes in those trees).

To solve reasonably optimally, flatten the two trees. Using Lisp notation,
we get
(10 (4(30) (6))
and
(26 (10 (4(30) (6)) (3 (3))
So the subtree is a substring of the parent. Using strstr we can
complete normally in O(N) time, it might take a little bit longer
if we have lots and lots of near sub-trees. You can use a suffix
tree if you need to do lots of searches and that gets it down to O(M)
time where M is the size of the subtree.
But actually runtime doesn't improve. It's the same algorithm,
and it will have N M behaviour if, for example, all the trees
have the same node id and structure, except for the last right
child of the query sub-tree. it's just that the operations
become a lot faster.

Related

Time complexity of a tree related problem

I am struggling to figure out the time complexity of the following problem (this is not homework, just something I came up with and can't understand).
Suppose you have an arbitrary tree. The algorithm is such that for every node in the tree you have to run some O(1) operation as many times as that node's number of leaf descendants. So, in the example tree below, we would run 2 operations for node A and 6 operations for the root node R.
Let's say you have n nodes, the tree is of depth d, and you may use any other notation necessary. What is the complexity?
I can't quite wrap my head around this. Surely it is less than O(n^2) but how do I approach this? Thank you!
Edit: leaf descendant of a node is a descendant that does not have any children. A descendant is a node reachable by repeated proceeding from parent to child (doesn't matter if it's an internal or a leaf node)
It's Ө(n^2). Obviously, as you noted, it's in O(n^2) because each node must have fewer than n descendant leaves.
In a tree with a construction like this:
A
/ \
B C
/ \
D E
/ \
F G
...
The top-most n/4 internal nodes have at least n/4 descendant leaves, so the total number of operations is at least n^2/16, which is in Ω(n^2).
If you have a depth limit d, then each node can have at most d ancestors, so you get O(n*min(d,n)), which is also tight by a similar construction.
I think it will be O(2(N - Leaf) + Leaf) where Leaf is the number of descendants of the tree. O(2(N - Leaf)) is required to iterate over the tree to find the leaf descendants and a O(1) operation needs to be performed on each of them.

runtime to find middle element using AVL tree

I have an one lecture slides says following:
To find middle element in AVL tree, I traverse elements in order until It reaches the moddile element. It takes O(N).
If I know correctly, in tree structure, finding element takes base 2 O(logn) since AVL is binary tree that always divided into 2 childs.
But why it says O(N)?
I am just trying to elaborate 'A. Mashreghi' comment.
Since, the tree under consideration is AVL tree - the guaranteed finding of element in O(log n) holds as log as you have the element(key) to find.
The problem is - you are trying to identify a middle element in the given data structure. As it is AVL tree (self balanced BST) in-order travel gives you elements in ascending order. You want to use this property to find the middle element.
Algorithm goes like - have a counter increment for every node traversed in-order and return # n/2th position. This sums to O(n/2) and hence the overall complexity O(n).
Being divided into 2 children does not guarantee perfect symmetry. For instance, consider the most unbalanced of all balanced binary trees: each right child has a depth one more than its corresponding left child.
In such a tree, the middle element will be somewhere down in the right branch's left branch's ...
You need to determine how many nodes N you have, then locate the N/2th largest node. This is not O(log N) process.

Why the Red Black Tree is kept unbalanced after insertion?

Here is a red black tree which seems unbalanced. If this is the case, Someone please explain why it is unbalanced?.
The term "balanced" is a bit ambiguous, since different kinds of balanced trees have different constraints.
A red-black tree ensures that every path to a leaf has the same number of black nodes, and at least as many black nodes as red nodes. The result is that the longest path is at most twice as long as the shortest path, which is good enough to guarantee O(log N) time for search, insert, and delete operations.
Most other kinds of balanced trees have tighter balancing constraints. An AVL tree, for example, ensures that the lengths of the longest paths on either side of every node differ by at most 1. This is more than you need, and that has costs -- inserting or deleting in an AVL tree (after finding the target node) takes O(log N) operations on average, while inserting or deleting in a red-black tree takes O(1) operations on average.
If you wanted to keep a tree completely balanced, so that you had the same number of descendents on either side of every node, +/- 1, it would be very expensive -- insert and delete operations would take O(N) time.
Yes it is balanced. The rule says, counting the black NIL leaves, the longest possible path should consists maximum of 2*B-1 nodes where B is black nodes in shortest possible path from the root to any leaf. In your example shortest path has 2 black nodes so B = 2 so longest path can have upto 3 black nodes but it is just 2.

Is it always possible to turn one BST into another using tree rotations?

Given a set of values, it's possible for there to be many different possible binary search trees that can be formed from those values. For example, for the values 1, 2, and 3, there are five BSTs we can make from those values:
1 1 2 3 3
\ \ / \ / /
2 3 1 3 1 2
\ / \ /
3 2 2 1
Many data structures that are based on balanced binary search trees use tree rotations as a primitive for reshaping a BST without breaking the required binary search tree invariants. Tree rotations can be used to pull a node up above its parent, as shown here:
rotate
u right v
/ \ -----> / \
v C A u
/ \ <----- / \
A B rotate B C
left
Given a BST containing a set of values, is it always possible to convert that BST into any arbitrary other BST for the same set of values? For example, could we convert between any of the five BSTs above into any of the other BSTs just by using tree rotations?
The answer to your question depends on whether you are allowed to have equal values in the BST that can appear different from one another. For example, if your BST stores key/value pairs, then it is not always possible to turn one BST for those key/value pairs into a different BST for the same key/value pairs.
The reason for this is that the inorder traversal of the nodes in a BST remains the same regardless of how many tree rotations are performed. As a result, it's not possible to convert from one BST to another if the inorder traversal of the nodes would come out differently. As a very simple case, suppose you have a BST holding two copies of the number 1, each of which is annotated with a different value (say, A or B). In that case, there is no way to turn these two trees into one another using tree rotations:
1:a 1:b
\ \
1:b 1:a
You can check this by brute-forcing the (very small!) set of possible trees you can make with the rotations. However, it suffices to note that an inorder traversal of the first tree gives 1:a, 1:b and an inorder traversal of the second tree gives 1:b, 1:a. Consequently, no number of rotations will suffice to convert between the trees.
On the other hand, if all the values are different, then it is always possible to convert between two BSTs by applying the right number of tree rotations. I'll prove this using an inductive argument on the number of nodes.
As a simple base case, if there are no nodes in the tree, there is only one possible BST holding those nodes: the empty tree. Therefore, it's always possible to convert between two trees with zero nodes in them, since the start and end tree must always be the same.
For the inductive step, let's assume that for any two BSTs of 0, 1, 2, .., n nodes with the same values, that it's always possible to convert from one BST to another using rotations. We'll prove that given any two BSTs made from the same n + 1 values, it's always possible to convert the first tree to the second.
To do this, we'll start off by making a key observation. Given any node in a BST, it is always possible to apply tree rotations to pull that node up to the root of the tree. To do this, we can apply this algorithm:
while (target node is not the root) {
if (node is a left child) {
apply a right rotation to the node and its parent;
} else {
apply a left rotation to the node and its parent;
}
}
The reason that this works is that every time a node is rotated with its parent, its height increases by one. As a result, after applying sufficiently many rotations of the above forms, we can get the root up to the top of the tree.
This now gives us a very straightforward recursive algorithm we can use to reshape any one BST into another BST using rotations. The idea is as follows. First, look at the root node of the second tree. Find that node in the first tree (this is pretty easy, since it's a BST!), then use the above algorithm to pull it up to the root of the tree. At this point, we have turned the first tree into a tree with the following properties:
The first tree's root node is the root node of the second tree.
The first tree's right subtree contains the same nodes as the second tree's right subtree, but possibly with a different shape.
The first tree's left subtree contains the same nodes as the second tree's left subtree, but possibly with a different shape.
Consequently, we could then recursively apply this same algorithm to make the left subtree have the same shape as the left subtree of the second tree and to make the right subtree have the same shape as the right subtree of the second tree. Since these left and right subtrees must have strictly no more than n nodes each, by our inductive hypothesis we know that it's always possible to do this, and so the algorithm will work as intended.
To summarize, the algorithm works as follows:
If the two trees are empty, we are done.
Find the root node of the second tree in the first tree.
Apply rotations to bring that node up to the root.
Recursively reshape the left subtree of the first tree to have the same shape as the left subtree of the second tree.
Recursively reshape the right subtree of the first tree to have the same shape as the right subtree of the second tree.
To analyze the runtime of this algorithm, note that applying steps 1 - 3 requires at most O(h) steps, where h is the height of the first tree. Every node will be brought up to the root of some subtree exactly once, so we do this a total of O(n) times. Since the height of an n-node tree is never greater than O(n), this means that the algorithm takes at most O(n2) time to complete. It's possible that it will do a lot better (for example, if the two trees already have the same shape, then this runs in time O(n)), but this gives a nice worst-case bound.
Hope this helps!
For binary search trees this can actually be done in O(n).
Any tree can be "straightened out", ie put into a form in which all nodes are either the root or a left child.
This form is unique (reading down from root gives the ordering of the elements)
A tree is straightened out as follows:
For any right child, perform a left rotation about itself. This decreases the number of right children by 1, so the tree is straightened out in O(n) rotations.
If A can be straightened out into S in O(n) rotations, and B into S in O(n) rotations, then since rotations are reversible one can turn A -> S -> B in O(n) rotations.

Split a tree into equal parts by deleting an edge

I am looking for an algorithm to split a tree with N nodes (where the maximum degree of each node is 3) by removing one edge from it, so that the two trees that come as the result have as close as possible to N/2. How do I find the edge that is "the most centered"?
The tree comes as an input from a previous stage of the algorithm and is input as a graph - so it's not balanced nor is it clear which node is the root.
My idea is to find the longest path in the tree and then select the edge in the middle of the longest path. Does it work?
Optimally, I am looking for a solution that can ensure that neither of the trees has more than 2N / 3 nodes.
Thanks for your answers.
I don't believe that your initial algorithm works for the reason I mentioned in the comments. However, I think that you can solve this in O(n) time and space using a modified DFS.
Begin by walking the graph to count how many total nodes there are; call this n. Now, choose an arbitrary node and root the tree at it. We will now recursively explore the tree starting from the root and will compute for each subtree how many nodes are in each subtree. This can be done using a simple recursion:
If the current node is null, return 0.
Otherwise:
For each child, compute the number of nodes in the subtree rooted at that child.
Return 1 + the total number of nodes in all child subtrees
At this point, we know for each edge what split we will get by removing that edge, since if the subtree below that edge has k nodes in it, the spilt will be (k, n - k). You can thus find the best cut to make by iterating across all nodes and looking for the one that balances (k, n - k) most evenly.
Counting the nodes takes O(n) time, and running the recursion visits each node and edge at most O(1) times, so that takes O(n) time as well. Finding the best cut takes an additional O(n) time, for a net runtime of O(n). Since we need to store the subtree node counts, we need O(n) memory as well.
Hope this helps!
If you see my answer to Divide-And-Conquer Algorithm for Trees, you can see I'll find a node that partitions tree into 2 nearly equal size trees (bottom up algorithm), now you just need to choose one of the edges of this node to do what you want.
Your current approach is not working assume you have a complete binary tree, now add a path of length 3*log n to one of leafs (name it bad leaf), your longest path will be within one of a other leafs to the end of path connected to this bad leaf, and your middle edge will be within this path (in fact after you passed bad leaf) and if you partition base on this edge you have a part of O(log n) and another part of size O(n) .

Resources