sorting 3 BST to one array in O(n) time and O(1) extra space - algorithm

I'm trying to write an algorithm for this problem:
Merge three binary search trees into one sorted array, using O(n) time and O(1) additional space.
I think the straightforward answer is to do an in-order traversal of all three trees at once and compare the elements while traversing. But how can I do such a traversal in all three trees at once? Especially when the trees don't all have the same number of elements.

Your idea seems right.
In each tree, maintain a pointer (iterator).
Initially, the iterator should point to the leftmost node of the tree.
In every iteration, select the minimum of the elements under the three current pointers (it is O(1) time and memory).
Then put that minimum into the resulting array.
After that, advance the corresponding pointer so that it points to the leftmost unvisited element of the tree.
To be able to do that in O(1) memory, the tree should allow some way to go to this next unvisited element: it is sufficient to have a pointer to parent in each node.
Proceed with such iterations until all nodes are visited.
The traversal of a whole tree of n elements takes O(n) time: there are n-1 edges, and the process moves twice along each edge, once up and once down.
So the resulting complexity is 3*O(n) = O(n).
The algorithm to find the next unvisited node is as follows.
Note that, when we are at a node, its left subtree is already fully visited.
The steps are as follows:
While there is no unvisited right child, go up to the parent once.
If, in doing so, we went up and right (we were at the left child), stop right there at the parent.
If we were at the root, terminate the traversal.
Assuming we did not stop yet, there's a right child.
Go there.
Then while there's a left child, go to the left child.
Stop.
The best way to grasp it is perhaps to visualize the steps on some non-trivial picture of a binary search tree. For example, there are explanatory pictures at the Wikipedia article on tree traversal.

Related

How to check if two binary trees share a node

Given an array of binary trees find whether any two trees share a node, not value wise, but "pointer" wise. At the bottom I provided an example.
My approach was to iterate through all the trees and store all the leaves (pointers) from each tree into a list, then check if list has any duplicates, but that's a rather slow approach. Is there perhaps a quicker way to solve this?
In the worst case you will have to traverse all nodes (all pointers) to find a shared node (pointer), as it might happen to be the last one visited. So the best time complexity we can expect to have is O(π‘š+𝑛) where π‘š and 𝑛 represent the number of nodes in either tree.
We can achieve this time complexity if we store the pointers from the first tree in a hash set and then traverse the pointers of the second tree to see if any of those is in the set. Assuming that get/set operations on a hash set have an amortized constant time complexity, the overal time complexity will be O(π‘š+𝑛).
If the same program is responsible for constructing the trees, then a reuse of the same node can be detected upon insertion. For instance, reuse of the same node in multiple trees can be completely avoided by having the insert method of your tree only take a value as argument, never a node instance. The method will then encapsulate the actual creation of the node, guaranteeing its uniqueness.
An idea for O(#nodes) time and O(1) space. It does more traversal work than simple traversals using a hash table, but it doesn't have the cost of using a hash table. I don't know what's better. Might depend on the language.
For two trees
Create one extra node. Do a Morris traversal of the first tree. It only modifies right child pointers, so we can use left child pointers for marking nodes as seen. For every tree node without left child, set our extra node as left child. Whenever checking a left child pointer, treat our extra node like a null pointer, i.e., don't visit it. After the traversal, the tree structure is restored, and all originally left-child-less tree nodes now point to our extra node as left child. That includes all leaf nodes.
Do a Morris traversal of the second tree. Again treat pointers to our extra node like null pointers. If we ever do encounter our extra node, we know the trees share a node. If not, then we know the trees don't share a node, since if they did share any, they'd also share a leaf node (just go down from any shared node to a leaf node, that's also shared), and all leafs nodes of the first tree are marked. After the traversal, the second tree is restored.
Do a Morris traversal of the first tree again, this time removing our extra node, restoring the original null pointers.
For an array of more than two trees
Mark the first tree as above. Check the second tree as above. Mark the second tree. Check the third. Mark the third. Check the fourth. Mark the fourth. Etc. When you found a shared node or there are no more trees, unmark the marked trees.
Every shared node must have two parents, or an ancestor with two parents.
LOOP over nodes
IF node has two parents
MARK node as shared
Mark all descendants as shared.

Is this new sorting algorithm based on Binary Search Tree useful?

If we some how transform a Binary Search Tree into a form where no node other than root may have both right and left child and the nodes the right sub-tree of the root may only have right child, and vice versa, such a configuration of BST is inherently sorted with its root being approximately in the middle (in case of nearly complete BST’s). To to this we need to do reverse rotations. Unlike AVL and red black trees, where roatations are done to make the tree balanced, we would do reversed rotations.
I would like to explain the pseudo code and logical implementation of the algorithm through the following images. The algorithm is to first sort the left subtree with respect to the root and then the right subtree. These two subparts will be opposite to each other, that is, left would interchange with right. For simplicity I have taken a BST with right subtree, with respect to root, sorted.
To improve the complexity as compared to tree sort we can augment the above algorithm. We can add a flag to each node where 0 stands for a normal node while 1 is when the node has non null right child, in the original unsorted BST. The nodes with flag 1 have an entry in a hash table with key being their pointers and the values being the right most node. For example node 23's pointer would map to 30.5's pointer. Then we would not have to traverse all the nodes in between for the iteration. If we have 23's pointer and 30.5's pointer we can do the required operation in O(1). This will bring down time complexity , as compared to tree sort.
Please review the algorithm and give suggestion if this algorithm is usefull.

runtime to find middle element using AVL tree

I have an one lecture slides says following:
To find middle element in AVL tree, I traverse elements in order until It reaches the moddile element. It takes O(N).
If I know correctly, in tree structure, finding element takes base 2 O(logn) since AVL is binary tree that always divided into 2 childs.
But why it says O(N)?
I am just trying to elaborate 'A. Mashreghi' comment.
Since, the tree under consideration is AVL tree - the guaranteed finding of element in O(log n) holds as log as you have the element(key) to find.
The problem is - you are trying to identify a middle element in the given data structure. As it is AVL tree (self balanced BST) in-order travel gives you elements in ascending order. You want to use this property to find the middle element.
Algorithm goes like - have a counter increment for every node traversed in-order and return # n/2th position. This sums to O(n/2) and hence the overall complexity O(n).
Being divided into 2 children does not guarantee perfect symmetry. For instance, consider the most unbalanced of all balanced binary trees: each right child has a depth one more than its corresponding left child.
In such a tree, the middle element will be somewhere down in the right branch's left branch's ...
You need to determine how many nodes N you have, then locate the N/2th largest node. This is not O(log N) process.

Binary Tree MIN and MAX Depth

I am having trouble with these questions:
A binary tree with N nodes is at least how deep?
How deep is it at most?
Would the maximum depth just be N?
There are two extremes that you need to consider.
Every node has just a left(or right) child, but not right child. In which case your binary search tree is merely a linkedlist in practice.
Every level in your tree is full, maybe except the last level. This type of trees are called complete.
Third type of tree that I know may not be relevant to your question. But it is called full tree and every node is either a leaf or has n number of childs for an n-ary tree.
So to answer your question. Max depth is N. And at least it has log(N) levels, when it is a complete tree.

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

Resources