Binary Tree MIN and MAX Depth - binary-tree

I am having trouble with these questions:
A binary tree with N nodes is at least how deep?
How deep is it at most?
Would the maximum depth just be N?

There are two extremes that you need to consider.
Every node has just a left(or right) child, but not right child. In which case your binary search tree is merely a linkedlist in practice.
Every level in your tree is full, maybe except the last level. This type of trees are called complete.
Third type of tree that I know may not be relevant to your question. But it is called full tree and every node is either a leaf or has n number of childs for an n-ary tree.
So to answer your question. Max depth is N. And at least it has log(N) levels, when it is a complete tree.

Related

Best 'order' traversal to copy a balanced binary tree into an AVL tree with minimum rotations

I have two binary trees. One, A which I can access its nodes and pointers (left, right, parent) and B which I don't have access to any of its internals. The idea is to copy A into B by iterating over the nodes of A and doing an insert into B. B being an AVL tree, is there a traversal on A (preorder, inorder, postorder) so that there is a minimum number of rotations when inserting elements to B?
Edit:
The tree A is balanced, I just don't know the exact implementation;
Iteration on tree A needs to be done using only pointers (the programming language is C and there is no queue or stack data structure that I can make use of).
Rebalancing in AVL happens when the depth of one part of the tree exceeds the depth of some other part of the tree by more than one. So to avoid triggering a rebalance you want to feed nodes into the AVL tree one level at a time; that is, feed it all of the nodes from level N of the original tree before you feed it any of the nodes from level N+1.
That ordering would be achieved by a breadth-first traversal of the original tree.
Edit
OP added:
Iteration on tree A needs to be done using only pointers (the
programming language is C and there is no queue or stack data
structure that I can make use of).
That does not affect the answer to the question as posed, which is still that a breadth-first traversal requires the fewest rebalances.
It does affect the way you will implement the breadth-first traversal. If you can't use a predefined queue then there are several ways that you could implement your own queue in C: an array, if permitted, or some variety of linked list are the obvious choices.
If you aren't allowed to use dynamic memory allocation, and the size of the original tree is not bounded such that you can build a queue using a fixed buffer that is sized for the worst case, then you can abandon the queue-based approach and instead use recursion to visit successively deeper levels of the tree. (Imagine a recursive traversal that stops when it reaches a specified depth in the tree, and only emits a result for nodes at that specified depth. Wrap that recursion in a while or for loop that runs from a depth of zero to the maximum depth of the tree.)
If the original tree is not necessarily AVL-balanced, then you can't just copy it.
To ensure that there is no rebalancing in the new tree, you should create a complete binary tree, and you should insert the nodes in BFS/level order so that every intermediate tree is also complete.
A "complete" tree is one in which every level is full, except possibly the last. Since every complete tree is AVL-balanced, and every intermediate tree is complete, there will be no rebalancing required.
If you can't copy your original tree out into an array or other data structure, then you'll need to do log(N) in-order traversals of the original tree to copy all the nodes. During the first traversal, you select and copy the root. During the second, you select and copy level 2. During the third, you copy level 3, etc.
Whether or not a source node is selected for each level depends only on its index within the source tree, so the actual structure of the source tree is irrelevant.
Since each traversal takes O(N) time, the total time spent traversing is O(N log N). Since inserts take O(log N) time, though, that is how long insertion takes as well, so doing log N traversals does not increase the complexity of the overall process.

sorting 3 BST to one array in O(n) time and O(1) extra space

I'm trying to write an algorithm for this problem:
Merge three binary search trees into one sorted array, using O(n) time and O(1) additional space.
I think the straightforward answer is to do an in-order traversal of all three trees at once and compare the elements while traversing. But how can I do such a traversal in all three trees at once? Especially when the trees don't all have the same number of elements.
Your idea seems right.
In each tree, maintain a pointer (iterator).
Initially, the iterator should point to the leftmost node of the tree.
In every iteration, select the minimum of the elements under the three current pointers (it is O(1) time and memory).
Then put that minimum into the resulting array.
After that, advance the corresponding pointer so that it points to the leftmost unvisited element of the tree.
To be able to do that in O(1) memory, the tree should allow some way to go to this next unvisited element: it is sufficient to have a pointer to parent in each node.
Proceed with such iterations until all nodes are visited.
The traversal of a whole tree of n elements takes O(n) time: there are n-1 edges, and the process moves twice along each edge, once up and once down.
So the resulting complexity is 3*O(n) = O(n).
The algorithm to find the next unvisited node is as follows.
Note that, when we are at a node, its left subtree is already fully visited.
The steps are as follows:
While there is no unvisited right child, go up to the parent once.
If, in doing so, we went up and right (we were at the left child), stop right there at the parent.
If we were at the root, terminate the traversal.
Assuming we did not stop yet, there's a right child.
Go there.
Then while there's a left child, go to the left child.
Stop.
The best way to grasp it is perhaps to visualize the steps on some non-trivial picture of a binary search tree. For example, there are explanatory pictures at the Wikipedia article on tree traversal.

runtime to find middle element using AVL tree

I have an one lecture slides says following:
To find middle element in AVL tree, I traverse elements in order until It reaches the moddile element. It takes O(N).
If I know correctly, in tree structure, finding element takes base 2 O(logn) since AVL is binary tree that always divided into 2 childs.
But why it says O(N)?
I am just trying to elaborate 'A. Mashreghi' comment.
Since, the tree under consideration is AVL tree - the guaranteed finding of element in O(log n) holds as log as you have the element(key) to find.
The problem is - you are trying to identify a middle element in the given data structure. As it is AVL tree (self balanced BST) in-order travel gives you elements in ascending order. You want to use this property to find the middle element.
Algorithm goes like - have a counter increment for every node traversed in-order and return # n/2th position. This sums to O(n/2) and hence the overall complexity O(n).
Being divided into 2 children does not guarantee perfect symmetry. For instance, consider the most unbalanced of all balanced binary trees: each right child has a depth one more than its corresponding left child.
In such a tree, the middle element will be somewhere down in the right branch's left branch's ...
You need to determine how many nodes N you have, then locate the N/2th largest node. This is not O(log N) process.

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

Complete binary tree definitions

I have some questions on binary trees:
Wikipedia states that a binary tree is complete when "A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible." What does the last "as far left as possible" passage mean?
A well-formed binary tree is said to be "height-balanced" if (1) it is empty, or (2) its left and right children are height-balanced and the height of the left tree is within 1 of the height of the right tree, taken from How to determine if binary tree is balanced?, is this correct or there's "jitter" on the 1-value? I read on the answer I linked that there could be also a difference factor of 4 between the height of the right and the left tree
Do the complete and height-balanced definitions just apply to binary tree or just any other tree?
Following the reference of the definition in wikipedia, I got to
this page. The definition was taken from there but modified:
Definition: A binary tree in which every level, except possibly the deepest, is completely filled. At depth n, the height of the
tree, all nodes must be as far left as possible.
It continues with a note below though,
A complete binary tree has 2k nodes at every depth k < n and between 2n and 2^(n+1) - 1 nodes altogether.
Sometimes, definitions vary according to convenience (be useful for something). That passage might be a variation which, as I understand, requires leaf nodes to fill first the left side of the deepest level (that is, fill from left to right). The definition that I usually found is exactly as described above but without that
passage.
Usually the definition taken for height-balanced tree is the one you
described. In other words:
A tree is balanced if and only if for every node the heights of its two subtrees differ by at most 1.
That definition was taken from here. Again, sometimes definitions are made more flexible to serve specific purposes. For example, the definition of an AVL tree says that
In an AVL tree, the heights of the two child subtrees of any node
differ by at most one
Still, I remember once I had to rewrite an algorithm so that the tree
would be considered height-balanced if the two child subtrees of any
node differed by at most 2. Note that the definition you gave is recursive, this is very common for binary trees.
In a tree whose number of children is variable, you wouldn't be able to say that it is complete (any parent could have the number of children that you want). Still, it can apply to n-ary trees (with a fixed amount of n children).
Do the complete and height-balanced definitions just apply to binary
tree or just any other tree?
Short answer: Yes, it can be extended to any n-ary tree.

Resources