runtime to find middle element using AVL tree - algorithm

I have an one lecture slides says following:
To find middle element in AVL tree, I traverse elements in order until It reaches the moddile element. It takes O(N).
If I know correctly, in tree structure, finding element takes base 2 O(logn) since AVL is binary tree that always divided into 2 childs.
But why it says O(N)?

I am just trying to elaborate 'A. Mashreghi' comment.
Since, the tree under consideration is AVL tree - the guaranteed finding of element in O(log n) holds as log as you have the element(key) to find.
The problem is - you are trying to identify a middle element in the given data structure. As it is AVL tree (self balanced BST) in-order travel gives you elements in ascending order. You want to use this property to find the middle element.
Algorithm goes like - have a counter increment for every node traversed in-order and return # n/2th position. This sums to O(n/2) and hence the overall complexity O(n).

Being divided into 2 children does not guarantee perfect symmetry. For instance, consider the most unbalanced of all balanced binary trees: each right child has a depth one more than its corresponding left child.
In such a tree, the middle element will be somewhere down in the right branch's left branch's ...
You need to determine how many nodes N you have, then locate the N/2th largest node. This is not O(log N) process.

Related

Is this new sorting algorithm based on Binary Search Tree useful?

If we some how transform a Binary Search Tree into a form where no node other than root may have both right and left child and the nodes the right sub-tree of the root may only have right child, and vice versa, such a configuration of BST is inherently sorted with its root being approximately in the middle (in case of nearly complete BST’s). To to this we need to do reverse rotations. Unlike AVL and red black trees, where roatations are done to make the tree balanced, we would do reversed rotations.
I would like to explain the pseudo code and logical implementation of the algorithm through the following images. The algorithm is to first sort the left subtree with respect to the root and then the right subtree. These two subparts will be opposite to each other, that is, left would interchange with right. For simplicity I have taken a BST with right subtree, with respect to root, sorted.
To improve the complexity as compared to tree sort we can augment the above algorithm. We can add a flag to each node where 0 stands for a normal node while 1 is when the node has non null right child, in the original unsorted BST. The nodes with flag 1 have an entry in a hash table with key being their pointers and the values being the right most node. For example node 23's pointer would map to 30.5's pointer. Then we would not have to traverse all the nodes in between for the iteration. If we have 23's pointer and 30.5's pointer we can do the required operation in O(1). This will bring down time complexity , as compared to tree sort.
Please review the algorithm and give suggestion if this algorithm is usefull.

sorting 3 BST to one array in O(n) time and O(1) extra space

I'm trying to write an algorithm for this problem:
Merge three binary search trees into one sorted array, using O(n) time and O(1) additional space.
I think the straightforward answer is to do an in-order traversal of all three trees at once and compare the elements while traversing. But how can I do such a traversal in all three trees at once? Especially when the trees don't all have the same number of elements.
Your idea seems right.
In each tree, maintain a pointer (iterator).
Initially, the iterator should point to the leftmost node of the tree.
In every iteration, select the minimum of the elements under the three current pointers (it is O(1) time and memory).
Then put that minimum into the resulting array.
After that, advance the corresponding pointer so that it points to the leftmost unvisited element of the tree.
To be able to do that in O(1) memory, the tree should allow some way to go to this next unvisited element: it is sufficient to have a pointer to parent in each node.
Proceed with such iterations until all nodes are visited.
The traversal of a whole tree of n elements takes O(n) time: there are n-1 edges, and the process moves twice along each edge, once up and once down.
So the resulting complexity is 3*O(n) = O(n).
The algorithm to find the next unvisited node is as follows.
Note that, when we are at a node, its left subtree is already fully visited.
The steps are as follows:
While there is no unvisited right child, go up to the parent once.
If, in doing so, we went up and right (we were at the left child), stop right there at the parent.
If we were at the root, terminate the traversal.
Assuming we did not stop yet, there's a right child.
Go there.
Then while there's a left child, go to the left child.
Stop.
The best way to grasp it is perhaps to visualize the steps on some non-trivial picture of a binary search tree. For example, there are explanatory pictures at the Wikipedia article on tree traversal.

Count nodes bigger then root in each subtree of a given binary tree in O(n log n)

We are given a tree with n nodes in form of a pointer to its root node, where each node contains a pointer to its parent, left child and right child, and also a key which is an integer. For each node v I want to add additional field v.bigger which should contain number of nodes with key bigger than v.key, that are in a subtree rooted at v. Adding such a field to all nodes of a tree should take O(n log n) time in total.
I'm looking for any hints that would allow me to solve this problem. I tried several heuristics - for example when thinking about doing this problem in bottom-up manner, for a fixed node v, v.left and v.right could provide v with some kind of set (balanced BST?) with operation bigger(x), which for a given x returns a number of elements bigger than x in that set in logarihmic time. The problem is, we would need to merge such sets in O(log n), so this seems as a no-go, as I don't know any ordered set like data structure which supports quick merging.
I also thought about top-down approach - a node v adds one to some u.bigger for some node u if and only if u lies on a simple path to the root and u<v. So v could update all such u's somehow, but I couldn't come up with any reasonable way of doing that...
So, what is the right way of thinking about this problem?
Perform depth-first search in given tree (starting from root node).
When any node is visited for the first time (coming from parent node), add its key to some order-statistics data structure (OSDS). At the same time query OSDS for number of keys larger than current key and initialize v.bigger with negated result of this query.
When any node is visited for the last time (coming from right child), query OSDS for number of keys larger than current key and add the result to v.bigger.
You could apply this algorithm to any rooted trees (not necessarily binary trees). And it does not necessarily need parent pointers (you could use DFS stack instead).
For OSDS you could use either augmented BST or Fenwick tree. In case of Fenwick tree you need to preprocess given tree so that values of the keys are compressed: just copy all the keys to an array, sort it, remove duplicates, then substitute keys by their indexes in this array.
Basic idea:
Using the bottom-up approach, each node will get two ordered lists of the values in the subtree from both sons and then find how many of them are bigger. When finished, pass the combined ordered list upwards.
Details:
Leaves:
Leaves obviously have v.bigger=0. The node above them creates a two item list of the values, updates itself and adds its own value to the list.
All other nodes:
Get both lists from sons and merge them in an ordered way. Since they are already sorted, this is O(number of nodes in subtree). During the merge you can also find how many nodes qualify the condition and get the value of v.bigger for the node.
Why is this O(n logn)?
Every node in the tree counts through the number of nodes in its subtree. This means the root counts all the nodes in the tree, the sons of the root each count (combined) the number of nodes in the tree (yes, yes, -1 for the root) and so on all nodes in the same height count together the number of nodes that are lower. This gives us that the number of nodes counted is number of nodes * height of the tree - which is O(n logn)
What if for each node we keep a separate binary search tree (BST) which consists of nodes of the subtree rooted at that node.
For a node v at level k, merging the two subtrees v.left and v.right which both have O(n/2^(k+1)) elements is O(n/2^k). After forming the BST for this node, we can find v.bigger in O(n/2^(k+1)) time by just counting the elements in the right (traditionally) subtree of the BST. Summing up, we have O(3*n/2^(k+1)) operations for a single node at level k. There are a total of 2^k many level k nodes, therefore we have O(2^k*3*n/2^(k+1)) which is simplified as O(n) (dropping the 3/2 constant). operations at level k. There are log(n) levels, hence we have O(n*log(n)) operations in total.

Binary Tree MIN and MAX Depth

I am having trouble with these questions:
A binary tree with N nodes is at least how deep?
How deep is it at most?
Would the maximum depth just be N?
There are two extremes that you need to consider.
Every node has just a left(or right) child, but not right child. In which case your binary search tree is merely a linkedlist in practice.
Every level in your tree is full, maybe except the last level. This type of trees are called complete.
Third type of tree that I know may not be relevant to your question. But it is called full tree and every node is either a leaf or has n number of childs for an n-ary tree.
So to answer your question. Max depth is N. And at least it has log(N) levels, when it is a complete tree.

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

Resources