Best 'order' traversal to copy a balanced binary tree into an AVL tree with minimum rotations - data-structures

I have two binary trees. One, A which I can access its nodes and pointers (left, right, parent) and B which I don't have access to any of its internals. The idea is to copy A into B by iterating over the nodes of A and doing an insert into B. B being an AVL tree, is there a traversal on A (preorder, inorder, postorder) so that there is a minimum number of rotations when inserting elements to B?
Edit:
The tree A is balanced, I just don't know the exact implementation;
Iteration on tree A needs to be done using only pointers (the programming language is C and there is no queue or stack data structure that I can make use of).

Rebalancing in AVL happens when the depth of one part of the tree exceeds the depth of some other part of the tree by more than one. So to avoid triggering a rebalance you want to feed nodes into the AVL tree one level at a time; that is, feed it all of the nodes from level N of the original tree before you feed it any of the nodes from level N+1.
That ordering would be achieved by a breadth-first traversal of the original tree.
Edit
OP added:
Iteration on tree A needs to be done using only pointers (the
programming language is C and there is no queue or stack data
structure that I can make use of).
That does not affect the answer to the question as posed, which is still that a breadth-first traversal requires the fewest rebalances.
It does affect the way you will implement the breadth-first traversal. If you can't use a predefined queue then there are several ways that you could implement your own queue in C: an array, if permitted, or some variety of linked list are the obvious choices.
If you aren't allowed to use dynamic memory allocation, and the size of the original tree is not bounded such that you can build a queue using a fixed buffer that is sized for the worst case, then you can abandon the queue-based approach and instead use recursion to visit successively deeper levels of the tree. (Imagine a recursive traversal that stops when it reaches a specified depth in the tree, and only emits a result for nodes at that specified depth. Wrap that recursion in a while or for loop that runs from a depth of zero to the maximum depth of the tree.)

If the original tree is not necessarily AVL-balanced, then you can't just copy it.
To ensure that there is no rebalancing in the new tree, you should create a complete binary tree, and you should insert the nodes in BFS/level order so that every intermediate tree is also complete.
A "complete" tree is one in which every level is full, except possibly the last. Since every complete tree is AVL-balanced, and every intermediate tree is complete, there will be no rebalancing required.
If you can't copy your original tree out into an array or other data structure, then you'll need to do log(N) in-order traversals of the original tree to copy all the nodes. During the first traversal, you select and copy the root. During the second, you select and copy level 2. During the third, you copy level 3, etc.
Whether or not a source node is selected for each level depends only on its index within the source tree, so the actual structure of the source tree is irrelevant.
Since each traversal takes O(N) time, the total time spent traversing is O(N log N). Since inserts take O(log N) time, though, that is how long insertion takes as well, so doing log N traversals does not increase the complexity of the overall process.

Related

How would you keep an ordinary binary tree (not BST) balanced?

I'm aware of ways to keep binary search trees balanced/self-balancing using rotations.
I am not sure if my case needs to be that complicated. I don't need to maintain any sorted order property like with self-balancing BSTs. I just have an ordinary binary tree that I may need to delete nodes or insert nodes. I need try to maintain balance in the tree. For simplicity, my binary tree is similar to a segment tree, and every time a node is deleted, all the nodes along the path from the root to this node will be affected (in my case, it's just some subtraction of the nodal values). Similarly, every time a node is inserted, all the nodes from the root to the inserted node's final location will be affected (an addition to nodal values this time).
What would be the most straightforward way to keep a tree such as this balanced? It doesn't need to be strictly as height balanced as AVL trees, but something like RB trees or maybe slightly less balanced is acceptable as well.
If a new node does not have to be inserted at a particular spot -- possibly determined by its own value and the values in the tree -- but you are completely free to choose its location, then you could maintain the shape of the tree as a complete tree:
In a complete binary tree every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.
An array is a very efficient data structure for a complete tree, as you can store the nodes in their order in a breadth-first traversal. Because the tree is given to be complete, the array has no gaps. This structure is commonly used for heaps:
Heaps are usually implemented with an array, as follows:
Each element in the array represents a node of the heap, and
The parent / child relationship is defined implicitly by the elements' indices in the array.
Example of a complete binary max-heap with node keys being integers from 1 to 100 and how it would be stored in an array.
In the array, the first index contains the root element. The next two indices of the array contain the root's children. The next four indices contain the four children of the root's two child nodes, and so on. Therefore, given a node at index i, its children are at indices 2i + 1 and 2i + 2, and its parent is at index floor((i-1)/2). This simple indexing scheme makes it efficient to move "up" or "down" the tree.
Operations
In your case, you would define the insert/delete operations as follows:
Insert: append the node to the end of the array. Then perform the mutation needed to its ancestors (as you described in your question)
Delete: replace the node to be deleted with the node that currently sits at the very end of the array, and shorten the array by 1. Make the updates needed that follow from the change at these two locations -- so two paths from root-to-node are impacted.
When balancing non-BSTs, the big question to ask is
Can your tree efficiently support rotations?
Some types of binary trees, like k-d trees, have a specific layer-by-layer structure that makes rotations infeasible. Others, like range trees, have auxiliary metadata in each node that's expensive to update after a rotation. But if you can handle rotations, then you can use just about any of the balancing strategies out there. The simplest option might be to model your tree on a treap: put a randomly-chosen weight field into each node, and then, during insertions, rotate your newly-added leaf up until its weight is less than its parent. To delete, repeatedly rotate the node with its lighter child until it's a leaf, then delete it.
If you cannot support rotations, you'll need a rebalancing strategy that does not require them. Perhaps the easiest option there is to model your tree after a scapegoat tree, which works by lazily detecting a node that's too deep for the tree to be balanced, then rebuilding the smallest imbalanced subtree possible into a perfectly-balanced tree to get everything back into order. Deletions are handled by rebuilding the whole tree once the number of nodes drops by some constant factor.

counting leaf nodes in a tree

Assume we have a tree where every node has pre-decided set of outgoing nodes. Is it possible to come up with a fast way/optimizations to count the number of leaf nodes given a level value? Would be great if someone could suggest any ideas/links/resources to do the same.
No. you'd still have to traverse the entire tree. There's no way of predicting the precise structure - or approximating it - from only the number of childnodes of each node of the tree.
Apart from that: just keep a counter and update it on each insertion. Far simpler and wouldn't change time-complexity of any operation, except for counting leaves, which would be reduced to O(1).
This can actually get pretty tough thing. As it varies of what is the programming language, what is the input data structure, is the tree binary or general tree (arbitrary number of children), size of the tree.
The most general idea is to run a DFS or BFS, starting from the root, to get every node level and then make a list of sets where each set contains the nodes of a single level. The set can be any structure, standard list is fine.
Let's say you are working in C++ which is good, if not the best practical choice if you need performance (even better than C).
Let's say we have a general tree and the input structure is adjacency list as you mentioned.
//nodes are numbered from zero to N-1
vector<vector<int>> adjList;
Then you run either a BFS or DFS, either will do for a tree, keeping a level for each node. The level for a next node is the level of it's parent plus one.
Once you discover the level, you put the node in like this.
vector<vector<int>> nodesPartitionedByLevels(nodeCount);
//run bfs here
//inside it you call
nodesPartitionedByLevels[level].push_back(node)
That's about it.
Then when you have the levels, you iterate all the nodes on that level and you check the adjaceny list if it contans any nodes.
basically you call adjList[node].empty(). If true than that is a leaf node.

Binary Tree MIN and MAX Depth

I am having trouble with these questions:
A binary tree with N nodes is at least how deep?
How deep is it at most?
Would the maximum depth just be N?
There are two extremes that you need to consider.
Every node has just a left(or right) child, but not right child. In which case your binary search tree is merely a linkedlist in practice.
Every level in your tree is full, maybe except the last level. This type of trees are called complete.
Third type of tree that I know may not be relevant to your question. But it is called full tree and every node is either a leaf or has n number of childs for an n-ary tree.
So to answer your question. Max depth is N. And at least it has log(N) levels, when it is a complete tree.

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

How can I efficiently get to the leaves of a binary-search tree?

I want to sum all the values in the leaves of a BST. Apparently, I can't get to the leaves without traversing the whole tree. Is this true? Can I get to the leaves without taking O(N) time?
You realize that the leaves themselves will be at least 1/2 of O(n) anyway?
There is no way to get the leaves of a tree without traversing the whole tree (especially if you want every single leaf), which will unfortunately operate in O(n) time. Are you sure that a tree is the best way to store your data if you want to access all of these leaves? There are other data structures which will allow more efficient access to your data.
To access all leaf nodes of a BST, you will have to traverse all the nodes of BST and that would be of order O(n).
One alternative is to use B+ tree where you can traverse to a leaf node in O(log n) time and after that all leaf nodes can be accessed sequentially to compute the sum. So, in your case it would be O(log n + k), where k is the number of leaf nodes and n is the total number of nodes in the B+ tree.
cheers
You will either have to traverse the tree searching for nodes without children, or modify the structure you are using to represent the tree to include a list of the leaf nodes. This will also necessitate modifying your insert and delete methods to maintain the list (for instance, if you remove the last child from a node, it becomes a leaf node). Unless the tree is very large, it's probably nice enough to just go ahead and traverse the tree.

Resources