Traversing an overflowing binary tree - binary-tree

Given a very large binary tree (i.e. with millions of nodes), how to handle determining the number of nodes in the tree? In other words, given the root node of this tree to a function, the function should return the number of nodes in the tree.
Or let's say how do you check if the Binary Tree is BST if the tree has very large number of nodes?

Walk all nodes and check whatever conditions/metric you need. There is nothing else you can do without additional knowledge about the tree.
You can enforce particular conditions at the time when tree is created (i.e. must be balanced/sorted/whatever) or collect information about tree at creation time (i.e. store and constantly update number of children).

To check if it's a VALID bst you have to visit every node depth first and ensure each node is smaller than the previous.
If you want to evaluate how long that will take for a balanced BST you could get a quick approximation of the size by counting the length of one leg, I believe the total size will be between 2^(n-1) and 2^n-1 inclusive

Related

Can I count the number of nodes under given node in AVL tree and balanced binary trees in O(logn) time?

My purpose is to first locate(search) a certain node in the tree(AVL or balanced binary tree), and then count the number of nodes which are under it. The whole operation works in O(logn) times. Is it achievable?
Yes it is, for that you need an implementation of AVL that keeps for each node the number of nodes that it has in its sub-tree.
Now, you only need to locate the desired node, and then look in its 'size' field to know how many nodes are in its sub-tree.

How would you keep an ordinary binary tree (not BST) balanced?

I'm aware of ways to keep binary search trees balanced/self-balancing using rotations.
I am not sure if my case needs to be that complicated. I don't need to maintain any sorted order property like with self-balancing BSTs. I just have an ordinary binary tree that I may need to delete nodes or insert nodes. I need try to maintain balance in the tree. For simplicity, my binary tree is similar to a segment tree, and every time a node is deleted, all the nodes along the path from the root to this node will be affected (in my case, it's just some subtraction of the nodal values). Similarly, every time a node is inserted, all the nodes from the root to the inserted node's final location will be affected (an addition to nodal values this time).
What would be the most straightforward way to keep a tree such as this balanced? It doesn't need to be strictly as height balanced as AVL trees, but something like RB trees or maybe slightly less balanced is acceptable as well.
If a new node does not have to be inserted at a particular spot -- possibly determined by its own value and the values in the tree -- but you are completely free to choose its location, then you could maintain the shape of the tree as a complete tree:
In a complete binary tree every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.
An array is a very efficient data structure for a complete tree, as you can store the nodes in their order in a breadth-first traversal. Because the tree is given to be complete, the array has no gaps. This structure is commonly used for heaps:
Heaps are usually implemented with an array, as follows:
Each element in the array represents a node of the heap, and
The parent / child relationship is defined implicitly by the elements' indices in the array.
Example of a complete binary max-heap with node keys being integers from 1 to 100 and how it would be stored in an array.
In the array, the first index contains the root element. The next two indices of the array contain the root's children. The next four indices contain the four children of the root's two child nodes, and so on. Therefore, given a node at index i, its children are at indices 2i + 1 and 2i + 2, and its parent is at index floor((i-1)/2). This simple indexing scheme makes it efficient to move "up" or "down" the tree.
Operations
In your case, you would define the insert/delete operations as follows:
Insert: append the node to the end of the array. Then perform the mutation needed to its ancestors (as you described in your question)
Delete: replace the node to be deleted with the node that currently sits at the very end of the array, and shorten the array by 1. Make the updates needed that follow from the change at these two locations -- so two paths from root-to-node are impacted.
When balancing non-BSTs, the big question to ask is
Can your tree efficiently support rotations?
Some types of binary trees, like k-d trees, have a specific layer-by-layer structure that makes rotations infeasible. Others, like range trees, have auxiliary metadata in each node that's expensive to update after a rotation. But if you can handle rotations, then you can use just about any of the balancing strategies out there. The simplest option might be to model your tree on a treap: put a randomly-chosen weight field into each node, and then, during insertions, rotate your newly-added leaf up until its weight is less than its parent. To delete, repeatedly rotate the node with its lighter child until it's a leaf, then delete it.
If you cannot support rotations, you'll need a rebalancing strategy that does not require them. Perhaps the easiest option there is to model your tree after a scapegoat tree, which works by lazily detecting a node that's too deep for the tree to be balanced, then rebuilding the smallest imbalanced subtree possible into a perfectly-balanced tree to get everything back into order. Deletions are handled by rebuilding the whole tree once the number of nodes drops by some constant factor.

Best 'order' traversal to copy a balanced binary tree into an AVL tree with minimum rotations

I have two binary trees. One, A which I can access its nodes and pointers (left, right, parent) and B which I don't have access to any of its internals. The idea is to copy A into B by iterating over the nodes of A and doing an insert into B. B being an AVL tree, is there a traversal on A (preorder, inorder, postorder) so that there is a minimum number of rotations when inserting elements to B?
Edit:
The tree A is balanced, I just don't know the exact implementation;
Iteration on tree A needs to be done using only pointers (the programming language is C and there is no queue or stack data structure that I can make use of).
Rebalancing in AVL happens when the depth of one part of the tree exceeds the depth of some other part of the tree by more than one. So to avoid triggering a rebalance you want to feed nodes into the AVL tree one level at a time; that is, feed it all of the nodes from level N of the original tree before you feed it any of the nodes from level N+1.
That ordering would be achieved by a breadth-first traversal of the original tree.
Edit
OP added:
Iteration on tree A needs to be done using only pointers (the
programming language is C and there is no queue or stack data
structure that I can make use of).
That does not affect the answer to the question as posed, which is still that a breadth-first traversal requires the fewest rebalances.
It does affect the way you will implement the breadth-first traversal. If you can't use a predefined queue then there are several ways that you could implement your own queue in C: an array, if permitted, or some variety of linked list are the obvious choices.
If you aren't allowed to use dynamic memory allocation, and the size of the original tree is not bounded such that you can build a queue using a fixed buffer that is sized for the worst case, then you can abandon the queue-based approach and instead use recursion to visit successively deeper levels of the tree. (Imagine a recursive traversal that stops when it reaches a specified depth in the tree, and only emits a result for nodes at that specified depth. Wrap that recursion in a while or for loop that runs from a depth of zero to the maximum depth of the tree.)
If the original tree is not necessarily AVL-balanced, then you can't just copy it.
To ensure that there is no rebalancing in the new tree, you should create a complete binary tree, and you should insert the nodes in BFS/level order so that every intermediate tree is also complete.
A "complete" tree is one in which every level is full, except possibly the last. Since every complete tree is AVL-balanced, and every intermediate tree is complete, there will be no rebalancing required.
If you can't copy your original tree out into an array or other data structure, then you'll need to do log(N) in-order traversals of the original tree to copy all the nodes. During the first traversal, you select and copy the root. During the second, you select and copy level 2. During the third, you copy level 3, etc.
Whether or not a source node is selected for each level depends only on its index within the source tree, so the actual structure of the source tree is irrelevant.
Since each traversal takes O(N) time, the total time spent traversing is O(N log N). Since inserts take O(log N) time, though, that is how long insertion takes as well, so doing log N traversals does not increase the complexity of the overall process.

Count nodes bigger then root in each subtree of a given binary tree in O(n log n)

We are given a tree with n nodes in form of a pointer to its root node, where each node contains a pointer to its parent, left child and right child, and also a key which is an integer. For each node v I want to add additional field v.bigger which should contain number of nodes with key bigger than v.key, that are in a subtree rooted at v. Adding such a field to all nodes of a tree should take O(n log n) time in total.
I'm looking for any hints that would allow me to solve this problem. I tried several heuristics - for example when thinking about doing this problem in bottom-up manner, for a fixed node v, v.left and v.right could provide v with some kind of set (balanced BST?) with operation bigger(x), which for a given x returns a number of elements bigger than x in that set in logarihmic time. The problem is, we would need to merge such sets in O(log n), so this seems as a no-go, as I don't know any ordered set like data structure which supports quick merging.
I also thought about top-down approach - a node v adds one to some u.bigger for some node u if and only if u lies on a simple path to the root and u<v. So v could update all such u's somehow, but I couldn't come up with any reasonable way of doing that...
So, what is the right way of thinking about this problem?
Perform depth-first search in given tree (starting from root node).
When any node is visited for the first time (coming from parent node), add its key to some order-statistics data structure (OSDS). At the same time query OSDS for number of keys larger than current key and initialize v.bigger with negated result of this query.
When any node is visited for the last time (coming from right child), query OSDS for number of keys larger than current key and add the result to v.bigger.
You could apply this algorithm to any rooted trees (not necessarily binary trees). And it does not necessarily need parent pointers (you could use DFS stack instead).
For OSDS you could use either augmented BST or Fenwick tree. In case of Fenwick tree you need to preprocess given tree so that values of the keys are compressed: just copy all the keys to an array, sort it, remove duplicates, then substitute keys by their indexes in this array.
Basic idea:
Using the bottom-up approach, each node will get two ordered lists of the values in the subtree from both sons and then find how many of them are bigger. When finished, pass the combined ordered list upwards.
Details:
Leaves:
Leaves obviously have v.bigger=0. The node above them creates a two item list of the values, updates itself and adds its own value to the list.
All other nodes:
Get both lists from sons and merge them in an ordered way. Since they are already sorted, this is O(number of nodes in subtree). During the merge you can also find how many nodes qualify the condition and get the value of v.bigger for the node.
Why is this O(n logn)?
Every node in the tree counts through the number of nodes in its subtree. This means the root counts all the nodes in the tree, the sons of the root each count (combined) the number of nodes in the tree (yes, yes, -1 for the root) and so on all nodes in the same height count together the number of nodes that are lower. This gives us that the number of nodes counted is number of nodes * height of the tree - which is O(n logn)
What if for each node we keep a separate binary search tree (BST) which consists of nodes of the subtree rooted at that node.
For a node v at level k, merging the two subtrees v.left and v.right which both have O(n/2^(k+1)) elements is O(n/2^k). After forming the BST for this node, we can find v.bigger in O(n/2^(k+1)) time by just counting the elements in the right (traditionally) subtree of the BST. Summing up, we have O(3*n/2^(k+1)) operations for a single node at level k. There are a total of 2^k many level k nodes, therefore we have O(2^k*3*n/2^(k+1)) which is simplified as O(n) (dropping the 3/2 constant). operations at level k. There are log(n) levels, hence we have O(n*log(n)) operations in total.

How can I efficiently get to the leaves of a binary-search tree?

I want to sum all the values in the leaves of a BST. Apparently, I can't get to the leaves without traversing the whole tree. Is this true? Can I get to the leaves without taking O(N) time?
You realize that the leaves themselves will be at least 1/2 of O(n) anyway?
There is no way to get the leaves of a tree without traversing the whole tree (especially if you want every single leaf), which will unfortunately operate in O(n) time. Are you sure that a tree is the best way to store your data if you want to access all of these leaves? There are other data structures which will allow more efficient access to your data.
To access all leaf nodes of a BST, you will have to traverse all the nodes of BST and that would be of order O(n).
One alternative is to use B+ tree where you can traverse to a leaf node in O(log n) time and after that all leaf nodes can be accessed sequentially to compute the sum. So, in your case it would be O(log n + k), where k is the number of leaf nodes and n is the total number of nodes in the B+ tree.
cheers
You will either have to traverse the tree searching for nodes without children, or modify the structure you are using to represent the tree to include a list of the leaf nodes. This will also necessitate modifying your insert and delete methods to maintain the list (for instance, if you remove the last child from a node, it becomes a leaf node). Unless the tree is very large, it's probably nice enough to just go ahead and traverse the tree.

Resources