Computing the nodes of a left skewed tree faster - data-structures

I have to traverse a tree, in which I need to count the total number of nodes. So, in an easier way, I can traverse the tree and do some count++ to count the nodes of the tree. But this is very time-consuming for a tree having millions of nodes. It will take O(N) time where N is the number of nodes. I want to reduce the time complexity of this approach. How can I do that? For your reference, I am sharing the pseudo-code of my idea.
struct Node{
Node* left;
Node* right;
}
int traverse(Node* node){
if (node == null) return 0; //base case
count++;
count += traverse (node->left) //recursive call
count += traverse (node->right) //recursive call
return count;
}
Also please let me know if the above approach is going to be work or not? If not then why? And please suggest any faster approach.

There is no other way then to count the nodes, i.e. you have to visit them. But you have an undeclared variable count in your recursive function. Instead just start from scratch in each call, as you will only return the count of nodes below that node (and 1 to account for the node itself), irrespective of the other surroundings of that node:
int traverse(Node* node){
if (node == null) return 0; //base case
return 1 + traverse(node->left) + traverse(node->right);
}
But if you are going to count nodes repeatedly, at different states of the tree, then you may benefit from storing the count of nodes in each subtree, and keep that information updated with every mutation of the tree. So you would store in each node instance, the number of nodes in the subtree of which it is the root (including itself).
struct Node{
Node* left;
Node* right;
int count;
}
With every insert operation, you increment the count member of all the ancestors of that node, and give the node itself a count of 1. This represents O(logn) time complexity which does not increase the time complexity you probably already have for the insert operation.
With every delete operation, you decrement the count member of all the ancestors of that deleted node. This represents O(logn) time complexity which does not increase the time complexity you probably already have for the delete operation.
When you need to get the count of the whole tree, just read out the root node's count value. This is then just a O(1) operation.

Related

Time complexity of finding k successors in BST

Given a binary search tree (BST) of height h, it would take O(k+h) time to apply the BST InOrder Successor algorithm k times in succession, starting in any node, applying each next call on the node that was returned by the previous call.
Pseudo code:
get_kth_successor(node):
for times = 1 to k:
node = successor(node)
return node
How can I prove this time complexity?
In particular, I am trying to establish a relation between k and the number of nodes visited, but can't find any pattern here.
Take the following truths concerning a successor(s) traversal:
You can traverse a branch not more than two times: downward and upward.
Every double visit of a branch corresponds to finding at least one more successor: when you backtrack via a branch upwards, you will have visited at least one successor more than at the time you passed that same branch the first time, in the downward direction.
The number of branches you will traverse only once cannot be more than 2h. This worst case happens when you start at a leaf in the bottom-left side of the tree and must go all the way up to the root (a successor) and then down again to a bottom-leaf to find the successor of the root. But if you need more successors after that, you will have to visit some of these branches again (in backtracking) before you can traverse other branches for the first time. So the total number of branches you traverse only once cannot increase above 2h.
So, to find k successors you will at the most traverse k branches twice (downward and upward, cf. point 2) and 2h branches once (point 3), which comes down to a worst case branch-traversal-count of 2k+2h which is O(h+k).
I am going to write the full implementation for the problem in order to make it easy to prove my arguments about the time being taken.
.
Assuming that each node of BST has the following structure:
typedef struct node{
int vale;
struct node* left;
struct node* right;
}node;
The algorithm will have 2 steps:
a. Start from the root node of the Tree and find the starting node and all the ancestor of that node. Store all this in the stack passed:
//root -> root node of the tree.
//val -> value of the node to find.
// s -> stack to store all ancestor.
node* find_node(node* root, int val,std::stack<node*> &s)
{
if(root != NULL) s.push(root);
if(root == NULL || root->value == val) return root;
if(root->value > val) return find_node(root->left);
else return find_node(root->right);
}
and the call to this method would look like:
//Assuming that the root is the root node of the tree.
std::stack<node*> s;
node* ptr = find_node(root,s); // we have all ancestors of ptr along with ptr in stack s now.
b. Now we have to print next k immediate bigger (than ptr) elements of the tree. We will start from the node (which is ptr) :
// s -> stack of all ancestor of the node.
// k -> k-successor to find.
void print_k_bigger(stack<node*> s, int k)
{
//since the top element of stack is the starting node. So we won't print it.
// We will just pop the first node and perform inorder traversal on its right child.
prev = s.top();
s.pop();
inorder(prev->right,k);
// Now all the nodes present in the stack are ancestor of the node.
while(!s.empty() && k>0)
{
//pop the node at the top of stack.
ptr = s.top();
s.pop();
//if the node popped previously (prev) was the right child of the current
//node, i.e. prev was bigger than the current node. So we will have to go
//one more level up to search for successors.
if(prev == ptr->right) continue;
//else the current node is immidiate bigger than prev node. Print it.
printf("%d",ptr->value);
//reduce the count.
k--;
//inorder the right subtree of the current node.
inorder(ptr->right);
//Note this.
prev = ptr;
}
}
Here is how our inorder will look like:
void inorder(node* ptr, int& k)
{
if(ptr != NULL)
{
inorder(ptr->left,k);
printf("%d",ptr->value);
k--;
inorder(ptr->right,k);
}
}
Time Analysis:
The method find_node is O(h) as it can go upto the length max root to leaf path.
The method print_k_bigger is O(h+k) as in every iteration of the loop, either the size of stack is reducing or the value of k is reducing. Note that when inorder() is called from inside the while loop, it won't raise additional overhead as all the calls to inorder() together will take at max O(k).
Here is a very simple algorithm to do this.
Create an empty stack S and a variable curr = NULL.
Find the starting node in the tree. Also, push all of it's ancestors (and the node itself) into stack S
Now pop a node top from the stack S and check if curr is it's right child. If it is not do an inorder traversal of it's right subtree
If we find k or more nodes while traversing, we are done
If we haven't discovered k successors yet, set curr = top, and repeat 2 till we find k nodes
Overall time compleity is O(h+k). Step 1 takes O(h) time. Step 2 takes O(h+k) time (all iterations of step 2 combined take O(h+k) time.)!

Finding the closest intervals in an interval tree that do not contain the query

I have implemented an Interval Tree in Java as outlined in the CLRS algorithm book (it uses red-black tree as the underlying structure). In the book (and as far as I've seen online), it discusses how to find the node whose interval contains the number being queried.
In my case, if the number being queried does not fall into any interval, I want to know what the 'closest' nodes are, i.e., those whose intervals lie immediately before and immediately after the query. I have accomplished this using the following functions. Each node contains the interval (int low, int high), as well as the min and max values of themselves and their subtrees.
public Node[] findPrevNext(int query) {
if (tree.root.isNull())
return null;
else {
Node prev = findPrev(query, tree.root, new Node());
Node next = findNext(query, tree.root, new Node());
Node[] result = {prev, next};
return result;
}
}
private Node findPrev(int query, Node x, Node prev) {
if (x.interval.high < query && x.interval.high > prev.interval.high)
prev = x;
if (!x.left.isNull())
prev = findPrev(query, x.left, prev);
if (!x.right.isNull())
prev = findPrev(query, x.right, prev);
return prev;
}
private Node findNext(int query, Node x, Node next) {
if (x.interval.low > query && x.interval.low < next.interval.low)
next = x;
if (!x.left.isNull())
next = findNext(query, x.left, next);
if (!x.right.isNull())
next = findNext(query, x.right, next);
return next;
}
The problem, of course, is that the functions findPrev() and findNext() both traverse the whole tree and don't take advantage of the tree's structure. Is there a way to perform this query in O(lgn) time?
I've also considered the possibility of creating a second Interval Tree which would contain all interval gaps, and simply doing the query on there. The nodes would then contain information about which elements are before and after that gap (I have attempted but as of yet am unsuccessful in implementing this).
Edit: Just to note, the function findPrevNext() is called after attempting to find the query fails. So, the query is known not to fall in any given interval beforehand.
Since the intervals are ordered by low endpoint, above is straightforward – starting from the hole where an interval [query, query] would be, ascend the tree until you reach a parent from its left child; that parent is the desired node.
Below would seem to require inspection of the max fields. Given a subtree containing only intervals below the query, (i.e., those whose lower endpoint is lower than the query) we can extract one candidate for the nearest below by descending on the path whose nodes have the highest value of max. There are O(log n) maximal such subtrees, one for every time in the search algorithm we considered going left but didn’t. We also need to check the O(log n) nodes on the original search path. Naïvely, this idea leads to a O(log2 n)-time algorithm, but if we maintain at each node a pointer to one interval whose high equals the max for that node, then we get an O(log n)-time algorithm.
The Java TreeMap, which implements a red-black tree, implements the methods headMap and tailMap that return the portions of the map less than the query point (for headMap) or greater than the query point (for tailMap). If you implement something similar to these methods for your tree then this should allow you to do a linear traversal from the query point to find the N closest intervals, rather than having to traverse the entire tree.

In a BST two nodes are randomly swapped we need to find those two nodes and swap them back

Given a binary search tree, in which any two nodes are swapped. So we need to find those two nodes and swap them back(we need to swap the nodes, not the data)
I tried to do this by making an additional array which has the inorder traversal of the tree and also saves the pointer to each node.
Then we just traverse the array and find the two nodes that are not in the sorted order, now these two nodes are searched in the tree and then swapped
So my question is how to solve this problem in O(1) space ?
In order traversal on a BST gives you a traversal on the elements, ordered.
You can use an in-order traversal, find the two out of place elements (store them in two pointers) and swap them back.
This method is actually creating the array you described on the fly, without actually creating it.
Note that however, space consumption is not O(1), it is O(h) where h is the height of the tree, due to the stack trace. If the tree is balanced, it will be O(logn)
Depending on the BST this can be done in O(1).
For instance, if the nodes of the tree have a pointer back to their parents you can use the implementation described in the nonRecrusiveInOrderTraversal section of the Wikipedia page for Tree_traversal.
As another possibility, if the BST is stored as a 1-dimensional array, instead of with pointers between nodes, you can simply walk over the array (and do the math to determine each node's parent and children).
While doing the traversal/walk, check to see if the current element is in the correct place.
If it isn't, then you can just see where in the tree it should be, and swap with the element in that location. (take some care for if the root is in the wrong place).
We can modified the isBST() method as below to swap those 2 randomly swapped nodes and correct it.
/* Returns true if the given tree is a binary search tree
(efficient version). */
int isBST(struct node* node)
{
struct node *x = NULL;
return(isBSTUtil(node, INT_MIN, INT_MAX,&x));
}
/* Returns true if the given tree is a BST and its
values are >= min and <= max. */
int isBSTUtil(struct node* node, int min, int max, struct node **x)
{
/* an empty tree is BST */
if (node==NULL)
return 1;
/* false if this node violates the min/max constraint */
if (node->data < min || node->data > max) {
if (*x == NULL) {
*x = node;//same the first incident where BST property is not followed.
}
else {
//here we second node, so swap it with *x.
int tmp = node->data;
node->data = (*x)->data;
(*x)->data = tmp;
}
//return 0;
return 1;//have to return 1; as we are not checking here if tree is BST, as we know it is not BST, and we are correcting it.
}
/* otherwise check the subtrees recursively,
tightening the min or max constraint */
return
isBSTUtil(node->left, min, node->data-1, x) && // Allow only distinct values
isBSTUtil(node->right, node->data+1, max, x); // Allow only distinct values
}

balanced binary tree visit with a twist

Looking for an in-place O(N) algorithm which prints the node pair(while traversing the tree) like below for a given balanced binary tree:
a
b c
d e f g
Output: bc, de, ef, fg
Where 'a' is the root node, 'b' the left child, 'c' the right, etc.
Please note the pair 'ef' in the output.
Additional info based on comments below:
'node pair' is each adjacent pair at each level
each node can have 'parent' pointer/reference in addition to 'left' and 'right'
If we were to relax O(N) and in-place, are there more simple solutions (both recursive and iterative)?
If this tree is stored in an array, it can be rearranged to be "level continuous" (the first element is the root, the next two its children, the next four their children, etc), and the problem is trivial.
If it is stored in another way, it becomes problematic. You could try a breadth-first traversal, but that can consume O(n) memory.
Well I guess you can create an O(n log n) time algorithm by storing the current level and the path to the current element (represented as a binary number), and only storing the last visited element to be able to create pairs. Only O(1 + log n) memory. -> This might actually be an O(n) algorithm with backtracking (see below).
I know there is an easy algorithm that visits all nodes in order in O(n), so there might be a trick to visit "sister" nodes in order in O(n) time. O(n log n) time is guaranteed tho, you could just stop at a given level.
There is a trivial O(n log n) algorithm as well, you just have to filter the nodes for a given level, increasing the level for the next loop.
Edit:
Okay, I created a solution that runs in O(n) with O(1) memory (two machine word sized variables to keep track of the current and maximal level /which is technically O(log log n) memory, but let's gloss over that/ and a few Nodes), but it requires that all Nodes contain a pointer to their parent. With this special structure it is possible to do an inorder traversal without an O(n log n) stack, only using two Nodes for stepping either left, up or right. You stop at a particular maximum level and never go below it. You keep track of the actual and the maximum level, and the last Node you encountered on the maximum level. Obviously you can print out such pairs if you encounter the next at the max level. You increase the maximum level each time you realize there is no more on that level.
Starting from the root Node in an n-1 node binary tree, this amounts to 1 + 3 + 7 + 15 + ... + n - 1 operations. This is 2 + 4 + 8 + 16 + ... + n - log2n = 2n - log2n = O(n) operations.
Without the special Node* parent members, this algorithm is only possible with O(log n) memory due to the stack needed for the in-order traversal.
Assuming you have the following structure as your tree:
struct Node
{
Node *left;
Node *right;
int value;
};
You can print out all pairs in three passes modifying the tree in place. The idea is to link nodes at the same depth together with their right pointer. You traverse down by following left pointers. We also maintain a count of expected nodes for each depth since we don't null terminate the list for each depth. Then, we unzip to restore the tree to its original configuration.
The beauty of this is the zip_down function; it "zips" together two subtrees such that the right branch of the left subtree points to the left branch of the right subtree. If you do this for every subtree, you can iterate over each depth by following the right pointer.
struct Node
{
Node *left;
Node *right;
int value;
};
void zip_down(Node *left, Node *right)
{
if (left && right)
{
zip_down(left->right, right->left);
left->right= right;
}
}
void zip(Node *left, Node *right)
{
if (left && right)
{
zip(left->left, left->right);
zip_down(left, right);
zip(right->left, right->right);
}
}
void print_pairs(const Node * const node, int depth)
{
int count= 1 << depth;
for (const Node *node_iter= node; count > 1; node_iter= node_iter->right, --count)
{
printf("(%c, %c) ", node_iter->value, node_iter->right->value);
}
if (node->left)
{
print_pairs(node->left, depth + 1);
}
}
void generate_tree(int depth, Node *node, char *next)
{
if (depth>0)
{
(node->left= new Node)->value= (*next)++;
(node->right= new Node)->value= (*next)++;
generate_tree(depth - 1, node->left, next);
generate_tree(depth - 1, node->right, next);
}
else
{
node->left= NULL;
node->right= NULL;
}
}
void print_tree(const Node * const node)
{
if (node)
{
printf("%c", node->value);
print_tree(node->left);
print_tree(node->right);
}
}
void unzip(Node * const node)
{
if (node->left && node->right)
{
node->right= node->left->right;
assert(node->right->value - node->left->value == 1);
unzip(node->left);
unzip(node->right);
}
else
{
assert(node->left==NULL);
node->right= NULL;
}
}
int _tmain(int argc, _TCHAR* argv[])
{
char value_generator= 'a';
Node root;
root.value= value_generator++;
generate_tree(2, &root, &value_generator);
print_tree(&root);
printf("\n");
zip(root.left, root.right);
print_pairs(&root, 0);
printf("\n");
unzip(&root);
print_tree(&root);
printf("\n");
return 0;
}
EDIT4: in-place, O(n) time, O(log n) stack space.

Maintaing list order in binary tree

Given a sequence of numbers, I want to insert the numbers into a balanced binary tree such that when I do a inorder traversal on the tree, it gives me the sequence back.
How can I construct the insert method corresponding to this requirement?
Remember that the tree must be balanced, so there isn't a completely trivial solution. I was trying to do this with a modified version of an AVL tree, but I'm not sure if this can work out.
I also wish to be able to implement a delete operation. Delete should delete the item at the ith position in the list.
Edit: clarification:
I want to have:
Insert(i, e), which inserts a single element e right before the ith element in the sequence.
Delete(i), which deletes the ith element of the sequence.
If I do insert(0, 5), insert(0, 4), insert(0, 7), then my stored sequence is now 7, 4, 5 and inorder traversal on the binary tree should give me 7, 4, 5.
If I do delete(1), then inorder traversal on the binary tree should give me 7, 5. As you can see, the sequence order is maintained after deleting the ith sequence element with i = 1 in this case (as I've indexed my sequence from 0).
Keep a linked list in the tree. You already have pointers/references to the nodes thanks to the tree. When you insert into the tree, also insert at the end of the linked list. When you remove from the tree, remove from the linked list. Because you have the references, they're all O(1) operations, and you can iterate through in insertion order if you so choose.
Edit:
to clarify Bart's concern, here's a Java implementation
class LinkedTreeNode<E> {
E data;
LinkedTreeNode<E> parent;
LinkedTreeNode<E> left;
LinkedTreeNode<E> right;
LinkedTreeNode<E> insertNext;
LinkedTreeNode<E> insertPrev;
}
class LinkedTree<E> {
LinkedTreeNode<E> root;
LinkedTreeNode<E> insertHead;
LinkedTreeNode<E> insertTail;
// skip some stuff
void add(E e) {
LinkedTreeNode<E> node = new LinkedTreeNode<E>(e);
// perform the tree insert using whatever method you like
// update the list
node.insertNext = insertTail;
node.insertPrev = insertTail.insertPrev.insertPrev;
node.insertPrev.insertNext = node;
insertTail.insertPrev = node;
}
E remove(E e) {
LinkedTreeNode<E> rem = // whatever method to remove the node
rem.insertNext.insertPrev = rem.insertPrev;
rem.insertPrev.insertNext = rem.insertNext;
return rem.data;
}
}
If you have random access to the contents of the sequence (e.g. you have an array), then you can simply build the tree directly, so that pre-order traversal provides the sequence you want. It can actually be done without random access, but less efficiently, since linear scans may be required to find the midpoint of the sequence.
The key observation here is that the first value in the sequence goes in the root of the tree, then the remainder of the sequence can be divided into halves. The first half will go into the subtree rooted at the left child and the second half will go into the subtree rooted at the right child. Because you are dividing the remaining list in half at each step, the resulting tree will have log_2 n levels and will be balanced.
Here's some C++ code. Note that "end" refers to the index of the element just past the end of the current array segment.
struct Node {
Node(int v) : value(v), l(0), r(0) {}
int value;
Node* l;
Node* r;
};
Node* vector_to_tree(int begin, int end, int array[]) {
if (begin == end)
return NULL;
Node* root = new Node(array[begin++]);
int mid = (begin + end) / 2;
root->l = vector_to_tree(begin, mid, array);
root->r = vector_to_tree(mid, end, array);
return root;
}
void preorder_traverse(Node* root) {
if (!root)
return;
std::cout << root->value << std::endl;
traverse(root->l);
traverse(root->r);
}
This can be done with an AVL tree.
Add items to the AVL tree by adding to the rightmost node of the tree.
AVL balancing does not affect the inorder traversal property due to the nature of rotations.
Storing the size of the tree at each node, and maintaining these values for each node affected during inserts/deletes/balances can be done in O(log n) time. Storing the size of the tree at each node permits searching for an item by rank in the sequence, since the rank of an item in a tree is known by a node's left child's size + 1.
Deletion from the AVL tree can be accomplished by replacing a removed node with the leftmost node of the node's right subtree.
Insert and delete operations are then done in O(log n) time, since the tree is always balanced, and size updates on the nodes are done along a path from inserted/deleted node to root.
Since there is no backing data structure other than a binary tree, other operations that benefit from a tree can be done, such as finding the sum of an [m, n] range of elements in the sequence in O(log n) time.

Resources