Delete duplicates from binary tree - algorithm

Im trying to come up with an algorithm for deleting duplicates from a binary tree/binary search tree. So far the best I could come up with was
Store Inorder traversal of tree in an array.
If the tree has no ordering, sort the array.
remove duplicates from the array and reconstruct the binary tree.
Do we need to store the pre order traversal of the tree as well to reconstruct the tree?
This puts the complexity at O(n log n ) time and O(n) space. Can we do better? Pseudo code / code samples would be appreciated
EDIT 1: Assume the structure of the binary tree is given by the following object
public class Node
{
int data;
Node right;
Node left;
// getters and setters for the left and right nodes
}

Remove duplicate algorithm for a Binary Search Tree:
Start a tree walk (in/pre/post order)
At each node, do a binary search on the subtree rooted at that node for the key value stored in the node.
If the key value is found down the tree, call delete(key) and restart step 2 (Might have multiple duplicates).
Repeat step 2 until key not found in sub tree. Then go back to step 1
Time Complexity - O(nlogn) (For each of the n elements do a binary search which takes logn time)
Space Complexity - O(logn) (for the stack space used in the recursion)

I have a weird idea:
If you already have a tree with duplicates and need to build the new one without duplicates then you can just:
Create empty hashmap of node values
Create new Tree
Traverse existing tree in any order and get elements one by one
If element is already present in hash map then do not add it to new
tree
Is it ok?)

Proposed Recursive solution for your problem : -
deleteDup(Node p) {
if(p == NULL) return
else {
deleteDup(p.left)
deleteDup(p.right)
delete(p.value,p.left)
delete(p.value,p.right)
}
}
deleteDup(root)
Time complexity :
Deletion in BST = O(logN)
T(N) = 2T(N/2) + O(logN)
T(N) = O(N)
Space Complexity: O(logN)
Note:- Assuming balanced BST

I had another idea that runs in O(n) time and O(n) space.
For example, we have a tree with root A and two children A and B.
A
A B
Create a Map counts that maps each value in the tree to a count.
Traverse all elements in the tree to populate counts. We will get counts[A]=2 and counts[B]=1. This takes O(n) space and O(n) time.
Again, traverse all elements in the tree. For every element, check if counts[element] > 1 and if so, we set counts[element]--, and delete that element (using binary tree deletion/rotation). This is O(n) time.
Once done, the tree has no more duplicates!

I guess if child node equals to parent node, then we can try removing child node.
void travel(Node* root, int parentNode)
{
if(root == NULL)
return;
if(root->val == parentNode)
deleteNode(root, root->val) //method to remove node from BST
travel(root->left, root->val);
tarvel(root->right,root->val);
}
tarvel(root, INT_MIN);

The trick in this situation is to not store the duplicates to start with....
When you add the nodes to the tree why can't you make the decision not to add the node if it already has a duplicate in the tree?

Related

Can a binary search tree be constructed from only the inorder traversal?

Wanted to check if there is a way to construct a binary search tree from only the inorder traversal which will be in sorted order. I am thinking there might be some way we can do it recursively, but not able to figure it out. Any pointers would be appreciated.
A BST has exactly one in-order traversal, but more than one BST can be constructed with a given in-order traversal. Hence, Yes, it is possible to construct a BST with a given in-order traversal, but you may not end up with the same tree whose in-order traversal you've started with.
Check this article for more info: https://www.geeksforgeeks.org/find-all-possible-trees-with-given-inorder-traversal/
Yes it is possible to construct a binary search tree from an inorder traversal.
Observe that an inorder traversal of a binary search tree is always sorted.
There are many possible outcomes, but one can be particularly interested in producing a tree that is as balanced as possible, so to make searches in it efficient. One way to produce an inefficient binary search tree, is to keep adding new nodes to the right of the last inserted node. The resulting binary search tree is then skewed.
If for example the inorder traversal is 1,2,3,4,5,6, we would get this tree:
1
\
2
\
3
\
4
\
5
\
6
That is not very helpful as binary search tree, as it really has degenerated into a single chain, and a search in that tree is no better than performing a search from left to right in the input array.
A much more efficient alternative would be this binary search tree, which has the same inorder traversal:
3
/ \
2 5
/ / \
1 4 6
Algorithm
The algorithm to produce a rather balanced binary search tree can indeed be a recursive one:
If the given array (traversal) is empty return null (emtpy tree)
Take the middle value of the given array (traversal) and create a node for it.
Take the subarray at the left of the selected array value, and perform the algorithm recursively for it. This returns a node (rooting a subtree), which should be attached as left child of the node created in step 2.
Take the subarray at the right of the selected array value, and perform the algorithm recursively for it. This returns a node (rooting a subtree), which should be attached as right child of the node created in step 2.
Return the node created in step 2.
TreeNode* solve(vector<int>& v, int l, int r){
if(l>r){
return NULL;
}
int m = (l+r)/2;
TreeNode* root = new TreeNode(v[m]);
root->left = solve(v, l, m-1);
root-> right = solve(v, m+1,r);
return root;
}
TreeNode *constructFromInOrder(vector<int> &inorder)
{
return solve(inorder, 0, inorder.size()-1);
}
c++ code to convert Inorder to BST

AVL Trees: How to do index access?

I noticed on the AVL Tree Wikipedia page the following comment:
"If each node additionally records the size of its subtree (including itself and its descendants), then the nodes can be retrieved by index in O(log n) time as well."
I've googled and have found a few places mentioning accessing by index but can't seem to find an explanation of the algorithm one would write.
Many thanks
[UPDATE] Thanks people. If found #templatetypedef answer combined with one of #user448810 links to particularly help. Especially this snipit:
"The key to both these functions is that the index of a node is the size of its left child. As long as we are descending a tree via its left child, we just take the index of the node. But when we have to move down the tree via its right child, we have to adjust the size to include the half of the tree that we have excluded."
Because my implementation is immutable I didn't need to do any additional work when rebalancing as each node calculates it's size on construction (same as the scheme impl linked)
My final implementation ended up being:
class Node<K,V> implements AVLTree<K,V> { ...
public V index(int i) {
if (left.size() == i) return value;
if (i < left.size()) return left.index(i);
return right.index(i - left.size() - 1);
}
}
class Empty<K,V> implements AVLTree<K,V> { ...
public V index(int i) { throw new IndexOutOfBoundsException();}
}
Which is slightly different from the other implementations, let me know if you think I have a bug!
The general idea behind this construction is to take an existing BST and augment each node by storing the number of nodes in the left subtree. Once you have done this, you can look up the nth node in the tree by using the following recursive algorithm:
To look up the nth element in a BST whose root node has k elements in its left subtree:
If k = n, return the root node (since this is the zeroth node in the tree)
If n ≤ k, recursively look up the nth element in the left subtree.
Otherwise, look up the (n - k - 1)st element in the right subtree.
This takes time O(h), where h is the height of the tree. In an AVL tree, this O(log n). In CLRS, this construction is explored as applied to red/black trees, and they call such trees "order statistic trees."
You have to put in some extra logic during tree rotations to adjust the cached number of elements in the left subtree, but this is not particularly difficult.
Hope this helps!

How to generate an AVL tree as lopsided as possible?

I saw this in some paper and someone argued that there can be at most log(n) times rotation when we delete a node of an AVL tree. I believe we can achieve this by generating an AVL tree as lopsided as possible. The problem is how to do this. This will help me a lot about researching the removal rotation thing. Thanks very much!
If you want to make a maximally lopsided AVL tree, you are looking for a Fibonacci tree, which is defined inductively as follows:
A Fibonacci tree of order 0 is empty.
A Fibonacci tree of order 1 is a single node.
A Fibonacci tree of order n + 2 is a node whose left child is a Fibonacci tree of order n and whose right child is a Fibonacci tree of order n + 1.
For example, here's a Fibonacci tree of order 5:
The Fibonacci trees represent the maximum amount of skew that an AVL tree can have, since if the balance factor were any more lopsided the balance factor of each node would exceed the limits placed by AVL trees.
You can use this definition to very easily generate maximally lopsided AVL trees:
function FibonacciTree(int order) {
if order = 0, return the empty tree.
if order = 1, create a single node and return it.
otherwise:
let left = FibonacciTree(order - 2)
let right = FibonacciTree(order - 1)
return a tree whose left child is "left" and whose right child is "right."
Hope this helps!

Convert Binary Tree -> BST (maintaining original tree shape)

I have a binary tree of some shape. I want to Convert it to BST search tree of same shape. Is it possible?
I tried methods like -
Do In-order traversal of Binary Tree & put contents into an array. Then map this into a BST keeping in mind the condition (left val <= root <= right val). This works for some cases but faile for others.
P.S.: I had a look at this - Binary Trees question. Checking for similar shape. But It's easy to compare 2 BST's for similarity in shape.
The short answer is: you can't. A BST requires that the nodes follow the rule left <= current < right. In the example you linked: http://upload.wikimedia.org/wikipedia/commons/f/f7/Binary_tree.svg, if you try and build a BST with the same shap you'll find that you can't.
However if you want to stretch the definition of a BST so that it allows left <= current <= right (notice that here current <= right is allowed, as apposed to the stricter definition) you can. Sort all the elements and stick them in an array. Now do an in-order traversal, replacing the values at nodes with each element in your array. Here's some pseudo code:
// t is your non-BST tree, a is an array containing the sorted elements of t, i is the current index into a
index i = 0
create_bst(Tree t, Array a)
{
if(t is NIL)
return;
create_bst(t->left, a)
t->data = a[i]
i++
create_bst(t->right, a)
}
The result won't be a true BST however. If you want a true BST that's as close to the original shape as possible, then you again put the elements in a sorted array but this time insert them into a BST. The order in which you insert them is defined by the sizes of the subtrees of the original tree. Here's some pseudo-code:
// left is initially set to 0
create_true_bst(Tree t, BST bt, array a, index left)
{
index i = left + left_subtree(t)->size
bt->insert(a[i])
if(left_subtree(t)->size != 0)
{
create_true_bst(t->left, bt, a, left)
create_true_bst(t->right, bt, a, i + 1)
}
}
This won't guarantee that the shape is the same however.
extract all elements of tree, then sort it and then use recursive inorder process to replace values.
The method you describe is guaranteed to work if you implement it properly. The traversal order on a binary tree is unique, and defines an ordering of the elements. If you sort the elements by value and then stick them in according to that ordering, then it will always be true that
left subtree <= root <= right subtree
for every node, given that this is the order in which you traverse them, and given that you sorted them in that order.
I would simply do two in-order traversals. In the first traversal, get the values from the tree and put them into a heap. In the second, get the values in order from the heap and put them into the tree. This runs in O(n·log n) time and O(n) space.

Finding last element of a binary heap

quoting Wikipedia:
It is perfectly acceptable to use a
traditional binary tree data structure
to implement a binary heap. There is
an issue with finding the adjacent
element on the last level on the
binary heap when adding an element
which can be resolved
algorithmically...
Any ideas on how such an algorithm might work?
I was not able to find any information about this issue, for most binary heaps are implemented using arrays.
Any help appreciated.
Recently, I have registered an OpenID account and am not able to edit my initial post nor comment answers. That's why I am responding via this answer. Sorry for this.
quoting Mitch Wheat:
#Yse: is your question "How do I find
the last element of a binary heap"?
Yes, it is.
Or to be more precise, my question is: "How do I find the last element of a non-array-based binary heap?".
quoting Suppressingfire:
Is there some context in which you're
asking this question? (i.e., is there
some concrete problem you're trying to
solve?)
As stated above, I would like to know a good way to "find the last element of a non-array-based binary heap" which is necessary for insertion and deletion of nodes.
quoting Roy:
It seems most understandable to me to
just use a normal binary tree
structure (using a pRoot and Node
defined as [data, pLeftChild,
pRightChild]) and add two additional
pointers (pInsertionNode and
pLastNode). pInsertionNode and
pLastNode will both be updated during
the insertion and deletion subroutines
to keep them current when the data
within the structure changes. This
gives O(1) access to both insertion
point and last node of the structure.
Yes, this should work. If I am not mistaken, it could be a little bit tricky to find the insertion node and the last node, when their locations change to another subtree due to an deletion/insertion. But I'll give this a try.
quoting Zach Scrivena:
How about performing a depth-first
search...
Yes, this would be a good approach. I'll try that out, too.
Still I am wondering, if there is a way to "calculate" the locations of the last node and the insertion point. The height of a binary heap with N nodes can be calculated by taking the log (of base 2) of the smallest power of two that is larger than N. Perhaps it is possible to calculate the number of nodes on the deepest level, too. Then it was maybe possible to determine how the heap has to be traversed to reach the insertion point or the node for deletion.
Basically, the statement quoted refers to the problem of resolving the location for insertion and deletion of data elements into and from the heap. In order to maintain "the shape property" of a binary heap, the lowest level of the heap must always be filled from left to right leaving no empty nodes. To maintain the average O(1) insertion and deletion times for the binary heap, you must be able to determine the location for the next insertion and the location of the last node on the lowest level to use for deletion of the root node, both in constant time.
For a binary heap stored in an array (with its implicit, compacted data structure as explained in the Wikipedia entry), this is easy. Just insert the newest data member at the end of the array and then "bubble" it into position (following the heap rules). Or replace the root with the last element in the array "bubbling down" for deletions. For heaps in array storage, the number of elements in the heap is an implicit pointer to where the next data element is to be inserted and where to find the last element to use for deletion.
For a binary heap stored in a tree structure, this information is not as obvious, but because it's a complete binary tree, it can be calculated. For example, in a complete binary tree with 4 elements, the point of insertion will always be the right child of the left child of the root node. The node to use for deletion will always be the left child of the left child of the root node. And for any given arbitrary tree size, the tree will always have a specific shape with well defined insertion and deletion points. Because the tree is a "complete binary tree" with a specific structure for any given size, it is very possible to calculate the location of insertion/deletion in O(1) time. However, the catch is that even when you know where it is structurally, you have no idea where the node will be in memory. So, you have to traverse the tree to get to the given node which is an O(log n) process making all inserts and deletions a minimum of O(log n), breaking the usually desired O(1) behavior. Any search ("depth-first", or some other) will be at least O(log n) as well because of the traversal issue noted and usually O(n) because of the random nature of the semi-sorted heap.
The trick is to be able to both calculate and reference those insertion/deletion points in constant time either by augmenting the data structure ("threading" the tree, as mention in the Wikipedia article) or using additional pointers.
The implementation which seems to me to be the easiest to understand, with low memory and extra coding overhead, is to just use a normal simple binary tree structure (using a pRoot and Node defined as [data, pParent, pLeftChild, pRightChild]) and add two additional pointers (pInsert and pLastNode). pInsert and pLastNode will both be updated during the insertion and deletion subroutines to keep them current when the data within the structure changes. This implementation gives O(1) access to both insertion point and last node of the structure and should allow preservation of overall O(1) behavior in both insertion and deletions. The cost of the implementation is two extra pointers and some minor extra code in the insertion/deletion subroutines (aka, minimal).
EDIT: added pseudocode for an O(1) insert()
Here is pseudo code for an insert subroutine which is O(1), on average:
define Node = [T data, *pParent, *pLeft, *pRight]
void insert(T data)
{
do_insertion( data ); // do insertion, update count of data items in tree
# assume: pInsert points node location of the tree that where insertion just took place
# (aka, either shuffle only data during the insertion or keep pInsert updated during the bubble process)
int N = this->CountOfDataItems + 1; # note: CountOfDataItems will always be > 0 (and pRoot != null) after an insertion
p = new Node( <null>, null, null, null); // new empty node for the next insertion
# update pInsert (three cases to handle)
if ( int(log2(N)) == log2(N) )
{# #1 - N is an exact power of two
# O(log2(N))
# tree is currently a full complete binary tree ("perfect")
# ... must start a new lower level
# traverse from pRoot down tree thru each pLeft until empty pLeft is found for insertion
pInsert = pRoot;
while (pInsert->pLeft != null) { pInsert = pInsert->pLeft; } # log2(N) iterations
p->pParent = pInsert;
pInsert->pLeft = p;
}
else if ( isEven(N) )
{# #2 - N is even (and NOT a power of 2)
# O(1)
p->pParent = pInsert->pParent;
pInsert->pParent->pRight = p;
}
else
{# #3 - N is odd
# O(1)
p->pParent = pInsert->pParent->pParent->pRight;
pInsert->pParent->pParent->pRight->pLeft = p;
}
pInsert = p;
// update pLastNode
// ... [similar process]
}
So, insert(T) is O(1) on average: exactly O(1) in all cases except when the tree must be increased by one level when it is O(log N), which happens every log N insertions (assuming no deletions). The addition of another pointer (pLeftmostLeaf) could make insert() O(1) for all cases and avoids the possible pathologic case of alternating insertion & deletion in a full complete binary tree. (Adding pLeftmost is left as an exercise [it's fairly easy].)
My first time to participate in stack overflow.
Yes, the above answer by Zach Scrivena (god I don't know how to properly refer to other people, sorry) is right. What I want to add is a simplified way if we are given the count of nodes.
The basic idea is:
Given the count N of nodes in this full binary tree, do "N % 2" calculation and push the results into a stack. Continue the calculation until N == 1. Then pop the results out. The result being 1 means right, 0 means left. The sequence is the route from root to target position.
Example:
The tree now have 10 nodes, I want insert another node at position 11. How to route it?
11 % 2 = 1 --> right (the quotient is 5, and push right into stack)
5 % 2 = 1 --> right (the quotient is 2, and push right into stack)
2 % 2 = 0 --> left (the quotient is 1, and push left into stack. End)
Then pop the stack: left -> right -> right. This is the path from the root.
You could use the binary representation of the size of the Binary Heap to find the location of the last node in O(log N). The size could be stored and incremented which would take O(1) time. The the fundamental concept behind this is the structure of the binary tree.
Suppose our heap size is 7. The binary representation of 7 is, "111". Now, remember to always omit the first bit. So, now we are left with "11". Read from left-to-right. The bit is '1', so, go to the right child of the root node. Then the string left is "1", the first bit is '1'. So, again go to the right child of the current node you are at. As you no longer have bits to process, this indicates that you have reached the last node. So, the raw working of the process is that, convert the size of the heap into bits. Omit the first bit. According to the leftmost bit, go to the right child of the current node if it is '1', and to the left child of the current node if it is '0'.
As you always to to the very end of the binary tree this operation always takes O(log N) time. This is a simple and accurate procedure to find the last node.
You may not understand it in the first reading. Try working this method on the paper for different values of Binary Heap, I'm sure you'll get the intuition behind it. I'm sure this knowledge is enough to solve your problem, if you want more explanation with figures, you can refer to my blog.
Hope my answer has helped you, if it did, let me know...! ☺
How about performing a depth-first search, visiting the left child before the right child, to determine the height of the tree. Thereafter, the first leaf you encounter with a shorter depth, or a parent with a missing child would indicate where you should place the new node before "bubbling up".
The depth-first search (DFS) approach above doesn't assume that you know the total number of nodes in the tree. If this information is available, then we can "zoom-in" quickly to the desired place, by making use of the properties of complete binary trees:
Let N be the total number of nodes in the tree, and H be the height of the tree.
Some values of (N,H) are (1,0), (2,1), (3,1), (4,2), ..., (7,2), (8, 3).
The general formula relating the two is H = ceil[log2(N+1)] - 1.
Now, given only N, we want to traverse from the root to the position for the new node, in the least number of steps, i.e. without any "backtracking".
We first compute the total number of nodes M in a perfect binary tree of height H = ceil[log2(N+1)] - 1, which is M = 2^(H+1) - 1.
If N == M, then our tree is perfect, and the new node should be added in a new level. This means that we can simply perform a DFS (left before right) until we hit the first leaf; the new node becomes the left child of this leaf. End of story.
However, if N < M, then there are still vacancies in the last level of our tree, and the new node should be added to the leftmost vacant spot.
The number of nodes that are already at the last level of our tree is just (N - 2^H + 1).
This means that the new node takes spot X = (N - 2^H + 2) from the left, at the last level.
Now, to get there from the root, you will need to make the correct turns (L vs R) at each level so that you end up at spot X at the last level. In practice, you would determine the turns with a little computation at each level. However, I think the following table shows the big picture and the relevant patterns without getting mired in the arithmetic (you may recognize this as a form of arithmetic coding for a uniform distribution):
0 0 0 0 0 X 0 0 <--- represents the last level in our tree, X marks the spot!
^
L L L L R R R R <--- at level 0, proceed to the R child
L L R R L L R R <--- at level 1, proceed to the L child
L R L R L R L R <--- at level 2, proceed to the R child
^ (which is the position of the new node)
this column tells us
if we should proceed to the L or R child at each level
EDIT: Added a description on how to get to the new node in the shortest number of steps assuming that we know the total number of nodes in the tree.
Solution in case you don't have reference to parent !!!
To find the right place for next node you have 3 cases to handle
case (1) Tree level is complete Log2(N)
case (2) Tree node count is even
case (3) Tree node count is odd
Insert:
void Insert(Node root,Node n)
{
Node parent = findRequiredParentToInsertNewNode (root);
if(parent.left == null)
parent.left = n;
else
parent.right = n;
}
Find the parent of the node in order to insert it
void findRequiredParentToInsertNewNode(Node root){
Node last = findLastNode(root);
//Case 1
if(2*Math.Pow(levelNumber) == NodeCount){
while(root.left != null)
root=root.left;
return root;
}
//Case 2
else if(Even(N)){
Node n =findParentOfLastNode(root ,findParentOfLastNode(root ,last));
return n.right;
}
//Case 3
else if(Odd(N)){
Node n =findParentOfLastNode(root ,last);
return n;
}
}
To find the last node you need to perform a BFS (breadth first search) and get the last element in the queue
Node findLastNode(Node root)
{
if (root.left == nil)
return root
Queue q = new Queue();
q.enqueue(root);
Node n = null;
while(!q.isEmpty()){
n = q.dequeue();
if ( n.left != null )
q.enqueue(n.left);
if ( n.right != null )
q.enqueue(n.right);
}
return n;
}
Find the parent of the last node in order to set the node to null in case replacing with the root in removal case
Node findParentOfLastNode(Node root ,Node lastNode)
{
if(root == null)
return root;
if( root.left == lastNode || root.right == lastNode )
return root;
Node n1= findParentOfLastNode(root.left,lastNode);
Node n2= findParentOfLastNode(root.left,lastNode);
return n1 != null ? n1 : n2;
}
I know this is an old thread but i was looking for a answer to the same question. But i could not afford to do an o(log n) solution as i had to find the last node thousands of times in a few seconds. I did have a O(log n) algorithm but my program was crawling because of the number of times it performed this operation. So after much thought I did finally find a fix for this. Not sure if anybody things this is interesting.
This solution is O(1) for search. For insertion it is definitely less than O(log n), although I cannot say it is O(1).
Just wanted to add that if there is interest, i can provide my solution as well.
The solution is to add the nodes in the binary heap to a queue. Every queue node has front and back pointers.We keep adding nodes to the end of this queue from left to right until we reach the last node in the binary heap. At this point, the last node in the binary heap will be in the rear of the queue.
Every time we need to find the last node, we dequeue from the rear,and the second-to-last now becomes the last node in the tree.
When we want to insert, we search backwards from the rear for the first node where we can insert and put it there. It is not exactly O(1) but reduces the running time dramatically.

Resources