Delete in Binary Search tree? - algorithm

I am reading through the binary tree delete node algorithm used in the book Data Structures and Algorithms: Annotated Reference with Examples
on page 34 , case 4(Delete node which has both right and left sub trees), following algorithm described in the book looks doesn't work, probably I may be wrong could someone help me what I am missing.
//Case 4
get largestValue from nodeToRemove.Left
FindParent(largestValue).Right <- 0
nodeToRemove.Value<-largestValue.Value
How does the following line deletes the largest value from sub tree FindParent(largestValue).Right <- 0

When deleting a node with two children, you can either choose its in-order successor node or its in-order predecessor node. In this case it's finding the the largest value in the left sub-tree (meaning the right-most child of its left sub-tree), which means that it is finding the node's in-order predecessor node.
Once you find the replacement node, you don't actually delete the node to be deleted. Instead you take the value from the successor node and store that value in the node you want to delete. Then, you delete the successor node. In doing so you preserve the binary search-tree property since you can be sure that the node you selected will have a value that is lower than the values of all the children in the original node's left sub-tree, and greater that than the values of all the children in the original node's right sub-tree.
EDIT
After reading your question a little more, I think I have found the problem.
Typically what you have in addition to the delete function is a replace function that replaces the node in question. I think you need to change this line of code:
FindParent(largestValue).Right <- 0
to:
FindParent(largestValue).Right <- largestValue.Left
If the largestValue node doesn't have a left child, you simply get null or 0. If it does have a left child, that child becomes a replacement for the largestValue node. So you're right; the code doesn't take into account the scenario that the largestValue node might have a left child.
Another EDIT
Since you've only posted a snippet, I'm not sure what the context of the code is. But the snippet as posted does seem to have the problem you suggest (replacing the wrong node). Usually, there are three cases, but I notice that the comment in your snippet says //Case 4 (so maybe there is some other context).
Earlier, I alluded to the fact that delete usually comes with a replace. So if you find the largestValue node, you delete it according to the two simple cases (node with no children, and node with one child). So if you're looking at pseudo-code to delete a node with two children, this is what you'll do:
get largestValue from nodeToRemove.Left
nodeToRemove.Value <- largestValue.Value
//now replace largestValue with largestValue.Left
if largestValue = largestValue.Parent.Left then
largestValue.Parent.Left <- largestValue.Left //is largestValue a left child?
else //largestValue must be a right child
largestValue.Parent.Right <- largestValue.Left
if largestValue.Left is not null then
largestValue.Left.Parent <- largestValue.Parent
I find it strange that a Data Structures And Algorithms book would leave out this part, so I am inclined to think that the book has further split up the deletion into a few more cases (since there are three standard cases) to make it easier to understand.
To prove that the above code works, consider the following tree:
8
/ \
7 9
Let's say that you want to delete 8. You try to find largestValue from nodeToRemove.Left. This gives you 7 since the left sub-tree only has one child.
Then you do:
nodeToRemove.Value <- largestValue.Value
Which means:
8.value <- 7.Value
or
8.Value <- 7
So now your tree looks like this:
7
/ \
7 9
You need to get rid of the replacement node and so you're going to replace largestValue with largestValue.Left (which is null). So first you find out what kind of child 7 is:
if largestValue = largestValue.Parent.Left then
Which means:
if 7 = 7.Parent.Left then
or:
if 7 = 8.Left then
Since 7 is 8's left child, need to replace 8.Left with 7.Right (largestValue.Parent.Left <- largestValue.Left). Since 7 has no children, 7.Left is null. So largestValue.Parent.Left gets assigned to null (which effectively removes its left child). So this means that you end up with the following tree:
7
\
9

The idea is to simply take the value from the largest node on the left hand side and move it to the node that is being deleted, i.e., don't delete the node at all, just replace it's contents. Then you prune out the node with the value you moved into the "deleted" node. This maintains the tree ordering with every node's value larger than all of it's left children and smaller than all of it's right children.

I think you may need to clarify what doesn't work.
I will try and explain the concept of deletion in a binary tree in case this helps.
Lets assume that you have a node in the tree that has two child nodes that you wish to delete.
in the tree below lets say that you want to delete node b
a
/ \
b c
/ \ / \
d e f g
When we delete a node we need to reattach its dependant nodes.
ie. When we delete b we need to reattach nodes d and e.
We know that the left nodes are less than the right nodes in value and that the parent nodes are between the left and right node s in value. In this case d < b and b < e. This is part of the definition of a binary tree.
What is slightly less obvious is that e < a. So this means that we can replace b with e. Now we have reattached e we need to reattach d.
As stated before d < e so we can attach e as the left node of e.
The deletion is now complete.
( btw The process of moving a node up the tree and rearranging the dependant nodes in this fashion is known as promoting a node. You can also promote a node without deleting other nodes.)
a
/ \
d c
\ / \
e f g
Note that there is another perfectly legitimate outcome of deleteing node b.
If we chose to promote node d instead of node e the tree would look like this.
a
/ \
e c
/ / \
d f g

If I understand the pseudo-code, it works in the general case, but fails in the "one node in the left subtree" case. Nice catch.
It effectively replaces the node_to_remove with largest_value from it's left subtree (also nulls the old largest_value node).
Note that in a BST, the left subtree of node_to_remove will be all be smaller than node_to_remove. The right subtree of node_to_remove will all be larger than node_to_remove. So if you take the largest node in the left subtree, it will preserve the invariant.
If this is a "one node in the subtree case", it'll destroy the right subtree instead. Lame :(
As Vivin points out, it also fails to reattach left children of largestNode.

It may make more sense when you look at the Wikipedia's take on that part of the algorithm:
Deleting a node with two children:
Call the node to be deleted "N". Do
not delete N. Instead, choose either
its in-order successor node or its
in-order predecessor node, "R".
Replace the value of N with the value
of R, then delete R. (Note: R itself
has up to one child.)
Note that the given algorithm chooses the in-order predecessor node.
Edit: what appears to be missing the possibility that R (to use Wikipedia's terminology) has one child. A recursive delete might work better.

Related

Deleting nodes from a binary search tree

I understand the idea when deleting a node that has two subtrees: I "erase" the node's value and replace it with either its predecessor from the left subtree's value or its successor from the right subtree's value, and then delete that node.
However, does it matter if I choose the successor from the right subtree or the predecessor from the left subtree? Or is either way valid as long as I still have a binary search tree after performing the deletion?
Both ways to perform a delete operation are valid if the node has two children.
Remember that when you get either the in-order predecessor node or the in-order successor node, you must call the delete operation on that node.
It doesn't matter which one you choose to replace. In fact you may need both.
Look at the following BST.
7
/ \
4 10
/ \ /
1 5 8
\
3
To delete 1, you need to replace 1 with right node 3.
And to delete 10, you need to replace 10 with left node 8.

Reconstruct a tree from a list, with depth info encapsulated in the entry of list

We built a list from a tree (not necessarily a binary search tree) via depth first traversal.
Each entry inside is a pair (k, d), k is the key of a node and d is the depth of that node.
Now we need to construct the original tree back from the list.
How do we do it?
Note
tree is not necessarily a binary search tree
we do not know whether the depth first traversal is pre-order, in-order or post-order.
My question is
Can we achieve this reverse engineering under the conditions? I know for binary search tree, we need at least two traversal lists (e.g., inorder and postorder list) to reconstruct the original tree.
How? if possible
Things to note:
The in-order traversal produces a unique tree
The pre-order and post-order don't:
You can't differentiate between these two:
1 1
/ \
2 2
I'll just generate the one on the left (doing this makes it a lot easier).
What we can say right away:
If the first node is the root (i.e. not depth 0):
We're either doing in-order with an empty left subtree, or pre-order.
If the last node is the root:
We're either doing in-order with an empty right subtree, or post-order.
If neither of the above:
We're doing in-order traversal.
For the two cases above where we don't know which traversal to do, the simplest approach is to try to generate the trees for both possible traversals, and discard whichever one doesn't work (based on the below restrictions), if either.
Some restrictions:
For in-order, we can't go right or up if the current node is empty.
For pre-order, we can't go left or right if the current node is empty.
For post-order, we have to go up after setting the current node - we can't go left or right without having set the current node.
In all cases, we try to go left before going right before going up.
By 'go left' or 'go right', I mean creating an (empty) left or right child and traversing to that node.
By 'go up', I mean simply traversing upwards in the already created tree.
Based on the above restrictions, it should be easy to write an algorithm to generate the tree. As an example for in-order:
If the new node's depth is deeper than the current node's depth:
If the current node is empty and doesn't have a left child, we can just create a left child and set that as the current node
Otherwise, if the current node is not empty and doesn't have a right child, we can just create a right child and set that as the current node
Otherwise, if the depth is the same as the current node and the current node is empty,
set that node's value to the new node
If none of the above cases triggered and the current node is empty,
set the parent of the current node as the current node
If none of the above cases triggered, fail
If 1.1, 1.2, or 3 triggered, repeat from 1.
Example:
Input: (f, 2), (g, 2), (b, 1), (i, 2), (c, 1), (a, 0)
Since (a, 0) is the root, we're doing either in-order or post-order.
So then we generate 2 subtrees:
in-order post-order
. .
/ /
. .
/ /
f f
(. indicates an empty node)
When we get (g, 2), we can already discard the in-order tree, as we can't go right or up from f's parent, because it is empty, so we're stuck.
Then we continue with post-order:
.
/
.
/ \
f g
.
/
b
/ \
f g
.
/ \
b .
/ \ /
f g i
.
/ \
b c
/ \ /
f g i
a
/ \
b c
/ \ /
f g i
I'm not sure what you mean by pre/post/in-order, a single DFS run with the depth data should allow you to reconstruct the tree if you know the time of first visiting each node (I guess that would amount to a "pre-order" by your definition). In-order isn't even defined well in non-binary trees (would the parent appear after the first node? after the second? what if there's a single child?)
If you can tell the order of discovering each node, you can just go over the list while the depth increases, creating more and more children (and keeping track of the last node encountered at each depth), and once you get a non increasing depth you know for sure how many levels up you need to go in order to place the next node.
Two list-consecutive nodes with the same depth would be siblings of the same parent, and in general, if the last node had depth d1 and you now encounter d2, then you need to go up d1-d2+1 levels back up the current branch before attaching the next node.
A depth of d is enough to identify the parent (it would have to be the last node of depth d-1), since in DFS you can't encounter any other parent of that depth without first completely exploring the entire branch descending from from the previous one.
A slightly better proof - let v be a node along your list of depth d. It would have to descend of some node of depth d-1.
Let's assume the list is
[(v0,d0), ... (v, d), ...]
The parent can't be in the remainder of the list since this means you reached a child before its parent - impossible while traversing a tree. So the parent has to be in the first ... section. Let's assume it's not the last d-1-depth node prior to v - so let's say the list is -
[(v0,d0), ... (vi, d-1), ... (vj, d-1), ...(v, d), ...]
if v is the child of vi, than when traversing the original tree your DFS reached vi, missed v somehow, then passed to another branch originating from some ancestor of vi, found vj there, and only then came back to vi and encountered v. This violates the DFS premise.
Maybe I'm missing something, but:
If the first pair in the sequence has depth zero, it's a pre-order traversal.
If the last pair in the sequence has depth zero, it's a post-order traversal.
Else it's an in-order traversal.

How to save the memory when storing color information in Red-Black Trees?

I've bumped into this question at one of Coursera algorithms course and realized that I have no idea how to do that. But still, I have some thoughts about it. The first thing that comes into my mind was using optimized bit set (like Java's BitSet) to get mapping node's key -> color. So, all we need is to allocate a one bit set for whole tree and use it as color information source. If there is no duplicatate elements in the tree - it should work.
Would be happy to see other's ideas about this task.
Just modify the BST. For black node, do nothing. And for red node, exchange its left child and right child. In this case, a node can be justified red or black according to if its right child is larger than its left child.
Use the least significant bit of one of the pointers in the node to store the color information. The node pointers should contain even addresses on most platforms. See details here.
There's 2 rules we can use:
since the root node is always black, then a red node will always have a parent node.
RB BST is always with the order that left_child < parent < right_child
Then we will do this:
keep the black node unchanged.
for the red node, we call it as R, we suppose it as the left child node for it's parent node, called P.
change the red node value from R to R', while R' = P + P - R
now that R' > P, but as it's the left child tree, we will find the order mismatch.
If we find an order mismatch, then we will know it's a red node.
and it's easy to go back to the original R = P + P - R'
One option is to use a tree that requires less bookkeeping, e.g. a splay tree. However, splay trees in particular aren't very good for iteration (they're much better at random lookup), so they may not be a good fit for the domain you're working in.
You can also use one BitSet for the entire red-black tree based on node position, e.g. the root is the 0th bit, the root's left branch is the 1st bit, the right branch is the 2nd bit, the left branch's left branch is the 3rd bit, etc; this way it shouldn't matter if there are duplicate elements. While traversing the tree make note of which bit position you're at.
It's much more efficient in terms of space to use one bitset for the tree instead of assigning a boolean to each node; each boolean will take up at least a byte and may take up a word depending on alignment, whereas the bitset will only take up one bit per node (plus 2x bits to account for a maximally unbalanced tree where the shortest branch is half the length of the longest branch).
Instead of using boolean property on a child we could define a red node as the one who has a child in the wrong place.
If we go this way all leaf nodes are guaranteed to to be black and we should swap parent with his sibling (making him red) when inserting a new node.

Deletion procedure for a Binary Search Tree

Consider the deletion procedure on a BST, when the node to delete has two children. Let's say i always replace it with the node holding the minimum key in its right subtree.
The question is: is this procedure commutative? That is, deleting x and then y has the same result than deleting first y and then x?
I think the answer is no, but i can't find a counterexample, nor figure out any valid reasoning.
EDIT:
Maybe i've got to be clearer.
Consider the transplant(node x, node y) procedure: it replace x with y (and its subtree).
So, if i want to delete a node (say x) which has two children i replace it with the node holding the minimum key in its right subtree:
y = minimum(x.right)
transplant(y, y.right) // extracts the minimum (it doesn't have left child)
y.right = x.right
y.left = x.left
transplant(x,y)
The question was how to prove the procedure above is not commutative.
Deletion (in general) is not commutative. Here is a counterexample:
4
/ \
3 7
/
6
What if we delete 4 and then 3?
When we delete 4, we get 6 as the new root:
6
/ \
3 7
Deleting 3 doesn't change the tree, but gives us this:
6
\
7
What if we delete 3 and then 4?
When we delete 3 the tree doesn't change:
4
\
7
/
6
However, when we now delete 4, the new root becomes 7:
7
/
6
The two resulting trees are not the same, therefore deletion is not commutative.
UPDATE
I didn't read the restriction that this is when you always delete a node with 2 children. My solution is for the general case. I'll update it if/when I can find a counter-example.
ANOTHER UPDATE
I don't have concrete proof, but I'm going to hazard a guess:
In the general case, you handle deletions differently based on whether you have two children, one child, or no children. In the counter-example I provided, I first delete a node with two children and then a node with one child. After that, I delete a node with no children and then another node with one child.
In the special case of only deleting nodes with two children, you want to consider the case where both nodes are in the same sub-tree (since it wouldn't matter if they are in different sub-trees; you can be sure that the overall structure won't change based on the order of deletion). What you really need to prove is whether the order of deletion of nodes in the same sub-tree, where each node has two children, matters.
Consider two nodes A and B where A is an ancestor of B. Then you can further refine the question to be:
Is deletion commutative when you are considering the deletion of two nodes from a Binary Search Tree which have a ancestor-descendant relationship to each other (this would imply that they are in the same sub-tree)?
When you delete a node (let's say A), you traverse the right sub-tree to find the minimum element. This node will be a leaf node and can never be equal to B (because B has two children and cannot be a leaf node). You would then replace the value of A with the value of this leaf-node. What this means is that the only structural change to the tree is the replacement of A's value with the value of the leaf-node, and the loss of the leaf-node.
The same process is involved for B. That is, you replace the value of the node and replace a leaf-node. So in general, when you delete a node with two children, the only structural change is the change in value of the node you are deleting, and the deletion of the leaf node who's value you are using as replacement.
So the question is further refined:
Can you guarantee that you will always get the same replacement node regardless of the order of deletion (when you are always deleting a node with two children)?
The answer (I think) is yes. Why? Here are a few observations:
Let's say you delete the descendant node first and the ancestor node second. The sub-tree that was modified when you deleted the descendant node is not in the left sub-tree of the ancestor node's right child. This means that this sub-tree remains unaffected. What this also means is regardless of the order of deletion, two different sub-trees are modified and therefore the operation is commutative.
Again, let's say you delete the descendant node first and the ancestor node second. The sub-tree that was modified when you deleted the descendant node is in the left sub-tree of the ancestor node's right child. But even here, there is no overlap. The reason is when you delete the descendant node first, you look at the left sub-tree of the descendant node's right child. When you then delete the ancestor node, you will never go down that sub-tree since you will always be going towards the left after you enter the ancestor node's right-child's left sub-tree. So again, regardless of what you delete first you are modifying different sub-trees and so it appears order doesn't matter.
Another case is if you delete the ancestor node first and you find that the minimum node is a child of the descendant node. This means that the descendant node will end up with one child, and deleting the one child is trivial. Now consider the case where in this scenario, you deleted the descendant node first. Then you would replace the value of the descendant node with its right child and then delete the right child. Then when you delete the ancestor node, you end up finding the same minimum node (the old deleted node's left child, which is also the replaced node's left child). Either way, you end up with the same structure.
This is not a rigorous proof; these are just some observations I've made. By all means, feel free to poke holes!
It seems to me that the counterexample shown in Vivin's answer is the sole case of non-commutativity, and that it is indeed eliminated by the restriction that only nodes with two children can be deleted.
But it can also be eliminated if we discard what appears to be one of Vivin's premises, which is that we should traverse the right subtree as little as possible to find any acceptable successor. If, instead, we always promote the smallest node in the right subtree as the successor, regardless of how far away it turns out to be located, then even if we relax the restriction on deleting nodes with fewer than two children, Vivin's result
7
/
6
is never reached if we start at
4
/ \
3 7
/
6
Instead, we would first delete 3 (without successor) and then delete 4 (with 6 as successor), yielding
6
\
7
which is the same as if the order of deletion were reversed.
Deletion would then be commutative, and I think it is always commutative, given the premise I have named (successor is always smallest node in right subtree of deleted node).
I do not have a formal proof to offer, merely an enumeration of cases:
If the two nodes to be deleted are in different subtrees, then deletion of one does not affect the other. Only when they are in the same path can the order of deletion possibly affect the outcome.
So any effect on commutativity can come only when an ancestor node and one of its descendants are both deleted. Now, how does their vertical relationship affect commutativity?
Descendant in the left subtree of the ancestor. This situation will not affect commutativity because the successor comes from the right subtree and cannot affect the left subtree at all.
Descendant in the right subtree of the ancestor. If the ancestor's successor is always the smallest node in the right subtree, then order of deletion cannot change the choice of successor, no matter what descendant is deleted before or after the ancestor. Even if the successor to the ancestor turns out to be the descendant node that is also to be deleted, that descendant too is replaced with the the next-largest node to it, and that descendant cannot have its own left subtree remaining to be dealt with. So deletion of an ancestor and any right-subtree descendant will always be commutative.
I think there are two equally viable ways to delete a node, when it has 2 children: SKIP TO CASE 4...
Case 1: delete 3 (Leaf node)
2 3
/ \ --> / \
1 3 1
Case 2: delete 2 (Left child node)
2 3
/ \ --> / \
1 3 1
Case 3: delete 2 (Right child node)
2 2
/ \ --> / \
1 3 3
______________________________________________________________________
Case 4: delete 2 (Left & Right child nodes)
2 2 3
/ \ --> / \ or / \
1 3 1 3
BOTH WORK and have different resulting trees :)
______________________________________________________________________
As algorithm explained here: http://www.mathcs.emory.edu/~cheung/Courses/323/Syllabus/Trees/AVL-delete.html
Deleting a node with 2 children nodes:
1) Replace the (to-delete) node with its in-order predecessor or in-order successor
2) Then delete the in-order predecessor or in-order successor
I respond here to Vivin's second update.
I think this is a good recast of the question:
Is deletion commutative when you are
considering the deletion of two nodes
from a Binary Search Tree which have a
ancestor-descendant relationship to
each other (this would imply that they
are in the same sub-tree)?
but this bold sentence below is not true:
When you delete a node (let's say A),
you traverse the right sub-tree to
find the minimum element. This node
will be a leaf node and can never be equal to B
since the minimum element in A's right subtree can have a right child. So, it is not a leaf.
Let's call the minimum element in A's right subtree successor(A).
Now, it is true that B cannot be successor(A), but it can be in its right subtree. So, it is a mess.
I try to summarize.
Hypothesis:
A and B have two children each.
A and B are in the same subtree.
Other stuff we can deduce from hypothesis:
B is not successor(A), neither A is successor(B).
Now, given that, i think there are 4 different cases (as usual, let be A an ancestor of B):
B is in A's left subtree
B is an ancestor of successor(A)
successor(A) is an ancestor of B
B and successor(A) don't have any relationship. (they are in different A's subtrees)
I think (but of course i cannot prove it) that cases 1, 2 and 4 don't matter.
So, only in the case successor(A) is an ancestor of B deletion procedure could not be commutative. Or could it?
I pass the ball : )
Regards.

Binary Search Tree - node deletion

I am trying to understand why when deleting a node in a BST tree and having to keep the children and adhering to the BST structure, you have to either take the node's right child (higher value, then node being deleted) and if that right child has a left child take that child. Else just the the node being deleted right child.
Why don't you just take the node being deleted left child, if there's one. It still works out correctly?
Or have I missed something.
I'm reading this article.
You're oversimplifying.
The node selected to replace the one that was deleted must be larger than all the nodes to the left of the deleted one, and smaller than all the nodes to the right. So it must be either the left subtree's rightmost descendant or the right subtree's leftmost descendant; except if one or the other subtree is entirely absent, we can remove a level of tree entirely simply by replacing the deleted node with the child that was present.
The rules listed in the article will always give you the right subtree's leftmost descendant when both trees are present. If you wished, you could indeed derive an alternative ruleset that used the leftmost subtree's rightmost descendant instead.
It does not "work out correctly" to just always use the left child. Indeed, if there is a child on the right and the left child itself has two children, it cannot even be done without essentially rebuilding the tree.
You would be correct for the special case that you described. But for something more general where you can have many more levels deeper than the node being deleted you need to replace that node with a node that will be less than everything to the right, and greater than everything to the left. So as an example:
2
/ \
1 6
/ \
4 7
\
5
Let's say you wanted to move the node 6, now following your instructions we will replace it with the left child, node 4. Now what do we do with node 5? We could make it the left child of node 7 (or the left most descendant of node 7 if it existed), but why would you do all this reshuffling when you know that removing a leaf is trivial and you just want to replace the node with another node that would keep every node on the left less and every node on the right greater.

Resources