About orientation in splay trees in Robert Sedgweick - algorithm

I am reading about splay trees by Robert Sedgwick. Following is text snippet from section
In the root-insertion method, we accomplished our primary objective of bringing the newly inserted node to the root of the tree by using left and right rotations. In this section, we examine how we can modify root insertion such that the rotations balance the tree in a certain sense, as well.
Rather than considering (recursively) the single rotation that brings the newly inserted node to the top of the tree, we consider the two rotations that bring the node from a position as one of the grandchildren of the root up to the top of the tree. First, we perform one rotation to bring the node to be a child of the root. Then, we perform another rotation to bring it to the root. There are two essentially different cases, depending on whether or not the two links from the root to the node being inserted are oriented in the same way. . Splay BSTs are based on the observation that there is an alternative way to proceed when the links from the root to the node being inserted are oriented in the same way: Simply perform two rotations at the root.
Additional information is present at slide 4 at following link
http://www.cs.princeton.edu/courses/archive/spr04/cos226/lectures/balanced.4up.pdf
My question is
What does author mean by orientation here?
Request to given an example for orienations differ and orientation match.
Thanks

Orientation simply means left or right child.
A
/ \
B C
/ \
D E
E is an example of the same orientation case - C is the right child of A and E is again the right child of C. D is an example of differing orientations because D is the left child of C but C is the right child of A.
So you can describe the orientation of all the nodes with respect to the root node A as follows.
B left
C right
D right left
E right right

Suppose you are accessing node n, whose parent is p, and whose grandparent is g. Then the orientation is the same if both n and p are the left or right child of its parent.
So, for example, if p is the right child of g and n is the left child of p, the orientation is not the same. If p is the right child of g and n is the right child of p, the orientation is the same.

Related

Finding the neighbours of a segment (Bentley-Ottmann algorithm )

I'm implementing the Bentley-Ottmann algorithm
to find the set of segment intersection points,
unfortunately I didn't understand some things.
For example :
how can I get the neighbours of the segment Sj in the image.
I'm using a balanced binary search tree for the sweepLine status, but we store the segments in the leaves, after reading this wikipedia article I didn't find an explanation for this operation.
From the reference book (de Berg & al.: "Computational Geometry", ill. at p.25):
Suppose we search in T for the segment immediately to the left of some point p that lies on the sweep line.
At each internal node v we test whether p lies left or right of the segment stored at v.
Depending on the outcome we descend to the left or right subtree of v,
eventually ending up in a leaf.
Either this leaf, or the leaf immediately to the left of it, stores the segment we are searching for.
For my example if I follow this I will arrive at the leaf Sj but I will know just the leaf to the left i.e. Sk, how can I get Si?
Edit
I found this discussion that looks like my problem, unfortunately there are no answers about how can I implement some operations in such data structure.
The operation are:
inserting a node in such data structure.
deleting a node.
swapping two nodes.
searching for neighbours' node.
I know how to implement these operations in a balanced binary search tree when we store data too in internal node, but with this type of AVL I don't know if it is the same thing.
Thank you
I stumbled upon the same problem when reading Computational Geometry from DeBerg (see p. 25 for the quote and the image). My understanding is the following:
say you need the right neighbor of a segment S which is in the tree. If you store data in the nodes, the pseudo code is:
locate node S
if S has a right subtree:
return the left-most node of the right subtree of S
else if S is in the left sub-tree of any ancestor:
return the lowest/nearest such ancestor
else
return not found
If you store the data in the leaves, the pseudo-code becomes:
let p the point of S currently on the sweep line
let n the segment at the root of the tree
while n != null && n is not a leaf:
if n = S:
n = right child of S
else:
determine if p is on the right or left of n
update n accordingly (normal descent)
In the end, either n is null and it means there is no right neighbor, or n points to the proper leaf.
The same logic applies for the left neighbor.
Same as you, I have met the same problem while reading the de Berg & al.: "Computational Geometry". But I think The C++ Standard Template Library (STL) have an implantation called "map" which can do the job.
You just need to define some personalized class for line segment and event points and their comparison functions. Then, use std::map to build the tree and access the neighboring element using map.find() to get and iterator, and use iterator to gain access to the two neighbor element.

How to save the memory when storing color information in Red-Black Trees?

I've bumped into this question at one of Coursera algorithms course and realized that I have no idea how to do that. But still, I have some thoughts about it. The first thing that comes into my mind was using optimized bit set (like Java's BitSet) to get mapping node's key -> color. So, all we need is to allocate a one bit set for whole tree and use it as color information source. If there is no duplicatate elements in the tree - it should work.
Would be happy to see other's ideas about this task.
Just modify the BST. For black node, do nothing. And for red node, exchange its left child and right child. In this case, a node can be justified red or black according to if its right child is larger than its left child.
Use the least significant bit of one of the pointers in the node to store the color information. The node pointers should contain even addresses on most platforms. See details here.
There's 2 rules we can use:
since the root node is always black, then a red node will always have a parent node.
RB BST is always with the order that left_child < parent < right_child
Then we will do this:
keep the black node unchanged.
for the red node, we call it as R, we suppose it as the left child node for it's parent node, called P.
change the red node value from R to R', while R' = P + P - R
now that R' > P, but as it's the left child tree, we will find the order mismatch.
If we find an order mismatch, then we will know it's a red node.
and it's easy to go back to the original R = P + P - R'
One option is to use a tree that requires less bookkeeping, e.g. a splay tree. However, splay trees in particular aren't very good for iteration (they're much better at random lookup), so they may not be a good fit for the domain you're working in.
You can also use one BitSet for the entire red-black tree based on node position, e.g. the root is the 0th bit, the root's left branch is the 1st bit, the right branch is the 2nd bit, the left branch's left branch is the 3rd bit, etc; this way it shouldn't matter if there are duplicate elements. While traversing the tree make note of which bit position you're at.
It's much more efficient in terms of space to use one bitset for the tree instead of assigning a boolean to each node; each boolean will take up at least a byte and may take up a word depending on alignment, whereas the bitset will only take up one bit per node (plus 2x bits to account for a maximally unbalanced tree where the shortest branch is half the length of the longest branch).
Instead of using boolean property on a child we could define a red node as the one who has a child in the wrong place.
If we go this way all leaf nodes are guaranteed to to be black and we should swap parent with his sibling (making him red) when inserting a new node.

Can anyone explain the deletion of Left-Lean-Red-Black tree clearly?

I am learning Left-Lean-Red-Black tree, from Prof.Robert Sedgewick
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
While I got to understand the insert of the 2-3 tree and the LLRB, I have spent totally like 40 hours now for 2 weeks and I still can't get the deletion of the LLRB.
Can anyone really explain the deletion of LLRB to me?
Ok I am going to try this, and maybe the other good people of SO can help out. You know how one way of thinking of red nodes is as indicators of
where there there imbalance/new nodes in the tree, and
how much imbalance there is.
This is why all new nodes are red. When the nodes (locally) balance out, they undergo a color flip, and the redness is passed up to the parent, and now the parent may look imbalanced relative to its sibling.
As an illustration, consider a situation where you are adding nodes from larger to smaller. You start with node Z which is now root and is black. You add node Y, which is red and is a left child of Z. You add a red X as a child of Z, but now you have two successive reds, so you rotate right, recolor, and you have a balanced, all black (no imbalance/"new nodes"!) tree rooted at Y [first drawing]. Now you add W and V, in that order. At first they are both red [second drawing], but immediately V/X/W are rotated right, and color flipped, so that only X is red [third drawing]. This is important: X being red indicates that left subtree of Y is unbalanced by 2 nodes, or, in other words, there are two new nodes in the left subtree. So the height of the red links is the count of new, potentially unbalanced nodes: there are 2^height of new nodes in the red subtree.
Note how when adding nodes, the redness is always passed up: in color flip, two red children become black (=locally balanced) while coloring their parent red. Essentially what the deletion does, is reverse this process. Just like a new node is red, we always also want to delete a red node. If the node isn't red, then we want to make it red first. This can be done by a color flip (incidentally, this is why color flip in the code on page 3 is actually color-neutral). So if the child we want to delete is black, we can make it red by color-flipping its parent. Now the child is guaranteed to be red.
The next problem to deal with is the fact that when we start the deletion we don't know if the target node to be deleted is red or not. One strategy would be to find out first. However, according to my reading of your first reference, the strategy chosen there is to ensure that the deleted node can be made red, by "pushing" a red node down in front of the search node as we are searching down the tree for the node to be deleted. This may create unnecessary red nodes that fixUp() procedure will resolve on the way back up the tree. fixUp() presumably maintains the usual LLRBT invariants: "no successive red nodes" and "no right red nodes."
Not sure if that helps, or if we need to get into more detailed examination of code.
There is an interesting comment about the Sedgwich implementation and in particular its delete method from a Harvard Comp Sci professor. Left-Leaning Red-Black Trees Considered Harmful was written in 2013 (the Sedgwich pdf you referenced above is dated 2008):
Tricky writing
Sedgewick’s paper is tricky. As of 2013, the insert section presents 2–3–4 trees as the default and describes 2–3 trees as a variant. The delete implementation, however, only works for 2–3 trees. If you implement the default variant of insert and the only variant of delete, your tree won’t work. The text doesn’t highlight the switch from 2–3–4 to 2–3: not kind.
The most recent version I could find of the Sedgwich code, which contains a 2-3 implementation, is dated April 2014. It is on his Algorithms book site at RedBlackBST.java
Follow the next strategy to delete an arbitrary node in a LLRB tree which is not in a leaf:
Transform a LLRB tree to a 2-3-4 tree (we do not need to transform the whole tree, only a part of the tree).
Replace the value of the node (which we want to delete) its successor.
Delete its successor.
Fix the tree (recover balance, see the book "Algorithms 4th edition" on the pages 435, 436).
If a value in a leaf then we do not need to use a successor to replase this value, but we still need to transform the current tree to 2-3-4 tree to delete this value.
The slide on the page 20 of this presentation https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf and the book "Algorithms 4th edition" on the page 437 are a key. They show how a LLRB tree transformations into a 2-3 tree. In the book "Algorithms 4th edition" on the page 442 https://books.google.com/books?id=MTpsAQAAQBAJ&pg=PA442 is an algorithm of transformation for trees.
For example, open the page 54 of the presentation https://www.cs.princeton.edu/~rs/talks/LLRB/08Dagstuhl/RedBlack.pdf. The node H has children D and L. According to the algorithm on the page 442 we transform these three nodes into the 4-node of a 2-3-4 tree. Then the node D has children B and F we also transform these nodes into a node of 2-3-4 tree. Then the node B has children A and C we also transform these nodes into a node of 2-3-4 tree. And finally we need to delete A. After deletion we need to recover balance. We move up through the tree and we restore balance of the tree (according to rules, see the book "Algorithms 4th edition" on the pages 435, 436). If you need to delete the node D (the same tree on the page 54). You need the same transformations and need to replace the value of the node D on the value of the node E and delete the node E (because it is a successor of D).

Regarding splay trees

I am reading about splay trees in Data structures and algorithms by Mark Allen Wesis
The splaying strategy is similar to the rotation idea, except that we
are a little more selective about how rotations are performed. We will
still rotate bottom up along the access path. Let x be a (nonroot)
node on the access path at which we are rotating. If the parent of x
is the root of the tree, we merely rotate x and the root. This is the
last rotation along the access path. Otherwise, x has both a parent
(p) and a grandparent (g), and there are two cases, plus symmetries,
to consider. The first case is the zig-zag case, Here x is a right
child and p is a left child (or vice versa). If this is the case, we
perform a double rotation, exactly like an AVL double rotation.
Otherwise, we have a zig-zig case: x and p are either both left
children or both right children.
In above text what does author mean by following statement "There are two cases plus symmetries"? two cases are given but what are symmetires here?
Thanks!
I think it's just pretty basic axial symetry:
example for a zig zag case, here are 2 symetric trees:
g
/ \
p d
/\
c x
/ \
a b
g
/ \
d p
/\
x c
/ \
a b
For example, say a case is "When the node in questions is the right child of its parent and the parent is the left child of the grandparent" In this case you do a left rotation and then a right rotation. So the node will come up to the grand parent.
The symmetric part of this case is "When the node in questions is the left child of its parent and the parent is the right child of the grandparent" In this case you do a right rotation and then a left rotation. So the node will come up to the grand parent.
replace left with right and right with left you get the symmetric case.
There are only 3 cases for rotation in a splay tree. Listed it here
You can see the runtime difference in searching with and without splaying.

Delete in Binary Search tree?

I am reading through the binary tree delete node algorithm used in the book Data Structures and Algorithms: Annotated Reference with Examples
on page 34 , case 4(Delete node which has both right and left sub trees), following algorithm described in the book looks doesn't work, probably I may be wrong could someone help me what I am missing.
//Case 4
get largestValue from nodeToRemove.Left
FindParent(largestValue).Right <- 0
nodeToRemove.Value<-largestValue.Value
How does the following line deletes the largest value from sub tree FindParent(largestValue).Right <- 0
When deleting a node with two children, you can either choose its in-order successor node or its in-order predecessor node. In this case it's finding the the largest value in the left sub-tree (meaning the right-most child of its left sub-tree), which means that it is finding the node's in-order predecessor node.
Once you find the replacement node, you don't actually delete the node to be deleted. Instead you take the value from the successor node and store that value in the node you want to delete. Then, you delete the successor node. In doing so you preserve the binary search-tree property since you can be sure that the node you selected will have a value that is lower than the values of all the children in the original node's left sub-tree, and greater that than the values of all the children in the original node's right sub-tree.
EDIT
After reading your question a little more, I think I have found the problem.
Typically what you have in addition to the delete function is a replace function that replaces the node in question. I think you need to change this line of code:
FindParent(largestValue).Right <- 0
to:
FindParent(largestValue).Right <- largestValue.Left
If the largestValue node doesn't have a left child, you simply get null or 0. If it does have a left child, that child becomes a replacement for the largestValue node. So you're right; the code doesn't take into account the scenario that the largestValue node might have a left child.
Another EDIT
Since you've only posted a snippet, I'm not sure what the context of the code is. But the snippet as posted does seem to have the problem you suggest (replacing the wrong node). Usually, there are three cases, but I notice that the comment in your snippet says //Case 4 (so maybe there is some other context).
Earlier, I alluded to the fact that delete usually comes with a replace. So if you find the largestValue node, you delete it according to the two simple cases (node with no children, and node with one child). So if you're looking at pseudo-code to delete a node with two children, this is what you'll do:
get largestValue from nodeToRemove.Left
nodeToRemove.Value <- largestValue.Value
//now replace largestValue with largestValue.Left
if largestValue = largestValue.Parent.Left then
largestValue.Parent.Left <- largestValue.Left //is largestValue a left child?
else //largestValue must be a right child
largestValue.Parent.Right <- largestValue.Left
if largestValue.Left is not null then
largestValue.Left.Parent <- largestValue.Parent
I find it strange that a Data Structures And Algorithms book would leave out this part, so I am inclined to think that the book has further split up the deletion into a few more cases (since there are three standard cases) to make it easier to understand.
To prove that the above code works, consider the following tree:
8
/ \
7 9
Let's say that you want to delete 8. You try to find largestValue from nodeToRemove.Left. This gives you 7 since the left sub-tree only has one child.
Then you do:
nodeToRemove.Value <- largestValue.Value
Which means:
8.value <- 7.Value
or
8.Value <- 7
So now your tree looks like this:
7
/ \
7 9
You need to get rid of the replacement node and so you're going to replace largestValue with largestValue.Left (which is null). So first you find out what kind of child 7 is:
if largestValue = largestValue.Parent.Left then
Which means:
if 7 = 7.Parent.Left then
or:
if 7 = 8.Left then
Since 7 is 8's left child, need to replace 8.Left with 7.Right (largestValue.Parent.Left <- largestValue.Left). Since 7 has no children, 7.Left is null. So largestValue.Parent.Left gets assigned to null (which effectively removes its left child). So this means that you end up with the following tree:
7
\
9
The idea is to simply take the value from the largest node on the left hand side and move it to the node that is being deleted, i.e., don't delete the node at all, just replace it's contents. Then you prune out the node with the value you moved into the "deleted" node. This maintains the tree ordering with every node's value larger than all of it's left children and smaller than all of it's right children.
I think you may need to clarify what doesn't work.
I will try and explain the concept of deletion in a binary tree in case this helps.
Lets assume that you have a node in the tree that has two child nodes that you wish to delete.
in the tree below lets say that you want to delete node b
a
/ \
b c
/ \ / \
d e f g
When we delete a node we need to reattach its dependant nodes.
ie. When we delete b we need to reattach nodes d and e.
We know that the left nodes are less than the right nodes in value and that the parent nodes are between the left and right node s in value. In this case d < b and b < e. This is part of the definition of a binary tree.
What is slightly less obvious is that e < a. So this means that we can replace b with e. Now we have reattached e we need to reattach d.
As stated before d < e so we can attach e as the left node of e.
The deletion is now complete.
( btw The process of moving a node up the tree and rearranging the dependant nodes in this fashion is known as promoting a node. You can also promote a node without deleting other nodes.)
a
/ \
d c
\ / \
e f g
Note that there is another perfectly legitimate outcome of deleteing node b.
If we chose to promote node d instead of node e the tree would look like this.
a
/ \
e c
/ / \
d f g
If I understand the pseudo-code, it works in the general case, but fails in the "one node in the left subtree" case. Nice catch.
It effectively replaces the node_to_remove with largest_value from it's left subtree (also nulls the old largest_value node).
Note that in a BST, the left subtree of node_to_remove will be all be smaller than node_to_remove. The right subtree of node_to_remove will all be larger than node_to_remove. So if you take the largest node in the left subtree, it will preserve the invariant.
If this is a "one node in the subtree case", it'll destroy the right subtree instead. Lame :(
As Vivin points out, it also fails to reattach left children of largestNode.
It may make more sense when you look at the Wikipedia's take on that part of the algorithm:
Deleting a node with two children:
Call the node to be deleted "N". Do
not delete N. Instead, choose either
its in-order successor node or its
in-order predecessor node, "R".
Replace the value of N with the value
of R, then delete R. (Note: R itself
has up to one child.)
Note that the given algorithm chooses the in-order predecessor node.
Edit: what appears to be missing the possibility that R (to use Wikipedia's terminology) has one child. A recursive delete might work better.

Resources