Splay Tree Deletion - algorithm

I'm having trouble conceptualising the process of deletion from a splay tree. Given this intial, tree, I want to delete the node 78.
Based on the information from my course (derived from Goodrich, Tamassia and Goldwasser), the deleted node in a BST should be replaced by the next node reached by performing an in-order traversal from the node which should be 91. This node should then be splayed to the top of the tree. However, this is not the case as shown on this visualiser here. https://www.cs.usfca.edu/~galles/visualization/SplayTree.html

The visualizer replaced 78 by its in order predecessor (70) instead and splayed that node. (The in order successor, i.e., the next key in sorted order is 83, not 91.) In general, splay trees are wonderfully malleable and as long as you approximately halve the length of the path you just descended while making every other path at most a little bit longer, you’re doing it right from an asymptotic performance standpoint (your professor may have different ideas, however).

Your textbook description:
the deleted node in a BST should be replaced by the next node reached by performing an in-order traversal from the node which should be 91
That description applies to unbalanced BST (binary search trees) but does not apply to most of the various kinds of balanced binary trees, and also does not apply to Splay Trees. To delete a node in a splay tree do the following:
Splay the node to be deleted to the root and dispose of it. This leaves two trees, call the left tree A and the right tree B.
The new root of the recombined tree will come from A. Splay the largest (rightmost) node of A tree to its root.
Because A's new root has the greatest key in A, it has no right child. Set the right child of A's new root to B.
A is the new combined tree.
This is what the visualization at https://www.cs.usfca.edu/%7Egalles/visualization/SplayTree.html did.
You said in comments to the other answer:
So in practice, the node that you choose to replace the deleted node doesn't really matter, i.e. affect performance etc.
In the typical splay tree deletion algorithm the node to replace will be the predecessor or successor node, in key order.
The rule of thumb is to always splay whenever a specific node is accessed. Find the node to delete, then splay it to the root. Find its predecessor, then splay it to the root. There are variations where you can splay less aggressively, too.

Related

How to check if two binary trees share a node

Given an array of binary trees find whether any two trees share a node, not value wise, but "pointer" wise. At the bottom I provided an example.
My approach was to iterate through all the trees and store all the leaves (pointers) from each tree into a list, then check if list has any duplicates, but that's a rather slow approach. Is there perhaps a quicker way to solve this?
In the worst case you will have to traverse all nodes (all pointers) to find a shared node (pointer), as it might happen to be the last one visited. So the best time complexity we can expect to have is O(π‘š+𝑛) where π‘š and 𝑛 represent the number of nodes in either tree.
We can achieve this time complexity if we store the pointers from the first tree in a hash set and then traverse the pointers of the second tree to see if any of those is in the set. Assuming that get/set operations on a hash set have an amortized constant time complexity, the overal time complexity will be O(π‘š+𝑛).
If the same program is responsible for constructing the trees, then a reuse of the same node can be detected upon insertion. For instance, reuse of the same node in multiple trees can be completely avoided by having the insert method of your tree only take a value as argument, never a node instance. The method will then encapsulate the actual creation of the node, guaranteeing its uniqueness.
An idea for O(#nodes) time and O(1) space. It does more traversal work than simple traversals using a hash table, but it doesn't have the cost of using a hash table. I don't know what's better. Might depend on the language.
For two trees
Create one extra node. Do a Morris traversal of the first tree. It only modifies right child pointers, so we can use left child pointers for marking nodes as seen. For every tree node without left child, set our extra node as left child. Whenever checking a left child pointer, treat our extra node like a null pointer, i.e., don't visit it. After the traversal, the tree structure is restored, and all originally left-child-less tree nodes now point to our extra node as left child. That includes all leaf nodes.
Do a Morris traversal of the second tree. Again treat pointers to our extra node like null pointers. If we ever do encounter our extra node, we know the trees share a node. If not, then we know the trees don't share a node, since if they did share any, they'd also share a leaf node (just go down from any shared node to a leaf node, that's also shared), and all leafs nodes of the first tree are marked. After the traversal, the second tree is restored.
Do a Morris traversal of the first tree again, this time removing our extra node, restoring the original null pointers.
For an array of more than two trees
Mark the first tree as above. Check the second tree as above. Mark the second tree. Check the third. Mark the third. Check the fourth. Mark the fourth. Etc. When you found a shared node or there are no more trees, unmark the marked trees.
Every shared node must have two parents, or an ancestor with two parents.
LOOP over nodes
IF node has two parents
MARK node as shared
Mark all descendants as shared.

Red Black Tree Insertion & Deletion Uniqueness

I've been learning and working on implementing a red-black tree data structure. I'm following this article on red-black tree deletion examples and looking at example 5 they have:
When I insert the same nodes into my tree, I get the following:
I understand that red black trees are not unique (I think), therefore both of the above trees are valid since they don't violate any of the properties.
In the example article, after deleting node 1, they get the following:
But after deleting node 1 in my code, I get the following:
Since in my case, node 1 is red, I don't call my delete_fix function which takes care of re-arranging the tree and such. The deletion algorithm I was following simply states to call a delete_fix function if the node to be deleted is black.
However, after comparing my tree with the one in the example article I can see that mine is not exactly optimized. It still follows the rules of the red-black tree though. Is this to be expected with red-black trees or am I missing something here?
However, after comparing my tree with the one in the example article I can see that mine is not exactly optimized.
It is optimised. Your tree will be fast at deleting nodes 5, 7, 20 & 28. The other only 5 & 7.
Bear in mind that for Red-Black Trees, they can be bushy in one direction. If the black tree height of real nodes is N, then the minimum path from root to leaf node is N (all black) and maximum path from root to leaf node is 2 * N (alternatively black-red-black-red etc). If you try to add a new node to the bushy path that is at maximum height, the tree will recolour and/or rebalance.
If you want a more balanced search tree you should use an AVL tree. Red-Black trees favour minimal insertion/deletion fixups over finding a node. Your tree is fine.

Idiomatic Traversal Binary Tree (Perhaps Any Tree)

A Doubly Linked List enables idiomatic traversal of a Linked List and I thought why not for a Binary Tree? Traditionally, Binary Trees or Trees ingeneral are unidirectional and that implies, given a large tree with sufficient number of nodes, the running time to find a leaf node can be costly.
If, after finding such a node, to find the next I could traverse the tree back toward the root, would that not be advantageous as compared to another depth-first search through every node of the tree? I have never considered this before until realizing the marriage of a Doubly Linked List and a Binary Tree could potentially add benefit.
For example, if I employed an inner class
class Tree<T> {
private class TwoWayNode {
var data : T
var left : TwoWayNode
var right : TwoWayNode
var previous : TwoWayNode
}
}
The use of left and right are as normal to traverse the respective subtrees from each node and previous would hold a pointer to the parent node enable idiomatic traversal. Would someting like this work well and what are some of the potential problems or pitfalls?
Given you store a previous reference, you can walk leftmost first. Upon arrival at the leaf node, you back one up again, traverse right.
You can always compare the current node, your "walker", with the child nodes, so you can check if you went left or right the last time. This makes your traversal stateless and you do not even need recursion; suitable for very large datasets.
Now, everytime you just left the right leaf, you back one up again.
This algorithm is a Depth-First-Search.*
Making it faster:
Given that you could define some deterministic condition for the order of traversal, this can become quite flexible, and even be used in applications like ray tracing.
*: http://en.wikipedia.org/wiki/Depth-first_search
Bonus: This paper on traversal algorithms for Kd-trees in Ray Tracing: Review: Kd-tree Traversal Algorithms for Ray Tracing (http://dcgi.felk.cvut.cz/home/havran/ARTICLES)/cgf2011.pdf
Indeed nodes of a binary tree are often implemented with pointers to the left and right child and the parent (see this implementation of red black trees).
But you not always need a parent pointer:
For an inorder-traversal you can use a recursive algorithm so that the call stack takes care of that for you.
If you want to access the min or max node you can simply maintain a extra pointer to them.
Sometimes you can use a finger tree.
Or organize your pointers extra clever (see Self adjusting binary search trees page 666):
The left pointer of a node points to the first (left) child
The right pointer of a node points to either the sibling (if it is a left child) or back to the parent (if it is a right child)
Extra cool: Threaded binary search trees for extra easy inorder (and reverse order) traversal without a stack - so O(1) space!

How does a red-black tree work?

There are lots of questions around about red-black trees but none of them answer how they work. Why is it called red-black? How does this keep the tree balanced (thus increasing performance over an unbalanced normal binary search tree)? I'm just looking for an overview of how and why it works.
For searches and traversals, it's the same as any binary tree.
For inserts and deletes, more sophisticated algorithms are applied which aim to ensure that the tree cannot be too unbalanced. These guarantee that all single-item operations will always run in at worst O(log n) time, whereas in a simple binary tree the binary tree can become so unbalanced that it's effectively a linked list, giving O(n) worst case performance for each single-item operation.
The basic idea of the red-black tree is to imitate a B-tree with up to 3 keys and 4 children per node. B-trees (or variations such as B+ trees) are mainly used for database indexes and for data stored on hard disk.
Each binary tree node has a "colour" - red or black. Each black node is, in the B-tree analogy, the subtree root for the subtree that fits within that B-tree node. If this node has red children, they are also considered part of the same B-tree node. So it is possible (though not done in practice) to convert a red-black tree to a B-tree and back, with (most) structure preserved. The only possible anomoly is that when a B-tree node has two keys and three children, you have a choice of which key to goes in the black node in the equivalent red-black tree.
For example, with red-black trees, every line from root to leaf has the same number of black nodes. This rule is derived from the B-tree rule that all leaf nodes are at the same depth.
Although this is the basic idea from which red-black trees are derived, the algorithms used in practice for inserts and deletes are modified to enforce all the B-tree rules (there might be a minor exception - I forget) during updates, but are tailored for the binary tree form. This means that doing a red-black tree insert or delete may give a different structure for the result than that you'd expect comparing with doing the B-tree insert or delete.
For more detail, follow the Wikipedia link that MigDus already supplied.
A red-black tree is an ordered binary tree where each vertex is coloured red or black. The intuition is that a red vertex should be seen as being at the same height as its parent (i.e., an edge to a red vertex is thought of as "horizontal" rather than "descending").
[I don't believe the Wikipedia entry makes this point clear.]
The usual rules for red-black trees require that a red vertex never point to another red vertex. This means that the possible vertex arrangements for any subtree rooted with a black vertex (bbb, bbr, rbb, rbr -- for [left child][root][right child]) correspond to 234 trees.
Searching a red-black tree is just the same as searching an ordinary binary tree. Insertion and deletion are similar, except that a "fix-up" rotation may be required at some point to preserve the red-black invariant.
Cheers!

tree traverse recursive in level-first order and depth-first order

Is there any algorithm can traverse a tree recursively in level-first order and non-recursively in postorder.Thanks a lot.
To get an effectively recursive breadth-first search you can use iterative deepening depth-first search. It's particularly good for situations where the branching factor is high, where regular breadth-first search tends to choke from excessive memory consumption.
Edit: Marcos Marin already mentioned it, but for the sake of completeness, the Wikipedia page on breadth-first traversal describes the algorithm thus:
Enqueue the root node.
Dequeue a node and examine it.
If the element sought is found in this node, quit the search and return a result.
Otherwise enqueue any successors (the direct child nodes) that have not yet been discovered.
If the queue is empty, every node on the graph has been examined – quit the search and return "not found".
Repeat from Step 2.
Note: Using a stack instead of a queue would turn this algorithm into a depth-first search.
That last line is, obviously, interesting to you if you want to do a non-recursive depth-first traversal. Getting pre- or post-order is just a matter of modifying how you append the nodes in step 2.b.
You can recurse a tree in post order iteratively by using a stack instead of the implicit call stack used in recursion.
Wikipedia says,
Traversal
Compared to linear data structures
like linked lists and one dimensional
arrays, which have only one logical
means of traversal, tree structures
can be traversed in many different
ways. Starting at the root of a binary
tree, there are three main steps that
can be performed and the order in
which they are performed defines the
traversal type.
These steps (in no
particular order) are: performing an
action on the current node (referred
to as "visiting" the node), traversing
to the left child node, and traversing
to the right child node. Thus the
process is most easily described
through recursion.
To traverse a non-empty binary tree in
preorder, perform the following
operations recursively at each node,
starting with the root node:
Visit the node.
Traverse the left subtree.
Traverse the right subtree. (This is also called Depth-first
traversal.)
To traverse a non-empty binary tree in
inorder, perform the following
operations recursively at each node:
Traverse the left subtree.
Visit the node.
Traverse the right subtree. (This is also called Symmetric traversal.)
To traverse a non-empty binary tree in
postorder, perform the following
operations recursively at each node:
Traverse the left subtree.
Traverse the right subtree.
Visit the node.
Finally, trees can also be traversed
in level-order, where we visit every
node on a level before going to a
lower level. This is also called
Breadth-first traversal.

Resources