Binary Search Tree - Deletion vs Insertion. Which is 'faster'? - performance

Investigate what must happen to delete a key from a Binary Search
Tree. Is deletion always as fast as insertion?
I looked up insertions and deletions in a BST. It appears that a deletion is more complex because the nodes need to be re-routed, which also means that the keys need to be reassigned and reorganized.
As far as speed is concerned, based on the complexity of the deletion, I assume that this means that a deletion is not as fast as an insertion.
Is this a correct assumption? Thanks

Although it may initially appear that insertion should be faster, I'm not at all convinced that this is really true, at least to any significant degree.
When we do an insertion, we always insert the new node as a child of a leaf node. We have to traverse the tree to the leaf node to do the insertion.
When we do a deletion, we have three cases to consider. The simplest is that we're deleting a leaf node. In such a case, we set the parent's pointer to that leaf node to a null pointer, and we release the memory occupied by the leaf. Not really any different from the insertion.
If the node to be deleted is a non-leaf node with one child, the task is only marginally more difficult: we set the parent of the current node to point to the child of the node to be deleted, and (again) release the memory occupied by the node we're deleting.
The only time we encounter anything that could be considered extra work at all is when we have to delete a non-leaf node that has two children. In this case, we need to find a leaf node that's a child of that node--either the right-most descendant of its left child, or the left-most descendant of its right child. We swap that node into the place of the one we're deleting and release its memory.
The thing to keep in mind here is that for insertion, we started by traversing the tree to a leaf, then we insert. In the case of deletion, it's possible that we reach the node to delete before traversing all the way to a leaf--but even in the worst case, we still just continue traversing until we reach the leaf (something we do for insertion anyway), the assign pointers to move that node into the place of the one being deleted.
There might be one or two extra assignments here (depending primarily on how you implement things), but at most the difference is extremely small.
From a practical viewpoint, any real performance difference is likely to come down to one question: whether the memory management you're using attempts to balance the costs of allocation and deletion, or favors one over the other (and of so, which).
In short, depending on how your heap is managed, the slowest part of this may well be allocation or deletion of the memory for the node, and the tree manipulation is basically lost in the noise.

Related

How to check if two binary trees share a node

Given an array of binary trees find whether any two trees share a node, not value wise, but "pointer" wise. At the bottom I provided an example.
My approach was to iterate through all the trees and store all the leaves (pointers) from each tree into a list, then check if list has any duplicates, but that's a rather slow approach. Is there perhaps a quicker way to solve this?
In the worst case you will have to traverse all nodes (all pointers) to find a shared node (pointer), as it might happen to be the last one visited. So the best time complexity we can expect to have is O(𝑚+𝑛) where 𝑚 and 𝑛 represent the number of nodes in either tree.
We can achieve this time complexity if we store the pointers from the first tree in a hash set and then traverse the pointers of the second tree to see if any of those is in the set. Assuming that get/set operations on a hash set have an amortized constant time complexity, the overal time complexity will be O(𝑚+𝑛).
If the same program is responsible for constructing the trees, then a reuse of the same node can be detected upon insertion. For instance, reuse of the same node in multiple trees can be completely avoided by having the insert method of your tree only take a value as argument, never a node instance. The method will then encapsulate the actual creation of the node, guaranteeing its uniqueness.
An idea for O(#nodes) time and O(1) space. It does more traversal work than simple traversals using a hash table, but it doesn't have the cost of using a hash table. I don't know what's better. Might depend on the language.
For two trees
Create one extra node. Do a Morris traversal of the first tree. It only modifies right child pointers, so we can use left child pointers for marking nodes as seen. For every tree node without left child, set our extra node as left child. Whenever checking a left child pointer, treat our extra node like a null pointer, i.e., don't visit it. After the traversal, the tree structure is restored, and all originally left-child-less tree nodes now point to our extra node as left child. That includes all leaf nodes.
Do a Morris traversal of the second tree. Again treat pointers to our extra node like null pointers. If we ever do encounter our extra node, we know the trees share a node. If not, then we know the trees don't share a node, since if they did share any, they'd also share a leaf node (just go down from any shared node to a leaf node, that's also shared), and all leafs nodes of the first tree are marked. After the traversal, the second tree is restored.
Do a Morris traversal of the first tree again, this time removing our extra node, restoring the original null pointers.
For an array of more than two trees
Mark the first tree as above. Check the second tree as above. Mark the second tree. Check the third. Mark the third. Check the fourth. Mark the fourth. Etc. When you found a shared node or there are no more trees, unmark the marked trees.
Every shared node must have two parents, or an ancestor with two parents.
LOOP over nodes
IF node has two parents
MARK node as shared
Mark all descendants as shared.

Rope and self-balancing binary tree hybrid? (i.e Sorted set with fast n-th element lookup)

Is there a data structure for a sorted set allows quick lookup of the n-th (i.e. the least but n-th) item? That is, something like a a hybrid between a rope and a red-black tree.
Seems like it should be possible to either keep track of the size of the left subtree and update it through rotations or do something else clever and I'm hoping someone smart has already worked this out.
Seems like it should be possible to either keep track of the size of the left subtree and update it through rotations […]
Yes, this is quite possible; but instead of keeping track of the size of the left subtree, it's a bit simpler to keep track of the size of the complete subtree rooted at a given node. (You can then get the size of its left subtree by examining its left-child's size.) It's not as tricky as you might think, because you can always re-calculate a node's size as long as its children are up-to-date, so you don't need any extra bookkeeping beyond making sure that you recalculate sizes by working your way up the tree.
Note that, in most mutable red-black tree implementations, 'put' and 'delete' stop walking back up the tree once they've restored the invariants, whereas with this approach you need to walk all the way back up the tree in all cases. That'll be a small performance hit, but at least it's not hard to implement. (In purely functional red-black tree implementations, even that isn't a problem, because those always have to walk the full path back up to create the new parent nodes. So you can just put the size-calculation in the constructor — very simple.)
Edited in response to your comment:
I was vaguely hoping this data structure already had a name so I could just find some implementations out there and that there was something clever one could do to minimize the updating but (while I can find plenty of papers on data structures that are variations of balanced binary trees) I can't figure out a good search term to look for papers that let one lookup the nth least element.
The fancy term for the nth smallest value in a collection is order statistic; so a tree structure that enables fast lookup by order statistic is called an order statistic tree. That second link includes some references that may help you — not sure, I haven't looked at them — but regardless, that should give you some good search terms. :-)
Yes, this is fully possible. Self-balancing tree algorithms do not actually need to be search trees, that is simply the typical presentation. The actual requirement is that nodes be ordered in some fashion (which a rope provides).
What is required is to update the tree weight on insert and erase. Rotations do not require a full update, local is enough. For example, a left rotate requires that the weight of the parent be added to the new parent (since that new parent is the old parent's right child it is not necessary to walk down the new parent's right descent tree since that was already the new parent's left descent tree). Similarly, for a right rotate it is necessary to subtract the weight of the new parent only, since the new parent's right descent tree will become the left descent tree of the old parent.
I suppose it would be possible to create an insert that updates the weight as it does rotations then adds the weight up any remaining ancestors but I didn't bother when I was solving this problem. I simply added the new node's weight all the way up the tree then did rotations as needed. Similarly for erase, I did the fix-up rotations then subtracted the weight of the node being removed before finally unhooking the node from the tree.

Why nodes of a binary tree have links only from parent to children?

Why nodes of a binary tree have links only from parent to children? I know tha there is threaded binary tree but those are harder to implement. A binary tree with two links will allow traversal in both directions iteratively without a stack or queue.
I do not know of any such design. If there is one please let me know.
Edit1: Let me conjure a problem for this. I want to do traversal without recursion and without using extra memory in form of stack or queue.
PS: I am afraid that I am going to get flake and downvotes for this stupid question.
Some binary trees do require children to keep up with their parent, or even their grandparent, e.g. Splay Trees. However this is only to balance or splay the tree. The reason we only traverse a tree from the parent to the children is because we are usually searching for a specific node, and as long as the binary tree is implemented such that all left children are less than the parent, and all right children are greater than the parent (or vice-versa), we only need links in one direction to find that node. We start the search at the root and then iterate down, and if the node is in the tree, we are guaranteed to find it. If we started at a leaf, there is no guarantee we would find the node we want by going back to the root. The reason we don't have links from the child to the parent is because it is unnecessary for searches. Hope this helps.
It can be, however, we should consider the balance between the memory usage and the complexity.
Yeah you can traverse the binary tree with an extra link in each node, but actually you are using the same extra memory as you do the traversal with a queue, which even run faster.
What binary search tree good at is that it can implement many searching problems in O(logN). It's fast enough and memory saving.
Let me conjure a problem for this. I want to do traversal without recursion and without using extra memory in form of stack or queue.
Have you considered that the parent pointers in the tree occupy space themselves?
They add O(N) memory to the tree to store parent pointer in order not to use O(log N) space during recursion.
What parent pointers allow us to do is to support an API whereby the caller can pass a pointer to a node and request an operation on it like "find the next node in order" (for example).
In this situation, we do not have a stack which holds the path to the root; we just receive a node "out of the blue" from the caller. With parent pointers, given a tree node, we can find its successor in amortized constant time O(1).
Implementations which don't require this functionality can save space by not including the parent pointers in the tree, and using recursion or an explicit stack structure for the root to leaf traversals.

What is the purpose behind marking some of the nodes in Fibonacci heap?

This picture from Wikipedia article has three nodes of a Fibonacci heap marked in blue . What is the purpose of some of the nodes being marked in this data structure ?
Intuitively, the Fibonacci heap maintains a collection of trees of different orders, coalescing them when a delete-min occurs. The hope in constructing a Fibonacci heap is that each tree holds a large number of nodes. The more nodes in each tree, the fewer the number of trees that need to be stored in the tree, and therefore the less time will be spent coalescing trees on each delete-min.
At the same time, the Fibonacci heap tries to make the decrease-key operation as fast as possible. To do this, Fibonacci heaps allow subtrees to be "cut out" of other trees and moved back up to the root. This makes decrease-key faster, but makes each tree hold fewer nodes (and also increases the number of trees). There is therefore a fundamental tension in the structure of the design.
To get this to work, the shape of the trees in the Fibonacci heap have to be somewhat constrained. Intuitively, the trees in a Fibonacci heap are binomial trees that are allowed to lose a small number of children. Specifically, each tree in a Fibonacci heap is allowed to lose at most two children before that tree needs to be "reprocessed" at a later step. The marking step in the Fibonacci heap allows the data structure to count how many children have been lost so far. An unmarked node has lost no children, and a marked node has lost one child. Once a marked node loses another child, it has lost two children and thus needs to be moved back to the root list for reprocessing.
The specifics of why this works are documented in many introductory algorithms textbooks. It's not obvious that this should work at all, and the math is a bit tricky.
Hopefully this provides a useful intuition!
A node is marked when one of its child nodes is cut because of a decrease-key. When a second child is cut, the node also cuts itself from its parent. Marking is done so that you know when the second cut occurs.
Good Explanation from Wiki: Operation decrease key will take the node, decrease the key and if the heap property becomes violated (the new key is smaller than the key of the parent), the node is cut from its parent. If the parent is not a root, it is marked. If it has been marked already, it is cut as well and its parent is marked. We continue upwards until we reach either the root or an unmarked node. Now we set the minimum pointer to the decreased value if it is the new minimum.

Are keys in B-tree nodes duplicated when the node is split?

When a node in a B-tree is split, are keys from the original node duplicated in the new nodes? What's the purpose of doing this? Isn't this inefficient?
No. It's all done with pointers. Half of the pointers are moved to the new node.
Of course, there's no such thing as 'a B-tree'. There are a myriad of different implementations. I could imagine one in which the keys are actually stored in the nodes, such as a tree where the keys are ints. But they still wouldn't be 'duplicated', just the data copied.
If your beef is the storage left behind in the node that gets split, well, that's another optimization choice whether to free and reallocate smaller or not. Probably not, since more insertions could arrive that go into that node's 1/2 of the key space.
I think that you mean a B+ tree.
In a B+ tree that I wrote, I did duplicate the key values in the parent node during a split. key[pos] in the parent was set to the left node's lowest value and pointed to the left node. The right node's lowest value became key[pos+1] in the parent.

Resources