Fair deletion of nodes in Binary Search Tree - algorithm

The idea of deleting a node in BST is:
If the node has no child, delete it and update the parent's pointer to this node as null
If the node has one child, replace the node with its children by updating the node's parent's pointer to its child
If the node has two children, find the predecessor of the node and replace it with its predecessor, also update the predecessor's parent's pointer by pointing it to its only child (which only can be a left child)
the last case can also be done with use of a successor instead of predecessor!
It's said that if we use predecessor in some cases and successor in some other cases (giving them equal priority) we can have better empirical performance ,
Now the question is , how is it done ? based on what strategy? and how does it affect the performance ? (I guess by performance they mean time complexity)
What I think is that we have to choose predecessor or successor to have a more balanced tree ! but I don't know how to choose which one to use !
One solution is to randomly choose one of them (fair randomness) but isn't better to have the strategy based on the tree structure ? but the question is WHEN to choose WHICH ?

The thing is that is fundamental problem - to find correct removal algorithm for BST. For 50 years people were trying to solve it (just like in-place merge) and they didn't find anything better then just usual algorithm (with predecessor/successor removing). So, what is wrong with classic algorithm? Actually, this removing unbalances the tree. After several random operations add/remove you'll get unbalanced tree with height sqrt(n). And it is no matter what you choosed - remove successor or predecessor (or random chose beetwen these ways) - the result is the same.
So, what to choose? I'm guessing random based (succ or pred) deletion will postpone unbalancing of your tree. But, if you want to have perfectly balanced tree - you have to use red-black ones or something like that.

As you said, it's a question of balance, so in general the method that disturbs the balance the least is preferable. You can hold some metrics to measure the level of balance (e.g., difference from maximal and minimal leaf height, average height etc.), but I'm not sure whether the overhead worth it. Also, there are self-balancing data structures (red-black, AVL trees etc.) that mitigate this problem by rebalancing after each deletion. If you want to use the basic BST, I suppose the best strategy without apriori knowledge of tree structure and the deletion sequence would be to toggle between the 2 methods for each deletion.

Related

Is there a balanced BST with each node maintain the subtree size?

Is there a balanced BST structure that also keeps track of subtree size in each node?
In Java, TreeMap is a red-black tree, but doesn't provide subtree size in each node.
Previously, I did write some BST that could keep track subtree size of each node, but it's not balanced.
The questions are:
Is it possible to implement such a tree, while keeping efficiency of (O(lg(n)) for basic operations)?
If yes, then is there any 3rd-party libraries provide such an impl?
A Java impl is great, but other languages (e.g c, go) would also be helpful.
BTW:
The subtree size should be kept track in each node.
So that could get the size without traversing the subtree.
Possible appliation:
Keep track of rank of items, whose value (that the rank depends on) might change on fly.
The Weight Balanced Tree (also called the Adams Tree, or Bounded Balance tree) keeps the subtree size in each node.
This also makes it possible to find the Nth element, from the start or end, in log(n) time.
My implementation in Nim is on github. It has properties:
Generic (parameterized) key,value map
Insert (add), lookup (get), and delete (del) in O(log(N)) time
Key-ordered iterators (inorder and revorder)
Lookup by relative position from beginning or end (getNth) in O(log(N)) time
Get the position (rank) by key in O(log(N)) time
Efficient set operations using tree keys
Map extensions to set operations with optional value merge control for duplicates
There are also implementations in Scheme and Haskell available.
That's called an "order statistic tree": https://en.wikipedia.org/wiki/Order_statistic_tree
It's pretty easy to add the size to any kind of balanced binary tree (red-black, avl, b-tree, etc.), or you can use a balancing algorithm that works with the size directly, like weight-balanced trees (#DougCurrie answer) or (better) size-balanced trees: https://cs.wmich.edu/gupta/teaching/cs4310/lectureNotes_cs4310/Size%20Balanced%20Tree%20-%20PEGWiki%20sourceMayNotBeFullyAuthentic%20but%20description%20ok.pdf
Unfortunately, I don't think there are any standard-library implementations, but you can find open source if you look for it. You may want to roll your own.

Mirror a binary tree in constant time

This is not a homework question. I heard that it is possible to mirror a binary tree i.e. flip it, in constant time. Is this really the case?
Sure, depending on your data structure, you would just do the equivalent of: instead of traversing down the left node and then the right node, you would traverse down the right node, and then the left node. This could be a parameter passed into the recursive function that traverses the tree (i.e. in C/C++, a bool bDoLeftFirst, and an if-statement that uses that parameter to decide which order to traverse the child nodes in).
Did you mean "invert binary tree", the problem which Max Howell could not solve and thus rejected by Google?
https://leetcode.com/problems/invert-binary-tree/
You can find solutions in the "discuss" section.

While serializing BST why is in-order essential?

I learnt that, to retain the structure of a BST while serialiing it, one needs to store in-order and one of either pre-order or post-order notations of the tree.
What makes in-order notation essential?
Note: Rewrote the answer, the previous version was incorrect.
For a general binary tree (with unique elements) your statement would be correct. Consider these two inputs (not very prettily drawn ;-) ):
If you serialize these using in-order traversal, both yield ABC. Similar cases exist for the other traversal types.
So why is a combination of in-order and pre-order enough?
The serialized shape of pre-order is [root][left subtree][right subtree]. The root is easy to identify, but you don't know where the left subtree ends and the right subtree begins.
Now consider in-order serialized: [left subtree][root][right subtree]. You know what the root is (thanks to pre-order), so it is really easy to identify the left and right subtrees.
Note that this is still not enough if the weights are not unique. If in the above example we change B into A, both trees would yield [AAC] for both traversal types.
For binary search trees deserialization is much easier. Why? Well, every subtree has the property that the nodes in the left subtree are smaller than the root, whereas the nodes in the right subtree are bigger. Therefore, the pre-order serialization [root][left subtree][right subtree] can easily and unambiguously be parsed again. So, in conclusion, the person who told you that at least two serialization approaches are needed for a BST was mistaken (maybe he also forgot about the properties of a BST).
Storing BSTs in some sort of order while serializing likely makes it simpler to build upon retrieval. Imagine that you have your BST and just pick nodes at random to serialize and store. When retrieving, it will retrieve in the order stored and then after the fact, something would have to go through and connect all of the nodes. While that should be possible - all the information is there - it seems unnecessary. Each node is just sort of floating; the deserialization process/program has to maintain a list of all the nodes (or similar) while it walks through the list connecting piece by piece.
On the other hand, if you store them in some sort of prescribed order, it can build the tree while reading in each node - it knows where to connect the nodes since they are in order (for clarity: this doesn't imply the next node must be connected to the previously-read node, in the case of adjacent leaves; it's just much simpler to hop up enough levels to the appropriate branch). This should be faster, and potentially use less memory (no list/container while building).

Insertion and removal from a B-tree

I have come across this question and haven't been able to answer it.
Given a B-tree of order 9 and of 4 levels will insertion and right after it removal of a new item x will always bring the tree to its first structure?
Will removal and insertion of a existing item x always bring the tree to its first structure?
Prove it.
So far i tried to disprove it but haven't been able to.
Now i honestly can't find the answer, I am not asking for a full proof a general idea on how to prove it will satisfy me.
The answer obviously depends on the implementation of the insert and delete methods but in short: no.
I won't give you a full proof (because you didn't ask for it and because I'm too lazy) but the general idea should be that usually when you delete a node the inner-most node of the opposite side (relative to the parent) takes its place. So in any scenario where that node exists, it will be moved up. It also means that the node was not a leaf, which is a problem because insertion usually puts new nodes on the tree as a leaf. So the original structure will only be maintained if the inner-most node of the opposite side (relative to the parent) is empty.
This is the deletion I'm referring to. If you remove 2 and re-insert it, that's the counter proof.

In place min-max tree invalidation problems

I’m trying to build a parallel implementation of a min-max search. My current approach is to materialize the tree to a small depth and then do the normal thing from each of these nodes.
The simple way to do this is to compute the heuristic value for each leaf and then sweep up and compute the min/max. The problem is that it this omits alpha/beta pruning at the upper levels and makes for a major performance hit.
My first “solution” to that was to push the min/max up after each leaf is computed. This gives updating value so I can scan up the tree and check if a leaf should be pruned.
The problem is that it's totally broken. (2 days of debugging to notice that, darn I feel stupid)
Now for the question:
Is there a way to build a min-max tree that allows the leafs to be evaluated in random order and also allows alpha/beta pruning?
Check out parallel game tree search, e.g. this paper.
I think I have found a solution but I don't like it in a few regards:
Annotate the tree with the number of unfinished children.
After a leaf is evaluated, update it's parent, decrement the parent's count
If that count just reached zero, update it's parent, decrement that count
Lather, rise, repeat
Alpha/beta pruning works as expected.
The problems with this is that with random order evaluation, a lot more nodes will get evaluated before stuff starts getting pruned. On the other hand, that might be mitigated by better ordering of the leafs.

Resources