While serializing BST why is in-order essential? - algorithm

I learnt that, to retain the structure of a BST while serialiing it, one needs to store in-order and one of either pre-order or post-order notations of the tree.
What makes in-order notation essential?

Note: Rewrote the answer, the previous version was incorrect.
For a general binary tree (with unique elements) your statement would be correct. Consider these two inputs (not very prettily drawn ;-) ):
If you serialize these using in-order traversal, both yield ABC. Similar cases exist for the other traversal types.
So why is a combination of in-order and pre-order enough?
The serialized shape of pre-order is [root][left subtree][right subtree]. The root is easy to identify, but you don't know where the left subtree ends and the right subtree begins.
Now consider in-order serialized: [left subtree][root][right subtree]. You know what the root is (thanks to pre-order), so it is really easy to identify the left and right subtrees.
Note that this is still not enough if the weights are not unique. If in the above example we change B into A, both trees would yield [AAC] for both traversal types.
For binary search trees deserialization is much easier. Why? Well, every subtree has the property that the nodes in the left subtree are smaller than the root, whereas the nodes in the right subtree are bigger. Therefore, the pre-order serialization [root][left subtree][right subtree] can easily and unambiguously be parsed again. So, in conclusion, the person who told you that at least two serialization approaches are needed for a BST was mistaken (maybe he also forgot about the properties of a BST).

Storing BSTs in some sort of order while serializing likely makes it simpler to build upon retrieval. Imagine that you have your BST and just pick nodes at random to serialize and store. When retrieving, it will retrieve in the order stored and then after the fact, something would have to go through and connect all of the nodes. While that should be possible - all the information is there - it seems unnecessary. Each node is just sort of floating; the deserialization process/program has to maintain a list of all the nodes (or similar) while it walks through the list connecting piece by piece.
On the other hand, if you store them in some sort of prescribed order, it can build the tree while reading in each node - it knows where to connect the nodes since they are in order (for clarity: this doesn't imply the next node must be connected to the previously-read node, in the case of adjacent leaves; it's just much simpler to hop up enough levels to the appropriate branch). This should be faster, and potentially use less memory (no list/container while building).

Related

Is there a balanced BST with each node maintain the subtree size?

Is there a balanced BST structure that also keeps track of subtree size in each node?
In Java, TreeMap is a red-black tree, but doesn't provide subtree size in each node.
Previously, I did write some BST that could keep track subtree size of each node, but it's not balanced.
The questions are:
Is it possible to implement such a tree, while keeping efficiency of (O(lg(n)) for basic operations)?
If yes, then is there any 3rd-party libraries provide such an impl?
A Java impl is great, but other languages (e.g c, go) would also be helpful.
BTW:
The subtree size should be kept track in each node.
So that could get the size without traversing the subtree.
Possible appliation:
Keep track of rank of items, whose value (that the rank depends on) might change on fly.
The Weight Balanced Tree (also called the Adams Tree, or Bounded Balance tree) keeps the subtree size in each node.
This also makes it possible to find the Nth element, from the start or end, in log(n) time.
My implementation in Nim is on github. It has properties:
Generic (parameterized) key,value map
Insert (add), lookup (get), and delete (del) in O(log(N)) time
Key-ordered iterators (inorder and revorder)
Lookup by relative position from beginning or end (getNth) in O(log(N)) time
Get the position (rank) by key in O(log(N)) time
Efficient set operations using tree keys
Map extensions to set operations with optional value merge control for duplicates
There are also implementations in Scheme and Haskell available.
That's called an "order statistic tree": https://en.wikipedia.org/wiki/Order_statistic_tree
It's pretty easy to add the size to any kind of balanced binary tree (red-black, avl, b-tree, etc.), or you can use a balancing algorithm that works with the size directly, like weight-balanced trees (#DougCurrie answer) or (better) size-balanced trees: https://cs.wmich.edu/gupta/teaching/cs4310/lectureNotes_cs4310/Size%20Balanced%20Tree%20-%20PEGWiki%20sourceMayNotBeFullyAuthentic%20but%20description%20ok.pdf
Unfortunately, I don't think there are any standard-library implementations, but you can find open source if you look for it. You may want to roll your own.

Mirror a binary tree in constant time

This is not a homework question. I heard that it is possible to mirror a binary tree i.e. flip it, in constant time. Is this really the case?
Sure, depending on your data structure, you would just do the equivalent of: instead of traversing down the left node and then the right node, you would traverse down the right node, and then the left node. This could be a parameter passed into the recursive function that traverses the tree (i.e. in C/C++, a bool bDoLeftFirst, and an if-statement that uses that parameter to decide which order to traverse the child nodes in).
Did you mean "invert binary tree", the problem which Max Howell could not solve and thus rejected by Google?
https://leetcode.com/problems/invert-binary-tree/
You can find solutions in the "discuss" section.

B-Tree deletion in a single pass

Is it possible to remove an element from a B-Tree in a single pass?
Wikipedia says "Do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring"
but doesn't say anything about how it is done.
Google only gives me the process of removing an element having to reestructure the tree.
Cormen also doesn't say anything about it.
It's possible in a variant of B+ tree called PO-B+ tree. In this "preparatory operations B+ tree" the number of keys in a node may be between n-1 and 2n+1 rather than n and 2n in the usual B+-tree (quoted from the paper). For delete operation (called PO-delete in the paper) you just merge (called "catenate" in the paper) all the nodes (except the root) that could be merged (or take a key from a neighbor), while moving toward the leaf. For PO-insert operation you split all the nodes (including the root). The description is given in the paper.
This preemptive restructuring only makes sense if the tree is used in multi-threaded environment, as it reduces the locking, and increases the concurency. It does not pay if a tree is accessed by only one actor.

Insertion In 2-3-4 tree

Consider the following 2-3-4 tree (i.e., B-tree with a minimum degree of two) in
which each data item is a letter. The usual alphabetical ordering of letters is used
in constructing the tree.
What is the result of inserting G in the above tree?
I am getting the answer as
But the answer in solution key is
Can anyone explain how to get the answer provided by the solution key?
As long the invariants are not violated, the operation is technically valid. The insertion algorithm in CLRS splits on the way down, so it would split the root like you did.
However, another implementation might observe that the second child is empty and the first is full. That means the "rotation" can be done and the root node count is unaffected. The rotation involves pushing L down into the second child (prepending) and pulling up I up into L's previous place in the root. Now the first child has only two entries and you can insert into it.
Animated insertion using the CLRS method you used

Fair deletion of nodes in Binary Search Tree

The idea of deleting a node in BST is:
If the node has no child, delete it and update the parent's pointer to this node as null
If the node has one child, replace the node with its children by updating the node's parent's pointer to its child
If the node has two children, find the predecessor of the node and replace it with its predecessor, also update the predecessor's parent's pointer by pointing it to its only child (which only can be a left child)
the last case can also be done with use of a successor instead of predecessor!
It's said that if we use predecessor in some cases and successor in some other cases (giving them equal priority) we can have better empirical performance ,
Now the question is , how is it done ? based on what strategy? and how does it affect the performance ? (I guess by performance they mean time complexity)
What I think is that we have to choose predecessor or successor to have a more balanced tree ! but I don't know how to choose which one to use !
One solution is to randomly choose one of them (fair randomness) but isn't better to have the strategy based on the tree structure ? but the question is WHEN to choose WHICH ?
The thing is that is fundamental problem - to find correct removal algorithm for BST. For 50 years people were trying to solve it (just like in-place merge) and they didn't find anything better then just usual algorithm (with predecessor/successor removing). So, what is wrong with classic algorithm? Actually, this removing unbalances the tree. After several random operations add/remove you'll get unbalanced tree with height sqrt(n). And it is no matter what you choosed - remove successor or predecessor (or random chose beetwen these ways) - the result is the same.
So, what to choose? I'm guessing random based (succ or pred) deletion will postpone unbalancing of your tree. But, if you want to have perfectly balanced tree - you have to use red-black ones or something like that.
As you said, it's a question of balance, so in general the method that disturbs the balance the least is preferable. You can hold some metrics to measure the level of balance (e.g., difference from maximal and minimal leaf height, average height etc.), but I'm not sure whether the overhead worth it. Also, there are self-balancing data structures (red-black, AVL trees etc.) that mitigate this problem by rebalancing after each deletion. If you want to use the basic BST, I suppose the best strategy without apriori knowledge of tree structure and the deletion sequence would be to toggle between the 2 methods for each deletion.

Resources