Insertion In 2-3-4 tree - data-structures

Consider the following 2-3-4 tree (i.e., B-tree with a minimum degree of two) in
which each data item is a letter. The usual alphabetical ordering of letters is used
in constructing the tree.
What is the result of inserting G in the above tree?
I am getting the answer as
But the answer in solution key is
Can anyone explain how to get the answer provided by the solution key?

As long the invariants are not violated, the operation is technically valid. The insertion algorithm in CLRS splits on the way down, so it would split the root like you did.
However, another implementation might observe that the second child is empty and the first is full. That means the "rotation" can be done and the root node count is unaffected. The rotation involves pushing L down into the second child (prepending) and pulling up I up into L's previous place in the root. Now the first child has only two entries and you can insert into it.
Animated insertion using the CLRS method you used

Related

B-tree insertion: during the descend in the tree, Why we split every node with 2t-1 elements?

In B-tree insertion algorithm, I see that in order to solve the case in which we need to insert an element to a leaf with 2t-1 elements, we need to do split algorithm to the tree. Something I don't understand is why in the insertion algorithm during the descend in the tree (to the willing point) we split every node with 2t-1 elements, even though I seems useless. for example
example
I understand that there is a case in which couple of nodes above the leaf got 2t-1 elements, and in case we want move the median to them we face problem, but why not to give pinpoint solution for that, instead of doing split every time.
correct me if I say something wrong.
We split the full nodes on the way down to the target position because we don't know if we will need to "go back up." You can do it the way you are thinking, where we go down to the target node, split it, and then insert the median of the split into the parent, recursively splitting nodes as needed. But this requires us to go from the root, down to the target, and back up, potentially all the way to the root again. This might be undesirable, e.g. if accessing the nodes twice would be too expensive. In that case, it may be better to go in one pass straight down, where you split any full nodes to anticipate the need for more space.
For a demonstration, you can try inserting 10 into the trees in the middle and on the bottom of your drawing. The tree on the bottom, unsplit, needs to be split all the way to the root in the same way as the middle tree, because the two-pass algorithm didn't leave any space. In the middle tree, inserting 10 still causes a split, but it doesn't extend all the way up because the top two layers of the tree are very spacious.
There is an important caveat, though. Let t be the minimum number of children per node. For the two pass algorithm, the maximum number of children a node can have needs to be at least u = 2t - 1. If it is less, like 2t - 2, then splitting a full node (2t - 3 elements), even with the additional element to insert, will not be able to make two non-deficient nodes. The one pass algorithm requires a higher maximum, u = 2t. This is because the two-pass algorithm always has an element on hand to cancel exactly one deficiency. The one-pass algorithm does not have this ability, as it sometimes splits nodes unnecessarily, so it can't stick the element it's holding into one of the deficiencies. It might not belong there.
I've implemented B-trees several times, and have never split nodes on the way down.
Usually I do insert recursively, such that node->insert(key,data) can return a new key to insert in the parent. The parent calls insert on the child node, and if the child splits it returns a new key to the parent. If the parent splits then it returns the a key to it's parent, etc.
I've found that the insert implementation can stay pretty clean this way.

Implementing the Rope data structure using binary search trees (splay trees)

In a standard implementation of the Rope data structure using splay trees, the nodes would be ordered according to a rank statistic measuring the position of each one from the start of the string, so the keys normally found in binary search tree would be irrelevant, would they not?
I ask because the keys shown in the graphic below (thanks Wikipedia!) are letters, which would presumably become non-unique once the number of nodes exceeded the length of the chosen alphabet. Wouldn't it be better to use integers or avoid using keys altogether?
Separately, can anyone point me to a good implementation of the logic to recompute rank statistics after each operation?
Presumably, if the index for a split falls within the substring attached to a particular node, say, between "Hel" and "llo_" on the node E above, you would remove the substring from E, split it and reattach it as two children of E. Correct?
Finally, after a certain number of such operations, the tree could, I suppose, end up with as many leaves as letters. What would be the best way to keep track of that and prune the tree (by combining substrings) as necessary?
Thanks!
For what it's worth, you can implement a Rope using Splay Trees by attaching a substring to each node of the binary search tree (not just to the leaf nodes as shown above).
The rank of each node is its size plus the size of its left subtree. But when recomputing ranks during splay operations, you need to remember to walk down the node.left.right branch, too.
If each node records a reference to the substring it represents (cf. the actual substring itself), everything runs faster. That way when a split operation falls within an existing node, you just need to modify the node's attributes to reflect the right part of the substring you want to split, then add another node to represent the left part and merge it with the left subtree.
Done as above, each node records (in addition its left, right and parent attributes etc.) its rank, size (in characters) and the location of the first character it represents in the string you're trying to modify. That way, you never actually modify the initial string: you just do your operations on bits of the tree and reproduce the final string when you're ready by walking it in order.

Insertion and removal from a B-tree

I have come across this question and haven't been able to answer it.
Given a B-tree of order 9 and of 4 levels will insertion and right after it removal of a new item x will always bring the tree to its first structure?
Will removal and insertion of a existing item x always bring the tree to its first structure?
Prove it.
So far i tried to disprove it but haven't been able to.
Now i honestly can't find the answer, I am not asking for a full proof a general idea on how to prove it will satisfy me.
The answer obviously depends on the implementation of the insert and delete methods but in short: no.
I won't give you a full proof (because you didn't ask for it and because I'm too lazy) but the general idea should be that usually when you delete a node the inner-most node of the opposite side (relative to the parent) takes its place. So in any scenario where that node exists, it will be moved up. It also means that the node was not a leaf, which is a problem because insertion usually puts new nodes on the tree as a leaf. So the original structure will only be maintained if the inner-most node of the opposite side (relative to the parent) is empty.
This is the deletion I'm referring to. If you remove 2 and re-insert it, that's the counter proof.

Fair deletion of nodes in Binary Search Tree

The idea of deleting a node in BST is:
If the node has no child, delete it and update the parent's pointer to this node as null
If the node has one child, replace the node with its children by updating the node's parent's pointer to its child
If the node has two children, find the predecessor of the node and replace it with its predecessor, also update the predecessor's parent's pointer by pointing it to its only child (which only can be a left child)
the last case can also be done with use of a successor instead of predecessor!
It's said that if we use predecessor in some cases and successor in some other cases (giving them equal priority) we can have better empirical performance ,
Now the question is , how is it done ? based on what strategy? and how does it affect the performance ? (I guess by performance they mean time complexity)
What I think is that we have to choose predecessor or successor to have a more balanced tree ! but I don't know how to choose which one to use !
One solution is to randomly choose one of them (fair randomness) but isn't better to have the strategy based on the tree structure ? but the question is WHEN to choose WHICH ?
The thing is that is fundamental problem - to find correct removal algorithm for BST. For 50 years people were trying to solve it (just like in-place merge) and they didn't find anything better then just usual algorithm (with predecessor/successor removing). So, what is wrong with classic algorithm? Actually, this removing unbalances the tree. After several random operations add/remove you'll get unbalanced tree with height sqrt(n). And it is no matter what you choosed - remove successor or predecessor (or random chose beetwen these ways) - the result is the same.
So, what to choose? I'm guessing random based (succ or pred) deletion will postpone unbalancing of your tree. But, if you want to have perfectly balanced tree - you have to use red-black ones or something like that.
As you said, it's a question of balance, so in general the method that disturbs the balance the least is preferable. You can hold some metrics to measure the level of balance (e.g., difference from maximal and minimal leaf height, average height etc.), but I'm not sure whether the overhead worth it. Also, there are self-balancing data structures (red-black, AVL trees etc.) that mitigate this problem by rebalancing after each deletion. If you want to use the basic BST, I suppose the best strategy without apriori knowledge of tree structure and the deletion sequence would be to toggle between the 2 methods for each deletion.

Disadvantages of top-down node splitting on insertion into B+ tree

For a B+ tree insertion why would you traverse down the tree then back upwards splitting the parents?
Wikipedia suggests this method of insertion:
Perform a search to determine what bucket the new record should go
into.
If the bucket is not full (at most b - 1 entries after the insertion), add the record.
Otherwise, split the bucket.
Allocate new leaf and move half the bucket's elements to the new bucket.
Insert the new leaf's
smallest key and address into the parent.
If the parent is full, split it too.
Add the middle key to the parent node.
Repeat until a parent is found that need not split.
If the root splits, create a new root which has one key and two
pointers.
Why would you traverse down then tree and then go back up performing the splits? Why not split the nodes as you encounter them on the way down?
To me, the proposed method performs twice the work and requires more bookkeeping as well.
Can anyone explain why this is the preferred method for insertion as opposed to splitting on the way down and what the disadvantages are for inserting during the traversal?
You have to backtrack up the tree because you don't actually know whether a split is required at the lowest level until you get there.
It's all there in the phrase "If the bucket is not full, ...".
You should also be aware that it's nowhere near twice the work. Since you're remembering all sorts of stuff on the way down (node pointers, indexes within the node, and so on), there's not as much calculation or searching on the way back up.

Resources