Well, I'm studying for a test and I'm a little bit confused with the following.
The following image is a B-tree with t=3 so each node can have at most 2t-1 keys and at least t-1 keys.
I'm being asked to delete key=3.
I can't understand why I need to join the root with its sons in this case. I know the delete algorithm is defensive as it starts in the root and checks every node so it will not need to go to any ancestor again.
But which rule will be broken if I don't join the root with its son?
Original B-tree
After deleting key 3
As for me I would just delete key 3 and that's it.
It would not broke any of the rules, the algorithm just executes every possible node merge while looking up the given key. This is necessary to ensure that there will be no need to traverse the tree upwards after the deletion.
Also, the height of the tree is reduced, which will speed up later lookups.
So this behaviour is an algorithmic decision to implement the B-tree efficiently.
Related
I have seen many different implementations of BK Trees in many different languages, and literally none of them seem to include a way to remove nodes from the tree.
Even the original article where BK Trees were first introduced does not provide a meaningful insight about node deletion, as the authors merely suggest to mark the node to be deleted so that it is ignored:
The deletion of a key in Structures 1 [the BK Tree] and 2 follows a process similar to that above, with special consideration for the case in which the key to be deleted is the representative x° [root key]. In this case, the key cannot simply be deleted, as it is essential for the structure information. Instead an extra bit must be used for each key which denotes whether the key actually corresponds to a record or not. The search algorithm is modified correspondingly to ignore keys which do not correspond to records. This involves testing the extra bit in the Update procedure.
While it may be theoretically possible to properly delete a node in a BK Tree, is it possible to do so in linear/sublinear time?
While it may be theoretically possible to properly delete a node in a BK Tree, is it possible to do so in linear/sublinear time?
If you want to physically remove it from a BK-Tree, then I can't think of a way to do this in a linear time for all cases. Consider 2 scenarios, when a node is removed. Please note that I do not account for a time complexity related to calculating the Levenshtein distance because that operation doesn't depend on the number of words, although it requires some processing time too.
Remove non-root node
Find a parent of the node in the tree.
Save node's child nodes.
Nullify parent's reference to the node.
Re-add each child node as if it were a new node.
Here, even if step 1 can be done in O(1), steps 2 and 4 are way more expensive. Inserting a single node is O(h), where h is a height of tree. To make matters worse, this has to be done for each child node of the original node, and so it will be O(k*h), where k is a number of child nodes.
Remove root node
Rebuild the tree from scratch without using the previous root node.
Rebuilding a tree will be at least O(n) in the best case and O(h*n) otherwise.
Alternative solution
That's why it's better not to delete a node physically, but keep it in a tree and just mark it as deleted. This way it will be used, as before, for inserting new nodes, but will be excluded from suggestion results for a misspelled word. This can be done in O(1).
Let k and 2k be keys in a B-tree, B.
Assume that the depth of B is reduced if the key k is deleted.
It is necessary the case that if we delete the key 2k instead, the depth of B will be reduced as well?
I'm having a hard time to visualize and solve this, can someone please show me how should I think about this and solve this?
Assuming a classical B-tree that constrains the number of keys per node: a precondition for a node merge involving the root - and thus a reduction in height - is that the root have exactly one key and exactly two children, both of which have exactly the minimum allowable number of keys. The state of affairs at deeper levels of the tree does not matter.
If the top of the tree looks as described, then the height reduction can be triggered by deleting the root's only key or one of the keys in its two children. That deletion could be direct, or it could be the result of a change rippling up after a deletion deeper in the tree.
In any case there are many constellations where deletion of key 2k will not trigger a height reduction under the same circumstances. There are many conditions that can prevent a height reduction after the deletion of key 2k: the key residing in a 'safe' node (with more than the minimum number of keys) or having a 'safe' parent, the existence of 'safe' siblings somewhere along the path so that borrowing becomes possible, etc. pp.
Visualisation resources on the web are discussed in another topic here:
Are there any B-tree programs or sites that show visually how a B-tree works
Consider the following 2-3-4 tree (i.e., B-tree with a minimum degree of two) in
which each data item is a letter. The usual alphabetical ordering of letters is used
in constructing the tree.
What is the result of inserting G in the above tree?
I am getting the answer as
But the answer in solution key is
Can anyone explain how to get the answer provided by the solution key?
As long the invariants are not violated, the operation is technically valid. The insertion algorithm in CLRS splits on the way down, so it would split the root like you did.
However, another implementation might observe that the second child is empty and the first is full. That means the "rotation" can be done and the root node count is unaffected. The rotation involves pushing L down into the second child (prepending) and pulling up I up into L's previous place in the root. Now the first child has only two entries and you can insert into it.
Animated insertion using the CLRS method you used
I have come across this question and haven't been able to answer it.
Given a B-tree of order 9 and of 4 levels will insertion and right after it removal of a new item x will always bring the tree to its first structure?
Will removal and insertion of a existing item x always bring the tree to its first structure?
Prove it.
So far i tried to disprove it but haven't been able to.
Now i honestly can't find the answer, I am not asking for a full proof a general idea on how to prove it will satisfy me.
The answer obviously depends on the implementation of the insert and delete methods but in short: no.
I won't give you a full proof (because you didn't ask for it and because I'm too lazy) but the general idea should be that usually when you delete a node the inner-most node of the opposite side (relative to the parent) takes its place. So in any scenario where that node exists, it will be moved up. It also means that the node was not a leaf, which is a problem because insertion usually puts new nodes on the tree as a leaf. So the original structure will only be maintained if the inner-most node of the opposite side (relative to the parent) is empty.
This is the deletion I'm referring to. If you remove 2 and re-insert it, that's the counter proof.
For a B+ tree insertion why would you traverse down the tree then back upwards splitting the parents?
Wikipedia suggests this method of insertion:
Perform a search to determine what bucket the new record should go
into.
If the bucket is not full (at most b - 1 entries after the insertion), add the record.
Otherwise, split the bucket.
Allocate new leaf and move half the bucket's elements to the new bucket.
Insert the new leaf's
smallest key and address into the parent.
If the parent is full, split it too.
Add the middle key to the parent node.
Repeat until a parent is found that need not split.
If the root splits, create a new root which has one key and two
pointers.
Why would you traverse down then tree and then go back up performing the splits? Why not split the nodes as you encounter them on the way down?
To me, the proposed method performs twice the work and requires more bookkeeping as well.
Can anyone explain why this is the preferred method for insertion as opposed to splitting on the way down and what the disadvantages are for inserting during the traversal?
You have to backtrack up the tree because you don't actually know whether a split is required at the lowest level until you get there.
It's all there in the phrase "If the bucket is not full, ...".
You should also be aware that it's nowhere near twice the work. Since you're remembering all sorts of stuff on the way down (node pointers, indexes within the node, and so on), there's not as much calculation or searching on the way back up.