B-tree insertion: during the descend in the tree, Why we split every node with 2t-1 elements? - algorithm

In B-tree insertion algorithm, I see that in order to solve the case in which we need to insert an element to a leaf with 2t-1 elements, we need to do split algorithm to the tree. Something I don't understand is why in the insertion algorithm during the descend in the tree (to the willing point) we split every node with 2t-1 elements, even though I seems useless. for example
example
I understand that there is a case in which couple of nodes above the leaf got 2t-1 elements, and in case we want move the median to them we face problem, but why not to give pinpoint solution for that, instead of doing split every time.
correct me if I say something wrong.

We split the full nodes on the way down to the target position because we don't know if we will need to "go back up." You can do it the way you are thinking, where we go down to the target node, split it, and then insert the median of the split into the parent, recursively splitting nodes as needed. But this requires us to go from the root, down to the target, and back up, potentially all the way to the root again. This might be undesirable, e.g. if accessing the nodes twice would be too expensive. In that case, it may be better to go in one pass straight down, where you split any full nodes to anticipate the need for more space.
For a demonstration, you can try inserting 10 into the trees in the middle and on the bottom of your drawing. The tree on the bottom, unsplit, needs to be split all the way to the root in the same way as the middle tree, because the two-pass algorithm didn't leave any space. In the middle tree, inserting 10 still causes a split, but it doesn't extend all the way up because the top two layers of the tree are very spacious.
There is an important caveat, though. Let t be the minimum number of children per node. For the two pass algorithm, the maximum number of children a node can have needs to be at least u = 2t - 1. If it is less, like 2t - 2, then splitting a full node (2t - 3 elements), even with the additional element to insert, will not be able to make two non-deficient nodes. The one pass algorithm requires a higher maximum, u = 2t. This is because the two-pass algorithm always has an element on hand to cancel exactly one deficiency. The one-pass algorithm does not have this ability, as it sometimes splits nodes unnecessarily, so it can't stick the element it's holding into one of the deficiencies. It might not belong there.

I've implemented B-trees several times, and have never split nodes on the way down.
Usually I do insert recursively, such that node->insert(key,data) can return a new key to insert in the parent. The parent calls insert on the child node, and if the child splits it returns a new key to the parent. If the parent splits then it returns the a key to it's parent, etc.
I've found that the insert implementation can stay pretty clean this way.

Related

Implementing the Rope data structure using binary search trees (splay trees)

In a standard implementation of the Rope data structure using splay trees, the nodes would be ordered according to a rank statistic measuring the position of each one from the start of the string, so the keys normally found in binary search tree would be irrelevant, would they not?
I ask because the keys shown in the graphic below (thanks Wikipedia!) are letters, which would presumably become non-unique once the number of nodes exceeded the length of the chosen alphabet. Wouldn't it be better to use integers or avoid using keys altogether?
Separately, can anyone point me to a good implementation of the logic to recompute rank statistics after each operation?
Presumably, if the index for a split falls within the substring attached to a particular node, say, between "Hel" and "llo_" on the node E above, you would remove the substring from E, split it and reattach it as two children of E. Correct?
Finally, after a certain number of such operations, the tree could, I suppose, end up with as many leaves as letters. What would be the best way to keep track of that and prune the tree (by combining substrings) as necessary?
Thanks!
For what it's worth, you can implement a Rope using Splay Trees by attaching a substring to each node of the binary search tree (not just to the leaf nodes as shown above).
The rank of each node is its size plus the size of its left subtree. But when recomputing ranks during splay operations, you need to remember to walk down the node.left.right branch, too.
If each node records a reference to the substring it represents (cf. the actual substring itself), everything runs faster. That way when a split operation falls within an existing node, you just need to modify the node's attributes to reflect the right part of the substring you want to split, then add another node to represent the left part and merge it with the left subtree.
Done as above, each node records (in addition its left, right and parent attributes etc.) its rank, size (in characters) and the location of the first character it represents in the string you're trying to modify. That way, you never actually modify the initial string: you just do your operations on bits of the tree and reproduce the final string when you're ready by walking it in order.

Most performant way to find all the leaf nodes in a tree data structure

I have a tree data structure where each node can have any number of children, and the tree can be of any height. What is the optimal way to get all the leaf nodes in the tree? Is it possible to do better than just traversing every path in the tree until I hit the leaf nodes?
In practice the tree will usually have a max depth of 5 or so, and each node in the tree will have around 10 children.
I'm open to other types of data structures or special trees that would make getting the leaf nodes especially optimal.
I'm using javascript but really just looking for general recommendations, any language etc.
Thanks!
Memory layout is essential to optimal retrieval, so the child lists should be contiguous and not linked list, the nodes should be place after each other in retrieval order.
The more static your tree is, the better layout can be done.
All in one layout
All in one array totally ordered
Pro
memory can be streamed for maximal throughput (hardware pre-fetch)
no unneeded page lookups
normal lookups can be made
no extra memory to make linked lists.
internal nodes use offset to find the child relative to itself
Con
inserting / deleting can be cumbersome
insert / delete O(N)
insert might lead to resize of the array leading to a costly copy
Two array layout
One array for internal nodes
One array for leafs
Internal nodes points to the leafs
Pro
leaf nodes can be streamed at maximum throughput (maybe the best layout if your mostly only interested in the leafs).
no unneeded page lookups
indirect lookups can be made
Con
if all leafs are ordered insert / delete can be cumbersome
if leafs are unordered insertion is ease, just add at the end.
deleting unordered leafs is also a problem if no tombstones are allowed as the last leaf would have to be moved back and the internal nodes would need fix up. (via a further indirection this can also be fixed see slot-map)
resizing of the either might lead to a large copy, though less than the All-in-one as they could be done independently.
Array of arrays (dynamic sized, C++ vector of vectors)
using contiguous arrays for referencing the children of each node
Pro
running through each child list is fast
each child array may be resized independently
Con
while removing much of the extra work of linked list children the individual lists are dispersed among all other data making lookup taking extra time.
insert might cause resize and copy of an array.
Finding the leaves of a tree is O(n), which is optimal for a tree, because you have to look at O(n) places to retrieve all n things, plus the branch nodes along the way. The constant overhead is the branch nodes.
If we increase the branching factor, e.g. letting each branch have 32 children instead of 2, we significantly decrease the number of overhead nodes, which might make the traversal faster.
If we skip a branch, we're not including the values in that branch, so we have to look at all branches.

HRW rendezvous hashing in log time?

The Wikipedia page for Rendezvous hashing (Highest Random Weight "HRW") makes the following claim:
While it might first appear that the HRW algorithm runs in O(n) time, this is not the case. The sites can be organized hierarchically, and HRW applied at each level as one descends the hierarchy, leading to O(log n) running time, as in.[7]
I got a copy of the referenced paper, "Hash-Based Virtual Hierarchies for Scalable Location Service in Mobile Ad-hoc Networks." However the hierarchy referenced in their paper seems to be very specific to their application domain. As far as I can discern, there is no clear indication of how to generalize the method. The Wikipedia remark makes it seem like log is the general case.
I looked at a few general HRW implementations, and none of them seemed to support anything better than linear time. I gave it some thought, but I don't see any way to organize sites hierarchically without causing parent nodes to cause inefficient remapping when they drop out, significantly defeating the main advantage of HRW.
Does anybody know how to do this? Alternatively, is Wikipedia incorrect about there being a general way to implement this in log time?
Edit: Investigating mcdowella's approach:
OK, I think I see how this could work. But you need a little more than you've specified.
If you just do what you've described, you get in a situation where each leaf probably just has either zero or one nodes in it, and there's significant variance in how many nodes are in the leaf-most subtrees. If you swap using HRW at each level with just making the whole thing a regular search tree, you get exactly the same effect. Essentially, you've got an implementation of consistent hashing, along with its flaw of having unequal loading between buckets. Computing the combined weights, the defining implementation of HRW, adds nothing; you're better off just doing a search at each level, since it saves doing the hashes, and can be implemented without looping over each radix value
It's fixable though: you just need to be using HRW to choose from many alternatives at the final level. That is, you need all of the leaf nodes to be in large buckets, comparable to the number of replicas you'd have in consistent hashing. These large buckets should be approximately equally-loaded compared to each other, and then you're using HRW to choose the specific site. Since the bucket sizes are fixed, this is an O(n) algorithm, and we get all of the key HRW properties.
Honestly though, I think this is pretty questionable. It isn't so much an implementation of HRW, as it is just combining HRW with consistent hashing. I guess there's nothing wrong with that, and it might even be better than the usual technique of using replicas, in some cases. But I think it's misleading to state that HRW is log(n), if this is actually what the author meant.
Additionally, the original description is also questionable. You don't need to apply HRW at each level, and you shouldn't, as there is no advantage in doing so; you should do something fast (such as indexing), and just use HRW for the final choice.
Is this really the best we can do, or is there some other way to make HRW O(log(n))?
If you give each site a sufficiently long random id expressed in radix k (perhaps by hashing a non-random id) then you can associate the sites with leaves of a tree which has at most k descendants at each node. There is no need to associate any site with an internal node of the tree.
To work out where to store an item, use HRW to work out from the root of the tree down which way to branch at tree nodes, stopping when you reach a leaf, which is associated with a site. You can do this without having to communicate with any site until you work out which site you want to store the item at - all you need to know is the hashed ids of the sites to construct a tree.
Because sites are associated only with leaves there is no way an internal node of the tree can drop out, except if all of the sites associated with leaves under it drop out, at which point it will become irrelevant.
I don't buy the updated answer. There are two nice properties of HRWs that appear to get lost when you compare the weights of branches instead of all sites.
One is that you can pick the top-n sites instead of just the primary, and these should be randomly distributed. If you're descending into a single tree, the top-n sites will be near each other in the tree. This could be fixed by descending multiple times with different salts but that seems like a lot of extra work.
Two is that it is obvious what happens when a site is added or remove and only 1/|sites| of the data moves in the case of an add. If you modify the existing tree, it only affects the peer site. In the case of an add, the only data that moves is from the new peer of the added site. In the case of a delete, all the data that was at that site now moves to the former peer. If you instead recompute the tree, all of the data could move depending on the way the tree is constructed.
I think you can use the same "virtual node" approach normally used for consistent hashing. Suppose you have N physical nodes with IDs:
{n1,...,nN}.
Choose V, the number of virtual nodes per physical node, and generate a new list of IDs:
{n1v1,v1v2,...,n1vV
,n2v1,n2v2,...,n2vV
,...
,nNv1,nNv2,...,nNvV}.
Arrange these into the leaves of a fixed but randomized binary tree with labels on the internal nodes. These internal labels could be, for example, a concatenation of the labels of its child nodes.
To choose a physical node to store an object O at, start at the root and choose the branch with the higher hash H(label,O). Repeat the process until you reach a leaf. Store the object at the physical node corresponding to the virtual node at that leaf. This takes O(log(NV)) = O(log(N)+log(V)) = O(log(N)) steps (since V is constant).
If a physical node fails, the objects at that node are rehashed, skipping over subtrees with no active leaves.
One way to implement HRW rendezvous hashing in log time
One way to implement rendezvous hashing in O(log N), where N is the number of cache nodes:
Each file named F is cached in the cache node named C with the largest weight w(F,C), as is normal in rendezvous hashing.
First, we use a nonstandard hash function w() something like this:
w(F,C) = h(F) xor h(C).
where h() is some good hash function.
tree construction
Given some file named F, rather than calculate w(F,C) for every cache node -- which requires O(N) time for each file --
we pre-calculate a binary tree based only on the hashed names h(C) of the cache nodes;
a tree that lets us find the cache node with the maximum w(F,C) value in O(log N) time for each file.
Each leaf of the tree contains the name C of one cache node.
The root (at depth 0) of the tree points to 2 subtrees.
All the leaves where the most significant bit of h(C) is 0 are in the root's left subtree; all the leaves where the most significant bit of h(C) are 1 are in the root's right subtree.
The two children of the root node (at depth 1) deal with the next-most-significant bit of h(C).
And so on, with the interior nodes at depth D dealing with the D'th-most-significant bit of h(C).
With a good hash function, each step down from the root approximately halves the candidate cache nodes in the chosen subtree,
so we end up with a tree of depth roughly ln_2 N.
(If we end up with a tree with that is "too unbalanced",
somehow get everyone to agree on some different hash function from some universal hashing family rebuild the tree, before we add any files to the cache, until we get a tree that is "not too unbalanced").
Once the tree has been built, we never need to change it no matter how many file names F we later encounter.
We only change it when we add or remove cache nodes from the system.
filename lookup
For a filename F that happens to hash to h(F) = 0 (all zero bits),
we find the cache node with the highest weight (for that filename) by starting at the root and always taking the right subtree when possible.
If that leads us to an interior node that doesn't have a right subtree, then we take its left subtree.
Continue until we reach a node without a left or right subtree -- i.e., a leaf node that contains the name of the selected cache node C.
When looking up some other file named F, first we hash its name to get h(F), then
we start at the root and go right or left respectively (if possible) determined by the next bit in h(F) is 0 or 1.
Since the tree (by construction) is not "too unbalanced",
traversing the whole tree from the root to the leaf that contains the name of the chosen cache node C requires O(ln N) time in the worst case.
We expect that for a typical set of file names,
the hash function h(F) "randomly" chooses left or right at each depth of the tree.
Since the tree (by construction) is not "too unbalanced",
we expect each physical cache node to cache roughly the same number of files (within a multiple of 4 or so).
drop out effects
When some physical cache node fails,
everyone deletes the corresponding leaf node from their copy of this tree.
(Everyone also deletes every interior node that then has no leaf descendants).
This doesn't require moving around any files cached on any other cache node -- they still map to the same cache node they always did.
(The right-most leaf node in a tree is still the right-most leaf node in that tree, no matter how many other nodes in that tree are deleted).
For example,
....
\
|
/ \
| |
/ / \
| X |
/ \ / \
V W Y Z
With this O(log N) algorithm, when cache node X dies, leaf X is deleted from the tree, and all its files become (hopefully relatively evenly) distributed between Y and Z -- none of the files from X end up at V or W or any other cache node.
All the files that previously went to cache nodes V, W, Y, Z continue to go to those same cache nodes.
rebalancing after dropout
Many cache nodes failing or new cache nodes adding or both, may make the tree "too unbalanced".
Picking a new hash function is a big hassle after we've added a bunch of files to the cache, so rather than pick a new hash function like we did when initially constructing the tree, maybe it would be better to somehow rebalance the tree by remove a few nodes, rename them with some new semi-random names, and then add them back to the system.
Repeat until the system is no longer "too unbalanced".
(Start with the most unbalanced nodes -- the nodes cacheing the least amount of data).
comments
p.s.:
I think this may be pretty close to what mcdowella was thinking,
but with more details filled in to clarify that (a) yes, it is log(N) because it's a binary tree that is "not too unbalanced", (b) it doesn't have "replicas", and (c) when one cache node fails, it doesn't require any remapping of files that were not on that cache node.
p.p.s.:
I'm pretty sure that Wikipedia page is wrong to imply that typical implementations of rendezvous hashing occur in O(log N) time, where N is the number of cache nodes.
It seems to me (and I suspect the original designers of the hash as well) that the time it takes to (internally, without communicating) recalculate a hash against every node in the network is going to be insignificant and not worth worrying about compared to the time it takes to fetch data from some remote cache node.
My understanding is that rendezvous hashing is almost always implemented with a simple linear algorithm that uses O(N) time, where N is the number of cache nodes, every time we get a new filename F and want to choose the cache node for that file.
Such a linear algorithm has the advantage that it can use a "better" hash function than the above xor-based w(), so when some physical cache node dies, all the files that were cached on the now-dead node are expected to become evenly distributed among all the remaining nodes.

Insertion In 2-3-4 tree

Consider the following 2-3-4 tree (i.e., B-tree with a minimum degree of two) in
which each data item is a letter. The usual alphabetical ordering of letters is used
in constructing the tree.
What is the result of inserting G in the above tree?
I am getting the answer as
But the answer in solution key is
Can anyone explain how to get the answer provided by the solution key?
As long the invariants are not violated, the operation is technically valid. The insertion algorithm in CLRS splits on the way down, so it would split the root like you did.
However, another implementation might observe that the second child is empty and the first is full. That means the "rotation" can be done and the root node count is unaffected. The rotation involves pushing L down into the second child (prepending) and pulling up I up into L's previous place in the root. Now the first child has only two entries and you can insert into it.
Animated insertion using the CLRS method you used

Disadvantages of top-down node splitting on insertion into B+ tree

For a B+ tree insertion why would you traverse down the tree then back upwards splitting the parents?
Wikipedia suggests this method of insertion:
Perform a search to determine what bucket the new record should go
into.
If the bucket is not full (at most b - 1 entries after the insertion), add the record.
Otherwise, split the bucket.
Allocate new leaf and move half the bucket's elements to the new bucket.
Insert the new leaf's
smallest key and address into the parent.
If the parent is full, split it too.
Add the middle key to the parent node.
Repeat until a parent is found that need not split.
If the root splits, create a new root which has one key and two
pointers.
Why would you traverse down then tree and then go back up performing the splits? Why not split the nodes as you encounter them on the way down?
To me, the proposed method performs twice the work and requires more bookkeeping as well.
Can anyone explain why this is the preferred method for insertion as opposed to splitting on the way down and what the disadvantages are for inserting during the traversal?
You have to backtrack up the tree because you don't actually know whether a split is required at the lowest level until you get there.
It's all there in the phrase "If the bucket is not full, ...".
You should also be aware that it's nowhere near twice the work. Since you're remembering all sorts of stuff on the way down (node pointers, indexes within the node, and so on), there's not as much calculation or searching on the way back up.

Resources