Which element is the 'middle' in a B-Tree of even order? - b-tree

If I have an B-Tree of order 4 with the following data in it...
and I need to add 2 to the tree; do I...
add the 2 to the node (making it invalid, as it now has 4 keys), then split the node, taking the value 2 as the middle value and propagating it up
OR
do I not add the 2, take 3 as the middle value, propagate 3 up, then add 2 into the correct node?
Excuse the poor diagram.

You perform the first option. For a B-tree of any order you always add the node then perform splits that propagate upwards. For a great interactive demonstration of a variety of basic (insert, delete, search) operations on data structures, there is a useful algorithm visualization page I go to located here. Find the B-tree page and you will find that it performs option 1.

How to find which element to push upward:
1)Push the element in proper position of Btree and check if overflow occurs.
If then follow steps 2 and 3 given below.
2)find CEILING((order of Btree+1)/2).
3)Move that index element upward giving two pointers to left and right subtree.
Note:First insert the element then follow steps 2 and 3 if overflow occurs.
Here in this example first insert 2.
The partial leaf of the tree becomes |1| 2| 3| 5|.
overflow occurs because only 3 keys can there be in any node.
Find ceiling ((4+1)/2)= ceiling(5/2)= 3 (index no)
3rd index value 3 is the middle element. so propagate it up. 3's left pointer points to 1|2 and right points to 5.

Related

Given a list of keys, how do we find the almost complete binary search tree of that list?

I saw an answer here with the idea implemented in Python (not very familiar with Python) - I was looking for a more general algorithm.
EDIT:
For clarification:
Say we are given a list of integer keys: 23 44 88 12 74 32 7 39 10
That list was chosen arbitrarily. We are to create an almost complete (or complete) binary search tree from that list. There is supposed to be only one such tree...how do we find it?
A binary search tree is constructed so that all items on a node's left subtree are less than the node, and all nodes on the right subtree are greater than the node.
A complete (or almost complete) binary tree is one in which all levels except possibly the last are completely full, and the bottom level is filled to the left.
So, for example, this is an almost-complete binary search tree:
4
/ \
2 5
/ \
1 3
This is not:
3
/ \
2 4
/ \
1 5
Because the bottom level of the tree is not filled from the left.
If the number of items is one less than a power of two (i.e. 3, 7, 15, etc.), then building the tree is easy. Start by sorting the list. Then, take the middle element as the root. So if you have [1,2,3,4,5,6,7], and the root node is 4.
You do the same thing recursively for the right and left halves of the array.
If the number of items is not one less than a power of two, you have to adjust the starting point (the root node) so that the bottom row is left-filled. Note that you might have to apply that adjustment recursively, as well, whenever your subtree length is not one less than a power of two.
Since this is a homework assignment, I'll leave that for you to figure out.

Geneal B+ Tree split logic

I just want to know if you would split a leaf node after the insert or before the insert. lets say our capacity in the leaf is 4 elements and we already have 3 elements in there. would you add the 4th element and immediately split after the insert so we have now two nodes holding 2 elements each. Or would you just add the 4th element so that the leaf is full. Now if you add the 5th element (which would cause an overflow) we do the split and add the element which would result in 2 leaf nodes one holding 2 and one holding 3 elements.
EDIT: Since I have seed both approaches out there in the www. I would like to know the reason when to choose solution 1 or 2. Or if one of them even is incorrect for some reason.
https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
This visualization is very useful to understand B+ tree logic.

Why does adding nodes in (reverse) order make for inefficient searching?

I am preparing for an exam and I have stumbled on the following question:
Draw the binary search tree that would result if data were to be added in the following order:
10,9,8,7,6,5,4,3
Why is the tree that results unsuitable for efficient searching?
My Answer:
I would have thought when creating a BST that we start with the value 10 as the root node then add 9 as the left sub tree value on the first level. Then 8 to the left subtree of 9 and so on. I don't know why this makes it inefficient for searching though. Any ideas?
Since the values are in decreasing order, they get added to the left at each level, practically leaving you with a linked list, which takes O(N) to search, instead of the preferred O(logN) of a BST.
Drawing:
10
/
9
/
8
/
7
/
6
/
5
/
4
/
3
This would create a linked-list, since it would just be a series of nodes; which is a heavily unbalanced tree.
You should look up red-black trees. They have the same time-complexities, but it will constantly move around nodes, so that it is always forming a triangular shape. This will keep the tree balanced.
This is inefficient because the node will always be added to the left subtree of the prior node. Making a search check every every node in the list until it finds the result even though the answer will always be to the left thus actually making it take more computations than just simply having a list that is searched through a loop.

Indexing count of buckets

So, here is my little problem.
Let's say I have a list of buckets a0 ... an which respectively contain L <= c0 ... cn < H items. I can decide of the L and H limits. I could even update them dynamically, though I don't think it would help much.
The order of the buckets matter. I can't go and swap them around.
Now, I'd like to index these buckets so that:
I know the total count of items
I can look-up the ith element
I can add/remove items from any bucket and update the index efficiently
Seems easy right ? Seeing these criteria I immediately thought about a Fenwick Tree. That's what they are meant for really.
However, when you think about the use cases, a few other use cases creep in:
if a bucket count drops below L, the bucket must disappear (don't worry about the items yet)
if a bucket count reaches H, then a new bucket must be created because this one is full
I haven't figured out how to edit a Fenwick Tree efficiently: remove / add a node without rebuilding the whole tree...
Of course we could setup L = 0, so that removing would become unecessary, however adding items cannot really be avoided.
So here is the question:
Do you know either a better structure for this index or how to update a Fenwick Tree ?
The primary concern is efficiency, and because I do plan to implement it cache/memory considerations are worth worrying about.
Background:
I am trying to come up with a structure somewhat similar to B-Trees and Ranked Skip Lists but with a localized index. The problem of those two structures is that the index is kept along the data, which is inefficient in term of cache (ie you need to fetch multiple pages from memory). Database implementations suggest that keeping the index isolated from the actual data is more cache-friendly, and thus more efficient.
I have understood your problem as:
Each bucket has an internal order and buckets themselves have an order, so all the elements have some ordering and you need the ith element in that ordering.
To solve that:
What you can do is maintain a 'cumulative value' tree where the leaf nodes (x1, x2, ..., xn) are the bucket sizes. The value of a node is the sum of values of its immediate children. Keeping n a power of 2 will make it simple (you can always pad it with zero size buckets in the end) and the tree will be a complete tree.
Corresponding to each bucket you will maintain a pointer to the corresponding leaf node.
Eg, say the bucket sizes are 2,1,4,8.
The tree will look like
15
/ \
3 12
/ \ / \
2 1 4 8
If you want the total count, read the value of the root node.
If you want to modify some xk (i.e. change correspond bucket size), you can walk up the tree following parent pointers, updating the values.
For instance if you add 4 items to the second bucket it will be (the nodes marked with * are the ones that changed)
19*
/ \
7* 12
/ \ / \
2 5* 4 8
If you want to find the ith element, you walk down the above tree, effectively doing the binary search. You already have a left child and right child count. If i > left child node value of current node, you subtract the left child node value and recurse in the right tree. If i <= left child node value, you go left and recurse again.
Say you wanted to find the 9th element in the above tree:
Since left child of root is 7 < 9.
You subtract 7 from 9 (to get 2) and go right.
Since 2 < 4 (the left child of 12), you go left.
You are at the leaf node corresponding to the third bucket. You now need to pick the second element in that bucket.
If you have to add a new bucket, you double the size of your tree (if needed) by adding a new root, making the existing tree the left child and add a new tree with all zero buckets except the one you added (which we be the leftmost leaf of the new tree). This will be amortized O(1) time for adding a new value to the tree. Caveat is you can only add a bucket at the end, and not anywhere in the middle.
Getting the total count is O(1).
Updating single bucket/lookup of item are O(logn).
Adding new bucket is amortized O(1).
Space usage is O(n).
Instead of a binary tree, you can probably do the same with a B-Tree.
I still hope for answers, however here is what I could come up so far, following #Moron suggestion.
Apparently my little Fenwick Tree idea cannot be easily adapted. It's easy to append new buckets at the end of the fenwick tree, but not in it the middle, so it's kind of a lost cause.
We're left with 2 data structures: Binary Indexed Trees (ironically the very name Fenwick used to describe his structure) and Ranked Skip List.
Typically, this does not separate the data from the index, however we can get this behavior by:
Use indirection: the element held by the node is a pointer to a bucket, not the bucket itself
Use pool allocation so that the index elements, even though allocated independently from one another, are still close in memory which shall helps the cache
I tend to prefer Skip Lists to Binary Trees because they are self-organizing, so I'm spared the trouble of constantly re-balancing my tree.
These structures would allow to get to the ith element in O(log N), I don't know if it's possible to get faster asymptotic performance.
Another interesting implementation detail is I have a pointer to this element, but others might have been inserted/removed, how do I know the rank of my element now?
It's possible if the bucket points back to the node that owns it. But this means that either the node should not move or it should update the bucket's pointer when moved around.

Permuting a binary tree without the use of lists

I need to find an algorithm for generating every possible permutation of a binary tree, and need to do so without using lists (this is because the tree itself carries semantics and restraints that cannot be translated into lists). I've found an algorithm that works for trees with the height of three or less, but whenever I get to greater heights, I loose one set of possible permutations per height added.
Each node carries information about its original state, so that one node can determine if all possible permutations have been tried for that node. Also, the node carries information on weather or not it's been 'swapped', i.e. if it has seen all possible permutations of it's subtree. The tree is left-centered, meaning that the right node should always (except in some cases that I don't need to cover for this algorithm) be a leaf node, while the left node is always either a leaf or a branch.
The algorithm I'm using at the moment can be described sort of like this:
if the left child node has been swapped
swap my right node with the left child nodes right node
set the left child node as 'unswapped'
if the current node is back to its original state
swap my right node with the lowest left nodes' right node
swap the lowest left nodes two childnodes
set my left node as 'unswapped'
set my left chilnode to use this as it's original state
set this node as swapped
return null
return this;
else if the left child has not been swapped
if the result of trying to permute left child is null
return the permutation of this node
else
return the permutation of the left child node
if this node has a left node and a right node that are both leaves
swap them
set this node to be 'swapped'
The desired behaviour of the algoritm would be something like this:
branch
/ |
branch 3
/ |
branch 2
/ |
0 1
branch
/ |
branch 3
/ |
branch 2
/ |
1 0 <-- first swap
branch
/ |
branch 3
/ |
branch 1 <-- second swap
/ |
2 0
branch
/ |
branch 3
/ |
branch 1
/ |
0 2 <-- third swap
branch
/ |
branch 3
/ |
branch 0 <-- fourth swap
/ |
1 2
and so on...
The structure is just completely unsuited for permutations, but since you know it's left-centered you might be able to make some assumptions that help you out.
I tried working it in a manner similar to yours, and I always got caught on the fact that you only have a binary piece of information (swapped or not) which isn't sufficient. For four leaves, you have 4! (24) possible combinations, but you only really have three branches (3 bits, 8 possible combinations) to store the swapped state information. You simply don't have a place to store this information.
But maybe you could write a traverser that goes through the tree and uses the number of leaves to determine how many swaps are needed, and then goes through those swaps systematically instead of just leaving it to the tree itself.
Something like
For each permutation
Encode the permutation as a series of swaps from the original
Run these swaps on the original tree
Do whatever processing is needed on the swapped tree
That might not be appropriate for your application, but you haven't given that many details about why you need to do it the way you're doing it. The way you're doing it now simply won't work, since factorial (the number of permutations) grows faster than exponential (the number of "swapped" bits you have). If you had 8 leaves, you would have 7 branches and 8 leaves for a total of 15 bits. There are 40320 permutation of 8 leaves, and only 32768 possible combinations of 15 bits. Mathematically, you simply cannot represent the permutations.
What is wrong with making a list of all items in the tree, use generative means to build all possible orders (see Knuth Vol 4), and then re-map them to the tree structure?

Resources