Does anyone know about how the sequence of insertions matter for 2-3-4 Trees? Or B-Trees?
It seems the formula for minimum height is logm(k+1) where m is the max no. of children and k is the number of keys
And the formula for max height is: logn((k+1)/2) where n is the min no. of children an internal node can have.
But what sequence of insertions actually get these results?! I don't know.
It has been suggested to minimise the height of the 2-3-4 tree, you would take the median of the linear sequence eg. 1,2,3,4,5,6,7,8 it being 4, and insert that, before rinse repeating for the sub lists either side of the median. Is this true? And if so, what sequence maximises the height?
Yes, the sequence of insertions matters. Obviously, the tree will be taller for the same number of keys if more nodes are 1-nodes. They way to maximise the number of 1-nodes in the tree is to continually expand one branch of the tree to 4-nodes, increasing the height of the tree while leaving many nodes as 1-nodes. Essentially, insert the keys pre-sorted. 1,2,3,..,k. For a minimum height tree, you want to expand in all branches evenly so as to fill up each layer of the tree. So, you insert the median of the keys, split up the insertion list at this key and then insert the medians from the two halves of the list and so on..
Related
In a traditional radix-balanced tree, the height of the tree can be determined quickly by counting the leading zeros of the number of elements of the tree, and indexes to children can be determined by taking successive chunks of bits from the element index.
However, consider a radix-balanced tree which is almost full. Now make that tree into a relaxed radix balanced tree, so that the tree is no longer "densely packed". This would require that the tree have additional height to accommodate all the elements which could only just barely have been contained by a fully-packed radix tree.
This means that the height of the tree is no longer given by clz(size), and the index into the first child is no longer taken from the first k nonzero bits of the element index (with k the log2 of the tree's branching factor). Instead we'd need start with a more significant chunk of bits in the index (which would be all be zero), and begin a linear search for the element in subtree number 0, "spilling over" into subtree number 1 if the first subtree didn't contain it.
Thus whether the tree "spills over" to a greater height depends on the size of the tree and the density of its packing.
In practice, how is this condition detected? (Obviously performance is important because these structures are designed for inner-loop usage!)
So in class one of my exercises was to find the insertion orders that result in a binary search tree with a minimum height and maximum height. The numbers being inserted were [1,2,3,4]. The resulting answer was this:
Figure 3.9:
However what I fail to understand is why the insertion orders 1324,1342,4213,4231 are not included as an insertion order resulting in a minimum height, as technically don't these result in a BST with a minimum height of 2 as well?
Thank you in advance!
Interesting that the text doesn't mention those four cases. They don't have the worst case height, but they aren't minimal either. There are two features that characterize a tree:
the maximum depth from the root to any node
the average depth from the root to any node
A tree like 1432 has maximum depth 3, and average depth (0+1+2+3)/4 = 1.50
A tree like 3124 has maximum depth 2, and average depth (0+1+1+2)/4 = 1.00
A tree like 1324 has maximum depth 2, but average depth (0+1+2+2)/4 = 1.25
The best possible tree has the smallest average as well as the smallest maximum depth. To put it another way, the best possible tree has every level (except for the last) completely filled.
For example, even though the two trees below have the same number of nodes, and the same maximum depth, the tree on the left is not a minimum height tree because it's missing a node at the third level (which means that the average depth will be greater than the tree on the right).
How many binary tree shapes of N nodes are there with height N-1?
Also, how would you go about proofing by induction?
So binary tree of height n-1 with node n means all node will have only 1 child, sort of chain like structure? So number of binary tree will be different permutation of n numbers which is n. Am I thinking in the right direction?
You are thinking in the right direction and you have correctly transformed the original problem to a simple one. However what is strange is that it is explicitly stated that the tree is "binary" when in fact the statement dictates even tighter constraint.
I'm working on an algorithm problem. Given n, generate all structurally unique binary search trees that store values 1...n. The solution was to enumerate each number i in the sequence, and use the number as the root, the subsequence 1…(i-1) on its left side would lay on the left branch of the root, and similarly the right subsequence (i+1)…n lay on the right branch of the root. Then construct the subtree from the subsequence recursively. This approach ensures that the BST constructed are all unique, since they have unique roots.
Now my question is: what if the trees are not limited to binary search trees, if it can be any binary tree. How would the solution be? I'd still want to go over all the cases with root i, where i = 1, ... n. The left subtree doesn't have to be in the range of 1...(i-1), right subtree doesn't have to be in the range of (i+1)...n. But then how to arrange them then? Create an arbitrary subset of (i-1) elements and apply??
Suppose you were given the following problem: given n disks, arrange them in unique binary-tree shapes. Then, following your correct reasoning in the question, you could say the following: I'll number the disks 1, 2, 3, .., n; then I'll (recursively) build the trees whose root is at disk #1, then disk #2, etc.
So the tree rooted-digraphs you (correctly) found really have nothing to do with the content in the nodes, let alone the question of whether the contents satisfied the BST invariant. Given your question here,
If the question is how many rooted digraphs exist, then it's the same as before.
If the question is how many combinations of rooted digraphs + node contents there are, then you just enumerate the rooted digraphs as you've done, and, for each one, enumerate the permutations of 1, 2, ... n.
If this is the case, and you don't need to enumerate, but rather approximate the number of such trees, then note that this is n! multiplied by the Catalan Numbers.
Use your algorithm for BSTs. This generates the unique shapes of trees. The shapes are unique, because for each root n there are n-1 elements in the left subtree, the rest are on the right.
Then, for each shape, there are n! orderings of elements. This gives the results for arbitrary trees.
The definition of balanced in this question is
The number of nodes in its left subtree and the number of nodes in its
right subtree are almost equal, which means their difference is not
greater than one
if given a n as the number of nodes in total, how many are there such trees?
Also what if we replace the number of nodes with height? Given a height, how many height balanced trees are there?
Well the difference will be made only by the last level, hence you can just find how many nodes should be left for that one, and just consider all possible combinations. Having n nodes you know that the height should be floor(log(n)) hence the same tree at depth k = floor(log(n)) - 1 is fully balanced, hence you know that is needs (m = sum(i=0..k)2^i) nodes, hence n-m nodes are left for the last level. Some definition of a balanced binary tree force "all the nodes to be left aligned", in this case it is obvious that there can be only one possibility, without this constraint you have combinations of 2^floor(log(n)) chooses n-m, because you have to pick which of the 2^floor(log(n)) possible slots you will assign with nodes, forcing a total of n-m nodes to be assigned.
For the height story you consider a sum of combinations of 2^floor(log(n)) chooses i as i goes from 1 to 2^floor(log(n)). You consider all possibilities of having either 1 node at the last level, then 2 and so on, until you don't make it a fully balanced binary tree, hence having all 2^floor(log(n)) slots assigned.