Why different pairs of levels can have different block sizes? - caching

According to the CSAPP, the block size is fixed between any particular pair of adjacent levels in the hierarchy. So that the block size of level k and level k+1 is the same. And the block size of level k-1 and level k is the same. So that it can lead to level k-1 and level k having the same block size. And so on, the level L0 has the same block size as the level L2.
enter image description here

Related

proving level of median in heap

I need to prove that the median of binary heap (doesn't matter if it is a min heap or max heap) can be in the lowest level of the heap (in the leaf). I am not sure how to prove it. I thought about using the fact that a heap is a complete binary tree but I am not sure about it. How can I prove it?
As #Evg mentioned in the comments, if all elements are the same, this is trivially true. Assume that all elements need to be different, and let us focus on the case with an odd amount of nodes 2H+1 and a min heap (the max heap case is similar). To create the min heap where the median is at the bottom, first insert the smallest H elements.
There are two cases. Case 1; after doing this the binary tree formed by these H elements is completely filled (every layer is filled) then you can just insert the remaining H+1 elements on the last layer (which you can do since the maximum capacity of the last layer equals (#total_nodes+1)/2 which is precisely H+1).
Case 2 The last layer still has some unfilled spaces. In this case, take the smallest remaining nodes from the largest H elements until this layer is filled (note that there will be no upward movement in your heap since these elements are already larger than whatever is in the tree). Then start the next layer by inserting the median. Finally insert the remaining nodes, which won't be moved upwards either since they are larger than whatever is in the layer above, by construction. By the same argument about the capacity of the last layer, you will not need to start a new layer during this process.
In the case where there are an even amount of nodes 2H, you can argue similarly, but you would have to define the median as H+1 smallest node (otherwise the statement you want to prove is false, as you can see by noticing that the only possible min-heap for the set {1,2} is the tree with root at 1 and leaf at 2).
Easiest way to prove it is just to make one:
1
2 3
4 5 6 7
Any complete heap with nodes in level order will have the median at the left-most leaf, but you don't have to prove that.

Circular Queue using Dynamic Array

I'm learning data structures from a "Fundamentals of Data structures in C" by Sahni. In the topic, Circular Queue using Dynamic Array, the author has mentioned below point,
Let capacity be the initial capacity of the circular queue,We must
first increase the size of the array using realloc,this will copy
maximum of capacity elements on to the new array. To get a proper
circular queue configuration, we must slide elements in the right
segment(i.e, elements A and B) to the right end of the array(refer
diagram 3.7.d). The array doubling and the slide to the right together
copy at most 2 * capacity -2 elements.
I understand array doubling copies at most capacity elements. But how does array doubling and slide to right copy at most 2 * capacity -2 elements??
Let us try to justify worst case scenario:
For a queue with capacity = N, there are maximum N-1 elements present in the queue.
So, when we double the queue size, we need to copy all these N-1 elements to new queue, and at max there can be N-1 shifts(for elements).
So in total, 2*(N-1) = 2*N - 2

Why does it require no more than 1 + logN compares when inserting a new node into a heap?

I think when inserting a new node into a heap, the amount of nodes it might passes by is logN, why is it (1 + logN), where is 1 from?
This is necessary to account for the border case when the number of notes is 2n. A heap of n levels fits 2n-1 objects, so adding one more object starts the new level:
Black squares represent seven elements of a three-level heap. Red element is number eight. If your search takes you to the location of this last element, you end up with four comparisons, even though log28 is three.

How many larger items in top and middle levels of heap

I'm looking in a book I have for algorithms and came across this problem which I'm having trouble understanding. Using a max heap:
LARGE is the bigger half of numbers to be sorted. I.E. LARGE = { n/2, n/2 + 1, ... n }
Initially, how many items in LARGE can be within the top (log n)/4 levels of the heap, after build heap phase of HeapSort?
Let A be the items in LARGE which are neither in the bottom level nor within the top (log n)/4 levels initially. At least how many items are in A?
As found out in a previous answer, the number of elements in LARGE that can initially be on the bottom level is 2^(h-1). My theory is that if I find the ones in the top log n / 4 levels, I can subtract the bottom from the top and find the middle ones. Not sure how to go about finding the top levels though. I can assume n is large enough to make the quarter of levels even, i.e. 8 levels / 4 = 2 levels for each quarter.

min/max number of records on a B+Tree?

I was looking at the best & worst case scenarios for a B+Tree (http://en.wikipedia.org/wiki/B-tree#Best_case_and_worst_case_heights) but I don't know how to use this formula with the information I have.
Let's say I have a tree B with 1,000 records, what is the maximum (and maximum) number of levels B can have?
I can have as many/little keys on each page. I can also have as many/little number of pages.
Any ideas?
(In case you are wondering, this is not a homework question, but it will surely help me understand some stuff for hw.)
I don't have the math handy, but...
Basically, the primary factor to tree depth is the "fan out" of each node in the tree.
Normally, in a simply B-Tree, the fan out is 2, 2 nodes as children for each node in the tree.
But with a B+Tree, typically they have a fan out much larger.
One factor that comes in to play is the size of the node on disk.
For example, if you have a 4K page size, and, say, 4000 byte of free space (not including any other pointers or other meta data related to the node), and lets say that a pointer to any other node in the tree is a 4 byte integer. If your B+Tree is in fact storing 4 byte integers, then the combined size (4 bytes of pointer information + 4 bytes of key information) = 8 bytes. 4000 free bytes / 8 bytes == 500 possible children.
That give you a fan out of 500 for this contrived case.
So, with one page of index, i.e. the root node, or a height of 1 for the tree, you can reference 500 records. Add another level, and you're at 500*500, so for 501 4K pages, you can reference 250,000 rows.
Obviously, the large the key size, or the smaller the page size of your node, the lower the fan out that the tree is capable of. If you allow variable length keys in each node, then the fan out can easily vary.
But hopefully you can see the gist of how this all works.
It depends on the arity of the tree. You have to define this value. If you say that each node can have 4 children then and you have 1000 records, then the height is
Best case log_4(1000) = 5
Worst case log_{4/2}(1000) = 10
The arity is m and the number of records is n.
The best and worst case depends on the no. of children each node can have. For the best case, we consider the case, when each node has the maximum number of children (i.e. m for an m-ary tree) with each node having m-1 keys. So,
1st level(or root) has m-1 entries
2nd level has m*(m-1) entries (since the root has m children with m-1 keys each)
3rd level has m^2*(m-1) entries
....
Hth level has m^(h-1)*(m-1)
Thus, if H is the height of the tree, the total number of entries is equal to n=m^H-1
which is equivalent to H=log_m(n+1)
Hence, in your case, if you have n=1000 records with each node having m children (m should be odd), then the best case height will be equal to log_m(1000+1)
Similarly, for the worst case scenario:
Level 1(root) has at least 1 entry (and minimum 2 children)
2nd level has as least 2*(d-1) entries (where d=ceil(m/2) is the minimum number of children each internal node (except root) can have)
3rd level has 2d*(d-1) entries
...
Hth level has 2*d^(h-2)*(d-1) entries
Thus, if H is the height of the tree, the total number of entries is equal to n=2*d^H-1 which is equivalent to H=log_d((n+1)/2+1)
Hence, in your case, if you have n=1000 records with each node having m children (m should be odd), then the worst case height will be equal to log_d((1000+1)/2+1)

Resources