special minHeap, how to print all the n elements in O(n)? - algorithm

Speical minHeap is a minHeap which each level is sorted from left to right.
How can I print all the n elements by order in O(n) at worst case?
The minHeap is implemented by binary heap, in which the tree is a complete binary tree (see figure).
here is the example of a special minHeap:
So the result should be: [1,3,4,5,8,10,17,18,20,22,25,30]
Question from homework.

If n is a parameter independent of the size of the heap, then under a standard comparison-based model, this is impossible. You will need additional restrictions, like more preexisting order than you've mentioned, or all elements of the heap being integers under a sufficiently low bound.
Suppose you have a heap of height k, where the root and its chain of left children have values 1, 2, 3, ... k. We can assign values >k to the k-1 right children of these nodes in any order without violating the "special minheap" condition, then assign values greater than those to fill out the rest of the heap. Printing the top 2k-1 values in this heap requires sorting k-1 values that could be in any order, which cannot be done through comparisons in less than O(k*log(k)) time.
If n is supposed to be the size of the heap, this is straightforward. The heap invariant is unnecessary; it only matters that the layers are sorted. A mergesort merging the first and second layers, then merging each successive layer into the already-merged results, will take O(n) time. The kth merge merges 2^k-1 already-merged elements with <=2^k elements from the next layer, taking O(2^k) time. There are O(log(n)) merges, and summing O(2^k) from k=1 to k=log(n) gives O(n).

Each level of the heap is in ascending order. There are log(n) levels.
We can do a merge of the levels, which is O(n log k). k in this case is the number of levels, or log(n), so we know it's possible to do this in O(n * log(log n)).
The levels have 1, 2, 4, 8, 16, etc. nodes in them. The first merge operation removes the first level, so the number of items in our merge heap becomes k-1. In the worst case, after half of the nodes have been removed, the merge heap is k-2, etc.
I don't have the math at hand, but I suspect the solution involves showing that expanding the series (i.e. keeping track of the merge heap size and multiplying by the number of nodes that go through each size heap) reduces to 2, as mentioned in the comments.

Related

Why is the top down approach of heap construction less efficient than bottom up even though its order of growth is lower O(log n) over O(n)?

How is the bottom up approach of heap construction of the order O(n) ? Anany Levitin says in his book that this is more efficient compared to top down approach which is of order O(log n). Why?
That to me seems like a typo.
There are two standard algorithms for building a heap. The first is to start with an empty heap and to repeatedly insert elements into it one at a time. Each individual insertion takes time O(log n), so we can upper-bound the cost of this style of heap-building at O(n log n). It turns out that, in the worst case, the runtime is Θ(n log n), which happens if you insert the elements in reverse-sorted order.
The other approach is the heapify algorithm, which builds the heap directly by starting with each element in its own binary heap and progressively coalescing them together. This algorithm runs in time O(n) regardless of the input.
The reason why the first algorithm requires time Θ(n log n) is that, if you look at the second half of the elements being inserted, you'll see that each of them is inserted into a heap whose height is Θ(log n), so the cost of doing each bubble-up can be high. Since there are n / 2 elements and each of them might take time Θ(log n) to insert, the worst-case runtime is Θ(n log n).
On the other hand, the heapify algorithm spends the majority of its time working on small heaps. Half the elements are inserted into heaps of height 0, a quarter into heaps of height 1, an eighth into heaps of height 2, etc. This means that the bulk of the work is spent inserting elements into small heaps, which is significantly faster.
If you consider swapping to be your basic operation -
In top down construction,the tree is constructed first and a heapify function is called on the nodes.The worst case would swap log n times ( to sift the element to the top of the tree where height of tree is log n) for all the n/2 leaf nodes. This results in a O(n log n) upper bound.
In bottom up construction, you assume all the leaf nodes to be in order in the first pass, so heapify is now called only on n/2 nodes. At each level, the number of possible swaps increases but the number of nodes on which it happens decreases.
For example -
At the level right above leaf nodes,
we have n/4 nodes that can have at most 1 swap each.
At its' parent level we have,
n/8 nodes that can have at most 2 swaps each and so on.
On summation, we'll come up with a O(n) efficiency for bottom up construction of a heap.
It generally refers to a way of solving a problem. Especially in computer science algorithms.
Top down :
Take the whole problem and split it into two or more parts.
Find solution to these parts.
If these parts turn out to be too big to be solved as a whole, split them further and find find solutions to those sub-parts.
Merge solutions according to the sub-problem hierarchy thus created after all parts have been successfully solved.
In the regular heapify(), we perform two comparisons on each node from top to bottom to find the largest of three elements:
Parent node with left child
The larger node from the first comparison with the second child
Bottom up :
Breaking the problem into smallest possible(and practical) parts.
Finding solutions to these small sub-problems.
Merging the solutions you get iteratively(again and again) till you have merged all of them to get the final solution to the "big" problem. The main difference in approach is splitting versus merging. You either start big and split "down" as required or start with the smallest and merge your way "up" to the final solution.
Bottom-up Heapsort, on the other hand, only compares the two children and follows the larger child to the end of the tree ("top-down"). From there, the algorithm goes back towards the tree root (“bottom-up”) and searches for the first element larger than the root. From this position, all elements are moved one position towards the root, and the root element is placed in the field that has become free.
Binary Heap can be built in two ways:
Top-Down Approach
Bottom-Up Approach
In the Top-Down Approach, first begin with 3 elements. You consider 2 of them as heaps and the third as a key k. You then create a new Heap by joining these two sub-heaps with the key as the root node. Then, you perform Heapify to maintain the heap order (either Min or Max Heap order).
The, we take two such heaps(containing 3 elements each) and another element as a k, and create a new heap. We keep repeating this process, and increasing the size of each sub-heap until all elements are added.
This process adds half the elements in the bottom level, 1/4th in the second last one, 1/8th in the third last one and so on, therefore, the complexity of this approach results in a nearly observed time of O(n).
In the bottom up approach, we first simply create a complete binary tree from the given elements. We then apply DownHeap operation on each parent of the tree, starting from the last parent and going up the tree until the root. This is a much simpler approach. However, as DownHeap's worst case is O(logn) and we will be applying it on n/2 elements of the tree; the time complexity of this particular method results in O(nlogn).
Regards.

amortized analysis on min-heap?

If on empty min heap we doing n arbitrary insert and delete operations, (with given location of delete in min-heap). why the amortized analysis for insert is O(1) and delete is O(log n)?
a) insert O(log n), delete O(1)
b) insert O(log n), delete O(log n)
c) insert O(1), delete O(1)
d) insert O(1), delete O(log n)
any person could clarify it for me?
Based on your question and responses to comments, I'm going to assume a binary heap.
First, the worst case for insertion is O(log n) and the worst case for removal of the smallest item is O(log n). This follows from the tree structure of the heap. That is, for a heap of n items, there are log(n) levels in the tree.
Insertion involves (logically) adding the item as the lowest right-most node in the tree and then "bubbling" it up to the required level. If the new item is smaller than the root, then it has to bubble all the way to the top--all log(n) levels. So if you insert the numbers 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 into a min-heap, you'll hit the worst case for every insertion.
Removal of the smallest element involves replacing the lowest item (the root) with the last item and then "sifting" the item down to its proper position. Again, this can take up to log(n) operations.
That's the worst case. The average case is much different.
Remember that in a binary heap, half of the nodes are leafs--they have no children. So if you're inserting items in random order, half the time the item you're inserting will belong on the lowest level and there is no "bubble up" to do. So half the time your insert operation is O(1). Of the other half, half of those will belong on the second level up. And so on. The only time you actually do log(n) operations on insert is when the item you're inserting is smaller than the existing root item. It's quite possible, then, that the observed runtime behavior is that insertion is O(1). In fact that will be the behavior if you insert a sorted array into a min-heap. That is, if you were to insert the values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 in that order.
When removing the smallest item from a min-heap, you take the last item from the heap and sift it down from the top. The "half the time" rule comes into play again, but this time it's working against you. That last item you took from the heap probably belongs down there on the lowest level. So you have to sift it all the way back down, which takes log(n) operations. Half the time you'll have do to all log(n) operations. Half of the remaining you'll need to do all but one of them, etc. And in fact the minimum number of levels you have to sift down will depend on the depth of the tree. For example, if your heap has more than three items then you know that removing the smallest item will require at least one sift-down operation because the next-lowest item is always on the second level of the tree.
It turns out, then, that in the average case insertion into a binary heap takes much less than O(log n) time. It's likely closer to O(1). And removal from a binary heap is much closer to the worst case of O(log n).

Get the k smallest elements of an array using quick sort

How would you find the k smallest elements from an unsorted array using quicksort (other than just sorting and taking the k smallest elements)? Would the worst case running time be the same O(n^2)?
You could optimize quicksort, all you have to do is not run the recursive potion on the other portions of the array other than the "first" half until your partition is at position k. If you don't need your output sorted, you can stop there.
Warning: non-rigorous analysis ahead.
However, I think the worst-case time complexity will still be O(n^2). That occurs when you always pick the biggest or smallest element to be your pivot, and you devolve into bubble sort (i.e. you aren't able to pick a pivot that divides and conquers).
Another solution (if the only purpose of this collection is to pick out k min elements) is to use a min-heap of limited tree height ciel(log(k)) (or exactly k nodes). So now, for each insert into the min heap, your maximum time for insert is O(n*log(k)) and the same for removal (versus O(n*log(n)) for both in a full heapsort). This will give the array back in sorted order in linearithmic time worst-case. Same with mergesort.

Why is the runtime of building a heap by inserting elements worse than using heapify?

In the CLRS book, building a heap by top-down heapify has the complexity O(n). A heap can also be built by repeatedly calling insertion, which has the complexity nlg(n) in the worst case.
My question is: is there any insight why the latter method has the worse performance?
I asked this question since I feel there are simple insights behind the math. For example,
quicksort, merge sort, and heapsort are all based on reducing unnecessary comparisons, but with different methods.
quicksort: balanced partition, no need to compare left subset to right subset.
merge sort: simply compare the two minimum elements from two sub-arrays.
heapsort: if A has larger value than B, A has larger value than B's descendants, and no need to compare with them.
The main difference between the two is what direction they work: upwards (the O(n log n) algorithm) or downwards (the O(n)) algorithm.
In the O(n log n) algorithm done by making n insertions, each insertion might potentially bubble up an element from the bottom of the (current) heap all the way up to the top. So imagine that you've built all of the heap except the last full layer. Imagine that every time you do an insertion in that layer, the value you've inserted is the smallest overall value. In that case, you'd have to bubble the new element all the way up to the top of the heap. During this time, the heap has height (roughly) log n - 1, so the total number of swaps you'll have to do is (roughly) n log n / 2 - n / 2, giving a runtime of Θ(n log n) in the worst-case.
In the O(n) algorithm done by building the heap in one pass, new elements are inserted at the tops of various smaller heaps and then bubbled down. Intuitively, there are progressively fewer and fewer elements higher and higher up in the heap, so most of the work is spent on the leaves, which are lower down, than in the higher elements.
The major difference in the runtimes has to do with the direction. In the O(n log n) version, since elements are bubbled up, the runtime is bounded by the sum of the lengths of the paths from each node to the root of the tree, which is Θ(n log n). In the O(n) version, the runtime is bounded by the lengths of the paths from each node to the leaves of the tree, which is much lower (O(n)), hence the better runtime.
Hope this helps!

Efficiently find order statistics of unsorted list prefixes?

A is an array of the integers from 1 to n in random order.
I need random access to the ith largest element of the first j elements in at least log time.
What I've come up with so far is an n x n matrix M, where the element in the (i, j) position is the ith largest of the first j. This gives me constant-time random access, but requires n^2 storage.
By construction, M is sorted by row and column. Further, each column differs from its neighbors by a single value.
Can anyone suggest a way to compress M down to n log(n) space or better, with log(n) or better random access time?
I believe you can perform the access in O(log(N)) time, given O(N log(N)) preprocessing time and O(N log(N)) extra space. Here's how.
You can augment a red-black tree to support a select(i) operation which retrieves the element at rank i in O(log(N)) time. For example, see this PDF or the appropriate chapter of Introduction to Algorithms.
You can implement a red-black tree (even one augmented to support select(i)) in a functional manner, such that the insert operation returns a new tree which shares all but O(log(N)) nodes with the old tree. See for example Purely Functional Data Structures by Chris Okasaki.
We will build an array T of purely functional augmented red-black trees, such that the tree T[j] stores the indexes 0 ... j-1 of the first j elements of A sorted largest to smallest.
Base case: At T[0] create an augmented red-black tree with just one node, whose data is the number 0, which is the index of the 0th largest element in the first 1 elements of your array A.
Inductive step: For each j from 1 to N-1, at T[j] create an augmented red-black tree by purely functionally inserting a new node with index j into the tree T[j-1]. This creates at most O(log(j)) new nodes; the remaining nodes are shared with T[j-1]. This takes O(log(j)) time.
The total time to construct the array T is O(N log(N)) and the total space used is also O(N log(N)).
Once T[j-1] is created, you can access the ith largest element of the first j elements of A by performing T[j-1].select(i). This takes O(log(j)) time. Note that you can create T[j-1] lazily the first time it is needed. If A is very large and j is always relatively small, this will save a lot of time and space.
Unless I misunderstand, you are just finding the k-th order statistic of an array which is the prefix of another array.
This can be done using an algorithm that I think is called 'quickselect' or something along those lines. Basically, it's like quicksort:
Take a random pivot
Swap around array elements so all the smaller ones are on one side
You know this is the p+1th largest element where p is the number of smaller array elements
If p+1 = k, it's the solution! If p+1 > k, repeat on the 'smaller' subarray. If p+1 < k, repeat on the larger 'subarray'.
There's a (much) better description here under the Quickselect and Quicker Select headings, and also just generally on the internet if you search for k-th order quicksort solutions.
Although the worst-case time for this algorithm is O(n2) like quicksort, its expected case is much better (also like quicksort) if you properly select your random pivots. I think the space complexity would just be O(n); you can just make one copy of your prefix to muck up the ordering for.

Resources