Why is the time complexity of heapq.merge higher than that of heapq.heapify? - big-o

Merging k sorted lists containing a total of n elements using heapq.merge is supposed to have a complexity of O(n * logk). This is not directly listed in the documentation but can be inferred from the Wikipedia article mentioning the direct k-way merge. It also seems fairly intuitive - you create a heap of the k top elements and then pop the root of that heap and push onto the heap the next element from the same list - and repeat this till you get the heap (and the lists feeding to it) empty.
What bugs me is that the complexity of this algorithm is higher than that of heapq.heapify if the latter is applied on the same number of elements n supplied in a single unsorted list. The latter complexity is known to be O(n)
This does not make sense - it should be the other way round. It should be more difficult to heapify n unordered elements than to heapify the same elements as sorted in k lists.
What am I missing here?

Direct k-way merge produced a sorted array from your input of sorted arrays.
Creating a heap from all your n elements in unsorted order produces, well, a heap.
A heap is not a sorted list; in fact you need to do a lot of work to produce a sorted list from a heap, as discussed in articles about heapsort, which is an O(n log n) sorting algorithm. So creating the heap may be O(n), but the output is different to that of k-way merge. In this context, you may view it as weaker than the already sorted array. Thus, it makes sense that this time complexity is smaller than that of k-way merge.

Related

existence of a certain data structure

I'm wondering, can there exists such a data stucture under the following criterions and times(might be complicated)?
if we obtain an unsorted list L an build a data structure out of it like this:
Build(L,X) - under O(n) time, we build the structure S from an unsorted list of n elements
Insert (y,S) under O(lg n) we insert z into the structure S
DEL-MIN(S) - under O(lg n) we delete the minimal element from S
DEL-MAX(S) - under O(lg n) we delete the maximal element from S
DEL-MId(S) - under O(lg n) we delete the upper medial(ceiling function) element from S
the problem is that the list L is unsorted. can such a data structure exist?
DEL-MIN and DEL-MAX are easy: keep a min-heap and max-heap of all the elements. The only trick is that you have to keep indices of the value in the heap so that when (for example) you remove the max, you can also find it and remove it in the min-heap.
For DEL-MED, you can keep a max-heap of the elements less than the median and a min-heap of the elements greater than or equal to the median. The full description is in this answer: Data structure to find median. Note that in that answer the floor-median is returned, but that's easily fixed. Again, you need to use the cross-indexing trick to refer to the other datastructures as in the first part. You will also need to think how this handles repeated elements if that's possible in your problem formulation. (If necessary, you can do it by storing repeated elements as (count, value) in your heap, but this complicates rebalancing the heaps on insert/remove a little).
Can this all be built in O(n)? Yes -- you can find the median of n things in O(n) time (using the median-of-median algorithm), and heaps can be built in O(n) time.
So overall, the datastructure is 4 heaps (a min-heap of all the elements, a max-heap of all the elements, a max-heap of the floor(n/2) smallest elements, a min-heap of the ceil(n/2) largest elements. All with cross-indexes to each other.

BUILD-MAX-HEAP running time for array sorted in decreasing order

I know that the running time for BUILD-MAX-HEAP in heap sort is O(n). But, if we have an array that already sorted in a decreasing order, why do we still have O(n) for the running time of BUILD-MAX-HEAP?
Isn't it supposed to be something like O(1)? It's already sorted from the maximum value to the minimum value, so we do not need MAX-HEAPIFY.
Is my understanding correct? Could someone please explain it to me?
You are right. It can of course be O(1). When you know for sure that your list is sorted you can use it as your max heap.
The common implementation of a heap using array uses this behavior for its elements position:
childs[i] = 2i+1 and 2i+2
parent[i] = floor((i-1)/2)
This rule applies on a sorted array. (descending for max-heap, increasing for min-heap).
Please note that if you need to check first that the list is sorted it is of course still O(n).
EDIT: Heap Sort Complexity
Even though the array might be sorted and building the heap might actually take O(1). Whenever you perform a Heap Sort you will still end up with O(n log n).
As said in the comments, Heap Sort is performing n calls to extract-max. Each extraction operation takes O(log n) - We end up with total time complexity of O(n log n).
In case the array is not sorted we will get total time-complexity of O(n + nlogn) which is still O(n log n).
If you know that the array is already sorted in decreasing order, then there's no need to sort it. If you want it in ascending order, you can reverse the array in O(n) time.
If you don't know whether the array is already sorted, then it takes O(n) to determine if it's already reverse sorted.
The reason building a max heap from a reverse-sorted array is considered O(n) is that you have to start at item n/2 and, working backwards, make sure that the element is not smaller than its children. It's considered O(n) even though there are only n/2 checks, because the number of operations performed is proportional to the total number of items to be checked.
It's interesting to note, by the way, that you can build a max-heap from a reverse-sorted array faster than you can check the array to see if it's reverse sorted.

What is tree sort useful for?

Tree sort is one of the usual textbook sorting algorithms, where all elements of the list to be sorted are inserted into a binary search tree, and then the tree is traversed to get the elements in order.
Is there any situation where tree sort is preferable over other sorting algorithms that also take O(n log n) time, like quicksort, mergesort, and heapsort?
It doesn't seem very useful since it always takes extra space to store the tree, while those others can be done in-place. And allocating memory for all those tree nodes probably makes it slower too.
One scenario i could figure out Is in case of external sort
For Example in case of external Sort If u have List if k list of n
elements total KN elements.
Consider a scenario where u cannot store all of them in memory at once like in case of ram we cannot put every element at once in ram
In that Scenario heap Sort or tree sort is useful
we keep one element from each n list So our heap will be of size K
and we can sort it by taking out minimum or maximum element from heap
For more Info -Check External Sort

Merging heap arrays with linear complexity

how can I merge two heap arrays into one balanced heap array and still maintain linear complexity? Much of the material I read about merging heaps takes O(nlogn).
There is an O(n)-time algorithm (sometimes called "heapify") that, given n total values, creates a max-heap from those values. This earlier answer describes the algorithm and explains its runtime. You can use this algorithm to merge two max-heaps by creating a new array storing all the values from those max-heaps and applying the heapify algorithm to construct an new heap out of them.
If you know you'll be merging heaps frequently, though, there are better data structures than binary heaps. For example, binomial heaps, pairing heaps, and skew heaps all support melding in O(log n) time.
Hope this helps!
We are given two heaps of size N each. Each heap can be represented as an array See relation between parent and child. So we have two arrays which are heap ordered. We need to concatenate these two arrays into one by adding all the elements of one array to the end of the other.
So now we have one array of size 2N but it is not heap ordered. To make an array heap ordered it takes linear number of compares and hence takes linear time. (See bottom-up heap construction - Order of building a heap in O(n))

heap tree - complexity to output a sorted list

there is an unsorted list of numbers and a heap tree is constructed out of them.
What is the time complexity of outputting a sorted list of numbers from the heap tree that is already constructed?
(Note: the nodes does not need to be removed from the tree to get current min/max, looking for an efficient way to traverse the heap tree and output the sorted list of numbers)
The same as sorting the list initially - O(nlogn). This is because heapifying the list takes O(n) time, and it's impossible to print sorted sequence out of the heap faster than O(nlogn), because it'd mean it's possible to sort any sequence faster than O(nlogn) (by heapifying and then outputing), which is proved to be wrong.
This is a duplicate question to Print a tree in sorted order using heap properties (Cormen). There is an illustration provided in the original question of why it is an nlogn operation and not logn.

Resources