how can I merge two heap arrays into one balanced heap array and still maintain linear complexity? Much of the material I read about merging heaps takes O(nlogn).
There is an O(n)-time algorithm (sometimes called "heapify") that, given n total values, creates a max-heap from those values. This earlier answer describes the algorithm and explains its runtime. You can use this algorithm to merge two max-heaps by creating a new array storing all the values from those max-heaps and applying the heapify algorithm to construct an new heap out of them.
If you know you'll be merging heaps frequently, though, there are better data structures than binary heaps. For example, binomial heaps, pairing heaps, and skew heaps all support melding in O(log n) time.
Hope this helps!
We are given two heaps of size N each. Each heap can be represented as an array See relation between parent and child. So we have two arrays which are heap ordered. We need to concatenate these two arrays into one by adding all the elements of one array to the end of the other.
So now we have one array of size 2N but it is not heap ordered. To make an array heap ordered it takes linear number of compares and hence takes linear time. (See bottom-up heap construction - Order of building a heap in O(n))
Related
Merging k sorted lists containing a total of n elements using heapq.merge is supposed to have a complexity of O(n * logk). This is not directly listed in the documentation but can be inferred from the Wikipedia article mentioning the direct k-way merge. It also seems fairly intuitive - you create a heap of the k top elements and then pop the root of that heap and push onto the heap the next element from the same list - and repeat this till you get the heap (and the lists feeding to it) empty.
What bugs me is that the complexity of this algorithm is higher than that of heapq.heapify if the latter is applied on the same number of elements n supplied in a single unsorted list. The latter complexity is known to be O(n)
This does not make sense - it should be the other way round. It should be more difficult to heapify n unordered elements than to heapify the same elements as sorted in k lists.
What am I missing here?
Direct k-way merge produced a sorted array from your input of sorted arrays.
Creating a heap from all your n elements in unsorted order produces, well, a heap.
A heap is not a sorted list; in fact you need to do a lot of work to produce a sorted list from a heap, as discussed in articles about heapsort, which is an O(n log n) sorting algorithm. So creating the heap may be O(n), but the output is different to that of k-way merge. In this context, you may view it as weaker than the already sorted array. Thus, it makes sense that this time complexity is smaller than that of k-way merge.
A heap can be constructed from a list in O(n logn) time, because inserting an element into a heap takes O(logn) time and there are n elements.
Similarly, a binary search tree can be constructed from a list in O(n logn) time, because inserting an element into a BST takes on average logn time and there are n elements.
Traversing a heap from min-to-max takes O(n logn) time (because we have to pop n elements, and each pop requires an O(logn) sink operation). Traversing a BST from min-to-max takes O(n) time (literally just inorder traversal).
So, it appears to me that constructing both structures takes equal time, but BSTs are faster to iterate over. So, why do we use "Heapsort" instead of "BSTsort"?
Edit: Thank you to Tobias and lrlreon for your answers! In summary, below are the points why we use heaps instead of BSTs for sorting.
Construction of a heap can actually be done in O(n) time, not O(nlogn) time. This makes heap construction faster than BST construction.
Additionally, arrays can be easily transformed into heaps in-place, because heaps are always complete binary trees. BSTs can't be easily implemented as an array, since BSTs are not guaranteed to be complete binary trees. This means that BSTs require additional O(n) space allocation to sort, while Heaps require only O(1).
All operations on heaps are guaranteed to be O(logn) time. BSTs, unless balanced, may have O(n) operations. Heaps are dramatically simpler to implement than Balanced BSTs are.
If you need to modify a value after creating the heap, all you need to do is apply the sink or swim operations. Modifying a value in a BST is much more conceptually difficult.
There are multiple reasons I can imagine you would want to prefer a (binary) heap over a search tree:
Construction: A binary heap can actually be constructed in O(n) time by applying the heapify operations bottom-up from the smallest to the largest subtrees.
Modification: All operations of the binary heap are rather straightforward:
Inserted an element at the end? Sift it up until the heap condition holds
Swapped the last element to the beginning? Swift it down until the heap condition holds
Changed the key of an entry? Sift it up or down depending on the direction of the change
Conceptual simplicity: Due to its implicit array representation, a binary heap can be implemented by anyone who knows the basic indexing scheme (2i+1, 2i+2 are the children of i) without considering many difficult special cases.
If you look at these operations in a binary search tree, in theory
they are also quite simple, but the tree has to be stored explicitly, e.g. using pointers, and most of the operations require the tree to be
rebalanced to preserve the O(log n) height, which requires complicated rotations (red black-trees) or splitting/merging
nodes (B-trees)
EDIT: Storage: As Irleon pointed out, to store a BST you also need more storage, as at least two child pointers need to be stored for every entry in addition to the value itself, which can be a large storage overhead especially for small value types. At the same time, the heap needs no additional pointers.
To answer your question about sorting: A BST takes O(n) time to traverse in-order, the construction process takes O(n log n) operations which, as mentioned before, are much more complex.
At the same time Heapsort can actually be implemented in-place by building a max-heap from the input array in O(n) time and and then repeatedly swapping the maximum element to tbe back and shrinking the heap. You can think of Heapsort as Insertion sort with a helpful data structure that lets you find the next maximum in O(log n) time.
If the sorting method consists of storing the elements in a data structure and after extracting in a sorted way, then, although both approaches (heap and bst) have the same asymptotic complexity O(n log n), the heap tends to be faster. The reason is the heap always is a perfectly balanced tree and its operations always are O(log n), in a determistic way, not on average. With bst's, depending on the approah for balancing, insertion and deletion tend to take more time than the heap, no matter which balancing approach is used. In addition, a heap is usually implemented with an array storing the level traversal of the tree, without the need of storing any kind of pointers. Thus, if you know the number of elements, which usually is the case, the extra storage required for a heap is less than the used for a bst.
In the case of sorting an array, there is a very important reason which it would rather be preferable a heap than a bst: you can use the same array for storing the heap; no need to use additional memory.
Consider two min-heaps H1,H2 of size n1 and n2 respectively, such that every node in H2 is greater than every node of H1.
How can I merge this two heaps into one heap "H" , in O(n2) (not O(n^2)..)?
(Assume that the heaps represented in arrays of size > n1+n2)
A heap can be constucted in linear time see here. This means you need only take all elements and construct a heap from all the elements to get linear complexity. However you can use a "more fancy" heap for instance leftist heap and perform the merge operation even faster.
Can someone explain why the following algorithm for merging heaps isn't correct?
Lets say we have two (max) heaps H1 and H2.
To merge them:
create an artificial dummy node whose key value is negative infinity and place it at the root with H1 and H2 attached as children. Then do an O(log n) bubble down step that swaps the root eventually to a leaf position, where it is ultimately deleted. The resulting structure is a merged heap.
I have seen claims both on wikipedia and elsewhere that merging two equal sized heaps is a Theta(n) operation, in contradiction with what I've written above.
At least as heaps are normally implemented (with the links implicit in the placement of the nodes), a part you seem to be almost ignoring ("with H1 and H2 attached as children") has linear complexity by itself.
As a heap is normally implemented, you have a linear collection (e.g., an array) where each element N has elements 2N and 2N+1 as its children (e.g., with a 1-based array, the children of element 1 are elements 2 and 3, and the children of element 2 are 4 and 5). Therefore, you need to interleave the elements from the two heaps before we get to the starting point for your merge operation.
If you started with explicitly linked binary trees (just following heap-style rules instead of, for example, binary-search tree ordering) then you'd be right, the merge can be done with logarithmic complexity -- but I doubt that original article intends to refer to that kind of structure.
If you are implementing it as a tree your solution is correct. But as Jerry mentioned, merging array based heaps cannot be done in sub-linear time.
Depending on the frequency and size of the merge I suggest you use a virtual heap. You can implement it as heap of heaps (with arrays). After a few merges you can lazily merge multiple internal heaps into one large heap.
Is there an efficient algorithm for merging 2 max-heaps that are stored as arrays?
It depends on what the type of the heap is.
If it's a standard heap where every node has up to two children and which gets filled up that the leaves are on a maximum of two different rows, you cannot get better than O(n) for merge.
Just put the two arrays together and create a new heap out of them which takes O(n).
For better merging performance, you could use another heap variant like a Fibonacci-Heap which can merge in O(1) amortized.
Update:
Note that it is worse to insert all elements of the first heap one by one to the second heap or vice versa since an insertion takes O(log(n)).
As your comment states, you don't seem to know how the heap is optimally built in the beginning (again for a standard binary heap)
Create an array and put in the elements of both heaps in some arbitrary order
now start at the lowest level. The lowest level contains trivial max-heaps of size 1 so this level is done
move a level up. When the heap condition of one of the "sub-heap"s gets violated, swap the root of the "sub-heap" with it's bigger child. Afterwards, level 2 is done
move to level 3. When the heap condition gets violated, process as before. Swap it down with it's bigger child and process recursively until everything matches up to level 3
...
when you reach the top, you created a new heap in O(n).
I omit a proof here but you can explain this since you have done most of the heap on the bottom levels where you didn't have to swap much content to re-establish the heap condition. You have operated on much smaller "sub heaps" which is much better than what you would do if you would insert every element into one of the heaps => then, you willoperate every time on the whole heap which takes O(n) every time.
Update 2: A binomial heap allows merging in O(log(n)) and would conform to your O(log(n)^2) requirement.
Two binary heaps of sizes n and k can be merged in O(log n * log k) comparisons. See
Jörg-R. Sack and Thomas Strothotte, An algorithm for merging heaps, Acta Informatica 22 (1985), 172-186.
I think what you're looking for in this case is a Binomial Heap.
A binomial heap is a collection of binomial trees, a member of the merge-able heap family. The worst-case running time for a union (merge) on 2+ binomial heaps with n total items in the heaps is O(lg n).
See http://en.wikipedia.org/wiki/Binomial_heap for more information.