Consider a binary max-heap with n elements. It will have a height of O(log n). When new elements are inserted into the heap, they will be propagated in the heap so that max-heap property is satisfied always.
The new element will be added as the child on the last level. But post insertion, there can be violation of max-heap property. Hence, heapify method will be used. This will have a time complexity of O(log n) i.e height of the heap.
But can we make it even more efficient?
When multiple insert and delete are performed, this procedure makes things slow. Also, it is a strict requirement that the heap should be a max-heap post every insertion.
The objective is to reduce the time complexity of heapify method. This is possible only when the number of comparisons are reduced.
The objective is to reduce the time complexity of the heapify method.
That is a pity, because that is impossible, in contrast to
Reduce the time complexity of multiple inserts and deletes:
Imagine not inserting into the n item heap immediately,
building an auxiliary one (or even a list).
On delete (extract?), place one item from the auxiliary (now at size k) "in the spot emptied" and do a sift-down or up as required if k << n.
If the auxiliary data structure is not significantly smaller than the main one, merge them.
Such ponderings lead to advanced heaps like Fibonacci, pairing, Brodal…
The time complexity of the insert operation in a heap is dependent on the number of comparisons that are made. One can imagine to use some overhead to implement a smart binary search along the leaf-to-root path.
However, the time complexity is not only determined by the number of comparisons. Time complexity is determined by any work that must be performed, and in this case the number of writes is also O(log𝑛) and that number of writes cannot be reduced.
The number of nodes whose value need to change by the insert operation is O(log𝑛). A reduction of the number of comparisons is not enough to reduce the complexity.
Related
Is there a data structure with elements that can be indexed whose insertion runtime is O(1)? So for example, I could index the data structure like so: a[4], and yet when inserting an element at an arbitrary place in the data structure that the runtime is O(1)? Note that the data structure does not maintain sorted order, just the ability for each sequential element to have an index.
I don't think its possible, since inserting somewhere that is not at the end or beginning of the ordered data structure would mean that all the indicies after insertion must be updated to know that their index has increased by 1, which would take worst case O(n) time. If the answer is no, could someone prove it mathematically?
EDIT:
To clarify, I want to maintain the order of insertion of elements, so upon inserting, the item inserted remains sequentially between the two elements it was placed between.
The problem that you are looking to solve is called the list labeling problem.
There are lower bounds on the cost that depend on the relationship between the the maximum number of labels you need (n), and the number of possible labels (m).
If n is in O(log m), i.e., if the number of possible labels is exponential in the number of labels you need at any one time, then O(1) cost per operation is achievable... but this is not the usual case.
If n is in O(m), i.e., if they are proportional, then O(log2 n) per operation is the best you can do, and the algorithm is complicated.
If n <= m2, then you can do O(log N). Amortized O(log N) is simple, and O(log N) worst case is hard. Both algorithms are described in this paper by Dietz and Sleator. The hard way makes use of the O(log2 n) algorithm mentioned above.
HOWEVER, maybe you don't really need labels. If you just need to be able to compare the order of two items in the collection, then you are solving a slightly different problem called "list order maintenance". This problem can actually be solved in constant time -- O(1) cost per operation and O(1) cost to compare the order of two items -- although again O(1) amortized cost is a lot easier to achieve.
When inserting into slot i, append the element which was first at slot i to the end of the sequence.
If the sequence capacity must be grown, then this growing may not necessarily be O(1).
It seems I'm missing something very simple: what are advantages of a Binary Heap for a Priority Queue comparing, say, with quick-sorted array of values? In both cases we keep values in an array, insert is O(logN), delete-max is O(1) in both cases. Initial construction out of a given array of elements is O(NlogN) in both cases, though the link http://en.wikipedia.org/wiki/Heap_%28data_structure%29 suggests faster Floyd's algorithm for the Binary Heap construction. But in case of a queue the elements are probably received one by one, so this advantage disappears. Also, merge seems to perform better for a Binary Heap.
So what are the reasons to prefer BH besides merge? Maybe my assumption is wrong, and BP is used only for studying purpose. I checked C++ docs, they mention "a heap" but of course it does not necessary means Binary heap.
Somewhat similar question: When is it a bad idea to use a heap for a Priority Queue?
The major advantage of the binary heap is that you can add new values to it efficiently after initially constructing it. Suppose you want to back a priority queue with a sorted array. If all the values in the queue are known in advance, you can just sort the values, as you've mentioned. But what happens when you the want to add a new value to the priority queue? This might take time Θ(n) in the worst case because you'd have to shift down all the array elements to make space for the new element that you just added. On the other hand, insertion into a binary heap takes time O(log n), which is exponentially faster.
Another reason you'd use a heap over a sorted array is if you only need to dequeue a few elements. As you mentioned, sorting an array takes time O(n log n), but using clever algorithms you can build a heap in time O(n). If you need to build a priority queue and residue k elements from it, where k is unknown in advance, the runtime with a sorted array is O(n log n + k) and with a binary heap is O(n + k log n). For small k, the second algorithm is much faster.
Is it possible for a priority queue to have both O(1) insertion and removal?
Priority queues can be implemented using heaps and looking at the run times for Fibonacci heaps it appears that it is not possible to get a run time better than O(logN) per removal.
I am trying to implement a data structure where given N items I will have half in a max-priority queue and half in a min-priority queue. I am then to remove all N items sequentially.
I can insert all N elements in O(N) time but removing all N items will take O(N*logN) so I am wondering if another approach would be more suitable.
If you could construct a priority queue with O(1) insertion and O(1) removal, you could use that to sort a list of n items in O(n) time. As explained in this answer, you can't sort in O(n) in the general case, so it will be impossible to construct a priory queue with O(1) insertion and O(1) removal without making more assumptions on the input.
For example, a priority queue that has O(1) insertion and O(k) (k is the maximum element that could be inserted) removal can be constructed. Keep a table of k linked lists. Insertion of x just prepends an item to the front of the xth list. Removal has to scan through the table to find the first non-empty list (then remove the first item of the list and return the index of that list). There are only k lists, so removal takes O(k) time. If k is a constant, that works out to O(1) removal.
In practice, using a table of counts would work out better. Incrementing a variable-length integer isn't constant time unless you use amortized analysis (which is why I didn't use it in the previous paragraph), but in practice you wouldn't need variable-length counts anyway. Also, in practice it would be bad for large k, even if k is a constant - you'd run out of memory quickly and scanning for the first non-zero element could take a while.
Is there one type of set-like data structure supporting merging in O(logn) time and k-th element search in O(logn) time? n is the size of this set.
You might try a Fibonacci heap which does merge in constant amortized time and decrease key in constant amortized time. Most of the time, such a heap is used for operations where you are repeatedly pulling the minimum value, so a check-for-membership function isn't implemented. However, it is simple enough to add one using the decrease key logic, and simply removing the decrease portion.
If k is a constant, then any meldable heap will do this, including leftist heaps, skew heaps, pairing heaps and Fibonacci heaps. Both merging and getting the first element in these structures typically take O(1) or O(lg n) amortized time, so O( k lg n) maximum.
Note, however, that getting to the k'th element may be destructive in the sense that the first k-1 items may have to be removed from the heap.
If you're willing to accept amortization, you could achieve the desired bounds of O(lg n) time for both meld and search by using a binary search tree to represent each set. Melding two trees of size m and n together requires time O(m log(n / m)) where m < n. If you use amortized analysis and charge the cost of the merge to the elements of the smaller set, at most O(lg n) is charged to each element over the course of all of the operations. Selecting the kth element of each set takes O(lg n) time as well.
I think you could also use a collection of sorted arrays to represent each set, but the amortization argument is a little trickier.
As stated in the other answers, you can use heaps, but getting O(lg n) for both meld and select requires some work.
Finger trees can do this and some more operations:
http://en.wikipedia.org/wiki/Finger_tree
There may be something even better if you are not restricted to purely functional data structures (i.e. aka "persistent", where by this is meant not "backed up on non-volatile disk storage", but "all previous 'versions' of the data structure are available even after 'adding' additional elements").
Is there an efficient algorithm for merging 2 max-heaps that are stored as arrays?
It depends on what the type of the heap is.
If it's a standard heap where every node has up to two children and which gets filled up that the leaves are on a maximum of two different rows, you cannot get better than O(n) for merge.
Just put the two arrays together and create a new heap out of them which takes O(n).
For better merging performance, you could use another heap variant like a Fibonacci-Heap which can merge in O(1) amortized.
Update:
Note that it is worse to insert all elements of the first heap one by one to the second heap or vice versa since an insertion takes O(log(n)).
As your comment states, you don't seem to know how the heap is optimally built in the beginning (again for a standard binary heap)
Create an array and put in the elements of both heaps in some arbitrary order
now start at the lowest level. The lowest level contains trivial max-heaps of size 1 so this level is done
move a level up. When the heap condition of one of the "sub-heap"s gets violated, swap the root of the "sub-heap" with it's bigger child. Afterwards, level 2 is done
move to level 3. When the heap condition gets violated, process as before. Swap it down with it's bigger child and process recursively until everything matches up to level 3
...
when you reach the top, you created a new heap in O(n).
I omit a proof here but you can explain this since you have done most of the heap on the bottom levels where you didn't have to swap much content to re-establish the heap condition. You have operated on much smaller "sub heaps" which is much better than what you would do if you would insert every element into one of the heaps => then, you willoperate every time on the whole heap which takes O(n) every time.
Update 2: A binomial heap allows merging in O(log(n)) and would conform to your O(log(n)^2) requirement.
Two binary heaps of sizes n and k can be merged in O(log n * log k) comparisons. See
Jörg-R. Sack and Thomas Strothotte, An algorithm for merging heaps, Acta Informatica 22 (1985), 172-186.
I think what you're looking for in this case is a Binomial Heap.
A binomial heap is a collection of binomial trees, a member of the merge-able heap family. The worst-case running time for a union (merge) on 2+ binomial heaps with n total items in the heaps is O(lg n).
See http://en.wikipedia.org/wiki/Binomial_heap for more information.