Is it possible for a priority queue to have both O(1) insertion and removal?
Priority queues can be implemented using heaps and looking at the run times for Fibonacci heaps it appears that it is not possible to get a run time better than O(logN) per removal.
I am trying to implement a data structure where given N items I will have half in a max-priority queue and half in a min-priority queue. I am then to remove all N items sequentially.
I can insert all N elements in O(N) time but removing all N items will take O(N*logN) so I am wondering if another approach would be more suitable.
If you could construct a priority queue with O(1) insertion and O(1) removal, you could use that to sort a list of n items in O(n) time. As explained in this answer, you can't sort in O(n) in the general case, so it will be impossible to construct a priory queue with O(1) insertion and O(1) removal without making more assumptions on the input.
For example, a priority queue that has O(1) insertion and O(k) (k is the maximum element that could be inserted) removal can be constructed. Keep a table of k linked lists. Insertion of x just prepends an item to the front of the xth list. Removal has to scan through the table to find the first non-empty list (then remove the first item of the list and return the index of that list). There are only k lists, so removal takes O(k) time. If k is a constant, that works out to O(1) removal.
In practice, using a table of counts would work out better. Incrementing a variable-length integer isn't constant time unless you use amortized analysis (which is why I didn't use it in the previous paragraph), but in practice you wouldn't need variable-length counts anyway. Also, in practice it would be bad for large k, even if k is a constant - you'd run out of memory quickly and scanning for the first non-zero element could take a while.
Related
Consider a binary max-heap with n elements. It will have a height of O(log n). When new elements are inserted into the heap, they will be propagated in the heap so that max-heap property is satisfied always.
The new element will be added as the child on the last level. But post insertion, there can be violation of max-heap property. Hence, heapify method will be used. This will have a time complexity of O(log n) i.e height of the heap.
But can we make it even more efficient?
When multiple insert and delete are performed, this procedure makes things slow. Also, it is a strict requirement that the heap should be a max-heap post every insertion.
The objective is to reduce the time complexity of heapify method. This is possible only when the number of comparisons are reduced.
The objective is to reduce the time complexity of the heapify method.
That is a pity, because that is impossible, in contrast to
Reduce the time complexity of multiple inserts and deletes:
Imagine not inserting into the n item heap immediately,
building an auxiliary one (or even a list).
On delete (extract?), place one item from the auxiliary (now at size k) "in the spot emptied" and do a sift-down or up as required if k << n.
If the auxiliary data structure is not significantly smaller than the main one, merge them.
Such ponderings lead to advanced heaps like Fibonacci, pairing, Brodal…
The time complexity of the insert operation in a heap is dependent on the number of comparisons that are made. One can imagine to use some overhead to implement a smart binary search along the leaf-to-root path.
However, the time complexity is not only determined by the number of comparisons. Time complexity is determined by any work that must be performed, and in this case the number of writes is also O(log𝑛) and that number of writes cannot be reduced.
The number of nodes whose value need to change by the insert operation is O(log𝑛). A reduction of the number of comparisons is not enough to reduce the complexity.
Max heap are used for priority queue, because of cheap extraction of max element.
However, please tolerate me.
Shouldn't we just search the max element in O(N) times?
I know that to extract max we require just O(log N) time, but before we can do that we need to build a heap, which itself require O(N) time.
So why do we go through so much complexity of even implementing heap?
Also, some might say that to perform repetitive extract max heaps are an advantage.
But let's say we perform k search operations so by linear search we get O(KN) ==O(N), which is same as heap O(N + K) == O(N)
If we perform N extract max we get O(NLogN) which is better than (NN)==(N^2) search operations.
But there too we could sort an array in O(NlogN) and then have N extracts in O(1) time ==> O(NlogN) + O(N).
So my doubt is that, do we really need heaps? When we can have replace the functionality of heap to much similar procedure if not a better one.
What am I missing out, and for what are heaps really used for.
Forgive my ignorance, and bad usage of grammar. Not a native speaker, sorry :(....
You can use a heap to sort that array in O(n log n)-time in the worst case (unlike Quicksort, unless you implement a complicated pivot selection procedure that's not really practical) and without additional space (unlike Mergesort, unless you implement a complicated in-place merge that is not practical at all).
Heaps really shine when you intermix insertion and extraction though (e.g., Dijkstra's algorithm, Prim's algorithm).
Consider a scenario where you mix N inserts and N extractions.
For a heap you get O(NlogN) total steps.
For a naive approach you get O(N^2) total steps.
For a sort approach (upon insertion add elements at the end, upon query sort) you also get O(N^2) total steps.
Think of how a priority queue is used in the real world. You add some things, take some away, add some more, extract some, etc. With a heap, both Add and Remove are, in the worst case, O(log n). With a list, either Add is O(1) or Remove is O(1). The other is O(n). Typically you would want Add to be O(n) so that the list is always sorted and the Peek operation can be O(1).
So, given a sequence of Add and Remove operations, when there are already 1,000 items in the heap:
Operation Heap List
Add log n n
Add log n n
Remove log n 1
Add log n n
Add log n n
Add log n n
Remove log n 1
Remove log n 1
Remove log n 1
Remove log n 1
That's 10*log(n) for the heap, and 5n + 5 for the list. log(n) of 1,000 is about 10. So you're talking on the order of 100 operations for a heap. For the list, you're talking on the order of 5,000 operations. So the heap would be 50 times as fast. If you have a million items in the list, you're talking about 200 operations for a heap, and 5 million operations for a list.
If you just want to go through a bunch of items in-order, then it doesn't really make sense to use a priority queue. A sorted list works just fine, and will likely be faster than building a priority queue and pulling items off one-by-one. (Although you might end up using a heap sort to sort the items in the first place.)
Use the right tool for the job.
It seems I'm missing something very simple: what are advantages of a Binary Heap for a Priority Queue comparing, say, with quick-sorted array of values? In both cases we keep values in an array, insert is O(logN), delete-max is O(1) in both cases. Initial construction out of a given array of elements is O(NlogN) in both cases, though the link http://en.wikipedia.org/wiki/Heap_%28data_structure%29 suggests faster Floyd's algorithm for the Binary Heap construction. But in case of a queue the elements are probably received one by one, so this advantage disappears. Also, merge seems to perform better for a Binary Heap.
So what are the reasons to prefer BH besides merge? Maybe my assumption is wrong, and BP is used only for studying purpose. I checked C++ docs, they mention "a heap" but of course it does not necessary means Binary heap.
Somewhat similar question: When is it a bad idea to use a heap for a Priority Queue?
The major advantage of the binary heap is that you can add new values to it efficiently after initially constructing it. Suppose you want to back a priority queue with a sorted array. If all the values in the queue are known in advance, you can just sort the values, as you've mentioned. But what happens when you the want to add a new value to the priority queue? This might take time Θ(n) in the worst case because you'd have to shift down all the array elements to make space for the new element that you just added. On the other hand, insertion into a binary heap takes time O(log n), which is exponentially faster.
Another reason you'd use a heap over a sorted array is if you only need to dequeue a few elements. As you mentioned, sorting an array takes time O(n log n), but using clever algorithms you can build a heap in time O(n). If you need to build a priority queue and residue k elements from it, where k is unknown in advance, the runtime with a sorted array is O(n log n + k) and with a binary heap is O(n + k log n). For small k, the second algorithm is much faster.
I would like to implement a double-ended priority queue with the following constraints:
needs to be implemented in a fixed size array..say 100 elements..if new elements need to be added after the array is full, the oldest needs to be removed
need maximum and minimum in O(1)
if possible insert in O(1)
if possible remove minimum in O(1)
clear to empty/init state in O(1) if possible
count of number of elements in array at the moment in O(1)
I would like O(1) for all the above 5 operations but its not possible to have O(1) on all of them in the same implementation. Atleast O(1) on 3 operations and O(log(n)) on the other 2 operations should suffice.
Will appreciate if any pointers can be provided to such an implementation.
There are many specialized data structures for this. One simple data structure is the min-max heap, which is implemented as a binary heap where the layers alternate between "min layers" (each node is less than or equal to its descendants) and "max layers" (each node is greater than or equal to its descendants.) The minimum and maximum can be found in time O(1), and, as in a standard binary heap, enqueues and dequeues can be done in time O(log n) time each.
You can also use the interval heap data structure, which is another specialized priority queue for the task.
Alternatively, you can use two priority queues - one storing elements in ascending order and one in descending order. Whenever you insert a value, you can then insert elements into both priority queues and have each store a pointer to the other. Then, whenever you dequeue the min or max, you can remove the corresponding element from the other heap.
As yet another option, you could use a balanced binary search tree to store the elements. The minimum and maximum can then be found in time O(log n) (or O(1) if you cache the results) and insertions and deletions can be done in time O(log n). If you're using C++, you can just use std::map for this and then use begin() and rbegin() to get the minimum and maximum values, respectively.
Hope this helps!
A binary heap will give you insert and remove minimum in O(log n) and the others in O(1).
The only tricky part is removing the oldest element once the array is full. For this, keep another array:
time[i] = at what position in the heap array is the element
added at time i + 100 * k.
Every 100 iterations, you increment k.
Then, when the array fills up for the first time, you remove heap[ time[0] ], when it fills up for the second time you remove heap[ time[1] ], ..., when it fills up for the 100th time, you wrap around and remove heap[ time[0] ] again etc. When it fills up for the kth time, you remove heap[ time[k % 100] ] (100 is your array size).
Make sure to also update the time array when you insert and remove elements.
Removal of an arbitrary element can be done in O(log n) if you know its position: just swap it with the last element in your heap array, and sift down the element you have swapped in.
If you absolutely need max and min to be O(1) then what you can do is create a linked list, where you constantly keep track of min, max, and size, and then link all the nodes to some sort of tree structure, probably a heap. Min, max, and size would all be constant, and since finding any node would be in O(log n), insert and remove are log n each. Clearing would be trivial.
If your queue is a fixed size, then O-notation is meaningless. Any O(log n) or even O(n) operation is essentially O(1) because n is fixed, so what you really want is an algorithm that's fast for the given dataset. Probably two parallel traditional heap priority queues would be fine (one for high, one for low).
If you know more about what kind of data you have, you might be able to make something more special-purpose.
The task is to implement a queue in java with the following methods:
enqueue //add an element to queue
dequeue //remove element from queue
peekMedian //find median
peekMinimum //find minimum
peakMaximum //find maximum
size // get size
Assume that ALL METHODS ARE CALLED In EQUAL FREQUENCY, the task is to have the fastest implementation.
My Current Approach:
Maintain a sorted array, in addition to the queue, so enqueue and dequeue are take O(logn) and peekMedian, peekMaximum, peekMinimum all take O(1) time.
Please suggest a method that will be faster, assuming all methods are called in equal frequency.
Well, you are close - but there is still something missing, since inserting/deleting from a sorted array is O(n) (because at probability 1/2 the inserted element is at the first half of the array, and you will have to shift to the right all the following elements, and there are at least n/2 of these, so total complexity of this operation is O(n) on average + worst case)
However, if you switch your sorted DS to a skip list/ balanced BST - you are going to get O(logn) insertion/deletion and O(1) minimum/maximum/median/size (with caching)
EDIT:
You cannot get better then O(logN) for insertion (unless you decrease the peekMedian() to Omega(logN)), because that will enable you to sort better then O(NlogN):
First, note that the median moves one element to the right for each "high" elements you insert (in here, high means >= the current max).
So, by iteratively doing:
while peekMedian() != MAX:
peekMedian()
insert(MAX)
insert(MAX)
you can find the "higher" half of the sorted array.
Using the same approach with insert(MIN) you can get the lowest half of the array.
Assuming you have o(logN) (small o notation, better then Theta(logN) insertion and O(1) peekMedian(), you got yourself a sort better then O(NlogN), but sorting is Omega(NlogN) problem.
=><=
Thus insert() cannot be better then O(logN), with median still being O(1).
QED
EDIT2: Modifying the median in insertions:
If the tree size before insertion is 2n+1 (odd) then the old median is at index n+1, and the new median is at the same index (n+1), so if the element was added before the old median - you need to get the preceding node of the last median - and that's the new median. If it was added after it - do nothing, the old median is the new one as well.
If the list is even (2n elements), then after the insertion, you should increase an index (from n to n+1), so if the new element was added before the median - do nothing, if it was added after the old median, you need to set the new median as the following node from the old median.
note: In here next nodes and preceding nodes are those that follow according to the key, and index means the "place" of the node (smallest is 1st and biggest is last).
I only explained how to do it for insertion, the same ideas hold for deletion.
There is a simpler and perhaps better solution. (As has been discussed, the sorted array makes enqueue and dequeue both O(n), which is not so good.)
Maintain two sorted sets in addition to the queue. The Java library provides e.g. SortedSet, which are balanced search trees. The "low set" stores the first ceiling (n/2) elements in sorted order. The second "high set" has the last floor(n/2).
NB: If duplicates are allowed, you'll have to use something like Google's TreeMultiset instead of regular Java sorted sets.
To enqueue, just add to the queue and the correct set. If necessary, re-establish balance between the sets by moving one element: either the greatest element in the low set to the upper set or the least element in the high set to the low. Dequeuing needs the same re-balance operation.
Finding the median if n is odd is just looking up the max element in the low set. If n is even, find the max element in the low set and min in the high set and average them.
With the native Java sorted set implementation (balanced tree), this will be O(log n) for all operations. It will be very easy to code. About 60 lines.
If you implement your own sifting heaps for the low and high sets, then you'll have O(1) for the find median operation while all other ops will remain O(log n).
If you go on and implement your own Fibonacci heaps for the low and high sets, then you'll have O(1) insert as well.