I am looking for a datastructure to support a kind of advanced priority queueing. The idea is as follows. I need to sequentially process a number of items, and at any given point in time I know the "best" one to do next (based on some metric). The thing is, processing an item changes the metric for a few of the other items, so a static queue does not do the trick.
In my problem, I know which items need to have their priorities updated, so the datastructure I am looking for should have the methods
enqueue(item, priority)
dequeue()
requeue(item, new_priority)
Ideally I would like to requeue in O(log n) time. Any ideas?
There is an algorithm with time complexity similar to what you required, but it runs O(log n) only on average time, if it is what you needed. In this algorithm, you can use existing priority queue without the requeue() function.
Assuming you have a connection between the nodes in your graph and the elements in the priority queue. Let the element of the priority queue also store an extra bit called ignore. The algorithm for the modified dequeue runs as follow:
Call dequeue()
If the ignore bit in the element is true, go back to 1, otherwise return the item id.
The algorithm for the modified enqueue runs as follow:
Call enqueue(item, priority)
Visit neighbor nodes v of the item in the graph one by one
change the ignore bit to true for the current linked element in the queue correspond to v
enqueue(v, new_priority(v))
change the connection of the node v to the new enqueued elements.
num_ignore++
If the number of ignore element (num_ignore) is more than the number of non-ignore element, rebuild the priority queue
dequeue all elements, store, and then enqueue only non-ignore elements again
In this algorithm, the setting of ignore bit takes constant time, so you basically delay the O(log n) "requeue" until you accumulate O(n) ignore elements. Then clear all of them once, which takes O(n log n). Therefore, on average, each "requeue" takes O(log n).
You can not achieve the complexity you are asking for, as when updating elements the complexity should also depend on the number of updated elements.
However if we assume that the number of updated elements on a given step is p most of the typical implementations of a heap will do for a O(1) complexity to get max-element's value, O(log(n)) for deque, and O(p * log(n)) for the update operations. I would personally go for a binary heap as it is fairly easy to implement and will work for what you are asking for.
A Priority queue is exactly for this. You can implement it, for example, by using max-heap.
http://www.eecs.wsu.edu/~ananth/CptS223/Lectures/heaps.pdf describes increaseKey(), decreaseKey() and remove() operations. This would let you do what you want. I haven't figured out if the C++ stdlib implementation supports it yet.
Further, the version: http://theboostcpplibraries.com/boost.heap seems to support update() for some subclasses, but I haven't found a full reference yet.
Related
I currently studying the binomial heap right now.
I learned that following operations for the binomial heaps can be completed in Theta(log n) time.:
Get-max
Insert
Extract Max
Merge
Increase-Key
Delete
But, the two operations Increase key and Delete operations said they need the pointer to the element that need to be complete in Theta(log n).
Here is 3 questions I want to ask:
Is this because if Increase key and Delete don't have the pointer to element, they have to search the elements before the operations took place?
what is the time complexity for the searching operations for the binomial heap? (I believe O(n))
If the pointer to the element is not given for Increase key and Delete operations, those two operations will take O(n) time or it can be lower than that.
It’s good that you’re thinking about this!
Yes, that’s exactly right. The nodes in a binomial heap are organized in a way that makes it very quick to find the minimum value, but the relative ordering of the remaining elements is not guaranteed to be in an order that makes it easy to find things.
There isn’t a general way to search a binomial heap for an element faster than O(n). Or, stated differently, the worst-case cost of any way of searching a binomial heap is Ω(n). Here’s one way to see this. Form a binomial heap where n-1 items have priority 137 and one item has priority 42. The item with priority 42 must be a leaf node. There are (roughly) n/2 leaves in the heap, and since there is no ordering on them to find that one item you’d have to potentially look at all the leaves. To formalize this, you could form multiple different binomial heaps with these items, and whatever algorithm was looking for the item of priority 42 would necessarily have to find it in the last place it looks at least once.
For the reasons given above, no, there’s no way to implement those operations quickly without having pointers to them, since in the worst case you have to search everywhere.
I know about the implementation of both the data structures , i want to know which is better considering time complexity.
Both have same insertion and erase complexity O(log n), while get min is for both O(1).
A priority queue only gives you access to one element in sorted order ie, you can get the highest/lowest priority item, and when you remove that, you can get the next one, and so on. A set allows you full access in sorted order, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other.
In priority queue you can have multiple elements with same priority value, while in set you can't.
Set are generally backed by a binary tree, while priority queue is heap.
So the question is when should you use a binary tree instead of a heap?
In my opinion you should use neither of them. Check Binomial and Fibonacci heap. For prime algorithm they will have better performance.
If you insist in using one of them, I would go with priority queue, as it have smaller memory footprint and can have multiple elements with same priority value.
Theoretically speaking, both will give you an O(E log V)-time algorithm. This is not optimal; Fibonacci heaps give you O(E + V log V), which is better for dense graphs (E >> V).
Practically speaking, neither is ideally suited. Since set has long lived iterators, it's possible to implement a DecreaseKey operation, reducing the extra storage from O(E) to O(V) (the workaround is to enqueue vertices multiple times), but the space constant is worse than priority_queue, and the time constant probably is as well. You should measure your use case.
I will second Jim Mischel's recommendation of binary heap (a.k.a., priority_queue) -> pairing heap if the builtin isn't fast enough.
What is the performance of the insertion operation for a Queue implemented as:
(a) an array, with the items in unsorted order
(b) an array, with the items in sorted order
(c) a linked list, with the items in unsorted order.
For each operation, and each implementation, give the performance in Big Oh notation and explain enough of the algorithm to justify your answer. (e.g. it takes O(n) times because in the worse case.... the algorithm does such and such....).
Please explain in detail, it'll help me out a lot!
Short answer: it depends on your data structure.
In a naive array-based implementation (Assuming fixed size), I think it's pretty obvious that insertion is a constant operation (That is, O(1)), assuming that you don't run off the end of the array. This is similar in a cyclic array, with similar assumptions.
A dynamic array is a little more complicated. A dynamic array is a fixed-size array that you enlarge once it's filled to a certain point. So for a dynamic array that resizes when it reaches length k, the first k-1 insertions are constant (Just like inserting into an ordinary array) and the k-th insertion takes O(k+1) - the cost of duplicating the contents of the array into a larger container, and then inserting the element. You can show that this works out to O(1) insertion time, but that may be out of scope for your course.
As others have noted, sorted order doesn't affect a standard queue. If you are in fact dealing with a priority queue, then there are lots of possible implementations, which I'll let you research on your own. The best insertion time is O(1), but that implementation has some disadvantages. The standard implementation is O(log n) insertion.
With linked lists, the insertion time will depend on whether the head of the list is the head of the queue (i.e., whether you add onto the head or the tail).
If you're adding onto the head, then it's pretty easy to see that insertion is O(1). If you're adding onto the tail, then it's also easy to see that insertion is O(n) for a list of length n. The main point is that, whichever implementation you choose, insert will always be one of O(1) or O(n), and removal will always be the other.
However, there is a simple trick that will let you get both insert and removal to O(1) in either case. I'll leave it to you to consider how to do that.
I need to frequently find the minimum value object in a set that's being continually updated. I need to have a priority queue type of functionality. What's the best algorithm or data structure to do this? I was thinking of having a sorted tree/heap, and every time the value of an object is updated, I can remove the object, and re-insert it into the tree/heap. Is there a better way to accomplish this?
A binary heap is hard to beat for simplicity, but it has the disadvantage that decrease-key takes O(n) time. I know, the standard references say that it's O(log n), but first you have to find the item. That's O(n) for a standard binary heap.
By the way, if you do decide to use a binary heap, changing an item's priority doesn't require a remove and re-insert. You can change the item's priority in-place and then either bubble it up or sift it down as required.
If the performance of decrease-key is important, a good alternative is a pairing heap, which is theoretically slower than a Fibonacci heap, but is much easier to implement and in practice is faster than the Fibonacci heap due to lower constant factors. In practice, pairing heap compares favorably with binary heap, and outperforms binary heap if you do a lot of decrease-key operations.
You could also marry a binary heap and a dictionary or hash map, and keep the dictionary updated with the position of the item in the heap. This gives you faster decrease-key at the cost of more memory and increased constant factors for the other operations.
Quoting Wikipedia:
To improve performance, priority queues typically use a heap as their
backbone, giving O(log n) performance for inserts and removals, and
O(n) to build initially. Alternatively, when a self-balancing binary
search tree is used, insertion and removal also take O(log n) time,
although building trees from existing sequences of elements takes O(n
log n) time; this is typical where one might already have access to
these data structures, such as with third-party or standard libraries.
If you are looking for a better way, there must be something special about the objects in your priority queue. For example, if the keys are numbers from 1 to 10, a countsort-based approach may outperform the usual ones.
If your application looks anything like repeatedly choosing the next scheduled event in a discrete event simulation, you might consider the options listed in e.g. http://en.wikipedia.org/wiki/Discrete_event_simulation and http://www.acm-sigsim-mskr.org/Courseware/Fujimoto/Slides/FujimotoSlides-03-FutureEventList.pdf. The later summarizes results from different implementations in this domain, including many of the options considered in other comments and answers - and a search will find a number of papers in this area. Priority queue overhead really does make some difference in how many times real time you can get your simulation to run - and if you wish to simulate something that takes weeks of real time this can be important.
It seems I'm missing something very simple: what are advantages of a Binary Heap for a Priority Queue comparing, say, with quick-sorted array of values? In both cases we keep values in an array, insert is O(logN), delete-max is O(1) in both cases. Initial construction out of a given array of elements is O(NlogN) in both cases, though the link http://en.wikipedia.org/wiki/Heap_%28data_structure%29 suggests faster Floyd's algorithm for the Binary Heap construction. But in case of a queue the elements are probably received one by one, so this advantage disappears. Also, merge seems to perform better for a Binary Heap.
So what are the reasons to prefer BH besides merge? Maybe my assumption is wrong, and BP is used only for studying purpose. I checked C++ docs, they mention "a heap" but of course it does not necessary means Binary heap.
Somewhat similar question: When is it a bad idea to use a heap for a Priority Queue?
The major advantage of the binary heap is that you can add new values to it efficiently after initially constructing it. Suppose you want to back a priority queue with a sorted array. If all the values in the queue are known in advance, you can just sort the values, as you've mentioned. But what happens when you the want to add a new value to the priority queue? This might take time Θ(n) in the worst case because you'd have to shift down all the array elements to make space for the new element that you just added. On the other hand, insertion into a binary heap takes time O(log n), which is exponentially faster.
Another reason you'd use a heap over a sorted array is if you only need to dequeue a few elements. As you mentioned, sorting an array takes time O(n log n), but using clever algorithms you can build a heap in time O(n). If you need to build a priority queue and residue k elements from it, where k is unknown in advance, the runtime with a sorted array is O(n log n + k) and with a binary heap is O(n + k log n). For small k, the second algorithm is much faster.