So let's say I have a priority queue of N items with priorities, where N is in the thousands, using a priority queue implemented with a binary heap. I understand the EXTRACT-MIN and INSERT primitives (see Cormen, Leiserson, Rivest which uses -MAX rather than -MIN).
But DELETE and DECREASE-KEY both seem to require the priority queue to be able to find an item's index in the heap given the item itself (alternatively, that index needs to be given by consumers of the priority queue, but this seems like an abstraction violation).... which looks like an oversight. Is there a way to do this efficiently without having to add a hashtable on top of the heap?
Right, I think the point here is that for the implementation of the priority queue you may use a binary heap who's API takes an index (i) for its HEAP-INCREASE-KEY(A, i, key), but the interface to the priority queue may be allowed to take an arbitrary key. You're free to have the priority queue encapsulate the details of key->index maps. If you need your PQ-INCREASE-KEY(A, old, new) to to work in O(log n) then you'd better have a O(log n) or better key to index lookup that you keep up to date. That could be a hash table or other fast lookup structure.
So, to answer your question: I think it's inevitable that the data structure be augmented some how.
FWIW, and if someone still comes looking for something similar — I recently chanced upon an implementation for an Indexed priority queue while doing one of the Coursera courses on Algorithms.
The basic gist is to incorporate a reverse lookup using 2 arrays to support the operations that the OP stated.
Here's a clear implementation for Min Ordered Indexed Priority Queue.
"But DELETE and DECREASE-KEY both seem to require the priority queue to be able to find an item's index in the heap given the item itself" -- it's clear from the code that at least a few of these methods use an index into the heap rather than the item's priority. Clearly, i is an index in HEAP-INCREASE-KEY:
HEAP-INCREASE-KEY(A, i, key)
if key < A[i]
then error 'new key is smaller than current key"
A[i] <-- key
...
So if that's the API, use it.
I modified my node class to add a heapIndex member. This is maintained by the heap as nodes are swapped during insert, delete, decrease, etc.
This breaks encapsulation (my nodes are now tied to the heap), but it runs fast, which was more important in my situation.
One way is to split up the heap into the elements on one side and the organization on the other.
For full functionality, you need two relations:
a) Given a Heap Location (e.g. Root), find the Element seated there.
b) Given an Element, find its Heap Location.
The second is very easy: add a value "location" (most likely an index in an array-based heap) that is updated every time the element is moved in the heap.
The first is also simple: instead of storing Elements, you simply keep a heap of pointers to Elements (or array indeces). Now, given a Location (e.g. Root), you can find the Element seated there by dereferencing it (or accessing the vector).
But DELETE and DECREASE-KEY both seem to require the priority queue to be able to find an item's index in the heap given the item itself
Actually, that's not true. You can implement these operations in an unindexed graph, linked-lists and 'traditional' search trees by having predecessor(s) and successor(s) pointers.
Related
What would be the most appropriate way to implement a stack and a queue together efficiently, in a single data structure. The number of elements is infinite. The retrieval and insertion should both happen in constant time.
A doubly linked list, has all the computational complexity attributes you desire, but poor cache locality.
A ring buffer (array) that allows for appending and removing at head and tail has the same complexity characteristics. It uses a dynamic array and requires reallocation, once the number of elements grows beyond it's capacity.
But, similar to an array list / vector generally being faster in practice for sequential access versus a linked list. In most cases it will be faster and more memory efficient than using a doubly linked list implementation.
It is one of the possible implementations for the dequeue abstract data structure, see e.g. the ArrayDeque<E> implementation in Java.
A doubly linked list can solve this problem with all operations taking constant time:
It allows push() or enqueue() by appending the element to the
list in constant time.
It allows pop() by removing the last element in constant time
It allows dequeue() by removing the first element, also in constant time.
A two-way linked list is going to be best for this. Each node in the list has two references: one to the item before it and one to the item after it. The main list object maintains a reference to the item at the front of the list and one at the back of the list.
Any time it inserts an item, the list:
creates a new node, giving it a reference to the previous first or last node in the list (depending on whether you're adding to the front or back).
connects the previous first or last node to point at the newly-created node.
updates its own reference to the first or last node, to point at the new node.
Removing an item from the front or back of the list effectively reverses this process.
Inserting to the front or back of the structure will always be an O(1) operation.
Just like it was asked here,
I fail to understand how we can find the index of a relaxed vertex in the heap.
Programming style-wise, the heap is a black box that abstracts away the details of a priority queue. Now if we need to maintain a hash table that maps vertex keys to corresponding indices in the heap array, that would need to be done in heap implementation, right?
But most standard heaps don't provide a hash table that does such mapping.
Another way to deal with this whole problem is to add the relaxed vertices to the heap regardless of anything. When we extract the minimum we'll get the best one. To prevent the same vertex being extracted multiple times, we can mark it visited.
So my exact question is, what is the typical way (in the industry) of dealing with this problem?
What are the pros and cons compared what the methods I mentioned?
Typically, you'd need a specially-constructed priority queue that supports the decreaseKey operation in order to get this to work. I've seen this implemented by having the priority queue explicitly keep track of a hash table of the indices (if using a binary heap), or by having an intrusive priority queue where elements stored are nodes in the heap (if using a binomial heap or Fibonacci heap, for example). Sometimes, the priority queue's insertion operation will return a pointer to the node in the priority queue that holds the newly-added key. As an example, here is an implementation of a Fibonacci heap that supports decreaseKey. It works by having each insert operation return a pointer to the node in the Fibonacci heap, which makes it possible to look up the node in O(1), assuming you keep track of the returned pointers.
Hope this helps!
You are asking some very valid questions but unfortunately they are kind of vague so we won't be able to give you a 100% solid "industry standard" answer. However, I'll try to go over your points anyway:
Programming style-wise, the heap is a black box that abstracts away the details of a priority queue
Technically, a priority queue is the abstract interface (insert elements with a priority, extract the lowest priority element) and a heap is a concrete implementation (array-based heap, binomial heap, fibonacci heap, etc).
What I'm trying to say is that using an array is only one particular way to implement a priority queue.
Now if we need to maintain a hash table that maps vertex keys to corresponding indices in the heap array, that would need to be done in heap implementation, right?
Yes, because everytime you move an element inside the array you will need to update the index in the hash table.
But most standard heaps don't provide a hash table that does such mapping.
Yes. This can be very annoying.
Another way to deal with this whole problem is to add the relaxed vertices to the heap regardless of anything.
I guess that could work but I dont think I ever saw anyone do that. The whole point of using a heap here is to increase performance and by adding redundant elements to the heap you kind of go against that. Sure, you preserve the "black-boxness" of the priority queue but I don't know if that is worth it. Additionally, there could be a chance that the extra pop_heap operations could negatively affect your asymptoptic complexity but I'd have to do the math to check.
what is the typical way (in the industry) of dealing with this problem?
First of all, ask yourself if you can get away with using a dumb array instead of a priority queue.
Sure, finding the minimum element in now O(N) instead of O(log n) but the implementation is the simplest (an advantage on its own). Additionally, using an array will be just as efficient if your graph is dense and even if your graph is sparse it might be efficient enough depending on how big your graph is.
If you really need a priority queue, then you are going to have to find one that has a decreaseKey operation implemented. If you can't find one, I would say its not that bad to implement it yourself - it might be less trouble than trying to find an existing implementation and then trying to fit it in with the rest of your code.
Finally, I would not recommend using the really fancy heap data structures (such as fibonacci heaps). While these often show up in textbooks as a way to get optimal asymptotics, in practice they have terrible constant factors and these constant factors are significant when compared with something that is logarithmic.
Programming style-wise, the heap is a black box that abstracts away the details of a priority queue.
Not necessarily. Both C++ and Python have heap libraries that provide functions on arrays rather than black box objects. Go abstracts a bit, but requires the programmer to provide an array-like data structure for its heap operations to work on.
All this abstraction leaking in standardized, industry-strength libraries has a reason: some algorithms (Dijkstra) require a heap with additional operations, which would degrade the performance of other algorithms. Yet other algorithms (heapsort) need heap operations that work in-place on input arrays. If your library's heap gives you a black-box object, and it doesn't suffice for some algorithm, then it's time to re-implement the operations as function on arrays, or find a library that does have the operations you need.
This is a great question and one of those details that algorithms books like CLRS just glaze over without mention.
There are a few ways to do handle this, either:
Use a custom heap implementation that supports decreaseKey operations
Every time you "relax" a vertex, you just add it back into the heap with the new lower weight, then you write a custom way to ignore the old elements later. You can take advantage of the fact that you only ever add a node into the heap/priority-queue if the weight has decreased.
Option #1 is definitely used. For example, if you are familiar with OpenSourceRoutingMachine (OSRM) it searches over graphs with many millions of nodes to compute road routing directions. It uses a Boost implementation of a d-ary heap specifically because it has better decreaseKey operations, source. Often the Fibonacci_heap is also mentioned for this purpose because it supports O(1) decrease key operations, but likewise you'd probably have to roll your own.
In option #2 you end up doing more insertions and removeMin operations in total. If D is the total number of "relax" operations you must do, you end up doing a total of D additional heap operations. So while this has a theoretically worse runtime complexity, in practice there is research evidence that option #2 can be more performant because you can take advantage of cache locality and avoid the additional overhead of keeping pointers to do the decreaseKey operations (see [1], specifically pg. 16). This approach also has the advantage of being simpler and allows you to use standard library heap/priority-queue implementations in most languages.
To give you some psuedocode for how option #2 would look:
// Imagine this is some lookup table that has the minimum weight
// so far for each node.
weights = {}
while Queue is not empty:
u = Queue.removeMin()
// This is our new logic to discard the duplicate entries.
if u.weight > weights[u]:
continue
visit neighbors[u] and relax() each one
As an alternative, you can also check out the the Python standard library heapq docs which describe another approach to keeping track of "dead" entries in the heap. Whether you find it helpful depends on what data structure you are using for your graph representation and storing of vertex distances.
[1] Priority Queues and Dijkstra’s Algorithm 2007
Let's say you're composing a blogging website. It displays recent blog posts by multiple authors sorted by "priority". Highest priority on top. Priority is determined by some formula involving:
how recently the post was published
how many comments it attracted
Order must always be accurate in real-time.
Sorting by priority is easy. The problem is let's say our website is hugely popular and comments fly in at the hundreds-per-minute rate. They fly in on dozens of posts.
Is there a pattern to handle this scenario? In other words, can we do any better than just updating the priority field whenever there's a comment on a post, and then sorting posts each and every time the page is loaded? Caching post order doesn't help much because heavy user activity causes order to change frequently.
With "pattern" I'm speaking from both a code and database schema point of view.
You can use a balanced binary tree (e.g. a red-black tree) to store the sorted index, which should make it quicker to update than if you were sorting the entire index every time.
Using Java-ish pseudocode this would look like
Tree tree;
Node {
int priority;
incrementPriority() {
priority = priority + 1;
if(priority > tree.nextHighestNode(this)) {
tree.remove(this);
tree.add(this);
}
}
decrementPriority() {
priority = priority - 1;
if(priority < tree.nextLowestNode(this)) {
tree.remove(this);
tree.add(this);
}
}
}
If changing a node's priority means that it's in an invalid tree location (meaning that it is higher than what ought to be the next-highest node, or lower than what ought to be the next-lowest node), then it's removed and re-added to the tree (which takes care of rebalancing itself). Insertion is O(log(n)), but usually (when there's no insertions/removals) updating the priority is a constant time operation.
Red-black trees are how balanced binary trees are usually implemented, but there are alternatives e.g. a Tango tree is probably more appropriate here since it's an online implementation. The biggest problem is going to be with concurrency - ideally you would want to be able to implement the nodes' priority fields using some sort of AtomicInteger (permits atomic increments and decrements; quite a few languages have something like this) so that you won't need to lock the field each time you change it, but it will be difficult to atomically compare the priority to the adjacent nodes' priorities.
As an alternative, you can store everything in an array or a linked list and swap adjacent elements when their priorities change - this way you won't need to do a full sort each time, and unlike the balanced binary tree where removing and inserting an element is O(log(n)), swapping two adjacent array/list elements is constant time. The only problem is that adding an entirely new element will be costly with an array as you will need to shift all of the array's elements; it will be O(n) with a list as well because you'll need to traverse the list until you find the correct location to insert the item, but this is probably preferable to the array because you won't need to shift any adjacent elements (which will reduce the amount of locking you need to do).
I've been misguided a bit and so am a bit confused.This is what I have understood as an unsorted priority queue.Can someone please confirm?
An unsorted priority queue is one in which we insert at the end and remove elements based on priority(i,e; smallest value in the queue).
Thank you.
the basic queue in data structure works based on first come first serve (FIFO) first in first out,
the first element was inserted in the queue the one will be served or executed in the queue, and the last element will be served or executed last, this method represent unsorted queue.
for the sorted queue it will sort the inserted elements then execute them as your sorting method
if you added some kind of sorting algorithm attached to the queue it will work as the algorithm work.
for more info :
http://en.wikipedia.org/wiki/Priority_queue
http://en.wikipedia.org/wiki/Sorting_algorithm
The priority queue is an abstract data structure that defines the method get-min, push and pop-min, possibly also union. Whether the concrete implementation is using a sorted container or not should not affect the operations that we should be able to perform.
There are several possible implementations, most popular of which uses binary heap(that in a way is not sorted), but one way also use a sorted list for example. I think maybe wherever you heard about unsorted priority queue the person may have meant a priority queue that is not implemented using a sorted list or other sorted container.
There is a conceptual difference between a Queue and a Priority Queue. A Queue is a first-in, first-out data structure that allows efficient access to both ends of the queue (head and tail). A Priority Queue is an abstract data structure that provides a getBestItem() function, without specificying how (hence abstract).
An Unsorted Priority Queue could refer to a PQ implementation that does no intermittent work (no organization of elements) and implements getBestItem() as a simple, linear search. This makes getBestItem() very inefficient (O(n)), but insert/delete very cheap (O(1)). If Insert/Delete is frequent and getBestItem() is not, this could be a valid choice.
Very often we need to discard repeated states, as stated in uniform-cost search.
if n is in frontier with higher cost
replace existing node with n
Priority Queue doesn't provide an interface for search an item for its priority and then update it. I am surprised I cannot find any resource regarding this, any one can offer help please.
You are looking for a Priority Search Queue.
A priority search queue efficiently supports the
opperations of both a search tree and a priority queue. A Binding is a
product of a key and a priority. Bindings can be inserted, deleted,
modified and queried in the queue (usually in logarithmic time), and the binding with the
least priority can be retrieved in constant time.
Here is an implementation in Haskell.
Many Priority Queue implementations allow keeping some reference to queue element and then use it to delete/update this element.
You can easily keep such references if you implement Priority Queue as a binary search tree. For Binary Heap this is possible, but more difficult: you'll need to update references for all elements, moved upheap or downheap.
There are Priority Queue implementations, allowing efficient update of elements when used with algorithms like uniform-cost search. See Pairing heap and Fibonacci heap.
Actually, you can get away with a regular priority queue for uniform-cost search.
You can insert a new, better (node, cost) pair without deleting the old one. You will always process the newly inserted entry first (because it's better) and processing the older entry will effectively be a no-op. The downside is that you may end up with O(E) elements in the priority queue (instead of O(V)).