Priority Queue - Skip List vs. Fibonacci Heap - algorithm

I am interested in implementing a priority queue to enable an efficient Astar implementation that is also relatively simple (the priority queue is simple I mean).
It seems that because a Skip List offers a simple O(1) extract-Min operation and an insert operation that is O(Log N) it may be competitive with the more difficult to implement Fibonacci Heap which has O(log N) extract-Min and an O(1) insert. I suppose that the Skip-List would be better for a graph with sparse nodes whereas a Fibonacci heap would be better for an environment with more densely connected nodes.
This would probably make the Fibonacci Heap usually better, but am I correct in assuming that Big-Oh wise these would be similar?

The raison d'etre of the Fibonacci heap is the O(1) decrease-key operation, enabling Dijkstra's algorithm to run in time O(|V| log |V| + |E|). In practice, however, if I needed an efficient decrease-key operation, I'd use a pairing heap, since the Fibonacci heap has awful constants. If your keys are small integers, it may be even better just to use bins.

Fibonacci heaps are very very very slow except for very very very very large and dense graphs (on the order of hundreds of millions of edges). They are also notoriously difficult to implement correctly.
On the other hand, skip lists are very nice data structures and relatively simple to implement.
However I wonder why you're not considering using a simple binary heap. I believe binary heaps-based priority queues are even faster than skip list-based priority queues. Skip lists are mainly used to take advantage of concurrency.

Related

Efficient algorithm for finding N min values in a min pairing-heap

I'm using the pairing heap implementation found here: https://github.com/jemalloc/jemalloc/commits/dev/include/jemalloc/internal/ph.h
Once in a while though I need to iterate over the N min values in the heap (where N is bound by the number of elements in the heap but usually less). Is there any efficient way of doing this?
My current approach is calling remove_first() N times and then pushing all the popped elements back via insert().
Your approach is O(k log n), where k is the number of items you want to print and n is the number of elements in the heap. It looks like these operations have been optimized quite extensively in the implementation you're using. There is a way to traverse the heap with another heap to solve your goal in O(k log k) instead, which is faster with the log factor on k instead of n. The approach is fairly simple: Maintain a min-heap of values (with pointers to the nodes in the tree) initialized to the root (which is the minimum value in the heap). Then, you can pop off the auxiliary heap and insert the current node's children in the main heap, which is faster since only the nodes which could possibly be the next smallest value are in the queue (which are the neighbors of the nodes you've taken so far).
While the big-O complexity of this approach is technically better, it will almost certainly be much slower in practice. The implementation of this min-pairing heap seems very efficient, and it would almost certainly outweigh the overhead of creating an auxiliary heap and performing this search. Not to mention the extra code complexity with the possibility of introducing bugs, it's probably not worth it.
I'm pretty sure that you can't do better than O(k log k). My intuition for this is that there's probably a constant time reduction to sorting if you could do better, but comparison-based sorting (iirc) has been proven to be impossible to solve in faster than O(n log n). This intuition could be wrong, but I think it's probably pretty close to the truth. Best of luck!

Which implementation is best for Prims algorithm , using Set or Prority Queue ? why?

I know about the implementation of both the data structures , i want to know which is better considering time complexity.
Both have same insertion and erase complexity O(log n), while get min is for both O(1).
A priority queue only gives you access to one element in sorted order ie, you can get the highest/lowest priority item, and when you remove that, you can get the next one, and so on. A set allows you full access in sorted order, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other.
In priority queue you can have multiple elements with same priority value, while in set you can't.
Set are generally backed by a binary tree, while priority queue is heap.
So the question is when should you use a binary tree instead of a heap?
In my opinion you should use neither of them. Check Binomial and Fibonacci heap. For prime algorithm they will have better performance.
If you insist in using one of them, I would go with priority queue, as it have smaller memory footprint and can have multiple elements with same priority value.
Theoretically speaking, both will give you an O(E log V)-time algorithm. This is not optimal; Fibonacci heaps give you O(E + V log V), which is better for dense graphs (E >> V).
Practically speaking, neither is ideally suited. Since set has long lived iterators, it's possible to implement a DecreaseKey operation, reducing the extra storage from O(E) to O(V) (the workaround is to enqueue vertices multiple times), but the space constant is worse than priority_queue, and the time constant probably is as well. You should measure your use case.
I will second Jim Mischel's recommendation of binary heap (a.k.a., priority_queue) -> pairing heap if the builtin isn't fast enough.

What is the difference between binary, binomial, and Fibonacci heaps?

I want to know the basic difference between binary, binomial, and Fibonacci heaps and in which scenarios they are best to use.
I am mainly concerned with their application in Dijkstra's algorithm that how it's Time complexity will vary depending on the type of the heap used?
According to Wikipedia, a binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints complete binary tree and heap property. Note that heap property is all nodes are either greater or less than each of children.
Binomial heap is more complex than most of the binary heaps. However, it has excellent merge performance which bound to O(lg N) time. A binomial heap is consist of a list of binomial trees.
Before jumping into Fibonacci heaps, it's probably good to explore why we even need them in the first place. There are plenty of other types of heaps (binary heaps and binomial heaps, for example), so why do we need another one?
The main reason comes up in Dijkstra's algorithm and Prim's algorithm. Both of these graph algorithms work by maintaining a priority queue holding nodes with associated priorities. Interestingly, these algorithms rely on a heap operation called decrease-key that takes an entry already in the priority queue and then decreases its key (i.e. increases its priority). In fact, a lot of the runtime of these algorithms is explained by the number of times you have to call decrease-key. If we could build a data structure that optimized decrease-key, we could optimize the performance of these algorithms. In the case of the binary heap and binomial heap, decrease-key takes time O(log n), where n is the number of nodes in the priority queue. If we could drop that to O(1), then the time complexities of Dijkstra's algorithm and Prim's algorithm would drop from O(m log n) to (m + n log n), which is asymptotically faster than before. Therefore, it makes sense to try to build a data structure that supports decrease-key efficiently.
If you're interested in learning more about Fibonacci heaps, you may want to check out this two-part series of lecture slides. Part one introduces binomial heaps and shows how lazy binomial heaps work.Part two explores Fibonacci heaps. These slides go into more mathematical depth than what I've covered here.

Can heaps really only use O(1) auxiliary storage?

One of the biggest advantages of using heaps for priority queues (as opposed to, say, red-black trees) seems to be space-efficiency: unlike balanced BSTs, heaps only require O(1) auxiliary storage.
i.e, the ordering of the elements alone is sufficient to satisfy the invariants and guarantees of a heap, and no child/parent pointers are necessary.
However, my question is: is the above really true?
It seems to me that, in order to use an array-based heap while satisfying the O(log n) running time guarantee, the array must be dynamically expandable.
The only way we can dynamically expand an array in O(1) time is to over-allocate the memory that backs the array, so that we can amortize the memory operations into the future.
But then, doesn't that overallocation imply a heap also requires auxiliary storage?
If the above is true, that seems to imply heaps have no complexity advantages over balanced BSTs whatsoever, so then, what makes heaps "interesting" from a theoretical point of view?
You appear to confuse binary heaps, heaps in general, and implementations of binary heaps that use an array instead of an explicit tree structure.
A binary heap is simply a binary tree with properties that make it theoretically interesting beyond memory use. They can be built in linear time, whereas building a BST necessarily takes n log n time. For example, this can be used to select the k smallest/largest values of a sequence in better-than-n log n time.
Implementing a binary heap as an array yields an implicit data structure. This is a topic formalized by theorists, but not actively pursued by most of them. But in any case, the classification is justified: To dynamically expand this structure one indeed needs to over-allocate, but not every data structure has to grow dynamically, so the case of a statically-sized pre-allocated data structure is interesting as well.
Furthermore, there is a difference between space needed for faster growth and space needed because each element is larger than it has to be. The first can be avoided by not growing, and also reduced to arbitrarily small constant factor of the total size, at the cost of a greater constant factor on running time. The latter kind of space overhead is usually unavoidable and can't be reduced much (the pointers in a tree are at least log n bits, period).
Finally, there are many heaps other than binary heaps (fibonacci, binominal, leftist, pairing, ...), and almost all except binary heaps offer better bounds for at least some operations. The most common ones are decrease-key (alter the value of a key already in the structure in a certain way) and merge (combined two heaps into one). The complexity of these operations are important for the analysis of several algorithms using priority queues, and hence the motivation for a lot of research into heaps.
In practice the memory use is important. But (with over-allocation and without), the difference is only a constant factor overall, so theorists are not terribly interested in binary heaps. They'd rather get better complexity for decrease-key and merge; most of them are happy if the data structure takes O(n) space. The extremely high memory density, ease of implementation and cache friendliness are far more interesting for practitioners, and it's them who sing the praise of binary heaps far and wide.

Best algorithm/data structure for a continually updated priority queue

I need to frequently find the minimum value object in a set that's being continually updated. I need to have a priority queue type of functionality. What's the best algorithm or data structure to do this? I was thinking of having a sorted tree/heap, and every time the value of an object is updated, I can remove the object, and re-insert it into the tree/heap. Is there a better way to accomplish this?
A binary heap is hard to beat for simplicity, but it has the disadvantage that decrease-key takes O(n) time. I know, the standard references say that it's O(log n), but first you have to find the item. That's O(n) for a standard binary heap.
By the way, if you do decide to use a binary heap, changing an item's priority doesn't require a remove and re-insert. You can change the item's priority in-place and then either bubble it up or sift it down as required.
If the performance of decrease-key is important, a good alternative is a pairing heap, which is theoretically slower than a Fibonacci heap, but is much easier to implement and in practice is faster than the Fibonacci heap due to lower constant factors. In practice, pairing heap compares favorably with binary heap, and outperforms binary heap if you do a lot of decrease-key operations.
You could also marry a binary heap and a dictionary or hash map, and keep the dictionary updated with the position of the item in the heap. This gives you faster decrease-key at the cost of more memory and increased constant factors for the other operations.
Quoting Wikipedia:
To improve performance, priority queues typically use a heap as their
backbone, giving O(log n) performance for inserts and removals, and
O(n) to build initially. Alternatively, when a self-balancing binary
search tree is used, insertion and removal also take O(log n) time,
although building trees from existing sequences of elements takes O(n
log n) time; this is typical where one might already have access to
these data structures, such as with third-party or standard libraries.
If you are looking for a better way, there must be something special about the objects in your priority queue. For example, if the keys are numbers from 1 to 10, a countsort-based approach may outperform the usual ones.
If your application looks anything like repeatedly choosing the next scheduled event in a discrete event simulation, you might consider the options listed in e.g. http://en.wikipedia.org/wiki/Discrete_event_simulation and http://www.acm-sigsim-mskr.org/Courseware/Fujimoto/Slides/FujimotoSlides-03-FutureEventList.pdf. The later summarizes results from different implementations in this domain, including many of the options considered in other comments and answers - and a search will find a number of papers in this area. Priority queue overhead really does make some difference in how many times real time you can get your simulation to run - and if you wish to simulate something that takes weeks of real time this can be important.

Resources