I want to know the basic difference between binary, binomial, and Fibonacci heaps and in which scenarios they are best to use.
I am mainly concerned with their application in Dijkstra's algorithm that how it's Time complexity will vary depending on the type of the heap used?
According to Wikipedia, a binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints complete binary tree and heap property. Note that heap property is all nodes are either greater or less than each of children.
Binomial heap is more complex than most of the binary heaps. However, it has excellent merge performance which bound to O(lg N) time. A binomial heap is consist of a list of binomial trees.
Before jumping into Fibonacci heaps, it's probably good to explore why we even need them in the first place. There are plenty of other types of heaps (binary heaps and binomial heaps, for example), so why do we need another one?
The main reason comes up in Dijkstra's algorithm and Prim's algorithm. Both of these graph algorithms work by maintaining a priority queue holding nodes with associated priorities. Interestingly, these algorithms rely on a heap operation called decrease-key that takes an entry already in the priority queue and then decreases its key (i.e. increases its priority). In fact, a lot of the runtime of these algorithms is explained by the number of times you have to call decrease-key. If we could build a data structure that optimized decrease-key, we could optimize the performance of these algorithms. In the case of the binary heap and binomial heap, decrease-key takes time O(log n), where n is the number of nodes in the priority queue. If we could drop that to O(1), then the time complexities of Dijkstra's algorithm and Prim's algorithm would drop from O(m log n) to (m + n log n), which is asymptotically faster than before. Therefore, it makes sense to try to build a data structure that supports decrease-key efficiently.
If you're interested in learning more about Fibonacci heaps, you may want to check out this two-part series of lecture slides. Part one introduces binomial heaps and shows how lazy binomial heaps work.Part two explores Fibonacci heaps. These slides go into more mathematical depth than what I've covered here.
Related
So I'm attempting to figure out algorithms which proceed in a manner most similar to a PQ-based sort for the following structures.
1-heap
3-heap
n-1 heap
BST
Balanced BST
For an example with heapsort and d-heaps. Heapsort uses a 2-heap as an intermediate representation to sort the contents. For heapsort, the PQ is a 2-heap even though any PQ would work.
I'm not sure what you mean by a 1-heap. Using standard terminology, a 1-heap would be a heap with one child per node: a linked list. That wouldn't perform very well.
A 3-heap is just a d-ary heap where d=3. You say that heap sort uses a 2-heap. That's not always true. Many people implement heap sort with a 2-heap, often because they don't know that they could use a 3-heap. Which is unfortunate, because a 3-heap can be more efficient in many cases. I and many others have implemented heap sort with a 3-heap.
If by "n-1 heap" you mean a heap with n-1 children of the root, that's not going to be very efficient. With n-1 children of the root, when you remove the smallest item, you have to search all n-1 children to find the new root. You might as well just repeatedly search a linear list for the next largest item. An "n-1" heap sort will have the same performance as a selection sort.
A balanced binary search tree will perform better than an unbalanced one, but it's expensive to build the tree. The beauty of heap sort is that you can build the heap in O(n), in place. Building a BST is O(n log n). Although removing the smallest node is an O(log n) operation for both, real world performance favors the heap.
All that said, any data structure that can serve as a priority queue can be used in a pq-based sort. Binary heap (more generally, d-ary heap) is commonly used because it's easy to implement and very efficient. Depending on your application, though, something like pairing heap could potentially outperform binary heap.
A heap can be constructed from a list in O(n logn) time, because inserting an element into a heap takes O(logn) time and there are n elements.
Similarly, a binary search tree can be constructed from a list in O(n logn) time, because inserting an element into a BST takes on average logn time and there are n elements.
Traversing a heap from min-to-max takes O(n logn) time (because we have to pop n elements, and each pop requires an O(logn) sink operation). Traversing a BST from min-to-max takes O(n) time (literally just inorder traversal).
So, it appears to me that constructing both structures takes equal time, but BSTs are faster to iterate over. So, why do we use "Heapsort" instead of "BSTsort"?
Edit: Thank you to Tobias and lrlreon for your answers! In summary, below are the points why we use heaps instead of BSTs for sorting.
Construction of a heap can actually be done in O(n) time, not O(nlogn) time. This makes heap construction faster than BST construction.
Additionally, arrays can be easily transformed into heaps in-place, because heaps are always complete binary trees. BSTs can't be easily implemented as an array, since BSTs are not guaranteed to be complete binary trees. This means that BSTs require additional O(n) space allocation to sort, while Heaps require only O(1).
All operations on heaps are guaranteed to be O(logn) time. BSTs, unless balanced, may have O(n) operations. Heaps are dramatically simpler to implement than Balanced BSTs are.
If you need to modify a value after creating the heap, all you need to do is apply the sink or swim operations. Modifying a value in a BST is much more conceptually difficult.
There are multiple reasons I can imagine you would want to prefer a (binary) heap over a search tree:
Construction: A binary heap can actually be constructed in O(n) time by applying the heapify operations bottom-up from the smallest to the largest subtrees.
Modification: All operations of the binary heap are rather straightforward:
Inserted an element at the end? Sift it up until the heap condition holds
Swapped the last element to the beginning? Swift it down until the heap condition holds
Changed the key of an entry? Sift it up or down depending on the direction of the change
Conceptual simplicity: Due to its implicit array representation, a binary heap can be implemented by anyone who knows the basic indexing scheme (2i+1, 2i+2 are the children of i) without considering many difficult special cases.
If you look at these operations in a binary search tree, in theory
they are also quite simple, but the tree has to be stored explicitly, e.g. using pointers, and most of the operations require the tree to be
rebalanced to preserve the O(log n) height, which requires complicated rotations (red black-trees) or splitting/merging
nodes (B-trees)
EDIT: Storage: As Irleon pointed out, to store a BST you also need more storage, as at least two child pointers need to be stored for every entry in addition to the value itself, which can be a large storage overhead especially for small value types. At the same time, the heap needs no additional pointers.
To answer your question about sorting: A BST takes O(n) time to traverse in-order, the construction process takes O(n log n) operations which, as mentioned before, are much more complex.
At the same time Heapsort can actually be implemented in-place by building a max-heap from the input array in O(n) time and and then repeatedly swapping the maximum element to tbe back and shrinking the heap. You can think of Heapsort as Insertion sort with a helpful data structure that lets you find the next maximum in O(log n) time.
If the sorting method consists of storing the elements in a data structure and after extracting in a sorted way, then, although both approaches (heap and bst) have the same asymptotic complexity O(n log n), the heap tends to be faster. The reason is the heap always is a perfectly balanced tree and its operations always are O(log n), in a determistic way, not on average. With bst's, depending on the approah for balancing, insertion and deletion tend to take more time than the heap, no matter which balancing approach is used. In addition, a heap is usually implemented with an array storing the level traversal of the tree, without the need of storing any kind of pointers. Thus, if you know the number of elements, which usually is the case, the extra storage required for a heap is less than the used for a bst.
In the case of sorting an array, there is a very important reason which it would rather be preferable a heap than a bst: you can use the same array for storing the heap; no need to use additional memory.
One standard implementation of the Dijkstra algorithm uses a heap to store distances from the starting node S to all unexplored nodes. The argument for using a heap is that we can efficiently pop the minimum distance from it, in O(log n). However, to maintain the invariant of the algorithm, one also needs to update some of the distances in the heap. This involves:
popping non-min elements from the heaps
computing the updated distances
inserting them back into the heap
I understand that popping non-min elements from a heap can be done in O(log n) if one knows the location of that element in the heap. However, I fail to understand how one can know this location in the case of the Dijkstra algorithm. It sounds like a binary search tree would be more appropriate.
More generally, my understanding is that the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element. Is my understanding correct?
However, I fail to understand how one can know this location in the case of the Dijkstra algorithm.
You need an additional array that keeps track of where in the heap the elements live, or an extra data member inside the heap's elements. This has to be updated after each heap operation.
the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element
Even a BST can be amended to keep a pointer to the min element in addition to the root pointer, giving O(1) access to the min (effectively amortizing the O(lg n) work over the other operations).
The only advantage of heaps in terms of worst-case complexity is the "heapify" algorithm, which turns an array into a heap by reshuffling its elements in-place, in linear time. For Dijkstra's, this doesn't matter, since it's going to do n heap operations of O(lg n) cost apiece anyway.
The real reason for heaps, then, is constants. A properly implemented heap is just a contiguous array of elements, while a BST is a pointer structure. Even when a BST is implemented inside an array (which can be done if the number of elements is known from the start, as in Dijkstra's), the pointers take up more memory, and navigating them takes more time than the integer operations that are used to navigate a heap.
Priority queue:
Basic operations: Insertion
Delete (Delete minumum element)
Goal: To provide efficient running time or order of growth for above functionality.
Implementation of Priority queue By:
Linked List: Insertion will take o(n) in case of insertion at end o(1) in case of
insertion at head.
Delet (Finding minumum and Delete this ) will take o(n)
BST:
Insertion/Deltion of minimum = In avg case it will take o(logn) worst case 0(n)
AVL Tree:
Insertion/deletion/searching: o(log n) in all cases.
My confusion goes here:
Why not we have used AVL Tree for implementation of Priority queue, Why we gone
for Binary heap...While as we know that in AVL Tree we can do insertion/ Deletion/searching in o(log n) in worst case.
Complexity isn't everything, there are other considerations for actual performance.
For most purposes, most people don't even use an AVL tree as a balanced tree (Red-Black trees are more common as far as I've seen), let alone as a priority queue.
This is not to say that AVL trees are useless, I quite like them. But they do have a relatively expensive insert. What AVL trees are good for (beating even Red-Black trees) is doing lots and lots of lookups without modification. This is not what you need for a priority queue.
As a separate consideration -- never mind your O(log n) insert for a binary heap, a fibonacci heap has O(1) insert and O(log N) delete-minimum. There are a lot of data structures to choose from with slightly different trade-offs, so you wouldn't expect to see everyone just pick the first thing that satisfies your (quite brief) criteria.
Binary heap is not Binary Search Tree (BST). If severely unbalanced / deteriorated into a list, it will indeed take O(n) time. Heaps are usually always O(log(n)) or better. IIRC Sedgewick claimed O(1) average-time for array-based heaps.
Why not AVL? Because it maintains too much order in a structure. Too much order means, too much effort went into maintaining that order. The less order we can get away with, the better - it will usually translate to faster operations. For example, RBTs are better than AVL trees. RBTs, red-black trees, are almost balanced trees - they save operations while still ensuring O(log(n)) time.
But any tree is totally-ordered structure, so heaps are generally better, because they only ensure that the minimal element is on top. They are only partially ordered.
Because in a binary heap the minimum element is the root.
I am interested in implementing a priority queue to enable an efficient Astar implementation that is also relatively simple (the priority queue is simple I mean).
It seems that because a Skip List offers a simple O(1) extract-Min operation and an insert operation that is O(Log N) it may be competitive with the more difficult to implement Fibonacci Heap which has O(log N) extract-Min and an O(1) insert. I suppose that the Skip-List would be better for a graph with sparse nodes whereas a Fibonacci heap would be better for an environment with more densely connected nodes.
This would probably make the Fibonacci Heap usually better, but am I correct in assuming that Big-Oh wise these would be similar?
The raison d'etre of the Fibonacci heap is the O(1) decrease-key operation, enabling Dijkstra's algorithm to run in time O(|V| log |V| + |E|). In practice, however, if I needed an efficient decrease-key operation, I'd use a pairing heap, since the Fibonacci heap has awful constants. If your keys are small integers, it may be even better just to use bins.
Fibonacci heaps are very very very slow except for very very very very large and dense graphs (on the order of hundreds of millions of edges). They are also notoriously difficult to implement correctly.
On the other hand, skip lists are very nice data structures and relatively simple to implement.
However I wonder why you're not considering using a simple binary heap. I believe binary heaps-based priority queues are even faster than skip list-based priority queues. Skip lists are mainly used to take advantage of concurrency.