Implementing priority queue using max heap vs balanced BST

Implementing priority queue using max heap vs balanced BST - algorithm

Balanced BST and max heap both perform insert and delete in O(logn). However, finding max value in a max heap is O(1) but this is O(logn) in balanced BST.
If we remove the max value in a max heap it takes O(logn) because it is a delete operation.
In balanced BST, deleting the max element = finding max value + delete; it equals logn + logn reduces to O(logn). So even deleting the max value in balanced BST is O(logn).
I have read one such application of max heap is a priority queue and its primary purpose is to remove the max value for every dequeue operation. If deleting max element is O(logn) for both max heap and balanced BST, I have the following questions
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
All the time complexities are calculated for worst-case. Any help is greatly appreciated.

What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Some advantages of a heap are:
Given an unsorted input array, a heap can still be built in O(n) time, while a BST needs O(nlogn) time.
If the initial input is an array, that same array can serve as heap, meaning no extra memory is needed for it. Although one could think of ways to create a BST using the data in-place in the array, it would be quite odd (for primitive types) and give more processing overhead. A BST is usually created from scratch, copying the data into the nodes as they are created.
Interesting fact: a sorted array is also a heap, so if it is known that the input is sorted, nothing needs to be done to build the heap.
A heap can be stored as an array without the need of storing cross references, while a BST usually consists of nodes with left & right references. This has at least two consequences:
The memory used for a BST is about 3 times greater than for a heap.
Although several operations have the same time complexity for both heap and BST, the overhead for adapting a BST is much greater, so that the actual time spent on these operations is a (constant) factor greater in the BST case.
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
A heap is in fact a complete binary tree, so it is always as balanced as it can be: the leaves will always be positioned in the last or one-but-last level. A self-balancing BST (like AVL, red-black,...) cannot beat that high level of balancing, where you will often have leaves occurring at three levels or even more.
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
Yes, this is true. So if the application needs the search feature, then a BST is superior.

What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Nope. Max heap fits better, since it is carefully instrumented to return next (respecting priority) element ASAP, in O(1) time. That's what you want from the simplest possible priority queue.
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
Nope. There is a balance as well. Long story short, balancing a heap is done by shift-up or shift-down operations (swapping elements which are out of order).
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
Yeah! As well as linked list could be used or array. It is just gonna be more expensive in terms of O-notation and much slower on practice.

Related

Why do we sort via Heaps instead of Binary Search Trees?

A heap can be constructed from a list in O(n logn) time, because inserting an element into a heap takes O(logn) time and there are n elements.
Similarly, a binary search tree can be constructed from a list in O(n logn) time, because inserting an element into a BST takes on average logn time and there are n elements.
Traversing a heap from min-to-max takes O(n logn) time (because we have to pop n elements, and each pop requires an O(logn) sink operation). Traversing a BST from min-to-max takes O(n) time (literally just inorder traversal).
So, it appears to me that constructing both structures takes equal time, but BSTs are faster to iterate over. So, why do we use "Heapsort" instead of "BSTsort"?
Edit: Thank you to Tobias and lrlreon for your answers! In summary, below are the points why we use heaps instead of BSTs for sorting.
Construction of a heap can actually be done in O(n) time, not O(nlogn) time. This makes heap construction faster than BST construction.
Additionally, arrays can be easily transformed into heaps in-place, because heaps are always complete binary trees. BSTs can't be easily implemented as an array, since BSTs are not guaranteed to be complete binary trees. This means that BSTs require additional O(n) space allocation to sort, while Heaps require only O(1).
All operations on heaps are guaranteed to be O(logn) time. BSTs, unless balanced, may have O(n) operations. Heaps are dramatically simpler to implement than Balanced BSTs are.
If you need to modify a value after creating the heap, all you need to do is apply the sink or swim operations. Modifying a value in a BST is much more conceptually difficult.

There are multiple reasons I can imagine you would want to prefer a (binary) heap over a search tree:
Construction: A binary heap can actually be constructed in O(n) time by applying the heapify operations bottom-up from the smallest to the largest subtrees.
Modification: All operations of the binary heap are rather straightforward:
Inserted an element at the end? Sift it up until the heap condition holds
Swapped the last element to the beginning? Swift it down until the heap condition holds
Changed the key of an entry? Sift it up or down depending on the direction of the change
Conceptual simplicity: Due to its implicit array representation, a binary heap can be implemented by anyone who knows the basic indexing scheme (2i+1, 2i+2 are the children of i) without considering many difficult special cases.
If you look at these operations in a binary search tree, in theory
they are also quite simple, but the tree has to be stored explicitly, e.g. using pointers, and most of the operations require the tree to be
rebalanced to preserve the O(log n) height, which requires complicated rotations (red black-trees) or splitting/merging
nodes (B-trees)
EDIT: Storage: As Irleon pointed out, to store a BST you also need more storage, as at least two child pointers need to be stored for every entry in addition to the value itself, which can be a large storage overhead especially for small value types. At the same time, the heap needs no additional pointers.
To answer your question about sorting: A BST takes O(n) time to traverse in-order, the construction process takes O(n log n) operations which, as mentioned before, are much more complex.
At the same time Heapsort can actually be implemented in-place by building a max-heap from the input array in O(n) time and and then repeatedly swapping the maximum element to tbe back and shrinking the heap. You can think of Heapsort as Insertion sort with a helpful data structure that lets you find the next maximum in O(log n) time.

If the sorting method consists of storing the elements in a data structure and after extracting in a sorted way, then, although both approaches (heap and bst) have the same asymptotic complexity O(n log n), the heap tends to be faster. The reason is the heap always is a perfectly balanced tree and its operations always are O(log n), in a determistic way, not on average. With bst's, depending on the approah for balancing, insertion and deletion tend to take more time than the heap, no matter which balancing approach is used. In addition, a heap is usually implemented with an array storing the level traversal of the tree, without the need of storing any kind of pointers. Thus, if you know the number of elements, which usually is the case, the extra storage required for a heap is less than the used for a bst.
In the case of sorting an array, there is a very important reason which it would rather be preferable a heap than a bst: you can use the same array for storing the heap; no need to use additional memory.

Why would one use a heap over a self balancing binary search tree?

What are the uses of heaps? Whatever a heap can do can also be done by a self-balancing binary search tree like an AVL tree. The most common use of heap is to find the minimum (or maximum) element in O(1) time (which is always the root). This functionality can also be included while constructing the AVL tree by maintaining a pointer to the minimum(or maximum) element, and min/max queries can be answered in O(1) time.
The only benefit of heaps over AVL trees I can think of is that AVL trees use a bit more memory because of pointers. Is there any other advantage/functionality of using a heap over an AVL tree?

A heap may have better insert and merge times. It really depends on the type of heap, but typically they are far less strict than an AVL because they don't have to worry about auto balancing after each operation.
A heap merely guarantees that all nodes follow the same style of ordering across the heap. There are of course more strict heaps like a binary heap that make inserting and merging more difficult since ordering matters more, but this is not always the case.
For example, the insert and merge times for a Fibonacci heap would be O(1) vs O(log n) for the AVL.
It is also more difficult to build the full AVL when compared to a heap.
We typically use heaps when we just want quick access to the min and max items and don't care about perfect ordering of other elements. With fast insertion, we can deal with many elements quickly and always keep our attention on the most important (or least important) ones.

You are correct when you say that a self-balancing binary tree can do strictly more things than a heap, and that heaps use less space for pointers. Here are some additional considerations:
The binary heap takes much less code to implement than an AVL tree. This makes coding, debugging, and modification significantly easier.
An AVL tree uses one object container and two pointers per data item stored. The binary heap uses zero overhead per data item stored - it is all packed into one array.

The main reason is a binary heap is actually implemented as an array, not a tree (the tree is a metaphor, the actual implementation is an array, where the children of element with index i are elements with indices 2i+1 and 2i+2). An array is extremely more efficient (in constants) than a tree, both in space and time - due to locality of reference, making the data structure much more cache efficient, which usually results in much better constants.
In addition, initializing a binary heap with n elements takes O(n) time, while doing the same for a BST takes O(nlogn) time.

Advantages of heaps over binary trees in the Dijkstra algorithm

One standard implementation of the Dijkstra algorithm uses a heap to store distances from the starting node S to all unexplored nodes. The argument for using a heap is that we can efficiently pop the minimum distance from it, in O(log n). However, to maintain the invariant of the algorithm, one also needs to update some of the distances in the heap. This involves:
popping non-min elements from the heaps
computing the updated distances
inserting them back into the heap
I understand that popping non-min elements from a heap can be done in O(log n) if one knows the location of that element in the heap. However, I fail to understand how one can know this location in the case of the Dijkstra algorithm. It sounds like a binary search tree would be more appropriate.
More generally, my understanding is that the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element. Is my understanding correct?

However, I fail to understand how one can know this location in the case of the Dijkstra algorithm.
You need an additional array that keeps track of where in the heap the elements live, or an extra data member inside the heap's elements. This has to be updated after each heap operation.
the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element
Even a BST can be amended to keep a pointer to the min element in addition to the root pointer, giving O(1) access to the min (effectively amortizing the O(lg n) work over the other operations).
The only advantage of heaps in terms of worst-case complexity is the "heapify" algorithm, which turns an array into a heap by reshuffling its elements in-place, in linear time. For Dijkstra's, this doesn't matter, since it's going to do n heap operations of O(lg n) cost apiece anyway.
The real reason for heaps, then, is constants. A properly implemented heap is just a contiguous array of elements, while a BST is a pointer structure. Even when a BST is implemented inside an array (which can be done if the number of elements is known from the start, as in Dijkstra's), the pointers take up more memory, and navigating them takes more time than the integer operations that are used to navigate a heap.

What are the differences between heap and red-black tree?

We know that heaps and red-black tree both have these properties:
worst-case cost for searching is lgN;
worst-case cost for insertion is lgN.
So, since the implementation and operation of red-black trees is difficult, why don't we just use heaps instead of red-black trees? I am confused.

You can't find an arbitrary element in a heap in O(log n). It takes O(n) to do this. You can find the first element (the smallest, say) in a heap in O(1) and extract it in O(log n). Red-black trees and heaps have quite different uses, internal orderings, and implementations: see below for more details.
Typical use
Red-black tree: storing dictionary where as well as lookup you want elements sorted by key, so that you can for example iterate through them in order. Insert and lookup are O(log n).
Heap: priority queue (and heap sort). Extraction of minimum and insertion are O(log n).
Consistency constraints imposed by structure
Red-black tree: total ordering: left child < parent < right child.
Heap: dominance: parent < children only.
(note that you can substitute a more general ordering than <)
Implementation / Memory overhead
Red-black tree: pointers used to represent structure of tree, so overhead per element. Typically uses a number of nodes allocated on free store (e.g. using new in C++), nodes point to other nodes. Kept balanced to ensure logarithmic lookup / insertion.
Heap: structure is implicit: root is at position 0, children of root at 1 and 2, etc, so no overhead per element. Typically just stored in a single array.

Red Black Tree:
Form of a binary search tree with a deterministic balancing strategy. This Balancing guarantees good performance and it can always be searched in O(log n) time.
Heaps:
We need to search through every element in the heap in order to determine if an element is inside. Even with optimization, I believe search is still O(N). On the other hand, It is best for finding min/max in a set O(1).

Can we use binary search tree to simulate heap operation?

I was wondering if we can use a binary search tree to simulate heap operations (insert, find minimum, delete minimum), i.e., use a BST for doing the same job?
Are there any kind of benefits for doing so?

Sure we can. but with a balanced BST.
The minimum is the leftest element. The maximum is the rightest element. finding those elements is O(logn) each, and can be cached on each insert/delete, after the data structure was modified [note there is room for optimizations here, but this naive approach also doesn't contradict complexity requirement!]
This way you get insert,delete: O(logn), findMin/findMax: O(1)
EDIT:
The only advantage I can think of in this implementtion is that you get both findMin,findMax in one data structure.
However, this solution will be much slower [more ops per step, more cache misses are expected...] and consume more space then the regular array-based implementation of a heap.

Yes, but you lose the O(1) average insert of the heap
As others mentioned, you can use a BST to simulate a heap.
However this has one major downside: you lose the O(1) insert average time, which is basically the only reason to use the heap in the first place: https://stackoverflow.com/a/29548834/895245
If you want to track both min and max on a heap, I recommend that you do it with two heaps instead of a BST to keep the O(1) insert advantage.

Yes, we can, by simply inserting and finding the minimum into the BST. There are few benefits, however, since a lookup will take O(log n) time and other functions receive similar penalties due to the stricter ordering enforced throughout the tree.

Basically, I agree with #amit answer. I will elaborate more on the implementation of this modified BST.
Heap can do findMin or findMax in O(1) but not both in the same data structure. With a slight modification, the BST can do both findMin and findMax in O(1).
In this modified BST, you keep track of the the min node and max node every time you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)
In this BST (specifically Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). Thanks to #JimMischel comment below however we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)
In my opinion, generally Heap can be replaced by Balanced BST because BST perform better in almost all of the heap data structure can do. However, I am not sure if Heap should be considered as an obsolete data structure. (What do you think?)
PS: Have to cross reference to different questions: https://stackoverflow.com/a/27074221/764592

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio