I was wondering if we can use a binary search tree to simulate heap operations (insert, find minimum, delete minimum), i.e., use a BST for doing the same job?
Are there any kind of benefits for doing so?
Sure we can. but with a balanced BST.
The minimum is the leftest element. The maximum is the rightest element. finding those elements is O(logn) each, and can be cached on each insert/delete, after the data structure was modified [note there is room for optimizations here, but this naive approach also doesn't contradict complexity requirement!]
This way you get insert,delete: O(logn), findMin/findMax: O(1)
EDIT:
The only advantage I can think of in this implementtion is that you get both findMin,findMax in one data structure.
However, this solution will be much slower [more ops per step, more cache misses are expected...] and consume more space then the regular array-based implementation of a heap.
Yes, but you lose the O(1) average insert of the heap
As others mentioned, you can use a BST to simulate a heap.
However this has one major downside: you lose the O(1) insert average time, which is basically the only reason to use the heap in the first place: https://stackoverflow.com/a/29548834/895245
If you want to track both min and max on a heap, I recommend that you do it with two heaps instead of a BST to keep the O(1) insert advantage.
Yes, we can, by simply inserting and finding the minimum into the BST. There are few benefits, however, since a lookup will take O(log n) time and other functions receive similar penalties due to the stricter ordering enforced throughout the tree.
Basically, I agree with #amit answer. I will elaborate more on the implementation of this modified BST.
Heap can do findMin or findMax in O(1) but not both in the same data structure. With a slight modification, the BST can do both findMin and findMax in O(1).
In this modified BST, you keep track of the the min node and max node every time you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)
In this BST (specifically Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). Thanks to #JimMischel comment below however we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)
In my opinion, generally Heap can be replaced by Balanced BST because BST perform better in almost all of the heap data structure can do. However, I am not sure if Heap should be considered as an obsolete data structure. (What do you think?)
PS: Have to cross reference to different questions: https://stackoverflow.com/a/27074221/764592
Related
Balanced BST and max heap both perform insert and delete in O(logn). However, finding max value in a max heap is O(1) but this is O(logn) in balanced BST.
If we remove the max value in a max heap it takes O(logn) because it is a delete operation.
In balanced BST, deleting the max element = finding max value + delete; it equals logn + logn reduces to O(logn). So even deleting the max value in balanced BST is O(logn).
I have read one such application of max heap is a priority queue and its primary purpose is to remove the max value for every dequeue operation. If deleting max element is O(logn) for both max heap and balanced BST, I have the following questions
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
All the time complexities are calculated for worst-case. Any help is greatly appreciated.
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Some advantages of a heap are:
Given an unsorted input array, a heap can still be built in O(n) time, while a BST needs O(nlogn) time.
If the initial input is an array, that same array can serve as heap, meaning no extra memory is needed for it. Although one could think of ways to create a BST using the data in-place in the array, it would be quite odd (for primitive types) and give more processing overhead. A BST is usually created from scratch, copying the data into the nodes as they are created.
Interesting fact: a sorted array is also a heap, so if it is known that the input is sorted, nothing needs to be done to build the heap.
A heap can be stored as an array without the need of storing cross references, while a BST usually consists of nodes with left & right references. This has at least two consequences:
The memory used for a BST is about 3 times greater than for a heap.
Although several operations have the same time complexity for both heap and BST, the overhead for adapting a BST is much greater, so that the actual time spent on these operations is a (constant) factor greater in the BST case.
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
A heap is in fact a complete binary tree, so it is always as balanced as it can be: the leaves will always be positioned in the last or one-but-last level. A self-balancing BST (like AVL, red-black,...) cannot beat that high level of balancing, where you will often have leaves occurring at three levels or even more.
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
Yes, this is true. So if the application needs the search feature, then a BST is superior.
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Nope. Max heap fits better, since it is carefully instrumented to return next (respecting priority) element ASAP, in O(1) time. That's what you want from the simplest possible priority queue.
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
Nope. There is a balance as well. Long story short, balancing a heap is done by shift-up or shift-down operations (swapping elements which are out of order).
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
Yeah! As well as linked list could be used or array. It is just gonna be more expensive in terms of O-notation and much slower on practice.
I'm working on this problem but I'm pretty confused on how to solve it:
Design a data structure that supports the following operations in amortized O(log n) time, where n is the total number of elements:
Ins(k): Insert a new element with key k
Extract-Max: Find and remove the element with largest key
Extract-Min: Find and remove the element with smallest key
Union: Merge two different sets of elements
How do I calculate the amortized time? Isn't this already something like a hash table? Or is it a variant of it?
I would really appreciate if someone can help me with this.
Thank you!!
What you're proposing isn't something that most hash tables are equipped to deal with because hash tables don't usually support finding the min and max elements quickly while supporting deletions.
However, this is something that you could do with a pair of priority queues that support melding. For example, suppose that you back your data structure with two binomial heaps - a min-heap and a max-heap. Every time you insert an element into your data structure, you add it to both the min-heap and the max-heap. However, you slightly modify the two heaps so that each element in the heap stores a pointer to its corresponding element in the other heap; that way, given a node in the min-heap, you can find the corresponding node in the max-heap and vice-versa.
Now, to do an extract-min or extract-max, you just apply a find-min operation to the min-heap or a find-max operation to the max-heap to get the result. Then, delete that element from both heaps using the normal binomial heap delete operation. (You can use the pointer you set up during the insert step to quickly locate the sibling element in the other heap).
Finally, for a union operation, just apply the normal binomial heap merge operation to the corresponding min-heaps and max-heaps.
Since all of the described operations requires O(1) operations on binomial heaps, each of them runs in time O(log n) worst-case, with no amortization needed.
Generally speaking, the data structure you're describing is called a double-ended priority queue. There are a couple of specialized data structures you can use to meet those requirements, though the one described above is probably the easiest to build with off-the-shelf components.
One standard implementation of the Dijkstra algorithm uses a heap to store distances from the starting node S to all unexplored nodes. The argument for using a heap is that we can efficiently pop the minimum distance from it, in O(log n). However, to maintain the invariant of the algorithm, one also needs to update some of the distances in the heap. This involves:
popping non-min elements from the heaps
computing the updated distances
inserting them back into the heap
I understand that popping non-min elements from a heap can be done in O(log n) if one knows the location of that element in the heap. However, I fail to understand how one can know this location in the case of the Dijkstra algorithm. It sounds like a binary search tree would be more appropriate.
More generally, my understanding is that the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element. Is my understanding correct?
However, I fail to understand how one can know this location in the case of the Dijkstra algorithm.
You need an additional array that keeps track of where in the heap the elements live, or an extra data member inside the heap's elements. This has to be updated after each heap operation.
the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element
Even a BST can be amended to keep a pointer to the min element in addition to the root pointer, giving O(1) access to the min (effectively amortizing the O(lg n) work over the other operations).
The only advantage of heaps in terms of worst-case complexity is the "heapify" algorithm, which turns an array into a heap by reshuffling its elements in-place, in linear time. For Dijkstra's, this doesn't matter, since it's going to do n heap operations of O(lg n) cost apiece anyway.
The real reason for heaps, then, is constants. A properly implemented heap is just a contiguous array of elements, while a BST is a pointer structure. Even when a BST is implemented inside an array (which can be done if the number of elements is known from the start, as in Dijkstra's), the pointers take up more memory, and navigating them takes more time than the integer operations that are used to navigate a heap.
Say you have a lot of (key, value) objects to keep track of, with many insertions as well as deletions.
You need to satisfy 3 requirements:
get the maximum key in constant time at any point
look up the value of any key in logarithmic time.
insertions and deletes take logarithmic time.
Is there a data structure that can do this?
My thoughts:
priority queues can get max in constant time, but i can't lookup values.
binary search trees (2-3 trees) can lookup in logarithmic time, but max takes O(lgN) as well.
if i try to keep track of the max in a BST, it takes O(lgN) when I have to delete the max and find the second greatest.
Why we need those fancy data structs? I think a simple Binary Search Tree with tracking the Max node can serve OP's requirment well.
You can track the node with the max key:
whenever you insert a new node, you compare the key with the previous max key to decide if this is a new max node
whenever you delete the max node, it takes O(logN) to find the next max node
You certainly have O(logN) lookup time with the nature of BST
BST's update takes O(logN) time
You can just use two data structures in parallel-
Store the key/value pairs in a hash table or balanced BST to get O(log n) lookups, and
Store all the values in a max heap so that you can look up the max in O(1) time.
This makes insertion or deletion take O(log n) time, since that's the time complexity of inserting or deleting from the max heap.
Hope this helps!
Skip lists have an amortized O(logn) lookup, and they're a linked list so min and max is always O(1). http://en.wikipedia.org/wiki/Skip_list
I know a hash table has O(1) search time due to the fact that you use keys and you can instantaneously look up that value. As far as the max value, you may be able to constantly keep track of that every time that you insert or delete a value.
How about a list sorted in descending order?
Max is always first so O(1).
Look-up is O(log n) through binary search.
Insertion/Deletion is O(n) because you'll have to shift n-i items when inserting/deleting from position i.
Since your are using key value pairs a best solution i can suggest you is to use TreeMap in java.
You can simply use the following 4 methods present in the Treemap.
get() and put(key,value) methods for insert and retrieve
lastKey() for finding max key.
remove(key) for deletion.
.or
use a following structure as in this page
Final conclusion:
If you have are going to trade off space complexity and keen on running time you need to have 2 data structures.
Use a HashMap or TreeMap which has O(1) for insert,retrieval and remove.
Then as per the second link i provided use a two stack data structure to find the max or min of o(1).
I think this is the best possible solution i can give.
Take a look at RMQ (Range minimum-maximum Query) data structure or segment tree data structure. They both has a O(1) query time, BUT you will have to modify them somehow to store values also..
Here is nice article http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor
As the first comment says, use a max heap. Use a hashmap to store pointers into the heap. These are used for the constant time lookup and log time delete.
Heaps are very simple to implement. They don't require balancing like BST's. Hashmaps are usually built into your language of choice.
Deletion in a tree data structure is already an O(logN) operation, so looking for the second greatest key is not going to change the complexity of the operation.
Though, you can invalidate elements instead of deleting then, and if you keep back pointers inside your data structure, moving for the greatest to the second greatest could be an O(N) operation.
I am just trying to learn binary heap and have a doubt regarding doing delete operation in binary heap.
I have read that we can delete an element from binary heap and we need to reheapify it.
But at the following link, it says unavailable:
http://en.wikibooks.org/wiki/Data_Structures/Tradeoffs
Binary Search AVL Tree Binary Heap (min) Binomial Queue (min)
Find O(log n) O(log n) unavailable unavailable
Delete element O(log n O(log n) unavailable unavailable
I am little confused about it.
Thanks in advance for all of the clarifications.
Binary heaps and other priority queue structures don't usually support a general "delete element" operation; you need an additional data structure that keeps track of each element's index in the heap, e.g. a hash table. If you have that, you can implement a general delete operation as
find-element, O(1) expected time with a hash table
decrease key to less than the minimum, O(lg n) time
delete-min and update the hash table, O(lg n) combined expected time.
A regular delete is possible, just like a DeleteMin/Max. The "problem" is that you have to check for both up- and downshifts (i.e.: when the "last" node takes up the vacant spot, it can be over- or underevaluated. Since it still can't be both, for obvious reasons, it's easy to check for correctness.
The only problem that remains is the Find. The answer above states that you can Find Element in O(lg n), but I wouldn't know how. In my implementations, I generally build a Heap of pointers-to-elements rather than elements (cheaper copying during up/downshifts). I add a "position" variable to the Element type, which keeps track of the index of the Element's pointer in the Heap. This way, given an element E, I can find it's position in the Heap in constant time.
Obviously, this isn't cut out for every implementation.
I am confused why delete operation of binary heap is mentioned as unavailable in the link of your question. Deletion in binary heap quite possible and it's composition of 2 other operations of binary heap.
https://en.wikipedia.org/wiki/Binary_heap
I am considering you know all other operations of Binary Heap
Deleting a key from binary heap requires 2 lines of code/operation. Suppose you want to delete item at index x. Decrease it's value to lowest integer possible. That's Integer.MIN_VALUE. Since it's lowest value of all integer it will go to root position when decreaseItem(int index, int newVal) execution done. Afterwards extract the root invoking extractMin() method.
// Complexity: O(lg n)
public void deleteItem(int index) {
// Assign lowest value possible so that it will reach to root
decreaseItem(index, Integer.MIN_VALUE);
// Then extract min will remove that item from heap tree. correct ?
extractMin();
}
Full Code: BinaryHeap_Demo.java