Deletion in binary heap - data-structures

Deletion in binary heap - data-structures

I am just trying to learn binary heap and have a doubt regarding doing delete operation in binary heap.
I have read that we can delete an element from binary heap and we need to reheapify it.
But at the following link, it says unavailable:
http://en.wikibooks.org/wiki/Data_Structures/Tradeoffs
Binary Search AVL Tree Binary Heap (min) Binomial Queue (min)
Find O(log n) O(log n) unavailable unavailable
Delete element O(log n O(log n) unavailable unavailable
I am little confused about it.
Thanks in advance for all of the clarifications.

Binary heaps and other priority queue structures don't usually support a general "delete element" operation; you need an additional data structure that keeps track of each element's index in the heap, e.g. a hash table. If you have that, you can implement a general delete operation as
find-element, O(1) expected time with a hash table
decrease key to less than the minimum, O(lg n) time
delete-min and update the hash table, O(lg n) combined expected time.

A regular delete is possible, just like a DeleteMin/Max. The "problem" is that you have to check for both up- and downshifts (i.e.: when the "last" node takes up the vacant spot, it can be over- or underevaluated. Since it still can't be both, for obvious reasons, it's easy to check for correctness.
The only problem that remains is the Find. The answer above states that you can Find Element in O(lg n), but I wouldn't know how. In my implementations, I generally build a Heap of pointers-to-elements rather than elements (cheaper copying during up/downshifts). I add a "position" variable to the Element type, which keeps track of the index of the Element's pointer in the Heap. This way, given an element E, I can find it's position in the Heap in constant time.
Obviously, this isn't cut out for every implementation.

I am confused why delete operation of binary heap is mentioned as unavailable in the link of your question. Deletion in binary heap quite possible and it's composition of 2 other operations of binary heap.
https://en.wikipedia.org/wiki/Binary_heap
I am considering you know all other operations of Binary Heap
Deleting a key from binary heap requires 2 lines of code/operation. Suppose you want to delete item at index x. Decrease it's value to lowest integer possible. That's Integer.MIN_VALUE. Since it's lowest value of all integer it will go to root position when decreaseItem(int index, int newVal) execution done. Afterwards extract the root invoking extractMin() method.
// Complexity: O(lg n)
public void deleteItem(int index) {
// Assign lowest value possible so that it will reach to root
decreaseItem(index, Integer.MIN_VALUE);
// Then extract min will remove that item from heap tree. correct ?
extractMin();
}
Full Code: BinaryHeap_Demo.java

Related

Why does 'delete' operation(Not max or min) of heap take O(n)?

It is known that deleting operation on a heap take O(n)(NOTICE: not max or min value). I know that heap is not suitable for deleting or updating, but kinda curious.
I think that if I want to delete certain element, I think just percDown(element) and heapSize-- make it done....So I think it takes O(logn)?
Did I miss something?

I assume you're talking about the standard array-based complete binary tree implementation.
The short answer is that it doesn't need to. If you maintain a parallel map from objects in the heap to the indices where they're stored, then you can delete in O(log n) using sifting operations to restore the heap property to "fill the hole" created by the deletion.
Naive implementations require O(n) because they begin by searching the heap array. There's nothing about the heap property that helps make this more efficient than O(n).
If you're interested, here is an implementation that keeps a heap of indices into an object array (rather than keeping the objects in the heap nodes) and a reverse index - just an array of integers - that takes an index in the object array to its index in the heap array. This reverse map doesn't change the asymptotic run time of any of the other operations, but it provides O(log n) delete.

Tracking a node inside an heap

I have this problem - i'm keeping a data structure that contains two different heaps , a minimum heap and a maximum heap that does not contain the same data.
My goal is to keep some kind of record for each node location in either of the heaps and have it updated with the heaps action.
Bottom line - i'm trying to figure out how can i have a delete(p) function that works in lg(n) complexity. p being a pointer data object that can hold any data.
Thanks,
Ned.

If your heap is implemented as an array of items (references, say), then you can easily locate an arbitrary item in the heap in O(n) time. And once you know where the item is in the heap, you can delete it in O(log n) time. So find and remove is O(n + log n).
You can achieve O(log n) for removal if you pair the heap with a dictionary or hash map, as I describe in this answer.
Deleting an arbitrary item in O(log n) time is explained here.
The trick to the dictionary approach is that the dictionary contains a key (the item key) and a value that is the node's position in the heap. Whenever you move a node in the heap, you update that value in the dictionary. Insertion and removal are slightly slower in this case, because they require making up to log(n) dictionary updates. But those updates are O(1), so it's not hugely expensive.
Or, if your heap is implemented as a binary tree (with pointers, rather than the implicit structure in an array), then you can store a pointer to the node in the dictionary and not have to update it when you insert or remove from the heap.
That being said, the actual performance of add and delete min (or delete max for the max heap) in the paired data structure will be lower than with a standard heap that's implemented as an array, unless you're doing a lot of arbitrary deletes. If you're only deleting an arbitrary item every once in a while, especially if your heap is rather small, you're probably better off with the O(n) delete performance. It's simpler to implement and when n is small there's little real difference between O(n) and O(log n).

is there a way to find max in O(1) and do lookups in O(lgN)?

Say you have a lot of (key, value) objects to keep track of, with many insertions as well as deletions.
You need to satisfy 3 requirements:
get the maximum key in constant time at any point
look up the value of any key in logarithmic time.
insertions and deletes take logarithmic time.
Is there a data structure that can do this?
My thoughts:
priority queues can get max in constant time, but i can't lookup values.
binary search trees (2-3 trees) can lookup in logarithmic time, but max takes O(lgN) as well.
if i try to keep track of the max in a BST, it takes O(lgN) when I have to delete the max and find the second greatest.

Why we need those fancy data structs? I think a simple Binary Search Tree with tracking the Max node can serve OP's requirment well.
You can track the node with the max key:
whenever you insert a new node, you compare the key with the previous max key to decide if this is a new max node
whenever you delete the max node, it takes O(logN) to find the next max node
You certainly have O(logN) lookup time with the nature of BST
BST's update takes O(logN) time

You can just use two data structures in parallel-
Store the key/value pairs in a hash table or balanced BST to get O(log n) lookups, and
Store all the values in a max heap so that you can look up the max in O(1) time.
This makes insertion or deletion take O(log n) time, since that's the time complexity of inserting or deleting from the max heap.
Hope this helps!

Skip lists have an amortized O(logn) lookup, and they're a linked list so min and max is always O(1). http://en.wikipedia.org/wiki/Skip_list

I know a hash table has O(1) search time due to the fact that you use keys and you can instantaneously look up that value. As far as the max value, you may be able to constantly keep track of that every time that you insert or delete a value.

How about a list sorted in descending order?
Max is always first so O(1).
Look-up is O(log n) through binary search.
Insertion/Deletion is O(n) because you'll have to shift n-i items when inserting/deleting from position i.

Since your are using key value pairs a best solution i can suggest you is to use TreeMap in java.
You can simply use the following 4 methods present in the Treemap.
get() and put(key,value) methods for insert and retrieve
lastKey() for finding max key.
remove(key) for deletion.
.or
use a following structure as in this page
Final conclusion:
If you have are going to trade off space complexity and keen on running time you need to have 2 data structures.
Use a HashMap or TreeMap which has O(1) for insert,retrieval and remove.
Then as per the second link i provided use a two stack data structure to find the max or min of o(1).
I think this is the best possible solution i can give.

Take a look at RMQ (Range minimum-maximum Query) data structure or segment tree data structure. They both has a O(1) query time, BUT you will have to modify them somehow to store values also..
Here is nice article http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor

As the first comment says, use a max heap. Use a hashmap to store pointers into the heap. These are used for the constant time lookup and log time delete.
Heaps are very simple to implement. They don't require balancing like BST's. Hashmaps are usually built into your language of choice.

Deletion in a tree data structure is already an O(logN) operation, so looking for the second greatest key is not going to change the complexity of the operation.
Though, you can invalidate elements instead of deleting then, and if you keep back pointers inside your data structure, moving for the greatest to the second greatest could be an O(N) operation.

Can we use binary search tree to simulate heap operation?

I was wondering if we can use a binary search tree to simulate heap operations (insert, find minimum, delete minimum), i.e., use a BST for doing the same job?
Are there any kind of benefits for doing so?

Sure we can. but with a balanced BST.
The minimum is the leftest element. The maximum is the rightest element. finding those elements is O(logn) each, and can be cached on each insert/delete, after the data structure was modified [note there is room for optimizations here, but this naive approach also doesn't contradict complexity requirement!]
This way you get insert,delete: O(logn), findMin/findMax: O(1)
EDIT:
The only advantage I can think of in this implementtion is that you get both findMin,findMax in one data structure.
However, this solution will be much slower [more ops per step, more cache misses are expected...] and consume more space then the regular array-based implementation of a heap.

Yes, but you lose the O(1) average insert of the heap
As others mentioned, you can use a BST to simulate a heap.
However this has one major downside: you lose the O(1) insert average time, which is basically the only reason to use the heap in the first place: https://stackoverflow.com/a/29548834/895245
If you want to track both min and max on a heap, I recommend that you do it with two heaps instead of a BST to keep the O(1) insert advantage.

Yes, we can, by simply inserting and finding the minimum into the BST. There are few benefits, however, since a lookup will take O(log n) time and other functions receive similar penalties due to the stricter ordering enforced throughout the tree.

Basically, I agree with #amit answer. I will elaborate more on the implementation of this modified BST.
Heap can do findMin or findMax in O(1) but not both in the same data structure. With a slight modification, the BST can do both findMin and findMax in O(1).
In this modified BST, you keep track of the the min node and max node every time you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)
In this BST (specifically Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). Thanks to #JimMischel comment below however we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)
In my opinion, generally Heap can be replaced by Balanced BST because BST perform better in almost all of the heap data structure can do. However, I am not sure if Heap should be considered as an obsolete data structure. (What do you think?)
PS: Have to cross reference to different questions: https://stackoverflow.com/a/27074221/764592

Data structures question

This question is from an exam I had, and I couldn't solve it and wanted to see what the answer is (this is not homework, as it will not help me in anything but knowledge).
We need to create a data structure for containing elements whose keys are real numbers.
The data structure should have these functions:
Build(S, array): Builds the data structure S with n elements in O(n)
Insert(S, k) and Delete(S, x) in O(lgn) (k is an element, x is a pointer to it in the data structure)
Delete-Minimal-Positive(S): Remove the element with the minimal positive key
Mode(S): returns the key that is most frequent in S in O(1)
Now, building in O(n) usually means a heap should be used, but that does not allow to find frequencies. I couldn't find any way to do this so. Best I could come up with is building a Red-Black-Tree (O(nlgn)) that will be used to build a frequency heap.
I'm dying to know the answer...
Thanks!

Using just the comparison model, there is no solution to this problem.
The Element Distinctness Problem has provable Omega(nlogn) lower bounds. This (element distinctness) problem is basically the problem of determining if all the elements of an array are distinct.
If there was a solution to your problem, then we could answer the element distinctness problem in O(n) time (find the most frequent element in O(n) time, and see if there are more than one instances of that element, again in O(n) time).
So, I suggest you ask your professor for the computational model.

Well, you can use a hash table to calculate the number of occurrences of distinct real numbers in O(1) amortized time, and then use a standard heap where the items are pairs (real number, number of occurrences) and the heap is sorted according to the number of occurrences field.
When you insert a key or delete a key, you increment or decrement the number of occurrences field by one, or in the extreme cases add or remove a heap element. In both cases you need to percolate up / down because the ordering field has changed.
Assuming the hash table is O(1) operation, you have a standard heap + O(1) hash table and you get all the operations above within the time limits. In particular, you get the "mode" by reading the root element of the heap.

I think the following solution will be acceptable. It based on two data stuctures:
Red-black tree
Binary heap
Binary heap holds tuple, that contain (element value, frequence of element), heap is builded on frequencies, so it's give us ability to find mode in O(1).
Red black tree contains a tuple that hold (element value, pointer to same element value in heap)
When you need to insert new element, you will try to find element(it takes O(log n)), if search was succeful, than go to the pointer from element founded in RB-tree, increase frequence, and rebuild heap(also O(log n)). If search didn't find such element than insert it into RB-tree(O(log n)) and to heap with value = (element, 1) (also O(1)), set a pointer in RB-tree to new element in heap.
Initial building will take O(n), because building both structures from set of n element takes O(n).
Sorry, if I am miss something.

For frequencies:
Each entry is bi-directionaly linked to own frequencies/counters (use hash table)
Frequencies are in linked list.
There is need to move frequency up/down over linked list,(deleting inserting element) but for max difference of 1.
Frequencies are thus linked to pointer of +1 and -1 frequency element.
(there are exceptions but can be handled)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Deletion in binary heap - data-structures

Related

Why does 'delete' operation(Not max or min) of heap take O(n)?

Tracking a node inside an heap

is there a way to find max in O(1) and do lookups in O(lgN)?

Can we use binary search tree to simulate heap operation?

Data structures question

Categories

Resources