Tracking a node inside an heap - algorithm

I have this problem - i'm keeping a data structure that contains two different heaps , a minimum heap and a maximum heap that does not contain the same data.
My goal is to keep some kind of record for each node location in either of the heaps and have it updated with the heaps action.
Bottom line - i'm trying to figure out how can i have a delete(p) function that works in lg(n) complexity. p being a pointer data object that can hold any data.
Thanks,
Ned.

If your heap is implemented as an array of items (references, say), then you can easily locate an arbitrary item in the heap in O(n) time. And once you know where the item is in the heap, you can delete it in O(log n) time. So find and remove is O(n + log n).
You can achieve O(log n) for removal if you pair the heap with a dictionary or hash map, as I describe in this answer.
Deleting an arbitrary item in O(log n) time is explained here.
The trick to the dictionary approach is that the dictionary contains a key (the item key) and a value that is the node's position in the heap. Whenever you move a node in the heap, you update that value in the dictionary. Insertion and removal are slightly slower in this case, because they require making up to log(n) dictionary updates. But those updates are O(1), so it's not hugely expensive.
Or, if your heap is implemented as a binary tree (with pointers, rather than the implicit structure in an array), then you can store a pointer to the node in the dictionary and not have to update it when you insert or remove from the heap.
That being said, the actual performance of add and delete min (or delete max for the max heap) in the paired data structure will be lower than with a standard heap that's implemented as an array, unless you're doing a lot of arbitrary deletes. If you're only deleting an arbitrary item every once in a while, especially if your heap is rather small, you're probably better off with the O(n) delete performance. It's simpler to implement and when n is small there's little real difference between O(n) and O(log n).

Related

Why does 'delete' operation(Not max or min) of heap take O(n)?

It is known that deleting operation on a heap take O(n)(NOTICE: not max or min value). I know that heap is not suitable for deleting or updating, but kinda curious.
I think that if I want to delete certain element, I think just percDown(element) and heapSize-- make it done....So I think it takes O(logn)?
Did I miss something?
I assume you're talking about the standard array-based complete binary tree implementation.
The short answer is that it doesn't need to. If you maintain a parallel map from objects in the heap to the indices where they're stored, then you can delete in O(log n) using sifting operations to restore the heap property to "fill the hole" created by the deletion.
Naive implementations require O(n) because they begin by searching the heap array. There's nothing about the heap property that helps make this more efficient than O(n).
If you're interested, here is an implementation that keeps a heap of indices into an object array (rather than keeping the objects in the heap nodes) and a reverse index - just an array of integers - that takes an index in the object array to its index in the heap array. This reverse map doesn't change the asymptotic run time of any of the other operations, but it provides O(log n) delete.

Data structure design with O(log n) amortized time?

I'm working on this problem but I'm pretty confused on how to solve it:
Design a data structure that supports the following operations in amortized O(log n) time, where n is the total number of elements:
Ins(k): Insert a new element with key k
Extract-Max: Find and remove the element with largest key
Extract-Min: Find and remove the element with smallest key
Union: Merge two different sets of elements
How do I calculate the amortized time? Isn't this already something like a hash table? Or is it a variant of it?
I would really appreciate if someone can help me with this.
Thank you!!
What you're proposing isn't something that most hash tables are equipped to deal with because hash tables don't usually support finding the min and max elements quickly while supporting deletions.
However, this is something that you could do with a pair of priority queues that support melding. For example, suppose that you back your data structure with two binomial heaps - a min-heap and a max-heap. Every time you insert an element into your data structure, you add it to both the min-heap and the max-heap. However, you slightly modify the two heaps so that each element in the heap stores a pointer to its corresponding element in the other heap; that way, given a node in the min-heap, you can find the corresponding node in the max-heap and vice-versa.
Now, to do an extract-min or extract-max, you just apply a find-min operation to the min-heap or a find-max operation to the max-heap to get the result. Then, delete that element from both heaps using the normal binomial heap delete operation. (You can use the pointer you set up during the insert step to quickly locate the sibling element in the other heap).
Finally, for a union operation, just apply the normal binomial heap merge operation to the corresponding min-heaps and max-heaps.
Since all of the described operations requires O(1) operations on binomial heaps, each of them runs in time O(log n) worst-case, with no amortization needed.
Generally speaking, the data structure you're describing is called a double-ended priority queue. There are a couple of specialized data structures you can use to meet those requirements, though the one described above is probably the easiest to build with off-the-shelf components.

Advantages of heaps over binary trees in the Dijkstra algorithm

One standard implementation of the Dijkstra algorithm uses a heap to store distances from the starting node S to all unexplored nodes. The argument for using a heap is that we can efficiently pop the minimum distance from it, in O(log n). However, to maintain the invariant of the algorithm, one also needs to update some of the distances in the heap. This involves:
popping non-min elements from the heaps
computing the updated distances
inserting them back into the heap
I understand that popping non-min elements from a heap can be done in O(log n) if one knows the location of that element in the heap. However, I fail to understand how one can know this location in the case of the Dijkstra algorithm. It sounds like a binary search tree would be more appropriate.
More generally, my understanding is that the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element. Is my understanding correct?
However, I fail to understand how one can know this location in the case of the Dijkstra algorithm.
You need an additional array that keeps track of where in the heap the elements live, or an extra data member inside the heap's elements. This has to be updated after each heap operation.
the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element
Even a BST can be amended to keep a pointer to the min element in addition to the root pointer, giving O(1) access to the min (effectively amortizing the O(lg n) work over the other operations).
The only advantage of heaps in terms of worst-case complexity is the "heapify" algorithm, which turns an array into a heap by reshuffling its elements in-place, in linear time. For Dijkstra's, this doesn't matter, since it's going to do n heap operations of O(lg n) cost apiece anyway.
The real reason for heaps, then, is constants. A properly implemented heap is just a contiguous array of elements, while a BST is a pointer structure. Even when a BST is implemented inside an array (which can be done if the number of elements is known from the start, as in Dijkstra's), the pointers take up more memory, and navigating them takes more time than the integer operations that are used to navigate a heap.

Can we use binary search tree to simulate heap operation?

I was wondering if we can use a binary search tree to simulate heap operations (insert, find minimum, delete minimum), i.e., use a BST for doing the same job?
Are there any kind of benefits for doing so?
Sure we can. but with a balanced BST.
The minimum is the leftest element. The maximum is the rightest element. finding those elements is O(logn) each, and can be cached on each insert/delete, after the data structure was modified [note there is room for optimizations here, but this naive approach also doesn't contradict complexity requirement!]
This way you get insert,delete: O(logn), findMin/findMax: O(1)
EDIT:
The only advantage I can think of in this implementtion is that you get both findMin,findMax in one data structure.
However, this solution will be much slower [more ops per step, more cache misses are expected...] and consume more space then the regular array-based implementation of a heap.
Yes, but you lose the O(1) average insert of the heap
As others mentioned, you can use a BST to simulate a heap.
However this has one major downside: you lose the O(1) insert average time, which is basically the only reason to use the heap in the first place: https://stackoverflow.com/a/29548834/895245
If you want to track both min and max on a heap, I recommend that you do it with two heaps instead of a BST to keep the O(1) insert advantage.
Yes, we can, by simply inserting and finding the minimum into the BST. There are few benefits, however, since a lookup will take O(log n) time and other functions receive similar penalties due to the stricter ordering enforced throughout the tree.
Basically, I agree with #amit answer. I will elaborate more on the implementation of this modified BST.
Heap can do findMin or findMax in O(1) but not both in the same data structure. With a slight modification, the BST can do both findMin and findMax in O(1).
In this modified BST, you keep track of the the min node and max node every time you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)
In this BST (specifically Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). Thanks to #JimMischel comment below however we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)
In my opinion, generally Heap can be replaced by Balanced BST because BST perform better in almost all of the heap data structure can do. However, I am not sure if Heap should be considered as an obsolete data structure. (What do you think?)
PS: Have to cross reference to different questions: https://stackoverflow.com/a/27074221/764592

Deletion in binary heap

I am just trying to learn binary heap and have a doubt regarding doing delete operation in binary heap.
I have read that we can delete an element from binary heap and we need to reheapify it.
But at the following link, it says unavailable:
http://en.wikibooks.org/wiki/Data_Structures/Tradeoffs
Binary Search AVL Tree Binary Heap (min) Binomial Queue (min)
Find O(log n) O(log n) unavailable unavailable
Delete element O(log n O(log n) unavailable unavailable
I am little confused about it.
Thanks in advance for all of the clarifications.
Binary heaps and other priority queue structures don't usually support a general "delete element" operation; you need an additional data structure that keeps track of each element's index in the heap, e.g. a hash table. If you have that, you can implement a general delete operation as
find-element, O(1) expected time with a hash table
decrease key to less than the minimum, O(lg n) time
delete-min and update the hash table, O(lg n) combined expected time.
A regular delete is possible, just like a DeleteMin/Max. The "problem" is that you have to check for both up- and downshifts (i.e.: when the "last" node takes up the vacant spot, it can be over- or underevaluated. Since it still can't be both, for obvious reasons, it's easy to check for correctness.
The only problem that remains is the Find. The answer above states that you can Find Element in O(lg n), but I wouldn't know how. In my implementations, I generally build a Heap of pointers-to-elements rather than elements (cheaper copying during up/downshifts). I add a "position" variable to the Element type, which keeps track of the index of the Element's pointer in the Heap. This way, given an element E, I can find it's position in the Heap in constant time.
Obviously, this isn't cut out for every implementation.
I am confused why delete operation of binary heap is mentioned as unavailable in the link of your question. Deletion in binary heap quite possible and it's composition of 2 other operations of binary heap.
https://en.wikipedia.org/wiki/Binary_heap
I am considering you know all other operations of Binary Heap
Deleting a key from binary heap requires 2 lines of code/operation. Suppose you want to delete item at index x. Decrease it's value to lowest integer possible. That's Integer.MIN_VALUE. Since it's lowest value of all integer it will go to root position when decreaseItem(int index, int newVal) execution done. Afterwards extract the root invoking extractMin() method.
// Complexity: O(lg n)
public void deleteItem(int index) {
// Assign lowest value possible so that it will reach to root
decreaseItem(index, Integer.MIN_VALUE);
// Then extract min will remove that item from heap tree. correct ?
extractMin();
}
Full Code: BinaryHeap_Demo.java

Resources