Data structure design with O(log n) amortized time? - algorithm

I'm working on this problem but I'm pretty confused on how to solve it:
Design a data structure that supports the following operations in amortized O(log n) time, where n is the total number of elements:
Ins(k): Insert a new element with key k
Extract-Max: Find and remove the element with largest key
Extract-Min: Find and remove the element with smallest key
Union: Merge two different sets of elements
How do I calculate the amortized time? Isn't this already something like a hash table? Or is it a variant of it?
I would really appreciate if someone can help me with this.
Thank you!!

What you're proposing isn't something that most hash tables are equipped to deal with because hash tables don't usually support finding the min and max elements quickly while supporting deletions.
However, this is something that you could do with a pair of priority queues that support melding. For example, suppose that you back your data structure with two binomial heaps - a min-heap and a max-heap. Every time you insert an element into your data structure, you add it to both the min-heap and the max-heap. However, you slightly modify the two heaps so that each element in the heap stores a pointer to its corresponding element in the other heap; that way, given a node in the min-heap, you can find the corresponding node in the max-heap and vice-versa.
Now, to do an extract-min or extract-max, you just apply a find-min operation to the min-heap or a find-max operation to the max-heap to get the result. Then, delete that element from both heaps using the normal binomial heap delete operation. (You can use the pointer you set up during the insert step to quickly locate the sibling element in the other heap).
Finally, for a union operation, just apply the normal binomial heap merge operation to the corresponding min-heaps and max-heaps.
Since all of the described operations requires O(1) operations on binomial heaps, each of them runs in time O(log n) worst-case, with no amortization needed.
Generally speaking, the data structure you're describing is called a double-ended priority queue. There are a couple of specialized data structures you can use to meet those requirements, though the one described above is probably the easiest to build with off-the-shelf components.

Related

Delete and Increase key for Binomial heap

I currently studying the binomial heap right now.
I learned that following operations for the binomial heaps can be completed in Theta(log n) time.:
Get-max
Insert
Extract Max
Merge
Increase-Key
Delete
But, the two operations Increase key and Delete operations said they need the pointer to the element that need to be complete in Theta(log n).
Here is 3 questions I want to ask:
Is this because if Increase key and Delete don't have the pointer to element, they have to search the elements before the operations took place?
what is the time complexity for the searching operations for the binomial heap? (I believe O(n))
If the pointer to the element is not given for Increase key and Delete operations, those two operations will take O(n) time or it can be lower than that.
It’s good that you’re thinking about this!
Yes, that’s exactly right. The nodes in a binomial heap are organized in a way that makes it very quick to find the minimum value, but the relative ordering of the remaining elements is not guaranteed to be in an order that makes it easy to find things.
There isn’t a general way to search a binomial heap for an element faster than O(n). Or, stated differently, the worst-case cost of any way of searching a binomial heap is Ω(n). Here’s one way to see this. Form a binomial heap where n-1 items have priority 137 and one item has priority 42. The item with priority 42 must be a leaf node. There are (roughly) n/2 leaves in the heap, and since there is no ordering on them to find that one item you’d have to potentially look at all the leaves. To formalize this, you could form multiple different binomial heaps with these items, and whatever algorithm was looking for the item of priority 42 would necessarily have to find it in the last place it looks at least once.
For the reasons given above, no, there’s no way to implement those operations quickly without having pointers to them, since in the worst case you have to search everywhere.

Is there a data structure representing an ordered list with O(n*log n) time on main operations?

I am looking for a data structure that allows a specific problem to be solved in O(n*log(n)) complexity. It needs to represent a set of integers, in which I can do the following operations : 
- add an element
- check if an element exists in the set
- delete every value bigger than a given integer
Hopefully with logarithmic complexity.
I looked for linked list since adding an element in the middle and deleting a whole part of the structure is easy, but I don't know how to keep an ordered list or implement a dichotomic search. At first I was considering hash tables but I don't know how to filter the set. I'm looking at balanced binary trees and I do not know if I am looking for something delusional or if it exists somehow and I just can't find it.
For implementing from scratch, I would suggest a Treap.
A Treap is just a binary search tree where every node is given a random priority, and it satisfies the heap condition as a tree. This randomized data structure makes the expected time to find, insert, delete and split the tree be O(log(n)). The first three are fairly straightforward. To split, you just put a node in at the point to split with higher priority than the root. Then one half winds up on one side of that node, and the other half on the other.
Please note, while splitting is O(log(n)), freeing up the deleted bits is O(n).
Please note that you may not have to implement anything. For example in C++ you can just use an std::map. The performance of those operations except the delete are O(log(n)). While deleting a range of length m from a structure of size n is O(m + log(n)). If you consider the comment about freeing memory, that's about ideal.

Tracking a node inside an heap

I have this problem - i'm keeping a data structure that contains two different heaps , a minimum heap and a maximum heap that does not contain the same data.
My goal is to keep some kind of record for each node location in either of the heaps and have it updated with the heaps action.
Bottom line - i'm trying to figure out how can i have a delete(p) function that works in lg(n) complexity. p being a pointer data object that can hold any data.
Thanks,
Ned.
If your heap is implemented as an array of items (references, say), then you can easily locate an arbitrary item in the heap in O(n) time. And once you know where the item is in the heap, you can delete it in O(log n) time. So find and remove is O(n + log n).
You can achieve O(log n) for removal if you pair the heap with a dictionary or hash map, as I describe in this answer.
Deleting an arbitrary item in O(log n) time is explained here.
The trick to the dictionary approach is that the dictionary contains a key (the item key) and a value that is the node's position in the heap. Whenever you move a node in the heap, you update that value in the dictionary. Insertion and removal are slightly slower in this case, because they require making up to log(n) dictionary updates. But those updates are O(1), so it's not hugely expensive.
Or, if your heap is implemented as a binary tree (with pointers, rather than the implicit structure in an array), then you can store a pointer to the node in the dictionary and not have to update it when you insert or remove from the heap.
That being said, the actual performance of add and delete min (or delete max for the max heap) in the paired data structure will be lower than with a standard heap that's implemented as an array, unless you're doing a lot of arbitrary deletes. If you're only deleting an arbitrary item every once in a while, especially if your heap is rather small, you're probably better off with the O(n) delete performance. It's simpler to implement and when n is small there's little real difference between O(n) and O(log n).

is there a way to find max in O(1) and do lookups in O(lgN)?

Say you have a lot of (key, value) objects to keep track of, with many insertions as well as deletions.
You need to satisfy 3 requirements:
get the maximum key in constant time at any point
look up the value of any key in logarithmic time.
insertions and deletes take logarithmic time.
Is there a data structure that can do this?
My thoughts:
priority queues can get max in constant time, but i can't lookup values.
binary search trees (2-3 trees) can lookup in logarithmic time, but max takes O(lgN) as well.
if i try to keep track of the max in a BST, it takes O(lgN) when I have to delete the max and find the second greatest.
Why we need those fancy data structs? I think a simple Binary Search Tree with tracking the Max node can serve OP's requirment well.
You can track the node with the max key:
whenever you insert a new node, you compare the key with the previous max key to decide if this is a new max node
whenever you delete the max node, it takes O(logN) to find the next max node
You certainly have O(logN) lookup time with the nature of BST
BST's update takes O(logN) time
You can just use two data structures in parallel-
Store the key/value pairs in a hash table or balanced BST to get O(log n) lookups, and
Store all the values in a max heap so that you can look up the max in O(1) time.
This makes insertion or deletion take O(log n) time, since that's the time complexity of inserting or deleting from the max heap.
Hope this helps!
Skip lists have an amortized O(logn) lookup, and they're a linked list so min and max is always O(1). http://en.wikipedia.org/wiki/Skip_list
I know a hash table has O(1) search time due to the fact that you use keys and you can instantaneously look up that value. As far as the max value, you may be able to constantly keep track of that every time that you insert or delete a value.
How about a list sorted in descending order?
Max is always first so O(1).
Look-up is O(log n) through binary search.
Insertion/Deletion is O(n) because you'll have to shift n-i items when inserting/deleting from position i.
Since your are using key value pairs a best solution i can suggest you is to use TreeMap in java.
You can simply use the following 4 methods present in the Treemap.
get() and put(key,value) methods for insert and retrieve
lastKey() for finding max key.
remove(key) for deletion.
.or
use a following structure as in this page
Final conclusion:
If you have are going to trade off space complexity and keen on running time you need to have 2 data structures.
Use a HashMap or TreeMap which has O(1) for insert,retrieval and remove.
Then as per the second link i provided use a two stack data structure to find the max or min of o(1).
I think this is the best possible solution i can give.
Take a look at RMQ (Range minimum-maximum Query) data structure or segment tree data structure. They both has a O(1) query time, BUT you will have to modify them somehow to store values also..
Here is nice article http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor
As the first comment says, use a max heap. Use a hashmap to store pointers into the heap. These are used for the constant time lookup and log time delete.
Heaps are very simple to implement. They don't require balancing like BST's. Hashmaps are usually built into your language of choice.
Deletion in a tree data structure is already an O(logN) operation, so looking for the second greatest key is not going to change the complexity of the operation.
Though, you can invalidate elements instead of deleting then, and if you keep back pointers inside your data structure, moving for the greatest to the second greatest could be an O(N) operation.

Can we use binary search tree to simulate heap operation?

I was wondering if we can use a binary search tree to simulate heap operations (insert, find minimum, delete minimum), i.e., use a BST for doing the same job?
Are there any kind of benefits for doing so?
Sure we can. but with a balanced BST.
The minimum is the leftest element. The maximum is the rightest element. finding those elements is O(logn) each, and can be cached on each insert/delete, after the data structure was modified [note there is room for optimizations here, but this naive approach also doesn't contradict complexity requirement!]
This way you get insert,delete: O(logn), findMin/findMax: O(1)
EDIT:
The only advantage I can think of in this implementtion is that you get both findMin,findMax in one data structure.
However, this solution will be much slower [more ops per step, more cache misses are expected...] and consume more space then the regular array-based implementation of a heap.
Yes, but you lose the O(1) average insert of the heap
As others mentioned, you can use a BST to simulate a heap.
However this has one major downside: you lose the O(1) insert average time, which is basically the only reason to use the heap in the first place: https://stackoverflow.com/a/29548834/895245
If you want to track both min and max on a heap, I recommend that you do it with two heaps instead of a BST to keep the O(1) insert advantage.
Yes, we can, by simply inserting and finding the minimum into the BST. There are few benefits, however, since a lookup will take O(log n) time and other functions receive similar penalties due to the stricter ordering enforced throughout the tree.
Basically, I agree with #amit answer. I will elaborate more on the implementation of this modified BST.
Heap can do findMin or findMax in O(1) but not both in the same data structure. With a slight modification, the BST can do both findMin and findMax in O(1).
In this modified BST, you keep track of the the min node and max node every time you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)
In this BST (specifically Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). Thanks to #JimMischel comment below however we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)
In my opinion, generally Heap can be replaced by Balanced BST because BST perform better in almost all of the heap data structure can do. However, I am not sure if Heap should be considered as an obsolete data structure. (What do you think?)
PS: Have to cross reference to different questions: https://stackoverflow.com/a/27074221/764592

Resources