i need to design a data structure that supports the following operations:
Group S which has n numbers:
Init(A,n) - init ds given n elements = O(n)
insert(x) - insert element x = O(logn)
find_min() - return minimum element without taking it out from the structure = O(1)
find_max() - return maximum element without taking it out from the structure = O(1)
extract_min() - delete minimum element = O(logn)
extract_max() - delete maximum element = O(logn)
Any ideas or suggestions on how to approach this problem or how should it be implemented would be kindly appreciated
That would be a Min-max heap. Here is a nice article about it by Malte Skarupke with a C++ implementation. Here is a Java implementation.
This is a heap data structure.
More specifically a binary heap.
There are 2 types of heaps basically, a min heap in which minimum value element is present at the root node and a max heap in which maximum value element is present at the root.
So, according to the constraints given, the data structure might be composite, consisting of both min heap and a max heap.
Since you need both min and max in O(1) time complexity, you will have to keep both heaps.
Also, init process that takes O(n) time complexity, this process is called heapify, or building heap from an existing array
Related
The question is :
find median of a large data(n numbers) in fixed size (k) of numbers
What i thought is :
maintain 2 heaps , maximum heap for numbers less than current median and minimum heap for numbers greater than current median .
The main concept is to FIND the 1st element of previous set in one of the heap (depending on it is < or > current median), and replace it with the new element we encounter .
Now modify such as to make |size(heap1) - size(heap2)| = 1 or 0 because median is avg. of top element if size1 != size2 else the top element of the heap with size > size of other .
The problem i am facing is the time complexity increases because finding the element takes O(n) time so total O(n*k), so i am not able to achieve the desired complexity O(n*logk) (as was required in source of the question).
How should it be reduced , without using extra space ?
edit : input : 1 4 3 5 6 2 , k=4
median :
from 1 4 3 5 = (4+3)/2
from 4 3 5 6 = (4+5)/2
from 3 5 6 2= (3+5)/2
You can solve this problem using an order-statistic tree, which is a BST with some additional information that allows finding medians, quantiles and other order statistics in O(log n) time in a tree with n elements.
First, construct an OST with the first k elements. Then, in a loop:
Find and report the median value.
Remove the first element that was inserted into the tree (you can find out which element this was in the array).
Insert the next element from the array.
Each of these steps takes O(log k) if the tree is self-balancing, because we maintain the invariant that the tree never grows beyond size k, which also gives O(k) auxiliary space. The preprocessing takes O(k log k) time while the loop repeats n + 1 - k times for a total time of O(n log k).
If you can find a balanced tree implementation that gives you efficient access to the central element you should probably use it. You could also do this with heaps much as you suggest, as long as you keep an extra array of length k which tells you where each element in the window lives in its heap, and which heap it is in. You will have to modify the code that maintains the heap to update this array when it moves things around, but heap code is a lot easier to write and a lot smaller than balanced tree code. Then you don't need to search through all the heap to remove the item which has just gone off the edge of the window and the cost is down to n log k.
This problem seems like the one which you have in efficient implementation in dijkstra shortest path where we need to delete(update in case of dijkstra) an element which is not at top of heap.But you can use the same work around to this problem but with extra space complexity. Firstly you cannot use inbuilt heap library, create your own heap data structure but maintain pointers to each element in the heap so while adding or removing element update the pointers to each element. So after calculating median of first k elements delete the first element directly from heap (min or max) according to whether it is greater or less than median using pointers and then use heapify at that position. Then sizes of heaps change and then you can get new median using same logic as you are using for adjusting the heap sizes and calculating median.
Heapify takes O(logk) hence your total cost will be O(n*logk) but will need O(n) more space for pointers.
The interface provided by the book "Introduction to Algorithm" of decreasing key in binomial heap is:
BINOMIAL-HEAP-DECREASE-KEY (H,x,k), where H is the pointer to the first root of the tree, x is the "index" of the node, whose key is to be decreased to k. And the time complexity is O(logn)
However, we usually use linked list to implement the binomial heap, where there is no direct access to x without performing a search, which in general is O(n).
One way to solve this problem is to keep a pointer for each node in the binomial heap, which then make the direct access to every node in O(1) but the space complexity is then O(n).
Does anybody know better solutions for this? Thanks!
A previous discussion can be found here.
If we store the heap in an array we can easily do that.
If root element is at index i, left child is at 2i and right child is at 2i+1.
In an array we store the elements from index 1.
If we store the heap elements like this, we can directly decrement the element at index x and heapify from x.
This way for heapify we take o(logn) time. For decrementing it is constant.
I would like to implement a double-ended priority queue with the following constraints:
needs to be implemented in a fixed size array..say 100 elements..if new elements need to be added after the array is full, the oldest needs to be removed
need maximum and minimum in O(1)
if possible insert in O(1)
if possible remove minimum in O(1)
clear to empty/init state in O(1) if possible
count of number of elements in array at the moment in O(1)
I would like O(1) for all the above 5 operations but its not possible to have O(1) on all of them in the same implementation. Atleast O(1) on 3 operations and O(log(n)) on the other 2 operations should suffice.
Will appreciate if any pointers can be provided to such an implementation.
There are many specialized data structures for this. One simple data structure is the min-max heap, which is implemented as a binary heap where the layers alternate between "min layers" (each node is less than or equal to its descendants) and "max layers" (each node is greater than or equal to its descendants.) The minimum and maximum can be found in time O(1), and, as in a standard binary heap, enqueues and dequeues can be done in time O(log n) time each.
You can also use the interval heap data structure, which is another specialized priority queue for the task.
Alternatively, you can use two priority queues - one storing elements in ascending order and one in descending order. Whenever you insert a value, you can then insert elements into both priority queues and have each store a pointer to the other. Then, whenever you dequeue the min or max, you can remove the corresponding element from the other heap.
As yet another option, you could use a balanced binary search tree to store the elements. The minimum and maximum can then be found in time O(log n) (or O(1) if you cache the results) and insertions and deletions can be done in time O(log n). If you're using C++, you can just use std::map for this and then use begin() and rbegin() to get the minimum and maximum values, respectively.
Hope this helps!
A binary heap will give you insert and remove minimum in O(log n) and the others in O(1).
The only tricky part is removing the oldest element once the array is full. For this, keep another array:
time[i] = at what position in the heap array is the element
added at time i + 100 * k.
Every 100 iterations, you increment k.
Then, when the array fills up for the first time, you remove heap[ time[0] ], when it fills up for the second time you remove heap[ time[1] ], ..., when it fills up for the 100th time, you wrap around and remove heap[ time[0] ] again etc. When it fills up for the kth time, you remove heap[ time[k % 100] ] (100 is your array size).
Make sure to also update the time array when you insert and remove elements.
Removal of an arbitrary element can be done in O(log n) if you know its position: just swap it with the last element in your heap array, and sift down the element you have swapped in.
If you absolutely need max and min to be O(1) then what you can do is create a linked list, where you constantly keep track of min, max, and size, and then link all the nodes to some sort of tree structure, probably a heap. Min, max, and size would all be constant, and since finding any node would be in O(log n), insert and remove are log n each. Clearing would be trivial.
If your queue is a fixed size, then O-notation is meaningless. Any O(log n) or even O(n) operation is essentially O(1) because n is fixed, so what you really want is an algorithm that's fast for the given dataset. Probably two parallel traditional heap priority queues would be fine (one for high, one for low).
If you know more about what kind of data you have, you might be able to make something more special-purpose.
I want to implement data structure with operations both pertinent to arrays - i.e. indexing, and linked list - quick access to prev/next item. Resembles sparse array, but the memory is not a concern - the concern is time complexity.
Requirements:
key is integer with a limited range 1..N - you can afford to allocate an array for it (i.e. memory is not a concern)
Operations:
insert(key, data) - O(1)
find(key) - O(1) - returns the "node" with data
delete(node) - O(1)
next(node) - O(1) - find next occupied node, in the ordering given by key
prev(node) - O(1)
I was thinking of implementation in an array with pointers to the next/prev occupied item, but I have problems in the insert operation - how do I find the prev and next items, i.e. the place in the double linked list where to put the new item - I don't know how to make this O(1).
If this is not possible please provide a proof.
You can do this with a Van Emde Boas tree.
The tree supports the operations you require:
Insert: insert a key/value pair with an m-bit key
Delete: remove the key/value pair with a given key
Lookup: find the value associated with a given key
FindNext: find the key/value pair with the smallest key at least a given k
FindPrevious: find the key/value pair with the largest key at most a given k
And the time complexity is O(logm) where m is the number of bits in the keys.
For example if all your keys are 16 bit integers between 0 and 65535, m would be 16.
EDIT
If the keys are in the range 1..N, the complexity is O(loglogN) for each of these operations.
The tree also supports min and max operations, which would have complexity O(1).
Insert O(loglogN)
Find O(loglogN)
Delete O(loglogN)
Next O(loglogN)
Prev O(loglogN)
Max O(1)
Min O(1)
DETAILS
This tree works by using a large array of children trees.
For example, suppose we had 16 bit keys. The first layer of the tree would store an array of 2^8 (=256) children trees. The first child is responsible for keys from 0 to 255, the second for keys 256,257,..,511, etc.
This makes it very easy to lookup to see whether a node is present as we can simply go straight to the corresponding array element.
However, by itself this would make finding the next element hard as we might need to search 256 children trees to find a nonzero element.
The Van Emde Boas tree contains two additions that make it easy to find the next element:
A min and max is stored for each tree so it is O(1) work to see whether we have reached our limits
An auxiliary tree is used to store the indexes of non-zero children. This auxiliary tree is a Van Emde Boas tree of size the square root of the original size.
I have this problem - i'm keeping a data structure that contains two different heaps , a minimum heap and a maximum heap that does not contain the same data.
My goal is to keep some kind of record for each node location in either of the heaps and have it updated with the heaps action.
Bottom line - i'm trying to figure out how can i have a delete(p) function that works in lg(n) complexity. p being a pointer data object that can hold any data.
Thanks,
Ned.
If your heap is implemented as an array of items (references, say), then you can easily locate an arbitrary item in the heap in O(n) time. And once you know where the item is in the heap, you can delete it in O(log n) time. So find and remove is O(n + log n).
You can achieve O(log n) for removal if you pair the heap with a dictionary or hash map, as I describe in this answer.
Deleting an arbitrary item in O(log n) time is explained here.
The trick to the dictionary approach is that the dictionary contains a key (the item key) and a value that is the node's position in the heap. Whenever you move a node in the heap, you update that value in the dictionary. Insertion and removal are slightly slower in this case, because they require making up to log(n) dictionary updates. But those updates are O(1), so it's not hugely expensive.
Or, if your heap is implemented as a binary tree (with pointers, rather than the implicit structure in an array), then you can store a pointer to the node in the dictionary and not have to update it when you insert or remove from the heap.
That being said, the actual performance of add and delete min (or delete max for the max heap) in the paired data structure will be lower than with a standard heap that's implemented as an array, unless you're doing a lot of arbitrary deletes. If you're only deleting an arbitrary item every once in a while, especially if your heap is rather small, you're probably better off with the O(n) delete performance. It's simpler to implement and when n is small there's little real difference between O(n) and O(log n).