Sparse array with O(1) indexing and O(1) prev and next - algorithm

I want to implement data structure with operations both pertinent to arrays - i.e. indexing, and linked list - quick access to prev/next item. Resembles sparse array, but the memory is not a concern - the concern is time complexity.
Requirements:
key is integer with a limited range 1..N - you can afford to allocate an array for it (i.e. memory is not a concern)
Operations:
insert(key, data) - O(1)
find(key) - O(1) - returns the "node" with data
delete(node) - O(1)
next(node) - O(1) - find next occupied node, in the ordering given by key
prev(node) - O(1)
I was thinking of implementation in an array with pointers to the next/prev occupied item, but I have problems in the insert operation - how do I find the prev and next items, i.e. the place in the double linked list where to put the new item - I don't know how to make this O(1).
If this is not possible please provide a proof.

You can do this with a Van Emde Boas tree.
The tree supports the operations you require:
Insert: insert a key/value pair with an m-bit key
Delete: remove the key/value pair with a given key
Lookup: find the value associated with a given key
FindNext: find the key/value pair with the smallest key at least a given k
FindPrevious: find the key/value pair with the largest key at most a given k
And the time complexity is O(logm) where m is the number of bits in the keys.
For example if all your keys are 16 bit integers between 0 and 65535, m would be 16.
EDIT
If the keys are in the range 1..N, the complexity is O(loglogN) for each of these operations.
The tree also supports min and max operations, which would have complexity O(1).
Insert O(loglogN)
Find O(loglogN)
Delete O(loglogN)
Next O(loglogN)
Prev O(loglogN)
Max O(1)
Min O(1)
DETAILS
This tree works by using a large array of children trees.
For example, suppose we had 16 bit keys. The first layer of the tree would store an array of 2^8 (=256) children trees. The first child is responsible for keys from 0 to 255, the second for keys 256,257,..,511, etc.
This makes it very easy to lookup to see whether a node is present as we can simply go straight to the corresponding array element.
However, by itself this would make finding the next element hard as we might need to search 256 children trees to find a nonzero element.
The Van Emde Boas tree contains two additions that make it easy to find the next element:
A min and max is stored for each tree so it is O(1) work to see whether we have reached our limits
An auxiliary tree is used to store the indexes of non-zero children. This auxiliary tree is a Van Emde Boas tree of size the square root of the original size.

Related

Find nth Smallest element from Binary Search Tree

How to find nth Smallest element from Binary Search Tree
Constraints are :
time complexity must be O(1)
No extra space should be used
I have already tried 2 approaches.
Doing inorder traversal and finding nth element - Time complexity O(n)
Maintaining no. of small elements than current node and finding element with m small elements - Time complexity O(log n)
The only way I could think about is to change the data structure that holds the BST in memory. Should be simple if you actually consider every nodes as structure themselves (value, left_child and right_child) instead of storing them in a unordered array, you can store them in a ordered array. Thus the nth smallest element would be the nth element in your array. The extra computation will be at insertion and deletion. But it still would be more effective if you use for example a C++ set (log(n) for both insertion and deletion).
It mainly depends on your use case.
If you do not use data structure for handling the tree (based on array position) I don't think you cannot do it in something better than log(n).

Partial sorting to find the kth largest/smallest elements

Source : Wikipedia
A streaming, single-pass partial sort is also possible using heaps or
other priority queue data structures. First, insert the first k
elements of the input into the structure. Then make one pass over the
remaining elements, add each to the structure in turn, and remove the
largest element. Each insertion operation also takes O(log k) time,
resulting in O(n log k) time overall.
How is this different from the case where we first heapify the complete input array in O(n) time and then extract the minimum of the heap k times.
I don't understand the part where it says to make one pass over the remaining elements, add each to the structure in turn, and remove the largest element. Isn't this the same as the method described in 1) ?
The suggested method is streaming. It doesn't need to have all the items in memory to run the heapify algorithm, given it O(k) space complexity (but it only finds the top-k items).
A more explicit description of the algorithm (see also the reference WP gives) is
given a stream of items:
make a heap of the first k elements in the stream,
for each element after the first k:
push it onto the heap,
extract the largest (or smallest) element from the heap and discard it,
finally return the k values left in the heap.
By construction, the heap never grows to more than k + 1 elements. The items can be streamed in from disk, over a network, etc., which is not possible with the heapify algorithm.

Design a data structure with insertion, deletion, get random in O(1) time complexity,

It was a recent interview question. Please design a data structure with insertion, deletion, get random in o(1) time complexity, the data structure can be a basic data structures such as arrays, can be a modification of basic data structures, and can be a combination of basic data structures.
Combine an array with a hash-map of element to array index.
Insertion can be done by appending to the array and adding to the hash-map.
Deletion can be done by first looking up and removing the array index in the hash-map, then swapping the last element with that element in the array, updating the previously last element's index appropriately, and decreasing the array size by one (removing the last element).
Get random can be done by returning a random index from the array.
All operations take O(1).
Well, in reality, it's amortised (from resizing the array) expected (from expected hash collisions) O(1), but close enough.
A radix tree would work. See http://en.wikipedia.org/wiki/Radix_tree. Insertion and deletion are O(k) where k is the maximum length of the keys. If all the keys are the same length (e.g., all pointers), then k is a constant so the running time is O(1).
In order to implement get random, maintain a record of the total number of leaves in each subtree (O(k)). The total number of leaves in tree is recorded at the root. To pick one at random, generate a random integer to represent the index of the element to pick. Recursively scan down the tree, always following the branch that contains the element you picked. You always know which branch to choose because you know how many leaves can be reached from each subtree. The height of the tree is no more than k, so this is O(k), or O(1) when k is constant.

data structure similar to array but supporting deletion

I am thinking of the following data structure question:
given integers between 1 and n in sorted order, every operation queries and then removes (in a single call) kth smallest number. How to make the query and removal both constant time operations?
It is similar to an array structure but requiring constant removing. Though an order balanced binary tree can do this, but it is O(lg n) complexity.
Can one take the advantage of the range property (numbers only between 1 and n) to make it work?
LinkedHashSet is what you are looking for . If you want index as in arrays then use this LinkedHashMap. But you need to insert them in order from 1 ton
What is the maximal value of N? You mentioned that you are going to work with positive numbers - Van Emde Boas tree probably the best choice for you.
Short description:
- allows to store only positive numbers from [0,2^k), where k is is a number of bits required to store maximal number N. - all operations (insert,delete,lookup,find_next,find_prev) works in log(K).Not log(N). So, for integer 32-bit numbers complexity is log(32)=5
- disadvantage is memory consumption. requires 2^k ~ O(N) memory, so for storing integers you need ~1GB RAM. Remember, that usually O(N) memory means O(number of elements) but here it means O(maximal stored value).
Note: I'm not sure about supporting k-th element query but description looks nice:
FindNext: find the key/value pair with the smallest key at least a
given k
FindPrevious: find the key/value pair with the largest key at most a
given k
UPDATE
As Dukeling mentioned below, K-th element query is not supported. I see the only way to implement it.
int x = getMin();
for(int i=0;i<k-1;i++) x = getNext(x);
after this loop x will store k-th element. But complexity is O(K*log(bits)). Too bad for large values of K(

Data structures question

This question is from an exam I had, and I couldn't solve it and wanted to see what the answer is (this is not homework, as it will not help me in anything but knowledge).
We need to create a data structure for containing elements whose keys are real numbers.
The data structure should have these functions:
Build(S, array): Builds the data structure S with n elements in O(n)
Insert(S, k) and Delete(S, x) in O(lgn) (k is an element, x is a pointer to it in the data structure)
Delete-Minimal-Positive(S): Remove the element with the minimal positive key
Mode(S): returns the key that is most frequent in S in O(1)
Now, building in O(n) usually means a heap should be used, but that does not allow to find frequencies. I couldn't find any way to do this so. Best I could come up with is building a Red-Black-Tree (O(nlgn)) that will be used to build a frequency heap.
I'm dying to know the answer...
Thanks!
Using just the comparison model, there is no solution to this problem.
The Element Distinctness Problem has provable Omega(nlogn) lower bounds. This (element distinctness) problem is basically the problem of determining if all the elements of an array are distinct.
If there was a solution to your problem, then we could answer the element distinctness problem in O(n) time (find the most frequent element in O(n) time, and see if there are more than one instances of that element, again in O(n) time).
So, I suggest you ask your professor for the computational model.
Well, you can use a hash table to calculate the number of occurrences of distinct real numbers in O(1) amortized time, and then use a standard heap where the items are pairs (real number, number of occurrences) and the heap is sorted according to the number of occurrences field.
When you insert a key or delete a key, you increment or decrement the number of occurrences field by one, or in the extreme cases add or remove a heap element. In both cases you need to percolate up / down because the ordering field has changed.
Assuming the hash table is O(1) operation, you have a standard heap + O(1) hash table and you get all the operations above within the time limits. In particular, you get the "mode" by reading the root element of the heap.
I think the following solution will be acceptable. It based on two data stuctures:
Red-black tree
Binary heap
Binary heap holds tuple, that contain (element value, frequence of element), heap is builded on frequencies, so it's give us ability to find mode in O(1).
Red black tree contains a tuple that hold (element value, pointer to same element value in heap)
When you need to insert new element, you will try to find element(it takes O(log n)), if search was succeful, than go to the pointer from element founded in RB-tree, increase frequence, and rebuild heap(also O(log n)). If search didn't find such element than insert it into RB-tree(O(log n)) and to heap with value = (element, 1) (also O(1)), set a pointer in RB-tree to new element in heap.
Initial building will take O(n), because building both structures from set of n element takes O(n).
Sorry, if I am miss something.
For frequencies:
Each entry is bi-directionaly linked to own frequencies/counters (use hash table)
Frequencies are in linked list.
There is need to move frequency up/down over linked list,(deleting inserting element) but for max difference of 1.
Frequencies are thus linked to pointer of +1 and -1 frequency element.
(there are exceptions but can be handled)

Resources