How to find nth Smallest element from Binary Search Tree
Constraints are :
time complexity must be O(1)
No extra space should be used
I have already tried 2 approaches.
Doing inorder traversal and finding nth element - Time complexity O(n)
Maintaining no. of small elements than current node and finding element with m small elements - Time complexity O(log n)
The only way I could think about is to change the data structure that holds the BST in memory. Should be simple if you actually consider every nodes as structure themselves (value, left_child and right_child) instead of storing them in a unordered array, you can store them in a ordered array. Thus the nth smallest element would be the nth element in your array. The extra computation will be at insertion and deletion. But it still would be more effective if you use for example a C++ set (log(n) for both insertion and deletion).
It mainly depends on your use case.
If you do not use data structure for handling the tree (based on array position) I don't think you cannot do it in something better than log(n).
It was a recent interview question. Please design a data structure with insertion, deletion, get random in o(1) time complexity, the data structure can be a basic data structures such as arrays, can be a modification of basic data structures, and can be a combination of basic data structures.
Combine an array with a hash-map of element to array index.
Insertion can be done by appending to the array and adding to the hash-map.
Deletion can be done by first looking up and removing the array index in the hash-map, then swapping the last element with that element in the array, updating the previously last element's index appropriately, and decreasing the array size by one (removing the last element).
Get random can be done by returning a random index from the array.
All operations take O(1).
Well, in reality, it's amortised (from resizing the array) expected (from expected hash collisions) O(1), but close enough.
A radix tree would work. See http://en.wikipedia.org/wiki/Radix_tree. Insertion and deletion are O(k) where k is the maximum length of the keys. If all the keys are the same length (e.g., all pointers), then k is a constant so the running time is O(1).
In order to implement get random, maintain a record of the total number of leaves in each subtree (O(k)). The total number of leaves in tree is recorded at the root. To pick one at random, generate a random integer to represent the index of the element to pick. Recursively scan down the tree, always following the branch that contains the element you picked. You always know which branch to choose because you know how many leaves can be reached from each subtree. The height of the tree is no more than k, so this is O(k), or O(1) when k is constant.
I have this problem - i'm keeping a data structure that contains two different heaps , a minimum heap and a maximum heap that does not contain the same data.
My goal is to keep some kind of record for each node location in either of the heaps and have it updated with the heaps action.
Bottom line - i'm trying to figure out how can i have a delete(p) function that works in lg(n) complexity. p being a pointer data object that can hold any data.
Thanks,
Ned.
If your heap is implemented as an array of items (references, say), then you can easily locate an arbitrary item in the heap in O(n) time. And once you know where the item is in the heap, you can delete it in O(log n) time. So find and remove is O(n + log n).
You can achieve O(log n) for removal if you pair the heap with a dictionary or hash map, as I describe in this answer.
Deleting an arbitrary item in O(log n) time is explained here.
The trick to the dictionary approach is that the dictionary contains a key (the item key) and a value that is the node's position in the heap. Whenever you move a node in the heap, you update that value in the dictionary. Insertion and removal are slightly slower in this case, because they require making up to log(n) dictionary updates. But those updates are O(1), so it's not hugely expensive.
Or, if your heap is implemented as a binary tree (with pointers, rather than the implicit structure in an array), then you can store a pointer to the node in the dictionary and not have to update it when you insert or remove from the heap.
That being said, the actual performance of add and delete min (or delete max for the max heap) in the paired data structure will be lower than with a standard heap that's implemented as an array, unless you're doing a lot of arbitrary deletes. If you're only deleting an arbitrary item every once in a while, especially if your heap is rather small, you're probably better off with the O(n) delete performance. It's simpler to implement and when n is small there's little real difference between O(n) and O(log n).
I'm not sure if it's possible but it seems a little bit reasonable to me, I'm looking for a data structure which allows me to do these operations:
insert an item with O(log n)
remove an item with O(log n)
find/edit the k'th-smallest element in O(1), for arbitrary k (O(1) indexing)
of course editing won't result in any change in the order of elements. and what makes it somehow possible is I'm going to insert elements one by one in increasing order. So if for example I try inserting for the fifth time, I'm sure all four elements before this one are smaller than it and all the elements after this this are going to be larger.
I don't know if the requested time complexities are possible for such a data container. But here is a couple of approaches, which almost achieve these complexities.
First one is tiered vector with O(1) insertion and indexing, but O(sqrt N) deletion. Since you expect only about 10000 elements in this container and sqrt(10000)/log(10000) = 7, you get almost the required performance here. Tiered vector is implemented as an array of ring-buffers, so deleting an element requires moving all elements, following it in the ring-buffer, and moving one element from each of the following ring-buffers to the one, preceding it; indexing in this container means indexing in the array of ring-buffers and then indexing inside the ring-buffer.
It is possible to create a different container, very similar to tiered vector, having exactly the same complexities, but working a little bit faster because it is more cache-friendly. Allocate a N-element array to store the values. And allocate a sqrt(N)-element array to store index corrections (initialized with zeros). I'll show how it works on the example of 100-element container. To delete element with index 56, move elements 57..60 to positions 56..59, then in the array of index corrections add 1 to elements 6..9. To find 84-th element, look up eighth element in the array of index corrections (its value is 1), then add its value to the index (84+1=85), then take 85-th element from the main array. After about half of elements in main array are deleted, it is necessary to compact the whole container to attain contiguous storage. This gets only O(1) cumulative complexity. For real-time applications this operation may be performed in several smaller steps.
This approach may be extended to a Trie of depth M, taking O(M) time for indexing, O(M*N1/M) time for deletion and O(1) time for insertion. Just allocate a N-element array to store the values, N(M-1)/M, N(M-2)/M, ..., N1/M-element arrays to store index corrections. To delete element 2345, move 4 elements in main array, increase 5 elements in the largest "corrections" array, increase 6 elements in the next one and 7 elements in the last one. To get element 5678 from this container, add to 5678 all corrections in elements 5, 56, 567 and use the result to index the main array. Choosing different values for 'M', you can balance the complexity between indexing and deletion operations. For example, for N=65000 you can choose M=4; so indexing requires only 4 memory accesses and deletion updates 4*16=64 memory locations.
I wanted to point out first that if k is really a random number, then it might be worth considering that the problem might be completely different: asking for the k-th smallest element, with k uniformly at random in the range of the available elements is basically... picking an element at random. And it can be done much differently.
Here I'm assuming you actually need to select for some specific, if arbitrary, k.
Given your strong pre-condition that your elements are inserted in order, there is a simple solution:
Since your elements are given in order, just add them one by one to an array; that is you have some (infinite) table T, and a cursor c, initially c := 1, when adding an element, do T[c] := x and c := c+1.
When you want to access the k-th smallest element, just look at T[k].
The problem, of course, is that as you delete elements, you create gaps in the table, such that element T[k] might not be the k-th smallest, but the j-th smallest with j <= k, because some cells before k are empty.
It then is enough to keep track of the elements which you have deleted, to know how many have been deleted that are smaller than k. How do you do this in time at most O(log n)? By using a range tree or a similar type of data structure. A range tree is a structure that lets you add integers and then query for all integers in between X and Y. So, whenever you delete an item, simply add it to the range tree; and when you are looking for the k-th smallest element, make a query for all integers between 0 and k that have been deleted; say that delta have been deleted, then the k-th element would be in T[k+delta].
There are two slight catches, which require some fixing:
The range tree returns the range in time O(log n), but to count the number of elements in the range, you must walk through each element in the range and so this adds a time O(D) where D is the number of deleted items in the range; to get rid of this, you must modify the range tree structure so as to keep track, at each node, of the number of distinct elements in the subtree. Maintaining this count will only cost O(log n) which doesn't impact the overall complexity, and it's a fairly trivial modification to do.
In truth, making just one query will not work. Indeed, if you get delta deleted elements in range 1 to k, then you need to make sure that there are no elements deleted in range k+1 to k+delta, and so on. The full algorithm would be something along the line of what is below.
KthSmallest(T,k) := {
a = 1; b = k; delta
do {
delta = deletedInRange(a, b)
a = b + 1
b = b + delta
while( delta > 0 )
return T[b]
}
The exact complexity of this operation depends on how exactly you make your deletions, but if your elements are deleted uniformly at random, then the number of iterations should be fairly small.
There is a Treelist (implementation for Java, with source code), which is O(lg n) for all three ops (insert, delete, index).
Actually, the accepted name for this data structure seems to be "order statistic tree". (Apart from indexing, it's also defined to support indexof(element) in O(lg n).)
By the way, O(1) is not considered much of an advantage over O(lg n). Such differences tend to be overwhelmed by the constant factor in practice. (Are you going to have 1e18 items in the data structure? If we set that as an upper bound, that's just equivalent to a constant factor of 60 or so.)
Look into heaps. Insert and removal should be O(log n) and peeking of the smallest element is O(1). Peeking or retrieval of the K'th element, however, will be O(log n) again.
EDITED: as amit stated, retrieval is more expensive than just peeking
This is probably not possible.
However, you can make certain changes in balanced binary trees to get kth element in O(log n).
Read more about it here : Wikipedia.
Indexible Skip lists might be able to do (close) what you want:
http://en.wikipedia.org/wiki/Skip_lists#Indexable_skiplist
However, there's a few caveats:
It's a probabilistic data structure. That means it's not necessarily going to be O(log N) for all operations
It's not going to be O(1) for indexing, just O(log N)
Depending on the speed of your RNG and also depending on how slow traversing pointers are, you'll likely get worse performance from this than just sticking with an array and dealing with the higher cost of removals.
Most likely, something along the lines of this is going to be the "best" you can do to achieve your goals.
I am just trying to learn binary heap and have a doubt regarding doing delete operation in binary heap.
I have read that we can delete an element from binary heap and we need to reheapify it.
But at the following link, it says unavailable:
http://en.wikibooks.org/wiki/Data_Structures/Tradeoffs
Binary Search AVL Tree Binary Heap (min) Binomial Queue (min)
Find O(log n) O(log n) unavailable unavailable
Delete element O(log n O(log n) unavailable unavailable
I am little confused about it.
Thanks in advance for all of the clarifications.
Binary heaps and other priority queue structures don't usually support a general "delete element" operation; you need an additional data structure that keeps track of each element's index in the heap, e.g. a hash table. If you have that, you can implement a general delete operation as
find-element, O(1) expected time with a hash table
decrease key to less than the minimum, O(lg n) time
delete-min and update the hash table, O(lg n) combined expected time.
A regular delete is possible, just like a DeleteMin/Max. The "problem" is that you have to check for both up- and downshifts (i.e.: when the "last" node takes up the vacant spot, it can be over- or underevaluated. Since it still can't be both, for obvious reasons, it's easy to check for correctness.
The only problem that remains is the Find. The answer above states that you can Find Element in O(lg n), but I wouldn't know how. In my implementations, I generally build a Heap of pointers-to-elements rather than elements (cheaper copying during up/downshifts). I add a "position" variable to the Element type, which keeps track of the index of the Element's pointer in the Heap. This way, given an element E, I can find it's position in the Heap in constant time.
Obviously, this isn't cut out for every implementation.
I am confused why delete operation of binary heap is mentioned as unavailable in the link of your question. Deletion in binary heap quite possible and it's composition of 2 other operations of binary heap.
https://en.wikipedia.org/wiki/Binary_heap
I am considering you know all other operations of Binary Heap
Deleting a key from binary heap requires 2 lines of code/operation. Suppose you want to delete item at index x. Decrease it's value to lowest integer possible. That's Integer.MIN_VALUE. Since it's lowest value of all integer it will go to root position when decreaseItem(int index, int newVal) execution done. Afterwards extract the root invoking extractMin() method.
// Complexity: O(lg n)
public void deleteItem(int index) {
// Assign lowest value possible so that it will reach to root
decreaseItem(index, Integer.MIN_VALUE);
// Then extract min will remove that item from heap tree. correct ?
extractMin();
}
Full Code: BinaryHeap_Demo.java