Find nth Smallest element from Binary Search Tree - data-structures

How to find nth Smallest element from Binary Search Tree
Constraints are :
time complexity must be O(1)
No extra space should be used
I have already tried 2 approaches.
Doing inorder traversal and finding nth element - Time complexity O(n)
Maintaining no. of small elements than current node and finding element with m small elements - Time complexity O(log n)

The only way I could think about is to change the data structure that holds the BST in memory. Should be simple if you actually consider every nodes as structure themselves (value, left_child and right_child) instead of storing them in a unordered array, you can store them in a ordered array. Thus the nth smallest element would be the nth element in your array. The extra computation will be at insertion and deletion. But it still would be more effective if you use for example a C++ set (log(n) for both insertion and deletion).
It mainly depends on your use case.
If you do not use data structure for handling the tree (based on array position) I don't think you cannot do it in something better than log(n).

Related

existence of a certain data structure

I'm wondering, can there exists such a data stucture under the following criterions and times(might be complicated)?
if we obtain an unsorted list L an build a data structure out of it like this:
Build(L,X) - under O(n) time, we build the structure S from an unsorted list of n elements
Insert (y,S) under O(lg n) we insert z into the structure S
DEL-MIN(S) - under O(lg n) we delete the minimal element from S
DEL-MAX(S) - under O(lg n) we delete the maximal element from S
DEL-MId(S) - under O(lg n) we delete the upper medial(ceiling function) element from S
the problem is that the list L is unsorted. can such a data structure exist?
DEL-MIN and DEL-MAX are easy: keep a min-heap and max-heap of all the elements. The only trick is that you have to keep indices of the value in the heap so that when (for example) you remove the max, you can also find it and remove it in the min-heap.
For DEL-MED, you can keep a max-heap of the elements less than the median and a min-heap of the elements greater than or equal to the median. The full description is in this answer: Data structure to find median. Note that in that answer the floor-median is returned, but that's easily fixed. Again, you need to use the cross-indexing trick to refer to the other datastructures as in the first part. You will also need to think how this handles repeated elements if that's possible in your problem formulation. (If necessary, you can do it by storing repeated elements as (count, value) in your heap, but this complicates rebalancing the heaps on insert/remove a little).
Can this all be built in O(n)? Yes -- you can find the median of n things in O(n) time (using the median-of-median algorithm), and heaps can be built in O(n) time.
So overall, the datastructure is 4 heaps (a min-heap of all the elements, a max-heap of all the elements, a max-heap of the floor(n/2) smallest elements, a min-heap of the ceil(n/2) largest elements. All with cross-indexes to each other.

Partial sorting to find the kth largest/smallest elements

Source : Wikipedia
A streaming, single-pass partial sort is also possible using heaps or
other priority queue data structures. First, insert the first k
elements of the input into the structure. Then make one pass over the
remaining elements, add each to the structure in turn, and remove the
largest element. Each insertion operation also takes O(log k) time,
resulting in O(n log k) time overall.
How is this different from the case where we first heapify the complete input array in O(n) time and then extract the minimum of the heap k times.
I don't understand the part where it says to make one pass over the remaining elements, add each to the structure in turn, and remove the largest element. Isn't this the same as the method described in 1) ?
The suggested method is streaming. It doesn't need to have all the items in memory to run the heapify algorithm, given it O(k) space complexity (but it only finds the top-k items).
A more explicit description of the algorithm (see also the reference WP gives) is
given a stream of items:
make a heap of the first k elements in the stream,
for each element after the first k:
push it onto the heap,
extract the largest (or smallest) element from the heap and discard it,
finally return the k values left in the heap.
By construction, the heap never grows to more than k + 1 elements. The items can be streamed in from disk, over a network, etc., which is not possible with the heapify algorithm.

Design a data structure with insertion, deletion, get random in O(1) time complexity,

It was a recent interview question. Please design a data structure with insertion, deletion, get random in o(1) time complexity, the data structure can be a basic data structures such as arrays, can be a modification of basic data structures, and can be a combination of basic data structures.
Combine an array with a hash-map of element to array index.
Insertion can be done by appending to the array and adding to the hash-map.
Deletion can be done by first looking up and removing the array index in the hash-map, then swapping the last element with that element in the array, updating the previously last element's index appropriately, and decreasing the array size by one (removing the last element).
Get random can be done by returning a random index from the array.
All operations take O(1).
Well, in reality, it's amortised (from resizing the array) expected (from expected hash collisions) O(1), but close enough.
A radix tree would work. See http://en.wikipedia.org/wiki/Radix_tree. Insertion and deletion are O(k) where k is the maximum length of the keys. If all the keys are the same length (e.g., all pointers), then k is a constant so the running time is O(1).
In order to implement get random, maintain a record of the total number of leaves in each subtree (O(k)). The total number of leaves in tree is recorded at the root. To pick one at random, generate a random integer to represent the index of the element to pick. Recursively scan down the tree, always following the branch that contains the element you picked. You always know which branch to choose because you know how many leaves can be reached from each subtree. The height of the tree is no more than k, so this is O(k), or O(1) when k is constant.

Find k-th smallest element data structure

I have a problem here that requires to design a data structure that takes O(lg n) worst case for the following three operations:
a) Insertion: Insert the key into data structure only if it is not already there.
b) Deletion: delete the key if it is there!
c) Find kth smallest : find the ݇k-th smallest key in the data structure
I am wondering if I should use heap but I still don't have a clear idea about it.
I can easily get the first two part in O(lg n), even faster but not sure how to deal with the c) part.
Anyone has any idea please share.
Two solutions come in mind:
Use a balanced binary search tree (Red black, AVG, Splay,... any would do). You're already familiar with operation (1) and (2). For operation (3), just store an extra value at each node: the total number of nodes in that subtree. You could easily use this value to find the kth smallest element in O(log(n)).
For example, let say your tree is like follows - root A has 10 nodes, left child B has 3 nodes, right child C has 6 nodes (3 + 6 + 1 = 10), suppose you want to find the 8th smallest element, you know you should go to the right side.
Use a skip list. It also supports all your (1), (2), (3) operations for O(logn) on average but may be a bit longer to implement.
Well, if your data structure keeps the elements sorted, then it's easy to find the kth lowest element.
The worst-case cost of a Binary Search Tree for search and insertion is O(N) while the average-case cost is O(lgN).
Thus, I would recommend using a Red-Black Binary Search Tree which guarantees a worst-case complexity of O(lgN) for both search and insertion.
You can read more about red-black trees here and see an implementation of a Red-Black BST in Java here.
So in terms of finding the k-th smallest element using the above Red-Black BST implementation, you just need to call the select method, passing in the value of k. The select method also guarantees worst-case O(lgN).
One of the solution could be using the strategy of quick sort.
Step 1 : Pick the fist element as pivot element and take it to its correct place. (maximum n checks)
now when you reach the correct location for this element then you do a check
step 2.1 : if location >k
your element resides in the first sublist. so you are not interested in the second sublist.
step 2.2 if location
step 2.3 if location == k
you have got the element break the look/recursion
Step 3: repete the step 1 to 2.3 by using the appropriate sublist
Complexity of this solution is O(n log n)
Heap is not the right structure for finding the Kth smallest element of an array, simply because you would have to remove K-1 elements from the heap in order to get to the Kth element.
There is a much better approach to finding Kth smallest element, which relies on median-of-medians algorithm. Basically any partition algorithm would be good enough on average, but median-of-medians comes with the proof of worst-case O(N) time for finding the median. In general, this algorithm can be used to find any specific element, not only the median.
Here is the analysis and implementation of this algorithm in C#: Finding Kth Smallest Element in an Unsorted Array
P.S. On a related note, there are many many things that you can do in-place with arrays. Array is a wonderful data structure and only if you know how to organize its elements in a particular situation, you might get results extremely fast and without additional memory use.
Heap structure is a very good example, QuickSort algorithm as well. And here is one really funny example of using arrays efficiently (this problem comes from programming Olympics): Finding a Majority Element in an Array

Sorting an n element array with O(logn) distinct elements in O(nloglogn) worst case time

The problem at hand is whats in the title itself. That is to give an algorithm which sorts an n element array with O(logn) distinct elements in O(nloglogn) worst case time. Any ideas?
Further how do you generally handle arrays with multiple non distinct elements?
O(log(log(n))) time is enough for you to do a primitive operation in a search tree with O(log(n)) elements.
Thus, maintain a balanced search tree of all the distinct elements you have seen so far. Each node in the tree additionally contains a list of all elements you have seen with that key.
Walk through the input elements one by one. For each element, try to insert it into the tree (which takes O(log log n) time). If you find you've already seen an equal element, just insert it into the auxiliary list in the already-existing node.
After traversing the entire list, walk through the tree in order, concatenating the auxiliary lists. (If you take care to insert in the auxiliary lists at the right ends, this is even a stable sort).
Simple log(N) space solution would be:
find distinct elements using balanced tree (log(n) space, n+log(n) == n time)
Than you can use this this tree to allways pick correct pivot for quicksort.
I wonder if there is log(log(N)) space solution.
Some details about using a tree:
You should be able to use a red black tree (or other type of tree based sorting algorithm) using nodes that hold both a value and a counter: maybe a tuple (n, count).
When you insert a new value you either create a new node or you increment the count of the node with the value you are adding (if a node with that value already exists). If you just increment the counter it will take you O(logH) where H is the height of the tree (to find the node), if you need to create it it will also take O(logH) to create and position the node (the constants are bigger, but it's still O(logH).
This will ensure that the tree will have no more than O(logn) values (because you have log n distinct values). This means that the insertion will take O(loglogn) and you have n insertions, so O(nloglogn).

Resources