Find k-th smallest element data structure - algorithm

I have a problem here that requires to design a data structure that takes O(lg n) worst case for the following three operations:
a) Insertion: Insert the key into data structure only if it is not already there.
b) Deletion: delete the key if it is there!
c) Find kth smallest : find the ݇k-th smallest key in the data structure
I am wondering if I should use heap but I still don't have a clear idea about it.
I can easily get the first two part in O(lg n), even faster but not sure how to deal with the c) part.
Anyone has any idea please share.

Two solutions come in mind:
Use a balanced binary search tree (Red black, AVG, Splay,... any would do). You're already familiar with operation (1) and (2). For operation (3), just store an extra value at each node: the total number of nodes in that subtree. You could easily use this value to find the kth smallest element in O(log(n)).
For example, let say your tree is like follows - root A has 10 nodes, left child B has 3 nodes, right child C has 6 nodes (3 + 6 + 1 = 10), suppose you want to find the 8th smallest element, you know you should go to the right side.
Use a skip list. It also supports all your (1), (2), (3) operations for O(logn) on average but may be a bit longer to implement.

Well, if your data structure keeps the elements sorted, then it's easy to find the kth lowest element.

The worst-case cost of a Binary Search Tree for search and insertion is O(N) while the average-case cost is O(lgN).
Thus, I would recommend using a Red-Black Binary Search Tree which guarantees a worst-case complexity of O(lgN) for both search and insertion.
You can read more about red-black trees here and see an implementation of a Red-Black BST in Java here.
So in terms of finding the k-th smallest element using the above Red-Black BST implementation, you just need to call the select method, passing in the value of k. The select method also guarantees worst-case O(lgN).

One of the solution could be using the strategy of quick sort.
Step 1 : Pick the fist element as pivot element and take it to its correct place. (maximum n checks)
now when you reach the correct location for this element then you do a check
step 2.1 : if location >k
your element resides in the first sublist. so you are not interested in the second sublist.
step 2.2 if location
step 2.3 if location == k
you have got the element break the look/recursion
Step 3: repete the step 1 to 2.3 by using the appropriate sublist
Complexity of this solution is O(n log n)

Heap is not the right structure for finding the Kth smallest element of an array, simply because you would have to remove K-1 elements from the heap in order to get to the Kth element.
There is a much better approach to finding Kth smallest element, which relies on median-of-medians algorithm. Basically any partition algorithm would be good enough on average, but median-of-medians comes with the proof of worst-case O(N) time for finding the median. In general, this algorithm can be used to find any specific element, not only the median.
Here is the analysis and implementation of this algorithm in C#: Finding Kth Smallest Element in an Unsorted Array
P.S. On a related note, there are many many things that you can do in-place with arrays. Array is a wonderful data structure and only if you know how to organize its elements in a particular situation, you might get results extremely fast and without additional memory use.
Heap structure is a very good example, QuickSort algorithm as well. And here is one really funny example of using arrays efficiently (this problem comes from programming Olympics): Finding a Majority Element in an Array

Related

Find nth Smallest element from Binary Search Tree

How to find nth Smallest element from Binary Search Tree
Constraints are :
time complexity must be O(1)
No extra space should be used
I have already tried 2 approaches.
Doing inorder traversal and finding nth element - Time complexity O(n)
Maintaining no. of small elements than current node and finding element with m small elements - Time complexity O(log n)
The only way I could think about is to change the data structure that holds the BST in memory. Should be simple if you actually consider every nodes as structure themselves (value, left_child and right_child) instead of storing them in a unordered array, you can store them in a ordered array. Thus the nth smallest element would be the nth element in your array. The extra computation will be at insertion and deletion. But it still would be more effective if you use for example a C++ set (log(n) for both insertion and deletion).
It mainly depends on your use case.
If you do not use data structure for handling the tree (based on array position) I don't think you cannot do it in something better than log(n).

Complexity of inserting n numbers into a binary search tree

I have got a question, and it says "calculate the tight time complexity for the process of inserting n numbers into a binary search tree". It does not denote whether this is a balanced tree or not. So, what answer can be given to such a question? If this is a balanced tree, then height is logn, and inserting n numbers take O(nlogn) time. But this is unbalanced, it may take even O(n2) time in the worst case. What does it mean to find the tight time complexity of inserting n numbers to a bst? Am i missing something? Thanks
It could be O(n^2) even if the tree is balanced.
Suppose you're adding a sorted list of numbers, all larger than the largest number in the tree. In that case, all numbers will be added to the right child of the rightmost leaf in the tree, Hence O(n^2).
For example, suppose that you add the numbers [15..115] to the following tree:
The numbers will be added as a long chain, each node having a single right hand child. For the i-th element of the list, you'll have to traverse ~i nodes, which yields O(n^2).
In general, if you'd like to keep the insertion and retrieval at O(nlogn), you need to use Self Balancing trees.
What wiki is saying is correct.
Since the given tree is a BST, so one need not to search through entire tree, just comparing the element to be inserted with roots of tree/subtree will get the appropriate node for th element. This takes O(log2n).
Once we have such a node we can insert the key there bht after that it is required push all the elements in the right aub-tree to right, so that BST's searching property does not get violated. If the place to be inserted comes to be the very last last one, we need to worry for the second procedure. If note this procedure may take O(n), worst case!.
So the overall worst case complexity of inserting an element in a BST would be O(n).
Thanks!

What search tree should I use when I know all the probabilities of accessing each element

In case if I don't know the probabilities of accessing each element, but I'm sure that some elements will be accessed far more often then the others, I will use Splay tree. What should I use if I already know all the probabilities? I assume that there should be some data structure that is better than splay trees for this case.
I'm trying to imagine all the cases where and when should I use every type of the search trees. Maybe someone can post some links to articles about comparison of all the search trees, and similar structures?
EDIT I'd like to still have O(log n) as the worst case, but in avarage it should be faster. Splay trees are good example, but I'd like to predefine the configuration of this tree.
For example, I have an array of elements to store [a1, a2, .. an], and the probabilities for each element [p1, p2, .. pn], which define how often I will access each element. I can create splay tree, add each element to the splay tree (O(n log n)), and then access them with given probabilities to make the desired tree. So if I have probabilities [1/2, 1/4, 1/4], I need to splay the first element, to make it be among the first. So, I need to order elements by probabilities, and splay them in the order from the lowest to the highest access probability. That takes O(n log n) also. So, overall time of building such tree is O(n log n) with a big constant. My goal is to lower this number.
I do not mind using something else, but not search tree, but I'd like for the time to be lower then in case of Splay tree. And I want search, insert and delete be in the range of O(log n) amortized.
Edit: I didn't see that you wanted to update the tree dynamically - the below algorithm requires all elements and probabilities to be known in advance. I'll leave the post up in case someone in such a situation comes along.
If you happen to be in possession of the third edition of Introduction to Algorithms by Cormen et al., it describes a dynamic programming algorithm for creating optimal binary search trees when you know all of the probabilities.
Here is a rough outline of the algorithm: First, sort the elements (on element value, not probability). We don't yet know which element should be the root of the tree, but we know that all elements that will be to the left of the root in the tree will be to the left of that element in the list, and vice versa for the elements to the right of the root. If we choose the element at index k to be the root, we get two subproblems: how to construct an optimal tree for the elements 0 through k-1, and for the elements k+1 through n-1. Solve these problems recursively, so that you know the expected cost for a search in a tree where the root is element k. Do this for all possible choices of k, and you will find which tree is the best one. Use dynamic programming or memoization in order to save computation time.
Use a hash table.
You never mentioned needing ordered iteration, and by sacrificing this you can achieve amortized O(1) insert/access complexity, better than O(log n).
Specifically, use a hash table with linked list buckets, and use the move-to-front optimization. What this means is each time you search a bucket (linked list) with more than one item, you move the item found to the front of that bucket. The next time you access this element, it will already be at the front.
If you know the access probabilities, you can further refine the technique. When inserting a new element into a bucket, don't insert it onto the front, but rather insert such that you maintain most-probable-first order. Note the move-to-front technique will tend to perform this sort implicitly already, but you can help it bootstrap more quickly.
If your tree is not going to change once created, you probably should use a hash table or tango tree:
http://en.wikipedia.org/wiki/Tango_tree
Hash tables, when not overloaded, are O(1) lookup, degrading to a O(n) when overloaded.
Tango trees, once constructed, are O(loglogn) lookup. They do not support deletion or insertion.
There's also something known as a "perfect hash" that might be good for your use.

Data structure to find median

This is an interview question. Design a class, which stores integers and provides two operations:
void insert(int k)
int getMedian()
I guess I can use BST so that insert takes O(logN) and getMedian takes O(logN) (for getMedian I should add the number of of left/right children for each node).
Now I wonder if this is the most efficient solution and there is no better one.
You can use 2 heaps, that we will call Left and Right.
Left is a Max-Heap.
Right is a Min-Heap.
Insertion is done like this:
If the new element x is smaller than the root of Left then we insert x to Left.
Else we insert x to Right.
If after insertion Left has count of elements that is greater than 1 from the count of elements of Right, then we call Extract-Max on Left and insert it to Right.
Else if after insertion Right has count of elements that is greater than the count of elements of Left, then we call Extract-Min on Right and insert it to Left.
The median is always the root of Left.
So insertion is done in O(lg n) time and getting the median is done in O(1) time.
See this Stack Overflow question for a solution that involves two heaps.
Would it beat an array of integers witch performs a sort at insertion time with a sort algorithm dedicated for integer (http://en.wikipedia.org/wiki/Sorting_algorithm) if you choose your candidate amongst O < O(log(n)) and using an array, then getMedian would be taking the index at half of the size would be O(1), no? It seems possible to me to do better than log(n) + log(n).
Plus by being a little more flexible you can improve your performance by changing your sort algorithm according to the properties of your input (are the input almost sorted or not ...).
I am pretty much autodidact in computer science, but that is the way I would do it: simpler is better.
You could consider a self-balancing tree, too. If the tree is fully balanced, then the root node is your median. Say, the tree is one-level deeper on one end. Then, you just need to know how many nodes are there in the deeper-side to pick the correct median.

Listing values in a binary heap in sorted order using breadth-first search?

I'm currently reading this paper and on page five, it discusses properties of binary heaps that it considers to be common knowledge. However, one of the points they make is something that I haven't seen before and can't make sense of. The authors claim that if you are given a balanced binary heap, you can list the elements of that heap in sorted order in O(log n) time per element using a standard breadth-first search. Here's their original wording:
In a balanced heap, any new element can be
inserted in logarithmic time. We can list the elements of a heap in order by weight, taking logarithmic
time to generate each element, simply by using breadth first search.
I'm not sure what the authors mean by this. The first thing that comes to mind when they say "breadth-first search" would be a breadth-first search of the tree elements starting at the root, but that's not guaranteed to list the elements in sorted order, nor does it take logarithmic time per element. For example, running a BFS on this min-heap produces the elements out of order no matter how you break ties:
1
/ \
10 100
/ \
11 12
This always lists 100 before either 11 or 12, which is clearly wrong.
Am I missing something? Is there a simple breadth-first search that you can perform on a heap to get the elements out in sorted order using logarithmic time each? Clearly you can do this by destructively modifying heap by removing the minimum element each time, but the authors' intent seems to be that this can be done non-destructively.
You can get the elements out in sorted order by traversing the heap with a priority queue (which requires another heap!). I guess this is what he refers to as a "breadth first search".
I think you should be able to figure it out (given your rep in algorithms) but basically the key of the priority queue is the weight of a node. You push the root of the heap onto the priority queue. Then:
while pq isn't empty
pop off pq
append to output list (the sorted elements)
push children (if any) onto pq
I'm not really sure (at all) if this is what he was referring to but it vaguely fitted the description and there hasn't been much activity so I thought I might as well put it out there.
In case that you know that all elements lower than 100 are on left you can go left, but in any case even if you get to 100 you can see that there no elements on left so you go out. In any case you go from node (or any other node) at worst twice before you realise that there are no element you are searching for. Than men that you go in this tree at most 2*log(N) times. This is simplified to log(N) complexity.
Point is that even if you "screw up" and you traverse to "wrong" node you go that node at worst once.
EDIT
This is just how heapsort works. You can imagine, that you have to reconstruct heap using N(log n) complexity each time you take out top element.

Resources