Find the Median of an AVL Tree in O(1) [duplicate] - algorithm

If you have an AVL tree, what's the best way to get the median from it? The median would be defined as the element with index ceil(n/2) (index starts with 1) in the sorted list.
So if the list was
1 3 5 7 8
the median is 5. If the list was
1 3 5 7 8 10
the median is 5.
If you can augment the tree, I think it's best to let each node know the size (number of nodes) of the subtree, (i.e. 1 + left.size + right.size). Using this, the best way I can think of makes median searching O(lg n) time because you can traverse by comparing indexes.
Is there a better way?

Augmenting the AVL tree to store subtree sizes is generally the best approach here if you need to optimize over median queries. It takes time O(log n), which is pretty fast.
If you'll be computing the median a huge number of times, you could potentially use an augmented tree and also cache the median value so that you can read it in time O(1). Each time you do an insertion or deletion, you might need to recompute the median in time O(log n), which will slow things down a bit but not impact the asymptotic costs.
Another option would be to thread a doubly-linked list through the nodes in the tree so that you can navigate from a node to its successor or predecessor in constant time. If you do that, then you can store a pointer to the median element, and then on an insertion or a deletion, move the pointer to the left or to the right as appropriate. If you delete the median itself, you can just move the pointer left or right as you'd like. This doesn't require any augmentation and might be a bit faster, but it adds two extra pointers into each node.
Hope this helps!

Related

Why is the top down approach of heap construction less efficient than bottom up even though its order of growth is lower O(log n) over O(n)?

How is the bottom up approach of heap construction of the order O(n) ? Anany Levitin says in his book that this is more efficient compared to top down approach which is of order O(log n). Why?
That to me seems like a typo.
There are two standard algorithms for building a heap. The first is to start with an empty heap and to repeatedly insert elements into it one at a time. Each individual insertion takes time O(log n), so we can upper-bound the cost of this style of heap-building at O(n log n). It turns out that, in the worst case, the runtime is Θ(n log n), which happens if you insert the elements in reverse-sorted order.
The other approach is the heapify algorithm, which builds the heap directly by starting with each element in its own binary heap and progressively coalescing them together. This algorithm runs in time O(n) regardless of the input.
The reason why the first algorithm requires time Θ(n log n) is that, if you look at the second half of the elements being inserted, you'll see that each of them is inserted into a heap whose height is Θ(log n), so the cost of doing each bubble-up can be high. Since there are n / 2 elements and each of them might take time Θ(log n) to insert, the worst-case runtime is Θ(n log n).
On the other hand, the heapify algorithm spends the majority of its time working on small heaps. Half the elements are inserted into heaps of height 0, a quarter into heaps of height 1, an eighth into heaps of height 2, etc. This means that the bulk of the work is spent inserting elements into small heaps, which is significantly faster.
If you consider swapping to be your basic operation -
In top down construction,the tree is constructed first and a heapify function is called on the nodes.The worst case would swap log n times ( to sift the element to the top of the tree where height of tree is log n) for all the n/2 leaf nodes. This results in a O(n log n) upper bound.
In bottom up construction, you assume all the leaf nodes to be in order in the first pass, so heapify is now called only on n/2 nodes. At each level, the number of possible swaps increases but the number of nodes on which it happens decreases.
For example -
At the level right above leaf nodes,
we have n/4 nodes that can have at most 1 swap each.
At its' parent level we have,
n/8 nodes that can have at most 2 swaps each and so on.
On summation, we'll come up with a O(n) efficiency for bottom up construction of a heap.
It generally refers to a way of solving a problem. Especially in computer science algorithms.
Top down :
Take the whole problem and split it into two or more parts.
Find solution to these parts.
If these parts turn out to be too big to be solved as a whole, split them further and find find solutions to those sub-parts.
Merge solutions according to the sub-problem hierarchy thus created after all parts have been successfully solved.
In the regular heapify(), we perform two comparisons on each node from top to bottom to find the largest of three elements:
Parent node with left child
The larger node from the first comparison with the second child
Bottom up :
Breaking the problem into smallest possible(and practical) parts.
Finding solutions to these small sub-problems.
Merging the solutions you get iteratively(again and again) till you have merged all of them to get the final solution to the "big" problem. The main difference in approach is splitting versus merging. You either start big and split "down" as required or start with the smallest and merge your way "up" to the final solution.
Bottom-up Heapsort, on the other hand, only compares the two children and follows the larger child to the end of the tree ("top-down"). From there, the algorithm goes back towards the tree root (“bottom-up”) and searches for the first element larger than the root. From this position, all elements are moved one position towards the root, and the root element is placed in the field that has become free.
Binary Heap can be built in two ways:
Top-Down Approach
Bottom-Up Approach
In the Top-Down Approach, first begin with 3 elements. You consider 2 of them as heaps and the third as a key k. You then create a new Heap by joining these two sub-heaps with the key as the root node. Then, you perform Heapify to maintain the heap order (either Min or Max Heap order).
The, we take two such heaps(containing 3 elements each) and another element as a k, and create a new heap. We keep repeating this process, and increasing the size of each sub-heap until all elements are added.
This process adds half the elements in the bottom level, 1/4th in the second last one, 1/8th in the third last one and so on, therefore, the complexity of this approach results in a nearly observed time of O(n).
In the bottom up approach, we first simply create a complete binary tree from the given elements. We then apply DownHeap operation on each parent of the tree, starting from the last parent and going up the tree until the root. This is a much simpler approach. However, as DownHeap's worst case is O(logn) and we will be applying it on n/2 elements of the tree; the time complexity of this particular method results in O(nlogn).
Regards.

Get median from AVL tree?

If you have an AVL tree, what's the best way to get the median from it? The median would be defined as the element with index ceil(n/2) (index starts with 1) in the sorted list.
So if the list was
1 3 5 7 8
the median is 5. If the list was
1 3 5 7 8 10
the median is 5.
If you can augment the tree, I think it's best to let each node know the size (number of nodes) of the subtree, (i.e. 1 + left.size + right.size). Using this, the best way I can think of makes median searching O(lg n) time because you can traverse by comparing indexes.
Is there a better way?
Augmenting the AVL tree to store subtree sizes is generally the best approach here if you need to optimize over median queries. It takes time O(log n), which is pretty fast.
If you'll be computing the median a huge number of times, you could potentially use an augmented tree and also cache the median value so that you can read it in time O(1). Each time you do an insertion or deletion, you might need to recompute the median in time O(log n), which will slow things down a bit but not impact the asymptotic costs.
Another option would be to thread a doubly-linked list through the nodes in the tree so that you can navigate from a node to its successor or predecessor in constant time. If you do that, then you can store a pointer to the median element, and then on an insertion or a deletion, move the pointer to the left or to the right as appropriate. If you delete the median itself, you can just move the pointer left or right as you'd like. This doesn't require any augmentation and might be a bit faster, but it adds two extra pointers into each node.
Hope this helps!

Find median in O(1) in binary tree

Suppose I have a balanced BST (binary search tree). Each tree node contains a special field count, which counts all descendants of that node + the node itself. They call this data structure order statistics binary tree.
This data structure supports two operations of O(logN):
rank(x) -- number of elements that are less than x
findByRank(k) -- find the node with rank == k
Now I would like to add a new operation median() to find the median. Can I assume this operation is O(1) if the tree is balanced?
Unless the tree is complete, the median might be a leaf node. So in general case the cost will be O(logN). I guess there is a data structure with requested properties and with a O(1) findMedian operation (Perhaps a skip list + a pointer to the median node; I'm not sure about findByRank and rank operations though) but a balanced BST is not one of them.
If the tree is complete (i.e. all levels completely filled), yes you can.
In a balanced order statistics tree, finding the median is O(log N). If it is important to find the median in O(1) time, you can augment the data structure by maintaining a pointer to the median. The catch, of course, is that you would need to update this pointer during each Insert or Delete operation. Updating the pointer would take O(log N) time, but since those operations already take O(log N) time, the extra work of updating the median pointer does not change their big-O cost.
As a practical matter, this only makes sense if you do a lot of "find median" operations compared to the number of insertions/deletions.
If desired, you could reduce the cost of updating the median pointer during Insert/Delete to O(1) by using a (doubly) threaded binary tree, but Insert/Delete would still be O(log N).

Data structure to find median

This is an interview question. Design a class, which stores integers and provides two operations:
void insert(int k)
int getMedian()
I guess I can use BST so that insert takes O(logN) and getMedian takes O(logN) (for getMedian I should add the number of of left/right children for each node).
Now I wonder if this is the most efficient solution and there is no better one.
You can use 2 heaps, that we will call Left and Right.
Left is a Max-Heap.
Right is a Min-Heap.
Insertion is done like this:
If the new element x is smaller than the root of Left then we insert x to Left.
Else we insert x to Right.
If after insertion Left has count of elements that is greater than 1 from the count of elements of Right, then we call Extract-Max on Left and insert it to Right.
Else if after insertion Right has count of elements that is greater than the count of elements of Left, then we call Extract-Min on Right and insert it to Left.
The median is always the root of Left.
So insertion is done in O(lg n) time and getting the median is done in O(1) time.
See this Stack Overflow question for a solution that involves two heaps.
Would it beat an array of integers witch performs a sort at insertion time with a sort algorithm dedicated for integer (http://en.wikipedia.org/wiki/Sorting_algorithm) if you choose your candidate amongst O < O(log(n)) and using an array, then getMedian would be taking the index at half of the size would be O(1), no? It seems possible to me to do better than log(n) + log(n).
Plus by being a little more flexible you can improve your performance by changing your sort algorithm according to the properties of your input (are the input almost sorted or not ...).
I am pretty much autodidact in computer science, but that is the way I would do it: simpler is better.
You could consider a self-balancing tree, too. If the tree is fully balanced, then the root node is your median. Say, the tree is one-level deeper on one end. Then, you just need to know how many nodes are there in the deeper-side to pick the correct median.

Find k-th smallest element data structure

I have a problem here that requires to design a data structure that takes O(lg n) worst case for the following three operations:
a) Insertion: Insert the key into data structure only if it is not already there.
b) Deletion: delete the key if it is there!
c) Find kth smallest : find the ݇k-th smallest key in the data structure
I am wondering if I should use heap but I still don't have a clear idea about it.
I can easily get the first two part in O(lg n), even faster but not sure how to deal with the c) part.
Anyone has any idea please share.
Two solutions come in mind:
Use a balanced binary search tree (Red black, AVG, Splay,... any would do). You're already familiar with operation (1) and (2). For operation (3), just store an extra value at each node: the total number of nodes in that subtree. You could easily use this value to find the kth smallest element in O(log(n)).
For example, let say your tree is like follows - root A has 10 nodes, left child B has 3 nodes, right child C has 6 nodes (3 + 6 + 1 = 10), suppose you want to find the 8th smallest element, you know you should go to the right side.
Use a skip list. It also supports all your (1), (2), (3) operations for O(logn) on average but may be a bit longer to implement.
Well, if your data structure keeps the elements sorted, then it's easy to find the kth lowest element.
The worst-case cost of a Binary Search Tree for search and insertion is O(N) while the average-case cost is O(lgN).
Thus, I would recommend using a Red-Black Binary Search Tree which guarantees a worst-case complexity of O(lgN) for both search and insertion.
You can read more about red-black trees here and see an implementation of a Red-Black BST in Java here.
So in terms of finding the k-th smallest element using the above Red-Black BST implementation, you just need to call the select method, passing in the value of k. The select method also guarantees worst-case O(lgN).
One of the solution could be using the strategy of quick sort.
Step 1 : Pick the fist element as pivot element and take it to its correct place. (maximum n checks)
now when you reach the correct location for this element then you do a check
step 2.1 : if location >k
your element resides in the first sublist. so you are not interested in the second sublist.
step 2.2 if location
step 2.3 if location == k
you have got the element break the look/recursion
Step 3: repete the step 1 to 2.3 by using the appropriate sublist
Complexity of this solution is O(n log n)
Heap is not the right structure for finding the Kth smallest element of an array, simply because you would have to remove K-1 elements from the heap in order to get to the Kth element.
There is a much better approach to finding Kth smallest element, which relies on median-of-medians algorithm. Basically any partition algorithm would be good enough on average, but median-of-medians comes with the proof of worst-case O(N) time for finding the median. In general, this algorithm can be used to find any specific element, not only the median.
Here is the analysis and implementation of this algorithm in C#: Finding Kth Smallest Element in an Unsorted Array
P.S. On a related note, there are many many things that you can do in-place with arrays. Array is a wonderful data structure and only if you know how to organize its elements in a particular situation, you might get results extremely fast and without additional memory use.
Heap structure is a very good example, QuickSort algorithm as well. And here is one really funny example of using arrays efficiently (this problem comes from programming Olympics): Finding a Majority Element in an Array

Resources