Sorting a unique array in less than O(nlogn) - algorithm

The question goes like this-
Assuming I have an Array of real numbers X[x1,...,xn] and a natural constant k such that for every i X[i]<X[i+k]. Write a sorting algorithm with time complexity better than O(nlogn). For the purpose of the question I am allowed to use quick sort, counting sort, Radix sort, bucket sort, heaps and so on.
All I've figured out so far is that if I take sublists by the remainder of the indecies (after dividing with K), those sublists are sorted. But merging them in the right complexity seems impossible. I also tried using min heaps after realizing the i smallest value must be in the first k*i places but it took me O(n^2) which is too much.
I'd appreaciate any guidance/help/references.
Thank you!

Something to note here is that you essentially have k sorted arrays that you could merge directly. That is, the sequence [x[i],x[i+k],x[i+2k],x[i+3k]...] is sorted. As is [x[i+1],x[i+k+1],[x[i]+2k+1]...]
So you have to merge k sorted sequences. The basic idea is:
Initialize a priority queue of size k with the first item from each sorted sequence. That is, add x[0], x[1], x[2], x[3] ... x[k-1] to the priority queue. In addition, save the index of the item in the priority queue structure.
Remove the first item from the queue and add the next item from that sequence. So if the first item you remove from the queue is x[3], then you'll want to add x[3+k] to the queue.
You'll of course have to make sure that you're not attempting to add anything from beyond the end of the array.
Continue in that fashion until the queue is empty.
Complexity is O(n log k). Basically, every item is added to and removed from a priority queue of size k. Both addition and removal are, worst case, O(log k).

Related

Find Median of K last numbers from Data Stream

Problem: An odd number K is given. At the input, we gradually receive a sequence of N numbers. After reading each number (except the first K-1), print the median of the last K numbers.
I solved it using two heaps (MaxHeap and MinHeap, e.g. like here ) and added function remove_element(value) to both heaps. remove_element(value) iterates over the heap, looking for the value to remove. Removes it and then rebalances the tree. So it works in O(K+log K).
And I solved the problem like this: by iterating the stream date, I add a new element to the heaps and delete the old one, which is already outside the window (K+1th element). So it works in O(N*(K + log K + log K)) = O(N*K).
I'm wondering 1) if I correctly estimated the time complexity of the algorithm, 2) is it possible to speed up this algorithm. 3) if not (the algorithm is optimal), then how can I prove its optimality.
As for the third question, if my estimate of the O(N*K) algorithm is correct, I think it can be proven based on the obvious idea that in order to write out the median, we need to check all K elements for each of N requests
For a better approach, make the heaps be heaps of (value, i). And now you don't need to remove stale values promptly, you can simply keep track of how many non-stale values are on each side, and throw away stale values when they pop up from the heap.
And if more than half of a heap is stale, then do garbage collection to remove stale values.
If you do this well, you should be able to solve this problem with O(k) extra memory in time O(n log(k)).

BUILD-MAX-HEAP running time for array sorted in decreasing order

I know that the running time for BUILD-MAX-HEAP in heap sort is O(n). But, if we have an array that already sorted in a decreasing order, why do we still have O(n) for the running time of BUILD-MAX-HEAP?
Isn't it supposed to be something like O(1)? It's already sorted from the maximum value to the minimum value, so we do not need MAX-HEAPIFY.
Is my understanding correct? Could someone please explain it to me?
You are right. It can of course be O(1). When you know for sure that your list is sorted you can use it as your max heap.
The common implementation of a heap using array uses this behavior for its elements position:
childs[i] = 2i+1 and 2i+2
parent[i] = floor((i-1)/2)
This rule applies on a sorted array. (descending for max-heap, increasing for min-heap).
Please note that if you need to check first that the list is sorted it is of course still O(n).
EDIT: Heap Sort Complexity
Even though the array might be sorted and building the heap might actually take O(1). Whenever you perform a Heap Sort you will still end up with O(n log n).
As said in the comments, Heap Sort is performing n calls to extract-max. Each extraction operation takes O(log n) - We end up with total time complexity of O(n log n).
In case the array is not sorted we will get total time-complexity of O(n + nlogn) which is still O(n log n).
If you know that the array is already sorted in decreasing order, then there's no need to sort it. If you want it in ascending order, you can reverse the array in O(n) time.
If you don't know whether the array is already sorted, then it takes O(n) to determine if it's already reverse sorted.
The reason building a max heap from a reverse-sorted array is considered O(n) is that you have to start at item n/2 and, working backwards, make sure that the element is not smaller than its children. It's considered O(n) even though there are only n/2 checks, because the number of operations performed is proportional to the total number of items to be checked.
It's interesting to note, by the way, that you can build a max-heap from a reverse-sorted array faster than you can check the array to see if it's reverse sorted.

What is the time complexity of sorting half an array using a priority queue?

So I have developed a Priority Queue using a Min Heap and according to online tutorials it takes O(nlogn) time to sort an entire array using a Priority Queue. This is because we extract 'n' times and for every extraction we have to perform a priority fix which takes logn time. Hence it is nlogn.
However, if I only want to sort half an array every single time, would it still be O(nlogn) time? Or would it be just O(logn)? The reason why I want to do this is because I want to get the element with middle priority and this seems like the only way to do it using a priority queue by extracting half the elements unless there is a more intuitive way of getting the element with middle priority in Priority Queue.
I think that the question is in two parts, so I will answer in two parts:
(a) If I understand you correctly, by sorting "half an array" you mean obtaining a sorted array of (n/2) smallest values of the given array. This will have to take O(n lg n) time. If there were a technique for doing this shorter than O(n lg n) time, then whenever we wanted to sort an array of n values whose maximum value is known to be v (and we can obtain the maximum value in O(n) time), we could construct an array of 2n elements, where the first half is the original array and the second half is filled with a value larger than v. Then, applying the hypothetical technique, we could in effect sort the original array in a time shorter than O(n lg n), which is known to be impossible.
(b) But if I am correct in understanding "the element with middle priority" as the median element in an array, you may be interested in this question.

Priority queue O(1) insertion and removal

Is it possible for a priority queue to have both O(1) insertion and removal?
Priority queues can be implemented using heaps and looking at the run times for Fibonacci heaps it appears that it is not possible to get a run time better than O(logN) per removal.
I am trying to implement a data structure where given N items I will have half in a max-priority queue and half in a min-priority queue. I am then to remove all N items sequentially.
I can insert all N elements in O(N) time but removing all N items will take O(N*logN) so I am wondering if another approach would be more suitable.
If you could construct a priority queue with O(1) insertion and O(1) removal, you could use that to sort a list of n items in O(n) time. As explained in this answer, you can't sort in O(n) in the general case, so it will be impossible to construct a priory queue with O(1) insertion and O(1) removal without making more assumptions on the input.
For example, a priority queue that has O(1) insertion and O(k) (k is the maximum element that could be inserted) removal can be constructed. Keep a table of k linked lists. Insertion of x just prepends an item to the front of the xth list. Removal has to scan through the table to find the first non-empty list (then remove the first item of the list and return the index of that list). There are only k lists, so removal takes O(k) time. If k is a constant, that works out to O(1) removal.
In practice, using a table of counts would work out better. Incrementing a variable-length integer isn't constant time unless you use amortized analysis (which is why I didn't use it in the previous paragraph), but in practice you wouldn't need variable-length counts anyway. Also, in practice it would be bad for large k, even if k is a constant - you'd run out of memory quickly and scanning for the first non-zero element could take a while.

Fastest method for Queue Implementation in Java

The task is to implement a queue in java with the following methods:
enqueue //add an element to queue
dequeue //remove element from queue
peekMedian //find median
peekMinimum //find minimum
peakMaximum //find maximum
size // get size
Assume that ALL METHODS ARE CALLED In EQUAL FREQUENCY, the task is to have the fastest implementation.
My Current Approach:
Maintain a sorted array, in addition to the queue, so enqueue and dequeue are take O(logn) and peekMedian, peekMaximum, peekMinimum all take O(1) time.
Please suggest a method that will be faster, assuming all methods are called in equal frequency.
Well, you are close - but there is still something missing, since inserting/deleting from a sorted array is O(n) (because at probability 1/2 the inserted element is at the first half of the array, and you will have to shift to the right all the following elements, and there are at least n/2 of these, so total complexity of this operation is O(n) on average + worst case)
However, if you switch your sorted DS to a skip list/ balanced BST - you are going to get O(logn) insertion/deletion and O(1) minimum/maximum/median/size (with caching)
EDIT:
You cannot get better then O(logN) for insertion (unless you decrease the peekMedian() to Omega(logN)), because that will enable you to sort better then O(NlogN):
First, note that the median moves one element to the right for each "high" elements you insert (in here, high means >= the current max).
So, by iteratively doing:
while peekMedian() != MAX:
peekMedian()
insert(MAX)
insert(MAX)
you can find the "higher" half of the sorted array.
Using the same approach with insert(MIN) you can get the lowest half of the array.
Assuming you have o(logN) (small o notation, better then Theta(logN) insertion and O(1) peekMedian(), you got yourself a sort better then O(NlogN), but sorting is Omega(NlogN) problem.
=><=
Thus insert() cannot be better then O(logN), with median still being O(1).
QED
EDIT2: Modifying the median in insertions:
If the tree size before insertion is 2n+1 (odd) then the old median is at index n+1, and the new median is at the same index (n+1), so if the element was added before the old median - you need to get the preceding node of the last median - and that's the new median. If it was added after it - do nothing, the old median is the new one as well.
If the list is even (2n elements), then after the insertion, you should increase an index (from n to n+1), so if the new element was added before the median - do nothing, if it was added after the old median, you need to set the new median as the following node from the old median.
note: In here next nodes and preceding nodes are those that follow according to the key, and index means the "place" of the node (smallest is 1st and biggest is last).
I only explained how to do it for insertion, the same ideas hold for deletion.
There is a simpler and perhaps better solution. (As has been discussed, the sorted array makes enqueue and dequeue both O(n), which is not so good.)
Maintain two sorted sets in addition to the queue. The Java library provides e.g. SortedSet, which are balanced search trees. The "low set" stores the first ceiling (n/2) elements in sorted order. The second "high set" has the last floor(n/2).
NB: If duplicates are allowed, you'll have to use something like Google's TreeMultiset instead of regular Java sorted sets.
To enqueue, just add to the queue and the correct set. If necessary, re-establish balance between the sets by moving one element: either the greatest element in the low set to the upper set or the least element in the high set to the low. Dequeuing needs the same re-balance operation.
Finding the median if n is odd is just looking up the max element in the low set. If n is even, find the max element in the low set and min in the high set and average them.
With the native Java sorted set implementation (balanced tree), this will be O(log n) for all operations. It will be very easy to code. About 60 lines.
If you implement your own sifting heaps for the low and high sets, then you'll have O(1) for the find median operation while all other ops will remain O(log n).
If you go on and implement your own Fibonacci heaps for the low and high sets, then you'll have O(1) insert as well.

Resources