What has a time complexity of O(nlogk) - algorithm

To sort a basic array of integers of size n find the k smallest elements, what function would cause a time complexity of O(nlogk)?
When looking at a merge sort for example, it gives has the time complexity of O(nlogn). Where does k come into the time complexity?

Put the numbers into a priority queue one-by-one. Every time you insert a number, if the queue now has more than k elements, then remove the largest one. Finally, poll the elements remaining in the queue to get the k smallest from the original array, in reverse order.
This runs in O(n log k) time assuming your priority queue of size at most k+1 can insert and find-and-remove-max in O(log k) time. A heap or balanced BST will work.

Related

a heap with n elements that supports Insert and Extract-Min, Which of the following tasks can you achieve in O(logn) time?

For the following questions
Question 3
You are given a heap with n elements that supports Insert and Extract-Min. Which of the following tasks can you achieve in O(logn) time?
Find the median of the elements stored in the heap.
Find the fifth-smallest element stored in the heap.
Find the largest element stored in the heap.
Find the median of the elements stored in theheap.
Why is "Find the largest element stored in the heap."not correct, my understanding here is that you can use logN time to go to the bottom of the heap, and one of the element there must be the largest element.
"Find the fifth-smallest element stored in the heap." this should take constant time right, because you only need to go down 5 layers at most?
"Find the median of the elements stored in the heap. " should this take O(n) time? because we extract min for the n elements to get a sorted array, and take o(1) to find the median of it?
It depends on what the running times are of the operations insert and extract-min. In traditional heaps, both take ϴ(log n) time. However, in finger-tree-based heaps, only insert takes ϴ(log n) time, while extract-min takes O(1) time. There, you can find the fifth smallest element in O(5) = O(1) time and the median in O(n/2) = O(n) time. You can also find the largest element in O(n) time.
Why is "Find the largest element stored in the heap."not correct, my understanding here is that you can use logN time to go to the bottom of the heap, and one of the element there must be the largest element.
The lowest level of the heap contains half of the elements. More correctly, half of the elements of the heap are leaves--have no children. The largest element in the heap is one of those. Finding the largest element of the heap, then, will require that you examine n/2 items. Except that the heap only supports insert and extract-min, so you end up having to call extract-min on every element. Finding the largest element will take O(n log n) time.
"Find the fifth-smallest element stored in the heap." this should take constant time right, because you only need to go down 5 layers at most?
This can be done in log(n) time. Actually 5*log(n) because you have to call extract-min five times. But we ignore constant factors. However it's not constant time because the complexity of extract-min depends on the size of the heap.
"Find the median of the elements stored in the heap." should this take O(n) time? because we extract min for the n elements to get a sorted array, and take o(1) to find the median of it?
The median is the middle element. So you only have to remove n/2 elements from the heap. But removing an item from the heap is a log(n) operation. So the complexity is O(n/2 log n) and since we ignore constant factors in algorithmic analysis, it's O(n log n).

Extracting k largest elements

If I have n integers, is it possible to list the k largest elements out of the n values in O(k+logn) time? The closest I've gotten is constructing a max heap and extracting the maximum k times, which takes O(klogn) time. Also thinking about using inorder traversal.
Ways to solve this problem.
Sort the data, then take top k. Sorting takes O(n lg n) and iterating over the top k takes O(k). Total time: O(n lg n + k)
Build a max-heap from the data and remove the top k times. Building the heap is O(n), and the operation to remove the top item is O(lg N) to reheapify. Total time: O(n) + O(k lg n)
Keep a running min-heap of maximum size k. Iterate over all the data, add to the heap, and then take the entirety of the heap. Total time: O(n lg k) + O(k)
Use a selection algorithm to find the k'th largest value. Then iterate over all the data to find all items that are larger than that value.
a. You can find the k'th largest using QuickSelect which has an average running time of O(n) but a worst case of O(n^2). Total average case time: O(n) + O(n) = O(n). Total worst case time: O(n^2) + O(n) = O(n^2).
b. You can also find the k'th largest using the median-of-medians algorithms which has a worst case running time of O(n) but is not in-place. Total time: O(n) + O(n) = O(n).
You can use Divide and Conquer technique for extracting kth element from array.Technique is sometimes called as Quick select because it uses the Idea of Quicksort.
QuickSort, we pick a pivot element, then move the pivot element to its correct position and partition the array around it. The idea is, not to do complete quicksort, but stop at the point where pivot itself is k’th smallest element. Also, not to recur for both left and right sides of pivot, but recur for one of them according to the position of pivot. The worst case time complexity of this method is O(n^2), but it works in O(n) on average.
Constructing a heap takes O(nlogn), and extracting k elements takes O(klogn). If you reached the conclusion that extracting k elements is O(klogn), it means you're not worried about the time it takes to build the heap.
In that case, just sort the list ( O(nlogn) ) and take the k largest element (O(k)).

Find time complexity of an element which is neither kth maximum nor kth minimum?

There are N distinct numbers which are given not in sorted order. How much time it will take to select a number say which is neither k-th minimum nor k-th maximum?
I tried like this =>
Take initial k + 1 numbers and sort them in O(k log k). Then pick up kth number in that sorted list, that will be neither the kth minimum nor kth maximum .
Hence, time complexity = O(K log k)
Example =>
Select a number which is neither the 2nd minimum nor 2nd maximum.
array[] = {3,9,1,2,6,5,7,8,4}
Take initial 3 numbers or subarray = 3,9,1 and sorted subarray will be = 1,3,9
Now pick up 2nd element 3. Now, 3 is not the 2nd minimum nor 2nd maximum .
Now, time complexity = O(k lg k) = O(2 lg 2) = O(1).
The problem is trivial if N < k. Otherwise there's no k'th largest or smallest element in the array -- so one can pick any element (for example the first) in O(1) time.
If N is large enough you can take any subset of size 2k+1 and choose the median. Then you have found a number that is guaranteed not to be the kth largest or smallest number in the overall array. In fact you get something stronger -- it's guaranteed that it will not be in the first k or last k numbers in the sorted array.
Finding a median of M things can be done in O(M) time, so this algorithm runs in O(k) time.
I believe this is asymptotically optimal for large N -- any algorithm that considers fewer than k items cannot guarantee that it chooses a number that's not the kth min or max in the overall array.
If N isn't large enough (specifically N < 2k+1), you can find the minimum (or second minimum value if k=1) in O(N) time. Since k <= N < 2k+1, this is also O(k).
There are three cases where no solution exists: (k=1, N=1), (k=1, N=2), (k=2, N=2).
If you only consider cases where k <= N, then the complexity of the overall algorithm is O(k). If you want to include the trivial cases too then it's somewhat messy. If I(k<=N) is the function that's 1 when k<=N and 0 otherwise, a tighter bound is O(1 + k*I(k<=N)).
I think there many points that must be noticed in your solution:
Firstly it would require to take 2k+1 elements instead of k+1 in your solution. More specifically you take :
array[] = {3,9,1,2,6,5,7,8,4}
Take initial 3 numbers or subarray = 3,9,1 and sorted subarray will be = 1,3,9
Now pick up 2nd element 3. Now, 3 is not the 2nd minimum nor 2nd maximum .
but to check that 3 is not the 2nd minimum nor 2nd you can't do it with your k+1 elements:subarray = 3,9,1 you have to check the array to see what is the 2 max and min and check your solution.
On the other hand by taking 2k+1 elements and sorting them ,since your elements are distinct you would know that the k+1 element is greater from the k first elements and smaller from the k last elements of your sorted subarray.
In your example you could see:
array[] = {3,9,1,2,6,5,7,8,4}
subarray[]={3,9,1,2,6} then sort the subarray :{1,2,3,6,9} ,and give as an answer the number 3 .
An example where your solution would not be rigt:
array[] = {9,8,2,6,5,3,7,1,4} where your algorithm would return the number 2 which is the second min .
As of terms of complexity .By taking 2k+1 elements it would not change the complexity that you found because it would be O((2k+1)log(2k+1)) which is O(klog(k)).
Clearly if n<2k+1 the above algorithm won't work ,so you will have to sort the entire array which would take nlog n , but in this case n<2k+1 so it O(klogk).
Finally the algorithm based on the above will be O(klog k) .A thing that might be confusing is that the problem has two parameters k,n .If K is much smaller than n this is efficient algorithm since you don't need to look and short the n-size array but when k,n are very close then it is the same as sorting the n-size array .
One more thing that you should understand is that big O notation is way of measuring the time complexity when an input n is given to the algorithm ,and shows the asymptotic behavior of the algorithm for big input n. O(1) denotes that the algorithm is running ALWAYS in constant time .So in the end when you refer:
Now, time complexity = O(k lg k) = O(2 lg 2) = O(1).
This is not Right you have to measure the complexity with k being the input variable and not a constant ,and this shows the behavior of the algorithm for a random input k. Clearly the above algorithm doesn't take O(1) (or else constant time) it takes O(k log(k)).
Finally ,after searching for a better approach of the problem, if you want a more efficient way you could find kth min and kth max in O(n) (n is the size of the array) .And with one loop in O(n) you could simply select the first element which is different from kth min and max. I think O(n) is the lowest time complexity you can get since finding kth min and max take the least O(n).
For how to find kth min,max in O(n) you could see here:
How to find the kth largest element in an unsorted array of length n in O(n)?
This solution is O(n) while previous solution was O(klog k) .Now for k parameter close to n ,as explained above it is the same as O(n log(n)) ,so in this occasion the O(n) solution is better .But if most of the times k is much smaller than n then then O(k log k) may be better .The good thing with the O(n) solution (second solution) is that in all cases it takes O(n) regardless to k so it is more stable but as mentioned for small k the first solution may be better (but in the worst case it can reach O(nlogn)).
You can sort the entire list in pseudo-linear time using radix-sort and select the k-th largest element in constant time.
Overall it would be a worst-case O(n) algorithm assuming the size of the radix is much smaller than n or you're using a Selection algorithm.
O(n) is the absolute lower bound here. There's no way to get anything better than linear because if the list is unsorted you need to at least examine everything or you might miss the element you're looking for.

Is it possible to find the k-largest numbers from n unsorted integers with time complexity O(n) and space complexity O(k)?

I know that we can find the k-largest numbers from n unsorted integers in 2 ways:
Use an algorithm like quick-select to find the Kth largest number, then we can get the k-largest numbers. The time complexity is O(n) and space complexity is O(n)
Use a heap to store the k-largest numbers and iterate through n integers, then add proper integers to the heap. The time complexity is O(nlogk) and space complexity is O(k)
Suppose the n integers are in a stream and we don't have random access to them
I want to know is it possible to find the k-largest numbers from n unsorted integers with time complexity O(n) and space complexity O(k)?
It is. After filling the heap with k elements, instead of evicting one element from the heap after every insertion, evict k elements from the heap after every k insertions. Then you don't need the heap structure any more -- just select every time.
k passes of bubble sort will give k largest elements at last of the array timeo(nk)

Get the k smallest elements of an array using quick sort

How would you find the k smallest elements from an unsorted array using quicksort (other than just sorting and taking the k smallest elements)? Would the worst case running time be the same O(n^2)?
You could optimize quicksort, all you have to do is not run the recursive potion on the other portions of the array other than the "first" half until your partition is at position k. If you don't need your output sorted, you can stop there.
Warning: non-rigorous analysis ahead.
However, I think the worst-case time complexity will still be O(n^2). That occurs when you always pick the biggest or smallest element to be your pivot, and you devolve into bubble sort (i.e. you aren't able to pick a pivot that divides and conquers).
Another solution (if the only purpose of this collection is to pick out k min elements) is to use a min-heap of limited tree height ciel(log(k)) (or exactly k nodes). So now, for each insert into the min heap, your maximum time for insert is O(n*log(k)) and the same for removal (versus O(n*log(n)) for both in a full heapsort). This will give the array back in sorted order in linearithmic time worst-case. Same with mergesort.

Resources