Choosing comparing algorithms to find k max values - algorithm

Lets say that I want to find the K max values in an array of n elements , and also return them in a sorted output.
k may be -
k = 30 , k = n/5 ..
I thought about some efficient algorithms but all I could think of was in complexity of O(nlogn). Can I do it in `O(n)? maybe with some modification of quick sort?
Thanks!

The problem could be solved using min-heap-based priority queue in
O(NlogK) + (KlogK) time
If k is constant (k=30 case), then complexity is equal to O(N).
If k = O(N) (k=n/5 case), then complexity is equal to O(NlogN).
Another option for constant k - K-select algorithm based on Quicksort partition with average time O(N) (while worst case O(N^2) might occur)

There is a way of sorting elements in nearly O(n), if you assume that you only want to sort integers. This can be done with Algorithms like Bucket Sort or Radix Sort, that do not rely on the comparison between two elements (which are limited to O(n*log(n))).
However note, that these algorithms also have worst-case runtimes, that might be slower than O(n*log(n)).
More information can be found here.

No comparison based sorting algorithms can achieve a better average case complexity than O(n*lg n)
There are many papers with proofs out there but this site provides a nice visual example.
So unless you are given a sorted array, your best case is going to be an O(n lg n) algorithm.
There are sorts like radix and bucket, but they are not based off of comparison based sorting like your title seems to imply.

Related

Merge k sorted arrays of size n in less then O(nklogk) time complexity

The question:
Merge k sorted arrays each with n elements into a single array of size nk in minimum time complexity. The algorithm should be a comparison-based algorithm. No assumption on the input should be made.
So I know about an algorithm that solves the problem in nklogk time complexity as mentioned here: https://www.geeksforgeeks.org/merge-k-sorted-arrays/.
Though, my question is can we sort in less than nklogk, meaning, the runtime is o(nklogk).
So I searched through the internet and found this answer:
Merge k sorted arrays of size n in O(nk) time complexity
Which claims to divide an array of size K into singletons and merge them into a single array. But this is incorrect since one can claim that he found an algorithm that solves the problem in sqrt(n)klogk which is o(nklogk) but n=1 so we sort the array in KlogK time which doesn't contradict the lower bound on sorting an array.
So how can I contradict the lower bound on sorting an array? meaning, for an array of size N which doesn't have any assumptions on the input, sorting will take at least NlogN operations.
The lower bound of n log n only applies to comparison-based sorting algorithms (heap sort, merge sort, etc.). There are, of course, sorting algorithms that have better time complexities (such as counting sort), however they are not comparison-based.

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

Time Complexity to Sort a K Sorted Array Using Quicksort Algorithm

Problem:
I have to analyze the time complexity to sort (using Quick sort) a list of integer values which are almost sorted.
What I have done?
I have read SO Q1, SO Q2, SO Q3 and this one.
However, I have not found anything which mentioned explicitly the time complexity to sort a k sorted array using Quick sort.
Since the time complexity of Quick sort algorithm depends on the strategy of choosing pivot and there is a probability to face the worst case due to having almost sorted data, to avoid worst case, I have used median of three values(first, middle, last) as a pivot as referred here.
What do I think?
Since in average case, the time complexity of Quick sort algorithm is O(n log(n)) and as mentioned here, "For any non trivial value of n, a divide and conquer algorithm will need many O(n) passes, even if the array be almost completely sorted",
I think the time complexity to sort a k sorted array using Quick sort algorithm is O(n log(n)), if the worst case does not occur.
My Question:
Am I right that the time complexity to sort a k sorted array using Quick sort algorithm is O(n log(n)) if I try to avoid worst case selecting a proper pivot and if the worst case does not occur.
When you say time complexity of Quick Sort, it is O(n^2), because the worst case is assumed by default. However, if you use another strategy to choose pivot, like Randomized Quick Sort, for example, your time complexity is still going to be O(n^2) by default. But the expected time complexity is O(n log(n)), since the occurrence of the worst case is highly unlikely. So if you can prove somehow that the worst case is 100% guaranteed not to happen, then you can say time complexity is less than O(n^2), otherwise, by default, the worst case is considered, no matter how unlikely.

Best sorting algorithm for a partly sorted sequence?

I have to answer the following question:
What sorting algorithm is recommended if the first n-m part
is already sorted and the remaining part m is unsorted? Are there any algorithms that take O(n log m) comparisons? What about O(m log n) comparisons?
I just can't find the solution.
My first idea was insertion sort because O(n) for almost sorted sequence. But since we don't know the size of m the Runtime is very likely to be O(n^2) eventough the sequence is half sorted already isn't it?
Then I tought perhabs its quick sort because it takes (Sum from k=1 to n) Cavg (1-m) + Cavg (n-m) comparisons. But after ignoring the n-m part of the sequence the remaining sequence is 1-m in quicksort and not m.
Merge Sort and heap sort should have a runtime of O(m log m) for the remaining sequence m I would say.
Does anyone have an idea or can give me some advice?
Greetings
Have you tried sorting remaining part m separately as O(m log (m)) complexity (with any algorithm you like: MergeSort, HeapSort, QuickSort, ...) and then merge that part with sorted part using MergeSort (You won't even need to fully implement MergeSort - just single pass of it's inner loop body to merge two sorted sequences)?
That would result in O(m*log(m) + n + m) = O(m*log(m) + n) complexity. I don't believe it is possible to find better asymptotic complexity on single-core CPU. Although it will require additional O(n+m) memory for merging result array.
Which sort algorithm works best on mostly sorted data?
Sounds like insertion and bubble are good. You are free to implement as many as you want then test to see which is faster/fewer operations by supplying them partially sorted data.

Can you sort n integers in O(n) amortized complexity?

Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
What about trying to create a worst case of O(n) complexity?
Most of the algorithms today are built on O(nlogn) average + O(n^2) worst case.
Some, while using more memory are O(nlogn) worst.
Can you with no limitation on memory usage create such an algorithm?
What if your memory is limited? how will this hurt your algorithm?
Any page on the intertubes that deals with comparison-based sorts will tell you that you cannot sort faster than O(n lg n) with comparison sorts. That is, if your sorting algorithm decides the order by comparing 2 elements against each other, you cannot do better than that. Examples include quicksort, bubblesort, mergesort.
Some algorithms, like count sort or bucket sort or radix sort do not use comparisons. Instead, they rely on the properties of the data itself, like the range of values in the data or the size of the data value.
Those algorithms might have faster complexities. Here is an example scenario:
You are sorting 10^6 integers, and each integer is between 0 and 10. Then you can just count the number of zeros, ones, twos, etc. and spit them back out in sorted order. That is how countsort works, in O(n + m) where m is the number of values your datum can take (in this case, m=11).
Another:
You are sorting 10^6 binary strings that are all at most 5 characters in length. You can use the radix sort for that: first split them into 2 buckets depending on their first character, then radix-sort them for the second character, third, fourth and fifth. As long as each step is a stable sort, you should end up with a perfectly sorted list in O(nm), where m is the number of digits or bits in your datum (in this case, m=5).
But in the general case, you cannot sort faster than O(n lg n) reliably (using a comparison sort).
I'm not quite happy with the accepted answer so far. So I'm retrying an answer:
Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
The answer to this question depends on the machine that would execute the sorting algorithm. If you have a random access machine, which can operate on exactly 1 bit, you can do radix sort for integers with at most k bits, which was already suggested. So you end up with complexity O(kn).
But if you are operating on a fixed size word machine with a word size of at least k bits (which all consumer computers are), the best you can achieve is O(n log n). This is because either log n < k or you could do a count sort first and then sort with a O (n log n) algorithm, which would yield the first case also.
What about trying to create a worst case of O(n) complexity?
That is not possible. A link was already given. The idea of the proof is that in order to be able to sort, you have to decide for every element to be sorted if it is larger or smaller to any other element to be sorted. By using transitivity this can be represented as a decision tree, which has n nodes and log n depth at best. So if you want to have performance better than Ω(n log n) this means removing edges from that decision tree. But if the decision tree is not complete, than how can you make sure that you have made a correct decision about some elements a and b?
Can you with no limitation on memory usage create such an algorithm?
So as from above that is not possible. And the remaining questions are therefore of no relevance.
If the integers are in a limited range then an O(n) "sort" of them would involve having a bit vector of "n" bits ... looping over the integers in question and setting the n%8 bit of offset n//8 in that byte array to true. That is an "O(n)" operation. Another loop over that bit array to list/enumerate/return/print all the set bits is, likewise, an O(n) operation. (Naturally O(2n) is reduced to O(n)).
This is a special case where n is small enough to fit within memory or in a file (with seek()) operations). It is not a general solution; but it is described in Bentley's "Programming Pearls" --- and was allegedly a practical solution to a real-world problem (involving something like a "freelist" of telephone numbers ... something like: find the first available phone number that could be issued to a new subscriber).
(Note: log(10*10) is ~24 bits to represent every possible integer up to 10 digits in length ... so there's plenty of room in 2*31 bits of a typical Unix/Linux maximum sized memory mapping).
I believe you are looking for radix sort.

Resources