Can you sort n integers in O(n) amortized complexity? - algorithm

Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
What about trying to create a worst case of O(n) complexity?
Most of the algorithms today are built on O(nlogn) average + O(n^2) worst case.
Some, while using more memory are O(nlogn) worst.
Can you with no limitation on memory usage create such an algorithm?
What if your memory is limited? how will this hurt your algorithm?

Any page on the intertubes that deals with comparison-based sorts will tell you that you cannot sort faster than O(n lg n) with comparison sorts. That is, if your sorting algorithm decides the order by comparing 2 elements against each other, you cannot do better than that. Examples include quicksort, bubblesort, mergesort.
Some algorithms, like count sort or bucket sort or radix sort do not use comparisons. Instead, they rely on the properties of the data itself, like the range of values in the data or the size of the data value.
Those algorithms might have faster complexities. Here is an example scenario:
You are sorting 10^6 integers, and each integer is between 0 and 10. Then you can just count the number of zeros, ones, twos, etc. and spit them back out in sorted order. That is how countsort works, in O(n + m) where m is the number of values your datum can take (in this case, m=11).
Another:
You are sorting 10^6 binary strings that are all at most 5 characters in length. You can use the radix sort for that: first split them into 2 buckets depending on their first character, then radix-sort them for the second character, third, fourth and fifth. As long as each step is a stable sort, you should end up with a perfectly sorted list in O(nm), where m is the number of digits or bits in your datum (in this case, m=5).
But in the general case, you cannot sort faster than O(n lg n) reliably (using a comparison sort).

I'm not quite happy with the accepted answer so far. So I'm retrying an answer:
Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
The answer to this question depends on the machine that would execute the sorting algorithm. If you have a random access machine, which can operate on exactly 1 bit, you can do radix sort for integers with at most k bits, which was already suggested. So you end up with complexity O(kn).
But if you are operating on a fixed size word machine with a word size of at least k bits (which all consumer computers are), the best you can achieve is O(n log n). This is because either log n < k or you could do a count sort first and then sort with a O (n log n) algorithm, which would yield the first case also.
What about trying to create a worst case of O(n) complexity?
That is not possible. A link was already given. The idea of the proof is that in order to be able to sort, you have to decide for every element to be sorted if it is larger or smaller to any other element to be sorted. By using transitivity this can be represented as a decision tree, which has n nodes and log n depth at best. So if you want to have performance better than Ω(n log n) this means removing edges from that decision tree. But if the decision tree is not complete, than how can you make sure that you have made a correct decision about some elements a and b?
Can you with no limitation on memory usage create such an algorithm?
So as from above that is not possible. And the remaining questions are therefore of no relevance.

If the integers are in a limited range then an O(n) "sort" of them would involve having a bit vector of "n" bits ... looping over the integers in question and setting the n%8 bit of offset n//8 in that byte array to true. That is an "O(n)" operation. Another loop over that bit array to list/enumerate/return/print all the set bits is, likewise, an O(n) operation. (Naturally O(2n) is reduced to O(n)).
This is a special case where n is small enough to fit within memory or in a file (with seek()) operations). It is not a general solution; but it is described in Bentley's "Programming Pearls" --- and was allegedly a practical solution to a real-world problem (involving something like a "freelist" of telephone numbers ... something like: find the first available phone number that could be issued to a new subscriber).
(Note: log(10*10) is ~24 bits to represent every possible integer up to 10 digits in length ... so there's plenty of room in 2*31 bits of a typical Unix/Linux maximum sized memory mapping).

I believe you are looking for radix sort.

Related

Merge k sorted arrays of size n in less then O(nklogk) time complexity

The question:
Merge k sorted arrays each with n elements into a single array of size nk in minimum time complexity. The algorithm should be a comparison-based algorithm. No assumption on the input should be made.
So I know about an algorithm that solves the problem in nklogk time complexity as mentioned here: https://www.geeksforgeeks.org/merge-k-sorted-arrays/.
Though, my question is can we sort in less than nklogk, meaning, the runtime is o(nklogk).
So I searched through the internet and found this answer:
Merge k sorted arrays of size n in O(nk) time complexity
Which claims to divide an array of size K into singletons and merge them into a single array. But this is incorrect since one can claim that he found an algorithm that solves the problem in sqrt(n)klogk which is o(nklogk) but n=1 so we sort the array in KlogK time which doesn't contradict the lower bound on sorting an array.
So how can I contradict the lower bound on sorting an array? meaning, for an array of size N which doesn't have any assumptions on the input, sorting will take at least NlogN operations.
The lower bound of n log n only applies to comparison-based sorting algorithms (heap sort, merge sort, etc.). There are, of course, sorting algorithms that have better time complexities (such as counting sort), however they are not comparison-based.

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

Hash with O(N) Run time using O (k) memory

So assume we are given an array of m numbers, the max number in this array is k. There are duplicates in this array.
let array a = [1,2,3,1,2,5,1,2,3,4]
Is there an algorithm that prints out this array after o(n) operation result in [1,2,3,4,5](both sorted and no duplicate), where n is the quantity of unique values.
We are allowed to use k memory -- 5 in this case.
The algorithm I have in mind is to use a hash table. Insert value into a hash table, if the value exist before, we ignore it. This will sort automatically. However, if we have 5 number, [1,2,3,100,4] but one of them is 100, means when printing these 5 numbers, we need to run o(k) ~= 100 time instead of o(n) ~= 5 time.
Is there a way to solve this problem?
I don't think there exists such algorithm. Take a look here https://en.wikipedia.org/wiki/Sorting_algorithm
Essentially for comparison based algorithms the best you can achieve is O(nlogn). But since you have provided the max value k I would assume you want something more than just comparison based algorithm.
But for non-comparison based algorithms, since it by nature depends on magnitude of the numbers, the complexity has to reflect such dependency - meaning you will definitely have k somewhere in your total time complexity. You won't be able to find an algorithm of just O(n).
Conversly, if that O(n) algorithm were to exist and were not to depend on k. You can sort any array of n numbers since k is an extra, useless information.
You suggest that printing 5 numbers takes o(k) (or 100) time instead of o(n). That is wrong because, to print those 5 numbers, it takes 5 time to iterate and print. How would the value of your number change the time in which it takes to pull off this problem? The only situation when that should make a difference is if the value is greater than the allowable value in a 32-bit integer, or 2^32-1. Then you would have to detect those cases and treat them differently. However, assuming you don't have any integers of that size, you should be able to print 5 integers in O(5) time. I would go back over your calculation of the time it takes to go through your algorithm.
With your method, if you're using efficient algorithms, you should be able to remove duplicates in O(n log n) time, as seen here.
The way I see it, if you have a piece of the algorithm (the hashing part, where you remove duplicates and sort) running in O(n log n) time, and a piece of the algorithm (printing the array) running O(N) (or O(5) in this case), the entire algorithm runs in O(N) time: O(N) + O(N log N) -> O(N), since O(N) >= O(N log N). I hope that answers what you were asking for!
Looks like I was wrong, since O(N log N) of course grows faster than O(N). I don't think there's any way to pull off your problem.

Does comparison really take O(1) time? and if not... why do we use comparison sorts?

Consider two k-bit numbers (in binary representation):
$$A = A_1 A_2 A_3 A_4 ... A_k $$
$$B = B_1 B_2 B_3 B_4 ... B_k $$
to compare we scan from left to right looking for an occurrence of a 0 and check opposite number if that digit is also a 0 (for both numbers) noticing that if ever such a case is found then the source of the 0 is less than the source of the 1. But what if the numbers are:
111111111111
111111111110
clearly this will require scanning the whole number and if we are told nothing about the numbers ahead of time and simply given them then:
Comparison take $O(k)$ time.
Therefore when we look at the code for a sorting method such as high-performance quick sort:
HPQuicksort(list): T(n)
check if list is sorted: if so return list
compute median: O(n) time (or technically: O(nk))
Create empty list $L_1$, $L_2$, and $L_3$ O(1) time
Scan through list O(n)
if element is less place into $L_1$ O(k)
if element is more place into $L_2$ O(k)
if element is equal place into $L_3$ O(k)
return concatenation of HP sorted $L_1$, $L_3$, $L_2$ 2 T(n/2)
Thus: T(n) = O(n) + O(nk) + 2*T(n/2) ---> T(n) = O(nklog(n))
Which means quicksort is slower than radix sort.
Why do we still use it then?
There seem to be two independent questions here:
Why do we claim that comparisons take time O(1) when analyzing sorting algorithms, when in reality they might not?
Why would we use quicksort on large integers instead of radix sort?
For (1), typically, the runtime analysis of sorting algorithms is measured in terms of the number of comparisons made rather than in terms of the total number of operations performed. For example, the famous sorting lower bound gives a lower bound in terms of number of comparisons, and the analyses of quicksort, heapsort, selection sort, etc. all work by counting comparisons. This is useful for a few reasons. First, typically, a sorting algorithm will be implemented by being given an array and some comparison function used to compare them (for example, C's qsort or Java's Arrays.sort). From the perspective of the sorting algorithm, this is a black box. Therefore, it makes sense to analyze the algorithm by trying to minimize the number of calls to the black box. Second, if we do perform our analyses of sorting algorithms by counting comparisons, it's easy to then determine the overall runtime by multiplying the number of comparisons by the cost of a comparison. For example, you correctly determined that sorting n k-bit integers will take expected time O(kn log n) using quicksort, since you can just multiply the number of comparisons by the cost of a comparison.
For your second question - why would we use quicksort on large integers instead of radix sort? - typically, you would actually use radix sort in this context, not quicksort, for the specific reason that you pointed out. Quicksort is a great sorting algorithm for sorting objects that can be compared to one another and has excellent performance, but radix sort frequently outperforms it on large arrays of large strings or integers.
Hope this helps!

Is it possible to find two numbers whose difference is minimum in O(n) time

Given an unsorted integer array, and without making any assumptions on
the numbers in the array:
Is it possible to find two numbers whose
difference is minimum in O(n) time?
Edit: Difference between two numbers a, b is defined as abs(a-b)
Find smallest and largest element in the list. The difference smallest-largest will be minimum.
If you're looking for nonnegative difference, then this is of course at least as hard as checking if the array has two same elements. This is called element uniqueness problem and without any additional assumptions (like limiting size of integers, allowing other operations than comparison) requires >= n log n time. It is the 1-dimensional case of finding the closest pair of points.
I don't think you can to it in O(n). The best I can come up with off the top of my head is to sort them (which is O(n * log n)) and find the minimum difference of adjacent pairs in the sorted list (which adds another O(n)).
I think it is possible. The secret is that you don't actually have to sort the list, you just need to create a tally of which numbers exist. This may count as "making an assumption" from an algorithmic perspective, but not from a practical perspective. We know the ints are bounded by a min and a max.
So, create an array of 2 bit elements, 1 pair for each int from INT_MIN to INT_MAX inclusive, set all of them to 00.
Iterate through the entire list of numbers. For each number in the list, if the corresponding 2 bits are 00 set them to 01. If they're 01 set them to 10. Otherwise ignore. This is obviously O(n).
Next, if any of the 2 bits is set to 10, that is your answer. The minimum distance is 0 because the list contains a repeated number. If not, scan through the list and find the minimum distance. Many people have already pointed out there are simple O(n) algorithms for this.
So O(n) + O(n) = O(n).
Edit: responding to comments.
Interesting points. I think you could achieve the same results without making any assumptions by finding the min/max of the list first and using a sparse array ranging from min to max to hold the data. Takes care of the INT_MIN/MAX assumption, the space complexity and the O(m) time complexity of scanning the array.
The best I can think of is to counting sort the array (possibly combining equal values) and then do the sorted comparisons -- bin sort is O(n + M) (M being the number of distinct values). This has a heavy memory requirement, however. Some form of bucket or radix sort would be intermediate in time and more efficient in space.
Sort the list with radixsort (which is O(n) for integers), then iterate and keep track of the smallest distance so far.
(I assume your integer is a fixed-bit type. If they can hold arbitrarily large mathematical integers, radixsort will be O(n log n) as well.)
It seems to be possible to sort unbounded set of integers in O(n*sqrt(log(log(n))) time. After sorting it is of course trivial to find the minimal difference in linear time.
But I can't think of any algorithm to make it faster than this.
No, not without making assumptions about the numbers/ordering.
It would be possible given a sorted list though.
I think the answer is no and the proof is similar to the proof that you can not sort faster than n lg n: you have to compare all of the elements, i.e create a comparison tree, which implies omega(n lg n) algorithm.
EDIT. OK, if you really want to argue, then the question does not say whether it should be a Turing machine or not. With quantum computers, you can do it in linear time :)

Resources