Counting Sort has a lower bound of O(n) - algorithm

The running time of counting sort is Θ (n+k). If k=O(n), the algorithm is O(n). The k represents the range of the input elements.
Can I say that the Counting sort has a lower bound of O(n) because the algorithm takes O(n) time to compute a problem and that the lower bound of O(n) shows that there is no hope of solving a specific computation problem in time better than Ω(n)??

Well yes since T(n,k) = Theta(n+k) then T(n,k) = Omega(n+k). Since k is nonnegative we know that n + k = Omega(n) and so T(n, k) = Omega(n) as required.

Another perspective on why the lower bound is indeed Ω(n): if you want to sort an array of n elements, you need to at least look at all the array elements. If you don’t, you can’t form a sorted list of all the elements of the array because you won’t know what those array elements are. :-)
That gives an immediate Ω(n) lower bound for sorting any sequence of n elements, unless you can read multiple elements of the sequence at once (say, using parallelism or if the array elements are so small that you can read several with a single machine instruction.)

Related

How do you sort an array in the most efficient way when given the largest value?

Let's say that i have an array of size n and the largest value of this array is k.
Let's assume that k=log(sqrt(n)) and I want to sort this array in the most efficient way possible ,to do this I've simplified the equation to get k in terms of n and I've got n=2^2k that's my array size.
Now if i apply any sorting algorithm of Θ(n^(2)) the time complexity will be equal to Θ(2^(4k)) in terms of n this will be Θ(n^2)
,and if i apply a Θ(nlogn) sorting algorithm i will have Θ(k*2^(2k)) and in terms of n i will have Θ(nlog(sqrt(n))) which is the most efficient time complexity ,did i do this right?
And if i assume k=n^n can I use the same method as i used before?
for this I'm failing to denote the array size in terms of k to use the same method ,is there another way ?
Knowing the largest element value k in an array you want to sort, might help, especially if, as in your case k < n. In this case you can use counting sort an will have a runtime of O(n + k)

Should we ignore constant k in O(nk)?

Was reading CLRS when I encountered this:
Why do we not ignore the constant k in the big o equations in a. , b. and c.?
In this case, you aren't considering the run time of a single algorithm, but of a family of algorithms parameterized by k. Considering k lets you compare the difference between sorting n/n == 1 list and n/2 2-element lists. Somewhere in the middle, there is a value of k that you want to compute for part (c) so that Θ(nk + n lg(n/k)) and Θ(n lg n) are equal.
Going into more detail, insertion sort is O(n^2) because (roughly speaking) in the worst case, any single insertion could take O(n) time. However, if the sublists have a fixed length k, then you know the insertion step is O(1), independent of how many lists you are sorting. (That is, the bottleneck is no longer in the insertion step, but the merge phase.)
K is not a constant when you compare different algorithms with different values of k.

Lower bound Ω(nlogn) of time complexity of every comparison based sorting algorithm given M maximum

Given the maximum element M of array with n elements [1,...,n], how the lower bound Ω(nlogn) of time complexity of every comparison based sorting algorithm is affected? I must highlight that the maximum element M of the array is given.
It is not affected.
Note that there are n! possible permutation, and each compare OP has 2 possible outcomes - 'left is higher' or 'right is higher'.
For any comparisons based algorithm, each 'decision' is made according to the outcome of one comparison.
Thus, in order to successfully determine the correct order of any permutation, you are going to need (at worst case) log2(n!) comparisons.
However, it is well known that log2(n!) is in Theta(nlogn) - and you got yourself back to a lower bound of Omega(nlogn), regardless of the range at hand.
Note that other methods that do not use (only) comparisons exist to sort integers more efficiently.
If M is really a bound on the absolute values of the elements of the array, and the elements are integers, you can sort the array in O(n + M) time, by keeping a separate array int occurrences[2M + 1]; initialized to 0, scanning your original array and counting the number of occurrences of each element, and writing the output array using occurrences.
If the elements are floats (formally, real numbers), having a bound on their magnitudes has no effect.
If the elements are integral and can be negative (formally, integers of arbitrarily large magnitude), then having an upper bound on the magnitudes has no effect.
Edit: had O(n) in first paragraph, should be O(n + M).

Sorting m sets of total O(n) elements in O(n)

Suppose we have m sets S1,S2,...,Sm of elements from {1...n}
Given that m=O(n) , |S1|+|S2|+...+|Sm|=O(n)
sort all the sets in O(n) time and O(n) space.
I was thinking to use counting sort algorithm on each set.
Counting sort on each set will be O(S1)+O(S2)+...+O(Sm) < O(n)
and because that in it's worst case if one set consists of n elements it will still take O(n).
But will it solve the problem and still hold that it uses only O(n) space?
Your approach won't necessarily work in O(n) time. Imagine you have n sets of one element each, where each set just holds n. Then each iteration of counting sort will take time Θ(n) to complete, so the total runtime will be Θ(n2).
However, you can use a modified counting sort to solve this by effectively doing counting sort on all sets at the same time. Create an array of length n that stores lists of numbers. Then, iterate over all the sets and for each element, if the value is k and the set number is r, append the number r to array k. This process essentially builds up a histogram of the distribution of the elements in the sets, where each element is annotated with the set that it came from. Then, iterate over the arrays and reconstruct the sets in sorted order using logic similar to counting sort.
Overall, this algorithm takes time Θ(n), since it takes time Θ(n) to initialize the array, O(n) total time to distribute the elements, and O(n) time to write them back. It also uses only Θ(n) space, since there are n total arrays and across all the arrays there are a total of n elements distributed.
Hope this helps!

Prove that the running time of quick sort after modification = O(Nk)

this is a homework question, and I'm not that at finding the complixity but I'm trying my best!
Three-way partitioning is a modification of quicksort that partitions elements into groups smaller than, equal to, and larger than the pivot. Only the groups of smaller and larger elements need to be recursively sorted. Show that if there are N items but only k unique values (in other words there are many duplicates), then the running time of this modification to quicksort is O(Nk).
my try:
on the average case:
the tree subroutines will be at these indices:
I assume that the subroutine that have duplicated items will equal (n-k)
first: from 0 - to(i-1)
Second: i - (i+(n-k-1))
third: (i+n-k) - (n-1)
number of comparisons = (n-k)-1
So,
T(n) = (n-k)-1 + Sigma from 0 until (n-k-1) [ T(i) + T (i-k)]
then I'm not sure how I'm gonna continue :S
It might be a very bad start though :$
Hope to find a help
First of all, you shouldn't look at the average case since the upper bound of O(nk) can be proved for the worst case, which is a stronger statement.
You should look at the maximum possible depth of recursion. In normal quicksort, the maximum depth is n. For each level, the total number of operations done is O(n), which gives O(n^2) total in the worst case.
Here, it's not hard to prove that the maximum possible depth is k (since one unique value will be removed at each level), which leads to O(nk) total.
I don't have a formal education in complexity. But if you think about it as a mathematical problem, you can prove it as a mathematical proof.
For all sorting algorithms, the best case scenario will always be O(n) for n elements because to sort n elements you have to consider each one atleast once. Now, for your particular optimisation of quicksort, what you have done is simplified the issue because now, you are only sorting unique values: All the values that are the same as the pivot are already considered sorted, and by virtue of its nature, quicksort will guarantee that every unique value will feature as the pivot at some point in the operation, so this eliminates duplicates.
This means for an N size list, quicksort must perform some operation N times (once for every position in the list), and because it is trying to sort the list, that operation is trying to find the position of that value in the list, but because you are effectively dealing with just unique values, and there are k of those, the quicksort algorithm must perform k comparisons for each element. So it performs Nk operations for an N sized list with k unique elements.
To summarise:
This algorithm eliminates checking against duplicate values.
But all sorting algorithms must look at every value in the list at least once. N operations
For every value in the list the operation is to find its position relative to other values in the list.
Because duplicates get removed, this leaves only k values to check against.
O(Nk)

Resources