Quicksort Analysis - sorting

Question:
Here's a modification of quick sort: whenever we have ten items or fewer in a sublist, we sort the sublist using selection sort rather than further recursing with quicksort. Does this change the big-oh time complexity of quicksort? Explain.
In my opinion the big-oh time complexity would change. We know that selection sort is O(n^2) and therefore sorting the sublist of ten items or fewer would take O(n^2). Until we get to a sublist that has ten or fewer items we would use quicksort and keep partitioning the list. So in the end we would have O( nlogn + n^2) which is O(n^2).
Am I correct? If not, could someone explain why?

The reason that the time complexity is actually unaffected is that 10 is a constant term. No matter how large the total array is, it always takes a constant amount of time to sort subarrays of size 10 or less. If you are sorting a list with one million elements, that constant 10 is going to play a very small role in the actual time it takes to sort the list (most of the time will be spent partitioning the original array into subarrays recursively).
If sorting a list of 10 elements takes constant time, partitioning the array at each recursive step is linear, and you end up with log n subarrays of 10 items or fewer, you end up with O(n log n + log n), which is the same as O(n log n).
Saying that selection sort is O(n^2) means that the running time of the algorithm increases quadratically with the size of the input. Running selection sort on an array with a constant number of elements will always take constant time on its own, but if you were to compare the running time of selection sort on arrays of varying sizes, you would see a quadratic increase in the running time as input size varies linearly.

The big O complexity does not change. Please read up on the Master Method (aka Master Theorem) https://en.wikipedia.org/wiki/Master_theorem
If you think through the algorithm as the size of the sorting list grows exceptionally large the time to sort the final ten in any given recursion substree will make insignificant contributions to overall running time.

Related

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

Finding the top log n elements of an unsorted list in linear time

I was given a question to find the top log(n) elements in an unsorted array. I know that I can do this in O(n) time with a selection algorithm to find the log(n)-th largest element and then find all elements larger than it. However, would it be possible to use a heap or other priority queue to do it in O(n) time as well?
Thanks
To get the top k items from an unsorted list, you employ the the Quickselect algorithm to partition the array such that the first k elements are at the beginning of the array.
Quickselect is, on average, O(n). However, in pathological cases it can take O(n^2) time.
Using a heap, selecting the top k elements from a list of n is O(n log k). If you're taking the first log(n) items, then k = log(n), and your complexity is O(n log(log(n)).
From a performance perspective, using a heap to select the top log(n) items from a list will almost certainly be faster than using Quickselect. The reason is that Quickselect, even though it's (on average) linear, it has a high constant. It takes almost the same time to select the top 10 items from a list of 1,000,000 as it does to select the top 100,000 items. But my research shows that using a heap to do the selection is faster than Quickselect when the number to be selected is much smaller (less than 1%) of the total number of items. Considering that the log base 2 of 1,000,000 is about 20, it's almost certain that a heap selection algorithm would be faster than Quickselect.

Hash with O(N) Run time using O (k) memory

So assume we are given an array of m numbers, the max number in this array is k. There are duplicates in this array.
let array a = [1,2,3,1,2,5,1,2,3,4]
Is there an algorithm that prints out this array after o(n) operation result in [1,2,3,4,5](both sorted and no duplicate), where n is the quantity of unique values.
We are allowed to use k memory -- 5 in this case.
The algorithm I have in mind is to use a hash table. Insert value into a hash table, if the value exist before, we ignore it. This will sort automatically. However, if we have 5 number, [1,2,3,100,4] but one of them is 100, means when printing these 5 numbers, we need to run o(k) ~= 100 time instead of o(n) ~= 5 time.
Is there a way to solve this problem?
I don't think there exists such algorithm. Take a look here https://en.wikipedia.org/wiki/Sorting_algorithm
Essentially for comparison based algorithms the best you can achieve is O(nlogn). But since you have provided the max value k I would assume you want something more than just comparison based algorithm.
But for non-comparison based algorithms, since it by nature depends on magnitude of the numbers, the complexity has to reflect such dependency - meaning you will definitely have k somewhere in your total time complexity. You won't be able to find an algorithm of just O(n).
Conversly, if that O(n) algorithm were to exist and were not to depend on k. You can sort any array of n numbers since k is an extra, useless information.
You suggest that printing 5 numbers takes o(k) (or 100) time instead of o(n). That is wrong because, to print those 5 numbers, it takes 5 time to iterate and print. How would the value of your number change the time in which it takes to pull off this problem? The only situation when that should make a difference is if the value is greater than the allowable value in a 32-bit integer, or 2^32-1. Then you would have to detect those cases and treat them differently. However, assuming you don't have any integers of that size, you should be able to print 5 integers in O(5) time. I would go back over your calculation of the time it takes to go through your algorithm.
With your method, if you're using efficient algorithms, you should be able to remove duplicates in O(n log n) time, as seen here.
The way I see it, if you have a piece of the algorithm (the hashing part, where you remove duplicates and sort) running in O(n log n) time, and a piece of the algorithm (printing the array) running O(N) (or O(5) in this case), the entire algorithm runs in O(N) time: O(N) + O(N log N) -> O(N), since O(N) >= O(N log N). I hope that answers what you were asking for!
Looks like I was wrong, since O(N log N) of course grows faster than O(N). I don't think there's any way to pull off your problem.

Amortized Analysis of Algorithms

I am currently reading amortized analysis. I am not able to fully understand how it is different from normal analysis we perform to calculate average or worst case behaviour of algorithms. Can someone explain it with an example of sorting or something ?
Amortized analysis gives the average performance (over time) of each operation in
the worst case.
In a sequence of operations the worst case does not occur often in each operation - some operations may be cheap, some may be expensive Therefore, a traditional worst-case per operation analysis can give overly pessimistic bound. For example, in a dynamic array only some inserts take a linear time, though others - a constant time.
When different inserts take different times, how can we accurately calculate the total time? The amortized approach is going to assign an "artificial cost" to each operation in the sequence, called the amortized cost of an operation. It requires that the total real cost of the sequence should be bounded by the total of the amortized costs of all the operations.
Note, there is sometimes flexibility in the assignment of amortized costs.
Three methods are used in amortized analysis
Aggregate Method (or brute force)
Accounting Method (or the banker's method)
Potential Method (or the physicist's method)
For instance assume we’re sorting an array in which all the keys are distinct (since this is the slowest case, and takes the same amount of time as when they are not, if we don’t do anything special with keys that equal the pivot).
Quicksort chooses a random pivot. The pivot is equally likely to be the smallest key,
the second smallest, the third smallest, ..., or the largest. For each key, the
probability is 1/n. Let T(n) be a random variable equal to the running time of quicksort on
n distinct keys. Suppose quicksort picks the ith smallest key as the pivot. Then we run quicksort recursively on a list of length i − 1 and on a list of
length n − i. It takes O(n) time to partition and concatenate the lists–let’s
say at most n dollars–so the running time is
Here i is a random variable that can be any number from 1 (pivot is the
smallest key) to n (pivot is largest key), each chosen with probability 1/n,
so
This equation is called a recurrence. The base cases for the recurrence are T(0) = 1 and T(1) = 1. This means that sorting a list of length zero or one takes at most one dollar (unit of time).
So when you solve:
The expression 1 + 8j log_2 j might be an overestimate, but it doesn’t
matter. The important point is that this proves that E[T(n)] is in O(n log n).
In other words, the expected running time of quicksort is in O(n log n).
Also there’s a subtle but important difference between amortized running time
and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Θ(n^2). This means that there is a small
possibility that quicksort will cost (n^2) dollars, but the probability that this
will happen approaches zero as n grows large.
Quicksort O(n log n) expected time
Quickselect Θ(n) expected time
For a numeric example:
The Comparison Based Sorting Lower Bound is:
Finally you can find more information about quicksort average case analysis here
average - a probabilistic analysis, the average is in relation to all of the possible inputs, it is an estimate of the likely run time of the algorithm.
amortized - non probabilistic analysis, calculated in relation to a batch of calls to the algorithm.
example - dynamic sized stack:
say we define a stack of some size, and whenever we use up the space, we allocate twice the old size, and copy the elements into the new location.
overall our costs are:
O(1) per insertion \ deletion
O(n) per insertion ( allocation and copying ) when the stack is full
so now we ask, how much time would n insertions take?
one might say O(n^2), however we don't pay O(n) for every insertion.
so we are being pessimistic, the correct answer is O(n) time for n insertions, lets see why:
lets say we start with array size = 1.
ignoring copying we would pay O(n) per n insertions.
now, we do a full copy only when the stack has these number of elements:
1,2,4,8,...,n/2,n
for each of these sizes we do a copy and alloc, so to sum the cost we get:
const*(1+2+4+8+...+n/4+n/2+n) = const*(n+n/2+n/4+...+8+4+2+1) <= const*n(1+1/2+1/4+1/8+...)
where (1+1/2+1/4+1/8+...) = 2
so we pay O(n) for all of the copying + O(n) for the actual n insertions
O(n) worst case for n operation -> O(1) amortized per one operation.

Intuitive explanation for why QuickSort is n log n?

Is anybody able to give a 'plain english' intuitive, yet formal, explanation of what makes QuickSort n log n? From my understanding it has to make a pass over n items, and it does this log n times...Im not sure how to put it into words why it does this log n times.
Complexity
A Quicksort starts by partitioning the input into two chunks: it chooses a "pivot" value, and partitions the input into those less than the pivot value and those larger than the pivot value (and, of course, any equal to the pivot value have go into one or the other, of course, but for a basic description, it doesn't matter a lot which those end up in).
Since the input (by definition) isn't sorted, to partition it like that, it has to look at every item in the input, so that's an O(N) operation. After it's partitioned the input the first time, it recursively sorts each of those "chunks". Each of those recursive calls looks at every one of its inputs, so between the two calls it ends up visiting every input value (again). So, at the first "level" of partitioning, we have one call that looks at every input item. At the second level, we have two partitioning steps, but between the two, they (again) look at every input item. Each successive level has more individual partitioning steps, but in total the calls at each level look at all the input items.
It continues partitioning the input into smaller and smaller pieces until it reaches some lower limit on the size of a partition. The smallest that could possibly be would be a single item in each partition.
Ideal Case
In the ideal case we hope each partitioning step breaks the input in half. The "halves" probably won't be precisely equal, but if we choose the pivot well, they should be pretty close. To keep the math simple, let's assume perfect partitioning, so we get exact halves every time.
In this case, the number of times we can break it in half will be the base-2 logarithm of the number of inputs. For example, given 128 inputs, we get partition sizes of 64, 32, 16, 8, 4, 2, and 1. That's 7 levels of partitioning (and yes log2(128) = 7).
So, we have log(N) partitioning "levels", and each level has to visit all N inputs. So, log(N) levels times N operations per level gives us O(N log N) overall complexity.
Worst Case
Now let's revisit that assumption that each partitioning level will "break" the input precisely in half. Depending on how good a choice of partitioning element we make, we might not get precisely equal halves. So what's the worst that could happen? The worst case is a pivot that's actually the smallest or largest element in the input. In this case, we do an O(N) partitioning level, but instead of getting two halves of equal size, we've ended up with one partition of one element, and one partition of N-1 elements. If that happens for every level of partitioning, we obviously end up doing O(N) partitioning levels before even partition is down to one element.
This gives the technically correct big-O complexity for Quicksort (big-O officially refers to the upper bound on complexity). Since we have O(N) levels of partitioning, and each level requires O(N) steps, we end up with O(N * N) (i.e., O(N2)) complexity.
Practical implementations
As a practical matter, a real implementation will typically stop partitioning before it actually reaches partitions of a single element. In a typical case, when a partition contains, say, 10 elements or fewer, you'll stop partitioning and and use something like an insertion sort (since it's typically faster for a small number of elements).
Modified Algorithms
More recently other modifications to Quicksort have been invented (e.g., Introsort, PDQ Sort) which prevent that O(N2) worst case. Introsort does so by keeping track of the current partitioning "level", and when/if it goes too deep, it'll switch to a heap sort, which is slower than Quicksort for typical inputs, but guarantees O(N log N) complexity for any inputs.
PDQ sort adds another twist to that: since Heap sort is slower, it tries to avoid switching to heap sort if possible To to that, if it looks like it's getting poor pivot values, it'll randomly shuffle some of the inputs before choosing a pivot. Then, if (and only if) that fails to produce sufficiently better pivot values, it'll switch to using a Heap sort instead.
Each partitioning operation takes O(n) operations (one pass on the array).
In average, each partitioning divides the array to two parts (which sums up to log n operations). In total we have O(n * log n) operations.
I.e. in average log n partitioning operations and each partitioning takes O(n) operations.
There's a key intuition behind logarithms:
The number of times you can divide a number n by a constant before reaching 1 is O(log n).
In other words, if you see a runtime that has an O(log n) term, there's a good chance that you'll find something that repeatedly shrinks by a constant factor.
In quicksort, what's shrinking by a constant factor is the size of the largest recursive call at each level. Quicksort works by picking a pivot, splitting the array into two subarrays of elements smaller than the pivot and elements bigger than the pivot, then recursively sorting each subarray.
If you pick the pivot randomly, then there's a 50% chance that the chosen pivot will be in the middle 50% of the elements, which means that there's a 50% chance that the larger of the two subarrays will be at most 75% the size of the original. (Do you see why?)
Therefore, a good intuition for why quicksort runs in time O(n log n) is the following: each layer in the recursion tree does O(n) work, and since each recursive call has a good chance of reducing the size of the array by at least 25%, we'd expect there to be O(log n) layers before you run out of elements to throw away out of the array.
This assumes, of course, that you're choosing pivots randomly. Many implementations of quicksort use heuristics to try to get a nice pivot without too much work, and those implementations can, unfortunately, lead to poor overall runtimes in the worst case. #Jerry Coffin's excellent answer to this question talks about some variations on quicksort that guarantee O(n log n) worst-case behavior by switching which sorting algorithms are used, and that's a great place to look for more information about this.
Well, it's not always n(log n). It is the performance time when the pivot chosen is approximately in the middle. In worst case if you choose the smallest or the largest element as the pivot then the time will be O(n^2).
To visualize 'n log n', you can assume the pivot to be element closest to the average of all the elements in the array to be sorted.
This would partition the array into 2 parts of roughly same length.
On both of these you apply the quicksort procedure.
As in each step you go on halving the length of the array, you will do this for log n(base 2) times till you reach length = 1 i.e a sorted array of 1 element.
Break the sorting algorithm in two parts. First is the partitioning and second recursive call. Complexity of partioning is O(N) and complexity of recursive call for ideal case is O(logN). For example, if you have 4 inputs then there will be 2(log4) recursive call. Multiplying both you get O(NlogN). It is a very basic explanation.
In-fact you need to find the position of all the N elements(pivot),but the maximum number of comparisons is logN for each element (the first is N,second pivot N/2,3rd N/4..assuming pivot is the median element)
In the case of the ideal scenario, the first level call, places 1 element in its proper position. there are 2 calls at the second level taking O(n) time combined but it puts 2 elements in their proper position. in the same way. there will be 4 calls at the 3rd level which would take O(n) combined time but will place 4 elements into their proper position. so the depth of the recursive tree will be log(n) and at each depth, O(n) time is needed for all recursive calls. So time complexity is O(nlogn).

Resources