Quicksort complexities in depth [closed] - algorithm

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
So I am having an exam, and a big part of this exam will be quicksort algorithm. As everyone knows, the best case scenario and actually an average case for this algorithm is: O(nlogn). The worst case scenario would be O(n^2).
As for the worst case scenario I know how to explain it: It happens when the selected pivot would be the smallest or the biggest value in the array, then we would have n quicksort calls which may take up to n time (I mean partition operation). Am I right?
Now the best/average case. I've read the Cormens book, I understood many things thanks to that book, but as for the quicksort algorithm he focuses on the mathematical formulas on how to explain O(nlogn) complexity. I just wanted to know why is it O(nlogn), not getting into some mathematical proof. For now I've only seen some Wikipedia explanation, that if we choose a pivot which divides our array into n/2, n/2+1 parts each time, then we would have a call tree of depth logn, but I don't know if that is true and even if so, why is it logn then.
I know that there are many materials covering quicksort on the internet, but they only cover implementation, or are just telling me the complexity, not explaining it.

Am I right?
Yes.
we would have a call tree of depth logn but I don't know if that is true
It is.
why is it logn?
Because we partition the array in half at every step, resulting in logn depth of the call graph. From this Intro:
See the tree and its depth, it's logn. Imagine it as the search in a BST costs logn, or why search takes logn too in Binary search in a sorted array.
PS: Math tell the truth, invest in understanding them, and you shall become a better Computer Scientist! =)

For the best case scenario, quick sort splits the current array 50% / 50% (in half) on each partition step for a time complexity of O(log2(n)) (1/.5 = 2), but the constant 2 is ignored, so it's O(n log(n).
If each partition step produced a 20% / 80% split, then the worst case time complexity would be based on the 80% or O(n log1.25(n)) (1/.8 = 1.25), but the constant 1.25 is ignored so it's also O(n log(n)), even though it's about 3 times slower than the 50% / 50% partition case for sorting 1 million elements.
The O(n^2) time complexity occurs when the partition split only produces a linear reduction in partition size with each partition step. The simplest and worst case example is when only 1 element is removed per partition step.

Related

Insertion Sort vs. Merge Sort: which is faster depending on array? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Suppose that we are given an array A already sorted in increasing order. Which is asymptotically faster, insertion-sort or merge-sort?
Like wise, suppose we are given an array B sorted in decreasing order, so it needs to be reversed. Which is now asymptotically faster?
I'm having a hard time grasping this, I already know that insertion-sort is better for smaller data sets and merge-sort is better for larger data sets. However I'm not sure why one is faster than the other depending on whether or not the data set is already sorted or not.
Speaking about worst case, the merge sort is faster with O(N logN) against O(N^2) for insertion sort. However, another characteristic of an algorithm is omega - best case complexity, which is Omega(N) for insertion sort against Omega(N logN) of merge sort.
The latter can be explained when looking at the algorithms at hand:
Merge sort works by dividing the array in half (if possible), recursively sorting those halves and merging them. Look how it does not depend on the actual order of elements: we will do recursive calls regardless of whether the part we're sorting is already in order (unless it's the base case).
Insertion sort seeks for the first element which is out of the desired order, and shifts it to the left, until it's in order. If there's no such index, no shifting will occur, and the algorithm will finish, doing only O(N) comparisons.
However, the merge sort is quite fixable w.r.t. best running time! You can check if the part at hand is already sorted before going into recursion. This will not change the worst case complexity of O(N logN) (however, the constant will double), but will bring the best case complexity to Omega(N).
In the case where the data is sorted in the reverse order, the insertion sort's worst case will show itself, since we'll have to move each element (in the order of iteration) from its position to the first position, doing N(N-1)/2 swaps, which belongs to O(N^2). The merge sort, however, still takes O(N logN) because of its recursive approach.

Logarithms in Computer Science for Big O Notation? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have always had this question in my head, and have never been able to connect these two concepts so I am looking for some help in understanding Logarithms in Computer Science with respect to Big-O notation and algorithmic time complexity. I understand logarithms as a math concept as being able to answer the question, "what number do I need to raise this base to exponentially to get X?". For example, log2(16) tells us that we need to raise 2 to the 4th power to get 16. I also have a memorization-level understanding that O(log n) algorithms are faster than O(n) and other slower algorithms such as those that are exponential and that an example of an O(log n) algorithm is searching a balanced binary search tree.
My question is a little hard to state exactly, but I think it boils down to why is searching a balanced BST logarithmic and what makes it logarithmic and how do I relate mathematical logarithms with the CS use of the term? And a follow-up question would be what is the difference between O(n log n) and O(log n)?
I know that is not the clearest question in the world, but if someone could help me connect these two concepts it would clear up a lot of confusion for me and take me past the point of just memorization (which I generally hate).
When you are calculating Big O notation, you are calculating the complexity of an algorithm as the problem size grows.
For example, when performing a linear search of a list, the worst possible case is that the element is either in the last index, or not in the list at all, meaning your search will perform N steps, with N being the number of elements in the list. O(N).
An algorithm that will always take the same amount of steps to complete regardless of problem size is O(1).
Logarithms come into play when you are cutting the problem size as you move through an algorithm. For a BST, you start in the middle of a list. If the element to search for is smaller, you only focus on the first half of the list. If it is larger, you only focus on the second half. After only one step, you just cut your problem size in half. You continue cutting the list in half until you either find the element or can not proceed. (Note that a binary search assumes the list is in order)
Let's consider we are looking for 0 in the list below (A BST is represented as an ordered list):
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
We first start in the middle: 7
0 is less than 7 so we look in the first half of the list: [0,1,2,3,4,5,6]
We look in the middle of this list: 3
0 is less than 3 and our working list is now: [0,1,2]
So we look at 1. 0 is less than 1, so our list is now [0].
Given we have a working list of just 1 element, we are at the worst case. We either found the element, or it does not exist in the list. We were able to determine this in just four steps, looking at 7,3,1, and 0.
The problem size is 16 (number of elements in the list), which we represent as N.
In the worst case, we perform 4 comparisons (2^4 = 16 OR Log base 2 of 16 is 4)).
If we took a look at a problem size of 32, we would perform only 5 comparisons (2^5 = 32 OR Log base 2 of 32 is 5).
Therefor, the Big O for a BST is O(logN) (note that we use a base 2 for logarithms in CS).
For O(NlogN), the worst case is the problem size times the calculation of it's logarithm. Insertion sort, quick sort, and merge sort are all examples of O(NlogN)
In computer science, the big O notation indicates how fast the number of operations of an algorithm increases with a given parameter n of the requested problem statement. In a balanced binary search tree, n can be number of nodes in the tree. As you search through the tree, the algorithm needs to take a decision at each depth level of the tree. Since the number of nodes doubles at each level, the number of node in the tree n=2^d-1, where d is the depth of the tree. It is thus relatively intuitive that the number of decision that the algorithm takes is d-1 = log_{2}(n+1)-1. This shows that the complexity of the algorithm is of the order O(log(n)), which means that the number of operations is grows like log(n). As a function, log grows slower than n, that is as n becomes large log(n) is smaller than n, so an algorithm that is of time complexity O(log(n)) will be faster than one with complexity O(n), which is itself faster than O(n log(n)).
There are 2^n number of leaves in a BST. “n” is the hight of the tree. When you search, you check at each time the tree branching. So you have logarithmic time. (Logarithm function is inverse of exponent function)

How is O(n log n) different then O(log n)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Researching big O notation, I understand the concept of O(log n) as a binary search and O(n log n) as a quick sort.
Can anyone put into layman's terms what the main difference in runtime is between these two? and why that is the case?
they seem intuitively to be similarly related
Basically: a factor of N.
A binary search only touches a small number of elements. If there's a billion elements, the binary search only touches ~30 of them.
A quicksort touches every single element, a small number of times. If there's a billion elements, the quick sort touches all of them, about 30 times: about 30 billion touches total.
See how Log(n) is flat (not literally but figuratively, in comparison to other functions), while nLog(n) has crossed 600 for a value of n = 100. That's how different they are.
On simple terms and visualization, they are kind of the same in sorting algorithms, but quick sort as O(n log n) has a flaw in some situations, Quick Sort most situations is log n, but on special cases is n², that's why n before log n . So Quick sort for small amount of sorting is very good, but for millions/billions its not, better use Merge Sort for that kind of sorting.

Amortized and Average runtime complexity [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
this is not homework, I am studying Amortized analysis. There are something confuse me .I can't totally understand the meaning between Amortized and Average complexity. Not sure this is right or not. Here is a question:
--
We know that the runtime complexity of a program depends on the program input combinations --- Suppose the probability of the program with runtime complexity O(n) is p, where p << 1, and in other cases (i.e for the (1-p)possible cases), the runtime complexity is O(logn). If we are running the program with K different input combinations, where K is a very large number, we can say that the amortized and average runtime complexity of this program is:
--
First question is: I have read the question here:Difference between average case and amortized analysis
So, I think there is no answer for the average runtime complexity. Because we have no idea about what average input. But it seems to be p*O(n)+(1-p)*O(logn). Which is correct and why?
Second, the amortized part. I have read Constant Amortized Time and we already know that the Amortized analysis differs from average-case analysis in that probability is not involved; an amortized analysis guarantees the average performance of each operation in the worst case.
Can I just say that the amortized runtime is O(n). But the answer is O(pn). I'm a little confuse about why the probability involved. Although O(n)=O(pn), but I really can't have any idea why p could appear there? I change the way of thinking. Suppose we do lost of times then K becomes very big so the amortized runtime is (KpO(n)+K*(1-p)O(logn))/k = O(pn). It seems to be the same idea with Average case.
Sorry for that confuse, help me please, thanks first!
With "average" or "expected" complexity, you are making assumptions about the probability distribution of the problem. If you are unlucky, (or if your problem generator maliciously fails to match your assumption 8^), all your operations will be very expensive, and your program might take a much greater time than you expect.
Amortized complexity is a guarantee on the total cost of any sequence of operations. That means, no matter how malicious your problem generator is, you don't have to worry about a sequence of operations taking a much greater time than you expect.
(Depending on the algorithm, it is not hard to accidentally stumble on the worst case. The classic example is the naive Quicksort, which does very badly on mostly-sorted input, even though the "average" case is fast)

Why is quicksort used in practice? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Quicksort has a worst-case performance of O(n2), but is still used widely in practice anyway. Why is this?
You shouldn't center only on worst case and only on time complexity. It's more about average than worst, and it's about time and space.
Quicksort:
has average time complexity of Θ(n log n);
can be implemented with space complexity of Θ(log n);
Also have in account that big O notation doesn't take in account any constants, but in practice it does make difference if the algorithm is few times faster. Θ(n log n) means, that algorithm executes in K n log(n), where K is constant. Quicksort is the comparison-sort algorithm with the lowest K.
Average asymptotic order of QuickSort is O(nlogn) and it's usually more efficient than heapsort due to smaller constants (tighter loops). In fact, there is a theoretical linear time median selection algorithm that you can use to always find the best pivot, thus resulting a worst case O(nlogn). However, the normal QuickSort is usually faster than this theoretical one.
To make it more sensible, consider the probability that QuickSort will finish in O(n2). It's just 1/n! which means it'll almost never encounter that bad case.
Interestingly, quicksort performs more comparisons on average than mergesort - 1.44 n lg n (expected) for quicksort versus n lg n for mergesort. If all that mattered were comparisons, mergesort would be strongly preferable to quicksort.
The reason that quicksort is fast is that it has many other desirable properties that work extremely well on modern hardware. For example, quicksort requires no dynamic allocations. It can work in-place on the original array, using only O(log n) stack space (worst-case if implemented correctly) to store the stack frames necessary for recursion. Although mergesort can be made to do this, doing so usually comes at a huge performance penalty during the merge step. Other sorting algorithms like heapsort also have this property.
Additionally, quicksort has excellent locality of reference. The partitioning step, if done using Hoare's in-place partitioning algorithm, is essentially two linear scans performed inward from both ends of the array. This means that quicksort will have a very small number of cache misses, which on modern architectures is critical for performance. Heapsort, on the other hand, doesn't have very good locality (it jumps around all over an array), though most mergesort implementations have reasonably locality.
Quicksort is also very parallelizable. Once the initial partitioning step has occurred to split the array into smaller and greater regions, those two parts can be sorted independently of one another. Many sorting algorithms can be parallelized, including mergesort, but the performance of parallel quicksort tends to be better than other parallel algorithms for the above reason. Heapsort, on the other hand, does not.
The only issue with quicksort is the possibility that it degrades to O(n2), which on large data sets can be very serious. One way to avoid this is to have the algorithm introspect on itself and switch to one of the slower but more dependable algorithms in the case where it degenerates. This algorithm, called introsort, is a great hybrid sorting algorithm that gets many of the benefits of quicksort without the pathological case.
In summary:
Quicksort is in-place except for the stack frames used in the recursion, which take O(log n) space.
Quicksort has good locality of reference.
Quicksort is easily parallelized.
This accounts for why quicksort tends to outperform sorting algorithms that on paper might be better.
Hope this helps!
Because on average it's the fastest comparison sort (in terms of elapsed time).
Because, in the general case, it's one of the fastest sorting algorithms.
In addition to being the fastest though, some of it's bad case scenarios can be avoided by shuffling the array before sorting it. As for it's weakness with small data sets, obviously isn't as big a problem since the datasets are small and the sort time is probably small regardless.
As an example, I wrote a python function for QuickSort and bubble sorts. The bubble sort takes ~20 seconds to sort 10,000 records, 11 seconds for 7500, and 5 for 5000. The quicksort does all these sorts in around 0.15 seconds!
It might be worth pointing out that C does have the library function qsort(), but there's no requirement that it be implemented using an actual QuickSort, that is up to the compiler vendor.
Bcs this is one of Algorithm with work well on large data set with O(NlogN) complexity. This is also in place algorithm which take constant space. By selecting pivot element wisely we can avoid worse case of Quick sort and will perform in O(NlogN) always even on sorted array.

Resources