Scenarios for selection sort, insertion sort, and quick sort - sorting

If anyone can give some input on my logic, I would very much appreciate it.
Which method runs faster for an array with all keys identical, selection sort or insertion sort?
I think that this would be similar to when the array is already sorted, so that insertion sort will be linear, and the selection sort quadratic.
Which method runs faster for an array in reverse order, selection sort or insertion sort?
I think that they would run similarly, since the values at every position will have to be changed. The worst case scenario for insertion sort is reverse order, so that would mean it is quadratic, and then the selection sort would already be quadratic as well.
Suppose that we use insertion sort on a randomly ordered array where elements have only one of three values. Is the running time linear, quadratic, or something in between?
Since it is randomly sorted, I think that would mean that the insertion sort would have to perform many more times the number of operations that the number of values. If that's the case, then its not linear.So, it would likely be quadratic, or perhaps a little below quadratic.
What is the maximum number of times during the execution of Quick.sort() that the largest item can be exchanged, for an array of length N?
The maximum number cannot be passed over more times than there are spaces available, since it should always be approaching its right position. So, going from being the first to the last value spot, it would be exchanged N times.
About how many compares will quick.sort() make when sorting an array of N items that are all equal?
When drawing out the quick sort , a triangle can be drawn around the compared objects at every phase, that is N tall and N wide, the area of this would equal the number of compares performed, which would be (N^2)/2

Here are my comments on your comments:
Which method runs faster for an array with all keys identical, selection sort or insertion sort?
I think that this would be similar to when the array is already sorted, so that insertion sort will be linear, and the selection sort quadratic.
Yes, that's correct. Insertion sort will do O(1) work per element and visit O(n) elements for a total runtime of O(n). Selection sort always runs in time Θ(n2) regardless of the input structure, so its runtime will be quadratic.
Which method runs faster for an array in reverse order, selection sort or insertion sort?
I think that they would run similarly, since the values at every position will have to be changed. The worst case scenario for insertion sort is reverse order, so that would mean it is quadratic, and then the selection sort would already be quadratic as well.
You're right that both algorithms have quadratic runtime. The algorithms should actually have relatively comparable performance, since they'll make the same total number of comparisons.
Suppose that we use insertion sort on a randomly ordered array where elements have only one of three values. Is the running time linear, quadratic, or something in between?
Since it is randomly sorted, I think that would mean that the insertion sort would have to perform many more times the number of operations that the number of values. If that's the case, then its not linear.So, it would likely be quadratic, or perhaps a little below quadratic.
This should take quadratic time (time Θ(n2)). Take just the elements in the back third of the array. About a third of these elements will be 1's, and in order to insert them into the sorted sequence they'd need to be moved above 2/3's of the way down the array. Therefore, the work done would be at least (n / 3)(2n / 3) = 2n2 / 9, which is quadratic.
What is the maximum number of times during the execution of Quick.sort() that the largest item can be exchanged, for an array of length N?
The maximum number cannot be passed over more times than there are spaces available, since it should always be approaching its right position. So, going from being the first to the last value spot, it would be exchanged N times.
There's an off-by-one error here. When the array has size 1, the largest element can't be moved any more, so the maximum number of moves would be N - 1.
About how many compares will quick.sort() make when sorting an array of N items that are all equal?
When drawing out the quick sort , a triangle can be drawn around the compared objects at every phase, that is N tall and N wide, the area of this would equal the number of compares performed, which would be (N^2)/2
This really depends on the implementation of Quick.sort(). Quicksort with ternary partitioning would only do O(n) total work because all values equal to the pivot are excluded in the recursive calls. If this isn't done, then your analysis would be correct.
Hope this helps!

Related

Why insertion sort is best algorithm for sorted or nearly sorted array?

So i guess its because it just compares A[k] and A[k-1], and does the implementation in one sweep but its still not clear. Can someone explain better.
Thanks
This link shows a graphical representation of sorting algorithm with different types of data set.
As you can see, when the data is sorted the algorithm complexity is reduced to N. Which is equivalent to the number of elements as inputs.
The link provided gives a clear picture of how its more efficient.
You answered your own question: For a nearly sorted array, insertion sort will only need a handful of O(n) passes to complete. Contrast that to a divide and conquer sorting algorithm like merge sort, which takes O(n*lgn). For any non trivial value of n, a divide and conquer algorithm will need many O(n) passes, even if the array be almost completely sorted, whereas insertion sort might only require a few.
Insertion sort is a faster and more improved sorting algorithm than selection sort. In selection sort the algorithm iterates through all of the data through every pass whether it is already sorted or not. However, insertion sort works differently, instead of iterating through all of the data after every pass the algorithm only traverses the data it needs to until the segment that is being sorted is sorted. Again there are two loops that are required by insertion sort and therefore two main variables, which in this case are named 'i' and 'j'. Variables 'i' and 'j' begin on the same index after every pass of the first loop, the second loop only executes if variable 'j' is greater then index 0 AND arr[j] < arr[j - 1]. In other words, if 'j' hasn't reached the end of the data AND the value of the index where 'j' is at is smaller than the value of the index to the left of 'j', finally 'j' is decremented. As long as these two conditions are met in the second loop it will keep executing, this is what sets insertion sort apart from selection sort. Only the data that needs to be sorted is sorted.
The general goal of a sorting algorithm is to minimize the number of comparisons. Sorting algorithms have a lower bound and an upper bound on the number of comparisons( n log n worst-case for merge and heap sorts, n log n average case for quick sort). In the most general case, you'd go with an algorithm that happens to have the best average or best worst-case number of comparisons. However, when you know something about the data (e.g., the array is already sorted, or almost sorted), you can exploit the fact that insertion sort's lower bound is far lower than the "n log n" sorts.
For example, if you have an array [1,2,3,4,5,6,7,9] and you need to insert 8 into it, you can either insert it at the end, and sort the array using a vanilla n log n sort (which will do about 28 comparisons (roughly) to sort the data to [1,2,3,4,5,6,7,8,9]). However, insertion sort lets you insert the 8 at the right position in only about 8 comparisons.

In quicksort If an array is randomized, does using the median of 3 for pivot selection matter?

I've been comparing the run times of various pivot selection algorithms. Surprisingly the simplest one where the first element is always chosen is the fastest. This may be because I'm filling the array with random data.
If the array has been randomized (shuffled) does it matter? For example picking the medium of 3 as the pivot is always(?) better than picking the first element as the pivot. But this isn't what I've noticed. Is it because if the array is already randomized there would be no reason to assume sortedness, and using the medium is assuming there is some degree of sortedness?
The worst case runtime of quicksort is O(n²). Quicksort is only in average case a fast sorting algorithm.
To reach a average runtime of O(n log n) you have to choose a random pivot element.
But instead of choosing a random pivot element, you can shuffle the list and choose the first element.
To see that this holds you can look at this that way: lets say all elements are in a specific order. Shuffling means you use a random permutation on the list of elements, so a random element will be at the first position and also on all other positions. You can also see it by shuffling the list by randomly choose one of all elements for the first element, then choosing randomly one element of the other (not yet coosen elements) for the second element, and so on.
If your list is already a random generated list, you can directly choose the first element as pivot, without shuffling again.
So, choosing the first element is the fastest one because of the random generated input, but choosing the thrid or the last will also as fast as choosing the first.
All other ways to choose a pivot element have to compute something (a median or a random number or something like this), but they have no advantage over a random choice.
A substantially late response, but I believe it will add some additional info.
Surprisingly the simplest one where the first element is always chosen
is the fastest.
This is actually not surprisingly at all, since you mentioned that you test the algorithm with the random data. In the reality, a percentage of almost-sorted and sorted data is much greater than it would statistically be expected. Take for example the chronological data, when you collect it into the log file some elements can be out of order, but most of them are already sorted. Unfortunately, the Quicksort implementation that takes first (or last) element as a pivot is vulnerable to such input and it degenerates into O(n^2) complexity because in the partition step you divide your array into two halves of size 1 and n-1 and therefore you get n partitions instead of log n, on average.
That's why people decided to add some sort of randomization that would make a probability of getting the problematic input as minimum as possible. There are three well-known approaches:
shuffle the input - to quote Robert Sedgewick, "the probability of getting O(n^2) performance with such approach is lower than the probability that you will be hit by a thunderstrike" :)
choose the pivot element randomly - Wikipedia says that in average, expected number of comparisons in this case is 1.386 n log n
choose the pivot element as a median of three - Wikipedia says that in average, expected number of comparisons in this case is 1.188 n log n
However, randomization costs. If you shuffle the input array, that is O(n) which is dominated by O(nlogn), but you need to take in the account the cost of invoking random(..) method n times. With your simple approach, that is avoided and it is thus faster.
See also:
Worst case for Quicksort - when can it occur?

Insertion sort vs Bubble Sort Algorithms

I'm trying to understand a few sorting algorithms, but I'm struggling to see the difference in the bubble sort and insertion sort algorithm.
I know both are O(n2), but it seems to me that bubble sort just bubbles the maximum value of the array to the top for each pass, while insertion sort just sinks the lowest value to the bottom each pass. Aren't they doing the exact same thing but in different directions?
For insertion sort, the number of comparisons/potential swaps starts at zero and increases each time (ie 0, 1, 2, 3, 4, ..., n) but for bubble sort this same behaviour happens, but at the end of the sorting (ie n, n-1, n-2, ... 0) because bubble sort no longer needs to compare with the last elements as they are sorted.
For all this though, it seems a consensus that insertion sort is better in general. Can anyone tell me why?
Edit: I'm primarily interested in the differences in how the algorithms work, not so much their efficiency or asymptotic complexity.
Insertion Sort
After i iterations the first i elements are ordered.
In each iteration the next element is bubbled through the sorted section until it reaches the right spot:
sorted | unsorted
1 3 5 8 | 4 6 7 9 2
1 3 4 5 8 | 6 7 9 2
The 4 is bubbled into the sorted section
Pseudocode:
for i in 1 to n
for j in i downto 2
if array[j - 1] > array[j]
swap(array[j - 1], array[j])
else
break
Bubble Sort
After i iterations the last i elements are the biggest, and ordered.
In each iteration, sift through the unsorted section to find the maximum.
unsorted | biggest
3 1 5 4 2 | 6 7 8 9
1 3 4 2 | 5 6 7 8 9
The 5 is bubbled out of the unsorted section
Pseudocode:
for i in 1 to n
for j in 1 to n - i
if array[j] > array[j + 1]
swap(array[j], array[j + 1])
Note that typical implementations terminate early if no swaps are made during one of the iterations of the outer loop (since that means the array is sorted).
Difference
In insertion sort elements are bubbled into the sorted section, while in bubble sort the maximums are bubbled out of the unsorted section.
In bubble sort in ith iteration you have n-i-1 inner iterations (n^2)/2 total, but in insertion sort you have maximum i iterations on i'th step, but i/2 on average, as you can stop inner loop earlier, after you found correct position for the current element. So you have (sum from 0 to n) / 2 which is (n^2) / 4 total;
That's why insertion sort is faster than bubble sort.
Another difference, I didn't see here:
Bubble sort has 3 value assignments per swap:
you have to build a temporary variable first to save the value you want to push forward(no.1), than you have to write the other swap-variable into the spot you just saved the value of(no.2) and then you have to write your temporary variable in the spot other spot(no.3).
You have to do that for each spot - you want to go forward - to sort your variable to the correct spot.
With insertion sort you put your variable to sort in a temporary variable and then put all variables in front of that spot 1 spot backwards, as long as you reach the correct spot for your variable. That makes 1 value assignement per spot. In the end you write your temp-variable into the the spot.
That makes far less value assignements, too.
This isn't the strongest speed-benefit, but i think it can be mentioned.
I hope, I expressed myself understandable, if not, sorry, I'm not a nativ Britain
The main advantage of insert sort is that it's online algorithm. You don't have to have all the values at start. This could be useful, when dealing with data coming from network, or some sensor.
I have a feeling, that this would be faster than other conventional n log(n) algorithms. Because the complexity would be n*(n log(n)) e.g. reading/storing each value from stream (O(n)) and then sorting all the values (O(n log(n))) resulting in O(n^2 log(n))
On the contrary using Insert Sort needs O(n) for reading values from the stream and O(n) to put the value to the correct place, thus it's O(n^2) only. Other advantage is, that you don't need buffers for storing values, you sort them in the final destination.
Bubble Sort is not online (it cannot sort a stream of inputs without knowing how many items there will be) because it does not really keep track of a global maximum of the sorted elements. When an item is inserted you will need to start the bubbling from the very beginning
well bubble sort is better than insertion sort only when someone is looking for top k elements from a large list of number
i.e. in bubble sort after k iterations you'll get top k elements. However after k iterations in insertion sort, it only assures that those k elements are sorted.
Though both the sorts are O(N^2).The hidden constants are much smaller in Insertion sort.Hidden constants refer to the actual number of primitive operations carried out.
When insertion sort has better running time?
Array is nearly sorted-notice that insertion sort does fewer operations in this case, than bubble sort.
Array is of relatively small size: insertion sort you move elements around, to put the current element.This is only better than bubble sort if the number of elements is few.
Notice that insertion sort is not always better than bubble sort.To get the best of both worlds, you can use insertion sort if array is of small size, and probably merge sort(or quicksort) for larger arrays.
Number of swap in each iteration
Insertion-sort does at most 1 swap in each iteration.
Bubble-sort does 0 to n swaps in each iteration.
Accessing and changing sorted part
Insertion-sort accesses(and changes when needed) the sorted part to find the correct position of a number in consideration.
When optimized, Bubble-sort does not access what is already sorted.
Online or not
Insertion-sort is online. That means Insertion-sort takes one input at a time before it puts in appropriate position. It does not have to compare only adjacent-inputs.
Bubble-sort is not-online. It does not operate one input at a time. It handles a group of inputs(if not all) in each iteration. Bubble-sort only compare and swap adjacent-inputs in each iteration.
insertion sort:
1.In the insertion sort swapping is not required.
2.the time complexity of insertion sort is Ω(n)for best case and O(n^2) worst case.
3.less complex as compared to bubble sort.
4.example: insert books in library, arrange cards.
bubble sort:
1.Swapping required in bubble sort.
2.the time complexity of bubble sort is Ω(n)for best case and O(n^2) worst case.
3.more complex as compared to insertion sort.
I will try to give a more concise and informative answer than others.
Yes, after each pass, insertion sort and bubble sort intuitively seem the same - they both build a sorted sublist at the edge.
However, insertion sort will perform fewer comparisons in general. With insertion sort, we are only performing a linear search in the sorted sublist with each pass. With random data, you can expect to make m/2 comparisons and swaps, where m is the size of the sorted sublist.
With bubble sort, we are always comparing EVERY pair in the unsorted sublist with each pass, so that's n-m comparisons (twice as many as insertion sort on random data). This means bubble sort is bad if comparisons are expensive/slow.
Also, the branching associated with swaps and compares for insertion sort is more predictable. We do a linear search at the same time as a linear insert, and we can generally predict/assume that the linear search/insert will continue until the correct space is found. With bubble sort, branching is essentially random, and we can expect a branch miss half the time! With every single compare! This means bubble sort is bad for pipelined processors if comparisons and swaps are relatively cheap/fast.
These factors make bubble sort much slower in general than insertion sort.
Insertion Sort: We insert the elements into their proper positions in the array, one at a time. When we reach the nth element in the array, the n-1 elements are sorted.
Bubble Sort: We start with a bubble of one element and keep extending the bubble by a quantity of 1, until all elements are added. At any iteration, we simply swap the adjacent elements in the proper order so as to get the largest element at the end of the bubble. In this way, we keep on putting the largest element at the end of the array, and finally after all iterations our sorting is done.
Bubble Sort and Insertion sort complexity: O(n^2)
Insertion is faster as compared to Bubble sort, for the following reason:
Insertion sort just compares an element to a sorted array, that is ith element to the array containing 1...i-1 elements, which are sorted already. Therefore, there are less number of comparisons and swaps.
In Bubble sort, however, as the bubble increases, the same iteration of comparing each pair of neighbors runs. This leads to a lot more comparisons and swapping as compared to Insertion Sort.
Therefore, even though the time complexity of both the algorithms is O(n^2); insertion sort results in a faster approach that bubble sort.
Insertion sort can be resumed as "Look for the element which should be at first position(the minimum), make some space by shifting next elements, and put it at first position. Good. Now look at the element which should be at 2nd...." and so on...
Bubble sort operate differently which can be resumed as "As long as I find two adjacent elements which are in the wrong order, I swap them".
Bubble sort is almost useless under all circumstances. In use cases when insertion sort may have too many swaps, selection sort can be used because it guarantees less than N times of swap. Because selection sort is better than bubble sort, bubble sort has no use cases.

Big(0) running time for selection sort

You are given a list of 100 integers that have been read from a file. If all values are zero, what would be the running time (in terms of O-notation) of a selection sort algorithm.
I thought it was O(n) because selection sort starts with the leftmost number as the sorted side. then it goes through the rest of the array to find the smallest number and swaps it with the the first number in the sorted side. But since they are all zeros then it won't swap any numbers (or so I think).
my teacher said that it is O(n^2). can anyone explain why?
Selection sort is not adaptive. Each element will always be compared with each other element (Compare n elements with n other elements → n^2 comparisons). Thus, selection sort always has O(n^2) comparisons. It has, however, O(n) swaps.
Think of a table with n rows and n colums, and each cell needs a comparison to fill the value (except the diagonal).
More info on this amazing website

Intuitive explanation for why QuickSort is n log n?

Is anybody able to give a 'plain english' intuitive, yet formal, explanation of what makes QuickSort n log n? From my understanding it has to make a pass over n items, and it does this log n times...Im not sure how to put it into words why it does this log n times.
Complexity
A Quicksort starts by partitioning the input into two chunks: it chooses a "pivot" value, and partitions the input into those less than the pivot value and those larger than the pivot value (and, of course, any equal to the pivot value have go into one or the other, of course, but for a basic description, it doesn't matter a lot which those end up in).
Since the input (by definition) isn't sorted, to partition it like that, it has to look at every item in the input, so that's an O(N) operation. After it's partitioned the input the first time, it recursively sorts each of those "chunks". Each of those recursive calls looks at every one of its inputs, so between the two calls it ends up visiting every input value (again). So, at the first "level" of partitioning, we have one call that looks at every input item. At the second level, we have two partitioning steps, but between the two, they (again) look at every input item. Each successive level has more individual partitioning steps, but in total the calls at each level look at all the input items.
It continues partitioning the input into smaller and smaller pieces until it reaches some lower limit on the size of a partition. The smallest that could possibly be would be a single item in each partition.
Ideal Case
In the ideal case we hope each partitioning step breaks the input in half. The "halves" probably won't be precisely equal, but if we choose the pivot well, they should be pretty close. To keep the math simple, let's assume perfect partitioning, so we get exact halves every time.
In this case, the number of times we can break it in half will be the base-2 logarithm of the number of inputs. For example, given 128 inputs, we get partition sizes of 64, 32, 16, 8, 4, 2, and 1. That's 7 levels of partitioning (and yes log2(128) = 7).
So, we have log(N) partitioning "levels", and each level has to visit all N inputs. So, log(N) levels times N operations per level gives us O(N log N) overall complexity.
Worst Case
Now let's revisit that assumption that each partitioning level will "break" the input precisely in half. Depending on how good a choice of partitioning element we make, we might not get precisely equal halves. So what's the worst that could happen? The worst case is a pivot that's actually the smallest or largest element in the input. In this case, we do an O(N) partitioning level, but instead of getting two halves of equal size, we've ended up with one partition of one element, and one partition of N-1 elements. If that happens for every level of partitioning, we obviously end up doing O(N) partitioning levels before even partition is down to one element.
This gives the technically correct big-O complexity for Quicksort (big-O officially refers to the upper bound on complexity). Since we have O(N) levels of partitioning, and each level requires O(N) steps, we end up with O(N * N) (i.e., O(N2)) complexity.
Practical implementations
As a practical matter, a real implementation will typically stop partitioning before it actually reaches partitions of a single element. In a typical case, when a partition contains, say, 10 elements or fewer, you'll stop partitioning and and use something like an insertion sort (since it's typically faster for a small number of elements).
Modified Algorithms
More recently other modifications to Quicksort have been invented (e.g., Introsort, PDQ Sort) which prevent that O(N2) worst case. Introsort does so by keeping track of the current partitioning "level", and when/if it goes too deep, it'll switch to a heap sort, which is slower than Quicksort for typical inputs, but guarantees O(N log N) complexity for any inputs.
PDQ sort adds another twist to that: since Heap sort is slower, it tries to avoid switching to heap sort if possible To to that, if it looks like it's getting poor pivot values, it'll randomly shuffle some of the inputs before choosing a pivot. Then, if (and only if) that fails to produce sufficiently better pivot values, it'll switch to using a Heap sort instead.
Each partitioning operation takes O(n) operations (one pass on the array).
In average, each partitioning divides the array to two parts (which sums up to log n operations). In total we have O(n * log n) operations.
I.e. in average log n partitioning operations and each partitioning takes O(n) operations.
There's a key intuition behind logarithms:
The number of times you can divide a number n by a constant before reaching 1 is O(log n).
In other words, if you see a runtime that has an O(log n) term, there's a good chance that you'll find something that repeatedly shrinks by a constant factor.
In quicksort, what's shrinking by a constant factor is the size of the largest recursive call at each level. Quicksort works by picking a pivot, splitting the array into two subarrays of elements smaller than the pivot and elements bigger than the pivot, then recursively sorting each subarray.
If you pick the pivot randomly, then there's a 50% chance that the chosen pivot will be in the middle 50% of the elements, which means that there's a 50% chance that the larger of the two subarrays will be at most 75% the size of the original. (Do you see why?)
Therefore, a good intuition for why quicksort runs in time O(n log n) is the following: each layer in the recursion tree does O(n) work, and since each recursive call has a good chance of reducing the size of the array by at least 25%, we'd expect there to be O(log n) layers before you run out of elements to throw away out of the array.
This assumes, of course, that you're choosing pivots randomly. Many implementations of quicksort use heuristics to try to get a nice pivot without too much work, and those implementations can, unfortunately, lead to poor overall runtimes in the worst case. #Jerry Coffin's excellent answer to this question talks about some variations on quicksort that guarantee O(n log n) worst-case behavior by switching which sorting algorithms are used, and that's a great place to look for more information about this.
Well, it's not always n(log n). It is the performance time when the pivot chosen is approximately in the middle. In worst case if you choose the smallest or the largest element as the pivot then the time will be O(n^2).
To visualize 'n log n', you can assume the pivot to be element closest to the average of all the elements in the array to be sorted.
This would partition the array into 2 parts of roughly same length.
On both of these you apply the quicksort procedure.
As in each step you go on halving the length of the array, you will do this for log n(base 2) times till you reach length = 1 i.e a sorted array of 1 element.
Break the sorting algorithm in two parts. First is the partitioning and second recursive call. Complexity of partioning is O(N) and complexity of recursive call for ideal case is O(logN). For example, if you have 4 inputs then there will be 2(log4) recursive call. Multiplying both you get O(NlogN). It is a very basic explanation.
In-fact you need to find the position of all the N elements(pivot),but the maximum number of comparisons is logN for each element (the first is N,second pivot N/2,3rd N/4..assuming pivot is the median element)
In the case of the ideal scenario, the first level call, places 1 element in its proper position. there are 2 calls at the second level taking O(n) time combined but it puts 2 elements in their proper position. in the same way. there will be 4 calls at the 3rd level which would take O(n) combined time but will place 4 elements into their proper position. so the depth of the recursive tree will be log(n) and at each depth, O(n) time is needed for all recursive calls. So time complexity is O(nlogn).

Resources