insertion, selection, bubble sorting analysis with inversion Robert Sedgewick - algorithm

I am reading algorithms by Robertsedwick by C++ on sorting
Property 1: Insertion sort and bubble sort use a linear number of
comparisions and exchanges for files with at most a constant number of
inversions corresponding to each element.
In another type of partially sorted file, we perhaps have appended a few elements to a sorted file or have edited a few elements in a sorted file to change their kesy. Insetion sort is efficient menthod for such files; bubble and selection sort are not.
Property 2: Insertion sort uses a linear number of comparisions and
exchanges for files with atmost a constant number of elements having
more than a constant number of corresponding inversions.
My questions on above properties are
I am not able to get difference between property 1 and property 2? Can any one explain me here?
On what basis above for property 2 author mentioned insertion sort is best and not bubble and selection sort?
It would be good if explained with example.
Thanks for your time and help

So, an inversion where the sort order is < is there i < j but a[i] > a[j].
Property 1. Consider the sequence 2 1 4 3 6 5 8 7 10 9.... Every element is out of order with respect to its neighbor to the left or to the right, but is in order with respect to all other elements. So each element has a constant number of inversions, one, in this case. This property says that all the elements can be a little out of order.
Both bubble sort and insertion sort will run in linear time. Bubble sort will take just one pass to correct the order since it swaps neighboring elements and another pass to confirm. Insertion sort will only have to do one compare and swap per element.
Property 2. This property is stronger. In addition to being able to have all the elements a little out of order, now you can have a few that are very out of order. Consider the same sequence as before, but the smallest element and largest elements moved to opposite ends: n 2 4 3 6 5 8 7 10 9...1. Now 1 and n are out of order with respect to all other elements.
Insertion sort will still perform in linear time. As before, most of the elements require only a few compare and swaps, but there are a few that can take order n compare and swaps. In this example, the first n-1 elements take a couple of compare and swaps (ok, so the 2 only takes one) to get into place and the last takes n-1 compare and swaps -- 2*(n-1) + 1*(n-1) is order n.
Bubble sort has a much harder time in this example. Each pass through can only move the 1 a single step backwards. Thus it will take at least (n-1) passes in which (n-1) comparisons are done before completion -- this is multiplicative (n-1)*(n-1) is order n^2. (You could also run bubble sort in the opposite direction, in which case the largest element at the beginning would slowly move to the other end instead.)

Related

Scenarios for selection sort, insertion sort, and quick sort

If anyone can give some input on my logic, I would very much appreciate it.
Which method runs faster for an array with all keys identical, selection sort or insertion sort?
I think that this would be similar to when the array is already sorted, so that insertion sort will be linear, and the selection sort quadratic.
Which method runs faster for an array in reverse order, selection sort or insertion sort?
I think that they would run similarly, since the values at every position will have to be changed. The worst case scenario for insertion sort is reverse order, so that would mean it is quadratic, and then the selection sort would already be quadratic as well.
Suppose that we use insertion sort on a randomly ordered array where elements have only one of three values. Is the running time linear, quadratic, or something in between?
Since it is randomly sorted, I think that would mean that the insertion sort would have to perform many more times the number of operations that the number of values. If that's the case, then its not linear.So, it would likely be quadratic, or perhaps a little below quadratic.
What is the maximum number of times during the execution of Quick.sort() that the largest item can be exchanged, for an array of length N?
The maximum number cannot be passed over more times than there are spaces available, since it should always be approaching its right position. So, going from being the first to the last value spot, it would be exchanged N times.
About how many compares will quick.sort() make when sorting an array of N items that are all equal?
When drawing out the quick sort , a triangle can be drawn around the compared objects at every phase, that is N tall and N wide, the area of this would equal the number of compares performed, which would be (N^2)/2
Here are my comments on your comments:
Which method runs faster for an array with all keys identical, selection sort or insertion sort?
I think that this would be similar to when the array is already sorted, so that insertion sort will be linear, and the selection sort quadratic.
Yes, that's correct. Insertion sort will do O(1) work per element and visit O(n) elements for a total runtime of O(n). Selection sort always runs in time Θ(n2) regardless of the input structure, so its runtime will be quadratic.
Which method runs faster for an array in reverse order, selection sort or insertion sort?
I think that they would run similarly, since the values at every position will have to be changed. The worst case scenario for insertion sort is reverse order, so that would mean it is quadratic, and then the selection sort would already be quadratic as well.
You're right that both algorithms have quadratic runtime. The algorithms should actually have relatively comparable performance, since they'll make the same total number of comparisons.
Suppose that we use insertion sort on a randomly ordered array where elements have only one of three values. Is the running time linear, quadratic, or something in between?
Since it is randomly sorted, I think that would mean that the insertion sort would have to perform many more times the number of operations that the number of values. If that's the case, then its not linear.So, it would likely be quadratic, or perhaps a little below quadratic.
This should take quadratic time (time Θ(n2)). Take just the elements in the back third of the array. About a third of these elements will be 1's, and in order to insert them into the sorted sequence they'd need to be moved above 2/3's of the way down the array. Therefore, the work done would be at least (n / 3)(2n / 3) = 2n2 / 9, which is quadratic.
What is the maximum number of times during the execution of Quick.sort() that the largest item can be exchanged, for an array of length N?
The maximum number cannot be passed over more times than there are spaces available, since it should always be approaching its right position. So, going from being the first to the last value spot, it would be exchanged N times.
There's an off-by-one error here. When the array has size 1, the largest element can't be moved any more, so the maximum number of moves would be N - 1.
About how many compares will quick.sort() make when sorting an array of N items that are all equal?
When drawing out the quick sort , a triangle can be drawn around the compared objects at every phase, that is N tall and N wide, the area of this would equal the number of compares performed, which would be (N^2)/2
This really depends on the implementation of Quick.sort(). Quicksort with ternary partitioning would only do O(n) total work because all values equal to the pivot are excluded in the recursive calls. If this isn't done, then your analysis would be correct.
Hope this helps!

Insertion sort vs Bubble Sort Algorithms

I'm trying to understand a few sorting algorithms, but I'm struggling to see the difference in the bubble sort and insertion sort algorithm.
I know both are O(n2), but it seems to me that bubble sort just bubbles the maximum value of the array to the top for each pass, while insertion sort just sinks the lowest value to the bottom each pass. Aren't they doing the exact same thing but in different directions?
For insertion sort, the number of comparisons/potential swaps starts at zero and increases each time (ie 0, 1, 2, 3, 4, ..., n) but for bubble sort this same behaviour happens, but at the end of the sorting (ie n, n-1, n-2, ... 0) because bubble sort no longer needs to compare with the last elements as they are sorted.
For all this though, it seems a consensus that insertion sort is better in general. Can anyone tell me why?
Edit: I'm primarily interested in the differences in how the algorithms work, not so much their efficiency or asymptotic complexity.
Insertion Sort
After i iterations the first i elements are ordered.
In each iteration the next element is bubbled through the sorted section until it reaches the right spot:
sorted | unsorted
1 3 5 8 | 4 6 7 9 2
1 3 4 5 8 | 6 7 9 2
The 4 is bubbled into the sorted section
Pseudocode:
for i in 1 to n
for j in i downto 2
if array[j - 1] > array[j]
swap(array[j - 1], array[j])
else
break
Bubble Sort
After i iterations the last i elements are the biggest, and ordered.
In each iteration, sift through the unsorted section to find the maximum.
unsorted | biggest
3 1 5 4 2 | 6 7 8 9
1 3 4 2 | 5 6 7 8 9
The 5 is bubbled out of the unsorted section
Pseudocode:
for i in 1 to n
for j in 1 to n - i
if array[j] > array[j + 1]
swap(array[j], array[j + 1])
Note that typical implementations terminate early if no swaps are made during one of the iterations of the outer loop (since that means the array is sorted).
Difference
In insertion sort elements are bubbled into the sorted section, while in bubble sort the maximums are bubbled out of the unsorted section.
In bubble sort in ith iteration you have n-i-1 inner iterations (n^2)/2 total, but in insertion sort you have maximum i iterations on i'th step, but i/2 on average, as you can stop inner loop earlier, after you found correct position for the current element. So you have (sum from 0 to n) / 2 which is (n^2) / 4 total;
That's why insertion sort is faster than bubble sort.
Another difference, I didn't see here:
Bubble sort has 3 value assignments per swap:
you have to build a temporary variable first to save the value you want to push forward(no.1), than you have to write the other swap-variable into the spot you just saved the value of(no.2) and then you have to write your temporary variable in the spot other spot(no.3).
You have to do that for each spot - you want to go forward - to sort your variable to the correct spot.
With insertion sort you put your variable to sort in a temporary variable and then put all variables in front of that spot 1 spot backwards, as long as you reach the correct spot for your variable. That makes 1 value assignement per spot. In the end you write your temp-variable into the the spot.
That makes far less value assignements, too.
This isn't the strongest speed-benefit, but i think it can be mentioned.
I hope, I expressed myself understandable, if not, sorry, I'm not a nativ Britain
The main advantage of insert sort is that it's online algorithm. You don't have to have all the values at start. This could be useful, when dealing with data coming from network, or some sensor.
I have a feeling, that this would be faster than other conventional n log(n) algorithms. Because the complexity would be n*(n log(n)) e.g. reading/storing each value from stream (O(n)) and then sorting all the values (O(n log(n))) resulting in O(n^2 log(n))
On the contrary using Insert Sort needs O(n) for reading values from the stream and O(n) to put the value to the correct place, thus it's O(n^2) only. Other advantage is, that you don't need buffers for storing values, you sort them in the final destination.
Bubble Sort is not online (it cannot sort a stream of inputs without knowing how many items there will be) because it does not really keep track of a global maximum of the sorted elements. When an item is inserted you will need to start the bubbling from the very beginning
well bubble sort is better than insertion sort only when someone is looking for top k elements from a large list of number
i.e. in bubble sort after k iterations you'll get top k elements. However after k iterations in insertion sort, it only assures that those k elements are sorted.
Though both the sorts are O(N^2).The hidden constants are much smaller in Insertion sort.Hidden constants refer to the actual number of primitive operations carried out.
When insertion sort has better running time?
Array is nearly sorted-notice that insertion sort does fewer operations in this case, than bubble sort.
Array is of relatively small size: insertion sort you move elements around, to put the current element.This is only better than bubble sort if the number of elements is few.
Notice that insertion sort is not always better than bubble sort.To get the best of both worlds, you can use insertion sort if array is of small size, and probably merge sort(or quicksort) for larger arrays.
Number of swap in each iteration
Insertion-sort does at most 1 swap in each iteration.
Bubble-sort does 0 to n swaps in each iteration.
Accessing and changing sorted part
Insertion-sort accesses(and changes when needed) the sorted part to find the correct position of a number in consideration.
When optimized, Bubble-sort does not access what is already sorted.
Online or not
Insertion-sort is online. That means Insertion-sort takes one input at a time before it puts in appropriate position. It does not have to compare only adjacent-inputs.
Bubble-sort is not-online. It does not operate one input at a time. It handles a group of inputs(if not all) in each iteration. Bubble-sort only compare and swap adjacent-inputs in each iteration.
insertion sort:
1.In the insertion sort swapping is not required.
2.the time complexity of insertion sort is Ω(n)for best case and O(n^2) worst case.
3.less complex as compared to bubble sort.
4.example: insert books in library, arrange cards.
bubble sort:
1.Swapping required in bubble sort.
2.the time complexity of bubble sort is Ω(n)for best case and O(n^2) worst case.
3.more complex as compared to insertion sort.
I will try to give a more concise and informative answer than others.
Yes, after each pass, insertion sort and bubble sort intuitively seem the same - they both build a sorted sublist at the edge.
However, insertion sort will perform fewer comparisons in general. With insertion sort, we are only performing a linear search in the sorted sublist with each pass. With random data, you can expect to make m/2 comparisons and swaps, where m is the size of the sorted sublist.
With bubble sort, we are always comparing EVERY pair in the unsorted sublist with each pass, so that's n-m comparisons (twice as many as insertion sort on random data). This means bubble sort is bad if comparisons are expensive/slow.
Also, the branching associated with swaps and compares for insertion sort is more predictable. We do a linear search at the same time as a linear insert, and we can generally predict/assume that the linear search/insert will continue until the correct space is found. With bubble sort, branching is essentially random, and we can expect a branch miss half the time! With every single compare! This means bubble sort is bad for pipelined processors if comparisons and swaps are relatively cheap/fast.
These factors make bubble sort much slower in general than insertion sort.
Insertion Sort: We insert the elements into their proper positions in the array, one at a time. When we reach the nth element in the array, the n-1 elements are sorted.
Bubble Sort: We start with a bubble of one element and keep extending the bubble by a quantity of 1, until all elements are added. At any iteration, we simply swap the adjacent elements in the proper order so as to get the largest element at the end of the bubble. In this way, we keep on putting the largest element at the end of the array, and finally after all iterations our sorting is done.
Bubble Sort and Insertion sort complexity: O(n^2)
Insertion is faster as compared to Bubble sort, for the following reason:
Insertion sort just compares an element to a sorted array, that is ith element to the array containing 1...i-1 elements, which are sorted already. Therefore, there are less number of comparisons and swaps.
In Bubble sort, however, as the bubble increases, the same iteration of comparing each pair of neighbors runs. This leads to a lot more comparisons and swapping as compared to Insertion Sort.
Therefore, even though the time complexity of both the algorithms is O(n^2); insertion sort results in a faster approach that bubble sort.
Insertion sort can be resumed as "Look for the element which should be at first position(the minimum), make some space by shifting next elements, and put it at first position. Good. Now look at the element which should be at 2nd...." and so on...
Bubble sort operate differently which can be resumed as "As long as I find two adjacent elements which are in the wrong order, I swap them".
Bubble sort is almost useless under all circumstances. In use cases when insertion sort may have too many swaps, selection sort can be used because it guarantees less than N times of swap. Because selection sort is better than bubble sort, bubble sort has no use cases.

Big(0) running time for selection sort

You are given a list of 100 integers that have been read from a file. If all values are zero, what would be the running time (in terms of O-notation) of a selection sort algorithm.
I thought it was O(n) because selection sort starts with the leftmost number as the sorted side. then it goes through the rest of the array to find the smallest number and swaps it with the the first number in the sorted side. But since they are all zeros then it won't swap any numbers (or so I think).
my teacher said that it is O(n^2). can anyone explain why?
Selection sort is not adaptive. Each element will always be compared with each other element (Compare n elements with n other elements → n^2 comparisons). Thus, selection sort always has O(n^2) comparisons. It has, however, O(n) swaps.
Think of a table with n rows and n colums, and each cell needs a comparison to fill the value (except the diagonal).
More info on this amazing website

Most suitable sorting algorithm

I have to sort a large array of doubles of size 100000.
The point is that I do not want to sort the whole array but only find the largest 20000 elements in descending order.
Currently I am using selection sort. Any way to improve the performance?
100,000 is not a very large array on most modern devices. Are you sure you can't just sort all of them using a standard library sorting function?
You can avoid a full sort by using a variation of heapsort. Normally in a heapsort you build a heap of the entire data set (100,000 elements in your case). Instead, only allow the heap to grow to 20,000 elements. Keep the largest element at the top of the heap. Once the heap is full (20,000 elements), you compare each subsequent element of the data set to the top of the heap. If the next data set element is larger than the top of the heap, just skip it. If it's smaller than the top of the heap, pop the top of the heap and insert the element from the data set.
Once you've gone through the entire data set, you have a heap of the 20,000 smallest elements of the data set. You can pop them one-by-one into an array to have a sorted array.
This algorithm runs in O(N log K) time, where N is the size of the data set (100,000 in your example) and K is the number of elements you want to keep (20,000 in your example).
I'd suggest starting with bucket sort and then using some of the simpler algorithms to sort each bucket. If any of them is still too big, you can either use bucket sort again or another nlog(n) method (such as mergesort or quicksort). Otherwise, selection (or better, insertion) will do just fine.
Just for comparison: selection/insertion/quicksort is O(n*n), mergesort is O(nlog(n)), bucket sort is O(n*k), where k is the number of buckets. Choose k < log(n) and you'll get a better performance than the alternatives.
Note: quicksort's worst case scenario is O(n*n), but in practice it is much faster.
Update O(n*k) is the average performance for bucket sort, not the worst case, so the same note above applies.
If you use bubble sort algorithm and move to left smaller number, after 20.000th iteration there will be smallest numbers in the end of the array in descending order.For example 3 7 2 5 1 4 8 array:
1 iteration: 7 3 5 2 4 8 1
2 iteration: 7 5 3 4 8 2 1
3 iteration: 7 5 4 8 3 2 1
After 3rd iteration there are 3 smallest elements in the end in descending order.
I recommend this because in this case complexity depends from number of elements you want to sort. And if you want to get small number of elements your program will work fast. Complexity is O(k*n) where k is number of elements you want to get.
You can get the first K sorted elements with a modified quicksort. The key is to realise that, once you've reordered your list around the pivot, you can forget about sorting the right-hand side if your pivot is ≥K.
In short, just replace the "right-hand" recursive call to quicksort() with
if (pivot >= k) quicksort(...)
Alternatively, you could follow the standard heapsort algorithm, but stop after pulling K elements from the heap.
Both of these approaches take O(N + KlogN) time, O(N) space, and can be done in-place.
you can improve by using Quick sort algorithm to improve its efficiency, or you can use merge sort that will do this in nlog(n) time. calculate boths running time and find which is suitable for your snario.

each group in quick Select algorithm should be sort?

i have a question that my teacher in his lecture for quick select algorithm he says that after we Considering the array as groups of 5 elements;it doesn't need to sort each group,is he correct? because when we have a group like <3,5,7,6,1> without sorting how can we find the median??? thanks
EDITED: it is not about quick select it is about linear general selection algorithm
If all you need is the Median, then sorting it first could be more expensive than simply running a half-version of a selection sort, depending on your sort algorithm. In an array of n elements, you know the median will be the middle (n/2+1) element if n is odd, or the average of the two middle elements (n/2, n/2+1) if even. So perform a normal selection sort, but instead of running the entire O(N) operation, run it only halfway to obtain that selected median value.
You could also do the very easy Bubble Sort, but only run it n/2 times. This will ensure the median is in the middle, and is conceptually easy. Do it manually on paper if you doubt it.
The median is the floor(n / 2) + 1th smallest element in sorted order, which the selection algorithm can find in O(n) (considering n to be odd for convenience). So if you know that all the elements to the left of k are smaller than k and all the ones to the right are bigger than k and k is in position floor(n / 2) + 1, then you know that k is the median. You don't need to sort.
For example:
8 3 11 20 18 => 11 is the median because it's in the middle and smaller than everything after it and bigger than everything before it. No sorting required.
There are many variants of the selection algorithm. The basic idea is the same for all of them, but some details might be different. Post your teacher's implementation and try to clarify your question if you need more localized help.

Resources