I got a problem. I'm very confused over shell sort and insertion sort algorithms. How should we distinguish from each other?
Shell sort is a generalized version of Insertion sort. The basic priciple is the same for both algorithms. You have a sorted sequence of length n and you insert the unsorted element into it - and you get n+1 elements long sorted sequence.
The difference follows: while Insertion sort works only with one sequence (initially the first element of the array) and expands it (using the next element).
However, shell sort has a diminishing increment, which means, that there is a gap between the compared elements (initially n/2). Hence there are n/2 sequences to be sorted using insertion sort. In each step the increment is shrinked (often just divided by 2.2) and the number of sequences is reduced. In the last step there is no gap and the algorithm degenerates to simple insertion sort.
Because of the diminishing increment, the large and small elements are moved rapidly to correct part of the array and than in the last step sorted using insertion sort really fast. This leads to reduced time complexity O(n^(4/3))
You can implement insertion sort as a series of comparisons and swaps of contiguous elements. That makes it a "stable sort". Shell sort, instead, compares and swaps elements which are far from each other. That makes it faster.
I suppose that your confusion comes from the fact that shell sort can be implemented as several insertion sorts applied to different subsets of the data. Note that these subsets are composed of noncontiguous elements of the data sequence.
See the Wikipedia for more details ;-)
The insertion sort is a simple, in-place, O(N^2) sort. Shell sort is a little more complex and harder to understand, and somewhere around O(N^(5/4)). Check the links out for examples -- it should be easy to see the difference.
Related
when I got the answer:
http://clrs.skanev.com/08/03/02.html for exercise 8.3-2,
I could not understand how to use index specifically to solve it.
Could someone please show it step by step or interpret why is Θ(n)?
and here is the question and answer:
Which of the following sorting algorithms are stable: insertion sort, merge sort, heapsort, and quicksort? Give a simple scheme that makes any sorting algorithm stable. How much additional time and space does your scheme entail?
Stable: Insertion sort, merge sort
Not stable: Heapsort, quicksort
We can make any algorithm stable by mapping the array to an array of pairs, where the first element in each pair is the original element and the second is its index. Then we sort lexicographically. This scheme takes additional Θ(n) space.
In the context of sorting, "stable" means that when a collection containing some elements with equivalent value is sorted, those elements stay in the same order with respect to each other.
So a sorting algorithm can be made stable by storing the original index of each element, and using that index as a secondary way of sorting elements with equal primary value.
To implement this the comparison function (for example <) would be implemented
so A < B returns true if A.PrimarySortValue < B.PrimarySortValue, and returns (A.OrginalIndex < B.OriginalIndex) when A.PrimarySortValue == B.PrimarySortValue. otherwise (when A.PrimarySortValue > B.PrimarySortValue) it returns false;
This requires one additional OriginalIndex value to be stored per element. There are n elements hence Θ(n) extra space is required.
So i guess its because it just compares A[k] and A[k-1], and does the implementation in one sweep but its still not clear. Can someone explain better.
Thanks
This link shows a graphical representation of sorting algorithm with different types of data set.
As you can see, when the data is sorted the algorithm complexity is reduced to N. Which is equivalent to the number of elements as inputs.
The link provided gives a clear picture of how its more efficient.
You answered your own question: For a nearly sorted array, insertion sort will only need a handful of O(n) passes to complete. Contrast that to a divide and conquer sorting algorithm like merge sort, which takes O(n*lgn). For any non trivial value of n, a divide and conquer algorithm will need many O(n) passes, even if the array be almost completely sorted, whereas insertion sort might only require a few.
Insertion sort is a faster and more improved sorting algorithm than selection sort. In selection sort the algorithm iterates through all of the data through every pass whether it is already sorted or not. However, insertion sort works differently, instead of iterating through all of the data after every pass the algorithm only traverses the data it needs to until the segment that is being sorted is sorted. Again there are two loops that are required by insertion sort and therefore two main variables, which in this case are named 'i' and 'j'. Variables 'i' and 'j' begin on the same index after every pass of the first loop, the second loop only executes if variable 'j' is greater then index 0 AND arr[j] < arr[j - 1]. In other words, if 'j' hasn't reached the end of the data AND the value of the index where 'j' is at is smaller than the value of the index to the left of 'j', finally 'j' is decremented. As long as these two conditions are met in the second loop it will keep executing, this is what sets insertion sort apart from selection sort. Only the data that needs to be sorted is sorted.
The general goal of a sorting algorithm is to minimize the number of comparisons. Sorting algorithms have a lower bound and an upper bound on the number of comparisons( n log n worst-case for merge and heap sorts, n log n average case for quick sort). In the most general case, you'd go with an algorithm that happens to have the best average or best worst-case number of comparisons. However, when you know something about the data (e.g., the array is already sorted, or almost sorted), you can exploit the fact that insertion sort's lower bound is far lower than the "n log n" sorts.
For example, if you have an array [1,2,3,4,5,6,7,9] and you need to insert 8 into it, you can either insert it at the end, and sort the array using a vanilla n log n sort (which will do about 28 comparisons (roughly) to sort the data to [1,2,3,4,5,6,7,8,9]). However, insertion sort lets you insert the 8 at the right position in only about 8 comparisons.
Ques: Mergesort divides a list of numbers into two halves and calls itself recursively on both of them. Instead can you perform quicksort on the left half and mergesort on the right half? If yes, show how it will sort the following list of numbers by showing every step. If no, explain why you cannot.
Iam supposed to sort a list of numbers using mergesort. Where the left half is to be sorted using a quicksort ?
I figured it out.
Ans:Yes,we can
Sort the right half of the array using mergesort.
Sort the left half using quicksort.
Merge the 2 using the merge func of merge_sort.
Yes, you can do this. The basic idea behind mergesort is the following:
Split the array into two (or more) pieces.
Sort each piece independently.
Apply a merge step to combine the sorted pieces into one overall sorted list.
From the perspective of correctness, it doesn't actually matter how you sort the lists generated in part (2). All that matters is that those lists get sorted. A typical implementation of mergesort does step (2) by recursively applying itself to the left and right halves, but there's no fundamental reason you have to do this. (In fact, in some optimized versions of mergesort, you specifically don't do this and instead switch to an algorithm like insertion sort when the arrays get sufficiently small).
In your case, you are correct that using quicksort on the left and mergesort on the right would still produce a sorted sequence. However, the way in which it would work would look quite different from what you're describing. What would end up happening is something like this: the first half of the array would get quicksorted (because you quicksort the left half), then you'd recursively sort the right half. The first half of that would get quicksorted, then you'd recursively sort the right half. The first half of that would get quicksorted, etc. Overall this would look something like this:
You quicksort the first half of the array, then the first half of what's left, then the first half of what's left, etc. until there are no elements left.
Then, working from left to right, you'd merge the last two elements together, then the last four, then the last eight, etc.
This would be a pretty cool-looking sort, but doing it by hand would be a total pain. You might be better off writing a program that just does this and showing all the intermediate steps. :-)
No, you cannot do it. At least if you still want to call it "merge sort". The most fundamental difference between merge sort and quick sort is that the first is a stable algorithm, i.e. equally ordered elements keep their relative positions unaltered after sorting. This is important in many scenarios.
If you sort the second half using quick sort, the relative position of equal elements can (and very likely will) change. The resulting set will not preserve stability so it can't be still considered merge sort.
By the way, previous answer is correct regarding insertion sort used as the last step of merge sort. Most efficient merge sort implementations will use something like insertion sort when the number of elements is small. Insertion sort is also stable, that's why it can be done without breaking merge sort stability.
If anyone can give some input on my logic, I would very much appreciate it.
Which method runs faster for an array with all keys identical, selection sort or insertion sort?
I think that this would be similar to when the array is already sorted, so that insertion sort will be linear, and the selection sort quadratic.
Which method runs faster for an array in reverse order, selection sort or insertion sort?
I think that they would run similarly, since the values at every position will have to be changed. The worst case scenario for insertion sort is reverse order, so that would mean it is quadratic, and then the selection sort would already be quadratic as well.
Suppose that we use insertion sort on a randomly ordered array where elements have only one of three values. Is the running time linear, quadratic, or something in between?
Since it is randomly sorted, I think that would mean that the insertion sort would have to perform many more times the number of operations that the number of values. If that's the case, then its not linear.So, it would likely be quadratic, or perhaps a little below quadratic.
What is the maximum number of times during the execution of Quick.sort() that the largest item can be exchanged, for an array of length N?
The maximum number cannot be passed over more times than there are spaces available, since it should always be approaching its right position. So, going from being the first to the last value spot, it would be exchanged N times.
About how many compares will quick.sort() make when sorting an array of N items that are all equal?
When drawing out the quick sort , a triangle can be drawn around the compared objects at every phase, that is N tall and N wide, the area of this would equal the number of compares performed, which would be (N^2)/2
Here are my comments on your comments:
Which method runs faster for an array with all keys identical, selection sort or insertion sort?
I think that this would be similar to when the array is already sorted, so that insertion sort will be linear, and the selection sort quadratic.
Yes, that's correct. Insertion sort will do O(1) work per element and visit O(n) elements for a total runtime of O(n). Selection sort always runs in time Θ(n2) regardless of the input structure, so its runtime will be quadratic.
Which method runs faster for an array in reverse order, selection sort or insertion sort?
I think that they would run similarly, since the values at every position will have to be changed. The worst case scenario for insertion sort is reverse order, so that would mean it is quadratic, and then the selection sort would already be quadratic as well.
You're right that both algorithms have quadratic runtime. The algorithms should actually have relatively comparable performance, since they'll make the same total number of comparisons.
Suppose that we use insertion sort on a randomly ordered array where elements have only one of three values. Is the running time linear, quadratic, or something in between?
Since it is randomly sorted, I think that would mean that the insertion sort would have to perform many more times the number of operations that the number of values. If that's the case, then its not linear.So, it would likely be quadratic, or perhaps a little below quadratic.
This should take quadratic time (time Θ(n2)). Take just the elements in the back third of the array. About a third of these elements will be 1's, and in order to insert them into the sorted sequence they'd need to be moved above 2/3's of the way down the array. Therefore, the work done would be at least (n / 3)(2n / 3) = 2n2 / 9, which is quadratic.
What is the maximum number of times during the execution of Quick.sort() that the largest item can be exchanged, for an array of length N?
The maximum number cannot be passed over more times than there are spaces available, since it should always be approaching its right position. So, going from being the first to the last value spot, it would be exchanged N times.
There's an off-by-one error here. When the array has size 1, the largest element can't be moved any more, so the maximum number of moves would be N - 1.
About how many compares will quick.sort() make when sorting an array of N items that are all equal?
When drawing out the quick sort , a triangle can be drawn around the compared objects at every phase, that is N tall and N wide, the area of this would equal the number of compares performed, which would be (N^2)/2
This really depends on the implementation of Quick.sort(). Quicksort with ternary partitioning would only do O(n) total work because all values equal to the pivot are excluded in the recursive calls. If this isn't done, then your analysis would be correct.
Hope this helps!
I am doing my revision for the exam.
Would like to know under what condition will Insertion sort performs better than bubble sort given same average case complexity of O(N^2).
I did found some related articles, but I can't understand them.
Would anyone mind explaining it in a simple way?
The advantage of bubblesort is in the speed of detecting an already sorted list:
BubbleSort Best Case Scenario: O(n)
However, even in this case insertion sort got better/same performance.
Bubblesort is, more or less, only good for understanding and/or teaching the mechanism of sortalgorithm, but wont find a proper usage in programming these days, because its complexity
O(n²)
means that its efficiency decreases dramatically on lists of more than a small number of elements.
Following things came to my mind:
Bubble sort always takes one more pass over array to determine if it's sorted. On the other hand, insertion sort not need this -- once last element inserted, algorithm guarantees that array is sorted.
Bubble sort does n comparisons on every pass. Insertion sort does less than n comparisons: once the algorithm finds the position where to insert current element it stops making comparisons and takes next element.
Finally, quote from wikipedia article:
Bubble sort also interacts poorly with modern CPU hardware. It
requires at least twice as many writes as insertion sort, twice as
many cache misses, and asymptotically more branch mispredictions.
Experiments by Astrachan sorting strings in Java show bubble sort to
be roughly 5 times slower than insertion sort and 40% slower than
selection sort
You can find link to original research paper there.
I guess the answer you're looking for is here:
Bubble sort may also be efficiently used on a list that is already
sorted except for a very small number of elements. For example, if
only one element is not in order, bubble sort will take only 2n time.
If two elements are not in order, bubble sort will take only at most
3n time...
and
Insertion sort is a simple sorting algorithm that is relatively
efficient for small lists and mostly sorted lists, and often is used
as part of more sophisticated algorithms
Could you provide links to the related articles you don't understand? I'm not sure what aspects they might be addressing. Other than that, there is a theoretical difference which might be that bubble sort is more suited for collections represented as arrays (than it is for those represented as linked lists), while insertion sort is suited for linked lists.
The reasoning would be that bubble sort always swaps two items at a time which is trivial on both, array and linked list (more efficient on arrays), while insertion sort inserts at a place in a given list which is trivial for linked lists but involves moving all subsequent elements in an array to the right.
That being said, take it with a grain of salt. First of all, sorting arrays is, in practice, almost always faster than sorting linked lists. Simply due to the fact that scanning the list once has an enormous difference already. Apart from that, moving n elements of an array to the right, is much faster than performing n (or even n/2) swaps. This is why other answers correctly claim insertion sort to be superior in general, and why I really wonder about the articles you read, because I fail to think of a simple way of saying this is better in cases A, and that is better in cases B.
In the worst case both tend to perform at O(n^2)
In the best case scenario, i.e., when the array is already sorted, Bubble sort can perform at O(n).