InsertionSort vs. InsertionSort vs. BinaryInsertionSort - algorithm

I have a couple of questions concerning different implementations of insertion sort.
Implementation 1:
public static void insertionSort(int[] a) {
for (int i = 1; i < a.length; ++i) {
int key = a[i];
int j = i - 1;
while (j >= 0 && a[j] > key) {
a[j + 1] = a[j];
--j;
}
a[j + 1] = key;
}
}
Implementation 2:
public static void insertionSort(int[] a) {
for (int i = 1; i < a.length; ++i) {
for (int j = i; j > 0 && a[j - 1] > a[j]; --j) {
swap(a, j, j - 1);
}
}
}
private static void swap(int[] a, int i, int j) {
int tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
Here's my first question: One should think that the first version should be a little faster that the second version (because of lesser assignments) but it isn't (or at least the difference it's negligible). But why?
Second, I was wondering that Java's Arrays.sort() method also uses the second approach (maybe because of code reuse because the swap method is used in different places, maybe because it's easier to understand).
Implementation 3 (binaryInsertionSort):
public static void binaryInsertionSort(int[] a) {
for (int i = 1; i < a.length; ++i) {
int pos = Arrays.binarySearch(a, 0, i, a[i]);
int insertionPoint = (pos >= 0) ? pos : -pos - 1;
if (insertionPoint < i) {
int key = a[i];
// for (int j = i; i > insertionPoint; --i) {
// a[j] = a[j - 1];
// }
System.arraycopy(a, insertionPoint, a, insertionPoint + 1, i - insertionPoint);
a[insertionPoint] = key;
}
}
}
Is the binary insertion sort of any practical use, or is it more of a theoretical thing? On small arrays, the other approaches are much faster, and on bigger arrays mergesort/quicksort has a much better performance.

delete false claim
The number of comparisons in the first two is 1/2*n(n-1), excluding those for the outer loops.
None of these programs make much sense for real work as they stand, because they don't make use of the information at their disposal. For instance, it is easy to add a check to the inner loop to see if any swaps have been made: if not then the array is sorted, and you can finish, perhaps saving most of the work. In practice, these kinds of consideration can dominate the average case.
Postscript
Missed the question about Java: I understand that Java's sort is a pretty complex algorithm, which uses a lot of special cases, such as specialised sorting cases for small arrays, and using quicksort to do its heavy lifting.

Related

Merge sort gives poor efficiency and isn't affected by compiler optimizations

While trying to measure the time various sorting algorithms require to sort a random array of unsigned integers, I've obtained some peculiar behavior regarding top-down Merge sort that doesn't seem to be caused by bad implementation.
On arrays of length up to 1 million values, Merge sort behaves a lot worst than random-pivot Quicksort and even Shell sort. This is unexpected so I've tried with multiple online implementations of Merge sort but the result still seems to be about the same.
Graph 1, optimizations ON
This is the implementation I used for these graphs:
void merge(int *array, int l, int m, int r) {
int i, j, k, nl, nr;
nl = m - l + 1; nr = r - m;
int *larr = new int[nl], *rarr = new int[nr];
for (i = 0; i < nl; i++)
larr[i] = array[l + i];
for (j = 0; j < nr; j++)
rarr[j] = array[m + 1 + j];
i = 0; j = 0; k = l;
while (i < nl && j < nr) {
if (larr[i] <= rarr[j]) {
array[k] = larr[i];
i++;
}
else {
array[k] = rarr[j];
j++;
}
k++;
}
while (i < nl) {
array[k] = larr[i];
i++; k++;
}
while (j < nr) {
array[k] = rarr[j];
j++; k++;
}
delete[] larr;
delete[] rarr;
}
void mergeSort(int *array, int l, int r) {
if (l < r) {
int m = l + (r - l) / 2;
mergeSort(array, l, m);
mergeSort(array, m + 1, r);
merge(array, l, m, r);
}
}
I have also tried to remove compiler optimizations (VisualC++15), favoring size instead of speed and this seem to have affected all the other algorithms instead of Merge sort. Nonetheless, it still got the worst time.
Graph 2, optimizations OFF
The only time Merge sort didn't give the worst time was on a test with arrays of 15 million elements where it got just a slightly better performance than Heap sort, but still far from the others.
The values that I plot are the averages of 100 tests with random arrays so I don't think this is just a particular case. I also don't think the use of dynamic memory in Merge sort is the cause of these results, 16GB of RAM are plenty for these tests and everything else.
Does anybody know why Merge sort behaves so badly and why compiler optimizations don't seem to affect Merge sort?

First missing Integer approach's time complexity

I want to understand the time complexity of my below algorithm, which is an acceptable answer for the famous first missing integer problem:
public int firstMissingPositive(int[] A) {
int l = A.length;
int i = 0;
while (i < l) {
int j = A[i];
while (j > 0 && j <= l) {
int k = A[j - 1];
A[j - 1] = Integer.MAX_VALUE;
j = k;
}
i++;
}
for (i = 0; i < l; i++) {
if (A[i] != Integer.MAX_VALUE)
break;
}
return i + 1;
}
Observations and findings:
Looking at the loop structure I thought that the complexity should be more than n as I may visit every element more than twice in some cases. But to my surprise, the solution got accepted. I am not able to understand the complexity.
You are probably looking at the nested loops and thinking O(N2), but it's not that simple.
Every iteration of the inner loop changes an item in A to Integer.MAX_VALUE, and there are only N items, so there cannot be more than N iterations of the inner loop in total.
The total time is therefore O(N).

I am trying to find effective solution for my Homework. Bubble Sort or Insertion Sort?

Hello everyone i have a question. It's my task which one is below:
Let A[] be a natural numbers array of length N, which is partially sorted, i.e. there exists such index i(0 < i < N-1), that the subaray A[0],...,A[i] is incrementally sorted and also the subarray A[i+1],...,A[N] is incrementally sorted. Design the algorithm, which sorts the whole array A[] and works in place (so has space complexity O(1)) and the result must be stored in the same array A[]. Describe the algorithm, its correctness and its time complexity approximation.
For this question which approaching is better? Bubble sorting or Insertion sort? Or is there more effective solution? I prefered bubble sorting for this task but i am open to other opinions
static void bubbleSort(int arr[], int n)
{
int i, j, temp;
boolean swapped;
for (i = 0; i < n - 1; i++)
{
swapped = false;
for (j = 0; j < n - i - 1; j++)
{
if (arr[j] > arr[j + 1])
{
// swap arr[j] and arr[j+1]
temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
swapped = true;
}
}
if (swapped == false)
break;
}
}
static void printArray(int arr[], int size)
{
int i;
for (i = 0; i < size; i++)
System.out.print(arr[i] + " ");
System.out.println();
}
public static void main(String args[])
{
int arr[] = { 1, 8, 45, 12, 22, 11, 90 };
int n = arr.length;
bubbleSort(arr, n);
System.out.println("Sorted array: ");
printArray(arr, n);
}
}
Bubble sort algorithm complexity is O(n^2). Even using if (swapped == false) break; this will not help to reduce the complexity (try for {2,3,4,5,1}, you will find out).
Since there exists such index i(0 < i < N-1), that the subaray A[0],...,A[i] is incrementally sorted and also the subarray A[i+1],...,A[N] is incrementally sorted.This problem can be solve in O(n) run time complexity. If we can find the index i where A[0:i] and A[i+1:n] are sorted, then we can think this problem as merging two sorted array into one array which can be done in O(n) time. Algorithm is given below:
void sortPartialSortedArray(int arr[], int n)
{
int pos = 0;
// find the position for which arr[0:pos] and arr[pos+1:n] is sorted
for(int i=0; i+1<n; i++) {
if(arr[i]>arr[i+1]) {
pos = i;
}
}
int i = pos, j= n-1;
// sort it from last position
while(i>=0 && j>=0) {
if(arr[i] > arr[j]) {
swap(arr[i],arr[j]);
}
j--;
if(i==j) {
i--;
}
}
}

Insertion sort in best case

With reference to Algorithm - Fourth Edition by Robert and Kevin, I am having difficulty in understanding the best case complexity for Insertion sort as per below code:
public class Insertion
{
public static void sort(Comparable[] a)
{ // Sort a[] into increasing order.
int N = a.length;
for (int i = 1; i < N; i++)
{ // Insert a[i] among a[i-1], a[i-2], a[i-3]... ..
for (int j = i; j > 0 && less(a[j], a[j-1]); j--)
exch(a, j, j-1);
}
}
// See page 245 for less(), exch(), isSorted(), and main().
}
It says in the book that in best case (sorted array), the number of exchanges is 0 and number of compares is N-1. While I understood exchanges to be 0, I am having a hard time how can number of compares be N-1 in best case?
If the array is already sorted, then in the specific implementation of insertion-sort that you provide, each element will only be compared to its immediate predecessor. Since it's not less than that predecessor, the inner for-loop then aborts immediately, without requiring any further comparisons or exchanges.
Note that other implementations of insertion-sort do not necessarily have that property.
how can number of compares be N-1 in best case?
The best case happens when you have an already sorted array. The number of comparison is n-1 because the comparison is made from the 2nd element onwards till the last element.
This can also be observed from your given code:
for (int i = 1; i < N; i++) //int i=1 (start comparing from 2nd element)
The source code for the specific implementation is:
public class Insertion
{
public static void sort(Comparable[] a)
{ // Sort a[] into increasing order.
int N = a.length;
bool exc = false;
for (int i = 1; i < N; i++)
{ // Insert a[i] among a[i-1], a[i-2], a[i-3]... ..
for (int j = i; j > 0 && less(a[j], a[j-1]); j--) {
exch(a, j, j-1);
exc = true;
}
if (!exc)
break;
}
}
// See page 245 for less(), exch(), isSorted(), and main().
}

Understanding of shell sort

I have a couple of questions I couldn't find online regarding Shell sort with Shell's gap.
public static void shell(int[] a) {
int increment = a.length / 2;
while (increment > 0) {
for (int i = increment; i < a.length; i++) {
int j = i;
int temp = a[i];
while (j >= increment && a[j - increment] > temp) {
a[j] = a[j - increment];
j = j - increment;
}
a[j] = temp;
}
if (increment == 2) {
increment = 1;
} else {
increment *= (5.0 / 11);
}
}
}
This is the code I found online, but I don't really understand the last else statement. What does 5.0/11 represent?
Also I need to analyse the complexity of the algorithm, though I'm receiving pretty perplexing results:
It seems that it is O(n) either best and worst cases. Are these results legit?
5.0/11 is actually used to get the half of the increment. Anything <0.5 and >0.45 will work to obtain the half of the value in round figures.

Resources