Why Insertion sort faster than Merge sort? - sorting

I created on jsperf.com test for 3 sorting methods: Bubble, Insertion and Merge. Link
Before test I create unsorted array with random number from 0 to 1Mln.
Each time test shows that Insertion sort faster than Merge one.
What's reason for such result, if Merge sort time O(n log(n)) while Insertion and Bubble sorts have O(n^2)
test result here

Without more testing, a tentative answer:
Your insertion sort is fairly optimised - you are only switching elements. Your merge sort instantiates new arrays using [], and creates new arrays using slice and concat, which is a large memory-management overhead, not to mention that concat and slice have implicit loops inside them (although in native code). Merge sort is efficient when it is done in-place; with all the copying going on, that should slow you down a lot.

As commented by Amadan, it would be best for merge sort to do a one time allocation of the same size as the array to be sorted. Top down merge sort uses recursion to generate the indices used by merge, while bottom up skips the recursion and use iteration to generate the indices. Most of the time will be spent doing the actual merging of sub-arrays, so top down's excess overhead on larger arrays (1 million element or more) is only about 5%.
Example C++ code for a somewhat optimized bottom up merge sort.
void MergeSort(int a[], size_t n) // entry function
{
if(n < 2) // if size < 2 return
return;
int *b = new int[n];
BottomUpMergeSort(a, b, n);
delete[] b;
}
size_t GetPassCount(size_t n) // return # passes
{
size_t i = 0;
for(size_t s = 1; s < n; s <<= 1)
i += 1;
return(i);
}
void BottomUpMergeSort(int a[], int b[], size_t n)
{
size_t s = 1; // run size
if(GetPassCount(n) & 1){ // if odd number of passes
for(s = 1; s < n; s += 2) // swap in place for 1st pass
if(a[s] < a[s-1])
std::swap(a[s], a[s-1]);
s = 2;
}
while(s < n){ // while not done
size_t ee = 0; // reset end index
while(ee < n){ // merge pairs of runs
size_t ll = ee; // ll = start of left run
size_t rr = ll+s; // rr = start of right run
if(rr >= n){ // if only left run
rr = n;
BottomUpCopy(a, b, ll, rr); // copy left run
break; // end of pass
}
ee = rr+s; // ee = end of right run
if(ee > n)
ee = n;
// merge a pair of runs
BottomUpMerge(a, b, ll, rr, ee);
}
std::swap(a, b); // swap a and b
s <<= 1; // double the run size
}
}
void BottomUpCopy(int a[], int b[], size_t ll, size_t rr)
{
while(ll < rr){ // copy left run
b[ll] = a[ll];
ll++;
}
}
void BottomUpMerge(int a[], int b[], size_t ll, size_t rr, size_t ee)
{
size_t o = ll; // b[] index
size_t l = ll; // a[] left index
size_t r = rr; // a[] right index
while(1){ // merge data
if(a[l] <= a[r]){ // if a[l] <= a[r]
b[o++] = a[l++]; // copy a[l]
if(l < rr) // if not end of left run
continue; // continue (back to while)
while(r < ee) // else copy rest of right run
b[o++] = a[r++];
break; // and return
} else { // else a[l] > a[r]
b[o++] = a[r++]; // copy a[r]
if(r < ee) // if not end of right run
continue; // continue (back to while)
while(l < rr) // else copy rest of left run
b[o++] = a[l++];
break; // and return
}
}
}

Related

Mergesort implementation is slow

I'am doing a report about different sorting algorithms in C++. What baffles me is that my mergesort seems to be slower than heapsort in both of the languages. What I've seen is that heapsort is supposed to be slower.
My mergesort sorts an unsorted array with size 100000 at a speed of 19.8 ms meanwhile heapsort sorts it at 9.7 ms. The code for my mergesort function in C++ is as follows:
void merge(int *array, int low, int mid, int high) {
int i, j, k;
int lowLength = mid - low + 1;
int highLength = high - mid;
int *lowArray = new int[lowLength];
int *highArray = new int[highLength];
for (i = 0; i < lowLength; i++)
lowArray[i] = array[low + i];
for (j = 0; j < highLength; j++)
highArray[j] = array[mid + 1 + j];
i = 0;
j = 0;
k = low;
while (i < lowLength && j < highLength) {
if (lowArray[i] <= highArray[j]) {
array[k] = lowArray[i];
i++;
} else {
array[k] = highArray[j];
j++;
}
k++;
}
while (i < lowLength) {
array[k] = lowArray[i];
i++;
k++;
}
while (j < highLength) {
array[k] = highArray[j];
j++;
k++;
}
}
void mergeSort(int *array, int low, int high) {
if (low < high) {
int mid = low + (high - low) / 2;
mergeSort(array, low, mid);
mergeSort(array, mid + 1, high);
merge(array, low, mid, high);
}
}
The example merge sort is doing allocation and copying of data in merge(), and both can be eliminated with a more efficient merge sort. A single allocation for the temp array can be done in a helper / entry function, and the copy is avoided by changing the direction of merge depending on level of recursion either by using two mutually recursive functions (as in example below) or with a boolean parameter.
Here is an example of a C++ top down merge sort that is reasonably optimized. A bottom up merge sort would be slightly faster, and on a system with 16 registers, a 4 way bottom merge sort a bit faster still, about as fast or faster than quick sort.
// prototypes
void TopDownSplitMergeAtoA(int a[], int b[], size_t ll, size_t ee);
void TopDownSplitMergeAtoB(int a[], int b[], size_t ll, size_t ee);
void TopDownMerge(int a[], int b[], size_t ll, size_t rr, size_t ee);
void MergeSort(int a[], size_t n) // entry function
{
if(n < 2) // if size < 2 return
return;
int *b = new int[n];
TopDownSplitMergeAtoA(a, b, 0, n);
delete[] b;
}
void TopDownSplitMergeAtoA(int a[], int b[], size_t ll, size_t ee)
{
if((ee - ll) == 1) // if size == 1 return
return;
size_t rr = (ll + ee)>>1; // midpoint, start of right half
TopDownSplitMergeAtoB(a, b, ll, rr);
TopDownSplitMergeAtoB(a, b, rr, ee);
TopDownMerge(b, a, ll, rr, ee); // merge b to a
}
void TopDownSplitMergeAtoB(int a[], int b[], size_t ll, size_t ee)
{
if((ee - ll) == 1){ // if size == 1 copy a to b
b[ll] = a[ll];
return;
}
size_t rr = (ll + ee)>>1; // midpoint, start of right half
TopDownSplitMergeAtoA(a, b, ll, rr);
TopDownSplitMergeAtoA(a, b, rr, ee);
TopDownMerge(a, b, ll, rr, ee); // merge a to b
}
void TopDownMerge(int a[], int b[], size_t ll, size_t rr, size_t ee)
{
size_t o = ll; // b[] index
size_t l = ll; // a[] left index
size_t r = rr; // a[] right index
while(1){ // merge data
if(a[l] <= a[r]){ // if a[l] <= a[r]
b[o++] = a[l++]; // copy a[l]
if(l < rr) // if not end of left run
continue; // continue (back to while)
while(r < ee) // else copy rest of right run
b[o++] = a[r++];
break; // and return
} else { // else a[l] > a[r]
b[o++] = a[r++]; // copy a[r]
if(r < ee) // if not end of right run
continue; // continue (back to while)
while(l < rr) // else copy rest of left run
b[o++] = a[l++];
break; // and return
}
}
}

mergesort running faster than radix sort

I sorted a million random positive long numbers about 20 digits in length using my implementations of Merge sort and Radix sort.
The merge sort is significantly, almost 6 times, faster than the Radix sort.
I understand the time complexity of Radix sort also depends on the number of digits of the integers, but my merge implementation is beating my Radix implementation on all input sizes.
I am using my own queue class that has constant time push() and pop() in my radix sort. I am using arrays in the merge sort. Does this have something to do with this?
public static void RadixSort(long arr[]) {
//Using 10 queues for each digit from 0-9.
Queue q[] = new Queue[10];
for (int i = 0; i < 10; i++)
q[i] = new Queue();
boolean allNumbersNotBucketed = true;
long divisor = 1;
while (allNumbersNotBucketed) {
allNumbersNotBucketed = false;
for (int i = 0; i < arr.length; i++) {
long digit = (arr[i] / divisor) % 10;
q[(int) digit].enqueue(arr[i]);
//Put number into appropriate queue.
if(digit > 0) allNumbersNotBucketed = true;
}
int pos = 0;
divisor *= 10;
for (int i = 0; i < 10; i++)
while (!q[i].isEmpty())
arr[pos++] = q[i].dequeue();
//Put queue contents back into array
}
}
Here is the merge sort
public static void mergeSort(long[] a) {
long[] tmp = new long[a.length];
mergeSort(a, tmp, 0, a.length - 1);
}
private static void mergeSort(long[] a, long[] tmp, int left, int right) {
if (left < right) {
int center = (left + right) / 2;
mergeSort(a, tmp, left, center); //Divide 0 to middle
mergeSort(a, tmp, center + 1, right); // Divide middle to center
merge(a, tmp, left, center + 1, right); //Merge sorted lists
}
}
private static void merge(long[] a, long[] tmp, int left, int right,
int rightEnd) {
long leftEnd = right - 1;
int k = left;
long num = rightEnd - left + 1;
//Put smallest element into tmp while both lists
//are non empty.
while (left <= leftEnd && right <= rightEnd)
if (a[left] < a[right])
tmp[k++] = a[left++];
else
tmp[k++] = a[right++];
while (left <= leftEnd)
// Copy rest of first half
tmp[k++] = a[left++];
while (right <= rightEnd)
// Copy rest of right half
tmp[k++] = a[right++];
// Copy tmp back
for (long i = 0; i < num; i++, rightEnd--)
a[rightEnd] = tmp[rightEnd];
}
EDIT:
I was rather stupidly using a LinkedList style Queue. I changed it to use a native array and now the merge sort is only twice as fast as compared to 6 times as fast earlier. The merge sort is still faster even for numbers only 10 digits long. I guess the BigO constants are in play here. Multiple million function calls to push() and pop() could also be to blame here.

How to find the subarray that has sum closest to zero or a certain value t in O(nlogn)

Actually it is the problem #10 of chapter 8 of Programming Pearls 2nd edition. It asked two questions: given an array A[] of integers(positive and nonpositive), how can you find a continuous subarray of A[] whose sum is closest to 0? Or closest to a certain value t?
I can think of a way to solve the problem closest to 0. Calculate the prefix sum array S[], where S[i] = A[0]+A[1]+...+A[i]. And then sort this S according to the element value, along with its original index information kept, to find subarray sum closest to 0, just iterate the S array and do the diff of the two neighboring values and update the minimum absolute diff.
Question is, what is the best way so solve second problem? Closest to a certain value t? Can anyone give a code or at least an algorithm? (If anyone has better solution to closest to zero problem, answers are welcome too)
To solve this problem, you can build an interval-tree by your own,
or balanced binary search tree, or even beneficial from STL map, in O(nlogn).
Following is use STL map, with lower_bound().
#include <map>
#include <iostream>
#include <algorithm>
using namespace std;
int A[] = {10,20,30,30,20,10,10,20};
// return (i, j) s.t. A[i] + ... + A[j] is nearest to value c
pair<int, int> nearest_to_c(int c, int n, int A[]) {
map<int, int> bst;
bst[0] = -1;
// barriers
bst[-int(1e9)] = -2;
bst[int(1e9)] = n;
int sum = 0, start, end, ret = c;
for (int i=0; i<n; ++i) {
sum += A[i];
// it->first >= sum-c, and with the minimal value in bst
map<int, int>::iterator it = bst.lower_bound(sum - c);
int tmp = -(sum - c - it->first);
if (tmp < ret) {
ret = tmp;
start = it->second + 1;
end = i;
}
--it;
// it->first < sum-c, and with the maximal value in bst
tmp = sum - c - it->first;
if (tmp < ret) {
ret = tmp;
start = it->second + 1;
end = i;
}
bst[sum] = i;
}
return make_pair(start, end);
}
// demo
int main() {
int c;
cin >> c;
pair<int, int> ans = nearest_to_c(c, 8, A);
cout << ans.first << ' ' << ans.second << endl;
return 0;
}
You can adapt your method. Assuming you have an array S of prefix sums, as you wrote, and already sorted in increasing order of sum value. The key concept is to not only examine consecutive prefix sums, but instead use two pointers to indicate two positions in the array S. Written in a (slightly pythonic) pseudocode:
left = 0 # Initialize window of length 0 ...
right = 0 # ... at the beginning of the array
best = ∞ # Keep track of best solution so far
while right < length(S): # Iterate until window reaches the end of the array
diff = S[right] - S[left]
if diff < t: # Window is getting too small
if t - diff < best: # We have a new best subarray
best = t - diff
# remember left and right as well
right = right + 1 # Make window bigger
else: # Window getting too big
if diff - t < best # We have a new best subarray
best = diff - t
# remember left and right as well
left = left + 1 # Make window smaller
The complexity is bound by the sorting. The above search will take at most 2n=O(n) iterations of the loop, each with computation time bound by a constant. Note that the above code was conceived for positive t.
The code was conceived for positive elements in S, and positive t. If any negative integers crop up, you might end up with a situation where the original index of right is smaller than that of left. So you'd end up with a sub sequence sum of -t. You can check this condition in the if … < best checks, but if you only suppress such cases there, I believe that you might be missing some relevant cases. Bottom line is: take this idea, think it through, but you'll have to adapt it for negative numbers.
Note: I think that this is the same general idea which Boris Strandjev wanted to express in his solution. However, I found that solution somewhat hard to read and harder to understand, so I'm offering my own formulation of this.
Your solution for the 0 case seems ok to me. Here is my solution to the second case:
You again calculate the prefix sums and sort.
You initialize to indices start to 0 (first index in the sorted prefix array) end to last (last index of the prefix array)
you start iterating over start 0...last and for each you find the corresponding end - the last index in which the prefix sum is such that prefix[start] + prefix[end] > t. When you find that end the best solution for start is either prefix[start] + prefix[end] or prefix[start] + prefix[end - 1] (the latter taken only if end > 0)
the most important thing is that you do not search for end for each start from scratch - prefix[start] increases in value when iterating over all possible values for start, which means that in each iteration you are interested only in values <= the previous value of end.
you can stop iterating when start > end
you take the best of all values obtained for all start positions.
It can easily be proved that this will give you complexity of O(n logn) for the entire algorithm.
I found this question by accident. Although it's been a while, I just post it. O(nlogn) time, O(n) space algorithm. This is running Java code. Hope this help people.
import java.util.*;
public class FindSubarrayClosestToZero {
void findSubarrayClosestToZero(int[] A) {
int curSum = 0;
List<Pair> list = new ArrayList<Pair>();
// 1. create prefix array: curSum array
for(int i = 0; i < A.length; i++) {
curSum += A[i];
Pair pair = new Pair(curSum, i);
list.add(pair);
}
// 2. sort the prefix array by value
Collections.sort(list, valueComparator);
// printPairList(list);
System.out.println();
// 3. compute pair-wise value diff: Triple< diff, i, i+1>
List<Triple> tList = new ArrayList<Triple>();
for(int i=0; i < A.length-1; i++) {
Pair p1 = list.get(i);
Pair p2 = list.get(i+1);
int valueDiff = p2.value - p1.value;
Triple Triple = new Triple(valueDiff, p1.index, p2.index);
tList.add(Triple);
}
// printTripleList(tList);
System.out.println();
// 4. Sort by min diff
Collections.sort(tList, valueDiffComparator);
// printTripleList(tList);
Triple res = tList.get(0);
int startIndex = Math.min(res.index1 + 1, res.index2);
int endIndex = Math.max(res.index1 + 1, res.index2);
System.out.println("\n\nThe subarray whose sum is closest to 0 is: ");
for(int i= startIndex; i<=endIndex; i++) {
System.out.print(" " + A[i]);
}
}
class Pair {
int value;
int index;
public Pair(int value, int index) {
this.value = value;
this.index = index;
}
}
class Triple {
int valueDiff;
int index1;
int index2;
public Triple(int valueDiff, int index1, int index2) {
this.valueDiff = valueDiff;
this.index1 = index1;
this.index2 = index2;
}
}
public static Comparator<Pair> valueComparator = new Comparator<Pair>() {
public int compare(Pair p1, Pair p2) {
return p1.value - p2.value;
}
};
public static Comparator<Triple> valueDiffComparator = new Comparator<Triple>() {
public int compare(Triple t1, Triple t2) {
return t1.valueDiff - t2.valueDiff;
}
};
void printPairList(List<Pair> list) {
for(Pair pair : list) {
System.out.println("<" + pair.value + " : " + pair.index + ">");
}
}
void printTripleList(List<Triple> list) {
for(Triple t : list) {
System.out.println("<" + t.valueDiff + " : " + t.index1 + " , " + t.index2 + ">");
}
}
public static void main(String[] args) {
int A1[] = {8, -3, 2, 1, -4, 10, -5}; // -3, 2, 1
int A2[] = {-3, 2, 4, -6, -8, 10, 11}; // 2, 4, 6
int A3[] = {10, -2, -7}; // 10, -2, -7
FindSubarrayClosestToZero f = new FindSubarrayClosestToZero();
f.findSubarrayClosestToZero(A1);
f.findSubarrayClosestToZero(A2);
f.findSubarrayClosestToZero(A3);
}
}
Solution time complexity : O(NlogN)
Solution space complexity : O(N)
[Note this problem can't be solved in O(N) as some have claimed]
Algorithm:-
Compute cumulative array(here,cum[]) of given array [Line 10]
Sort the cumulative array [Line 11]
Answer is minimum amongst C[i]-C[i+1] , $\forall$ i∈[1,n-1] (1-based index) [Line 12]
C++ Code:-
#include<bits/stdc++.h>
#define M 1000010
#define REP(i,n) for (int i=1;i<=n;i++)
using namespace std;
typedef long long ll;
ll a[M],n,cum[M],ans=numeric_limits<ll>::max(); //cum->cumulative array
int main() {
ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
cin>>n; REP(i,n) cin>>a[i],cum[i]=cum[i-1]+a[i];
sort(cum+1,cum+n+1);
REP(i,n-1) ans=min(ans,cum[i+1]-cum[i]);
cout<<ans; //min +ve difference from 0 we can get
}
After more thinking on this problem, I found that #frankyym's solution is the right solution. I have made some refinements on the original solution, here is my code:
#include <map>
#include <stdio.h>
#include <algorithm>
#include <limits.h>
using namespace std;
#define IDX_LOW_BOUND -2
// Return [i..j] range of A
pair<int, int> nearest_to_c(int A[], int n, int t) {
map<int, int> bst;
int presum, subsum, closest, i, j, start, end;
bool unset;
map<int, int>::iterator it;
bst[0] = -1;
// Barriers. Assume that no prefix sum is equal to INT_MAX or INT_MIN.
bst[INT_MIN] = IDX_LOW_BOUND;
bst[INT_MAX] = n;
unset = true;
// This initial value is always overwritten afterwards.
closest = 0;
presum = 0;
for (i = 0; i < n; ++i) {
presum += A[i];
for (it = bst.lower_bound(presum - t), j = 0; j < 2; --it, j++) {
if (it->first == INT_MAX || it->first == INT_MIN)
continue;
subsum = presum - it->first;
if (unset || abs(closest - t) > abs(subsum - t)) {
closest = subsum;
start = it->second + 1;
end = i;
if (closest - t == 0)
goto ret;
unset = false;
}
}
bst[presum] = i;
}
ret:
return make_pair(start, end);
}
int main() {
int A[] = {10, 20, 30, 30, 20, 10, 10, 20};
int t;
scanf("%d", &t);
pair<int, int> ans = nearest_to_c(A, 8, t);
printf("[%d:%d]\n", ans.first, ans.second);
return 0;
}
As a side note: I agree with the algorithms provided by other threads here. There is another algorithm on top of my head recently. Make up another copy of A[], which is B[]. Inside B[], each element is A[i]-t/n, which means B[0]=A[0]-t/n, B[1]=A[1]-t/n ... B[n-1]=A[n-1]-t/n. Then the second problem is actually transformed to the first problem, once the smallest subarray of B[] closest to 0 is found, the subarray of A[] closest to t is found at the same time. (It is kinda tricky if t is not divisible by n, nevertheless, the precision has to be chosen appropriate. Also the runtime is O(n))
I think there is a little bug concerning the closest to 0 solution. At the last step we should not only inspect the difference between neighbor elements but also elements not near to each other if one of them is bigger than 0 and the other one is smaller than 0.
Sorry, I thought I am supposed to get all answers for the problem. Didn't see it only requires one.
Cant we use dynamic programming to solve this question similar to kadane's algorithm.Here is my solution to this problem.Please comment if this approach is wrong.
#include <bits/stdc++.h>
using namespace std;
int main() {
//code
int test;
cin>>test;
while(test--){
int n;
cin>>n;
vector<int> A(n);
for(int i=0;i<n;i++)
cin>>A[i];
int closest_so_far=A[0];
int closest_end_here=A[0];
int start=0;
int end=0;
int lstart=0;
int lend=0;
for(int i=1;i<n;i++){
if(abs(A[i]-0)<abs(A[i]+closest_end_here-0)){
closest_end_here=A[i]-0;
lstart=i;
lend=i;
}
else{
closest_end_here=A[i]+closest_end_here-0;
lend=i;
}
if(abs(closest_end_here-0)<abs(closest_so_far-0)){
closest_so_far=closest_end_here;
start=lstart;
end=lend;
}
}
for(int i=start;i<=end;i++)
cout<<A[i]<<" ";
cout<<endl;
cout<<closest_so_far<<endl;
}
return 0;
}
Here is a code implementation by java:
public class Solution {
/**
* #param nums: A list of integers
* #return: A list of integers includes the index of the first number
* and the index of the last number
*/
public ArrayList<Integer> subarraySumClosest(int[] nums) {
// write your code here
int len = nums.length;
ArrayList<Integer> result = new ArrayList<Integer>();
int[] sum = new int[len];
HashMap<Integer,Integer> mapHelper = new HashMap<Integer,Integer>();
int min = Integer.MAX_VALUE;
int curr1 = 0;
int curr2 = 0;
sum[0] = nums[0];
if(nums == null || len < 2){
result.add(0);
result.add(0);
return result;
}
for(int i = 1;i < len;i++){
sum[i] = sum[i-1] + nums[i];
}
for(int i = 0;i < len;i++){
if(mapHelper.containsKey(sum[i])){
result.add(mapHelper.get(sum[i])+1);
result.add(i);
return result;
}
else{
mapHelper.put(sum[i],i);
}
}
Arrays.sort(sum);
for(int i = 0;i < len-1;i++){
if(Math.abs(sum[i] - sum[i+1]) < min){
min = Math.abs(sum[i] - sum[i+1]);
curr1 = sum[i];
curr2 = sum[i+1];
}
}
if(mapHelper.get(curr1) < mapHelper.get(curr2)){
result.add(mapHelper.get(curr1)+1);
result.add(mapHelper.get(curr2));
}
else{
result.add(mapHelper.get(curr2)+1);
result.add(mapHelper.get(curr1));
}
return result;
}
}

How do I write merge in place? [duplicate]

I know the question is not too specific.
All I want is someone to tell me how to convert a normal merge sort into an in-place merge sort (or a merge sort with constant extra space overhead).
All I can find (on the net) is pages saying "it is too complex" or "out of scope of this text".
The only known ways to merge in-place (without any extra space) are too complex to be reduced to practical program. (taken from here)
Even if it is too complex, what is the basic concept of how to make the merge sort in-place?
Knuth left this as an exercise (Vol 3, 5.2.5). There do exist in-place merge sorts. They must be implemented carefully.
First, naive in-place merge such as described here isn't the right solution. It downgrades the performance to O(N2).
The idea is to sort part of the array while using the rest as working area for merging.
For example like the following merge function.
void wmerge(Key* xs, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(xs, w++, xs[i] < xs[j] ? i++ : j++);
while (i < m)
swap(xs, w++, i++);
while (j < n)
swap(xs, w++, j++);
}
It takes the array xs, the two sorted sub-arrays are represented as ranges [i, m) and [j, n) respectively. The working area starts from w. Compare with the standard merge algorithm given in most textbooks, this one exchanges the contents between the sorted sub-array and the working area. As the result, the previous working area contains the merged sorted elements, while the previous elements stored in the working area are moved to the two sub-arrays.
However, there are two constraints that must be satisfied:
The work area should be within the bounds of the array. In other words, it should be big enough to hold elements exchanged in without causing any out-of-bound error.
The work area can be overlapped with either of the two sorted arrays; however, it must ensure that none of the unmerged elements are overwritten.
With this merging algorithm defined, it's easy to imagine a solution, which can sort half of the array; The next question is, how to deal with the rest of the unsorted part stored in work area as shown below:
... unsorted 1/2 array ... | ... sorted 1/2 array ...
One intuitive idea is to recursive sort another half of the working area, thus there are only 1/4 elements haven't been sorted yet.
... unsorted 1/4 array ... | sorted 1/4 array B | sorted 1/2 array A ...
The key point at this stage is that we must merge the sorted 1/4 elements B
with the sorted 1/2 elements A sooner or later.
Is the working area left, which only holds 1/4 elements, big enough to merge
A and B? Unfortunately, it isn't.
However, the second constraint mentioned above gives us a hint, that we can exploit it by arranging the working area to overlap with either sub-array if we can ensure the merging sequence that the unmerged elements won't be overwritten.
Actually, instead of sorting the second half of the working area, we can sort the first half, and put the working area between the two sorted arrays like this:
... sorted 1/4 array B | unsorted work area | ... sorted 1/2 array A ...
This setup effectively arranges the work area overlap with the sub-array A. This idea
is proposed in [Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. ``Practical in-place mergesort''. Nordic Journal of Computing, 1996].
So the only thing left is to repeat the above step, which reduces the working area from 1/2, 1/4, 1/8, … When the working area becomes small enough (for example, only two elements left), we can switch to a trivial insertion sort to end this algorithm.
Here is the implementation in ANSI C based on this paper.
void imsort(Key* xs, int l, int u);
void swap(Key* xs, int i, int j) {
Key tmp = xs[i]; xs[i] = xs[j]; xs[j] = tmp;
}
/*
* sort xs[l, u), and put result to working area w.
* constraint, len(w) == u - l
*/
void wsort(Key* xs, int l, int u, int w) {
int m;
if (u - l > 1) {
m = l + (u - l) / 2;
imsort(xs, l, m);
imsort(xs, m, u);
wmerge(xs, l, m, m, u, w);
}
else
while (l < u)
swap(xs, l++, w++);
}
void imsort(Key* xs, int l, int u) {
int m, n, w;
if (u - l > 1) {
m = l + (u - l) / 2;
w = l + u - m;
wsort(xs, l, m, w); /* the last half contains sorted elements */
while (w - l > 2) {
n = w;
w = l + (n - l + 1) / 2;
wsort(xs, w, n, l); /* the first half of the previous working area contains sorted elements */
wmerge(xs, l, l + n - w, n, u, w);
}
for (n = w; n > l; --n) /*switch to insertion sort*/
for (m = n; m < u && xs[m] < xs[m-1]; ++m)
swap(xs, m, m - 1);
}
}
Where wmerge is defined previously.
The full source code can be found here and the detailed explanation can be found here
By the way, this version isn't the fastest merge sort because it needs more swap operations. According to my test, it's faster than the standard version, which allocates extra spaces in every recursion. But it's slower than the optimized version, which doubles the original array in advance and uses it for further merging.
Including its "big result", this paper describes a couple of variants of in-place merge sort (PDF):
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.5514&rep=rep1&type=pdf
In-place sorting with fewer moves
Jyrki Katajainen, Tomi A. Pasanen
It is shown that an array of n
elements can be sorted using O(1)
extra space, O(n log n / log log n)
element moves, and n log2n + O(n log
log n) comparisons. This is the first
in-place sorting algorithm requiring
o(n log n) moves in the worst case
while guaranteeing O(n log n)
comparisons, but due to the constant
factors involved the algorithm is
predominantly of theoretical interest.
I think this is relevant too. I have a printout of it lying around, passed on to me by a colleague, but I haven't read it. It seems to cover basic theory, but I'm not familiar enough with the topic to judge how comprehensively:
http://comjnl.oxfordjournals.org/cgi/content/abstract/38/8/681
Optimal Stable Merging
Antonios Symvonis
This paper shows how to stably merge
two sequences A and B of sizes m and
n, m ≤ n, respectively, with O(m+n)
assignments, O(mlog(n/m+1))
comparisons and using only a constant
amount of additional space. This
result matches all known lower bounds...
It really isn't easy or efficient, and I suggest you don't do it unless you really have to (and you probably don't have to unless this is homework since the applications of inplace merging are mostly theoretical). Can't you use quicksort instead? Quicksort will be faster anyway with a few simpler optimizations and its extra memory is O(log N).
Anyway, if you must do it then you must. Here's what I found: one and two. I'm not familiar with the inplace merge sort, but it seems like the basic idea is to use rotations to facilitate merging two arrays without using extra memory.
Note that this is slower even than the classic merge sort that's not inplace.
The critical step is getting the merge itself to be in-place. It's not as difficult as those sources make out, but you lose something when you try.
Looking at one step of the merge:
[...list-sorted...|x...list-A...|y...list-B...]
We know that the sorted sequence is less than everything else, that x is less than everything else in A, and that y is less than everything else in B. In the case where x is less than or equal to y, you just move your pointer to the start of A on one. In the case where y is less than x, you've got to shuffle y past the whole of A to sorted. That last step is what makes this expensive (except in degenerate cases).
It's generally cheaper (especially when the arrays only actually contain single words per element, e.g., a pointer to a string or structure) to trade off some space for time and have a separate temporary array that you sort back and forth between.
An example of bufferless mergesort in C.
#define SWAP(type, a, b) \
do { type t=(a);(a)=(b);(b)=t; } while (0)
static void reverse_(int* a, int* b)
{
for ( --b; a < b; a++, b-- )
SWAP(int, *a, *b);
}
static int* rotate_(int* a, int* b, int* c)
/* swap the sequence [a,b) with [b,c). */
{
if (a != b && b != c)
{
reverse_(a, b);
reverse_(b, c);
reverse_(a, c);
}
return a + (c - b);
}
static int* lower_bound_(int* a, int* b, const int key)
/* find first element not less than #p key in sorted sequence or end of
* sequence (#p b) if not found. */
{
int i;
for ( i = b-a; i != 0; i /= 2 )
{
int* mid = a + i/2;
if (*mid < key)
a = mid + 1, i--;
}
return a;
}
static int* upper_bound_(int* a, int* b, const int key)
/* find first element greater than #p key in sorted sequence or end of
* sequence (#p b) if not found. */
{
int i;
for ( i = b-a; i != 0; i /= 2 )
{
int* mid = a + i/2;
if (*mid <= key)
a = mid + 1, i--;
}
return a;
}
static void ip_merge_(int* a, int* b, int* c)
/* inplace merge. */
{
int n1 = b - a;
int n2 = c - b;
if (n1 == 0 || n2 == 0)
return;
if (n1 == 1 && n2 == 1)
{
if (*b < *a)
SWAP(int, *a, *b);
}
else
{
int* p, * q;
if (n1 <= n2)
p = upper_bound_(a, b, *(q = b+n2/2));
else
q = lower_bound_(b, c, *(p = a+n1/2));
b = rotate_(p, b, q);
ip_merge_(a, p, b);
ip_merge_(b, q, c);
}
}
void mergesort(int* v, int n)
{
if (n > 1)
{
int h = n/2;
mergesort(v, h); mergesort(v+h, n-h);
ip_merge_(v, v+h, v+n);
}
}
An example of adaptive mergesort (optimized).
Adds support code and modifications to accelerate the merge when an auxiliary buffer of any size is available (still works without additional memory). Uses forward and backward merging, ring rotation, small sequence merging and sorting, and iterative mergesort.
#include <stdlib.h>
#include <string.h>
static int* copy_(const int* a, const int* b, int* out)
{
int count = b - a;
if (a != out)
memcpy(out, a, count*sizeof(int));
return out + count;
}
static int* copy_backward_(const int* a, const int* b, int* out)
{
int count = b - a;
if (b != out)
memmove(out - count, a, count*sizeof(int));
return out - count;
}
static int* merge_(const int* a1, const int* b1, const int* a2,
const int* b2, int* out)
{
while ( a1 != b1 && a2 != b2 )
*out++ = (*a1 <= *a2) ? *a1++ : *a2++;
return copy_(a2, b2, copy_(a1, b1, out));
}
static int* merge_backward_(const int* a1, const int* b1,
const int* a2, const int* b2, int* out)
{
while ( a1 != b1 && a2 != b2 )
*--out = (*(b1-1) > *(b2-1)) ? *--b1 : *--b2;
return copy_backward_(a1, b1, copy_backward_(a2, b2, out));
}
static unsigned int gcd_(unsigned int m, unsigned int n)
{
while ( n != 0 )
{
unsigned int t = m % n;
m = n;
n = t;
}
return m;
}
static void rotate_inner_(const int length, const int stride,
int* first, int* last)
{
int* p, * next = first, x = *first;
while ( 1 )
{
p = next;
if ((next += stride) >= last)
next -= length;
if (next == first)
break;
*p = *next;
}
*p = x;
}
static int* rotate_(int* a, int* b, int* c)
/* swap the sequence [a,b) with [b,c). */
{
if (a != b && b != c)
{
int n1 = c - a;
int n2 = b - a;
int* i = a;
int* j = a + gcd_(n1, n2);
for ( ; i != j; i++ )
rotate_inner_(n1, n2, i, c);
}
return a + (c - b);
}
static void ip_merge_small_(int* a, int* b, int* c)
/* inplace merge.
* #note faster for small sequences. */
{
while ( a != b && b != c )
if (*a <= *b)
a++;
else
{
int* p = b+1;
while ( p != c && *p < *a )
p++;
rotate_(a, b, p);
b = p;
}
}
static void ip_merge_(int* a, int* b, int* c, int* t, const int ts)
/* inplace merge.
* #note works with or without additional memory. */
{
int n1 = b - a;
int n2 = c - b;
if (n1 <= n2 && n1 <= ts)
{
merge_(t, copy_(a, b, t), b, c, a);
}
else if (n2 <= ts)
{
merge_backward_(a, b, t, copy_(b, c, t), c);
}
/* merge without buffer. */
else if (n1 + n2 < 48)
{
ip_merge_small_(a, b, c);
}
else
{
int* p, * q;
if (n1 <= n2)
p = upper_bound_(a, b, *(q = b+n2/2));
else
q = lower_bound_(b, c, *(p = a+n1/2));
b = rotate_(p, b, q);
ip_merge_(a, p, b, t, ts);
ip_merge_(b, q, c, t, ts);
}
}
static void ip_merge_chunk_(const int cs, int* a, int* b, int* t,
const int ts)
{
int* p = a + cs*2;
for ( ; p <= b; a = p, p += cs*2 )
ip_merge_(a, a+cs, p, t, ts);
if (a+cs < b)
ip_merge_(a, a+cs, b, t, ts);
}
static void smallsort_(int* a, int* b)
/* insertion sort.
* #note any stable sort with low setup cost will do. */
{
int* p, * q;
for ( p = a+1; p < b; p++ )
{
int x = *p;
for ( q = p; a < q && x < *(q-1); q-- )
*q = *(q-1);
*q = x;
}
}
static void smallsort_chunk_(const int cs, int* a, int* b)
{
int* p = a + cs;
for ( ; p <= b; a = p, p += cs )
smallsort_(a, p);
smallsort_(a, b);
}
static void mergesort_lower_(int* v, int n, int* t, const int ts)
{
int cs = 16;
smallsort_chunk_(cs, v, v+n);
for ( ; cs < n; cs *= 2 )
ip_merge_chunk_(cs, v, v+n, t, ts);
}
static void* get_buffer_(int size, int* final)
{
void* p = NULL;
while ( size != 0 && (p = malloc(size)) == NULL )
size /= 2;
*final = size;
return p;
}
void mergesort(int* v, int n)
{
/* #note buffer size may be in the range [0,(n+1)/2]. */
int request = (n+1)/2 * sizeof(int);
int actual;
int* t = (int*) get_buffer_(request, &actual);
/* #note allocation failure okay. */
int tsize = actual / sizeof(int);
mergesort_lower_(v, n, t, tsize);
free(t);
}
This answer has a code example, which implements the algorithm described in the paper Practical In-Place Merging by Bing-Chao Huang and Michael A. Langston. I have to admit that I do not understand the details, but the given complexity of the merge step is O(n).
From a practical perspective, there is evidence that pure in-place implementations are not performing better in real world scenarios. For example, the C++ standard defines std::inplace_merge, which is as the name implies an in-place merge operation.
Assuming that C++ libraries are typically very well optimized, it is interesting to see how it is implemented:
1) libstdc++ (part of the GCC code base): std::inplace_merge
The implementation delegates to __inplace_merge, which dodges the problem by trying to allocate a temporary buffer:
typedef _Temporary_buffer<_BidirectionalIterator, _ValueType> _TmpBuf;
_TmpBuf __buf(__first, __len1 + __len2);
if (__buf.begin() == 0)
std::__merge_without_buffer
(__first, __middle, __last, __len1, __len2, __comp);
else
std::__merge_adaptive
(__first, __middle, __last, __len1, __len2, __buf.begin(),
_DistanceType(__buf.size()), __comp);
Otherwise, it falls back to an implementation (__merge_without_buffer), which requires no extra memory, but no longer runs in O(n) time.
2) libc++ (part of the Clang code base): std::inplace_merge
Looks similar. It delegates to a function, which also tries to allocate a buffer. Depending on whether it got enough elements, it will choose the implementation. The constant-memory fallback function is called __buffered_inplace_merge.
Maybe even the fallback is still O(n) time, but the point is that they do not use the implementation if temporary memory is available.
Note that the C++ standard explicitly gives implementations the freedom to choose this approach by lowering the required complexity from O(n) to O(N log N):
Complexity:
Exactly N-1 comparisons if enough additional memory is available. If the memory is insufficient, O(N log N) comparisons.
Of course, this cannot be taken as a proof that constant space in-place merges in O(n) time should never be used. On the other hand, if it would be faster, the optimized C++ libraries would probably switch to that type of implementation.
This is my C version:
void mergesort(int *a, int len) {
int temp, listsize, xsize;
for (listsize = 1; listsize <= len; listsize*=2) {
for (int i = 0, j = listsize; (j+listsize) <= len; i += (listsize*2), j += (listsize*2)) {
merge(& a[i], listsize, listsize);
}
}
listsize /= 2;
xsize = len % listsize;
if (xsize > 1)
mergesort(& a[len-xsize], xsize);
merge(a, listsize, xsize);
}
void merge(int *a, int sizei, int sizej) {
int temp;
int ii = 0;
int ji = sizei;
int flength = sizei+sizej;
for (int f = 0; f < (flength-1); f++) {
if (sizei == 0 || sizej == 0)
break;
if (a[ii] < a[ji]) {
ii++;
sizei--;
}
else {
temp = a[ji];
for (int z = (ji-1); z >= ii; z--)
a[z+1] = a[z];
ii++;
a[f] = temp;
ji++;
sizej--;
}
}
}
I know I'm late to the game, but here's a solution I wrote yesterday. I also posted this elsewhere, but this appears to be the most popular merge-in-place thread on SO. I've also not seen this algorithm posted anywhere else, so hopefully this helps some people.
This algorithm is in its most simple form so that it can be understood. It can be significantly tweaked for extra speed. Average time complexity is: O(n.log₂n) for the stable in-place array merge, and O(n.(log₂n)²) for the overall sort.
// Stable Merge In Place Sort
//
//
// The following code is written to illustrate the base algorithm. A good
// number of optimizations can be applied to boost its overall speed
// For all its simplicity, it does still perform somewhat decently.
// Average case time complexity appears to be: O(n.(log₂n)²)
#include <stddef.h>
#include <stdio.h>
#define swap(x, y) (t=(x), (x)=(y), (y)=t)
// Both sorted sub-arrays must be adjacent in 'a'
// Assumes that both 'an' and 'bn' are always non-zero
// 'an' is the length of the first sorted section in 'a', referred to as A
// 'bn' is the length of the second sorted section in 'a', referred to as B
static void
merge_inplace(int A[], size_t an, size_t bn)
{
int t, *B = &A[an];
size_t pa, pb; // Swap partition pointers within A and B
// Find the portion to swap. We're looking for how much from the
// start of B can swap with the end of A, such that every element
// in A is less than or equal to any element in B. This is quite
// simple when both sub-arrays come at us pre-sorted
for(pa = an, pb = 0; pa>0 && pb<bn && B[pb] < A[pa-1]; pa--, pb++);
// Now swap last part of A with first part of B according to the
// indicies we found
for (size_t index=pa; index < an; index++)
swap(A[index], B[index-pa]);
// Now merge the two sub-array pairings. We need to check that either array
// didn't wholly swap out the other and cause the remaining portion to be zero
if (pa>0 && (an-pa)>0)
merge_inplace(A, pa, an-pa);
if (pb>0 && (bn-pb)>0)
merge_inplace(B, pb, bn-pb);
} // merge_inplace
// Implements a recursive merge-sort algorithm with an optional
// insertion sort for when the splits get too small. 'n' must
// ALWAYS be 2 or more. It enforces this when calling itself
static void
merge_sort(int a[], size_t n)
{
size_t m = n/2;
// Sort first and second halves only if the target 'n' will be > 1
if (m > 1)
merge_sort(a, m);
if ((n-m)>1)
merge_sort(a+m, n-m);
// Now merge the two sorted sub-arrays together. We know that since
// n > 1, then both m and n-m MUST be non-zero, and so we will never
// violate the condition of not passing in zero length sub-arrays
merge_inplace(a, m, n-m);
} // merge_sort
// Print an array */
static void
print_array(int a[], size_t size)
{
if (size > 0) {
printf("%d", a[0]);
for (size_t i = 1; i < size; i++)
printf(" %d", a[i]);
}
printf("\n");
} // print_array
// Test driver
int
main()
{
int a[] = { 17, 3, 16, 5, 14, 8, 10, 7, 15, 1, 13, 4, 9, 12, 11, 6, 2 };
size_t n = sizeof(a) / sizeof(a[0]);
merge_sort(a, n);
print_array(a, n);
return 0;
} // main
Leveraging C++ std::inplace_merge, in-place merge sort can be implemented as follows:
template< class _Type >
inline void merge_sort_inplace(_Type* src, size_t l, size_t r)
{
if (r <= l) return;
size_t m = l + ( r - l ) / 2; // computes the average without overflow
merge_sort_inplace(src, l, m);
merge_sort_inplace(src, m + 1, r);
std::inplace_merge(src + l, src + m + 1, src + r + 1);
}
More sorting algorithms, including parallel implementations, are available in https://github.com/DragonSpit/ParallelAlgorithms repo, which is open source and free.
I just tried in place merge algorithm for merge sort in JAVA by using the insertion sort algorithm, using following steps.
1) Two sorted arrays are available.
2) Compare the first values of each array; and place the smallest value into the first array.
3) Place the larger value into the second array by using insertion sort (traverse from left to right).
4) Then again compare the second value of first array and first value of second array, and do the same. But when swapping happens there is some clue on skip comparing the further items, but just swapping required.
I have made some optimization here; to keep lesser comparisons in insertion sort. The only drawback i found with this solutions is it needs larger swapping of array elements in the second array.
e.g)
First___Array : 3, 7, 8, 9
Second Array : 1, 2, 4, 5
Then 7, 8, 9 makes the second array to swap(move left by one) all its elements by one each time to place himself in the last.
So the assumption here is swapping items is negligible compare to comparing of two items.
https://github.com/skanagavelu/algorithams/blob/master/src/sorting/MergeSort.java
package sorting;
import java.util.Arrays;
public class MergeSort {
public static void main(String[] args) {
int[] array = { 5, 6, 10, 3, 9, 2, 12, 1, 8, 7 };
mergeSort(array, 0, array.length -1);
System.out.println(Arrays.toString(array));
int[] array1 = {4, 7, 2};
System.out.println(Arrays.toString(array1));
mergeSort(array1, 0, array1.length -1);
System.out.println(Arrays.toString(array1));
System.out.println("\n\n");
int[] array2 = {4, 7, 9};
System.out.println(Arrays.toString(array2));
mergeSort(array2, 0, array2.length -1);
System.out.println(Arrays.toString(array2));
System.out.println("\n\n");
int[] array3 = {4, 7, 5};
System.out.println(Arrays.toString(array3));
mergeSort(array3, 0, array3.length -1);
System.out.println(Arrays.toString(array3));
System.out.println("\n\n");
int[] array4 = {7, 4, 2};
System.out.println(Arrays.toString(array4));
mergeSort(array4, 0, array4.length -1);
System.out.println(Arrays.toString(array4));
System.out.println("\n\n");
int[] array5 = {7, 4, 9};
System.out.println(Arrays.toString(array5));
mergeSort(array5, 0, array5.length -1);
System.out.println(Arrays.toString(array5));
System.out.println("\n\n");
int[] array6 = {7, 4, 5};
System.out.println(Arrays.toString(array6));
mergeSort(array6, 0, array6.length -1);
System.out.println(Arrays.toString(array6));
System.out.println("\n\n");
//Handling array of size two
int[] array7 = {7, 4};
System.out.println(Arrays.toString(array7));
mergeSort(array7, 0, array7.length -1);
System.out.println(Arrays.toString(array7));
System.out.println("\n\n");
int input1[] = {1};
int input2[] = {4,2};
int input3[] = {6,2,9};
int input4[] = {6,-1,10,4,11,14,19,12,18};
System.out.println(Arrays.toString(input1));
mergeSort(input1, 0, input1.length-1);
System.out.println(Arrays.toString(input1));
System.out.println("\n\n");
System.out.println(Arrays.toString(input2));
mergeSort(input2, 0, input2.length-1);
System.out.println(Arrays.toString(input2));
System.out.println("\n\n");
System.out.println(Arrays.toString(input3));
mergeSort(input3, 0, input3.length-1);
System.out.println(Arrays.toString(input3));
System.out.println("\n\n");
System.out.println(Arrays.toString(input4));
mergeSort(input4, 0, input4.length-1);
System.out.println(Arrays.toString(input4));
System.out.println("\n\n");
}
private static void mergeSort(int[] array, int p, int r) {
//Both below mid finding is fine.
int mid = (r - p)/2 + p;
int mid1 = (r + p)/2;
if(mid != mid1) {
System.out.println(" Mid is mismatching:" + mid + "/" + mid1+ " for p:"+p+" r:"+r);
}
if(p < r) {
mergeSort(array, p, mid);
mergeSort(array, mid+1, r);
// merge(array, p, mid, r);
inPlaceMerge(array, p, mid, r);
}
}
//Regular merge
private static void merge(int[] array, int p, int mid, int r) {
int lengthOfLeftArray = mid - p + 1; // This is important to add +1.
int lengthOfRightArray = r - mid;
int[] left = new int[lengthOfLeftArray];
int[] right = new int[lengthOfRightArray];
for(int i = p, j = 0; i <= mid; ){
left[j++] = array[i++];
}
for(int i = mid + 1, j = 0; i <= r; ){
right[j++] = array[i++];
}
int i = 0, j = 0;
for(; i < left.length && j < right.length; ) {
if(left[i] < right[j]){
array[p++] = left[i++];
} else {
array[p++] = right[j++];
}
}
while(j < right.length){
array[p++] = right[j++];
}
while(i < left.length){
array[p++] = left[i++];
}
}
//InPlaceMerge no extra array
private static void inPlaceMerge(int[] array, int p, int mid, int r) {
int secondArrayStart = mid+1;
int prevPlaced = mid+1;
int q = mid+1;
while(p < mid+1 && q <= r){
boolean swapped = false;
if(array[p] > array[q]) {
swap(array, p, q);
swapped = true;
}
if(q != secondArrayStart && array[p] > array[secondArrayStart]) {
swap(array, p, secondArrayStart);
swapped = true;
}
//Check swapped value is in right place of second sorted array
if(swapped && secondArrayStart+1 <= r && array[secondArrayStart+1] < array[secondArrayStart]) {
prevPlaced = placeInOrder(array, secondArrayStart, prevPlaced);
}
p++;
if(q < r) { //q+1 <= r) {
q++;
}
}
}
private static int placeInOrder(int[] array, int secondArrayStart, int prevPlaced) {
int i = secondArrayStart;
for(; i < array.length; i++) {
//Simply swap till the prevPlaced position
if(secondArrayStart < prevPlaced) {
swap(array, secondArrayStart, secondArrayStart+1);
secondArrayStart++;
continue;
}
if(array[i] < array[secondArrayStart]) {
swap(array, i, secondArrayStart);
secondArrayStart++;
} else if(i != secondArrayStart && array[i] > array[secondArrayStart]){
break;
}
}
return secondArrayStart;
}
private static void swap(int[] array, int m, int n){
int temp = array[m];
array[m] = array[n];
array[n] = temp;
}
}

Non-Recursive Merge Sort

Can someone explain in English how does Non-Recursive merge sort works ?
Thanks
Non-recursive merge sort works by considering window sizes of 1,2,4,8,16..2^n over the input array. For each window ('k' in code below), all adjacent pairs of windows are merged into a temporary space, then put back into the array.
Here is my single function, C-based, non-recursive merge sort.
Input and output are in 'a'. Temporary storage in 'b'.
One day, I'd like to have a version that was in-place:
float a[50000000],b[50000000];
void mergesort (long num)
{
int rght, wid, rend;
int i,j,m,t;
for (int k=1; k < num; k *= 2 ) {
for (int left=0; left+k < num; left += k*2 ) {
rght = left + k;
rend = rght + k;
if (rend > num) rend = num;
m = left; i = left; j = rght;
while (i < rght && j < rend) {
if (a[i] <= a[j]) {
b[m] = a[i]; i++;
} else {
b[m] = a[j]; j++;
}
m++;
}
while (i < rght) {
b[m]=a[i];
i++; m++;
}
while (j < rend) {
b[m]=a[j];
j++; m++;
}
for (m=left; m < rend; m++) {
a[m] = b[m];
}
}
}
}
By the way, it is also very easy to prove this is O(n log n). The outer loop over window size grows as power of two, so k has log n iterations. While there are many windows covered by inner loop, together, all windows for a given k exactly cover the input array, so inner loop is O(n). Combining inner and outer loops: O(n)*O(log n) = O(n log n).
Loop through the elements and make every adjacent group of two sorted by swapping the two when necessary.
Now, dealing with groups of two groups (any two, most likely adjacent groups, but you could use the first and last groups) merge them into one group be selecting the lowest valued element from each group repeatedly until all 4 elements are merged into a group of 4. Now, you have nothing but groups of 4 plus a possible remainder. Using a loop around the previous logic, do it all again except this time work in groups of 4. This loop runs until there is only one group.
Quoting from Algorithmist:
Bottom-up merge sort is a
non-recursive variant of the merge
sort, in which the array is sorted by
a sequence of passes. During each
pass, the array is divided into blocks
of size m. (Initially, m = 1).
Every two adjacent blocks are merged
(as in normal merge sort), and the
next pass is made with a twice larger
value of m.
Both recursive and non-recursive merge sort have same time complexity of O(nlog(n)). This is because both the approaches use stack in one or the other manner.
In non-recursive approach
the user/programmer defines and uses stack
In Recursive approach stack is used internally by the system to store return address of the function which is called recursively
The main reason you would want to use a non-recursive MergeSort is to avoid recursion stack overflow. I for example am trying to sort 100 million records, each record about 1 kByte in length (= 100 gigabytes), in alphanumeric order. An order(N^2) sort would take 10^16 operations, ie it would take decades to run even at 0.1 microsecond per compare operation. An order (N log(N)) Merge Sort will take less than 10^10 operations or less than an hour to run at the same operational speed. However, in the recursive version of MergeSort, the 100 million element sort results in 50-million recursive calls to the MergeSort( ). At a few hundred bytes per stack recursion, this overflows the recursion stack even though the process easily fits within heap memory. Doing the Merge sort using dynamically allocated memory on the heap-- I am using the code provided by Rama Hoetzlein above, but I am using dynamically allocated memory on the heap instead of using the stack-- I can sort my 100 million records with the non-recursive merge sort and I don't overflow the stack. An appropriate conversation for website "Stack Overflow"!
PS: Thanks for the code, Rama Hoetzlein.
PPS: 100 gigabytes on the heap?!! Well, it's a virtual heap on a Hadoop cluster, and the MergeSort will be implemented in parallel on several machines sharing the load...
I am new here.
I have modified Rama Hoetzlein solution( thanks for the ideas ). My merge sort does not use the last copy back loop. Plus it falls back on insertion sort. I have benchmarked it on my laptop and it is the fastest. Even better than the recursive version. By the way it is in java and sorts from descending order to ascending order. And of course it is iterative. It can be made multithreaded. The code has become complex. So if anyone interested, please have a look.
Code :
int num = input_array.length;
int left = 0;
int right;
int temp;
int LIMIT = 16;
if (num <= LIMIT)
{
// Single Insertion Sort
right = 1;
while(right < num)
{
temp = input_array[right];
while(( left > (-1) ) && ( input_array[left] > temp ))
{
input_array[left+1] = input_array[left--];
}
input_array[left+1] = temp;
left = right;
right++;
}
}
else
{
int i;
int j;
//Fragmented Insertion Sort
right = LIMIT;
while (right <= num)
{
i = left + 1;
j = left;
while (i < right)
{
temp = input_array[i];
while(( j >= left ) && ( input_array[j] > temp ))
{
input_array[j+1] = input_array[j--];
}
input_array[j+1] = temp;
j = i;
i++;
}
left = right;
right = right + LIMIT;
}
// Remainder Insertion Sort
i = left + 1;
j = left;
while(i < num)
{
temp = input_array[i];
while(( j >= left ) && ( input_array[j] > temp ))
{
input_array[j+1] = input_array[j--];
}
input_array[j+1] = temp;
j = i;
i++;
}
// Rama Hoetzlein method
int[] temp_array = new int[num];
int[] swap;
int k = LIMIT;
while (k < num)
{
left = 0;
i = k;// The mid point
right = k << 1;
while (i < num)
{
if (right > num)
{
right = num;
}
temp = left;
j = i;
while ((left < i) && (j < right))
{
if (input_array[left] <= input_array[j])
{
temp_array[temp++] = input_array[left++];
}
else
{
temp_array[temp++] = input_array[j++];
}
}
while (left < i)
{
temp_array[temp++] = input_array[left++];
}
while (j < right)
{
temp_array[temp++] = input_array[j++];
}
// Do not copy back the elements to input_array
left = right;
i = left + k;
right = i + k;
}
// Instead of copying back in previous loop, copy remaining elements to temp_array, then swap the array pointers
while (left < num)
{
temp_array[left] = input_array[left++];
}
swap = input_array;
input_array = temp_array;
temp_array = swap;
k <<= 1;
}
}
return input_array;
Just in case anyone's still lurking in this thread ... I've adapted Rama Hoetzlein's non-recursive merge sort algorithm above to sort double linked lists. This new sort is in-place, stable and avoids time costly list dividing code that's in other linked list merge sorting implementations.
// MergeSort.cpp
// Angus Johnson 2017
// License: Public Domain
#include "io.h"
#include "time.h"
#include "stdlib.h"
struct Node {
int data;
Node *next;
Node *prev;
Node *jump;
};
inline void Move2Before1(Node *n1, Node *n2)
{
Node *prev, *next;
//extricate n2 from linked-list ...
prev = n2->prev;
next = n2->next;
prev->next = next; //nb: prev is always assigned
if (next) next->prev = prev;
//insert n2 back into list ...
prev = n1->prev;
if (prev) prev->next = n2;
n1->prev = n2;
n2->prev = prev;
n2->next = n1;
}
void MergeSort(Node *&nodes)
{
Node *first, *second, *base, *tmp, *prev_base;
if (!nodes || !nodes->next) return;
int mul = 1;
for (;;) {
first = nodes;
prev_base = NULL;
//sort each successive mul group of nodes ...
while (first) {
if (mul == 1) {
second = first->next;
if (!second) {
first->jump = NULL;
break;
}
first->jump = second->next;
}
else
{
second = first->jump;
if (!second) break;
first->jump = second->jump;
}
base = first;
int cnt1 = mul, cnt2 = mul;
//the following 'if' condition marginally improves performance
//in an unsorted list but very significantly improves
//performance when the list is mostly sorted ...
if (second->data < second->prev->data)
while (cnt1 && cnt2) {
if (second->data < first->data) {
if (first == base) {
if (prev_base) prev_base->jump = second;
base = second;
base->jump = first->jump;
if (first == nodes) nodes = second;
}
tmp = second->next;
Move2Before1(first, second);
second = tmp;
if (!second) { first = NULL; break; }
--cnt2;
}
else
{
first = first->next;
--cnt1;
}
} //while (cnt1 && cnt2)
first = base->jump;
prev_base = base;
} //while (first)
if (!nodes->jump) break;
else mul <<= 1;
} //for (;;)
}
void InsertNewNode(Node *&head, int data)
{
Node *tmp = new Node;
tmp->data = data;
tmp->next = NULL;
tmp->prev = NULL;
tmp->jump = NULL;
if (head) {
tmp->next = head;
head->prev = tmp;
head = tmp;
}
else head = tmp;
}
void ClearNodes(Node *head)
{
if (!head) return;
while (head) {
Node *tmp = head;
head = head->next;
delete tmp;
}
}
int main()
{
srand(time(NULL));
Node *nodes = NULL, *n;
const int len = 1000000; //1 million nodes
for (int i = 0; i < len; i++)
InsertNewNode(nodes, rand() >> 4);
clock_t t = clock();
MergeSort(nodes); //~1/2 sec for 1 mill. nodes on Pentium i7.
t = clock() - t;
printf("Sort time: %d msec\n\n", t * 1000 / CLOCKS_PER_SEC);
n = nodes;
while (n)
{
if (n->prev && n->data < n->prev->data) {
printf("oops! sorting's broken\n");
break;
}
n = n->next;
}
ClearNodes(nodes);
printf("All done!\n\n");
getchar();
return 0;
}
Edited 2017-10-27: Fixed a bug affecting odd numbered lists
Any interest in this anymore? Probably not. Oh well. Here goes nothing.
The insight of merge-sort is that you can merge two (or several) small sorted runs of records into one larger sorted run, and you can do so with simple stream-like operations "read first/next record" and "append record" -- which means you don't need a big data set in RAM at once: you can get by with just two records, each taken from a distinct run. If you can just keep track of where in your file the sorted runs start and end, you can simply merge pairs of adjacent runs (into a temp file) repeatedly until the file is sorted: this takes a logarithmic number of passes over the file.
A single record is trivially sorted: each time you merge two adjacent runs, the size of each run doubles. So that's one way to keep track. The other is to work on a priority queue of runs. Take the two smallest runs from the queue, merge them, and enqueue the result -- until there is only one remaining run. This is appropriate if you expect your data to naturally start with sorted runs.
In practice with enormous data sets you'll want to exploit the memory hierarchy. Suppose you have gigabytes of RAM and terabytes of data. Why not merge a thousand runs at once? Indeed you can do this, and a priority-queue of runs can help. That will significantly decrease the number of passes you have to make over a file to get it sorted. Some details are left as an exercise for the reader.

Resources