Merge sort in c++ with divide 3 - sorting

How can I write a merge sort but divide to 3?
int merge_sort(int input[], int p, int r)
{
if ( p >= r )
return 0;
int mid = floor((p + r) / 2);
merge_sort(input, p, mid);
**merge_sort(input, mid + 1, r);**
merge(input, p, r);
}

This is probably supposed to be a 3 way merge. You may want to consider using a bottom up merge sort. For either top down or bottom up merge, most of the complexity is going to be in the merge function. As mentioned in the answer linked to by zwergmaster, it's a 3 way merge of runs. Each run needs a current and ending index or pointer. A sequence of if / else statements end up doing two compares to determine which of 3 runs has the smallest element, and then that smallest element is moved to the destination array (or vector or ...) and the next element from that run is retrieved. When the end of one of the 3 runs is reached, the code switches into a 2 way merge. When the end of the next run is reached, the code copies the rest of the remaining run. Then the next set of 3 runs are merged, repeating the process until the end of the array is reached, which could happen within any of the 3 runs, so the last merge near the end of the array may be a merge of 3 or 2 runs, or just a copy of 1 run.
It would be more efficient to have an initial function that allocates a temp array the same size as the array to be sorted, then have it call the merge sort function passing the temp array as a parameter, rather than constantly allocating and freeing small temp arrays during the merge sort process.
So using top down merge sort partial code to help explain this:
merge_sort(int *a, int n)
{
int *b = new int[n];
top_down_merge_sort(a, b, 0, n);
/* ... */
delete[] b;
}
top_down_merge_sort(int *a, int *b, int beg, int end)
{
if(end - beg < 3){
/* sort in place */
return;
}
int run0 = beg;
int run1 = beg + (end-beg)/3;
int run2 = beg + 2*(end-beg)/3;
top_down_merge_sort(a, b, run0, run1);
top_down_merge_sort(a, b, run1, run2);
top_down_merge_sort(a, b, run2, end);
merge_runs(a, b, run0, run1, run2, end);
}

Related

Why is recursive Merge Sort preferred over iterative Merge Sort even though the latter has auxillary space complexity?

While studying about Merge Sort algorithm, I was curious to know if this sorting algorithm can be further optimised. Found out that there exists Iterative version of Merge Sort algorithm with same time complexity but even better O(1) space complexity. And Iterative approach is always better than recursive approch in terms of performance. Then why is it less common and rarely talked about in any regular Algorithm course?
Here's the link to Iterative Merge Sort algorithm
If you think that it has O(1) space complexity, look again. They have the original array A of size n, and an auxiliary temp also of size n. (It actually only needs to be n/2 but they kept it simple.)
And the reason why they need that second array is that when you merge, you copy the bottom region out to temp, then merge back starting with where it was.
So the tradeoff is this. A recursive solution involves a lot less fiddly bits and makes the concepts clearer, but adds a O(log(n)) memory overhead on top of the O(n) memory overhead that both solutions share. When you're trying to communicate concepts, that's a straight win.
Furthermore in practice I believe that recursive is also a win.
In the iterative approach you keep making full passes through your entire array. Which, in the case of a large array, means that data comes into the cache for a pass, gets manipulated, and then falls out as you load the rest of the array. Only to have to be loaded again for the next pass.
In the recursive approach, by contrast, for the operations that are the equivalent of the first few passes you load them into cache, completely sort them, then move on. (How many passes you get this win for depends heavily on data type, memory layout, and the size of your CPU cache.) You are only loading/unloading from cache when you're merging too much data to fit into cache. Algorithms courses generally omit such low-level details, but they matter a lot to real-world performance.
Found out that there exists Iterative version of Merge Sort algorithm with same time complexity but even better O(1) space complexity
The iterative, bottom-up implementation of Merge Sort you linked to, doesn't have O(1) space complexity. It maintains a copy of the array, so this represents O(n) space complexity. By consequence that makes the additional O(logn) stack space (for the recursive implementation) irrelevant for the total space complexity.
In the title of your question, and in some comments, you use the words "auxiliary space complexity". This is what we usually mean with space complexity, but you seem to suggest this term means constant space complexity. This is not true. "Auxiliary" refers to the space other than the space used by the input. This term tells us nothing about the actual complexity.
Recursive top down merge sort is mostly educational. Most actual libraries use some variation of a hybrid insertion sort and bottom up merge sort, using insertion sort to create small sorted runs that will be merged in an even number of merge passes, so that merging back and forth between original and temp array ends up with the sorted data in the original array (no copy operation in merge other than singleton runs at the end of an array, which can be avoided by choosing an appropriate initial run size for insertion sort (note - this is not done in my example code, I only use run size 32 or 64, while a more advanced method like Timsort does choose an appropriate run size).
Bottom up is slightly faster because the array pointers and indexes will be kept in registers (assuming an optimizing compiler), while top down is pushing|popping array pointers and indexes to|from the stack.
Although I'm not sure that the OP actually meant O(1) space complexity for a merge sort, it is possible, but it is about 50% slower than conventional merge sort with O(n) auxiliary space. It's mostly an research (educational) effort now. The code is fairly complex. Link to example code. One of the options is no extra buffer at all. The benchmark table is for a relatively small number of keys (max is 32767 keys). For a large number of keys, this example ends up about 50% slower than an optimized insertion + bottom up merge sort (std::stable_sort is generalized, such as using a pointer to function for every compare, so it is not fully optimized).
https://github.com/Mrrl/GrailSort
Example hybrid insertion + bottom up merge sort C++ code (left out the prototypes):
void MergeSort(int a[], size_t n) // entry function
{
if(n < 2) // if size < 2 return
return;
int *b = new int[n];
MergeSort(a, b, n);
delete[] b;
}
void MergeSort(int a[], int b[], size_t n)
{
size_t s; // run size
s = ((GetPassCount(n) & 1) != 0) ? 32 : 64;
{ // insertion sort
size_t l, r;
size_t i, j;
int t;
for (l = 0; l < n; l = r) {
r = l + s;
if (r > n)r = n;
l--;
for (j = l + 2; j < r; j++) {
t = a[j];
i = j-1;
while(i != l && a[i] > t){
a[i+1] = a[i];
i--;
}
a[i+1] = t;
}
}
}
while(s < n){ // while not done
size_t ee = 0; // reset end index
size_t ll;
size_t rr;
while(ee < n){ // merge pairs of runs
ll = ee; // ll = start of left run
rr = ll+s; // rr = start of right run
if(rr >= n){ // if only left run
rr = n; // copy left run
while(ll < rr){
b[ll] = a[ll];
ll++;
}
break; // end of pass
}
ee = rr+s; // ee = end of right run
if(ee > n)
ee = n;
Merge(a, b, ll, rr, ee);
}
std::swap(a, b); // swap a and b
s <<= 1; // double the run size
}
}
void Merge(int a[], int b[], size_t ll, size_t rr, size_t ee)
{
size_t o = ll; // b[] index
size_t l = ll; // a[] left index
size_t r = rr; // a[] right index
while(1){ // merge data
if(a[l] <= a[r]){ // if a[l] <= a[r]
b[o++] = a[l++]; // copy a[l]
if(l < rr) // if not end of left run
continue; // continue (back to while)
while(r < ee) // else copy rest of right run
b[o++] = a[r++];
break; // and return
} else { // else a[l] > a[r]
b[o++] = a[r++]; // copy a[r]
if(r < ee) // if not end of right run
continue; // continue (back to while)
while(l < rr) // else copy rest of left run
b[o++] = a[l++];
break; // and return
}
}
}
size_t GetPassCount(size_t n) // return # passes
{
size_t i = 0;
for(size_t s = 1; s < n; s <<= 1)
i += 1;
return(i);
}

Efficiently count occurrences of each element from given ranges

So i have some ranges like these:
2 4
1 9
4 5
4 7
For this the result should be
1 -> 1
2 -> 2
3 -> 2
4 -> 4
5 -> 3
6 -> 2
7 -> 2
8 -> 1
9 -> 1
The naive approach will be to loop through all the ranges but that would be very inefficient and the worst case would take O(n * n)
What would be the efficient approach probably in O(n) or O(log(n))
Here's the solution, in O(n):
The rationale is to add a range [a, b] as a +1 in a, and a -1 after b. Then, after adding all the ranges, then compute the accumulated sums for that array and display it.
If you need to perform queries while adding the values, a better choice would be to use a Binary Indexed Tree, but your question doesn't seem to require this, so I left it out.
#include <iostream>
#define MAX 1000
using namespace std;
int T[MAX];
int main() {
int a, b;
int min_index = 0x1f1f1f1f, max_index = 0;
while(cin >> a >> b) {
T[a] += 1;
T[b+1] -= 1;
min_index = min(min_index, a);
max_index = max(max_index, b);
}
for(int i=min_index; i<=max_index; i++) {
T[i] += T[i-1];
cout << i << " -> " << T[i] << endl;
}
}
UPDATE: Based on the "provocations" (in a good sense) by גלעד ברקן, you can also do this in O(n log n):
#include <iostream>
#include <map>
#define ull unsigned long long
#define miit map<ull, int>::iterator
using namespace std;
map<ull, int> T;
int main() {
ull a, b;
while(cin >> a >> b) {
T[a] += 1;
T[b+1] -= 1;
}
ull last;
int count = 0;
for(miit it = T.begin(); it != T.end(); it++) {
if (count > 0)
for(ull i=last; i<it->first; i++)
cout << i << " " << count << endl;
count += it->second;
last = it->first;
}
}
The advantage of this solution is being able to support ranges with much larger values (as long as the output isn't so large).
The solution would be pretty simple:
generate two lists with the indices of all starting and ending indices of the ranges and sort them.
Generate a counter for the number of ranges that cover the current index. Start at the first item that is at any range and iterate over all numbers to the last element that is in any range. Now if an index is either part of the list of starting-indices, we add 1 to the counter, if it's an element of the ending-indices, we substract 1 from the counter.
Implementation:
vector<int> count(int** ranges , int rangecount , int rangemin , int rangemax)
{
vector<int> res;
set<int> open, close;
for(int** r = ranges ; r < ranges + sizeof(int*) * rangecount ; r++)
{
open.add((*r)[0]);
close.add((*r)[1]);
}
int rc = 0;
for(int i = rangemin ; i < rangemax ; i++)
{
if(open.count(i))
++rc;
res.add(rc);
if(close.count(i))
--rc;
}
return res;
}
Paul's answer still counts from "the first item that is at any range and iterate[s] over all numbers to the last element that is in any range." But what is we could aggregate overlapping counts? For example, if we have three (or say a very large number of) overlapping ranges [(2,6),[1,6],[2,8] the section (2,6) could be dependent only on the number of ranges, if we were to label the overlaps with their counts [(1),3(2,6),(7,8)]).
Using binary search (once for the start and a second time for the end of each interval), we could split the intervals and aggregate the counts in O(n * log m * l) time, where n is our number of given ranges and m is the number of resulting groups in the total range and l varies as the number of disjoint updates required for a particular overlap (the number of groups already within that range). Notice that at any time, we simply have a sorted list grouped as intervals with labeled count.
2 4
1 9
4 5
4 7
=>
(2,4)
(1),2(2,4),(5,9)
(1),2(2,3),3(4),2(5),(6,9)
(1),2(2,3),4(4),3(5),2(6,7),(8,9)
So you want the output to be an array, where the value of each element is the number of input ranges that include it?
Yeah, the obvious solution would be to increment every element in the range by 1, for each range.
I think you can get more efficient if you sort the input ranges by start (primary), end (secondary). So for 32bit start and end, start:end can be a 64bit sort key. Actually, just sorting by start is fine, we need to sort the ends differently anyway.
Then you can see how many ranges you enter for an element, and (with a pqueue of range-ends) see how many you already left.
# pseudo-code with possible bugs.
# TODO: peek or put-back the element from ranges / ends
# that made the condition false.
pqueue ends; // priority queue
int depth = 0; // how many ranges contain this element
for i in output.len {
while (r = ranges.next && r.start <= i) {
ends.push(r.end);
depth++;
}
while (ends.pop < i) {
depth--;
}
output[i] = depth;
}
assert ends.empty();
Actually, we can just sort the starts and ends separately into two separate priority queues. There's no need to build the pqueue on the fly. (Sorting an array of integers is more efficient than sorting an array of structs by one struct member, because you don't have to copy around as much data.)

Converting A Recursive Function into a Non-Recursive Function

I'm trying to convert a recursive function into a non-recursive solution in pseudocode. The reason why I'm running into problems is that the method has two recursive calls in it.
Any help would be great. Thanks.
void mystery(int a, int b) {
if (b - a > 1) {
int mid = roundDown(a + b) / 2;
print mid;
mystery(a, mid);
mystery(mid + 1, b);
}
}
This one seems more interesting, it will result in displaying all numbers from a to (b-1) in an order specific to the recursive function. Note that all of the "left" midpoints get printed before any "right" midpoints.
void mystery (int a, int b) {
if (b > a) {
int mid = roundDown(a + b) / 2;
print mid;
mystery(a, mid);
mystery(mid + 1, b);
}
}
For example, if a = 0, and b = 16, then the output is:
8 4 2 1 0 3 6 5 7 12 10 9 11 14 13 15
The texbook method to turn a recursive procedure into an iterative one is simply to replace the recursive call with
a stack and run a "do loop" until the stack is empty.
Try the following:
push(0, 16); /* Prime the stack */
call mystery;
...
void mystery {
do until stackempty() { /* iterate until stack is empty */
pop(a, b) /* pop and process... */
do while (b - a >= 1) { /* run the "current" values down... */
int mid = roundDown(a+b)/2;
print mid;
push(mid+1, b); /* push in place of recursive call */
b = mid;
}
}
The original function had two recusive calls, so why only a single stack? Ignore the requirements for
the second recursive call and you can easily see
the first recursive call (mystery(a, mid);) could implemented as a simple loop where b assumes the value of mid
on each iteration - nothing else needs to be "remembered". So turn it into a loop and simply push
the parameters needed for the recusion onto a stack,
add an outer loop to run the stack down. Done.
With a bit of creative thinking, any recursive function can be turned into an iterative one using stacks.
This is what is happening. You have a long rod, you are dividing it into two. Then you take these two parts and divide it into two. You do this with each sub-part until the length of that part becomes 1.
How would you do that?
Assume you have to break the rod at mid point. We will put the marks to cut in bins for further cuts. Note: each part spawns two new parts so we need 2n boxes to store sub-parts.
len = pow (2, b-a+1) // +1 might not be needed
ar = int[len] // large array will memoize my marks to cut
ar[0] = a // first mark
ar[1] = b // last mark
start_ptr = 0 // will start from this point
end_ptr = 1 // will end to this point
new_end = end_ptr // our end point will move for cuts
while true: //loop endlessly, I do not know, may there is a limit
while start_ptr < end_ptr: // look into bins
i = ar[start_ptr] //
j = ar[start_ptr+1] // pair-wise ends
if j - i > 1 // if lengthier than unit length, then add new marks
mid = floor ( (i+j) / 2 ) // this is my mid
print mid
ar[++new_end] = i // first mark --|
ar[++new_end] = mid - 1 // mid - 1 mark --+-- these two create one pair
ar[++new_end] = mid + 1 // 2nd half 1st mark --|
ar[++new_end] = j // end mark --+-- these two create 2nd pair
start_ptr = start_ptr + 2 // jump to next two ends
if end_ptr == new_end // if we passed to all the pairs and no new pair
break // was created, we are done.
else
end_ptr = new_end //else, start from sub prolem
PS: I haven't tried this code. This is just a pseudo code. It seems to me that it should do the job. Let me know if you try it out. It will validate my algorithm. It is basically a b-tree in an array.
This example recursively splits a range of numbers until the range is reduced to a single value. The output shows the structure of the numbers. The single values are output in order, but grouped based on the left side first split function.
void split(int a, int b)
{
int m;
if ((b - a) < 2) { /* if size == 1, return */
printf(" | %2d", a);
return;
}
m = (a + b) / 2; /* else split array */
printf("\n%2d %2d %2d", a, m, b);
split(a, m);
split(m, b);
}

Efficient Way to Arrange Odd and Even Data Sequentially

I have data like (1,2,3,4,5,6,7,8) .I want to arrange them in a way like (1,3,5,7,2,4,6,8) in n/2-2 swap without using any array and loop must be use 1 or less.
Note that i have to do the swap in existing array of number.If there is other way like without swap and without extra array use,
Please give me some advice.
maintain two pointers: p1,p2. p1 goes from start to end, p2 goes from end to start, and swap non matching elements.
pseudo code:
specialSort(array):
p1 <- array.start()
p2 <- array.end()
while (p1 != p2):
if (*p1 %2 == 0):
p1 <- p1 + 1;
continue;
if (*p2 %2 == 1):
p2 <- p2 -1;
continue;
//when here, both p1 and p2 need a swap
swap(p1,p2);
Note that complexity is O(n), at least one of p1 or p2 changes in every second iteration, so the loop cannot repeat more the 2*n=O(n) times. [we can find better bound, but it is not needed]. space complexity is trivially O(1), we allocate a constant amount of space: 2 pointers only.
Note2: if your language does not support pointers [i.e. java,ml,...], it can be replaced with indexes: i1 going from start to end, i2 going from end to start, with the same algorithm principle.
#include <stdio.h>
#include <string.h>
char array[26] = "ABcdEfGiHjklMNOPqrsTUVWxyZ" ;
#define COUNTOF(a_) (sizeof(a_)/sizeof(a_)[0])
#define IS_ODD(e) ((e)&0x20)
#define IS_EVEN(e) (!IS_ODD(e))
void doswap (char *ptr, unsigned sizl, unsigned sizr);
int main(void)
{
unsigned bot,limit,cut,top,size;
size = COUNTOF(array);
printf("Before:%26.26s\n", array);
/* pass 1 count the number of EVEN chars */
for (limit=top=0; top < size; top++) {
if ( IS_EVEN( array[top] ) ) limit++;
}
/* skip initial segment of EVEN */
for (bot=0; bot < limit;bot++ ) {
if ( IS_ODD(array[bot])) break;
}
/* Find leading strech of misplaced ODD + trailing stretch of EVEN */
for (cut=bot;bot < limit; cut = top) {
/* count misplaced items */
for ( ;cut < size && IS_ODD(array[cut]); cut++) {;}
/* count shiftable items */
for (top=cut;top < size && IS_EVEN(array[top]); top++) {;}
/* Now, [bot...cut) and [cut...top) are two blocks
** that need to be swapped: swap them */
doswap(array+bot, cut-bot, top-cut);
bot += top-cut;
}
printf("Result:%26.26s\n", array);
return 0;
}
void doswap (char *ptr, unsigned sizl, unsigned sizr)
{
if (!sizl || !sizr) return;
if (sizl >= sizr) {
char tmp[sizr];
memcpy(tmp, ptr+sizl, sizr);
memmove(ptr+sizr, ptr, sizl);
memcpy(ptr, tmp, sizr);
}
else {
char tmp[sizr];
memcpy(tmp, ptr, sizl);
memmove(ptr, ptr+sizl, sizr);
memcpy(ptr+sizl, tmp, sizl);
}
}

compaction in an array storing 2 linked lists

An array Arr ( size n ) can represent doubly linked list.
[ Say the cells have struct { int val, next, prev; } ]
I have two lists A and B stored in the array.
A has m nodes and B has n - m nodes.
These nodes being scattered, I want to rearrange them such that all nodes of A are from Arr[0] .. Arr[m-1] and rest are filled by nodes of B, in O(m) time.
The solution that occurs to me is to :
Iterate A till a node occurs which is placed beyond Arr[m-1]
then, iterate B till a node occurs which is placed before Arr[m]
swap the two ( including the manipulation of the next prev links of them and their neighbours).
However in this case the total number of iterations is O(n + m). Hence there should be a better answer.
P.S:
This question occurs in Introduction to Algorithms, 2nd edition.
Problem 10.3-5
How about iterating through list A and placing each element in Arr[0] ... Arr[m-1], obviously swapping its position with whatever was there before and updating the prev/next links as well. There will be a lot of swapping but nevertheless it will be O(m) since once you finish iterating through A (m iterations), all of its elements will be located (in order, incidentally) in the first m slots of Arr, and thus B must be located entirely in the rest of Arr.
To add some pseudocode
a := index of head of A
for i in 0 ... m-1
swap Arr[i], Arr[a]
a := index of next element in A
end
i think "jw013" is right but the idea needs some improvements :
by swapping your are changing the address of elements in the Arr array .
so you need to be careful about that !
e.g. lets say we have Arr like :
indices: 0 1 2 3 4
| 2 | empty | 3 | empty | 1 | (assume the link list is like 1 -> 2 -> 3)
so Arr[4].next is 0 and Arr[0].next is 2 .
but when you swap Arr[4] and Arr[0] then Arr[0].next is 0 .
which is not what we want to happen so we should consider adjusting pointers when swapping.
so the code for it is like :
public static void compactify(int List_head , int Free , node [] array){
int List_lenght ;
List_lenght = find_listlenght(List_head , array);
if(List_lenght != 0){ // if the list is not empty
int a = List_head;
for (int i = 0; i < List_lenght ; i++) {
swap( array , a , i );
a = array[i].next;
print_mem(array);
}
}
}
now when calling swap:
private static void swap(node[] array, int a, int i) {
// adjust the next and prev of both array[a] and array[i]
int next_a = array[a].next;
int next_i = array[i].next;
int prev_a = array[a].prev;
int prev_i = array[i].prev;
// if array[a] has a next adjust the array[next_a].prev to i
if(next_a != -1)
array[next_a].prev = i;
// if array[i] has a next adjust the array[next_i].prev to a
if(next_i != -1)
array[next_i].prev = a;
// like before adjust the pointers of array[prev_a] and array[prev_i]
if(prev_a != -1)
array[prev_a].next = i;
if(prev_i != -1)
array[prev_i].next = a;
node temp = array[a];
array[a] = array[i];
array[i] = temp;
}

Resources