How to measure OMP time with OMP tasks & recursive task workloads?

How to measure OMP time with OMP tasks & recursive task workloads? - parallel-processing

Below I try to sketch code that is parallelised using OpenMP tasks.
In the main function a parallel environment is started and immediately after doing so the code is wrapped into a #pragma omp master section. After computing the expected workload and depending on whether this workload is below a given threshold, the stuff that has to be done will either be passed to a serial function or to a function that will recursively split the workload and initialise separate tasks. The results from each single #pragma omp task are then aggregated after a #pragma omp taskwait directive.
int main() {
#pragma omp parallel
{
#pragma omp master
{
//do some serial stuff
//estimate if parallelisation is worth it.
const int workload = estimateWorkload();
if (workload < someBound) {
serialFunction();
}
else {
parallelFunction(workload);
}
}
}
}
int parallelFunction(int workload) {
if (workload < someBound) {
return serialFunction();
}
int result1, result2;
#pragma omp task shared(result1)
{
result1 = parallelFunction(workload/2);
}
#pragma omp task shared(result2)
{
result2 = parallelFunction(workload/2);
}
#pragma omp taskwait
return result1 < result2;
}
How do I measure the actual computing time of each thread in such a setting?
If I measure CPU time and have k active threads, then I will get k*wallTime, which makes sense because threads are initialized by the leading #pragma omp parallel directive and remain active all the time. However this does not give me any information on how much time the threads spend actually working, which makes the code hard to analyze.

Q : How do I measure the actual computing time of each thread in such a setting?
A trivial MOCK-UP CODE for a simple, semi-manual code-execution time profiling:
It is needless to say, that for "noisy" execution-platform, the choice of the CLOCK_MONOTONIC saves drifting time updates, yet does not "save" off-CPU-core wait-states, due to any (the more if heavy) "background"-(disturbing)-processes scheduled by the O/S.
Yet, for a prototyping phase, this is way easier than mounting all the "omp-native" callbacks'{ ompt_callback_task_create_t, ompt_callback_task_schedule_t, ompt_callback_task_dependence_t, ompt_callback_dispatch_t, ompt_callback_sync_region_t, ..., ompt_callback_thread_begin_t, ompt_callback_thread_end_t, ... } handling.
A SIDE-EFFECT BONUS :
The trivial code permits, if reporting and post-processing the recorded nested code-execution respective durations, to "frame" the hidden costs of the associated call-signatures' and recursion-nesting related overheads.
The revised, overhead-strict Amdahl's Law then stops lying you and starts to show you more precisely, when this code starts to lose on that very overhead-related ( plus due to a potential atomicity-of-work-unit(s) ) principally-[SERIAL]-add-on-costs to any expected True-[PARALLEL]-section(s) speedup ( expected from harnessing more (those and only those otherwise free) resources ).
That is always the hardest part of the War ( still to be fought ahead ... ).
EFFICIENCY of SCHEDULING & OCCUPIED RESOURCES' of a CALL to 2-ary task-SCHEDULED fun() with hidden 1-ary RECURSION:
CALL
42----*--------------------------------------------------------------------------------------*
: | |
: | 21----*---------------------------------------*
: | : | |
: | : | 10----*----------------*
: | : | : | |
: | : | : | 5----*----*
: | : | : | : | |
: | : | : | : | 2<
: | : | : | : 2< /
: | : | : 5----*----* 5___/___/................ #taskwait 2
: | : | : : | | /
: | : | : : | 2< /
: | : | : : 2< / /
: | : | : 5___/___/ /
: | : | 10___/____________/............................. #taskwait 5
: | : 10----*----------------* /
: | : : | | /
: | : : | 5----*----* /
: | : : | : | | /
: | : : | : | 2< /
: | : : | : 2< / /
: | : : 5----*----* 5___/___/ /
: | : : : | | / /
: | : : : | 2< / /
: | : : : 2< / / /
: | : : 5___/___/ / /
: | : 10___/____________/__________/.......................................................... #taskwait 10
: | 21___/
: 21----*---------------------------------------* /
: : | | /
: : | 10----*----------------* /
: : | : | | /
: : | : | 5----*----* /
: : | : | : | | /
: : | : | : | 2< /
: : | : | : 2< / /
: : | : 5----*----* 5___/___/ /
: : | : : | | / /
: : | : : | 2< / /
: : | : : 2< / / /
: : | : 5___/___/ / /
: : | 10___/____________/ /
: : 10----*----------------* / /
: : : | | / /
: : : | 5----*----* / /
: : : | : | | / /
: : : | : | 2< / /
: : : | : 2< / / /
: : : 5----*----* 5___/___/ / /
: : : : | | / / /
: : : : | 2< / / /
: : : : 2< / / / /
: : : 5___/___/ / / /
: : 10___/____________/__________/ /
: 21___/_______________________________________________________/...................................................................................................................... #taskwait 21
42___/
RET_/
EFFICIENCY
of SCHEDULING & OCCUPIED RESOURCES' of a CALL to 2-ary task-SCHEDULED fun() with hidden 1-ary RECURSION matters more and more for any growing scale of the workload becoming soon workload < someBound * 2 ^ W only at a cost of an awfully high W ( which causes W * k-ary-many times re-{-acquired, -allocated, -released} Wasted all k-ary-times requested #pragma omp task shared(...)-handling-related resources, throughout the progression of the whole pure-[SERIAL]-by-definition recursion dive-&-resurfacing back ).
It is easy to see how many resources will hang waiting ( due to even 1-ary RECURSION formulation ), until each and every dive to the deepest level of recursion bubbles back to the top level #pragma omp taskwait.
The costs of re-allocating new and new resources for each recursion-dive level will most often kill you on overhead-strict Amdahl's Law ( performance-wise ), if not getting into thrashing or system-configuration related overflow, due to devastating the real-system physical resources earlier, for any reasonably large recursion-depths.
These are the costs you need not have paid, if not using a "typomatically-cheap"-yet-expensive-in-(idle/wasted)-resources recursive-problem-formulation, even with the most lightweight 1-ary cases.
See how many :-denoted "waiting-lines" are there in parallel, besides how few |-denoted "computing-lines" in either phase of the topology, which waste/block, yet must let idle, all task-related resources ( memory & stack-space are just the more visible ones that are performance-wise very expensive to acquire (to just let most of the processing time idle waiting) or prone to crash due to overflow if over-subscribed beyond the real system's configuration capacities ).
The War is yours! Keep Walking ...
The Site-Compliance Disclaimer :------------------------------------------------------------------------------As per StackOverflow policy, the full mock-up code is posted here, for any case the Godbolt.org platform might turn inaccessible, otherwise feel free to prefer and/or use the Compiler Explorer tools, that go way, way beyond the sequence-of-chars put into the mock-up source-code thereThe choice & the joy from executing it is always yours :o)
#include <time.h>
int estimateWorkload() {
return 42; // _________________________________________________________ mock-up "workload"
}
int serial_a_bit_less_naive_factorial( int n ){
return ( n < 3 ) ? n : n * serial_a_bit_less_naive_factorial( n - 1 );
}
int serialFunction() {
return serial_a_bit_less_naive_factorial( 76 );
}
int parallelFunction( int workload, const int someBound ) { // __ pass both control parameters
struct timespec T0, T1;
int retFlag,
retValue,
result1,
result2;
retFlag = clock_gettime( CLOCK_MONOTONIC, &T0 ); // \/\/\/\/\/\/\/\/\/\ SECTION.begin
if ( workload < someBound ) {
retValue = serialFunction();
}
else { // -- [SEQ]----------------------------------------------------
#pragma omp task shared( result1 ) // -- [PAR]|||||||||||||||||||| with (1-ary recursions)
{
result1 = parallelFunction( (int) workload / 2, someBound ); // (int) fused DIV
}
#pragma omp task shared( result2 ) // -- [PAR]|||||||||||||||||||| with (1-ary recursions)
{
result2 = parallelFunction( (int) workload / 2, someBound ); // (int) fused DIV
}
#pragma omp taskwait
retValue = result1 < result2;
}
retFlag = clock_gettime( CLOCK_MONOTONIC, &T1 ); // \/\/\/\/\/\/\/\/\/\ SECTION.end
// ____________________________________________________________________ MAY ADD ACCUMULATION (1-ary recursions)
// ...
// ____________________________________________________________________ MAY ADD ACCUMULATION (1-ary recursions)
return retValue;
}
int main() {
const int someBound = 3; // _______________________________________ a control parameter A
#pragma omp parallel
{
#pragma omp master
{
// -- [SEQ]---------------------------------------- do some serial stuff
// ------------------------------estimate if parallelisation is worth it
const int workload = estimateWorkload();
if ( workload < someBound ) {
serialFunction();
}
else {
parallelFunction( workload, someBound ); // -- [PAR]||||||| with (1-ary recursions)
}
}
}
}

Related

ARM MMU Debugging

I have been working on a bare metal raspberry pi project, and I am now attempting to initialize the memory management unit, following the documentation and examples.
When I run it, however, on the pi, nothing happens afterwards, and when I run it in QEMU using gdb, gdb either crashes of I get a Prefetch Abort as exception 3. Have I incorrectly set some properties such as shareability, or incorrectly used isb, or is there something I have missed?
Here is my code below
pub unsafe fn init_mmu(start: usize) {
let tcr_el1 =
(0b10 << 30) | //4kb Granule
(0b10 << 28) | //TTBR1 outer shareable
(25 << 16) | //TTBR1 size
(0b10 << 12) | //TTBR0 outer shareable
25; //TTBR0 size
let sctlr_el1 =
(1 << 4) | //Enforce EL0 stack alignment
(1 << 3) | //Enforce EL1 stack alignment
(1 << 1) | // Enforce access alignment
1; //Enable MMU
//0000_0000: nGnRnE device memory
// 0100_0100: non cacheable
let mair = 0b0000_0000_0100_0100;
let mut table = core::slice::from_raw_parts_mut(start as *mut usize, 2048);
for i in 0..(512-8) {
table[i] = (i <<9) |
(1 << 10) | // AF
(0 << 2) | //MAIR index
1; //Block entry
}
for i in (512-8)..512 {
table[i] = (i << 9) |
(1 << 10) | // AF
(1 << 2) | //MAIR index
1; //Block entry
}
table[512] = (512 << 9) |
(1 << 10) | // AF
(1 << 2) | //MAIR index
1; //Block entry
table[1024] = (0x8000000000000000) | start | 3;
table[1025] = (0x8000000000000000) | (start + 512 * 64) | 3;
write!("mair_el1", mair);
write!("ttbr0_el1", start + 1024 * 64);
asm!("isb");
write!("tcr_el1", tcr_el1);
asm!("isb");
write!("sctlr_el1", sctlr_el1);
asm!("isb");
}

the data method of vector has some wrong

I use the data method of vector in C++, but I have some problems, the code is in belows:
#include <iostream>
#include <vector>
int main ()
{
std::vector<int> myvector (5);
int* p = myvector.data();
*p = 10;
++p;
*p = 20;
p[2] = 100;
std::cout << "myvector contains:";
for (unsigned i=0; i<myvector.size(); ++i)
std::cout << ' ' << myvector[i];
std::cout << '\n';
return 0;
}
the result is myvector contains: 10 20 0 100 0, but why the result is not myvector contains: 10 20 100 0 0, the first one *p = 10; is 10, the second one ++p;*p = 20; is 20, that's all right, but the third one p[2] = 100; should be 100, but it is 0, why?

With visuals:
std::vector<int> myvector (5);
// ---------------------
// | 0 | 0 | 0 | 0 | 0 |
// ---------------------
int* p = myvector.data();
// ---------------------
// | 0 | 0 | 0 | 0 | 0 |
// ---------------------
// ^
// p
*p = 10;
// ----------------------
// | 10 | 0 | 0 | 0 | 0 |
// ----------------------
// ^
// p
++p;
// ----------------------
// | 10 | 0 | 0 | 0 | 0 |
// ----------------------
// ^
// p
*p = 20;
// ----------------------
// | 10 | 20 | 0 | 0 | 0 |
// ----------------------
// ^
// p
p[2] = 100;
// -------------------------
// | 10 | 20 | 0 | 100 | 0 |
// -------------------------
// ^ ^
// p p[2]
It's helpful to remember that p[2] is a shorter way to say *(p + 2).

Because you are modifying p itself.
After ++p (which I remember you it's equivalent to p = p + 1), p points to the element at index 1, so p[2] points at element at index 3 from the beginning of the vector which is why the fourth element is changed instead.

After ++p, pointer p is pointing to myvector[1].
Then we have:
p[0] pointing to myvector[1]
p[1] pointing to myvector[2]
p[2] pointing to myvector[3]

Scheduling algorithm in linear complexity

Here's the problem:
We have n tasks to complete in n days. We can complete a single task per day. Each task has a start date and an end date. We can't start a task before the start date and it has to be completed before the end date. So, given a vector s for the start dates and e for the end dates give a vector d, if it exists, where d[i] is the date you do task i. For example:
s = {1, 0, 1, 2, 0}
e = {2, 4, 2, 3, 1}
+--------+------------+----------+
| Task # | Start Date | End Date |
+--------+------------+----------+
| 0 | 1 | 2 |
| 1 | 0 | 4 |
| 2 | 1 | 2 |
| 3 | 2 | 3 |
| 4 | 0 | 1 |
+--------+------------+----------+
We have as a possible solution:
d = {1, 4, 2, 3, 0}
+--------+----------------+
| Task # | Date Completed |
+--------+----------------+
| 0 | 1 |
| 1 | 4 |
| 2 | 2 |
| 3 | 3 |
| 4 | 0 |
+--------+----------------+
It is trivial to create an algorithm with O(n^2). It is not too bad either to create an algorithm with O(nlogn). There is supposedly an algorithm that gives a solution with O(n). What would that be?

When you can't use time, use space! You can represent the tasks open on any day using a bit vector. In O(n) create a "starting today" array. You can also represent the tasks ending soonest using another bit vector that can also be calculated in O(n). And then finally, in O(n) again scan each day, adding in any tasks starting that day, picking the lowest numbered task open that day giving priority to the ones ending soonest.
using System.IO;
using System;
using System.Math;
class Program
{
static void Main()
{
int n = 5;
var s = new int[]{1, 0, 1, 2, 0};
var e = new int[]{2, 4, 2, 3, 1};
var sd = new int[n];
var ed = new int[n];
for (int task = 0; task < n; task++)
{
sd[s[task]] += (1 << task); // Start for task i
ed[e[task]] += (1 << task); // End for task i
}
int bits = 0;
// Track the earliest ending task
var ends = new int[n];
for (int day = n-1; day >= 0; day--)
{
if (ed[day] != 0) // task(s) ending today
{
// replace any tasks that have later end dates
bits = ed[day];
}
ends[day] = bits;
bits = bits ^ sd[day]; // remove any starting
}
var d = new int[n];
bits = 0;
for (int day = 0; day < n; day++)
{
bits |= sd[day]; // add any starting
int lowestBit;
if ((ends[day] != 0) && ((bits & ends[day]) != 0))
{
// Have early ending tasks to deal with
// and have not dealt with it yet
int tb = bits & ends[day];
lowestBit = tb & (-tb);
if (lowestBit == 0) throw new Exception("Fail");
}
else
{
lowestBit = bits & (-bits);
}
int task = (int)Math.Log(lowestBit, 2);
d[task] = day;
bits = bits - lowestBit; // remove task
}
Console.WriteLine(string.Join(", ", d));
}
}
Result in this case is: 1, 4, 2, 3, 0 as expected.

Correct me if I'm wrong, but I believe that the name for such problems would be Interval Scheduling.
From your post, I assume that you are not looking for the Optimal Schedule and that you're simply looking to find any solution within O(n).
The problem here is that sorting will take O(nlogn) and computing will also take O(nlogn). You can try do it in one step:
Define V - a vector of 'time taken' for each task.
Define P - a vector of tasks which finish before a given task.
i.e if Task 2 finishes before Task 3, P[3] = 2.
Of course, as you can see the main computation is involved in finding P[j]. You always take the latest-ending non-overlapping task.
Thus, to find P[i]. Find a task which ends before the ith task. If more than one exists, pick the one which ends last.
Define M - a vector containing the resultant slots.
The algorithm:
WIS(n)
M[0] <- 0
for j <- 1 to n
do M[j] = max(wj + M[p(j)], M[j-1])
return M[n]

Sort by end date, start date. Then process data in sorted order only executing tasks that are after the start date.

Understanding the Recursion of mergesort

Most of the mergesort implementations I see are similar to this. intro to algorithms book along with online implentations I search for. My recursion chops don't go much further than messing with Fibonacci generation (which was simple enough) so maybe it's the multiple recursions blowing my mind, but I can't even step through the code and understand whats going on even before I even hit the merge function.
How is it stepping through this? Is there some strategy or reading I should undergo to better understand the process here?
void mergesort(int *a, int*b, int low, int high)
{
int pivot;
if(low<high)
{
pivot=(low+high)/2;
mergesort(a,b,low,pivot);
mergesort(a,b,pivot+1,high);
merge(a,b,low,pivot,high);
}
}
and the merge(although frankly I'm mentally stuck before I even get to this part)
void merge(int *a, int *b, int low, int pivot, int high)
{
int h,i,j,k;
h=low;
i=low;
j=pivot+1;
while((h<=pivot)&&(j<=high))
{
if(a[h]<=a[j])
{
b[i]=a[h];
h++;
}
else
{
b[i]=a[j];
j++;
}
i++;
}
if(h>pivot)
{
for(k=j; k<=high; k++)
{
b[i]=a[k];
i++;
}
}
else
{
for(k=h; k<=pivot; k++)
{
b[i]=a[k];
i++;
}
}
for(k=low; k<=high; k++) a[k]=b[k];
}

MERGE SORT:
1) Split the array in half
2) Sort the left half
3) Sort the right half
4) Merge the two halves together

I think the "sort" function name in MergeSort is a bit of a misnomer, it should really be called "divide".
Here is a visualization of the algorithm in process.
Each time the function recurses, it's working on a smaller and smaller subdivision of the input array, starting with the left half of it. Each time the function returns from recursion, it will continue on and either start working on the right half, or recurse up again and work on a larger half.
Like this
[************************]mergesort
[************]mergesort(lo,mid)
[******]mergesort(lo,mid)
[***]mergesort(lo,mid)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[***]mergesort(mid+1,hi)
[**]mergesort*(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[******]merge
[******]mergesort(mid+1,hi)
[***]mergesort(lo,mid)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[***]mergesort(mid+1,hi)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[******]merge
[************]merge
[************]mergesort(mid+1,hi)
[******]mergesort(lo,mid)
[***]mergesort(lo,mid)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[***]mergesort(mid+1,hi)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[******]merge
[******]mergesort(mid+1,hi)
[***]mergesort(lo,mid)
[**]mergesort*(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[***]mergesort(mid+1,hi)
[**]mergesort(lo,mid)
[**]mergesort(mid+1,hi)
[***]merge
[******]merge
[************]merge
[************************]merge

An obvious thing to do would be to try this merge sort on a small array, say size 8 (power of 2 is convenient here), on paper. Pretend you are a computer executing the code, and see if it starts to become a bit clearer.
Your question is a bit ambiguous because you don't explain what you find confusing, but it sounds like you are trying to unroll the recursive calls in your head. Which may or may not be a good thing, but I think it can easily lead to having too much in your head at once. Instead of trying to trace the code from start to end, see if you can understand the concept abstractly. Merge sort:
Splits the array in half
Sorts the left half
Sorts the right half
Merges the two halves together
(1) should be fairly obvious and intuitive to you. For step (2) the key insight is this, the left half of an array... is an array. Assuming your merge sort works, it should be able to sort the left half of the array. Right? Step (4) is actually a pretty intuitive part of the algorithm. An example should make it trivial:
at the start
left: [1, 3, 5], right: [2, 4, 6, 7], out: []
after step 1
left: [3, 5], right: [2, 4, 6, 7], out: [1]
after step 2
left: [3, 5], right: [4, 6, 7], out: [1, 2]
after step 3
left: [5], right: [4, 6, 7], out: [1, 2, 3]
after step 4
left: [5], right: [6, 7], out: [1, 2, 3, 4]
after step 5
left: [], right: [6, 7], out: [1, 2, 3, 4, 5]
after step 6
left: [], right: [7], out: [1, 2, 3, 4, 5, 6]
at the end
left: [], right: [], out: [1, 2, 3, 4, 5, 6, 7]
So assuming that you understand (1) and (4), another way to think of merge sort would be this. Imagine someone else wrote mergesort() and you're confident that it works. Then you could use that implementation of mergesort() to write:
sort(myArray)
{
leftHalf = myArray.subArray(0, myArray.Length/2);
rightHalf = myArray.subArray(myArray.Length/2 + 1, myArray.Length - 1);
sortedLeftHalf = mergesort(leftHalf);
sortedRightHalf = mergesort(rightHalf);
sortedArray = merge(sortedLeftHalf, sortedRightHalf);
}
Note that sort doesn't use recursion. It just says "sort both halves and then merge them". If you understood the merge example above then hopefully you see intuitively that this sort function seems to do what it says... sort.
Now, if you look at it more carefully... sort() looks pretty much exactly like mergesort()! That's because it is mergesort() (except it doesn't have base cases because it's not recursive!).
But that's how I like thinking of recursive functions--assume that the function works when you call it. Treat it as a black box that does what you need it to. When you make that assumption, figuring out how to fill in that black box is often easy. For a given input, can you break it down into smaller inputs to feed to your black box? After you solve that, the only thing that's left is handling the base cases at the start of your function (which are the cases where you don't need to make any recursive calls. For example, mergesort([]) just returns an empty array; it doesn't make a recursive call to mergesort()).
Finally, this is a bit abstract, but a good way to understand recursion is actually to write mathematical proofs using induction. The same strategy used to write an proof by induction is used to write a recursive function:
Math proof:
Show the claim is true for the base cases
Assume it is true for inputs smaller than some n
Use that assumption to show that it is still true for an input of size n
Recursive function:
Handle the base cases
Assume that your recursive function works on inputs smaller than some n
Use that assumption to handle an input of size n

Concerning the recursion part of the merge sort, I've found this page to be very very helpful. You can follow the code as it's being executed. It shows you what gets executed first, and what follows next.
Tom

the mergesort() simply divides the array in two halves until the if condition fails that is low < high. As you are calling mergesort() twice : one with low to pivot and second with pivot+1 to high, this will divide the sub arrays even more further.
Lets take an example :
a[] = {9,7,2,5,6,3,4}
pivot = 0+6/2 (which will be 3)
=> first mergesort will recurse with array {9,7,2} : Left Array
=> second will pass the array {5,6,3,4} : Right Array
It will repeat until you have 1 element in each left as well as right array.
In the end you'll have something similar to this :
L : {9} {7} {2} R : {5} {6} {3} {4} (each L and R will have further sub L and R)
=> which on call to merge will become
L(L{7,9} R{2}) : R(L{5,6} R{3,4})
As you can see that each sub array are getting sorted in the merge function.
=> on next call to merge the next L and R sub arrays will get in order
L{2,7,9} : R{3,4,5,6}
Now both L and R sub array are sorted within
On last call to merge they'll be merged in order
Final Array would be sorted => {2,3,4,5,6,7,9}
See the merging steps in answer given by #roliu

My apologies if this has been answered this way. I acknowledge that this is just a sketch, rather than a deep explanation.
While it is not obvious to see how the actual code maps to the recursion, I was able to understand the recursion in a general sense this way.
Take a the example unsorted set {2,9,7,5} as input. The merge_sort algorithm is denoted by "ms" for brevity below. Then we can sketch the operation as:
step 1: ms( ms( ms(2),ms(9) ), ms( ms(7),ms(5) ) )
step 2: ms( ms({2},{9}), ms({7},{5}) )
step 3: ms( {2,9}, {5,7} )
step 4: {2,5,7,9}
It is important to note that merge_sort of a singlet (like {2}) is simply the singlet (ms(2) = {2}), so that at the deepest level of recursion we get our first answer. The remaining answers then tumble like dominoes as the interior recursions finish and are merged together.
Part of the genius of the algorithm is the way it builds the recursive formula of step 1 automatically through its construction. What helped me was the exercise of thinking how to turn step 1 above from a static formula to a general recursion.

Trying to work out each and every step of a recursion is often not an ideal approach, but for beginners, it definitely helps to understand the basic idea behind recursion, and also to get better at writing recursive functions.
Here's a C solution to Merge Sort :-
#include <stdio.h>
#include <stdlib.h>
void merge_sort(int *, unsigned);
void merge(int *, int *, int *, unsigned, unsigned);
int main(void)
{
unsigned size;
printf("Enter the no. of integers to be sorted: ");
scanf("%u", &size);
int * arr = (int *) malloc(size * sizeof(int));
if (arr == NULL)
exit(EXIT_FAILURE);
printf("Enter %u integers: ", size);
for (unsigned i = 0; i < size; i++)
scanf("%d", &arr[i]);
merge_sort(arr, size);
printf("\nSorted array: ");
for (unsigned i = 0; i < size; i++)
printf("%d ", arr[i]);
printf("\n");
free(arr);
return EXIT_SUCCESS;
}
void merge_sort(int * arr, unsigned size)
{
if (size > 1)
{
unsigned left_size = size / 2;
int * left = (int *) malloc(left_size * sizeof(int));
if (left == NULL)
exit(EXIT_FAILURE);
for (unsigned i = 0; i < left_size; i++)
left[i] = arr[i];
unsigned right_size = size - left_size;
int * right = (int *) malloc(right_size * sizeof(int));
if (right == NULL)
exit(EXIT_FAILURE);
for (unsigned i = 0; i < right_size; i++)
right[i] = arr[i + left_size];
merge_sort(left, left_size);
merge_sort(right, right_size);
merge(arr, left, right, left_size, right_size);
free(left);
free(right);
}
}
/*
This merge() function takes a target array (arr) and two sorted arrays (left and right),
all three of them allocated beforehand in some other function(s).
It then merges the two sorted arrays (left and right) into a single sorted array (arr).
It should be ensured that the size of arr is equal to the size of left plus the size of right.
*/
void merge(int * arr, int * left, int * right, unsigned left_size, unsigned right_size)
{
unsigned i = 0, j = 0, k = 0;
while ((i < left_size) && (j < right_size))
{
if (left[i] <= right[j])
arr[k++] = left[i++];
else
arr[k++] = right[j++];
}
while (i < left_size)
arr[k++] = left[i++];
while (j < right_size)
arr[k++] = right[j++];
}
Here's the step-by-step explanation of the recursion :-
Let arr be [1,4,0,3,7,9,8], having the address 0x0000.
In main(), merge_sort(arr, 7) is called, which is the same as merge_sort(0x0000, 7).
After all of the recursions are completed, arr (0x0000) becomes [0,1,3,4,7,8,9].
| | |
| | |
| | |
| | |
| | |
arr - 0x0000 - [1,4,0,3,7,9,8] | | |
size - 7 | | |
| | |
left = malloc() - 0x1000a (say) - [1,4,0] | | |
left_size - 3 | | |
| | |
right = malloc() - 0x1000b (say) - [3,7,9,8] | | |
right_size - 4 | | |
| | |
merge_sort(left, left_size) -------------------> | arr - 0x1000a - [1,4,0] | |
| size - 3 | |
| | |
| left = malloc() - 0x2000a (say) - [1] | |
| left_size = 1 | |
| | |
| right = malloc() - 0x2000b (say) - [4,0] | |
| right_size = 2 | |
| | |
| merge_sort(left, left_size) -------------------> | arr - 0x2000a - [1] |
| | size - 1 |
| left - 0x2000a - [1] <-------------------------- | (0x2000a has only 1 element) |
| | |
| | |
| merge_sort(right, right_size) -----------------> | arr - 0x2000b - [4,0] |
| | size - 2 |
| | |
| | left = malloc() - 0x3000a (say) - [4] |
| | left_size = 1 |
| | |
| | right = malloc() - 0x3000b (say) - [0] |
| | right_size = 1 |
| | |
| | merge_sort(left, left_size) -------------------> | arr - 0x3000a - [4]
| | | size - 1
| | left - 0x3000a - [4] <-------------------------- | (0x3000a has only 1 element)
| | |
| | |
| | merge_sort(right, right_size) -----------------> | arr - 0x3000b - [0]
| | | size - 1
| | right - 0x3000b - [0] <------------------------- | (0x3000b has only 1 element)
| | |
| | |
| | merge(arr, left, right, left_size, right_size) |
| | i.e. merge(0x2000b, 0x3000a, 0x3000b, 1, 1) |
| right - 0x2000b - [0,4] <----------------------- | (0x2000b is now sorted) |
| | |
| | free(left) (0x3000a is now freed) |
| | free(right) (0x3000b is now freed) |
| | |
| | |
| merge(arr, left, right, left_size, right_size) | |
| i.e. merge(0x1000a, 0x2000a, 0x2000b, 1, 2) | |
left - 0x1000a - [0,1,4] <---------------------- | (0x1000a is now sorted) | |
| | |
| free(left) (0x2000a is now freed) | |
| free(right) (0x2000b is now freed) | |
| | |
| | |
merge_sort(right, right_size) -----------------> | arr - 0x1000b - [3,7,9,8] | |
| size - 4 | |
| | |
| left = malloc() - 0x2000c (say) - [3,7] | |
| left_size = 2 | |
| | |
| right = malloc() - 0x2000d (say) - [9,8] | |
| right_size = 2 | |
| | |
| merge_sort(left, left_size) -------------------> | arr - 0x2000c - [3,7] |
| | size - 2 |
| | |
| | left = malloc() - 0x3000c (say) - [3] |
| | left_size = 1 |
| | |
| | right = malloc() - 0x3000d (say) - [7] |
| | right_size = 1 |
| | |
| | merge_sort(left, left_size) -------------------> | arr - 0x3000c - [3]
| left - [3,7] was already sorted, but | | size - 1
| that doesn't matter to this program. | left - 0x3000c - [3] <-------------------------- | (0x3000c has only 1 element)
| | |
| | |
| | merge_sort(right, right_size) -----------------> | arr - 0x3000d - [7]
| | | size - 1
| | right - 0x3000d - [7] <------------------------- | (0x3000d has only 1 element)
| | |
| | |
| | merge(arr, left, right, left_size, right_size) |
| | i.e. merge(0x2000c, 0x3000c, 0x3000d, 1, 1) |
| left - 0x2000c - [3,7] <------------------------ | (0x2000c is now sorted) |
| | |
| | free(left) (0x3000c is now freed) |
| | free(right) (0x3000d is now freed) |
| | |
| | |
| merge_sort(right, right_size) -----------------> | arr - 0x2000d - [9,8] |
| | size - 2 |
| | |
| | left = malloc() - 0x3000e (say) - [9] |
| | left_size = 1 |
| | |
| | right = malloc() - 0x3000f (say) - [8] |
| | right_size = 1 |
| | |
| | merge_sort(left, left_size) -------------------> | arr - 0x3000e - [9]
| | | size - 1
| | left - 0x3000e - [9] <-------------------------- | (0x3000e has only 1 element)
| | |
| | |
| | merge_sort(right, right_size) -----------------> | arr - 0x3000f - [8]
| | | size - 1
| | right - 0x3000f - [8] <------------------------- | (0x3000f has only 1 element)
| | |
| | |
| | merge(arr, left, right, left_size, right_size) |
| | i.e. merge(0x2000d, 0x3000e, 0x3000f, 1, 1) |
| right - 0x2000d - [8,9] <----------------------- | (0x2000d is now sorted) |
| | |
| | free(left) (0x3000e is now freed) |
| | free(right) (0x3000f is now freed) |
| | |
| | |
| merge(arr, left, right, left_size, right_size) | |
| i.e. merge(0x1000b, 0x2000c, 0x2000d, 2, 2) | |
right - 0x1000b - [3,7,8,9] <------------------- | (0x1000b is now sorted) | |
| | |
| free(left) (0x2000c is now freed) | |
| free(right) (0x2000d is now freed) | |
| | |
| | |
merge(arr, left, right, left_size, right_size) | | |
i.e. merge(0x0000, 0x1000a, 0x1000b, 3, 4) | | |
(0x0000 is now sorted) | | |
| | |
free(left) (0x1000a is now freed) | | |
free(right) (0x1000b is now freed) | | |
| | |
| | |
| | |

I know this is an old question but wanted to throw my thoughts of what helped me understand merge sort.
There are two big parts to merge sort
Splitting of the array into smaller chunks (dividing)
Merging the array together (conquering)
The role of the recurison is simply the dividing portion.
I think what confuses most people is that they think there is a lot of logic in the splitting and determining what to split, but most of the actual logic of sorting happens on the merge. The recursion is simply there to divide and do the first half and then the second half is really just looping, copying things over.
I see some answers that mention pivots but I would recommend not associating the word "pivot" with merge sort because that's an easy way to confuse merge sort with quicksort (which is heavily reliant on choosing a "pivot"). They are both "divide and conquer" algorithms. For merge sort the division always happens in the middle whereas for quicksort you can be clever with the division when choosing an optimal pivot.

process to divide the problem into subproblems
Given example will help you understand recursion. int A[]={number of element to be shorted.}, int p=0; (lover index). int r= A.length - 1;(Higher index).
class DivideConqure1 {
void devide(int A[], int p, int r) {
if (p < r) {
int q = (p + r) / 2; // divide problem into sub problems.
devide(A, p, q); //divide left problem into sub problems
devide(A, q + 1, r); //divide right problem into sub problems
merger(A, p, q, r); //merger the sub problem
}
}
void merger(int A[], int p, int q, int r) {
int L[] = new int[q - p + 1];
int R[] = new int[r - q + 0];
int a1 = 0;
int b1 = 0;
for (int i = p; i <= q; i++) { //store left sub problem in Left temp
L[a1] = A[i];
a1++;
}
for (int i = q + 1; i <= r; i++) { //store left sub problem in right temp
R[b1] = A[i];
b1++;
}
int a = 0;
int b = 0;
int c = 0;
for (int i = p; i < r; i++) {
if (a < L.length && b < R.length) {
c = i + 1;
if (L[a] <= R[b]) { //compare left element<= right element
A[i] = L[a];
a++;
} else {
A[i] = R[b];
b++;
}
}
}
if (a < L.length)
for (int i = a; i < L.length; i++) {
A[c] = L[i]; //store remaining element in Left temp into main problem
c++;
}
if (b < R.length)
for (int i = b; i < R.length; i++) {
A[c] = R[i]; //store remaining element in right temp into main problem
c++;
}
}

When you call the recursive method it does not execute the real function at the same time it's stack into stack memory. And when condition not satisfied then it's going to next line.
Consider that this is your array:
int a[] = {10,12,9,13,8,7,11,5};
So your method merge sort will work like below:
mergeSort(arr a, arr empty, 0 , 7);
mergeSort(arr a, arr empty, 0, 3);
mergeSort(arr a, arr empty,2,3);
mergeSort(arr a, arr empty, 0, 1);
after this `(low + high) / 2 == 0` so it will come out of first calling and going to next:
mergeSort(arr a, arr empty, 0+1,1);
for this also `(low + high) / 2 == 0` so it will come out of 2nd calling also and call:
merger(arr a, arr empty,0,0,1);
merger(arr a, arr empty,0,3,1);
.
.
So on
So all sorting values store in empty arr.
It might help to understand the how recursive function works

Antlr parsing hexadecimal number

I have following grammar for parsing time expressions like '1 day 2 hour'.
time : timeLiteral
| FLOAT
| INTEGER
;
timeLiteral : dayExpr hourExpr? minuteExpr? ;
dayExpr : timeVal ('days' | 'day') ;
hourExpr : timeVal ('hour'|'hours') ;
minuteExpr : timeVal ('minutes' | 'minute') ;
timeVal : INTEGER | FLOAT ;
INTEGER : '0' 'x' (HEXDIGIT)+
| (DIGIT)+
;
FLOAT : '.' (DIGIT)+ (EXPONENT)?
| ((DIGIT)+ '.') => (DIGIT)+ '.' (DIGIT)* (EXPONENT)?
| (DIGIT)+ ('.' (DIGIT)*)? EXPONENT
;
fragment DIGIT : '0'..'9';
fragment HEXDIGIT : 'a'..'f' | 'A'..'F' | DIGIT;
fragment EXPONENT : ('e' | 'E') ('+' | '-')? (DIGIT)+;
This works fine for expressions like '1day 2hour' but fails when hexadecimal values are used with day without space. Like '0x23day 0x56hour'. In this case lexer creates an INTEGER token with value '0x23da'. So instead of getting INTEGER, 'day' parser is getting INTEGER, 'y'.
What can I do to make lexer give me 0x23 as INTGER and 'day'.?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to measure OMP time with OMP tasks & recursive task workloads? - parallel-processing

Related

ARM MMU Debugging

the data method of vector has some wrong

Scheduling algorithm in linear complexity

Understanding the Recursion of mergesort

Antlr parsing hexadecimal number

Categories

Resources