Embaressingly parallel processing using Open MPI and OpenMP - openmp

It may be a silly question, anyway I am working on an embarrassingly parallel problem. I can divide the work into independent tasks (no communication) that can be performed in parallel.
In a shell script.sh one can use the following:
#!/bin/bash
let MY_ID=${OMPI_COMM_WORLD_RANK}
./a.out $MY_ID
In prog.c we have a simple independent program:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[]){
int myid = atoi(argv[1]);
if(myid%2==0)
printf("\nI am %d and I am an even process\n", myid);
else
printf("\nI am %d and I am an odd process\n", myid);
return 0;
}
And finally, to execute the program with 12 different processors:
mpirun -np 12 script.sh
My question is that is it possible to do the same thing using OpenMP and perhaps environment variables like OMP_NUM_THREADS ?

There is no OpenMP environment variable which, when passed to a program in the way you indicate, will be given a value different for each invocation of the program. If you think about how OpenMP works you might think, as I do, that this doesn't make much sense - since the OpenMP program is implemented as a collection of threads. This is in marked contrast to the model of operations of MPI in which each process is a separate process, bolstered with a library for inter-process communications - in this case it makes sense that each process have a unique identifier for facilitating communication. When an OpenMP program executes communication between threads is effected by operations on shared memory locations, not by passing messages to specified threads.
In passing, you're not really writing an MPI program, just using one of the facilities its environment provides for making your shell-script writing a little easier. You could almost as easily write a shell-script to send a different id to each invocation of the program without the MPI environment - and you could do the same for invocations of an OpenMP program. Though why you would do either if your programs really are independent I don't know.

Too long to post it as a comment. Actually, this kind of parallel processing is useful for ensemble modeling. When your program runs for different initial condition, and each run creates different folder or output file.
I came up to this :
export OMP_NUM_THREADS=4
g++ prog.cpp -fopenmp
./a.out
in prog.cpp we have:
#include<stdio.h>
#include<stdlib.h>
#include<omp.h>
#include<string>
using std::string;
int main(){
int nthreads, tid;
string cmd;
#pragma omp parallel private(tid)
{
tid = omp_get_thread_num();
cmd = "echo 'this is a test. I am thread :' " + std::to_string(tid) ;
system(cmd.c_str()); //each thread can creat a folder and run a script
// system("./my_script.sh");
}
return 0;
}
with the output as follows:
this is a test. I am thread : 0
this is a test. I am thread : 3
this is a test. I am thread : 2
this is a test. I am thread : 1

Related

Why can't CIVL verify OpenMP programs in my desktop?

I installed CIVL in ubuntu 14.04 and want to use it to verify OpenMP programs, but I encountered some questions as following:
If I use the command 'civl verify file_name.c' to verify OpenMP, it just regard the OpenMP program as a common C program, because I missed a optional parameter '-ompNoSimplify'.
However, if I use the command 'civl verify -ompNoSimplify file_name.c', it cannot verify any OpenMP programs, even the very simple OpenMP programs. such as the following program
#include <stdio.h>
#include <omp.h>
int main()
{
#pragma omp parallel for
for (char i = 'a'; i <= 'z'; i++)
printf("%c\n",i);
return 0;
}
the output of verification

Capture vDSO in strace

I was wondering if there is a way to capture (in other words observe) vDSO calls like gettimeofday in strace.
Also, is there a way to execute a binary without loading linux-vdso.so.1 (a flag or env variable)?
And lastly, what if I write a program that delete the linux-vdso.so.1 address from the auxiliary vector and then execve my program? Has anyone ever tried that?
You can capture calls to system calls which have been implemented via the vDSO by using ltrace instead of strace. This is because calls to system calls implemented via the vDSO work differently than "normal" system calls and the method strace uses to trace system calls does not work with vDSO-implemented system calls. To learn more about how strace works, check out this blog post I wrote about strace. And, to learn more about how ltrace works, check out this other blog post I wrote about ltrace.
No, it is not possible to execute a binary without loading linux-vdso.so.1. At least, not on my version of libc on Ubuntu precise. It is certainly possible that newer versions of libc/eglibc/etc have added this as a feature but it seems very unlikely. See the next answer for why.
If you delete the address from the auxillary vector, your binary will probably crash. libc has a piece of code which will first attempt to walk the vDSO ELF object, and if this fails, will fall back to a hardcoded vsyscall address. The only way it will avoid this is if you've compiled glibc with the vDSO disabled.
There is another workaround, though, if you really, really don't want to use the vDSO. You can try using glibc's syscall function and pass in the syscall number for gettimeofday. This will force glibc to call gettimeofday via the kernel instead of the vDSO.
I've included a sample program below illustrating this. You can read more about how system calls work by reading my syscall blog post.
#include <sys/time.h>
#include <stdio.h>
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
int
main(int argc, char *argv[]) {
struct timeval tv;
syscall(SYS_gettimeofday, &tv);
return 0;
}
Compile with gcc -o test test.c and strace with strace -ttTf ./test 2>&1 | grep gettimeofday:
09:57:32.651876 gettimeofday({1467305852, 651888}, {420, 140735905220705}) = 0 <0.000006>

Creating multiple segment for section using linker script

Could anyone please point me out what is the limitation to the alignment value which makes the creation of multiple segment for a section.
With The test case mentioned below:
#include <stdio.h>
#define SIZE (1 << 11)
int Buffer[SIZE] __attribute__ ((aligned (SIZE * sizeof(int)))) ;
int main (int argc, char * argv[])
{
printf("Test\n");
return 0;
}
And here if i change the macro as:
#define SIZE (1 << 11) to #define SIZE (1 << 12)
Without the above changes we see only two loadable segment while with the above changes we observed three loadable segment. As the alignment of BSS changes from 8K to 16K for GCC 4.8.1 which creates three loadable segments.
So can anyone please tell me what changes need to be done in linker script to make creation of only one loadable segment for data.
There are two ways of creating a linker script:
1. Under ld/emulparam directory there are shell script which creates linker script.
2. While other part of linker script came from the actual source.
Now in the source part there depends which linker you are using ie. GNU linker or gold linker.
GNU Linker script is build based on:
Under directory ~/binutils-2013.11/ld/emulparams/, there are different architecture specifics shell script based on different ELF type and platform like for i386/Vxworks
elf_i386_vxworks.sh
While still the rest of generic contains in the script came from the ld/elf sources.
While about segment creation then please look into procedure **bfd_boolean
_bfd_elf_map_sections_to_segments (bfd *abfd, struct bfd_link_info *info)**
under source "bfd/elf.c"

How to get backtrace without core file?

In case core-dump is not generated(due to any possible reason). And I want to know back trace (execution sequence of instructions). How Can I do that?
As /proc/pid/maps stores the memory map of a process.
Is there any file in linux which stores userspace or kernel space of a process?(May be I'm using wrong words to express)
I mean to say all address by address sequence of execution of instructions.
To see what the kernel stack for a process looks like at the present time:
sudo cat /proc/PID/stack
If you want to see the user stack for a process, and can get to it while it is still running (even if it is stuck waiting for a system call to return), run gdb and attach to it using its PID. Then use the backtrace command. This will be much more informative if the program was compiled with debug symbols.
If you want backtrace to be printed in Linux kernel use dump_stack()
If you want backtrace to be printed in user-level C code, implement something like this
#include <stdlib.h>
#include <stdio.h>
#include <execinfo.h>
#define BACKTRACE_SIZ 64
void show_backtrace (void)
{
void *array[BACKTRACE_SIZ];
size_t size, i;
char **strings;
size = backtrace(array, BACKTRACE_SIZ);
strings = backtrace_symbols(array, size);
for (i = 0; i < size; i++) {
printf("%p : %s\n", array[i], strings[i]);
}
free(strings); // malloced by backtrace_symbols
}
And then compile the code with -funwind-tables flag and link with -rdynamic
As told in http://www.stlinux.com/devel/debug/backtrace

sys_call_table in linux kernel 2.6.18

I am trying to set the sys exit call to a variable by
extern void *sys_call_table[];
real_sys_exit = sys_call_table[__NR_exit]
however, when I try to make, the console gives me the error
error: ‘__NR_exit’ undeclared (first use in this function)
Any tips would be appreciated :) Thank you
Since you are in kernel 2.6.x , sys_call_table isnt exported any more.
If you want to avoid the compilation error try this include
#include<linux/unistd.h>
however, It will not work. So the work around to "play" with the sys_call_table is to find the address of sys_call_table in SystemXXXX.map (located at /boot) with this command:
grep sys_call System.map-2.6.X -i
this will give the addres, then this code should allow you to modify the table:
unsigned long *sys_call_table;
sys_call_table = (unsigned long *) simple_strtoul("0xc0318500",NULL,16);
original_mkdir = sys_call_table[__NR_mkdir];
sys_call_table[__NR_mkdir] = mkdir_modificado;
Hope it works for you, I have just tested it under kernel 2.6.24, so should work for 2.6.18
also check here, Its a very good
http://commons.oreilly.com/wiki/index.php/Network_Security_Tools/Modifying_and_Hacking_Security_Tools/Fun_with_Linux_Kernel_Modules
If you haven't included the file syscall.h, you should do that ahead of the reference to __NR_exit. For example,
#include <syscall.h>
#include <stdio.h>
int main()
{
printf("%d\n", __NR_exit);
return 0;
}
which returns:
$ cc t.c
$ ./a.out
60
Some other observations:
If you've already included the file, the usual reasons __NR_exit wouldn't be defined are that the definition was being ignored due to conditional compilation (#ifdef or #ifndef at work somewhere) or because it's being removed elsewhere through a #undef.
If you're writing the code for kernel space, you have a completely different set of headers to use. LXR (http://lxr.linux.no/linux) searchable, browsable archive of the kernel source is a helpful resource.

Resources