OpenMP pragma translation to runtime calls - gcc

I wrote a short program in C with OpenMP pragma, and I need to know to which libGOMP function a pragma is translated by GCC.
Here is my marvelous code:
#include <stdio.h>
#include "omp.h"
int main(int argc, char** argv)
{
int k = 0;
#pragma omp parallel private(k) num_threads(4)
{
k = omp_get_thread_num();
printf("Hello World from %d !\n", k);
}
return 0;
}
In order to generate intermediate language from GCC v8.2.0, I compiled this program with the following command:
gcc -fopenmp -o hello.exe hello.c -fdump-tree-ompexp
And the result is given by:
;; Function main (main, funcdef_no=0, decl_uid=2694, cgraph_uid=0, symbol_order=0)
OMP region tree
bb 2: gimple_omp_parallel
bb 3: GIMPLE_OMP_RETURN
Added new low gimple function main._omp_fn.0 to callgraph
Introduced new external node (omp_get_thread_num/2).
Introduced new external node (printf/3).
;; Function main._omp_fn.0 (main._omp_fn.0, funcdef_no=1, decl_uid=2700, cgraph_uid=1, symbol_order=1)
main._omp_fn.0 (void * .omp_data_i)
{
int k;
<bb 6> :
<bb 3> :
k = omp_get_thread_num ();
printf ("Hello World from %d !\n", k);
return;
}
;; Function main (main, funcdef_no=0, decl_uid=2694, cgraph_uid=0, symbol_order=0)
Merging blocks 2 and 7
Merging blocks 2 and 4
main (int argc, char * * argv)
{
int k;
int D.2698;
<bb 2> :
k = 0;
__builtin_GOMP_parallel (main._omp_fn.0, 0B, 4, 0);
D.2698 = 0;
<bb 3> :
<L0>:
return D.2698;
}
The function call to "__builtin_GOMP_parallel" is what it interest me. So, I looked at the source code of the libGOMP from GCC.
However, the only function calls I found was (from parallel.c file):
GOMP_parallel_start (void (*fn) (void *), void *data, unsigned num_threads)
GOMP_parallel_end (void)
So, I can imiagine that, in a certain manner, the call to "__builtin_GOMP_parallel" is transformed to GOMP_parallel_start and GOMP_parallel_end.
How can I be sure of this assumption ? How can I found the translation from the builtin function to the two other ones I found in the source code ?
Thank you

You almost got it. __builtin_GOMP_parallel is just a compiler alias to GOMP_parallel (defined in omp-builtins.def) which is translated very late in compilation, you can see the actual call in the assembly with gcc -S.
GOMP_parallel is similar to
GOMP_parallel_start(...);
fn(...);
GOMP_parallel_end();

Related

multiple compilation and main() function in c

I have a problem related to the functioning of gcc for the compilation of different files.
My goal would be to have a program (see script1.c) that would load (compile?) The functions from the script2.c and script3.c file on each run.
This is what my scripts should look like (the names (2 main() and 1 init()) of the functions in the script2.c and script3.c files must not be changed).
script1.c:
int main(int argc, char *argv[]){
printf("loaded main() in script1.c\n");
int ret = main(argc, argv); // main() script2.c
if(ret == 0){
init(argc, argv); // init() script3.c
ret = main(argc, argv); // main() script3.c
if(ret == 0){
ret = main(argc, argv); // main() script3.c
}
}
return(ret);
}
script2.c:
int main(int argc, char *argv[]){
printf("loaded main() in script2.c\n");
return(0);
}
script3.c:
void init(int argc, char *argv[]){
...
argv[0] = (char*)strdup("OK");
printf("loaded init() in script3.c\n");
}
int main(int argc, char *argv[]){
...
if(strcmp(argv[0], "OK") == 0){
printf("loaded init() in script3.c\n");
}
return(0);
}
I would like to return :
loaded main() in script1.c
loaded main() in script2.c
loaded init() in script3.c
loaded main() in script3.c
loaded main() in script3.c
I logically have loops on the main() function.
Here are the different methods I was able to try:
Change the name of the main() function of script1.c to _start(), and use the "-Wl,--allow-multiple-definition -nostartfiles" option during compilation.
Result: Error with the main() function of script2.c is logically initialized 2 times.
Use the system() and exec functions in script1.c to compile/run scripts2.c and script3.c.
Result: Error with script3.c which executes the main() function, before the init() function. I would need, I think, a working equivalent of the "-Wl, -init, init" option for this to work.
Add attribute((constructor)) in the init() function so that it is started before the main() of script3.c. The problem, is that in my example, the init() function will be launched 2 times :
Here is an example of something that may resemble what you are trying to do. I have one main function as well as three helper functions, aptly named func1, func2 and func3. They all reside in the same directory. Each function is in their own source file, named after the function for clarity (but the source file name is arbitrary). main(), the program's "entry point", calls func1, then func2; func2, in turn, calls func3. The parameters to main() are faithfully passed on each time, and output.
In a typical program one would also have a header file that contains the function declarations for all three helper functions; but in this small example one can simply write them manually.
func1.c:
#include <stdio.h>
void func1(int argc, char **argv)
{
printf("func1: Called with the following arguments:\n");
for(int i = 0; i<argc; i++)
{
printf("func1, arg. %d (index %d): ->%s<-\n", i+1, i, argv[i]);
}
printf("func1: Returning\n");
}
func2.c:
#include <stdio.h>
extern void func3(int, char **);
void func2(int argc, char **argv)
{
printf("func2: Calling func3\n");
func3(argc, argv);
printf("func2: returning\n");
}
func3.c
#include <stdio.h>
void func3(int argc, char **argv)
{
printf("func3: Called with the following arguments:\n");
for(int i = 0; i<argc; i++)
{
printf("func3, Arg. %d (index %d): ->%s<-\n", i+1, i, argv[i]);
}
printf("func3: Returning\n");
}
And main.c:
#include <stdio.h>
extern void func1(int, char **);
extern void func2(int, char **);
int main(int argc, char **argv)
{
printf("main(): Called with the following arguments:\n");
for(int i = 0; i<argc; i++)
{
printf("Arg. %d (index %d): ->%s<-\n", i+1, i, argv[i]);
}
printf("********* main(): Calling func1 *********\n");
func1(argc, argv);
printf("********* main(): Calling func2 *********\n");
func2(argc, argv);
}
Sample session: Compile with
$ gcc -Wall -o main main.c func1.c func2.c func3.c
and then execute with e.g.
$ ./main 1 2 3
Output:
Arg. 1 (index 0): ->./main<-
Arg. 2 (index 1): ->1<-
Arg. 3 (index 2): ->2<-
Arg. 4 (index 3): ->3<-
********* main(): Calling func1 *********
func1: Called with the following arguments:
func1, arg. 1 (index 0): ->./main<-
func1, arg. 2 (index 1): ->1<-
func1, arg. 3 (index 2): ->2<-
func1, arg. 4 (index 3): ->3<-
func1: Returning
********* main(): Calling func2 *********
func2: Calling func3
func3: Called with the following arguments:
func3, Arg. 1 (index 0): ->./main<-
func3, Arg. 2 (index 1): ->1<-
func3, Arg. 3 (index 2): ->2<-
func3, Arg. 4 (index 3): ->3<-
func3: Returning
func2: returning
You can see that the first argument to the program is put in their by the command shell and C runtime library: It's the string with which the program was called, which is customary, here "./main". Only after that come the explicit command line arguments "1", "2" and "3".

Raspberry PI4 Segmentation Fault

I have a short program that is causing Segmentation Fault on RPi4 after run several times (e.g.: 10 times in a loop).
I am using Raspbian GNU/Linux 10 (buster) and default gcc compiler (sudo apt install build-essential)
gcc --version
gcc (Raspbian 8.3.0-6+rpi1) 8.3.0
Do you think this is a gcc compiler problem? Maybe I am missing some special settings for RPi4.
I am using this to build:
gcc threads.c -o threads -l pthread
The output is sometimes (not always) something like this:
...
in thread_dummy, loop: 003
Segmentation fault
The code is here:
#include <stdio.h> /* for puts() */
#include <unistd.h> /* for sleep() */
#include <stdlib.h> /* for EXIT_SUCCESS */
#include <pthread.h>
#define PTR_SIZE (0xFFFFFF)
#define PTR_CNT (10)
void* thread_dummy(void* param)
{
void* ptr = malloc(PTR_SIZE);
//fprintf(stderr, "thread num: %03i, stack: %08X, heap: %08X - %08X\n", (int)param, (unsigned int)&param, (unsigned int)ptr, (unsigned int)((unsigned char*)ptr + PTR_SIZE));
fprintf(stderr, "in thread_dummy, loop: %03i\n", (int)param);
sleep(1);
free(ptr);
pthread_detach(pthread_self());
return NULL;
}
int main(void)
{
void* ptrs[PTR_CNT];
pthread_t threads[PTR_CNT];
for(int i=0; i<PTR_CNT; ++i)
{
ptrs[i] = malloc(PTR_SIZE);
//fprintf(stderr, "main num: %03i, stack: %08X, heap: %08X - %08X\n", i, (unsigned int)&ptrs, (unsigned int)ptrs[i], (unsigned int)((unsigned char*)ptrs[i] + PTR_SIZE));
fprintf(stderr, "in main, loop: %03i\n", i);
}
fprintf(stderr, "-----------------------------------------------------------\n");
for(int i=0; i<PTR_CNT; ++i)
pthread_create(&threads[i], 0, thread_dummy, (void*)i);
for(int i=0; i<PTR_CNT; ++i)
pthread_join(threads[i], NULL);
for(int i=0; i<PTR_CNT; ++i)
free(ptrs[i]);
return EXIT_SUCCESS;
}
UPDATE:
I also tested it with new gcc, but the problem remains...
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/arm-linux-gnueabihf/11.1.0/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../configure --enable-languages=c,c++,fortran --with-cpu=cortex-a72 --with-fpu=neon-fp-armv8 --with-float=hard --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.1.0 (GCC)
pthread_create is like malloc, and pthread_detach or pthread_join is like free. You are basically doing something like "double free" - you detach a thread and join it at the same time. Either detach or join the thread.
You could remove pthread_join from main. But you should logically remove pthread_detach(...) from inside the thread, which is actually useless because the thread terminates right after anyway.

Cannot understand how libgomp implements the FOR construct

According to the libgomp manual, a code in the form:
#pragma omp parallel for
for (i = lb; i <= ub; i++)
body;
becomes
void subfunction (void *data)
{
long _s0, _e0;
while (GOMP_loop_static_next (&_s0, &_e0))
{
long _e1 = _e0, i;
for (i = _s0; i < _e1; i++)
body;
}
GOMP_loop_end_nowait ();
}
GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
subfunction (NULL);
GOMP_parallel_end ();
I did a very tiny program to debug just to see how this implementation works:
int main(int argc, char** argv)
{
int res, i;
# pragma omp parallel for num_threads(4)
for(i = 0; i < 400000; i++)
res = res*argc;
return 0;
}
Next, I ran gdb and set breakpoints to "GOMP_parallel_loop_static" and "GOMP_parallel_end". At the beginning, the library was not loaded, so they were pending. By the time a ran the test program inside gdb, I got the result below:
(gdb) run 2 1 6 5 4 3 8 7
Starting program: ./test 2 1 6 5 4 3 8 7
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-
gnu/libthread_db.so.1".
[New Thread 0x7ffff73c9700 (LWP 5381)]
[New Thread 0x7ffff6bc8700 (LWP 5382)]
[New Thread 0x7ffff63c7700 (LWP 5383)]
Thread 1 "test" hit Breakpoint 2, 0x00007ffff7bc0c00 in GOMP_parallel_end () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
As you can see, It reached the second breakpoint, in "GOMP_parallel_end" but not the first. I would like to know how could this be possible if the libgomp manual shows clearly that "GOMP_parallel_loop_static" comes first.
Thank you.
That part of GCC's documentation has not really been updated regularly, so it's probably a good idea to only read it as an approximation of what is actually happening. If you're interested in that level of detail, I suggest you look at the debug files generated by -fdump-tree-all and similar options.
With a recent version of GCC, your example generates a call to __builtin_GOMP_parallel, which maps to GOMP_parallel. That one internally calls GOMP_parallel_end at the end, so that's what you're seeing, I suppose.
void
GOMP_parallel (void (*fn) (void *), void *data, unsigned num_threads, unsigned int flags)
{
num_threads = gomp_resolve_num_threads (num_threads, 0);
gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads));
fn (data);
ialias_call (GOMP_parallel_end) ();
}
Of course, patches to update the documentation will be gladly accepted. :-)

MakeCodeWritable

good afternoon.
I got the code below on a book. I'm trying to execute it, but I don't know what is the "first" and "last" parameters on the MakeCodeWritable function, or where I can find them. Someone can help? This code is about C obfuscation method. I'm using Xcode program and LLVM GCC 4.2 compiler.
#include <stdio.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
typedef unsigned int uint32;
typedef char* caddr_t;
typedef uint32* waddr_t;
#define Tam_celula 64
#define ALIGN __attribute__((aligned(Tam_celula)))
void makeCodeWritable(char* first, char* last) {
char* firstpage = first - ((int)first % getpagesize());
char* lastpage = last - ((int)last % getpagesize());
int pages = (lastpage-firstpage)/getpagesize()+1;
if (mprotect(firstpage,pages*getpagesize(), PROT_READ|PROT_EXEC|PROT_WRITE)==-1) perror("mprotect");
}
void xor(caddr_t from, caddr_t to, int len){
int i;
for(i=0;i<len;i++){
*to ^= *from; from++; to++;
} }
void swap(caddr_t from, caddr_t to, int len){
int i;
for(i=0;i<len;i++){
char t = *from; *from = *to; *to = t; from++; to++;
} }
#define CELLSIZE 64
#define ALIGN asm volatile (".align 64\n");
void P() {
static int firsttime=1; if (firsttime) {
xor(&&cell5,&&cell2,CELLSIZE);
xor(&&cell0,&&cell3,CELLSIZE);
swap(&&cell1,&&cell4,CELLSIZE);
firsttime = 0; }
char* a[] = {&&align0,&&align1,&&align2,&&align3,&&align4,&&align5};
char*next[] ={&&cell0,&&cell1,&&cell2,&&cell3, &&cell4,&&cell5};
goto *next[0];
align0: ALIGN
cell0: printf("SPGM0\n");
xor(&&cell0,&&cell3,3*CELLSIZE);
goto *next[3];
align1: ALIGN
cell1: printf("SPGM2\n"); xor(&&cell0,&&cell3,3*CELLSIZE);
goto *next[4];
align2: ALIGN
cell2: printf("SPGM4\n"); xor(&&cell0,&&cell3,3*CELLSIZE);
goto *next[5];
align3: ALIGN
cell3: printf("SPGM1\n"); xor(&&cell3,&&cell0,3*CELLSIZE);
goto *next[1];
align4: ALIGN
cell4: printf("SPGM3\n"); xor(&&cell3,&&cell0,3*CELLSIZE);
goto *next[2];
align5: ALIGN
cell5: printf("SPGM5\n");
xor(&&cell3,&&cell0,3*CELLSIZE);
}
int main (int argc, char *argv[]) {
makeCodeWritable(...);
P(); P();
}
The first argument should be (char *)P, because it looks like you want to modify code inside function P. The second argument is the ending address of function P. You can first compile the code, and using objdump -d to see the address of beginning and end of P, then calculate the size of the function, SIZE, then manually specify in the makeCodeWritable( (char *)P, ((char *)P) + SIZE.
The second way is utilizing the as to get the size of function P, but it depends on the assembler language on your platform. This is code snipe I modified from your code, it should be able to compile and run in x86, x86_64 in GCC 4.x on Linux platform.
align5: ALIGN
cell5: printf("SPGM5\n");
xor(&&cell3,&&cell0,3*CELLSIZE);
// adding an label to the end of function P to assembly code
asm ("END_P: \n");
;
}
extern char __sizeof__myfunc[];
int main (int argc, char *argv[]) {
// calculate the code size, ending - starting address of P
asm (" __sizeof__myfunc = END_P-P \n");
// you can see the code size of P
printf("code size is %d\n", (unsigned)__sizeof__myfunc);
makeCodeWritable( (char*)P, ((char *)P) + (unsigned)__sizeof__myfunc);
P(); P();
}
With some modification to support LLVM GCC and as in Mac OS X
int main (int argc, char *argv[]) {
size_t sizeof__myfunc = 0;
asm volatile ("movq $(_END_P - _P),%0;"
: "=r" (sizeof__myfunc)
: );
printf("%d\n", sizeof__myfunc);

openMP is not creating threads in visual studio

My openMP version did not give any speed boost. I have a dual core machine and the CPU usage is always 50%. So I tried the sample program given in Wiki. Looks like the openMP compiler (Visual Studio 2008) is not creating more than one thread.
This is the program:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[]) {
int th_id, nthreads;
#pragma omp parallel private(th_id)
{
th_id = omp_get_thread_num();
printf("Hello World from thread %d\n", th_id);
#pragma omp barrier
if ( th_id == 0 ) {
nthreads = omp_get_num_threads();
printf("There are %d threads\n",nthreads);
}
}
return EXIT_SUCCESS;
}
This is the output that I get:
Hello World from thread 0
There are 1 threads
Press any key to continue . . .
There's nothing wrong with the program - so presumably there's some issue with how it's being compiled or run. Is this VS2008 Pro? A quick google around suggests OpenMP is not enabled in Standard. Is OpenMP enabled in Properties -> C/C++ -> Language -> OpenMP? (Eg, are you compiling with /openmp)? Is the environment variable OMP_NUM_THREADS being set to 1 somewhere when you run this?
If you want to test out your program with more than one thread, there are several constructs for specifying the number of threads in an OpenMP parallel region. They are, in order of precedence:
Evaluation of the if clause
Setting of the num_threads clause
Use of the omp_set_num_threads() library function
Setting of the OMP_NUM_THREADS environment variable
Implementation default
It sounds like your implementation is defaulting to one thread (assuming you don't have OMP_NUM_THREADS=1 set in your environment).
To test with 4 threads, for instance, you could add num_threads(4) to your #pragma omp parallel directive.
As the other answer noted, you won't really see any "speedup" because you aren't exploiting any parallelism. But it is reasonable to want to run a "hello world" program with several threads to test it out.
As mentioned here, http://docs.oracle.com/cd/E19422-01/819-3694/5_compiling.html I got it working by setting the environment variable OMP_DYNAMIC to FALSE
Why would you need more than one thread for that program? It's clearly the case that OpenMP realizes that it doesn't need to create an extra thread to run a program with no loops, no code that could run in parallel whatsoever.
Try running some parallel stuff with OpenMP. Something like this:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define CHUNKSIZE 10
#define N 100
int main (int argc, char *argv[])
{
int nthreads, tid, i, chunk;
float a[N], b[N], c[N];
/* Some initializations */
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(i,tid)
{
tid = omp_get_thread_num();
if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
printf("Thread %d starting...\n",tid);
#pragma omp for schedule(dynamic,chunk)
for (i=0; i<N; i++)
{
c[i] = a[i] + b[i];
printf("Thread %d: c[%d]= %f\n",tid,i,c[i]);
}
} /* end of parallel section */
}
If you want some hard core stuff, try running one of these.

Resources