How to preprocess OpenCL kernel?

How to preprocess OpenCL kernel? - compilation

Its simple to call the preprocessor on a c/c++ code:
g++ -E <file>.cpp
And it passes through the preprocessor and generates the preprocessed code.
I have OpenCL kernel in .cl how to achieve the same?
This is what I did and failed:
g++ -E -I. -std=c++11 -g -O3 -march=native -I/path/to/opencl/include/ -Wno-unused-result kernel.cl -L/path/to/opencl/lib/x86_64/ -lOpenCL -lquadmath
g++: warning: kernel.cl: linker input file unused because linking not done
Thanks

OpenCL code can run on a different architecture to the one that you are using to compile this on. You might find that there are differences depending on the complile time settings in the code that depend on the physical configuration of the target.
The most reliable method for generation of the postprocessed code for AMD devices is to ask the framework to save the temporary files, including the postprocessed output files.
On linux all you need to do for AMD is set an environment varisable. ie:
export AMD_OCL_BUILD_OPTIONS_APPEND="-save-temps"
When you compile you opencl program you will see a few files in /tmp. The one with the .i extension is the postprocessed file. This might be different to the one that you will get using cpp on the host architecture.

Related

How to create and link a static library for an ARM project using arm-none-eabi-gcc?

I want to create a static library libmylib.a from mylib.c/.h and link it to a project to use this library in bootloader code using the arm-none-eabi-gcc cross compiler in ubuntu 20.04 LTS.
I have an electronic engineering background, so I'm kind of new in this compiler and linker stuff.
What I know:
I've been searching about this, and found out that '.a' are just packed '.o' files, and that's it. You can do it using ar in linux. I don't know how to manage the dependencies for this '.a' file, for example, or how to link it to the project.
What I want to know:
I really want to understand how it works, to compile and generate the bin, elf or hex files using these static libraries for arm using the arm-none-eabi-gcc cross compiler (found some for linux), but I don't know how to search for this properly, how to learn it in a linear way. If you guys could help me on this I would be really grateful.

First you create your library objects. Let us say that you have a foo function written in foo.c, then you do:
arm-none-eabi-gcc -c foo.c
The -c options tells the compiler to stop after assembling and no go further.
Then you need to create the .a file
arm-none-eabi-ar -rc libfoo.a foo.o
this command creates a static library called libfoo.a
At the end you compile your main with:
arm-none-eabi-gcc -L. -lfoo main.c -o main
Note that in -l flag we don put "lib" and ".a", those are automagically added. The -L. flag tells gcc to look into the current folder for library files.

Why can I skip "device code linking" with a non-nvcc linker?

If I run the following:
c++ -c --std=c++11 $(includes) -o src/main.o src/main.cpp
nvcc -c -m64 -arch=sm_30 --std=c++11 $(includes) -o src/kernels/add.o src/kernels/add.cu
ar qc src/kernels/libkernels.a src/kernels/add.o
ranlib src/kernels/libkernels.a
c++ -o program -L/usr/local/cuda/lib64 src/main.o src/kernels/libkernels.a -lcudart -lcudadevrt
It works. Shouldn't it fail because I didn't perform a -dlink phase? The Parallel4All blog entry on separate compilation says:
When you use nvcc to link, there is nothing special to do: replace your normal compiler command with nvcc and it will take care of all the necessary steps. However, you may choose to use a compiler driver other than nvcc (such as g++) for the final link step. Since your CPU compiler will not know how to link CUDA device code, you’ll have to add a step in your build to have nvcc link the CUDA device code, using the nvcc option –dlink.
nvcc –arch=sm_20 –dlink v3.o particle.o main.o –o gpuCode.o
This links all the device object code and places it into gpuCode.o. Note that this does not link the CPU object code. In fact, the CPU object code in v3.o, particle.o, and main.o is discarded in this step. To complete the link to an executable, we can use ld or g++.
g++ gpuCode.o main.o particle.o v3.o –lcudart –o app
Does the use of a .a library somehow make up for the lack of "device code linking"?
PS - I'm using CUDA 8.0.61 on Linux Mint 18.2

Device code linking is not required in all scenarios. (This must be true, because prior to CUDA 5.0 there was no device code linking.)
Device code linking is required in a number of scenarios, the most typical being when linking of device code must occur across different compilation units. This means that device code in one module (,file,compilation unit) calls device code in another module (,file, compilation unit).
I can tell for a fact that this scenario is not present in your case, because there is exactly one module (,file, compilation unit) of yours that contains any device code:
nvcc -c -m64 -arch=sm_30 --std=c++11 $(includes) -o src/kernels/add.o src/kernels/add.cu
^^
only one file here
I know this to be true, because any attempt to compile any device code by an ordinary host-code compiler other than nvcc will throw syntax errors. Since this is not happening in your case, and you have only one file that could possibly have device code, you cannot possibly have a scenario where device-code linking is required, so your method works.
Even if you had for example multiple .cu files, you might still not need device code linking if no device code in a file called device code (or referenced device variables) in another file.
The relevant section of the nvcc manual covers the topic of device code linking in more detail.

NVCC separate compilation with PTX output

Just to see what kind of code CUDA is generating I like to compile to ptx in addition to an object file. Since some of my loop unrolling can take quite a while I'd like to be able to compile *.cu→*.ptx→*.o instead of wasting time with both *.cu→*.ptx and *.cu→*.o, which I'm currently doing.
Simply adding -ptx to the nvcc *.cu line gives the desired ptx output.
Using ptxas -c to compile *.ptx to *.o works, but causes an error in my executable linking: Relocations in generic ELF (EM: 190).
Attempting to compile the *.ptx with nvcc fails silently, outputting nothing.
this image is quite helpful:
Is there some option I need to pass to ptxas? How should I properly compile via ptx with separate compilation? Alternatively, can I just tell nvcc to keep the ptx?

Alternatively, can I just tell nvcc to keep the ptx?
Yes, you can tell nvcc to keep all intermediate files, one of which will be the .ptx file.
nvcc -keep ...
Keeping all the intermediate files is a bit messy, but I'm sure you can come up with a script to tidy things up, and only save the files you want.

arm-elf-gcc. No .gcda file created, program too big to fit in memory?

I was trying to do code coverage on a simple hello world program in C++.
The target device is an arm processor and hence I am using GNU ARM toolchain.
arm-elf-gcc -mcpu=arm7tdmi -O2 -g -c main.c -o main.exe creates a .gcno file but fails to create a .gcda file which is needed by gcov to find out the code coverage.
Normally when I run g++/gcc -fprofile-arcs -ftest-coverage .cpp,it first creates a .gcno file and an .exe. After running the a.exe , it generates the .gcda file.
Here when I try to run the main.exe to generate the .gcda, it throws an error - Program too big to fit in memory.
How do I resolve this issue?
Am I going wrong somehere?
Thanks,
A-J

Obviously, you have to run your executable on the target device. The target device must have a filesystem. Upon exit, the executable writes coverage information using ordinary POSIX functions - open, fcntl, write, close, etc. Look at gcov-io.c in GCC sources. Make sure you can successfully link libgcov.a into your executable, that you have write permission on the target device, etc.

How to compile MPI with gcc?

Does anyone know if it is possible to compile MPI with gcc?. I need to use gcc, no mpicc.

mpicc is just a wrapper around certain set of compilers. Most implementations have their mpicc wrappers understand a special option like -showme (Open MPI) or -show (Open MPI, MPICH and derivates) that gives the full list of options that the wrapper passes on to the backend compiler.
For example, in Open MPI, wrappers are C++ programs that read plain text configuration files and build command line options that are further passed on to the compiler. mpicc -showme shows the full list of such options:
$ mpicc -showme
icc
-I/opt/MPI/openmpi-1.5.3/linux/intel/include
-I/opt/MPI/openmpi-1.5.3/linux/intel/include/openmpi
-fexceptions
-pthread
-I/opt/MPI/openmpi-1.5.3/linux/intel/lib
-Wl,-rpath,/opt/MPI/openmpi-1.5.3/linux/intel/lib
-I/opt/MPI/openmpi-1.5.3/linux/intel/lib
-L/opt/MPI/openmpi-1.5.3/linux/intel/lib
-lmpi
-ldl
-Wl,--export-dynamic
-lnsl
-lutil
(it's really a single line that I have split here to improve readability)
It that particular case Intel C Compiler icc is used as the backend compiler but we also have variants that use GCC. You can also get the list of options needed for the comple phase (usually known as CFLAGS) with mpicc -showme:compile:
$ mpicc -showme:compile
-I/opt/MPI/openmpi-1.5.3/linux/intel/include
-I/opt/MPI/openmpi-1.5.3/linux/intel/include/openmpi
-fexceptions
-pthread
-I/opt/MPI/openmpi-1.5.3/linux/intel/lib
as well as the list of options that you need to pass to the linker (known as LDFLAGS) with mpicc -showme:link:
$ mpicc -showme:link
-fexceptions
-pthread
-I/opt/MPI/openmpi-1.5.3/linux/intel/lib
-Wl,-rpath,/opt/MPI/openmpi-1.5.3/linux/intel/lib
-I/opt/MPI/openmpi-1.5.3/linux/intel/lib
-L/opt/MPI/openmpi-1.5.3/linux/intel/lib
-lmpi
-ldl
-Wl,--export-dynamic
-lnsl
-lutil
These could be used, e.g. in a Makefile, like this:
...
CFLAGS += $(shell mpicc -showme:compile)
LDFLAGS += $(shell mpicc -showme:link)
...
As far as I know -showme:compile and -showme:link are specific to Open MPI and other implementations only give the full list of options when called with -show.
I still think it's better to use mpicc directly because if it happens that something in the MPI setup is changed, it will be immediately reflected in the wrapper while you would have to change your build script / Makefile manually (unless you use -showme:compile and -showme:link to obtain the options automatically).

mpicc -compile_info for MPICH.

Yes, you can use gcc actually. But in my case (on Ubuntu) mpicc is just a wrapper of gcc, here is the output of command mpicc -showme:
gcc -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi -pthread -Wl,-rpath -Wl,/usr/lib/openmpi/lib -Wl,--enable-new-dtags -L/usr/lib/openmpi/lib -lmpi
While in Open MPI docs:
The Open MPI team strongly recommends that you simply use Open MPI's "wrapper" compilers to compile your MPI applications. That is, instead of using (for example) gcc to compile your program, use mpicc.
We repeat the above statement: the Open MPI Team strongly recommends that the use the wrapper compilers to compile and link MPI applications.
If you find yourself saying, "But I don't want to use wrapper compilers!", please humor us and try them. See if they work for you. Be sure to let us know if they do not work for you.
Many people base their "wrapper compilers suck!" mentality on bad behavior from poorly-implemented wrapper compilers in the mid-1990's. Things are much better these days; wrapper compilers can handle almost any situation, and are far more reliable than you attempting to hard-code the Open MPI-specific compiler and linker flags manually.
That being said, there are some -- very, very few -- situations where using wrapper compilers can be problematic -- such as nesting multiple wrapper compilers of multiple projects. Hence, Open MPI provides a workaround to find out what command line flags you need to compile MPI applications.
Here this answer is useful for you.

For MPICH, according to the mpicc man pages, mpicc -compile_info shows the flags for compiling a program, and mpicc -link_info shows the flags for linking a program.

Yes, you can certainly compile an MPI program without the convenience of the mpicc wrapper. On most implementations mpicc is a shell script (or similar) which sets environment variables, finds and links various libraries, all the sort of stuff that you might otherwise put into a Makefile.
I suggest that you find an instance of the mpicc script and deconstruct it.

mpicc is already using gcc as a backend

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio