What it takes to make OpenACC/OpenMP4.0 offloading to nvidia/mic work om GCC? - gcc

I am trying to understand how exactly I can use OpenACC to offload computation to my nvidia GPU on GCC 5.3. The more I google things the more confused I become. All the guides I find, they involve recompiling the entire gcc along with two libs called nvptx-tools and nvptx-newlib. Other sources say that OpenACC is part of GOMP library. Other sources say that the development for OpenACC support will continue only on GCC 6.x. Also I have read that support for OpenACC is in the main brunch of GCC. However if I compile a program with -fopenacc and -foffload=nvptx-non is just wont work. Can someone explain to me what exactly it takes to compiler and run OpenACC code with gcc 5.3+?
Why some guides seem to require (re)compilation of nvptx-tools, nvptx-newlib, and GCC, if, as some internet sources say, OpenACC support is part of GCC's main branch?
What is the role of the GOMP library in all this?
Is it true that development for OpenACC support will only be happening for GCC 6+ from now on?
When OpenACC support matures, is it the goal to enable it in a similar way we enable OpenMP (i.e., by just adding a couple of compiler flags)?
Can someone also provide answers to all the above after replacing "OpenACC" with "OpenMP 4.0 GPU/MIC offload capability"?
Thanks in advance

The link below contains a script that will compile gcc for OpenACC support.
https://github.com/olcf/OLCFHack15/blob/master/GCC5OffloadTest/auto-gcc5-offload-openacc-build-install.sh
OpenACC is part of GCC's main branch now, but there are some points to note. Even if there are libraries that are part of gcc, when you compile gcc, you have to specify which libraries to compile. Not all of them will be compiled by default. For OpenACC there's an additional problem. Since, NVIDIA drivers are not open source, GCC cannot compile OpenACC directly to binaries. It needs to compile OpenACC to the intermediate NVPTX instructions which the Nvidia runtime will handle. Therefore you also need to install nvptx libs.
GOMP library is the intermediate library that handles both OpenMP and OpenACC
Yes, I think OpenACC development will only be happening in GCC 6, but it may still be backported to GCC 5. But your best best would be to use GCC 6.
While I cannot comment on what GCC developers decide to do, I think in the first point I have already stated what the problems are. Unless NVIDIA make their drivers open source, I think an extra step will always be necessary.
I believe right now OpenMP is planned only for CPU's and MIC. I believe OpenMP support for both will probably become default behavior. I am not sure whether OpenMP targeting NVIDIA GPU's are immediately part of their target, but since GCC is using GOMP for both OpenMP and OpenACC, I believe eventually they might be able to do it. Also, GCC is also targeting HSA using OpenMP, so basically AMD APU's. I am not sure whether AMD GPU's will work the same way, but it maybe possible. Since, AMD is making their drivers open source, I believe they maybe easier to integrate into default behavior.

Related

Is it normal that the performance of executable compiled from gcc better than one compiled from LLVM?

I write some C code by myself and I compile them by LLVM or gcc.
Nowadays, I compiled two different executable from two compiler (LLVM and gcc), and ran 100,000 times on both of them.
I found that the performance of executable from gcc always better than LLVM one.
I know they are different type of compiler even different architecture.
But why gcc always beat LLVM on performance? What is the reason?
So this is your main question
Question
Is it normal that the performance of executable compiled from gcc better than one compiled from LLVM?
Although an interesting but apparently a very broad question. Because compiler performance depends on a lot of factors.
It depends heavily on your application.
It depends on the underlying architecture and processor
.........and many more factors
In addition to that, there are well established benchmarks to verify the compilers performance. With one application binary performing better is not the right benchmark and hence not an established result.
However, since application performance (program performance) is affected by the choice of the compiler, it completely makes sense that in your case you see GCC performance better than LLVM, however, it's not bound to happen always for every application.
For your further understanding, please check this:
StackOverflow post Clang vs GCC - which produces faster binaries?
Benchmarking LLVM & Clang Against GCC

Is it possible to build gcc 1.0 without a C compiler?

Is it possible to build gcc 1.0 with only an assembler, without any C compilers? If it is possible, how can I build it? If it is not possible, how did the first C compiler come out?
Let's say if we have a new architecture of CPU with a new set of instructions, and the only software that has been made for it, is the assembler, then how can I build a gcc compiler for it?
Early versions of GCC were written in C. At the time, the operating systems GCC targeted came with at least a rudimentary C compiler (maybe for K&R C only, without support for prototypes). There was no bootstrap from assembler code involved, even in the first release. For those who did not or could not build GCC by themselves, the FSF provided pre-built binaries on tape, for a fee.
Support for new architectures (if they support self-hosting at all) was and still is implemented using cross-compilers.

Compiling for Cortex M3 bare metal

Is there a guide somewhere that describes how to get LLVM to emit a binary for Cortex-M3 that I can massage into running bare metal? I've spent considerable time playing with LLVM on Windows and Ubuntu to no avail. I can get ARM-like assembly out. I can get bit code out, but what I really need is ELF, DWARF, Hobbit, Gandalf or any other Lord of the Rings critter that has a file format specification. Any and all help appreciated! I'm compiling LLVM 3.4 with CLANG on Ubuntu, Windows and/or OS X.
I created a firmware framework - PolyMCU https://github.com/labapart/polymcu - that is based on CMake that support GCC and LLVM. Because it is based on CMake you can build your firmware on Linux/Windows/MacOS.
It also uses Newlib and supports Baremetal/CMSIS RTOS (RTX)/FreeRTOS.
The benefit of using PolyMCU is this framework does not add any software layer on top of the libc and the MCU vendor's SDKs.
Another benefit is you can easily switch toolchains. I used this feature to get more feedback on my code by testing it with many compilers.
I also wrote a blog where I compared GCC and LLVM build size on ARM Cortex-M: http://labapart.com/blogs/3-the-importance-of-the-toolchain-version-in-embedded-space Interesting results, Clang generated code is not much bigger than GCC on Cortex-M...
The best guide that I know of is here: http://wiki.osdev.org/LLVM_Cross-Compiler. It's mostly about building an LLVM cross-compiler, but it does show a "Usage" section. However, that section specifically shows an example for a Cortex-A processor, but you should be able to get the general idea.
I have created an simple clang bare metal Cortex-M3 "hello world" program, but I don't have it in front of me. IIRC, the only options I needed were -march=thumb -mcpu=cortex-m3 as long as the LLVM compiler backend was built with the ARM thumb backend support (Again, see http://wiki.osdev.org/LLVM_Cross-Compiler). I did, however, need to link with arm-none-eabi-ld from the GCC toolchain here (http://launchpad.net/gcc-arm-embedded), and I believe that is how you can get your ELF binary.
I've since moved on to the D programming language, and I have a simple example using LDC (The LLVM D compiler) here (http://wiki.dlang.org/Extremely_minimal_semihosted_%22Hello_World%22)
So, I believe compiling bare metal ARM Cortex-M3 software with LLVM can be done, but it seems not many people have tried.
It is possible to use clang++ pulled from http://llvm.org/builds with https://launchpad.net/gcc-arm-embedded as a base, at least for the compile step.
Required extra arguments are the include paths hardcoded into gcc and certain arm-none-eabi defaults:
--target=arm-none-eabi -fshort-enums -isystem "../arm-none-eabi/include/c++/5.2.1" [-isystem ...]

Portable method to package C++11 program sources

so, C++11 has been around for a while and, given there already are compilers supporting it on most platforms, it would be nice to use it in some real software -- e.g. one that can be packaged in as-portable-as-possible package, preferably providing ./configure and so.
Because both Clang and GCC currently need -std=c++11 flag to compile c++11 source, and both sometimes require specific flags to work correctly (see for example How to compile C++11 with clang 3.2 on OSX lion? or C++11 Thread not working ), I'm quite afraid that the package won't work on some platforms that already support c++11 because of wrong invocation of compiler.
Q: Is there some standard how to correctly and portably compile c++11? E.g. autotools/autoconf check or some list of compiler/platform directives that describe all possible needed options? Or does the situation come from the fact that c++11 standard implementations are currently marked as "experimental" and the standard will eventually stabilize and become the default choice, not needing any usage of extra compiler flags?
Thanks
-exa
Well, if you`re trying to write portable code, i would recommend using cmake
a very powerful cross-platform, open-source build system.
Using cmake you should be able to identify the compilers available in your current machine and then generate your makefiles using the flags that you want in each case.
I have been using cmake for almost a year by now and it has significantly reduced the time consumed when trying to get a project compiling in different platforms.
I`m using CMake to generate Makefiles of C++11 projects. The only change in CMakeLists.txt I need to do is add the following:
ADD_DEFINITIONS("-std=gnu++11")
ADD_DEFINITIONS("-D_GLIBCXX_USE_C99_STDINT_TR1")
ADD_DEFINITIONS("-D_GLIBCXX_HAS_GTHREADS")
However, as I use Qt, I re-compile QtSDK with a new gcc version 4.8 and get a complete mingw system that use gcc in version 4.8.
Makings these changes, the project compile and run in Windows XP, Windows 7 and linux both 32 and 64 bits. I didn`t test it in OSX yet.

Can/should libiomp5 and libgomp mix?

We are compiling an application that uses OpenMP. We are using gcc 4.4, with -fopenmp. The app also uses IPP, which includes its own version of OpenMP (libiomp5). (Note: we are disabling IPP's internal threading by calling ippSetNumThread(1). According to Intel's documentation, this should avoid conflicts with other threading libraries. However, linking with IPP still links in libiomp5.so.)
Since libiomp5.so is already linked in, we have not been linking with libgomp.so (gcc's version of OpenMP). For a long time this has worked, but after a seemingly inconsequential change we started seeing very odd OpenMP-related crashes on one of four platforms we support (the other three platforms still work fine).
I can make the crashes go away if I link in libgomp.so as well as libiomp5.so.
I have a couple questions about this:
Is linking with both these libraries safe? It seems like they would both define the same symbols.
Is there a way to tell what version of OpenMP libiomp5.so supports? With gcc 4.4, libgomp.so should be at OpenMP v3.0. I can't find any information in Intel's documentation about the OpenMP version of libiomp5.so.
Since no one has answered for a few days, I'll just report what I've found out independently:
Is linking with both these libraries safe?
No. Here's the most useful page I found on this topic:
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/optaps/common/optaps_par_openmp_multiple_compilers.htm
Intel recommends that if you are going to be mixing IPP's internal OpenMP threading with your own OpenMP threading, you link to libiomp5 instead of your compiler's OpenMP library. The current version of libiomp5 provides "source compatibility and object-level interoperability" with gcc's OpenMP, but only if you are using gcc "4.42" (sic; I assume they mean 4.4.2) or later.
Is there a way to tell what version of OpenMP libiomp5.so supports?
Yes. Set the environment variable KMP_VERSION=1, then run your application. You'll get some debugging output printed by libiomp5 to your console. If you are using IPP v7 or later, one line will be something like
Intel(R) OMP API version: 3.0 (200805)
If you are using IPP 6, it won't tell you the API version, but it will tell you when it was built and with which version of the Intel compiler. Then you can check and see what version of OpenMP that compiler supported. (11.0 was the first version of the Intel compiler to support OpenMP v3.0.)

Resources