If I run the following:
c++ -c --std=c++11 $(includes) -o src/main.o src/main.cpp
nvcc -c -m64 -arch=sm_30 --std=c++11 $(includes) -o src/kernels/add.o src/kernels/add.cu
ar qc src/kernels/libkernels.a src/kernels/add.o
ranlib src/kernels/libkernels.a
c++ -o program -L/usr/local/cuda/lib64 src/main.o src/kernels/libkernels.a -lcudart -lcudadevrt
It works. Shouldn't it fail because I didn't perform a -dlink phase? The Parallel4All blog entry on separate compilation says:
When you use nvcc to link, there is nothing special to do: replace your normal compiler command with nvcc and it will take care of all the necessary steps. However, you may choose to use a compiler driver other than nvcc (such as g++) for the final link step. Since your CPU compiler will not know how to link CUDA device code, you’ll have to add a step in your build to have nvcc link the CUDA device code, using the nvcc option –dlink.
nvcc –arch=sm_20 –dlink v3.o particle.o main.o –o gpuCode.o
This links all the device object code and places it into gpuCode.o. Note that this does not link the CPU object code. In fact, the CPU object code in v3.o, particle.o, and main.o is discarded in this step. To complete the link to an executable, we can use ld or g++.
g++ gpuCode.o main.o particle.o v3.o –lcudart –o app
Does the use of a .a library somehow make up for the lack of "device code linking"?
PS - I'm using CUDA 8.0.61 on Linux Mint 18.2
Device code linking is not required in all scenarios. (This must be true, because prior to CUDA 5.0 there was no device code linking.)
Device code linking is required in a number of scenarios, the most typical being when linking of device code must occur across different compilation units. This means that device code in one module (,file,compilation unit) calls device code in another module (,file, compilation unit).
I can tell for a fact that this scenario is not present in your case, because there is exactly one module (,file, compilation unit) of yours that contains any device code:
nvcc -c -m64 -arch=sm_30 --std=c++11 $(includes) -o src/kernels/add.o src/kernels/add.cu
^^
only one file here
I know this to be true, because any attempt to compile any device code by an ordinary host-code compiler other than nvcc will throw syntax errors. Since this is not happening in your case, and you have only one file that could possibly have device code, you cannot possibly have a scenario where device-code linking is required, so your method works.
Even if you had for example multiple .cu files, you might still not need device code linking if no device code in a file called device code (or referenced device variables) in another file.
The relevant section of the nvcc manual covers the topic of device code linking in more detail.
Related
Its simple to call the preprocessor on a c/c++ code:
g++ -E <file>.cpp
And it passes through the preprocessor and generates the preprocessed code.
I have OpenCL kernel in .cl how to achieve the same?
This is what I did and failed:
g++ -E -I. -std=c++11 -g -O3 -march=native -I/path/to/opencl/include/ -Wno-unused-result kernel.cl -L/path/to/opencl/lib/x86_64/ -lOpenCL -lquadmath
g++: warning: kernel.cl: linker input file unused because linking not done
Thanks
OpenCL code can run on a different architecture to the one that you are using to compile this on. You might find that there are differences depending on the complile time settings in the code that depend on the physical configuration of the target.
The most reliable method for generation of the postprocessed code for AMD devices is to ask the framework to save the temporary files, including the postprocessed output files.
On linux all you need to do for AMD is set an environment varisable. ie:
export AMD_OCL_BUILD_OPTIONS_APPEND="-save-temps"
When you compile you opencl program you will see a few files in /tmp. The one with the .i extension is the postprocessed file. This might be different to the one that you will get using cpp on the host architecture.
I'm trying to use this existing library in my Android Application. It is a rather big library with a lot of code, and it's own makefile, changing all of that would be too much effort. That is why I want to adjust the Makefile so that it gets cross-compiled for Android devices.
First I want to start small and try to compile some example with this library and see if it runs on the Android device.
Here some parts that I thought are to be changed/added to the compiler and the flags:
CXX := ${ANDROID_NDK_TOOLCHAINS}/bin/clang++
CXX := ${ANDROID_NDK_TOOLCHAINS}/bin/clang
CFLAGS := -fPIE -fPIC -pie
CXXFLAGS := -fPIE -fPIC -pie
CXXFLAGS += --target=armv5te-none-linux-androideabi --gcc-toolchain=/home/Android/Sdk/ndk-bundle/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64
CFLAGS += --target=armv5te-none-linux-androideabi --gcc-toolchain=/home/Android/Sdk/ndk-bundle/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64
If I do not use the last two flags, compilation works, but I can only run the example on my pc, copying it to the Android device and starting it via console will give an error, that it was not compiled for this device.
So far that makes sense.
After adding the last two lines, it will abort and say:
fatal error: 'vector' file not found
Some research says that I have to tell the ndk compiler which C++ Runtime Libraries (https://developer.android.com/ndk/guides/cpp-support.html#c_runtime_libraries) it should use.
But how can I add this to my makefile? If I try to add it like it is added to Application.mk (http://mobilepearls.com/labs/native-android-api/ndk/docs/CPLUSPLUS-SUPPORT.html):
APP_STL := gnustl_static
Then it will still not find the vector class.
Do I have to add it differently? Is it possible to include it with just some flag?
We are catching link errors on Solaris with makefiles generated by CMake 3.6.2. In the testing below, we are using GCC and not SunCC. From the looks of it, CMake is applying our options inconsistently:
Typical compile command
[ 2%] Building CXX object CMakeFiles/cryptopp-object.dir/cpu.cpp.o
/bin/c++ -fPIC -march=native -m64 -Wa,--divide -o CMakeFiles/cryptopp-object.dir/cryptlib.cpp.o
-c /export/home/jwalton/cryptopp/cpu.cpp
Abbreviated link command
/bin/c++ CMakeFiles/cryptest.dir/bench1.cpp.o CMakeFiles/cryptest.dir/bench2.cpp.o
...
CMakeFiles/cryptest.dir/fipstest.cpp.o -o cryptest.exe libcryptopp.a -lnsl -lsocket
Typical link error
ld: fatal: file CMakeFiles/cryptopp-object.dir/cryptlib.cpp.o: wrong ELF class: ELFCLASS64
Notice the file was compiled with -march=native -m64 (its a 64-bit capable machine and kernel), but the link invocation is missing it (the default is 32-bit on Solaris).
Attempting to search for "cmake use CXXFLAGS link" is producing too much irrelevant noise, and I'm not having much luck finding the CMakeList.txt option. I also want to avoid duplicating the work into LDFLAGS, or performing the work of reformatting the options (CXXFLAGS option -Wl,-x becomes LDFLAGS option -x).
How do I instruct CMake to use both CXX and CXXFLAGS when driving link?
I found Running a different program for the linker on the CMake users mailing list, but it does not feel right to me (also, the problem and context are slightly different). It also does not work.
Here is a small example:
PROJECT(foo)
SET(CMAKE_CXX_LINK_EXECUTABLE
"purify <CMAKE_CXX_COMPILER> <CMAKE_CXX_LINK_FLAGS> <LINK_FLAGS> <FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>")
ADD_EXECUTABLE(foo foo.cxx)
I also found Setting global link flags on the mailing list. It does not work, either.
SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_CXX_FLAGS}")
SET(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_CXX_FLAGS}")
SET(CMAKE_MODULE_LINKER_FLAGS "${CMAKE_CXX_FLAGS}")
One of my users is getting an error message when trying to compile a C part of our mixed C/C++ codebase on ubuntu 12.04 with gcc 4.8.1
We have a library in C++ with some C-linkage functions in, and want to compile a C program linking to it. The library is compiled with g++ and builds fine. The c program fails like this:
> gcc -O3 -g -fPIC -I/media/Repo/lcdm/code/cosmosis/ -Wall -Wextra -pedantic -Werror -std=c99 -o c_datablock_t c_datablock_test.c -L . -lcosmosis
cc1plus: error: command line option ‘-std=c99’ is valid for C/ObjC but not for C++ [-Werror]
The program has a lower case .c file suffix, so why does gcc try to compile it as c++ ? We have not seen this on other OSes.
(I know we could kick the problem down the road by removing -Werror or handle this particular file with -x c but I'd like to solve the real problem.)
why does gcc try to compile it as c++
I can think of only two plausible explanations, and they both are end-user's fault.
It could be that the user transferred sources via Windows, and the file is really called C_DATABLOCK_TEST.C, and the user is misleading you.
It could also be that the user overwrote his gcc with g++ (surprisingly many people believe that gcc and g++ are the same thing, but they are not).
To disprove the first possibility, ask the user to execute his build commands under script, and send you resulting typescript.
To disprove the second, ask the user to add -v to the compile command.
This look like GCC Bug 54641, which has been fixed in a later release of GCC. It is only a warning, but your compile flags are causing GCC to treat all warnings as errors.
For a test I have written a code of matrix multiplication in C(cuda) and compiled it using nvcc to create shared library using following command.
nvcc -c MatMul.cu -o libmatmul.so
Then i wrote a OpenCV code in C and tried to compile with following command.
gcc ImgMul.c `pkg-config --cflags --libs opencv` -L. -L/usr/local/cuda/lib64 -I/usr/local/cuda/include -I. -lmatmul -lcudart -o ImgMul
and I am getting following error.
gputest.c:(.text+0x3f): undefined reference to `matmul'
Could anyone tell me how to include cuda libraries while compiling a code in gcc.
OS: Ubuntu
gcc : 4.4.0
The first point to make is that
nvcc -c MatMul.cu -o libmatmul.so
does not make a shared library, it just compiles to an object file. Shared libraries and object files are not at all the same thing.
That aside, the reason for the symbol not found error is C++ name mangling. Host code in CUDA source files is compiled using the host C++ compiler, not C. So symbol names in the host code emitted by the compiler are subject to name mangling. To get around this, the easiest way is to declare functions which you wish to call from plain C code using the extern "C" declarator (see here for a reasonable overview of the perils of C/C++ interoperability).