Can I use C++11 in the .cu-files (CUDA5.5) in Windows7x64 (MSVC) and Linux64 (GCC4.8.2)? - gcc

When I compile the following code containing the design C++11, in Windows7x64 (MSVS2012 + Nsight 2.0 + CUDA5.5), then I do not get errors, and everything compiles and works well:
#include <thrust/device_vector.h>
int main() {
thrust::device_vector<int> dv(10);
auto iter = dv.begin();
return 0;
}
But when I try to compile it under the Linux64 (Debian 7 Wheezey + Nsight Eclipse from CUDA5.5), I get errors:
../src/CudaCpp11.cu(5): error: explicit type is missing ("int"
assumed)
../src/CudaCpp11.cu(5): error: no suitable conversion function from
"thrust::detail::normal_iterator>" to "int"
exists
2 errors detected in the compilation of
"/tmp/tmpxft_00001520_00000000-6_CudaCpp11.cpp1.ii". make: *
[src/CudaCpp11.o] Error 2
When I added line:-stdc++11
in Properties-> Build-> Settings-> Tool Settings-> Build Stages-> Preprocessor options (-Xcompiler)
I get more errors:
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stddef.h(432): error:
identifier "nullptr" is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stddef.h(432): error:
expected a ";"
...
/usr/include/c++/4.8/bits/cpp_type_traits.h(314): error: namespace
"std::__gnu_cxx" has no member
"__normal_iterator"
/usr/include/c++/4.8/bits/cpp_type_traits.h(314): error: expected a
">"
nvcc error : 'cudafe' died due to signal 11 (Invalid memory
reference) make: * [src/CudaCpp11.o] Error 11
Only when I use thrust::device_vector<int>::iterator iter = dv.begin(); in Linux-GCC then I do not get an error. But in Windows MSVS2012 all c++11 features works fine!
Can I use C++11 in the .cu-files (CUDA5.5) in Windows7x64 (MSVC) and Linux64 (GCC4.8.2)?

You will probably have to split the main.cpp from your others.cu like this:
others.hpp:
void others();
others.cu:
#include "others.hpp"
#include <boost/typeof/std/utility.hpp>
#include <thrust/device_vector.h>
void others() {
thrust::device_vector<int> dv(10);
BOOST_AUTO(iter, dv.begin()); // regular C++
}
main.cpp:
#include "others.hpp"
int main() {
others();
return 0;
}
This particular answer shows that compiling with an officially supported gcc version (as Robert Crovella stated correctly) should work out at least for c++11 code in the main.cpp file:
g++ -std=c++0x -c main.cpp
nvcc -arch=sm_20 -c others.cu
nvcc -lcudart -o test main.o others.o
(tested on Debian 8 with nvcc 5.5 and gcc 4.7.3).
To answer your underlying question: I am not aware that one can use C++11 in .cu files with CUDA 5.5 in Linux (and I was not aware the shown example with host-side C++11 gets properly de-cluttered under MSVC). I even filed a feature request for constexpr support which is still open.
The CUDA programming guide for CUDA 5.5 states:
For the host code, nvcc supports whatever part of the C++ ISO/IEC
14882:2003 specification the host c++ compiler supports.
For the device code, nvcc supports the features illustrated in Code
Samples with some restrictions described in Restrictions; it does not
support run time type information (RTTI), exception handling, and the
C++ Standard Library.
Anyway, it is possible to use some of the C++11 features like auto in kernels, e.g. with boost::auto.
As an outlook, other C++11 features like threads may be quite unlikely to end up in CUDA and I heard no official plans about them yet (as of supercomputing 2013).
Shameless plug: If you are interested in more of these tweeks, feel free to have a look in our library libPMacc which provides multi-GPU grid and particle abstractions for simulations. We implemented lambda, a STL-like access concept for 1-3D matrices and other useful stuff there.
All the best,
Axel
Update: Since CUDA 7.0 C++11 support in kernels has been added officially. As BenC pointed our correctly, parts of this feature were already silently added in CUDA 6.5.

According to Jared Hoberock (Thrust developer), it seems that C++11 support has been added to CUDA 6.5 (although it is still experimental and undocumented). This may make things easier when starting to use C++11 in very large C++/CUDA projects, since splitting everything can be quite cumbersome for large projects when you use CMake for instance.

Related

Undocumented ABI changes of std::function between GCC-4 and GCC-5/6/7/8/9, how to make a .so working with devtoolset-4/6/7/8/9?

with _GLIBCXX_USE_CXX11_ABI=0
std::function of GCC-4 is different of GCC-5 and follwing versions.
The following code show you the fact:
==> lib.cc <==
#include <functional>
std::function<int(const void*p)> holder;
int run_holder(const void *p)
{
return holder(p);
}
==> main.cc <==
#include <stdio.h>
#include <functional>
extern int run_holder(const void*p);
extern std::function<int(const void*p)> holder;
int foo(const void* p)
{
printf("p=%p\n", p);
return 0;
}
int main()
{
holder = foo;
foo((void*)0x12345678);
holder((void*)0x12345678);
run_holder((void*)0x12345678);
}
==> make.sh <==
#!/bin/bash
GCC4=/usr/bin/g++
GCCN="scl enable devtoolset-5 -- g++"
$GCC4 -std=c++11 -c -g lib.cc -shared -o libfoo.so &&
$GCCN -std=c++11 -L. -lfoo -g main.cc -o a.out &&
LD_LIBRARY_PATH=. ./a.out
expected result, something like:
p=0x12345678
p=0x12345678
p=0x12345678
actual result:
p=0x12345678
./make.sh: line 6: 973 Segmentation fault LD_LIBRARY_PATH=. ./a.out
The reason is the implementation of std::function changes without document.
gcc4: /usr/include/c++/4.8.2/functional:2430
typedef _Res (*_Invoker_type)(const _Any_data&, _ArgTypes...);
gcc5: /opt/rh/devtoolset-4/root/usr/include/c++/5.3.1/functional:2226
gcc8: /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:609
using _Invoker_type = _Res (*)(const _Any_data&, _ArgTypes&&...);
So I can not write a .so using std::function compiled by gcc4 and used by gcc5/6/7/8. There is no macro like _GLIBCXX_USE_CXX11_ABI can control the behavior.
So I can not write a .so using std::function compiled by gcc4 and used by gcc5/6/7/8.
Correct. Neither Red Hat nor the GCC project has ever claimed that's possible, quite the opposite. C++11 support in GCC 4.x was incomplete and unstable, and subject to ABI changes and API changes. What you're trying to do was never supported.
I've explained this in more detail at https://stackoverflow.com/a/49119902/981959
The Developer Toolset documentation covers it too (emphasis mine):
"A compiler in C++11 or C++14 mode is only guaranteed to be compatible with another compiler in C++11 or C++14 mode if they are from the same release series (for example from Red Hat Developer Toolset 6.x).
...
"Using the C++14 language version is supported in Red Hat Developer Toolset when all C++ objects compiled with the respective flag have been built using Red Hat Developer Toolset 6 or later. Objects compiled by the system GCC in its default mode of C++98 are also compatible, but objects compiled with the system GCC in C++11 or C++14 mode are not compatible."
There is no macro like _GLIBCXX_USE_CXX11_ABI can control the behavior.
We do not provide macros to control things that are unsupported and cannot work.
If you want to use C++11 with a mix of GCC versions you need to use a release that has stable, non-experimental support for C++11. So not GCC 4.x.

CUDA error with Boost make_shared.hpp? [duplicate]

I'm trying to integrate CUDA to an existing aplication wich uses boost::spirit.
Isolating the problem, I've found out that the following code does not copile with nvcc:
main.cu:
#include <boost/spirit/include/qi.hpp>
int main(){
exit(0);
}
Compiling with nvcc -o cudaTest main.cu I get a lot of errors that can be seen here.
But if I change the filename to main.cpp, and compile again using nvcc, it works. What is happening here and how can I fix it?
nvcc sometimes has trouble compiling complex template code such as is found in Boost, even if the code is only used in __host__ functions.
When a file's extension is .cpp, nvcc performs no parsing itself and instead forwards the code to the host compiler, which is why you observe different behavior depending on the file extension.
If possible, try to quarantine code which depends on Boost into .cpp files which needn't be parsed by nvcc.
I'd also make sure to try the nvcc which ships with the recent CUDA 4.1. nvcc's template support improves with each release.

Undefined reference when linking with Boost using g++-4.9 on a g++-5-ish distribution

I've written the following groundbreaking application:
#include <boost/program_options.hpp>
int main(int argc, char** argv)
{
boost::program_options::options_description generic_options("foo");
return 0;
}
I am trying to build this on Debian Stretch, on which the default compiler is GCC 5.3.1-5, and the installed version of Boost is 1.58.0.
However, for reasons which I will not go into here (which would be apparent had this not been a MCVE), I need to compile and link the binary using g++-4.9, not g++-5.3. Compiling works fine, but when I try to link, this is what happens:
/usr/bin/g++-4.9 -Wall -std=c++11 CMakeFiles/tester3.dir/src/altmain2.cpp.o -o bin/tester3 -rdynamic -lboost_log -lboost_system -lboost_program_options
CMakeFiles/tester3.dir/src/altmain2.cpp.o: In function `main':
altmain2.cpp:(.text+0x61): undefined reference to `boost::program_options::options_description::options_description(std::string const&, unsigned int, unsigned int)'
Is this due to some ABI incompatibility between gcc 4 and 5?
If so, is there any way to get around it other than building my own version of Boost?
If not, what could cause this?
Is this due to some ABI incompatibility between gcc 4 and 5?
Looks like it is. By default GCC 5 uses a new ABI for several important standard library classes, including std::basic_string template (and thus also std::string). Unfortunately it would be impossible to make std::string fully conform to C++11 and later without breaking the ABI.
libstdc++ implements dual ABI, so that binaries compiled with older versions of GCC will link (and work correctly) with the new library. See this post by Jason Merrill (the maintainer of GCC's C++ front end) for details.
If so, is there any way to get around it other than building my own
version of Boost?
Probably not. Boost depends on the new implementation of std::string and does not provide backward compatibility (unlike libstdc++). This problem is mentioned in Debian bugtracker.

c++ thread-local storage clang-503.0.40 (Mac OSX)

After I declared a variable in this way:
#include <thread>
namespace thread_space
{
thread_local int s;
} //etc.
i tried to compile my code using 'g++ -std=c++0x -pthread [sourcefile]'. I get the following error:
example.C:6:8: error: thread-local storage is unsupported for the current target
static thread_local int s;
^
1 error generated.
If i try to compile the same code on Linux with GCC 4.8.1 whit the same flags, i get a functioning executable file. I'm using clang-503.0.40 (the one which comes with Xcode 5.1.1) on a MacBook Pro running OSX 10.9.3. Can anybody explain me what i'm doing wrong?
Thank you!!
Try clang++ -stdlib=libc++ -std=c++11. OS X's outdated libstdc++ doesn't support TLS.
Edit
Ok, this works for the normal clang version but not for the Xcode one.
I did a diff against Apple's clang (503.0.38) and the normal released one and found the following difference:
.Case("cxx_thread_local",
- LangOpts.CPlusPlus11 && PP.getTargetInfo().isTLSSupported() &&
- !PP.getTargetInfo().getTriple().isOSDarwin())
+ LangOpts.CPlusPlus11 && PP.getTargetInfo().isTLSSupported())
So I think this is a bug in Apple's clang version (or they kept it in there on purpose - but still weird, because -v says based on 3.4).
Alternatively, you can use compiler extensions such as __thread (GCC/Clang) or __declspec(thread) (Visual Studio).
Wrap it in a macro and you can easily port your code across different compilers and language versions:
#if HAS_CXX11_THREAD_LOCAL
#define ATTRIBUTE_TLS thread_local
#elif defined (__GNUC__)
#define ATTRIBUTE_TLS __thread
#elif defined (_MSC_VER)
#define ATTRIBUTE_TLS __declspec(thread)
#else // !C++11 && !__GNUC__ && !_MSC_VER
#error "Define a thread local storage qualifier for your compiler/platform!"
#endif
...
ATTRIBUTE_TLS static unsigned int tls_int;
The clang compiler included in the Xcode 8 Beta and GM releases supports the C++11 thread_local keyword with both -std=c++11 and -std=c++14 (as well as the GCC variants).
Earlier versions of Xcode apparently supported C-style thread local storage using the keywords __thread or _Thread_local, according to the WWDC 2016 video "What's New in LLVM" (see the discussion beginning at 5:50).
Seems like you might need to set the minimum OS X version you target to 10.7 or higher.

Compiling CUDA with dynamic parallelism fallback - multiple architectures/compute capability

In one application, I've got a bunch of CUDA kernels. Some use dynamic parallelism and some don't. For the purposes of either providing a fallback option if this is not supported, or simply allowing the application to continue but with reduced/partially available features, how can I go about compiling?
At the moment I'm getting invalid device function when running kernels compiled with -arch=sm_35 on a 670 (max sm_30) that don't require compute 3.5.
AFAIK you can't use multiple -arch=sm_* arguments and using multiple -gencode=* doesn't help. Also for separable compilation I've had to create an additional object file using -dlink, but this doesn't get created when using compute 3.0 (nvlink fatal : no candidate found in fatbinary due to -lcudadevrt, which I've needed for 3.5), how should I deal with this?
I believe this issue has been addressed now in CUDA 6.
Here's my simple test:
$ cat t264.cu
#include <stdio.h>
__global__ void kernel1(){
printf("Hello from DP Kernel\n");
}
__global__ void kernel2(){
#if __CUDA_ARCH__ >= 350
kernel1<<<1,1>>>();
#else
printf("Hello from non-DP Kernel\n");
#endif
}
int main(){
kernel2<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
$ nvcc -O3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35 -rdc=true -o t264 t264.cu -lcudadevrt
$ CUDA_VISIBLE_DEVICES="0" ./t264
Hello from non-DP Kernel
$ CUDA_VISIBLE_DEVICES="1" ./t264
Hello from DP Kernel
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Sat_Jan_25_17:33:19_PST_2014
Cuda compilation tools, release 6.0, V6.0.1
$
In my case, device 0 is a Quadro5000, a cc 2.0 device, and device 1 is a GeForce GT 640, a cc 3.5 device.
I don't believe there is a way to do this using the runtime API as of CUDA 5.5.
The only way I can think of to get around the problem is to use the driver API to perform your own architecture selection and load code from different cubin files at runtime. The APIs can be safely mixed, so it is only the context establishment-device selection-module load phase which needs to be done with the driver API. You can use the runtime API after that - you will need a little bit of homemade syntactic sugar for the kernel launches, but otherwise no code changes are required in other runtime API code.

Resources