Setting up Visual Studio Intellisense for CUDA kernel calls - visual-studio-2010

I've just started CUDA programming and it's going quite nicely, my GPUs are recognized and everything. I've partially set up Intellisense in Visual Studio using this extremely helpful guide here:
and here:
However, Intellisense still doesn't pick up on kernel calls like this:
#include <iostream>
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
__global__ void kernel(void){}
int main()
return 0;
The line kernel<<<1,1>>>() is underlined in red, specifically the one arrow to the left of the first one with the error reading "Error: expected and expression". However, if I hover over the function, its return type and parameters are displayed properly. It still compiles just fine, I'm just wondering how to get rid of this little annoyance.

Wow, lots of dust on this thread. I came up with a macro fix (well, more like workaround...) for this that I thought I would share:
// nvcc does not seem to like variadic macros, so we have to define
// one for each kernel parameter list:
#ifdef __CUDACC__
#define KERNEL_ARGS2(grid, block) <<< grid, block >>>
#define KERNEL_ARGS3(grid, block, sh_mem) <<< grid, block, sh_mem >>>
#define KERNEL_ARGS4(grid, block, sh_mem, stream) <<< grid, block, sh_mem, stream >>>
#define KERNEL_ARGS2(grid, block)
#define KERNEL_ARGS3(grid, block, sh_mem)
#define KERNEL_ARGS4(grid, block, sh_mem, stream)
// Now launch your kernel using the appropriate macro:
kernel KERNEL_ARGS2(dim3(nBlockCount), dim3(nThreadCount)) (param1);
I prefer this method because for some reason I always lose the '<<<' in my code, but the macro gets some help via syntax coloring :).

Visual Studio provides IntelliSense for C++, the trick from the rocket scientist's blog is basically relying on the similarity CUDA-C has to C++, nothing more.
In the C++ language, the proper parsing of angle brackets is troublesome. You've got < as less than and for templates, and << as shift, remember not long ago when we had to put a space in between nested template declarations.
So it turns out that the guy at NVIDIA who came up with this syntax was not a language expert, and happened to choose the worst possible delimiter, then tripled it, well, you're going to have trouble. It's amazing that Intellisense works at all when it sees this.
The only way I know to get full IntelliSense in CUDA is to switch from the Runtime API to the Driver API. The C++ is just C++, and the CUDA is still (sort of) C++, there is no <<<>>> badness for the language parsing to have to work around.

From VS 2015 and CUDA 7 onwards you can add these two includes before any others, provided your files have the .cu extension:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
No need for MACROS or anything. Afterwards everything will work perfectly.

I LOVED Randy's solution. I'll match and raise using C preprocessor variadic macros:
#define CUDA_KERNEL(...)
#define CUDA_KERNEL(...) <<< __VA_ARGS__ >>>
Usage examples:

I've been learning CUDA and have encountered that exact issue. As others have said, it's just Intellisense problem and can be ignored, but I've found a clean solution which actually removes it.
It seems that <<< >>> is interpreted as a correct code if it's inside a template function.
I've discovered it accidentally when I wanted to create wrappers for kernels to be able to call them from a regular cpp code. It's both a nice abstraction and removes the syntax error.
kernel header file (eg. kernel.cuh)
const size_t THREADS_IN_BLOCK = 1024;
typedef double numeric_t;
// sample kernel function headers
__global__ void sumKernel(numeric_t* out, numeric_t* f, numeric_t* blockSum, size_t N);
__global__ void expKernel(numeric_t* out, numeric_t* in, size_t N);
// ..
// strong-typed wrapper for a kernel with 4 arguments
template <typename T1, typename T2, typename T3, typename T4>
void runKernel(void (*fun)(T1, T2, T3, T4), int Blocks, T1 arg1, T2 arg2, T3 arg3, T4 arg4) {
fun <<<Blocks, THREADS_IN_BLOCK >>> (arg1, arg2, arg3, arg4);
// strong-typed wrapper for a kernel with 3 arguments
template <typename T1, typename T2, typename T3>
void runKernel(void (*fun)(T1, T2, T3), int Blocks, T1 arg1, T2 arg2, T3 arg3) {
fun <<<Blocks, THREADS_IN_BLOCK >>> (arg1, arg2, arg3);
// ...
// the one-argument fun cannot have implementation here
void runKernel(void (*fun)(), int Blocks);
in a .cu file (you will get a syntax error here, but do u ever need a parameter-less kernel function? if not, this and a respective header can be deleted):
void runKernel(void (*fun)(), int Blocks) {
fun <<<Blocks, THREADS_IN_BLOCK >>> ();
usage in a .cpp file:
runKernel(kernelFunctionName, arg1, arg2, arg3);
// for example runKernel(expKernel, B, output, input, size);


std::vector of type deduced from initializers before C++17 ... any workaround for C++11?

I learned that from C++17, with the deduction guides, template arguments of std::vector can be deduced e.g. from the initialization:
std::vector vec = { function_that_calculate_and_return_a_specifically_templated_type() }
However I do not have the luxury of C++17 in the machine where I want to compile and run the code now.
Is there any possible workaround for C++11? If more solutions exist, the best would be the one that keep the readability of the code.
At the moment the only idea that I have is to track the various cases along the code (luckily they should not be too many) and make some explicit typedef/using.
Any suggestion is very welcome
The usual way to use type deduction for class template when CTAD is not available is providing a make_* function template, e.g. for your case (trailing return type is necessary for C++11):
#include <vector>
#include <type_traits>
#include <tuple>
template <class ...Args>
auto make_vec(Args&&... args) ->
std::vector<typename std::decay<typename std::tuple_element<0, std::tuple<Args...>>::type>::type>
using First = typename std::decay<typename std::tuple_element<0, std::tuple<Args...>>::type>::type;
return std::vector<First>{std::forward<Args>(args)...};
You can invoke the above with
const auto v = make_vec(1, 2, 3);
which gets at least kind of close to CTAD in the sense that you don't have to explicitly specify the vector instantiation.
While the answer by lubgr is a correct way, the following template is simpler and seems to work as well:
#include <vector>
#include <string>
template <typename T>
std::vector<T> make_vec(const std::initializer_list<T> &list)
return std::vector<T>(list);
int main()
auto v = make_vec({1,2,3});
auto v2 = make_vec({std::string("s")});
std::string s("t");
auto v3 = make_vec({s});
return v.size() + v2.size() + v3.size();
One advantage of using the initializer_list template directly are more clear error messages if you pass mixed types like in make_vec({1,2,"x"});, because the construction of the invalid initializer list now happens in non-templated code.

C++11, Is it possible to force an instance to be extern but also a constant expression of a non-type template parameter?

Using C++11, g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18).
Lets pretend I have a templated function (pardon my terminology if it isn't quite right).
I want to perform a "general" algorithm based on what was supposed to be compile-time instances of "field". Where the only things that really changed are these constants which I moved into trait classes (only added one here but imagine there are more). Originally I was declaring it as
constexpr field FIELD1{1};
However in C++11, non-type template params need to have external linkage (unlike C++14 which can have internal and external linkage?). So because not's in the same translation unit I needed to use extern in order to give it external linkage (sorry if I butchered that explanation also). But by defining it extern I can't define it using constexpr and it seems that losing that constexpr constructor this field is no longer a valid constant expression to qualify as a non-type template param.
Any suggestions if there is some way I can get around this? Open to a new method of doing things. Below is a simplified (incomplete, and non-compiling version to get the gist of the organization).
So the error I am seeing is along the lines of
error: the value of ‘FIELD1’ is not usable in a constant expression
note: ‘FIELD1’ was not declared ‘constexpr’
extern const field FIELD1;
Not quite sure what could be a best alternative.
I can get rid of the second error by removing the constexpr from the constructor. But then I don't know how to approach the constant expression issue.
struct field
int thingone;
constexpr field(int i):thingone(i){}
extern const field FIELD1;
#include "field.H"
const field FIELD1{0};
#include "field.H"
template< const field& T >
class fieldTraits;
template< >
class fieldTraits<FIELD1>
// Let's say I have common field names
// with different constants that I want to plug
// into the "function_name" algorithm
static constexpr size_t field_val = 1;
#include "field.H"
template< const field& T, typename TT = fieldTraits<T> >
void function_name()
// Let's pretend I'm doing something useful with that data
std::cout << T.thingone << std::endl;
std::cout << TT::field_val << std::endl;
So because not's in the same translation unit I needed to use extern in order to give it external linkage (sorry if I butchered that explanation also). But by defining it extern I can't define it using constexpr [...]
Per my comment, you can. It wouldn't work for you, but it's a step that helps in coming up with something that would work:
extern constexpr int i = 10;
This is perfectly valid, gives i external linkage, and makes i usable in constant expressions.
But it doesn't allow multiple definitions, so it can't work in a header file which is included in multiple translation units.
Ordinarily, the way around that is with inline:
extern inline constexpr int i = 10;
But variables cannot be declared inline in C++11.
Except... when they don't need to be declared inline because the effect has already been achieved implicitly:
struct S {
static constexpr int i = 10;
Now, S::i has external linkage and is usable in constant expressions!
You may not even need to define your own class for this, depending on the constant's type: consider std::integral_constant. You can write
using i = std::integral_constant<int, 10>;
and now i::value will do exactly what you want.

Different compiler behavior with C++11

The following code
#include <vector>
#include <complex>
#include <algorithm>
template<class K>
inline void conjVec(int m, K* const in) {
static_assert(std::is_same<K, double>::value || std::is_same<K, std::complex<double>>::value, "");
if(!std::is_same<typename std::remove_pointer<K>::type, double>::value)
#ifndef OK
std::for_each(in, in + m, [](K& z) { z = std::conj(z); });
std::for_each(reinterpret_cast<std::complex<double>*>(in), reinterpret_cast<std::complex<double>*>(in) + m, [](std::complex<double>& z) { z = std::conj(z); });
int main(int argc, char* argv[]) {
std::vector<double> nums;
return 0;
compiles fine on Linux with
Debian clang version 3.5.0-9
gcc version 4.9.1
icpc version 15.0.1
and on Mac OS X with
gcc version 4.9.2
but not with
icpc version 15.0.1
except if the macro OK is defined. I don't know which are the faulty compilers, could someone let me know ? Thanks.
PS: here is the error
10:48: error: assigning to 'double' from incompatible type 'complex<double>'
std::for_each(in, in + m, [](K& z) { z = std::conj(z); });
The difference is that on Linux, you're using libstdc++ and glibc, and on MacOS you're using libc++ and whatever CRT MacOS uses.
The MacOS version is correct. (Also, your workaround is completely broken and insanely dangerous.)
Here's what I think happens.
There are multiple overloads of conj in the environment. C++98 brings in a single template, which takes a std::complex<F> and returns the same type. Because this template needs F to be deduced, it doesn't work when calling conj with a simple floating point number, so C++11 added overloads of conj which take float, double and long double, and return the appropriate std::complex instantiation.
Then there's a global function from the C99 library, ::conj, which takes a C99 double complex and returns the same.
libstdc++ doesn't yet provide the new C++11 conj overloads, as far as I can see. The C++ version of conj isn't called. It appears, however, that somehow ::conj found its way into the std namespace, and gets called. The double you pass is implicitly converted to a double complex by adding a zero imaginary part. conj negates that zero. The result double complex is implicitly converted back to a double by discarding the imaginary component. (Yes, that's an implicit conversion in C99. No, I don't know what they were thinking.) The result can be assigned to z.
libc++ provides the new overloads. The one taking a double is chosen. It returns a std::complex<double>. This class has no implicit conversion to double, so the assignment to z gives you an error.
The bottom line is this: your code makes absolutely no sense. A vector<double> isn't a vector<complex<double>> and shouldn't be treated as one. Calling conj on double doesn't make sense. Either it doesn't compile, or it's a no-op. (libc++'s conj(double) is in fact implemented by simply constructing a complex<double> with a zero imaginary part.) And wildly reinterpret_casting your way around compile errors is horrible.
Sebastian Redl's answer explains why your code didn't compile with libc++ but did with libstdc++. if is not the static if that exists in some languages; even if the code in an if branch is 100% dead, it must still be valid code.
In any event, this feels like a massive amount of unnecessary complexity to me. Not everything has to be a template. Especially when your template can only be used with two types, and when used with one of those two it's a no-op.
template<class K>
inline void conjVec(int m, K* const in) {
static_assert(std::is_same<K, double>::value || std::is_same<K, std::complex<double>>::value, "");
if(!std::is_same<K, double>::value)
std::for_each(reinterpret_cast<std::complex<double>*>(in), reinterpret_cast<std::complex<double>*>(in) + m, [](std::complex<double>& z) { z = std::conj(z); });
inline void conjVec(int m, double* const in) {}
inline void conjVec(int m, std::complex<double>* const in) {
std::for_each(in, in + m, [](std::complex<double>& z) { z = std::conj(z); });
I know which one I would prefer.

Getting c++11 auto initialization syntax right

I am newbie programmer in C++ (but a veteran programmer in other languages) and I am trying to use "Modern C++" in my code.
I am wondering what I am doing wrong here, trying to initialize an istream from a boost::asio::streambuf:
#include <iostream>
#include <boost/asio/streambuf.hpp>
class A {
void foo();
boost::asio::streambuf cmdStreamBuf_{};
void A::foo() {
std::istream is1{&cmdStreamBuf_}; // works
auto is2 = std::istream{&cmdStreamBuf_}; // does not compile
I get this error:
try.cpp:13:41: error: use of deleted function 'std::basic_istream<char>::basic_istream(const std::basic_istream<char>&)'
I am not trying to copy; I thought I was constructing an std::istream!
Since all the answers were in the comments, I thought I'd finish this off by doing an official answer myself.
I am using a c++ library that doesn't have movable streams, and this matters because
auto is2 = std::istream{&cmdStreamBuf_};
creates a new std::istream and then initializes is2 with that rvalue (temporary object). It initializes it by calling the copy constructor or the move constructor. My c++ library apparently does not have either of these constructors, therefore the call fails.
I had originally thought that
auto varname = typename{...};
was the conceptually the same as
typename varname{...};
but it is not. So, this is an instance where you can't use auto to create a variable.
(sigh) And I was really hyped on using auto everywhere.

Is there a way to use _T/TEXT "conditionally" inside a macro?

This question is specific to Visual C++ (you may assume Visual C++ 2005 and later).
I would like to create glue code for a program from unixoid systems (FreeBSD in particular) in order build and run on Win32 with a minimum of changes to the original .c file. Now, most of this is straightforward, but now I ran into an issue. I am using the tchar.h header and TCHAR/_TCHAR and need to create glue code for the err and errx calls (see err(3)) in the original code. Bear with me, even if you don't agree that code using tchar.h should still be written.
The calls to err and errx take roughly two forms, and this is where the problem occurs:
err(1, "some string with %d format specifiers and %s such", ...)
/* or */
err(1, NULL, ...)
The latter would output the error stored in errno (using strerror).
Now, the question is, is there any way to write a generic macro that can take both NULL and a string literal? I do not have to (and will not) care about variables getting passed as the second parameter, only NULL and literal strings.
Of course my naive initial approach didn't account for fmt passed as NULL (using variadic macros!):
#define err(eval, fmt, ...) my_err(eval, _T(fmt), __VA_ARGS__)
Right now I don't have any ideas how to achieve this with macros, because it would require a kind of mix of compile-time and runtime conditionals that I cannot imagine at the moment. So I am looking for an authoritative answer whether this is conceivable at all or not.
The method I am resorting to right now - lacking a better approach - is to write a wrapper function that accepts, just like err and errx, a (const) char * and then converting that to wchar_t * if compiled with _UNICODE defined. This should work, assuming that the caller passes a _TCHAR* string for the variable arguments after the fmt (which is a sane assumption in my context). Otherwise I'd also have to convert %s to %hs inside the format string, to handle "ANSI" strings, as MS calls them.
Here's one solution:
#define _WIN32_WINNT 0x0502
#include <windows.h>
#include <stdio.h>
#include <tchar.h>
#ifdef _UNICODE
#define LNULL NULL
#define fn(x) myfn(L ## x)
#define fn(x) myfn(x)
void myfn(TCHAR * str)
if (str == NULL) _tprintf(_T("<NULL>")); else _tprintf(str);
int main(int argc, char ** argv)
return 0;
