Cuda Project Not Compiling - visual-studio-2010

I have compiled my cuda project using visual studio 2010. I have countered an error stated:
student_func.cu(65): error C2059: syntax error : '<'
The line where error occurs is when the kernel function is called:
rgba_to_greyscale<<< gridSize, blockSize >>>(d_rgbaImage, d_greyImage, numRows, numCols);
and here is the code for student_func.cu:
#include "reference_calc.cpp"
#include "utils.h"
#include <stdio.h>
__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
unsigned char* const greyImage,
int numRows, int numCols)
{
}
void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
unsigned char* const d_greyImage, size_t numRows, size_t numCols)
{
//You must fill in the correct sizes for the blockSize and gridSize
//currently only one block with one thread is being launched
const dim3 blockSize(1, 1, 1); //TODO
const dim3 gridSize( 1, 1, 1); //TODO
rgba_to_greyscale<<< gridSize, blockSize >>>(d_rgbaImage, d_greyImage, numRows, numCols);
cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}

Please, have first a look at this guide on how to integrate CUDA in a Visual Studio C++ project.
Also, you should organize the code so that:
.h, .cpp, .c, .hpp files should not contain CUDA code (like __device__ functions and the kernel call in your case). However, in these files you can call CUDA APIs (for example, cudaMalloc, cudaMemcpy, etc.). These files are compiled by a compiler other than NVCC.
.cuh, .cu files should contain the CUDA code. These files are compiled by NVCC.
As an example, suppose to have a GPU-based FDTD code. I usually do the following (Visual Studio 2010).
main.cpp file, including CPU-GPU memory transfers;
FDTD.cu, including an extern "C" void E_update(...) function which contains the kernel <<< >>> call;
main.h file, including the extern "C" void E_update(...) prototype;
FDTD.cuh, including the __global__ void E_update_kernel(...) function.

Related

Shared library undefined reference for CUDA Kernel wrapper

So I am attempting to use the CUDA Runtime API with Go's cgo on Windows. I've been at this for a few days now and am stuck: I am getting an undefined reference to my kernel wrapper.
I have separated out my kernel and it's wrapper into the following
FILE: cGo.cuh
typedef unsigned long int ktype;
typedef unsigned char glob;
/*
function Prototypes
*/
extern "C" void kernel_kValid(int, int, ktype *, glob *);
__global__ void kValid(ktype *, glob *);
FILE: cGo.cu
#include "cGo.cuh"
#include "device_launch_parameters.h"
#include "cuda.h"
#include "cuda_runtime.h"
//function Definitions
/*
kernel_kValid is a wrapper function for the CUDA Kernel to be called from Go
*/
extern "C" void kernel_kValid(int blocks, int threads, ktype *kInfo, glob *values) {
kValid<<<blocks, threads>>>(kInfo, values);//execute the kernel
}
/*
kValid is the CUDA Kernel which is to be executed
*/
__global__ void kValid(ktype *kInfo, glob *values) {
//lots of code
}
I compile my CUDA source code into a shared library as such:
nvcc -shared -o myLib.so cGo.cu
then I have created a header file to include in my cgo
FILE: cGo.h
typedef unsigned long int ktype;
typedef unsigned char glob;
/*
function Declarations
*/
void kernel_kValid(int , int , ktype *, glob *);
Then from the go package I utilize cgo to call my kernel wrapper I have
package cuda
/*
#cgo LDFLAGS: -LC:/Storage/Cuda/lib/x64 -lcudart //this is the Cuda library
#cgo LDFLAGS: -L${SRCDIR}/lib -lmyLib //this is my shared library
#cgo CPPFLAGS: -IC:/Storage/Cuda/include //this contains cuda headers
#cgo CPPFLAGS: -I${SRCDIR}/include //this contains cGo.h
#include <cuda_runtime.h>
#include <stdlib.h>
#include "cGo.h"
*/
import "C"
func useKernel(){
//other code
C.kernel_kValid(C.int(B), C.int(T), unsafe.Pointer(storageDevice), unsafe.Pointer(globDevice))
cudaErr, err = C.cudaDeviceSynchronize()
//rest of the code
}
So all of the calls to the CUDA runtime API don't throw errors, it's only my kernel wrapper. This is the output when I build the cuda package with go.
C:\Users\user\Documents\Repos\go\cuda_wrapper>go build cuda_wrapper\cuda
# cuda_wrapper/cuda
In file included from C:/Storage/Cuda/include/host_defines.h:50:0,
from C:/Storage/Cuda/include/device_types.h:53,
from C:/Storage/Cuda/include/builtin_types.h:56,
from C:/Storage/Cuda/include/cuda_runtime.h:86,
from C:\Go\workspace\src\cuda_wrapper\cuda\cuda.go:12:
C:/Storage/Cuda/include/crt/host_defines.h:84:0: warning: "__cdecl" redefined
#define __cdecl
<built-in>: note: this is the location of the previous definition
# cuda_wrapper/cuda
C:\Users\user\AppData\Local\Temp\go-build038297194\cuda_wrapper\cuda\_obj\cuda.cgo2.o: In function `_cgo_440ebb0a3e25_Cfunc_kernel_kValid':
/tmp/go-build\cuda_wrapper\cuda\_obj/cgo-gcc-prolog:306: undefined reference to `kernel_kValid'
collect2.exe: error: ld returned 1 exit status
It's here I'm not really sure what's wrong. I have been looking at questions asked about undefined references with cgo but nothing I have found has solved my issue. I have also been looking at the fact that the CUDA runtime API is written in C++ and if that would affect how cgo will compile this but again I haven't found anything conclusive. At this point I think I have confused myself more than anything else so I'm hoping someone more knowledgeable can point me in the right direction.
Good catch on the name manlging.
Here's a solution we used for gorgonia:
#include <math.h>
#ifdef __cplusplus
extern "C" {
#endif
__global__ void sigmoid32(float* A, int size)
{
int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z;
int idx = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x;
if (idx >= size) {
return;
}
A[idx] = 1 / (1 + powf((float)(M_E), (-1 * A[idx])));
}
#ifdef __cplusplus
}
#endif
So... just wrap your kernel wrapper function in extern "C"

Code with std::bind not compiling, error C2780: expects 6 arguments - 8 provided

Consider the code:
#include <functional>
#include <vector>
#include <stdint.h>
class CFileOperationWatcher
{
public:
CFileOperationWatcher() {}
virtual void onProgressChanged(uint64_t sizeProcessed, uint64_t totalSize, size_t numFilesProcessed, size_t totalNumFiles, uint64_t currentFileSizeProcessed, uint64_t currentFileSize) {}
virtual ~CFileOperationWatcher() {}
void onProgressChangedCallback(uint64_t sizeProcessed, uint64_t totalSize, size_t numFilesProcessed, size_t totalNumFiles, uint64_t currentFileSizeProcessed, uint64_t currentFileSize) {
_callbacks.emplace_back(std::bind(&CFileOperationWatcher::onProgressChanged, this, sizeProcessed, totalSize, numFilesProcessed, totalNumFiles, currentFileSizeProcessed, currentFileSize));
}
protected:
std::vector<std::function<void ()> > _callbacks;
};
int main(int argc, char *argv[])
{
CFileOperationWatcher w;
w.onProgressChangedCallback(0,0,0,0,0,0);
}
I'm getting an error C2780 in Visual Studio 2012. Looks like there's no std::bind definition that can take that many arguments. But isn't it supposed to use variadic templates and accept any number of args?
MSVC++ 2012 has fake variadic templates that rely on macro machinery. By default, they only work for up to 5 parameters. If you need more, you can use _VARIADIC_MAX to up it as high as 10 parameters.
Here's a similar question.
VC++ added variadic templates in the 2013 version.
This compiles fine with clang (3.3), so it must be a compiler bug with VS2012.
And yes, you're correct: std::bind() is a variadic template (C++11), see also http://en.cppreference.com/w/cpp/utility/functional/bind
Visual Studio 2012 has no support for variadic templates. Visual Studio 2013 will on the other hand according to this.

Separating out .cu and .cpp(using c++11 library)

I am trying to convert a c++ program I have which uses random library which is a C++11 feature. After having read through a couple of similar posts here, I tried by separating out the code into three files. At the outset I would like to say that I am not very conversant at C/C++ and mostly use R at work.
The main file looks as follows.
#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_
#include <complex>
#include <random>
#include <iostream>
#include "my_code_header.h"
using namespace std;
std::default_random_engine generator;
std::normal_distribution<double> distribution(0.0,1.0);
const int rand_mat_length = 24561;
double rand_mat[rand_mat_length];// = {0};
void create_std_norm(){
for(int i = 0 ; i < rand_mat_length ; i++)
::rand_mat[i] = distribution(generator);
}
.
.
.
int main(void)
{
...
...
call_global();
return 0;
}
#endif
The header file looks as follows.
#ifndef mykernel_h
#define mykernel_h
void call_global();
void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width);
#endif
And the .cu file looks like the following.
#ifndef _MY_KERNEL_
#define _MY_KERNEL_
#include <iostream>
#include "my_code_header.h"
#define TILE_WIDTH 8
using namespace std;
__global__ void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width)
{
unsigned int row = blockIdx.y*blockDim.y + threadIdx.y;
unsigned int col = blockIdx.x*blockDim.x + threadIdx.x;
if ((row>length) || (col>width)) {
return;
}
...
}
void call_global()
{
const size_t imageLength = 528;
const size_t imageWidth = 528;
const dim3 threadsPerBlock(TILE_WIDTH,TILE_WIDTH);
const dim3 numBlocks(((imageLength) / threadsPerBlock.x), ((imageWidth) / threadsPerBlock.y));
double *d_a, *d_b, *mys ;
...
cudaMalloc((void**)&d_a, sizeof(double) * imageLength);
cudaMalloc((void**)&d_b, sizeof(double) * imageWidth);
cudaMalloc((void**)&mys, sizeof(double) * imageLength * imageWidth);
two_d_example<<<numBlocks,threadsPerBlock>>>(d_a, d_b, mys, imageLength, imageWidth);
...
cudaFree(d_a);
cudaFree(d_b);
}
#endif
Please note that the __global__ has been removed from .h since I was getting the following error owing to it being compiled by g++.
In file included from my_code_main.cpp:12:0:
my_code_header.h:5:1: error: ‘__global__’ does not name a type
When I compile the .cu file with nvcc it is all fine and generates a my_code_kernel.o. But since I am using C++11 in my .cpp I am trying to compile it with g++ and I am getting the following error.
/tmp/ccR2rXzf.o: In function `main':
my_code_main.cpp:(.text+0x1c4): undefined reference to `call_global()'
collect2: ld returned 1 exit status
I understand that this might not have to do anything with CUDA as such and may just be the wrong use of including the header at both places. Also what is the right way to compile and most importantly link the my_code_kernel.o and my_code_main.o(hopefully)? Sorry if this question is too trivial!
It looks like you are not linking with my_code_kernel.o. You have used -c for your nvcc command (causes it to compile but not link, i.e. generate the .o file), I'm going to guess that you're not using -c with your g++ command, in which case you need to add my_code_kernel.o to the list of inputs as well as the .cpp file.
The separation you are trying to achieve is completely possible, it just looks like your not linking properly. If you still have problems, add the compilation commands to your question.
FYI: You don't need to declare two_d_example() in your header file, it is only used within your .cu file (from call_global()).

Steps to make a loadable DLL of some tcl methods in Visual Studio

I want to create a loadable DLL of some of my tcl methods. But I am not getting how to do this. For that I have taken a simple example of tcl api which adds two numbers and prints the sum. Now I want to create a loadable DLL for this to export this tcl functionality.
But I am not understanding how to do it in Visual Studio. I have written a C code which can call this tcl api and get the sum of two integers, but again I don't want it to do this way. I want to create a DLL file to use this tcl functionality. How can I create this DLL on Visual Studio 2010.
Below is my sample tcl program that I am using:
#!/usr/bin/env tclsh8.5
proc add_two_nos { } {
set a 10
set b 20
set c [expr { $a + $b } ]
puts " c is $c ......."
}
And here is the C code which can use this tcl functionality :
#include <tcl.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
Tcl_Interp *interp;
int code;
char *result;
Tcl_FindExecutable(argv[0]);
interp = Tcl_CreateInterp();
code = Tcl_Eval(interp, "source myscript.tcl; add_two_nos");
/* Retrieve the result... */
result = Tcl_GetString(Tcl_GetObjResult(interp));
/* Check for error! If an error, message is result. */
if (code == TCL_ERROR) {
fprintf(stderr, "ERROR in script: %s\n", result);
exit(1);
}
/* Print (normal) result if non-empty; we'll skip handling encodings for now */
if (strlen(result)) {
printf("%s\n", result);
}
/* Clean up */
Tcl_DeleteInterp(interp);
exit(0);
}
I have successfully compiled this code with the below command
gcc simple_addition_wrapper_new.c -I/usr/include/tcl8.5/ -ltcl8.5 -o simple_addition_op
The above code is working with the expected output.
What steps do I need to take to create a loadable dll for this in Visual Studio 2010?
If you look at the answers to this question: here it gives the basic outline of the process you need to go through. There are links from my answer to some Microsoft MSDN articles on creating DLLs.
To go into this in a little more detail for a C++ dll that has Tcl embedded in it.
The first step is to create a new visual studio project with the correct type, one that is going to build a dll that exports symbols. My example project is called TclEmbeddedInDll and that name appears in code in symbols such as TCLEMBEDDEDINDLL_API that are generated by Visual Studio.
The dllmain.cpp look like this:
// dllmain.cpp : Defines the entry point for the DLL application.
#include "stdafx.h"
BOOL APIENTRY DllMain( HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
{
allocInterp() ;
break ;
}
case DLL_THREAD_ATTACH:
break ;
case DLL_THREAD_DETACH:
break ;
case DLL_PROCESS_DETACH:
{
destroyInterp() ;
break;
}
}
return TRUE;
}
The allocInterp() and destroyInterp() functions are defined in the TclEmbeddedInDll.h, the reason for using functions here rather than creating the Tcl_Interp directly is that it keeps the details about Tcl away from the DLL interface. If you create the interp here then you have to include tcl.h and then things get complicated when you try and use the DLL in another program.
The TclEmbeddedInDll.h and .cpp are shown next, the function fnTclEmbeddedInDll() is the one that is exported from the DLL - I'm using C linkage for this rather than C++ as it makes it easier to call the function from other languages IMHO.
// The following ifdef block is the standard way of creating macros which make exporting
// from a DLL simpler. All files within this DLL are compiled with the TCLEMBEDDEDINDLL_EXPORTS
// symbol defined on the command line. This symbol should not be defined on any project
// that uses this DLL. This way any other project whose source files include this file see
// TCLEMBEDDEDINDLL_API functions as being imported from a DLL, whereas this DLL sees symbols
// defined with this macro as being exported.
#ifdef TCLEMBEDDEDINDLL_EXPORTS
#define TCLEMBEDDEDINDLL_API __declspec(dllexport)
#else
#define TCLEMBEDDEDINDLL_API __declspec(dllimport)
#endif
extern "C" {
TCLEMBEDDEDINDLL_API void fnTclEmbeddedInDll(void);
}
void allocInterp() ;
void destroyInterp() ;
// TclEmbeddedInDll.cpp : Defines the exported functions for the DLL application.
//
#include "stdafx.h"
extern "C" {
static Tcl_Interp *interp ;
// This is an example of an exported function.
TCLEMBEDDEDINDLL_API void fnTclEmbeddedInDll(void)
{
int code;
const char *result;
code = Tcl_Eval(interp, "source simple_addition.tcl; add_two_nos");
result = Tcl_GetString(Tcl_GetObjResult(interp));
}
}
void allocInterp()
{
Tcl_FindExecutable(NULL);
interp = Tcl_CreateInterp();
}
void destroyInterp()
{
Tcl_DeleteInterp(interp);
}
The implementation of allocInterp() and destroyInterp() is very naive, no error checking is done.
Finally for the Dll the stdafx.h file ties it all together like this:
// stdafx.h : include file for standard system include files,
// or project specific include files that are used frequently, but
// are changed infrequently
//
#pragma once
#include "targetver.h"
#define WIN32_LEAN_AND_MEAN // Exclude rarely-used stuff from Windows headers
// Windows Header Files:
#include <windows.h>
// TODO: reference additional headers your program requires here
#include <tcl.h>
#include "TclEmbeddedInDll.h"

Visual Studio, MSBuild: First build after a clean fails, following builds succeed

There's obviously something wrong with my build, but I can't figure it out. I narrowed this down to one of my projects: first build after clean fails, all following builds succeed.
I get linking errors which say that some symbols are already defined:
>------ Build started: Project: Problem, Configuration: Debug Win32 ------
> Blah blah blah...
23> Creating library D:\SVN.DRA.WorkingCopy\Debug\Problem.lib and object D:\SVN.DRA.WorkingCopy\Debug\Problem.exp
23>ProblemDependency1.lib(PD1.obj) : error LNK2005: "public: unsigned short __thiscall PD2Class::getFoo(void)const " (?getFoo#PD2Class##QBEGXZ) already defined in ProblemDependecy2.lib(ProblemDependency2.dll)
23>ProblemDependency1.lib(PD1.obj) : error LNK2005: "public: void __thiscall PD2Class2::`default constructor closure'(void)" (??_FPD2Class2#Image#DRA##QAEXXZ) already defined in ProblemDependency2.lib(ProblemDependency2.dll)
23>D:\SVN.DRA.WorkingCopy\Debug\Problem.dll : fatal error LNK1169: one or more multiply defined symbols found
Problem is a C++/CLI project, built with the /clr switch, which references the unmanaged C++ projects ProblemDependency1, a static lib, and ProblemDependency2, a dll.
ProblemDependency1 references ProblemDependency2.
getFoo() is declared as inline and defined outside of the class declaration, in the .h
PD2Class2 doesn't have an explicitly defined default constructor, but it has a constructor which has all default arguments, so you could say it includes the default constructor as a special case
The .h's where these are defined have #pragma once as their first line.
Any hint on troubleshooting this? I can post more info if needed
Update: I solved the first error thanks to Anders Abel's suggestion, but I still can't solve the second one (the one about the default constructor)
Update: If I compile using MSBuild outside Visual Studio, it fails always, with the same error
Edit: Here's some code. First, a bit of PD2Class2's declaration. PD2Class2's real name is CImage (feeling lazy to anonymize), CImage.h:
#pragma once
#pragma warning( disable: 4251 ) //TODO: Disable and solve
#include "ImageProperties.h"
#include "../CommonCppLibrary/CriticalSection.h"
#include <windows.h>
#include <stdexcept>
#include <string>
class CSharedMemory;
class EmptyImageException;
struct IShape;
struct SImageStatics {
unsigned short low3Percentile;
unsigned short high97Percentile;
unsigned short medianPixelValue;
unsigned short meanPixelValue;
unsigned short minPixelValue;
unsigned short maxPixelValue;
};
namespace DRA{
namespace Image{
class __declspec(dllexport) CImage {
friend class CImageLock;
//Attributes
int m_iPitch;
protected:
mutable CImageProperties m_cProperties;
CSharedMemory * m_pSharedMemmory;
mutable DRA::CommonCpp::CCriticalSection m_csData;
static const float PIXEL_FREQUENCY_COVERAGE;
static const float PIXEL_CUTOFF_PERCENTAGE;
static const int MINIMUM_PIXEL_FREQUENCY; //Pixels with a frequency lower than this are ignored
static const int MINIMUM_WINDOW_WIDTH_FOR_16_BITS;
//Methods
//Some private methods
public:
CImage( DWORD dwWidth = 0, DWORD dwHeight = 0, ULONG uBytesPerPixel = 0,
bool isSigned = false, EPhotometricInterpretation ePI = PI_UNKNOWN,
UINT bitsStored = 0, float pw = -1.0f, float ph = -1.0f, BYTE * pData = NULL );
CImage( const CImageProperties& cProperties, int iPitch = 0 );
CImage( const CImage& rImage );
virtual ~CImage();
virtual CImage& operator=( const CImage& );
bool operator==( const CImage& rImage );
//Alter State
//More methods
//Query State
//More methods
};
}
}
Next, the constructor's definition, from CImage.cpp:
CImage::CImage( DWORD dwWidth, DWORD dwHeight, ULONG uBytesPerPixel, bool isSigned,
EPhotometricInterpretation ePI, UINT bitsStored, float pw, float ph,
BYTE * pData ) :
m_iPitch( dwWidth * uBytesPerPixel ),
m_cProperties( dwWidth, dwHeight, uBytesPerPixel, bitsStored, ePI, isSigned, pw, ph ),
m_pSharedMemmory( NULL ),
m_csData(){
m_pSharedMemmory = new CSharedMemory( pData ? pData : new BYTE[getSize()] );
}
Is getFoo() marked as __declspec(dllexport)? If it is an inline function, it is instantiated/used from wherever it is called through the included header. It shouldn't be part of the functions that the dll exports and it should not have a dllexport directive.
__declspec(dllexport) might be handled through a macro that is expanded to dllexport or dllimport depending on if it is the dll or code using the dll that is compiled. If there is any macro in the function declaration you might have to dig into it to find if there is an export directive.
Update
I think that if the header file is used both when the dll is built and when the dll is used, it is incorrect to have __declspec(dllexport) in the header file. Instead use a macro system:
#ifdef PROBLEMDEPENDENCY2
#define DLLEXPORT __declspec(dllexport)
#else
#define DLLEXPORT __declspec(dllimport)
#endif
class DLLEXPORT CImage
{
//...
}
Then define the PROBLEMDEPENDENCY2 preprocessor symbol when building the dll, but not when using it. The problem with hardcoding __declspec(dllexport) in the header file is that the compiler will try to export the class both from ProblemDependency2 (which is correct) and from ProblemDependency1 (which is incorrect).
Just something I've run into recently to check:
Are you building on a network volume? I had been having problems with not being able to debug my applications because the .pdb file was not "there" after the build and before the debug launch due to latency in the SAN that I was working on as a build directory.
Once I moved the project build to a local volume, everything was fine.
Don't know if that's what's happening to you or not, but something I'd look into.
I dont' have much c++ experience, but problems like this in other .NET languages, often result from having a DLL reference to another project in the same solution (to the DLL in the "obj" or "bin" folder of the other project, instead of a project reference. This stops Visual Studio from being able to figure out the build order, and, hence, the first time after a "clean", you will not have the DLL you are depending on. On the second build, this DLL will already have been built, and the build will succeed.

Resources