Separating out .cu and .cpp(using c++11 library) - c++11

I am trying to convert a c++ program I have which uses random library which is a C++11 feature. After having read through a couple of similar posts here, I tried by separating out the code into three files. At the outset I would like to say that I am not very conversant at C/C++ and mostly use R at work.
The main file looks as follows.
#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_
#include <complex>
#include <random>
#include <iostream>
#include "my_code_header.h"
using namespace std;
std::default_random_engine generator;
std::normal_distribution<double> distribution(0.0,1.0);
const int rand_mat_length = 24561;
double rand_mat[rand_mat_length];// = {0};
void create_std_norm(){
for(int i = 0 ; i < rand_mat_length ; i++)
::rand_mat[i] = distribution(generator);
}
.
.
.
int main(void)
{
...
...
call_global();
return 0;
}
#endif
The header file looks as follows.
#ifndef mykernel_h
#define mykernel_h
void call_global();
void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width);
#endif
And the .cu file looks like the following.
#ifndef _MY_KERNEL_
#define _MY_KERNEL_
#include <iostream>
#include "my_code_header.h"
#define TILE_WIDTH 8
using namespace std;
__global__ void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width)
{
unsigned int row = blockIdx.y*blockDim.y + threadIdx.y;
unsigned int col = blockIdx.x*blockDim.x + threadIdx.x;
if ((row>length) || (col>width)) {
return;
}
...
}
void call_global()
{
const size_t imageLength = 528;
const size_t imageWidth = 528;
const dim3 threadsPerBlock(TILE_WIDTH,TILE_WIDTH);
const dim3 numBlocks(((imageLength) / threadsPerBlock.x), ((imageWidth) / threadsPerBlock.y));
double *d_a, *d_b, *mys ;
...
cudaMalloc((void**)&d_a, sizeof(double) * imageLength);
cudaMalloc((void**)&d_b, sizeof(double) * imageWidth);
cudaMalloc((void**)&mys, sizeof(double) * imageLength * imageWidth);
two_d_example<<<numBlocks,threadsPerBlock>>>(d_a, d_b, mys, imageLength, imageWidth);
...
cudaFree(d_a);
cudaFree(d_b);
}
#endif
Please note that the __global__ has been removed from .h since I was getting the following error owing to it being compiled by g++.
In file included from my_code_main.cpp:12:0:
my_code_header.h:5:1: error: ‘__global__’ does not name a type
When I compile the .cu file with nvcc it is all fine and generates a my_code_kernel.o. But since I am using C++11 in my .cpp I am trying to compile it with g++ and I am getting the following error.
/tmp/ccR2rXzf.o: In function `main':
my_code_main.cpp:(.text+0x1c4): undefined reference to `call_global()'
collect2: ld returned 1 exit status
I understand that this might not have to do anything with CUDA as such and may just be the wrong use of including the header at both places. Also what is the right way to compile and most importantly link the my_code_kernel.o and my_code_main.o(hopefully)? Sorry if this question is too trivial!

It looks like you are not linking with my_code_kernel.o. You have used -c for your nvcc command (causes it to compile but not link, i.e. generate the .o file), I'm going to guess that you're not using -c with your g++ command, in which case you need to add my_code_kernel.o to the list of inputs as well as the .cpp file.
The separation you are trying to achieve is completely possible, it just looks like your not linking properly. If you still have problems, add the compilation commands to your question.
FYI: You don't need to declare two_d_example() in your header file, it is only used within your .cu file (from call_global()).

Related

Shared library undefined reference for CUDA Kernel wrapper

So I am attempting to use the CUDA Runtime API with Go's cgo on Windows. I've been at this for a few days now and am stuck: I am getting an undefined reference to my kernel wrapper.
I have separated out my kernel and it's wrapper into the following
FILE: cGo.cuh
typedef unsigned long int ktype;
typedef unsigned char glob;
/*
function Prototypes
*/
extern "C" void kernel_kValid(int, int, ktype *, glob *);
__global__ void kValid(ktype *, glob *);
FILE: cGo.cu
#include "cGo.cuh"
#include "device_launch_parameters.h"
#include "cuda.h"
#include "cuda_runtime.h"
//function Definitions
/*
kernel_kValid is a wrapper function for the CUDA Kernel to be called from Go
*/
extern "C" void kernel_kValid(int blocks, int threads, ktype *kInfo, glob *values) {
kValid<<<blocks, threads>>>(kInfo, values);//execute the kernel
}
/*
kValid is the CUDA Kernel which is to be executed
*/
__global__ void kValid(ktype *kInfo, glob *values) {
//lots of code
}
I compile my CUDA source code into a shared library as such:
nvcc -shared -o myLib.so cGo.cu
then I have created a header file to include in my cgo
FILE: cGo.h
typedef unsigned long int ktype;
typedef unsigned char glob;
/*
function Declarations
*/
void kernel_kValid(int , int , ktype *, glob *);
Then from the go package I utilize cgo to call my kernel wrapper I have
package cuda
/*
#cgo LDFLAGS: -LC:/Storage/Cuda/lib/x64 -lcudart //this is the Cuda library
#cgo LDFLAGS: -L${SRCDIR}/lib -lmyLib //this is my shared library
#cgo CPPFLAGS: -IC:/Storage/Cuda/include //this contains cuda headers
#cgo CPPFLAGS: -I${SRCDIR}/include //this contains cGo.h
#include <cuda_runtime.h>
#include <stdlib.h>
#include "cGo.h"
*/
import "C"
func useKernel(){
//other code
C.kernel_kValid(C.int(B), C.int(T), unsafe.Pointer(storageDevice), unsafe.Pointer(globDevice))
cudaErr, err = C.cudaDeviceSynchronize()
//rest of the code
}
So all of the calls to the CUDA runtime API don't throw errors, it's only my kernel wrapper. This is the output when I build the cuda package with go.
C:\Users\user\Documents\Repos\go\cuda_wrapper>go build cuda_wrapper\cuda
# cuda_wrapper/cuda
In file included from C:/Storage/Cuda/include/host_defines.h:50:0,
from C:/Storage/Cuda/include/device_types.h:53,
from C:/Storage/Cuda/include/builtin_types.h:56,
from C:/Storage/Cuda/include/cuda_runtime.h:86,
from C:\Go\workspace\src\cuda_wrapper\cuda\cuda.go:12:
C:/Storage/Cuda/include/crt/host_defines.h:84:0: warning: "__cdecl" redefined
#define __cdecl
<built-in>: note: this is the location of the previous definition
# cuda_wrapper/cuda
C:\Users\user\AppData\Local\Temp\go-build038297194\cuda_wrapper\cuda\_obj\cuda.cgo2.o: In function `_cgo_440ebb0a3e25_Cfunc_kernel_kValid':
/tmp/go-build\cuda_wrapper\cuda\_obj/cgo-gcc-prolog:306: undefined reference to `kernel_kValid'
collect2.exe: error: ld returned 1 exit status
It's here I'm not really sure what's wrong. I have been looking at questions asked about undefined references with cgo but nothing I have found has solved my issue. I have also been looking at the fact that the CUDA runtime API is written in C++ and if that would affect how cgo will compile this but again I haven't found anything conclusive. At this point I think I have confused myself more than anything else so I'm hoping someone more knowledgeable can point me in the right direction.
Good catch on the name manlging.
Here's a solution we used for gorgonia:
#include <math.h>
#ifdef __cplusplus
extern "C" {
#endif
__global__ void sigmoid32(float* A, int size)
{
int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z;
int idx = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x;
if (idx >= size) {
return;
}
A[idx] = 1 / (1 + powf((float)(M_E), (-1 * A[idx])));
}
#ifdef __cplusplus
}
#endif
So... just wrap your kernel wrapper function in extern "C"

how to compile Cuda source with Go language's cgo?

I wrote a simple program in cuda-c and it works on eclipse nsight. This is source code:
#include <iostream>
#include <stdio.h>
__global__ void add( int a,int b, int *c){
*c = a + b;
}
int main(void){
int c;
int *dev_c;
cudaMalloc((void**)&dev_c, sizeof(int));
add <<<1,1>>>(2,7,dev_c);
cudaMemcpy(&c, dev_c, sizeof(int),cudaMemcpyDeviceToHost);
printf("\n2+7= %d\n",c);
cudaFree(dev_c);
return 0;
}
Now I'm trying to use this code with Go language with cgo!!!
So I wrote this new code:
package main
//#include "/usr/local/cuda-7.0/include/cuda.h"
//#include "/usr/local/cuda-7.0/include/cuda_runtime.h"
//#cgo LDFLAGS: -lcuda
//#cgo LDFLAGS: -lcurand
////default location:
//#cgo LDFLAGS: -L/usr/local/cuda-7.0/lib64 -L/usr/local/cuda-7.0/lib
//#cgo CFLAGS: -I/usr/local/cuda-7.0/include/
//
//
//
//
//
//
//
//
//
//
/*
#include <stdio.h>
__global__ void add( int a,int b, int *c){
*c = a + b;
}
int esegui_somma(void){
int c;
int *dev_c;
cudaMalloc((void**)&dev_c, sizeof(int));
add <<<1,1>>> (2,7,dev_c);
cudaMemcpy(&c, dev_c, sizeof(int),cudaMemcpyDeviceToHost);
cudaFree(dev_c);
return c;
}
*/
import "C"
import "fmt"
func main(){
fmt.Printf("il risultato è %d",C.esegui_somma)
}
But it doesn't work!!
I read this error message:
cgo_cudabyexample_1/main.go:34:8: error: expected expression before '<' token
add <<<1,1>>> (2,7,dev_c);
^
I think that I must to set nvcc cuda compiler for cgo instead of gcc.
How can I do it? Can I change CC environment variable?
best regards
I finally figured out how to do this. Thing biggest problem is that nvccdoes not follow gcc standard flags and unlike clang it won't silently ignore them. cgo triggers the problem by adding a bunch of flags not explicitly specified by the user.
To make it all work, you'll need to separate out your device code and the functions that directly call it into separate files and compile/package them directly using nvcc into a shared library (.so). Then you'll use cgo to link this shared library using whatever default linker you have on your system. The only thing you'll have to add is -lcudart to your LDFLAGS (linker flags) to link the CUDA runtime.

How do I declare a user defined function in OMNet++?

I have declared a function in the c++ file as stated in the documentation and called it in the .ned file. But it gives the following error.
error:expected constructor, destructor, or type conversion before ‘(’ token Define_Function(dijkstra, 1);
The following is my c++ file.
#include <omnetpp.h>
#include "stdio.h"
#include "Node.h"
#include "cdelaychannel.h"
Define_Function(dijkstra, 1);
double dijkstra(double start = 1){
....
....
}
In my network description file, I've called the function.
package myproject;
#license(LGPL);
dijkstra(1.0);
Why is it giving me the error?
If you want to create a function for using it in NED files, you have to do it as described in OMNeT++ Manual. An example could be the following:
static cNEDValue ned_foo(cComponent *context, cNEDValue argv[], int argc)
int a = (long) argv[0];
int b = (long) argv[1];
return a*b;
}
Define_NED_Function(ned_foo,"int ned_foo(int a, int b)");

How to setup C/C++ project safely (file organisation)

I have a (large) C/C++ project that consists of both C and C++ languages. At some point it turned out that there are two C functions with identical names. Those functions are defined in two different *.c files in different locations. In general at the highest level, the project is C++. This problem was questioned and answered here
However still a question "how to organize those files safely" remains. How can I group such project so that there are no name conflicts, and I can be sure that proper function is called. Will writing a wrapper for each of those functions help?
That how it looks at the moment:
A.h //first declaration of function F
A.c //first definition of function F
B.h //second declaration of function F
B.c //second definition of function F
trying to make such thing:
extern "C"{
#include "A.h"
#include "B.h"
}
causes of course name conflict. What can I do to avoid this conflct, and have the robust code? Would such solution help:
A_Wrapper.h: //c++
extern "C"{
#include "A.h"
}
void WrapF_A(int x)
{
F(x);
}
B_Wrapper.h: //C++
extern "C"{
#include "B.h"
}
void WrapF_B(int x)
{
F(x);
}
and then in the program:
#include A_Wrapper.h
#include B_Wrapper.h
Modyfing each file in that project would be rather impossible as it cosists of hundreds of files, and i would probably damage some code rather. Is there a way to make an include file seen only in some part of the program?
EDIT:
So I created a simple project illustrating the problem, and tried to apply the hints given by doctorlove. However still multiple definition of F error occurs. What should I change? Project files:
A.h:
#ifndef A_H_INCLUDED
#define A_H_INCLUDED
int F(int x);
#endif // A_H_INCLUDED
A.c
#include "A.h"
int F(int x)
{
return x*x;
}
AWrapper.h:
#ifndef AWRAPPER_H_INCLUDED
#define AWRAPPER_H_INCLUDED
int AF(int x);
#endif // AWRAPPER_H_INCLUDED
AW.cpp:
#include "AWrapper.h"
extern "C"{
#include "A.h"
}
int AF(int x)
{
return F(x);
}
B.h:
#ifndef B_H_INCLUDED
#define B_H_INCLUDED
int F(int x);
#endif // B_H_INCLUDED
B.c:
#include "B.h"
int F(int x)
{
return -x*x;
}
BWrapper.h:
#ifndef BWRAPPER_H_INCLUDED
#define BWRAPPER_H_INCLUDED
int BF(int x);
#endif // BWRAPPER_H_INCLUDED
BW.cpp:
#include "BWrapper.h"
extern "C"{
#include "B.h"
}
int BF(int x)
{
return F(x);
}
Go with your wrapper idea, but write a facade (see also here) that exposes what you need from A, and what you need from B not all the functions in there.
You will end up with something like
//header Wrap_A.h
#ifndef WRAP_A_INCLUDED
#define WRAP_A_INCLUDED
//for some input Data left as an exercise for the reader...
double solve_with_A(Data data);
#endif
//header Wrap_B.h
#ifndef WRAP_B_INCLUDED
#define WRAP_B_INCLUDED
//for some input Data...
double solve_with_B(Data data);
#endif
Then make two cpp files that include all the conflicting headers files, those from A in A.cpp and those from B in B.cpp, so the conflicts don't happen. The solve_with_A and solve_with_B functions will then call all the things they need without without leaking them to the whole program and causing conflicts.
You might have to give some thought to what Data will actually be. You could define your own types, one for A and one for B. Just avoid exposing the implementation details in your wrapping/facade headers.
If headers are causing you pain, firewall them off in the naughty corner.
EDIT
Given you have two functions, F, if you put all the sources into one project the linker should and will complain it can see both. Instead, you need to make two static libraries, and just expose the wrapped version to your main project.

Cuda Project Not Compiling

I have compiled my cuda project using visual studio 2010. I have countered an error stated:
student_func.cu(65): error C2059: syntax error : '<'
The line where error occurs is when the kernel function is called:
rgba_to_greyscale<<< gridSize, blockSize >>>(d_rgbaImage, d_greyImage, numRows, numCols);
and here is the code for student_func.cu:
#include "reference_calc.cpp"
#include "utils.h"
#include <stdio.h>
__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
unsigned char* const greyImage,
int numRows, int numCols)
{
}
void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
unsigned char* const d_greyImage, size_t numRows, size_t numCols)
{
//You must fill in the correct sizes for the blockSize and gridSize
//currently only one block with one thread is being launched
const dim3 blockSize(1, 1, 1); //TODO
const dim3 gridSize( 1, 1, 1); //TODO
rgba_to_greyscale<<< gridSize, blockSize >>>(d_rgbaImage, d_greyImage, numRows, numCols);
cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}
Please, have first a look at this guide on how to integrate CUDA in a Visual Studio C++ project.
Also, you should organize the code so that:
.h, .cpp, .c, .hpp files should not contain CUDA code (like __device__ functions and the kernel call in your case). However, in these files you can call CUDA APIs (for example, cudaMalloc, cudaMemcpy, etc.). These files are compiled by a compiler other than NVCC.
.cuh, .cu files should contain the CUDA code. These files are compiled by NVCC.
As an example, suppose to have a GPU-based FDTD code. I usually do the following (Visual Studio 2010).
main.cpp file, including CPU-GPU memory transfers;
FDTD.cu, including an extern "C" void E_update(...) function which contains the kernel <<< >>> call;
main.h file, including the extern "C" void E_update(...) prototype;
FDTD.cuh, including the __global__ void E_update_kernel(...) function.

Resources