Type mismatch in CUDA when invoking Kernel

Type mismatch in CUDA when invoking Kernel - c++11

I'm trying to port some code onto a GPU using CUDA 9.0.
I ran into the problem that the Kernel appears to expect a different type inside the kernel than outside the Kernel.
I have boiled down the problem to the following lines, which should show the problem. I hope this should be enough code to expose the error source.
I definitely do not have a second kernel named similar or equal, of course all streams are defined and for testing purposes I commented out any inner implementation of the kernel.
Real is a typedef, that sets here to float. For trial purposes I have replaced the Real with float, which leads me to the same result.
// Kernel definition
__global__ void doStuff(Real *masses)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
// no inner implementation, yet
}
// prepare the loop
for(...)
{
Real *masses, *d_masses;
masses = getMasses();
cudaMalloc(&d_masses, numActiveParticles * sizeof(Real));
cudaMemcpyAsync(d_masses, masses, numActiveParticles * sizeof(Real), cudaMemcpyHostToDevice, dataStream1);
cudaStreamSynchronize(dataStream1);
doStuff<<<256, 256, 0, executionStream>>>(d_masses);
// ....
}
The error message that I am getting now is:
error: argument of type "Real *" is incompatible with parameter of type
"unsigned int"
and when I replace everything with float:
error: argument of type "float *" is incompatible with parameter of type
"unsigned int"
Help would be much appreciated, and thank you all in advance,

Update:
I found the error. My class inherited another class with a member function named like the kernel. Instead of invoking the kernel it always tried to invoke the parent's class member function.

Related

Alea GPU for loop cannot get field

I am just starting with ALEA and I am curious how you can access other types and references inside a given gpu parallel.for. when i do the following i get a runtime error that states "Cannot get field random. Possible reasons: 1) Static field is not supported.2) The field type is not supported. 3) In closure class, the field doesn't have [GpuParam] attribute."
This error makes sense but I am not sure what the correct implementation would be
[GpuManaged]
public void InitPoints()
{
var gp = Gpu.Default;
gp.For(1, (10), (i) =>
{
int pointStart = random.Next(totalPoints) + 1;
Pt point = new Pt(pointStart, ptAt[i]);
point.Process();
});
}

You try to call the System.Random.Next. This is .NET library code and cannot be compiled to GPU. There is no MSIL behind that function that could be accessed and compiled to run on the GPU. Also the System.Random.Next is a random number generator implemented for serial applications. You should use the parallel random number generators provided in cuRand, which are also exposed in Alea GPU.

Lambda expression in c++, OS X's clang vs GCC

A particular property of c++'s lambda expressions is to capture the variables in the scope in which they are declared. For example I can use a declared and initialized variable c in a lambda function even if 'c' is not sent as an argument, but it's captured by '[ ]':
#include<iostream>
int main ()
{int c=5; [c](int d){std::cout<<c+d<<'\n';}(5);}
The expected output is thus 10. The problem arises when at least 2 variables, one captured and the other sent as an argument, have the same name:
#include<iostream>
int main ()
{int c=5; [c](int c){std::cout<<c<<'\n';}(3);}
I think that the 2011 standard for c++ says that the captured variable has the precedence on the arguments of the lambda expression in case of coincidence of names. In fact compiling the code using GCC 4.8.1 on Linux the output I get is the expected one, 5. If I compile the same code using apple's version of clang compiler (clang-503.0.40, the one which comes with Xcode 5.1.1 on Mac OS X 10.9.4) I get the other answer, 3.
I'm trying to figure why this happens; is it just an apple's compiler bug (if the standard for the language really says that the captured 'c' has the precedence) or something similar? Can this issue be fixed?
EDIT
My teacher sent an email to GCC help desk, and they answered that it's clearly a bug of GCC compiler and to report it to Bugzilla. So Clang's behavior is the correct one!

From my understanding of the c++11 standard's points below:
5.1.2 Lambda expressions
3 The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed non-union class type — called
the closure type — whose properties are described below.
...
5 The closure type for a lambda-expression has a public inline function call operator (13.5.4) whose parameters and return type are
described by the lambda-expression’s parameter-declaration-clause and
trailing-return-type respectively. This function call operator is
declared const (9.3.1) if and only if the lambda-expression’s
parameter-declaration-clause is not followed by mutable.
...
14 For each entity captured by copy, an unnamed non static data member is declared in the closure type
A lambda expression like this...
int c = 5;
[c](int c){ std::cout << c << '\n'; }
...is roughly equivalent to a class/struct like this:
struct lambda
{
int c; // captured c
void operator()(int c) const
{
std::cout << c << '\n';
}
};
So I would expect the parameter to hide the captured member.
EDIT:
In point 14 from the standard (quoted above) it would seem the data member created from the captured variable is * unnamed *. The mechanism by which is it referenced appears to be independent of the normal identifier lookups:
17 Every id-expression that is an odr-use (3.2) of an entity captured by copy is transformed into an access to the corresponding unnamed data member of the closure type.
It is unclear from my reading of the standard if this transformation should take precedence over parameter symbol lookup.
So perhaps this should be marked as UB (undefined behaviour)?

From the C++11 Standard, 5.1.2 "Lambda expressions" [expr.prim.lambda] #7:
The lambda-expression’s compound-statement yields the function-body (8.4) of the function call operator,
but for purposes of name lookup (3.4), determining the type and value of this (9.3.2) and transforming id-expressions
referring to non-static class members into class member access expressions using (*this) (9.3.1),
the compound-statement is considered in the context of the lambda-expression.
Also, from 3.3.3 "Block scope" [basic.scope.local] #2:
The potential scope of a function parameter name (including one appearing in a lambda-declarator) or of
a function-local predefined variable in a function definition (8.4) begins at its point of declaration.
Names in a capture list are not declarations and therefore do not affect name lookup. The capture list just allows you to use the local variables; it does not introduce their names into the lambda's scope. Example:
int i, j;
int main()
{
int i = 0;
[](){ i; }; // Error: Odr-uses non-static local variable without capturing it
[](){ j; }; // OK
}
So, since the parameters to a lambda are in an inner block scope, and since name lookup is done in the context of the lambda expression (not, say, the generated class), the parameter names indeed hide the variable names in the enclosing function.

Can I know data type from variable name in GCC?

I want to know data type using variable name
My final goal is getting a function signature for making a function stub(skeleton code)
but GCC error message just notify only undefined function name
Can I see a symbol table? (for inferencing function signature)
for example, foo.c is like below
#include <stdio.h>
int main() {
int n = 0;
n = foo();
return 0;
}
I want to make a function stub
so I want to know function foo has no parameter and returns an integer value
What should I do?
I think below:
linker error message say function foo is undefined
read line 5
n = foo();
inspect type of n using symbol table
is it right?
sorry for my bad english
please teach me inferencing a function signature

Inject his code into your source file:
typedef struct { int a; char c; } badtype_t;
badtype_t badtype;
then replace the error line like this:
n = badtype; //foo();
or if you want the type foo returns:
badtype = foo();
then you will get some error like this:
incompatible types when initializing type ‘int’ using type ‘badtype_t’
and you can get the type int.
or if you want the type of foo itself:
foo * 2
then you will get some error like this:
invalid operands to binary * (have 'int (*)()' and 'int')
and you can get the type int (*)() (that is, function taking nothing and returning an int).

It seems ok, but this strategy will not be good enough. Using the left-hand side of an expression is not enough to determine the return-type of the function. In particular, there may be no left-hand side at all, simply: foo();. What then?

If you just want to see a symbol table, that's what nm is for.
For example, if you get an error linking foo.o and bar.o together, you can do this:
nm -a foo.o
That will show you all the symbols defined in module foo.
But I don't see why you think this would help. C symbols do not have any type information. There may be enough metadata to distinguish extern linkage, and/or to tell whether a symbol function or data, but that's it. There is no way to tell an int from a float, or a function taking two ints and returning a double from a function taking a char * and returning a different char *.

So, you have some function named foo defined somewhere, and you want to know what its type is.
If you don't actually have a prototype for foo somewhere in your #included header files, this is easy:
If you're using C99, your code is invalid.
Otherwise, foo must take no arguments and return int, or your code is invalid.
And this isn't one of those "technically invalid, but it works on every platform" cases; it will break. For example, with gcc 4.2 for 64-bit x86 linux or Mac, if you do this:
double foo(double f) { return f*2; }
Then, without a header file, call it like this:
double f = foo(2.0);
printf("%f\n", f);
If compiled as C89, this will compile and link just fine (clang or gcc 4.8 will give you a warning; gcc 4.2 won't even do that by default), and run, and print out 2.0. At least on x86_64; on ARM7, you'll corrupt the stack, and segfault if you're lucky. (Of course it actually does double something—either your 2.0 or some random uninitialized value—but it can't return that to you; it's stashed it in an arbitrary floating-point register that the caller doesn't know to access.)
If it is in a header file, you can always search for it. emacs, graphical IDEs, etc. are very good at this. But you can use the compiler to help you out, in two ways.
First, just do this:
gcc -E main.c > main.i
less main.i
Now search for /foo, and you'll find it.
Or you can trick the compiler into giving you an error message, as in perreal's answer.

Add a mathematical operation to standard TCL ones

As you know TCL has some mathematical functions such as sin, cos, and hypot that are called in expr command with () braces as follows:
puts [expr sin(1.57)]
Now how can I add a function using TCL library functions so that it was called exactly the same way, and was doing something that a certain proc defines.
I would like to clarify my question. Say there is a proc (string) as follows:
proc add { a b } { return [expr $a+$b] } ;# string of a proc
Also I have a TCL interpreter in my C++ code. Now I want get the string of a proc and runtime register a function called add into the tcl::mathfunc namespace (I guess I should use Tcl_CreateObjCommand) so that I could call the following:
puts [expr add(1.57, 1.43)]
How this can be done. Could you please write a simple example. I could not find any example in TCL documentation and in books as well which describe the usage of this command.

Creating a function from C isn't too hard. To do it, you've got to write an implementation of a command that will perform the operation, and register that implementation as a command in the correct namespace. (In 8.4 and before, functions were done with a separate interface that was quite a bit nastier to use; the mechanism was wholly overhauled in 8.5.)
Command Implementation
Note that the signature is defined, and the ignored parameter is not used here. (It's really a void * — great when you're wanting to do things like binding a command to an object — but it simply isn't needed for doing an addition.)
static int AddCmd(ClientData ignored, Tcl_Interp *interp, int objc,
Tcl_Obj *const objv[]) {
double x, y, sum;
/* First, check number of arguments: command name is objv[0] always */
if (objc != 3) {
Tcl_WrongNumArgs(interp, 1, objv, "x y");
return TCL_ERROR;
}
/* Get our arguments as doubles */
if ( Tcl_GetDoubleFromObj(interp, objv[1], &x) != TCL_OK ||
Tcl_GetDoubleFromObj(interp, objv[2], &y) != TCL_OK) {
return TCL_ERROR;
}
/* Do the real operation */
sum = x + y;
/* Pass the result out */
Tcl_SetObjResult(interp, Tcl_NewDoubleObj(sum));
return TCL_OK;
}
Don't worry about the fact that it's allocating a value here; Tcl's got a very high performance custom memory manager that makes that a cheap operation.
Command Registration
This is done usually inside an initialization function that is registered as part of a Tcl package definition or which is called as part of initialization of the overall application. You can also do it directly if you are calling Tcl_CreateInterp manually. Which you do depends on how exactly how you are integrating with Tcl, and that's quite a large topic of its own. So I'll show how to create an initialization function; that's usually a good start in all scenarios.
int Add_Init(Tcl_Interp *interp) {
/* Use the fully-qualified name */
Tcl_CreateObjCommand(interp, "::tcl::mathfunc::add", AddCmd, NULL, NULL);
return TCL_OK;
}
The first NULL is the value that gets passed through as the first (ClientData) parameter to the implementation. The second is a callback to dispose of the ClientData (or NULL if it needs no action, as here).
Doing all this from C++ is also quite practical, but remember that Tcl is a C library, so they have to be functions (not methods, not without an adapter) and they need C linkage.
To get the body of a procedure from C (or C++), by far the easiest mechanism is to use Tcl_Eval to run a simple script to run info body theCmdName. Procedure implementations are very complex indeed, so the interface to them is purely at the script level (unless you actually entangle yourself far more with Tcl than is really wise).

System.AccessViolationException storing a variable with reflectio.emit

I'm building a compiler with reflection.emit in my spare time, and i've come to a problem that i'm not understanding.
A little context, I've a runtime with a couple of types and one of them is Float2, a simpler vector struct with two float values (X and Y). I've made a couple of properties that allow me to swizzle the values (a la hlsl). For example if i have a new Float2(1.0f, 2.0f), if i make something like (new Float2(1.0f, 2.0f)).YX i'm going to get a Float2(2.0f, 1.0f)
I'm using this type in my language and currently testing this case (minor details of the language omitted):
float2 a = float2(1.0, 2.0).yx;
return a;
I'm transforming float2(1.0, 2.0) in a new call and accessing the property YX of my Float2 type in .yx.
The problem is I'm getting a "System.AccessViolationException : Attempted to read or write protected memory. This is often an indication that other memory is corrupt.". I don't understand why because if I make something like this:
float2 a = float2(1.0, 2.0);
return a;
Everything goes well.
The IL code that i'm generating is the following (I think the problem occurs in "L_0014: stloc.0", I don't know why it happens though) :
.method public virtual final instance valuetype
[Bifrost.Psl]Bifrost.Psl.Compiler.Runtime.Float2 Main() cil managed
{
.maxstack 3
.locals init (
[0] valuetype [Bifrost.Psl]Bifrost.Psl.Compiler.Runtime.Float2 num)
L_0000: ldc.r4 1
L_0005: ldc.r4 2
L_000a: newobj instance void [Bifrost.Psl]Bifrost.Psl.Compiler.Runtime.Float2::.ctor(float32, float32)
L_000f: call instance valuetype [Bifrost.Psl]Bifrost.Psl.Compiler.Runtime.Float2 [Bifrost.Psl]Bifrost.Psl.Compiler.Runtime.Float2::get_XY()
L_0014: stloc.0
L_0015: ldloc.0
L_0016: ret
}
Result of peverify:
[IL]: Error: [offset 0x0000000F]
[found value 'Bifrost.Psl.Compiler.Runtime.Float2'][expected address of value 'Bifrost.Psl.Compiler.Runtime.Float2'] Unexpected type on the stack.

The IL looks OK, although I don't know what your Float2 looks like.
I found the best way to debug this is to save the assembly to disk, then run peverify. Any code that generates an AccessViolationException will cause an error in peverify.
Edit: The newobj doc on MSDN talks about pushing an object reference onto the stack, which I took to be a pointer to a value type. If you're getting this error from peverify then I think you need to
newobj
stloc to a temporary variable
ldloca to get the address of the value type stored in the temporary variable
call
Now that I think about it, this is what the C# compiler does if you do a direct call on a value type like 4.ToString();.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Type mismatch in CUDA when invoking Kernel - c++11

Update: I found the error. My class inherited another class with a member function named like the kernel. Instead of invoking the kernel it always tried to invoke the parent's class member function.

Related

Alea GPU for loop cannot get field

Lambda expression in c++, OS X's clang vs GCC

Can I know data type from variable name in GCC?

Add a mathematical operation to standard TCL ones

System.AccessViolationException storing a variable with reflectio.emit

Categories

Resources