Using cuBLAS in OpenAcc - cublas

I want to replace the call to "cblas_dgemm()" with cublasDgemm(). Here is the original wrapper from Shark machine learning library:
inline void gemm(
CBLAS_ORDER const Order, CBLAS_TRANSPOSE TransA, CBLAS_TRANSPOSE TransB,
int M, int N, int K,
double alpha, double const *A, int lda,
double const *B, int ldb,
double beta, double *C, int ldc
){
cblas_dgemm(
Order, TransA, TransB,
M, N, K,
alpha,
A, lda,
B, ldb,
beta,
C, ldc
);
}
And here is the modified code using OpenAcc pragmas:
inline void gemm(
CBLAS_ORDER const Order, CBLAS_TRANSPOSE TransA, CBLAS_TRANSPOSE TransB,
int M, int N, int K,
double alpha, double const *A, int lda,
double const *B, int ldb,
double beta, double *C, int ldc
){
#ifdef _OPENACC
cublasOperation_t OpT_A, OpT_B;
switch (TransA)
{
case CblasNoTrans:
OpT_A = CUBLAS_OP_N;
break;
case CblasTrans:
OpT_A = CUBLAS_OP_T;
break;
case CblasConjTrans:
OpT_A = CUBLAS_OP_C;
break;
default:
OpT_A = CUBLAS_OP_N;
}
switch (TransB)
{
case CblasNoTrans:
OpT_B = CUBLAS_OP_N;
break;
case CblasTrans:
OpT_B = CUBLAS_OP_T;
break;
case CblasConjTrans:
OpT_B = CUBLAS_OP_C;
break;
default:
OpT_B = CUBLAS_OP_N;
}
cublasHandle_t handle;
#pragma acc data copyin(OpT_A, OpT_B, M, N, K, alpha, A[0:M][0:K], lda, B[0:K][0:N], ldb, beta, ldc) copy(C[0:M][0:N])
{
#pragma acc host_data use_device(handle,OpT_A, OpT_B, A, B, C, M, N, K, lda, ldb, ldc, alpha, beta)
{
cublasDgemm(handle,OpT_A,OpT_B,M,N,K,&alpha,A,lda,B,ldb,&beta,C,ldc);
}
}
#else
cblas_dgemm(
Order, TransA, TransB,
M, N, K,
alpha,
A, lda,
B, ldb,
beta,
C, ldc
);
#endif
}
The problem is when I compile the code with OpenAcc flag, the elements of the result matrix, i.e. C, are all zeros before and after the kernel execution. I am not sure what I am missing here.
I appreciate any help.

It looks like you've got the basic structure right. You don't need any of the scalar variables on the data or host_data directives though. That's the Op*, M, N, K, and ld* variables. I think this is likely your issue, as cublasDgemm will try to resolve those variables on the host in order to launch the kernel.

Related

std::array in template specialization

I want to write a function that take n int as a coordinate of array with the max value for each coordinate. This function linearizes these parameters to target a specific index.
int my_func(int x, int y, int XMAX, int YMAX){
return x + y*XMAX;
}
Here is a 2D example, but I can make something generic thanks to variadic template quite easily.
However, I am stucked when I want to make the same function that does not take the max value for each coordinate in parameters.
Something like that :
template<int XMAX, int YMAX>
int my_func(int x, int y){
return x + y*XMAX;
}
Here it works in 2D, but I want to generalize that from 1 to N dimensions and I don't know how I could achieve that.
I was thinking to pass an int N which is the number of dimension and an std::array<N, int>::iterator which is an iterator on the std::array containing the actual max value, but it does not compile.
Here is the code:
template <int N, std::array<size_t, N>::iterator it>
void foo(){...}
It says ’std::array<long unsigned int, N>::iterator’ is not a type.
If i just pass the std::array, I get the following error : ’struct std::array<long unsigned int, N>’ is not a valid type for a template non-type parameter
Does someone have an idea on how to solve such a problem ? I am using C++ 11 (G++ 5.4.0).
First of all, I suppose you did a little mistake in your function, because if you need to linearize the accessing of array you need to multuply y by XMAX
int my_func(int x, int y, int XMAX, int YMAX){
return x + y*XMAX;
}
because each row is composed of XMAX items. For answer to your question I used a template parameter pack
template <int N>
int my_func(int x)
{
assert(x < N);
return x;
}
template <int N, int... Ns, typename ARG, typename... ARGS>
ARG my_func (ARG x, ARGS... args)
{
assert(x < N);
return x + N*my_func<Ns...>(args...);
}
int main()
{
int a = 1;
int b = 2;
int c = my_func<10, 3>(a, b);
}
The fist function is the base for the recursion, the second function use two parameter packs but also 2 explicit template parameter to make the recursion possible.

Why does this partial specialization of a template friend function work?

I am writing simple classes that implement vectors and matrices as part of trying to learn OpenGL. I have matrix and vector classes that look like this :
// Forward declarations
template <typename T, size_t N/*len*/> struct vec;
template<typename T, size_t N /*rows*/, size_t M /*cols*/> struct mat;
// Forward declare *operator for matrix
// (NxM) matrix multiplied by (MxP) matrix yields (NxP) matrix
mat<T, N, P> operator* (const mat<T, N, M>& A, const mat<T, M, P>& B);
template <typename T, size_t N>
struct vec {
public:
vec() {}
virtual ~vec() {}
private:
T[N] m_data;
};
template <typename T, size_t N, size_t M>
struct mat {
public:
mat() {}
virtual ~mat() {}
// This is where it gets interesting. By my reading of the rules
// of C++11, this counts as a partial specialization of the
// operator template, and should not work.
// However, it compiles just fine!
template <size_t n, size_t m, size_t p>
friend mat<T, n, p> operator* (const mat<T, n, m>& A,
const mat<T, m, p> &B);
// Implementation appears later in the same header file.
private:
T[N*M] m_data;
};
I declare the * operator as a friend because I want it to have access to the internal m_data member, but I don't want the users of 'mat' and 'vec' to know the internals.
This compiles and runs just fine. I have a unit test for that matrix multiplication, and it works just fine. However, I don't know why it even compiles, let alone runs. By my reading of the rules of C++ templates, the declaration of the * operator counts as a partial specialization of function template, and is illegal.
What am I missing here?
Turns out this does *NOT* compile. I thought it was compiling because I wasn't invoking the template stream operator in my unit test when I thought I was.
Sorry for the stupid question!

GCC and clang: subexpression not valid in a constant expression

Consider the following code:
template<int... V>
static constexpr int f(int v) {
int a[] = { (v ^= V, 0)... }; // Line 3
return v;
}
static constexpr int i = f<0x00>(0x11); // Line 7
int main() { }
It compiles with GCC and fails to compile with clang with the following error:
main.cpp:7:22: error: constexpr variable 'i' must be initialized by a constant expression
[...]
main.cpp:3:23: note: subexpression not valid in a constant expression
Note that it doesn't depend on the fact that I'm using a template function.
In other terms, neither the code above nor the one below compile with clang:
static constexpr int mix(int v, int u) {
int a[] = { (v ^= u, 0) };
return v;
}
static constexpr int mf = mix(0x11, 0x00);
int main() { }
Which compiler is right?
As mentioned here, it is a bug of clang:
the bug is something odd about the left-hand side of a comma operator
Confirmed and fixed.

Does C++11 allow a better way to repeat the same action on similar data without non-anonymous helper functions?

(I really can't think of a better title, please help if you can)
It's not unusual to have this kind of repetitive code... a deliberately simple example:
vector<int> x,y,z;
vector<string> a,b,c;
for(auto i : x)
if(test(i))
a.push_back(func(i));
for(auto i : y)
if(test(i))
b.push_back(func(i));
for(auto i : z)
if(test(i))
c.push_back(func(i));
You could of course write a function Convert(const vector<int> &in, vector<string> &out) however I was curious if there's a way you could use anonymous constructs or function-local definitions to avoid this.
Something like:
vector<int> x,y,z;
vector<string> a,b,c;
for(auto o : {{x,a},{y,b},{z,c}})
...
What would be the neatest way to do this - and does C++11 provide any additional help beyond a 'classic' C++/STL approach?
Note: the point of this question is how to avoid repeating code, not how specifically to transform the data between the vectors - that's just a trivial example.
The classic STL way is to use std::transform, and you can wrap that in a lambda if you want to
auto map = [](const vector<int>& in, vector<string>& out, string (*f)(int)) {
out.reserve(in.size());
std::transform(in.begin(), in.end(), std::back_inserter(out), f);
};
vector<int> x,y,z;
vector<string> a,b,c;
map(x, a, func);
map(y, b, func);
map(z, c, func);
In C++14 the lambda is easier to write:
auto map = [](const auto& in, auto& out, auto f) {
out.reserve(in.size());
std::transform(in.begin(), in.end(), std::back_inserter(out), f);
};
To do the different operation in the edited question just change the body of the lambda, you still avoid repeating the logic, so you only define it once, then apply it to each pair of vectors:
auto op = [](const auto& in, auto& out, auto f) {
for (auto i : in)
if (test(i))
out.push_back(f(i));
};
op(x, a, func);
op(y, b, func);
op(z, c, func);
If you want to go to extremes in C++14 you can bundle the vectors up in tuples and use apply from the Library Fundamentals TS like so:
vector<int> x,y,z;
vector<string> a,b,c;
using std::experimental::apply;
auto mapper = [&func, &test](auto... t) {
auto op = [](const auto& in, auto& out) {
for (auto i : in)
if (test(i))
out.push_back(func(i));
};
std::tie( (apply(op, t), std::ignore) ... );
};
mapper( std::tie(x, a), std::tie(y, b), std::tie(z, c) );
But this is getting quite unreadable!
for me this works just fine:
vector<pair<vector<string>&,vector<int>&>> work{{a,x},{b,y},{c,z}};
for(auto w:work)
{
for(auto i : w.second)
if(test(i))
w.first.push_back(func(i));
}

string literal parameter not accepted to a constexpr function

Call to the extract function below does not work for me on g++ 4.9.0 (20130421). The error I get is that s1 is not a constant expression. If i can be initialized as constexpr then j and k should too. Is that wrong?
#include <tuple>
template <unsigned N1, unsigned N2>
constexpr bool strmatch(const char (&s1)[N1], const char (&s2)[N2], unsigned i = 0)
{
return (s1[i]==s2[i]) ?
(s1[i]=='\0') ?
true
: strmatch(s1, s2, i+1)
: false;
}
template<unsigned N>
constexpr int extract(const std::tuple<int, int> & t1, const char (&array)[N]) {
return std::get<strmatch(array, "m0")>(t1);
}
int main(void)
{
constexpr int i = strmatch("m0", "m0"); // OK
constexpr int j = extract(std::make_tuple(10, 20), "m0");
constexpr int k = extract(std::make_tuple(10, 20), "m1");
return 0;
}
Your code is ill-formed. The problem is that array is not a core constant expression, so can't be used in the template argument in the call to std::get:
template<unsigned N>
constexpr int extract(const std::tuple<int, int> & t1, const char (&array)[N]) {
return std::get<strmatch(array, "m0")>(t1);
}
Remember that constexpr functions can be called at runtime: this code would use the value of a runtime parameter to this function (array) during translation (in the evaluation of the call to strmatch).

Resources