Program linking error without log message on intel driver - windows

Note: I will first spit out specifically my issue and then I will explain better my scenario.
On my windows, using my intel driver. I can compile these two (respectively vertex and fragment shader), but can not link them:
attribute vec4 vertex;
void main(void) {
gl_Position = vertex;
}
,
uniform sampler2D textures[2];
vec4 third(sampler2D texture) {
return texture2D(texture,vec2(0.5,0.5));
}
vec4 second(float a) {
return third(textures[0]);
}
void main(void) {
gl_FragColor = second(0.0);
}
I get an error on the linking (the link status is false) but the info log is empty.
The thing is that this shader works fine in my Linux with the same GPU and works fine in my NVIDIA. Only in windows using the Intel driver I get this problem. My OpenGL informs the following about the driver:
GL_VENDOR Intel
GL_RENDERER Intel(R) HD Graphics Family
GL_VERSION 3.0.0 - Build 8.15.10.2253
GL_SHADING_LANGUAGE_VERSION 1.30 - Intel Build 8.15.10.2253
Funny thing is that modifications that seems irrelevant makes the program to link correctly, I found a few so far:
If the second function does not receive any parameter, it works.
If main() calls third directly, it works.
If the uniform is not a array, it is a single element, it works.
If instead of passing the sampler2D from second to third, I pass an index that third will use to access textures, it works.
My question is how to make it work but keeping the same semantics? I need three functions, I needed that the third and the main function do not use the uniform, I need that only the second function knows the uniforms. Also, I need that the second and third function receives parameters. In my real world scenario the second and the third functions does a lot of things with the values it received.
And just for clarification, what I am doing that needs this:
In the framework that I am developing, there are three different fragment shaders being used. The first one (main in my reference code), is the one that the user provides, this one may call functions (second in my reference code) that are defined in the framework. The third level is also user provided but the shader is only compiled once and does not know how many times it will be used and it which values, the second is the responsible for allocating the right number of buffers and calling the third for each.
Here is the code that I used to test the shaders:
#include <SDL.h>
#include <GL/glew.h>
#include <iostream>
#include <vector>
#include <cassert>
#include <sstream>
#include <stdexcept>
#define WIDTH 800
#define HEIGHT 640
void warn(const std::exception& e) {
#ifndef NDEBUG
std::cerr << "Warning: " << e.what() << std::endl;
#endif
};
namespace {
std::string tostr(unsigned a) {
std::stringstream ss;
ss << a;
return ss.str();
}
std::string errorname(unsigned a) {
switch(a) {
case GL_NO_ERROR: return "GL_NO_ERROR";
case GL_INVALID_ENUM: return "GL_INVALID_ENUM";
case GL_INVALID_VALUE: return "GL_INVALID_VALUE";
case GL_INVALID_OPERATION: return "GL_INVALID_OPERATION";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "GL_INVALID_FRAMEBUFFER_OPERATION";
case GL_OUT_OF_MEMORY: return "GL_OUT_OF_MEMORY";
}
return "";
}
}
void checkGlErrorImpl(unsigned line, const char* file) {
GLenum curerr = glGetError();
if( curerr == GL_NO_ERROR )
return;
auto err = std::runtime_error(std::string("OpenGL ")+errorname(curerr)+" error on "+file+":"+tostr(line));
warn(err);
throw err;
}
#define checkGlError() checkGlErrorImpl(__LINE__,__FILE__)
int create_shader(unsigned type, const char* shaderSource) {
unsigned id = glCreateShader(type);
const char* shaderSources[2] = {"#version 130\n",shaderSource};
glShaderSource(id,2,shaderSources,NULL);
glCompileShader(id);
GLint compileStatus;
glGetShaderiv(id, GL_COMPILE_STATUS, &compileStatus);
int msgLength;
glGetShaderiv(id, GL_INFO_LOG_LENGTH, &msgLength);
char* msg = new char[msgLength];
glGetShaderInfoLog(id, msgLength, &msgLength, msg);
std::cout << "(" << id << ") " << msg << std::endl;
std::runtime_error except(std::string("Error on compiling shader:\n")+msg);
delete[] msg;
if( compileStatus == GL_FALSE ) {
warn(except);
throw except;
}
checkGlError();
return id;
};
int create_program(const std::vector<int>& shaders) {
int id = glCreateProgram();
checkGlError();
for( unsigned int i=0; i< shaders.size(); ++i ) {
glAttachShader(id,shaders[i]);
}
glLinkProgram(id);
GLint linkStatus=-1;
glGetProgramiv(id, GL_LINK_STATUS, &linkStatus);
assert(linkStatus != -1);
checkGlError();
if( linkStatus == GL_FALSE ) {
int msgLength=-1;
glGetProgramiv(id, GL_INFO_LOG_LENGTH, &msgLength);
assert( msgLength != -1 );
char* msg = new char[msgLength+1];
msg[0] = '\0';
std::cout << "Buffer(" << msgLength+1 << ")" << msg << std::endl;
glGetProgramInfoLog(id, msgLength+1, &msgLength, msg);
std::cout << "Second length " << msgLength << std::endl;
std::cout << "Log " << msg << std::endl;
std::string errormsg("Error on linking shader: ");
errormsg += msg;
// delete[] msg;
auto err = std::runtime_error(errormsg);
warn(err);
throw err;
}
checkGlError();
return id;
}
int main(int argc, char** argv) {
if( SDL_Init( SDL_INIT_VIDEO ) < 0 ) {
throw __LINE__;
}
int video_flags;
video_flags = SDL_OPENGL;
video_flags |= SDL_GL_DOUBLEBUFFER;
video_flags |= SDL_HWSURFACE;
video_flags |= SDL_HWACCEL;
SDL_GL_SetAttribute( SDL_GL_DOUBLEBUFFER, 1 );
SDL_Surface* surface = SDL_SetVideoMode( WIDTH, HEIGHT, 24, video_flags );
if( surface == NULL )
throw __LINE__;
unsigned width = WIDTH;
unsigned height = HEIGHT;
GLenum err = glewInit();
if (GLEW_OK != err) {
std::cerr << "Error: " << glewGetErrorString(err) << std::endl;
throw __LINE__;
}
std::vector<int> shaders;
shaders.push_back( create_shader(GL_VERTEX_SHADER,
"attribute vec4 vertex;\n"
"void main(void) {\n"
" gl_Position = vertex;\n"
"}\n"
));
shaders.push_back( create_shader(GL_FRAGMENT_SHADER,
"uniform sampler2D textures[2];\n"
"vec4 third(sampler2D texture) {\n"
" return texture2D(texture,vec2(0.5,0.5));\n"
"}\n"
"vec4 second(float a) {\n"
" return third(textures[0]);\n"
"}\n"
"\n"
"void main(void) {\n"
" gl_FragColor = second(0.0);\n"
"}\n"
));
int program = create_program(shaders);
try {
while( true ) {
SDL_Event event;
while( SDL_PollEvent(&event) ) {
switch( event.type ) {
case SDL_QUIT:
throw 0;
break;
}
}
SDL_Delay(10);
}
} catch( int returnal) {
return returnal;
}
return 0;
};
It depends on GLEW,SDL, OpenGL and C++11.

Related

OpenCL compute histogram program doesn't returns 0 in every bin

I'm trying to implement a simple opencl program to compute an histogram.
Below is what I currently have:
#include <CL/cl.h>
#include <iostream>
#include <vector>
#define STB_IMAGE_IMPLEMENTATION
#include <stb_image.h>
#include <algorithm>
//Getting platform, device, context and command queue
void setup(
cl_platform_id &platformId, cl_device_id &deviceId, cl_context& context, cl_command_queue& commandQueue,
std::string platformName = "NVIDIA CUDA", cl_device_type deviceType = CL_DEVICE_TYPE_GPU,
std::string deviceName = "GeForce GTX 1070")
{
using std::vector;
using std::string;
using std::cout;
using std::endl;
cl_uint numberOfPlatforms, numberOfDevices;
cl_int error;
//Finding platform id
error = clGetPlatformIDs(0,nullptr,&numberOfPlatforms);
vector<cl_platform_id> platform(numberOfPlatforms);
error = clGetPlatformIDs(numberOfPlatforms,platform.data(),nullptr);
for(const auto & currentPlatform : platform)
{
size_t stringSize;
error = clGetPlatformInfo(currentPlatform,CL_PLATFORM_NAME,0,nullptr,&stringSize);
char * currentPlatformName = new char[stringSize];
error = clGetPlatformInfo(currentPlatform,CL_PLATFORM_NAME,stringSize,currentPlatformName,nullptr);
if(string(currentPlatformName).compare(platformName) == 0)
{
cout << "Platform " << platformName << " found!" << endl;
delete [] currentPlatformName;
platformId = currentPlatform;
break;
}
delete [] currentPlatformName;
}
error = clGetDeviceIDs(platformId,deviceType,0,nullptr,&numberOfDevices);
vector<cl_device_id> device(numberOfDevices);
error = clGetDeviceIDs(platformId,deviceType,numberOfDevices,device.data(),nullptr);
for(const auto & currentDevice : device)
{
size_t stringSize;
error = clGetDeviceInfo(currentDevice,CL_DEVICE_NAME,0,nullptr,&stringSize);
char * currentDeviceName = new char[stringSize];
error = clGetDeviceInfo(currentDevice,CL_DEVICE_NAME,stringSize,currentDeviceName,nullptr);
if(string(currentDeviceName).compare(deviceName) == 0)
{
cout << "Device " << deviceName << " found!" << endl;
delete [] currentDeviceName;
deviceId = currentDevice;
break;
}
delete [] currentDeviceName;
}
context = clCreateContext(nullptr,1,&deviceId,nullptr,nullptr,&error);
commandQueue = clCreateCommandQueue(context,deviceId,0,&error);
}
void run(const std::string & imagePath, const std::string& programSource, const cl_device_id deviceId,
const cl_context& context, const cl_command_queue& commandQueue, int histogram[256])
{
cl_int error;
int width, height, channels;
stbi_set_flip_vertically_on_load(true);
unsigned char *image = stbi_load(imagePath.c_str(),
&width,
&height,
&channels,
STBI_grey);
char min = 0;
char max = 255;
for(int i = 0; i < width*height; ++i)
{
min = (image[i] < min) ? image[i]:min;
max = (image[i] > max) ? image[i]:max;
}
std::cout << "(min, max) := (" << min << ", " << max << ")" << std::endl;
//create buffers
cl_mem memImage = clCreateBuffer(context,CL_MEM_READ_ONLY,width*height*sizeof(char),image,&error);
cl_mem memHistogram = clCreateBuffer(context,CL_MEM_READ_WRITE,256*sizeof(int),&histogram,&error);
//Create program, kernel and setting kernel args
size_t programSize = programSource.length();
const char * source = programSource.c_str();
cl_program program = clCreateProgramWithSource(context,1,&source,&programSize,&error);
error = clBuildProgram(program,1,&deviceId,nullptr,nullptr,nullptr);
cl_kernel kernel = clCreateKernel(program,"computeHistogram",&error);
error = clEnqueueWriteBuffer(commandQueue,memImage,CL_TRUE,0,sizeof(cl_mem),&image,0,nullptr,nullptr);
error = clSetKernelArg(kernel,0,sizeof(cl_mem),&memImage);
error = clSetKernelArg(kernel,1,sizeof(cl_mem),&memHistogram);
clFinish(commandQueue);
size_t globalWorkSize = width*height;
error = clEnqueueNDRangeKernel(commandQueue,kernel,1,nullptr,&globalWorkSize,nullptr,0,nullptr,nullptr);
error = clEnqueueWriteBuffer(commandQueue,memHistogram,CL_TRUE,0,256*sizeof(int),&histogram,0,nullptr,nullptr);
clFinish(commandQueue);
clReleaseCommandQueue(commandQueue);
clReleaseContext(context);
}
int main(int argc, char** argv)
{
cl_platform_id platformId;
cl_device_id deviceId;
cl_context context;
cl_command_queue commandQueue;
setup(platformId,deviceId,context,commandQueue);
std::string filename = "gray.jpeg";
std::string programSource =
"__kernel void computeHistogram(\n"
" __global char * image, __global int * histogram)\n"
"{\n"
" size_t idx = get_global_id(0);\n"
" char pixelValue = image[idx];\n"
" atomic_inc(&histogram[pixelValue]);\n"
"}\n";
int histogram[256] = {0};
run(filename,programSource, deviceId, context, commandQueue,histogram);
for(int i = 0; i < 256; ++i)
{
std::cout << "i : " << histogram[i] << std::endl;
}
return 0;
}
However I get 0 in every bin. I think the logic I'm trying to apply is correct, but I cannot figure what the error is.
There are several problems. To name a few:
clCreateBuffer returns error -38 (CL_INVALID_MEM_OBJECT) because host_ptr is being passed and this is not being reflected in the flags parameter. CL_MEM_USE_HOST_PTR can be used in addition to CL_MEM_READ_ONLY and CL_MEM_READ_WRITE respectively.
To clEnqueueWriteBuffer size of cl_mem object is being passed instead of the size of image buffer.
After clEnqueueNDRangeKernel again clEnqueueWriteBuffer is being used. I suspect the intention here was to read data back and for that clEnqueueReadBuffer needs to be used.
There may be more problems. These are just the major ones and it's hard to imagine that you checked cl functions return codes and all of them returned CL_SUCCESS...
The actual program that works is the following:
#include <CL/cl.h>
#include <iostream>
#include <vector>
#define STB_IMAGE_IMPLEMENTATION
#include <stb_image.h>
#include <algorithm>
//Getting platform, device, context and command queue
void setup(
cl_platform_id &platformId, cl_device_id &deviceId, cl_context& context, cl_command_queue& commandQueue,
std::string platformName = "NVIDIA CUDA", cl_device_type deviceType = CL_DEVICE_TYPE_GPU,
std::string deviceName = "GeForce GTX 1070")
{
using std::vector;
using std::string;
using std::cout;
using std::endl;
cl_uint numberOfPlatforms, numberOfDevices;
cl_int error;
//Finding platform id
error = clGetPlatformIDs(0,nullptr,&numberOfPlatforms);
vector<cl_platform_id> platform(numberOfPlatforms);
error = clGetPlatformIDs(numberOfPlatforms,platform.data(),nullptr);
for(const auto & currentPlatform : platform)
{
size_t stringSize;
error = clGetPlatformInfo(currentPlatform,CL_PLATFORM_NAME,0,nullptr,&stringSize);
char * currentPlatformName = new char[stringSize];
error = clGetPlatformInfo(currentPlatform,CL_PLATFORM_NAME,stringSize,currentPlatformName,nullptr);
if(string(currentPlatformName).compare(platformName) == 0)
{
cout << "Platform " << platformName << " found!" << endl;
delete [] currentPlatformName;
platformId = currentPlatform;
break;
}
delete [] currentPlatformName;
}
error = clGetDeviceIDs(platformId,deviceType,0,nullptr,&numberOfDevices);
vector<cl_device_id> device(numberOfDevices);
error = clGetDeviceIDs(platformId,deviceType,numberOfDevices,device.data(),nullptr);
for(const auto & currentDevice : device)
{
size_t stringSize;
error = clGetDeviceInfo(currentDevice,CL_DEVICE_NAME,0,nullptr,&stringSize);
char * currentDeviceName = new char[stringSize];
error = clGetDeviceInfo(currentDevice,CL_DEVICE_NAME,stringSize,currentDeviceName,nullptr);
if(string(currentDeviceName).compare(deviceName) == 0)
{
cout << "Device " << deviceName << " found!" << endl;
delete [] currentDeviceName;
deviceId = currentDevice;
break;
}
delete [] currentDeviceName;
}
context = clCreateContext(nullptr,1,&deviceId,nullptr,nullptr,&error);
commandQueue = clCreateCommandQueue(context,deviceId,0,&error);
}
void run(const std::string & imagePath, const std::string& programSource, const cl_device_id deviceId,
const cl_context& context, const cl_command_queue& commandQueue, int histogram[256])
{
cl_int error;
int width, height, channels;
stbi_set_flip_vertically_on_load(true);
unsigned char *image = stbi_load(imagePath.c_str(),
&width,
&height,
&channels,
STBI_grey);
unsigned char min = 255;
unsigned char max = 0;
for(int i = 0; i < width*height; ++i)
{
min = (image[i] < min) ? image[i]:min;
max = (image[i] > max) ? image[i]:max;
}
std::cout << "(min, max) := (" << static_cast<int>(min) << ", " << static_cast<int>(max) << ")" << std::endl;
//create buffers
cl_mem memImage = clCreateBuffer(context,CL_MEM_READ_ONLY,width*height*sizeof(unsigned char),image,&error);
cl_mem memHistogram = clCreateBuffer(context,CL_MEM_READ_WRITE,256*sizeof(int),&histogram,&error);
//Create program, kernel and setting kernel args
size_t programSize = programSource.length();
const char * source = programSource.c_str();
cl_program program = clCreateProgramWithSource(context,1,&source,&programSize,&error);
error = clBuildProgram(program,1,&deviceId,nullptr,nullptr,nullptr);
cl_kernel kernel = clCreateKernel(program,"computeHistogram",&error);
error = clEnqueueWriteBuffer(commandQueue,memImage,CL_TRUE,0,width*height*sizeof(unsigned char),image,0,nullptr,nullptr);
error = clSetKernelArg(kernel,0,sizeof(cl_mem),&memImage);
error = clSetKernelArg(kernel,1,sizeof(cl_mem),&memHistogram);
clFinish(commandQueue);
const size_t globalWorkSize = width*height;
error = clEnqueueNDRangeKernel(commandQueue,kernel,1,nullptr,&globalWorkSize,nullptr,0,nullptr,nullptr);
error = clEnqueueReadBuffer(commandQueue,memHistogram,CL_TRUE,0,256*sizeof(int),histogram,0,nullptr,nullptr);
clFinish(commandQueue);
clReleaseCommandQueue(commandQueue);
clReleaseContext(context);
}
int main(int argc, char** argv)
{
cl_platform_id platformId;
cl_device_id deviceId;
cl_context context;
cl_command_queue commandQueue;
setup(platformId,deviceId,context,commandQueue);
std::string filename = "gray.jpeg";
std::string programSource =
"__kernel void computeHistogram(\n"
" __global unsigned char * image, __global int * histogram)\n"
"{\n"
" size_t idx = get_global_id(0);\n"
" unsigned char pixelValue = image[idx];\n"
" atomic_inc(&histogram[pixelValue]);\n"
" barrier(CLK_GLOBAL_MEM_FENCE);"
"}\n";
int histogram[256] = {0};
run(filename,programSource, deviceId, context, commandQueue,histogram);
for(int i = 0; i < 256; ++i)
{
std::cout << i << " : " << histogram[i] << std::endl;
}
return 0;
}
The main issue the line
error = clEnqueueReadBuffer(commandQueue,memHistogram,CL_TRUE,0,256*sizeof(int),histogram,0,nullptr,nullptr);
In the original post this was a clEnqueueWriteBuffer and the size was wrong. I was also using char instead of unsigned char and finally the kernel is different.

Load public key with openssl - invalid encoding

I start using openssl.
I want to use a public key to check a signature. But for now, I can not read my public key with openssl.
Here is my source code:
#include <iostream>
#include <openssl/ec.h>
#include <openssl/evp.h>
#include <openssl/err.h>
bool verifyPublicKey(const std::string &sRawPublicKey);
void printAllError();
int main(int argc, char* argv[])
{
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " PUBLIC KEY" << std::endl;
return EXIT_FAILURE;
}
std::string sPublicKey = argv[1];
std::cout << "Key: " << sPublicKey << std::endl;
bool bRes = verifyPublicKey(sPublicKey);
if (!bRes)
{
std::cerr << "verifyPublicKey failled" << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
bool verifyPublicKey(const std::string &sRawPublicKey)
{
bool bRes = false;
EC_KEY *eckey = EC_KEY_new_by_curve_name(NID_X9_62_prime256v1);
EC_KEY_set_conv_form(eckey, POINT_CONVERSION_UNCOMPRESSED);
unsigned char *p_RawPublicKey = new unsigned char[sRawPublicKey.length() + 1];
std::copy(sRawPublicKey.begin(), sRawPublicKey.end(), p_RawPublicKey);
const unsigned char *pubkey_raw_p = p_RawPublicKey;
o2i_ECPublicKey(&eckey, &pubkey_raw_p, sRawPublicKey.size());
if (!EC_KEY_check_key(eckey))
{
EC_KEY_free(eckey);
bRes = false;
printAllError();
}
else
{
EC_KEY_free(eckey);
bRes = true;
}
return bRes;
}
void printAllError()
{
while (ERR_peek_last_error() != 0)
{
std::cerr << ERR_error_string(ERR_get_error(), nullptr) << std::endl;
}
}
I run it with the following public key:
3059301306072A8648CE3D020106082A8648CE3D03010703420004E297417036EB4C6404CC9C2AC4F28468DD0A92F2C9496D187D2BCA784DB49AB540B9FD9ACE0BA49C8532825954755EC10246A71AF2AEE9AEC34BE683CDDFD212
ASN.1 Decoder:
SEQUENCE {
SEQUENCE {
OBJECTIDENTIFIER 1.2.840.10045.2.1 (ecPublicKey)
OBJECTIDENTIFIER 1.2.840.10045.3.1.7 (P-256)
}
BITSTRING 0x04E297417036EB4C6404CC9C2AC4F28468DD0A92F2C9496D187D2BCA784DB49AB540B9FD9ACE0BA49C8532825954755EC10246A71AF2AEE9AEC34BE683CDDFD212
: 0 unused bit(s)
}
With the ASN.1, I notice that the key I use is in the correct format: 0x04 || HEX(x) || HEX(y) with z = 0x04.
The output of the program is as follows:
Key: 3059301306072A8648CE3D020106082A8648CE3D03010703420004E297417036EB4C6404CC9C2AC4F28468DD0A92F2C9496D187D2BCA784DB49AB540B9FD9ACE0BA49C8532825954755EC10246A71AF2AEE9AEC34BE683CDDFD212
error:10067066:elliptic curve routines:ec_GFp_simple_oct2point:invalid encoding
error:10098010:elliptic curve routines:o2i_ECPublicKey:EC lib
error:1010206A:elliptic curve routines:ec_key_simple_check_key:point at infinity verifyPublicKey failed
I'm lost. Do you have explanations?
Moreover, is it possible to go further by giving only x and y (without ASN.1 header).
Thank you
Looks like you should feed the raw point to function o2i_ECPublicKey(), without the ASN.1 framing.

How to Avoid Copies When Doing Method Chaining in C++

I love to use method chaining to completely initialize objects and then store them in const variables. When analyzing the resulting code it turns out that this means the execution of many copy constructors. Therefore I have wondered whether C++ 11 move semantics might help optimizing method chaining.
Indeed I have been able to significantly speed up my code by adding overloads with ref qualifiers to my chain methods. Please consider this source code:
#include <chrono>
#include <iostream>
#include <string>
#undef DEBUGGING_OUTPUT
#undef ENABLE_MOVING
class Entity
{
public:
Entity() :
data(0.0), text("Standard Text")
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Constructing entity." << std::endl;
#endif
}
Entity(const Entity& entity) :
data(entity.data), text(entity.text)
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Copying entity." << std::endl;
#endif
}
Entity(Entity&& entity) :
data(entity.data), text(std::move(entity.text))
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Moving entity." << std::endl;
#endif
}
~Entity()
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Cleaning up entity." << std::endl;
#endif
}
double getData() const
{
return data;
}
const std::string& getText() const
{
return text;
}
void modify1()
{
data += 1.0;
text += " 1";
}
Entity getModified1() const &
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Lvalue version of getModified1" << std::endl;
#endif
Entity newEntity = *this;
newEntity.modify1();
return newEntity;
}
#ifdef ENABLE_MOVING
Entity getModified1() &&
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Rvalue version of getModified1" << std::endl;
#endif
modify1();
return std::move(*this);
}
#endif
void modify2()
{
data += 2.0;
text += " 2";
}
Entity getModified2() const &
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Lvalue version of getModified2" << std::endl;
#endif
Entity newEntity = *this;
newEntity.modify2();
return newEntity;
}
#ifdef ENABLE_MOVING
Entity getModified2() &&
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Rvalue version of getModified2" << std::endl;
#endif
modify2();
return std::move(*this);
}
#endif
private:
double data;
std::string text;
};
int main()
{
const int interationCount = 1000;
{
// Create a temporary entity, modify it and store it in a const variable
// by taking use of method chaining.
//
// This approach is elegant to write and read, but it is slower than the
// other approach.
const std::chrono::steady_clock::time_point startTimePoint =
std::chrono::steady_clock::now();
for (int i = 0; i < interationCount; ++i)
{
const Entity entity = Entity().getModified1().getModified1().getModified2().getModified2();
#ifdef DEBUGGING_OUTPUT
std::cout << "Entity has text " << entity.getText() << " and data "
<< entity.getData() << std::endl;
#endif
}
const std::chrono::steady_clock::time_point stopTimePoint =
std::chrono::steady_clock::now();
const std::chrono::duration<double> timeSpan = std::chrono::duration_cast<
std::chrono::duration<double>>(stopTimePoint - startTimePoint);
std::cout << "Method chaining has taken " << timeSpan.count() << " seconds."
<< std::endl;
}
{
// Create an entity and modify it without method chaining. It cannot be
// stored in a const variable.
//
// This approach is optimal from a performance point of view, but it is longish
// and renders usage of a const variable impossible even if the entity
// won't change after initialization.
const std::chrono::steady_clock::time_point startTimePoint =
std::chrono::steady_clock::now();
for (int i = 0; i < interationCount; ++i)
{
Entity entity;
entity.modify1();
entity.modify1();
entity.modify2();
entity.modify2();
#ifdef DEBUGGING_OUTPUT
std::cout << "Entity has text " << entity.getText() << " and data "
<< entity.getData() << std::endl;
#endif
}
const std::chrono::steady_clock::time_point stopTimePoint =
std::chrono::steady_clock::now();
const std::chrono::duration<double> timeSpan = std::chrono::duration_cast<
std::chrono::duration<double>>(stopTimePoint - startTimePoint);
std::cout << "Modification without method chaining has taken "
<< timeSpan.count() << " seconds." << std::endl;
}
return 0;
}
The version without method chaining is approximately 10 times faster here than the other one. As soon as I replace
#undef ENABLE_MOVING
by
#define ENABLE_MOVING
the version without method chaining remains only 1.5 times faster than the other one. So this is a great improvement.
Still I wonder whether I could optimize the code even more. When I switch to
#define DEBUGGING_OUTPUT
then I can see that there are new entities created for every call to getModified1() or getModified2(). The only advantage of move construction is that creation is cheaper. Is there a way to even prevent move construction and work on the original entity with method chaining?
With the help of Igor Tandetnik I guess that I can answer my question!
The modification methods have to be changed to return rvalue references:
#ifdef ENABLE_MOVING
Entity&& getModified1() &&
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Rvalue version of getModified1" << std::endl;
#endif
modify1();
return std::move(*this);
}
#endif
#ifdef ENABLE_MOVING
Entity&& getModified2() &&
{
#ifdef DEBUGGING_OUTPUT
std::cout << "Rvalue version of getModified2" << std::endl;
#endif
modify2();
return std::move(*this);
}
#endif
and the initialization has to happen like this:
const Entity entity = std::move(Entity().getModified1().getModified1().getModified2().getModified2());
Then the method chaining code is almost as efficient as the other code. The difference is one call to the move constructor and one additional destructor call for the temporary instance which is negligible.
Thank you for your help!

Win32/x86 SEH Bug on EXCEPTION_FLT_INVALID_OPERATION?

I just experiemnted with Win32 structured exception handling.
I generated a singalling NaN through a 0.0 / 0.0 division.
I wrote the following test-code:
#include <windows.h>
#include <cfloat>
#include <iostream>
int main()
{
double d1 = 0.0;
double d2 = 0.0;
double dx;
_controlfp( _controlfp( 0, 0 ) & ~_EM_INVALID, _MCW_EM );
__try
{
dx = d1 / d2;
}
__except( GetExceptionCode() == EXCEPTION_FLT_INVALID_OPERATION
? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH )
{
std::cout << "exception caught";
}
return 0;
}
I compiled the code both for Win32 x86 and Win32 x64.
For both cases, SSE2-code is genrated.
For x64, the exception is caught properly. But for x86, I get an uncaught exception.
When I change the __except-line to
__except( EXCEPTION_EXECUTE_HANDLER )
and compile the code for x86, the exception is caught.
Is this a Windows-bug?
[EDIT1]
I extended my program, here it is:
#include <windows.h>
#include <intrin.h>
#include <cfloat>
#include <limits>
#include <iostream>
using namespace std;
void PrintSseExceptionMask();
void ClearSseExceptionFlags();
LONG CALLBACK Handler( PEXCEPTION_POINTERS ExceptionInfo );
int main()
{
double d1 = 0.0;
double d2 = 0.0;
double dx;
_controlfp( ~_EM_INVALID & _MCW_EM, _MCW_EM );
PrintSseExceptionMask();
AddVectoredExceptionHandler( 0, Handler );
ClearSseExceptionFlags();
dx = d1 / d2;
return 0;
}
void PrintSseExceptionFlags( unsigned mxcsr );
LONG CALLBACK Handler( PEXCEPTION_POINTERS pep )
{
unsigned mxcsr = _mm_getcsr();
if( pep->ExceptionRecord->ExceptionCode == STATUS_FLOAT_INVALID_OPERATION )
cout << "float invalid operation caught" << endl;
else if( pep->ExceptionRecord->ExceptionCode == STATUS_FLOAT_MULTIPLE_TRAPS )
cout << "multiple float traps caught" << endl;
PrintSseExceptionFlags( mxcsr );
return EXCEPTION_CONTINUE_SEARCH;
}
unsigned const MXCSR_INVALID_OPERATION_FLAG = 0x0001;
unsigned const MXCSR_DENORMAL_FLAG = 0x0002;
unsigned const MXCSR_DIVIDE_BY_ZERO_FLAG = 0x0004;
unsigned const MXCSR_OVERFLOW_FLAG = 0x0008;
unsigned const MXCSR_UNDERFLOW_FLAG = 0x0010;
unsigned const MXCSR_PRECISION_FLAG = 0x0020;
unsigned const MXCSR_EXCEPTION_FLAGS = 0x003F;
unsigned const MXCSR_INVALID_OPERATION_MASK = 0x0080;
unsigned const MXCSR_DENORMAL_OPERATION_MASK = 0x0100;
unsigned const MXCSR_DIVIDE_BY_ZERO_MASK = 0x0200;
unsigned const MXCSR_OVERFLOW_MASK = 0x0400;
unsigned const MXCSR_UNDERFLOW_MASK = 0x0800;
unsigned const MXCSR_PRECISION_MASK = 0x1000;
unsigned const MXCSR_EXCEPTION_MASK = 0x1F80;
void PrintSseExceptionFlags( unsigned mxcsr )
{
unsigned exceptionFlags;
static const struct
{
unsigned flag;
char *pstrFlag;
} aExceptionFlags[] =
{
MXCSR_INVALID_OPERATION_FLAG, "invalid operation flag",
MXCSR_DENORMAL_FLAG, "denormal flag",
MXCSR_DIVIDE_BY_ZERO_FLAG, "divide by zero flag",
MXCSR_OVERFLOW_FLAG, "overflow flag",
MXCSR_UNDERFLOW_FLAG, "underflow flag",
MXCSR_PRECISION_FLAG, "precision flag",
(unsigned)-1, nullptr
};
if( !(exceptionFlags = mxcsr & MXCSR_EXCEPTION_FLAGS) )
{
cout << "no exception flags set" << endl;
return;
}
for( int i = 0; aExceptionFlags[i].pstrFlag; i++ )
if( exceptionFlags & aExceptionFlags[i].flag )
cout << aExceptionFlags[i].pstrFlag << " set" << endl;
}
void PrintSseExceptionMask()
{
unsigned exceptionMasks;
static const struct
{
unsigned mask;
char *pstrMask;
} aExceptionMasks[] =
{
MXCSR_INVALID_OPERATION_MASK, "invalid operation",
MXCSR_DENORMAL_OPERATION_MASK, "denormal operation",
MXCSR_DIVIDE_BY_ZERO_MASK, "divide by zero",
MXCSR_OVERFLOW_MASK, "overflow",
MXCSR_UNDERFLOW_MASK, "underflow",
MXCSR_PRECISION_MASK, "precision",
(unsigned)-1, nullptr
};
if( (exceptionMasks = _mm_getcsr() & MXCSR_EXCEPTION_MASK) == MXCSR_EXCEPTION_MASK )
{
cout << "all excpetions masked" << endl;
return;
}
for( int i = 0; aExceptionMasks[i].pstrMask; i++ )
if( !(exceptionMasks & aExceptionMasks[i].mask) )
cout << aExceptionMasks[i].pstrMask << " exception enabled" << endl;
}
void ClearSseExceptionFlags()
{
_mm_setcsr( _mm_getcsr() & ~MXCSR_EXCEPTION_FLAGS );
}
When the exception is caught in Handler(), it reports that only the invalid operation flag of the MXCSR is set, and no other flag.
So it seems, that there isn't a different behaviour of the FPU in x86-mode, but there is rather a Windows-bug.

What does Opencl make_Kernel actuatlly return and how do i store it?

I have the following code below, where I am trying to store my kernel that I compiled into a variable functor that can then be accessed later. Unfortunately, when I declare auto kernelTest in the struct, it throws an error saying that "non-static member declared as auto". What does cl::make_kernel actually return, and how can I store it as a private variable?
struct OCLData
{
cl::Device device;
cl::Context context;
cl::CommandQueue queue;
cl::Program program;
auto kernelTest; (PROBLEM)
const char *kernelTestSource = MULTILINE(
__kernel void kernelTest(const int N, __global float* A, __global float* B, __global float* C)
{
int i = get_global_id(0);
int j = get_global_id(1);
}
);
OCLData(){
try{
// Set Device
cl_uint deviceIndex = 0;
std::vector<cl::Device> devices;
unsigned numDevices = getDeviceList(devices);
if (deviceIndex >= numDevices)
{
std::cout << "Invalid device index (try '--list')\n";
return;
}
this->device = devices[deviceIndex];
// Set Context and Queue
std::vector<cl::Device> chosen_device;
chosen_device.push_back(device);
this->context = cl::Context(chosen_device);
this->queue = cl::CommandQueue(context, device);
// Print Device Name
std::string name;
getDeviceName(this->device, name);
std::cout << "\nUsing OpenCL device: " << name << "\n";
// Compile GPU Code
this->program = cl::Program(this->context, this->kernelTestSource, true);
//auto kernel = cl::make_kernel<int, cl::Buffer, cl::Buffer, cl::Buffer>(this->program, "kernelTest");
this->test = cl::make_kernel<int, cl::Buffer, cl::Buffer, cl::Buffer>(this->program, "kernelTest");
//cl::make_kernel<int, cl::Buffer, cl::Buffer, cl::Buffer> naive_mmul(this->program, "kernelTest");
std::cout << "GPU Code Compiled: " << "\n";
} catch (cl::Error err)
{
std::cout << "Exception\n";
std::cerr << "ERROR: "
<< err.what()
<< "("
<< err_code(err.err())
<< ")"
<< std::endl;
}
}
};
cl::make_kernel<int, cl::Buffer, cl::Buffer, cl::Buffer> creates object of that type.
According to C++11 standard auto class member must be also static const which means it must be initialized however there is quite a bit of the code to be executed before cl::make_kerenl<...> could be created.
In this case you can use std::shared_ptr as a member type of the struct:
std::shared_ptr<cl::make_kernel<int, cl::Buffer, cl::Buffer, cl::Buffer>> kernelTest;
and then later in the code:
this->kernelTest.reset(new cl::make_kernel<int, cl::Buffer, cl::Buffer, cl::Buffer>(this->program, "kernelTest"));
I did this as suggested by a friend:
typedef cl::make_kernel <float, cl::Buffer&> kernelTestType;
std::function<kernelTestType::type_> kernelTest;
this->kernelTest = kernelTestType(this->program, "kernelTest");
Looks like this works

Resources