cudaMemPrefetchAsync() returns cudaErrorInvalidDevice - why? - debugging

Whenever I call cudaMemPrefetchAsync() it returns the error code cudaErrorInvalidDevice. I am sure that I pass right device id (I have only one CUDA-capable GPU in my laptop under id == 0).
I believe that code sample posted below is error-free, but at line 52 (call to cudaMemPrefetchAsync()) I keep getting this error.
I tried:
Clean driver installation. (Latest version)
I check Google for an answer, but I could not find any. (I managed only to find this)
(I haven't idea for anything else)
System Spec:
OS: Microsoft windows 8.1 x64 home.
IDE: Visual studio 2015
CUDA toolkit: 8.0.61
NVIDIA GPU: GeForce GTX 960M
NVIDIA GPU driver: ver 381.65 (latest)
Compute Capability: 5.0 (Maxwell)
Unified Memory support: is supported.
Intel integrated gpu: Intel HD graphics 4600
Code Sample:
/////////////////////////////////////////////////////////////////////////////////////////////////////////
// TEST AREA:
// -- INCLUDE:
/////////////////////////////////////////////////////////////////////////////////////////////////////////
// Cuda Libs: ( Device Side ):
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
// Std C++ Libs:
#include <iostream>
#include <iomanip>
///////////
/////////////////////////////////////////////////////////////////////////////////////////////////////////
// TEST AREA:
// -- NAMESPACE:
/////////////////////////////////////////////////////////////////////////////////////////////////////////
using namespace std;
///////////
/////////////////////////////////////////////////////////////////////////////////////////////////////////
// TEST AREA:
// -- START POINT:
/////////////////////////////////////////////////////////////////////////////////////////////////////////
int main() {
// Set cuda Device:
if (cudaSetDevice(0) != cudaSuccess)
cout << "ERROR: cudaSetDevice(***)" << endl;
// Array:
unsigned int size = 1000;
double * d_ptr = nullptr;
// Allocate unified memory:
if (cudaMallocManaged(&d_ptr, size * sizeof(double), cudaMemAttachGlobal) != cudaSuccess)
cout << "ERROR: cudaMallocManaged(***)" << endl;
if (cudaDeviceSynchronize() != cudaSuccess)
cout << "ERROR: cudaDeviceSynchronize(***)" << endl;
// Prefetch:
if(cudaMemPrefetchAsync(d_ptr, size * sizeof(double), 0) != cudaSuccess)
cout << "ERROR: cudaMemPrefetchAsync(***)" << endl;
// Exit:
getchar();
}
///////////

Thank to talonmies I have realized that my GPU does not support prefetch feature. In order to be able to use cudaMemPrefetchAsync(***) gpu must have non-zero value in (cudaDeviceProp)deviceProp.concurrentManagedAccess.
See more here.

Related

Use libp11 on Windows environment

I want to install all I need to use libp11 and use libp11.
What is my environment and needs:
I work on Windows 10, 64 bits.
I add package with pacman linux command on my mingw32 terminal (msys64 version of 2022/09/04).
I work on a Qt Creator editor and I have a .pro file to configure my Qt project.
I want to develop a C++ module for my application which use libp11 to get the Yubikey bin number and decrypt file.
Why I use a mingw32 terminal and not the mingw64, because, for the moment, the project is always in develop in QT 4.8.
What I do:
I read the README file and follow the installation step of the INSTALL file (first I follow the MinGW / MSYS chapter, then I follow the MSYS2 chapter.)
I do much more, but I don't remember every thing and I go in many wrong ways.
My problems and questions
I try to follow the examples find in GitHub.
I help me with the site cpp.hotexemples.com to find the second parameter for the PKCS11_CTX_load function. I find the st_engine_ctx structure on this project.
The project file:
TEMPLATE = app
CONFIG += console c++17
CONFIG -= app_bundle
CONFIG += qt
LIBS += \
-lp11 \
-lssl \
-lcrypto
SOURCES += \
TestYubikey.cpp \
main.cpp
HEADERS += \
TestYubikey.h
The header file:
#ifndef TESTYUBIKEY_H
#define TESTYUBIKEY_H
#include <libp11.h>
#include <cstdio>
#include <cstring>
#include <openssl/err.h>
#include <openssl/crypto.h>
#include <openssl/objects.h>
#include <openssl/engine.h>
#include <openssl/ui.h>
/* Engine configuration */
/* The PIN used for login. Cache for the ctx_get_pin function.
* The memory for this PIN is always owned internally,
* and may be freed as necessary. Before freeing, the PIN
* must be whitened, to prevent security holes.
*/
struct st_engine_ctx
{
char *pin = nullptr;
size_t pin_length = 0;
int verbose = 0;
char *module = nullptr;
char *init_args = nullptr;
UI_METHOD *ui_method = nullptr;
void *callback_data = nullptr;
int force_login = 0;
/* Engine initialization mutex */
#if OPENSSL_VERSION_NUMBER >= 0x10100004L && !defined(LIBRESSL_VERSION_NUMBER)
CRYPTO_RWLOCK *rwlock = nullptr;
#else
int rwlock;
#endif
/* Current operations */
PKCS11_CTX *pkcs11_ctx = nullptr;
PKCS11_SLOT *slot_list = nullptr;
unsigned int slot_count = 0;
};
class TestYubikey
{
public:
TestYubikey();
};
#endif // TESTYUBIKEY_H
The source file:
//libp11 is a wrapper library for PKCS#11 modules with OpenSSL interface
#include "TestYubikey.h"
#include <iostream>
TestYubikey::TestYubikey()
{
// Create a new libp11 context
PKCS11_CTX *ctx = PKCS11_CTX_new();
std::cout << "ctx = " << ctx << std::endl;
/* load pkcs #11 module */
int rc = PKCS11_CTX_load(ctx, "C:\\msys64\\mingw32\\lib\\engines-1_1\\pkcs11.dll"); //I test with "libpkcs11.dll" and "pkcs11.dll" too.
std::cout << "rc = " << rc << std::endl;
if (rc == -1)
{
std::cout << "Loading pkcs11 engine failed";
unsigned long error_code = ERR_get_error();
const char* error_detail = ERR_reason_error_string(error_code);
std::cout << " (" << error_code << ") : " << std::string(error_detail) << std::endl;
}
else
{
std::cout << "Loading pkcs11 engine worked !" << std::endl;
}
}
My output console show:
11:59:27: Starting C:/Users/jgomez/Documents/build-SandBox-Desktop_Qt_4_8_7_MinGW_32_bit-Release/release/SandBox.exe...
ctx = 0x2ca8f50
rc = -1
Loading pkcs11 engine failed (0) : terminate called after throwing an instance of 'std::logic_error'
what(): basic_string: construction from null is not valid
11:59:29: C:/Users/jgomez/Documents/build-SandBox-Desktop_Qt_4_8_7_MinGW_32_bit-Release/release/SandBox.exe exited with code 3
My problem:
rc = -1
Solution :
use the Dll called opensc-pkcs11.dll provided with OpenSC project and it should work.
( https://github.com/OpenSC/OpenSC , after installing , it should be found here by default C:\Program Files (x86)\OpenSC Project\OpenSC\pkcs11)
Explantation :
I have encountered the same problem, and after messing around with the files I figured out that this error due the PKCS11_CTX_Load function.
PKCS11_CTX_Load , tries to load the pkcs11.dll , and then tries getting the address of "c_get_function_list" from this dll, which fails since it doesn't have that function.

Unable to get the physical monitor handle on Windows 7 for color calibration

I'm on Windows 7 and I'm trying to change the color balance from code. Specifically, I'm trying to change these sliders that show on the color calibration wizard.
I'm assuming that the correct functions are SetMonitorRedGreenOrBlueGain and SetMonitorRedGreenOrBlueDrive.
Here is my minimal working example:
#pragma comment(lib, "dxva2.lib")
#include <windows.h>
#include <lowlevelmonitorconfigurationapi.h>
#include <physicalmonitorenumerationapi.h>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <string>
using namespace std;
int main()
{
HWND hWnd = GetDesktopWindow();
HMONITOR hMonitor = MonitorFromWindow(hWnd, MONITOR_DEFAULTTOPRIMARY);
cout << "Monitor: " << hMonitor << endl;
DWORD cPhysicalMonitors;
BOOL bSuccess = GetNumberOfPhysicalMonitorsFromHMONITOR(hMonitor, &cPhysicalMonitors);
cout << "GetNumber: " << bSuccess << ", number of physical monitors: " << cPhysicalMonitors << endl;
LPPHYSICAL_MONITOR pPhysicalMonitors = (LPPHYSICAL_MONITOR)malloc(cPhysicalMonitors * sizeof(PHYSICAL_MONITOR));
bSuccess = GetPhysicalMonitorsFromHMONITOR(hMonitor, cPhysicalMonitors, pPhysicalMonitors);
cout << "GetPhysicalMonitor: " << bSuccess << endl
<< "Handle: " << pPhysicalMonitors[0].hPhysicalMonitor << endl
<< "Description: ";
wcout << (WCHAR*)(pPhysicalMonitors[0].szPhysicalMonitorDescription);
DestroyPhysicalMonitors(cPhysicalMonitors, pPhysicalMonitors);
free(pPhysicalMonitors);
}
The output is:
Monitor: 00010001
GetNumber: 1, number of physical monitors: 1
GetPhysicalMonitor: 1
Handle: 00000000
Description: Generic PnP Monitor
All the functions for brightness and color gains require HANDLE hPhysicalMonitor which is always null for my display (laptop screen). But, I know it must be possible to change the color balance since the color calibration window can do that.
What am I doing wrong?
EDIT 1:
As mentioned in the comments, it seems that the hPhysicalMonitor is correct. My issue is that calling functions like GetMonitorBrightness returns FALSE with an error code of ERROR_GRAPHICS_I2C_ERROR_TRANSMITTING_DATA (An error occurred while transmitting data to the device on the I2C bus.)
EDIT 2:
My monitor does support setting brightness and has 11 levels. Windows itself and some programs are able to adjust the brightness (the back-light of the monitor directly). So the issue must be software related.
My issue is that calling functions like GetMonitorBrightness returns
FALSE with an error code of ERROR_GRAPHICS_I2C_ERROR_TRANSMITTING_DATA
(An error occurred while transmitting data to the device on the I2C
bus.)
GetMonitorBrightness works for me. I tested it on i.e desktop. Some similar cases point out that GetMonitorBrightness does not work on some laptops.
GetMonitorCapabilities returns false
How to control system brightness using windows api ?
I think your laptop does not support DDC/CI.
GetMonitorCapabilities: The function fails if the monitor does not
support DDC/CI.
You may first check if your laptop supports DDC/Cl.

Weird behaviour using thrust experimental::pinned_allocator in cuda

I am currently trying to delete part of the cumbersome cudaMallocHost/cudaFreeHost from my code. To do so, I am willing to use only std::vector, but I absolutely need that the underlying memory must be of pinned cuda memory type.
But, I am facing strange behaviour using the thrust::system::cuda::experimental::pinned_allocator<> from the thrust library:
//STL
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
//CUDA
#include <cuda_runtime.h>
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/system/cuda/experimental/pinned_allocator.h>
#define SIZE 4
#define INITVAL 2
#define ENDVAL 4
//Compile using nvcc ./main.cu -o test -std=c++11
int main( int argc, char* argv[] )
{
// init host
std::vector<float,thrust::system::cuda::experimental::pinned_allocator<float> > hostVec(SIZE);
std::fill(hostVec.begin(),hostVec.end(),INITVAL);
//Init device
thrust::device_vector<float> thrustVec(hostVec.size());
//Copy
thrust::copy(hostVec.begin(), hostVec.end(), thrustVec.begin());
//std::cout << "Dereferencing values of the device, values should be "<< INITVAL << std::endl;
std::for_each(thrustVec.begin(),thrustVec.end(),[](float in){ std::cout <<"val is "<<in<<std::endl;} );
std::cout << "------------------------" << std::endl;
//Do Stuff
thrust::transform( thrustVec.begin(), thrustVec.end(), thrust::make_constant_iterator(2), thrustVec.begin(), thrust::multiplies<float>() );
//std::cout << "Dereferencing values of the device, values should now be "<< ENDVAL << std::endl;
std::for_each(thrustVec.begin(),thrustVec.end(),[](float in){ std::cout <<"val is "<<in<<std::endl;} );
std::cout << "------------------------" << std::endl;
//Copy back
thrust::copy(thrustVec.begin(), thrustVec.end(), hostVec.begin());
//Synchronize
//cudaDeviceSynchronize(); //makes the weird behaviour to go away
//Check result
//std::cout << "Dereferencing values on the host, values should now be "<< ENDVAL << std::endl;//Also makes the weird behaviour to go away
std::for_each(hostVec.begin(),hostVec.end(),[](float in){ std::cout <<"val is "<<in<<std::endl;} );
return EXIT_SUCCESS;
}
Which, in my setup, gives:
val is 2
val is 2
val is 2
val is 2
------------------------
val is 4
val is 4
val is 4
val is 4
------------------------
val is 2
val is 4
val is 4
val is 4
Why does the copy from device to host seems to fail ? Nvvp however shows a perfectly fine chronogram with the right values for copy.
By the way, I use NVCC/cuda/thrust from the 7.5 package, and gcc (GCC) 4.8.5 with a titanX card.
Thank you in advance for your help.
This was a real bug, and thrust developpers were already aware of it, see https://github.com/thrust/thrust/issues/775
Using the latest 1.8.3 version of thrust from the github repository solved the problem for me.

Error in compilation : undefined reference to 'clGetPlatformInfo#20'

I'm such a newby concerning OpenCL programming, and I want to run a simple program which is in "OpenCL Parallel Programming Development Cookbook".
In fact, I want to query OpenCl platforms by this simple prog:
#include <stdio.h>
#include <stdlib.h>
#include <CL/cl.h>
void displayPlatformInfo(cl_platform_id id,
cl_platform_info param_name,
const char* paramNameAsStr) {
cl_int error = 0;
size_t paramSize = 0;
error = clGetPlatformInfo( id, param_name, 0, NULL, &paramSize );
char* moreInfo = (char*)malloc( sizeof(char) * paramSize);
error = clGetPlatformInfo( id, param_name, paramSize,moreInfo, NULL );
if (error != CL_SUCCESS ) {
perror("Unable to find any OpenCL platform information");
return;
}
printf("%s: %s\n", paramNameAsStr, moreInfo);
}
int main() {
/* OpenCL 1.2 data structures */
cl_platform_id* platforms;
/* OpenCL 1.1 scalar data types */
cl_uint numOfPlatforms;
cl_int error;
/*
Get the number of platforms
Remember that for each vendor's SDK installed on the
Computer, the number of available platform also
*/
error = clGetPlatformIDs(0, NULL, &numOfPlatforms);
if(error < 0) {
perror("Unable to find any OpenCL platforms");
exit(1);
}
// Allocate memory for the number of installed platforms.
// alloca(...) occupies some stack space but is
// automatically freed on return
platforms = (cl_platform_id*) malloc(sizeof(cl_platform_id)
* numOfPlatforms);
printf("Number of OpenCL platforms found: %d\n",
numOfPlatforms);
// We invoke the API 'clPlatformInfo' twice for each
// parameter we're trying to extract
// and we use the return value to create temporary data
// structures (on the stack) to store
// the returned information on the second invocation.
for(cl_uint i = 0; i < numOfPlatforms; ++i) {
displayPlatformInfo( platforms[i],
CL_PLATFORM_PROFILE,
"CL_PLATFORM_PROFILE" );
displayPlatformInfo( platforms[i],
CL_PLATFORM_VERSION,
"CL_PLATFORM_VERSION" );
displayPlatformInfo( platforms[i],
CL_PLATFORM_NAME,
"CL_PLATFORM_NAME" );
displayPlatformInfo( platforms[i],
CL_PLATFORM_VENDOR,
"CL_PLATFORM_VENDOR" );
displayPlatformInfo( platforms[i],
CL_PLATFORM_EXTENSIONS,
"CL_PLATFORM_EXTENSIONS" );
}
return 0;
}
I'm on Qt Creator, and my pc's config concerning video is : NVIDIA GEFORCE GT 635M & Intel(R) HD Graphics 4000 under Windows 8.1
My .pro file is :
SOURCES += \
main.cpp
QMAKE_CXXFLAGS += -std=c++0x
INCLUDEPATH += \
$$quote(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v6.5/include)
LIBS += \
$$quote(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v6.5/lib/x64/OpenCL.lib)
Because of spaces in file path. So, my question is : Why, when i'm compiling my project, does the problem "undefined reference to clGetPlatformInfo#20'" appear? There's 2 others errors (one which exactly the same, the other is "undefined reference toclGetPlatformIDs#12'")
I search on the web for a lot of days and I can't find the answer (these prob has answer but on Linux or on Mac..)
Thanks in advance !
Mathieu
It looks like you are trying to build 32-bit application, while linking with 64-bit version of OpenCL.lib:
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v6.5/lib/x64/OpenCL.lib
So, either build application in 64-bit mode, or fix the path to point to 32-bit version of OpenCL.lib.

Using OpenGL Vertex Buffer Objects with Dynamically linked OpenGL from Windows

I am working on setting up a basic OpenGL application by dynamically linking the opengl32.dll file pre-packaged with Windows (That part is non-optional). However I am having quite a lot of difficulty getting procedure addresses for the functions related to Vertex Buffer Objects.
My initial investigations have revealed that windows only exposes the OpenGL 1.1 specification at first, and wglGetProcAddress calls need to be used to get any functions more recent than that. So I modified my code to attempt that method as well. I am using glGenBuffers as my example case, and have attempted four different attempts to load it, and all fail. I have also used glGetString to check my version number which is reported as major version 4, so I doubt it lacks VBO support.
How should I be getting the proc addresses for these VBO functions?
A minimized example of the code I'm dealing with is here:
#include <iostream>
#include "windows.h"
using namespace std;
int main()
{
//Load openGL and get necessary functions
HINSTANCE hDLL = LoadLibrary("opengl32.dll");
PROC WINAPI(*winglGetProcAddress)(LPCSTR);
void(*genBuffers)(int, unsigned int*);
if(hDLL)
{
winglGetProcAddress = (PROC WINAPI(*)(LPCSTR))GetProcAddress(hDLL, "wglGetProcAddress");
if(winglGetProcAddress == NULL){cout << "wglGetProcAddress not found!" << endl; return 0;}
genBuffers = (void(*)(int, unsigned int*))GetProcAddress(hDLL, "glGenBuffers");
if(genBuffers == NULL){genBuffers = (void(*)(int, unsigned int*))winglGetProcAddress("glGenBuffers");}
}
else
{cout << "This application requires Open GL support." << endl; return 0;}
//glGenBuffers not supported, fallback to glGenBuffersARB
if(genBuffers == NULL)
{
genBuffers = (void(*)(int, unsigned int*))GetProcAddress(hDLL, "glGenBuffersARB");
if(genBuffers == NULL){genBuffers = (void(*)(int, unsigned int*))winglGetProcAddress("glGenBuffersARB");}
if(genBuffers == NULL)
{cout << "Could not locate glGenBuffers or glGenBuffersARB in opengl32.dll." << endl; return 0;}
}
//get a Vertex Buffer Object
unsigned int a[1];
genBuffers(1, a);
//cleanup
if(!FreeLibrary(hDLL))
{cout << "Failed to free the opengl32.dll library." << endl;}
return 0;
}
When run, it loads the library and get's the wglGetProcAddress correctly, but then outputs the "Could not locate glGenBuffers or glGenBuffersARB in opengl32.dll." error, indicating it failed to get either "glGenBuffers" or "glGenBuffersARB" using either "GetProcAddress" or "wglGetProcAddress".
Alternatively, if this does mean I do not have VBO support, will a driver update help, or is it even possible to get it supported? I'd really rather not use deprecated immediate mode calls.
I am running this in Code::Blocks, on Windows XP, Intel Core i5, with an NVIDIA GeForce GTX 460.

Resources