I'm such a newby concerning OpenCL programming, and I want to run a simple program which is in "OpenCL Parallel Programming Development Cookbook".
In fact, I want to query OpenCl platforms by this simple prog:
#include <stdio.h>
#include <stdlib.h>
#include <CL/cl.h>
void displayPlatformInfo(cl_platform_id id,
cl_platform_info param_name,
const char* paramNameAsStr) {
cl_int error = 0;
size_t paramSize = 0;
error = clGetPlatformInfo( id, param_name, 0, NULL, ¶mSize );
char* moreInfo = (char*)malloc( sizeof(char) * paramSize);
error = clGetPlatformInfo( id, param_name, paramSize,moreInfo, NULL );
if (error != CL_SUCCESS ) {
perror("Unable to find any OpenCL platform information");
return;
}
printf("%s: %s\n", paramNameAsStr, moreInfo);
}
int main() {
/* OpenCL 1.2 data structures */
cl_platform_id* platforms;
/* OpenCL 1.1 scalar data types */
cl_uint numOfPlatforms;
cl_int error;
/*
Get the number of platforms
Remember that for each vendor's SDK installed on the
Computer, the number of available platform also
*/
error = clGetPlatformIDs(0, NULL, &numOfPlatforms);
if(error < 0) {
perror("Unable to find any OpenCL platforms");
exit(1);
}
// Allocate memory for the number of installed platforms.
// alloca(...) occupies some stack space but is
// automatically freed on return
platforms = (cl_platform_id*) malloc(sizeof(cl_platform_id)
* numOfPlatforms);
printf("Number of OpenCL platforms found: %d\n",
numOfPlatforms);
// We invoke the API 'clPlatformInfo' twice for each
// parameter we're trying to extract
// and we use the return value to create temporary data
// structures (on the stack) to store
// the returned information on the second invocation.
for(cl_uint i = 0; i < numOfPlatforms; ++i) {
displayPlatformInfo( platforms[i],
CL_PLATFORM_PROFILE,
"CL_PLATFORM_PROFILE" );
displayPlatformInfo( platforms[i],
CL_PLATFORM_VERSION,
"CL_PLATFORM_VERSION" );
displayPlatformInfo( platforms[i],
CL_PLATFORM_NAME,
"CL_PLATFORM_NAME" );
displayPlatformInfo( platforms[i],
CL_PLATFORM_VENDOR,
"CL_PLATFORM_VENDOR" );
displayPlatformInfo( platforms[i],
CL_PLATFORM_EXTENSIONS,
"CL_PLATFORM_EXTENSIONS" );
}
return 0;
}
I'm on Qt Creator, and my pc's config concerning video is : NVIDIA GEFORCE GT 635M & Intel(R) HD Graphics 4000 under Windows 8.1
My .pro file is :
SOURCES += \
main.cpp
QMAKE_CXXFLAGS += -std=c++0x
INCLUDEPATH += \
$$quote(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v6.5/include)
LIBS += \
$$quote(C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v6.5/lib/x64/OpenCL.lib)
Because of spaces in file path. So, my question is : Why, when i'm compiling my project, does the problem "undefined reference to clGetPlatformInfo#20'" appear? There's 2 others errors (one which exactly the same, the other is "undefined reference toclGetPlatformIDs#12'")
I search on the web for a lot of days and I can't find the answer (these prob has answer but on Linux or on Mac..)
Thanks in advance !
Mathieu
It looks like you are trying to build 32-bit application, while linking with 64-bit version of OpenCL.lib:
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v6.5/lib/x64/OpenCL.lib
So, either build application in 64-bit mode, or fix the path to point to 32-bit version of OpenCL.lib.
Related
I have the latest mac pro(OS:10.12.2) ,with intel intergrated GPU HD 530(Gen9) which runs the OpenCL code. In my OpenCL code, I use vloadx and atomic_add instruction. change my OpenCL kernel code into bitcode like https://developer.apple.com/library/content/samplecode/OpenCLOfflineCompilation/Introduction/Intro.html#//apple_ref/doc/uid/DTS40011196-Intro-DontLinkElementID_2
. and create the program with clCreateProgramWithBinary. But when clBuildProgram, it returns error with -11 .and build log is "
error: undefined reference to _Z6vload2mPKU3AS1h()'
undefined reference to_Z8atom_addPVU3AS3ii()'
"
But in my mac air with HD 5500(Gen8), the code is ok.
Can someone tell me what should I do?
The problem here is, you cannot use incompatible binaries in different devices. Which means if you compile for Intel, you cannot use the compiled binary for AMD for example. What you need to do is compile the code for the specific device every time from the source.
If you do not want to use the OpenCL codes in different files, what you can do is put them inside your source file by stringifying them. Instead of reading a file, you use the kernel string inside your host code to pass as the kernel string. This will allow you to protect your IP. However, everytime, you need to build the code using clBuildProgram. You can also save the built program as binary, so after the first run, you won't degrade performance by building it everytime. To give an example, lets suppose you have a kernel.cl file as following:
__kernel void foo(__global int* in, __global int* out)
{
int idx = get_global_id(0);
out[idx] = in[idx] * in[idx];
}
You probably get this kernel code by reading the file with something like:
char *source_str;
fp = fopen("kernel.cl", "r");
source_str = (char *)malloc(MAX_SOURCE_SIZE);
source_size = fread(source_str, 1, MAX_SOURCE_SIZE, fp);
fclose(fp);
program = clCreateProgramWithSource(context, 1, (const char **)&source_str, (const size_t *)&source_size, &ret);
What you can do instead is something like:
const char* src = "__kernel void foo(__global int* in, __global int* out)\
{\
int idx = get_global_id(0);\
out[idx] = in[idx] * in[idx];\
}";
program = clCreateProgramWithSource(context, 1, (const char **)&src, (const size_t *)&src_size, &ret);
When you compile your C code, this string will be converted into binary, so you protect your source code.
I have an image drawing routine which is compiled multiple times for SSE, SSE2, SSE3, SSE4.1, SSE4.2, AVX and AVX2.
My program dynamically dispatches one of these binary variations by checking CPUID flags.
On Windows, I check the version of Windows and disable AVX/AVX2 dispatch if the OS doesn't support them. (For example, only Windows 7 SP1 or later supports AVX/AVX2.)
I want to do the same thing on Mac OS X, but I'm not sure what version of OS X supports AVX/AVX2.
Note that what I want to know is the minimum version of OS X for use with AVX/AVX2. Not machine models which are capable of AVX/AVX2.
For detecting instruction set features there are two source files I reference:
Mysticial's cpu_x86.cpp
Agner Fog's instrset_detect.cpp
Both of these files will tell you how to detect SSE through AVX2 as well as XOP, FMA3, FMA4, if your OS supports AVX and other features.
I am used to Agner's code (one source file for MSVC, GCC, Clang, ICC) so let's look at that first.
Here are the relevant code fragments from instrset_detect.cpp for detecting AVX:
iset = 0; // default value
int abcd[4] = {0,0,0,0}; // cpuid results
cpuid(abcd, 0); // call cpuid function 0
//....
iset = 6; // 6: SSE4.2 supported
if ((abcd[2] & (1 << 27)) == 0) return iset; // no OSXSAVE
if ((xgetbv(0) & 6) != 6) return iset; // AVX not enabled in O.S.
if ((abcd[2] & (1 << 28)) == 0) return iset; // no AVX
iset = 7; // 7: AVX supported
with xgetbv defined as
// Define interface to xgetbv instruction
static inline int64_t xgetbv (int ctr) {
#if (defined (_MSC_FULL_VER) && _MSC_FULL_VER >= 160040000) || (defined (__INTEL_COMPILER) && __INTEL_COMPILER >= 1200) // Microsoft or Intel compiler supporting _xgetbv intrinsic
return _xgetbv(ctr); // intrinsic function for XGETBV
#elif defined(__GNUC__) // use inline assembly, Gnu/AT&T syntax
uint32_t a, d;
__asm("xgetbv" : "=a"(a),"=d"(d) : "c"(ctr) : );
return a | (uint64_t(d) << 32);
#else // #elif defined (_WIN32) // other compiler. try inline assembly with masm/intel/MS syntax
//see the source file
}
I did not include the cpuid function (see the source code) and I removed the non GCC inline assembly from xgetbv to make the answer shorter.
Here is the detect_OS_AVX() from Mysticial's cpu_x86.cpp for detecting AVX:
bool cpu_x86::detect_OS_AVX(){
// Copied from: http://stackoverflow.com/a/22521619/922184
bool avxSupported = false;
int cpuInfo[4];
cpuid(cpuInfo, 1);
bool osUsesXSAVE_XRSTORE = (cpuInfo[2] & (1 << 27)) != 0;
bool cpuAVXSuport = (cpuInfo[2] & (1 << 28)) != 0;
if (osUsesXSAVE_XRSTORE && cpuAVXSuport)
{
uint64_t xcrFeatureMask = xgetbv(_XCR_XFEATURE_ENABLED_MASK);
avxSupported = (xcrFeatureMask & 0x6) == 0x6;
}
return avxSupported;
}
Mystical apparently came up with this solution from this answer.
Notice that both source files do basically the same thing: check the OSXSAVE bit 27, check the AVX bit 28 from CPUID, check a result from xgetbv.
For AVX the answer is quite straightforward:
You need at least OS X 10.6.7
Please note that only build 10J3250 and 10J4138 would support it.
For AVX2 that would be 10.8.4 build 12E3067 or 12E4022
I am trying to do separate compilation using CUDA 5. For this reason I set the "Generate Relocatable Device Code" to "Yes (-rdc=true)" in Visual Studio 2010. The program compiles without errors, however,
I get an invalid device symbol error when I try to initialize device constants using cudaMemcpyToSymbol.
i.e. I have the following constant
__constant__ float gdDomainOrigin[2];
and try to initialize it with
cudaMemcpyToSymbol(gdDomainOrigin, mDomainOrigin, 2*sizeof(float));
which leads to the error. The error does not occur, when I compile everything as a whole, without the aforementioned option set. Could anybody please help me with that?
I can't reproduce this. If build an application from two .cu files, one containing a __constant__ symbol and a simple kernel, and the other containing the runtime API incantations to populate that constant memory and call the kernel, it works only when relocatable device code is enabled, viz:
__constant__ float gdDomainOrigin[2];
__global__
void kernel(float *inout)
{
inout[0] = gdDomainOrigin[0];
inout[1] = gdDomainOrigin[1];
}
and
#include <cstdio>
extern __constant__ float gdDomainOrigin;
extern __global__ void kernel(float *);
inline
void gpuAssert(cudaError_t code, char * file, int line, bool Abort=true)
{
if (code != 0) {
fprintf(stderr, "GPUassert: %s %s %d\n",
cudaGetErrorString(code),file,line);
if (Abort) exit(code);
}
}
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
int main(void)
{
const float mDomainOrigin[2] = { 1.234f, 5.6789f };
const size_t sz = sizeof(float) * size_t(2);
float * dbuf, * hbuf;
gpuErrchk( cudaFree(0) );
gpuErrchk( cudaMemcpyToSymbol(gdDomainOrigin, mDomainOrigin, sz) );
gpuErrchk( cudaMalloc((void **)&dbuf, sz) );
kernel<<<1,1>>>(dbuf);
gpuErrchk( cudaPeekAtLastError() );
hbuf = new float[2];
gpuErrchk( cudaMemcpy(hbuf, dbuf, sz, cudaMemcpyDeviceToHost) );
fprintf(stdout, "%f %f\n", hbuf[0], hbuf[1]);
return 0;
}
Compiling and running these in CUDA 5 on a 64 bit linux system with a Kepler GPU produces the following:
$ nvcc -arch=sm_30 -o shared shared.cu shared_dev.cu
$ ./shared
GPUassert: invalid device symbol shared.cu 23
$ nvcc -arch=sm_30 -rdc=true -o shared shared.cu shared_dev.cu
$ ./shared
1.234000 5.678900
You can see that in the first compilation, without relocatable GPU code generation, the symbol isn't found. In the second case, with relocatable GPU code generation, it is found, and the elf header in the object file looks just as you would expect:
$ nvcc -arch=sm_30 -rdc=true -c shared_dev.cu
$ cuobjdump -symbols shared_dev.o
Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = linux
compile_size = 64bit
identifier = shared_dev.cu
symbols:
STT_SECTION STB_LOCAL .text._Z6kernelPf
STT_SECTION STB_LOCAL .nv.constant3
STT_SECTION STB_LOCAL .nv.constant0._Z6kernelPf
STT_CUDA_OBJECT STB_LOCAL _param
STT_SECTION STB_LOCAL .nv.callgraph
STT_FUNC STB_GLOBAL _Z6kernelPf
STT_CUDA_OBJECT STB_GLOBAL gdDomainOrigin
Fatbin ptx code:
================
arch = sm_30
code version = [3,1]
producer = cuda
host = linux
compile_size = 64bit
compressed
identifier = shared_dev.cu
ptxasOptions = --compile-only
Perhaps you could try my code and compilation/diagnostic steps and see what happens with your Windows toolchain.
I am working on setting up a basic OpenGL application by dynamically linking the opengl32.dll file pre-packaged with Windows (That part is non-optional). However I am having quite a lot of difficulty getting procedure addresses for the functions related to Vertex Buffer Objects.
My initial investigations have revealed that windows only exposes the OpenGL 1.1 specification at first, and wglGetProcAddress calls need to be used to get any functions more recent than that. So I modified my code to attempt that method as well. I am using glGenBuffers as my example case, and have attempted four different attempts to load it, and all fail. I have also used glGetString to check my version number which is reported as major version 4, so I doubt it lacks VBO support.
How should I be getting the proc addresses for these VBO functions?
A minimized example of the code I'm dealing with is here:
#include <iostream>
#include "windows.h"
using namespace std;
int main()
{
//Load openGL and get necessary functions
HINSTANCE hDLL = LoadLibrary("opengl32.dll");
PROC WINAPI(*winglGetProcAddress)(LPCSTR);
void(*genBuffers)(int, unsigned int*);
if(hDLL)
{
winglGetProcAddress = (PROC WINAPI(*)(LPCSTR))GetProcAddress(hDLL, "wglGetProcAddress");
if(winglGetProcAddress == NULL){cout << "wglGetProcAddress not found!" << endl; return 0;}
genBuffers = (void(*)(int, unsigned int*))GetProcAddress(hDLL, "glGenBuffers");
if(genBuffers == NULL){genBuffers = (void(*)(int, unsigned int*))winglGetProcAddress("glGenBuffers");}
}
else
{cout << "This application requires Open GL support." << endl; return 0;}
//glGenBuffers not supported, fallback to glGenBuffersARB
if(genBuffers == NULL)
{
genBuffers = (void(*)(int, unsigned int*))GetProcAddress(hDLL, "glGenBuffersARB");
if(genBuffers == NULL){genBuffers = (void(*)(int, unsigned int*))winglGetProcAddress("glGenBuffersARB");}
if(genBuffers == NULL)
{cout << "Could not locate glGenBuffers or glGenBuffersARB in opengl32.dll." << endl; return 0;}
}
//get a Vertex Buffer Object
unsigned int a[1];
genBuffers(1, a);
//cleanup
if(!FreeLibrary(hDLL))
{cout << "Failed to free the opengl32.dll library." << endl;}
return 0;
}
When run, it loads the library and get's the wglGetProcAddress correctly, but then outputs the "Could not locate glGenBuffers or glGenBuffersARB in opengl32.dll." error, indicating it failed to get either "glGenBuffers" or "glGenBuffersARB" using either "GetProcAddress" or "wglGetProcAddress".
Alternatively, if this does mean I do not have VBO support, will a driver update help, or is it even possible to get it supported? I'd really rather not use deprecated immediate mode calls.
I am running this in Code::Blocks, on Windows XP, Intel Core i5, with an NVIDIA GeForce GTX 460.
I have been looking for some time but have not found anywhere near sufficient documentation / examples on how to use the CryptoAPI that comes with linux in the creation of syscalls / in kernel land.
If anyone knows of a good source please let me know, I would like to know how to do SHA1 / MD5 and Blowfish / AES within the kernel space only.
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/err.h>
#include <linux/scatterlist.h>
#define SHA1_LENGTH 20
static int __init sha1_init(void)
{
struct scatterlist sg;
struct crypto_hash *tfm;
struct hash_desc desc;
unsigned char output[SHA1_LENGTH];
unsigned char buf[10];
int i;
printk(KERN_INFO "sha1: %s\n", __FUNCTION__);
memset(buf, 'A', 10);
memset(output, 0x00, SHA1_LENGTH);
tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
desc.tfm = tfm;
desc.flags = 0;
sg_init_one(&sg, buf, 10);
crypto_hash_init(&desc);
crypto_hash_update(&desc, &sg, 10);
crypto_hash_final(&desc, output);
for (i = 0; i < 20; i++) {
printk(KERN_ERR "%d-%d\n", output[i], i);
}
crypto_free_hash(tfm);
return 0;
}
static void __exit sha1_exit(void)
{
printk(KERN_INFO "sha1: %s\n", __FUNCTION__);
}
module_init(sha1_init);
module_exit(sha1_exit);
MODULE_LICENSE("Dual MIT/GPL");
MODULE_AUTHOR("Me");
There are a couple of places in the kernel which use the crypto module: the eCryptfs file system (linux/fs/ecryptfs/) and the 802.11 wireless stack (linux/drivers/staging/rtl8187se/ieee80211/). Both of these use AES, but you may be able to extrapolate what you find there to MD5.
Another good example is from the 2.6.18 kernel source in security/seclvl.c
Note: You can change CRYPTO_TFM_REQ_MAY_SLEEP if needed
static int
plaintext_to_sha1(unsigned char *hash, const char *plaintext, unsigned int len)
{
struct crypto_tfm *tfm;
struct scatterlist sg;
if (len > PAGE_SIZE) {
seclvl_printk(0, KERN_ERR, "Plaintext password too large (%d "
"characters). Largest possible is %lu "
"bytes.\n", len, PAGE_SIZE);
return -EINVAL;
}
tfm = crypto_alloc_tfm("sha1", CRYPTO_TFM_REQ_MAY_SLEEP);
if (tfm == NULL) {
seclvl_printk(0, KERN_ERR,
"Failed to load transform for SHA1\n");
return -EINVAL;
}
sg_init_one(&sg, (u8 *)plaintext, len);
crypto_digest_init(tfm);
crypto_digest_update(tfm, &sg, 1);
crypto_digest_final(tfm, hash);
crypto_free_tfm(tfm);
return 0;
}
Cryptodev-linux
https://github.com/cryptodev-linux/cryptodev-linux
It is a kernel module that exposes the kernel crypto API to userspace through /dev/crypto .
SHA calculation example: https://github.com/cryptodev-linux/cryptodev-linux/blob/da730106c2558c8e0c8e1b1b1812d32ef9574ab7/examples/sha.c
As others have mentioned, the kernel does not seem to expose the crypto API to userspace itself, which is a shame since the kernel can already use native hardware accelerated crypto functions internally.
Crypto operations cryptodev supports: https://github.com/nmav/cryptodev-linux/blob/383922cabeea7dca354415e8c590f8e932f4d7a8/crypto/cryptodev.h
Crypto operations Linux x86 supports: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto?id=refs/tags/v4.0
The best place to start is Documentation/crytpo in the kernel sources. dm-crypt is one of the many components that probably uses the kernel crypto API and you can refer to it to get an idea about usage.
how to do SHA1 / MD5 and Blowfish / AES within the kernel space only.
Example of hashing data using a two-element scatterlist:
struct crypto_hash *tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
if (tfm == NULL)
fail;
char *output_buf = kmalloc(crypto_hash_digestsize(tfm), GFP_KERNEL);
if (output_buf == NULL)
fail;
struct scatterlist sg[2];
struct hash_desc desc = {.tfm = tfm};
ret = crypto_hash_init(&desc);
if (ret != 0)
fail;
sg_init_table(sg, ARRAY_SIZE(sg));
sg_set_buf(&sg[0], "Hello", 5);
sg_set_buf(&sg[1], " World", 6);
ret = crypto_hash_digest(&desc, sg, 11, output_buf);
if (ret != 0)
fail;
One critical note:
Never compare the return value of crypto_alloc_hash function to NULL for detecting the failure.
Steps:
Always use IS_ERR function for this purpose. Comparing to NULL does not capture the error, hence you get segmentation faults later on.
If IS_ERR returns fail, you possibly have a missing crypto algorithm compiled into your kernel image (or as a module). Make sure you have selected the appropriate crypto algo. form make menuconfig.