OpenCL program on amazon EC2 hanging at the end - amazon-ec2

I have a OpenCL C++ program that I'm running on a NVIDIA GPU-enabled EC2 instance. Quite often, at the end of the program, it hangs, and I end up having to stop the instance to get out of it.
Here is the end of my code, is there something that I should be doing differently?
/* Finalization */
ret = clFlush(command_queue);
printf("clFlush: %s\n", getErrorString(ret));
ret = clFinish(command_queue);
printf("clFinish: %s\n", getErrorString(ret));
ret = clReleaseMemObject(ymobj);
printf("clReleaseMemObject: %s\n", getErrorString(ret));
ret = clReleaseProgram(program);
printf("clReleaseProgram: %s\n", getErrorString(ret));
ret = clReleaseKernel(kernel);
printf("clReleaseKernel: %s\n", getErrorString(ret));
ret = clReleaseCommandQueue(command_queue);
printf("clReleaseCommandQueue: %s\n", getErrorString(ret));
ret = clReleaseContext(context);
printf("clReleaseContext: %s\n", getErrorString(ret));
free(source_str);
free(y);
exit(0);
Typically I can see that the output from the command following clFinish, but nothing after that.

Related

pid_task() causes kernel panic, linux kernel 5.4

I am trying to send a signal from the kernel space to the user space.
I have the below code and I am seeing a kernel panic.
[ 5230.132362] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: prog_irq_handler+0x1d4/0x2cc [prog_mon]
[ 5230.146795] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: prog_irq_handler+0x1d4/0x2cc
Upon debugging some more, I found that the source of the panic is from the below function:
t = pid_task(find_pid_ns(id, &init_pid_ns), PIDTYPE_PID);
Referencing the value of "t" seems to cause an exception, resulting in a kernel panic.
Is there any known issue with the kernel 5.4 wrt the pid_task().
Any help will be appreciated.
My kernel comes from yocto, and it is:
branch ti-linux-5.4.y
commit 6f3bf13d53820fc12432d7052744be2ee046fc92 (HEAD -> ti-linux-5.4.y)
Merge: d2f658ed506d d5ef1ab82339
Author: LCPD Auto Merger lcpd_integration#list.ti.com
Date: Fri Apr 3 10:50:48 2020 -0500
https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/
Full code below:
send_signal(int val, int id, int sig)
{
struct kernel_siginfo info;
struct task_struct *t;
int ret;
ret = 0;
if ((id > 0) && (sig > 0)) {
memset(&info, 0, sizeof(struct siginfo));
info.si_signo = sig;
/* Using SI_KERNEL here results in real_time data not getting delivered to the user space signal handler */
info.si_code = SI_QUEUE;
/* Real time signals may have 32 bits of data */
info.si_int = val;
info._sifields._rt._sigval.sival_int = val;
info.si_errno = 0;
rcu_read_lock();
t = pid_task(find_pid_ns(id, &init_pid_ns), PIDTYPE_PID);
if(t == NULL) {
printk(KERN_ERR "%s: Invalid user handler PID %d\n", module_name, id);
rcu_read_unlock();
return -ENODEV;
}
ret = send_sig_info(sig, &info, t);
rcu_read_unlock();
if (ret < 0)
printk(KERN_INFO "%s: Failed to signal with data %d to user space\n", module_name, val);
}
return ret
}

PRINTER_INFO_6 not available using WIn32 api

I have been trying to get printer status from a DNP rx1 printer, but the status of the printer does not change when I open the tray of the printer. Here is an example using py32win library to access the status and it always return status = 0 event when the tray is open.
device_name = win32print.GetDefaultPrinter()
handle = win32print.OpenPrinter(device_name)
# Get the default properties for the printer
properties = win32print.GetPrinter(handle, 2)
When I try win32print.GetPrinter(handle, 6) # 6 = print_info_6 I get the some NotImplementedException. So my guess is that the firmware of the printer have not implemented print_info_6. So I can't get the status from the printer
I have also tried using powershell with:
Get-Printer | Select Name, PrinterStatus
Also no change in status when I open the tray or if there is a paper jam.
Is there anything that I'm overlooking? Is there anything else I can try to get the status of the printer?
PRINTER_INFO_6 works for me in C++ on Windows 10 1903 with OneNote printer.
And when I pause the printer I get status 0x00000001 (PRINTER_STATUS_PAUSED).
The C++ code I used for testing.
#pragma comment(lib, "Winspool")
int main()
{
DWORD bufSize;
WCHAR* buf = NULL;
HANDLE hPrinter = NULL;
PRINTER_INFO_6 info = {};
DWORD needed;
BOOL result = FALSE;
DWORD err;
// Get required buffer size
result = GetDefaultPrinter(NULL, &bufSize);
if(!result)
{
err = GetLastError();
if (ERROR_INSUFFICIENT_BUFFER != err)
{
std::cout << "GetDefaultPrinter failed with error: \n" << GetLastError();
return 0;
}
}
buf = (WCHAR*)calloc(bufSize, sizeof(WCHAR));
result = GetDefaultPrinter(buf, &bufSize);
if (!result)
{
std::cout << "GetDefaultPrinter failed with error: \n" << GetLastError();
return 0;
}
std::wcout << "Printer name: " << buf << "\n";
result = OpenPrinter(buf, &hPrinter, NULL);
if (!result)
{
std::cout << "OpenPrinter failed with error: \n" << GetLastError();
return 0;
}
result = GetPrinter(hPrinter, 6, (LPBYTE)&info, sizeof(PRINTER_INFO_6), &needed);
if (!result)
{
err = GetLastError();
if (ERROR_INSUFFICIENT_BUFFER != err)
{
std::cout << "GetPrinter failed with error: \n" << GetLastError();
return 0;
}
}
BYTE* statBuf = (BYTE*)calloc(needed, sizeof(BYTE));
result = GetPrinter(hPrinter, 6, statBuf, needed, &needed);
if (!result)
{
std::cout << "GetPrinter failed with error: \n" << GetLastError();
return 0;
}
std::cout << "Printer status (low 32bit): " << *((DWORD*)statBuf) << "\n";
statBuf += sizeof(DWORD);
std::cout << "Printer status (high 32bit): " << *((DWORD*)statBuf) << "\n";
getchar();
}
Some issues I found in testing:
Pinter status defined as a DWORD (4 bytes) in PRINTER_INFO_6 structure but GetPrinter requries 8 bytes for it (needed == 8). So you will get ERROR_INSUFFICIENT_BUFFER error when you pass a PRINTER_INFO_6 structure as pPrinter parameter.
There is only PRINTER_INFO_6 defined but no _PRINTER_INFO_6W (Unicode) and _PRINTER_INFO_6A (ANSI) mentioned in the document.

on PowerPC P2040/E500mc the LD instruction EA causing kernel panic during PCI card pull out

Everything I have read so far points to the fact that when accessing PCI address space during card pull out will cause kernel panic if not handled in the kernel machine_check_handler. The machine_check_handler for e500mc looks for the EA(Effective Address) of the instruction in the MCSRR0 register and compares it agains PCI address space. However, since this address (EA) was not in PCI address space, caused the kernel panic eventually, as it could not be handled in the machine check interrupt handler as the address was some bad address that was stored by CPU in the MCSRR0.
Although the GPRs are all pointing to PCI address space BAR addresses from previous cpu instructions, but the Effective Address stored in the MCSRR0 register is the same invalid physical address that the NIP is pointing to...
The MCSRR1 points to machine state (MSR) at the point of interrupt and shows LD|GLD bits set along with MCSRR1[RI] bit. so its a recoverable synchronous interrupt.
And since the CPU address access was on an external hot-plugged device we need not crash the system even if the device is not present and hence the kernel check and safe return from interrupt.
I have a few questions regarding this issue:
Which GPRs are used to determine the effective address of the LD instruction. The LD bit is set in MCSR register? How do I tell which addressing mode was used for generating the effective address for the LD instruction?
the LD instruction uses rD,rA,rB operands, how do i find which EA calculation mode is being used by the processor. Apparently there are 4 of them. Also, which GPR's do each of these operands point to? I couldn't figure it out from the E500MCRM or powepc EREF.
Since we are writing to PCI address space from user space, the PCI device registers are mapped to some virtual address space in the process memory to which we are writing. This is non cached mapping as far as i know.
Does the CPU address translation to PCI device physical address, for accessing the PCI device result in bad address as the PCI device is no longer connected. My assumption for this was, since the device is no longer present the effective address returned was some junk value that caused this kernel panic. I am not sure if that's how CPU works.
Any suggestions helping my understanding are welcome. this is way deep down and beyond my expertise. I have gone through the E500MCRM, P2040RM and powerpc EREF but I cannot figure out why I am getting a bad address instead of a PCI physical address in the Effective address.
kernel - crash dump
fujitsu:~$ fsl_pci_mcheck_exception-> SPRN_MCAR: 0x0
fsl_pci_mcheck_exception-> SPRN_MCSRR0: 0x0f6fec68
fsl_pci_mcheck_exception-> SPRN_MCSRR1: 0x2d002
fsl_pci_mcheck_exception-> SPRN_MCAR: 0x0
fsl_pci_mcheck_exception-> SPRN_DEAR: 0x0
fsl_pci_mcheck_exception-> current->pid: [8333]
fsl_pci_mcheck_exception-> after __get_user_inatomic(inst, &regs->nip): 0x0f6fec68(inst), 0x0f6fec68(regs->nip), 0x0(ret)
Machine check in kernel mode.
Caused by (from MCSR=a000): Load Error Report
Guarded Load Error Report
Oops: Machine check, sig: 7 [#1]
PREEMPT SMP NR_CPUS=4 P2041 RDB
Modules linked in: i2cBridge(O) interruptDriver_pb(O) cma_alloc(O) hwtp_drv(O) interruptDriver_wdt(O)
NIP: 0f6fec68 LR: 0f6fec4c CTR: 0f6faad4
REGS: e4ec5f10 TRAP: 0204 Tainted: G O (3.8.13-rt9+)
MSR: 0002d002 <CE,EE,PR,ME> CR: 40044442 XER: 20000000
TASK = e57dc020[8333] 'RxManager' THREAD: e4ec4000 CPU: 3
GPR00: 0f6fec4c 52afea90 52b06910 50400000 52afeb50 00000003 a0105210 52afebfc
GPR08: a1ffffff a0000000 0000000c a0000000 20044448 1032e800 52900000 00000006
GPR16: 0f74f434 0f729d20 135a78a0 00200000 0fe28280 52aff4b0 00000000 0fe2a6c8
GPR24: 52afec98 0f6cd268 135a7630 00105210 52afebfc 50400000 0f71d31c 00000003
NIP [0f6fec68] 0xf6fec68
LR [0f6fec4c] 0xf6fec4c
Call Trace:
---[ end trace 2715d0da39427f69 ]---
here's the code from fsl_pci.c that's getting called from machine_check_handler
#ifdef CONFIG_E500
static int mcheck_handle_load(struct pt_regs *regs, u32 inst)
{
unsigned int rd, ra, rb, d;
rd = get_rt(inst);
ra = get_ra(inst);
rb = get_rb(inst);
d = get_d(inst);
printk(KERN_EMERG "%s==> rd==0x%x, ra=0x%x, rb=0x%x, d=0x%x\n", __FUNCTION__, rd, ra, rb, d);
printk(KERN_EMERG "%s==> get_op(inst) = 0x%x\n", __FUNCTION__, get_op(inst));
return 1;
switch (get_op(inst)) {
case 31:
switch (get_xop(inst)) {
case OP_31_XOP_LWZX:
case OP_31_XOP_LWBRX:
regs->gpr[rd] = 0xffffffff;
break;
case OP_31_XOP_LWZUX:
regs->gpr[rd] = 0xffffffff;
regs->gpr[ra] += regs->gpr[rb];
break;
case OP_31_XOP_LBZX:
regs->gpr[rd] = 0xff;
break;
case OP_31_XOP_LBZUX:
regs->gpr[rd] = 0xff;
regs->gpr[ra] += regs->gpr[rb];
break;
case OP_31_XOP_LHZX:
case OP_31_XOP_LHBRX:
regs->gpr[rd] = 0xffff;
break;
case OP_31_XOP_LHZUX:
regs->gpr[rd] = 0xffff;
regs->gpr[ra] += regs->gpr[rb];
break;
default:
return 0;
}
break;
case OP_LWZ:
regs->gpr[rd] = 0xffffffff;
break;
case OP_LWZU:
regs->gpr[rd] = 0xffffffff;
regs->gpr[ra] += (s16)d;
break;
case OP_LBZ:
regs->gpr[rd] = 0xff;
break;
case OP_LBZU:
regs->gpr[rd] = 0xff;
regs->gpr[ra] += (s16)d;
break;
case OP_LHZ:
regs->gpr[rd] = 0xffff;
break;
case OP_LHZU:
regs->gpr[rd] = 0xffff;
regs->gpr[ra] += (s16)d;
break;
default:
return 0;
}
return 1;
}
static int is_in_pci_mem_space(phys_addr_t addr)
{
struct pci_controller *hose;
struct resource *res;
int i;
list_for_each_entry(hose, &hose_list, list_node) {
if (!(hose->indirect_type & PPC_INDIRECT_TYPE_EXT_REG))
continue;
for (i = 0; i < 3; i++) {
res = &hose->mem_resources[i];
if ((res->flags & IORESOURCE_MEM) &&
addr >= res->start && addr <= res->end)
printk(KERN_EMERG "%s ==> returning from checking addresses\n", __FUNCTION__);
return 1;
}
}
printk(KERN_EMERG "%s ==> returning without checking addresses\n", __FUNCTION__);
return 1;
}
int fsl_pci_mcheck_exception(struct pt_regs *regs)
{
u32 inst;
int ret;
phys_addr_t addr = 0;
/* Let KVM/QEMU deal with the exception */
if (regs->msr & MSR_GS)
return 0;
#ifdef CONFIG_PHYS_64BIT
addr = mfspr(SPRN_MCARU);
addr <<= 32;
#endif
addr += mfspr(SPRN_MCSRR0);
printk(KERN_EMERG "%s-> SPRN_MCAR: 0x%x\n", __FUNCTION__, addr);
printk(KERN_EMERG "%s-> SPRN_MCSRR0: 0x%x\n", __FUNCTION__, mfspr(SPRN_MCSRR0));
printk(KERN_EMERG "%s-> SPRN_MCSRR1: 0x%x\n", __FUNCTION__, mfspr(SPRN_MCSRR1));
printk(KERN_EMERG "%s-> current->pid: 0x%x\n", __FUNCTION__, current->pid);
#ifdef CONFIG_E500
if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
printk(KERN_EMERG "%s-> SPRN_DEAR: 0x%x\n", __FUNCTION__, addr);
#endif
printk(KERN_EMERG "%s-> before get_user: 0x%x, 0x%x\n", __FUNCTION__, regs->nip, inst);
if (is_in_pci_mem_space(addr)) {
if (user_mode(regs)) {
pagefault_disable();
/* I am using __get_user_inatomic to get the instruction from the user
space as any other get_user versions were resulting in -EFAULT as they can
sleep and this needs to be called from user context and we are in interrupt
context.
*/
ret = __get_user_inatomic(inst, &regs->nip);
pagefault_enable();
} else {
ret = probe_kernel_address(regs->nip, inst);
}
printk(KERN_EMERG "%s-> after get_user: 0x%x, 0x%x, 0x%d\n", __FUNCTION__, regs->nip, inst, ret);
if (mcheck_handle_load(regs, inst)) {
regs->nip += 4;
printk(KERN_EMERG "%s-> after mcheck_handle load: 0x%x, 0x%x\n", __FUNCTION__, regs->nip, inst);
return 1;
}
}
return 0;
}
#endif
Here's the code I added to fix the kernel panic. Looks like regs->gpr[0] is destination address of the LD instruction and incrementing the instruction pointer took care of the return from the interrupt context cleanly. I still have the issue of verifying that this interrupt originated due to access of PCI device address. Right now I have commented out the PCI address range check and without this check I am able to access any address without crashing the system which is even worse.
Yes. Even a null pointer access doesn't crash the system anymore. I tried it with devmem2 and accessed a nullpointer and the call goes through the interrupt and returns safely after dumping the logs from the interrupt handler.
regs->gpr[0] = 0xffffffff;
regs->nip += 4;
return 1;
if (mcheck_handle_load(regs, inst)) {

Using libssh library, I have a trouble with ssh_scp_read function

Now I'm trying to download file from server (Linux) to my host (windows).
It looks working well (other functions) but when I'm trying to read file from server. Data in the file is not proper.
for example)
server text file:
A
B
C
D
E
Reading file and saving to buffer:
A
B
C
I checked small size text file. It was working well. However now I'm in trouble because text file is bigger than before.
Here is my code.
int scp_receive(ssh_session session)
{
ssh_scp scp;
int rc;
int size, mode;
char *filename;
char *buffer;
int fd;
scp = ssh_scp_new(session, SSH_SCP_READ, "/home/abc/27");
if (scp == NULL)
{
fprintf(stderr, "Error allocating scp session: %s\n",
ssh_get_error(session)); return SSH_ERROR;
}
rc = ssh_scp_init(scp);
if (rc != SSH_OK)
{
fprintf(stderr, "Error initializing scp session: %s\n",ssh_get_error(session));
ssh_scp_free(scp);
return rc;
}
rc = ssh_scp_pull_request(scp);
if (rc != SSH_SCP_REQUEST_NEWFILE)
{
fprintf(stderr, "Error receiving information about file: %s\n",ssh_get_error(session));
return SSH_ERROR;
}
size = ssh_scp_request_get_size(scp);
std::cout << "size is:" << size << std::endl;
filename = strdup(ssh_scp_request_get_filename(scp));
mode = ssh_scp_request_get_permissions(scp);
printf("Receiving file %s, size %d, permisssions 0%o\n", filename, size, mode);
free(filename);
buffer = (char *)malloc(sizeof(char)*size);
if (buffer == NULL)
{
fprintf(stderr, "Memory allocation error\n");
return SSH_ERROR;
}
ssh_scp_accept_request(scp);
rc = ssh_scp_read(scp, buffer, sizeof(char)*size);
if(rc == SSH_ERROR)
{
fprintf(stderr, "Error receiving file data: %s\n", ssh_get_error(session));
free(buffer);
return rc;
}
std::cout << buffer << std::endl;
printf("Done\n");
//write(fd, buffer, size);
char path[30];
sprintf(path, "C:/Check/%s", filename);
if (access(path, F_OK) != 0) {
//TODO: file
remove(path);
}
if(0 < (fd=open(path, O_RDWR | O_CREAT | O_TRUNC, mode)))
{
write(fd, buffer, sizeof(char)*size);
close(fd);
}
else
{
std::cerr << "failed to open" << std::endl;
}
free(buffer);
//rc = ssh_scp_pull_request(scp);
//if(rc != SSH_SCP_REQUEST_EOF)
//{
// fprintf(stderr, "Unexpected request: %s\n",ssh_get_error(session));
// return SSH_ERROR;
//}
ssh_scp_close(scp);
ssh_scp_free(scp);
return SSH_OK;
}
The issue here is with the libssh read function. It only reads a max of 65536 bytes at once. It won't read anything large than that block size.
If you take a look at the libssh SCP read function here: http://xtalopt.openmolecules.net/wiki/index.fcgi/browser/src/libssh/scp.c?rev=2b0288492ad2481ee8bdbb8c1f9d5c453a044eee
You will see the size is limited to 65536.
SOLUTION
In your application, you should attempt to read multiple smaller chunks of memory till you finish reading the large text file. And since you have the size of the large text file, this can be done easily via a loop.
I know this is a late answer, but hope this can help someone else having this issue.

Xcode 4.2 on Lion OpenCL continually pausing with EXC_BAD_ACCESS

So I am running some simple Hello World OpenCL code in Xcode 4.1 on Lion and it continually breaks at clEnqueueTask. The same thing happens when I run the source from the MacResearch.org OpenCL tutorials, which breaks at clEnqueueNDRangeKernel. lldb gives code 1, address 0x30.
Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <OpenCL/opencl.h>
#define MEM_SIZE (128)
#define MAX_SOURCE_SIZE (0x100000)
int main ()
{
char *program_source = "\n"\
"__kernel void hello(__global char* string) \n"\
"{ \n"\
" string[0] = 'H'; \n"\
" string[1] = 'e'; \n"\
" string[2] = 'l'; \n"\
" string[3] = 'l'; \n"\
" string[4] = 'o'; \n"\
" string[5] = ','; \n"\
" string[6] = ' '; \n"\
" string[7] = 'w'; \n"\
" string[8] = 'o'; \n"\
" string[9] = 'r'; \n"\
" string[10] = 'l'; \n"\
" string[11] = 'd'; \n"\
" string[12] = '!'; \n"\
" string[13] = '\0'; \n"\
"} \n"\
"\n";
size_t source_size = sizeof(program_source);
cl_device_id device_id = NULL;
cl_context context = NULL;
cl_command_queue command_queue = NULL;
cl_mem memobj = NULL;
cl_program program = NULL;
cl_kernel kernel = NULL;
cl_platform_id platform_id = NULL;
cl_uint ret_num_devices;
cl_uint ret_num_platforms;
cl_int ret;
char string[MEM_SIZE];
// get platform and device information
ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
ret = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &ret_num_devices);
cl_int err = 0;
size_t returned_size = 0;
size_t buffer_size;
// Get some information about the returned device
cl_char vendor_name[1024] = {0};
cl_char device_name[1024] = {0};
err = clGetDeviceInfo(device_id, CL_DEVICE_VENDOR, sizeof(vendor_name), vendor_name, &returned_size);
err |= clGetDeviceInfo(device_id, CL_DEVICE_NAME, sizeof(device_name),device_name, &returned_size);
// assert(err == CL_SUCCESS);
printf("Connecting to %s %s...\n", vendor_name, device_name);
// create OpenCL context
context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &ret);
// create command queue
command_queue = clCreateCommandQueue(context, device_id, 0, &ret);
// create memory buffer
memobj = clCreateBuffer(context,CL_MEM_READ_WRITE, MEM_SIZE*sizeof(char), NULL, &ret);
// create kernel program from source code
program = clCreateProgramWithSource(context, 1, (const char **)&program_source, (const size_t*)&source_size, &ret);
// build kernel program
ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
// create OpenCL Kernel
kernel = clCreateKernel(program, "hello", &ret);
// set OpenCL kernel parameters
ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobj);
// Execute OpenCL kernel
ret = clEnqueueTask(command_queue, kernel, 0, NULL, NULL);
// copy results from the memory buffer
ret = clEnqueueReadBuffer(command_queue, memobj, CL_TRUE, 0, MEM_SIZE*sizeof(char), string, 0, NULL, NULL);
// display results
puts(string);
// finish up
ret = clFlush(command_queue);
ret = clFinish(command_queue);
ret = clReleaseKernel(kernel);
ret = clReleaseProgram(program);
ret = clReleaseMemObject(memobj);
ret = clReleaseCommandQueue(command_queue);
ret = clReleaseContext(context);
return 0;
}
Tried using Guard Malloc, got:
GuardMalloc[OCL_HW-1453]: recording malloc stacks to disk using standard recorder
GuardMalloc[OCL_HW-1453]: Allocations will be placed on 16 byte boundaries.
GuardMalloc[OCL_HW-1453]: - Some buffer overruns may not be noticed.
GuardMalloc[OCL_HW-1453]: - Applications using vector instructions (e.g., SSE) should work.
GuardMalloc[OCL_HW-1453]: version 24.1
OCL_HW(1453) malloc: process 1423 no longer exists, stack logs deleted from /tmp/stack-logs.1423.OCL_HW.yL5f5u.index
OCL_HW(1453) malloc: stack logs being written into /tmp/stack-logs.1453.OCL_HW.pCjTNR.index
Connecting to NVIDIA GeForce GT 330M...
I had no problems with these codes under Snow Leopard and Xcode 3. I made sure not to compile any .cl files by removing them from the target, and 'OpenCl.framework' is linked and everything.
I actually even wiped my computer and clean installed lion and xcode and still it's a problem. I'm pretty sure at this point it's something stupid.
-Thanks a bunch
You're right -- it's something silly. You are passing an incorrect value to the fourth parameter of clCreateProgramWithSource. You should be passing the length of your source string, but you are passing the size of the pointer. You can fix it like this:
size_t source_size = strlen(program_source);
Note that I found this by checking the return value from clBuildProgram. It was -11, CL_BUILD_PROGRAM_FAILURE, which means your kernel compilation failed. Since your kernel looked fine, I did this on the command line:
CL_LOG_ERRORS=stdout ./test
Which caused the Apple OpenCL implementation to dump the compiler build log to standard output. I saw this:
[CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:2:1: error: unknown type name '__kerne'
__kerne
<program source>:2:8: error: expected identifier or '('
__kerne
Which made me immediately think something was up with your source code length parameter.
Also note that you need to change this in your kernel:
string[13] = '\0';
to
string[13] = 0;
After making these changes, I see this on my Macbook Pro:
Connecting to AMD ATI Radeon HD 6490M...
Hello, world!

Resources