MacOs OpenCL on HD 530 with clBuildProgram error(-11) - macos

I have the latest mac pro(OS:10.12.2) ,with intel intergrated GPU HD 530(Gen9) which runs the OpenCL code. In my OpenCL code, I use vloadx and atomic_add instruction. change my OpenCL kernel code into bitcode like https://developer.apple.com/library/content/samplecode/OpenCLOfflineCompilation/Introduction/Intro.html#//apple_ref/doc/uid/DTS40011196-Intro-DontLinkElementID_2
. and create the program with clCreateProgramWithBinary. But when clBuildProgram, it returns error with -11 .and build log is "
error: undefined reference to _Z6vload2mPKU3AS1h()'
undefined reference to_Z8atom_addPVU3AS3ii()'
"
But in my mac air with HD 5500(Gen8), the code is ok.
Can someone tell me what should I do?

The problem here is, you cannot use incompatible binaries in different devices. Which means if you compile for Intel, you cannot use the compiled binary for AMD for example. What you need to do is compile the code for the specific device every time from the source.
If you do not want to use the OpenCL codes in different files, what you can do is put them inside your source file by stringifying them. Instead of reading a file, you use the kernel string inside your host code to pass as the kernel string. This will allow you to protect your IP. However, everytime, you need to build the code using clBuildProgram. You can also save the built program as binary, so after the first run, you won't degrade performance by building it everytime. To give an example, lets suppose you have a kernel.cl file as following:
__kernel void foo(__global int* in, __global int* out)
{
int idx = get_global_id(0);
out[idx] = in[idx] * in[idx];
}
You probably get this kernel code by reading the file with something like:
char *source_str;
fp = fopen("kernel.cl", "r");
source_str = (char *)malloc(MAX_SOURCE_SIZE);
source_size = fread(source_str, 1, MAX_SOURCE_SIZE, fp);
fclose(fp);
program = clCreateProgramWithSource(context, 1, (const char **)&source_str, (const size_t *)&source_size, &ret);
What you can do instead is something like:
const char* src = "__kernel void foo(__global int* in, __global int* out)\
{\
int idx = get_global_id(0);\
out[idx] = in[idx] * in[idx];\
}";
program = clCreateProgramWithSource(context, 1, (const char **)&src, (const size_t *)&src_size, &ret);
When you compile your C code, this string will be converted into binary, so you protect your source code.

Related

Yocto Patch Linux Kernel In-Tree-Module with extern symbol exported from Out-Of-Tree Module

I am using Yocto to build an SD Card image for my Embedded Linux Project. The Yocto branch is Warrior and the Linux kernel version is 4.19.78-linux4sam-6.2.
I am currently working on a way to read memory from an external QSPI device in the initramfs and stick the contents into a file in procfs. That part works and I echo data into the proc file and read it out successfully later in user space Linux after the board has booted.
Now I need to use the Linux Kernel module EXPORT_SYMBOL() functionality to allow an in-tree kernel module to know about my out-of-tree custom kernel module exported symbol.
In my custom module, I do this:
static unsigned char lan9730_mac_address_buffer[6];
EXPORT_SYMBOL(lan9730_mac_address_buffer);
And I patched the official kernel build in a bitbake bbappend file with this:
diff -Naur kernel-source/drivers/net/usb/smsc95xx.c kernel-source.new/drivers/net/usb/smsc95xx.c
--- kernel-source/drivers/net/usb/smsc95xx.c 2020-08-04 22:34:02.767157368 +0000
+++ kernel-source.new/drivers/net/usb/smsc95xx.c 2020-08-04 23:34:27.528435689 +0000
## -917,6 +917,27 ##
{
const u8 *mac_addr;
+ printk("=== smsc95xx_init_mac_address ===\n");
+ printk("%x:%x:%x:%x:%x:%x\n",
+ lan9730_mac_address_buffer[0],
+ lan9730_mac_address_buffer[1],
+ lan9730_mac_address_buffer[2],
+ lan9730_mac_address_buffer[3],
+ lan9730_mac_address_buffer[4],
+ lan9730_mac_address_buffer[5]);
+ printk("=== mac_addr is set ===\n");
+ if (lan9730_mac_address_buffer[0] != 0xff &&
+ lan9730_mac_address_buffer[1] != 0xff &&
+ lan9730_mac_address_buffer[2] != 0xff &&
+ lan9730_mac_address_buffer[3] != 0xff &&
+ lan9730_mac_address_buffer[4] != 0xff &&
+ lan9730_mac_address_buffer[5] != 0xff) {
+ printk("=== SUCCESS ===\n");
+ memcpy(dev->net->dev_addr, lan9730_mac_address_buffer, ETH_ALEN);
+ return;
+ }
+ printk("=== FAILURE ===\n");
+
/* maybe the boot loader passed the MAC address in devicetree */
mac_addr = of_get_mac_address(dev->udev->dev.of_node);
if (!IS_ERR(mac_addr)) {
diff -Naur kernel-source/drivers/net/usb/smsc95xx.h kernel-source.new/drivers/net/usb/smsc95xx.h
--- kernel-source/drivers/net/usb/smsc95xx.h 2020-08-04 22:32:30.824951447 +0000
+++ kernel-source.new/drivers/net/usb/smsc95xx.h 2020-08-04 23:33:50.486778978 +0000
## -361,4 +361,6 ##
#define INT_ENP_TDFO_ ((u32)BIT(12)) /* TX FIFO Overrun */
#define INT_ENP_RXDF_ ((u32)BIT(11)) /* RX Dropped Frame */
+extern unsigned char lan9730_mac_address_buffer[6];
+
#endif /* _SMSC95XX_H */
However, the Problem is that the Kernel fails to build with this error:
| GEN ./Makefile
| Using /home/me/Desktop/poky/build-microchip/tmp/work-shared/sama5d27-som1-ek-sd/kernel-source as source for kernel
| CALL /home/me/Desktop/poky/build-microchip/tmp/work-shared/sama5d27-som1-ek-sd/kernel-source/scripts/checksyscalls.sh
| Building modules, stage 2.
| MODPOST 279 modules
| ERROR: "lan9730_mac_address_buffer" [drivers/net/usb/smsc95xx.ko] undefined!
How can I refer to the Out-Of-Tree kernel module exported symbol in a patched In-Tree kernel module?
initramfs relevant code:
msg "Inserting lan9730-mac-address.ko..."
insmod /mnt/lib/modules/4.19.78-linux4sam-6.2/extra/lan9730-mac-address.ko
ls -rlt /proc/lan9730-mac-address
head -c 6 /dev/mtdblock0 > /proc/lan9730-mac-address
Out-Of-Tree module:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/proc_fs.h>
#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
const int BUFFER_SIZE = 6;
int write_length, read_length;
unsigned char lan9730_mac_address_buffer[6];
EXPORT_SYMBOL(lan9730_mac_address_buffer);
int read_proc(struct file *filp, char *buf, size_t count, loff_t *offp)
{
// Read bytes (returning the byte count) until all bytes are read.
// Then return count=0 to signal the end of the operation.
if (count > read_length)
count = read_length;
read_length = read_length - count;
copy_to_user(buf, lan9730_mac_address_buffer, count);
if (count == 0)
read_length = write_length;
return count;
}
int write_proc(struct file *filp, const char *buf, size_t count, loff_t *offp)
{
if (count > BUFFER_SIZE)
count = BUFFER_SIZE;
copy_from_user(lan9730_mac_address_buffer, buf, count);
write_length = count;
read_length = count;
return count;
}
struct file_operations proc_fops = {
read: read_proc,
write: write_proc
};
void create_new_proc_entry(void) //use of void for no arguments is compulsory now
{
proc_create("lan9730-mac-address", 0, NULL, &proc_fops);
}
int proc_init (void) {
create_new_proc_entry();
memset(lan9730_mac_address_buffer, 0x00, sizeof(lan9730_mac_address_buffer));
return 0;
}
void proc_cleanup(void) {
remove_proc_entry("lan9730-mac-address", NULL);
}
MODULE_LICENSE("GPL");
module_init(proc_init);
module_exit(proc_cleanup);
There are several ways to achieve what you want (taking into account different aspects, like module can be compiled in or be a module).
Convert Out-Of-Tree module to be In-Tree one (in your custom kernel build). This will require simple export and import as you basically done and nothing special is required, just maybe providing a header with the symbol and depmod -a run after module installation. Note, you have to use modprobe in-tree which reads and satisfies dependencies.
Turn other way around, i.e. export symbol from in-tree module and file it in the out-of-tree. In this case you simply have to check if it has been filed or not (since it's a MAC address the check against all 0's will work, no additional flags needed)
BUT, these ways are simply wrong. The driver and even your patch clearly show that it supports OF (Device Tree) and your board has support of it. So, this is a first part of the solution, you may provide correct MAC to the network card using Device Tree.
In the case you want to change it runtime the procfs approach is very strange to begin with. Network device interface in Linux has all means to update MAC from user space at any time user wants to do it. Just use ip command, like /sbin/ip link set <$ETH> addr <$MACADDR>, where <$ETH> is a network interface, for example, eth0 and <$MACADDR> is a desired address to set.
So, if this question rather about module symbols, you need to find better example for it because it's really depends to use case. You may consider to read How to export symbol from Linux kernel module in this case? as an alternative way to exporting. Another possibility how to do it right is to use software nodes (it's a new concept in recent Linux kernel).

How could SSCANF provide so strange results?

I am in 4-day fight with this code:
unsigned long baudrate = 0;
unsigned char databits = 0;
unsigned char stop_bits = 0;
char parity_text[10];
char flowctrl_text[4];
const char xformat[] = "%lu,%hhu,%hhu,%[^,],%[^,]\n";
const char xtext[] = "115200,8,1,EVEN,NFC\n";
int res = sscanf(xtext, xformat, &baudrate, &databits, &stop_bits, (char*) &parity_text, (char*) &flowctrl_text);
printf("Res: %d\r\n", res);
printf("baudrate: %lu, databits: %hhu, stop: %hhu, \r\n", baudrate, databits, stop_bits);
printf("parity: %s \r\n", parity_text);
printf("flowctrl: %s \r\n", flowctrl_text);
It returns:
Res: 5
baudrate: 115200, databits: 0, stop: 1,
parity:
flowctrl: NFC
Databits and parity missing !
Actually memory under the parity variable is '\0'VEN'\0',
looks like the first characters was somehow overwritten by sscanf procedure.
Return value of sscanf is 5, which suggests, that it was able to parse the input.
My configuration:
gccarmnoneeabi 7.2.1
Visual Studio Code 1.43.2
PlatformIO Core 4.3.1
PlatformIO Home 3.1.1
Lib ST-STM 6.0.0 (Mbed 5.14.1)
STM32F446RE (Nucleo-F446RE)
I have tried (without success):
compiling with mbed RTOS and without
variable types uint8_t, uint32_t
gccarm versions: 6.3.1, 8.3.1, 9.2.1
using another IDE (CLion+PlatformIO)
compiling on another computer (same config)
What actually helps:
making the variables static
compiling in Mbed online compiler
The behavior of sscanf is as whole very unpredictable, mixing the order or datatype of variables sometimes helps, but most often ends with another flaws in the output.
This took me longer than I care to admit. But like most issues it ended up being very simple.
char parity_text[10];
char flowctrl_text[4];
Needs to be changed to:
char parity_text[10] = {0};
char flowctrl_text[5] = {0};
The flowctrl_text array is not large enough at size four to hold "EVEN" and the NULL termination. If you bump it to a size of 5 you should have no problem. Just to be safe I would also initialize the arrays to 0.
Once I increased the size I had 0 issues with your existing code. Let me know if this helps.

Halide: OpenCL code generation

Is it possible in Halide to produce a file which contains generated OpenCL code? I have tried to produce a c file from a Halide program which target would be opencl, but I don't see any opencl specfic code there.
Edit 1:
I would like to see especially how kernels are created in Halide. Something like this:
static char
kernelSourceCode[] =
kernel void test_kernel(int a, int b, __global int *out)
{
out[0] = a + b;
}
Edit 2:
Ok, I put HL_DEBUG_CODEGEN=1 to env variable and set in the code set_target(Target::Debug). I got bunch of code on the screen, which some of were OpenCL code but I still can't see any kernel spesific code.
There are two lines on the screen which indicates about kernels. Should there be something?
OpenCL kernel:
/*OpenCL C*/
Then there is also a line:
kernel void _at_least_one_kernel(int x) { }
In example if I have a function like this:
gradient(x, y) = x + y;
Is the function inside a kernel if I want to target to OpenCL?
Here is what I managed to spot from documentation
CUDA or OpenCL are not enabled by default. You have to construct a Target object, enable one of them, and then pass that target object to compile_jit.
Target target = get_host_target();
target.set_feature(Target::OpenCL);
curved.compile_jit(target);
Or similarely you can use compile_to method, by providing the correct target.
EXPORT void Halide::Func::compile_to(const Outputs & output_files,
std::vector<Argument> args,
const std::string& fn_name,
const Target& target = get_target_from_environment()
)

How to use arrays in program (global) scope in OpenCL

AMD OpenCL Programming Guide, Section 6.3 Constant Memory Optimization:
Globally scoped constant arrays. These arrays are initialized,
globally scoped, and in the constant address space (as specified in
section 6.5.3 of the OpenCL specification). If the size of an array is
below 64 kB, it is placed in hardware constant buffers; otherwise, it
uses global memory. An example of this is a lookup table for math
functions.
I want to use this "globally scoped constant array". I have such code in pure C
#define SIZE 101
int *reciprocal_table;
int reciprocal(int number){
return reciprocal_table[number];
}
void kernel(int *output)
{
for(int i=0; i < SIZE; i+)
output[i] = reciprocal(i);
}
I want to port it into OpenCL
__kernel void kernel(__global int *output){
int gid = get_global_id(0);
output[gid] = reciprocal(gid);
}
int reciprocal(int number){
return reciprocal_table[number];
}
What should I do with global variable reciprocal_table? If I try to add __global or __constant to it I get an error:
global variable must be declared in addrSpace constant
I don't want to pass __constant int *reciprocal_table from kernel to reciprocal. Is it possible to initialize global variable somehow? I know that I can write it down into code, but does other way exist?
P.S. I'm using AMD OpenCL
UPD Above code is just an example. I have real much more complex code with a lot of functions. So I want to make array in program scope to use it in all functions.
UPD2 Changed example code and added citation from Programming Guide
#define SIZE 2
int constant array[SIZE] = {0, 1};
kernel void
foo (global int* input,
global int* output)
{
const uint id = get_global_id (0);
output[id] = input[id] + array[id];
}
I can get the above to compile with Intel as well as AMD. It also works without the initialization of the array but then you would not know what's in the array and since it's in the constant address space, you could not assign any values.
Program global variables have to be in the __constant address space, as stated by section 6.5.3 in the standard.
UPDATE Now, that I fully understood the question:
One thing that worked for me is to define the array in the constant space and then overwrite it by passing a kernel parameter constant int* array which overwrites the array.
That produced correct results only on the GPU Device. The AMD CPU Device and the Intel CPU Device did not overwrite the arrays address. It also is probably not compliant to the standard.
Here's how it looks:
#define SIZE 2
int constant foo[SIZE] = {100, 100};
int
baz (int i)
{
return foo[i];
}
kernel void
bar (global int* input,
global int* output,
constant int* foo)
{
const uint id = get_global_id (0);
output[id] = input[id] + baz (id);
}
For input = {2, 3} and foo = {0, 1} this produces {2, 4} on my HD 7850 Device (Ubuntu 12.10, Catalyst 9.0.2). But on the CPU I get {102, 103} with either OCL Implementation (AMD, Intel). So I can not stress, how much I personally would NOT do this, because it's only a matter of time, before this breaks.
Another way to achieve this is would be to compute .h files with the host during runtime with the definition of the array (or predefine them) and pass them to the kernel upon compilation via a compiler option. This, of course, requires recompilation of the clProgram/clKernel for every different LUT.
I struggled to get this work in my own program some time ago.
I did not find any way to initialize a constant or global scope array from the host via some clEnqueueWriteBuffer or so. The only way is to write it explicitely in your .cl source file.
So here my trick to initialize it from the host is to use the fact that you are actually compiling your source from the host, which also means you can alter your src.cl file before compiling it.
First my src.cl file reads:
__constant double lookup[SIZE] = { LOOKUP }; // precomputed table (in constant memory).
double func(int idx) {
return(lookup[idx])
}
__kernel void ker1(__global double *in, __global double *out)
{
... do something ...
double t = func(i)
...
}
notice the lookup table is initialized with LOOKUP.
Then, in the host program, before compiling your OpenCL code:
compute the values of my lookup table in host_values[]
on your host, run something like:
char *buf = (char*) malloc( 10000 );
int count = sprintf(buf, "#define LOOKUP "); // actual source generation !
for (int i=0;i<SIZE;i++) count += sprintf(buf+count, "%g, ",host_values[i]);
count += sprintf(buf+count,"\n");
then read the content of your source file src.cl and place it right at buf+count.
you now have a source file with an explicitely defined lookup table that you just computed from the host.
compile your buffer with something like clCreateProgramWithSource(context, 1, (const char **) &buf, &src_sz, err);
voilĂ  !
It looks like "array" is a look-up table of sorts. You'll need to clCreateBuffer and clEnqueueWriteBuffer so the GPU has a copy of it to use.

Regarding how the parameters to the read function is passed in simple char driver

I am newbei to driver programming i am started writing the simple char driver . Then i created special file for my char driver mknod /dev/simple-driver c 250 0 .when it type cat /dev/simple-driver. it shows the string "Hello world from Kernel mode!". i know that function
static const char g_s_Hello_World_string[] = "Hello world tamil_vanan!\n\0";
static const ssize_t g_s_Hello_World_size = sizeof(g_s_Hello_World_string);
static ssize_t device_file_read(
struct file *file_ptr
, char __user *user_buffer
, size_t count
, loff_t *possition)
{
printk( KERN_NOTICE "Simple-driver: Device file is read at offset =
%i, read bytes count = %u", (int)*possition , (unsigned int)count );
if( *possition >= g_s_Hello_World_size )
return 0;
if( *possition + count > g_s_Hello_World_size )
count = g_s_Hello_World_size - *possition;
if( copy_to_user(user_buffer, g_s_Hello_World_string + *possition, count) != 0 )
return -EFAULT;
*possition += count;
return count;
}
is get called . This is mapped to (*read) in file_opreation structure of my driver .My question is how this function is get called , how the parameters like struct file,char,count, offset are passed bcoz is i simply typed cat command ..Please elabroate how this happening
In Linux all are considered as files. The type of file, whether it is a driver file or normal file depends upon the mount point where it is mounted.
For Eg: If we consider your case : cat /dev/simple-driver traverses back to the mount point of device files.
From the device file name simple-driver it retrieves Major and Minor number.
From those number(especially from minor number) it associates the driver file for your character driver.
From the driver it uses struct file ops structure to find the read function, which is nothing but your read function:
static ssize_t device_file_read(struct file *file_ptr, char __user *user_buffer, size_t count, loff_t *possition)
User_buffer will always take sizeof(size_t count).It is better to keep a check of buffer(In some cases it throws warning)
String is copied to User_buffer(copy_to_user is used to check kernel flags during copy operation).
postion is 0 for first copy and it increments in the order of count:position+=count.
Once read function returns the buffer to cat. and cat flushes the buffer contents on std_out which is nothing but your console.
cat will use some posix version of read call from glibc. Glibc will put the arguments on the stack or in registers (this depends on your hardware architecture) and will switch to kernel mode. In the kernel the values will be copied to the kernel stack. And in the end your read function will be called.

Resources