OpenCL half4 type Apple OS X - macos

Does anybody know the state of half precision floating point support in OpenCL as implemented by Apple.
According to http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/cl_khr_fp16.html
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
should enable support for types such as half4 but when I come to build the kernel the compiler throws a message such as
error: variable has incomplete type 'half4' (aka 'struct __Reserved_Name__Do_not_use_half4')
is there anyway I can have half4 support in Apple's OpenCL?
Thanks.

The latest shipping Apple implementation is on Lion, and it supports OpenCL 1.1. You are looking at the recently released OpenCL 1.2 specification. That simply documents what will be in a given 1.2 implementation of OpenCL, whoever the vendor might be.

The cl_khr_fp16 extension (floating point operations on the 16bit scalar type (half) and vectors of half (half2,half3,half4,half8,half16) is an optional extension to OpenCL 1.0, 1.1 and 1.2.
An OpenCL extension defines a macro of the same name as the extension if it is supported in the OpenCL implementation.
e.g.
#ifdef cl_khr_fp16
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
... // Code using half
#else
#error No FP16 support
#endif
I do not believe Apple are shipping an OpenCL with half support.

Related

How to use OpenGL ES with GLFW on windows?

Since NVIDIA DRIVE product supports the OpenGL ES 2 and 3 specifications, I want to run OpenGL ES code on Windows 10 with GTX 2070, which will elimated a
Also, GLFW support configuration like glfwWindowHint(GLFW_CLIENT_API, GLFW_OPENGL_ES_API). Is it possible to use GLFW to run OpenGL ES code on Windows 10?
First of all, make sure you have downloaded glad with GL ES.
https://glad.dav1d.de/
For the glfw part, you need to set the GLFW_CLIENT_API window hints.
glfwWindowHint(GLFW_CLIENT_API, GLFW_OPENGL_ES_API);
And also choose wich version you want, for example:
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 2);
Then specify the kind of context. In the case of OpenGL ES, according to the documentation, it must be GLFW_OPENGL_ANY_PROFILE.
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_ANY_PROFILE);
GLFW_OPENGL_PROFILE indicates the OpenGL profile used by the context.
This is GLFW_OPENGL_CORE_PROFILE or GLFW_OPENGL_COMPAT_PROFILE if the
context uses a known profile, or GLFW_OPENGL_ANY_PROFILE if the OpenGL
profile is unknown or the context is an OpenGL ES context. Note that
the returned profile may not match the profile bits of the context
flags, as GLFW will try other means of detecting the profile when no
bits are set.
However, GLFW_OPENGL_ANY_PROFILE is already the default value, so you don't really need to set it.

CUDA-like workflow for OpenCL

The typical example workflow for OpenCL programming seems to be focused on source code within strings, passed to the JIT compiler, then finally enqueued (with a specific kernel name); and the compilation results can be cached - but that's left for you the programmer to take care of.
In CUDA, the code is compiled in a non-JIT way to object files (alongside host-side code, but forget about that for a second), and then one just refers to device-side functions in the context of an enqueue or arguments etc.
Now, I'd like to have the second kind of workflow, but with OpenCL sources. That is, suppose I have some C host-side code my_app.c, and some OpenCL kernel code in a separate file, my_kernel.cl (which for the purpose of discussion is self-contained). I would like to be able to run a magic command on my_kernel.cl, get a my_kernel.whatever, link or faux-link that together with my_app.o, and get a binary. Now, in my_app.c I want to be able to somehow to refer to the kernel, even if it's not an extern symbol, as compiled OpenCL program (or program + kernel name) - and not get compilation errors.
Is this supported somehow? With nVIDIA's ICD or with one of the other ICDs? If not, is at least some of this supported, say, the magic kernel compiler + generation of an extra header or source stub to use in compiling my_app.c?
Look into SYCL, it offers single-source C++ OpenCL. However, not yet available on every platform.
https://www.khronos.org/sycl
There is already ongoing effort that enables CUDA-like workflow in TensorFlow, and it uses SYCL 1.2 - it is actively up-streamed.
Similarly to CUDA, SYCL's approach needs the following steps:
device registration via device factory ( device is called SYCL ) - done here: https://github.com/lukeiwanski/tensorflow/tree/master/tensorflow/core/common_runtime/sycl
operation registration for above device. In order to create / port operation you can either:
re-use Eigen's code since Tensor module has SYCL back-end ( look here: https://github.com/lukeiwanski/tensorflow/blob/opencl/adjustcontrastv2/tensorflow/core/kernels/adjust_contrast_op.cc#L416 - we just partially specialize operation for SYCL device and calling the already implemented functor https://github.com/lukeiwanski/tensorflow/blob/opencl/adjustcontrastv2/tensorflow/core/kernels/adjust_contrast_op.h#L91;
write SYCL code - it has been done for FillPhiloxRandom - see https://github.com/lukeiwanski/tensorflow/blob/master/tensorflow/core/kernels/random_op.cc#L685
SYCL kernel uses modern C++
you can use OpenCL interoperability - thanks to which you can write pure OpenCL C kernel code! - I think this bit is most relevant to you
The workflow is a bit different as you do not have to do an explicit instantiation of the functor templates as CUDA does https://github.com/lukeiwanski/tensorflow/blob/master/tensorflow/core/kernels/adjust_contrast_op_gpu.cu.cc or any .cu.cc file ( in fact you do not have to add any new files - avoids mess with the build system )
As well as this thing: https://github.com/lukeiwanski/tensorflow/issues/89;
TL;DR - CUDA can create "persistent" pointers, OpenCL needs to go through Buffers and Accessors.
Codeplay's SYCL compiler ( ComputeCpp ) at the moment requires OpenCL 1.2 with SPIR extension - these are Intel CPU, Intel GPU ( Beignet work in progress ), AMD GPU ( although older drivers ) - additional platforms are coming!
Setup instructions can be found here: https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl
Our effort can be tracked in my fork of TensorFlow: https://github.com/lukeiwanski/tensorflow ( branch dev/eigen_mehdi )
Eigen used is: https://bitbucket.org/mehdi_goli/opencl ( branch default )
We are getting there! Contributions are welcome! :)

OpenCL half precision extension support on Apple OS X

Does anybody know the state of half precision floating point support in OpenCL as implemented by Apple.
According to OpenCL 1.1 spec The following statement should enable half2:
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
but when I come to build the kernel the compiler throws a message such as
error: variable has incomplete type 'half4' (aka 'struct __Reserved_Name__Do_
The following thread ask a similar question : OpenCL half4 type Apple OS X
But this thread is old. Can anyone please tell me if the half precision is supported by apple recently?
When you want to know if a extension is supported by a specific implementation (regardless if it's Apple's or another), just use the function
cl_int clGetPlatformInfo(cl_platform_id platform,
cl_platform_info param_name,
size_t param_value_size,
void *param_value,
size_t *param_value_size_ret)
passing the value CL_PLATFORM_EXTENSIONS for the param_name argument. it'll return a space-separated list of extension names.
Note that this list must returns the extensions "supported by all devices associated with this platform".
So it means that even if the platform supports the cl_khr_fp16 extension but not your device, it won't appear in the list.
To know the extension available on your device use
clGetDeviceInfo(...)
with the value CL_DEVICE_EXTENSIONS for the param_name argument.
For a generic answer to OpenCL extension querying see CaptainObvious' answer above (https://stackoverflow.com/a/17425167/5394228).
I asked Apple Developer Support about this and they say that half support is available in Metal and there are no plans to add new functionality to OpenCL now. (they answered Nov 2017)

Do GLSL geometry shaders work on the GMA X3100 under OSX

I am trying to use a trivial geometry shader but when run in Shader Builder on a laptop with a GMA X3100 it falls back and uses the software render. According this document the GMA X3100 does support EXT_geometry_shader4.
The input is POINTS and the output is LINE_STRIP.
What would be required to get it to run on the GPU (if possible)
uniform vec2 offset;
void main()
{
gl_Position = gl_PositionIn[0];
EmitVertex();
gl_Position = gl_PositionIn[0] + vec4(offset.x,offset.y,0,0);
EmitVertex();
EndPrimitive();
}
From the docs you link to it certainly appears it should be supported.
You could try
int hasGEOM = isExtensionSupported("EXT_geometry_shader4");
If it returns in the affirmative you may have another problem stopping it from working.
Also according to the GLSL Spec (1.20.8) "Any extended behavior must first be enabled.
Directives to control the behavior of the compiler with respect to extensions are declared with the #extension directive"
I didn't see you use this directive in your code so I can suggest
#extension GL_EXT_geometry_shader4 : enable
At the top of your shader code block.
I've found this OpenGL Extensions Viewer tool really helpful in tracking down these sorts of issues. It will certainly allow you to confirm Apple's claims. That said, wikipedia states that official GLSL support for geometry shaders is technically an OpenGL 3.2 feature.
Does anyone know if the EXT_geometry_shader4 implementation supports the GLSL syntax, or does it require some hardware or driver specific format?
Interestingly enough, I've heard that the compatibility claims of Intel regarding these integrated GPUs are sometimes overstated or just false. Apparently the X3100 only supports OpenGL 1.4 and below (or so I've heard, take this with a grain of salt, as I can't confirm this).
On my HP Laptop, with an Intel x3100 using Windows 7 x64 drivers (v8.15.10.1930 (9-23-2009)) directly from Intel's website, the extension "EXT_geometry_shader4" (or any variation of it) is NOT supported. I've confirmed this programmatically and using the tool "GPU Caps Viewer" (which lists detected supported extensions, amongst other useful things). Since Windows tends to be the primary subject of driver development from any vendor, it's unlikely the OSX driver is any better, and may in fact have even less supported extensions.

Enabling floating point interrupts on Mac OS X Intel

On Linux, feenableexcept and fedisableexcept can be used to control the generation of SIGFPE interrupts on floating point exceptions. How can I do this on Mac OS X Intel?
Inline assembly for enabling floating point interrupts is provided in http://developer.apple.com/documentation/Performance/Conceptual/Mac_OSX_Numerics/Mac_OSX_Numerics.pdf, pp. 7-15, but only for PowerPC assembly.
Exceptions for sse can be enabled using _MM_SET_EXCEPTION_MASK from xmmintrin.h. For example, to enable invalid (nan) exceptions, do
#include <xmmintrin.h>
...
_MM_SET_EXCEPTION_MASK(_MM_GET_EXCEPTION_MASK() & ~_MM_MASK_INVALID);
On Mac OS X this is moderately complicated. OS X uses the SSE unit for all FP math by default, not the x87 FP unit. The SSE unit does not honor the interrupt options, so that means that in addition to enabling interrupts, you need to make sure to compile all your code not to use SSE math.
You can disable the math by adding "-mno-sse -mno-sse2 -mno-sse3" to your CFLAGS. Once you do that you can use some inline assembly to configure your FP exceptions, with basically the same flags as Linux.
short fpflags = 0x1332 // Default FP flags, change this however you want.
asm("fnclex");
asm("fldcw _fpflags");
The one catch you may find is that since OS X is built entirely using sse there may be uncaught bugs. I know there used to be a big with the signal handler not passing back the proper codes, but that was a few years ago, hopefully it is fixed now.

Resources